Get linkage disequilibrium data for variants
Source:R/linkage_disequilibrium.R
get_ld_variants_by_window.Rd
Gets linkage disequilibrium data for variants from Ensembl REST API. There are four ways to query, either by:
- Genomic window centred on variants:
get_ld_variants_by_window(variant_id, genomic_window_size, ...)
- Pairs of variants:
get_ld_variants_by_pair(variant_id1, variant_id2, ...)
- Genomic range:
get_ld_variants_by_range(genomic_range, ...)
- All pair combinations of variants:
get_ld_variants_by_pair_combn(variant_id, ...)
Usage
get_ld_variants_by_window(
variant_id,
genomic_window_size = 500L,
species_name = "homo_sapiens",
population = "1000GENOMES:phase_3:CEU",
d_prime = 0,
r_squared = 0.05,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
get_ld_variants_by_pair(
variant_id1,
variant_id2,
species_name = "homo_sapiens",
population = "1000GENOMES:phase_3:CEU",
d_prime = 0,
r_squared = 0.05,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
get_ld_variants_by_range(
genomic_range,
species_name = "homo_sapiens",
population = "1000GENOMES:phase_3:CEU",
d_prime = 0,
r_squared = 0.05,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
get_ld_variants_by_pair_combn(
variant_id,
species_name = "homo_sapiens",
population = "1000GENOMES:phase_3:CEU",
d_prime = 0,
r_squared = 0.05,
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
- variant_id
Variant identifiers, e.g.,
'rs123'
. This argument is to be used with either functionget_ld_variants_by_window()
orget_ld_variants_by_pair_combn()
. In the case ofget_ld_variants_by_pair_combn()
all pairwise combinations of elements ofvariant_id
are used to define pairs of variants for querying. Note that this argument is not the same asvariant_id1
orvariant_id2
, to be used with functionget_ld_variants_by_pair
.- genomic_window_size
An integer vector specifying the genomic window size in kilobases (kb) around the variant indicated in
variant_id
. This argument is to be used with functionget_ld_variants_by_window()
. At the moment, the Ensembl REST API does not allow values greater than 500kb. A window size of 500 means looking 250kb upstream and downstream the variant passed asvariant_id
. The minimum value for this argument is1L
, not0L
.- species_name
The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples:
'homo_sapiens'
(human),'ovis_aries'
(Domestic sheep) or'capra_hircus'
(Goat).- population
Population for which to compute linkage disequilibrium. See
get_populations
on how to find available populations for a species.- d_prime
\(D'\) is a measure of linkage disequilibrium.
d_prime
defines a cut-off threshold: only variants whose \(D' \ge \)d_prime
are returned.- r_squared
\(r^2\) is a measure of linkage disequilibrium.
r_squared
defines a cut-off threshold: only variants whose \(r^2 \ge \)r_squared
are returned. The lower bound forr_squared
is0.05
, not0
; the upper bound is1
.- verbose
Whether to be verbose about the http requests and respective responses' status.
- warnings
Whether to show warnings.
- progress_bar
Whether to show a progress bar.
- variant_id1
The first variant of a pair of variants. Used with
variant_id2
. Note that this argument is not the same asvariant_id
. This argument is to be used with functionget_ld_variants_by_pair()
.- variant_id2
The second variant of a pair of variants. Used with
variant_id1
. Note that this argument is not the same asvariant_id
. This argument is to be used with functionget_ld_variants_by_pair()
.- genomic_range
Genomic range formatted as a string
"chr:start..end"
, e.g.,"X:1..10000"
. Check functiongenomic_range
to easily create these ranges from vectors of start and end positions. This argument is to be used with functionget_ld_variants_by_range()
.
Value
A tibble
of 6 variables:
species_name
Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g.,
'homo_sapiens'
.population
Population for which to compute linkage disequilibrium.
variant_id1
First variant identifier.
variant_id2
Second variant identifier.
d_prime
\(D'\) between the two variants.
r_squared
\(r^2\) between the two variants.
Examples
# Retrieve variants in LD by a window size of 1kb:
# 1kb: 500 bp upstream and 500 bp downstream of variant.
get_ld_variants_by_window('rs123', genomic_window_size = 1L)
#> # A tibble: 6 × 6
#> species_name population variant_id1 variant_id2 r_squared d_prime
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs114 0.475 0.703
#> 2 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs124 0.722 1.00
#> 3 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs122 0.722 1.00
#> 4 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs12536724 0.255 1.00
#> 5 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs115 0.721 1.00
#> 6 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs10239961 0.255 1.00
# Retrieve LD measures for pairs of variants:
get_ld_variants_by_pair(
variant_id1 = c('rs123', 'rs35439278'),
variant_id2 = c('rs122', 'rs35174522')
)
#> # A tibble: 2 × 6
#> species_name population variant_id1 variant_id2 r_squared d_prime
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs123 rs122 0.722 1.00
#> 2 homo_sapiens 1000GENOMES:phase_3:CEU rs35439278 rs35174522 0.0973 1.00
# Retrieve variants in LD within a genomic range
get_ld_variants_by_range('7:100000..100500')
#> # A tibble: 1 × 6
#> species_name population variant_id1 variant_id2 r_squared d_prime
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs35439278 rs35174522 0.0973 1.00
# Retrieve all pair combinations of variants in LD
get_ld_variants_by_pair_combn(c('rs6978506', 'rs12718102', 'rs13307200'))
#> # A tibble: 3 × 6
#> species_name population variant_id1 variant_id2 r_squared d_prime
#> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 homo_sapiens 1000GENOMES:phase_3:CEU rs6978506 rs12718102 0.111 0.999
#> 2 homo_sapiens 1000GENOMES:phase_3:CEU rs6978506 rs13307200 0.320 1.00
#> 3 homo_sapiens 1000GENOMES:phase_3:CEU rs12718102 rs13307200 0.266 0.875