This function retrieves a few extra details about a toplevel sequence. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.
Usage
get_toplevel_sequence_info(
species_name = "homo_sapiens",
toplevel_sequence = c(1:22, "X", "Y", "MT"),
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)
Arguments
- species_name
The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples:
'homo_sapiens'
(human),'ovis_aries'
(Domestic sheep) or'capra_hircus'
(Goat).- toplevel_sequence
A toplevel sequence name, e.g. chromosome names such as
"1"
,"X"
, or"Y"
, or a non-chromosome sequence, e.g., a scaffold such as"KI270757.1"
.- verbose
Whether to be chatty.
- warnings
Whether to print warnings.
- progress_bar
Whether to show a progress bar.
Value
A tibble
, each row being a toplevel sequence,
of 8 variables:
species_name
Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g.,
'homo_sapiens'
.toplevel_sequence
Name of the toplevel sequence.
is_chromosome
A logical indicating whether the toplevel sequence is a chromosome (
TRUE
) or not (FALSE
).coord_system
Coordinate system type.
assembly_exception_type
Coordinate system type.
is_circular
A logical indicating whether the toplevel sequence is a circular sequence (
TRUE
) or not (FALSE
).assembly_name
Assembly name.
length
Genomic length toplevel sequence in base pairs.
Examples
# Get details about human chromosomes (default)
get_toplevel_sequence_info()
#> # A tibble: 25 × 8
#> species_name toplevel_sequence is_chromosome coordinate_system
#> <chr> <chr> <lgl> <chr>
#> 1 homo_sapiens 1 TRUE chromosome
#> 2 homo_sapiens 2 TRUE chromosome
#> 3 homo_sapiens 3 TRUE chromosome
#> 4 homo_sapiens 4 TRUE chromosome
#> 5 homo_sapiens 5 TRUE chromosome
#> 6 homo_sapiens 6 TRUE chromosome
#> 7 homo_sapiens 7 TRUE chromosome
#> 8 homo_sapiens 8 TRUE chromosome
#> 9 homo_sapiens 9 TRUE chromosome
#> 10 homo_sapiens 10 TRUE chromosome
#> # ℹ 15 more rows
#> # ℹ 4 more variables: assembly_exception_type <chr>, is_circular <lgl>,
#> # assembly_name <chr>, length <int>
# Get details about a scaffold
# (To find available toplevel sequences to query use the function
# `get_toplevel_sequences()`)
get_toplevel_sequence_info(species_name = 'homo_sapiens', toplevel_sequence = 'KI270757.1')
#> # A tibble: 1 × 8
#> species_name toplevel_sequence is_chromosome coordinate_system
#> <chr> <chr> <lgl> <chr>
#> 1 homo_sapiens KI270757.1 FALSE scaffold
#> # ℹ 4 more variables: assembly_exception_type <chr>, is_circular <lgl>,
#> # assembly_name <chr>, length <int>