Skip to contents

This function retrieves a few extra details about a toplevel sequence. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.

Usage

get_toplevel_sequence_info(
  species_name = "homo_sapiens",
  toplevel_sequence = c(1:22, "X", "Y", "MT"),
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

toplevel_sequence

A toplevel sequence name, e.g. chromosome names such as "1", "X", or "Y", or a non-chromosome sequence, e.g., a scaffold such as "KI270757.1".

verbose

Whether to be chatty.

warnings

Whether to print warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble, each row being a toplevel sequence, of 8 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

toplevel_sequence

Name of the toplevel sequence.

is_chromosome

A logical indicating whether the toplevel sequence is a chromosome (TRUE) or not (FALSE).

coord_system

Coordinate system type.

assembly_exception_type

Coordinate system type.

is_circular

A logical indicating whether the toplevel sequence is a circular sequence (TRUE) or not (FALSE).

assembly_name

Assembly name.

length

Genomic length toplevel sequence in base pairs.

Examples

# Get details about human chromosomes (default)
get_toplevel_sequence_info()
#> # A tibble: 25 × 8
#>    species_name toplevel_sequence is_chromosome coordinate_system
#>    <chr>        <chr>             <lgl>         <chr>            
#>  1 homo_sapiens 1                 TRUE          chromosome       
#>  2 homo_sapiens 2                 TRUE          chromosome       
#>  3 homo_sapiens 3                 TRUE          chromosome       
#>  4 homo_sapiens 4                 TRUE          chromosome       
#>  5 homo_sapiens 5                 TRUE          chromosome       
#>  6 homo_sapiens 6                 TRUE          chromosome       
#>  7 homo_sapiens 7                 TRUE          chromosome       
#>  8 homo_sapiens 8                 TRUE          chromosome       
#>  9 homo_sapiens 9                 TRUE          chromosome       
#> 10 homo_sapiens 10                TRUE          chromosome       
#> # ℹ 15 more rows
#> # ℹ 4 more variables: assembly_exception_type <chr>, is_circular <lgl>,
#> #   assembly_name <chr>, length <int>

# Get details about a scaffold
# (To find available toplevel sequences to query use the function
# `get_toplevel_sequences()`)
get_toplevel_sequence_info(species_name = 'homo_sapiens', toplevel_sequence = 'KI270757.1')
#> # A tibble: 1 × 8
#>   species_name toplevel_sequence is_chromosome coordinate_system
#>   <chr>        <chr>             <lgl>         <chr>            
#> 1 homo_sapiens KI270757.1        FALSE         scaffold         
#> # ℹ 4 more variables: assembly_exception_type <chr>, is_circular <lgl>,
#> #   assembly_name <chr>, length <int>