Skip to contents

This function retrieves toplevel sequences. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.

Usage

get_toplevel_sequences(
  species_name = "homo_sapiens",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

verbose

Whether to be chatty.

warnings

Whether to print warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble, each row being a toplevel sequence, of 4 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

coord_system

Coordinate system type.

toplevel_sequence

Name of the toplevel sequence.

length

Genomic length toplevel sequence in base pairs.

Examples

# Get toplevel sequences for the human genome (default)
get_toplevel_sequences()
#> # A tibble: 194 × 4
#>    species_name coord_system toplevel_sequence length
#>    <chr>        <chr>        <chr>              <int>
#>  1 homo_sapiens scaffold     KI270757.1         71251
#>  2 homo_sapiens scaffold     KI270741.1        157432
#>  3 homo_sapiens scaffold     KI270756.1         79590
#>  4 homo_sapiens scaffold     KI270730.1        112551
#>  5 homo_sapiens scaffold     KI270739.1         73985
#>  6 homo_sapiens scaffold     KI270738.1         99375
#>  7 homo_sapiens scaffold     KI270737.1        103838
#>  8 homo_sapiens scaffold     KI270312.1           998
#>  9 homo_sapiens scaffold     KI270591.1          5796
#> 10 homo_sapiens scaffold     KI270371.1          2805
#> # ℹ 184 more rows

# Get toplevel sequences for Caenorhabditis elegans
get_toplevel_sequences('caenorhabditis_elegans')
#> # A tibble: 7 × 4
#>   species_name           coord_system toplevel_sequence   length
#>   <chr>                  <chr>        <chr>                <int>
#> 1 caenorhabditis_elegans chromosome   I                 15072434
#> 2 caenorhabditis_elegans chromosome   II                15279421
#> 3 caenorhabditis_elegans chromosome   III               13783801
#> 4 caenorhabditis_elegans chromosome   IV                17493829
#> 5 caenorhabditis_elegans chromosome   V                 20924180
#> 6 caenorhabditis_elegans chromosome   X                 17718942
#> 7 caenorhabditis_elegans chromosome   MtDNA                13794