Skip to contents

This function retrieves cross-references to external databases by Ensembl identifier. The data is returned as a tibble where each row is a cross reference related to the provided Ensembl identifier. See below under section Value for details about each column.

Usage

get_xrefs_by_ensembl_id(
  species_name,
  ensembl_id,
  all_levels = FALSE,
  ensembl_db = "core",
  external_db = "",
  feature = "",
  verbose = FALSE,
  warnings = TRUE,
  progress_bar = TRUE
)

Arguments

species_name

The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples: 'homo_sapiens' (human), 'ovis_aries' (Domestic sheep) or 'capra_hircus' (Goat).

ensembl_id

An Ensembl stable identifier, e.g. "ENSG00000248234378".

all_levels

A logical vector. Set to find all genetic features linked to the stable ID, and fetch all external references for them. Specifying this on a gene will also return values from its transcripts and translations.

ensembl_db

Restrict the search to an Ensembl database: typically one of 'core', 'rnaseq', 'cdna', 'funcgen' and 'otherfeatures'.

external_db

External database to be filtered by. By default no filtering is applied.

feature

Restrict search to a feature type: gene ('gene'), exon ('exon'), transcript ('transcript'), and protein ('translation').

verbose

Whether to be verbose about the http requests and respective responses' status.

warnings

Whether to show warnings.

progress_bar

Whether to show a progress bar.

Value

A tibble of 12 variables:

species_name

Ensembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g., 'homo_sapiens'.

ensembl_id

An Ensembl stable identifier, e.g. "ENSG00000248234378".

ensembl_db

Ensembl database.

primary_id

Primary identification in external database.

display_id

Display identification in external database.

external_db_name

External database name.

external_db_display_name

External database display name.

version

TODO

info_type

There are two types of external cross references (XRef): direct ('DIRECT') or dependent ('DEPENDENT'). A direct cross reference is one that can be directly linked to a gene, transcript or translation object in Ensembl Genomes by synonymy or sequence similarity. A dependent cross reference is one that is transitively linked to the object via the direct cross reference. The value can also be 'UNMAPPED' for unmapped cross references, or 'PROJECTION' for TODO.

info_text

TODO

synonyms

Other names or acronyms used to refer to the the external database entry. Note that this column is of the list type.

description

Brief description of the external database entry.

Ensembl REST API endpoints

get_xrefs_by_ensembl_id() makes GET requests to /xrefs/id/:id.

Examples

get_xrefs_by_ensembl_id('human', 'ENSG00000248378')
#> # A tibble: 1 × 12
#>   species_name ensembl_id      ensembl_db primary_id display_id external_db_name
#>   <chr>        <chr>           <chr>      <chr>      <chr>      <chr>           
#> 1 human        ENSG00000248378 core       ENSG00000… ENSG00000… ArrayExpress    
#> # ℹ 6 more variables: external_db_display_name <chr>, version <chr>,
#> #   info_type <chr>, info_text <chr>, synonyms <list>, description <lgl>

get_xrefs_by_ensembl_id('human', 'ENSG00000248378', all_levels = TRUE)
#> # A tibble: 3 × 12
#>   species_name ensembl_id      ensembl_db primary_id display_id external_db_name
#>   <chr>        <chr>           <chr>      <chr>      <chr>      <chr>           
#> 1 human        ENSG00000248378 core       ENSG00000… ENSG00000… ArrayExpress    
#> 2 human        ENSG00000248378 core       uc063csj.1 uc063csj.1 UCSC            
#> 3 human        ENSG00000248378 core       URS000007… URS000007… RNAcentral      
#> # ℹ 6 more variables: external_db_display_name <chr>, version <chr>,
#> #   info_type <chr>, info_text <chr>, synonyms <list>, description <lgl>