Skip to contents

The MGI report MRK_ENSEMBL.rpt provides associations between MGI genetic markers and Ensembl identifiers.

To read this report using the key "ensembl_ids", use the following code:

# To read all records (more than 70,000), use `read_report("ensembl_ids")`.
(assoc_ensembl_ids <- read_report(report_key = "ensembl_ids", n_max = 30L))
## # A tibble: 30 × 13
##    marker_id   marker_symbol marker_name   feature_type chromosome  start    end
##    <chr>       <chr>         <chr>         <fct>        <fct>       <int>  <int>
##  1 MGI:1919275 1600012P17Rik RIKEN cDNA 1… lncRNA gene  1          1.59e8 1.59e8
##  2 MGI:1914753 1700001G17Rik RIKEN cDNA 1… lncRNA gene  1          3.37e7 3.37e7
##  3 MGI:1916606 1700003I22Rik RIKEN cDNA 1… lncRNA gene  1          5.61e7 5.61e7
##  4 MGI:1925628 1700006P03Rik RIKEN cDNA 1… lncRNA gene  1          1.37e8 1.37e8
##  5 MGI:1916558 1700007P06Rik RIKEN cDNA 1… lncRNA gene  1          1.87e8 1.87e8
##  6 MGI:1923817 1700012E03Rik RIKEN cDNA 1… lncRNA gene  1          1.20e8 1.20e8
##  7 MGI:1919458 1700016L21Rik RIKEN cDNA 1… lncRNA gene  1          8.04e7 8.05e7
##  8 MGI:1916647 1700019A02Rik RIKEN cDNA 1… protein cod… 1          5.32e7 5.32e7
##  9 MGI:1914330 1700019D03Rik RIKEN cDNA 1… protein cod… 1          5.30e7 5.30e7
## 10 MGI:1922796 1700019P21Rik RIKEN cDNA 1… lncRNA gene  1          1.39e8 1.39e8
## # ℹ 20 more rows
## # ℹ 6 more variables: strand <fct>, genetic_map_pos <dbl>,
## #   ensembl_gen_id <chr>, ensembl_trp_id <list>, ensembl_prt_id <list>,
## #   biotype <list>

Ensembl identifiers

These variables hold one or more identifiers associated with each genetic marker:

  • ensembl_gen_id: Ensembl gene identifier(s)
  • ensembl_trp_id: Ensembl transcript identifier(s)
  • ensembl_prt_id: Ensembl protein identifier(s)
  • biotype: Ensembl’s biotype

Example: MGI:1919458 (lncRNA)

Here is the example of marker MGI:1919458, a lncRNA, with multiple Ensembl transcripts:

assoc_ensembl_ids |>
  dplyr::filter(marker_id == "MGI:1919458") |>
  dplyr::select("marker_id", "marker_symbol", "ensembl_gen_id","ensembl_trp_id") |>
  tidyr::unnest("ensembl_trp_id")
## # A tibble: 9 × 4
##   marker_id   marker_symbol ensembl_gen_id     ensembl_trp_id    
##   <chr>       <chr>         <chr>              <chr>             
## 1 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000187497
## 2 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000189139
## 3 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000242693
## 4 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000244035
## 5 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000246306
## 6 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000246474
## 7 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000247492
## 8 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000247629
## 9 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000247689

Example: MGI:1914330 (protein coding)

And another example of a protein coding gene, MGI:1914330.

Transcripts:

# Transcripts
assoc_ensembl_ids |>
  dplyr::filter(marker_id == "MGI:1914330") |>
  dplyr::select("marker_id", "marker_symbol", "ensembl_gen_id","ensembl_trp_id") |>
  tidyr::unnest("ensembl_trp_id")
## # A tibble: 7 × 4
##   marker_id   marker_symbol ensembl_gen_id     ensembl_trp_id    
##   <chr>       <chr>         <chr>              <chr>             
## 1 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000050567
## 2 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000114492
## 3 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000114493
## 4 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000186266
## 5 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000190726
## 6 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000190831
## 7 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000191441

Proteins:

# Proteins
assoc_ensembl_ids |>
  dplyr::filter(marker_id == "MGI:1914330") |>
  dplyr::select("marker_id", "marker_symbol", "ensembl_gen_id","ensembl_prt_id") |>
  tidyr::unnest("ensembl_prt_id")
## # A tibble: 7 × 4
##   marker_id   marker_symbol ensembl_gen_id     ensembl_prt_id    
##   <chr>       <chr>         <chr>              <chr>             
## 1 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000055413
## 2 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000110136
## 3 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000110137
## 4 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000139750
## 5 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000140160
## 6 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000140273
## 7 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000140530

Variables

marker_status

marker_status: genetic marker status is a factor of two levels: 'O' for official, and 'W' for withdrawn. Official indicates a currently in-use genetic marker, whereas withdrawn means that the symbol or name was once approved but has since been replaced.

marker_type

marker_type: genetic marker type is a factor of 10 levels: Gene, GeneModel, Pseudogene, DNA Segment, Transgene, QTL, Cytogenetic Marker, BAC/YAC end, Complex/Cluster/Region, Other Genome Feature. See ?marker_type_definitions for the meaning of each type.

marker_id

marker_id: MGI accession identifier. A unique alphanumeric character string that is used to unambiguously identify a particular record in the Mouse Genome Informatics database. The format is MGI:nnnnnn, where n is a digit.

marker_symbol

marker_symbol: marker symbol is a unique abbreviation of the marker name.

marker_name

marker_name: marker name is a word or phrase that uniquely identifies the genetic marker, e.g. a gene or allele name.

feature_type

feature_type: an attribute of a portion of a genomic sequence. See the dataset ?feature_type_definitions for details.

chromosome

chromosome: mouse chromosome name. Possible values are names for the autosomal, sexual or mitochondrial chromosomes.

start

start: genomic start position (one-offset).

end

end: genomic end position (one-offset).

strand

strand: DNA strand, ‘+’ for sense, and ‘-’ for antisense.

genetic_map_pos

genetic_map_pos: genetic map position in centiMorgan (cM): a unit of length in a genetic map. Two loci are 1 cM apart if recombination is detected between them in 1% of meioses.

ensembl_gen_id

ensembl_gen_id: mouse Ensembl gene identifier, a string of the format `ENSMUSG[a unique eleven digit number].

ensembl_trp_id

ensembl_trp_id: Ensembl transcript identifier(s), a list-column.

ensembl_prt_id

ensembl_prt_id: Ensembl protein identifier(s), a list-column.

biotype

biotype: Ensembl’s biotype, gene or transcript classification. See Ensembl documentation on Biotypes for more details.