The MGI report MRK_ENSEMBL.rpt
provides associations
between MGI genetic markers and Ensembl identifiers.
To read this report using the key "ensembl_ids"
, use the
following code:
# To read all records (more than 70,000), use `read_report("ensembl_ids")`.
(assoc_ensembl_ids <- read_report(report_key = "ensembl_ids", n_max = 30L))
## # A tibble: 30 × 13
## marker_id marker_symbol marker_name feature_type chromosome start end
## <chr> <chr> <chr> <fct> <fct> <int> <int>
## 1 MGI:1919275 1600012P17Rik RIKEN cDNA 1… lncRNA gene 1 1.59e8 1.59e8
## 2 MGI:1914753 1700001G17Rik RIKEN cDNA 1… lncRNA gene 1 3.37e7 3.37e7
## 3 MGI:1916606 1700003I22Rik RIKEN cDNA 1… lncRNA gene 1 5.61e7 5.61e7
## 4 MGI:1925628 1700006P03Rik RIKEN cDNA 1… lncRNA gene 1 1.37e8 1.37e8
## 5 MGI:1916558 1700007P06Rik RIKEN cDNA 1… lncRNA gene 1 1.87e8 1.87e8
## 6 MGI:1923817 1700012E03Rik RIKEN cDNA 1… lncRNA gene 1 1.20e8 1.20e8
## 7 MGI:1919458 1700016L21Rik RIKEN cDNA 1… lncRNA gene 1 8.04e7 8.05e7
## 8 MGI:1916647 1700019A02Rik RIKEN cDNA 1… protein cod… 1 5.32e7 5.32e7
## 9 MGI:1914330 1700019D03Rik RIKEN cDNA 1… protein cod… 1 5.30e7 5.30e7
## 10 MGI:1922796 1700019P21Rik RIKEN cDNA 1… lncRNA gene 1 1.39e8 1.39e8
## # ℹ 20 more rows
## # ℹ 6 more variables: strand <fct>, genetic_map_pos <dbl>,
## # ensembl_gen_id <chr>, ensembl_trp_id <list>, ensembl_prt_id <list>,
## # biotype <list>
Ensembl identifiers
These variables hold one or more identifiers associated with each genetic marker:
-
ensembl_gen_id
: Ensembl gene identifier(s) -
ensembl_trp_id
: Ensembl transcript identifier(s) -
ensembl_prt_id
: Ensembl protein identifier(s) -
biotype
: Ensembl’s biotype
Example: MGI:1919458
(lncRNA)
Here is the example of marker MGI:1919458, a lncRNA, with multiple Ensembl transcripts:
assoc_ensembl_ids |>
dplyr::filter(marker_id == "MGI:1919458") |>
dplyr::select("marker_id", "marker_symbol", "ensembl_gen_id","ensembl_trp_id") |>
tidyr::unnest("ensembl_trp_id")
## # A tibble: 9 × 4
## marker_id marker_symbol ensembl_gen_id ensembl_trp_id
## <chr> <chr> <chr> <chr>
## 1 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000187497
## 2 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000189139
## 3 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000242693
## 4 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000244035
## 5 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000246306
## 6 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000246474
## 7 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000247492
## 8 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000247629
## 9 MGI:1919458 1700016L21Rik ENSMUSG00000101483 ENSMUST00000247689
Example: MGI:1914330
(protein coding)
And another example of a protein coding gene, MGI:1914330.
Transcripts:
# Transcripts
assoc_ensembl_ids |>
dplyr::filter(marker_id == "MGI:1914330") |>
dplyr::select("marker_id", "marker_symbol", "ensembl_gen_id","ensembl_trp_id") |>
tidyr::unnest("ensembl_trp_id")
## # A tibble: 7 × 4
## marker_id marker_symbol ensembl_gen_id ensembl_trp_id
## <chr> <chr> <chr> <chr>
## 1 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000050567
## 2 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000114492
## 3 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000114493
## 4 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000186266
## 5 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000190726
## 6 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000190831
## 7 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUST00000191441
Proteins:
# Proteins
assoc_ensembl_ids |>
dplyr::filter(marker_id == "MGI:1914330") |>
dplyr::select("marker_id", "marker_symbol", "ensembl_gen_id","ensembl_prt_id") |>
tidyr::unnest("ensembl_prt_id")
## # A tibble: 7 × 4
## marker_id marker_symbol ensembl_gen_id ensembl_prt_id
## <chr> <chr> <chr> <chr>
## 1 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000055413
## 2 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000110136
## 3 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000110137
## 4 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000139750
## 5 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000140160
## 6 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000140273
## 7 MGI:1914330 1700019D03Rik ENSMUSG00000043629 ENSMUSP00000140530
Variables
marker_status
marker_status
: genetic marker status is a factor of two
levels: 'O'
for official, and 'W'
for
withdrawn. Official indicates a currently in-use genetic marker, whereas
withdrawn means that the symbol or name was once approved but has since
been replaced.
marker_type
marker_type
: genetic marker type is a factor of 10
levels: Gene, GeneModel, Pseudogene, DNA Segment, Transgene, QTL,
Cytogenetic Marker, BAC/YAC end, Complex/Cluster/Region, Other Genome
Feature. See ?marker_type_definitions
for the meaning of
each type.
marker_id
marker_id
: MGI accession identifier. A unique
alphanumeric character string that is used to unambiguously identify a
particular record in the Mouse Genome Informatics database. The format
is MGI:nnnnnn
, where n
is a digit.
marker_name
marker_name
: marker name is a word or phrase that
uniquely identifies the genetic marker, e.g. a gene or allele name.
feature_type
feature_type
: an attribute of a portion of a genomic
sequence. See the dataset ?feature_type_definitions
for
details.
chromosome
chromosome
: mouse chromosome name. Possible values are
names for the autosomal, sexual or mitochondrial chromosomes.
genetic_map_pos
genetic_map_pos
: genetic map position in centiMorgan
(cM): a unit of length in a genetic map. Two loci are 1 cM apart if
recombination is detected between them in 1% of meioses.
ensembl_gen_id
ensembl_gen_id
: mouse Ensembl gene identifier, a string
of the format `ENSMUSG[a unique eleven digit number].
biotype
biotype
: Ensembl’s biotype, gene or transcript
classification. See Ensembl documentation on Biotypes
for more details.