Skip to contents

The MGI report MGI_Gene_Model_Coord.rpt lists the subset of mouse genetic markers that correspond to NCBI and Ensembl gene models.

To read this report using the key "gene_model_coordinates", use the following code:

# To read all records (more than 70,000), use `read_report("gene_model_coordinates")`.
(gene_model_coord <- read_report(report_key = "gene_model_coordinates", n_max = 300L))
## # A tibble: 300 × 15
##    marker_type marker_id marker_symbol marker_name        assembly entrez_gen_id
##    <fct>       <chr>     <chr>         <chr>              <fct>            <int>
##  1 Gene        MGI:87853 a             nonagouti          GRCm39           50518
##  2 Gene        MGI:87854 Pzp           PZP, alpha-2-macr… GRCm39           11287
##  3 Gene        MGI:87859 Abl1          c-abl oncogene 1,… GRCm39           11350
##  4 Gene        MGI:87860 Abl2          ABL proto-oncogen… GRCm39           11352
##  5 Gene        MGI:87862 Scgb1b27      secretoglobin, fa… GRCm39           11354
##  6 Gene        MGI:87863 Scgb2b27      secretoglobin, fa… GRCm39          233099
##  7 Gene        MGI:87864 Scgb2b26      secretoglobin, fa… GRCm39          110187
##  8 Gene        MGI:87866 Acadl         acyl-Coenzyme A d… GRCm39           11363
##  9 Gene        MGI:87867 Acadm         acyl-Coenzyme A d… GRCm39           11364
## 10 Gene        MGI:87868 Acads         acyl-Coenzyme A d… GRCm39           11409
## # ℹ 290 more rows
## # ℹ 9 more variables: entrez_chromosome <fct>, entrez_start <int>,
## #   entrez_end <int>, entrez_strand <fct>, ensembl_gen_id <chr>,
## #   ensembl_chromosome <fct>, ensembl_start <int>, ensembl_end <int>,
## #   ensembl_strand <fct>

NCBI gene model variables

NCBI gene model annotation variables are prefixed with entrez:

dplyr::select(gene_model_coord, "marker_symbol", dplyr::starts_with("entrez"))
## # A tibble: 300 × 6
##    marker_symbol entrez_gen_id entrez_chromosome entrez_start entrez_end
##    <chr>                 <int> <fct>                    <int>      <int>
##  1 a                     50518 2                    154792519  154892932
##  2 Pzp                   11287 6                    128460530  128503683
##  3 Abl1                  11350 2                     31578256   31697105
##  4 Abl2                  11352 1                    156386160  156477189
##  5 Scgb1b27              11354 7                     33720906   33722306
##  6 Scgb2b27             233099 7                     33711344   33713367
##  7 Scgb2b26             110187 7                     33642422   33644410
##  8 Acadl                 11363 1                     66869998   66902468
##  9 Acadm                 11364 3                    153627990  153650280
## 10 Acads                 11409 5                    115248358  115257405
## # ℹ 290 more rows
## # ℹ 1 more variable: entrez_strand <fct>

Ensembl gene model variables

Ensembl gene model annotation variables are prefixed with ensembl:

dplyr::select(gene_model_coord, "marker_symbol", dplyr::starts_with("ensembl"))
## # A tibble: 300 × 6
##    marker_symbol ensembl_gen_id     ensembl_chromosome ensembl_start ensembl_end
##    <chr>         <chr>              <fct>                      <int>       <int>
##  1 a             ENSMUSG00000027596 2                      154633322   154892932
##  2 Pzp           ENSMUSG00000030359 6                      128460530   128503683
##  3 Abl1          ENSMUSG00000026842 2                       31578388    31694239
##  4 Abl2          ENSMUSG00000026596 1                      156386356   156477138
##  5 Scgb1b27      ENSMUSG00000066583 7                       33720908    33722306
##  6 Scgb2b27      ENSMUSG00000066584 7                       33711346    33713367
##  7 Scgb2b26      ENSMUSG00000066586 7                       33642427    33644465
##  8 Acadl         ENSMUSG00000026003 1                       66869998    66902436
##  9 Acadm         ENSMUSG00000062908 3                      153627994   153650269
## 10 Acads         ENSMUSG00000029545 5                      115248358   115257405
## # ℹ 290 more rows
## # ℹ 1 more variable: ensembl_strand <fct>

Variables

marker_type

marker_type: genetic marker type is a factor of 10 levels: Gene, GeneModel, Pseudogene, DNA Segment, Transgene, QTL, Cytogenetic Marker, BAC/YAC end, Complex/Cluster/Region, Other Genome Feature. See ?marker_type_definitions for the meaning of each type.

marker_id

marker_id: MGI accession identifier. A unique alphanumeric character string that is used to unambiguously identify a particular record in the Mouse Genome Informatics database. The format is MGI:nnnnnn, where n is a digit.

marker_symbol

marker_symbol: marker symbol is a unique abbreviation of the marker name.

marker_name

marker_name: marker name is a word or phrase that uniquely identifies the genetic marker, e.g. a gene or allele name.

assembly

assembly: mouse genome assembly version, a factor of two levels: 'GRCm38' and 'GRCm39'. Almost always 'GRCm39'.

entrez_gen_id

entrez_gen_id: mouse NCBI Entrez gene identifier, an integer number.

entrez_chromosome

entrez_chromosome: mouse chromosome name, according to NCBI gene model.

entrez_start

entrez_start: genomic start position (one-offset), according to NCBI gene model.

entrez_end

entrez_end: genomic end position (one-offset), according to NCBI gene model.

entrez_strand

entrez_strand: DNA strand, ‘+’ for sense, and ‘-’ for antisense, according to NCBI gene model.

ensembl_gen_id

ensembl_gen_id: mouse Ensembl gene identifier, a string of the format `ENSMUSG[a unique eleven digit number].

ensembl_chromosome

ensembl_chromosome: mouse chromosome name, according to Ensembl gene model.

ensembl_start

ensembl_start: genomic start position (one-offset), according to Ensembl gene model.

ensembl_end

ensembl_end: genomic end position (one-offset), according to Ensembl gene model.

ensembl_strand

ensembl_strand: DNA strand, ‘+’ for sense, and ‘-’ for antisense, according to Ensembl gene model.