Gene Model Coordinates
Source:vignettes/articles/gene_model_coordinates.Rmd
gene_model_coordinates.Rmd
The MGI report MGI_Gene_Model_Coord.rpt
lists the subset
of mouse genetic markers that correspond to NCBI and Ensembl gene
models.
To read this report using the key
"gene_model_coordinates"
, use the following code:
# To read all records (more than 70,000), use `read_report("gene_model_coordinates")`.
(gene_model_coord <- read_report(report_key = "gene_model_coordinates", n_max = 300L))
## # A tibble: 300 × 15
## marker_type marker_id marker_symbol marker_name assembly entrez_gen_id
## <fct> <chr> <chr> <chr> <fct> <int>
## 1 Gene MGI:87853 a nonagouti GRCm39 50518
## 2 Gene MGI:87854 Pzp PZP, alpha-2-macr… GRCm39 11287
## 3 Gene MGI:87859 Abl1 c-abl oncogene 1,… GRCm39 11350
## 4 Gene MGI:87860 Abl2 ABL proto-oncogen… GRCm39 11352
## 5 Gene MGI:87862 Scgb1b27 secretoglobin, fa… GRCm39 11354
## 6 Gene MGI:87863 Scgb2b27 secretoglobin, fa… GRCm39 233099
## 7 Gene MGI:87864 Scgb2b26 secretoglobin, fa… GRCm39 110187
## 8 Gene MGI:87866 Acadl acyl-Coenzyme A d… GRCm39 11363
## 9 Gene MGI:87867 Acadm acyl-Coenzyme A d… GRCm39 11364
## 10 Gene MGI:87868 Acads acyl-Coenzyme A d… GRCm39 11409
## # ℹ 290 more rows
## # ℹ 9 more variables: entrez_chromosome <fct>, entrez_start <int>,
## # entrez_end <int>, entrez_strand <fct>, ensembl_gen_id <chr>,
## # ensembl_chromosome <fct>, ensembl_start <int>, ensembl_end <int>,
## # ensembl_strand <fct>
NCBI gene model variables
NCBI gene model annotation variables are prefixed with
entrez
:
dplyr::select(gene_model_coord, "marker_symbol", dplyr::starts_with("entrez"))
## # A tibble: 300 × 6
## marker_symbol entrez_gen_id entrez_chromosome entrez_start entrez_end
## <chr> <int> <fct> <int> <int>
## 1 a 50518 2 154792519 154892932
## 2 Pzp 11287 6 128460530 128503683
## 3 Abl1 11350 2 31578256 31697105
## 4 Abl2 11352 1 156386160 156477189
## 5 Scgb1b27 11354 7 33720906 33722306
## 6 Scgb2b27 233099 7 33711344 33713367
## 7 Scgb2b26 110187 7 33642422 33644410
## 8 Acadl 11363 1 66869998 66902468
## 9 Acadm 11364 3 153627990 153650280
## 10 Acads 11409 5 115248358 115257405
## # ℹ 290 more rows
## # ℹ 1 more variable: entrez_strand <fct>
Ensembl gene model variables
Ensembl gene model annotation variables are prefixed with
ensembl
:
dplyr::select(gene_model_coord, "marker_symbol", dplyr::starts_with("ensembl"))
## # A tibble: 300 × 6
## marker_symbol ensembl_gen_id ensembl_chromosome ensembl_start ensembl_end
## <chr> <chr> <fct> <int> <int>
## 1 a ENSMUSG00000027596 2 154633322 154892932
## 2 Pzp ENSMUSG00000030359 6 128460530 128503683
## 3 Abl1 ENSMUSG00000026842 2 31578388 31694239
## 4 Abl2 ENSMUSG00000026596 1 156386356 156477138
## 5 Scgb1b27 ENSMUSG00000066583 7 33720908 33722306
## 6 Scgb2b27 ENSMUSG00000066584 7 33711346 33713367
## 7 Scgb2b26 ENSMUSG00000066586 7 33642427 33644465
## 8 Acadl ENSMUSG00000026003 1 66869998 66902436
## 9 Acadm ENSMUSG00000062908 3 153627994 153650269
## 10 Acads ENSMUSG00000029545 5 115248358 115257405
## # ℹ 290 more rows
## # ℹ 1 more variable: ensembl_strand <fct>
Variables
marker_type
marker_type
: genetic marker type is a factor of 10
levels: Gene, GeneModel, Pseudogene, DNA Segment, Transgene, QTL,
Cytogenetic Marker, BAC/YAC end, Complex/Cluster/Region, Other Genome
Feature. See ?marker_type_definitions
for the meaning of
each type.
marker_id
marker_id
: MGI accession identifier. A unique
alphanumeric character string that is used to unambiguously identify a
particular record in the Mouse Genome Informatics database. The format
is MGI:nnnnnn
, where n
is a digit.
marker_name
marker_name
: marker name is a word or phrase that
uniquely identifies the genetic marker, e.g. a gene or allele name.
assembly
assembly
: mouse genome assembly version, a factor of two
levels: 'GRCm38'
and 'GRCm39'
. Almost always
'GRCm39'
.
entrez_strand
entrez_strand
: DNA strand, ‘+’ for sense, and ‘-’ for
antisense, according to NCBI gene model.