Ensembl, NCBI or MGI Biotype Conflicts
Source:vignettes/articles/biotype_conflicts.Rmd
biotype_conflicts.Rmd
The MGI report MGI_BioTypeConflict.rpt
provides markers
for which the biotype classification differs across Ensembl, NCBI or
MGI.
To read this report using the key "biotype_conflicts"
,
use the following code:
# To read all records (more than 16,000), use `read_report("ensembl_ids")`.
(biotype_conflicts <- read_report(report_key = "biotype_conflicts", n_max = 30L))
## # A tibble: 30 × 6
## marker_id marker_symbol database gene_id biotype is_mgi_rep
## <chr> <chr> <fct> <chr> <chr> <lgl>
## 1 MGI:103211 Hoxb3os NCBI Gene Model 102632302 ncRNA TRUE
## 2 MGI:103211 Hoxb3os Ensembl Gene Model ENSMUSG000000… lncRNA FALSE
## 3 MGI:104522 Phxr4 MGI MGI:104522 protei… FALSE
## 4 MGI:104522 Phxr4 Ensembl Gene Model ENSMUSG000000… TEC FALSE
## 5 MGI:104522 Phxr4 NCBI Gene Model 18689 ncRNA TRUE
## 6 MGI:104524 Phxr2 MGI MGI:104524 protei… FALSE
## 7 MGI:104524 Phxr2 Ensembl Gene Model ENSMUSG000000… lncRNA TRUE
## 8 MGI:104547 Amy2b MGI MGI:104547 protei… FALSE
## 9 MGI:104547 Amy2b NCBI Gene Model 545562 protei… TRUE
## 10 MGI:104547 Amy2b Ensembl Gene Model ENSMUSG000000… unproc… FALSE
## # ℹ 20 more rows
Each row is for the combination genetic marker / biotype
classification. The source of the classification is indicated in the
database
variable. The gene_id
can be any gene
identifier that is meaningful in any of these gene models.
MGI Representative Gene Model
The variable is_mgi_rep
stands for is the MGI
genomic representative sequence and is encoded as a logical vector
that indicates whether the corresponding gene_id
and
biotype
values are the ones associated with MGI
representative sequence. See
vignette("representative_sequence")
for more details.
biotype_conflicts |>
dplyr::filter(is_mgi_rep)
## # A tibble: 10 × 6
## marker_id marker_symbol database gene_id biotype is_mgi_rep
## <chr> <chr> <fct> <chr> <chr> <lgl>
## 1 MGI:103211 Hoxb3os NCBI Gene Model 102632302 ncRNA TRUE
## 2 MGI:104522 Phxr4 NCBI Gene Model 18689 ncRNA TRUE
## 3 MGI:104524 Phxr2 Ensembl Gene Model ENSMUSG000000… lncRNA TRUE
## 4 MGI:104547 Amy2b NCBI Gene Model 545562 protei… TRUE
## 5 MGI:104642 Pla2g2a Ensembl Gene Model ENSMUSG000000… protei… TRUE
## 6 MGI:105101 Rnu7-ps3 Ensembl Gene Model ENSMUSG000000… snRNA TRUE
## 7 MGI:105103 Rprl3 NCBI Gene Model 19785 ncRNA TRUE
## 8 MGI:105104 Rprl2 NCBI Gene Model 19784 ncRNA TRUE
## 9 MGI:105105 Rprl1 NCBI Gene Model 19783 ncRNA TRUE
## 10 MGI:106023 Hmga2-ps1 NCBI Gene Model 15365 pseudo TRUE
Variables
marker_id
marker_id
: MGI accession identifier. A unique
alphanumeric character string that is used to unambiguously identify a
particular record in the Mouse Genome Informatics database. The format
is MGI:nnnnnn
, where n
is a digit.