Skip to contents

The MGI report MGI_BioTypeConflict.rpt provides markers for which the biotype classification differs across Ensembl, NCBI or MGI.

To read this report using the key "biotype_conflicts", use the following code:

# To read all records (more than 16,000), use `read_report("ensembl_ids")`.
(biotype_conflicts <- read_report(report_key = "biotype_conflicts", n_max = 30L))
## # A tibble: 30 × 6
##    marker_id  marker_symbol database           gene_id        biotype is_mgi_rep
##    <chr>      <chr>         <fct>              <chr>          <chr>   <lgl>     
##  1 MGI:103211 Hoxb3os       NCBI Gene Model    102632302      ncRNA   TRUE      
##  2 MGI:103211 Hoxb3os       Ensembl Gene Model ENSMUSG000000… lncRNA  FALSE     
##  3 MGI:104522 Phxr4         MGI                MGI:104522     protei… FALSE     
##  4 MGI:104522 Phxr4         Ensembl Gene Model ENSMUSG000000… TEC     FALSE     
##  5 MGI:104522 Phxr4         NCBI Gene Model    18689          ncRNA   TRUE      
##  6 MGI:104524 Phxr2         MGI                MGI:104524     protei… FALSE     
##  7 MGI:104524 Phxr2         Ensembl Gene Model ENSMUSG000000… lncRNA  TRUE      
##  8 MGI:104547 Amy2b         MGI                MGI:104547     protei… FALSE     
##  9 MGI:104547 Amy2b         NCBI Gene Model    545562         protei… TRUE      
## 10 MGI:104547 Amy2b         Ensembl Gene Model ENSMUSG000000… unproc… FALSE     
## # ℹ 20 more rows

Each row is for the combination genetic marker / biotype classification. The source of the classification is indicated in the database variable. The gene_id can be any gene identifier that is meaningful in any of these gene models.

MGI Representative Gene Model

The variable is_mgi_rep stands for is the MGI genomic representative sequence and is encoded as a logical vector that indicates whether the corresponding gene_id and biotype values are the ones associated with MGI representative sequence. See vignette("representative_sequence") for more details.

biotype_conflicts |>
  dplyr::filter(is_mgi_rep)
## # A tibble: 10 × 6
##    marker_id  marker_symbol database           gene_id        biotype is_mgi_rep
##    <chr>      <chr>         <fct>              <chr>          <chr>   <lgl>     
##  1 MGI:103211 Hoxb3os       NCBI Gene Model    102632302      ncRNA   TRUE      
##  2 MGI:104522 Phxr4         NCBI Gene Model    18689          ncRNA   TRUE      
##  3 MGI:104524 Phxr2         Ensembl Gene Model ENSMUSG000000… lncRNA  TRUE      
##  4 MGI:104547 Amy2b         NCBI Gene Model    545562         protei… TRUE      
##  5 MGI:104642 Pla2g2a       Ensembl Gene Model ENSMUSG000000… protei… TRUE      
##  6 MGI:105101 Rnu7-ps3      Ensembl Gene Model ENSMUSG000000… snRNA   TRUE      
##  7 MGI:105103 Rprl3         NCBI Gene Model    19785          ncRNA   TRUE      
##  8 MGI:105104 Rprl2         NCBI Gene Model    19784          ncRNA   TRUE      
##  9 MGI:105105 Rprl1         NCBI Gene Model    19783          ncRNA   TRUE      
## 10 MGI:106023 Hmga2-ps1     NCBI Gene Model    15365          pseudo  TRUE

Variables

marker_id

marker_id: MGI accession identifier. A unique alphanumeric character string that is used to unambiguously identify a particular record in the Mouse Genome Informatics database. The format is MGI:nnnnnn, where n is a digit.

marker_symbol

marker_symbol: marker symbol is a unique abbreviation of the marker name.

database

database: database or catalogue within the source that provides the genomic annotation.

biotype

biotype: a gene or transcript biotype, according to any of the gene models by NCBI, Ensembl or MGI.

gene_id

gene_id: a gene identifier from either NCBI, Ensembl or MGI.

is_mgi_rep

is_mgi_rep: a logical, indicating whether the genetic marker is for gene model that is the MGI representative.