UniProtKB Swiss-Prot and TrEMBL
Source:vignettes/articles/swiss_trembl_ids.Rmd
swiss_trembl_ids.Rmd
The MGI report MRK_SwissProt_TrEMBL.rpt
provides
associations between MGI genetic markers and UniProtKB/Swiss-Prot and
UniProtKB/TrEMBL identifiers.
To read this report using the key "swiss_trembl_ids"
,
use the following code:
# To read all records (more than 20,000), use `read_report("swiss_trembl_ids")`.
(assoc_to_swiss_trembl_ids <- read_report(report_key = "swiss_trembl_ids", n_max = 30L))
## # A tibble: 30 × 7
## marker_status marker_id marker_symbol marker_name chromosome genetic_map_pos
## <fct> <chr> <chr> <chr> <fct> <dbl>
## 1 O MGI:19156… 0610025J13Rik RIKEN cDNA… 4 45.6
## 2 O MGI:19156… 0610030E20Rik RIKEN cDNA… 6 32.3
## 3 O MGI:19175… 0610038B21Rik RIKEN cDNA… 8 36.6
## 4 O MGI:19156… 0610039K10Rik RIKEN cDNA… 2 84.4
## 5 O MGI:19149… 0610040B10Rik RIKEN cDNA… 5 82.1
## 6 O MGI:19156… 0610042G04Rik RIKEN cDNA… 9 NA
## 7 O MGI:19156… 1010001I08Rik RIKEN cDNA… NA NA
## 8 O MGI:19150… 1110002E22Rik RIKEN cDNA… 3 64.0
## 9 O MGI:19158… 1110002L01Rik RIKEN cDNA… 12 1.79
## 10 O MGI:19292… 1110004F10Rik RIKEN cDNA… 7 61.4
## # ℹ 20 more rows
## # ℹ 1 more variable: uniprot_id <list>
UniProtKB/Swiss-Prot and UniProtKBTrEMBL identifiers
The list-column uniprot_id
provides both
ProtKB/Swiss-Prot and UniProtKBTrEMBL identifiers. To unnest
uniprot_id
use:
assoc_to_swiss_trembl_ids |>
dplyr::select("marker_id", "uniprot_id") |>
dplyr::filter(marker_id == "MGI:1915609") |>
tidyr::unnest("uniprot_id")
## # A tibble: 0 × 2
## # ℹ 2 variables: marker_id <chr>, uniprot_id <???>
Variables
marker_status
marker_status
: genetic marker status is a factor of two
levels: 'O'
for official, and 'W'
for
withdrawn. Official indicates a currently in-use genetic marker, whereas
withdrawn means that the symbol or name was once approved but has since
been replaced.
marker_id
marker_id
: MGI accession identifier. A unique
alphanumeric character string that is used to unambiguously identify a
particular record in the Mouse Genome Informatics database. The format
is MGI:nnnnnn
, where n
is a digit.
marker_name
marker_name
: marker name is a word or phrase that
uniquely identifies the genetic marker, e.g. a gene or allele name.
chromosome
chromosome
: mouse chromosome name. Possible values are
names for the autosomal, sexual or mitochondrial chromosomes.