UniProtKB Swiss-Prot and TrEMBL
Source:vignettes/articles/swiss_trembl_ids.Rmd
swiss_trembl_ids.Rmd
The MGI report MRK_SwissProt_TrEMBL.rpt
provides
associations between MGI genetic markers and UniProtKB/Swiss-Prot and
UniProtKB/TrEMBL identifiers.
To read this report using the key "swiss_trembl_ids"
,
use the following code:
# To read all records (more than 20,000), use `read_report("swiss_trembl_ids")`.
(assoc_to_swiss_trembl_ids <- read_report(report_key = "swiss_trembl_ids", n_max = 30L))
## # A tibble: 30 × 7
## marker_status marker_id marker_symbol marker_name chromosome genetic_map_pos
## <fct> <chr> <chr> <chr> <fct> <dbl>
## 1 O MGI:19156… 0610010K14Rik RIKEN cDNA… 11 43.0
## 2 O MGI:19156… 0610025J13Rik RIKEN cDNA… 4 45.6
## 3 O MGI:19156… 0610030E20Rik RIKEN cDNA… 6 32.3
## 4 O MGI:19175… 0610038B21Rik RIKEN cDNA… 8 36.6
## 5 O MGI:19156… 0610039K10Rik RIKEN cDNA… 2 84.4
## 6 O MGI:19149… 0610040B10Rik RIKEN cDNA… 5 82.1
## 7 O MGI:19235… 0610040J01Rik RIKEN cDNA… 5 32.8
## 8 O MGI:19156… 0610042G04Rik RIKEN cDNA… 9 NA
## 9 O MGI:19156… 1010001I08Rik RIKEN cDNA… NA NA
## 10 O MGI:19150… 1110002E22Rik RIKEN cDNA… 3 64.0
## # ℹ 20 more rows
## # ℹ 1 more variable: uniprot_id <list>
UniProtKB/Swiss-Prot and UniProtKBTrEMBL identifiers
The list-column uniprot_id
provides both
ProtKB/Swiss-Prot and UniProtKBTrEMBL identifiers. To unnest
uniprot_id
use:
assoc_to_swiss_trembl_ids |>
dplyr::select("marker_id", "uniprot_id") |>
dplyr::filter(marker_id == "MGI:1915609") |>
tidyr::unnest("uniprot_id")
## # A tibble: 8 × 2
## marker_id uniprot_id
## <chr> <chr>
## 1 MGI:1915609 A2CF80
## 2 MGI:1915609 A2CF83
## 3 MGI:1915609 D3Z687
## 4 MGI:1915609 F6S0D5
## 5 MGI:1915609 F6XN97
## 6 MGI:1915609 F8WH46
## 7 MGI:1915609 H3BJI0
## 8 MGI:1915609 Q9DCT6
Variables
marker_status
marker_status
: genetic marker status is a factor of two
levels: 'O'
for official, and 'W'
for
withdrawn. Official indicates a currently in-use genetic marker, whereas
withdrawn means that the symbol or name was once approved but has since
been replaced.
marker_id
marker_id
: MGI accession identifier. A unique
alphanumeric character string that is used to unambiguously identify a
particular record in the Mouse Genome Informatics database. The format
is MGI:nnnnnn
, where n
is a digit.
marker_name
marker_name
: marker name is a word or phrase that
uniquely identifies the genetic marker, e.g. a gene or allele name.
chromosome
chromosome
: mouse chromosome name. Possible values are
names for the autosomal, sexual or mitochondrial chromosomes.