Essentially, what this package provides is a single function read_report() to import MGI reports. We try to use a consistent naming scheme for variables across reports, use appropriate variable types, e.g. factors for variables with small enumerations, convert disparate missing values to NA, and other time-consuming tidying steps, so that you don’t have to.




To read an MGI report into R use read_report():


base_url <- ""
marker_list1_rpt <- file.path(base_url, "MRK_List1.rpt")
coordinates_rpt <- file.path(base_url, "MGI_MRK_Coord.rpt")

# Import the Mouse Genetic Markers (including withdrawn marker symbols) Report
read_report(marker_list1_rpt, "MRK_List1", n_max = 10L)
#> # A tibble: 10 × 12
#>    marker_id   marker_symbol marker_name marker_type status cM_pos chr     start
#>    <chr>       <chr>         <chr>       <fct>       <fct>   <dbl> <fct>   <int>
#>  1 MGI:1341858 03B03F        DNA segmen… BAC/YAC end O        NA   5     NA     
#>  2 MGI:1341869 03B03R        DNA segmen… BAC/YAC end O        NA   5     NA     
#>  3 MGI:1337005 03.MMHAP34FR… DNA segmen… DNA Segment O        NA   11    NA     
#>  4 <NA>        0610005A07Rik withdrawn,… Gene        W        NA   3     NA     
#>  5 MGI:1918911 0610005C13Rik RIKEN cDNA… Gene        O        29.4 7      4.52e7
#>  6 <NA>        0610005K03Rik withdrawn,… Gene        W        NA   15    NA     
#>  7 <NA>        0610005M07Rik withdrawn,… Gene        W        NA   6     NA     
#>  8 <NA>        0610006A03Rik withdrawn,… Gene        W        NA   4     NA     
#>  9 <NA>        0610006A11Rik withdrawn,… Gene        W        NA   <NA>  NA     
#> 10 <NA>        0610006C01Rik withdrawn,… Gene        W        NA   <NA>  NA     
#> # ℹ 4 more variables: end <int>, strand <fct>, feature_type <fct>,
#> #   synonyms <list>
# Import the MGI Marker Coordinates' Report
read_report(coordinates_rpt, "MGI_MRK_Coord", n_max = 10L)
#> # A tibble: 10 × 12
#>    marker_id marker_type marker_symbol marker_name  genome_assembly chr    start
#>    <chr>     <fct>       <chr>         <chr>        <fct>           <fct>  <int>
#>  1 MGI:87853 Gene        a             nonagouti    GRCm39          2     1.55e8
#>  2 MGI:87854 Gene        Pzp           PZP, alpha-… GRCm39          6     1.28e8
#>  3 MGI:87881 Gene        Acp1          acid phosph… GRCm39          12    3.09e7
#>  4 MGI:87926 Gene        Adh7          alcohol deh… GRCm39          3     1.38e8
#>  5 MGI:87929 Gene        Adh5          alcohol deh… GRCm39          3     1.38e8
#>  6 MGI:87859 Gene        Abl1          c-abl oncog… GRCm39          2     3.16e7
#>  7 MGI:87882 Gene        Acp2          acid phosph… GRCm39          2     9.10e7
#>  8 MGI:87862 Gene        Scgb1b27      secretoglob… GRCm39          7     3.37e7
#>  9 MGI:87883 Gene        Acp5          acid phosph… GRCm39          9     2.20e7
#> 10 MGI:87930 Gene        Adk           adenosine k… GRCm39          14    2.11e7
#> # ℹ 5 more variables: end <int>, strand <fct>, feature_type <fct>,
#> #   provider <fct>, provider_display <fct>

