Skip to contents

The goal of grantham is to provide a minimal set of routines to calculate the Grantham distance (Grantham (1974)).

The Grantham distance attempts to provide a proxy for the evolutionary distance between two amino acids based on three key side chain chemical properties: composition, polarity and molecular volume. In turn, evolutionary distance is used as a proxy for the impact of missense substitutions. The higher the distance, the more deleterious the substitution is expected to be.

Installation

Install grantham from CRAN:

install.packages("grantham")

Usage

Grantham distance between two amino acids:

library(grantham)

grantham_distance(x = "Ser", y = "Phe")
#> # A tibble: 1 × 3
#>   x     y         d
#>   <chr> <chr> <dbl>
#> 1 Ser   Phe     155

The function grantham_distance() is vectorised with amino acids being matched element-wise to form pairs for comparison:

grantham_distance(x = c("Ser", "Arg"), y = c("Phe", "Leu"))
#> # A tibble: 2 × 3
#>   x     y         d
#>   <chr> <chr> <dbl>
#> 1 Ser   Phe     155
#> 2 Arg   Leu     102

The two vectors of amino acids must have compatible sizes in the sense of vec_recycle() for element recycling to be possible, i.e., either the two vectors have the same length, or one of them is of length one, and it is recycled up to the length of the other.

# `'Ser'` is recycled to match the length of the second vector, i.e. 3.
grantham_distance(x = "Ser", y = c("Phe", "Leu", "Arg"))
#> # A tibble: 3 × 3
#>   x     y         d
#>   <chr> <chr> <dbl>
#> 1 Ser   Phe     155
#> 2 Ser   Leu     145
#> 3 Ser   Arg     110

Use the function amino_acid_pairs() to generate all 20 x 20 amino acid pairs:

aa_pairs <- amino_acid_pairs()
aa_pairs
#> # A tibble: 400 × 2
#>    x     y    
#>    <chr> <chr>
#>  1 Ser   Ser  
#>  2 Ser   Arg  
#>  3 Ser   Leu  
#>  4 Ser   Pro  
#>  5 Ser   Thr  
#>  6 Ser   Ala  
#>  7 Ser   Val  
#>  8 Ser   Gly  
#>  9 Ser   Ile  
#> 10 Ser   Phe  
#> # ℹ 390 more rows

And now calculate all Grantham distances for all pairs aa_pairs:

grantham_distance(x = aa_pairs$x, y = aa_pairs$y)
#> # A tibble: 400 × 3
#>    x     y         d
#>    <chr> <chr> <dbl>
#>  1 Ser   Ser       0
#>  2 Ser   Arg     110
#>  3 Ser   Leu     145
#>  4 Ser   Pro      74
#>  5 Ser   Thr      58
#>  6 Ser   Ala      99
#>  7 Ser   Val     124
#>  8 Ser   Gly      56
#>  9 Ser   Ile     142
#> 10 Ser   Phe     155
#> # ℹ 390 more rows

Because distances are symmetric, and for pairs formed by the same amino acid are trivially zero, you might want to exclude these pairs:

# `keep_self = FALSE`: excludes pairs such as ("Ser", "Ser")
# `keep_reverses = FALSE`: excludes reversed pairs, e.g. ("Arg", "Ser") will be
# removed because ("Ser", "Arg") already exists.
aa_pairs <- amino_acid_pairs(keep_self = FALSE, keep_reverses = FALSE)

# These amino acid pairs are the 190 pairs shown in Table 2 of Grantham's
# original publication.
aa_pairs
#> # A tibble: 190 × 2
#>    x     y    
#>    <chr> <chr>
#>  1 Ser   Arg  
#>  2 Ser   Leu  
#>  3 Ser   Pro  
#>  4 Ser   Thr  
#>  5 Ser   Ala  
#>  6 Ser   Val  
#>  7 Ser   Gly  
#>  8 Ser   Ile  
#>  9 Ser   Phe  
#> 10 Ser   Tyr  
#> # ℹ 180 more rows

# Grantham distance for the 190 unique amino acid pairs
grantham_distance(x = aa_pairs$x, y = aa_pairs$y)
#> # A tibble: 190 × 3
#>    x     y         d
#>    <chr> <chr> <dbl>
#>  1 Ser   Arg     110
#>  2 Ser   Leu     145
#>  3 Ser   Pro      74
#>  4 Ser   Thr      58
#>  5 Ser   Ala      99
#>  6 Ser   Val     124
#>  7 Ser   Gly      56
#>  8 Ser   Ile     142
#>  9 Ser   Phe     155
#> 10 Ser   Tyr     144
#> # ℹ 180 more rows

The Grantham distance di, j for two amino acids i and j is:

di, j = ρ(α(cicj)2+β(pipj)2+γ(vivj)2)1/2 .

The distance is based on three chemical properties of amino acid side chains:

  • composition (c)
  • polarity (p)
  • molecular volume (v)

We provide a data set with these properties:

amino_acids_properties
#> # A tibble: 20 × 4
#>    amino_acid     c     p     v
#>    <chr>      <dbl> <dbl> <dbl>
#>  1 Ser         1.42   9.2  32  
#>  2 Arg         0.65  10.5 124  
#>  3 Leu         0      4.9 111  
#>  4 Pro         0.39   8    32.5
#>  5 Thr         0.71   8.6  61  
#>  6 Ala         0      8.1  31  
#>  7 Val         0      5.9  84  
#>  8 Gly         0.74   9     3  
#>  9 Ile         0      5.2 111  
#> 10 Phe         0      5.2 132  
#> 11 Tyr         0.2    6.2 136  
#> 12 Cys         2.75   5.5  55  
#> 13 His         0.58  10.4  96  
#> 14 Gln         0.89  10.5  85  
#> 15 Asn         1.33  11.6  56  
#> 16 Lys         0.33  11.3 119  
#> 17 Asp         1.38  13    54  
#> 18 Glu         0.92  12.3  83  
#> 19 Met         0      5.7 105  
#> 20 Trp         0.13   5.4 170

If you want to calculate the Grantham distance from these property values you may use the function grantham_equation().

Other sources we’ve found in the R ecosystem that also provide code for calculation of the Grantham distance:

  • A GitHub Gist by Daniel E Cook provides the function calculate_grantham(), see Fetch_Grantham.R.
  • The {midasHLA} package includes the unexported function distGrantham() in utils.R.
  • The {HLAdivR} package exports a data set with the Grantham distances in the format of a matrix, see data.R.
  • The Bioconductor package {MSA2dist} by Kristian K. Ullrich provides the function aastring2dist().

Code of Conduct

Please note that the grantham package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

References

1. Grantham, R. Amino acid difference formula to help explain protein evolution. Science 185, 862–864 (1974). doi: 10.1126/science.185.4154.862.