BioPerl Alphabets
From BioPerl
Bioperl alphabets
Bioperl modules use the standard extended single-letter genetic alphabets to represent nucleotide and amino acid sequences.
In addition to the standard alphabet, the following symbols are also acceptable in a biosequence:
| Symbol | Meaning |
|---|---|
| ? | a missing nucleotide or amino acid |
| - | gap in sequence |
Extended DNA / RNA alphabet
| Symbol | Meaning | Nucleic Acid |
|---|---|---|
| A | A | Adenine |
| C | C | Cytosine |
| G | G | Guanine |
| T | T | Thymine |
| U | U | Uracil |
| M | A or C | aMino |
| R | A or G | puRine |
| W | A or T | Weak |
| S | C or G | Strong |
| Y | C or T | pYrimidine |
| K | G or T | Keto |
| V | A or C or G | not T (V) |
| H | A or C or T | not G (H) |
| D | A or G or T | not C (D) |
| B | C or G or T | not A (B) |
| X | G or A or T or C | any (not recommended) |
| N | G or A or T or C | aNy |
IUPAC-IUB SYMBOLS FOR NUCLEOTIDE NOMENCLATURE: Cornish-Bowden (1985) Nucl. Acids Res. 13: 3021-3030.
Amino Acid alphabet
Note that every letter of the alphabet is now used in the amino acid code.
| Symbol | Meaning |
|---|---|
| A | Alanine |
| B | Aspartic Acid, Asparagine |
| C | Cystine |
| D | Aspartic Acid |
| E | Glutamic Acid |
| F | Phenylalanine |
| G | Glycine |
| H | Histidine |
| I | Isoleucine |
| J | Leucine,Isoleucine |
| K | Lysine |
| L | Leucine |
| M | Methionine |
| N | Asparagine |
| O | Pyrrolysine |
| P | Proline |
| Q | Glutamine |
| R | Arginine |
| S | Serine |
| T | Threonine |
| U | Selenocysteine |
| V | Valine |
| W | Tryptophan |
| X | Unknown |
| Y | Tyrosine |
| Z | Glutamic Acid, Glutamine |
| * | Terminator |
IUPAC-IUP AMINO ACID SYMBOLS: Biochem J. 1984 Apr 15; 219(2): 345-373 Eur J Biochem. 1993 Apr 1; 213(1): 2
G. Srinivasan, C. M. James, J. A. Krzycki. Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA. Science 2002, 296:1459-1462.