MAF multiple alignment format

From BioPerl
Jump to: navigation, search

Description

"The multiple alignment format stores a series of multiple alignments in a format that is easy to parse and relatively easy to read. This format stores multiple alignments at the DNA level between entire genomes. Previously used formats are suitable for multiple alignments of single proteins or regions of DNA without rearrangements, but would require considerable extension to cope with genomic issues such as forward and reverse strand directions, multiple pieces to the alignment, and so forth." --UCSC FAQ

September 10 2007: Lincoln Stein says MAF file support is "coming soon" to GBrowse (bioperl-l mailing list).

maf format has recently changed, whereas it is downward compatible, the new version has more than just a- and s-lines, see UCSC FAQ. However, the additional information will probably not fit into the Alignment-Classes of bioperl.

NOTE - The MIRA alignment tool produces a completely different MAF file format (MIRA Assembly Format (MAF)).

Example

TODO - check if this is a real MAF file.

##maf version=1 scoring=humor.v4
# humor.v4 R=30 M=10 /cluster/bluearc/hg16/bed/blastz.mm3/mafNet300/chr21.mm3.maf /cluster/bluearc/hg16/bed/blastz.rn3/mafNet300/chr21.rn3.maf
a score=9502.0
s hg16.chr21   9928623 118 +  46976097 AGCTTGTCAAGTAAGCTACCTATTTAGTGCTCGGAATGAAAGGGAGTGTGTGTTGGGAGTTGGGGGACTG----CTTGCGTGAAACATTTCTCTCTTCTGGATTTAAAAC-TTAGTCTTGGTT
s mm3.chr8   107811077 114 - 128923138 AGTTTGTCTCATAAGCCACCTGTGCACTGCCAG----AGAAAGGGAAGTGAGCTAGCAGTGAGGGCCCAG----TGTGTGTGTGTGCTTTC-CTTCTCTGGATTTAGAACCTTGGTTTTGCTT
s rn3.chr16   74307054 118 +  90224819 AGTTTGTCATGTCTGCTACCTGCGTTGCTAGAG----AAAAGGGAAAGGGAGGTAGCAGTGAGGGCCTAGGAGTTTTGTTTGTTTTTTTTC-CTTCTCTGGGTTTAAAACATTGGTTTTGCTC

a score=378.0
s hg16.chr21   9928803 10 +  46976097 AATAAATCTG
s mm3.chr8   107811442 10 - 128923138 AATCAATTAG

a score=20305.0
s hg16.chr21   9928813 183 +  46976097 ----AAATTACAAATGTGAACCAAAGCAGGGAATAAATACTTGACCAAAAATATGTAAGTAAGTGGGTGTTGGGGAATCACAATTTTTGAATATCTCAAGT-TTTTGCTTTGAAAGTTC---TATTTCAAAGTTCTTC-AAAATGATGCCTGATGTTCCTGCATACTG--TGTTCCAAATTTAGGTAAATACAA
s mm3.chr8   107811452 179 - 128923138 AAGCAAATCACAAACTTAAAGCAGA-------ATAAATACTTGTTCAGAAACACGTAAGTACCTGTGTTTTGTGGGAACGCAATTTTTGTATAGCTCAATTATTTTGACAGAAAAGATCTGTTATTTCAAAGTTCTTC-AAAATGATACCTAGTACTCCCATGTACTGTATGTTCCAAA-------GAGTACAA
s rn3.chr16   74307421 158 +  90224819 ----AAATCACAAATATAAAACAGA---------GAATACTTGTTCAGAAATATGTAAGTACATGTGCTTTGTGGGAACACAATTTTTATATAGCTCTCTT----------------TTTGTTACTTCAAATTTCTTCAAAAATTATACCTAATACTCCCGGGTAGTGCGTATTCCAAA-------CAATACAA

a score=-3898.0
s hg16.chr21   9928996  17 +  46976097 GAT------------------------------------------------------------------------------------------------------------------------------------GGAAACTGTGAAGT
s mm3.chr8   107811631 149 - 128923138 AGACaaccaggcatggcggggcatgcctatgatccaggattcaggcaggtagatttctgtgagttcagactagcttacatagtgaattccaggccagcaaaggctaaatagtgcaaccctctctcaaaaagagaaGGAGACTGTATTGT

a score=68485.0
s hg16.chr21   9929013 674 +  46976097 ATGTACCTTCAAAAA--AGAAGAAAGACA-CTGACAT--TTtatctatatatatgtaatagatttatgaagaacatatataaacatatataaac------ataaataa---------ataCTTCAAGGAACATTTAGGATAGATTTAGGATATATGAACATGTGGCA-GGGTTGGAAAGAACATAATTCTTTCCCAGAAGGGGAAGGGGGAGCTATACTTAATCGGATCCAGCTACAACATCACTGGAAGTCATTTTCTCGCCAAAAAGTATCTCCACGGCAAAATCTGATGGATAAATTCTCCGTGCTTTTTGTTTATGTAGATTATCCAATTCATTTTTTGGTAGATAAAGCCTAAGAATAGAAA--AAAATTATTACATTTTATATTGGGGCTCACTGAAAAGCCACACAGTTGGGTACCCACGTTAGA--GCTGGAAGAAACAAAAAAAGAACCTCACCA-TGAATAGAACCTCAGCCCTTTT---------------TTTGTGTCTCAGTTGGCTCCTTCCACCTTTTTACTGAGACATAAAAATACTTTAATCTTCAGCAACACATCAGTAACACATGCTGACTCTCTTAGCATTGCTTTATATGGAATTAATGACATCCAAGTTTAATAAAATATCTAAATTTCTCTCTGTGACAGAAATCAGTCAGGATAGACATGAATGAAAGCCCAGTAATAAAATACTATATCT
s mm3.chr8   107811780 620 - 128923138 ATATGCCTTCAAAAGAAAAAAAAATGACAGTTGACATAATTTTCCTGTATACATTTAATATACTTCCCAGAAacatatgtacatacatatacacacacatatatataAAATACAGACATACCATCAAAAACATT--------GGTTAGAAGACATAACTATGTAGCATGGATCTGGAAGAACATGATTCTATCCAAGAA-------GGGGAGCGGCACTTACTCAA---CAGCTAGT-------------------TCTCACCAAAATCCACCTCTACTGCAAACTTTTCACCATAAATTTTCCATGTTTTTGGTTTATGGGTATTATCCAATTCGTTTCTTGGTAGATAAAGCCTAAAAATAGGGGAGAAAGCCATTACCCTTTCTTTTAGGGCTAACAGAAAGG-CATGCAGTAGTTTACCCATTTTGGCATATCAGAAAAGAC-----------CTCACTGCTGATCAGAGCCTCCGTCCCTCTCAGTCCCTCAGAGCCTCTGTGTGTTGGTTGGCTGCTTACAATTCTT---CGAGATAC-AAAATCCTT--GT----ATGGATATGT----------TGCTGATTCTTCCAACATTGGTTTTTATTACCTTATTGA----------TGAT-----------ATTTCTATAAGTGAAAGATACT--CCAG---AGATGTATAATAAAACTCAGTAATAAAATACTGCTTCT
s rn3.chr16   74307734 603 +  90224819 ATATACCTTCAAAAG---AAAACATGACAGTTGACGTAATTTTCCTGTATACATTTAATATATTTCCCAGAAACATATGTACATGCATATACACTCACATATATATAAAATACGGACATACCATCAAAAACATA--------GGTTAGAAGACATGACTATGTAGCACGGATCTG---GAACATAATTCTATCCAAGAA-------GGGGAGCGGCACTTATTCAA---CAGCTAGTACATAACCGTGGAGC-TTATCTCCTCAAAATACACCTCGACTGCAAACTTTGGACCATAAATTTTCCATGTTTTTTGTTTATGGGTATTATCCAATTCGCTTCTTGGTAGATAAAGCCTAAAGACAGGAGA-AAACCCATTACACTTTCTTTTAGGGCTGACAGAAAGG-CATGCGATAGATTACCCATTTTGGCATGTCAGAAGAGAC-------------CACTACTGATCGGGGCCTCAGTCCCTCT---------------TCTGTGTGTCAGCTGGCTCCTTGCAATTCTT---CAAGATAC-AAAATACCTTAGT----AAGGACATGT----------TGCTGATGCTTCCAGCATTCCTTTAT-----------------------TGAT-----------ATTTCTCTAAGTGAGAAATACT--CCAG---AGATGTCTAATGAAACCCAGTAATAAAATACCGCTTCT

a score=32753.0
s hg16.chr21   9929848 460 +  46976097 AGAAGTTAGCATTTTTAGCTAAACAACAATCTC---------ATAACAAAAACAGCTTTACCAAGTAGGATGTAAATTTAAATGTTACAGAAATCTTTAGAAATTTATATAAAA-TAAGAA-TAAAAGTGACCTAGCTTATCACTTCTCCAAAATGAACATAGTGTTTTAAAGGAAAAAAAAAATGGTATCCTTTAGCAAGAACCACTTTTGAGGAGCAGCATCAAATGAAGCTCCACCCAGGTCTCACTTTTTGAGGGTCTTTGCTCATGTTAGAATAAAA-AGCTTATTGTTTGTATGCATCCAAAAAAAAAAACTTGTAAAAAATTTCCATCAAATACAAAGTTGACTCTATCAAAATCCATTAAATGTTTTGCATTGCAAGTGTGCAGACCAGAGGTTTAATTTCCTGTTGCCTTGCTGGACTTAAGGAATCATTCGATCCAGTTCACATTTGAAGAAAAGATTAGGA
s mm3.chr8   107812411 408 - 128923138 AAAAGTTGGTG-TTTGGGCTAAACAGCCACCTCCTGACTATGACTATAAAAACATCATACTCA-----TATGCAAA-TTAAATGTTACAGACACGTTTAGAATCTCCTATAAAAGTGAGCACTGAAAATGATCCAACT--TTGCTTCTTTTATATTAACATTG-ACCATAAAGAACGAAAC------TGTCCTCTAGCAAAAATCACTCTTGAGAAATAGCAACAAATTA--------CCAG--CTGGCTTTTGAAG-------ACTCAC----GAGTCAAAGGGTT---------TTCACATCCAGAAATAAA----TGGTAAAAATCTCCCCCAGATTCAGGGTCGCTCTTAC-----TCATCTAAATGT------GTGTGCGTGTCTGGGCCACTGGCTC--CCTGCTGGTGTCTTACAAGGCTTGACTTGTTACCCAGTCTCGGTCA-GCTTGAAGAGAAGGTTAGGA
s rn3.chr16   74308348 389 +  90224819 AAAAGTTGGTG-TTTTGGCTAAACAGCCACCTCT--------ACTATATAAACATCATGCTCA-----TATGCAAA-TTAAACAGTGCGAACA--TTTAGAATCTCATATAAAAGTGAGCATTAAAAACAATCCAGCT--TTGCTTC---TATATTAACATTG-ACCATGAAGAACGAAAT------CATCCTTTAGCAAAAATCACTCTTGAGAACTAGCACCCGAGTA--------CCAG--CTGGCTGTCAAAG-------ACTCAT----GACTCAAA-GGCT---------TGCACACTCAGCAATAAG----TGGTAAAAATCTCCATCAAATCCAGGGTGGCT-TTAC-----GCATCTAAGTGT------GTGCGCA-GTGTGAGTCTGTGG-----GCCACTAGTGTCTTACAAACCTTGACCCGTTATCCAGTCTTGGGCA-GCTTGAAGAGAAGGTGAGGA

a score=-1105.0
s hg16.chr21  9930308 97 + 46976097 CTGGATGTAACAATAACTATCAATTCATGCCACATATAATCATAGCCACTTCTTCAACTCTGACCTAAATCATTTAAAAAATATTTTGTCCTTTTGT
s rn3.chr16  74308737 29 + 90224819 ----------------------------------------CATAGTGGATTCT----------------------------CATTCTGTCTGTCTGT

a score=1532.0
s hg16.chr21   9930514 44 +  46976097 CAGAAGGTTTTTTTGGAACAATAATCTCCAAATCCAATTAATAA
s mm3.chr8   107812892 36 - 128923138 CAGAAGATTCTTTGGG----ATAATATTCAAATCCAATTA----
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox