FASTA alignment program
Contents |
Description
This entry refers to the FASTA alignment program fasta97,fasta98. It produces output which can be parsed by in BioPerl by Bio::SearchIO. There is also a FASTA sequence format which refer to the sequence file format that was initially designed for input to these tools. There is a simple extension of the sequence format to a FASTA multiple alignment format which is different from the database search result format that is output by the FASTA applications.
Bill Pearson's package for sequence database searching.
History
(Wanted: someone to add some history of FASTA here)
Tips and Hints
Output options
BioPerl can parse both the default output and the -m 9 output which happens to be much more compact and leads to smaller filesizes (since alignments are not produced). If your needs are just E-value scores from SSEARCH or FASTA you can use the following options to produce a small tab-delimited file using the fastam9_to_table.PLS script.
fasta34 -H -E 1e-5 -m 9 -d 0 QueryFile SearchDatabase | fastam9_to_table > results.tab
This will lead to a small filesize limiting your disk space usage requirements and potentially speeding up your analysis.
Profile searches
From the release notes, here is information on how to search a sequence profile against a database using SW algorithm.
>>June 16, 2003 version: fasta34t22
ssearch34 now supports PSI-BLAST PSSM/profiles. Currently, it only
supports the "checkpoint" file produced by blastall, and only on
certain architectures where byte-reordering is unnecessary. It has not
been tested extensively with the -S option.
ssearch34 -P blast.ckpt -f -11 -g -1 -s BL62 query.aa library
Will use the frequency information in the blast.chkpt file to do a
position specific scoring matrix (PSSM) search using the
Smith-Waterman algorithm. Because ssearch34 calculates scores for
each of the sequences in the database, we anticipate that PSSM
ssearch34 statistics will be more reliable than PSI-Blast statistics.
The Blast checkpoint file is mostly double precision frequency
numbers, which are represented in a machine specific way. Thus, you
must generate the checkpoint file on the same machine that you run
ssearch34 or prss34 -P query.ckpt. To generate a checkpoint file,
run:
blastpgp -j 2 -h 1e-6 -i query.fa -d swissprot -C query.ckpt -o /dev/null
(This searches swissprot for 2 iterations ("-j 2" using a E()
threshold 1e-6 saving the resulting position specific frequencies in
query.ckpt. Note that the original query.fa and query.ckpt must match.)
References
<biblio>
- fasta98 pmid=3162770
- fasta97 pmid=9403055
</biblio>