FASTA alignment program

From BioPerl
Jump to: navigation, search

Contents

Description

This entry refers to the FASTA alignment program fasta97,fasta98. It produces output which can be parsed by in BioPerl by Bio::SearchIO. There is also a FASTA sequence format which refer to the sequence file format that was initially designed for input to these tools. There is a simple extension of the sequence format to a FASTA multiple alignment format which is different from the database search result format that is output by the FASTA applications.

Bill Pearson's package for sequence database searching.

History

(Wanted: someone to add some history of FASTA here)

Tips and Hints

Output options

BioPerl can parse both the default output and the -m 9 output which happens to be much more compact and leads to smaller filesizes (since alignments are not produced). If your needs are just E-value scores from SSEARCH or FASTA you can use the following options to produce a small tab-delimited file using the fastam9_to_table.PLS script.

fasta34 -H -E 1e-5 -m 9 -d 0 QueryFile SearchDatabase | fastam9_to_table > results.tab

This will lead to a small filesize limiting your disk space usage requirements and potentially speeding up your analysis.

Profile searches

From the release notes, here is information on how to search a sequence profile against a database using SW algorithm.

>>June 16, 2003 version: fasta34t22
ssearch34 now supports PSI-BLAST PSSM/profiles.  Currently, it only
supports the "checkpoint" file produced by blastall, and only on
certain architectures where byte-reordering is unnecessary.  It has not
been tested extensively with the -S option.

       ssearch34 -P blast.ckpt -f -11 -g -1 -s BL62 query.aa library

Will use the frequency information in the blast.chkpt file to do a
position specific scoring matrix (PSSM) search using the
Smith-Waterman algorithm.  Because ssearch34 calculates scores for
each of the sequences in the database, we anticipate that PSSM
ssearch34 statistics will be more reliable than PSI-Blast statistics.

The Blast checkpoint file is mostly double precision frequency
numbers, which are represented in a machine specific way.  Thus, you 
must generate the checkpoint file on the same machine that you run
ssearch34 or prss34 -P query.ckpt.  To generate a checkpoint file,
run:

blastpgp -j 2 -h 1e-6 -i query.fa -d swissprot -C query.ckpt -o /dev/null

(This searches swissprot for 2 iterations ("-j 2" using a E()
threshold 1e-6 saving the resulting position specific frequencies in
query.ckpt.  Note that the original query.fa and query.ckpt must match.)


References

<biblio>

  1. fasta98 pmid=3162770
  2. fasta97 pmid=9403055

</biblio>

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox