Module:Bio::Index::Fasta
Contents |
| Pdoc documentation: Bio::Index::Fasta | CPAN documentation: Bio::Index::Fasta |
|---|
Setting Up the BioPerl Indices (Bio::Index::*)
If you want to use BioPerl indices of FASTA, EMBL, Swissprot .dat files, SwissPfam, GenBank, or BLAST files then the bp_fetch.PLS and bp_index.PLS scripts are great ways to start off (and also reading the scripts shows you how to use the BioPerl indexing stuff). You can find these two scripts in the scripts/index directory (see Bioperl scripts for a complete list of BioPerl scripts).
bp_fetch.PLS and bp_index.PLS coordinate using two environment variables:
- BIOPERL_INDEX - directory where the indices are kept
- BIOPERL_INDEX_TYPE - type of DBM file to use for the index (see AnyDBM_File)
For example, for csh-style shells (eg. tcsh):
setenv BIOPERL_INDEX /nfs/datadisk/bioperlindex/ setenv BIOPERL_INDEX_TYPE SDBM_File
Or in sh-style shells (eg. bash):
export BIOPERL_INDEX=/nfs/datadisk/bioperlindex/ export BIOPERL_INDEX_TYPE=SDBM_File
The basic way of indexing a database, once BIOPERL_INDEX has been set up, is to go
bp_index.pl <index-name> <filenames as full path>
e.g., for Fasta files
bp_index.pl est /nfs/somewhere/fastafiles/est*.fa
Or, for EMBL/Swissprot files
bp_index.pl -fmt=EMBL swiss /nfs/somewhere/swiss/swissprot.dat
Retrieving Sequences
To retrieve sequences from the index use
bp_fetch.pl <index-name>:<id>
For example:
bp_fetch.pl est:AA01234
or
bp_fetch.pl swiss:VAV_HUMAN
bp_fetch.pl also has other options to connect to GenBank across the network.
Other Modules
See Bio::Index::Fasta, Bio::Index::GenBank, Bio::Index::Blast, Bio::Index::Hmmer, Bio::Index::EMBL, Bio::Index::SwissPfam, and Bio::Index::Swissprot for more.
Flat file indexing of Fasta files is also provided by Bio::DB::Fasta - this module provides some functionality absent from Bio::Index::Fasta.