Module:Bio::SearchIO
| Pdoc documentation: Bio::SearchIO | CPAN documentation: Bio::SearchIO |
|---|
This is a factory module for plugging in different parsers for pairwise alignment objects. It will produce Bio::Search::Result::ResultI compliant objects which in turn contain Bio::Search::Hit::HitI compliant objects and these contain Bio::Search::HSP::HSPI modules.
Contents |
Supported formats
This module can parser many different pairwise alignment search algorithm results.
| Application | Bio::SearchIO module | sub-formats | comments |
| BLAST | Bio::SearchIO::blast | WU BLAST, NCBI BLAST, bl2seq, rpsblast, psiblast, phiblast, TimeLogic BLAST, UCSC BLAST-like | This supports many different BLAST flavors |
| Tabular BLAST | Bio::SearchIO::blasttable | -m9 and -m8 (NCBI) or -mformat 2 and -mformat 3 (WU-BLAST) tabular format | This is the tab-delimited column format from NCBI BLAST or WU BLAST |
| Megablast BLAST | Bio::SearchIO::megablast | Format 0 | Format 2 should be parseable as standard BLAST output. |
| XML BLAST | Bio::SearchIO::blastxml | NCBI XML format | NCBI's XML DTD |
| FASTA | Bio::SearchIO::fasta | -m 9 -d 0 OR the default -m 1 options | Default or compact formats |
| HMMER | Bio::SearchIO::hmmer | hmmsearch and hmmpfam output parsed | Domains are Hits and alignments fall into the HSP category. Note that query-subject relationship flips between hmmsearch and hmmpfam (i.e. the domain is subject in hmmpfam and query in hmmsearch) |
| Exonerate | Bio::SearchIO::exonerate | CIGAR and VULGAR formats | Only one of CIGAR or VULGAR should be parsed (provide the -vulgar =>1 or -cigar=>1 when initializing the object) |
| Genewise Genomewise | Bio::SearchIO::wise | -genesf output | This parses the -genesf or -genes output from Genewise and Genomewise |
| Sim4 | Bio::SearchIO::sim4 | A={0,1,3,4} output | Cannot parse LAV or 'exon file' formats (A=2 or A=5) |
| PSL alignment format BLAT UCSC tools | Bio::SearchIO::psl | psl format | with or without -noHead option |
| AXT alignment format BLAT UCSC tools | Bio::SearchIO::axt | Directly produced from blat or see the lavToAxt | If you run BLASTZ you can convert the LAV alignment format to AXT alignment format with lavToAxt |
| WABA | Bio::SearchIO::waba | This processes the waba output (not the human readable portion) | The format is a lot like AXT but has four lines per alignment block |
Note: The module name gives you the 'format code' to use when creating a new Bio::SearchIO object. For example: axt, cross_match, megablast, exonerate, infernal, blastxml, and fasta. See also the SearchIO HOWTO.
These objects represent the three components of a BLAST or FASTA pairwise database search result. The can be though of like
- Result - a container object for a given query sequence, there will be a Result for every query sequence in a database search
- Hit - a container object for each identified sequence found to be similar to the query sequence, it contains HSPs
Note
Module:Bio::SearchIO produces Module:Bio::Search::Result::ResultI compliant objects which in turn contain Module:Bio::Search::Hit::HitI compliant objects (e.g. Module:Bio::Search::Hit::GenericHit) and these contain Module:Bio::Search::HSP::HSPI compliant objects.
Using SearchIO
Here's example code which processes a BLAST report, then iterates through the generated objects, finding all the hits where the HSPs are greater than 50 residues and the percent identity is less than 75 percent. This code demonstrates that a result, in this case from a BLAST report, contains one or more hits, and a hit contains one or more HSPs.
use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-format => 'blast', -file => 'report.bls'); while( my $result = $in->next_result ) { ## $result is a Bio::Search::Result::ResultI compliant object while( my $hit = $result->next_hit ) { ## $hit is a Bio::Search::Hit::HitI compliant object while( my $hsp = $hit->next_hsp ) { ## $hsp is a Bio::Search::HSP::HSPI compliant object if( $hsp->length('total') > 50 ) { if ( $hsp->percent_identity >= 75 ) { print "Query=", $result->query_name, " Hit=", $hit->name, " Length=", $hsp->length('total'), " Percent_id=", $hsp->percent_identity, "\n"; } } } } }
The example above shows just a few of the many methods available in Bio::SearchIO system.