From BioPerl
Jump to: navigation, search
PDoc Bio::SearchIO
CPAN Bio::SearchIO
metaCPAN Bio::SearchIO

This is a factory module for plugging in different parsers for pairwise alignment objects. It will produce Bio::Search::Result::ResultI compliant objects which in turn contain Bio::Search::Hit::HitI compliant objects and these contain Bio::Search::HSP::HSPI modules.


Supported formats

This module can parser many different pairwise alignment search algorithm results.

Application Bio::SearchIO module sub-formats comments
BLAST Bio::SearchIO::blast WU BLAST, NCBI BLAST, bl2seq, rpsblast, psiblast, phiblast, TimeLogic BLAST, UCSC BLAST-like This supports many different BLAST flavors
Tabular BLAST Bio::SearchIO::blasttable -m9 and -m8 (NCBI) or -mformat 2 and -mformat 3 (WU-BLAST) tabular format This is the tab-delimited column format from NCBI BLAST or WU BLAST
Megablast BLAST Bio::SearchIO::megablast Format 0 Format 2 should be parseable as standard BLAST output.
XML BLAST Bio::SearchIO::blastxml NCBI XML format NCBI's XML DTD
FASTA Bio::SearchIO::fasta -m 9 -d 0 OR the default -m 1 options Default or compact formats
HMMER Bio::SearchIO::hmmer hmmsearch and hmmpfam output parsed Domains are Hits and alignments fall into the HSP category. Note that query-subject relationship flips between hmmsearch and hmmpfam (i.e. the domain is subject in hmmpfam and query in hmmsearch)
Exonerate Bio::SearchIO::exonerate CIGAR and VULGAR formats Only one of CIGAR or VULGAR should be parsed (provide the -vulgar =>1 or -cigar=>1 when initializing the object)
Genewise Genomewise Bio::SearchIO::wise -genesf output This parses the -genesf or -genes output from Genewise and Genomewise
Sim4 Bio::SearchIO::sim4 A={0,1,3,4} output Cannot parse LAV or 'exon file' formats (A=2 or A=5)
PSL alignment format BLAT UCSC tools Bio::SearchIO::psl psl format with or without -noHead option
AXT alignment format BLAT UCSC tools Bio::SearchIO::axt Directly produced from blat or see the lavToAxt If you run BLASTZ you can convert the LAV alignment format to AXT alignment format with lavToAxt
WABA Bio::SearchIO::waba This processes the waba output (not the human readable portion) The format is a lot like AXT but has four lines per alignment block

Note: The module name gives you the 'format code' to use when creating a new Bio::SearchIO object. For example: axt, cross_match, megablast, exonerate, infernal, blastxml, and fasta. See also the SearchIO HOWTO.

These objects represent the three components of a BLAST or FASTA pairwise database search result. The can be though of like

  • Result - a container object for a given query sequence, there will be a Result for every query sequence in a database search
    • Hit - a container object for each identified sequence found to be similar to the query sequence, it contains HSPs
      • HSP - represents the alignment of the query and hit sequence. For BLAST there can be multiple HSPs while only a single one for FASTA results. The HSP object will contain reference to the query and subject alignment start and end.


Module:Bio::SearchIO produces Module:Bio::Search::Result::ResultI compliant objects which in turn contain Module:Bio::Search::Hit::HitI compliant objects (e.g. Module:Bio::Search::Hit::GenericHit) and these contain Module:Bio::Search::HSP::HSPI compliant objects.

Using SearchIO

Here's example code which processes a BLAST report, then iterates through the generated objects, finding all the hits where the HSPs are greater than 50 residues and the percent identity is less than 75 percent. This code demonstrates that a result, in this case from a BLAST report, contains one or more hits, and a hit contains one or more HSPs.

use strict;
use Bio::SearchIO; 
my $in = new Bio::SearchIO(-format => 'blast', 
                           -file   => 'report.bls');
while( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
      if( $hsp->length('total') > 50 ) {
        if ( $hsp->percent_identity >= 75 ) {
          print "Query=",   $result->query_name,
            " Hit=",        $hit->name,
            " Length=",     $hsp->length('total'),
            " Percent_id=", $hsp->percent_identity, "\n";

The example above shows just a few of the many methods available in Bio::SearchIO system.

See also

Personal tools
Main Links