Module:Bio::SeqIO
| Pdoc documentation: Bio::SeqIO | CPAN documentation: Bio::SeqIO |
|---|
Contents |
Introduction
Bio::SeqIO provides a factory interface for parsing sequence files. The system is designed to be pluggable so that new formats can be added easily. Additional documentation is provided by the SeqIO HOWTO. This module can parser many different pairwise alignment search algorithm results.
| Format | Bio::SearchIO module | comments |
| FASTA | Bio::SeqIO::fasta | |
| FASTQ | Bio::SeqIO::fastq | |
| NEXML | Bio::SeqIO::nexml | |
| SEQXML | Bio::SeqIO::seqxml | |
| BSML | Bio::SeqIO::bsml | |
| GenBank | Bio::SeqIO::genbank | |
| EMBL | Bio::SeqIO::embl | |
| plz add... |
Implementing new sequence parsers and writers
A new Bio::SeqIO subclass must:
- have an all-lowercase name (there are reasons for this), e.g.
simpleseq - be in the Bio::SeqIO package namespace, e.g.
Bio::SeqIO::simpleseq - reside in the
Bio/SeqIOdirectory - implement the
next_seqmethod (for reading) - implement the
write_seqmethod (for writing)
next_seq
The next_seq method should read data using the $self->_readline method as all Bio::SeqIO modules inherit from Bio::Root::IO. This method should return a new Bio::PrimarySeqI object. If the file or stream contains more than one sequence then repeated calls to next_seq should return a new sequence until the end of the stream, when an undefined value should be returned. If the sequence data is rich, meaning it contains features and annotations then Bio::SeqI or Bio::Seq::RichSeqI objects should be returned.
write_seq
The write_seq method should accept an array of one or many Bio::PrimarySeqI objects and generate sequences in the desired format. The data should be written to the stream using the $self->_print
Example module
package Bio::SeqIO::simpleseq; use strict; use Bio::PrimarySeq; use base qw(Bio::SeqIO); use vars qw($SEP); $SEP = "\t"; # if this module has its own special initialization options sub _initialize { my ($self,@args) = @_; $self->SUPER::_initalize(@args); my ($sep) = $self->_rearrange([qw(SEP)], @args); $sep && $self->sep($sep); } # method to write a sequence out =head2 write_seq Title : write_seq Usage : $stream->write_seq(@seq) Function: writes the $seq object into the stream Returns : 1 for success and 0 for error Args : array of 1 to n Bio::PrimarySeqI objects =cut sub write_seq { my ($self,@args) = @_; my $sep = $self->sep; for my $seq ( @args ) { $self->_print(join($sep, $seq->display_id, $seq->seq), "\n"); } return 1; } # method to read a sequence in =head2 next_seq Title : next_seq Usage : my $seq = $stream->next_seq Function: reads a $seq object from the stream Returns : Bio::PrimarySeqI Args : None =cut sub next_seq { my ($self) = shift; my $line = $self->_readline; return undef unless defined $line && $line =~ /\S+/; my $sep = $self->sep; my ($id, $seq) = split($sep, $line); return Bio::PrimarySeq->new(-seq => $seq, -display_id => $id); } =head2 sep Title : sep Usage : $obj->sep($newval) Function: Get/Set the field separator Returns : value of separator Args : newvalue (optional) =cut sub sep{ my ($self,$value) = @_; if( defined $value) { $self->{'_sep'} = $value; } return $self->{'_sep'} || $SEP; }