Extracting sequence in the neighborhood of a feature
From BioPerl
(see the bioperl-l thread starting here)
Rich Green asks:
I am looking for a command that would extract 2000 base pairs upstream of the sequence. Is there a way to extract just the gene promoter region from genbank?
Kevin Brown responds:
Get the start point for a Sequence Feature and request a subseq of the main object that starts 2000 above that.
- Jason Stajich made a technical change and kibbitzed in consideration of strandedness. The hybrid scrap is below --Ed.
my $gb = new Bio::DB::GenBank; $entry = $gb->get_Seq_by_id($id); my $promotor; foreach my $f ($entry->all_SeqFeatures()) { if ($f->primary_tag eq 'CDS') { if( $f->strand < 0 ) { $promotor = $entry->trunc($f->end+1, $f->end + 2000)->revcom; } else { $promotor = $entry->trunc($f->start - 2000, $f->start -1); } $out->write_seq($promotor); } }
to the #top