Merge gapped sequences across a common region
From BioPerl
(see thread)
Albert Vilella sez: I basically want to start with something like this:
seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.234 QWERTYU------------------- seq2.345 ----------ASDFGH---------- seq2.456 -------------------ZXCVBNM
and end with something like this:
seq1.123 QWERTYUIOPASDFGHJKLZXCVBNM seq2.mrg QWERTYU---ASDFGH---ZXCVBNM
Here's one of my favorite tricks for this: XOR mask on gap symbol. Fast! --ed.
use Bio::SeqIO; use Bio::Seq; use strict; my $seqio = Bio::SeqIO->new( -fh => \*DATA ); my $acc = $seqio->next_seq->seq ^ '-'; while ($_ = $seqio->next_seq ) { $acc ^= ($_->seq ^ '-'); } my $mrg = Bio::Seq->new( -id => 'merged', -seq => $acc ^ '-' ); 1; __END__ >seq2.234 QWERTYU------------------- >seq2.345 ----------ASDFGH---------- >seq2.456 -------------------ZXCVBNM