Site frequencies in an alignment

From BioPerl
Jump to: navigation, search
  • So, here's one reason for the Scrapbook. You say, "why write something I already sort of wrote?" So today I came to the Scrapbook to cannibalize Site entropy in an alignment, to get it to squirt out the actual frequencies of each nucleotide at each site. The key here is the round-trip: a new scrap for you, based on the mods. --Ed.

To get an array of hashes of hashes containing the frequencies of each residue at each site of an alignment:

$freqs = freqs_by_column($aln);
=head2 freqs_by_column
 Title   : freqs_by_column
 Usage   : freqs_by_column( $aln )
 Function: returns the freqs for each column in an alignment
 Example :
 Returns : hashref of the form { $column_number => {'A'=>$afreq,...}, ... }
 Args    : a Bio::SimpleAlign object
=cut
 
 sub freqs_by_column {
    my ($aln) = @_;
    my (%freqs);
    foreach my $col (1..$aln->length) {
      my %res;
      foreach my $seq ($aln->each_seq) {
          my $loc = $seq->location_from_column($col);
          next if (!defined($loc) || $loc->location_type eq 'IN-BETWEEN');
          $res{$seq->subseq($loc)}++;
      }
      $freqs{$col} = freqs(\%res);
    }
    return {%freqs};
 }
 
=head2 freqs
 Title   : freqs
 Usage   : freqs( \%ntct )
 Function: returns the site frequencies of nts in an aligment
 Returns : hashref of freqs: { A => 0.5, G => 0.5 }
 Args    : hashref of nt counts { A => 100, G => 100 }
=cut
 
sub freqs {
    my $ntct = shift;
    my $t = eval join('+', values %$ntct);
    my @a = qw( A T G C );
    my @f = map {$_ /= $t} @{$ntct}{@a};
    $t = eval join('+', @f);
    map { $_ /= $t } @f;
    my %ret;
    @ret{@a} = @f;
    return {%ret}
}

but don't expect to set any speed records.

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox