Site frequencies in an alignment
From BioPerl
- So, here's one reason for the Scrapbook. You say, "why write something I already sort of wrote?" So today I came to the Scrapbook to cannibalize Site entropy in an alignment, to get it to squirt out the actual frequencies of each nucleotide at each site. The key here is the round-trip: a new scrap for you, based on the mods. --Ed.
To get an array of hashes of hashes containing the frequencies of each residue at each site of an alignment:
$freqs = freqs_by_column($aln);
=head2 freqs_by_column Title : freqs_by_column Usage : freqs_by_column( $aln ) Function: returns the freqs for each column in an alignment Example : Returns : hashref of the form { $column_number => {'A'=>$afreq,...}, ... } Args : a Bio::SimpleAlign object =cut sub freqs_by_column { my ($aln) = @_; my (%freqs); foreach my $col (1..$aln->length) { my %res; foreach my $seq ($aln->each_seq) { my $loc = $seq->location_from_column($col); next if (!defined($loc) || $loc->location_type eq 'IN-BETWEEN'); $res{$seq->subseq($loc)}++; } $freqs{$col} = freqs(\%res); } return {%freqs}; } =head2 freqs Title : freqs Usage : freqs( \%ntct ) Function: returns the site frequencies of nts in an aligment Returns : hashref of freqs: { A => 0.5, G => 0.5 } Args : hashref of nt counts { A => 100, G => 100 } =cut sub freqs { my $ntct = shift; my $t = eval join('+', values %$ntct); my @a = qw( A T G C ); my @f = map {$_ /= $t} @{$ntct}{@a}; $t = eval join('+', @f); map { $_ /= $t } @f; my %ret; @ret{@a} = @f; return {%ret} }
but don't expect to set any speed records.