Site entropy in an alignment
From BioPerl
(see bioperl-l discussion here, and see also Site frequencies in an alignment)
Cláudio Sampaio asks
How to get the entropy of each nucleotide of an alignment?
from Mark Jensen:
If you have a Bio::SimpleAlign object prepared -- maybe from
$alnio = new Bio::AlignIO(-format=>'fasta', -file=>'your.fas'); $aln = $alnio->next_aln;
try the following function, as
$entropies = entropy_by_column( $aln ) =head2 entropy_by_column Title : entropy_by_column Usage : entropy_by_column( $aln ) Function: returns the Shannon entropy for each column in an alignment Example : Returns : hashref of the form { $column_number => $entropy, ... } Args : a Bio::SimpleAlign object =cut sub entropy_by_column { my ($aln) = @_; my (%ent); foreach my $col (1..$aln->length) { my %res; foreach my $seq ($aln->each_seq) { my $loc = $seq->location_from_column($col); next if (!defined($loc) || $loc->location_type eq 'IN-BETWEEN'); $res{$seq->subseq($loc)}++; } $ent{$col} = entropy(values %res); } return [%ent]; } =head2 entropy Title : entropy Usage : entropy( @numbers ) Function: returns the Shannon entropy of an array of numbers, each number represents the count of a unique category in a collection of items Example : entropy ( 1, 1, 1 ) # returns 1.09861228866811 = log(1/3) Returns : Shannon entropy or undef if entropy undefined; Args : an array =cut use List::Util qw(sum); sub entropy { @a = grep {$_} @_; return undef unless grep {$_>0} @a; return undef if grep {$_<0} @a; my $t = sum(@a); $_ /= $t foreach @a; return sum(map { -$_*log($_) } grep {$_} @a); }