HOWTO:Testing molecular clock
From BioPerl
Testing for a molecular clock
This is just a dumping ground for some scripts for testing for a molecular clock. Justification and background should be placed here.
Code and Scripts
Scripts focused on PAML
Run Codeml (PAML) with and without clock on a directory of files ending in '.cds.phy'. Will create a bunch of files ending in .codeml in the clock and noclock directories. It assumes that the alignments are in a directory call 'aln' and that the genes are named with the species abbreviation that is used in the phylogeny.tre file. I will put up examples of phylogeny.tre and a file from the aln directory.
#!/usr/bin/perl -w my $filecontent = <<EOF seqfile = %s * sequence data filename treefile = %s * tree structure file name outfile = %s * main result file name noisy = 2 * 0,1,2,3,9: how much rubbish on the screen verbose = 0 * 0: concise; 1: detailed, 2: too much runmode = 0 * 0: user tree; seqtype = 1 * 1:codons; 2:AAs; 3:codons-->AAs CodonFreq = 2 * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table clock = %d * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis aaDist = 0 * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a model = 0 NSsites = 0 icode = 0 * 0:universal code; 1:mammalian mt; 2-10:see below Mgene = 0 fix_kappa = 0 * 1: kappa fixed, 0: kappa to be estimated kappa = 2 * initial or fixed kappa fix_omega = 0 * 1: omega or omega_1 fixed, 0: estimate omega = .4 * initial or fixed omega, for codons or codon-based AAs fix_alpha = 1 * 0: estimate gamma shape parameter; 1: fix it at alpha alpha = 0. * initial or fixed alpha, 0:infinity (constant rate) Malpha = 0 * different alphas for genes ncatG = 8 * # of categories in dG of NSsites models getSE =1 * RateAncestor = 0 * (0,1,2): rates (alpha>0) or ancestral states (1 or 2) Small_Diff = .5e-6 cleandata = 1 * remove sites with ambiguity data (1:yes, 0:no)? method = 0 * 0: simultaneous; 1: one branch at a time EOF ; my $dir = 'aln'; my $clockdir = 'clock'; my $noclockdir = 'noclock'; mkdir($clockdir) unless -d $clockdir; mkdir($noclockdir) unless -d $noclockdir; my $ctl = 'codeml.ctl'; my $force = shift || 0; opendir(DIR,$dir) || die $!; for my $file (readdir(DIR) ) { next unless $file =~ /(\S+)\.cds\.phy$/; my $stem = $1; my $clock = 0; for my $o ( ["$noclockdir/$stem.codeml", "phylogeny.unrooted", "$clockdir/$stem.codeml", "phylogeny.rooted" ) { my ($outfile,$phylogeny) = @{$o}; # unroll the arrayref if( $force || (! -f $outfile || -z $outfile) ) { # clock open(CODEML,">$ctl") || die $!; printf CODEML $filecontent, "$dir/$file", $phylogeny,$outfile, $clock; close(CODEML); `codeml`; } $clock++; } }
Processing the results
#!/usr/bin/perl -w use strict; use Statistics::Distributions; my $signif_high = '0.01'; my $signif_mod = '0.05'; my @dirs = qw(clock noclock); my $ext = 'codeml'; my %sets; for my $dir ( @dirs ) { opendir(DIR, $dir) || die $!; for my $file ( readdir(DIR) ) { next unless $file =~ /(\S+)\.$ext$/; my $num = $1; open(IN,"$dir/$file") || die $!; while(<IN>) { if( /lnL\([^\)]+\):\s+(\-?\d+\.\d+)/ ) { $sets{$num}->{$dir} = $1 * -1; # this is just a lnL it is not -lnL } elsif( /^ns\s+=\s+(\d+)/ ) { $sets{$num}->{'ns'} = $1; } } } } # to then compare the clock vs no-clock my $i = 0; print join("\t", qw(CLUSTER NOCLOCK CLOCK LRT SIGNIF)),"\n"; for my $k ( sort keys %sets ) { my $d = $sets{$k}; if( defined $d ) { # H1,H0 = - log (likelihood) # LRT = 2*(lnL1 - lnL2); my $lrt = 2 * ($d->{'clock'} - $d->{'noclock'} ); if( ! defined $d->{clock} || ! defined $d->{'noclock'} ) { warn("no dat for $i\n"); } my $p = Statistics::Distributions::chisqrprob ($d->{'ns'} - 2,$lrt); printf "%s\t%9.3f\t%9.3f\t%-7.2f\t%-5.2g%s\n",$i,$d->{'noclock'}, $d->{'clock'}, $lrt, $p, $p < $signif_high ? '**' : $p < $signif_mod ? '*' : ''; } $i++; }