HOWTO:Trees (post-refactor)
Contents |
For the Lazy
use Bio::Tree::Tree; my $tree; # Step 1: Load a tree # from a file #$tree = Bio::Tree::Tree->from_file("my-tree.xml"); # from a string $tree = Bio::Tree::Tree->from_string("(a,(b,c));"); # using the TreeIO system (the format can be autodetected if the -format argument is missing) #my $treeio = Bio::TreeIO->new(-file => "my-tree.xml", -format => 'phyloxml'); #$tree = $treeio->next_tree; # Step 2: Save a tree # you can get the Newick string directly print "Newick format: " . $tree->newick . "\n"; # or use a TreeIO writer $treeio = Bio::TreeIO->new(-file => ">tree-out.xml", -format => 'phyloxml'); $treeio->write_tree($tree); # Step 3: Analyze a tree print "Leaf count: " . scalar($tree->leaves) . "\n"; print "Node count: " . scalar($tree->nodes) . "\n"; print "Total branch length: " . $tree->total_branch_length . "\n"; print "Max root-to-tip branch length: " . $tree->max_distance_to_leaf . "\n"; print "Max root-to-tip node depth: " . $tree->max_depth_to_leaf . "\n"; # Step 4: Modify a tree # print a human-readable ASCII diagram print "Original tree:\n" . $tree->ascii; my $orig_root = $tree->root; # re-root the tree on the node labeled 'b' $tree->reroot($tree->find('b')); print "Rerooted on b:\n" . $tree->ascii; # re-root halfway along the branch leading to 'c' # (this can be more intuitive, but it adds a new internal # node to the tree) $tree->reroot_above($tree->find('c'), 0.5); print "Rerooted above c:\n" . $tree->ascii; # return to the old root and remove the internal node created # in the previous re-rooting. $tree->reroot($orig_root); $tree->contract_linear_paths; # use key-value mappings to translate the tree's node labels my $id_map = { 'a' => 'Aardvark', 'b' => 'Banana', 'c' => 'Coyote' }; $tree->translate_ids($id_map); print "Translated IDs:\n" . $tree->ascii; # add a new node to the tree # ... first create a new object of the same class as the root my $root_node = $tree->root; my $new_node = new $root_node; # ... then add it as third child of the node parental to Banana $tree->find('Banana')->parent->add_child($new_node); $new_node->branch_length(1); $new_node->id('z'); print "New node added:\n" . $tree->ascii; # now the tree has a multifurcation -- ask BioPerl to randomly # resolve the multifurcation $tree->force_binary; print "Forced to binary structure:\n" . $tree->ascii; $tree = Bio::Tree::Tree->from_string("(a,(b,(c,(d,(e,(f,(g,(h,i))))))));"); print "Full tree:\n" . $tree->ascii; my $slice = $tree->slice($tree->find('a'), $tree->find('d'), $tree->find('i')); print "Slice:\n" . $slice->ascii; my $slice = $tree->slice_by_ids('a', 'c', 'e', 'i'); print "Slice:\n" . $slice->ascii; |
Newick format: (a,(b,c)); Leaf count: 3 Node count: 5 Total branch length: 0 Max root-to-tip branch length: 2 Max root-to-tip node depth: 1 Original tree: /-a ---------| | /-b \--------| \-c Rerooted on b: /-c -b------- /--------| \-------- /-a Rerooted above c: /-c ---------| | /-b \--------| \-------- /-a Translated IDs: /-Aardvark ---------| | /-Banana \--------| \-Coyote New node added: /-Aardvark | ---------| /-Banana | | \--------|--Coyote | \-z Forced to binary structure: /-Aardvark ---------| | /-Banana \--------| | /-Coyote \--------| \-z Full tree: /-a ---------| | /-b \--------| | /-c \--------| | /-d \--------| | /-e \--------| | /-f \--------| | /-g \--------| | /-h \--------| \-i Slice: /-a ---------| | /-d \--------| \-i Slice: /-a ---------| | /-c \--------| | /-e \--------| \-i |
Motivation
The evolutionary tree is a fundamental concept in biology, and the tree structure is a common datatype in computer science and bioinformatics. Almost all biological studies involve a phylogenetic tree at some stage or another, whether it be locating the position of an organism in the tree of life, inferring the evolutionary history of a protein family, or testing different detailed evolutionary hypotheses using mathematical models and a known evolutionary tree.
A powerful API for loading, modifying, analyzing, and saving trees is thus a necessity for any 'swiss army knife' biological toolkit such as BioPerl.
Goals
The ultimate goal of the BioPerl Tree API is to make working with phylogenetic trees simple and easy.
Bio::Tree::* is not meant to be the fastest or most powerful tree library available; rather, it should provide the basic tools that 'glue' scripts might require to work with trees that are output from or input to other programs. For more advanced functionality such as manipulating large trees (e.g., the entire NCBI taxonomy) or inferring phylogenetic trees from molecular data, please look elsewhere. Joe Felsenstein maintains a phylogenetic software page which is a useful starting point.
Some key aspects of the API:
- The most common tree operations should require little code and even less documentation
- More advanced functionality should be well-abstracted and encapsulated into methods of reasonable size and complexity
- Short yet accurate method names should be used when possible
- Convenience methods should be provided in order to save users from needless characters and excess lines of boilerplate code.