Why BioPerl is slow

From BioPerl
Jump to: navigation, search

A common complaint or question we get is, "why is BioPerl slow when I do X"? The short answer is it has to do with the hacky nature of how objects are created in BioPerl. When running the parsers like Bio::SearchIO, it is not the parsing that is slow, but the need to create all the sub-objects. For example when parsing a BLAST report, objects are created for the Result, Hit, and HSP objects. Because each HSP is an Bio::Search::HSP::HSPI it contains two Bio::SeqFeature::Similarity objects, which in turn contain a Bio::LocationI object. Object creation is slow in BioPerl because we are using inheritance quite heavily to allow initialization options to be shared among derived classes (i.e. you have the initialization option score in the parent object, you would like all of the children to have this as well, but not have to copy+paste the code for testing and setting the score field in all the sub-classes, it should just be done once the parent).

The long answer will involve fancy diagrams and stack traces, Dprof output, etc. Actually doing this will help us figure out better why it is slow and maybe can provide insight into a solution.

The design principles which have gone into the next major version of Perl, Perl 6, may also help.

How can we make it faster?

Removing aspects of the inheritance so that the inheritance hierarchy does not have to be queried every time. It should be cached....

Why BioPerl is Fast

BLAST tiling in C++: development 1 mo. + analysis 5min;

BLAST tiling in BioPerl: development 2 d + analysis 30min.

Personal tools
Main Links