Feature Annotation rollback
This is a tracking page for the planned rollback of BioPerl SeqFeature/Annotation changes prior to a new stable release. The code is being tested on a CVS branch (tagname: featann_rollback) prior to merging back onto the main branch. It is hoped this can be completed within a relatively short period of time, though many of the changes introduced are complex and may require extensive testing prior to any release (let alone a stable one). Portions of this page will likely be incorporated into the final release notes; it is not meant to be permanent.
I plan on posting changes and test results in case I run into issues. I am limiting comments on this page to regular and core devs; however, anyone can use the discussion page to make suggestions at any time.
The plan is to rollback changes gradually (in rounds), fixing tests prior to the next round of rollbacks. Hopefully, after everything is done Bio::FeatureIO will work without problems.
For all work, tests were run on Mac OS X (Tiger, Intel) using Perl 5.8.6.
Contents |
First round
- Remove tag methods from Bio::AnnotatableI; add tag methods back to Bio::SeqFeatureI as unimplemented; Bio::AnnotatableI now consists of one method (annotation())
- Remove Bio::AnnotatableI as parent class to Bio::SeqFeatureI
- Implement Bio::SeqFeatureI methods in Bio::SeqFeature::Generic and Bio::SeqFeature::Annotated by rolling back the tag methods Bio::SeqFeature::Generic to rel. 1.4 and moving the formerly implemented Bio::AnnotatableI tag methods into Bio::SeqFeature::Annotated
- Bio::SeqFeature::Annotated now implements Bio::AnnotatableI; Bio::SeqFeature::Annotated::get_Annotations() is now just a wrapper method around $sfann->annotation->get_Annotations()
Tests
Failed Test Stat Wstat Total Fail List of Failed
-------------------------------------------------------------------------------
t/BioGraphics.t 3 768 38 3 3-5
t/DB.t 255 65280 116 31 101-116
t/Genewise.t 3 768 53 3 37 41 45
t/Sopma.t 2 512 16 2 8 15
(2 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 33 subtests skipped.
Failed 4/247 test scripts. 24/17250 subtests failed.
Files=247, Tests=17250, 460 wallclock secs (130.49 cusr + 18.80 csys = 149.29 CPU)
Failed 4/247 test programs. 24/17250 subtests failed.
Notes
- Following test failures are also occuring on the main branch, so ignoring:
- BioGraphics.t
- DB.t
- Genewise.t
- Sopma.t
- SeqVersion.t failure appeared to be server-related (passed on subsequent retest), ignoring for now.
- AnnotationAdaptor.t fixed
- SeqFeature.t - fixed; split off into SeqFeatAnnotated.t, which also now passes.
- Bio::SeqFeature::Annotated::from_feature() now uses Bio::SeqFeature::AnnotationAdaptor to retrieve a homogeneous annotation collection from any Bio::SeqFeatureI
- Annotation.t test fail is due to an isa check of Bio::SeqFeature::Generic as Bio::AnnotatableI. fixed. Reverted Bio::SeqFeature::Generic back to a Bio::AnnotatableI.
- targetp.t tests failing b/c several tag values have undefined values. This brings up an interesting side effect of making tags Bio::AnnotationI, in that any tag values (even undef) apparently result in Bio::AnnotationI-implementing instances
- Revert to using older API by checking has_tag($tagname) first (as in Feature/Annotation HOWTO). fixed
- Several tests fail b/c methods once in Bio::AnnotatableI are no longer there (such as calling get_Annotations from a Bio::Seq::RichSeq or Bio::SeqFeature::Generic, or indirect fails from calls from Bio::SeqIO::FTHelper). fixed
Second round
- Stringification or 'eq' operator overloading in Bio::AnnotationI removed.
- First round of fixes change implicit 'stringified' calls in tests to explicit calls to new method display_text(); still need to work on GenBank/SwissProt tests (Handler.t are duplicate GenBank/EMBL/UniProt tests which use a different experimental driver).
- Need to check that tags such as the ones in this thread are caught.
- Clean up Bio::SeqFeatureI implementations
- Implement Bio::SeqFeature::TypedSeqFeatureI in Bio::SeqFeature::Annotated
- Maybe deprecate Bio::SeqFeature::Annotated::type() and delegate over to Bio::SeqFeature::TypedSeqFeatureI::ontology_term()?
Tests
Failed Test Stat Wstat Total Fail List of Failed
-------------------------------------------------------------------------------
t/BioGraphics.t 3 768 38 3 3-5
t/DB.t 255 65280 116 31 101-116
t/Genewise.t 3 768 53 3 37 41 45
t/Handler.t 2 512 546 2 242 315
t/SeqFeatAnnotated.t 3 768 26 3 24-26
t/Sopma.t 2 512 16 2 8 15
t/genbank.t 1 256 244 1 242
t/swiss.t 1 256 240 1 9
(3 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 33 subtests skipped.
Failed 8/247 test scripts. 31/17253 subtests failed.
Files=247, Tests=17253, 396 wallclock secs (129.15 cusr + 18.68 csys = 147.83 CPU)
Failed 8/247 test programs. 31/17253 subtests failed.
Notes
- A few GenBank and SwissProt tests:
- genbank.t test fail was due to a dropped AnnotationI (not sure why). Test was adjusted to account for the extra AnnotationI for now, but worth further investigation to ensure the extra AnnotationI is legit.
- swiss.t doesn't roundtrip efficiently; this is due to changes with date formats. The tests have been modified to TODO's for now; a more serious roundtripping set of tests needs to be performed.
- Some of these are 'lazy' tests using a Bio::AnnotationI object directly as if it is a string. Changing the test to make an explicit method call, adding comment ('no "" operator overloading') to test indicating overloading is not permitted.
- There appear to be some confusion as to method deprecation in Bio::SeqFeatureI which needs to be cleared up
- Bio::SeqFeature::Annotated doesn't appear to be complete, with some methods returning data types inconsistent with Bio::SeqFeature::Generic; needs a complete audit and revised (more strenuous) tests
- Bio::SeqFeature::Annotated::score() changed to explicitly return textual output (no objects). More method changes need to be made for consistency.
- genbank.t,swiss.t,Handler.t,SeqFeatAnnotated.t now pass; need to address or file bugs on above issues.
Third round
- Stepping through the various Bio::AnnotationI and adding exceptions to overloads to catch any instances where overloading is used.
Tests
Exceptions added to overloads:
- All Bio::AnnotationI
Failed Test Stat Wstat Total Fail List of Failed
-------------------------------------------------------------------------------
t/BioGraphics.t 3 768 38 3 3-5
t/DB.t 255 65280 116 31 101-116
t/GOterm.t 255 65280 61 108 8-61
t/Genewise.t 3 768 53 3 37 41 45
t/SeqFeatAnnotated.t 255 65280 26 42 6-26
t/Sopma.t 2 512 16 2 8 15
t/obo_parser.t 255 65280 45 86 3-45
t/simpleGOparser.t 255 65280 102 202 2-102
(2 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 37 subtests skipped.
Failed 8/247 test scripts. 243/17248 subtests failed.
Files=247, Tests=17248, 398 wallclock secs (122.08 cusr + 18.05 csys = 140.13 CPU)
Failed 8/247 test programs. 243/17248 subtests failed.
Notes
- Most errors were due to unexpected overloads:
- if($ann) triggers the overloaded sub, but if(defined $ann) doesn't.
- There is some API conflict with some modules using Bio::Ontology::Term::add_dblink_context incorrectly (passing values instead of Bio::Annotation::DBLink instances. Possible solution here. I added an exception to catch anything not passing objects, notably which kills these tests:
t/GOterm.t 255 65280 61 108 8-61 t/SeqFeatAnnotated.t 255 65280 26 42 6-26 t/obo_parser.t 255 65280 45 86 3-45 t/simpleGOparser.t 255 65280 102 202 2-102
- Several fix-me's found in Bio::FeatureIO::gff.
- Noticed that OntologyStore.t tests are consistently failing to make server contact.
Fourth Round
- Bio::SeqFeature::Annotated cleanup will wait until after merging to the main branch (minor fixes only here)
- Fix various Ontology-related dblink methods inconsistencies in various Bio::Ontology::TermI/Bio::OntologyIO classes and Bio::Annotation::OntologyTerm
- As noted above, there were several modules which passed in simple scalars to Bio::Ontology::Term while others passed in Bio::Annotation::DBLink instances. Notably the method documentation is ambiguous as to what is required (some indicate scalar values for arguments, others Bio::Annotation::DBLink).
- In order to rectify this we are reimplementing these methods to be more consistent and specifically allow both strings and Bio::Annotation::DBLink instances. Therefore, use of any Bio::Ontology::Term-related dblink method is deprecated in favor of the following methods:
- get_dbxrefs (in place of get_dblinks). This method uses parameters (-type and -context); -type can be used to get specific data types in cases where there are mixes of strings and Bio::Annotation::DBLink
- add_dbxref (in place of add_dblink)
- remove_dbxrefs (in place of remove_dblinks)
- has_dbxref (in place of has_dblink)
- add_dbxref_context (in place of add_dblink_context)
- Any text comparision between two instances or a scalar and an instance used text output from Bio::Annotation::DBLink::display_text (explicit comparison, as opposed to the implicitly overloaded 'eq' comparisons)
Tests
Failed Test Stat Wstat Total Fail List of Failed
-------------------------------------------------------------------------------
t/BioGraphics.t 3 768 38 3 3-5
t/DB.t 255 65280 116 31 101-116
t/Genewise.t 3 768 53 3 37 41 45
t/Sopma.t 2 512 16 2 8 15
(3 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 33 subtests skipped.
Failed 4/246 test scripts. 24/17195 subtests failed.
Files=246, Tests=17195, 371 wallclock secs (121.93 cusr + 17.29 csys = 139.22 CPU)
Failed 4/246 test programs. 24/17195 subtests failed.
Notes
- All tests now pass (above failures, as noted above, also fail on MAIN). Will merge to main branch soon.
Cleanup
- Implement Bio::SeqFeature::TypedSeqFeatureI using Bio::SeqFeature::Annotated
- Fix roundtripping issue with swiss.t/Handler.t
- Eventually remove Bio::AnnotationI overloads after testing
- Add tests:
- Term.t, new methods and test deprecation warnings
- Annotation.t display_text() (replacement for stringification overloads).
- More rigorous tests to FeatureIO.t and SeqFeatAnnotated.t
Tests (CVS HEAD)
Failed Test Stat Wstat Total Fail List of Failed
-------------------------------------------------------------------------------
t/DB.t 255 65280 116 31 101-116
t/Genewise.t 3 768 53 3 37 41 45
(3 subtests UNEXPECTEDLY SUCCEEDED), 7 tests and 33 subtests skipped.
Failed 2/247 test scripts. 19/17250 subtests failed.
Files=247, Tests=17250, 388 wallclock secs (127.72 cusr + 18.88 csys = 146.60 CPU)
Failed 2/247 test programs. 19/17250 subtests failed.
Notes
- fixed some bugs with tests in CVS HEAD (DB.t may be a server-related issue).
Simple Benchmark
Though we all know benchmarks have issues, here's a simple benchmark test using the following script and GenBank CP000473 (a 10 Mbp microbial genome) comparing bioperl-live MAIN branch and branch featann_rollback
use strict; use warnings; use Benchmark; use Bio::SeqIO; my $test = shift || die "Must supply file for benchmark\n"; timethis( 10, \&live, ); sub live { my $in = Bio::SeqIO->new(-format => 'genbank', -file => $test); my $ct = 0; while (my $seq = $in->next_seq) { $ct++; } print "Live : Parsed $ct seq(s)\n"; }
- bioperl-live : 126 wallclock secs (122.79 usr + 0.85 sys = 123.64 CPU) @ 0.08/s (n=10)
- rollback : 86 wallclock secs (84.87 usr + 0.57 sys = 85.44 CPU) @ 0.12/s (n=10)