From BioPerl
Jump to: navigation, search


Bug submission

BioPerl has switched (as of mid-April, 2014) to GitHub Issues. We still have access to the Redmine-based tracking system, but this will be effectively read-only. We no longer support access to the original Bugzilla instance.

Don't make me bug you!

Please submit bugs or enhancement requests to BioPerl GitHub Issues. The older BioPerl Redmine tracking system remains but is will no longer be used, and the oldest Bugzilla-based system is not supported and addition of new bugs has been disabled.

We really do want you to submit bugs, even though it means more things for us to do! It means there is something we didn't think of or test in the particular module and we won't know about this unless you tell us. The Redmine system requires you register to avoid spam and to allow us to contact you again when the bug is fixed or to clarify the problem and solutions.

It is important that you record the version of BioPerl you are running (if you don't know, see the FAQ the question is addressed there). You can also include the version of Perl you are using and the Operating System you are running.

Simon Tatham has a great resource on how to effectively report bugs.

Submitting Bugs

When submitting new bugs on BioPerl on GitHub, enter a brief description and other general information. You can use Markdown to add links, syntax highlighting, and so on; see the GitHub Markdown docs for more.

You can paste example code in the description, but we suggest submitting as a GitHub Gist. If you have example fixes, we highly suggest using the tools GitHub has in place, namely the ability to fork the code and create a pull request with the relevant fix. This will show up as an issue automatically, so there isn't a need to file one separately.

Note that attachments on GitHub issues only work for images. If the example is a text file then use a GitHub Gist; alternatively, if the file is something available publicly then please provide a link to the file.

Submitting Patches

We gladly welcome patches. Patches for bioperl code should be created as described in the SubmitPatch HOWTO. Try to ensure the patch is derived against the latest code checked out from Git, particularly if the patch is large.

Briefly, you can generate the patch using the following command:

diff -u old new

For best results, follow this example:

git pull
git diff GrokFrobnicator.pm > my-patch.dif

We also accept patches as an issue on GitHub; submit the patch as a GitHub Gist. Even better, submit a pull request on GitHub. It's also worth discussing these on the mailing list.

Submitting New Modules and Code Snippets

We also accept new code, either as full-fledged modules or as snippets of code (snippets work better as a patch). New code must include documentation, example code (typically listed in a SYNOPSIS section), and tests with decent test coverage following our testing standards in our Writing_BioPerl_Tests HOWTO). Because we are moving to a more modular scheme for future Bioperl installations we highly suggest individual submission of modules to CPAN, primarily to help lower the barrier to submitting bug fixes.


Good bug reports are ones which provide a small amount of code and the necessary test files to reproduce your bug. By doing this work up front you insure the developer spends most of his or her time actually working on the problem. Pasting your entire 600 line program into the comment buffer is probably not going to get an enthusiastic response. In addition, isolating the problem down to a small amount of your code will help ensure that the bug is not on your end before we dive in and start working on it.

Open Issues (GitHub)

This list details the open Bugs on GitHub for BioPerl.

Github Issues to RSS

Issue 109: Stop masking $seq variable in entrezgene.pm next_seq
$seq is decalared at line 151 so that it doesn't have to be passed as an argument to subs. However, at line 199, in the next_seq sub, $seq is declared again such that the original $seq variable is never defined. As far as I can tell this works in practice because for the most part $seq is only accessed in the next_seq sub. However, if you run into a variable without a tagname and try to throw the warning at line 571 your script dies with an error when trying to call method 'id' on an undefined variable. This suggested fix is simply to stop re-declaring $seq at line 199.

Issue 108: parsing segmented Genbank records
Hi there, I'm parsing a whole bunch of Genbank records to get CDS sequences, and found one weird record that messes up my pipeline. The Genbank accession is S81162. It turns out it's a segmented record (the CDS joins four regions from four different Genbank entries). Reading the wiki, it seems like Bioperl should be able to recognize this, but I think maybe the code no longer parses that part of the Genbank record? Details are below. I'd like to just check for segmented records and skip them so they don't throw my code and I can still parse all the other records in the same file (I don't need every single CDS - I think segmented records will be rare). From the help given here: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_the_Annotations it looks like I should be able to use code like this: my $anno_collection = $seq_obj->annotation; and $anno_collection->get_all_annotation_keys; to recognize that this record has a SEGMENT annotation. However, it actually looks like the SEGMENT annotation is ignored when the Genbank record is parsed. Shouldn't it have come through into $anno_collection? I updated my bioperl from github this morning. I guess I can parse each record outside of bioperl before I go into Bioperl, but it'd be great if I can just use Bioperl to get at those SEGMENT annotations. Does that seem easy to implement (or re-implement - from the HOWTO page it looks like it was something that used to be possible)? Some code that hopefully shows what I mean is below. thanks very much, Janet Young ------------------------------------------------------------------- Dr. Janet Young Malik lab http://research.fhcrc.org/malik/en.html Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., A2-025, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 4512 email: jayoung ...at... fhcrc.org ------------------------------------------------------------------- #!/usr/bin/perl use warnings; use strict; use Bio::SeqIO; use Bio::DB::EUtilities; ##### get the troublesome sequence from Genbank: my $file = "S81162.gb"; if (!-e $file) { my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'protein', -rettype => 'gb', -email => 'jayoung@fhcrc.org', -id => "S81162"); $factory->get_Response(-file => $file); } #### parse it: my $seqIN = Bio::SeqIO->new(-file => "$file", '-format' => 'Genbank'); while (my $seq = $seqIN ->next_seq) { ##### look at the annotations - SEGMENT is not captured my $anno_collection = $seq->annotation; for my $key ( $anno_collection->get_all_annotation_keys ) { my @annotations = $anno_collection->get_Annotations($key); for my $value ( @annotations ) { print "tagname : ", $value->tagname, "\n"; print " annotation value: ", $value->display_text, "\n\n"; } } }
Issue 107: Use of uninitialized value $Bio::DB::NCBIHelper::HOSTBASE in concatenation (.) or string at /usr/share/perl5/Bio/DB/Query/GenBank.pm line 103.
We have update our bioperl version to 1.6923. Now, when we ran the local version of Guidance, an dna sequence multiple alignment software (http://guidance.tau.ac.il/), we get the following error: "Use of uninitialized value $Bio::DB::NCBIHelper::HOSTBASE in concatenation (.) or string at /usr/share/perl5/Bio/DB/Query/GenBank.pm line 103." "Use of uninitialized value $Bio::DB::NCBIHelper::HOSTBASE in concatenation (.) or string at /usr/share/perl5/Bio/DB/Query/GenBank.pm line 104." Could provide any help? Best regards, Joaquim
Issue 104: Open Bio::DB::Taxonomy flatfile indices read-only
Do existing Bio::DB::Taxonomy flatfile indices need to be opened O_RDWR? This is failing for us where one user has created the indices and another user is simply using them.
Issue 102: Current bioperl-live won't commence Build
Just tried to build latest live 01ec10dda23b6c5ed7592808cff4ae0d34278611 and got the error below. I have File::Copy 2.09 from Perl 5.14 (Ubuntu Server 12) but latest 2.29 is part of Perl 5.20 core which I can only force install. ````  % git clone https://github.com/bioperl/bioperl-live.git  % cd bioperl-live  % perl ./Build.PL  % ./Built test 'blib/script/bp_chaos_plot.pl' and 'blib/script/bp_chaos_plot.pl' are identical (not copied) at /usr/share/perl5/Bio/Root/Build.pm line 219 Use of uninitialized value $atime in utime at /usr/share/perl/5.14/File/Copy.pm line 393. Use of uninitialized value $mtime in utime at /usr/share/perl/5.14/File/Copy.pm line 393. Can't rename 'blib/script/bp_chaos_plot.pl' to 'blib/script/bp_chaos_plot.pl': No such file or directory at /usr/share/perl5/Bio/Root/Build.pm line 219. ```
Issue 101: Expand RemoteBlast synopsis sample code
This illustrates how to send BLAST searches to a cloud service provider.
Issue 100: Bio::DB::Taxonomy::flatfile needs rewrite
We need to rethink the index structure here now that the number of taxonomy items is so large, DB_File hash interface is not working for so many entries. Can we explore either BerkeleyDB or perhaps a NOSQL option that can still run as a flatfile indexed file?
Issue 99: bioperl-guts-l notifications
Would like to get bioperl-guts-l notifications for github projects. Will look into this.
Issue 91: Support FTS attribute table and WITHOUT ROWID optimization for Bio::DB::SeqFeature::Store::DBI::SQLite
Several of our GBrowse instances had issues with full-text (attribute) searches timing out. Profiling revealed that the execution time of Bio::DB::SeqFeature::Store::search_attributes on our Bio::DB::SeqFeature::Store::DBI::SQLite databases contributed significantly to this problem. The proposed changes, which add support for indexing the attribute table using SQLite's FTS (full-text search) extension, resolved the issue (when used in conjunction with Scott Cain's recent commit that removed the use of CGI::Pretty in GBrowse: https://github.com/GMOD/GBrowse/commit/298cab3ece8f68b7baf2e9d085fab9055772fa3c) To demonstrate the performance impact of an FTS attribute table on search_attributes(), consider the following script: ``` # search_attributes.pl use strict; use warnings; use Bio::DB::SeqFeature::Store; my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::SQLite', -dsn => $ARGV[0]); my @features = $db->search_attributes($ARGV[1], ['arabidopsis_defline', 'arabidopsis_symbol', 'pfam', 'go', 'panther', 'kegg_enzyme', 'kegg_orthology', 'cog_cluster']); print 'Features: ' . scalar(@features) . "\n"; ``` Given a Bio::DB::SeqFeature::Store::DBI::SQLite database with gene models & annotation created from a GFF3 file with 692,300 features, the following were typical observed execution times in our environment: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ $ time perl search_attributes.pl track-orig.db iron Features: 1283 real 0m1.61s user 0m0.71s sys 0m0.89s $ time perl search_attributes.pl track-fts.db iron Features: 1280 real 0m0.27s user 0m0.20s sys 0m0.06s ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The difference in number of matches is due to the difference in behavior between the two methods: FTS (with the MATCH operator) searches for tokens, while LIKE '%iron%' finds substrings. The extra three results returned with "LIKE '%iron%'" contained a spurious match containing "iron" as a substring: acclimation of photosynthesis to environment and two occurrences of "diiron", which may be relevant to a user: dicarboxylate diiron protein, putative (Crd1) OTOH, if the user searches for "Fe" instead, they get "real" hits with an FTS attribute table, whereas the non-FTS search returns thousands of spurious hits where "fe" is a substring. Because of this difference in behavior (and possible portability issues to systems with old DBD::SQLite instances---see below), I thought that FTS should be op-in rather than the default. Also, note that FTS support depends on the version of DBD::SQLite. The current DBD::SQLite by default supports two versions: FTS3 since sometime before 1.30_04 (2010-08-25), and FTS4 since 1.36_01 (2012-01-19). At least one design decision I made while implementing this change should be considered/debated before accepting this pull request: the -fts option is just a boolean flag. My initial implementation supported the creation of an FTS attribute table using a user-specified FTS version, but at the last minute I decided to KISS and just use the most recent version supported by the installed DBD::SQLite. This isn't a problem if someone decides to implement an FTS attribute table for MySQL, which supports only one such index type (FULLTEXT: http://dev.mysql.com/doc/refman/5.6/en/fulltext-search.html). However, it's conceivable that one might want to implement FTS for PostgreSQL and have control over whether GIN or GiST indexing is used (http://www.postgresql.org/docs/9.3/static/textsearch-indexes.html), or, with SQLite, specify FTS3 at database (instead of the most recent FTS4) at creation time to allow its use on a host with an older DBD::SQLite.
Issue 87: Issues with bioperl versioning
Blocker on new 1.6/1.7 releases, see: https://github.com/andk/pause/issues/75
Issue 83: GenBank parsing CONTIG issues
See: http://mailman.open-bio.org/pipermail/bioperl-l/2014-September/088945.html Basically, this works with the August patch for GenBank parsing (so the bug isn't there) but some change since 1.6.922 has caused parsing to slow dramatically. We'll need to bisect this.
Issue 79: Added script to extract DNA sequences (as well as 5' or 3' regions if specified) from a FASTA file using a BLAST output file
Added a personal script to extract a DNA sequence from a FASTA file using a BLAST output file. Expects at least two arguments, the BLAST file and the FASTA file. There are a number of optional arguments that are explained in the script. This script is especially useful when trying to extract sequences with variance (hence the BLAST search beforehand) from FASTA files. For example, say that you are trying to extract a given gene and 2000 base pairs 5' to it from 20 different genomes. All you have is one gene sequence, however. By doing a BLAST search between each of the genomes and the gene and then using this script, you can extract the sequences that you are interested in. The script also has options to extract a specified 3' or 5' sequence from the FASTA file, as well as an e-value cut off. The final output is the extracted sequence in FASTA format. Is this useful/generic enough to be included in the scripts directory? The script is well tested and takes command-line arguments.
Issue 61: new modules Bio::Tools::Alignment::Overview and Bio::DB::NextProt
Some time ago I made the mistake of uploading to CPAN a module called Bio::Tools::Alignment::Overview, the module wasn't associated to bioperl but for some reason I decided to use the namespace anyway (yeah, I know...) Now I'm organizing some projects and I decided to include the module to bioperl, so if you accept this pull request I will remove from PAUSE/CPAN the current module, so that you can upload it under bioperl. Sorry for the noobish mistake =) Cheers.
Issue 46: Simplealign
Hi, Chris, This is my implementation of Bio::SimpleAlign. I have a detailed report describing the improvement to the code, and explaining the failed tests using t/Align/SimpleAlign.t. If you need any help, please just let me know. Cheers, Jun

Personal tools
Main Links