ListSummary:April 26-May 9,2006
April 26-May 9, 2006
YT wants to apologize for is a rushed and late ListSummaries report this week; he’s getting a poster together and heading off to a conference. Speaking of conference, the next ListSummaries report will not be posted until May 26. It would have been on time that week but YT’s better half had a laptop malfunction and Steve Job’s and Co. decided not to release the newest MacBooks until mid-May. Sigh…
I am making a small change this week; I’m adding ‘’Announcements’’ to the top of the list. These are anything, well, announced to the world at large (releases, changes in code, etc.).
Make sure to show the ‘orphans’ your love! Well, except YT and his mad ramblings…
BioPerl-run in FreeBSD
Bio::DB::GenBank and complexity parameter
YT announces that changes were made to incorporate ‘complexity’ into Bio::DB::GenBank (and yes, adding it was complex).
Batch retrieval partially implemented in Bio::DB::GenBank/GenPept
YT mentions the implementation of batch retrieval in Bio::DB::GenBank/GenPept and gets laughed at for his fat fingers…
Changes to NCBIHelper (RE: CONTIG, genome files)
And now to return to our regularly scheduled broadcast…
Reena Yadav wrote in with installation problems on Linux. Torsten points out that a web proxy error is the likely culprit…
Windows and BLAST
Raghunath Verabelli wanted know know about a local BLASTP program for Windows. YT show him the light (the ‘light’ being NCBI’s BLAST download page) and sets him right on how to set everything up…
Problems with GenBank file GI identifier and Bio::Index::GenBank?
Paul Mooney makes a change to Bio::DB::GFF for parsing all the special tags for GFF3. YT, being skittish from all the commits coming from the GMOD guys, warns that they probably need to make change. Scott Cain, paying rapt attention, obliges…
How to obtain GIs from clone_ids
Anand Venkatraman wants to know how to get the clone_id from the Features section of a GenBank file and use it to retrieve the GI. Wenwu Cui shows him how…
Problems with parsing bacterial strain from EMBL OS or RC lines
Mark A. Miller notes problems with parsing the strain information from bacterial species. Jason and Brian O. point out the previous thread talking about this, and James Abbott adds that his $job is keeping him busy at the moment but has it on his priority list.
Bio::RangeI intersections and crossroads
Marco Blanchette reveals a possible problem with Bio::DB::GFF when using the intersection function from a Bio::RangeI object. Malcolm Cook and Sean Davis offer suggestions. Marco digresses on a tangent about how to get the BioPerl version number, which Heikki and Torsten gladly share…
Problems in GenBank CONTIG parsing
Michael Rogoff points out a serious bug in GenBank file parsing when there is a CONTIG line present. YT decides to make a bug fix to Bio::SeqIO::genbank to get around this issue and crosses his fingers that it doesn’t break anything. He also makes related changes to Bio::DB::NCBIHelper to allow direct downloading these GenBank CON files (see below)…
Frames, strands, and BLASTX
Li Xiao wants to know how to get the strand information from a NCBI BLASTX report; YT shows him how…
T.D. Houfek points out that using Bio::Seq::Quality to write FASTA-style quality files doesn’t output the expected description line. Hilmar offers a suggestion to check it out but Dave Messina provides a fix…
Problems with BLAST file parsing
Hubert Prielinger submitted a problem with parsing multiple BLAST reports. After various attempts, a bug submission, and some ‘hemming and hawing’, Jason finally points out that the logic of the loop was in error…
AlignIO problems parsing FASTA files
Jason responds to an off-list email about problems with parsing FASTA files using Bio::AlignIO. Turns out there was an errant space causing the problem. He then realizes that Gloria may be just trying to read sequences so suggests using Bio::SeqIO and using the HOWTO’s…
Primer Parameters and Primer3 Parsing, post haste
Chen Li wanted to know how to change a parameter for Bio::Tools::Run::Primer3 to increase the PCR product size. Various suggestions were put forward, with Roy Chaudhuri providing the final solution. Chen then submitted a second issue, in which results were being overwritten; Brian O. and Paul Wiersma offer solutions, including using Bio::Tools::Primer3…
PAML + Codeml problem
Raw Blast Alignment
Simon wanted to know how to parse normal BLAST reports. YT shows him HOWTO…
Colorizing GFF features
Peter wanted to know how to colorize transcription factor sequence features based on the factor. Marc Logghe shows him HOWTO (wow, that’s becoming repetitive…)
Orphan 1 : Converting gene predictions to GFF - Mark Gosink wants to know how to get the output from genscan, glimmerhmm, and fgenesh into GFF format…
Orphan 3 : Calculating Fu and Li's D statistic - Saurabh Johri wanted to know how to calculate Fu and Li’s D statistic using BioPerl…
Bioperl-guts (for the die-hards)
Note: Significant module changes and additions to CVS are normally announced on the main bioperl-l list if they are in decent enough condition for production work. If modules listed below have not been announced, then there may be a very good reason for it. If you plan on trying to use these, consider contacting the author(s). Many of the modules discussed in this section are highly experimental and are in various stages of development. They may or may not work at all. Therefore, we are not responsible for any problems faced with using this code.
Abandon hope all ye who enter here!
Bugs Old and New
- RESOLVED - Bug 1985 - PSI-BLAST parsing fails on Windows – fix made to Bio::SearchIO::blast (YT)
- NEW - Bug 1986 - SearchIO::blast mixes up hits where duplicate accessions are present. Not sure what to think about this bug; is a database that has accessions with exactly the same name considered legitimate? YT and Jason take this one on…
- NEW – RESOLVED - Bug 1988 - SearchIO::blast parses bit score incorrectly when score is reported in scientific notation – looks like this was fixed in CVS a while ago.
- NEW - Bug 1989 - GI identifier missing when using Bio::Index::GenBank.’’’ This was first reported on the regular mail list..
- NEW - ENHANCEMENT - Bug 1990 - patch for SAX2 parsing in SearchIO::blastxml using XML::SAX – added JIC we decide the XML parser is to be updated for SAX2 parsing
- NEW – RESOLVED - Bug 1991 - Bio::Align::DNAStatistics requires sequence in uppercase, but doesn't always do a "uc" – Heikki fixed this one
- NEW - Bug 1993 - Changes to Bio:::SeqIO::genbank – YT announces change to the way the CONTIG line is parsed in GenBank CON files; this is a placeholder bug in case someone wonders why their script broke due to this change
- NEW – RESOLVED - Bug 1994 - SearchIO can't parse blast output file – Jason figured this one out
- RESOLVED - Bug 1405 - Enhancement to Bio::DB::GenBank to allow subsequence retrieval – YT added the ‘complexity’ parameter to NCBIHelper.
YT made changes to the following modules:
Bio::Tools::Run::RemoteBlast - fixed bug 1935 – added option to save XML output
Bio::SeqIO::genbank – bugfix to prevent CONTIG file parsing from crashing BioPerl
t/DB.t - added tests for complexity, CONTIG file
Bio::Location::Split – obfuscation is fun!
Brian O.’s changes:
t/Range.t, Bio::RangeI – add tests, clarified POD for problem described in the mail list (Marco Blanchette)
Bio::Align::DNAStatistics, t/data/hs_owlmonkey.aln, t/data/insulin.water - fixes for bug 1991
Scott Cain’s changes:
Bio::Tools::GFF - updated the list of reserved tags (prompted by mail list request)
Bio::Graphics::FeatureFile – make more robust against bad data, add sanity checks
Bio::Graphics::Glyph – fixed problems drawing ideograms