April 12 to April 25
It’s the second post already?!? Many things have happened in the last few weeks, including one Jason Stajich passing his defense (against the dark arts?) and completing his dissertation, thus paving the way towards glory (“glory” likely being a 2-42 year postdoc somewhere). Many kudos, cheers, and beers Dr. Stajich!
I had some nice feedback about the first edition. Hope this is a useful summary for the Bioperl users… Bioperlites? Bioperlers? Bioperlies? None of these sound really catchy.
Okay, enough fooling around, on to the summary…
no_redirect-ing Bio::DB::GenBank to grab NCBI seqfeatures
Eleni Rapsomaniki wrote about problems retrieving RefSeq sequence features from GenBank. Several suggestions were proposed; turns out that $RefSeq_EBI != $RefSeq_NCBI, at least for the sequence listed (i.e. you can retrieve the sequence, but EBI lacked seqfeatures and NCBI didn’t). Basic resolution: -no_redirect (if you are updated to bioperl-live, that is).
See the fun begin here…
Questions about Bio::AlignIO question marks
Kai Müller asked why Bio::AlignIO ignores question marks in sequences; some code suggestions were made before YT realized that this was reported as a bug and the fix was already made. Question is, what is ‘?’?
Confused? Go here for answers…
Another GFF3 validator
Continuing on a thread about GFF3 validators, Chris Mungall suggested another to validate parent/child relationships and sequence ontology. I wanted to add a bad joke about Nanny 911 here but just couldn’t quite make myself do it...
PSI-BLAST and Windows
David Waner found that blastpgp output does not parse correctly on Windows due to a variation in the executable output. YT told him to report it as a bug... which YT forgot to fix! Good thing there’s Bugzilla!
Codon file parsing problems
Marc Logghe pointed out a problem with parsing codon files using Bio::CodonUsage::IO. Brian O. found that the module requires codon files in output like Kazusa’s CUTG database…
R. Prabu wanted to know, given a GenBank RefSeq file, if there is a way to retrieve each gene with its transcript and protein IDs. Sean Davis and YT had some pointers (Sean’s final post being the most direct solution IMHO)…
StandAloneBlast, Windows, and Patience
Daniel Bornman emailed the list trying to determine how to use StandAloneBlast with Windows. Barry Moore, Brian O., and Alexander Kozik try helping out, with Barry’s suggestion giving the fix. NCBI hasn’t updated their instructions on how to set up local BLAST for Windows in a long time (surprise surprise). YT found some updated instructions. YT doesn’t think this affects other, better UNIX’y systems in any way (fingers crossed)
To BLAST or not to BLAST…
Robert Murphy asked, when one has a sequence and a database, how can one use BLAST to find unique 20-mers in the sequence? Torsten suggested the use of Perl’s index() function to test exact matches while Marc, sticking with the idea of using BLAST, suggested changing the wordsize…
Problems with parsing Species names and exceptions
Stefano Ghignone wants to know how to prevent a script from exiting with an exception error when parsing seemingly invalid species names. Mauricio offers to help out and Heikki offers some tips about using an
Bioperl Wiki Weirdness and the RSS feed (could be a great movie title, like Snakes on a Plane)
YT finds a little problem with a Bioperl wiki RSS feed. Jason finds a trailing newline in a wiki script is the problem and fixes it...
Merging sequences and features from GenBank files
Haiming Wang wonders if there is a tool or script the can merge GenBank sequence together into one record with updated feature coordinates. Brian O. offers some help, but Roy Chaudhuri replies (off-list BTW) that a cat() function was added to Bio::SeqUtils to do just what Haiming wants…
Orphans and Leftovers
Only one lonely orphan this week…
Reena Yadav posted some installation problems with Bioperl. Looks like it is on Windows, though he mentions ‘root priviledges’…
Update: Oops, he mentions Linux. My bad. His use of PPM mixed me up a bit...
There was actually some action on the BioSQL list this week (no tumbleweed analogies needed)!
BIOSQL on Oracle 10.1.0.3 needs an Oracle Patch
Gerben Menschaert mentions that using bioperl-db and BioSQL on a Oracle 10.1.0.3 requires a patch, otherwise you’ll get a nasty error.
You want a script? Well, here’s your script!
Gerben also submitted a simple question: where’s this script called load_ncbi_taxonomy.pl? Within minutes he gets three responses. Now that’s service!
Bioperl-guts (for the die-hards)
Note: Significant module changes and additions to CVS are normally announced on the main bioperl-l list if they are in decent enough condition for production work. If modules listed below have not been announced, then there may be a very good reason for it. If you plan on trying to use these, consider contacting the author(s). Many of the modules discussed in this section are highly experimental and are in various stages of development. They may or may not work at all. Therefore, we are not responsible for any problems faced with using this code.
Abandon hope all ye who enter here!
Lincoln Stein made many more revisions in the quest to integrate GFF3 into Bioperl. Looks like if you are currently toying around with or plan on using these you might want to update from CVS. I’m sure I probably left off some module here…
- Deleted Bio::DB::SeqFeature::Store::Cacher (code folded into Bio::DB::SeqFeature::Store)
- Bio::DB::SeqFeature::LazyTableFeature renamed to Bio::DB::SeqFeature
- Bio::DB::SeqFeature::LazyFeature renamed to Bio::DB::SeqFeature::NormalizedFeature
Update: Hilmar suggests not using the code until Sohel gets a chance to make changes to the code in case there is an API change.
YT submitted a new module for parsing ERPIN output (a program for finding RNA motifs):
Modifications were made by Brian O. to update a few modules, partly in response to mailing list users:
Modifications were made by YT to update Bio::DB::NCBIHelper and Bio::DB::WebDBSeqI in response to problems in retrieving sequence chunks from GenBank using Bio::DB::GenBank and the ‘strand’ tag (these were added for a requested enhancement from BugZilla, see below). The ‘complexity’ tag doesn’t work as expected yet. Also, YT figures that these methods should really be in Bio::DB::NCBIHelper and not Bio::DB::WebDBSeqI, so expect more fixes…
Jason worked on Bio::Tools::Phylo::PAML and Bio::Tools::Phylo::PAML::ModelResult to fix a bug with model parsing. He also further distanced himself from Bio::Tools::Run::RemoteBlast, which Roger Hall has taken over (YT thinks)…
- bug #1983 : Paml parser for NSsites parse only first 3 models bugzilla-daemon at newportal.open-bio.org – fixed by Jason
- bug #1984 : proposed enhancement : 2 new SimpleAlign methods: set_new_reference & uniq_seq bugzilla-daemon at newportal.open-bio.org
- bug #1985 : PSI-BLAST parsing fails on Windows – YT checking on this one
- bug #1405 : Enhancement to Bio::DB::GenBank to allow subsequence retrieval – YT working on; ‘strand’ tag implemented
- bug #1986 : SearchIO::blast mixes up hits where duplicate accessions are present – not assigned yet
Odds and ends
- INSTALL.WIN - Brian Osborne – updated BLAST docs from Windows
- scripts/Bio-DB-GFF/bulk_load_gff.PLS - Scott Cain – postgre flag fix
For suggestions, errors, gripes, etc., please post on the Talk page here or on the mailing list.
The next post should be around May 9th. See you then...