From lincoln.stein at gmail.com Fri May 1 13:33:09 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 1 May 2009 13:33:09 -0400 Subject: [Bioperl-l] Bio::DB::SeqFeature::Segment problem In-Reply-To: <23319982.post@talk.nabble.com> References: <23319982.post@talk.nabble.com> Message-ID: <6dce9a0b0905011033r307e8b88l5caaddc953f7de95@mail.gmail.com> Hi Jon, Sounds like your multiple chromosome-1 problems have been cleared up. The documentation should mention the exception and doesn't. I will fix it. Lincoln On Thu, Apr 30, 2009 at 12:40 PM, Jon Flowers wrote: > > Dear colleagues, > > I have set up a mySQL database and loaded a GFF3 and fasta file using > Bio::DB::SeqFeature::Store::GFF3Loader. Everything appears to be working > normally except when I attempt to create a Bio::DB::SeqFeature::Segment > object. > > The following works as expected: > > my $db = Bio::DB::SeqFeature::Store->new(-adaptor => 'DBI::mysql', > -dsn => > 'dbi:mysql:foo', > -user => > 'myuser', > -pass => > 'mypassword', > -write => > '1'); > > my @features = $db->features(-seq_id=>'chr1', > -start=>1, > -end=>10000, > -types=>['gene']); > > However, when I try to create a segment object using either of the two > following method calls I get an error: > > my $segment = $db->segment('chr1',1=>10000); > > my $segment = $db->segment( -seq_id => 'chr1', -start => '1', -end => > '10000'); > > -------------------------------- EXCEPTION > ------------------------------------ > > MSG: segment() called in a scalar context but multiple features match. > Either call in a list context or narrow your search using the -types or > -class arguments > > STACK Bio::DB::SeqFeature::Store::segment > /usr/share/perl5/Bio/DB/SeqFeature/Store.pm:1178 > STACK toplevel trial.pl:42 > ------------------------------------------------------- > > Calling in list context (which is not defined in the documentation) > produces > an array of 22 identical scalars = 'chr1:1..10000'. > > Any ideas? > > Thanks > > Jonathan > > -- > View this message in context: > http://www.nabble.com/Bio%3A%3ADB%3A%3ASeqFeature%3A%3ASegment-problem-tp23319982p23319982.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From hlapp at gmx.net Sun May 3 14:36:59 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 3 May 2009 14:36:59 -0400 Subject: [Bioperl-l] Other object oddities In-Reply-To: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> Message-ID: I agree, $seq->seq() could possibly be better named. Maybe $seq- >seqstr()? The thing is that having $seq->seq() return an object would be meaningless - it would be $self. You can test what kind of object you have using ref() or isa(): $seq = $obj->seq(); # we need the sequence string $seq = $seq->seq() if ref($seq) && $seq->isa("Bio::PrimarySeqI"); There has been a naming consistency review, but it's been a long time. -hilmar On Apr 30, 2009, at 5:56 PM, Kevin Brown wrote: > So, I'm using quite a bit of bioperl code in my own stuff and have > been > seeing some oddities with the naming of methods. A good example > would be > in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method > called > "seq" but in the latter case it returns an object (and expects an > object > when doing a Set) and in the former it returns a string and expects a > string when doing a Set. > > This makes for a bit of brain freeze on my part when the return from > another object might be a Bio::Seq or Bio::SeqFeature::Generic and now > calling the ->seq returns different things. > > Guess I'm just curious if anyone has done an audit of the methods of > the > various objects and their return types to see how consistent they are > across even a subsection of the codebase? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From wangyi2412 at gmail.com Mon May 4 00:42:31 2009 From: wangyi2412 at gmail.com (yi wang) Date: Mon, 4 May 2009 12:42:31 +0800 Subject: [Bioperl-l] bioperl / emboss on windows Message-ID: ---------- Forwarded message ---------- From: yi wang Date: 2009/5/4 Subject: bioperl on windows To: bioperl-l at bioperl.org I have installed the bioperl and emboss on my* windows xp*, as guided on the web. But it --------------------- WARNING --------------------- *MSG: Application [needle] is not available!* --------------------------------------------------- use warnings; use CGI; use Bio::Perl; use Bio::Root::Root; use Bio::Factory::ApplicationFactoryI; use Bio::Factory::EMBOSS; use Bio::Tools::Run::EMBOSSApplication; *my $f = Bio::Factory::EMBOSS -> new();* *$f->program("needle");* #my $factory = new Bio::Factory::EMBOSS; #my $compseqapp = $factory->program("needle"); I checked the manual and the emboss.pm, write the programe as the demo, but it could work! How could it be the problem? Thank you very much! *Looking for your reply!* Best Wishes, -- ?????????? From SMarkel at accelrys.com Mon May 4 09:41:06 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Mon, 4 May 2009 09:41:06 -0400 Subject: [Bioperl-l] bioperl / emboss on windows In-Reply-To: References: Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A11B418@exch1-hi.accelrys.net> Is needle in your path? Note that needle needs two input sequences, which you don't provide. You might try invoking embossversion, which takes no inputs. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of yi wang > Sent: Sunday, 03 May 2009 9:43 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] bioperl / emboss on windows > > ---------- Forwarded message ---------- > From: yi wang > Date: 2009/5/4 > Subject: bioperl on windows > To: bioperl-l at bioperl.org > > > I have installed the bioperl and emboss on my* windows xp*, as guided on > the web. But it > --------------------- WARNING --------------------- > *MSG: Application [needle] is not available!* > --------------------------------------------------- > > > use warnings; > use CGI; > use Bio::Perl; > use Bio::Root::Root; > use Bio::Factory::ApplicationFactoryI; > use Bio::Factory::EMBOSS; > use Bio::Tools::Run::EMBOSSApplication; > > > > *my $f = Bio::Factory::EMBOSS -> new();* > *$f->program("needle");* > #my $factory = new Bio::Factory::EMBOSS; #my $compseqapp = $factory- > >program("needle"); > > I checked the manual and the emboss.pm, write the programe as the demo, > but it could work! How could it be the problem? Thank you very much! > *Looking for your reply!* > > > Best Wishes, > > > > -- > ?????????? From Kevin.M.Brown at asu.edu Mon May 4 11:31:30 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 4 May 2009 08:31:30 -0700 Subject: [Bioperl-l] Other object oddities In-Reply-To: References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> Message-ID: <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> I don't mind that Bio::Seq uses seq to return a string. In fact I prefer that. Just would be nice if other objects obeyed the same convention. Bio::SeqFeature::Generic returns an object for both entire_seq and seq, but uses attach_seq to store the Bio::Seq object into the Feature. Maybe SeqFeature could be adjusted so that ->seq returns the sequence string of the feature (just like Bio::Seq) and ->feature_seq returns the Bio::Seq object. > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Sunday, May 03, 2009 11:37 AM > To: Kevin Brown > Cc: BioPerl List > Subject: Re: [Bioperl-l] Other object oddities > > I agree, $seq->seq() could possibly be better named. Maybe $seq- > >seqstr()? > > The thing is that having $seq->seq() return an object would be > meaningless - it would be $self. > > You can test what kind of object you have using ref() or isa(): > > $seq = $obj->seq(); > # we need the sequence string > $seq = $seq->seq() if ref($seq) && > $seq->isa("Bio::PrimarySeqI"); > > There has been a naming consistency review, but it's been a long time. > > -hilmar > > > On Apr 30, 2009, at 5:56 PM, Kevin Brown wrote: > > > So, I'm using quite a bit of bioperl code in my own stuff and have > > been > > seeing some oddities with the naming of methods. A good example > > would be > > in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method > > called > > "seq" but in the latter case it returns an object (and expects an > > object > > when doing a Set) and in the former it returns a string and > expects a > > string when doing a Set. > > > > This makes for a bit of brain freeze on my part when the return from > > another object might be a Bio::Seq or > Bio::SeqFeature::Generic and now > > calling the ->seq returns different things. > > > > Guess I'm just curious if anyone has done an audit of the > methods of > > the > > various objects and their return types to see how > consistent they are > > across even a subsection of the codebase? > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > From uludag at ebi.ac.uk Mon May 4 11:39:55 2009 From: uludag at ebi.ac.uk (uludag at ebi.ac.uk) Date: Mon, 4 May 2009 16:39:55 +0100 (BST) Subject: [Bioperl-l] bioperl / emboss on windows In-Reply-To: References: Message-ID: <60194.86.149.78.35.1241451595.squirrel@webmail.ebi.ac.uk> It looks like EMBOSS was disabled in Bio\Factory\EMBOSS.pm for Windows platform. After commenting out the related condition in the _program_list function (as shown below) i don't get the "Application [needle] is not available" error any more. if( #$^O =~ /MSWIN/i || Regards, Mahmut > I have installed the bioperl and emboss on my* windows xp*, as guided on > the web. But it > --------------------- WARNING --------------------- > *MSG: Application [needle] is not available!* > --------------------------------------------------- > > > use warnings; > use CGI; > use Bio::Perl; > use Bio::Root::Root; > use Bio::Factory::ApplicationFactoryI; > use Bio::Factory::EMBOSS; > use Bio::Tools::Run::EMBOSSApplication; > > > > *my $f = Bio::Factory::EMBOSS -> new();* > *$f->program("needle");* > #my $factory = new Bio::Factory::EMBOSS; > #my $compseqapp = $factory->program("needle"); From maj at fortinbras.us Mon May 4 11:50:59 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 4 May 2009 11:50:59 -0400 Subject: [Bioperl-l] Other object oddities In-Reply-To: <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> Message-ID: <4D0732D667FD4A26B6161660107920E5@NewLife> This is definitely a reasonable issue to chase down. How to do it needs a little care. I personally see 'seq' and think 'object', and have resorted to 'seqstr' in my own code to hold/access just strings. FWIW, my preference would be to have any object that has a seq object as a property return objects when a '..._seq' accessor is called. However, the seq objects themselves generally contain the sequence string in their seq() property. We wouldn't want to disrupt that, but would it be worth creating an alias getter/setter for the Seq classes seq() property called 'seqstr'? We could then count on $foo->bar_seq, an object $foo->bar_seq->seqstr, a string $foo->seqstr, a string (not nec same as above) cheers Mark ----- Original Message ----- From: "Kevin Brown" Cc: "BioPerl List" Sent: Monday, May 04, 2009 11:31 AM Subject: Re: [Bioperl-l] Other object oddities >I don't mind that Bio::Seq uses seq to return a string. In fact I prefer > that. Just would be nice if other objects obeyed the same convention. > Bio::SeqFeature::Generic returns an object for both entire_seq and seq, > but uses attach_seq to store the Bio::Seq object into the Feature. > > Maybe SeqFeature could be adjusted so that ->seq returns the sequence > string of the feature (just like Bio::Seq) and ->feature_seq returns the > Bio::Seq object. > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Sunday, May 03, 2009 11:37 AM >> To: Kevin Brown >> Cc: BioPerl List >> Subject: Re: [Bioperl-l] Other object oddities >> >> I agree, $seq->seq() could possibly be better named. Maybe $seq- >> >seqstr()? >> >> The thing is that having $seq->seq() return an object would be >> meaningless - it would be $self. >> >> You can test what kind of object you have using ref() or isa(): >> >> $seq = $obj->seq(); >> # we need the sequence string >> $seq = $seq->seq() if ref($seq) && >> $seq->isa("Bio::PrimarySeqI"); >> >> There has been a naming consistency review, but it's been a long time. >> >> -hilmar >> >> >> On Apr 30, 2009, at 5:56 PM, Kevin Brown wrote: >> >> > So, I'm using quite a bit of bioperl code in my own stuff and have >> > been >> > seeing some oddities with the naming of methods. A good example >> > would be >> > in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method >> > called >> > "seq" but in the latter case it returns an object (and expects an >> > object >> > when doing a Set) and in the former it returns a string and >> expects a >> > string when doing a Set. >> > >> > This makes for a bit of brain freeze on my part when the return from >> > another object might be a Bio::Seq or >> Bio::SeqFeature::Generic and now >> > calling the ->seq returns different things. >> > >> > Guess I'm just curious if anyone has done an audit of the >> methods of >> > the >> > various objects and their return types to see how >> consistent they are >> > across even a subsection of the codebase? >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Kevin.M.Brown at asu.edu Mon May 4 11:58:05 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 4 May 2009 08:58:05 -0700 Subject: [Bioperl-l] Other object oddities In-Reply-To: <4D0732D667FD4A26B6161660107920E5@NewLife> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> <4D0732D667FD4A26B6161660107920E5@NewLife> Message-ID: <1A4207F8295607498283FE9E93B775B405F7F347@EX02.asurite.ad.asu.edu> I guess since my first exposure to BioPerl was reading in FASTA data, that I picked up the preference for ->seq to be a string as that is what happens in Bio::Seq objects. So, I see seq and think sequence string, heheh. Just be aware, ->seq returning/setting a string seems to be far more common than it returning an object. > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > Sent: Monday, May 04, 2009 8:51 AM > To: Kevin Brown > Cc: BioPerl List > Subject: Re: [Bioperl-l] Other object oddities > > This is definitely a reasonable issue to chase down. How to > do it needs > a little care. I personally see 'seq' and think 'object', and > have resorted to > 'seqstr' in my own code to hold/access just strings. FWIW, my > preference would > be to have any object that has a seq object as a property > return objects > when a '..._seq' accessor is called. However, the seq objects > themselves > generally contain the sequence string in their seq() > property. We wouldn't > want to disrupt that, but would it be worth creating an alias > getter/setter for > the Seq classes seq() property called 'seqstr'? We could then count on > > $foo->bar_seq, an object > $foo->bar_seq->seqstr, a string > $foo->seqstr, a string (not nec same as above) > > cheers Mark > ----- Original Message ----- > From: "Kevin Brown" > Cc: "BioPerl List" > Sent: Monday, May 04, 2009 11:31 AM > Subject: Re: [Bioperl-l] Other object oddities > > > >I don't mind that Bio::Seq uses seq to return a string. In > fact I prefer > > that. Just would be nice if other objects obeyed the same > convention. > > Bio::SeqFeature::Generic returns an object for both > entire_seq and seq, > > but uses attach_seq to store the Bio::Seq object into the Feature. > > > > Maybe SeqFeature could be adjusted so that ->seq returns > the sequence > > string of the feature (just like Bio::Seq) and > ->feature_seq returns the > > Bio::Seq object. > > > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Sunday, May 03, 2009 11:37 AM > >> To: Kevin Brown > >> Cc: BioPerl List > >> Subject: Re: [Bioperl-l] Other object oddities > >> > >> I agree, $seq->seq() could possibly be better named. Maybe $seq- > >> >seqstr()? > >> > >> The thing is that having $seq->seq() return an object would be > >> meaningless - it would be $self. > >> > >> You can test what kind of object you have using ref() or isa(): > >> > >> $seq = $obj->seq(); > >> # we need the sequence string > >> $seq = $seq->seq() if ref($seq) && > >> $seq->isa("Bio::PrimarySeqI"); > >> > >> There has been a naming consistency review, but it's been > a long time. > >> > >> -hilmar > >> > >> > >> On Apr 30, 2009, at 5:56 PM, Kevin Brown wrote: > >> > >> > So, I'm using quite a bit of bioperl code in my own > stuff and have > >> > been > >> > seeing some oddities with the naming of methods. A good example > >> > would be > >> > in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method > >> > called > >> > "seq" but in the latter case it returns an object (and expects an > >> > object > >> > when doing a Set) and in the former it returns a string and > >> expects a > >> > string when doing a Set. > >> > > >> > This makes for a bit of brain freeze on my part when the > return from > >> > another object might be a Bio::Seq or > >> Bio::SeqFeature::Generic and now > >> > calling the ->seq returns different things. > >> > > >> > Guess I'm just curious if anyone has done an audit of the > >> methods of > >> > the > >> > various objects and their return types to see how > >> consistent they are > >> > across even a subsection of the codebase? > >> > > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > From cjfields at illinois.edu Mon May 4 11:53:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 May 2009 10:53:47 -0500 Subject: [Bioperl-l] bioperl / emboss on windows In-Reply-To: <60194.86.149.78.35.1241451595.squirrel@webmail.ebi.ac.uk> References: <60194.86.149.78.35.1241451595.squirrel@webmail.ebi.ac.uk> Message-ID: Okay, so I assume everything works then? I remember getting this to work at some point on WinXP years ago (I have since moved on to Linux/ Mac). chris On May 4, 2009, at 10:39 AM, uludag at ebi.ac.uk wrote: > > It looks like EMBOSS was disabled in Bio\Factory\EMBOSS.pm for Windows > platform. After commenting out the related condition in the > _program_list > function (as shown below) i don't get the "Application [needle] is not > available" error any more. > > if( #$^O =~ /MSWIN/i || > > Regards, > Mahmut > > >> I have installed the bioperl and emboss on my* windows xp*, as >> guided on >> the web. But it >> --------------------- WARNING --------------------- >> *MSG: Application [needle] is not available!* >> --------------------------------------------------- >> >> >> use warnings; >> use CGI; >> use Bio::Perl; >> use Bio::Root::Root; >> use Bio::Factory::ApplicationFactoryI; >> use Bio::Factory::EMBOSS; >> use Bio::Tools::Run::EMBOSSApplication; >> >> >> >> *my $f = Bio::Factory::EMBOSS -> new();* >> *$f->program("needle");* >> #my $factory = new Bio::Factory::EMBOSS; >> #my $compseqapp = $factory->program("needle"); > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 4 12:04:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 May 2009 11:04:10 -0500 Subject: [Bioperl-l] Other object oddities In-Reply-To: <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> Message-ID: <87C756F8-44FB-4930-8154-478BE50AE270@illinois.edu> On May 4, 2009, at 10:31 AM, Kevin Brown wrote: > I don't mind that Bio::Seq uses seq to return a string. In fact I > prefer > that. Just would be nice if other objects obeyed the same convention. > Bio::SeqFeature::Generic returns an object for both entire_seq and > seq, > but uses attach_seq to store the Bio::Seq object into the Feature. I think most of these are legacy issues that (for the most part) have just been dealt with ('they just work'), and with the thought that changing things breaks legacy code. I agree with you, though; it's a good time to rethink how we're naming methods, work towards some consistency, and possibly do this for the next significant release. I don't want to fall into the trap that perl 5.x had fallen into (and is working towards digging out of), namely fear of breaking old code. > Maybe SeqFeature could be adjusted so that ->seq returns the sequence > string of the feature (just like Bio::Seq) and ->feature_seq returns > the > Bio::Seq object. That would be a significant API change and would be inconsistent with seq() in other classes returning a Bio::Seq. Not that it's any different than some of the current behavior, but if we want to correct this it should be done in a *consistent*, well-defined way. My thoughts: To me, seq() should always return a Bio::PrimarySeqI (derived from invocant PrimarySeqI class). However, this is currently inconsistent as illustrated by your example. Changing this would require a deprecation cycle. A new method, seqstr()/str()/rawseq(), could be guaranteed to return a raw sequence. Similarly, bioseq(), could always return a Bio::PrimarySeqI. chris >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Sunday, May 03, 2009 11:37 AM >> To: Kevin Brown >> Cc: BioPerl List >> Subject: Re: [Bioperl-l] Other object oddities >> >> I agree, $seq->seq() could possibly be better named. Maybe $seq- >>> seqstr()? >> >> The thing is that having $seq->seq() return an object would be >> meaningless - it would be $self. >> >> You can test what kind of object you have using ref() or isa(): >> >> $seq = $obj->seq(); >> # we need the sequence string >> $seq = $seq->seq() if ref($seq) && >> $seq->isa("Bio::PrimarySeqI"); >> >> There has been a naming consistency review, but it's been a long >> time. >> >> -hilmar >> >> >> On Apr 30, 2009, at 5:56 PM, Kevin Brown wrote: >> >>> So, I'm using quite a bit of bioperl code in my own stuff and have >>> been >>> seeing some oddities with the naming of methods. A good example >>> would be >>> in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method >>> called >>> "seq" but in the latter case it returns an object (and expects an >>> object >>> when doing a Set) and in the former it returns a string and >> expects a >>> string when doing a Set. >>> >>> This makes for a bit of brain freeze on my part when the return from >>> another object might be a Bio::Seq or >> Bio::SeqFeature::Generic and now >>> calling the ->seq returns different things. >>> >>> Guess I'm just curious if anyone has done an audit of the >> methods of >>> the >>> various objects and their return types to see how >> consistent they are >>> across even a subsection of the codebase? >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From uludag at ebi.ac.uk Mon May 4 12:26:20 2009 From: uludag at ebi.ac.uk (uludag at ebi.ac.uk) Date: Mon, 4 May 2009 17:26:20 +0100 (BST) Subject: [Bioperl-l] bioperl / emboss on windows In-Reply-To: References: <60194.86.149.78.35.1241451595.squirrel@webmail.ebi.ac.uk> Message-ID: <43939.86.149.78.35.1241454380.squirrel@webmail.ebi.ac.uk> > Okay, so I assume everything works then? I remember getting this to > work at some point on WinXP years ago (I have since moved on to Linux/ > Mac). I cannot say everything works but it looks like at least basic things are working. I just tested the 'water' example given on top of EMBOSS.pm. Example Bio::Seq inputs were properly transferred to 'water' and bioperl was able to construct Bio::AlignIO object from the output file EMBOSS generated. In the example, 'water' inputs are named as 'sequencea' and 'seqall', however, i needed to rename them as 'asequence' and 'bsequence' (i use mEMBOSS-6.0.1). Regards, Mahmut > On May 4, 2009, at 10:39 AM, uludag at ebi.ac.uk wrote: > >> >> It looks like EMBOSS was disabled in Bio\Factory\EMBOSS.pm for Windows >> platform. After commenting out the related condition in the >> _program_list >> function (as shown below) i don't get the "Application [needle] is not >> available" error any more. >> >> if( #$^O =~ /MSWIN/i || >> >> Regards, >> Mahmut >> >> >>> I have installed the bioperl and emboss on my* windows xp*, as >>> guided on >>> the web. But it >>> --------------------- WARNING --------------------- >>> *MSG: Application [needle] is not available!* >>> --------------------------------------------------- >>> >>> >>> use warnings; >>> use CGI; >>> use Bio::Perl; >>> use Bio::Root::Root; >>> use Bio::Factory::ApplicationFactoryI; >>> use Bio::Factory::EMBOSS; >>> use Bio::Tools::Run::EMBOSSApplication; >>> >>> >>> >>> *my $f = Bio::Factory::EMBOSS -> new();* >>> *$f->program("needle");* >>> #my $factory = new Bio::Factory::EMBOSS; >>> #my $compseqapp = $factory->program("needle"); From cjfields at illinois.edu Mon May 4 12:30:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 May 2009 11:30:23 -0500 Subject: [Bioperl-l] bioperl / emboss on windows In-Reply-To: <43939.86.149.78.35.1241454380.squirrel@webmail.ebi.ac.uk> References: <60194.86.149.78.35.1241451595.squirrel@webmail.ebi.ac.uk> <43939.86.149.78.35.1241454380.squirrel@webmail.ebi.ac.uk> Message-ID: Yes, I recall something along those lines. Parameters are something that need to be genericized for EMBOSS use. Good to hear it works, though. chris On May 4, 2009, at 11:26 AM, uludag at ebi.ac.uk wrote: > >> Okay, so I assume everything works then? I remember getting this to >> work at some point on WinXP years ago (I have since moved on to >> Linux/ >> Mac). > > I cannot say everything works but it looks like at least basic > things are > working. I just tested the 'water' example given on top of EMBOSS.pm. > Example Bio::Seq inputs were properly transferred to 'water' and > bioperl > was able to construct Bio::AlignIO object from the output file EMBOSS > generated. > > In the example, 'water' inputs are named as 'sequencea' and 'seqall', > however, i needed to rename them as 'asequence' and 'bsequence' (i use > mEMBOSS-6.0.1). > > Regards, > Mahmut > > >> On May 4, 2009, at 10:39 AM, uludag at ebi.ac.uk wrote: >> >>> >>> It looks like EMBOSS was disabled in Bio\Factory\EMBOSS.pm for >>> Windows >>> platform. After commenting out the related condition in the >>> _program_list >>> function (as shown below) i don't get the "Application [needle] is >>> not >>> available" error any more. >>> >>> if( #$^O =~ /MSWIN/i || >>> >>> Regards, >>> Mahmut >>> >>> >>>> I have installed the bioperl and emboss on my* windows xp*, as >>>> guided on >>>> the web. But it >>>> --------------------- WARNING --------------------- >>>> *MSG: Application [needle] is not available!* >>>> --------------------------------------------------- >>>> >>>> >>>> use warnings; >>>> use CGI; >>>> use Bio::Perl; >>>> use Bio::Root::Root; >>>> use Bio::Factory::ApplicationFactoryI; >>>> use Bio::Factory::EMBOSS; >>>> use Bio::Tools::Run::EMBOSSApplication; >>>> >>>> >>>> >>>> *my $f = Bio::Factory::EMBOSS -> new();* >>>> *$f->program("needle");* >>>> #my $factory = new Bio::Factory::EMBOSS; >>>> #my $compseqapp = $factory->program("needle"); > > From cjfields at illinois.edu Mon May 4 13:51:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 May 2009 12:51:23 -0500 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> References: <0F6F530C-3EE5-4F1D-AA03-151B810AB068@berkeleybop.org> Message-ID: <6D2B293A-7BC5-4F4D-8D8C-3579BB4FD5AB@illinois.edu> We can note it as deprecated for the next minor release (1.7). chris On Apr 29, 2009, at 3:58 PM, Chris Mungall wrote: > The .ontology files have been deprecated by GO. Use the .obo files > instead. > > It appears the bioperl parser for the .ontology files isn't able to > deal with the new relations in GO. I suggest that the > bioperl .ontology parser is deprecated too > > On Apr 22, 2009, at 6:38 AM, Hilmar Lapp wrote: > >> Hi Carlos, >> >> I am moving your inquiry to the BioPerl list, as the tool is a part >> of Bioperl-db and uses BioPerl for parsing the ontologies. >> >> In your case, the goflat parser in BioPerl seems to balk at the >> second one of the input files. It may be that the input file is >> (was?) corrupted, that does happen every once in a while. More >> likely though is that the goflat parser hasn't kept up with some >> format changes. Have you tried using the obo format version instead? >> >> -hilmar >> >> On Apr 20, 2009, at 11:44 AM, Carlos A. Canchaya wrote: >> >>> Hi guys >>> >>> I'm working with biosql and I try to figure out how to load >>> ontologies into biosql. >>> >>> I've tried >>> >>> load_ontology.pl --driver mysql --dbuser carlos --dbpass xxx -- >>> host localhost --dbname biosql --namespace "Gene Ontology" -- >>> format goflat --fmtargs "-defs_file,GO.defs" function.ontology >>> process.ontology component.ontology >>> >>> as in the script info but I have an error, >>> >>> >>> ------------------- WARNING --------------------- >>> MSG: DBLink exists in the dblink of _default >>> --------------------------------------------------- >>> >>> ------------- EXCEPTION ------------- >>> MSG: format error (file process.ontology) offending line: >>> -negative regulation of angiogenesis ; GO:0016525 ; synonym:down >>> regulation of angiogenesis ; synonym:down\-regulation of >>> angiogenesis ; synonym:downregulation of angiogenesis ; >>> synonym:inhibition of angiogenesis % negative regulation of >>> developmental process ; GO:0051093 % regulation of angiogenesis ; >>> GO:0045765 >>> >>> STACK Bio::OntologyIO::dagflat::_parse_flat_file /usr/local/share/ >>> perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 >>> STACK Bio::OntologyIO::dagflat::parse /usr/local/share/perl/5.10.0/ >>> Bio/OntologyIO/dagflat.pm:284 >>> STACK Bio::OntologyIO::dagflat::next_ontology /usr/local/share/ >>> perl/5.10.0/Bio/OntologyIO/dagflat.pm:317 >>> STACK toplevel /usr/local/share/biosql/bioperl-db/scripts/biosql/ >>> load_ontology.pl:604 >>> ------------------------------------- >>> >>> Any suggestion? >>> >>> Cheers, >>> >>> Carlos >>> >>> >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon May 4 14:20:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 May 2009 13:20:16 -0500 Subject: [Bioperl-l] Other object oddities In-Reply-To: <4D0732D667FD4A26B6161660107920E5@NewLife> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> <4D0732D667FD4A26B6161660107920E5@NewLife> Message-ID: <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> Sorry I haven't chimed in, but $job had killed me the last couple weeks! Unfortunately the reason this hasn't been chased down before is the headache involved. It requires significant API changes to a broadly used codebase (read: so devs are scared about breaking someone's old scripts), having to deal with deprecation cycles, not to mention the most critical aspect, which would be tuits. Saying that, the reason I made a 1.6 branch is to maintain the snapshot of the code for API reasons. There is no reason we can't add in more explicit methods to main trunk. We can deprecate the use of more ambiguous methods down the road. chris On May 4, 2009, at 10:50 AM, Mark A. Jensen wrote: > This is definitely a reasonable issue to chase down. How to do it > needs > a little care. I personally see 'seq' and think 'object', and have > resorted to > 'seqstr' in my own code to hold/access just strings. FWIW, my > preference would > be to have any object that has a seq object as a property return > objects > when a '..._seq' accessor is called. However, the seq objects > themselves > generally contain the sequence string in their seq() property. We > wouldn't > want to disrupt that, but would it be worth creating an alias getter/ > setter for > the Seq classes seq() property called 'seqstr'? We could then count on > > $foo->bar_seq, an object > $foo->bar_seq->seqstr, a string > $foo->seqstr, a string (not nec same as above) > > cheers Mark > ----- Original Message ----- From: "Kevin Brown" > > Cc: "BioPerl List" > Sent: Monday, May 04, 2009 11:31 AM > Subject: Re: [Bioperl-l] Other object oddities > > >> I don't mind that Bio::Seq uses seq to return a string. In fact I >> prefer >> that. Just would be nice if other objects obeyed the same convention. >> Bio::SeqFeature::Generic returns an object for both entire_seq and >> seq, >> but uses attach_seq to store the Bio::Seq object into the Feature. >> >> Maybe SeqFeature could be adjusted so that ->seq returns the sequence >> string of the feature (just like Bio::Seq) and ->feature_seq >> returns the >> Bio::Seq object. >> >>> -----Original Message----- >>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>> Sent: Sunday, May 03, 2009 11:37 AM >>> To: Kevin Brown >>> Cc: BioPerl List >>> Subject: Re: [Bioperl-l] Other object oddities >>> >>> I agree, $seq->seq() could possibly be better named. Maybe $seq- >>> >seqstr()? >>> >>> The thing is that having $seq->seq() return an object would be >>> meaningless - it would be $self. >>> >>> You can test what kind of object you have using ref() or isa(): >>> >>> $seq = $obj->seq(); >>> # we need the sequence string >>> $seq = $seq->seq() if ref($seq) && >>> $seq->isa("Bio::PrimarySeqI"); >>> >>> There has been a naming consistency review, but it's been a long >>> time. >>> >>> -hilmar >>> >>> >>> On Apr 30, 2009, at 5:56 PM, Kevin Brown wrote: >>> >>> > So, I'm using quite a bit of bioperl code in my own stuff and have >>> > been >>> > seeing some oddities with the naming of methods. A good example >>> > would be >>> > in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method >>> > called >>> > "seq" but in the latter case it returns an object (and expects an >>> > object >>> > when doing a Set) and in the former it returns a string and >>> expects a >>> > string when doing a Set. >>> > >>> > This makes for a bit of brain freeze on my part when the return >>> from >>> > another object might be a Bio::Seq or >>> Bio::SeqFeature::Generic and now >>> > calling the ->seq returns different things. >>> > >>> > Guess I'm just curious if anyone has done an audit of the >>> methods of >>> > the >>> > various objects and their return types to see how >>> consistent they are >>> > across even a subsection of the codebase? >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Mon May 4 14:25:54 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 4 May 2009 11:25:54 -0700 Subject: [Bioperl-l] Other object oddities In-Reply-To: <87C756F8-44FB-4930-8154-478BE50AE270@illinois.edu> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> <87C756F8-44FB-4930-8154-478BE50AE270@illinois.edu> Message-ID: <1A4207F8295607498283FE9E93B775B405F7F3F4@EX02.asurite.ad.asu.edu> > > I don't mind that Bio::Seq uses seq to return a string. In fact I > > prefer > > that. Just would be nice if other objects obeyed the same > convention. > > Bio::SeqFeature::Generic returns an object for both entire_seq and > > seq, > > but uses attach_seq to store the Bio::Seq object into the Feature. > > I think most of these are legacy issues that (for the most > part) have > just been dealt with ('they just work'), and with the thought that > changing things breaks legacy code. I agree with you, > though; it's a > good time to rethink how we're naming methods, work towards some > consistency, and possibly do this for the next significant > release. I > don't want to fall into the trap that perl 5.x had fallen > into (and is > working towards digging out of), namely fear of breaking old code. > > > Maybe SeqFeature could be adjusted so that ->seq returns > the sequence > > string of the feature (just like Bio::Seq) and > ->feature_seq returns > > the > > Bio::Seq object. > > That would be a significant API change and would be > inconsistent with > seq() in other classes returning a Bio::Seq. Not that it's any > different than some of the current behavior, but if we want > to correct > this it should be done in a *consistent*, well-defined way. Changing it in either set of objects would be a break in the API. Either it always returns an object or always returns a string. Right now Bio::Seq/LocatableSeq/PrimarySeq/etc... and others of its ilk return strings when calling ->seq() and also allow one to set the sequence with that same method. Bio::SeqFeature::*, the Bio::DB objects, etc... only allow one to get the seq object that way, but set it via a different method. > My thoughts: > > To me, seq() should always return a Bio::PrimarySeqI (derived from > invocant PrimarySeqI class). However, this is currently > inconsistent > as illustrated by your example. Changing this would require a > deprecation cycle. > > A new method, seqstr()/str()/rawseq(), could be guaranteed to > return a > raw sequence. Similarly, bioseq(), could always return a > Bio::PrimarySeqI. Those sound like possibilities. With one or another of the methods being aliased to ->seq if you still want to keep the call around. From cjfields at illinois.edu Mon May 4 15:11:18 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 4 May 2009 14:11:18 -0500 Subject: [Bioperl-l] Other object oddities In-Reply-To: <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu><1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu><4D0732D667FD4A26B6161660107920E5@NewLife> <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> Message-ID: <30B87481-2314-48D4-8E84-F90FB02E90DB@illinois.edu> On May 4, 2009, at 2:01 PM, Mark A. Jensen wrote: > [I hear you re: $job] > Def. thanks for chiming- Maybe this should be an element of > the "Align refactor" that perhaps should be an overall > "Seq refactor". > > Are you saying that the trunk is fair game for api additions > for this issue? > cheers I don't think anyone should feel afraid to change things on trunk, but I think significant changes should be discussed here so everyone has a chance to chime in. And API additions are not nearly as severe as having a method like seq() return a different value. In fact, I personally don't have a problem with merging that to the 1.6 branch (others may disagree though). I consider it a 'bug fix' in a loose way. chris From maj at fortinbras.us Mon May 4 15:01:41 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 4 May 2009 15:01:41 -0400 Subject: [Bioperl-l] Other object oddities In-Reply-To: <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu><1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu><4D0732D667FD4A26B6161660107920E5@NewLife> <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> Message-ID: <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> [I hear you re: $job] Def. thanks for chiming- Maybe this should be an element of the "Align refactor" that perhaps should be an overall "Seq refactor". Are you saying that the trunk is fair game for api additions for this issue? cheers ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Kevin Brown" Sent: Monday, May 04, 2009 2:20 PM Subject: Re: [Bioperl-l] Other object oddities > Sorry I haven't chimed in, but $job had killed me the last couple weeks! > > Unfortunately the reason this hasn't been chased down before is the headache > involved. It requires significant API changes to a broadly used codebase > (read: so devs are scared about breaking someone's old scripts), having to > deal with deprecation cycles, not to mention the most critical aspect, which > would be tuits. > > Saying that, the reason I made a 1.6 branch is to maintain the snapshot of > the code for API reasons. There is no reason we can't add in more explicit > methods to main trunk. We can deprecate the use of more ambiguous methods > down the road. > > chris > > On May 4, 2009, at 10:50 AM, Mark A. Jensen wrote: > >> This is definitely a reasonable issue to chase down. How to do it needs >> a little care. I personally see 'seq' and think 'object', and have resorted >> to >> 'seqstr' in my own code to hold/access just strings. FWIW, my preference >> would >> be to have any object that has a seq object as a property return objects >> when a '..._seq' accessor is called. However, the seq objects themselves >> generally contain the sequence string in their seq() property. We wouldn't >> want to disrupt that, but would it be worth creating an alias getter/ setter >> for >> the Seq classes seq() property called 'seqstr'? We could then count on >> >> $foo->bar_seq, an object >> $foo->bar_seq->seqstr, a string >> $foo->seqstr, a string (not nec same as above) >> >> cheers Mark >> ----- Original Message ----- From: "Kevin Brown" > > >> Cc: "BioPerl List" >> Sent: Monday, May 04, 2009 11:31 AM >> Subject: Re: [Bioperl-l] Other object oddities >> >> >>> I don't mind that Bio::Seq uses seq to return a string. In fact I prefer >>> that. Just would be nice if other objects obeyed the same convention. >>> Bio::SeqFeature::Generic returns an object for both entire_seq and seq, >>> but uses attach_seq to store the Bio::Seq object into the Feature. >>> >>> Maybe SeqFeature could be adjusted so that ->seq returns the sequence >>> string of the feature (just like Bio::Seq) and ->feature_seq returns the >>> Bio::Seq object. >>> >>>> -----Original Message----- >>>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>>> Sent: Sunday, May 03, 2009 11:37 AM >>>> To: Kevin Brown >>>> Cc: BioPerl List >>>> Subject: Re: [Bioperl-l] Other object oddities >>>> >>>> I agree, $seq->seq() could possibly be better named. Maybe $seq- >>>> >seqstr()? >>>> >>>> The thing is that having $seq->seq() return an object would be >>>> meaningless - it would be $self. >>>> >>>> You can test what kind of object you have using ref() or isa(): >>>> >>>> $seq = $obj->seq(); >>>> # we need the sequence string >>>> $seq = $seq->seq() if ref($seq) && >>>> $seq->isa("Bio::PrimarySeqI"); >>>> >>>> There has been a naming consistency review, but it's been a long time. >>>> >>>> -hilmar >>>> >>>> >>>> On Apr 30, 2009, at 5:56 PM, Kevin Brown wrote: >>>> >>>> > So, I'm using quite a bit of bioperl code in my own stuff and have >>>> > been >>>> > seeing some oddities with the naming of methods. A good example >>>> > would be >>>> > in the Bio::Seq and Bio::SeqFeature::Generic. Both have a method >>>> > called >>>> > "seq" but in the latter case it returns an object (and expects an >>>> > object >>>> > when doing a Set) and in the former it returns a string and >>>> expects a >>>> > string when doing a Set. >>>> > >>>> > This makes for a bit of brain freeze on my part when the return >>>> from >>>> > another object might be a Bio::Seq or >>>> Bio::SeqFeature::Generic and now >>>> > calling the ->seq returns different things. >>>> > >>>> > Guess I'm just curious if anyone has done an audit of the >>>> methods of >>>> > the >>>> > various objects and their return types to see how >>>> consistent they are >>>> > across even a subsection of the codebase? >>>> > >>>> > _______________________________________________ >>>> > Bioperl-l mailing list >>>> > Bioperl-l at lists.open-bio.org >>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> =========================================================== >>>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>>> =========================================================== >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From punit_vergoboy2004 at yahoo.co.in Mon May 4 15:31:34 2009 From: punit_vergoboy2004 at yahoo.co.in (punit kumar) Date: Tue, 5 May 2009 01:01:34 +0530 (IST) Subject: [Bioperl-l] machine learnings Message-ID: <704392.20390.qm@web8402.mail.in.yahoo.com> hello i am punit kumar , i want to know that is the artificial neural network, and other machine learnings techniques?modules are availabe in? bio perl or not, if available pls give suggestion that how i?can utilise them.? ? ? ? ? punit kumar kadimi. Cricket on your mind? Visit the ultimate cricket website. Enter http://beta.cricket.yahoo.com From wangyi2412 at gmail.com Mon May 4 23:59:54 2009 From: wangyi2412 at gmail.com (yi wang) Date: Tue, 5 May 2009 11:59:54 +0800 Subject: [Bioperl-l] bioperl / emboss on windows In-Reply-To: <60194.86.149.78.35.1241451595.squirrel@webmail.ebi.ac.uk> References: <60194.86.149.78.35.1241451595.squirrel@webmail.ebi.ac.uk> Message-ID: Thanks to your good thought, which reminds me doing somthing tracking the emboss.pm. I found beside mswin supporting, there is another problem: *open(WOSSOUT, "wossname -auto |") is not successful*, so the WOSSOUT is got empty, and the following while loop does not executed, in which important data is set. So, does anybody know how to fix this problem? Thanks very much! Best Wishes! 2009/5/4 > > It looks like EMBOSS was disabled in Bio\Factory\EMBOSS.pm for Windows > platform. After commenting out the related condition in the _program_list > function (as shown below) i don't get the "Application [needle] is not > available" error any more. > > if( #$^O =~ /MSWIN/i || > > Regards, > Mahmut > > > > I have installed the bioperl and emboss on my* windows xp*, as guided on > > the web. But it > > --------------------- WARNING --------------------- > > *MSG: Application [needle] is not available!* > > --------------------------------------------------- > > > > > > use warnings; > > use CGI; > > use Bio::Perl; > > use Bio::Root::Root; > > use Bio::Factory::ApplicationFactoryI; > > use Bio::Factory::EMBOSS; > > use Bio::Tools::Run::EMBOSSApplication; > > > > > > > > *my $f = Bio::Factory::EMBOSS -> new();* > > *$f->program("needle");* > > #my $factory = new Bio::Factory::EMBOSS; > > #my $compseqapp = $factory->program("needle"); > > > > -- ?????????? From uludag at ebi.ac.uk Tue May 5 00:55:46 2009 From: uludag at ebi.ac.uk (uludag at ebi.ac.uk) Date: Tue, 5 May 2009 05:55:46 +0100 (BST) Subject: [Bioperl-l] bioperl / emboss on windows In-Reply-To: References: <60194.86.149.78.35.1241451595.squirrel@webmail.ebi.ac.uk> Message-ID: <42480.86.149.78.35.1241499346.squirrel@webmail.ebi.ac.uk> > I found beside mswin supporting, there is another problem: > *open(WOSSOUT, "wossname -auto |") > is not successful*, so the WOSSOUT is got empty, and the following while > loop does not executed, in which important data is set. As Scott wrote yesterday you can double check whether EMBOSS programs are included in your PATH environment variable. Installing mEMBOSS (version of EMBOSS for Windows) from EMBOSS ftp site should automatically update PATH environment variable (otherwise you should update it manually, Control Panel->System->Advanced->Environment Variables). ftp://emboss.open-bio.org/pub/EMBOSS/windows/ to check whether your PATH environment variable has successfully been updated you can call 'wossname -auto' command from Windows Command Prompt, it should return names of EMBOSS programs with their short descriptions. Regards, Mahmut From wangyi2412 at gmail.com Tue May 5 03:32:42 2009 From: wangyi2412 at gmail.com (yi wang) Date: Tue, 5 May 2009 15:32:42 +0800 Subject: [Bioperl-l] Solved: bioperl / emboss on windows Message-ID: Thank you very much! How big a mistake I have made! I did not even got the emboss actually, but I thought the bioperl,bioperl-run was enough, because the installed emboss.pm made me think so. Now, it's clear the bioperl and bioperl-run are just base for calling bio-perl module and external programs like emboss, but itself does not contain such things. Emboss.pm is just a handle for calling that emboss module. How foolish I was! Thank you for your patient and detailed answer very much! Best Wishes! 2009/5/5 > > > I found beside mswin supporting, there is another problem: > > *open(WOSSOUT, "wossname -auto |") > > is not successful*, so the WOSSOUT is got empty, and the following while > > loop does not executed, in which important data is set. > > As Scott wrote yesterday you can double check whether EMBOSS programs are > included in your PATH environment variable. Installing mEMBOSS (version of > EMBOSS for Windows) from EMBOSS ftp site should automatically update PATH > environment variable (otherwise you should update it manually, Control > Panel->System->Advanced->Environment Variables). > > ftp://emboss.open-bio.org/pub/EMBOSS/windows/ > > to check whether your PATH environment variable has successfully been > updated you can call 'wossname -auto' command from Windows Command Prompt, > it should return names of EMBOSS programs with their short descriptions. > > Regards, > Mahmut > > > -- ?????????? From hlapp at gmx.net Tue May 5 08:31:41 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 5 May 2009 08:31:41 -0400 Subject: [Bioperl-l] Other object oddities In-Reply-To: <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu><1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu><4D0732D667FD4A26B6161660107920E5@NewLife> <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> Message-ID: <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: > Maybe this should be an element of > the "Align refactor" that perhaps should be an overall > "Seq refactor". Possibly. Most importantly, it'd be great if someone would volunteer to summarize what's been said here so it won't get lost. > Are you saying that the trunk is fair game for api additions > for this issue? There's been talk some (a long, actually) time ago about BioPerl 2.0 that would start on a clean slate and not be bothered by backwards compatibility demands. That effort never really took off, but maybe this is also a good time to ask the question again whether it's better to introduce the API changes we desire in add/deprecate/remove cycles, or in a more radical fashion starting on a clean slate. The obvious advantage of the former is that we get API improvements sooner, but making them is possibly more dreadful, discouraging, or not even doable due to compatibility constraints. The disadvantage of the latter is that it really needs a committed crew of people to see it through or otherwise all the nice changes die in some grand but half-finished 2.0 construction site. I think Chris also had plans to branch off a Perl6 version of BioPerl - maybe those could be the same efforts? I'm not trying to advocate one over the other here; rather, I'd like to help push on that front that is best able to capture the energy of volunteers, as that's what it takes in the end. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Tue May 5 10:31:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 5 May 2009 09:31:23 -0500 Subject: [Bioperl-l] Other object oddities In-Reply-To: <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu><1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu><4D0732D667FD4A26B6161660107920E5@NewLife> <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> Message-ID: <6B76016F-60E8-4FE5-B083-E64762D79039@illinois.edu> On May 5, 2009, at 7:31 AM, Hilmar Lapp wrote: > > On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: > >> Maybe this should be an element of >> the "Align refactor" that perhaps should be an overall >> "Seq refactor". > > Possibly. Most importantly, it'd be great if someone would volunteer > to summarize what's been said here so it won't get lost. Looks like mark's done it. >> Are you saying that the trunk is fair game for api additions >> for this issue? > > There's been talk some (a long, actually) time ago about BioPerl 2.0 > that would start on a clean slate and not be bothered by backwards > compatibility demands. That effort never really took off, but maybe > this is also a good time to ask the question again whether it's > better to introduce the API changes we desire in add/deprecate/ > remove cycles, or in a more radical fashion starting on a clean slate. That's what I'm thinking. > The obvious advantage of the former is that we get API improvements > sooner, but making them is possibly more dreadful, discouraging, or > not even doable due to compatibility constraints. The disadvantage > of the latter is that it really needs a committed crew of people to > see it through or otherwise all the nice changes die in some grand > but half-finished 2.0 construction site. I think Chris also had > plans to branch off a Perl6 version of BioPerl - maybe those could > be the same efforts? I have been toying around with perl6 for a bit now (Rakudo on Parrot implementation). It's possible an alpha for perl6 will be available by christmas this year; Rakudo is now passing over 11000 spec tests. Just to note, Perl6 is another beast altogether from Perl5. Yes, there is supposed to be a backwards compatibility mode, but no one has implemented that yet, and it likely won't be implemented in the near future. Based on that I'm not sure we could really call a bioperl in perl6 bioperl 2.0, more like bioperl6 1.0, as it would be a complete refactor. As for perl5, it has a nice OO set of modules (Moose) that could be used for refactoring. It implements roles and a few other perl6-ish bits (along with MooseX modules). perl 5.10 also has a few things backported from p6, say(), given/when, state vars, etc. We could require Modern::Perl (perl5.10 with strict/warnings pragmas on) and Moose. I have played around with both and find them quite nice, so I suggest if we were to start a 2.0 effort it should include Moose, and we should push most of the interfaces into roles. Anyway, I grabbed the git repos for bioperl6 and biomoose (bioperl implemented in Moose) on github. We can set up something there using those namespaces if needed. > I'm not trying to advocate one over the other here; rather, I'd like > to help push on that front that is best able to capture the energy > of volunteers, as that's what it takes in the end. > > -hilmar Depends on where everyone wants to place their efforts. May be less work to port the most important core classes over to Moose, and a simple test implementation will give us an idea on what works Role- wise and what doesn't. From there we could work on p6 variants; that would have to be a separate project altogether. We could also include a few other MooseX modules if it makes life easier. chris From maj at fortinbras.us Tue May 5 10:13:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 5 May 2009 10:13:04 -0400 Subject: [Bioperl-l] Other object oddities In-Reply-To: <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu><1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu><4D0732D667FD4A26B6161660107920E5@NewLife> <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> Message-ID: <727BD57B31FE464082258697A4D742A7@NewLife> > Possibly. Most importantly, it'd be great if someone would volunteer > to summarize what's been said here so it won't get lost. > http://www.bioperl.org/wiki/Naming_Conventions_and_the_Future From cjm at berkeleybop.org Tue May 5 14:28:02 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Tue, 5 May 2009 11:28:02 -0700 Subject: [Bioperl-l] Moose [was Re: Other object oddities] In-Reply-To: <6B76016F-60E8-4FE5-B083-E64762D79039@illinois.edu> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu><1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu><4D0732D667FD4A26B6161660107920E5@NewLife> <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> <6B76016F-60E8-4FE5-B083-E64762D79039@illinois.edu> Message-ID: <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> On May 5, 2009, at 7:31 AM, Chris Fields wrote: > On May 5, 2009, at 7:31 AM, Hilmar Lapp wrote: > >> >> On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: >> >>> Maybe this should be an element of >>> the "Align refactor" that perhaps should be an overall >>> "Seq refactor". >> >> Possibly. Most importantly, it'd be great if someone would >> volunteer to summarize what's been said here so it won't get lost. > > Looks like mark's done it. > >>> Are you saying that the trunk is fair game for api additions >>> for this issue? >> >> There's been talk some (a long, actually) time ago about BioPerl >> 2.0 that would start on a clean slate and not be bothered by >> backwards compatibility demands. That effort never really took off, >> but maybe this is also a good time to ask the question again >> whether it's better to introduce the API changes we desire in add/ >> deprecate/remove cycles, or in a more radical fashion starting on a >> clean slate. > > That's what I'm thinking. > >> The obvious advantage of the former is that we get API improvements >> sooner, but making them is possibly more dreadful, discouraging, or >> not even doable due to compatibility constraints. The disadvantage >> of the latter is that it really needs a committed crew of people to >> see it through or otherwise all the nice changes die in some grand >> but half-finished 2.0 construction site. I think Chris also had >> plans to branch off a Perl6 version of BioPerl - maybe those could >> be the same efforts? > > I have been toying around with perl6 for a bit now (Rakudo on Parrot > implementation). It's possible an alpha for perl6 will be available > by christmas this year; Rakudo is now passing over 11000 spec tests. > > Just to note, Perl6 is another beast altogether from Perl5. Yes, > there is supposed to be a backwards compatibility mode, but no one > has implemented that yet, and it likely won't be implemented in the > near future. Based on that I'm not sure we could really call a > bioperl in perl6 bioperl 2.0, more like bioperl6 1.0, as it would be > a complete refactor. > > As for perl5, it has a nice OO set of modules (Moose) that could be > used for refactoring. It implements roles and a few other perl6-ish > bits (along with MooseX modules). perl 5.10 also has a few things > backported from p6, say(), given/when, state vars, etc. We could > require Modern::Perl (perl5.10 with strict/warnings pragmas on) and > Moose. I have played around with both and find them quite nice, so > I suggest if we were to start a 2.0 effort it should include Moose, > and we should push most of the interfaces into roles. We're playing around with a rewrite of go-perl using Moose: http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/OBO/ This is early enough that parts could be scrapped or rewritten. Compatibility with bioperl is important. Speed was an initial concern but apparently there are some moose tricks to speed things up DBIx::Class compatibility is also important. Not sure if there is specific support for this yet > > Anyway, I grabbed the git repos for bioperl6 and biomoose (bioperl > implemented in Moose) on github. We can set up something there > using those namespaces if needed. > >> I'm not trying to advocate one over the other here; rather, I'd >> like to help push on that front that is best able to capture the >> energy of volunteers, as that's what it takes in the end. >> >> -hilmar > > Depends on where everyone wants to place their efforts. May be less > work to port the most important core classes over to Moose, and a > simple test implementation will give us an idea on what works Role- > wise and what doesn't. From there we could work on p6 variants; > that would have to be a separate project altogether. We could also > include a few other MooseX modules if it makes life easier. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From sidd.basu at gmail.com Tue May 5 16:51:07 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Tue, 5 May 2009 15:51:07 -0500 Subject: [Bioperl-l] Re: Moose [was Re:Other object oddities] In-Reply-To: <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> References: <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> <6B76016F-60E8-4FE5-B083-E64762D79039@illinois.edu> <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> Message-ID: <20090505205105.GD422@Macintosh-47.local> On Tue, 05 May 2009, Chris Mungall wrote: > > On May 5, 2009, at 7:31 AM, Chris Fields wrote: > > > On May 5, 2009, at 7:31 AM, Hilmar Lapp wrote: > > > >> > >> On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: > >> > >>> Maybe this should be an element of > >>> the "Align refactor" that perhaps should be an overall > >>> "Seq refactor". > >> > >> Possibly. Most importantly, it'd be great if someone would volunteer to > >> summarize what's been said here so it won't get lost. > > > > Looks like mark's done it. > > > >>> Are you saying that the trunk is fair game for api additions > >>> for this issue? > >> > >> There's been talk some (a long, actually) time ago about BioPerl 2.0 that > >> would start on a clean slate and not be bothered by backwards > >> compatibility demands. That effort never really took off, but maybe this > >> is also a good time to ask the question again whether it's better to > >> introduce the API changes we desire in add/deprecate/remove cycles, or in > >> a more radical fashion starting on a clean slate. > > > > That's what I'm thinking. > > > >> The obvious advantage of the former is that we get API improvements > >> sooner, but making them is possibly more dreadful, discouraging, or not > >> even doable due to compatibility constraints. The disadvantage of the > >> latter is that it really needs a committed crew of people to see it > >> through or otherwise all the nice changes die in some grand but > >> half-finished 2.0 construction site. I think Chris also had plans to > >> branch off a Perl6 version of BioPerl - maybe those could be the same > >> efforts? > > > > I have been toying around with perl6 for a bit now (Rakudo on Parrot > > implementation). It's possible an alpha for perl6 will be available by > > christmas this year; Rakudo is now passing over 11000 spec tests. > > > > Just to note, Perl6 is another beast altogether from Perl5. Yes, there is > > supposed to be a backwards compatibility mode, but no one has implemented > > that yet, and it likely won't be implemented in the near future. Based on > > that I'm not sure we could really call a bioperl in perl6 bioperl 2.0, > > more like bioperl6 1.0, as it would be a complete refactor. > > > > As for perl5, it has a nice OO set of modules (Moose) that could be used > > for refactoring. It implements roles and a few other perl6-ish bits > > (along with MooseX modules). perl 5.10 also has a few things backported > > from p6, say(), given/when, state vars, etc. We could require > > Modern::Perl (perl5.10 with strict/warnings pragmas on) and Moose. I have > > played around with both and find them quite nice, so I suggest if we were > > to start a 2.0 effort it should include Moose, and we should push most of > > the interfaces into roles. > > We're playing around with a rewrite of go-perl using Moose: > http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/OBO/ > > This is early enough that parts could be scrapped or rewritten. > Compatibility with bioperl is important. > > Speed was an initial concern but apparently there are some moose tricks to > speed things up > > DBIx::Class compatibility is also important. Not sure if there is specific > support for this yet > > > > > > Anyway, I grabbed the git repos for bioperl6 and biomoose (bioperl > > implemented in Moose) on github. We can set up something there using > > those namespaces if needed. > > > >> I'm not trying to advocate one over the other here; rather, I'd like to > >> help push on that front that is best able to capture the energy of > >> volunteers, as that's what it takes in the end. I would definitely like to volunteer for 'biomoose' project as much as my skills will permit. I wrote a 'homologene' parser in early Moose days(0.3) and till then quite interested to work on a Moose based project. Hopefully will be able to help as the project takes some shape. Though quite early, two MooseX extension that worth looking, MooseX::Declare http://search.cpan.org/~flora/MooseX-Declare-0.21/lib/MooseX/Declare.pm MooseX::MultiMethods http://search.cpan.org/~flora/MooseX-MultiMethods-0.02/lib/MooseX/MultiMethods.pm thanks, -siddhartha > >> > >> -hilmar > > > > Depends on where everyone wants to place their efforts. May be less work > > to port the most important core classes over to Moose, and a simple test > > implementation will give us an idea on what works Role-wise and what > > doesn't. From there we could work on p6 variants; that would have to be a > > separate project altogether. We could also include a few other MooseX > > modules if it makes life easier. > > > > chris > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Tue May 5 17:42:19 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 5 May 2009 14:42:19 -0700 Subject: [Bioperl-l] question about in-between overlapping exact location Message-ID: <18944.45755.94431.882844@already.local> I was surprised to see that: $ins = Bio::Location::Simple->new(-start => 2, -end => 3, -location_type => 'IN-BETWEEN', ); $start = Bio::Location::Simple->new(-start => 3, -end => 5); print "Wow!\n" if $start->overlaps($ins); To my mind they would only overlap if the insertion were 3^4 or 4^5. Is my mental model of in-between's overlapping exact's wrong, or could the code be improved (I'm happy to make a change, but...)? g. From jason at bioperl.org Tue May 5 18:06:50 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 5 May 2009 15:06:50 -0700 Subject: [Bioperl-l] question about in-between overlapping exact location In-Reply-To: <18944.45755.94431.882844@already.local> References: <18944.45755.94431.882844@already.local> Message-ID: <1E9CA287-58C3-48B1-B9AD-3AC9541984C3@bioperl.org> George - I don't think the location type is taken into account in the overlap code testing code. Would you expect 2..3 and 3..5 to overlap? -jason On May 5, 2009, at 2:42 PM, George Hartzell wrote: > > I was surprised to see that: > > $ins = Bio::Location::Simple->new(-start => 2, > -end => 3, > -location_type => 'IN-BETWEEN', > ); > $start = Bio::Location::Simple->new(-start => 3, > -end => 5); > > print "Wow!\n" if $start->overlaps($ins); > > To my mind they would only overlap if the insertion were 3^4 or 4^5. > > Is my mental model of in-between's overlapping exact's wrong, or could > the code be improved (I'm happy to make a change, but...)? > > g. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From hartzell at alerce.com Wed May 6 00:17:49 2009 From: hartzell at alerce.com (George Hartzell) Date: Tue, 5 May 2009 21:17:49 -0700 Subject: [Bioperl-l] question about in-between overlapping exact location In-Reply-To: <1E9CA287-58C3-48B1-B9AD-3AC9541984C3@bioperl.org> References: <18944.45755.94431.882844@already.local> <1E9CA287-58C3-48B1-B9AD-3AC9541984C3@bioperl.org> Message-ID: <18945.3949.852961.763626@already.local> Jason Stajich writes: > George - > I don't think the location type is taken into account in the overlap > code testing code. Would you expect 2..3 and 3..5 to overlap? > > -jason > On May 5, 2009, at 2:42 PM, George Hartzell wrote: > > > > > I was surprised to see that: > > > > $ins = Bio::Location::Simple->new(-start => 2, > > -end => 3, > > -location_type => 'IN-BETWEEN', > > ); > > $start = Bio::Location::Simple->new(-start => 3, > > -end => 5); > > > > print "Wow!\n" if $start->overlaps($ins); > > > > To my mind they would only overlap if the insertion were 3^4 or 4^5. > > > > Is my mental model of in-between's overlapping exact's wrong, or could > > the code be improved (I'm happy to make a change, but...)? Yep, I'd expect them to overlap. 1 2 3 4 5 A T T A A I'm trying to ask a question like the following. Given a location that describes an e.g. start codon (3..5) and a description of a mutation, does the mutation cause a change in the ATG. Substitutions are described with exact locations (change bases 3..4 from AT to TA) and insertions are modeled as in-between locations (insert G at 3^4). 1 2 3 4 5 6 A T G T G C C Given 3..5, I can just ask if 3..4 overlaps it (yes), if 3 overlaps it (yes) and if 3^4 overlaps it (yes). For things to work out this easily, 2^3 shouldn't overlap (an insertion there wouldn't change the codon). I can get the in-between to work by using RangeI->contains, but then I end up with 1..4 not "causing a change". I've ended up with a two part if() that checks the location_type and uses ->overlap() or ->contains() so that it works out. g. From webb.daniel at yahoo.com Wed May 6 07:21:47 2009 From: webb.daniel at yahoo.com (Daniel Webb) Date: Wed, 6 May 2009 04:21:47 -0700 (PDT) Subject: [Bioperl-l] retrieving gene sequence given protein id Message-ID: <277937.66024.qm@web45507.mail.sp1.yahoo.com> Hi all, is there a script or a module with which I could, given the list of protein gi or accessions, retrieve corresponding genes from Entrez Gene/GenBank? What I would like is sequence of the whole gene in fasta format, with all the introns and UTRs. I would be grateful for any help Dan From hlapp at gmx.net Wed May 6 07:54:14 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 May 2009 07:54:14 -0400 Subject: [Bioperl-l] question about in-between overlapping exact location In-Reply-To: <18945.3949.852961.763626@already.local> References: <18944.45755.94431.882844@already.local> <1E9CA287-58C3-48B1-B9AD-3AC9541984C3@bioperl.org> <18945.3949.852961.763626@already.local> Message-ID: This sounds like a bug to me - the location type should be taken into account, shouldn't it? Would you mind submitting this (and a patch if you have one :) to bugzilla? -hilmar On May 6, 2009, at 12:17 AM, George Hartzell wrote: > > Jason Stajich writes: >> George - >> I don't think the location type is taken into account in the overlap >> code testing code. Would you expect 2..3 and 3..5 to overlap? >> >> -jason >> On May 5, 2009, at 2:42 PM, George Hartzell wrote: >> >>> >>> I was surprised to see that: >>> >>> $ins = Bio::Location::Simple->new(-start => 2, >>> -end => 3, >>> -location_type => 'IN-BETWEEN', >>> ); >>> $start = Bio::Location::Simple->new(-start => 3, >>> -end => 5); >>> >>> print "Wow!\n" if $start->overlaps($ins); >>> >>> To my mind they would only overlap if the insertion were 3^4 or 4^5. >>> >>> Is my mental model of in-between's overlapping exact's wrong, or >>> could >>> the code be improved (I'm happy to make a change, but...)? > > Yep, I'd expect them to overlap. > > 1 2 3 4 5 > A T > T A A > > I'm trying to ask a question like the following. Given a location > that describes an e.g. start codon (3..5) and a description of a > mutation, does the mutation cause a change in the ATG. Substitutions > are described with exact locations (change bases 3..4 from AT to TA) > and insertions are modeled as in-between locations (insert G at 3^4). > > 1 2 3 4 5 6 > A T G > T G C C > > Given 3..5, I can just ask if 3..4 overlaps it (yes), if 3 overlaps it > (yes) and if 3^4 overlaps it (yes). For things to work out this > easily, 2^3 shouldn't overlap (an insertion there wouldn't change the > codon). > > I can get the in-between to work by using RangeI->contains, but then I > end up with 1..4 not "causing a change". > > I've ended up with a two part if() that checks the location_type and > uses ->overlap() or ->contains() so that it works out. > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Wed May 6 10:26:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 May 2009 09:26:35 -0500 Subject: [Bioperl-l] question about in-between overlapping exact location In-Reply-To: References: <18944.45755.94431.882844@already.local> <1E9CA287-58C3-48B1-B9AD-3AC9541984C3@bioperl.org> <18945.3949.852961.763626@already.local> Message-ID: <0FF287BC-6EFE-498E-81BD-3D1E8DF37353@illinois.edu> We should definitely come up with some test cases and expected results for this; e.g. whether 2^3 should overlap with 1..2 or 3..5, etc (I would guess, in the latter example, they shouldn't). Also, as these are LocationI-specific, I'm not sure we should make changes to RangeI methods. Maybe fix LocationI-specific bits within LocationI and delegate to RangeI::overlaps/etc in simple cases? chris On May 6, 2009, at 6:54 AM, Hilmar Lapp wrote: > This sounds like a bug to me - the location type should be taken > into account, shouldn't it? > > Would you mind submitting this (and a patch if you have one :) to > bugzilla? > > -hilmar > > On May 6, 2009, at 12:17 AM, George Hartzell wrote: > >> >> Jason Stajich writes: >>> George - >>> I don't think the location type is taken into account in the overlap >>> code testing code. Would you expect 2..3 and 3..5 to overlap? >>> >>> -jason >>> On May 5, 2009, at 2:42 PM, George Hartzell wrote: >>> >>>> >>>> I was surprised to see that: >>>> >>>> $ins = Bio::Location::Simple->new(-start => 2, >>>> -end => 3, >>>> -location_type => 'IN-BETWEEN', >>>> ); >>>> $start = Bio::Location::Simple->new(-start => 3, >>>> -end => 5); >>>> >>>> print "Wow!\n" if $start->overlaps($ins); >>>> >>>> To my mind they would only overlap if the insertion were 3^4 or >>>> 4^5. >>>> >>>> Is my mental model of in-between's overlapping exact's wrong, or >>>> could >>>> the code be improved (I'm happy to make a change, but...)? >> >> Yep, I'd expect them to overlap. >> >> 1 2 3 4 5 >> A T >> T A A >> >> I'm trying to ask a question like the following. Given a location >> that describes an e.g. start codon (3..5) and a description of a >> mutation, does the mutation cause a change in the ATG. Substitutions >> are described with exact locations (change bases 3..4 from AT to TA) >> and insertions are modeled as in-between locations (insert G at 3^4). >> >> 1 2 3 4 5 6 >> A T G >> T G C C >> >> Given 3..5, I can just ask if 3..4 overlaps it (yes), if 3 overlaps >> it >> (yes) and if 3^4 overlaps it (yes). For things to work out this >> easily, 2^3 shouldn't overlap (an insertion there wouldn't change the >> codon). >> >> I can get the in-between to work by using RangeI->contains, but >> then I >> end up with 1..4 not "causing a change". >> >> I've ended up with a two part if() that checks the location_type and >> uses ->overlap() or ->contains() so that it works out. >> >> g. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 6 10:32:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 May 2009 09:32:51 -0500 Subject: [Bioperl-l] Moose [was Re: Other object oddities] In-Reply-To: <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu><1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu><4D0732D667FD4A26B6161660107920E5@NewLife> <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> <6B76016F-60E8-4FE5-B083-E64762D79039@illinois.edu> <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> Message-ID: <17AF855A-AC55-4322-BE15-050F5EE3E802@illinois.edu> On May 5, 2009, at 1:28 PM, Chris Mungall wrote: > > On May 5, 2009, at 7:31 AM, Chris Fields wrote: > >> On May 5, 2009, at 7:31 AM, Hilmar Lapp wrote: >> >>> >>> On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: >>> >>>> Maybe this should be an element of >>>> the "Align refactor" that perhaps should be an overall >>>> "Seq refactor". >>> >>> Possibly. Most importantly, it'd be great if someone would >>> volunteer to summarize what's been said here so it won't get lost. >> >> Looks like mark's done it. >> >>>> Are you saying that the trunk is fair game for api additions >>>> for this issue? >>> >>> There's been talk some (a long, actually) time ago about BioPerl >>> 2.0 that would start on a clean slate and not be bothered by >>> backwards compatibility demands. That effort never really took >>> off, but maybe this is also a good time to ask the question again >>> whether it's better to introduce the API changes we desire in add/ >>> deprecate/remove cycles, or in a more radical fashion starting on >>> a clean slate. >> >> That's what I'm thinking. >> >>> The obvious advantage of the former is that we get API >>> improvements sooner, but making them is possibly more dreadful, >>> discouraging, or not even doable due to compatibility constraints. >>> The disadvantage of the latter is that it really needs a committed >>> crew of people to see it through or otherwise all the nice changes >>> die in some grand but half-finished 2.0 construction site. I think >>> Chris also had plans to branch off a Perl6 version of BioPerl - >>> maybe those could be the same efforts? >> >> I have been toying around with perl6 for a bit now (Rakudo on >> Parrot implementation). It's possible an alpha for perl6 will be >> available by christmas this year; Rakudo is now passing over 11000 >> spec tests. >> >> Just to note, Perl6 is another beast altogether from Perl5. Yes, >> there is supposed to be a backwards compatibility mode, but no one >> has implemented that yet, and it likely won't be implemented in the >> near future. Based on that I'm not sure we could really call a >> bioperl in perl6 bioperl 2.0, more like bioperl6 1.0, as it would >> be a complete refactor. >> >> As for perl5, it has a nice OO set of modules (Moose) that could be >> used for refactoring. It implements roles and a few other perl6- >> ish bits (along with MooseX modules). perl 5.10 also has a few >> things backported from p6, say(), given/when, state vars, etc. We >> could require Modern::Perl (perl5.10 with strict/warnings pragmas >> on) and Moose. I have played around with both and find them quite >> nice, so I suggest if we were to start a 2.0 effort it should >> include Moose, and we should push most of the interfaces into roles. > > We're playing around with a rewrite of go-perl using Moose: > http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/OBO/ > > This is early enough that parts could be scrapped or rewritten. > Compatibility with bioperl is important. I don't think it needs to be scrapped. A stable Moose-based BioPerl is probably still a ways off from production use (I would like to test out a bit of interface->role conversion). > Speed was an initial concern but apparently there are some moose > tricks to speed things up > > DBIx::Class compatibility is also important. Not sure if there is > specific support for this yet I'm not sure about DBIx::Class, but I know Moose sometimes doesn't play well with Error.pm and it's exported methods (I think there is a conflict). I believe there have been some musings in the past over changing Bio::Root::Exceptions to use Exception::Class or similar, so maybe this'll be the push to do so. Startup speed is an issue with Moose but as you noted there are ways to optimize things. And, truthfully, if we can get around the interface issues using roles it might actually help a bit. chris From Michael.Stubbington at hpa.org.uk Wed May 6 10:39:27 2009 From: Michael.Stubbington at hpa.org.uk (Michael Stubbington) Date: Wed, 6 May 2009 15:39:27 +0100 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters Message-ID: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> Dear all, I am using the Bio::Tools::Run::Cap3 wrapper to the Cap3 assembly program. I have some reads that will only assemble if cap3 is used with the '-y 150' option. This is fine from the command line but I can't work out how to pass this option to the Cap3 factory object in my script. If I do the following my $params = "y 150" ; my $cap3Factory = Bio::Tools::Run::Cap3->new($params); my $assembly = $cap3Factory->run($file); Then I get an exception as follows: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Unallowed parameter: y ! STACK: Error::throw STACK: Bio::Root::Root::throw /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Root/Root.pm:357 STACK: Bio::Tools::Run::Cap3::AUTOLOAD /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:116 STACK: Bio::Tools::Run::Cap3::new /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:101 STACK: /Users/mike/perlScripts/QGenotype.pl:150 If I don't try to pass any parameters to Cap3 it runs fine but just fails to assemble the reads that need the -y 150 flag. I'd very much appreciate any help with this. I'm pretty new to bioperl, hope I haven't missed anything obvious! Thanks in advance, Mike ------------------------------------------------------------------------ ---- Mike Stubbington Novel and Dangerous Pathogens Health Protection Agency Centre for Emergency Preparedness and Response Porton Down Salisbury SP4 0JG Tel: +44 1980 619812 ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************** From Michael.Stubbington at hpa.org.uk Wed May 6 11:27:39 2009 From: Michael.Stubbington at hpa.org.uk (Michael Stubbington) Date: Wed, 6 May 2009 16:27:39 +0100 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters In-Reply-To: <3AE5C5E1-8551-4F44-92B3-C8DD40752A56@verizon.net> References: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> <3AE5C5E1-8551-4F44-92B3-C8DD40752A56@verizon.net> Message-ID: <335635A922FA2B43B35B6ADD7929CC59017B1317@porhpaexc001.HPA.org.uk> Hi Brian, Thanks for your reply. If I do that it doesn't throw the exception any more but it also doesn't successfully assemble the reads that need the -y 150 flag. M ________________________________ From: Brian Osborne [mailto:bosborne11 at verizon.net] Sent: 06 May 2009 16:09 To: Michael Stubbington Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters Michael, In Bio/Tools/Run/CAP3.pm you see this at the top: BEGIN { @PARAMS = qw(a b c d e f g m n o p s u v x); $PROGRAMDIR = '/usr/local/bin'; # Authorize attribute fields foreach my $attr (@PARAMS) { $OK_FIELD{$attr}++; } } If you add the letter y to @PARAMS does it work? Brian O. On May 6, 2009, at 10:39 AM, Michael Stubbington wrote: Dear all, I am using the Bio::Tools::Run::Cap3 wrapper to the Cap3 assembly program. I have some reads that will only assemble if cap3 is used with the '-y 150' option. This is fine from the command line but I can't work out how to pass this option to the Cap3 factory object in my script. If I do the following my $params = "y 150" ; my $cap3Factory = Bio::Tools::Run::Cap3->new($params); my $assembly = $cap3Factory->run($file); Then I get an exception as follows: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Unallowed parameter: y ! STACK: Error::throw STACK: Bio::Root::Root::throw /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Root/Root.pm:357 STACK: Bio::Tools::Run::Cap3::AUTOLOAD /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:116 STACK: Bio::Tools::Run::Cap3::new /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:101 STACK: /Users/mike/perlScripts/QGenotype.pl:150 If I don't try to pass any parameters to Cap3 it runs fine but just fails to assemble the reads that need the -y 150 flag. I'd very much appreciate any help with this. I'm pretty new to bioperl, hope I haven't missed anything obvious! Thanks in advance, Mike ------------------------------------------------------------------------ ---- Mike Stubbington Novel and Dangerous Pathogens Health Protection Agency Centre for Emergency Preparedness and Response Porton Down Salisbury SP4 0JG Tel: +44 1980 619812 ----------------------------------------- ************************************************************************ ** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************ ** _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed May 6 11:23:30 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 6 May 2009 08:23:30 -0700 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters In-Reply-To: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> References: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> Message-ID: <1A4207F8295607498283FE9E93B775B405F7F7C0@EX02.asurite.ad.asu.edu> BEGIN { @PARAMS = qw(a b c d e f g m n o p s u v x); $PROGRAMDIR = '/usr/local/bin'; # Authorize attribute fields foreach my $attr (@PARAMS) { $OK_FIELD{$attr}++; } That is the list of params that Cap3 will accept in the BioPerl module. I'm guessing if you add the y to that list that it might work. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Michael Stubbington > Sent: Wednesday, May 06, 2009 7:39 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters > > Dear all, > > > > I am using the Bio::Tools::Run::Cap3 wrapper to the Cap3 assembly > program. I have some reads that will only assemble if cap3 is > used with > the '-y 150' option. This is fine from the command line but I > can't work > out how to pass this option to the Cap3 factory object in my script. > > > > If I do the following > > > > my $params = "y 150" ; > > my $cap3Factory = Bio::Tools::Run::Cap3->new($params); > > my $assembly = $cap3Factory->run($file); > > > > Then I get an exception as follows: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Unallowed parameter: y ! > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::Run::Cap3::AUTOLOAD > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:116 > > STACK: Bio::Tools::Run::Cap3::new > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:101 > > STACK: /Users/mike/perlScripts/QGenotype.pl:150 > > > > If I don't try to pass any parameters to Cap3 it runs fine but just > fails to assemble the reads that need the -y 150 flag. > > > > I'd very much appreciate any help with this. I'm pretty new > to bioperl, > hope I haven't missed anything obvious! > > > > Thanks in advance, > > > > Mike > > > > -------------------------------------------------------------- > ---------- > ---- > > Mike Stubbington > > Novel and Dangerous Pathogens > > Health Protection Agency > > Centre for Emergency Preparedness and Response > > Porton Down > > Salisbury > > SP4 0JG > > > > Tel: +44 1980 619812 > > > > > > ----------------------------------------- > ************************************************************** > ************ > The information contained in the EMail and any attachments is > confidential and intended solely and for the attention and use of > the named addressee(s). It may not be disclosed to any other person > without the express authority of the HPA, or the intended > recipient, or both. If you are not the intended recipient, you must > not disclose, copy, distribute or retain this message or any part > of it. This footnote also confirms that this EMail has been swept > for computer viruses, but please re-sweep any attachments before > opening or saving. HTTP://www.HPA.org.uk > ************************************************************** > ************ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hartzell at alerce.com Wed May 6 11:31:59 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 6 May 2009 08:31:59 -0700 Subject: [Bioperl-l] question about in-between overlapping exact location In-Reply-To: References: <18944.45755.94431.882844@already.local> <1E9CA287-58C3-48B1-B9AD-3AC9541984C3@bioperl.org> <18945.3949.852961.763626@already.local> Message-ID: <18945.44399.403097.640951@already.local> Hilmar Lapp writes: > This sounds like a bug to me - the location type should be taken into > account, shouldn't it? > > Would you mind submitting this (and a patch if you have one :) to > bugzilla? Will do. I can just commit a fix if you'd like if the behaviour I expected makes sense to people. g. From jonathancrabtree at gmail.com Wed May 6 11:45:32 2009 From: jonathancrabtree at gmail.com (Jonathan Crabtree) Date: Wed, 6 May 2009 11:45:32 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters In-Reply-To: <1A4207F8295607498283FE9E93B775B405F7F7C0@EX02.asurite.ad.asu.edu> References: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> <1A4207F8295607498283FE9E93B775B405F7F7C0@EX02.asurite.ad.asu.edu> Message-ID: <8e5b8bf80905060845q59e91a2l8b84f4839de5065f@mail.gmail.com> The "new" argument to Cap3 expects an array, not a string. So I think you need to do this: my $cap3Factory = Bio::Tools::Run::Cap3->new('y', '150'); rather than this: my $cap3Factory = Bio::Tools::Run::Cap3->new('y 150'); Otherwise it will silently ignore the parameter. There are also several problems with the Cap3 module itself, at least the version shown here: http://cpansearch.perl.org/src/CJFIELDS/BioPerl-run-1.6.1/Bio/Tools/Run/Cap3.pm Those problems are: 1. "y" is not in the PARAMS array, as Brian and Kevin have noted 2. $PROGRAMDIR appears to be hard-coded to /usr/local/bin (OK if that's where your cap3 is installed) 3. The run() method does this: my $commandstring = $exe . $param_string . " $infilename1"; but at least for the version of cap3 I'm using, you need to put the $param_string _after_ the $infilename1 for it to work. Once all these things are corrected it worked for me and correctly passed the -y 150 to cap3 when new() was called as shown above. Jonathan On Wed, May 6, 2009 at 11:23 AM, Kevin Brown wrote: > BEGIN { > > @PARAMS = qw(a b c d e f g m n o p s u v x); > $PROGRAMDIR = '/usr/local/bin'; > > # Authorize attribute fields > foreach my $attr (@PARAMS) { $OK_FIELD{$attr}++; > > } > > That is the list of params that Cap3 will accept in the BioPerl module. > I'm guessing if you add the y to that list that it might work. > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Michael Stubbington > > Sent: Wednesday, May 06, 2009 7:39 AM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters > > > > Dear all, > > > > > > > > I am using the Bio::Tools::Run::Cap3 wrapper to the Cap3 assembly > > program. I have some reads that will only assemble if cap3 is > > used with > > the '-y 150' option. This is fine from the command line but I > > can't work > > out how to pass this option to the Cap3 factory object in my script. > > > > > > > > If I do the following > > > > > > > > my $params = "y 150" ; > > > > my $cap3Factory = Bio::Tools::Run::Cap3->new($params); > > > > my $assembly = $cap3Factory->run($file); > > > > > > > > Then I get an exception as follows: > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Unallowed parameter: y ! > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw > > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Root/Root.pm:357 > > > > STACK: Bio::Tools::Run::Cap3::AUTOLOAD > > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:116 > > > > STACK: Bio::Tools::Run::Cap3::new > > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:101 > > > > STACK: /Users/mike/perlScripts/QGenotype.pl:150 > > > > > > > > If I don't try to pass any parameters to Cap3 it runs fine but just > > fails to assemble the reads that need the -y 150 flag. > > > > > > > > I'd very much appreciate any help with this. I'm pretty new > > to bioperl, > > hope I haven't missed anything obvious! > > > > > > > > Thanks in advance, > > > > > > > > Mike > > > > > > > > -------------------------------------------------------------- > > ---------- > > ---- > > > > Mike Stubbington > > > > Novel and Dangerous Pathogens > > > > Health Protection Agency > > > > Centre for Emergency Preparedness and Response > > > > Porton Down > > > > Salisbury > > > > SP4 0JG > > > > > > > > Tel: +44 1980 619812 > > > > > > > > > > > > ----------------------------------------- > > ************************************************************** > > ************ > > The information contained in the EMail and any attachments is > > confidential and intended solely and for the attention and use of > > the named addressee(s). It may not be disclosed to any other person > > without the express authority of the HPA, or the intended > > recipient, or both. If you are not the intended recipient, you must > > not disclose, copy, distribute or retain this message or any part > > of it. This footnote also confirms that this EMail has been swept > > for computer viruses, but please re-sweep any attachments before > > opening or saving. HTTP://www.HPA.org.uk > > ************************************************************** > > ************ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hartzell at alerce.com Wed May 6 11:48:14 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 6 May 2009 08:48:14 -0700 Subject: [Bioperl-l] question about in-between overlapping exact location In-Reply-To: <0FF287BC-6EFE-498E-81BD-3D1E8DF37353@illinois.edu> References: <18944.45755.94431.882844@already.local> <1E9CA287-58C3-48B1-B9AD-3AC9541984C3@bioperl.org> <18945.3949.852961.763626@already.local> <0FF287BC-6EFE-498E-81BD-3D1E8DF37353@illinois.edu> Message-ID: <18945.45374.875448.871575@already.local> Chris Fields writes: > We should definitely come up with some test cases and expected results > for this; e.g. whether 2^3 should overlap with 1..2 or 3..5, etc (I > would guess, in the latter example, they shouldn't). My expectations agree with your guess. > Also, as these are LocationI-specific, I'm not sure we should make > changes to RangeI methods. Maybe fix LocationI-specific bits within > LocationI and delegate to RangeI::overlaps/etc in simple cases? I think that LocationI would my intended victim. I'll build up some test cases w/ expected output and a patch and see what people think before I commit it. g. From cjfields at illinois.edu Wed May 6 12:07:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 May 2009 11:07:00 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters In-Reply-To: <8e5b8bf80905060845q59e91a2l8b84f4839de5065f@mail.gmail.com> References: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> <1A4207F8295607498283FE9E93B775B405F7F7C0@EX02.asurite.ad.asu.edu> <8e5b8bf80905060845q59e91a2l8b84f4839de5065f@mail.gmail.com> Message-ID: Jonathan, Have a diff file? We can fix that on main trunk for the next release. chris On May 6, 2009, at 10:45 AM, Jonathan Crabtree wrote: > The "new" argument to Cap3 expects an array, not a string. So I > think you > need to do this: > > my $cap3Factory = Bio::Tools::Run::Cap3->new('y', '150'); > > rather than this: > > my $cap3Factory = Bio::Tools::Run::Cap3->new('y 150'); > > Otherwise it will silently ignore the parameter. There are also > several > problems with the Cap3 module itself, at least the version shown here: > > http://cpansearch.perl.org/src/CJFIELDS/BioPerl-run-1.6.1/Bio/Tools/Run/Cap3.pm > > Those problems are: > > 1. "y" is not in the PARAMS array, as Brian and Kevin have noted > 2. $PROGRAMDIR appears to be hard-coded to /usr/local/bin (OK if > that's > where your cap3 is installed) > 3. The run() method does this: > > my $commandstring = $exe . $param_string . " $infilename1"; > > but at least for the version of cap3 I'm using, you need to put the > $param_string _after_ the $infilename1 for it to work. Once all these > things are corrected it worked for me and correctly passed the -y > 150 to > cap3 when new() was called as shown above. > > Jonathan > > > On Wed, May 6, 2009 at 11:23 AM, Kevin Brown > wrote: > >> BEGIN { >> >> @PARAMS = qw(a b c d e f g m n o p s u v x); >> $PROGRAMDIR = '/usr/local/bin'; >> >> # Authorize attribute fields >> foreach my $attr (@PARAMS) { $OK_FIELD{$attr}++; >> >> } >> >> That is the list of params that Cap3 will accept in the BioPerl >> module. >> I'm guessing if you add the y to that list that it might work. >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Michael Stubbington >>> Sent: Wednesday, May 06, 2009 7:39 AM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters >>> >>> Dear all, >>> >>> >>> >>> I am using the Bio::Tools::Run::Cap3 wrapper to the Cap3 assembly >>> program. I have some reads that will only assemble if cap3 is >>> used with >>> the '-y 150' option. This is fine from the command line but I >>> can't work >>> out how to pass this option to the Cap3 factory object in my script. >>> >>> >>> >>> If I do the following >>> >>> >>> >>> my $params = "y 150" ; >>> >>> my $cap3Factory = Bio::Tools::Run::Cap3->new($params); >>> >>> my $assembly = $cap3Factory->run($file); >>> >>> >>> >>> Then I get an exception as follows: >>> >>> >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> >>> MSG: Unallowed parameter: y ! >>> >>> STACK: Error::throw >>> >>> STACK: Bio::Root::Root::throw >>> /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Root/Root.pm:357 >>> >>> STACK: Bio::Tools::Run::Cap3::AUTOLOAD >>> /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:116 >>> >>> STACK: Bio::Tools::Run::Cap3::new >>> /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:101 >>> >>> STACK: /Users/mike/perlScripts/QGenotype.pl:150 >>> >>> >>> >>> If I don't try to pass any parameters to Cap3 it runs fine but just >>> fails to assemble the reads that need the -y 150 flag. >>> >>> >>> >>> I'd very much appreciate any help with this. I'm pretty new >>> to bioperl, >>> hope I haven't missed anything obvious! >>> >>> >>> >>> Thanks in advance, >>> >>> >>> >>> Mike >>> >>> >>> >>> -------------------------------------------------------------- >>> ---------- >>> ---- >>> >>> Mike Stubbington >>> >>> Novel and Dangerous Pathogens >>> >>> Health Protection Agency >>> >>> Centre for Emergency Preparedness and Response >>> >>> Porton Down >>> >>> Salisbury >>> >>> SP4 0JG >>> >>> >>> >>> Tel: +44 1980 619812 >>> >>> >>> >>> >>> >>> ----------------------------------------- >>> ************************************************************** >>> ************ >>> The information contained in the EMail and any attachments is >>> confidential and intended solely and for the attention and use of >>> the named addressee(s). It may not be disclosed to any other person >>> without the express authority of the HPA, or the intended >>> recipient, or both. If you are not the intended recipient, you must >>> not disclose, copy, distribute or retain this message or any part >>> of it. This footnote also confirms that this EMail has been swept >>> for computer viruses, but please re-sweep any attachments before >>> opening or saving. HTTP://www.HPA.org.uk >>> ************************************************************** >>> ************ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Wed May 6 11:09:27 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 06 May 2009 11:09:27 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters In-Reply-To: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> References: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> Message-ID: <3AE5C5E1-8551-4F44-92B3-C8DD40752A56@verizon.net> Michael, In Bio/Tools/Run/CAP3.pm you see this at the top: BEGIN { @PARAMS = qw(a b c d e f g m n o p s u v x); $PROGRAMDIR = '/usr/local/bin'; # Authorize attribute fields foreach my $attr (@PARAMS) { $OK_FIELD{$attr}++; } } If you add the letter y to @PARAMS does it work? Brian O. On May 6, 2009, at 10:39 AM, Michael Stubbington wrote: > Dear all, > > > > I am using the Bio::Tools::Run::Cap3 wrapper to the Cap3 assembly > program. I have some reads that will only assemble if cap3 is used > with > the '-y 150' option. This is fine from the command line but I can't > work > out how to pass this option to the Cap3 factory object in my script. > > > > If I do the following > > > > my $params = "y 150" ; > > my $cap3Factory = Bio::Tools::Run::Cap3->new($params); > > my $assembly = $cap3Factory->run($file); > > > > Then I get an exception as follows: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Unallowed parameter: y ! > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::Run::Cap3::AUTOLOAD > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:116 > > STACK: Bio::Tools::Run::Cap3::new > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:101 > > STACK: /Users/mike/perlScripts/QGenotype.pl:150 > > > > If I don't try to pass any parameters to Cap3 it runs fine but just > fails to assemble the reads that need the -y 150 flag. > > > > I'd very much appreciate any help with this. I'm pretty new to > bioperl, > hope I haven't missed anything obvious! > > > > Thanks in advance, > > > > Mike > > > > ------------------------------------------------------------------------ > ---- > > Mike Stubbington > > Novel and Dangerous Pathogens > > Health Protection Agency > > Centre for Emergency Preparedness and Response > > Porton Down > > Salisbury > > SP4 0JG > > > > Tel: +44 1980 619812 > > > > > > ----------------------------------------- > ************************************************************************** > The information contained in the EMail and any attachments is > confidential and intended solely and for the attention and use of > the named addressee(s). It may not be disclosed to any other person > without the express authority of the HPA, or the intended > recipient, or both. If you are not the intended recipient, you must > not disclose, copy, distribute or retain this message or any part > of it. This footnote also confirms that this EMail has been swept > for computer viruses, but please re-sweep any attachments before > opening or saving. HTTP://www.HPA.org.uk > ************************************************************************** > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed May 6 12:49:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 May 2009 11:49:09 -0500 Subject: [Bioperl-l] Moose [was Re: Other object oddities] In-Reply-To: <18945.47362.738611.609881@already.local> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> <4D0732D667FD4A26B6161660107920E5@NewLife> <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> <6B76016F-60E8-4FE5-B083-E64762D79039@illinois.edu> <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> <17AF855A-AC55-4322-BE15-050F5EE3E802@illinois.edu> <18945.47362.738611.609881@already.local> Message-ID: <5A04CB9C-6B21-4DF6-868F-7B5F9A45C679@illinois.edu> On May 6, 2009, at 11:21 AM, George Hartzell wrote: > Chris Fields writes: >> [...] >> Startup speed is an issue with Moose but as you noted there are ways >> to optimize things. And, truthfully, if we can get around the >> interface issues using roles it might actually help a bit. > > Can anyone point to a thread/presentation/paper about Moose best > practices and/or common workarounds? > > Thanks, > > g. Best place is the actual module docs for Moose (including the cookbook and manual). http://search.cpan.org/~drolsky/Moose-0.77/lib/Moose/Cookbook.pod http://search.cpan.org/~drolsky/Moose-0.77/lib/Moose/Manual.pod For Moose extensions: http://search.cpan.org/~stevan/Task-Moose-0.01/lib/Task/Moose.pm Main Moose page: http://www.iinteractive.com/moose/ I have added these to: http://www.bioperl.org/wiki/BioMoose chris From hartzell at alerce.com Wed May 6 12:21:22 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 6 May 2009 09:21:22 -0700 Subject: [Bioperl-l] Moose [was Re: Other object oddities] In-Reply-To: <17AF855A-AC55-4322-BE15-050F5EE3E802@illinois.edu> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu> <1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu> <4D0732D667FD4A26B6161660107920E5@NewLife> <31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu> <02EEF4C7F37247C7BBA8EC1068069FC3@NewLife> <38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net> <6B76016F-60E8-4FE5-B083-E64762D79039@illinois.edu> <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> <17AF855A-AC55-4322-BE15-050F5EE3E802@illinois.edu> Message-ID: <18945.47362.738611.609881@already.local> Chris Fields writes: > [...] > Startup speed is an issue with Moose but as you noted there are ways > to optimize things. And, truthfully, if we can get around the > interface issues using roles it might actually help a bit. Can anyone point to a thread/presentation/paper about Moose best practices and/or common workarounds? Thanks, g. From maj at fortinbras.us Wed May 6 13:56:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 6 May 2009 13:56:03 -0400 Subject: [Bioperl-l] Moose [was Re: Other object oddities] In-Reply-To: <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu><1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu><4D0732D667FD4A26B6161660107920E5@NewLife><31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu><02EEF4C7F37247C7BBA8EC1068069FC3@NewLife><38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net><6B76016F-60E8-4FE5-B083-E64762D79039@illinois.edu> <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> Message-ID: Great discussion-- I have redacted the moose portions to http://www.bioperl.org/wiki/Talk:BioMoose and encourage all interested folks to log comments there as well. cheers Mark ----- Original Message ----- From: "Chris Mungall" To: "Chris Fields" Cc: "BioPerl List" ; "Mark A. Jensen" ; "Kevin Brown" Sent: Tuesday, May 05, 2009 2:28 PM Subject: [Bioperl-l] Moose [was Re: Other object oddities] > > On May 5, 2009, at 7:31 AM, Chris Fields wrote: > >> On May 5, 2009, at 7:31 AM, Hilmar Lapp wrote: >> >>> >>> On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: >>> >>>> Maybe this should be an element of >>>> the "Align refactor" that perhaps should be an overall >>>> "Seq refactor". >>> >>> Possibly. Most importantly, it'd be great if someone would volunteer to >>> summarize what's been said here so it won't get lost. >> >> Looks like mark's done it. >> >>>> Are you saying that the trunk is fair game for api additions >>>> for this issue? >>> >>> There's been talk some (a long, actually) time ago about BioPerl 2.0 that >>> would start on a clean slate and not be bothered by backwards compatibility >>> demands. That effort never really took off, but maybe this is also a good >>> time to ask the question again whether it's better to introduce the API >>> changes we desire in add/ deprecate/remove cycles, or in a more radical >>> fashion starting on a clean slate. >> >> That's what I'm thinking. >> >>> The obvious advantage of the former is that we get API improvements sooner, >>> but making them is possibly more dreadful, discouraging, or not even doable >>> due to compatibility constraints. The disadvantage of the latter is that it >>> really needs a committed crew of people to see it through or otherwise all >>> the nice changes die in some grand but half-finished 2.0 construction site. >>> I think Chris also had plans to branch off a Perl6 version of BioPerl - >>> maybe those could be the same efforts? >> >> I have been toying around with perl6 for a bit now (Rakudo on Parrot >> implementation). It's possible an alpha for perl6 will be available by >> christmas this year; Rakudo is now passing over 11000 spec tests. >> >> Just to note, Perl6 is another beast altogether from Perl5. Yes, there is >> supposed to be a backwards compatibility mode, but no one has implemented >> that yet, and it likely won't be implemented in the near future. Based on >> that I'm not sure we could really call a bioperl in perl6 bioperl 2.0, more >> like bioperl6 1.0, as it would be a complete refactor. >> >> As for perl5, it has a nice OO set of modules (Moose) that could be used for >> refactoring. It implements roles and a few other perl6-ish bits (along with >> MooseX modules). perl 5.10 also has a few things backported from p6, say(), >> given/when, state vars, etc. We could require Modern::Perl (perl5.10 with >> strict/warnings pragmas on) and Moose. I have played around with both and >> find them quite nice, so I suggest if we were to start a 2.0 effort it >> should include Moose, and we should push most of the interfaces into roles. > > We're playing around with a rewrite of go-perl using Moose: > http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/OBO/ > > This is early enough that parts could be scrapped or rewritten. Compatibility > with bioperl is important. > > Speed was an initial concern but apparently there are some moose tricks to > speed things up > > DBIx::Class compatibility is also important. Not sure if there is specific > support for this yet > > >> >> Anyway, I grabbed the git repos for bioperl6 and biomoose (bioperl >> implemented in Moose) on github. We can set up something there using those >> namespaces if needed. >> >>> I'm not trying to advocate one over the other here; rather, I'd like to >>> help push on that front that is best able to capture the energy of >>> volunteers, as that's what it takes in the end. >>> >>> -hilmar >> >> Depends on where everyone wants to place their efforts. May be less work to >> port the most important core classes over to Moose, and a simple test >> implementation will give us an idea on what works Role- wise and what >> doesn't. From there we could work on p6 variants; that would have to be a >> separate project altogether. We could also include a few other MooseX >> modules if it makes life easier. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Wed May 6 14:40:55 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 6 May 2009 14:40:55 -0400 Subject: [Bioperl-l] question about in-between overlapping exact location In-Reply-To: <18945.45374.875448.871575@already.local> References: <18944.45755.94431.882844@already.local> <1E9CA287-58C3-48B1-B9AD-3AC9541984C3@bioperl.org> <18945.3949.852961.763626@already.local> <0FF287BC-6EFE-498E-81BD-3D1E8DF37353@illinois.edu> <18945.45374.875448.871575@already.local> Message-ID: On May 6, 2009, at 11:48 AM, George Hartzell wrote: > I think that LocationI would my intended victim. I'll build up some > test cases w/ expected output and a patch and see what people think > before I commit it. That'd be great - forgot that you can commit away already! -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From scott at scottcain.net Wed May 6 15:37:52 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 6 May 2009 15:37:52 -0400 Subject: [Bioperl-l] Blasting 100kb against dbEST? Message-ID: <4536f7700905061237r50bd7c7cqe64d9eecfebab035@mail.gmail.com> Hi all, I'm working on a project that needs to BLAST 100kb genomic fragments against large DBs like dbEST. Now, 100kb is a big query, and I was hoping that there might be a standard way to break this apart, parallelize the BLAST and then reassembly/collate the results. Is there a standard way to do that? That is, the first two things are easy to do, but putting it all back together seems fraught with traps. Its those traps I'm looking for. Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From jonathancrabtree at gmail.com Wed May 6 15:34:07 2009 From: jonathancrabtree at gmail.com (Jonathan Crabtree) Date: Wed, 6 May 2009 15:34:07 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters In-Reply-To: References: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> <1A4207F8295607498283FE9E93B775B405F7F7C0@EX02.asurite.ad.asu.edu> <8e5b8bf80905060845q59e91a2l8b84f4839de5065f@mail.gmail.com> Message-ID: <8e5b8bf80905061234v2c980235hb2d8f9ac38edb02d@mail.gmail.com> Chris- It looks like Brian already added the 'y' option and fixed one of the typos in the SYNOPSIS, so here's a suggested diff based on the SVN head as of a few minutes ago. It includes the following changes: 1. Removed reference to BLAST in the comments. 2. Modified SYNOPSIS to make intended new() usage clearer. 3. Added the following @PARAMS: h i j k r t w z (to match those in the 12/21/07 version of cap3) 4. Calling $factory->program_dir('/some/path') now changes the default PROGRAMDIR (/usr/local/bin) 5. Bug fix, at least for post-2005 cap3 versions: changed run() method to pass CAP3 options _after_ the filename. 6. Throw an (informative) exception if the executable couldn't be found by WrapperBase::executable. 7. Changed comments to use "CAP3", not "Cap3" or "cap3" as the name of the software package. The Perl module is still "Cap3". 4. is probably the only change that might be controversial. It seems that most of the Bio::Tools::Run wrappers determine the directory in which the program executable resides by checking an environment variable. Cap3.pm stands out by hard-coding it. program_dir() is a class method but I've changed it to allow it to be called _either_ as a class method (in which case it returns the default $PROGRAMDIR) _or_ as an object method (in which case it returns or sets an internal copy of the program directory, as illustrated in the new SYNOPSIS.) If you want to change the class default you have to modify $PROGRAMDIR directly. I also noticed that if cap3 _isn't_ in the default $PROGRAMDIR the error message is completely unhelpful, so I've added a new throw() statement for this case. Finally, it doesn't seem that there's a test file for Cap3. If I have some free time later I'll look into adding one. Jonathan On Wed, May 6, 2009 at 12:07 PM, Chris Fields wrote: > Jonathan, > > Have a diff file? We can fix that on main trunk for the next release. > > chris > > > On May 6, 2009, at 10:45 AM, Jonathan Crabtree wrote: > > The "new" argument to Cap3 expects an array, not a string. So I think you >> need to do this: >> >> my $cap3Factory = Bio::Tools::Run::Cap3->new('y', '150'); >> >> rather than this: >> >> my $cap3Factory = Bio::Tools::Run::Cap3->new('y 150'); >> >> Otherwise it will silently ignore the parameter. There are also several >> problems with the Cap3 module itself, at least the version shown here: >> >> >> http://cpansearch.perl.org/src/CJFIELDS/BioPerl-run-1.6.1/Bio/Tools/Run/Cap3.pm >> >> Those problems are: >> >> 1. "y" is not in the PARAMS array, as Brian and Kevin have noted >> 2. $PROGRAMDIR appears to be hard-coded to /usr/local/bin (OK if that's >> where your cap3 is installed) >> 3. The run() method does this: >> >> my $commandstring = $exe . $param_string . " $infilename1"; >> >> but at least for the version of cap3 I'm using, you need to put the >> $param_string _after_ the $infilename1 for it to work. Once all these >> things are corrected it worked for me and correctly passed the -y 150 to >> cap3 when new() was called as shown above. >> >> Jonathan >> >> >> On Wed, May 6, 2009 at 11:23 AM, Kevin Brown >> wrote: >> >> BEGIN { >>> >>> @PARAMS = qw(a b c d e f g m n o p s u v x); >>> $PROGRAMDIR = '/usr/local/bin'; >>> >>> # Authorize attribute fields >>> foreach my $attr (@PARAMS) { $OK_FIELD{$attr}++; >>> >>> } >>> >>> That is the list of params that Cap3 will accept in the BioPerl module. >>> I'm guessing if you add the y to that list that it might work. >>> >>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>> Michael Stubbington >>>> Sent: Wednesday, May 06, 2009 7:39 AM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters >>>> >>>> Dear all, >>>> >>>> >>>> >>>> I am using the Bio::Tools::Run::Cap3 wrapper to the Cap3 assembly >>>> program. I have some reads that will only assemble if cap3 is >>>> used with >>>> the '-y 150' option. This is fine from the command line but I >>>> can't work >>>> out how to pass this option to the Cap3 factory object in my script. >>>> >>>> >>>> >>>> If I do the following >>>> >>>> >>>> >>>> my $params = "y 150" ; >>>> >>>> my $cap3Factory = Bio::Tools::Run::Cap3->new($params); >>>> >>>> my $assembly = $cap3Factory->run($file); >>>> >>>> >>>> >>>> Then I get an exception as follows: >>>> >>>> >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> >>>> MSG: Unallowed parameter: y ! >>>> >>>> STACK: Error::throw >>>> >>>> STACK: Bio::Root::Root::throw >>>> /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Root/Root.pm:357 >>>> >>>> STACK: Bio::Tools::Run::Cap3::AUTOLOAD >>>> /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:116 >>>> >>>> STACK: Bio::Tools::Run::Cap3::new >>>> /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:101 >>>> >>>> STACK: /Users/mike/perlScripts/QGenotype.pl:150 >>>> >>>> >>>> >>>> If I don't try to pass any parameters to Cap3 it runs fine but just >>>> fails to assemble the reads that need the -y 150 flag. >>>> >>>> >>>> >>>> I'd very much appreciate any help with this. I'm pretty new >>>> to bioperl, >>>> hope I haven't missed anything obvious! >>>> >>>> >>>> >>>> Thanks in advance, >>>> >>>> >>>> >>>> Mike >>>> >>>> >>>> >>>> -------------------------------------------------------------- >>>> ---------- >>>> ---- >>>> >>>> Mike Stubbington >>>> >>>> Novel and Dangerous Pathogens >>>> >>>> Health Protection Agency >>>> >>>> Centre for Emergency Preparedness and Response >>>> >>>> Porton Down >>>> >>>> Salisbury >>>> >>>> SP4 0JG >>>> >>>> >>>> >>>> Tel: +44 1980 619812 >>>> >>>> >>>> >>>> >>>> >>>> ----------------------------------------- >>>> ************************************************************** >>>> ************ >>>> The information contained in the EMail and any attachments is >>>> confidential and intended solely and for the attention and use of >>>> the named addressee(s). It may not be disclosed to any other person >>>> without the express authority of the HPA, or the intended >>>> recipient, or both. If you are not the intended recipient, you must >>>> not disclose, copy, distribute or retain this message or any part >>>> of it. This footnote also confirms that this EMail has been swept >>>> for computer viruses, but please re-sweep any attachments before >>>> opening or saving. HTTP://www.HPA.org.uk >>>> ************************************************************** >>>> ************ >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: Cap3.pm.diff Type: text/x-diff Size: 1724 bytes Desc: not available URL: From len.zaifman at sickkids.ca Wed May 6 15:59:19 2009 From: len.zaifman at sickkids.ca (len.zaifman at sickkids.ca) Date: Wed, 6 May 2009 15:59:19 -0400 Subject: [Bioperl-l] Installing bioperl without ftp Message-ID: Due to institutional requirements we cannot use ftp (port 21) to obtain the pre-requisite packages using the easy build method. Is there a way to do this using https or at least http? Or better yet scp/sftp? Thanks. Sent by a BlackBerry device From scott at scottcain.net Wed May 6 16:29:08 2009 From: scott at scottcain.net (Scott Cain) Date: Wed, 6 May 2009 16:29:08 -0400 Subject: [Bioperl-l] Installing bioperl without ftp In-Reply-To: References: Message-ID: <536f21b00905061329y7464db27o685bb61433d3c29e@mail.gmail.com> Hi Len, When you configure cpan, you can specify to use http urls instead of ftp urls. If you've already configured cpan, and need to redo it, enter the cpan shell and type "o conf init" and look for http urls for mirrors. If there aren't any in Canada, look in the US--there are about 10. Scott On Wed, May 6, 2009 at 3:59 PM, wrote: > > Due to institutional requirements we cannot use ftp (port 21) to obtain the > pre-requisite packages using the easy build method. Is there a way to do > this using https or at least http? Or better yet scp/sftp? > > Thanks. > Sent by a BlackBerry device > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Wed May 6 16:22:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 May 2009 15:22:47 -0500 Subject: [Bioperl-l] Installing bioperl without ftp In-Reply-To: References: Message-ID: <0C6EC7DB-15BD-44B7-8330-177551749033@illinois.edu> Do you mean via CPAN or from the bioperl.org site? If the latter, you can use http: http://bioperl.org/DIST/ chris On May 6, 2009, at 2:59 PM, len.zaifman at sickkids.ca wrote: > Due to institutional requirements we cannot use ftp (port 21) to > obtain the > pre-requisite packages using the easy build method. Is there a way > to do > this using https or at least http? Or better yet scp/sftp? > > Thanks. > Sent by a BlackBerry device > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki.lehvaslaiho at gmail.com Wed May 6 16:47:56 2009 From: heikki.lehvaslaiho at gmail.com (Heikki Lehvaslaiho) Date: Wed, 6 May 2009 22:47:56 +0200 Subject: [Bioperl-l] Installing bioperl without ftp In-Reply-To: References: Message-ID: Len, Install bioperl-live and other repositories directly from the SVN repository. http://www.bioperl.org/wiki/Using_Subversion SVN uses ssh, so that should work for you. -Heikki 2009/5/6 : > > Due to institutional requirements we cannot use ftp (port 21) to obtain the > pre-requisite packages using the easy build method. Is there a way to do > this using https or at least http? Or better yet scp/sftp? > > Thanks. > Sent by a BlackBerry device > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- -Heikki Heikki Lehvaslaiho - skype:heikki_lehvaslaiho cell: +27 (0)714328090 Sent from Claremont, WC, South Africa From cjfields at illinois.edu Wed May 6 16:49:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 May 2009 15:49:46 -0500 Subject: [Bioperl-l] Blasting 100kb against dbEST? In-Reply-To: <4536f7700905061237r50bd7c7cqe64d9eecfebab035@mail.gmail.com> References: <4536f7700905061237r50bd7c7cqe64d9eecfebab035@mail.gmail.com> Message-ID: <408B0EA1-5B3A-49CA-B3D7-459DD7F7BF8D@illinois.edu> I have locally run ~100kb fragments before (BLASTN) w/o problems off my first-gen MacBook, but this was against a small database. If you need to iterate through the sequence in chunks you can specify start/ stop with -L, so the hits/HSPs will be mapped accordingly (instead of starting from 1). Also, mpiBLAST appears to segment queries: http://www.mpiblast.org/ chris On May 6, 2009, at 2:37 PM, Scott Cain wrote: > Hi all, > > I'm working on a project that needs to BLAST 100kb genomic fragments > against large DBs like dbEST. Now, 100kb is a big query, and I was > hoping that there might be a standard way to break this apart, > parallelize the BLAST and then reassembly/collate the results. Is > there a standard way to do that? That is, the first two things are > easy to do, but putting it all back together seems fraught with traps. > Its those traps I'm looking for. > > Thanks, > Scott > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at > scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Wed May 6 17:05:06 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Wed, 6 May 2009 23:05:06 +0200 Subject: [Bioperl-l] Installing bioperl without ftp In-Reply-To: <536f21b00905061329y7464db27o685bb61433d3c29e@mail.gmail.com> References: <536f21b00905061329y7464db27o685bb61433d3c29e@mail.gmail.com> Message-ID: <628aabb70905061405w357a80ela3231c14aae973b3@mail.gmail.com> Hey Len, In addition to what Scott said, it's possible to get BioPerl via http directly off the website. See http://www.bioperl.org/wiki/Getting_BioPerl for the URLs and other details. Dave From cjfields at illinois.edu Wed May 6 18:41:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 6 May 2009 17:41:48 -0500 Subject: [Bioperl-l] Moose [was Re: Other object oddities] In-Reply-To: References: <1A4207F8295607498283FE9E93B775B405F1257B@EX02.asurite.ad.asu.edu><1A4207F8295607498283FE9E93B775B405F1286C@EX02.asurite.ad.asu.edu><4D0732D667FD4A26B6161660107920E5@NewLife><31FC08BB-1AF2-4064-8F7F-273517ECBE81@illinois.edu><02EEF4C7F37247C7BBA8EC1068069FC3@NewLife><38483E75-E05A-4A3D-B057-28B7C928ADC6@gmx.net><6B76016F-60E8-4FE5-B083-E64762D79039@illinois.edu> <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> Message-ID: As a final bit: if we go the Moose route, we should be very careful about which MooseX modules we want. I don't think we want to expand the dependency tree. For instance, I am attempting to install one possible module (MooseX::Declare) and the dependency tree was ginormous and included modules only needed for installation. chris On May 6, 2009, at 12:56 PM, Mark A. Jensen wrote: > Great discussion-- I have redacted the moose portions to http://www.bioperl.org/wiki/Talk:BioMoose > and encourage all interested folks to log comments there as well. > cheers Mark > ----- Original Message ----- From: "Chris Mungall" > > To: "Chris Fields" > Cc: "BioPerl List" ; "Mark A. Jensen" >; "Kevin Brown" > Sent: Tuesday, May 05, 2009 2:28 PM > Subject: [Bioperl-l] Moose [was Re: Other object oddities] > > >> >> On May 5, 2009, at 7:31 AM, Chris Fields wrote: >> >>> On May 5, 2009, at 7:31 AM, Hilmar Lapp wrote: >>> >>>> >>>> On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: >>>> >>>>> Maybe this should be an element of >>>>> the "Align refactor" that perhaps should be an overall >>>>> "Seq refactor". >>>> >>>> Possibly. Most importantly, it'd be great if someone would >>>> volunteer to summarize what's been said here so it won't get lost. >>> >>> Looks like mark's done it. >>> >>>>> Are you saying that the trunk is fair game for api additions >>>>> for this issue? >>>> >>>> There's been talk some (a long, actually) time ago about BioPerl >>>> 2.0 that would start on a clean slate and not be bothered by >>>> backwards compatibility demands. That effort never really took >>>> off, but maybe this is also a good time to ask the question >>>> again whether it's better to introduce the API changes we desire >>>> in add/ deprecate/remove cycles, or in a more radical fashion >>>> starting on a clean slate. >>> >>> That's what I'm thinking. >>> >>>> The obvious advantage of the former is that we get API >>>> improvements sooner, but making them is possibly more dreadful, >>>> discouraging, or not even doable due to compatibility >>>> constraints. The disadvantage of the latter is that it really >>>> needs a committed crew of people to see it through or otherwise >>>> all the nice changes die in some grand but half-finished 2.0 >>>> construction site. I think Chris also had plans to branch off a >>>> Perl6 version of BioPerl - maybe those could be the same efforts? >>> >>> I have been toying around with perl6 for a bit now (Rakudo on >>> Parrot implementation). It's possible an alpha for perl6 will be >>> available by christmas this year; Rakudo is now passing over >>> 11000 spec tests. >>> >>> Just to note, Perl6 is another beast altogether from Perl5. Yes, >>> there is supposed to be a backwards compatibility mode, but no >>> one has implemented that yet, and it likely won't be implemented >>> in the near future. Based on that I'm not sure we could really >>> call a bioperl in perl6 bioperl 2.0, more like bioperl6 1.0, as >>> it would be a complete refactor. >>> >>> As for perl5, it has a nice OO set of modules (Moose) that could >>> be used for refactoring. It implements roles and a few other >>> perl6-ish bits (along with MooseX modules). perl 5.10 also has a >>> few things backported from p6, say(), given/when, state vars, >>> etc. We could require Modern::Perl (perl5.10 with strict/ >>> warnings pragmas on) and Moose. I have played around with both >>> and find them quite nice, so I suggest if we were to start a 2.0 >>> effort it should include Moose, and we should push most of the >>> interfaces into roles. >> >> We're playing around with a rewrite of go-perl using Moose: >> http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/OBO/ >> >> This is early enough that parts could be scrapped or rewritten. >> Compatibility with bioperl is important. >> >> Speed was an initial concern but apparently there are some moose >> tricks to speed things up >> >> DBIx::Class compatibility is also important. Not sure if there is >> specific support for this yet >> >> >>> >>> Anyway, I grabbed the git repos for bioperl6 and biomoose (bioperl >>> implemented in Moose) on github. We can set up something there >>> using those namespaces if needed. >>> >>>> I'm not trying to advocate one over the other here; rather, I'd >>>> like to help push on that front that is best able to capture the >>>> energy of volunteers, as that's what it takes in the end. >>>> >>>> -hilmar >>> >>> Depends on where everyone wants to place their efforts. May be >>> less work to port the most important core classes over to Moose, >>> and a simple test implementation will give us an idea on what >>> works Role- wise and what doesn't. From there we could work on p6 >>> variants; that would have to be a separate project altogether. >>> We could also include a few other MooseX modules if it makes life >>> easier. >>> >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From alden.huang at gmail.com Wed May 6 19:30:37 2009 From: alden.huang at gmail.com (Alden Huang) Date: Wed, 6 May 2009 16:30:37 -0700 Subject: [Bioperl-l] retrieving gene sequence given protein id In-Reply-To: <277937.66024.qm@web45507.mail.sp1.yahoo.com> References: <277937.66024.qm@web45507.mail.sp1.yahoo.com> Message-ID: <9e408d720905061630o7f9348f8o6e3285eb53e6a09e@mail.gmail.com> I am pretty sure you can just do that on the NCBI website through "Batch Entrez." Just select like gene or nucleotide for your database. If I am wrong, sorry. On Wed, May 6, 2009 at 4:21 AM, Daniel Webb wrote: > > Hi all, > > is there a script or a module with which I could, given the list of protein gi or accessions, retrieve corresponding genes from Entrez Gene/GenBank? What I would like is sequence of the whole gene in fasta format, with all the introns and UTRs. > I would be grateful for any help > > Dan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Wed May 6 19:37:30 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 6 May 2009 16:37:30 -0700 Subject: [Bioperl-l] retrieving gene sequence given protein id In-Reply-To: <9e408d720905061630o7f9348f8o6e3285eb53e6a09e@mail.gmail.com> References: <277937.66024.qm@web45507.mail.sp1.yahoo.com> <9e408d720905061630o7f9348f8o6e3285eb53e6a09e@mail.gmail.com> Message-ID: <07585EBF-561D-47EF-864B-CD8982EE716C@bioperl.org> or the bioperl modules and the related Entrez query module: Bio::DB::GenBank Bio::DB::GenPept covered in the HOWTOs -jason On May 6, 2009, at 4:30 PM, Alden Huang wrote: > I am pretty sure you can just do that on the NCBI website through > "Batch Entrez." Just select like gene or nucleotide for your database. > If I am wrong, sorry. > > On Wed, May 6, 2009 at 4:21 AM, Daniel Webb > wrote: >> >> Hi all, >> >> is there a script or a module with which I could, given the list of >> protein gi or accessions, retrieve corresponding genes from Entrez >> Gene/GenBank? What I would like is sequence of the whole gene in >> fasta format, with all the introns and UTRs. >> I would be grateful for any help >> >> Dan >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Russell.Smithies at agresearch.co.nz Wed May 6 19:56:59 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 7 May 2009 11:56:59 +1200 Subject: [Bioperl-l] retrieving gene sequence given protein id In-Reply-To: <277937.66024.qm@web45507.mail.sp1.yahoo.com> References: <277937.66024.qm@web45507.mail.sp1.yahoo.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493CE8D07@exchsth.agresearch.co.nz> Hi Daniel, You should be able to do it with Bio::DB::Eutilities http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook Or use wget and manually link from protein (gi2088631) to gene: http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Link&LinkName=protein_gene&from_uid=2088631 then link gene to nucleotide: http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Link&LinkName=gene_nuccore&from_uid=282375 Or use NCBI eUtils http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=coursework&part=eutils Or build a pipeline: http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/epipe.html NCBI is like Perl, there's always more than one way to do it :-) --Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Daniel Webb > Sent: Wednesday, 6 May 2009 11:22 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] retrieving gene sequence given protein id > > > Hi all, > > is there a script or a module with which I could, given the list of protein gi > or accessions, retrieve corresponding genes from Entrez Gene/GenBank? What I > would like is sequence of the whole gene in fasta format, with all the introns > and UTRs. > I would be grateful for any help > > Dan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From brianli.cas at gmail.com Wed May 6 21:10:46 2009 From: brianli.cas at gmail.com (brian li) Date: Thu, 7 May 2009 09:10:46 +0800 Subject: [Bioperl-l] Asking for advice on full EMBL extraction Message-ID: Dear all, Recently I have to extract all EMBL entries and put them into relational database so that our report generation tool can access the data. I used Bio::SeqIO::embl to get entries one by one, but can not move on when dealing with big million-line entries. Segmentation Fault popped. And as currently SeqBuilder is not integrated into Bio::SeqIO::embl, SeqBuilder->add_unwanted_slot can't help (http://bugzilla.open-bio.org/show_bug.cgi?id=2823). Is there another way to get entires one by one with BioPerl? Brian From Russell.Smithies at agresearch.co.nz Wed May 6 23:32:32 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 7 May 2009 15:32:32 +1200 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> Hi Brian, I hate to say it but it worked OK for me using rel_ann_mus_01_r99.dat.gz and simple example Bio::SeqIO code from bugzilla It's not using more than 1GB memory on our server and doesn't segfault. Send me your example code and I'll give it a go if you like. Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of brian li > Sent: Thursday, 7 May 2009 1:11 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Asking for advice on full EMBL extraction > > Dear all, > > Recently I have to extract all EMBL entries and put them into > relational database so that our report generation tool can access the > data. > > I used Bio::SeqIO::embl to get entries one by one, but can not > move on when dealing with big million-line entries. Segmentation Fault > popped. And as currently SeqBuilder is not integrated into > Bio::SeqIO::embl, SeqBuilder->add_unwanted_slot can't help > (http://bugzilla.open-bio.org/show_bug.cgi?id=2823). > > Is there another way to get entires one by one with BioPerl? > > Brian > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From brianli.cas at gmail.com Thu May 7 00:50:26 2009 From: brianli.cas at gmail.com (brian li) Date: Thu, 7 May 2009 12:50:26 +0800 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> Message-ID: Dear Russell, My example code is as following. I omit the parse process and these lines give me "Segmentation Fault" too. # Start of code my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', -format => 'EMBL'); my $index = 1; while (my $seq = $seqio->next_seq) { print "Dealing with entry: $index\n"; $index++; } # End The platform I run this code on: BioPerl 1.6.0 Perl 5.8.8 Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) I have monitored the memory usage when I run the code above. There is always around 20GB free memory (buffer size counted in) left. So I suppose the segfault can't be explained just by memory shortage. Brian On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell wrote: > Hi Brian, > I hate to say it but it worked OK for me using rel_ann_mus_01_r99.dat.gz and simple example Bio::SeqIO code from bugzilla > It's not using more than 1GB memory on our server and doesn't segfault. > > Send me your example code and I'll give it a go if you like. > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809 > F? +64 3 489 9174 > www.agresearch.co.nz > > From Russell.Smithies at agresearch.co.nz Thu May 7 01:01:13 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 7 May 2009 17:01:13 +1200 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> Sadly, that's the same code as I ran but I had a Data::Dump in the middle. Versions of Perl and BioPerl are the same. We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM If you get a full script running on a smaller dataset, I could probably run it on the bigger stuff and give you back tab-separated (or is that tab\tseparated ?) data for loading into your db. --Russell > -----Original Message----- > From: brian li [mailto:brianli.cas at gmail.com] > Sent: Thursday, 7 May 2009 4:50 p.m. > To: Smithies, Russell > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > Dear Russell, > > My example code is as following. I omit the parse process and these > lines give me "Segmentation Fault" too. > > # Start of code > my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', > -format => 'EMBL'); > my $index = 1; > while (my $seq = $seqio->next_seq) > { > print "Dealing with entry: $index\n"; > $index++; > } > # End > > The platform I run this code on: > BioPerl 1.6.0 > Perl 5.8.8 > Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) > > I have monitored the memory usage when I run the code above. There is > always around 20GB free memory (buffer size counted in) left. So I > suppose the segfault can't be explained just by memory shortage. > > Brian > > > On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell > wrote: > > Hi Brian, > > I hate to say it but it worked OK for me using rel_ann_mus_01_r99.dat.gz and > simple example Bio::SeqIO code from bugzilla > > It's not using more than 1GB memory on our server and doesn't segfault. > > > > Send me your example code and I'll give it a go if you like. > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E? russell.smithies at agresearch.co.nz > > > > Invermay? Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T? +64 3 489 3809 > > F? +64 3 489 9174 > > www.agresearch.co.nz > > > > ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From brianli.cas at gmail.com Thu May 7 01:32:56 2009 From: brianli.cas at gmail.com (brian li) Date: Thu, 7 May 2009 13:32:56 +0800 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> Message-ID: Thank you very much for your offer. The director of our lab wants me to do the extraction every time a new release of EMBL is published. I can't push the task to you every time. I can offer more information of the server I run my script on if needed. -Brian On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell wrote: > Sadly, that's the same code as I ran but I had a Data::Dump in the middle. > Versions of Perl and BioPerl are the same. > We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM > > If you get a full script running on a smaller dataset, I could probably run it on the bigger stuff and give you back tab-separated (or is that tab\tseparated ?) data for loading into your db. > > --Russell > >> -----Original Message----- >> From: brian li [mailto:brianli.cas at gmail.com] >> Sent: Thursday, 7 May 2009 4:50 p.m. >> To: Smithies, Russell >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction >> >> Dear Russell, >> >> My example code is as following. I omit the parse process and these >> lines give me "Segmentation Fault" too. >> >> # Start of code >> my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-format => 'EMBL'); >> my $index = 1; >> while (my $seq = $seqio->next_seq) >> { >> ? ? print "Dealing with entry: $index\n"; >> ? ? $index++; >> } >> # End >> >> The platform I run this code on: >> BioPerl 1.6.0 >> Perl 5.8.8 >> Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) >> >> I have monitored the memory usage when I run the code above. There is >> always around 20GB free memory (buffer size counted in) left. So I >> suppose the segfault can't be explained just by memory shortage. >> >> Brian >> >> >> On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell >> wrote: >> > Hi Brian, >> > I hate to say it but it worked OK for me using rel_ann_mus_01_r99.dat.gz and >> simple example Bio::SeqIO code from bugzilla >> > It's not using more than 1GB memory on our server and doesn't segfault. >> > >> > Send me your example code and I'll give it a go if you like. >> > >> > >> > Russell Smithies >> > >> > Bioinformatics Applications Developer >> > T +64 3 489 9085 >> > E? russell.smithies at agresearch.co.nz >> > >> > Invermay? Research Centre >> > Puddle Alley, >> > Mosgiel, >> > New Zealand >> > T? +64 3 489 3809 >> > F? +64 3 489 9174 >> > www.agresearch.co.nz >> > >> > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From Michael.Stubbington at hpa.org.uk Thu May 7 03:53:29 2009 From: Michael.Stubbington at hpa.org.uk (Michael Stubbington) Date: Thu, 7 May 2009 08:53:29 +0100 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters In-Reply-To: <8e5b8bf80905060845q59e91a2l8b84f4839de5065f@mail.gmail.com> References: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> <1A4207F8295607498283FE9E93B775B405F7F7C0@EX02.asurite.ad.asu.edu> <8e5b8bf80905060845q59e91a2l8b84f4839de5065f@mail.gmail.com> Message-ID: <335635A922FA2B43B35B6ADD7929CC59017B138B@porhpaexc001.HPA.org.uk> Jonathan, Thanks a lot for this advice. It now all works for me. Strangely my cap3 installation is not in /usr/local/bin but everything works fine without me having to change $PROGRAMDIR in cap3.pm Thanks to everyone else involved in this thread for their efforts in improving Bio::Tools::Run::Cap3. Best wishes, Mike ________________________________ From: Jonathan Crabtree [mailto:jonathancrabtree at gmail.com] Sent: 06 May 2009 16:46 To: Kevin Brown Cc: Michael Stubbington; bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters The "new" argument to Cap3 expects an array, not a string. So I think you need to do this: my $cap3Factory = Bio::Tools::Run::Cap3->new('y', '150'); rather than this: my $cap3Factory = Bio::Tools::Run::Cap3->new('y 150'); Otherwise it will silently ignore the parameter. There are also several problems with the Cap3 module itself, at least the version shown here: http://cpansearch.perl.org/src/CJFIELDS/BioPerl-run-1.6.1/Bio/Tools/Run/ Cap3.pm Those problems are: 1. "y" is not in the PARAMS array, as Brian and Kevin have noted 2. $PROGRAMDIR appears to be hard-coded to /usr/local/bin (OK if that's where your cap3 is installed) 3. The run() method does this: my $commandstring = $exe . $param_string . " $infilename1"; but at least for the version of cap3 I'm using, you need to put the $param_string _after_ the $infilename1 for it to work. Once all these things are corrected it worked for me and correctly passed the -y 150 to cap3 when new() was called as shown above. Jonathan On Wed, May 6, 2009 at 11:23 AM, Kevin Brown wrote: BEGIN { @PARAMS = qw(a b c d e f g m n o p s u v x); $PROGRAMDIR = '/usr/local/bin'; # Authorize attribute fields foreach my $attr (@PARAMS) { $OK_FIELD{$attr}++; } That is the list of params that Cap3 will accept in the BioPerl module. I'm guessing if you add the y to that list that it might work. > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Michael Stubbington > Sent: Wednesday, May 06, 2009 7:39 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters > > Dear all, > > > > I am using the Bio::Tools::Run::Cap3 wrapper to the Cap3 assembly > program. I have some reads that will only assemble if cap3 is > used with > the '-y 150' option. This is fine from the command line but I > can't work > out how to pass this option to the Cap3 factory object in my script. > > > > If I do the following > > > > my $params = "y 150" ; > > my $cap3Factory = Bio::Tools::Run::Cap3->new($params); > > my $assembly = $cap3Factory->run($file); > > > > Then I get an exception as follows: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Unallowed parameter: y ! > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Root/Root.pm:357 > > STACK: Bio::Tools::Run::Cap3::AUTOLOAD > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:116 > > STACK: Bio::Tools::Run::Cap3::new > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:101 > > STACK: /Users/mike/perlScripts/QGenotype.pl:150 > > > > If I don't try to pass any parameters to Cap3 it runs fine but just > fails to assemble the reads that need the -y 150 flag. > > > > I'd very much appreciate any help with this. I'm pretty new > to bioperl, > hope I haven't missed anything obvious! > > > > Thanks in advance, > > > > Mike > > > > -------------------------------------------------------------- > ---------- > ---- > > Mike Stubbington > > Novel and Dangerous Pathogens > > Health Protection Agency > > Centre for Emergency Preparedness and Response > > Porton Down > > Salisbury > > SP4 0JG > > > > Tel: +44 1980 619812 > > > > > > ----------------------------------------- > ************************************************************** > ************ > The information contained in the EMail and any attachments is > confidential and intended solely and for the attention and use of > the named addressee(s). It may not be disclosed to any other person > without the express authority of the HPA, or the intended > recipient, or both. If you are not the intended recipient, you must > not disclose, copy, distribute or retain this message or any part > of it. This footnote also confirms that this EMail has been swept > for computer viruses, but please re-sweep any attachments before > opening or saving. HTTP://www.HPA.org.uk > ************************************************************** > ************ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------- ************************************************************************** The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of the HPA, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses, but please re-sweep any attachments before opening or saving. HTTP://www.HPA.org.uk ************************************************************************** From webb.daniel at yahoo.com Thu May 7 04:28:44 2009 From: webb.daniel at yahoo.com (Daniel Webb) Date: Thu, 7 May 2009 01:28:44 -0700 (PDT) Subject: [Bioperl-l] retrieving gene sequence given protein id Message-ID: <36212.82053.qm@web45511.mail.sp1.yahoo.com> Awesome! Thank you all for replying :) --- On Wed, 5/6/09, Smithies, Russell wrote: From: Smithies, Russell Subject: RE: [Bioperl-l] retrieving gene sequence given protein id To: "'Daniel Webb'" , "'bioperl-l at lists.open-bio.org'" Date: Wednesday, May 6, 2009, 11:56 PM Hi Daniel, You should be able to do it with Bio::DB::Eutilities http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook Or use wget and manually link from protein (gi2088631) to gene: http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Link&LinkName=protein_gene&from_uid=2088631 then link gene to nucleotide: http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Link&LinkName=gene_nuccore&from_uid=282375 Or use NCBI eUtils http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=coursework?=eutils Or build a pipeline: http://www.ncbi.nlm.nih.gov/Class/wheeler/eutils/epipe.html NCBI is like Perl, there's always more than one way to do it? :-) --Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Daniel Webb > Sent: Wednesday, 6 May 2009 11:22 p.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] retrieving gene sequence given protein id > > > Hi all, > > is there a script or a module with which I could, given the list of protein gi > or accessions, retrieve corresponding genes from Entrez Gene/GenBank? What I > would like is sequence of the whole gene in fasta format, with all the introns > and UTRs. > I would be grateful for any help > > Dan > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Thu May 7 08:07:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 7 May 2009 07:07:54 -0500 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> Message-ID: <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> I noticed that Russell has 16GB RAM on his setup. Was yours equivalent? chris On May 7, 2009, at 12:32 AM, brian li wrote: > Thank you very much for your offer. > > The director of our lab wants me to do the extraction every time a new > release of EMBL is published. I can't push the task to you every time. > > I can offer more information of the server I run my script on if > needed. > > -Brian > > On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell > wrote: >> Sadly, that's the same code as I ran but I had a Data::Dump in the >> middle. >> Versions of Perl and BioPerl are the same. >> We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM >> >> If you get a full script running on a smaller dataset, I could >> probably run it on the bigger stuff and give you back tab-separated >> (or is that tab\tseparated ?) data for loading into your db. >> >> --Russell >> >>> -----Original Message----- >>> From: brian li [mailto:brianli.cas at gmail.com] >>> Sent: Thursday, 7 May 2009 4:50 p.m. >>> To: Smithies, Russell >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction >>> >>> Dear Russell, >>> >>> My example code is as following. I omit the parse process and these >>> lines give me "Segmentation Fault" too. >>> >>> # Start of code >>> my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', >>> -format => 'EMBL'); >>> my $index = 1; >>> while (my $seq = $seqio->next_seq) >>> { >>> print "Dealing with entry: $index\n"; >>> $index++; >>> } >>> # End >>> >>> The platform I run this code on: >>> BioPerl 1.6.0 >>> Perl 5.8.8 >>> Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) >>> >>> I have monitored the memory usage when I run the code above. There >>> is >>> always around 20GB free memory (buffer size counted in) left. So I >>> suppose the segfault can't be explained just by memory shortage. >>> >>> Brian >>> >>> >>> On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell >>> wrote: >>>> Hi Brian, >>>> I hate to say it but it worked OK for me using >>>> rel_ann_mus_01_r99.dat.gz and >>> simple example Bio::SeqIO code from bugzilla >>>> It's not using more than 1GB memory on our server and doesn't >>>> segfault. >>>> >>>> Send me your example code and I'll give it a go if you like. >>>> >>>> >>>> Russell Smithies >>>> >>>> Bioinformatics Applications Developer >>>> T +64 3 489 9085 >>>> E russell.smithies at agresearch.co.nz >>>> >>>> Invermay Research Centre >>>> Puddle Alley, >>>> Mosgiel, >>>> New Zealand >>>> T +64 3 489 3809 >>>> F +64 3 489 9174 >>>> www.agresearch.co.nz >>>> >>>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From brianli.cas at gmail.com Thu May 7 08:59:59 2009 From: brianli.cas at gmail.com (brian li) Date: Thu, 7 May 2009 20:59:59 +0800 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> Message-ID: My server has 32 GB RAM. The os of my server is 64-bit version of Ubuntu Server Edition 8.04 LTS. And I have run my example code on another server with 32-bit version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. -Brian On Thu, May 7, 2009 at 8:07 PM, Chris Fields wrote: > I noticed that Russell has 16GB RAM on his setup. ?Was yours equivalent? > > chris > > On May 7, 2009, at 12:32 AM, brian li wrote: > >> Thank you very much for your offer. >> >> The director of our lab wants me to do the extraction every time a new >> release of EMBL is published. I can't push the task to you every time. >> >> I can offer more information of the server I run my script on if needed. >> >> -Brian >> >> On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell >> wrote: >>> >>> Sadly, that's the same code as I ran but I had a Data::Dump in the >>> middle. >>> Versions of Perl and BioPerl are the same. >>> We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM >>> >>> If you get a full script running on a smaller dataset, I could probably >>> run it on the bigger stuff and give you back tab-separated (or is that >>> tab\tseparated ?) data for loading into your db. >>> >>> --Russell >>> >>>> -----Original Message----- >>>> From: brian li [mailto:brianli.cas at gmail.com] >>>> Sent: Thursday, 7 May 2009 4:50 p.m. >>>> To: Smithies, Russell >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction >>>> >>>> Dear Russell, >>>> >>>> My example code is as following. I omit the parse process and these >>>> lines give me "Segmentation Fault" too. >>>> >>>> # Start of code >>>> my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -format => 'EMBL'); >>>> my $index = 1; >>>> while (my $seq = $seqio->next_seq) >>>> { >>>> ? ?print "Dealing with entry: $index\n"; >>>> ? ?$index++; >>>> } >>>> # End >>>> >>>> The platform I run this code on: >>>> BioPerl 1.6.0 >>>> Perl 5.8.8 >>>> Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) >>>> >>>> I have monitored the memory usage when I run the code above. There is >>>> always around 20GB free memory (buffer size counted in) left. So I >>>> suppose the segfault can't be explained just by memory shortage. >>>> >>>> Brian >>>> >>>> >>>> On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell >>>> wrote: >>>>> >>>>> Hi Brian, >>>>> I hate to say it but it worked OK for me using >>>>> rel_ann_mus_01_r99.dat.gz and >>>> >>>> simple example Bio::SeqIO code from bugzilla >>>>> >>>>> It's not using more than 1GB memory on our server and doesn't segfault. >>>>> >>>>> Send me your example code and I'll give it a go if you like. >>>>> >>>>> >>>>> Russell Smithies >>>>> >>>>> Bioinformatics Applications Developer >>>>> T +64 3 489 9085 >>>>> E ?russell.smithies at agresearch.co.nz >>>>> >>>>> Invermay ?Research Centre >>>>> Puddle Alley, >>>>> Mosgiel, >>>>> New Zealand >>>>> T ?+64 3 489 3809 >>>>> F ?+64 3 489 9174 >>>>> www.agresearch.co.nz >>>>> >>>>> >>> ======================================================================= >>> Attention: The information contained in this message and/or attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or privileged >>> material. Any review, retransmission, dissemination or other use of, or >>> taking of any action in reliance upon, this information by persons or >>> entities other than the intended recipients is prohibited by AgResearch >>> Limited. If you have received this message in error, please notify the >>> sender immediately. >>> ======================================================================= >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jonathancrabtree at gmail.com Thu May 7 10:20:08 2009 From: jonathancrabtree at gmail.com (Jonathan Crabtree) Date: Thu, 7 May 2009 10:20:08 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters In-Reply-To: <335635A922FA2B43B35B6ADD7929CC59017B138B@porhpaexc001.HPA.org.uk> References: <335635A922FA2B43B35B6ADD7929CC59017B12E8@porhpaexc001.HPA.org.uk> <1A4207F8295607498283FE9E93B775B405F7F7C0@EX02.asurite.ad.asu.edu> <8e5b8bf80905060845q59e91a2l8b84f4839de5065f@mail.gmail.com> <335635A922FA2B43B35B6ADD7929CC59017B138B@porhpaexc001.HPA.org.uk> Message-ID: <8e5b8bf80905070720j837b842x32a8b0b3d924c544@mail.gmail.com> No problem. With respect to the location of cap3, I was a bit quick to pass judgment and didn't look at exactly what WrapperBase does: it first tries the hard-coded directory location from the Cap3 module, and then falls back to Bio::Root:IO::exists_exe, which searches your PATH for the executable, provided that File::Spec can be loaded. Jonathan On Thu, May 7, 2009 at 3:53 AM, Michael Stubbington < Michael.Stubbington at hpa.org.uk> wrote: > Jonathan, > > > > Thanks a lot for this advice. It now all works for me. > > > > Strangely my cap3 installation is not in /usr/local/bin but everything > works fine without me having to change $PROGRAMDIR in cap3.pm > > > > Thanks to everyone else involved in this thread for their efforts in > improving Bio::Tools::Run::Cap3. > > > > Best wishes, > > > > Mike > > > ------------------------------ > > *From:* Jonathan Crabtree [mailto:jonathancrabtree at gmail.com] > *Sent:* 06 May 2009 16:46 > *To:* Kevin Brown > *Cc:* Michael Stubbington; bioperl-l at lists.open-bio.org > *Subject:* Re: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters > > > > > The "new" argument to Cap3 expects an array, not a string. So I think you > need to do this: > > my $cap3Factory = Bio::Tools::Run::Cap3->new('y', '150'); > > rather than this: > > my $cap3Factory = Bio::Tools::Run::Cap3->new('y 150'); > > Otherwise it will silently ignore the parameter. There are also several > problems with the Cap3 module itself, at least the version shown here: > > > http://cpansearch.perl.org/src/CJFIELDS/BioPerl-run-1.6.1/Bio/Tools/Run/Cap3.pm > > Those problems are: > > 1. "y" is not in the PARAMS array, as Brian and Kevin have noted > 2. $PROGRAMDIR appears to be hard-coded to /usr/local/bin (OK if that's > where your cap3 is installed) > 3. The run() method does this: > > my $commandstring = $exe . $param_string . " $infilename1"; > > but at least for the version of cap3 I'm using, you need to put the > $param_string _after_ the $infilename1 for it to work. Once all these > things are corrected it worked for me and correctly passed the -y 150 to > cap3 when new() was called as shown above. > > Jonathan > > On Wed, May 6, 2009 at 11:23 AM, Kevin Brown > wrote: > > BEGIN { > > @PARAMS = qw(a b c d e f g m n o p s u v x); > $PROGRAMDIR = '/usr/local/bin'; > > # Authorize attribute fields > foreach my $attr (@PARAMS) { $OK_FIELD{$attr}++; > > } > > That is the list of params that Cap3 will accept in the BioPerl module. > I'm guessing if you add the y to that list that it might work. > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Michael Stubbington > > Sent: Wednesday, May 06, 2009 7:39 AM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bio::Tools::Run::Cap3 - Parameters > > > > Dear all, > > > > > > > > I am using the Bio::Tools::Run::Cap3 wrapper to the Cap3 assembly > > program. I have some reads that will only assemble if cap3 is > > used with > > the '-y 150' option. This is fine from the command line but I > > can't work > > out how to pass this option to the Cap3 factory object in my script. > > > > > > > > If I do the following > > > > > > > > my $params = "y 150" ; > > > > my $cap3Factory = Bio::Tools::Run::Cap3->new($params); > > > > my $assembly = $cap3Factory->run($file); > > > > > > > > Then I get an exception as follows: > > > > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > > MSG: Unallowed parameter: y ! > > > > STACK: Error::throw > > > > STACK: Bio::Root::Root::throw > > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Root/Root.pm:357 > > > > STACK: Bio::Tools::Run::Cap3::AUTOLOAD > > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:116 > > > > STACK: Bio::Tools::Run::Cap3::new > > /Users/mike/lib/perl5/site_perl/5.8.9/Bio/Tools/Run/Cap3.pm:101 > > > > STACK: /Users/mike/perlScripts/QGenotype.pl:150 > > > > > > > > If I don't try to pass any parameters to Cap3 it runs fine but just > > fails to assemble the reads that need the -y 150 flag. > > > > > > > > I'd very much appreciate any help with this. I'm pretty new > > to bioperl, > > hope I haven't missed anything obvious! > > > > > > > > Thanks in advance, > > > > > > > > Mike > > > > > > > > -------------------------------------------------------------- > > ---------- > > ---- > > > > Mike Stubbington > > > > Novel and Dangerous Pathogens > > > > Health Protection Agency > > > > Centre for Emergency Preparedness and Response > > > > Porton Down > > > > Salisbury > > > > SP4 0JG > > > > > > > > Tel: +44 1980 619812 > > > > > > > > > > > > ----------------------------------------- > > ************************************************************** > > ************ > > The information contained in the EMail and any attachments is > > confidential and intended solely and for the attention and use of > > the named addressee(s). It may not be disclosed to any other person > > without the express authority of the HPA, or the intended > > recipient, or both. If you are not the intended recipient, you must > > not disclose, copy, distribute or retain this message or any part > > of it. This footnote also confirms that this EMail has been swept > > for computer viruses, but please re-sweep any attachments before > > opening or saving. HTTP://www.HPA.org.uk > > ************************************************************** > > ************ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From wgallin at ualberta.ca Thu May 7 16:01:58 2009 From: wgallin at ualberta.ca (Warren Gallin) Date: Thu, 7 May 2009 14:01:58 -0600 Subject: [Bioperl-l] Appending efetch results to a file Message-ID: <9325A632-C366-456B-AF94-604E77A1F9AF@ualberta.ca> Hi, I am having trouble with a script that was working a few months ago, but has started giving unexpected results. I need to request 100's of records, and to avoid stress the Entrez server I do my fetching inside a loop that increments the -retstart parameter in the factory. This should append the fetched records to the file that I am using to collect all the records, but instead it is replacing the file. How can I make the get_Response append to an existing file instead of overwriting it? Warren Gallin From Russell.Smithies at agresearch.co.nz Thu May 7 17:24:53 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 8 May 2009 09:24:53 +1200 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> I'm not sure if this will help with your problem or how it deals with memory management but using "ordinary" Perl to split the large EMBL file might work. Give this a go: ============================ #!perl -w use Bio::SeqIO; use IO::String; use constant SEP => "//\n"; open($fh, "gunzip -c rel_ann_mus_01_r99.dat.gz |") or die; my $index = 1; while(my $stringfh = new IO::String(get_next_record($fh))){ my $seqio = Bio::SeqIO->new( -fh => $stringfh,-format => "EMBL" ) or die $!; while ( my $seq_object = $seqio->next_seq ) { print "Dealing with entry: ".$index++."\t".$seq_object->id."\n"; # show the features for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } } } sub get_next_record{ my($fh) = @_; (my $old_sep,$/) = ($/,SEP); my $record = <$fh>; $/ = $old_sep; return $record; } ======================================== --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of brian li > Sent: Friday, 8 May 2009 1:00 a.m. > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > My server has 32 GB RAM. > > The os of my server is 64-bit version of Ubuntu Server Edition 8.04 > LTS. And I have run my example code on another server with 32-bit > version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. > > -Brian > > On Thu, May 7, 2009 at 8:07 PM, Chris Fields wrote: > > I noticed that Russell has 16GB RAM on his setup. ?Was yours equivalent? > > > > chris > > > > On May 7, 2009, at 12:32 AM, brian li wrote: > > > >> Thank you very much for your offer. > >> > >> The director of our lab wants me to do the extraction every time a new > >> release of EMBL is published. I can't push the task to you every time. > >> > >> I can offer more information of the server I run my script on if needed. > >> > >> -Brian > >> > >> On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell > >> wrote: > >>> > >>> Sadly, that's the same code as I ran but I had a Data::Dump in the > >>> middle. > >>> Versions of Perl and BioPerl are the same. > >>> We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM > >>> > >>> If you get a full script running on a smaller dataset, I could probably > >>> run it on the bigger stuff and give you back tab-separated (or is that > >>> tab\tseparated ?) data for loading into your db. > >>> > >>> --Russell > >>> > >>>> -----Original Message----- > >>>> From: brian li [mailto:brianli.cas at gmail.com] > >>>> Sent: Thursday, 7 May 2009 4:50 p.m. > >>>> To: Smithies, Russell > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > >>>> > >>>> Dear Russell, > >>>> > >>>> My example code is as following. I omit the parse process and these > >>>> lines give me "Segmentation Fault" too. > >>>> > >>>> # Start of code > >>>> my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', > >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -format => 'EMBL'); > >>>> my $index = 1; > >>>> while (my $seq = $seqio->next_seq) > >>>> { > >>>> ? ?print "Dealing with entry: $index\n"; > >>>> ? ?$index++; > >>>> } > >>>> # End > >>>> > >>>> The platform I run this code on: > >>>> BioPerl 1.6.0 > >>>> Perl 5.8.8 > >>>> Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) > >>>> > >>>> I have monitored the memory usage when I run the code above. There is > >>>> always around 20GB free memory (buffer size counted in) left. So I > >>>> suppose the segfault can't be explained just by memory shortage. > >>>> > >>>> Brian > >>>> > >>>> > >>>> On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell > >>>> wrote: > >>>>> > >>>>> Hi Brian, > >>>>> I hate to say it but it worked OK for me using > >>>>> rel_ann_mus_01_r99.dat.gz and > >>>> > >>>> simple example Bio::SeqIO code from bugzilla > >>>>> > >>>>> It's not using more than 1GB memory on our server and doesn't segfault. > >>>>> > >>>>> Send me your example code and I'll give it a go if you like. > >>>>> > >>>>> > >>>>> Russell Smithies > >>>>> > >>>>> Bioinformatics Applications Developer > >>>>> T +64 3 489 9085 > >>>>> E ?russell.smithies at agresearch.co.nz > >>>>> > >>>>> Invermay ?Research Centre > >>>>> Puddle Alley, > >>>>> Mosgiel, > >>>>> New Zealand > >>>>> T ?+64 3 489 3809 > >>>>> F ?+64 3 489 9174 > >>>>> www.agresearch.co.nz > >>>>> > >>>>> > >>> ======================================================================= > >>> Attention: The information contained in this message and/or attachments > >>> from AgResearch Limited is intended only for the persons or entities > >>> to which it is addressed and may contain confidential and/or privileged > >>> material. Any review, retransmission, dissemination or other use of, or > >>> taking of any action in reliance upon, this information by persons or > >>> entities other than the intended recipients is prohibited by AgResearch > >>> Limited. If you have received this message in error, please notify the > >>> sender immediately. > >>> ======================================================================= > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Thu May 7 17:54:39 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 7 May 2009 14:54:39 -0700 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> Message-ID: <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> Russell - I am not sure how that will help as only 1 sequence is parsed at a time by SeqIO parsers and they use the "//" delimiter. If the equivalent data exists in genbank format at NCBI I think _that_ module (Bio::SeqIO::genbank) has the ability to ignore annotations/features. Really we have to re-work the whole thing to be more lightweight and lazy-parse. -jason On May 7, 2009, at 2:24 PM, Smithies, Russell wrote: > I'm not sure if this will help with your problem or how it deals > with memory management but using "ordinary" Perl to split the large > EMBL file might work. > Give this a go: > > ============================ > #!perl -w > > use Bio::SeqIO; > use IO::String; > > use constant SEP => "//\n"; > > open($fh, "gunzip -c rel_ann_mus_01_r99.dat.gz |") or die; > > my $index = 1; > > while(my $stringfh = new IO::String(get_next_record($fh))){ > > my $seqio = Bio::SeqIO->new( -fh => $stringfh,-format => > "EMBL" ) or die $!; > > while ( my $seq_object = $seqio->next_seq ) { > print "Dealing with entry: ".$index++."\t".$seq_object->id."\n"; > > # show the features > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object->primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object->get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > } > > } > > > sub get_next_record{ > my($fh) = @_; > (my $old_sep,$/) = ($/,SEP); > my $record = <$fh>; > $/ = $old_sep; > return $record; > } > ======================================== > > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of brian li >> Sent: Friday, 8 May 2009 1:00 a.m. >> To: Chris Fields >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction >> >> My server has 32 GB RAM. >> >> The os of my server is 64-bit version of Ubuntu Server Edition 8.04 >> LTS. And I have run my example code on another server with 32-bit >> version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. >> >> -Brian >> >> On Thu, May 7, 2009 at 8:07 PM, Chris Fields >> wrote: >>> I noticed that Russell has 16GB RAM on his setup. Was yours >>> equivalent? >>> >>> chris >>> >>> On May 7, 2009, at 12:32 AM, brian li wrote: >>> >>>> Thank you very much for your offer. >>>> >>>> The director of our lab wants me to do the extraction every time >>>> a new >>>> release of EMBL is published. I can't push the task to you every >>>> time. >>>> >>>> I can offer more information of the server I run my script on if >>>> needed. >>>> >>>> -Brian >>>> >>>> On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell >>>> wrote: >>>>> >>>>> Sadly, that's the same code as I ran but I had a Data::Dump in the >>>>> middle. >>>>> Versions of Perl and BioPerl are the same. >>>>> We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM >>>>> >>>>> If you get a full script running on a smaller dataset, I could >>>>> probably >>>>> run it on the bigger stuff and give you back tab-separated (or >>>>> is that >>>>> tab\tseparated ?) data for loading into your db. >>>>> >>>>> --Russell >>>>> >>>>>> -----Original Message----- >>>>>> From: brian li [mailto:brianli.cas at gmail.com] >>>>>> Sent: Thursday, 7 May 2009 4:50 p.m. >>>>>> To: Smithies, Russell >>>>>> Cc: bioperl-l at lists.open-bio.org >>>>>> Subject: Re: [Bioperl-l] Asking for advice on full EMBL >>>>>> extraction >>>>>> >>>>>> Dear Russell, >>>>>> >>>>>> My example code is as following. I omit the parse process and >>>>>> these >>>>>> lines give me "Segmentation Fault" too. >>>>>> >>>>>> # Start of code >>>>>> my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', >>>>>> -format => 'EMBL'); >>>>>> my $index = 1; >>>>>> while (my $seq = $seqio->next_seq) >>>>>> { >>>>>> print "Dealing with entry: $index\n"; >>>>>> $index++; >>>>>> } >>>>>> # End >>>>>> >>>>>> The platform I run this code on: >>>>>> BioPerl 1.6.0 >>>>>> Perl 5.8.8 >>>>>> Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) >>>>>> >>>>>> I have monitored the memory usage when I run the code above. >>>>>> There is >>>>>> always around 20GB free memory (buffer size counted in) left. >>>>>> So I >>>>>> suppose the segfault can't be explained just by memory shortage. >>>>>> >>>>>> Brian >>>>>> >>>>>> >>>>>> On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell >>>>>> wrote: >>>>>>> >>>>>>> Hi Brian, >>>>>>> I hate to say it but it worked OK for me using >>>>>>> rel_ann_mus_01_r99.dat.gz and >>>>>> >>>>>> simple example Bio::SeqIO code from bugzilla >>>>>>> >>>>>>> It's not using more than 1GB memory on our server and doesn't >>>>>>> segfault. >>>>>>> >>>>>>> Send me your example code and I'll give it a go if you like. >>>>>>> >>>>>>> >>>>>>> Russell Smithies >>>>>>> >>>>>>> Bioinformatics Applications Developer >>>>>>> T +64 3 489 9085 >>>>>>> E russell.smithies at agresearch.co.nz >>>>>>> >>>>>>> Invermay Research Centre >>>>>>> Puddle Alley, >>>>>>> Mosgiel, >>>>>>> New Zealand >>>>>>> T +64 3 489 3809 >>>>>>> F +64 3 489 9174 >>>>>>> www.agresearch.co.nz >>>>>>> >>>>>>> >>>>> = >>>>> = >>>>> = >>>>> = >>>>> = >>>>> ================================================================== >>>>> Attention: The information contained in this message and/or >>>>> attachments >>>>> from AgResearch Limited is intended only for the persons or >>>>> entities >>>>> to which it is addressed and may contain confidential and/or >>>>> privileged >>>>> material. Any review, retransmission, dissemination or other use >>>>> of, or >>>>> taking of any action in reliance upon, this information by >>>>> persons or >>>>> entities other than the intended recipients is prohibited by >>>>> AgResearch >>>>> Limited. If you have received this message in error, please >>>>> notify the >>>>> sender immediately. >>>>> = >>>>> = >>>>> = >>>>> = >>>>> = >>>>> ================================================================== >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From Russell.Smithies at agresearch.co.nz Thu May 7 18:05:55 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 8 May 2009 10:05:55 +1200 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493CE904C@exchsth.agresearch.co.nz> OK, I misunderstood, I thought the entire file loaded was loaded into memory first then each sequence was extracted from there. I hoped splitting into 588 individual sequences might help. --Russell From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Friday, 8 May 2009 9:55 a.m. To: Smithies, Russell Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction Russell - I am not sure how that will help as only 1 sequence is parsed at a time by SeqIO parsers and they use the "//" delimiter. If the equivalent data exists in genbank format at NCBI I think _that_ module (Bio::SeqIO::genbank) has the ability to ignore annotations/features. Really we have to re-work the whole thing to be more lightweight and lazy-parse. -jason On May 7, 2009, at 2:24 PM, Smithies, Russell wrote: I'm not sure if this will help with your problem or how it deals with memory management but using "ordinary" Perl to split the large EMBL file might work. Give this a go: ============================ #!perl -w use Bio::SeqIO; use IO::String; use constant SEP => "//\n"; open($fh, "gunzip -c rel_ann_mus_01_r99.dat.gz |") or die; my $index = 1; while(my $stringfh = new IO::String(get_next_record($fh))){ my $seqio = Bio::SeqIO->new( -fh => $stringfh,-format => "EMBL" ) or die $!; while ( my $seq_object = $seqio->next_seq ) { print "Dealing with entry: ".$index++."\t".$seq_object->id."\n"; # show the features for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } } } sub get_next_record{ my($fh) = @_; (my $old_sep,$/) = ($/,SEP); my $record = <$fh>; $/ = $old_sep; return $record; } ======================================== --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- bounces at lists.open-bio.org] On Behalf Of brian li Sent: Friday, 8 May 2009 1:00 a.m. To: Chris Fields Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction My server has 32 GB RAM. The os of my server is 64-bit version of Ubuntu Server Edition 8.04 LTS. And I have run my example code on another server with 32-bit version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. -Brian On Thu, May 7, 2009 at 8:07 PM, Chris Fields > wrote: I noticed that Russell has 16GB RAM on his setup. Was yours equivalent? chris On May 7, 2009, at 12:32 AM, brian li wrote: Thank you very much for your offer. The director of our lab wants me to do the extraction every time a new release of EMBL is published. I can't push the task to you every time. I can offer more information of the server I run my script on if needed. -Brian On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell > wrote: Sadly, that's the same code as I ran but I had a Data::Dump in the middle. Versions of Perl and BioPerl are the same. We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM If you get a full script running on a smaller dataset, I could probably run it on the bigger stuff and give you back tab-separated (or is that tab\tseparated ?) data for loading into your db. --Russell -----Original Message----- From: brian li [mailto:brianli.cas at gmail.com] Sent: Thursday, 7 May 2009 4:50 p.m. To: Smithies, Russell Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction Dear Russell, My example code is as following. I omit the parse process and these lines give me "Segmentation Fault" too. # Start of code my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', -format => 'EMBL'); my $index = 1; while (my $seq = $seqio->next_seq) { print "Dealing with entry: $index\n"; $index++; } # End The platform I run this code on: BioPerl 1.6.0 Perl 5.8.8 Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) I have monitored the memory usage when I run the code above. There is always around 20GB free memory (buffer size counted in) left. So I suppose the segfault can't be explained just by memory shortage. Brian On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell > wrote: Hi Brian, I hate to say it but it worked OK for me using rel_ann_mus_01_r99.dat.gz and simple example Bio::SeqIO code from bugzilla It's not using more than 1GB memory on our server and doesn't segfault. Send me your example code and I'll give it a go if you like. Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From wgallin at ualberta.ca Thu May 7 19:00:45 2009 From: wgallin at ualberta.ca (Warren Gallin) Date: Thu, 7 May 2009 17:00:45 -0600 Subject: [Bioperl-l] More on Eutilities get_Response problem Message-ID: Hi, I am using the get_response method inside a loop, so I want to iteratively append the retrieved material to a file. If I pass temp_hold.gb as the file parameter a file called temp_hold.gb is created and that file is successively overwritten as I cycle through the loop. If I pass >temp_hold.gb as the file parameter a file called temp_hold.gb is created and that file is successively overwritten as I cycle through the loop. If I pass >>temp_hold.gb as the file parameter a file called >temp_hold.gb (yes, the > is part of the file name) is created and that file is successively overwritten as I cycle through the loop. Could it be that the way the file parameter is passed in has been slightly broken so it is no loner reading the >> as an indicator to append? Warren Gallin From Russell.Smithies at agresearch.co.nz Thu May 7 19:04:52 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 8 May 2009 11:04:52 +1200 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493CE90A4@exchsth.agresearch.co.nz> I guess Tie::File is going to do the same thing? (this works on my 32-bit Windows pc with 2GB RAM but is slow) --Russell ===================== #!perl -w use Bio::SeqIO; use IO::String; use Tie::File; tie @array, 'Tie::File', "rel_ann_mus_01_r99.dat", recsep => "//\n" or die $!; print "loaded ". $#array." records\n"; for (my $i = 0; $i < $#array; $i++) { print "$i\n"; my $seqio = Bio::SeqIO->new( -fh => new IO::String($array[$i]), -format => "EMBL" ) or die $!; # should only be one seq my $seq_object = $seqio->next_seq; print "Dealing with entry: $i\t" . $seq_object->id . "\n"; } ===================== From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Friday, 8 May 2009 9:55 a.m. To: Smithies, Russell Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction Russell - I am not sure how that will help as only 1 sequence is parsed at a time by SeqIO parsers and they use the "//" delimiter. If the equivalent data exists in genbank format at NCBI I think _that_ module (Bio::SeqIO::genbank) has the ability to ignore annotations/features. Really we have to re-work the whole thing to be more lightweight and lazy-parse. -jason On May 7, 2009, at 2:24 PM, Smithies, Russell wrote: I'm not sure if this will help with your problem or how it deals with memory management but using "ordinary" Perl to split the large EMBL file might work. Give this a go: ============================ #!perl -w use Bio::SeqIO; use IO::String; use constant SEP => "//\n"; open($fh, "gunzip -c rel_ann_mus_01_r99.dat.gz |") or die; my $index = 1; while(my $stringfh = new IO::String(get_next_record($fh))){ my $seqio = Bio::SeqIO->new( -fh => $stringfh,-format => "EMBL" ) or die $!; while ( my $seq_object = $seqio->next_seq ) { print "Dealing with entry: ".$index++."\t".$seq_object->id."\n"; # show the features for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } } } sub get_next_record{ my($fh) = @_; (my $old_sep,$/) = ($/,SEP); my $record = <$fh>; $/ = $old_sep; return $record; } ======================================== --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- bounces at lists.open-bio.org] On Behalf Of brian li Sent: Friday, 8 May 2009 1:00 a.m. To: Chris Fields Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction My server has 32 GB RAM. The os of my server is 64-bit version of Ubuntu Server Edition 8.04 LTS. And I have run my example code on another server with 32-bit version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. -Brian On Thu, May 7, 2009 at 8:07 PM, Chris Fields > wrote: I noticed that Russell has 16GB RAM on his setup. Was yours equivalent? chris On May 7, 2009, at 12:32 AM, brian li wrote: Thank you very much for your offer. The director of our lab wants me to do the extraction every time a new release of EMBL is published. I can't push the task to you every time. I can offer more information of the server I run my script on if needed. -Brian On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell > wrote: Sadly, that's the same code as I ran but I had a Data::Dump in the middle. Versions of Perl and BioPerl are the same. We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM If you get a full script running on a smaller dataset, I could probably run it on the bigger stuff and give you back tab-separated (or is that tab\tseparated ?) data for loading into your db. --Russell -----Original Message----- From: brian li [mailto:brianli.cas at gmail.com] Sent: Thursday, 7 May 2009 4:50 p.m. To: Smithies, Russell Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction Dear Russell, My example code is as following. I omit the parse process and these lines give me "Segmentation Fault" too. # Start of code my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', -format => 'EMBL'); my $index = 1; while (my $seq = $seqio->next_seq) { print "Dealing with entry: $index\n"; $index++; } # End The platform I run this code on: BioPerl 1.6.0 Perl 5.8.8 Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) I have monitored the memory usage when I run the code above. There is always around 20GB free memory (buffer size counted in) left. So I suppose the segfault can't be explained just by memory shortage. Brian On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell > wrote: Hi Brian, I hate to say it but it worked OK for me using rel_ann_mus_01_r99.dat.gz and simple example Bio::SeqIO code from bugzilla It's not using more than 1GB memory on our server and doesn't segfault. Send me your example code and I'll give it a go if you like. Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From jason at bioperl.org Thu May 7 19:25:16 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 7 May 2009 16:25:16 -0700 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493CE904C@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE904C@exchsth.agresearch.co.nz> Message-ID: <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> It parses from a stream or file, one sequence at a time so it only reads a single sequence out at a time, but it does have to parse that whole sequence record which is where feature rich sequences might be causing problems. I think per your other mention of Tie::File - the whole file is not going into memory so that is not the problem, it is the creation of many objects that it does as it parses the sequence that is likely the problem. It will read up to the first "//" from that Tie::File anyways, that becomes an entire string which is then parsed to pull out the relevant features so you don't gain anything with Tie::File -- what would be the way to solve it is if the objects could be created and reside in a DB on disk rather than in-memory. I'd really enjoy seeing more indexed and hashed data to objects stored on disk when mem requirements are such so that very large datasets can be handled more nimbly. I think there have been several attempts to simplify, but it basically means a dedicated developer to really overhaul or map to a new system. What we've tried to build is a decent API so a new implementation can be done without affecting the 'next_seq' and 'write_seq' API. Non-withstanding the seemed API confusion caused by _ancient_ decisions on giving function names of Bio::SeqFeatureI 'seq' and Bio::PrimarySeq 'seq' which return different types -- don't forget that Lincoln's Bio::DB::Fasta uses the 'seq' method to return a sequence as a string as well so major API changes in general here will create in all likelihood a big split between the branches that will make any new Bioperl not match up well with existing scripts or libraries that use it - hence the reason for no "great realigning" to a completely well-planned out API rather than the organically grown whims of several generations of devs. I say this in jest a bit - I do want to see changes, but I think it really will have to be called something else besides BioPerl to avoid confusion and the fact that a lot of things will break that depend on the current APIs. BioPerl2 or something indicating a Perl6 association. -jason On May 7, 2009, at 3:05 PM, Smithies, Russell wrote: > OK, I misunderstood, I thought the entire file loaded was loaded > into memory first then each sequence was extracted from there. > I hoped splitting into 588 individual sequences might help. > > --Russell > > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of > Jason Stajich > Sent: Friday, 8 May 2009 9:55 a.m. > To: Smithies, Russell > Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > Russell - > > I am not sure how that will help as only 1 sequence is parsed at a > time by SeqIO parsers and they use the "//" delimiter. > > If the equivalent data exists in genbank format at NCBI I think > _that_ module (Bio::SeqIO::genbank) has the ability to ignore > annotations/features. Really we have to re-work the whole thing to > be more lightweight and lazy-parse. > > -jason > On May 7, 2009, at 2:24 PM, Smithies, Russell wrote: > > > I'm not sure if this will help with your problem or how it deals > with memory management but using "ordinary" Perl to split the large > EMBL file might work. > Give this a go: > > ============================ > #!perl -w > > use Bio::SeqIO; > use IO::String; > > use constant SEP => "//\n"; > > open($fh, "gunzip -c rel_ann_mus_01_r99.dat.gz |") or die; > > my $index = 1; > > while(my $stringfh = new IO::String(get_next_record($fh))){ > > my $seqio = Bio::SeqIO->new( -fh => $stringfh,-format > => "EMBL" ) or die $!; > > while ( my $seq_object = $seqio->next_seq ) { > print "Dealing with entry: ".$index++."\t".$seq_object- > >id."\n"; > > # show the features > for my $feat_object ($seq_object->get_SeqFeatures) { > print "primary tag: ", $feat_object- > >primary_tag, "\n"; > for my $tag ($feat_object->get_all_tags) { > print " tag: ", $tag, "\n"; > for my $value ($feat_object- > >get_tag_values($tag)) { > print " value: ", $value, "\n"; > } > } > } > } > > } > > > sub get_next_record{ > my($fh) = @_; > (my $old_sep,$/) = ($/,SEP); > my $record = <$fh>; > $/ = $old_sep; > return $record; > } > ======================================== > > > --Russell > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On > Behalf Of brian li > Sent: Friday, 8 May 2009 1:00 a.m. > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > My server has 32 GB RAM. > > The os of my server is 64-bit version of Ubuntu Server Edition 8.04 > LTS. And I have run my example code on another server with 32-bit > version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. > > -Brian > > On Thu, May 7, 2009 at 8:07 PM, Chris Fields >> wrote: > I noticed that Russell has 16GB RAM on his setup. Was yours > equivalent? > > chris > > On May 7, 2009, at 12:32 AM, brian li wrote: > > Thank you very much for your offer. > > The director of our lab wants me to do the extraction every time a new > release of EMBL is published. I can't push the task to you every time. > > I can offer more information of the server I run my script on if > needed. > > -Brian > > On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell > >> wrote: > > Sadly, that's the same code as I ran but I had a Data::Dump in the > middle. > Versions of Perl and BioPerl are the same. > We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM > > If you get a full script running on a smaller dataset, I could > probably > run it on the bigger stuff and give you back tab-separated (or is that > tab\tseparated ?) data for loading into your db. > > --Russell > > -----Original Message----- > From: brian li [mailto:brianli.cas at gmail.com] > Sent: Thursday, 7 May 2009 4:50 p.m. > To: Smithies, Russell > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > Dear Russell, > > My example code is as following. I omit the parse process and these > lines give me "Segmentation Fault" too. > > # Start of code > my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', > -format => 'EMBL'); > my $index = 1; > while (my $seq = $seqio->next_seq) > { > print "Dealing with entry: $index\n"; > $index++; > } > # End > > The platform I run this code on: > BioPerl 1.6.0 > Perl 5.8.8 > Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) > > I have monitored the memory usage when I run the code above. There is > always around 20GB free memory (buffer size counted in) left. So I > suppose the segfault can't be explained just by memory shortage. > > Brian > > > On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell > >> wrote: > > Hi Brian, > I hate to say it but it worked OK for me using > rel_ann_mus_01_r99.dat.gz and > > simple example Bio::SeqIO code from bugzilla > > It's not using more than 1GB memory on our server and doesn't > segfault. > > Send me your example code and I'll give it a go if you like. > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E russell.smithies at agresearch.co.nz > > > Invermay Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T +64 3 489 3809 > F +64 3 489 9174 > www.agresearch.co.nz > > > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > Jason Stajich jason at bioperl.org From Russell.Smithies at agresearch.co.nz Thu May 7 20:03:58 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 8 May 2009 12:03:58 +1200 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE904C@exchsth.agresearch.co.nz> <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493CE9104@exchsth.agresearch.co.nz> I think the problem here though is the size of the sequences rather than too many features. If one was inclined to bodge/hack and didn't care about sequence, I guess you could filter them out with awk so Bio::SeqIO doesn't have to create the Bio::PrimarySeq :) Probably breaks the EMBL file spec ... Eg. open( $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk '!/^SQ|^ /{print}' |" ) or die; --Russell From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Friday, 8 May 2009 11:25 a.m. To: Smithies, Russell Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction It parses from a stream or file, one sequence at a time so it only reads a single sequence out at a time, but it does have to parse that whole sequence record which is where feature rich sequences might be causing problems. I think per your other mention of Tie::File - the whole file is not going into memory so that is not the problem, it is the creation of many objects that it does as it parses the sequence that is likely the problem. It will read up to the first "//" from that Tie::File anyways, that becomes an entire string which is then parsed to pull out the relevant features so you don't gain anything with Tie::File -- what would be the way to solve it is if the objects could be created and reside in a DB on disk rather than in-memory. I'd really enjoy seeing more indexed and hashed data to objects stored on disk when mem requirements are such so that very large datasets can be handled more nimbly. I think there have been several attempts to simplify, but it basically means a dedicated developer to really overhaul or map to a new system. What we've tried to build is a decent API so a new implementation can be done without affecting the 'next_seq' and 'write_seq' API. Non-withstanding the seemed API confusion caused by _ancient_ decisions on giving function names of Bio::SeqFeatureI 'seq' and Bio::PrimarySeq 'seq' which return different types -- don't forget that Lincoln's Bio::DB::Fasta uses the 'seq' method to return a sequence as a string as well so major API changes in general here will create in all likelihood a big split between the branches that will make any new Bioperl not match up well with existing scripts or libraries that use it - hence the reason for no "great realigning" to a completely well-planned out API rather than the organically grown whims of several generations of devs. I say this in jest a bit - I do want to see changes, but I think it really will have to be called something else besides BioPerl to avoid confusion and the fact that a lot of things will break that depend on the current APIs. BioPerl2 or something indicating a Perl6 association. -jason On May 7, 2009, at 3:05 PM, Smithies, Russell wrote: OK, I misunderstood, I thought the entire file loaded was loaded into memory first then each sequence was extracted from there. I hoped splitting into 588 individual sequences might help. --Russell From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Friday, 8 May 2009 9:55 a.m. To: Smithies, Russell Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction Russell - I am not sure how that will help as only 1 sequence is parsed at a time by SeqIO parsers and they use the "//" delimiter. If the equivalent data exists in genbank format at NCBI I think _that_ module (Bio::SeqIO::genbank) has the ability to ignore annotations/features. Really we have to re-work the whole thing to be more lightweight and lazy-parse. -jason On May 7, 2009, at 2:24 PM, Smithies, Russell wrote: I'm not sure if this will help with your problem or how it deals with memory management but using "ordinary" Perl to split the large EMBL file might work. Give this a go: ============================ #!perl -w use Bio::SeqIO; use IO::String; use constant SEP => "//\n"; open($fh, "gunzip -c rel_ann_mus_01_r99.dat.gz |") or die; my $index = 1; while(my $stringfh = new IO::String(get_next_record($fh))){ my $seqio = Bio::SeqIO->new( -fh => $stringfh,-format => "EMBL" ) or die $!; while ( my $seq_object = $seqio->next_seq ) { print "Dealing with entry: ".$index++."\t".$seq_object->id."\n"; # show the features for my $feat_object ($seq_object->get_SeqFeatures) { print "primary tag: ", $feat_object->primary_tag, "\n"; for my $tag ($feat_object->get_all_tags) { print " tag: ", $tag, "\n"; for my $value ($feat_object->get_tag_values($tag)) { print " value: ", $value, "\n"; } } } } } sub get_next_record{ my($fh) = @_; (my $old_sep,$/) = ($/,SEP); my $record = <$fh>; $/ = $old_sep; return $record; } ======================================== --Russell -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- bounces at lists.open-bio.org] On Behalf Of brian li Sent: Friday, 8 May 2009 1:00 a.m. To: Chris Fields Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction My server has 32 GB RAM. The os of my server is 64-bit version of Ubuntu Server Edition 8.04 LTS. And I have run my example code on another server with 32-bit version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. -Brian On Thu, May 7, 2009 at 8:07 PM, Chris Fields > wrote: I noticed that Russell has 16GB RAM on his setup. Was yours equivalent? chris On May 7, 2009, at 12:32 AM, brian li wrote: Thank you very much for your offer. The director of our lab wants me to do the extraction every time a new release of EMBL is published. I can't push the task to you every time. I can offer more information of the server I run my script on if needed. -Brian On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell > wrote: Sadly, that's the same code as I ran but I had a Data::Dump in the middle. Versions of Perl and BioPerl are the same. We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM If you get a full script running on a smaller dataset, I could probably run it on the bigger stuff and give you back tab-separated (or is that tab\tseparated ?) data for loading into your db. --Russell -----Original Message----- From: brian li [mailto:brianli.cas at gmail.com] Sent: Thursday, 7 May 2009 4:50 p.m. To: Smithies, Russell Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction Dear Russell, My example code is as following. I omit the parse process and these lines give me "Segmentation Fault" too. # Start of code my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', -format => 'EMBL'); my $index = 1; while (my $seq = $seqio->next_seq) { print "Dealing with entry: $index\n"; $index++; } # End The platform I run this code on: BioPerl 1.6.0 Perl 5.8.8 Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) I have monitored the memory usage when I run the code above. There is always around 20GB free memory (buffer size counted in) left. So I suppose the segfault can't be explained just by memory shortage. Brian On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell > wrote: Hi Brian, I hate to say it but it worked OK for me using rel_ann_mus_01_r99.dat.gz and simple example Bio::SeqIO code from bugzilla It's not using more than 1GB memory on our server and doesn't segfault. Send me your example code and I'll give it a go if you like. Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E russell.smithies at agresearch.co.nz Invermay Research Centre Puddle Alley, Mosgiel, New Zealand T +64 3 489 3809 F +64 3 489 9174 www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org Jason Stajich jason at bioperl.org From valiente at lsi.upc.edu Fri May 8 08:49:22 2009 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Fri, 8 May 2009 21:49:22 +0900 Subject: [Bioperl-l] The Power of R (Chris Fields) In-Reply-To: References: Message-ID: <8F435B66-33CF-467D-8D86-AA8EF2309E98@lsi.upc.edu> >>> While we're on the topic, can anyone recommend a good book or >>> resource from which to learn R, to supplement the official docs? Well, my new book G. Valiente. Combinatorial Pattern Matching Algorithms in Computational Biology using Perl and R. Taylor & Francis/CRC Press (2009) http://www.crcpress.com/product/isbn/9781420063677 is already available. I hope it will also be of much use to BioPerl developers and users. Gabriel From maj at fortinbras.us Fri May 8 08:43:16 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 8 May 2009 08:43:16 -0400 Subject: [Bioperl-l] More on Eutilities get_Response problem In-Reply-To: References: Message-ID: <2C9B353790344B11A3A5F0ACB1F95830@NewLife> Hi Warren, The get_Response function is really a wrapper for LWP::UserAgent::get; as such, the -file parameter works differently from the usual BioPerl -file. I agree that this is a bug; it's just not a BioPerl bug. If the behavior of your script really did change, maybe it did so after an update of LWP::UserAgent. Anyway, one way to work around this is to use the callback instead of the file parameter; something like my $global_file = 'eutil-dump.txt'; ... $thing->get_Response( -cb => \&_append_file ); ... sub _append_file { my ($data, $response_obj, $protocol_obj) = @_; open my $fh, ">>$global_file" or die "can't open dump file: $!"; print $fh $data; return; } See http://search.cpan.org/~gaas/libwww-perl-5.826/lib/LWP/UserAgent.pm, and 'perldoc Bio::DB::EUtilities'. cheers, Mark ----- Original Message ----- From: "Warren Gallin" To: "BioPerl List" Sent: Thursday, May 07, 2009 7:00 PM Subject: [Bioperl-l] More on Eutilities get_Response problem > Hi, > > I am using the get_response method inside a loop, so I want to iteratively > append the retrieved material to a file. > > If I pass temp_hold.gb as the file parameter a file called temp_hold.gb is > created and that file is successively overwritten as I cycle through the > loop. > > If I pass >temp_hold.gb as the file parameter a file called temp_hold.gb is > created and that file is successively overwritten as I cycle through the > loop. > > If I pass >>temp_hold.gb as the file parameter a file called > >temp_hold.gb (yes, the > is part of the file name) is created and > that file is successively overwritten as I cycle through the loop. > > Could it be that the way the file parameter is passed in has been slightly > broken so it is no loner reading the >> as an indicator to append? > > > Warren Gallin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri May 8 08:27:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 8 May 2009 07:27:09 -0500 Subject: [Bioperl-l] More on Eutilities get_Response problem In-Reply-To: References: Message-ID: Append was working at some point (on Mac OS X). Just curious, what OS are you using? Regardless, there is some hacky code in get_Response to deal with possible filename issues, but I'll change that to delegate to a Bio::Root::IO for consistency, just in case. Also, so it's available as an internal workaround, I'll add in a -fh (filehandle) option. As a current workaround, you could open a file handle on your own and just print to it: open(my $fh, '>>', 'mydata.gb'); # later in a loop while (...) { print $fh $eutil->get_Response(); } chris On May 7, 2009, at 6:00 PM, Warren Gallin wrote: > Hi, > > I am using the get_response method inside a loop, so I want to > iteratively append the retrieved material to a file. > > If I pass temp_hold.gb as the file parameter a file called > temp_hold.gb is created and that file is successively overwritten as > I cycle through the loop. > > If I pass >temp_hold.gb as the file parameter a file called > temp_hold.gb is created and that file is successively overwritten as > I cycle through the loop. > > If I pass >>temp_hold.gb as the file parameter a file called > >temp_hold.gb (yes, the > is part of the file name) is created and > that file is successively overwritten as I cycle through the loop. > > Could it be that the way the file parameter is passed in has been > slightly broken so it is no loner reading the >> as an indicator to > append? > > > Warren Gallin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri May 8 09:42:59 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 8 May 2009 09:42:59 -0400 Subject: [Bioperl-l] Appending efetch results to a file In-Reply-To: <9325A632-C366-456B-AF94-604E77A1F9AF@ualberta.ca> References: <9325A632-C366-456B-AF94-604E77A1F9AF@ualberta.ca> Message-ID: > I need to request 100's of records, and to avoid stress the Entrez > server I do my fetching inside a loop that increments the -retstart > parameter in the factory. This raises a question in my mind: should EUtilities use Bio::WebAgent rather than LWP::UserAgent directly, and doesn't Bio::WebAgent have magical properties that ease the server burden without having to build it into the user code directly? ----- Original Message ----- From: "Warren Gallin" To: "BioPerl List" Sent: Thursday, May 07, 2009 4:01 PM Subject: [Bioperl-l] Appending efetch results to a file > Hi, > > I am having trouble with a script that was working a few months ago, > but has started giving unexpected results. > > I need to request 100's of records, and to avoid stress the Entrez > server I do my fetching inside a loop that increments the -retstart > parameter in the factory. This should append the fetched records to > the file that I am using to collect all the records, but instead it is > replacing the file. How can I make the get_Response append to an > existing file instead of overwriting it? > > Warren Gallin > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Fri May 8 10:22:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 8 May 2009 09:22:31 -0500 Subject: [Bioperl-l] Appending efetch results to a file In-Reply-To: References: <9325A632-C366-456B-AF94-604E77A1F9AF@ualberta.ca> Message-ID: On May 8, 2009, at 8:42 AM, Mark A. Jensen wrote: >> I need to request 100's of records, and to avoid stress the Entrez >> server I do my fetching inside a loop that increments the - >> retstart parameter in the factory. > > This raises a question in my mind: should EUtilities use > Bio::WebAgent rather > than LWP::UserAgent directly, and doesn't Bio::WebAgent have > magical properties that ease the server burden without having to > build it into the user code directly? I thought about that originally, but there is a significant difference between the two agent implementations. Bio::WebAgent is-a LWP::Useragent subclass, whereas Bio::DB::GenericWebAgent and it's ilk contain a user agent instance (has-a). I choose the latter course b/c I favor composition over inheritance, and LWP::UserAgent uses different named parameter handling than BioPerl (no '-'); Bio::WebAgent code works around that in the constructor. Rather that than the possibility of down the road to run into odd parameter issues. Not to mention, I may genericize it more in the future to be capable of using SOAP-based methods, so switching out the ua made more sense in the long run (still a lot to do on that end). I haven't discussed this extensively on the list before, but when I redesigned EUtilities I wanted to separate out the various tasks, e.g. ua, parser, parameter handling, etc. So, for the specific eutil tools, parser = Bio::Tools:EUtilities, parameter = Bio::Tools::EUtilities::EUtilParameters, ua = LWP::UserAgent. For other DBs one could switch out the relevant bits for DB-specific implementations. Then, Bio::DB::EUtilities basically decorates all three, acts as the traffic cop to get the various bits playing well together, delegates as needed, etc. This'll allow additional components to be added in at later points if needed, and the basic tool can be used for retrieving raw data or as a souped-up agent for retrieving remote data in a new set of modules (Bio::Entrez::*, maybe). There are some experimental bits in there still (repeated requests with the exact same params do not spam eutils, for instance, and there is some 'lazy' code in the parser), but it seems to largely work, and those bits can be removed fairly easily if they prove problematic. chris From brianli.cas at gmail.com Fri May 8 10:48:32 2009 From: brianli.cas at gmail.com (brian li) Date: Fri, 8 May 2009 22:48:32 +0800 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493CE9104@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE904C@exchsth.agresearch.co.nz> <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE9104@exchsth.agresearch.co.nz> Message-ID: open $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk '!/^FT|^CO/{print}' |" works. open $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk '!/^SQ|^ /{print}' |" segfaults. So it seems the features are causing problems. Although I still don't know how that hurts my os to pop a segfault, my extraction can move on again. Maybe I can find a clue when I know more about my os's memory management strategy. Really appreciate all your help. -Brian On Fri, May 8, 2009 at 8:03 AM, Smithies, Russell wrote: > I think the problem here though is the size of the sequences rather than too > many features. > > If one was inclined to bodge/hack and didn?t care about sequence, I guess > you could filter them out with awk so Bio::SeqIO doesn?t have to create the > Bio::PrimarySeq J > > Probably breaks the EMBL file spec ? > > Eg. > > open( $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk '!/^SQ|^ /{print}' |" > ) or die; > > > > > > --Russell > > > > > > > > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason > Stajich > Sent: Friday, 8 May 2009 11:25 a.m. > To: Smithies, Russell > Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > > > It parses from a stream or file, one sequence at a time so it only reads a > single sequence out at a time, but it does have to parse that whole sequence > record which is where feature rich sequences might be causing problems. > > > > I think per your other mention of Tie::File - the whole file is not going > into memory so that is not the problem, it is the creation of many objects > that it does as it parses the sequence that is likely the problem. ?It will > read up to the first "//" from that Tie::File anyways, that becomes an > entire string which is then parsed to pull out the relevant features so you > don't gain anything with Tie::File -- what would be the way to solve it is > if the objects could be created and reside in a DB on disk rather than > in-memory. ?I'd really enjoy seeing more indexed and hashed data to objects > stored on disk when mem requirements are such so that very large datasets > can be handled more nimbly. > > > > I think there have been several attempts to simplify, but it basically means > a dedicated developer to really overhaul or map to a new system. ?What we've > tried to build is a decent API so a new implementation can be done without > affecting the 'next_seq' and 'write_seq' API. > > > > Non-withstanding the seemed API confusion caused by _ancient_ decisions on > giving function names of Bio::SeqFeatureI 'seq' and Bio::PrimarySeq 'seq' > which return different types -- don't forget that Lincoln's Bio::DB::Fasta > uses the 'seq' method to return a sequence as a string as well so major API > changes in general here will create in all likelihood a big split between > the branches that will make any new Bioperl not match up well with existing > scripts or libraries that use it - hence the reason for no "great > realigning" to a completely well-planned out API rather than the organically > grown whims of several generations of devs. ?I say this in jest a bit - I do > want to see changes, but I think it really will have to be called something > else besides BioPerl to avoid confusion and the fact that a lot of things > will break that depend on the current APIs. ?BioPerl2 or something > indicating a Perl6 association. > > > > -jason > > On May 7, 2009, at 3:05 PM, Smithies, Russell wrote: > > OK, I misunderstood, I thought the entire file loaded was loaded into memory > first then each sequence was extracted from there. > I hoped splitting into 588 individual sequences might help. > > --Russell > > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason > Stajich > Sent: Friday, 8 May 2009 9:55 a.m. > To: Smithies, Russell > Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > Russell - > > I am not sure how that will help as only 1 sequence is parsed at a time by > SeqIO parsers and they use the "//" delimiter. > > If the equivalent data exists in genbank format at NCBI I think _that_ > ?module (Bio::SeqIO::genbank) has the ability to ignore > annotations/features. ?Really we have to re-work the whole thing to be more > lightweight and lazy-parse. > > -jason > On May 7, 2009, at 2:24 PM, Smithies, Russell wrote: > > > I'm not sure if this will help with your problem or how it deals with memory > management but using "ordinary" Perl to split the large EMBL file might > work. > Give this a go: > > ============================ > #!perl -w > > use Bio::SeqIO; > use IO::String; > > use constant SEP => "//\n"; > > open($fh, "gunzip -c rel_ann_mus_01_r99.dat.gz |") or die; > > my $index = 1; > > while(my $stringfh = new IO::String(get_next_record($fh))){ > > ?????????my $seqio = Bio::SeqIO->new( -fh ????=> $stringfh,-format => "EMBL" > ) or die $!; > > ?????????while ( my $seq_object = $seqio->next_seq ) { > ??????????print "Dealing with entry: ".$index++."\t".$seq_object->id."\n"; > > ??????????# show the features > ??????????for my $feat_object ($seq_object->get_SeqFeatures) { > ???????????????????????print "primary tag: ", $feat_object->primary_tag, > "\n"; > ???????????????????????for my $tag ($feat_object->get_all_tags) { > ??????????????????????????print " ?tag: ", $tag, "\n"; > ??????????????????????????for my $value ($feat_object->get_tag_values($tag)) > { > ?????????????????????????????print " ???value: ", $value, "\n"; > ??????????????????????????} > ???????????????????????} > ?????????????????????} > ?????????} > > } > > > sub get_next_record{ > ?????????my($fh) = @_; > ?????????(my $old_sep,$/) = ($/,SEP); > ?????????my $record = <$fh>; > ?????????$/ = $old_sep; > ?????????return $record; > } > ======================================== > > > --Russell > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of > brian li > Sent: Friday, 8 May 2009 1:00 a.m. > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > My server has 32 GB RAM. > > The os of my server is 64-bit version of Ubuntu Server Edition 8.04 > LTS. And I have run my example code on another server with 32-bit > version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. > > -Brian > > On Thu, May 7, 2009 at 8:07 PM, Chris Fields > > wrote: > I noticed that Russell has 16GB RAM on his setup. ?Was yours equivalent? > > chris > > On May 7, 2009, at 12:32 AM, brian li wrote: > > Thank you very much for your offer. > > The director of our lab wants me to do the extraction every time a new > release of EMBL is published. I can't push the task to you every time. > > I can offer more information of the server I run my script on if needed. > > -Brian > > On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell > > > wrote: > > Sadly, that's the same code as I ran but I had a Data::Dump in the > middle. > Versions of Perl and BioPerl are the same. > We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM > > If you get a full script running on a smaller dataset, I could probably > run it on the bigger stuff and give you back tab-separated (or is that > tab\tseparated ?) data for loading into your db. > > --Russell > > -----Original Message----- > From: brian li [mailto:brianli.cas at gmail.com] > Sent: Thursday, 7 May 2009 4:50 p.m. > To: Smithies, Russell > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > Dear Russell, > > My example code is as following. I omit the parse process and these > lines give me "Segmentation Fault" too. > > # Start of code > my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', > ???????????????????????????????????????????-format => 'EMBL'); > my $index = 1; > while (my $seq = $seqio->next_seq) > { > ??print "Dealing with entry: $index\n"; > ??$index++; > } > # End > > The platform I run this code on: > BioPerl 1.6.0 > Perl 5.8.8 > Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) > > I have monitored the memory usage when I run the code above. There is > always around 20GB free memory (buffer size counted in) left. So I > suppose the segfault can't be explained just by memory shortage. > > Brian > > > On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell > > > wrote: > > Hi Brian, > I hate to say it but it worked OK for me using > rel_ann_mus_01_r99.dat.gz and > > simple example Bio::SeqIO code from bugzilla > > It's not using more than 1GB memory on our server and doesn't segfault. > > Send me your example code and I'll give it a go if you like. > > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E > ?russell.smithies at agresearch.co.nz > > Invermay ?Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T ?+64 3 489 3809 > F ?+64 3 489 9174 > www.agresearch.co.nz > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > > > Jason Stajich > > jason at bioperl.org > > > > > > > > From avilella at gmail.com Fri May 8 12:43:55 2009 From: avilella at gmail.com (Albert Vilella) Date: Fri, 8 May 2009 17:43:55 +0100 Subject: [Bioperl-l] parsing paml output In-Reply-To: <1252ab5f0905080844r5664f3d2la7eede658602bcb5@mail.gmail.com> References: <1252ab5f0905080844r5664f3d2la7eede658602bcb5@mail.gmail.com> Message-ID: <358f4d650905080943i69016199t4df58bbe4853cff2@mail.gmail.com> If I remember correctly, there was a way to just parse the output that is decoupled of running PAML through bioperl. I am ccing the bioperl mailing list as I've seen people having parsing issues with some newer versions of PAML. What version are you trying to parse? Can you attach a small example? On Fri, May 8, 2009 at 4:44 PM, Irene Newton wrote: > Hello! > > I first want to thank you for contributing your module to the bioperl > community. Every time I think, "hey, wouldn't it be great if someone coded > this tool?" It's there! It's much appreciated. > > I've been trying to implement it but am confused about one thing: what is > the main codeml output file that the parser expects? I usually work with > the *.out files or the rst files but when I try either of those as input, > the parser throws an error: > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Unknown format of PAML output > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Phylo::PAML::_parse_summary > /usr/local/share/perl/5.8.8/Bio/Tools/Phylo/PAML.pm:359 > STACK: Bio::Tools::Phylo::PAML::next_result > /usr/local/share/perl/5.8.8/Bio/Tools/Phylo/PAML.pm:224 > STACK: ./paml_parser.pl:14 > ---------------------------------------------------------------- > > Any thoughts? Warm regards, > Irene > > -- > Irene L.G. Newton > Postdoctoral Fellow > Tufts University - Microbiology Department > Jaharis 424 > 136 Harrison Ave. > Boston, MA 02111 > From jason at bioperl.org Fri May 8 12:57:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Fri, 8 May 2009 09:57:54 -0700 Subject: [Bioperl-l] parsing paml output In-Reply-To: <358f4d650905080943i69016199t4df58bbe4853cff2@mail.gmail.com> References: <1252ab5f0905080844r5664f3d2la7eede658602bcb5@mail.gmail.com> <358f4d650905080943i69016199t4df58bbe4853cff2@mail.gmail.com> Message-ID: <00798FDA-02D6-4527-A8F6-904D9C88F617@bioperl.org> The parsing is decoupled that is Bio::Tools::Phylo::PAML and I'm pretty sure where the errors are coming from. I think a sample report and sample script as a bug report is a good first step in case this a simple problem to diagnose. However we need programmers who want to work on this problem to step up and help update the module to deal with new variants in the PAML output file format. I think we've failed to recruit any new developers to work on supporting the latest PAML so things work(ed) for 3.15 but I think 4 has new variations in the format that cause it to fall over. -jason On May 8, 2009, at 9:43 AM, Albert Vilella wrote: > If I remember correctly, there was a way to just parse the output > that is > decoupled of running PAML through bioperl. > I am ccing the bioperl mailing list as I've seen people having parsing > issues with some newer versions of PAML. > > What version are you trying to parse? Can you attach a small example? > > On Fri, May 8, 2009 at 4:44 PM, Irene Newton > wrote: > >> Hello! >> >> I first want to thank you for contributing your module to the bioperl >> community. Every time I think, "hey, wouldn't it be great if >> someone coded >> this tool?" It's there! It's much appreciated. >> >> I've been trying to implement it but am confused about one thing: >> what is >> the main codeml output file that the parser expects? I usually >> work with >> the *.out files or the rst files but when I try either of those as >> input, >> the parser throws an error: >> >> ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >> MSG: Unknown format of PAML output >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/local/share/perl/5.8.8/Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Phylo::PAML::_parse_summary >> /usr/local/share/perl/5.8.8/Bio/Tools/Phylo/PAML.pm:359 >> STACK: Bio::Tools::Phylo::PAML::next_result >> /usr/local/share/perl/5.8.8/Bio/Tools/Phylo/PAML.pm:224 >> STACK: ./paml_parser.pl:14 >> ---------------------------------------------------------------- >> >> Any thoughts? Warm regards, >> Irene >> >> -- >> Irene L.G. Newton >> Postdoctoral Fellow >> Tufts University - Microbiology Department >> Jaharis 424 >> 136 Harrison Ave. >> Boston, MA 02111 >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From cjfields at illinois.edu Fri May 8 13:32:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 8 May 2009 12:32:46 -0500 Subject: [Bioperl-l] More on Eutilities get_Response problem In-Reply-To: <2C9B353790344B11A3A5F0ACB1F95830@NewLife> References: <2C9B353790344B11A3A5F0ACB1F95830@NewLife> Message-ID: <39D1D481-EF7F-4948-98FF-3A60DEF29DE5@illinois.edu> Yes, whoops, forgot about that (slept since then). Yes, the filename is passed on to LWP::UserAgent, http://search.cpan.org/~gaas/libwww-perl-5.826/lib/LWP/UserAgent.pm#REQUEST_METHODS IIRC, there was an odd issue that popped up when passing the filename onto get(), but I so recall append working correctly at one point. Let me se what I can work out. chris On May 8, 2009, at 7:43 AM, Mark A. Jensen wrote: > Hi Warren, > > The get_Response function is really a wrapper for LWP::UserAgent::get; > as such, the -file parameter works differently from the usual > BioPerl -file. > I agree that this is a bug; it's just not a BioPerl bug. If the > behavior of your > script really did change, maybe it did so after an update of > LWP::UserAgent. > Anyway, one way to work around this is to use the callback instead > of the > file parameter; something like > > my $global_file = 'eutil-dump.txt'; > ... > $thing->get_Response( -cb => \&_append_file ); > ... > sub _append_file { > my ($data, $response_obj, $protocol_obj) = @_; > open my $fh, ">>$global_file" or die "can't open dump file: $!"; > print $fh $data; > return; > } > > See http://search.cpan.org/~gaas/libwww-perl-5.826/lib/LWP/UserAgent.pm > , > and 'perldoc Bio::DB::EUtilities'. > > cheers, > Mark > > > > ----- Original Message ----- From: "Warren Gallin" > > To: "BioPerl List" > Sent: Thursday, May 07, 2009 7:00 PM > Subject: [Bioperl-l] More on Eutilities get_Response problem > > >> Hi, >> >> I am using the get_response method inside a loop, so I want to >> iteratively append the retrieved material to a file. >> >> If I pass temp_hold.gb as the file parameter a file called >> temp_hold.gb is created and that file is successively overwritten >> as I cycle through the loop. >> >> If I pass >temp_hold.gb as the file parameter a file called >> temp_hold.gb is created and that file is successively overwritten >> as I cycle through the loop. >> >> If I pass >>temp_hold.gb as the file parameter a file called >> >temp_hold.gb (yes, the > is part of the file name) is created and >> that file is successively overwritten as I cycle through the loop. >> >> Could it be that the way the file parameter is passed in has been >> slightly broken so it is no loner reading the >> as an indicator >> to append? >> >> >> Warren Gallin >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri May 8 13:45:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 8 May 2009 12:45:09 -0500 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE904C@exchsth.agresearch.co.nz> <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> Message-ID: On May 7, 2009, at 6:25 PM, Jason Stajich wrote: > It parses from a stream or file, one sequence at a time so it only > reads a single sequence out at a time, but it does have to parse > that whole sequence record which is where feature rich sequences > might be causing problems. > > I think per your other mention of Tie::File - the whole file is not > going into memory so that is not the problem, it is the creation of > many objects that it does as it parses the sequence that is likely > the problem. It will read up to the first "//" from that Tie::File > anyways, that becomes an entire string which is then parsed to pull > out the relevant features so you don't gain anything with Tie::File > -- what would be the way to solve it is if the objects could be > created and reside in a DB on disk rather than in-memory. I'd > really enjoy seeing more indexed and hashed data to objects stored > on disk when mem requirements are such so that very large datasets > can be handled more nimbly. Or maybe implement some lazy iterator-based methods. We have brought up the subject of the SwissKnife modules here before... > I think there have been several attempts to simplify, but it > basically means a dedicated developer to really overhaul or map to a > new system. What we've tried to build is a decent API so a new > implementation can be done without affecting the 'next_seq' and > 'write_seq' API. > > Non-withstanding the seemed API confusion caused by _ancient_ > decisions on giving function names of Bio::SeqFeatureI 'seq' and > Bio::PrimarySeq 'seq' which return different types -- don't forget > that Lincoln's Bio::DB::Fasta uses the 'seq' method to return a > sequence as a string as well so major API changes in general here > will create in all likelihood a big split between the branches that > will make any new Bioperl not match up well with existing scripts or > libraries that use it - hence the reason for no "great realigning" > to a completely well-planned out API rather than the organically > grown whims of several generations of devs. I say this in jest a > bit - I do want to see changes, but I think it really will have to > be called something else besides BioPerl to avoid confusion and the > fact that a lot of things will break that depend on the current > APIs. BioPerl2 or something indicating a Perl6 association. > > -jason Just thought of this: doesn't the feature iterator in Bio::DB::SeqFeature::Store use next_seq for features? Yikes... Anyway, I think if we set a decent enough deprecation schedule, users would adjust, but that's generally for small changes. Dramatic large-scale changes (such as Moose integration and conversion of interfaces to roles) should be done in a separate project. Similarly, as mentioned before, perl6 is a different (yet related) beast to perl5, and so a bioperl-related project using perl6 shouldn't be called BioPerl 2.0. The nice aspect of this: we can take what we like from BioPerl now and refactor it for either project, along the way making sure only the most critical modules get in. chris From raulmendez at cbm.uam.es Fri May 8 12:19:40 2009 From: raulmendez at cbm.uam.es (Raul Mendez Giraldez) Date: Fri, 08 May 2009 18:19:40 +0200 Subject: [Bioperl-l] How to get coil prediction out of Bio::Tools::Run::Coil modules Message-ID: <1241799580.6963.165.camel@pepa.cbm.uam.es> Hi, I'm trying to get coiled-coiled prediction in protein sequences using Bob Russell's program ncoils, through the bioperl interface Bio::Tools::Run::Coil, but the only thing I can get from any element on the features list is just the sequence name, and few more not so useful atributes. I'm running the following script: #!/home/rmendez/bin/perl -w use strict; use FileHandle; use Data::Dumper; use Bio::Tools::Run::Coil; my $seqin=filein.fasta my $factory=Bio::Tools::Run::Coil->new('-c'); my @features=$factory->run($seqin); print "Printing content of features[0]\n"; print Dumper $features[0]; ---- And the output is (the content of the first element of the features array) is : '_gsf_tag_hash' => { 'percent_id' => [ 'NULL' ], 'hid' => [ 'ncoils' ], 'evalue' => [ 0 ] }, '_location' => bless( { '_location_type' => 'EXACT', '_start' => 138, '_end' => 172 }, 'Bio::Location::Simple' ), '_gsf_seq_id' => 'ENSDARP00000084927', '_parse_h' => {}, '_root_cleanup_methods' => [ sub { "DUMMY" } ], '_source_tag' => 'Coils', '_primary_tag' => 'ncoils', '_root_verbose' => 0 }, 'Bio::SeqFeature::Generic' ); Then how could I get the sequence itself with the coil annotation 'xxx'? Thanks, Raul From cjfields at illinois.edu Fri May 8 13:45:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 8 May 2009 12:45:09 -0500 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> References: <18DF7D20DFEC044098A1062202F5FFF32493CE8E5C@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE904C@exchsth.agresearch.co.nz> <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> Message-ID: On May 7, 2009, at 6:25 PM, Jason Stajich wrote: > It parses from a stream or file, one sequence at a time so it only > reads a single sequence out at a time, but it does have to parse > that whole sequence record which is where feature rich sequences > might be causing problems. > > I think per your other mention of Tie::File - the whole file is not > going into memory so that is not the problem, it is the creation of > many objects that it does as it parses the sequence that is likely > the problem. It will read up to the first "//" from that Tie::File > anyways, that becomes an entire string which is then parsed to pull > out the relevant features so you don't gain anything with Tie::File > -- what would be the way to solve it is if the objects could be > created and reside in a DB on disk rather than in-memory. I'd > really enjoy seeing more indexed and hashed data to objects stored > on disk when mem requirements are such so that very large datasets > can be handled more nimbly. Or maybe implement some lazy iterator-based methods. We have brought up the subject of the SwissKnife modules here before... > I think there have been several attempts to simplify, but it > basically means a dedicated developer to really overhaul or map to a > new system. What we've tried to build is a decent API so a new > implementation can be done without affecting the 'next_seq' and > 'write_seq' API. > > Non-withstanding the seemed API confusion caused by _ancient_ > decisions on giving function names of Bio::SeqFeatureI 'seq' and > Bio::PrimarySeq 'seq' which return different types -- don't forget > that Lincoln's Bio::DB::Fasta uses the 'seq' method to return a > sequence as a string as well so major API changes in general here > will create in all likelihood a big split between the branches that > will make any new Bioperl not match up well with existing scripts or > libraries that use it - hence the reason for no "great realigning" > to a completely well-planned out API rather than the organically > grown whims of several generations of devs. I say this in jest a > bit - I do want to see changes, but I think it really will have to > be called something else besides BioPerl to avoid confusion and the > fact that a lot of things will break that depend on the current > APIs. BioPerl2 or something indicating a Perl6 association. > > -jason Just thought of this: doesn't the feature iterator in Bio::DB::SeqFeature::Store use next_seq for features? Yikes... Anyway, I think if we set a decent enough deprecation schedule, users would adjust, but that's generally for small changes. Dramatic large-scale changes (such as Moose integration and conversion of interfaces to roles) should be done in a separate project. Similarly, as mentioned before, perl6 is a different (yet related) beast to perl5, and so a bioperl-related project using perl6 shouldn't be called BioPerl 2.0. The nice aspect of this: we can take what we like from BioPerl now and refactor it for either project, along the way making sure only the most critical modules get in. chris From sidd.basu at gmail.com Fri May 8 14:30:19 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Fri, 8 May 2009 13:30:19 -0500 Subject: [Bioperl-l] Re: Moose [was Re:Other object oddities] In-Reply-To: References: <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> Message-ID: <4a047a3e.23bb720a.3b09.ffff9430@mx.google.com> On Wed, 06 May 2009, Chris Fields wrote: > As a final bit: if we go the Moose route, we should be very careful about > which MooseX modules we want. I don't think we want to expand the > dependency tree. For instance, I am attempting to install one possible > module (MooseX::Declare) and the dependency tree was ginormous and included > modules only needed for installation. > > chris Since we are on the topic of Moose dependencies, here is a nice article about it. http://chris.prather.org/perl/moose-dependencies-a-lurid-tale/ -siddhartha > > On May 6, 2009, at 12:56 PM, Mark A. Jensen wrote: > > > Great discussion-- I have redacted the moose portions to > > http://www.bioperl.org/wiki/Talk:BioMoose and encourage all interested > > folks to log comments there as well. cheers Mark > > ----- Original Message ----- From: "Chris Mungall" > > To: "Chris Fields" > > Cc: "BioPerl List" ; "Mark A. Jensen" > > ; "Kevin Brown" > > Sent: Tuesday, May 05, 2009 2:28 PM > > Subject: [Bioperl-l] Moose [was Re: Other object oddities] > > > > > >> > >> On May 5, 2009, at 7:31 AM, Chris Fields wrote: > >> > >>> On May 5, 2009, at 7:31 AM, Hilmar Lapp wrote: > >>> > >>>> > >>>> On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: > >>>> > >>>>> Maybe this should be an element of > >>>>> the "Align refactor" that perhaps should be an overall > >>>>> "Seq refactor". > >>>> > >>>> Possibly. Most importantly, it'd be great if someone would volunteer > >>>> to summarize what's been said here so it won't get lost. > >>> > >>> Looks like mark's done it. > >>> > >>>>> Are you saying that the trunk is fair game for api additions > >>>>> for this issue? > >>>> > >>>> There's been talk some (a long, actually) time ago about BioPerl 2.0 > >>>> that would start on a clean slate and not be bothered by backwards > >>>> compatibility demands. That effort never really took off, but maybe > >>>> this is also a good time to ask the question again whether it's better > >>>> to introduce the API changes we desire in add/ deprecate/remove cycles, > >>>> or in a more radical fashion starting on a clean slate. > >>> > >>> That's what I'm thinking. > >>> > >>>> The obvious advantage of the former is that we get API improvements > >>>> sooner, but making them is possibly more dreadful, discouraging, or > >>>> not even doable due to compatibility constraints. The disadvantage of > >>>> the latter is that it really needs a committed crew of people to see > >>>> it through or otherwise all the nice changes die in some grand but > >>>> half-finished 2.0 construction site. I think Chris also had plans to > >>>> branch off a Perl6 version of BioPerl - maybe those could be the same > >>>> efforts? > >>> > >>> I have been toying around with perl6 for a bit now (Rakudo on Parrot > >>> implementation). It's possible an alpha for perl6 will be available by > >>> christmas this year; Rakudo is now passing over 11000 spec tests. > >>> > >>> Just to note, Perl6 is another beast altogether from Perl5. Yes, there > >>> is supposed to be a backwards compatibility mode, but no one has > >>> implemented that yet, and it likely won't be implemented in the near > >>> future. Based on that I'm not sure we could really call a bioperl in > >>> perl6 bioperl 2.0, more like bioperl6 1.0, as it would be a complete > >>> refactor. > >>> > >>> As for perl5, it has a nice OO set of modules (Moose) that could be > >>> used for refactoring. It implements roles and a few other perl6-ish > >>> bits (along with MooseX modules). perl 5.10 also has a few things > >>> backported from p6, say(), given/when, state vars, etc. We could > >>> require Modern::Perl (perl5.10 with strict/warnings pragmas on) and > >>> Moose. I have played around with both and find them quite nice, so I > >>> suggest if we were to start a 2.0 effort it should include Moose, and > >>> we should push most of the interfaces into roles. > >> > >> We're playing around with a rewrite of go-perl using Moose: > >> http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/OBO/ > >> > >> This is early enough that parts could be scrapped or rewritten. > >> Compatibility with bioperl is important. > >> > >> Speed was an initial concern but apparently there are some moose tricks > >> to speed things up > >> > >> DBIx::Class compatibility is also important. Not sure if there is > >> specific support for this yet > >> > >> > >>> > >>> Anyway, I grabbed the git repos for bioperl6 and biomoose (bioperl > >>> implemented in Moose) on github. We can set up something there using > >>> those namespaces if needed. > >>> > >>>> I'm not trying to advocate one over the other here; rather, I'd like > >>>> to help push on that front that is best able to capture the energy of > >>>> volunteers, as that's what it takes in the end. > >>>> > >>>> -hilmar > >>> > >>> Depends on where everyone wants to place their efforts. May be less > >>> work to port the most important core classes over to Moose, and a > >>> simple test implementation will give us an idea on what works Role- wise > >>> and what doesn't. From there we could work on p6 variants; that would > >>> have to be a separate project altogether. We could also include a few > >>> other MooseX modules if it makes life easier. > >>> > >>> chris > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roychu at gmail.com Fri May 8 14:48:15 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 8 May 2009 11:48:15 -0700 Subject: [Bioperl-l] The Power of R (Chris Fields) Message-ID: <4d7f3e450905081148i28579baai3b9889ec7c8e8b1e@mail.gmail.com> Gabriel's book looks like a promising reference that I think I'll want to check out. I just recently came across this use R series by Springer: http://www.springer.com/series/6991 Another one I don't see listed, but probably more relevant is the Springer book: Statistical Methods in Computational Biology. Roy Date: Fri, 8 May 2009 21:49:22 +0900 From: Gabriel Valiente Subject: Re: [Bioperl-l] The Power of R (Chris Fields) To: bioperl-l at lists.open-bio.org Message-ID: <8F435B66-33CF-467D-8D86-AA8EF2309E98 at lsi.upc.edu> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed >>> While we're on the topic, can anyone recommend a good book or >>> resource from which to learn R, to supplement the official docs? Well, my new book G. Valiente. Combinatorial Pattern Matching Algorithms in Computational Biology using Perl and R. Taylor & Francis/CRC Press (2009) http://www.crcpress.com/product/isbn/9781420063677 is already available. I hope it will also be of much use to BioPerl developers and users. Gabriel From mmuratet at hudsonalpha.org Fri May 8 15:29:38 2009 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Fri, 8 May 2009 14:29:38 -0500 Subject: [Bioperl-l] fastq parsing problem Message-ID: Greetings I've got a problem parsing fastq output from the maq aligner. The parser is throwing an exception for the following record: @HWI-EAS146:3:1:2:177#0/1 CTCCGCTNNCTTCTCAGCTTTCTTGTAGGCGATAGACTTCCCGAGCCTANCCAGAGCAACGAGCNTNNNGNNNNTN + @,AB=>-&&:5).;+*=<*8?%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%% I looked up the line in fastq.pm that does the parsing: 116 my ($top,$sequence,$top2,$qualsequence) = $entry =~ /^ 117 \@?(. +?)\n 118 ([^ \@]*?)\n 119 \+?(. +?)\n 120 (.*)\n 121 /xs I don't consider myself a regex-pert, but I would interpret the above as "put everything after one or zero @ characters on the first line in $top; then put anything that is not @ on the second line in $sequence; then everything after one or zero + characters on the third line in $top2; then everything on the fourth line in $qualsequence; and don't be greedy". It seems like the fastq record above should parse with these rules. I note that the @ character is escaped in the regex and appears in several of the problem records, but not all. Has anyone come across this before? I don't see this exact problem in the list archives. Thanks Mike From jason at bioperl.org Fri May 8 16:04:06 2009 From: jason at bioperl.org (Jason Stajich) Date: Fri, 8 May 2009 13:04:06 -0700 Subject: [Bioperl-l] How to get coil prediction out of Bio::Tools::Run::Coil modules In-Reply-To: <1241799580.6963.165.camel@pepa.cbm.uam.es> References: <1241799580.6963.165.camel@pepa.cbm.uam.es> Message-ID: The sequence isn't part of the report - or at least isn't parsed but you can just do this (pseudo-y-code here). my $seqout =Bio::SeqIO->new(-format => 'fasta'); for my $feature ( @features ) my $featseq = $seqin->subseq($feature->start, $feature->end); $seqout->write_seq($featseq); } On May 8, 2009, at 9:19 AM, Raul Mendez Giraldez wrote: > Hi, > > I'm trying to get coiled-coiled prediction in protein sequences using > Bob Russell's program ncoils, through the bioperl interface > Bio::Tools::Run::Coil, but the only thing I can get from any element > on > the features list is just the sequence name, and few more not so > useful > atributes. > > I'm running the following script: > > > #!/home/rmendez/bin/perl -w > > use strict; > use FileHandle; > use Data::Dumper; > > use Bio::Tools::Run::Coil; > > my $seqin=filein.fasta > my $factory=Bio::Tools::Run::Coil->new('-c'); > my @features=$factory->run($seqin); > > print "Printing content of features[0]\n"; > print Dumper $features[0]; > > ---- > > And the output is (the content of the first element of the features > array) is : > '_gsf_tag_hash' => { > 'percent_id' => [ > 'NULL' > ], > 'hid' => [ > 'ncoils' > ], > 'evalue' => [ > 0 > ] > }, > '_location' => bless( { > '_location_type' => 'EXACT', > '_start' => 138, > '_end' => 172 > }, 'Bio::Location::Simple' ), > '_gsf_seq_id' => 'ENSDARP00000084927', > '_parse_h' => {}, > '_root_cleanup_methods' => [ > sub { "DUMMY" } > ], > '_source_tag' => 'Coils', > '_primary_tag' => 'ncoils', > '_root_verbose' => 0 > }, 'Bio::SeqFeature::Generic' ); > > Then how could I get the sequence itself with the coil annotation > 'xxx'? > > Thanks, > > Raul > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From SMarkel at accelrys.com Fri May 8 16:05:29 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Fri, 8 May 2009 16:05:29 -0400 Subject: [Bioperl-l] The Power of R (Chris Fields) In-Reply-To: <8F435B66-33CF-467D-8D86-AA8EF2309E98@lsi.upc.edu> References: <8F435B66-33CF-467D-8D86-AA8EF2309E98@lsi.upc.edu> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C7472A29BE7E@exch1-hi.accelrys.net> Gabriel, I just finished looking through my copy of your book last night. You cover a nice combination of pattern matching tasks and include background information for both Perl and R. I like the fact that the Perl examples use BioPerl where appropriate. Too bad that most other bioinformatics books don't do the same. A quick personal comment - Thank you for referencing the "Using BioPerl" book that Jason Stajich, Ewan Birney, and I are writing. Now we'll have to finish it. :) Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Gabriel Valiente > Sent: Friday, 08 May 2009 5:49 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] The Power of R (Chris Fields) > > >>> While we're on the topic, can anyone recommend a good book or > >>> resource from which to learn R, to supplement the official docs? > > Well, my new book > > G. Valiente. Combinatorial Pattern Matching Algorithms in > Computational Biology using Perl and R. Taylor & Francis/CRC Press > (2009) > > http://www.crcpress.com/product/isbn/9781420063677 > > is already available. I hope it will also be of much use to BioPerl > developers and users. > > Gabriel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roychu at gmail.com Fri May 8 16:46:48 2009 From: roychu at gmail.com (Chu, Roy) Date: Fri, 8 May 2009 13:46:48 -0700 Subject: [Bioperl-l] The Power of R (Chris Fields) In-Reply-To: <4d7f3e450905081148i28579baai3b9889ec7c8e8b1e@mail.gmail.com> References: <4d7f3e450905081148i28579baai3b9889ec7c8e8b1e@mail.gmail.com> Message-ID: <4d7f3e450905081346j14aa9fbdv9712f7e4b0cc8781@mail.gmail.com> My mistake, Statistical Methods in Bioinformatics. On Fri, May 8, 2009 at 11:48 AM, Chu, Roy wrote: > Gabriel's book looks like a promising reference that I think I'll want > to check out. > I just recently came across this use R series by Springer: > http://www.springer.com/series/6991 > > Another one I don't see listed, but probably more relevant is the > Springer book: Statistical Methods in Computational Biology. > > Roy > > > Date: Fri, 8 May 2009 21:49:22 +0900 > From: Gabriel Valiente > Subject: Re: [Bioperl-l] The Power of R (Chris Fields) > To: bioperl-l at lists.open-bio.org > Message-ID: <8F435B66-33CF-467D-8D86-AA8EF2309E98 at lsi.upc.edu> > Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > >>>> While we're on the topic, can anyone recommend a good book or >>>> resource from which to learn R, to supplement the official docs? > > Well, my new book > > G. Valiente. Combinatorial Pattern Matching Algorithms in > Computational Biology using Perl and R. Taylor & Francis/CRC Press > (2009) > > http://www.crcpress.com/product/isbn/9781420063677 > > is already available. I hope it will also be of much use to BioPerl > developers and users. > > Gabriel > From maj at fortinbras.us Fri May 8 21:33:58 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 8 May 2009 21:33:58 -0400 Subject: [Bioperl-l] Moose [was Re:Other object oddities] In-Reply-To: <4a047a3e.23bb720a.3b09.ffff9430@mx.google.com> References: <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> <4a047a3e.23bb720a.3b09.ffff9430@mx.google.com> Message-ID: <4076AEE2CB9F45138C6FE76DD807D0BA@NewLife> thanks Siddhartha- very informative [but he misquotes Eliot in his header!] cheers MAJ ----- Original Message ----- From: "Siddhartha Basu" To: Sent: Friday, May 08, 2009 2:30 PM Subject: [Bioperl-l] Re: Moose [was Re:Other object oddities] > On Wed, 06 May 2009, Chris Fields wrote: > >> As a final bit: if we go the Moose route, we should be very careful about >> which MooseX modules we want. I don't think we want to expand the >> dependency tree. For instance, I am attempting to install one possible >> module (MooseX::Declare) and the dependency tree was ginormous and included >> modules only needed for installation. >> >> chris > > Since we are on the topic of Moose dependencies, here is a nice article > about it. > http://chris.prather.org/perl/moose-dependencies-a-lurid-tale/ > > -siddhartha > >> >> On May 6, 2009, at 12:56 PM, Mark A. Jensen wrote: >> >> > Great discussion-- I have redacted the moose portions to >> > http://www.bioperl.org/wiki/Talk:BioMoose and encourage all interested >> > folks to log comments there as well. cheers Mark >> > ----- Original Message ----- From: "Chris Mungall" >> > To: "Chris Fields" >> > Cc: "BioPerl List" ; "Mark A. Jensen" >> > ; "Kevin Brown" >> > Sent: Tuesday, May 05, 2009 2:28 PM >> > Subject: [Bioperl-l] Moose [was Re: Other object oddities] >> > >> > >> >> >> >> On May 5, 2009, at 7:31 AM, Chris Fields wrote: >> >> >> >>> On May 5, 2009, at 7:31 AM, Hilmar Lapp wrote: >> >>> >> >>>> >> >>>> On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: >> >>>> >> >>>>> Maybe this should be an element of >> >>>>> the "Align refactor" that perhaps should be an overall >> >>>>> "Seq refactor". >> >>>> >> >>>> Possibly. Most importantly, it'd be great if someone would volunteer >> >>>> to summarize what's been said here so it won't get lost. >> >>> >> >>> Looks like mark's done it. >> >>> >> >>>>> Are you saying that the trunk is fair game for api additions >> >>>>> for this issue? >> >>>> >> >>>> There's been talk some (a long, actually) time ago about BioPerl 2.0 >> >>>> that would start on a clean slate and not be bothered by backwards >> >>>> compatibility demands. That effort never really took off, but maybe >> >>>> this is also a good time to ask the question again whether it's better >> >>>> to introduce the API changes we desire in add/ deprecate/remove cycles, >> >>>> or in a more radical fashion starting on a clean slate. >> >>> >> >>> That's what I'm thinking. >> >>> >> >>>> The obvious advantage of the former is that we get API improvements >> >>>> sooner, but making them is possibly more dreadful, discouraging, or >> >>>> not even doable due to compatibility constraints. The disadvantage of >> >>>> the latter is that it really needs a committed crew of people to see >> >>>> it through or otherwise all the nice changes die in some grand but >> >>>> half-finished 2.0 construction site. I think Chris also had plans to >> >>>> branch off a Perl6 version of BioPerl - maybe those could be the same >> >>>> efforts? >> >>> >> >>> I have been toying around with perl6 for a bit now (Rakudo on Parrot >> >>> implementation). It's possible an alpha for perl6 will be available by >> >>> christmas this year; Rakudo is now passing over 11000 spec tests. >> >>> >> >>> Just to note, Perl6 is another beast altogether from Perl5. Yes, there >> >>> is supposed to be a backwards compatibility mode, but no one has >> >>> implemented that yet, and it likely won't be implemented in the near >> >>> future. Based on that I'm not sure we could really call a bioperl in >> >>> perl6 bioperl 2.0, more like bioperl6 1.0, as it would be a complete >> >>> refactor. >> >>> >> >>> As for perl5, it has a nice OO set of modules (Moose) that could be >> >>> used for refactoring. It implements roles and a few other perl6-ish >> >>> bits (along with MooseX modules). perl 5.10 also has a few things >> >>> backported from p6, say(), given/when, state vars, etc. We could >> >>> require Modern::Perl (perl5.10 with strict/warnings pragmas on) and >> >>> Moose. I have played around with both and find them quite nice, so I >> >>> suggest if we were to start a 2.0 effort it should include Moose, and >> >>> we should push most of the interfaces into roles. >> >> >> >> We're playing around with a rewrite of go-perl using Moose: >> >> http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/OBO/ >> >> >> >> This is early enough that parts could be scrapped or rewritten. >> >> Compatibility with bioperl is important. >> >> >> >> Speed was an initial concern but apparently there are some moose tricks >> >> to speed things up >> >> >> >> DBIx::Class compatibility is also important. Not sure if there is >> >> specific support for this yet >> >> >> >> >> >>> >> >>> Anyway, I grabbed the git repos for bioperl6 and biomoose (bioperl >> >>> implemented in Moose) on github. We can set up something there using >> >>> those namespaces if needed. >> >>> >> >>>> I'm not trying to advocate one over the other here; rather, I'd like >> >>>> to help push on that front that is best able to capture the energy of >> >>>> volunteers, as that's what it takes in the end. >> >>>> >> >>>> -hilmar >> >>> >> >>> Depends on where everyone wants to place their efforts. May be less >> >>> work to port the most important core classes over to Moose, and a >> >>> simple test implementation will give us an idea on what works Role- wise >> >>> and what doesn't. From there we could work on p6 variants; that would >> >>> have to be a separate project altogether. We could also include a few >> >>> other MooseX modules if it makes life easier. >> >>> >> >>> chris >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Fri May 8 21:45:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 8 May 2009 21:45:18 -0400 Subject: [Bioperl-l] fastq parsing problem In-Reply-To: References: Message-ID: <0CECA54FA78F46839114FAB96CD53F39@NewLife> Hi Michael-- Can you send along the exception? The line you send seems to parse as advertised in the debugger (as long as the last newline that breaks up the string of %'s is not really there). thanks, Mark ----- Original Message ----- From: "Michael Muratet" To: ; Sent: Friday, May 08, 2009 3:29 PM Subject: [Bioperl-l] fastq parsing problem > Greetings > > I've got a problem parsing fastq output from the maq aligner. The > parser is throwing an exception for the following record: > > @HWI-EAS146:3:1:2:177#0/1 > CTCCGCTNNCTTCTCAGCTTTCTTGTAGGCGATAGACTTCCCGAGCCTANCCAGAGCAACGAGCNTNNNGNNNNTN > + > @,AB=>-&&:5).;+*=<*8?%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > %%%%% > > I looked up the line in fastq.pm that does the parsing: > > 116 my ($top,$sequence,$top2,$qualsequence) = $entry =~ /^ > 117 \@?(. > +?)\n > 118 ([^ > \@]*?)\n > 119 \+?(. > +?)\n > 120 (.*)\n > 121 /xs > > I don't consider myself a regex-pert, but I would interpret the above > as "put everything after one or zero @ characters on the first line in > $top; then put anything that is not @ on the second line in $sequence; > then everything after one or zero + characters on the third line in > $top2; then everything on the fourth line in $qualsequence; and don't > be greedy". > > It seems like the fastq record above should parse with these rules. I > note that the @ character is escaped in the regex and appears in > several of the problem records, but not all. Has anyone come across > this before? I don't see this exact problem in the list archives. > > Thanks > > Mike > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat May 9 11:26:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 9 May 2009 10:26:42 -0500 Subject: [Bioperl-l] Moose [was Re:Other object oddities] In-Reply-To: <4076AEE2CB9F45138C6FE76DD807D0BA@NewLife> References: <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> <4a047a3e.23bb720a.3b09.ffff9430@mx.google.com> <4076AEE2CB9F45138C6FE76DD807D0BA@NewLife> Message-ID: <003BA940-D974-44A8-9634-55963C2E8341@illinois.edu> Decent article, but it is slightly misleading. These are dependencies for Moose itself, which I don't have a problem with (off the subject, but I personally would like to add in a requirement for Modern::Perl!). What I am worried about are lots of additional dependencies introduced using some of the 'syntactic sugar' in various MooseX modules. For instance, MooseX::Declare, and MooseX::Method::Signatures (two popular ones): http://deps.cpantesters.org/?module=MooseX%3A%3ADeclare&perl=any+version&os=any+OS http://deps.cpantesters.org/?module=MooseX%3A%3AMethod%3A%3ASignatures&perl=any+version&os=any+OS chris On May 8, 2009, at 8:33 PM, Mark A. Jensen wrote: > thanks Siddhartha- very informative [but he misquotes Eliot in his > header!] > cheers MAJ > ----- Original Message ----- From: "Siddhartha Basu" > > To: > Sent: Friday, May 08, 2009 2:30 PM > Subject: [Bioperl-l] Re: Moose [was Re:Other object oddities] > > >> On Wed, 06 May 2009, Chris Fields wrote: >> >>> As a final bit: if we go the Moose route, we should be very >>> careful about >>> which MooseX modules we want. I don't think we want to expand the >>> dependency tree. For instance, I am attempting to install one >>> possible >>> module (MooseX::Declare) and the dependency tree was ginormous and >>> included >>> modules only needed for installation. >>> >>> chris >> >> Since we are on the topic of Moose dependencies, here is a nice >> article >> about it. >> http://chris.prather.org/perl/moose-dependencies-a-lurid-tale/ >> >> -siddhartha >> >>> >>> On May 6, 2009, at 12:56 PM, Mark A. Jensen wrote: >>> >>> > Great discussion-- I have redacted the moose portions to >>> > http://www.bioperl.org/wiki/Talk:BioMoose and encourage all >>> interested >>> > folks to log comments there as well. cheers Mark >>> > ----- Original Message ----- From: "Chris Mungall" >> > >>> > To: "Chris Fields" >>> > Cc: "BioPerl List" ; "Mark A. >>> Jensen" >>> > ; "Kevin Brown" >>> > Sent: Tuesday, May 05, 2009 2:28 PM >>> > Subject: [Bioperl-l] Moose [was Re: Other object oddities] >>> > >>> > >>> >> >>> >> On May 5, 2009, at 7:31 AM, Chris Fields wrote: >>> >> >>> >>> On May 5, 2009, at 7:31 AM, Hilmar Lapp wrote: >>> >>> >>> >>>> >>> >>>> On May 4, 2009, at 3:01 PM, Mark A. Jensen wrote: >>> >>>> >>> >>>>> Maybe this should be an element of >>> >>>>> the "Align refactor" that perhaps should be an overall >>> >>>>> "Seq refactor". >>> >>>> >>> >>>> Possibly. Most importantly, it'd be great if someone would >>> volunteer >>> >>>> to summarize what's been said here so it won't get lost. >>> >>> >>> >>> Looks like mark's done it. >>> >>> >>> >>>>> Are you saying that the trunk is fair game for api additions >>> >>>>> for this issue? >>> >>>> >>> >>>> There's been talk some (a long, actually) time ago about >>> BioPerl 2.0 >>> >>>> that would start on a clean slate and not be bothered by >>> backwards >>> >>>> compatibility demands. That effort never really took off, >>> but maybe >>> >>>> this is also a good time to ask the question again whether >>> it's better >>> >>>> to introduce the API changes we desire in add/ deprecate/ >>> remove cycles, >>> >>>> or in a more radical fashion starting on a clean slate. >>> >>> >>> >>> That's what I'm thinking. >>> >>> >>> >>>> The obvious advantage of the former is that we get API >>> improvements >>> >>>> sooner, but making them is possibly more dreadful, >>> discouraging, or >>> >>>> not even doable due to compatibility constraints. The >>> disadvantage of >>> >>>> the latter is that it really needs a committed crew of people >>> to see >>> >>>> it through or otherwise all the nice changes die in some >>> grand but >>> >>>> half-finished 2.0 construction site. I think Chris also had >>> plans to >>> >>>> branch off a Perl6 version of BioPerl - maybe those could be >>> the same >>> >>>> efforts? >>> >>> >>> >>> I have been toying around with perl6 for a bit now (Rakudo on >>> Parrot >>> >>> implementation). It's possible an alpha for perl6 will be >>> available by >>> >>> christmas this year; Rakudo is now passing over 11000 spec >>> tests. >>> >>> >>> >>> Just to note, Perl6 is another beast altogether from Perl5. >>> Yes, there >>> >>> is supposed to be a backwards compatibility mode, but no one >>> has >>> >>> implemented that yet, and it likely won't be implemented in >>> the near >>> >>> future. Based on that I'm not sure we could really call a >>> bioperl in >>> >>> perl6 bioperl 2.0, more like bioperl6 1.0, as it would be a >>> complete >>> >>> refactor. >>> >>> >>> >>> As for perl5, it has a nice OO set of modules (Moose) that >>> could be >>> >>> used for refactoring. It implements roles and a few other >>> perl6-ish >>> >>> bits (along with MooseX modules). perl 5.10 also has a few >>> things >>> >>> backported from p6, say(), given/when, state vars, etc. We >>> could >>> >>> require Modern::Perl (perl5.10 with strict/warnings pragmas >>> on) and >>> >>> Moose. I have played around with both and find them quite >>> nice, so I >>> >>> suggest if we were to start a 2.0 effort it should include >>> Moose, and >>> >>> we should push most of the interfaces into roles. >>> >> >>> >> We're playing around with a rewrite of go-perl using Moose: >>> >> http://geneontology.svn.sourceforge.net/viewvc/geneontology/go-moose/OBO/ >>> >> >>> >> This is early enough that parts could be scrapped or rewritten. >>> >> Compatibility with bioperl is important. >>> >> >>> >> Speed was an initial concern but apparently there are some >>> moose tricks >>> >> to speed things up >>> >> >>> >> DBIx::Class compatibility is also important. Not sure if there is >>> >> specific support for this yet >>> >> >>> >> >>> >>> >>> >>> Anyway, I grabbed the git repos for bioperl6 and biomoose >>> (bioperl >>> >>> implemented in Moose) on github. We can set up something >>> there using >>> >>> those namespaces if needed. >>> >>> >>> >>>> I'm not trying to advocate one over the other here; rather, >>> I'd like >>> >>>> to help push on that front that is best able to capture the >>> energy of >>> >>>> volunteers, as that's what it takes in the end. >>> >>>> >>> >>>> -hilmar >>> >>> >>> >>> Depends on where everyone wants to place their efforts. May >>> be less >>> >>> work to port the most important core classes over to Moose, >>> and a >>> >>> simple test implementation will give us an idea on what works >>> Role- wise >>> >>> and what doesn't. From there we could work on p6 variants; >>> that would >>> >>> have to be a separate project altogether. We could also >>> include a few >>> >>> other MooseX modules if it makes life easier. >>> >>> >>> >>> chris >>> >>> _______________________________________________ >>> >>> Bioperl-l mailing list >>> >>> Bioperl-l at lists.open-bio.org >>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> >>> >> _______________________________________________ >>> >> Bioperl-l mailing list >>> >> Bioperl-l at lists.open-bio.org >>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >>> > >>> > _______________________________________________ >>> > Bioperl-l mailing list >>> > Bioperl-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Sun May 10 16:49:56 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 11 May 2009 08:49:56 +1200 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32493CE8F24@exchsth.agresearch.co.nz> <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE904C@exchsth.agresearch.co.nz> <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE9104@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493CE92E3@exchsth.agresearch.co.nz> How about splitting the big file into smaller chunks and processing one sequence at a time? It could be one specific feature line that's causing the segfault and nothing to do with file size. You should be able to split the file with awk as well (I like awk :-) zcat rel_ann_mus_01_r99.dat.gz | awk 'BEGIN{RS="//";OFS="\n"}{$1=$1; print > "chunk"NR}' --Russell > -----Original Message----- > From: brian li [mailto:brianli.cas at gmail.com] > Sent: Saturday, 9 May 2009 2:49 a.m. > To: Smithies, Russell > Cc: bioperl-l at lists.open-bio.org; Jason Stajich; Chris Fields > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > open $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk > '!/^FT|^CO/{print}' |" works. > open $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk '!/^SQ|^ > /{print}' |" segfaults. > > So it seems the features are causing problems. Although I still don't > know how that hurts my os to pop a segfault, my extraction can move on > again. Maybe I can find a clue when I know more about my os's memory > management strategy. > > Really appreciate all your help. > > -Brian > > On Fri, May 8, 2009 at 8:03 AM, Smithies, Russell > wrote: > > I think the problem here though is the size of the sequences rather than too > > many features. > > > > If one was inclined to bodge/hack and didn't care about sequence, I guess > > you could filter them out with awk so Bio::SeqIO doesn't have to create the > > Bio::PrimarySeq J > > > > Probably breaks the EMBL file spec . > > > > Eg. > > > > open( $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk '!/^SQ|^ /{print}' |" > > ) or die; > > > > > > > > > > > > --Russell > > > > > > > > > > > > > > > > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason > > Stajich > > Sent: Friday, 8 May 2009 11:25 a.m. > > To: Smithies, Russell > > Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' > > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > > > > > > > It parses from a stream or file, one sequence at a time so it only reads a > > single sequence out at a time, but it does have to parse that whole sequence > > record which is where feature rich sequences might be causing problems. > > > > > > > > I think per your other mention of Tie::File - the whole file is not going > > into memory so that is not the problem, it is the creation of many objects > > that it does as it parses the sequence that is likely the problem. ?It will > > read up to the first "//" from that Tie::File anyways, that becomes an > > entire string which is then parsed to pull out the relevant features so you > > don't gain anything with Tie::File -- what would be the way to solve it is > > if the objects could be created and reside in a DB on disk rather than > > in-memory. ?I'd really enjoy seeing more indexed and hashed data to objects > > stored on disk when mem requirements are such so that very large datasets > > can be handled more nimbly. > > > > > > > > I think there have been several attempts to simplify, but it basically means > > a dedicated developer to really overhaul or map to a new system. ?What we've > > tried to build is a decent API so a new implementation can be done without > > affecting the 'next_seq' and 'write_seq' API. > > > > > > > > Non-withstanding the seemed API confusion caused by _ancient_ decisions on > > giving function names of Bio::SeqFeatureI 'seq' and Bio::PrimarySeq 'seq' > > which return different types -- don't forget that Lincoln's Bio::DB::Fasta > > uses the 'seq' method to return a sequence as a string as well so major API > > changes in general here will create in all likelihood a big split between > > the branches that will make any new Bioperl not match up well with existing > > scripts or libraries that use it - hence the reason for no "great > > realigning" to a completely well-planned out API rather than the organically > > grown whims of several generations of devs. ?I say this in jest a bit - I do > > want to see changes, but I think it really will have to be called something > > else besides BioPerl to avoid confusion and the fact that a lot of things > > will break that depend on the current APIs. ?BioPerl2 or something > > indicating a Perl6 association. > > > > > > > > -jason > > > > On May 7, 2009, at 3:05 PM, Smithies, Russell wrote: > > > > OK, I misunderstood, I thought the entire file loaded was loaded into memory > > first then each sequence was extracted from there. > > I hoped splitting into 588 individual sequences might help. > > > > --Russell > > > > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason > > Stajich > > Sent: Friday, 8 May 2009 9:55 a.m. > > To: Smithies, Russell > > Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' > > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > > > Russell - > > > > I am not sure how that will help as only 1 sequence is parsed at a time by > > SeqIO parsers and they use the "//" delimiter. > > > > If the equivalent data exists in genbank format at NCBI I think _that_ > > ?module (Bio::SeqIO::genbank) has the ability to ignore > > annotations/features. ?Really we have to re-work the whole thing to be more > > lightweight and lazy-parse. > > > > -jason > > On May 7, 2009, at 2:24 PM, Smithies, Russell wrote: > > > > > > I'm not sure if this will help with your problem or how it deals with memory > > management but using "ordinary" Perl to split the large EMBL file might > > work. > > Give this a go: > > > > ============================ > > #!perl -w > > > > use Bio::SeqIO; > > use IO::String; > > > > use constant SEP => "//\n"; > > > > open($fh, "gunzip -c rel_ann_mus_01_r99.dat.gz |") or die; > > > > my $index = 1; > > > > while(my $stringfh = new IO::String(get_next_record($fh))){ > > > > ?????????my $seqio = Bio::SeqIO->new( -fh ????=> $stringfh,-format => "EMBL" > > ) or die $!; > > > > ?????????while ( my $seq_object = $seqio->next_seq ) { > > ??????????print "Dealing with entry: ".$index++."\t".$seq_object->id."\n"; > > > > ??????????# show the features > > ??????????for my $feat_object ($seq_object->get_SeqFeatures) { > > ???????????????????????print "primary tag: ", $feat_object->primary_tag, > > "\n"; > > ???????????????????????for my $tag ($feat_object->get_all_tags) { > > ??????????????????????????print " ?tag: ", $tag, "\n"; > > ??????????????????????????for my $value ($feat_object->get_tag_values($tag)) > > { > > ?????????????????????????????print " ???value: ", $value, "\n"; > > ??????????????????????????} > > ???????????????????????} > > ?????????????????????} > > ?????????} > > > > } > > > > > > sub get_next_record{ > > ?????????my($fh) = @_; > > ?????????(my $old_sep,$/) = ($/,SEP); > > ?????????my $record = <$fh>; > > ?????????$/ = $old_sep; > > ?????????return $record; > > } > > ======================================== > > > > > > --Russell > > > > > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of > > brian li > > Sent: Friday, 8 May 2009 1:00 a.m. > > To: Chris Fields > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > > > My server has 32 GB RAM. > > > > The os of my server is 64-bit version of Ubuntu Server Edition 8.04 > > LTS. And I have run my example code on another server with 32-bit > > version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. > > > > -Brian > > > > On Thu, May 7, 2009 at 8:07 PM, Chris Fields > > > wrote: > > I noticed that Russell has 16GB RAM on his setup. ?Was yours equivalent? > > > > chris > > > > On May 7, 2009, at 12:32 AM, brian li wrote: > > > > Thank you very much for your offer. > > > > The director of our lab wants me to do the extraction every time a new > > release of EMBL is published. I can't push the task to you every time. > > > > I can offer more information of the server I run my script on if needed. > > > > -Brian > > > > On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell > > > > > > wrote: > > > > Sadly, that's the same code as I ran but I had a Data::Dump in the > > middle. > > Versions of Perl and BioPerl are the same. > > We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM > > > > If you get a full script running on a smaller dataset, I could probably > > run it on the bigger stuff and give you back tab-separated (or is that > > tab\tseparated ?) data for loading into your db. > > > > --Russell > > > > -----Original Message----- > > From: brian li [mailto:brianli.cas at gmail.com] > > Sent: Thursday, 7 May 2009 4:50 p.m. > > To: Smithies, Russell > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction > > > > Dear Russell, > > > > My example code is as following. I omit the parse process and these > > lines give me "Segmentation Fault" too. > > > > # Start of code > > my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', > > ???????????????????????????????????????????-format => 'EMBL'); > > my $index = 1; > > while (my $seq = $seqio->next_seq) > > { > > ??print "Dealing with entry: $index\n"; > > ??$index++; > > } > > # End > > > > The platform I run this code on: > > BioPerl 1.6.0 > > Perl 5.8.8 > > Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) > > > > I have monitored the memory usage when I run the code above. There is > > always around 20GB free memory (buffer size counted in) left. So I > > suppose the segfault can't be explained just by memory shortage. > > > > Brian > > > > > > On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell > > > > > > wrote: > > > > Hi Brian, > > I hate to say it but it worked OK for me using > > rel_ann_mus_01_r99.dat.gz and > > > > simple example Bio::SeqIO code from bugzilla > > > > It's not using more than 1GB memory on our server and doesn't segfault. > > > > Send me your example code and I'll give it a go if you like. > > > > > > Russell Smithies > > > > Bioinformatics Applications Developer > > T +64 3 489 9085 > > E > > ?russell.smithies at agresearch.co.nz > > > > Invermay ?Research Centre > > Puddle Alley, > > Mosgiel, > > New Zealand > > T ?+64 3 489 3809 > > F ?+64 3 489 9174 > > www.agresearch.co.nz > > > > > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > > jason at bioperl.org > > > > > > > > > > > > Jason Stajich > > > > jason at bioperl.org > > > > > > > > > > > > > > > > From brianli.cas at gmail.com Sun May 10 22:43:48 2009 From: brianli.cas at gmail.com (brian li) Date: Mon, 11 May 2009 10:43:48 +0800 Subject: [Bioperl-l] Asking for advice on full EMBL extraction In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493CE92E3@exchsth.agresearch.co.nz> References: <3070BEFE-CC10-44CC-9FB9-79B7BB0E53E0@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32493CE8FE3@exchsth.agresearch.co.nz> <6C1564CE-EC1E-446B-BD11-A0C1E627B14B@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE904C@exchsth.agresearch.co.nz> <82AAC49D-458A-4E79-90EA-A793A053314F@bioperl.org> <18DF7D20DFEC044098A1062202F5FFF32493CE9104@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493CE92E3@exchsth.agresearch.co.nz> Message-ID: Thanks for your advice. I agree with you that some features lines are causing the segfault. But I don't know which ones. I am afraid splitting big files by seqs could not help as I don't know which chunk has a mean feature :) Can't try some code and run for every file. I think for now I have to just skip the features and make the extraction run first. --Brian On Mon, May 11, 2009 at 4:49 AM, Smithies, Russell wrote: > How about splitting the big file into smaller chunks and processing one sequence at a time? > It could be one specific feature line that's causing the segfault and nothing to do with file size. > You should be able to split the file with awk as well (I like awk :-) > > zcat rel_ann_mus_01_r99.dat.gz | awk 'BEGIN{RS="//";OFS="\n"}{$1=$1; print > "chunk"NR}' > > --Russell > >> -----Original Message----- >> From: brian li [mailto:brianli.cas at gmail.com] >> Sent: Saturday, 9 May 2009 2:49 a.m. >> To: Smithies, Russell >> Cc: bioperl-l at lists.open-bio.org; Jason Stajich; Chris Fields >> Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction >> >> open $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk >> '!/^FT|^CO/{print}' |" works. >> open $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk '!/^SQ|^ >> /{print}' |" segfaults. >> >> So it seems the features are causing problems. Although I still don't >> know how that hurts my os to pop a segfault, my extraction can move on >> again. Maybe I can find a clue when I know more about my os's memory >> management strategy. >> >> Really appreciate all your help. >> >> -Brian >> >> On Fri, May 8, 2009 at 8:03 AM, Smithies, Russell >> wrote: >> > I think the problem here though is the size of the sequences rather than too >> > many features. >> > >> > If one was inclined to bodge/hack and didn't care about sequence, I guess >> > you could filter them out with awk so Bio::SeqIO doesn't have to create the >> > Bio::PrimarySeq J >> > >> > Probably breaks the EMBL file spec . >> > >> > Eg. >> > >> > open( $fh, "gunzip -c rel_ann_mus_01_r99.dat.gz | awk '!/^SQ|^ /{print}' |" >> > ) or die; >> > >> > >> > >> > >> > >> > --Russell >> > >> > >> > >> > >> > >> > >> > >> > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason >> > Stajich >> > Sent: Friday, 8 May 2009 11:25 a.m. >> > To: Smithies, Russell >> > Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' >> > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction >> > >> > >> > >> > It parses from a stream or file, one sequence at a time so it only reads a >> > single sequence out at a time, but it does have to parse that whole sequence >> > record which is where feature rich sequences might be causing problems. >> > >> > >> > >> > I think per your other mention of Tie::File - the whole file is not going >> > into memory so that is not the problem, it is the creation of many objects >> > that it does as it parses the sequence that is likely the problem. ?It will >> > read up to the first "//" from that Tie::File anyways, that becomes an >> > entire string which is then parsed to pull out the relevant features so you >> > don't gain anything with Tie::File -- what would be the way to solve it is >> > if the objects could be created and reside in a DB on disk rather than >> > in-memory. ?I'd really enjoy seeing more indexed and hashed data to objects >> > stored on disk when mem requirements are such so that very large datasets >> > can be handled more nimbly. >> > >> > >> > >> > I think there have been several attempts to simplify, but it basically means >> > a dedicated developer to really overhaul or map to a new system. ?What we've >> > tried to build is a decent API so a new implementation can be done without >> > affecting the 'next_seq' and 'write_seq' API. >> > >> > >> > >> > Non-withstanding the seemed API confusion caused by _ancient_ decisions on >> > giving function names of Bio::SeqFeatureI 'seq' and Bio::PrimarySeq 'seq' >> > which return different types -- don't forget that Lincoln's Bio::DB::Fasta >> > uses the 'seq' method to return a sequence as a string as well so major API >> > changes in general here will create in all likelihood a big split between >> > the branches that will make any new Bioperl not match up well with existing >> > scripts or libraries that use it - hence the reason for no "great >> > realigning" to a completely well-planned out API rather than the organically >> > grown whims of several generations of devs. ?I say this in jest a bit - I do >> > want to see changes, but I think it really will have to be called something >> > else besides BioPerl to avoid confusion and the fact that a lot of things >> > will break that depend on the current APIs. ?BioPerl2 or something >> > indicating a Perl6 association. >> > >> > >> > >> > -jason >> > >> > On May 7, 2009, at 3:05 PM, Smithies, Russell wrote: >> > >> > OK, I misunderstood, I thought the entire file loaded was loaded into memory >> > first then each sequence was extracted from there. >> > I hoped splitting into 588 individual sequences might help. >> > >> > --Russell >> > >> > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason >> > Stajich >> > Sent: Friday, 8 May 2009 9:55 a.m. >> > To: Smithies, Russell >> > Cc: 'brian li'; 'Chris Fields'; 'bioperl-l at lists.open-bio.org' >> > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction >> > >> > Russell - >> > >> > I am not sure how that will help as only 1 sequence is parsed at a time by >> > SeqIO parsers and they use the "//" delimiter. >> > >> > If the equivalent data exists in genbank format at NCBI I think _that_ >> > ?module (Bio::SeqIO::genbank) has the ability to ignore >> > annotations/features. ?Really we have to re-work the whole thing to be more >> > lightweight and lazy-parse. >> > >> > -jason >> > On May 7, 2009, at 2:24 PM, Smithies, Russell wrote: >> > >> > >> > I'm not sure if this will help with your problem or how it deals with memory >> > management but using "ordinary" Perl to split the large EMBL file might >> > work. >> > Give this a go: >> > >> > ============================ >> > #!perl -w >> > >> > use Bio::SeqIO; >> > use IO::String; >> > >> > use constant SEP => "//\n"; >> > >> > open($fh, "gunzip -c rel_ann_mus_01_r99.dat.gz |") or die; >> > >> > my $index = 1; >> > >> > while(my $stringfh = new IO::String(get_next_record($fh))){ >> > >> > ?????????my $seqio = Bio::SeqIO->new( -fh ????=> $stringfh,-format => "EMBL" >> > ) or die $!; >> > >> > ?????????while ( my $seq_object = $seqio->next_seq ) { >> > ??????????print "Dealing with entry: ".$index++."\t".$seq_object->id."\n"; >> > >> > ??????????# show the features >> > ??????????for my $feat_object ($seq_object->get_SeqFeatures) { >> > ???????????????????????print "primary tag: ", $feat_object->primary_tag, >> > "\n"; >> > ???????????????????????for my $tag ($feat_object->get_all_tags) { >> > ??????????????????????????print " ?tag: ", $tag, "\n"; >> > ??????????????????????????for my $value ($feat_object->get_tag_values($tag)) >> > { >> > ?????????????????????????????print " ???value: ", $value, "\n"; >> > ??????????????????????????} >> > ???????????????????????} >> > ?????????????????????} >> > ?????????} >> > >> > } >> > >> > >> > sub get_next_record{ >> > ?????????my($fh) = @_; >> > ?????????(my $old_sep,$/) = ($/,SEP); >> > ?????????my $record = <$fh>; >> > ?????????$/ = $old_sep; >> > ?????????return $record; >> > } >> > ======================================== >> > >> > >> > --Russell >> > >> > >> > >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of >> > brian li >> > Sent: Friday, 8 May 2009 1:00 a.m. >> > To: Chris Fields >> > Cc: bioperl-l at lists.open-bio.org >> > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction >> > >> > My server has 32 GB RAM. >> > >> > The os of my server is 64-bit version of Ubuntu Server Edition 8.04 >> > LTS. And I have run my example code on another server with 32-bit >> > version of Ubuntu Server Edition 8.04 and 4 GB RAM. Segfault again. >> > >> > -Brian >> > >> > On Thu, May 7, 2009 at 8:07 PM, Chris Fields >> > > wrote: >> > I noticed that Russell has 16GB RAM on his setup. ?Was yours equivalent? >> > >> > chris >> > >> > On May 7, 2009, at 12:32 AM, brian li wrote: >> > >> > Thank you very much for your offer. >> > >> > The director of our lab wants me to do the extraction every time a new >> > release of EMBL is published. I can't push the task to you every time. >> > >> > I can offer more information of the server I run my script on if needed. >> > >> > -Brian >> > >> > On Thu, May 7, 2009 at 1:01 PM, Smithies, Russell >> > >> > >> > wrote: >> > >> > Sadly, that's the same code as I ran but I had a Data::Dump in the >> > middle. >> > Versions of Perl and BioPerl are the same. >> > We're running RHEL 5 (kernel 2.6.18-92.1.18.el5) with 16GB RAM >> > >> > If you get a full script running on a smaller dataset, I could probably >> > run it on the bigger stuff and give you back tab-separated (or is that >> > tab\tseparated ?) data for loading into your db. >> > >> > --Russell >> > >> > -----Original Message----- >> > From: brian li [mailto:brianli.cas at gmail.com] >> > Sent: Thursday, 7 May 2009 4:50 p.m. >> > To: Smithies, Russell >> > Cc: bioperl-l at lists.open-bio.org >> > Subject: Re: [Bioperl-l] Asking for advice on full EMBL extraction >> > >> > Dear Russell, >> > >> > My example code is as following. I omit the parse process and these >> > lines give me "Segmentation Fault" too. >> > >> > # Start of code >> > my $seqio = Bio::SeqIO->new(-file => 'rel_ann_mus_01_r99.dat', >> > ???????????????????????????????????????????-format => 'EMBL'); >> > my $index = 1; >> > while (my $seq = $seqio->next_seq) >> > { >> > ??print "Dealing with entry: $index\n"; >> > ??$index++; >> > } >> > # End >> > >> > The platform I run this code on: >> > BioPerl 1.6.0 >> > Perl 5.8.8 >> > Ubuntu 8.04 LTS Server 64-bit version (Linux 2.6.24-23-server) >> > >> > I have monitored the memory usage when I run the code above. There is >> > always around 20GB free memory (buffer size counted in) left. So I >> > suppose the segfault can't be explained just by memory shortage. >> > >> > Brian >> > >> > >> > On Thu, May 7, 2009 at 11:32 AM, Smithies, Russell >> > >> > >> > wrote: >> > >> > Hi Brian, >> > I hate to say it but it worked OK for me using >> > rel_ann_mus_01_r99.dat.gz and >> > >> > simple example Bio::SeqIO code from bugzilla >> > >> > It's not using more than 1GB memory on our server and doesn't segfault. >> > >> > Send me your example code and I'll give it a go if you like. >> > >> > >> > Russell Smithies >> > >> > Bioinformatics Applications Developer >> > T +64 3 489 9085 >> > E >> > ?russell.smithies at agresearch.co.nz >> > >> > Invermay ?Research Centre >> > Puddle Alley, >> > Mosgiel, >> > New Zealand >> > T ?+64 3 489 3809 >> > F ?+64 3 489 9174 >> > www.agresearch.co.nz >> > >> > >> > ======================================================================= >> > Attention: The information contained in this message and/or attachments >> > from AgResearch Limited is intended only for the persons or entities >> > to which it is addressed and may contain confidential and/or privileged >> > material. Any review, retransmission, dissemination or other use of, or >> > taking of any action in reliance upon, this information by persons or >> > entities other than the intended recipients is prohibited by AgResearch >> > Limited. If you have received this message in error, please notify the >> > sender immediately. >> > ======================================================================= >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > >> > Jason Stajich >> > jason at bioperl.org >> > >> > >> > >> > >> > >> > Jason Stajich >> > >> > jason at bioperl.org >> > >> > >> > >> > >> > >> > >> > >> > > From dan.bolser at gmail.com Mon May 11 09:58:01 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 11 May 2009 14:58:01 +0100 Subject: [Bioperl-l] machine learnings In-Reply-To: <704392.20390.qm@web8402.mail.in.yahoo.com> References: <704392.20390.qm@web8402.mail.in.yahoo.com> Message-ID: <2c8757af0905110658q593c3684h3de8d7e0c294c0@mail.gmail.com> 2009/5/4 punit kumar : > hello > > i am punit kumar , i want to know that is the artificial neural network, and other machine learnings techniques?modules are availabe > in? bio perl or not, I don't think they are available in BioPerl. > if available pls give suggestion that how i?can utilise them. You could try looking in "R" or here: http://smw.referata.com/wiki/Emergent_Neural_Network_Simulation_System Good luck! Dan. > punit kumar kadimi. > > > ? ? ?Cricket on your mind? Visit the ultimate cricket website. Enter http://beta.cricket.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.bolser at gmail.com Mon May 11 10:34:22 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 11 May 2009 15:34:22 +0100 Subject: [Bioperl-l] Getting 'features' from SearchIO? Message-ID: <2c8757af0905110734i26f7102k69b615dc413e6be9@mail.gmail.com> Hi, I am parsing a blasttable and extracting Bio::Search::HSP::GenericHSP objects as a result. I read somewhere that HSP objects inherit Feature objects... How can I get a 'standard' representation of the HSP as a feature? Basically I'd like to simply load the blast results into a feature database... When I call feature methods on the HSP objects I just get blank or undef results... I think this is because I'm trying to get at the sequences existing (non existent) features, rather than get the HSP object as a feature... If that makes sense... How can I confirm that I have a feature object containing the details of the HSP? I thought of trying to just pass the HSP object to the Bio::DB::SeqFeature::Store, but I need to get that up and running first (I'm looking into it). In the mean time I thought I'd ask if this sounds like the right thing to do. More generally I want to have features attached to sequences that are themselves annotations of larger sequences (but with unknown position). Is Bio::DB::SeqFeature::Store a way to go? I need to manage various different bits of information coming from a sequencing project, and I need a solution to the whole 'assembly life cycle management' problem. Thanks for any help, Dan. From maj at fortinbras.us Mon May 11 10:31:35 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 11 May 2009 10:31:35 -0400 Subject: [Bioperl-l] Google Summer of Code student Chase Miller Message-ID: Hello all, With great pleasure, I want to introduce Chase Miller, my Google Summer of Code student from George Washington University, to the community. Chase will be working with me and Rutger Vos on a BioPerl wrapper for Rutger's Bio::Phylo package, with a particular emphasis on creating a BioPerl-native way to import and export the NeXML (http://nexml.org) phylogenetic data format. He wrote a great proposal, available here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit. We will be working throughout the summer on the project, and will of course come to you for sage advice. I know you will welcome him warmly, as you did me. Cheers, Mark From dan.bolser at gmail.com Mon May 11 11:07:47 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Mon, 11 May 2009 16:07:47 +0100 Subject: [Bioperl-l] Getting 'features' from SearchIO? In-Reply-To: <2c8757af0905110734i26f7102k69b615dc413e6be9@mail.gmail.com> References: <2c8757af0905110734i26f7102k69b615dc413e6be9@mail.gmail.com> Message-ID: <2c8757af0905110807g557efe35laf33a95f256dbf10@mail.gmail.com> 2009/5/11 Dan Bolser : > Hi, > > I am parsing a blasttable and extracting Bio::Search::HSP::GenericHSP > objects as a result. I read somewhere that HSP objects inherit Feature > objects... How can I get a 'standard' representation of the HSP as a > feature? Basically I'd like to simply load the blast results into a > feature database... > > When I call feature methods on the HSP objects I just get blank or > undef results... I think this is because I'm trying to get at the > sequences existing (non existent) features, rather than get the HSP > object as a feature... If that makes sense... How can I confirm that I > have a feature object containing the details of the HSP? > > I thought of trying to just pass the HSP object to the > Bio::DB::SeqFeature::Store, but I need to get that up and running > first (I'm looking into it). In the mean time I thought I'd ask if > this sounds like the right thing to do. Well it works... I am seeing things fill into the database as I call $db->store($p) or die "Couldn't store!"; (I needed to upgrade bioperl to get Bio::DB::SeqFeature working). Here is my code; while(my $r = $s->next_result ){ print $r->query_name, "\n"; while(my $h = $r->next_hit){ print "\t", $h->name, "\n"; while(my $p = $h->next_hsp){ $db->store($p) or die "Couldn't store!"; } } } How can I visualize the resulting set of HSPs? i.e. If I point gbrowse at this location, will it automatically pick up the entry points and their features from the database? Or how much manual configuration will I need? Is there some boilerplate config I can use to visualize this? Cheers, Dan. > More generally I want to have features attached to sequences that are > themselves annotations of larger sequences (but with unknown > position). Is Bio::DB::SeqFeature::Store a way to go? I need to manage > various different bits of information coming from a sequencing > project, and I need a solution to the whole 'assembly life cycle > management' problem. > > Thanks for any help, > Dan. > From jason at bioperl.org Mon May 11 11:38:14 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 11 May 2009 08:38:14 -0700 Subject: [Bioperl-l] Getting 'features' from SearchIO? In-Reply-To: <2c8757af0905110807g557efe35laf33a95f256dbf10@mail.gmail.com> References: <2c8757af0905110734i26f7102k69b615dc413e6be9@mail.gmail.com> <2c8757af0905110807g557efe35laf33a95f256dbf10@mail.gmail.com> Message-ID: Dan - There is nice documentation on the gmod website covering the gbrowse tutorial on the expected format of alignment features. That is what you should probably be generating and loading with the bp_seqfeature_load script -- otherwise you need to be converting the HSPs into seqfeatures with the right associated information (i.e. the tag/value pairs that are in the 9th column) in order to have well structured data in the database. There are boilerplate examples of how to visualize alignments on the Gbrowse tutorial website as well so I commend that as great starting place for GFF, data, conf files, and what kind of visualization you can obtain with the browser. There is also some helper scripts, that do this for you like bp_search2gff. Just dumping the feature will take the query ( i believe) of the feature pair that is the HSP by default, so you will need to make some choices about what information you want. You can get the individual features from the feature pair with $hsp->query or $hsp->hit which can also be passed to a GFF writer (or call $hsp->hit- >gff_string). Note that since the data storage is not structured in a GFF3 like-way this won't immediately produce well formed GFF3 for the 9th column. Here's a script I use for some DNA to genome alignments, from FASTA output for example - it assumes 1 HSP per Hit as per what you get from SSEARCH but is a reasonable jumping off place. http://bit.ly/fasta2gff There is also a wublast to gff converting script in that repository as well. -jason On May 11, 2009, at 8:07 AM, Dan Bolser wrote: > 2009/5/11 Dan Bolser : >> Hi, >> >> I am parsing a blasttable and extracting Bio::Search::HSP::GenericHSP >> objects as a result. I read somewhere that HSP objects inherit >> Feature >> objects... How can I get a 'standard' representation of the HSP as a >> feature? Basically I'd like to simply load the blast results into a >> feature database... >> >> When I call feature methods on the HSP objects I just get blank or >> undef results... I think this is because I'm trying to get at the >> sequences existing (non existent) features, rather than get the HSP >> object as a feature... If that makes sense... How can I confirm >> that I >> have a feature object containing the details of the HSP? >> >> I thought of trying to just pass the HSP object to the >> Bio::DB::SeqFeature::Store, but I need to get that up and running >> first (I'm looking into it). In the mean time I thought I'd ask if >> this sounds like the right thing to do. > > Well it works... I am seeing things fill into the database as I call > > $db->store($p) > or die "Couldn't store!"; > > (I needed to upgrade bioperl to get Bio::DB::SeqFeature working). > > > Here is my code; > > while(my $r = $s->next_result ){ > print $r->query_name, "\n"; > while(my $h = $r->next_hit){ > print "\t", $h->name, "\n"; > while(my $p = $h->next_hsp){ > $db->store($p) > or die "Couldn't store!"; > } > } > } > > > How can I visualize the resulting set of HSPs? i.e. If I point > gbrowse at this location, will it automatically pick up the entry > points and their features from the database? Or how much manual > configuration will I need? Is there some boilerplate config I can use > to visualize this? > > Cheers, > Dan. > > >> More generally I want to have features attached to sequences that are >> themselves annotations of larger sequences (but with unknown >> position). Is Bio::DB::SeqFeature::Store a way to go? I need to >> manage >> various different bits of information coming from a sequencing >> project, and I need a solution to the whole 'assembly life cycle >> management' problem. >> >> Thanks for any help, >> Dan. >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From cjfields at illinois.edu Mon May 11 11:39:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 May 2009 10:39:54 -0500 Subject: [Bioperl-l] Getting 'features' from SearchIO? In-Reply-To: <2c8757af0905110734i26f7102k69b615dc413e6be9@mail.gmail.com> References: <2c8757af0905110734i26f7102k69b615dc413e6be9@mail.gmail.com> Message-ID: On May 11, 2009, at 9:34 AM, Dan Bolser wrote: > Hi, > > I am parsing a blasttable and extracting Bio::Search::HSP::GenericHSP > objects as a result. I read somewhere that HSP objects inherit Feature > objects... How can I get a 'standard' representation of the HSP as a > feature? Basically I'd like to simply load the blast results into a > feature database... They are Bio::SeqFeature::SimilarityPair (all Bio::Search::HSP::HSPI are). > When I call feature methods on the HSP objects I just get blank or > undef results... I think this is because I'm trying to get at the > sequences existing (non existent) features, rather than get the HSP > object as a feature... If that makes sense... How can I confirm that I > have a feature object containing the details of the HSP? These are decorated feature pairs (they map to one another), so you would need to do something like $hsp->hit to get at the actual SeqFeature data for the hit, and similarly $hsp->query for the query SF. They technically have the SeqFeatureI methods but I believe they delegate to one specific feature (the query) unless you explicitly specify which feature to grab info from ('query', 'hit/subject'). I have added some tests for t/SearchIO//blasttable for this. > I thought of trying to just pass the HSP object to the > Bio::DB::SeqFeature::Store, but I need to get that up and running > first (I'm looking into it). In the mean time I thought I'd ask if > this sounds like the right thing to do. Worth a try to see what happens, but I'm not sure it would work as you expect, seeing as the methods by default delegate to the query (and I don't know if support for feature pairs is built in to Bio::DB::SeqFeature::Store). Also, last I recall, SF::Store stores everything based on a specified SF class, not the interface, so mixing SFs classes in the same database (such as Bio::SB::SeqFeature, Bio::SeqFeature::Generic, and HSPs) may not be the wisest thing. I haven't used it in a little while, though, so that may have changed. Just to note, this problem has been 'solved' to some degree in the past. I think there are a few blast2gff scripts floating around, and there is a Bio::SearchIO::Writer::GbrowseGFF module, though it isn't maintained. The main problem is the mapping is subjective based on what your reference sequence is within the BLAST run (e.g. whether it is the query or the hit), and is something that can't be automatically discerned. I ended up rolling my own with SeqFeature::Store (just mapped the relevant data to Bio::DB::SeqFeatures), but I have long wanted to fix up the relevant scripts to integrate my changes in, just haven't had the time (though that may change soon :) > More generally I want to have features attached to sequences that are > themselves annotations of larger sequences (but with unknown > position). Did you mean 'features of larger sequences'? At the very least, you can define a region a feature falls within; if it falls within a region that has gaps on both sides: gap1 gap2 ----------xxxxxxxx--------xxxxxxx------------ |---| you can still assign coordinates to the feature for that release based on the estimated length of the gaps. Therefore it may change in a future release if the gaps are filled in. Otherwise I would assume it's simpler to designate it as a feature in a singleton sequence (on it's own) that hasn't been mapped. > Is Bio::DB::SeqFeature::Store a way to go? I need to manage > various different bits of information coming from a sequencing > project, and I need a solution to the whole 'assembly life cycle > management' problem. It's a good start, but it's not the only solution (by far). If you want to integrate in more information you could look into Chado (Apollo has a plugin for Chado). > Thanks for any help, > Dan. np. chris From hlapp at gmx.net Mon May 11 12:09:20 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 11 May 2009 12:09:20 -0400 Subject: [Bioperl-l] Google Summer of Code student Chase Miller In-Reply-To: References: Message-ID: Welcome to the fold, Chase, and looking forward to the project! :-) -hilmar On May 11, 2009, at 10:31 AM, Mark A. Jensen wrote: > Hello all, > With great pleasure, I want to introduce Chase Miller, my Google > Summer of Code student from George Washington University, to the > community. Chase will be working with me and Rutger Vos on a BioPerl > wrapper for Rutger's Bio::Phylo package, with a particular emphasis > on creating a BioPerl-native way to import and export the NeXML (http://nexml.org > ) phylogenetic data format. He wrote a great proposal, available > here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit > . > We will be working throughout the summer on the project, and will of > course come to you for sage advice. I know you will welcome him > warmly, as you did me. > Cheers, > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Mon May 11 12:24:06 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 11 May 2009 09:24:06 -0700 Subject: [Bioperl-l] Google Summer of Code student Chase Miller In-Reply-To: References: Message-ID: <59B4ABC0-7C98-4CD6-9629-50B2503F040E@bioperl.org> Welcome Chase. Look forward to the project and helping where needed. -jason On May 11, 2009, at 7:31 AM, Mark A. Jensen wrote: > Hello all, > With great pleasure, I want to introduce Chase Miller, my Google > Summer of Code student from George Washington University, to the > community. Chase will be working with me and Rutger Vos on a BioPerl > wrapper for Rutger's Bio::Phylo package, with a particular emphasis > on creating a BioPerl-native way to import and export the NeXML (http://nexml.org > ) phylogenetic data format. He wrote a great proposal, available > here: https://www.nescent.org/wg_phyloinformatics/PhyloSoC:BioPerl_integration_of_the_NeXML_exchange_standard_and_Bio::Phylo_toolkit > . > We will be working throughout the summer on the project, and will of > course come to you for sage advice. I know you will welcome him > warmly, as you did me. > Cheers, > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From rmb32 at cornell.edu Mon May 11 12:43:42 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 11 May 2009 09:43:42 -0700 Subject: [Bioperl-l] Moose [was Re:Other object oddities] In-Reply-To: <003BA940-D974-44A8-9634-55963C2E8341@illinois.edu> References: <79D2E471-A9D1-4759-BC1F-4FEE9A812788@berkeleybop.org> <4a047a3e.23bb720a.3b09.ffff9430@mx.google.com> <4076AEE2CB9F45138C6FE76DD807D0BA@NewLife> <003BA940-D974-44A8-9634-55963C2E8341@illinois.edu> Message-ID: <4A0855BE.80509@cornell.edu> Anybody going to YAPC::NA? There are some talks about managing dependencies and using CPAN, could be quite valuable for figuring out what to do about using modern perl techniques in BioPerl. http://yapc10.org/yn2009/talk/1985 http://yapc10.org/yn2009/talk/1975 There are probably more. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu Chris Fields wrote: > Decent article, but it is slightly misleading. These are dependencies > for Moose itself, which I don't have a problem with (off the subject, > but I personally would like to add in a requirement for Modern::Perl!). > > What I am worried about are lots of additional dependencies introduced > using some of the 'syntactic sugar' in various MooseX modules. For > instance, MooseX::Declare, and MooseX::Method::Signatures (two popular > ones): > > http://deps.cpantesters.org/?module=MooseX%3A%3ADeclare&perl=any+version&os=any+OS > > http://deps.cpantesters.org/?module=MooseX%3A%3AMethod%3A%3ASignatures&perl=any+version&os=any+OS > > > chris > > On May 8, 2009, at 8:33 PM, Mark A. Jensen wrote: From jm18 at sanger.ac.uk Sat May 9 06:55:29 2009 From: jm18 at sanger.ac.uk (John Marshall) Date: Sat, 9 May 2009 11:55:29 +0100 (BST) Subject: [Bioperl-l] fastq parsing problem In-Reply-To: References: Message-ID: Michael Muratet wrote: > I've got a problem parsing fastq output from the maq aligner. The > parser is throwing an exception for the following record: > > @HWI-EAS146:3:1:2:177#0/1 > CTCCGCTNNCTTCTCAG[...] > + > @,AB=>-&&:5).;+*=[...] > > I looked up the line in fastq.pm that does the parsing: > > 116 my ($top,$sequence,$top2,$qualsequence) = [...] This is the fastq parser from 1.5.2 or thereabouts, which had a bug (the $/ definition just above this code) that prevented it from parsing a record with a quality line starting with "@". This was probably not recognised as a bug for a long time due to the enduring myth that fastq quality lines always start with "!". The fastq next_seq() was rewritten for 1.6.0 and parses this successfully. (Unfortunately the documentation at the top of fastq.pm was not updated and still reflects the now-unused false belief about an initial "!" quality.) You may be able to just drop 1.6.0's Bio/SeqIO/fastq.pm in front of your existing Bioperl installation, if you're a little crazy and don't want to update the installation properly. If you do that, or if you update, you'll find that the new parser emits the following pedantic warning for your fastq sequences: MSG: Seq/Qual descriptions don't match; using sequence description In practice, lots of people (probably even most!) don't bother putting the sequence id on the "+" line, as it is entirely pointless duplication, instead leaving the "+" line otherwise empty. So I hope the maintainers agree that this warning should be relaxed, such as in the attached patch. Or even removed -- there was no equivalent warning in the previous code. Cheers, John -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. -------------- next part -------------- A non-text attachment was scrubbed... Name: qualdesc.diff Type: application/octet-stream Size: 580 bytes Desc: not available URL: From Joao.Fadista at agrsci.dk Mon May 11 05:31:43 2009 From: Joao.Fadista at agrsci.dk (fadista) Date: Mon, 11 May 2009 02:31:43 -0700 (PDT) Subject: [Bioperl-l] alignable portion of a genome Message-ID: <23480025.post@talk.nabble.com> Hi, I would like to know of a good and fast way that could help me calculate the alignable portion of a genome (not human), given a reference sequence. When I say alignable portion I mean that I want to know all the positions of the genome that can be covered uniquely by reads of 36 bp and up to 2 mismatches. Some have advised me to work with Perl using the following strategy but I am not a Perl user so if someone has already a script for this function, it would be nice: "you could approach it by walking along the genome in a sliding window of 36 nt, and hash the frequency of each 36 nt sequence that you encounter. Then count how many of the 36 nt sequences had a frequency of exactly one. Divide this by the total number of 36nt windows visited. This should be do-able in about 20 lines of Perl." Best regards and thanks in advance -- View this message in context: http://www.nabble.com/alignable-portion-of-a-genome-tp23480025p23480025.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From raulmendez at cbm.uam.es Mon May 11 09:16:44 2009 From: raulmendez at cbm.uam.es (Raul Mendez Giraldez) Date: Mon, 11 May 2009 15:16:44 +0200 Subject: [Bioperl-l] How to get coil prediction out of Bio::Tools::Run::Coil modules In-Reply-To: References: <1241799580.6963.165.camel@pepa.cbm.uam.es> Message-ID: <1242047804.6963.192.camel@pepa.cbm.uam.es> Hi Jason, Thank you so much for your suggestion, although it was my $featseq = $seqin->trunc($feature->start, $feature->end); sice the subseq method just give you an string with the sequence, trunc outputs a seqobj as it is needed to be passed to write_seq. Cheers, Raul El vie, 08-05-2009 a las 13:04 -0700, Jason Stajich escribi?: > The sequence isn't part of the report - or at least isn't parsed but > you can just do this (pseudo-y-code here). > my $seqout =Bio::SeqIO->new(-format => 'fasta'); > > > > > for my $feature ( @features ) > my $featseq = $seqin->subseq($feature->start, $feature->end); > $seqout->write_seq($featseq); > } > > > > On May 8, 2009, at 9:19 AM, Raul Mendez Giraldez wrote: > > > Hi, > > > > I'm trying to get coiled-coiled prediction in protein sequences > > using > > Bob Russell's program ncoils, through the bioperl interface > > Bio::Tools::Run::Coil, but the only thing I can get from any element > > on > > the features list is just the sequence name, and few more not so > > useful > > atributes. > > > > I'm running the following script: > > > > > > #!/home/rmendez/bin/perl -w > > > > use strict; > > use FileHandle; > > use Data::Dumper; > > > > use Bio::Tools::Run::Coil; > > > > my $seqin=filein.fasta > > my $factory=Bio::Tools::Run::Coil->new('-c'); > > my @features=$factory->run($seqin); > > > > print "Printing content of features[0]\n"; > > print Dumper $features[0]; > > > > ---- > > > > And the output is (the content of the first element of the features > > array) is : > > '_gsf_tag_hash' => { > > 'percent_id' => [ > > 'NULL' > > ], > > 'hid' => [ > > 'ncoils' > > ], > > 'evalue' => [ > > 0 > > ] > > }, > > '_location' => bless( { > > '_location_type' => 'EXACT', > > '_start' => 138, > > '_end' => 172 > > }, 'Bio::Location::Simple' ), > > '_gsf_seq_id' => 'ENSDARP00000084927', > > '_parse_h' => {}, > > '_root_cleanup_methods' => [ > > sub { "DUMMY" } > > ], > > '_source_tag' => 'Coils', > > '_primary_tag' => 'ncoils', > > '_root_verbose' => 0 > > }, 'Bio::SeqFeature::Generic' ); > > > > Then how could I get the sequence itself with the coil annotation > > 'xxx'? > > > > Thanks, > > > > Raul > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Jason Stajich > jason at bioperl.org > > > > > > > > From wgallin at ualberta.ca Mon May 11 21:35:58 2009 From: wgallin at ualberta.ca (Warren Gallin) Date: Mon, 11 May 2009 19:35:58 -0600 Subject: [Bioperl-l] Eutilities epost/efetch problem Message-ID: <2977DDC5-D26F-4643-AA4C-5A11EC323B94@ualberta.ca> Hi folks, Something started failing for me this morning that had been working reliably for the last week, I post an array of gi numbers, a history is successfully returned, but when I try to use efetch to get the records, it fails with the error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Response Error Not Found STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::GenericWebAgent::get_Response /Library/Perl/5.8.8/Bio/ DB/GenericWebAgent.pm:215 STACK: 090507_Stable_gb_update.pl:238 ----------------------------------------------------------- I'm running the efetch inside an eval and letting it try a total of 6 times with a 5 sedond sleep in between, but the error is consistent. So I consider two possibilities: 1) Has something changed on the Entrez server recently? Has anyone else started having this kind of problem? 2) Have I inserted some subtle flaw into my code that would lead to a failure of efetch. I am attaching two text files, one with the code chunklet that is doing this and the other the output from the script. Any help or suggestions are profoundly appreciated. Warren Gallin -------------- next part -------------- A non-text attachment was scrubbed... Name: Fetch_Fail Type: application/octet-stream Size: 2659 bytes Desc: not available URL: -------------- next part -------------- -------------- next part -------------- A non-text attachment was scrubbed... Name: Fetch_Fail_Output Type: application/octet-stream Size: 2685 bytes Desc: not available URL: -------------- next part -------------- From cjfields at illinois.edu Mon May 11 23:07:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 11 May 2009 22:07:56 -0500 Subject: [Bioperl-l] fastq parsing problem In-Reply-To: References: Message-ID: <0A77B262-B808-4A02-82CB-16970EBF4C2C@illinois.edu> On May 9, 2009, at 5:55 AM, John Marshall wrote: > Michael Muratet wrote: >> I've got a problem parsing fastq output from the maq aligner. The >> parser is throwing an exception for the following record: >> >> @HWI-EAS146:3:1:2:177#0/1 >> CTCCGCTNNCTTCTCAG[...] >> + >> @,AB=>-&&:5).;+*=[...] >> >> I looked up the line in fastq.pm that does the parsing: >> >> 116 my ($top,$sequence,$top2,$qualsequence) = [...] > > This is the fastq parser from 1.5.2 or thereabouts, which had a bug > (the > $/ definition just above this code) that prevented it from parsing a > record with a quality line starting with "@". This was probably not > recognised as a bug for a long time due to the enduring myth that > fastq > quality lines always start with "!". > > The fastq next_seq() was rewritten for 1.6.0 and parses this > successfully. > (Unfortunately the documentation at the top of fastq.pm was not > updated > and still reflects the now-unused false belief about an initial "!" > quality.) > > You may be able to just drop 1.6.0's Bio/SeqIO/fastq.pm in front of > your > existing Bioperl installation, if you're a little crazy and don't > want to > update the installation properly. If you do that, or if you update, > you'll find that the new parser emits the following pedantic warning > for > your fastq sequences: > > MSG: Seq/Qual descriptions don't match; using sequence description > > In practice, lots of people (probably even most!) don't bother > putting the > sequence id on the "+" line, as it is entirely pointless duplication, > instead leaving the "+" line otherwise empty. So I hope the > maintainers > agree that this warning should be relaxed, such as in the attached > patch. > Or even removed -- there was no equivalent warning in the previous > code. > > Cheers, > > John Okay, patch committed (also removed the blurb about '!'). Thanks! chris From Russell.Smithies at agresearch.co.nz Mon May 11 23:55:39 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 12 May 2009 15:55:39 +1200 Subject: [Bioperl-l] alignable portion of a genome In-Reply-To: <23480025.post@talk.nabble.com> References: <23480025.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493DA5373@exchsth.agresearch.co.nz> Perfect matches is easy: $seq = "atcgacgatcgaacgatcga"; foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++} foreach (keys %hash){ $singles++ if($hash{$_} eq 1)} print $singles/$h; Could probably be done with map as well. Counting the miss-matches might take a bit more thinking.... Any ideas MAJ? --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of fadista > Sent: Monday, 11 May 2009 9:32 p.m. > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] alignable portion of a genome > > > Hi, > > I would like to know of a good and fast way that could help me calculate the > alignable portion of a genome (not human), given a reference sequence. > When I say alignable portion I mean that I want to know all the positions of > the genome that can be covered uniquely by reads of 36 bp and up to 2 > mismatches. > > Some have advised me to work with Perl using the following strategy but I am > not a Perl user so if someone has already a script for this function, it > would be nice: > > "you could approach it by walking along the genome in a sliding window of > 36 nt, and hash the frequency of each 36 nt sequence that you encounter. > Then count how many of the 36 nt sequences had a frequency of exactly > one. Divide this by the total number of 36nt windows visited. This > should be do-able in about 20 lines of Perl." > > > Best regards and thanks in advance > > -- > View this message in context: http://www.nabble.com/alignable-portion-of-a- > genome-tp23480025p23480025.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From dan.bolser at gmail.com Tue May 12 05:10:59 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 12 May 2009 10:10:59 +0100 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features' from SearchIO?) Message-ID: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com> Thanks for the info guys, I think I was naively hoping that the feature would know how to cast itself as a 'SeqFeature' (GFF). I think I understand the problem better now, so I'll try to summarise: There is no standard way to encode a HSP as a feature (not least because there are two choices about which sequence (query or the hit) it should be attached to). BioPerl will try, but the result will not be "well structured" SeqFeatures or "well formed" GFF. >From what I read I guess it should be possible to standardize this mapping (based on something in one of the examples or the 'search2gff' script), assuming you specify weather you want features put on the query or on the hit. At some point last year I was trying out the bp_search2gff.pl and my own code to write a GFF file for loading and viewing by Gbrowse. At that time I gave up, as nothing seemed to be working. I was hoping that doing this at a lower level (i.e. never writing any GFF myself) it would stand a better chance of working. Also I was thinking that Gbrowse, if given a SeqFeature::Store, could autoconfigure its interface to some degree. I guess its back to the docs ;-) I'll keep trying and see if I can get anywhere. Thanks again, Dan. References for the above: 2009/5/11 Jason Stajich : > otherwise you need to be converting the HSPs into seqfeatures with the right associated information (i.e. the tag/value pairs that are in the 9th column) in order to have well structured data in the database. > You can get the individual features from the feature pair with $hsp->query or $hsp->hit which can also be passed to a GFF writer (or call $hsp->hit->gff_string). Note that since the data storage is not structured in a GFF3 like-way this won't immediately produce well formed GFF3 for the 9th column. 2009/5/11 Chris Fields : > The main problem is the mapping is subjective based on what your reference sequence is within the BLAST run (e.g. whether it is the query or the hit), and is something that can't be automatically discerned. I ended up rolling my own with SeqFeature::Store (just mapped the relevant data to Bio::DB::SeqFeatures), but I have long wanted to fix up the relevant scripts to integrate my changes in, just haven't had the time From miguel.pignatelli at uv.es Tue May 12 04:45:46 2009 From: miguel.pignatelli at uv.es (Miguel Pignatelli) Date: Tue, 12 May 2009 10:45:46 +0200 Subject: [Bioperl-l] alignable portion of a genome In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493DA5373@exchsth.agresearch.co.nz> References: <23480025.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF32493DA5373@exchsth.agresearch.co.nz> Message-ID: For mismatches, take a look at the CPAN module Text::LevenshteinXS which calculates the Levenshtein distance (edit distance) of two strings. For more information about Levenshtein distance: http://en.wikipedia.org/wiki/Levenshtein_distance M; El 12/05/2009, a las 5:55, Smithies, Russell escribi?: > Perfect matches is easy: > > $seq = "atcgacgatcgaacgatcga"; > > foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++} > foreach (keys %hash){ $singles++ if($hash{$_} eq 1)} > print $singles/$h; > > Could probably be done with map as well. > Counting the miss-matches might take a bit more thinking.... > Any ideas MAJ? > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of fadista >> Sent: Monday, 11 May 2009 9:32 p.m. >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] alignable portion of a genome >> >> >> Hi, >> >> I would like to know of a good and fast way that could help me >> calculate the >> alignable portion of a genome (not human), given a reference >> sequence. >> When I say alignable portion I mean that I want to know all the >> positions of >> the genome that can be covered uniquely by reads of 36 bp and up to 2 >> mismatches. >> >> Some have advised me to work with Perl using the following strategy >> but I am >> not a Perl user so if someone has already a script for this >> function, it >> would be nice: >> >> "you could approach it by walking along the genome in a sliding >> window of >> 36 nt, and hash the frequency of each 36 nt sequence that you >> encounter. >> Then count how many of the 36 nt sequences had a frequency of exactly >> one. Divide this by the total number of 36nt windows visited. This >> should be do-able in about 20 lines of Perl." >> >> >> Best regards and thanks in advance >> >> -- >> View this message in context: http://www.nabble.com/alignable-portion-of-a- >> genome-tp23480025p23480025.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From dan.bolser at gmail.com Tue May 12 06:11:39 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 12 May 2009 11:11:39 +0100 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features' from SearchIO?) In-Reply-To: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com> References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com> Message-ID: <2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com> Unfortunately bp_search2gff.pl is giving me errors: bp_search2gff.pl --version 3 -i BlastResults/blast_table_filtered -f blasttable -o BlastResults/blast_table_filtered.gff -t hit --match --target --component --------------------- WARNING --------------------- MSG: Removing score value(s) --------------------------------------------------- Can't locate object method "remove_tags" via package "Bio::SeqFeature::Similarity" at /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm line 393, line 5. Anyone seen this before? Cheers, Dan. 2009/5/12 Dan Bolser : > Thanks for the info guys, I think I was naively hoping that the > feature would know how to cast itself as a 'SeqFeature' (GFF). > > I think I understand the problem better now, so I'll try to summarise: > > There is no standard way to encode a HSP as a feature (not least > because there are two choices about which sequence (query or the hit) > it should be attached to). BioPerl will try, but the result will not > be "well structured" SeqFeatures or "well formed" GFF. > > > From what I read I guess it should be possible to standardize this > mapping (based on something in one of the examples or the 'search2gff' > script), assuming you specify weather you want features put on the > query or on the hit. > > At some point last year I was trying out the bp_search2gff.pl and my > own code to write a GFF file for loading and viewing by Gbrowse. At > that time I gave up, as nothing seemed to be working. I was hoping > that doing this at a lower level (i.e. never writing any GFF myself) > it would stand a better chance of working. > > Also I was thinking that Gbrowse, if given a SeqFeature::Store, could > autoconfigure its interface to some degree. I guess its back to the > docs ;-) > > > > I'll keep trying and see if I can get anywhere. > > Thanks again, > Dan. > > > > References for the above: > > 2009/5/11 Jason Stajich : > >> otherwise you need to be converting the HSPs into seqfeatures with the right associated information (i.e. the tag/value pairs that are in the 9th column) in order to have well structured data in the database. > >> You can get the individual features from the feature pair with $hsp->query ?or $hsp->hit ?which can also be passed to a GFF writer (or call $hsp->hit->gff_string). ? Note that since the data storage is not structured in a GFF3 like-way this won't immediately produce well formed GFF3 for the 9th column. > > > 2009/5/11 Chris Fields : > >> The main problem is the mapping is subjective based on what your reference sequence is within the BLAST run (e.g. whether it is the query or the hit), and is something that can't be automatically discerned. ?I ended up rolling my own with SeqFeature::Store (just mapped the relevant data to Bio::DB::SeqFeatures), but I have long wanted to fix up the relevant scripts to integrate my changes in, just haven't had the time > From dan.bolser at gmail.com Tue May 12 06:55:34 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 12 May 2009 11:55:34 +0100 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features' from SearchIO?) In-Reply-To: <2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com> References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com> <2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com> Message-ID: <2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com> 2009/5/12 Dan Bolser : > Unfortunately bp_search2gff.pl is giving me errors: > > bp_search2gff.pl --version 3 ? -i BlastResults/blast_table_filtered -f > blasttable ? -o BlastResults/blast_table_filtered.gff ? -t hit > --match ? --target ? --component > > --------------------- WARNING --------------------- > MSG: Removing score value(s) > --------------------------------------------------- > Can't locate object method "remove_tags" via package > "Bio::SeqFeature::Similarity" at > /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm line > 393, line 5. I'm just learning the ropes... --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11 15:25:55.000000000 +0100 +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12 11:52:41.000000000 +0100 @@ -390,7 +390,7 @@ } if ($self->has_tag('score')) { $self->warn("Removing score value(s)"); - $self->remove_tags('score'); + $self->remove_tag('score'); } $self->add_tag_value('score',$value); } > Anyone seen this before? > > Cheers, > Dan. > > > > 2009/5/12 Dan Bolser : >> Thanks for the info guys, I think I was naively hoping that the >> feature would know how to cast itself as a 'SeqFeature' (GFF). >> >> I think I understand the problem better now, so I'll try to summarise: >> >> There is no standard way to encode a HSP as a feature (not least >> because there are two choices about which sequence (query or the hit) >> it should be attached to). BioPerl will try, but the result will not >> be "well structured" SeqFeatures or "well formed" GFF. >> >> >> From what I read I guess it should be possible to standardize this >> mapping (based on something in one of the examples or the 'search2gff' >> script), assuming you specify weather you want features put on the >> query or on the hit. >> >> At some point last year I was trying out the bp_search2gff.pl and my >> own code to write a GFF file for loading and viewing by Gbrowse. At >> that time I gave up, as nothing seemed to be working. I was hoping >> that doing this at a lower level (i.e. never writing any GFF myself) >> it would stand a better chance of working. >> >> Also I was thinking that Gbrowse, if given a SeqFeature::Store, could >> autoconfigure its interface to some degree. I guess its back to the >> docs ;-) >> >> >> >> I'll keep trying and see if I can get anywhere. >> >> Thanks again, >> Dan. >> >> >> >> References for the above: >> >> 2009/5/11 Jason Stajich : >> >>> otherwise you need to be converting the HSPs into seqfeatures with the right associated information (i.e. the tag/value pairs that are in the 9th column) in order to have well structured data in the database. >> >>> You can get the individual features from the feature pair with $hsp->query ?or $hsp->hit ?which can also be passed to a GFF writer (or call $hsp->hit->gff_string). ? Note that since the data storage is not structured in a GFF3 like-way this won't immediately produce well formed GFF3 for the 9th column. >> >> >> 2009/5/11 Chris Fields : >> >>> The main problem is the mapping is subjective based on what your reference sequence is within the BLAST run (e.g. whether it is the query or the hit), and is something that can't be automatically discerned. ?I ended up rolling my own with SeqFeature::Store (just mapped the relevant data to Bio::DB::SeqFeatures), but I have long wanted to fix up the relevant scripts to integrate my changes in, just haven't had the time >> > From ajmackey at gmail.com Tue May 12 08:18:25 2009 From: ajmackey at gmail.com (Aaron Mackey) Date: Tue, 12 May 2009 08:18:25 -0400 Subject: [Bioperl-l] alignable portion of a genome In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493DA5373@exchsth.agresearch.co.nz> References: <23480025.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF32493DA5373@exchsth.agresearch.co.nz> Message-ID: <24c96eca0905120518v4a5aae30r364986ef1211afaf@mail.gmail.com> A better idea than using Perl to count the mismatches is to actually generate all the (unique) 36-mers from the reference genome as an artificial set of "reads", and then use a program like Mosaik or maq to align them back to the reference genome. Both tools have means of then reporting the coverage along the genome of uniquely aligned reads. That way you can also change the Mosaik/maq parameters to reflect your true read alignment strategy. -Aaron On Mon, May 11, 2009 at 11:55 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > Perfect matches is easy: > > $seq = "atcgacgatcgaacgatcga"; > > foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++} > foreach (keys %hash){ $singles++ if($hash{$_} eq 1)} > print $singles/$h; > > Could probably be done with map as well. > Counting the miss-matches might take a bit more thinking.... > Any ideas MAJ? > > --Russell > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of fadista > > Sent: Monday, 11 May 2009 9:32 p.m. > > To: Bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] alignable portion of a genome > > > > > > Hi, > > > > I would like to know of a good and fast way that could help me calculate > the > > alignable portion of a genome (not human), given a reference sequence. > > When I say alignable portion I mean that I want to know all the positions > of > > the genome that can be covered uniquely by reads of 36 bp and up to 2 > > mismatches. > > > > Some have advised me to work with Perl using the following strategy but I > am > > not a Perl user so if someone has already a script for this function, it > > would be nice: > > > > "you could approach it by walking along the genome in a sliding window of > > 36 nt, and hash the frequency of each 36 nt sequence that you encounter. > > Then count how many of the 36 nt sequences had a frequency of exactly > > one. Divide this by the total number of 36nt windows visited. This > > should be do-able in about 20 lines of Perl." > > > > > > Best regards and thanks in advance > > > > -- > > View this message in context: > http://www.nabble.com/alignable-portion-of-a- > > genome-tp23480025p23480025.html > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue May 12 08:23:35 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 May 2009 07:23:35 -0500 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features' from SearchIO?) In-Reply-To: <2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com> References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com> <2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com> <2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com> Message-ID: <9ED2DE76-C331-41D2-B303-7BD97B5CAF10@illinois.edu> Fixed that in svn. We're all still learning the ropes... chris On May 12, 2009, at 5:55 AM, Dan Bolser wrote: > 2009/5/12 Dan Bolser : >> Unfortunately bp_search2gff.pl is giving me errors: >> >> bp_search2gff.pl --version 3 -i BlastResults/blast_table_filtered >> -f >> blasttable -o BlastResults/blast_table_filtered.gff -t hit >> --match --target --component >> >> --------------------- WARNING --------------------- >> MSG: Removing score value(s) >> --------------------------------------------------- >> Can't locate object method "remove_tags" via package >> "Bio::SeqFeature::Similarity" at >> /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm line >> 393, line 5. > > > I'm just learning the ropes... > > --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11 > 15:25:55.000000000 +0100 > +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12 > 11:52:41.000000000 +0100 > @@ -390,7 +390,7 @@ > } > if ($self->has_tag('score')) { > $self->warn("Removing score value(s)"); > - $self->remove_tags('score'); > + $self->remove_tag('score'); > } > $self->add_tag_value('score',$value); > } > > > > > >> Anyone seen this before? >> >> Cheers, >> Dan. >> >> >> >> 2009/5/12 Dan Bolser : >>> Thanks for the info guys, I think I was naively hoping that the >>> feature would know how to cast itself as a 'SeqFeature' (GFF). >>> >>> I think I understand the problem better now, so I'll try to >>> summarise: >>> >>> There is no standard way to encode a HSP as a feature (not least >>> because there are two choices about which sequence (query or the >>> hit) >>> it should be attached to). BioPerl will try, but the result will not >>> be "well structured" SeqFeatures or "well formed" GFF. >>> >>> >>> From what I read I guess it should be possible to standardize this >>> mapping (based on something in one of the examples or the >>> 'search2gff' >>> script), assuming you specify weather you want features put on the >>> query or on the hit. >>> >>> At some point last year I was trying out the bp_search2gff.pl and my >>> own code to write a GFF file for loading and viewing by Gbrowse. At >>> that time I gave up, as nothing seemed to be working. I was hoping >>> that doing this at a lower level (i.e. never writing any GFF myself) >>> it would stand a better chance of working. >>> >>> Also I was thinking that Gbrowse, if given a SeqFeature::Store, >>> could >>> autoconfigure its interface to some degree. I guess its back to the >>> docs ;-) >>> >>> >>> >>> I'll keep trying and see if I can get anywhere. >>> >>> Thanks again, >>> Dan. >>> >>> >>> >>> References for the above: >>> >>> 2009/5/11 Jason Stajich : >>> >>>> otherwise you need to be converting the HSPs into seqfeatures >>>> with the right associated information (i.e. the tag/value pairs >>>> that are in the 9th column) in order to have well structured data >>>> in the database. >>> >>>> You can get the individual features from the feature pair with >>>> $hsp->query or $hsp->hit which can also be passed to a GFF >>>> writer (or call $hsp->hit->gff_string). Note that since the >>>> data storage is not structured in a GFF3 like-way this won't >>>> immediately produce well formed GFF3 for the 9th column. >>> >>> >>> 2009/5/11 Chris Fields : >>> >>>> The main problem is the mapping is subjective based on what your >>>> reference sequence is within the BLAST run (e.g. whether it is >>>> the query or the hit), and is something that can't be >>>> automatically discerned. I ended up rolling my own with >>>> SeqFeature::Store (just mapped the relevant data to >>>> Bio::DB::SeqFeatures), but I have long wanted to fix up the >>>> relevant scripts to integrate my changes in, just haven't had the >>>> time >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.bolser at gmail.com Tue May 12 09:17:56 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 12 May 2009 14:17:56 +0100 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features' from SearchIO?) In-Reply-To: <9ED2DE76-C331-41D2-B303-7BD97B5CAF10@illinois.edu> References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com> <2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com> <2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com> <9ED2DE76-C331-41D2-B303-7BD97B5CAF10@illinois.edu> Message-ID: <2c8757af0905120617v37aa7b8udf94ca558ed3415@mail.gmail.com> 2009/5/12 Chris Fields : > Fixed that in svn. ?We're all still learning the ropes... In that case, I'm seeing multiple instances of... Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 256 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 412 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 429 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 465 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 473 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 494 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 502 Hmm... I was about to go on to complain about the weird GFF that I was seeing, but suddenly it looks OK. My bioperl install must think your standing over my shoulder and is therefore behaving itself! Thanks again for all the help, Dan. > chris > > On May 12, 2009, at 5:55 AM, Dan Bolser wrote: > >> 2009/5/12 Dan Bolser : >>> >>> Unfortunately bp_search2gff.pl is giving me errors: >>> >>> bp_search2gff.pl --version 3 ? -i BlastResults/blast_table_filtered -f >>> blasttable ? -o BlastResults/blast_table_filtered.gff ? -t hit >>> --match ? --target ? --component >>> >>> --------------------- WARNING --------------------- >>> MSG: Removing score value(s) >>> --------------------------------------------------- >>> Can't locate object method "remove_tags" via package >>> "Bio::SeqFeature::Similarity" at >>> /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm line >>> 393, line 5. >> >> >> I'm just learning the ropes... >> >> --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ ? 2009-05-11 >> 15:25:55.000000000 +0100 >> +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm ? ?2009-05-12 >> 11:52:41.000000000 +0100 >> @@ -390,7 +390,7 @@ >> ? ? ? ?} >> ? ? ? ?if ($self->has_tag('score')) { >> ? ? ? ? ? ?$self->warn("Removing score value(s)"); >> - ? ? ? ? ? ?$self->remove_tags('score'); >> + ? ? ? ? ? ?$self->remove_tag('score'); >> ? ? ? ?} >> ? ? ? ?$self->add_tag_value('score',$value); >> ? ?} >> >> >> >> >> >>> Anyone seen this before? >>> >>> Cheers, >>> Dan. >>> >>> >>> >>> 2009/5/12 Dan Bolser : >>>> >>>> Thanks for the info guys, I think I was naively hoping that the >>>> feature would know how to cast itself as a 'SeqFeature' (GFF). >>>> >>>> I think I understand the problem better now, so I'll try to summarise: >>>> >>>> There is no standard way to encode a HSP as a feature (not least >>>> because there are two choices about which sequence (query or the hit) >>>> it should be attached to). BioPerl will try, but the result will not >>>> be "well structured" SeqFeatures or "well formed" GFF. >>>> >>>> >>>> From what I read I guess it should be possible to standardize this >>>> mapping (based on something in one of the examples or the 'search2gff' >>>> script), assuming you specify weather you want features put on the >>>> query or on the hit. >>>> >>>> At some point last year I was trying out the bp_search2gff.pl and my >>>> own code to write a GFF file for loading and viewing by Gbrowse. At >>>> that time I gave up, as nothing seemed to be working. I was hoping >>>> that doing this at a lower level (i.e. never writing any GFF myself) >>>> it would stand a better chance of working. >>>> >>>> Also I was thinking that Gbrowse, if given a SeqFeature::Store, could >>>> autoconfigure its interface to some degree. I guess its back to the >>>> docs ;-) >>>> >>>> >>>> >>>> I'll keep trying and see if I can get anywhere. >>>> >>>> Thanks again, >>>> Dan. >>>> >>>> >>>> >>>> References for the above: >>>> >>>> 2009/5/11 Jason Stajich : >>>> >>>>> otherwise you need to be converting the HSPs into seqfeatures with the >>>>> right associated information (i.e. the tag/value pairs that are in the 9th >>>>> column) in order to have well structured data in the database. >>>> >>>>> You can get the individual features from the feature pair with >>>>> $hsp->query ?or $hsp->hit ?which can also be passed to a GFF writer (or call >>>>> $hsp->hit->gff_string). ? Note that since the data storage is not structured >>>>> in a GFF3 like-way this won't immediately produce well formed GFF3 for the >>>>> 9th column. >>>> >>>> >>>> 2009/5/11 Chris Fields : >>>> >>>>> The main problem is the mapping is subjective based on what your >>>>> reference sequence is within the BLAST run (e.g. whether it is the query or >>>>> the hit), and is something that can't be automatically discerned. ?I ended >>>>> up rolling my own with SeqFeature::Store (just mapped the relevant data to >>>>> Bio::DB::SeqFeatures), but I have long wanted to fix up the relevant scripts >>>>> to integrate my changes in, just haven't had the time >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue May 12 09:29:32 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 May 2009 09:29:32 -0400 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features' fromSearchIO?) In-Reply-To: <2c8757af0905120617v37aa7b8udf94ca558ed3415@mail.gmail.com> References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com><2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com><2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com><9ED2DE76-C331-41D2-B303-7BD97B5CAF10@illinois.edu> <2c8757af0905120617v37aa7b8udf94ca558ed3415@mail.gmail.com> Message-ID: This sounds like a $sum = eval join( '+', @a); problem, which can be fixed with $sum = eval join('+', map { $_ || () } @a) ; MAJ ----- Original Message ----- From: "Dan Bolser" To: "Chris Fields" Cc: "BioPerl List" Sent: Tuesday, May 12, 2009 9:17 AM Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features' fromSearchIO?) 2009/5/12 Chris Fields : > Fixed that in svn. We're all still learning the ropes... In that case, I'm seeing multiple instances of... Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 256 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 412 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 429 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 465 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 473 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 494 Argument "" isn't numeric in addition (+) at Bio/Search/SearchUtils.pm line 502 Hmm... I was about to go on to complain about the weird GFF that I was seeing, but suddenly it looks OK. My bioperl install must think your standing over my shoulder and is therefore behaving itself! Thanks again for all the help, Dan. > chris > > On May 12, 2009, at 5:55 AM, Dan Bolser wrote: > >> 2009/5/12 Dan Bolser : >>> >>> Unfortunately bp_search2gff.pl is giving me errors: >>> >>> bp_search2gff.pl --version 3 -i BlastResults/blast_table_filtered -f >>> blasttable -o BlastResults/blast_table_filtered.gff -t hit >>> --match --target --component >>> >>> --------------------- WARNING --------------------- >>> MSG: Removing score value(s) >>> --------------------------------------------------- >>> Can't locate object method "remove_tags" via package >>> "Bio::SeqFeature::Similarity" at >>> /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm line >>> 393, line 5. >> >> >> I'm just learning the ropes... >> >> --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11 >> 15:25:55.000000000 +0100 >> +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12 >> 11:52:41.000000000 +0100 >> @@ -390,7 +390,7 @@ >> } >> if ($self->has_tag('score')) { >> $self->warn("Removing score value(s)"); >> - $self->remove_tags('score'); >> + $self->remove_tag('score'); >> } >> $self->add_tag_value('score',$value); >> } >> >> >> >> >> >>> Anyone seen this before? >>> >>> Cheers, >>> Dan. >>> >>> >>> >>> 2009/5/12 Dan Bolser : >>>> >>>> Thanks for the info guys, I think I was naively hoping that the >>>> feature would know how to cast itself as a 'SeqFeature' (GFF). >>>> >>>> I think I understand the problem better now, so I'll try to summarise: >>>> >>>> There is no standard way to encode a HSP as a feature (not least >>>> because there are two choices about which sequence (query or the hit) >>>> it should be attached to). BioPerl will try, but the result will not >>>> be "well structured" SeqFeatures or "well formed" GFF. >>>> >>>> >>>> From what I read I guess it should be possible to standardize this >>>> mapping (based on something in one of the examples or the 'search2gff' >>>> script), assuming you specify weather you want features put on the >>>> query or on the hit. >>>> >>>> At some point last year I was trying out the bp_search2gff.pl and my >>>> own code to write a GFF file for loading and viewing by Gbrowse. At >>>> that time I gave up, as nothing seemed to be working. I was hoping >>>> that doing this at a lower level (i.e. never writing any GFF myself) >>>> it would stand a better chance of working. >>>> >>>> Also I was thinking that Gbrowse, if given a SeqFeature::Store, could >>>> autoconfigure its interface to some degree. I guess its back to the >>>> docs ;-) >>>> >>>> >>>> >>>> I'll keep trying and see if I can get anywhere. >>>> >>>> Thanks again, >>>> Dan. >>>> >>>> >>>> >>>> References for the above: >>>> >>>> 2009/5/11 Jason Stajich : >>>> >>>>> otherwise you need to be converting the HSPs into seqfeatures with the >>>>> right associated information (i.e. the tag/value pairs that are in the 9th >>>>> column) in order to have well structured data in the database. >>>> >>>>> You can get the individual features from the feature pair with >>>>> $hsp->query or $hsp->hit which can also be passed to a GFF writer (or call >>>>> $hsp->hit->gff_string). Note that since the data storage is not structured >>>>> in a GFF3 like-way this won't immediately produce well formed GFF3 for the >>>>> 9th column. >>>> >>>> >>>> 2009/5/11 Chris Fields : >>>> >>>>> The main problem is the mapping is subjective based on what your >>>>> reference sequence is within the BLAST run (e.g. whether it is the query >>>>> or >>>>> the hit), and is something that can't be automatically discerned. I ended >>>>> up rolling my own with SeqFeature::Store (just mapped the relevant data to >>>>> Bio::DB::SeqFeatures), but I have long wanted to fix up the relevant >>>>> scripts >>>>> to integrate my changes in, just haven't had the time >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue May 12 10:04:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 May 2009 09:04:26 -0500 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features' fromSearchIO?) In-Reply-To: References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com><2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com><2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com><9ED2DE76-C331-41D2-B303-7BD97B5CAF10@illinois.edu> <2c8757af0905120617v37aa7b8udf94ca558ed3415@mail.gmail.com> Message-ID: More complicated than that, I'm afraid. We should try to fix that at the source of the problem. This appears to stem from SearchUtils HSP tiling, which in turn utilizes HSPI::matches(), which in turn checks num_identical/ num_conserved. My guess is, since this is blasttable format, one of these isn't set and thus is returning the wrong value. I'll attempt to track it down today, but it may take some time. chris On May 12, 2009, at 8:29 AM, Mark A. Jensen wrote: > This sounds like a > > $sum = eval join( '+', @a); > > problem, which can be fixed with > > $sum = eval join('+', map { $_ || () } @a) ; > > MAJ > ----- Original Message ----- From: "Dan Bolser" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Tuesday, May 12, 2009 9:17 AM > Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features' > fromSearchIO?) > > > 2009/5/12 Chris Fields : >> Fixed that in svn. We're all still learning the ropes... > > In that case, I'm seeing multiple instances of... > > Argument "" isn't numeric in addition (+) at Bio/Search/ > SearchUtils.pm line 256 > Argument "" isn't numeric in addition (+) at Bio/Search/ > SearchUtils.pm line 412 > Argument "" isn't numeric in addition (+) at Bio/Search/ > SearchUtils.pm line 429 > Argument "" isn't numeric in addition (+) at Bio/Search/ > SearchUtils.pm line 465 > Argument "" isn't numeric in addition (+) at Bio/Search/ > SearchUtils.pm line 473 > Argument "" isn't numeric in addition (+) at Bio/Search/ > SearchUtils.pm line 494 > Argument "" isn't numeric in addition (+) at Bio/Search/ > SearchUtils.pm line 502 > > > Hmm... I was about to go on to complain about the weird GFF that I was > seeing, but suddenly it looks OK. My bioperl install must think your > standing over my shoulder and is therefore behaving itself! > > > Thanks again for all the help, > Dan. > > > > >> chris >> >> On May 12, 2009, at 5:55 AM, Dan Bolser wrote: >> >>> 2009/5/12 Dan Bolser : >>>> >>>> Unfortunately bp_search2gff.pl is giving me errors: >>>> >>>> bp_search2gff.pl --version 3 -i BlastResults/blast_table_filtered >>>> -f >>>> blasttable -o BlastResults/blast_table_filtered.gff -t hit >>>> --match --target --component >>>> >>>> --------------------- WARNING --------------------- >>>> MSG: Removing score value(s) >>>> --------------------------------------------------- >>>> Can't locate object method "remove_tags" via package >>>> "Bio::SeqFeature::Similarity" at >>>> /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm >>>> line >>>> 393, line 5. >>> >>> >>> I'm just learning the ropes... >>> >>> --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11 >>> 15:25:55.000000000 +0100 >>> +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12 >>> 11:52:41.000000000 +0100 >>> @@ -390,7 +390,7 @@ >>> } >>> if ($self->has_tag('score')) { >>> $self->warn("Removing score value(s)"); >>> - $self->remove_tags('score'); >>> + $self->remove_tag('score'); >>> } >>> $self->add_tag_value('score',$value); >>> } >>> >>> >>> >>> >>> >>>> Anyone seen this before? >>>> >>>> Cheers, >>>> Dan. >>>> >>>> >>>> >>>> 2009/5/12 Dan Bolser : >>>>> >>>>> Thanks for the info guys, I think I was naively hoping that the >>>>> feature would know how to cast itself as a 'SeqFeature' (GFF). >>>>> >>>>> I think I understand the problem better now, so I'll try to >>>>> summarise: >>>>> >>>>> There is no standard way to encode a HSP as a feature (not least >>>>> because there are two choices about which sequence (query or the >>>>> hit) >>>>> it should be attached to). BioPerl will try, but the result will >>>>> not >>>>> be "well structured" SeqFeatures or "well formed" GFF. >>>>> >>>>> >>>>> From what I read I guess it should be possible to standardize this >>>>> mapping (based on something in one of the examples or the >>>>> 'search2gff' >>>>> script), assuming you specify weather you want features put on the >>>>> query or on the hit. >>>>> >>>>> At some point last year I was trying out the bp_search2gff.pl >>>>> and my >>>>> own code to write a GFF file for loading and viewing by Gbrowse. >>>>> At >>>>> that time I gave up, as nothing seemed to be working. I was hoping >>>>> that doing this at a lower level (i.e. never writing any GFF >>>>> myself) >>>>> it would stand a better chance of working. >>>>> >>>>> Also I was thinking that Gbrowse, if given a SeqFeature::Store, >>>>> could >>>>> autoconfigure its interface to some degree. I guess its back to >>>>> the >>>>> docs ;-) >>>>> >>>>> >>>>> >>>>> I'll keep trying and see if I can get anywhere. >>>>> >>>>> Thanks again, >>>>> Dan. >>>>> >>>>> >>>>> >>>>> References for the above: >>>>> >>>>> 2009/5/11 Jason Stajich : >>>>> >>>>>> otherwise you need to be converting the HSPs into seqfeatures >>>>>> with the >>>>>> right associated information (i.e. the tag/value pairs that are >>>>>> in the 9th >>>>>> column) in order to have well structured data in the database. >>>>> >>>>>> You can get the individual features from the feature pair with >>>>>> $hsp->query or $hsp->hit which can also be passed to a GFF >>>>>> writer (or call >>>>>> $hsp->hit->gff_string). Note that since the data storage is not >>>>>> structured >>>>>> in a GFF3 like-way this won't immediately produce well formed >>>>>> GFF3 for the >>>>>> 9th column. >>>>> >>>>> >>>>> 2009/5/11 Chris Fields : >>>>> >>>>>> The main problem is the mapping is subjective based on what your >>>>>> reference sequence is within the BLAST run (e.g. whether it is >>>>>> the query or >>>>>> the hit), and is something that can't be automatically >>>>>> discerned. I ended >>>>>> up rolling my own with SeqFeature::Store (just mapped the >>>>>> relevant data to >>>>>> Bio::DB::SeqFeatures), but I have long wanted to fix up the >>>>>> relevant scripts >>>>>> to integrate my changes in, just haven't had the time >>>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From mmuratet at hudsonalpha.org Tue May 12 10:31:21 2009 From: mmuratet at hudsonalpha.org (Michael Muratet) Date: Tue, 12 May 2009 09:31:21 -0500 Subject: [Bioperl-l] fastq parsing problem In-Reply-To: References: Message-ID: On May 9, 2009, at 5:55 AM, John Marshall wrote: > Michael Muratet wrote: >> I've got a problem parsing fastq output from the maq aligner. The >> parser is throwing an exception for the following record: >> >> @HWI-EAS146:3:1:2:177#0/1 >> CTCCGCTNNCTTCTCAG[...] >> + >> @,AB=>-&&:5).;+*=[...] >> >> I looked up the line in fastq.pm that does the parsing: >> >> 116 my ($top,$sequence,$top2,$qualsequence) = [...] > > This is the fastq parser from 1.5.2 or thereabouts, which had a bug > (the > $/ definition just above this code) that prevented it from parsing a > record with a quality line starting with "@". This was probably not > recognised as a bug for a long time due to the enduring myth that > fastq > quality lines always start with "!". > > The fastq next_seq() was rewritten for 1.6.0 and parses this > successfully. > (Unfortunately the documentation at the top of fastq.pm was not > updated > and still reflects the now-unused false belief about an initial "!" > quality.) > > You may be able to just drop 1.6.0's Bio/SeqIO/fastq.pm in front of > your > existing Bioperl installation, if you're a little crazy and don't > want to > update the installation properly. If you do that, or if you update, > you'll find that the new parser emits the following pedantic warning > for > your fastq sequences: > John I did install 1.6.0 (which is very smooth, my compliments to the chefs) and it solved the problem except for the warning you note which Chris Fields fixed this morning. Thanks for the help. Mike > MSG: Seq/Qual descriptions don't match; using sequence description > > In practice, lots of people (probably even most!) don't bother > putting the > sequence id on the "+" line, as it is entirely pointless duplication, > instead leaving the "+" line otherwise empty. So I hope the > maintainers > agree that this warning should be relaxed, such as in the attached > patch. > Or even removed -- there was no equivalent warning in the previous > code. > > Cheers, > > John > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > From KBriedis at accelrys.com Tue May 12 13:19:39 2009 From: KBriedis at accelrys.com (Kristine Briedis) Date: Tue, 12 May 2009 13:19:39 -0400 Subject: [Bioperl-l] Eutilities epost/efetch problem In-Reply-To: <2977DDC5-D26F-4643-AA4C-5A11EC323B94@ualberta.ca> References: <2977DDC5-D26F-4643-AA4C-5A11EC323B94@ualberta.ca> Message-ID: Hi Warren, We've noticed the same EFetch error. I emailed NCBI and will let you know what they say. Cheers, Kristine =============================== Kristine Briedis, Ph.D. Bioinformatics Software Engineer Accelrys, Inc. 10188 Telesis Court, Suite 100 San Diego, CA 92121 USA kbriedis at accelrys.com -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Warren Gallin Sent: Monday, May 11, 2009 6:36 PM To: BioPerl List Subject: [Bioperl-l] Eutilities epost/efetch problem Hi folks, Something started failing for me this morning that had been working reliably for the last week, I post an array of gi numbers, a history is successfully returned, but when I try to use efetch to get the records, it fails with the error: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Response Error Not Found STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:368 STACK: Bio::DB::GenericWebAgent::get_Response /Library/Perl/5.8.8/Bio/ DB/GenericWebAgent.pm:215 STACK: 090507_Stable_gb_update.pl:238 ----------------------------------------------------------- I'm running the efetch inside an eval and letting it try a total of 6 times with a 5 sedond sleep in between, but the error is consistent. So I consider two possibilities: 1) Has something changed on the Entrez server recently? Has anyone else started having this kind of problem? 2) Have I inserted some subtle flaw into my code that would lead to a failure of efetch. I am attaching two text files, one with the code chunklet that is doing this and the other the output from the script. Any help or suggestions are profoundly appreciated. Warren Gallin From bix at sendu.me.uk Tue May 12 14:11:44 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 12 May 2009 19:11:44 +0100 Subject: [Bioperl-l] fastq parsing problem In-Reply-To: References: Message-ID: <4A09BBE0.7010000@sendu.me.uk> John Marshall wrote: > Michael Muratet wrote: >> I've got a problem parsing fastq output from the maq aligner. The >> parser is throwing an exception for the following record: >> >> @HWI-EAS146:3:1:2:177#0/1 >> CTCCGCTNNCTTCTCAG[...] >> + >> @,AB=>-&&:5).;+*=[...] >> >> I looked up the line in fastq.pm that does the parsing: >> >> 116 my ($top,$sequence,$top2,$qualsequence) = [...] > > This is the fastq parser from 1.5.2 or thereabouts, which had a bug (the > $/ definition just above this code) that prevented it from parsing a > record with a quality line starting with "@". This was probably not > recognised as a bug for a long time due to the enduring myth that fastq > quality lines always start with "!". I see you talked about it in the discussion page, but I think it might be time to change the wiki page as well: http://www.bioperl.org/wiki/FASTQ_sequence_format That caught me out as well. *sigh* From gmodhelp at googlemail.com Tue May 12 13:36:27 2009 From: gmodhelp at googlemail.com (Dave Clements, GMOD Help Desk) Date: Tue, 12 May 2009 10:36:27 -0700 Subject: [Bioperl-l] [Gmod-gbrowse] Non-root installation: problem with Module::Builder versions In-Reply-To: <4A03986D.7080007@gmail.com> References: <4A03986D.7080007@gmail.com> Message-ID: <71ee57c70905121036w7fba1ac6x7e68db41f3035ee3@mail.gmail.com> Hi Neil, I'm cross-posting your question to the BioPerl list as 1) it is more of a perl question than a GBrowse question, and 2) I don't know the answer. Dave C. GMOD Help Desk Was this helpful? Let us know at http://gmod.org/wiki/Help_Desk_Feedback Learn more about GMOD at SMBE & Arthropod Genomics: http://ccg.biology.uiowa.edu/smbe/symposia.php?action=view&sym_ID=27 http://www.k-state.edu/agc/symp2009/seminar.html On Thu, May 7, 2009 at 7:26 PM, Neil Saunders wrote: > I'm trying to install the latest Gbrowse (1.99) on a machine where I do > not have root access (Ubuntu/dapper). > > I have set up non-root CPAN and installed all of the prerequisites, no > problems, in ~/lib/perl5. ?However, when I try to install Gbrowse either > via CPAN or using the latest CVS Build script, I run into this problem: > > Global symbol "$VAR1" requires explicit package name at (eval 28) line > 1088, line 1. > ? ? ? ?...propagated at /usr/local/share/perl/5.8.7/Module/Build/Base.pm line > 1002, line 1. > make: *** [all] Error 255 > ? LDS/GBrowse-1.99.tar.gz > ? /usr/bin/make -- NOT OK > > > It seems that there are 2 versions of Module::Builder on the machine. ?I > have installed a version from CPAN which is found in > ~/lib/perl5/site_perl/Module/. ?However, from the above error it looks > as though the install is trying to use a system-wide version of > Module::Build in /usr/local/share/perl/5.8.7. > > Can anyone shed any light on either the error message, or a way to force > usage of my $HOME module, not the system one? > > thanks, > Neil Saunders > -- > ?Statistical Bioinformatics - Health > ?CSIRO Mathematical and Information Sciences > ?Locked Bag 17, North Ryde, NSW 1670, Australia > > http://friendfeed.com/neilfws > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > From cjfields at illinois.edu Tue May 12 14:36:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 May 2009 13:36:25 -0500 Subject: [Bioperl-l] Eutilities epost/efetch problem In-Reply-To: References: <2977DDC5-D26F-4643-AA4C-5A11EC323B94@ualberta.ca> Message-ID: <2F9A2E1F-877B-4571-B10C-4F89FB22488D@illinois.edu> Not showing up in tests, so this may be something very specific that changed. I'll try to reproduce it. chris On May 12, 2009, at 12:19 PM, Kristine Briedis wrote: > Hi Warren, > > We've noticed the same EFetch error. I emailed NCBI and will let > you know what they say. > > Cheers, > Kristine > > > =============================== > Kristine Briedis, Ph.D. > Bioinformatics Software Engineer > Accelrys, Inc. > 10188 Telesis Court, Suite 100 > San Diego, CA 92121 USA > kbriedis at accelrys.com > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org > ] On Behalf Of Warren Gallin > Sent: Monday, May 11, 2009 6:36 PM > To: BioPerl List > Subject: [Bioperl-l] Eutilities epost/efetch problem > > Hi folks, > > Something started failing for me this morning that had been working > reliably for the last week, > > I post an array of gi numbers, a history is successfully returned, > but when I try to use efetch to get the records, it fails with the > error: > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Response Error > Not Found > STACK: Error::throw > STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::GenericWebAgent::get_Response /Library/Perl/5.8.8/Bio/ > DB/GenericWebAgent.pm:215 > STACK: 090507_Stable_gb_update.pl:238 > ----------------------------------------------------------- > > > I'm running the efetch inside an eval and letting it try a total of 6 > times with a 5 sedond sleep in between, but the error is consistent. > > So I consider two possibilities: > 1) Has something changed on the Entrez server recently? Has anyone > else started having this kind of problem? > > 2) Have I inserted some subtle flaw into my code that would lead to a > failure of efetch. > > I am attaching two text files, one with the code chunklet that is > doing this and the other the output from the script. > > Any help or suggestions are profoundly appreciated. > > Warren Gallin > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue May 12 14:36:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 May 2009 13:36:40 -0500 Subject: [Bioperl-l] fastq parsing problem In-Reply-To: <4A09BBE0.7010000@sendu.me.uk> References: <4A09BBE0.7010000@sendu.me.uk> Message-ID: On May 12, 2009, at 1:11 PM, Sendu Bala wrote: > John Marshall wrote: >> Michael Muratet wrote: >>> I've got a problem parsing fastq output from the maq aligner. The >>> parser is throwing an exception for the following record: >>> >>> @HWI-EAS146:3:1:2:177#0/1 >>> CTCCGCTNNCTTCTCAG[...] >>> + >>> @,AB=>-&&:5).;+*=[...] >>> >>> I looked up the line in fastq.pm that does the parsing: >>> >>> 116 my ($top,$sequence,$top2,$qualsequence) = [...] >> This is the fastq parser from 1.5.2 or thereabouts, which had a bug >> (the >> $/ definition just above this code) that prevented it from parsing a >> record with a quality line starting with "@". This was probably not >> recognised as a bug for a long time due to the enduring myth that >> fastq >> quality lines always start with "!". > > I see you talked about it in the discussion page, but I think it > might be time to change the wiki page as well: > http://www.bioperl.org/wiki/FASTQ_sequence_format > > That caught me out as well. *sigh* Updated, along with links to the MAQ FASTQ page and Wikipedia. I'll update the module docs as well. chris From KBriedis at accelrys.com Tue May 12 14:42:33 2009 From: KBriedis at accelrys.com (Kristine Briedis) Date: Tue, 12 May 2009 14:42:33 -0400 Subject: [Bioperl-l] Eutilities epost/efetch problem In-Reply-To: <2F9A2E1F-877B-4571-B10C-4F89FB22488D@illinois.edu> References: <2977DDC5-D26F-4643-AA4C-5A11EC323B94@ualberta.ca> <2F9A2E1F-877B-4571-B10C-4F89FB22488D@illinois.edu> Message-ID: Hi Chris, I'm not getting the error anymore. NCBI must have fixed something. Cheers, Kristine -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Tuesday, May 12, 2009 11:36 AM To: Kristine Briedis Cc: Warren Gallin; BioPerl List Subject: Re: [Bioperl-l] Eutilities epost/efetch problem Not showing up in tests, so this may be something very specific that changed. I'll try to reproduce it. chris On May 12, 2009, at 12:19 PM, Kristine Briedis wrote: > Hi Warren, > > We've noticed the same EFetch error. I emailed NCBI and will let > you know what they say. > > Cheers, > Kristine > > > =============================== > Kristine Briedis, Ph.D. > Bioinformatics Software Engineer > Accelrys, Inc. > 10188 Telesis Court, Suite 100 > San Diego, CA 92121 USA > kbriedis at accelrys.com > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org > ] On Behalf Of Warren Gallin > Sent: Monday, May 11, 2009 6:36 PM > To: BioPerl List > Subject: [Bioperl-l] Eutilities epost/efetch problem > > Hi folks, > > Something started failing for me this morning that had been working > reliably for the last week, > > I post an array of gi numbers, a history is successfully returned, > but when I try to use efetch to get the records, it fails with the > error: > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Response Error > Not Found > STACK: Error::throw > STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:368 > STACK: Bio::DB::GenericWebAgent::get_Response /Library/Perl/5.8.8/Bio/ > DB/GenericWebAgent.pm:215 > STACK: 090507_Stable_gb_update.pl:238 > ----------------------------------------------------------- > > > I'm running the efetch inside an eval and letting it try a total of 6 > times with a 5 sedond sleep in between, but the error is consistent. > > So I consider two possibilities: > 1) Has something changed on the Entrez server recently? Has anyone > else started having this kind of problem? > > 2) Have I inserted some subtle flaw into my code that would lead to a > failure of efetch. > > I am attaching two text files, one with the code chunklet that is > doing this and the other the output from the script. > > Any help or suggestions are profoundly appreciated. > > Warren Gallin > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue May 12 14:57:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 May 2009 13:57:32 -0500 Subject: [Bioperl-l] Eutilities epost/efetch problem In-Reply-To: References: <2977DDC5-D26F-4643-AA4C-5A11EC323B94@ualberta.ca> <2F9A2E1F-877B-4571-B10C-4F89FB22488D@illinois.edu> Message-ID: Same here (no error). Just ran the below. chris #!/usr/bin/perl -w use strict; use warnings; use Bio::DB::EUtilities; my @gi_number = qw( 41395563 31618162 81831839 54038971 ); my $gpeptfactory = Bio::DB::EUtilities->new( -eutil => 'epost', -db => 'protein', -rettype => 'gp', -retmode => 'text', -tool => 'VKCDB_Update', -email => 'wgallin at ualberta.ca', -id => \@gi_number, -keep_histories => 1); my $hist = $gpeptfactory->next_cookie || die "Arghh!"; $gpeptfactory->set_parameters(-eutil => 'efetch', -history => $hist); $gpeptfactory->get_Response(-file => '>test.gb'); On May 12, 2009, at 1:42 PM, Kristine Briedis wrote: > Hi Chris, > > I'm not getting the error anymore. NCBI must have fixed something. > > Cheers, > Kristine > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Tuesday, May 12, 2009 11:36 AM > To: Kristine Briedis > Cc: Warren Gallin; BioPerl List > Subject: Re: [Bioperl-l] Eutilities epost/efetch problem > > Not showing up in tests, so this may be something very specific that > changed. I'll try to reproduce it. > > chris > > On May 12, 2009, at 12:19 PM, Kristine Briedis wrote: > >> Hi Warren, >> >> We've noticed the same EFetch error. I emailed NCBI and will let >> you know what they say. >> >> Cheers, >> Kristine >> >> >> =============================== >> Kristine Briedis, Ph.D. >> Bioinformatics Software Engineer >> Accelrys, Inc. >> 10188 Telesis Court, Suite 100 >> San Diego, CA 92121 USA >> kbriedis at accelrys.com >> >> >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org >> ] On Behalf Of Warren Gallin >> Sent: Monday, May 11, 2009 6:36 PM >> To: BioPerl List >> Subject: [Bioperl-l] Eutilities epost/efetch problem >> >> Hi folks, >> >> Something started failing for me this morning that had been working >> reliably for the last week, >> >> I post an array of gi numbers, a history is successfully returned, >> but when I try to use efetch to get the records, it fails with the >> error: >> >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Response Error >> Not Found >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm: >> 368 >> STACK: Bio::DB::GenericWebAgent::get_Response /Library/Perl/5.8.8/ >> Bio/ >> DB/GenericWebAgent.pm:215 >> STACK: 090507_Stable_gb_update.pl:238 >> ----------------------------------------------------------- >> >> >> I'm running the efetch inside an eval and letting it try a total >> of 6 >> times with a 5 sedond sleep in between, but the error is consistent. >> >> So I consider two possibilities: >> 1) Has something changed on the Entrez server recently? Has anyone >> else started having this kind of problem? >> >> 2) Have I inserted some subtle flaw into my code that would lead >> to a >> failure of efetch. >> >> I am attaching two text files, one with the code chunklet that is >> doing this and the other the output from the script. >> >> Any help or suggestions are profoundly appreciated. >> >> Warren Gallin >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Tue May 12 15:19:03 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 12 May 2009 12:19:03 -0700 Subject: [Bioperl-l] Eutilities epost/efetch problem In-Reply-To: References: <2977DDC5-D26F-4643-AA4C-5A11EC323B94@ualberta.ca> <2F9A2E1F-877B-4571-B10C-4F89FB22488D@illinois.edu> Message-ID: <4A09CBA7.7010206@cornell.edu> I don't think this is terribly unusual to have Efetch go down. I have an automated pipeline that uses efetch to cross-check some stuff, and it goes down every once in a while, sometimes for up to a day or so. Might consider having a little nicer error message for this case? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu Chris Fields wrote: > Same here (no error). Just ran the below. > > chris > > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::DB::EUtilities; > > my @gi_number = qw( > 41395563 > 31618162 > 81831839 > 54038971 > ); > > my $gpeptfactory = Bio::DB::EUtilities->new( > -eutil => 'epost', > -db => 'protein', > -rettype => 'gp', > -retmode => 'text', > -tool => 'VKCDB_Update', > -email => 'wgallin at ualberta.ca', > -id => \@gi_number, > -keep_histories => 1); > > my $hist = $gpeptfactory->next_cookie || die "Arghh!"; > > $gpeptfactory->set_parameters(-eutil => 'efetch', > -history => $hist); > > $gpeptfactory->get_Response(-file => '>test.gb'); > > On May 12, 2009, at 1:42 PM, Kristine Briedis wrote: > >> Hi Chris, >> >> I'm not getting the error anymore. NCBI must have fixed something. >> >> Cheers, >> Kristine >> >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Tuesday, May 12, 2009 11:36 AM >> To: Kristine Briedis >> Cc: Warren Gallin; BioPerl List >> Subject: Re: [Bioperl-l] Eutilities epost/efetch problem >> >> Not showing up in tests, so this may be something very specific that >> changed. I'll try to reproduce it. >> >> chris >> >> On May 12, 2009, at 12:19 PM, Kristine Briedis wrote: >> >>> Hi Warren, >>> >>> We've noticed the same EFetch error. I emailed NCBI and will let >>> you know what they say. >>> >>> Cheers, >>> Kristine >>> >>> >>> =============================== >>> Kristine Briedis, Ph.D. >>> Bioinformatics Software Engineer >>> Accelrys, Inc. >>> 10188 Telesis Court, Suite 100 >>> San Diego, CA 92121 USA >>> kbriedis at accelrys.com >>> >>> >>> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org >>> ] On Behalf Of Warren Gallin >>> Sent: Monday, May 11, 2009 6:36 PM >>> To: BioPerl List >>> Subject: [Bioperl-l] Eutilities epost/efetch problem >>> >>> Hi folks, >>> >>> Something started failing for me this morning that had been working >>> reliably for the last week, >>> >>> I post an array of gi numbers, a history is successfully returned, >>> but when I try to use efetch to get the records, it fails with the >>> error: >>> >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Response Error >>> Not Found >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:368 >>> STACK: Bio::DB::GenericWebAgent::get_Response /Library/Perl/5.8.8/Bio/ >>> DB/GenericWebAgent.pm:215 >>> STACK: 090507_Stable_gb_update.pl:238 >>> ----------------------------------------------------------- >>> >>> >>> I'm running the efetch inside an eval and letting it try a total >>> of 6 >>> times with a 5 sedond sleep in between, but the error is consistent. >>> >>> So I consider two possibilities: >>> 1) Has something changed on the Entrez server recently? Has anyone >>> else started having this kind of problem? >>> >>> 2) Have I inserted some subtle flaw into my code that would lead >>> to a >>> failure of efetch. >>> >>> I am attaching two text files, one with the code chunklet that is >>> doing this and the other the output from the script. >>> >>> Any help or suggestions are profoundly appreciated. >>> >>> Warren Gallin >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue May 12 15:36:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 May 2009 14:36:00 -0500 Subject: [Bioperl-l] Eutilities epost/efetch problem In-Reply-To: <4A09CBA7.7010206@cornell.edu> References: <2977DDC5-D26F-4643-AA4C-5A11EC323B94@ualberta.ca> <2F9A2E1F-877B-4571-B10C-4F89FB22488D@illinois.edu> <4A09CBA7.7010206@cornell.edu> Message-ID: Rob, The error message is generated on their side or from LWP; from GenericWebAgent: if ($response->is_error) { $self->throw("Response Error\n".$response->message); } We could change that, but I try to leave it as generic as possible. chris On May 12, 2009, at 2:19 PM, Robert Buels wrote: > I don't think this is terribly unusual to have Efetch go down. I > have an automated pipeline that uses efetch to cross-check some > stuff, and it goes down every once in a while, sometimes for up to a > day or so. > > Might consider having a little nicer error message for this case? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > > Chris Fields wrote: >> Same here (no error). Just ran the below. >> >> chris >> >> #!/usr/bin/perl -w >> >> use strict; >> use warnings; >> use Bio::DB::EUtilities; >> >> my @gi_number = qw( >> 41395563 >> 31618162 >> 81831839 >> 54038971 >> ); >> >> my $gpeptfactory = Bio::DB::EUtilities->new( >> -eutil => 'epost', >> -db => 'protein', >> -rettype => 'gp', >> -retmode => 'text', >> -tool => 'VKCDB_Update', >> -email => 'wgallin at ualberta.ca', >> -id => \@gi_number, >> -keep_histories => 1); >> >> my $hist = $gpeptfactory->next_cookie || die "Arghh!"; >> >> $gpeptfactory->set_parameters(-eutil => 'efetch', >> -history => $hist); >> >> $gpeptfactory->get_Response(-file => '>test.gb'); >> >> On May 12, 2009, at 1:42 PM, Kristine Briedis wrote: >> >>> Hi Chris, >>> >>> I'm not getting the error anymore. NCBI must have fixed something. >>> >>> Cheers, >>> Kristine >>> >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Tuesday, May 12, 2009 11:36 AM >>> To: Kristine Briedis >>> Cc: Warren Gallin; BioPerl List >>> Subject: Re: [Bioperl-l] Eutilities epost/efetch problem >>> >>> Not showing up in tests, so this may be something very specific that >>> changed. I'll try to reproduce it. >>> >>> chris >>> >>> On May 12, 2009, at 12:19 PM, Kristine Briedis wrote: >>> >>>> Hi Warren, >>>> >>>> We've noticed the same EFetch error. I emailed NCBI and will let >>>> you know what they say. >>>> >>>> Cheers, >>>> Kristine >>>> >>>> >>>> =============================== >>>> Kristine Briedis, Ph.D. >>>> Bioinformatics Software Engineer >>>> Accelrys, Inc. >>>> 10188 Telesis Court, Suite 100 >>>> San Diego, CA 92121 USA >>>> kbriedis at accelrys.com >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org >>>> ] On Behalf Of Warren Gallin >>>> Sent: Monday, May 11, 2009 6:36 PM >>>> To: BioPerl List >>>> Subject: [Bioperl-l] Eutilities epost/efetch problem >>>> >>>> Hi folks, >>>> >>>> Something started failing for me this morning that had been >>>> working >>>> reliably for the last week, >>>> >>>> I post an array of gi numbers, a history is successfully >>>> returned, >>>> but when I try to use efetch to get the records, it fails with the >>>> error: >>>> >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: Response Error >>>> Not Found >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/ >>>> Root.pm:368 >>>> STACK: Bio::DB::GenericWebAgent::get_Response /Library/Perl/5.8.8/ >>>> Bio/ >>>> DB/GenericWebAgent.pm:215 >>>> STACK: 090507_Stable_gb_update.pl:238 >>>> ----------------------------------------------------------- >>>> >>>> >>>> I'm running the efetch inside an eval and letting it try a >>>> total of 6 >>>> times with a 5 sedond sleep in between, but the error is >>>> consistent. >>>> >>>> So I consider two possibilities: >>>> 1) Has something changed on the Entrez server recently? Has >>>> anyone >>>> else started having this kind of problem? >>>> >>>> 2) Have I inserted some subtle flaw into my code that would >>>> lead to a >>>> failure of efetch. >>>> >>>> I am attaching two text files, one with the code chunklet that >>>> is >>>> doing this and the other the output from the script. >>>> >>>> Any help or suggestions are profoundly appreciated. >>>> >>>> Warren Gallin >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Tue May 12 14:50:40 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Tue, 12 May 2009 14:50:40 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] Non-root installation: problem with Module::Builder versions In-Reply-To: <71ee57c70905121036w7fba1ac6x7e68db41f3035ee3@mail.gmail.com> References: <4A03986D.7080007@gmail.com> <71ee57c70905121036w7fba1ac6x7e68db41f3035ee3@mail.gmail.com> Message-ID: <2A93D0F6-3A2A-4638-A095-26CFC595E8A1@verizon.net> Neil, Try setting the environmental variable PERL5LIB: PERL5LIB A colon-separated list of directories in which to look for Perl library files before looking in the standard library and the current directory. If PERL5LIB is not defined, PERLLIB is used. When running taint checks (because the script was running setuid or setgid, or the - T switch was used), neither variable is used. The script should instead say use lib "/my/directory"; Brian O. On May 12, 2009, at 1:36 PM, Dave Clements, GMOD Help Desk wrote: > Hi Neil, > > I'm cross-posting your question to the BioPerl list as 1) it is more > of a perl question than a GBrowse question, and 2) I don't know the > answer. > > Dave C. > GMOD Help Desk > > Was this helpful? Let us know at http://gmod.org/wiki/Help_Desk_Feedback > > Learn more about GMOD at SMBE & Arthropod Genomics: > http://ccg.biology.uiowa.edu/smbe/symposia.php?action=view&sym_ID=27 > http://www.k-state.edu/agc/symp2009/seminar.html > > > > On Thu, May 7, 2009 at 7:26 PM, Neil Saunders > wrote: >> I'm trying to install the latest Gbrowse (1.99) on a machine where >> I do >> not have root access (Ubuntu/dapper). >> >> I have set up non-root CPAN and installed all of the prerequisites, >> no >> problems, in ~/lib/perl5. However, when I try to install Gbrowse >> either >> via CPAN or using the latest CVS Build script, I run into this >> problem: >> >> Global symbol "$VAR1" requires explicit package name at (eval 28) >> line >> 1088, line 1. >> ...propagated at /usr/local/share/perl/5.8.7/Module/Build/ >> Base.pm line >> 1002, line 1. >> make: *** [all] Error 255 >> LDS/GBrowse-1.99.tar.gz >> /usr/bin/make -- NOT OK >> >> >> It seems that there are 2 versions of Module::Builder on the >> machine. I >> have installed a version from CPAN which is found in >> ~/lib/perl5/site_perl/Module/. However, from the above error it >> looks >> as though the install is trying to use a system-wide version of >> Module::Build in /usr/local/share/perl/5.8.7. >> >> Can anyone shed any light on either the error message, or a way to >> force >> usage of my $HOME module, not the system one? >> >> thanks, >> Neil Saunders >> -- >> Statistical Bioinformatics - Health >> CSIRO Mathematical and Information Sciences >> Locked Bag 17, North Ryde, NSW 1670, Australia >> >> http://friendfeed.com/neilfws >> >> ------------------------------------------------------------------------------ >> The NEW KODAK i700 Series Scanners deliver under ANY circumstances! >> Your >> production scanning environment may not be a perfect world - but >> thanks to >> Kodak, there's a perfect scanner to get the job done! With the NEW >> KODAK i700 >> Series Scanner you'll get full speed at 300 dpi even with all image >> processing features enabled. http://p.sf.net/sfu/kodak-com >> _______________________________________________ >> Gmod-gbrowse mailing list >> Gmod-gbrowse at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dan.bolser at gmail.com Tue May 12 16:27:40 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 12 May 2009 21:27:40 +0100 Subject: [Bioperl-l] The Power of R (Chris Fields) In-Reply-To: <1F1240778FB0AF46B4E5A72C44D2C7472A29BE7E@exch1-hi.accelrys.net> References: <8F435B66-33CF-467D-8D86-AA8EF2309E98@lsi.upc.edu> <1F1240778FB0AF46B4E5A72C44D2C7472A29BE7E@exch1-hi.accelrys.net> Message-ID: <2c8757af0905121327i681b13c3q4892ea9751c4adad@mail.gmail.com> 2009/5/8 Scott Markel : > Gabriel, > > A quick personal comment - Thank you for referencing the "Using > BioPerl" book that Jason Stajich, Ewan Birney, and I are writing. > Now we'll have to finish it. :) Please hurry! ;-) > Scott > From maj at fortinbras.us Tue May 12 16:06:56 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 May 2009 16:06:56 -0400 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?) In-Reply-To: References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com><2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com><2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com><9ED2DE76-C331-41D2-B303-7BD97B5CAF10@illinois.edu><2c8757af0905120617v37aa7b8udf94ca558ed3415@mail.gmail.com> Message-ID: <88072CDBECA446D0A7C48E74237F59FE@NewLife> Patch below (to SearchUtils.pm) fixes the non-numeric warnings on Dan's data, but something deeper may be going on. Also get many of the following warnings, haven't looked at it closely: --------------------- WARNING --------------------- MSG: Removing score value(s) --------------------------------------------------- PATCH: Index: SearchUtils.pm =================================================================== --- SearchUtils.pm (revision 15674) +++ SearchUtils.pm (working copy) @@ -252,8 +252,8 @@ } $qctg_dat{ "$frame$strand" }->{'length_aln_query'} += $_->{'stop'} - $_->{'start'} + 1; - $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_->{'iden'}; - $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_->{'cons'}; + $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_->{'iden'} || 0; + $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_->{'cons'} || 0; $qctg_dat{ "$frame$strand" }->{'qstrand'} = $strand; } @@ -407,9 +407,12 @@ }; if($@) { warn "\a\n$@\n"; } else { + # make sure numerical + $_->{'iden'} ||= 0; + $_->{'cons'} ||= 0; $_->{'start'} = $start; # Assign a new start coordinate to the contig - $_->{'iden'} += $numID; # and add new data to #identical, #conserved. - $_->{'cons'} += $numCons; + $_->{'iden'} += ($numID||0); # and add new data to #identical, #conserved. + $_->{'cons'} += ($numCons||0); push(@{$_->{hsps}}, $hsp); $overlap = 1; } @@ -424,9 +427,13 @@ }; if($@) { warn "\a\n$@\n"; } else { + # make sure numerical + $_->{'iden'} ||= 0; + $_->{'cons'} ||= 0; + $_->{'stop'} = $stop; # Assign a new stop coordinate to the contig - $_->{'iden'} += $numID; # and add new data to #identical, #conserved. - $_->{'cons'} += $numCons; + $_->{'iden'} += ($numID||0); # and add new data to #identical, #conserved. + $_->{'cons'} += ($numCons||0); push(@{$_->{hsps}}, $hsp); $overlap = 1; } @@ -461,8 +468,8 @@ }; if($@) { warn "\a\n$@\n"; } else { - $ids += $these_ids; - $cons += $these_cons; + $ids += ($these_ids||0); + $cons += ($these_cons||0); } last if $hsp_start == $u_start; @@ -490,8 +497,8 @@ }; if($@) { warn "\a\n$@\n"; } else { - $ids += $these_ids; - $cons += $these_cons; + $ids += ($these_ids||0); + $cons += ($these_cons||0); } last if $hsp_end == $u_stop; ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Dan Bolser" Sent: Tuesday, May 12, 2009 10:04 AM Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?) > More complicated than that, I'm afraid. We should try to fix that at the > source of the problem. > > This appears to stem from SearchUtils HSP tiling, which in turn utilizes > HSPI::matches(), which in turn checks num_identical/ num_conserved. My guess > is, since this is blasttable format, one of these isn't set and thus is > returning the wrong value. I'll attempt to track it down today, but it may > take some time. > > chris > > On May 12, 2009, at 8:29 AM, Mark A. Jensen wrote: > >> This sounds like a >> >> $sum = eval join( '+', @a); >> >> problem, which can be fixed with >> >> $sum = eval join('+', map { $_ || () } @a) ; >> >> MAJ >> ----- Original Message ----- From: "Dan Bolser" >> To: "Chris Fields" >> Cc: "BioPerl List" >> Sent: Tuesday, May 12, 2009 9:17 AM >> Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features' >> fromSearchIO?) >> >> >> 2009/5/12 Chris Fields : >>> Fixed that in svn. We're all still learning the ropes... >> >> In that case, I'm seeing multiple instances of... >> >> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line >> 256 >> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line >> 412 >> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line >> 429 >> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line >> 465 >> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line >> 473 >> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line >> 494 >> Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line >> 502 >> >> >> Hmm... I was about to go on to complain about the weird GFF that I was >> seeing, but suddenly it looks OK. My bioperl install must think your >> standing over my shoulder and is therefore behaving itself! >> >> >> Thanks again for all the help, >> Dan. >> >> >> >> >>> chris >>> >>> On May 12, 2009, at 5:55 AM, Dan Bolser wrote: >>> >>>> 2009/5/12 Dan Bolser : >>>>> >>>>> Unfortunately bp_search2gff.pl is giving me errors: >>>>> >>>>> bp_search2gff.pl --version 3 -i BlastResults/blast_table_filtered -f >>>>> blasttable -o BlastResults/blast_table_filtered.gff -t hit >>>>> --match --target --component >>>>> >>>>> --------------------- WARNING --------------------- >>>>> MSG: Removing score value(s) >>>>> --------------------------------------------------- >>>>> Can't locate object method "remove_tags" via package >>>>> "Bio::SeqFeature::Similarity" at >>>>> /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm line >>>>> 393, line 5. >>>> >>>> >>>> I'm just learning the ropes... >>>> >>>> --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11 >>>> 15:25:55.000000000 +0100 >>>> +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12 >>>> 11:52:41.000000000 +0100 >>>> @@ -390,7 +390,7 @@ >>>> } >>>> if ($self->has_tag('score')) { >>>> $self->warn("Removing score value(s)"); >>>> - $self->remove_tags('score'); >>>> + $self->remove_tag('score'); >>>> } >>>> $self->add_tag_value('score',$value); >>>> } >>>> >>>> >>>> >>>> >>>> >>>>> Anyone seen this before? >>>>> >>>>> Cheers, >>>>> Dan. >>>>> >>>>> >>>>> >>>>> 2009/5/12 Dan Bolser : >>>>>> >>>>>> Thanks for the info guys, I think I was naively hoping that the >>>>>> feature would know how to cast itself as a 'SeqFeature' (GFF). >>>>>> >>>>>> I think I understand the problem better now, so I'll try to summarise: >>>>>> >>>>>> There is no standard way to encode a HSP as a feature (not least >>>>>> because there are two choices about which sequence (query or the hit) >>>>>> it should be attached to). BioPerl will try, but the result will not >>>>>> be "well structured" SeqFeatures or "well formed" GFF. >>>>>> >>>>>> >>>>>> From what I read I guess it should be possible to standardize this >>>>>> mapping (based on something in one of the examples or the 'search2gff' >>>>>> script), assuming you specify weather you want features put on the >>>>>> query or on the hit. >>>>>> >>>>>> At some point last year I was trying out the bp_search2gff.pl and my >>>>>> own code to write a GFF file for loading and viewing by Gbrowse. At >>>>>> that time I gave up, as nothing seemed to be working. I was hoping >>>>>> that doing this at a lower level (i.e. never writing any GFF myself) >>>>>> it would stand a better chance of working. >>>>>> >>>>>> Also I was thinking that Gbrowse, if given a SeqFeature::Store, could >>>>>> autoconfigure its interface to some degree. I guess its back to the >>>>>> docs ;-) >>>>>> >>>>>> >>>>>> >>>>>> I'll keep trying and see if I can get anywhere. >>>>>> >>>>>> Thanks again, >>>>>> Dan. >>>>>> >>>>>> >>>>>> >>>>>> References for the above: >>>>>> >>>>>> 2009/5/11 Jason Stajich : >>>>>> >>>>>>> otherwise you need to be converting the HSPs into seqfeatures with the >>>>>>> right associated information (i.e. the tag/value pairs that are in the >>>>>>> 9th >>>>>>> column) in order to have well structured data in the database. >>>>>> >>>>>>> You can get the individual features from the feature pair with >>>>>>> $hsp->query or $hsp->hit which can also be passed to a GFF writer (or >>>>>>> call >>>>>>> $hsp->hit->gff_string). Note that since the data storage is not >>>>>>> structured >>>>>>> in a GFF3 like-way this won't immediately produce well formed GFF3 for >>>>>>> the >>>>>>> 9th column. >>>>>> >>>>>> >>>>>> 2009/5/11 Chris Fields : >>>>>> >>>>>>> The main problem is the mapping is subjective based on what your >>>>>>> reference sequence is within the BLAST run (e.g. whether it is the >>>>>>> query or >>>>>>> the hit), and is something that can't be automatically discerned. I >>>>>>> ended >>>>>>> up rolling my own with SeqFeature::Store (just mapped the relevant data >>>>>>> to >>>>>>> Bio::DB::SeqFeatures), but I have long wanted to fix up the relevant >>>>>>> scripts >>>>>>> to integrate my changes in, just haven't had the time >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Tue May 12 17:27:57 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 13 May 2009 09:27:57 +1200 Subject: [Bioperl-l] alignable portion of a genome In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493DA5373@exchsth.agresearch.co.nz> References: <23480025.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF32493DA5373@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32493DA54D2@exchsth.agresearch.co.nz> Adding the mutations is a little hacky (and probably slow) but I think it works correctly. The stats should work out OK but it's too early and I haven't had a coffee yet so can't be sure :-) --Russell ============================ #!perl -w my $seq = "atcgacgatcgaacgatcga"; my $debug = 0; foreach ($seq =~ /(?=(\w{5}))/g){ $h++; # add all the exact words to the hash $hash{$_}++; print "$_\n" if $debug; # mutate words and add to hash my at rr = mutate($_); foreach (@rr){ print "$_\n" if $debug; $h++; $hash{$_}++; } } # print out the hash counts & stats foreach (keys %hash){ print "$_\t$hash{$_}\n" if $debug; $singles++ if($hash{$_} eq 1); } print $singles/$h,"\n"; sub mutate{ my @array = split '',shift; my @res = (); my $rep = 'X'; for(my$i = 0; $i <= $#array; $i++){ my $old1 = $array[$i]; splice @array, $i, 1, $rep; push @res, (join '', @array); for(my$j = $i+1; $j <= $#array; $j++){ my $old2 = $array[$j]; splice @array, $j, 1, $rep; push @res, (join '', @array); splice @array, $j, 1, $old2; } splice @array, $i, 1, $old1; } return @res; } ================================ > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > Sent: Tuesday, 12 May 2009 3:56 p.m. > To: 'fadista'; 'Bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] alignable portion of a genome > > Perfect matches is easy: > > $seq = "atcgacgatcgaacgatcga"; > > foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++} > foreach (keys %hash){ $singles++ if($hash{$_} eq 1)} > print $singles/$h; > > Could probably be done with map as well. > Counting the miss-matches might take a bit more thinking.... > Any ideas MAJ? > > --Russell > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of fadista > > Sent: Monday, 11 May 2009 9:32 p.m. > > To: Bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] alignable portion of a genome > > > > > > Hi, > > > > I would like to know of a good and fast way that could help me calculate the > > alignable portion of a genome (not human), given a reference sequence. > > When I say alignable portion I mean that I want to know all the positions of > > the genome that can be covered uniquely by reads of 36 bp and up to 2 > > mismatches. > > > > Some have advised me to work with Perl using the following strategy but I am > > not a Perl user so if someone has already a script for this function, it > > would be nice: > > > > "you could approach it by walking along the genome in a sliding window of > > 36 nt, and hash the frequency of each 36 nt sequence that you encounter. > > Then count how many of the 36 nt sequences had a frequency of exactly > > one. Divide this by the total number of 36nt windows visited. This > > should be do-able in about 20 lines of Perl." > > > > > > Best regards and thanks in advance > > > > -- > > View this message in context: http://www.nabble.com/alignable-portion-of-a- > > genome-tp23480025p23480025.html > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Tue May 12 19:04:35 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 12 May 2009 16:04:35 -0700 Subject: [Bioperl-l] [Gmod-gbrowse] Non-root installation: problem with Module::Builder versions In-Reply-To: <4A09FBE2.8040000@gmail.com> References: <4A03986D.7080007@gmail.com> <71ee57c70905121036w7fba1ac6x7e68db41f3035ee3@mail.gmail.com> <2A93D0F6-3A2A-4638-A095-26CFC595E8A1@verizon.net> <4A09FBE2.8040000@gmail.com> Message-ID: So this doesn't work? ./Build --install_base ~/ install On May 12, 2009, at 3:44 PM, Neil Saunders wrote: >> Try setting the environmental variable PERL5LIB: > > Thanks for the tip - however, PERL5LIB is set (to ~/lib/perl5). > > The Module::Build docs state that Config.pm is used by Module::Build. > So far as I can tell, the initial Build is using the system-wide perl > installation (/usr/local, /etc/CPAN) and the 'Build install' to $HOME > uses my personal Module::Build. Problems arise because these are > different versions (0.28 v 0.32). > > I assume that I can edit the Build scripts in some way to use only my > personal installation - will keep working on this. > > Neil > -- > Statistical Bioinformatics - Health > CSIRO Mathematical and Information Sciences > Locked Bag 17, North Ryde, NSW 1670, Australia > > http://friendfeed.com/neilfws > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! > Your > production scanning environment may not be a perfect world - but > thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW > KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse Jason Stajich jason at bioperl.org From jason at bioperl.org Tue May 12 19:07:21 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 12 May 2009 16:07:21 -0700 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?) In-Reply-To: <88072CDBECA446D0A7C48E74237F59FE@NewLife> References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com><2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com><2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com><9ED2DE76-C331-41D2-B303-7BD97B5CAF10@illinois.edu><2c8757af0905120617v37aa7b8udf94ca558ed3415@mail.gmail.com> <88072CDBECA446D0A7C48E74237F59FE@NewLife> Message-ID: I really don't think tile_hsps should be used on BLAST data folks, it is a pretty blind approach. If you really want the right answer you need to do -links with WU- BLAST or FASTA. Been discussed a few times on the mailing list. Good to fix the code bug I guess to avoid the warnings, but unless you are going to walk through all the HSPs and extract the consistent paths wrt query I think you'll have loops, etc in there which will make Hit->percent_id non-accurate. -jason On May 12, 2009, at 1:06 PM, Mark A. Jensen wrote: > Patch below (to SearchUtils.pm) fixes the non-numeric warnings on > Dan's data, but something deeper may be going on. > Also get many of the following warnings, haven't looked at it closely: > > --------------------- WARNING --------------------- > MSG: Removing score value(s) > --------------------------------------------------- > > PATCH: > > Index: SearchUtils.pm > =================================================================== > --- SearchUtils.pm (revision 15674) > +++ SearchUtils.pm (working copy) > @@ -252,8 +252,8 @@ > } > > $qctg_dat{ "$frame$strand" }->{'length_aln_query'} += $_- > >{'stop'} - $_->{'start'} + 1; > - $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_- > >{'iden'}; > - $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_- > >{'cons'}; > + $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_- > >{'iden'} || 0; > + $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_- > >{'cons'} || 0; > $qctg_dat{ "$frame$strand" }->{'qstrand'} = $strand; > } > > @@ -407,9 +407,12 @@ > }; > if($@) { warn "\a\n$@\n"; } > else { > + # make sure numerical > + $_->{'iden'} ||= 0; > + $_->{'cons'} ||= 0; > $_->{'start'} = $start; # Assign a new start > coordinate to the contig > - $_->{'iden'} += $numID; # and add new data to > #identical, #conserved. > - $_->{'cons'} += $numCons; > + $_->{'iden'} += ($numID||0); # and add new data to > #identical, #conserved. > + $_->{'cons'} += ($numCons||0); > push(@{$_->{hsps}}, $hsp); > $overlap = 1; > } > @@ -424,9 +427,13 @@ > }; > if($@) { warn "\a\n$@\n"; } > else { > + # make sure numerical > + $_->{'iden'} ||= 0; > + $_->{'cons'} ||= 0; > + > $_->{'stop'} = $stop; # Assign a new stop coordinate > to the contig > - $_->{'iden'} += $numID; # and add new data to > #identical, #conserved. > - $_->{'cons'} += $numCons; > + $_->{'iden'} += ($numID||0); # and add new data to > #identical, #conserved. > + $_->{'cons'} += ($numCons||0); > push(@{$_->{hsps}}, $hsp); > $overlap = 1; > } > @@ -461,8 +468,8 @@ > }; > if($@) { warn "\a\n$@\n"; } > else { > - $ids += $these_ids; > - $cons += $these_cons; > + $ids += ($these_ids||0); > + $cons += ($these_cons||0); > } > > last if $hsp_start == $u_start; > @@ -490,8 +497,8 @@ > }; > if($@) { warn "\a\n$@\n"; } > else { > - $ids += $these_ids; > - $cons += $these_cons; > + $ids += ($these_ids||0); > + $cons += ($these_cons||0); > } > > last if $hsp_end == $u_stop; > > ----- Original Message ----- From: "Chris Fields" > > To: "Mark A. Jensen" > Cc: "BioPerl List" ; "Dan Bolser" > > Sent: Tuesday, May 12, 2009 10:04 AM > Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting > 'features'fromSearchIO?) > > >> More complicated than that, I'm afraid. We should try to fix that >> at the source of the problem. >> >> This appears to stem from SearchUtils HSP tiling, which in turn >> utilizes HSPI::matches(), which in turn checks num_identical/ >> num_conserved. My guess is, since this is blasttable format, one >> of these isn't set and thus is returning the wrong value. I'll >> attempt to track it down today, but it may take some time. >> >> chris >> >> On May 12, 2009, at 8:29 AM, Mark A. Jensen wrote: >> >>> This sounds like a >>> >>> $sum = eval join( '+', @a); >>> >>> problem, which can be fixed with >>> >>> $sum = eval join('+', map { $_ || () } @a) ; >>> >>> MAJ >>> ----- Original Message ----- From: "Dan Bolser" >> > >>> To: "Chris Fields" >>> Cc: "BioPerl List" >>> Sent: Tuesday, May 12, 2009 9:17 AM >>> Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features' >>> fromSearchIO?) >>> >>> >>> 2009/5/12 Chris Fields : >>>> Fixed that in svn. We're all still learning the ropes... >>> >>> In that case, I'm seeing multiple instances of... >>> >>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>> SearchUtils.pm line 256 >>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>> SearchUtils.pm line 412 >>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>> SearchUtils.pm line 429 >>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>> SearchUtils.pm line 465 >>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>> SearchUtils.pm line 473 >>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>> SearchUtils.pm line 494 >>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>> SearchUtils.pm line 502 >>> >>> >>> Hmm... I was about to go on to complain about the weird GFF that I >>> was >>> seeing, but suddenly it looks OK. My bioperl install must think your >>> standing over my shoulder and is therefore behaving itself! >>> >>> >>> Thanks again for all the help, >>> Dan. >>> >>> >>> >>> >>>> chris >>>> >>>> On May 12, 2009, at 5:55 AM, Dan Bolser wrote: >>>> >>>>> 2009/5/12 Dan Bolser : >>>>>> >>>>>> Unfortunately bp_search2gff.pl is giving me errors: >>>>>> >>>>>> bp_search2gff.pl --version 3 -i BlastResults/ >>>>>> blast_table_filtered -f >>>>>> blasttable -o BlastResults/blast_table_filtered.gff -t hit >>>>>> --match --target --component >>>>>> >>>>>> --------------------- WARNING --------------------- >>>>>> MSG: Removing score value(s) >>>>>> --------------------------------------------------- >>>>>> Can't locate object method "remove_tags" via package >>>>>> "Bio::SeqFeature::Similarity" at >>>>>> /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/ >>>>>> Generic.pm line >>>>>> 393, line 5. >>>>> >>>>> >>>>> I'm just learning the ropes... >>>>> >>>>> --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11 >>>>> 15:25:55.000000000 +0100 >>>>> +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12 >>>>> 11:52:41.000000000 +0100 >>>>> @@ -390,7 +390,7 @@ >>>>> } >>>>> if ($self->has_tag('score')) { >>>>> $self->warn("Removing score value(s)"); >>>>> - $self->remove_tags('score'); >>>>> + $self->remove_tag('score'); >>>>> } >>>>> $self->add_tag_value('score',$value); >>>>> } >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> Anyone seen this before? >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>>> >>>>>> >>>>>> 2009/5/12 Dan Bolser : >>>>>>> >>>>>>> Thanks for the info guys, I think I was naively hoping that the >>>>>>> feature would know how to cast itself as a 'SeqFeature' (GFF). >>>>>>> >>>>>>> I think I understand the problem better now, so I'll try to >>>>>>> summarise: >>>>>>> >>>>>>> There is no standard way to encode a HSP as a feature (not least >>>>>>> because there are two choices about which sequence (query or >>>>>>> the hit) >>>>>>> it should be attached to). BioPerl will try, but the result >>>>>>> will not >>>>>>> be "well structured" SeqFeatures or "well formed" GFF. >>>>>>> >>>>>>> >>>>>>> From what I read I guess it should be possible to standardize >>>>>>> this >>>>>>> mapping (based on something in one of the examples or the >>>>>>> 'search2gff' >>>>>>> script), assuming you specify weather you want features put on >>>>>>> the >>>>>>> query or on the hit. >>>>>>> >>>>>>> At some point last year I was trying out the bp_search2gff.pl >>>>>>> and my >>>>>>> own code to write a GFF file for loading and viewing by >>>>>>> Gbrowse. At >>>>>>> that time I gave up, as nothing seemed to be working. I was >>>>>>> hoping >>>>>>> that doing this at a lower level (i.e. never writing any GFF >>>>>>> myself) >>>>>>> it would stand a better chance of working. >>>>>>> >>>>>>> Also I was thinking that Gbrowse, if given a >>>>>>> SeqFeature::Store, could >>>>>>> autoconfigure its interface to some degree. I guess its back >>>>>>> to the >>>>>>> docs ;-) >>>>>>> >>>>>>> >>>>>>> >>>>>>> I'll keep trying and see if I can get anywhere. >>>>>>> >>>>>>> Thanks again, >>>>>>> Dan. >>>>>>> >>>>>>> >>>>>>> >>>>>>> References for the above: >>>>>>> >>>>>>> 2009/5/11 Jason Stajich : >>>>>>> >>>>>>>> otherwise you need to be converting the HSPs into >>>>>>>> seqfeatures with the >>>>>>>> right associated information (i.e. the tag/value pairs that >>>>>>>> are in the 9th >>>>>>>> column) in order to have well structured data in the database. >>>>>>> >>>>>>>> You can get the individual features from the feature pair with >>>>>>>> $hsp->query or $hsp->hit which can also be passed to a GFF >>>>>>>> writer (or call >>>>>>>> $hsp->hit->gff_string). Note that since the data storage is >>>>>>>> not structured >>>>>>>> in a GFF3 like-way this won't immediately produce well >>>>>>>> formed GFF3 for the >>>>>>>> 9th column. >>>>>>> >>>>>>> >>>>>>> 2009/5/11 Chris Fields : >>>>>>> >>>>>>>> The main problem is the mapping is subjective based on what >>>>>>>> your >>>>>>>> reference sequence is within the BLAST run (e.g. whether it >>>>>>>> is the query or >>>>>>>> the hit), and is something that can't be automatically >>>>>>>> discerned. I ended >>>>>>>> up rolling my own with SeqFeature::Store (just mapped the >>>>>>>> relevant data to >>>>>>>> Bio::DB::SeqFeatures), but I have long wanted to fix up the >>>>>>>> relevant scripts >>>>>>>> to integrate my changes in, just haven't had the time >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason at bioperl.org From j_martin at lbl.gov Tue May 12 19:34:08 2009 From: j_martin at lbl.gov (Joel Martin) Date: Tue, 12 May 2009 16:34:08 -0700 Subject: [Bioperl-l] alignable portion of a genome In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32493DA54D2@exchsth.agresearch.co.nz> References: <23480025.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF32493DA5373@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32493DA54D2@exchsth.agresearch.co.nz> Message-ID: <20090512233408.GB17765@eniac.jgi-psf.org> Hello, Doing this with hashes ends up being a little inefficient for larger kmers like 35 in larger genomes. A suffix array tool like 'tallymer' will tell you the unique/non-unique kmer counts quickly, and as Aaron suggested generating fake reads based on a reference and introduce errors into them so you can evaluate how well they map back is a good strategy. maq has a command to do that built in. Joel On Wed, May 13, 2009 at 09:27:57AM +1200, Smithies, Russell wrote: > Adding the mutations is a little hacky (and probably slow) but I think it works correctly. > The stats should work out OK but it's too early and I haven't had a coffee yet so can't be sure :-) > > --Russell > > ============================ > #!perl -w > > my $seq = "atcgacgatcgaacgatcga"; > my $debug = 0; > > > foreach ($seq =~ /(?=(\w{5}))/g){ > $h++; > # add all the exact words to the hash > $hash{$_}++; > print "$_\n" if $debug; > # mutate words and add to hash > my at rr = mutate($_); > foreach (@rr){ > print "$_\n" if $debug; > $h++; > $hash{$_}++; > } > } > > > # print out the hash counts & stats > foreach (keys %hash){ > print "$_\t$hash{$_}\n" if $debug; > $singles++ if($hash{$_} eq 1); > } > print $singles/$h,"\n"; > > > sub mutate{ > my @array = split '',shift; > my @res = (); > my $rep = 'X'; > for(my$i = 0; $i <= $#array; $i++){ > my $old1 = $array[$i]; > splice @array, $i, 1, $rep; > push @res, (join '', @array); > for(my$j = $i+1; $j <= $#array; $j++){ > my $old2 = $array[$j]; > splice @array, $j, 1, $rep; > push @res, (join '', @array); > splice @array, $j, 1, $old2; > } > splice @array, $i, 1, $old1; > } > return @res; > } > > ================================ > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Smithies, Russell > > Sent: Tuesday, 12 May 2009 3:56 p.m. > > To: 'fadista'; 'Bioperl-l at lists.open-bio.org' > > Subject: Re: [Bioperl-l] alignable portion of a genome > > > > Perfect matches is easy: > > > > $seq = "atcgacgatcgaacgatcga"; > > > > foreach ($seq =~ /(?=(\w{5}))/g){$h++; $hash{$_}++} > > foreach (keys %hash){ $singles++ if($hash{$_} eq 1)} > > print $singles/$h; > > > > Could probably be done with map as well. > > Counting the miss-matches might take a bit more thinking.... > > Any ideas MAJ? > > > > --Russell > > > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of fadista > > > Sent: Monday, 11 May 2009 9:32 p.m. > > > To: Bioperl-l at lists.open-bio.org > > > Subject: [Bioperl-l] alignable portion of a genome > > > > > > > > > Hi, > > > > > > I would like to know of a good and fast way that could help me calculate the > > > alignable portion of a genome (not human), given a reference sequence. > > > When I say alignable portion I mean that I want to know all the positions of > > > the genome that can be covered uniquely by reads of 36 bp and up to 2 > > > mismatches. > > > > > > Some have advised me to work with Perl using the following strategy but I am > > > not a Perl user so if someone has already a script for this function, it > > > would be nice: > > > > > > "you could approach it by walking along the genome in a sliding window of > > > 36 nt, and hash the frequency of each 36 nt sequence that you encounter. > > > Then count how many of the 36 nt sequences had a frequency of exactly > > > one. Divide this by the total number of 36nt windows visited. This > > > should be do-able in about 20 lines of Perl." > > > > > > > > > Best regards and thanks in advance > > > > > > -- > > > View this message in context: http://www.nabble.com/alignable-portion-of-a- > > > genome-tp23480025p23480025.html > > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue May 12 20:31:53 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 May 2009 20:31:53 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] Non-root installation: problem withModule::Builder versions In-Reply-To: <71ee57c70905121036w7fba1ac6x7e68db41f3035ee3@mail.gmail.com> References: <4A03986D.7080007@gmail.com> <71ee57c70905121036w7fba1ac6x7e68db41f3035ee3@mail.gmail.com> Message-ID: <35C29690B62445D4AAE15050823A6514@NewLife> Is this really an install problem? The error begins in Module::Build::Base in site_perl, no problem there. The error says $VAR1 has got scoping problems; that doesn't sound like a permissions problem. ----- Original Message ----- From: "Dave Clements, GMOD Help Desk" To: "Neil Saunders" ; "BioPerl List" Cc: Sent: Tuesday, May 12, 2009 1:36 PM Subject: Re: [Bioperl-l] [Gmod-gbrowse] Non-root installation: problem withModule::Builder versions Hi Neil, I'm cross-posting your question to the BioPerl list as 1) it is more of a perl question than a GBrowse question, and 2) I don't know the answer. Dave C. GMOD Help Desk Was this helpful? Let us know at http://gmod.org/wiki/Help_Desk_Feedback Learn more about GMOD at SMBE & Arthropod Genomics: http://ccg.biology.uiowa.edu/smbe/symposia.php?action=view&sym_ID=27 http://www.k-state.edu/agc/symp2009/seminar.html On Thu, May 7, 2009 at 7:26 PM, Neil Saunders wrote: > I'm trying to install the latest Gbrowse (1.99) on a machine where I do > not have root access (Ubuntu/dapper). > > I have set up non-root CPAN and installed all of the prerequisites, no > problems, in ~/lib/perl5. However, when I try to install Gbrowse either > via CPAN or using the latest CVS Build script, I run into this problem: > > Global symbol "$VAR1" requires explicit package name at (eval 28) line > 1088, line 1. > ...propagated at /usr/local/share/perl/5.8.7/Module/Build/Base.pm line > 1002, line 1. > make: *** [all] Error 255 > LDS/GBrowse-1.99.tar.gz > /usr/bin/make -- NOT OK > > > It seems that there are 2 versions of Module::Builder on the machine. I > have installed a version from CPAN which is found in > ~/lib/perl5/site_perl/Module/. However, from the above error it looks > as though the install is trying to use a system-wide version of > Module::Build in /usr/local/share/perl/5.8.7. > > Can anyone shed any light on either the error message, or a way to force > usage of my $HOME module, not the system one? > > thanks, > Neil Saunders > -- > Statistical Bioinformatics - Health > CSIRO Mathematical and Information Sciences > Locked Bag 17, North Ryde, NSW 1670, Australia > > http://friendfeed.com/neilfws > > ------------------------------------------------------------------------------ > The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your > production scanning environment may not be a perfect world - but thanks to > Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700 > Series Scanner you'll get full speed at 300 dpi even with all image > processing features enabled. http://p.sf.net/sfu/kodak-com > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue May 12 20:46:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 12 May 2009 19:46:39 -0500 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?) In-Reply-To: References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com><2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com><2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com><9ED2DE76-C331-41D2-B303-7BD97B5CAF10@illinois.edu><2c8757af0905120617v37aa7b8udf94ca558ed3415@mail.gmail.com> <88072CDBECA446D0A7C48E74237F59FE@NewLife> Message-ID: We should probably indicate this in the BLAST docs (and possibly deprecate using tile_hsps and its ilk in the long run). Worries me seeing modification of the score where none is apparent, so it may be worth tracking that down. chris On May 12, 2009, at 6:07 PM, Jason Stajich wrote: > I really don't think tile_hsps should be used on BLAST data folks, > it is a pretty blind approach. > If you really want the right answer you need to do -links with WU- > BLAST or FASTA. > Been discussed a few times on the mailing list. > > Good to fix the code bug I guess to avoid the warnings, but unless > you are going to walk through all the HSPs and extract the > consistent paths wrt query I think you'll have loops, etc in there > which will make Hit->percent_id non-accurate. > > -jason > > On May 12, 2009, at 1:06 PM, Mark A. Jensen wrote: > >> Patch below (to SearchUtils.pm) fixes the non-numeric warnings on >> Dan's data, but something deeper may be going on. >> Also get many of the following warnings, haven't looked at it >> closely: >> >> --------------------- WARNING --------------------- >> MSG: Removing score value(s) >> --------------------------------------------------- >> >> PATCH: >> >> Index: SearchUtils.pm >> =================================================================== >> --- SearchUtils.pm (revision 15674) >> +++ SearchUtils.pm (working copy) >> @@ -252,8 +252,8 @@ >> } >> >> $qctg_dat{ "$frame$strand" }->{'length_aln_query'} += $_- >> >{'stop'} - $_->{'start'} + 1; >> - $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_- >> >{'iden'}; >> - $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_- >> >{'cons'}; >> + $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_- >> >{'iden'} || 0; >> + $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_- >> >{'cons'} || 0; >> $qctg_dat{ "$frame$strand" }->{'qstrand'} = $strand; >> } >> >> @@ -407,9 +407,12 @@ >> }; >> if($@) { warn "\a\n$@\n"; } >> else { >> + # make sure numerical >> + $_->{'iden'} ||= 0; >> + $_->{'cons'} ||= 0; >> $_->{'start'} = $start; # Assign a new start >> coordinate to the contig >> - $_->{'iden'} += $numID; # and add new data to >> #identical, #conserved. >> - $_->{'cons'} += $numCons; >> + $_->{'iden'} += ($numID||0); # and add new data to >> #identical, #conserved. >> + $_->{'cons'} += ($numCons||0); >> push(@{$_->{hsps}}, $hsp); >> $overlap = 1; >> } >> @@ -424,9 +427,13 @@ >> }; >> if($@) { warn "\a\n$@\n"; } >> else { >> + # make sure numerical >> + $_->{'iden'} ||= 0; >> + $_->{'cons'} ||= 0; >> + >> $_->{'stop'} = $stop; # Assign a new stop coordinate >> to the contig >> - $_->{'iden'} += $numID; # and add new data to >> #identical, #conserved. >> - $_->{'cons'} += $numCons; >> + $_->{'iden'} += ($numID||0); # and add new data to >> #identical, #conserved. >> + $_->{'cons'} += ($numCons||0); >> push(@{$_->{hsps}}, $hsp); >> $overlap = 1; >> } >> @@ -461,8 +468,8 @@ >> }; >> if($@) { warn "\a\n$@\n"; } >> else { >> - $ids += $these_ids; >> - $cons += $these_cons; >> + $ids += ($these_ids||0); >> + $cons += ($these_cons||0); >> } >> >> last if $hsp_start == $u_start; >> @@ -490,8 +497,8 @@ >> }; >> if($@) { warn "\a\n$@\n"; } >> else { >> - $ids += $these_ids; >> - $cons += $these_cons; >> + $ids += ($these_ids||0); >> + $cons += ($these_cons||0); >> } >> >> last if $hsp_end == $u_stop; >> >> ----- Original Message ----- From: "Chris Fields" > > >> To: "Mark A. Jensen" >> Cc: "BioPerl List" ; "Dan Bolser" > > >> Sent: Tuesday, May 12, 2009 10:04 AM >> Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting >> 'features'fromSearchIO?) >> >> >>> More complicated than that, I'm afraid. We should try to fix that >>> at the source of the problem. >>> >>> This appears to stem from SearchUtils HSP tiling, which in turn >>> utilizes HSPI::matches(), which in turn checks num_identical/ >>> num_conserved. My guess is, since this is blasttable format, one >>> of these isn't set and thus is returning the wrong value. I'll >>> attempt to track it down today, but it may take some time. >>> >>> chris >>> >>> On May 12, 2009, at 8:29 AM, Mark A. Jensen wrote: >>> >>>> This sounds like a >>>> >>>> $sum = eval join( '+', @a); >>>> >>>> problem, which can be fixed with >>>> >>>> $sum = eval join('+', map { $_ || () } @a) ; >>>> >>>> MAJ >>>> ----- Original Message ----- From: "Dan Bolser" >>> > >>>> To: "Chris Fields" >>>> Cc: "BioPerl List" >>>> Sent: Tuesday, May 12, 2009 9:17 AM >>>> Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features' >>>> fromSearchIO?) >>>> >>>> >>>> 2009/5/12 Chris Fields : >>>>> Fixed that in svn. We're all still learning the ropes... >>>> >>>> In that case, I'm seeing multiple instances of... >>>> >>>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>>> SearchUtils.pm line 256 >>>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>>> SearchUtils.pm line 412 >>>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>>> SearchUtils.pm line 429 >>>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>>> SearchUtils.pm line 465 >>>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>>> SearchUtils.pm line 473 >>>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>>> SearchUtils.pm line 494 >>>> Argument "" isn't numeric in addition (+) at Bio/Search/ >>>> SearchUtils.pm line 502 >>>> >>>> >>>> Hmm... I was about to go on to complain about the weird GFF that >>>> I was >>>> seeing, but suddenly it looks OK. My bioperl install must think >>>> your >>>> standing over my shoulder and is therefore behaving itself! >>>> >>>> >>>> Thanks again for all the help, >>>> Dan. >>>> >>>> >>>> >>>> >>>>> chris >>>>> >>>>> On May 12, 2009, at 5:55 AM, Dan Bolser wrote: >>>>> >>>>>> 2009/5/12 Dan Bolser : >>>>>>> >>>>>>> Unfortunately bp_search2gff.pl is giving me errors: >>>>>>> >>>>>>> bp_search2gff.pl --version 3 -i BlastResults/ >>>>>>> blast_table_filtered -f >>>>>>> blasttable -o BlastResults/blast_table_filtered.gff -t hit >>>>>>> --match --target --component >>>>>>> >>>>>>> --------------------- WARNING --------------------- >>>>>>> MSG: Removing score value(s) >>>>>>> --------------------------------------------------- >>>>>>> Can't locate object method "remove_tags" via package >>>>>>> "Bio::SeqFeature::Similarity" at >>>>>>> /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/ >>>>>>> Generic.pm line >>>>>>> 393, line 5. >>>>>> >>>>>> >>>>>> I'm just learning the ropes... >>>>>> >>>>>> --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11 >>>>>> 15:25:55.000000000 +0100 >>>>>> +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12 >>>>>> 11:52:41.000000000 +0100 >>>>>> @@ -390,7 +390,7 @@ >>>>>> } >>>>>> if ($self->has_tag('score')) { >>>>>> $self->warn("Removing score value(s)"); >>>>>> - $self->remove_tags('score'); >>>>>> + $self->remove_tag('score'); >>>>>> } >>>>>> $self->add_tag_value('score',$value); >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Anyone seen this before? >>>>>>> >>>>>>> Cheers, >>>>>>> Dan. >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2009/5/12 Dan Bolser : >>>>>>>> >>>>>>>> Thanks for the info guys, I think I was naively hoping that the >>>>>>>> feature would know how to cast itself as a 'SeqFeature' (GFF). >>>>>>>> >>>>>>>> I think I understand the problem better now, so I'll try to >>>>>>>> summarise: >>>>>>>> >>>>>>>> There is no standard way to encode a HSP as a feature (not >>>>>>>> least >>>>>>>> because there are two choices about which sequence (query or >>>>>>>> the hit) >>>>>>>> it should be attached to). BioPerl will try, but the result >>>>>>>> will not >>>>>>>> be "well structured" SeqFeatures or "well formed" GFF. >>>>>>>> >>>>>>>> >>>>>>>> From what I read I guess it should be possible to standardize >>>>>>>> this >>>>>>>> mapping (based on something in one of the examples or the >>>>>>>> 'search2gff' >>>>>>>> script), assuming you specify weather you want features put >>>>>>>> on the >>>>>>>> query or on the hit. >>>>>>>> >>>>>>>> At some point last year I was trying out the >>>>>>>> bp_search2gff.pl and my >>>>>>>> own code to write a GFF file for loading and viewing by >>>>>>>> Gbrowse. At >>>>>>>> that time I gave up, as nothing seemed to be working. I was >>>>>>>> hoping >>>>>>>> that doing this at a lower level (i.e. never writing any GFF >>>>>>>> myself) >>>>>>>> it would stand a better chance of working. >>>>>>>> >>>>>>>> Also I was thinking that Gbrowse, if given a >>>>>>>> SeqFeature::Store, could >>>>>>>> autoconfigure its interface to some degree. I guess its back >>>>>>>> to the >>>>>>>> docs ;-) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I'll keep trying and see if I can get anywhere. >>>>>>>> >>>>>>>> Thanks again, >>>>>>>> Dan. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> References for the above: >>>>>>>> >>>>>>>> 2009/5/11 Jason Stajich : >>>>>>>> >>>>>>>>> otherwise you need to be converting the HSPs into >>>>>>>>> seqfeatures with the >>>>>>>>> right associated information (i.e. the tag/value pairs that >>>>>>>>> are in the 9th >>>>>>>>> column) in order to have well structured data in the database. >>>>>>>> >>>>>>>>> You can get the individual features from the feature pair with >>>>>>>>> $hsp->query or $hsp->hit which can also be passed to a GFF >>>>>>>>> writer (or call >>>>>>>>> $hsp->hit->gff_string). Note that since the data storage is >>>>>>>>> not structured >>>>>>>>> in a GFF3 like-way this won't immediately produce well >>>>>>>>> formed GFF3 for the >>>>>>>>> 9th column. >>>>>>>> >>>>>>>> >>>>>>>> 2009/5/11 Chris Fields : >>>>>>>> >>>>>>>>> The main problem is the mapping is subjective based on what >>>>>>>>> your >>>>>>>>> reference sequence is within the BLAST run (e.g. whether it >>>>>>>>> is the query or >>>>>>>>> the hit), and is something that can't be automatically >>>>>>>>> discerned. I ended >>>>>>>>> up rolling my own with SeqFeature::Store (just mapped the >>>>>>>>> relevant data to >>>>>>>>> Bio::DB::SeqFeatures), but I have long wanted to fix up the >>>>>>>>> relevant scripts >>>>>>>>> to integrate my changes in, just haven't had the time >>>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason at bioperl.org > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue May 12 22:00:41 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 12 May 2009 22:00:41 -0400 Subject: [Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?) In-Reply-To: References: <2c8757af0905120210i607dfb90sad0d426e6e6b4a4e@mail.gmail.com><2c8757af0905120311y3075c96cs12b0bfb1ad9c0d52@mail.gmail.com><2c8757af0905120355i576ef21o3f1ed8774c00d01c@mail.gmail.com><9ED2DE76-C331-41D2-B303-7BD97B5CAF10@illinois.edu><2c8757af0905120617v37aa7b8udf94ca558ed3415@mail.gmail.com> <88072CDBECA446D0A7C48E74237F59FE@NewLife> Message-ID: I dislike my patch, because it doesn't get to the bottom of why data members associated with numbers of conserved sites return from eval's undefined; it seems clear from the code that this is unexpected. Please excuse my naivete--why would this happen only in the blasttable format, and why hasn't this thing clucked before? ----- Original Message ----- From: Jason Stajich To: Mark A. Jensen Cc: Chris Fields ; BioPerl List ; Dan Bolser Sent: Tuesday, May 12, 2009 7:07 PM Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?) I really don't think tile_hsps should be used on BLAST data folks, it is a pretty blind approach. If you really want the right answer you need to do -links with WU-BLAST or FASTA. Been discussed a few times on the mailing list. Good to fix the code bug I guess to avoid the warnings, but unless you are going to walk through all the HSPs and extract the consistent paths wrt query I think you'll have loops, etc in there which will make Hit->percent_id non-accurate. -jason On May 12, 2009, at 1:06 PM, Mark A. Jensen wrote: Patch below (to SearchUtils.pm) fixes the non-numeric warnings on Dan's data, but something deeper may be going on. Also get many of the following warnings, haven't looked at it closely: --------------------- WARNING --------------------- MSG: Removing score value(s) --------------------------------------------------- PATCH: Index: SearchUtils.pm =================================================================== --- SearchUtils.pm (revision 15674) +++ SearchUtils.pm (working copy) @@ -252,8 +252,8 @@ } $qctg_dat{ "$frame$strand" }->{'length_aln_query'} += $_->{'stop'} - $_->{'start'} + 1; - $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_->{'iden'}; - $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_->{'cons'}; + $qctg_dat{ "$frame$strand" }->{'totalIdentical'} += $_->{'iden'} || 0; + $qctg_dat{ "$frame$strand" }->{'totalConserved'} += $_->{'cons'} || 0; $qctg_dat{ "$frame$strand" }->{'qstrand'} = $strand; } @@ -407,9 +407,12 @@ }; if($@) { warn "\a\n$@\n"; } else { + # make sure numerical + $_->{'iden'} ||= 0; + $_->{'cons'} ||= 0; $_->{'start'} = $start; # Assign a new start coordinate to the contig - $_->{'iden'} += $numID; # and add new data to #identical, #conserved. - $_->{'cons'} += $numCons; + $_->{'iden'} += ($numID||0); # and add new data to #identical, #conserved. + $_->{'cons'} += ($numCons||0); push(@{$_->{hsps}}, $hsp); $overlap = 1; } @@ -424,9 +427,13 @@ }; if($@) { warn "\a\n$@\n"; } else { + # make sure numerical + $_->{'iden'} ||= 0; + $_->{'cons'} ||= 0; + $_->{'stop'} = $stop; # Assign a new stop coordinate to the contig - $_->{'iden'} += $numID; # and add new data to #identical, #conserved. - $_->{'cons'} += $numCons; + $_->{'iden'} += ($numID||0); # and add new data to #identical, #conserved. + $_->{'cons'} += ($numCons||0); push(@{$_->{hsps}}, $hsp); $overlap = 1; } @@ -461,8 +468,8 @@ }; if($@) { warn "\a\n$@\n"; } else { - $ids += $these_ids; - $cons += $these_cons; + $ids += ($these_ids||0); + $cons += ($these_cons||0); } last if $hsp_start == $u_start; @@ -490,8 +497,8 @@ }; if($@) { warn "\a\n$@\n"; } else { - $ids += $these_ids; - $cons += $these_cons; + $ids += ($these_ids||0); + $cons += ($these_cons||0); } last if $hsp_end == $u_stop; ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Dan Bolser" Sent: Tuesday, May 12, 2009 10:04 AM Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features'fromSearchIO?) More complicated than that, I'm afraid. We should try to fix that at the source of the problem. This appears to stem from SearchUtils HSP tiling, which in turn utilizes HSPI::matches(), which in turn checks num_identical/ num_conserved. My guess is, since this is blasttable format, one of these isn't set and thus is returning the wrong value. I'll attempt to track it down today, but it may take some time. chris On May 12, 2009, at 8:29 AM, Mark A. Jensen wrote: This sounds like a $sum = eval join( '+', @a); problem, which can be fixed with $sum = eval join('+', map { $_ || () } @a) ; MAJ ----- Original Message ----- From: "Dan Bolser" To: "Chris Fields" Cc: "BioPerl List" Sent: Tuesday, May 12, 2009 9:17 AM Subject: Re: [Bioperl-l] SearchIO to GFF (was: Getting 'features' fromSearchIO?) 2009/5/12 Chris Fields : Fixed that in svn. We're all still learning the ropes... In that case, I'm seeing multiple instances of... Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 256 Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 412 Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 429 Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 465 Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 473 Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 494 Argument "" isn't numeric in addition (+) at Bio/Search/ SearchUtils.pm line 502 Hmm... I was about to go on to complain about the weird GFF that I was seeing, but suddenly it looks OK. My bioperl install must think your standing over my shoulder and is therefore behaving itself! Thanks again for all the help, Dan. chris On May 12, 2009, at 5:55 AM, Dan Bolser wrote: 2009/5/12 Dan Bolser : Unfortunately bp_search2gff.pl is giving me errors: bp_search2gff.pl --version 3 -i BlastResults/blast_table_filtered -f blasttable -o BlastResults/blast_table_filtered.gff -t hit --match --target --component --------------------- WARNING --------------------- MSG: Removing score value(s) --------------------------------------------------- Can't locate object method "remove_tags" via package "Bio::SeqFeature::Similarity" at /local/Scratch/dbolser/perl5/lib/perl5/Bio/SeqFeature/Generic.pm line 393, line 5. I'm just learning the ropes... --- ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm~ 2009-05-11 15:25:55.000000000 +0100 +++ ~/perl5/lib/perl5/Bio/SeqFeature/Generic.pm 2009-05-12 11:52:41.000000000 +0100 @@ -390,7 +390,7 @@ } if ($self->has_tag('score')) { $self->warn("Removing score value(s)"); - $self->remove_tags('score'); + $self->remov