From p.j.a.cock at googlemail.com Wed Jul 1 03:44:12 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 1 Jul 2009 08:44:12 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> Message-ID: <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> Hi all (BioPerl and Biopython), This is a continuation of a long thread on the BioPerl mailing list, which I have now CC'd to the Biopython mailing list. See: http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030265.html On this thread we have been discussing next gen sequencing tools and co-coordinating things like consistent file format naming between Biopython, BioPerl and EMBOSS. I've been chatting to Peter Rice (EMBOSS) while at BOSC/ISMB 2009, and he will look into setting up a cross project mailing list for this kind of discussion in future. In the mean time, my replies to Giles below cover both BioPerl and Biopython (and EMBOSS). Giles' original email is here: http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html Peter On 6/30/09, Giles Weaver wrote: > > I'm developing a transcriptomics database for use with next-gen data, and > have found processing the raw data to be a big hurdle. > > I'm a bit late in responding to this thread, so most issues have already > been discussed. One thing that hasn't been mentioned is removal of adapters > from raw Illumina sequence. This is a PITA, and I'm not aware of any well > developed and documented open source software for removal of adapters > (and poor quality sequence) from Illumina reads. > > My current Illumina sequence processing pipeline is an unholy mix of > biopython, bioperl, pure perl, emboss and bowtie. Biopython for converting > the Illumina fastq to Sanger fastq, bioperl to read the quality values, > pure perl to trim the poor quality sequence from each read, and bioperl > with emboss to remove the adapter sequence. I'm aware that the pipeline > contains bugs and would like to simplify it, but at least it does work... > > Ideally I'd like to replace as much of the pipeline as possible with > bioperl/bioperl-run, but this isn't currently possible due to both a lack > of features and poor performance. I'm sure the features will come with > time, but the performance is more of a concern to me. .. I gather you would rather work with (Bio)Perl, but since you are already using Biopython to do the FASTQ conversion, you could also use it for more of your pipe line. Our tutorial includes examples of simple FASTQ quality filtering, and trimming of primer sequences (something like this might be helpful for removing adaptors). See: http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf Alternatively, with the new release of EMBOSS this July, you will also be able to do the Illumina FASTQ to Sanger standard FASTQ with EMBOSS, and I'm sure BioPerl will offer this soon too. > Regarding trimming bad quality bases (see comments from > Tristan Lefebure) from Solexa/Illumina reads, I did find a mixed > pure/bioperl solution to be much faster than a primarily bioperl > based implementation. I found Bio::Seq->subseq(a,b) and > Bio::Seq->subqual(a,b) to be far too slow. My current code trims > ~1300 sequences/second, including unzipping the raw data and > converting it to sanger fastq with biopython. Processing an entire > sequencing run with the whole pipeline takes in the region of 6-12h. There are several ways of doing quality trimming, and it would make an excellent cookbook example (both for BioPerl and Biopython). Could you go into a bit more detail about your trimming algorithm? e.g. Do you just trim any bases on the right below a certain threshold, perhaps with a minimum length to retain the trimmed read afterwards? > Hope this looooong post was of interest to someone! I was interested at least ;) Peter From cjfields at illinois.edu Wed Jul 1 08:35:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 1 Jul 2009 07:35:14 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> Message-ID: <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> Peter, I just committed a fix to FASTQ parsing last night to support read/ write for Sanger/Solexa/Illumina following the biopython convention; the only thing needed is more extensive testing for the quality scores. There are a few other oddities with it I intend to address soon, but it appears to be working. The Seq instance iterator actually calls a raw data iterator (hash refs of named arguments to the class constructor). That should act as a decent filtering step if needed. We have automated EMBOSS wrapping but I'm not sure how intuitive it is; we can probably reconfigure some of that. chris On Jul 1, 2009, at 2:44 AM, Peter Cock wrote: > Hi all (BioPerl and Biopython), > > This is a continuation of a long thread on the BioPerl mailing > list, which I have now CC'd to the Biopython mailing list. See: > http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030265.html > > On this thread we have been discussing next gen sequencing > tools and co-coordinating things like consistent file format > naming between Biopython, BioPerl and EMBOSS. I've been > chatting to Peter Rice (EMBOSS) while at BOSC/ISMB 2009, > and he will look into setting up a cross project mailing list for > this kind of discussion in future. > > In the mean time, my replies to Giles below cover both BioPerl > and Biopython (and EMBOSS). Giles' original email is here: > http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html > > Peter > > On 6/30/09, Giles Weaver wrote: >> >> I'm developing a transcriptomics database for use with next-gen >> data, and >> have found processing the raw data to be a big hurdle. >> >> I'm a bit late in responding to this thread, so most issues have >> already >> been discussed. One thing that hasn't been mentioned is removal of >> adapters >> from raw Illumina sequence. This is a PITA, and I'm not aware of >> any well >> developed and documented open source software for removal of adapters >> (and poor quality sequence) from Illumina reads. >> >> My current Illumina sequence processing pipeline is an unholy mix of >> biopython, bioperl, pure perl, emboss and bowtie. Biopython for >> converting >> the Illumina fastq to Sanger fastq, bioperl to read the quality >> values, >> pure perl to trim the poor quality sequence from each read, and >> bioperl >> with emboss to remove the adapter sequence. I'm aware that the >> pipeline >> contains bugs and would like to simplify it, but at least it does >> work... >> >> Ideally I'd like to replace as much of the pipeline as possible with >> bioperl/bioperl-run, but this isn't currently possible due to both >> a lack >> of features and poor performance. I'm sure the features will come >> with >> time, but the performance is more of a concern to me. .. > > I gather you would rather work with (Bio)Perl, but since you are > already using Biopython to do the FASTQ conversion, you could > also use it for more of your pipe line. Our tutorial includes examples > of simple FASTQ quality filtering, and trimming of primer sequences > (something like this might be helpful for removing adaptors). See: > http://biopython.org/DIST/docs/tutorial/Tutorial.html > http://biopython.org/DIST/docs/tutorial/Tutorial.pdf > > Alternatively, with the new release of EMBOSS this July, you will > also be able to do the Illumina FASTQ to Sanger standard FASTQ > with EMBOSS, and I'm sure BioPerl will offer this soon too. > >> Regarding trimming bad quality bases (see comments from >> Tristan Lefebure) from Solexa/Illumina reads, I did find a mixed >> pure/bioperl solution to be much faster than a primarily bioperl >> based implementation. I found Bio::Seq->subseq(a,b) and >> Bio::Seq->subqual(a,b) to be far too slow. My current code trims >> ~1300 sequences/second, including unzipping the raw data and >> converting it to sanger fastq with biopython. Processing an entire >> sequencing run with the whole pipeline takes in the region of 6-12h. > > There are several ways of doing quality trimming, and it would > make an excellent cookbook example (both for BioPerl and > Biopython). > > Could you go into a bit more detail about your trimming > algorithm? e.g. Do you just trim any bases on the right below > a certain threshold, perhaps with a minimum length to retain > the trimmed read afterwards? > >> Hope this looooong post was of interest to someone! > > I was interested at least ;) > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Jonathan_Epstein at nih.gov Wed Jul 1 09:20:50 2009 From: Jonathan_Epstein at nih.gov (Jonathan Epstein) Date: Wed, 01 Jul 2009 09:20:50 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> Message-ID: <4A4B62B2.3090502@nih.gov> I too am interested in these topics. In particular, I would like to learn more about "sequencing adapter removal," i.e. what these adapters look like, and what strategies you've employed for finding and removing them. Jonathan Giles Weaver wrote: > I'm developing a transcriptomics database for use with next-gen data, and > have found processing the raw data to be a big hurdle. > > I'm a bit late in responding to this thread, so most issues have already > been discussed. One thing that hasn't been mentioned is removal of adapters > from raw Illumina sequence. This is a PITA, and I'm not aware of any well > developed and documented open source software for removal of adapters (and > poor quality sequence) from Illumina reads. > > My current Illumina sequence processing pipeline is an unholy mix of > biopython, bioperl, pure perl, emboss and bowtie. Biopython for converting > the Illumina fastq to Sanger fastq, bioperl to read the quality values, pure > perl to trim the poor quality sequence from each read, and bioperl with > emboss to remove the adapter sequence. I'm aware that the pipeline contains > bugs and would like to simplify it, but at least it does work... > > Ideally I'd like to replace as much of the pipeline as possible with > bioperl/bioperl-run, but this isn't currently possible due to both a lack of > features and poor performance. I'm sure the features will come with time, > but the performance is more of a concern to me. I wonder if Bio::Moose might > be used to alleviate some of the performance issues? Might next-gen modules > be an ideal guinea pig for Bio::Moose? > > For my purposes the tools that would love to see supported in > bioperl/bioperl-run are: > > - next-gen sequence quality parsing (to output phred scores) > - sequence quality based trimming > - sequencing adapter removal > - filtering based on sequence complexity (repeats, entropy etc) > - bioperl-run modules for bowtie etc. > > Obviously all of these need to be fast! > I'd love to muck in, but I doubt I'll contribute much before > Bio::Moose/bioperl6, as the (bio)perl object system gives me nightmares! > > Regarding trimming bad quality bases (see comments from Tristan Lefebure) > from Solexa/Illumina reads, I did find a mixed pure/bioperl solution to be > much faster than a primarily bioperl based implementation. I found > Bio::Seq->subseq(a,b) and Bio::Seq->subqual(a,b) to be far too slow. My > current code trims ~1300 sequences/second, including unzipping the raw data > and converting it to sanger fastq with biopython. Processing an entire > sequencing run with the whole pipeline takes in the region of 6-12h. > > Hope this looooong post was of interest to someone! > > Giles > > 2009/6/17 Tristan Lefebure > > >> Hello, >> Regarding next-gen sequences and bioperl, following my >> experience, another issue is bioperl speed. For example, if >> you want to trim bad quality bases at ends of 1E6 Solexa >> reads using Bio::SeqIO::fastq and some methods in >> Bio::Seq::Quality, well, you've got to be patient (but may >> be I missed some shortcuts...). >> >> A pure perl solution will be between 100 to 1000x faster... >> Would it be possible to have an ultra-light quality object >> with few simple methods for next-gen reads? >> >> I can contribute some tests if that sounds like an important >> point. >> >> -Tristan >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed Jul 1 09:42:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 1 Jul 2009 09:42:23 -0400 Subject: [Bioperl-l] Random nucleotide string generator? In-Reply-To: References: Message-ID: <0CBBA963E35D4D218D9512B4C4671BE1@NewLife> You guys earned your scrap: http://www.bioperl.org/wiki/Random_sequence_generation cheers and thanks! MAJ ----- Original Message ----- From: "Roger Hall" To: Sent: Friday, June 26, 2009 2:28 AM Subject: [Bioperl-l] Random nucleotide string generator? > All, > > Is there a random generator for creating nucleotides (of length l with > composition frequencies a, c, g, and t) in there somewhere? > > I noticed a thread about it from 2000 and nothing since (searching for "random > sequence"). > > If not - what should the namespace be for such a module should it be undone > and desirable? > > TIA! > > Roger > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ocarnorsk138 at gmail.com Wed Jul 1 10:30:47 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Wed, 1 Jul 2009 10:30:47 -0400 Subject: [Bioperl-l] Random nucleotide string generator? In-Reply-To: <0CBBA963E35D4D218D9512B4C4671BE1@NewLife> References: <0CBBA963E35D4D218D9512B4C4671BE1@NewLife> Message-ID: Thanks for the add to the wiki Mark. Cheers. O'car Campos C. Bioinformatics Engineering Student. University of Talca. 2009/7/1 Mark A. Jensen > You guys earned your scrap: > http://www.bioperl.org/wiki/Random_sequence_generation > > cheers and thanks! MAJ > ----- Original Message ----- From: "Roger Hall" > To: > Sent: Friday, June 26, 2009 2:28 AM > Subject: [Bioperl-l] Random nucleotide string generator? > > > > All, >> >> Is there a random generator for creating nucleotides (of length l with >> composition frequencies a, c, g, and t) in there somewhere? >> >> I noticed a thread about it from 2000 and nothing since (searching for >> "random sequence"). >> >> If not - what should the namespace be for such a module should it be >> undone and desirable? >> >> TIA! >> >> Roger >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From giles.weaver at googlemail.com Wed Jul 1 12:27:22 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Wed, 1 Jul 2009 17:27:22 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> Message-ID: <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> Peter, the trimming algorithm I use employs a sliding window, as follows: - For each sequence position calculate the mean phred quality score for a window around that position. - Record whether the mean score is above or below a threshold as an array of zeros and ones. - Use a regular expression on the joined array to find the start and end of the good quality sequence(s). - Extract the quality sequence(s) and replace any bases below the quality threshold with N. - Trim any Ns from the ends. A refinement would be to weight the scores from positions in the window, but this could give a performance hit, and the method seems to work well enough as is. Chris, thanks for committing the fix, I'll give bioperl illumina fastq parsing a workout soon. Peter, as much as I'd love to help out with biopython, I'm under too much time pressure right now! Jonathan, some of the Illumina sequencing adapters are listed at http://intron.ccam.uchc.edu/groups/tgcore/wiki/013c0/Solexa_Library_Primer_Sequences.htmland http://seqanswers.com/forums/showthread.php?t=198 Adapter sequence typically appears towards the end of the read, though the latter part of it is often misread as the sequencing quality drops off. I abuse needle (EMBOSS) into aligning the adapter sequence with each read. I then use Bio::AlignIO, Bio::Range and a custom scoring scheme to identify real alignments and trim the sequence. This is not the ideal way of doing things, but it's fast enough, and does seem to work. The adapter sequence shouldn't be gapped, so I'm sure there is a lot of scope for optimising the adapter removal. I'll happily share some code once I've got it to the stage where I'm not embarrassed by it! Giles 2009/7/1 Chris Fields > Peter, > > I just committed a fix to FASTQ parsing last night to support read/write > for Sanger/Solexa/Illumina following the biopython convention; the only > thing needed is more extensive testing for the quality scores. There are a > few other oddities with it I intend to address soon, but it appears to be > working. > > The Seq instance iterator actually calls a raw data iterator (hash refs of > named arguments to the class constructor). That should act as a decent > filtering step if needed. > > We have automated EMBOSS wrapping but I'm not sure how intuitive it is; we > can probably reconfigure some of that. > > chris > > > On Jul 1, 2009, at 2:44 AM, Peter Cock wrote: > > Hi all (BioPerl and Biopython), >> >> This is a continuation of a long thread on the BioPerl mailing >> list, which I have now CC'd to the Biopython mailing list. See: >> http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030265.html >> >> On this thread we have been discussing next gen sequencing >> tools and co-coordinating things like consistent file format >> naming between Biopython, BioPerl and EMBOSS. I've been >> chatting to Peter Rice (EMBOSS) while at BOSC/ISMB 2009, >> and he will look into setting up a cross project mailing list for >> this kind of discussion in future. >> >> In the mean time, my replies to Giles below cover both BioPerl >> and Biopython (and EMBOSS). Giles' original email is here: >> http://lists.open-bio.org/pipermail/bioperl-l/2009-June/030398.html >> >> Peter >> >> On 6/30/09, Giles Weaver wrote: >> >>> >>> I'm developing a transcriptomics database for use with next-gen data, and >>> have found processing the raw data to be a big hurdle. >>> >>> I'm a bit late in responding to this thread, so most issues have already >>> been discussed. One thing that hasn't been mentioned is removal of >>> adapters >>> from raw Illumina sequence. This is a PITA, and I'm not aware of any well >>> developed and documented open source software for removal of adapters >>> (and poor quality sequence) from Illumina reads. >>> >>> My current Illumina sequence processing pipeline is an unholy mix of >>> biopython, bioperl, pure perl, emboss and bowtie. Biopython for >>> converting >>> the Illumina fastq to Sanger fastq, bioperl to read the quality values, >>> pure perl to trim the poor quality sequence from each read, and bioperl >>> with emboss to remove the adapter sequence. I'm aware that the pipeline >>> contains bugs and would like to simplify it, but at least it does work... >>> >>> Ideally I'd like to replace as much of the pipeline as possible with >>> bioperl/bioperl-run, but this isn't currently possible due to both a lack >>> of features and poor performance. I'm sure the features will come with >>> time, but the performance is more of a concern to me. .. >>> >> >> I gather you would rather work with (Bio)Perl, but since you are >> already using Biopython to do the FASTQ conversion, you could >> also use it for more of your pipe line. Our tutorial includes examples >> of simple FASTQ quality filtering, and trimming of primer sequences >> (something like this might be helpful for removing adaptors). See: >> http://biopython.org/DIST/docs/tutorial/Tutorial.html >> http://biopython.org/DIST/docs/tutorial/Tutorial.pdf >> >> Alternatively, with the new release of EMBOSS this July, you will >> also be able to do the Illumina FASTQ to Sanger standard FASTQ >> with EMBOSS, and I'm sure BioPerl will offer this soon too. >> >> Regarding trimming bad quality bases (see comments from >>> Tristan Lefebure) from Solexa/Illumina reads, I did find a mixed >>> pure/bioperl solution to be much faster than a primarily bioperl >>> based implementation. I found Bio::Seq->subseq(a,b) and >>> Bio::Seq->subqual(a,b) to be far too slow. My current code trims >>> ~1300 sequences/second, including unzipping the raw data and >>> converting it to sanger fastq with biopython. Processing an entire >>> sequencing run with the whole pipeline takes in the region of 6-12h. >>> >> >> There are several ways of doing quality trimming, and it would >> make an excellent cookbook example (both for BioPerl and >> Biopython). >> >> Could you go into a bit more detail about your trimming >> algorithm? e.g. Do you just trim any bases on the right below >> a certain threshold, perhaps with a minimum length to retain >> the trimmed read afterwards? >> >> Hope this looooong post was of interest to someone! >>> >> >> I was interested at least ;) >> >> Peter >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From cjfields at illinois.edu Wed Jul 1 12:46:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 1 Jul 2009 11:46:49 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> Message-ID: <6CAF4023-7D04-4B56-839F-E587A00DEEEA@illinois.edu> On Jul 1, 2009, at 11:27 AM, Giles Weaver wrote: ... > Peter, the trimming algorithm I use employs a sliding window, as > follows: > > - For each sequence position calculate the mean phred quality > score for a > window around that position. > - Record whether the mean score is above or below a threshold as > an array > of zeros and ones. > - Use a regular expression on the joined array to find the start > and end > of the good quality sequence(s). > - Extract the quality sequence(s) and replace any bases below the > quality > threshold with N. > - Trim any Ns from the ends. > > A refinement would be to weight the scores from positions in the > window, but > this could give a performance hit, and the method seems to work well > enough > as is. > > Chris, thanks for committing the fix, I'll give bioperl illumina fastq > parsing a workout soon. Peter, as much as I'd love to help out with > biopython, I'm under too much time pressure right now! Just let me know if the qual values match up with what is expected. You can also iterate through the data with hashrefs using next_dataset (faster than objects). This is from the fastq tests in core: ----------------------------------------- $in_qual = Bio::SeqIO->new(-file => test_input_file('fastq','test3_illumina.fastq'), -variant => 'illumina', -format => 'fastq'); $qual = $in_qual->next_dataset(); isa_ok($qual, 'HASH'); is($qual->{-seq}, 'GTTAGCTCCCACCTTAAGATGTTTA'); is($qual->{-raw_quality}, 'SXXTXXXXXXXXXTTSUXSSXKTMQ'); is($qual->{-id}, 'FC12044_91407_8_200_406_24'); is($qual->{-desc}, ''); is($qual->{-descriptor}, 'FC12044_91407_8_200_406_24'); is(join(',',@{$qual->{-qual}}[0..10]), '19,24,24,20,24,24,24,24,24,24,24'); ----------------------------------------- So one could check those values directly and then filter them through as needed directly into Bio::Seq::Quality if necessary (note some of the key values are constructor args): my $qualobj = Bio::Seq::Quality->new(%$qual); chris From gmodhelp at googlemail.com Wed Jul 1 13:38:14 2009 From: gmodhelp at googlemail.com (Dave Clements, GMOD Help Desk) Date: Wed, 1 Jul 2009 10:38:14 -0700 Subject: [Bioperl-l] August 2009 GMOD Meeting In-Reply-To: <71ee57c70907011037o574666f9k8af120c04b2ea54c@mail.gmail.com> References: <71ee57c70907011032k25daa9cche0f4778e1c2c0093@mail.gmail.com> <71ee57c70907011036w49b9c144qbe04fcd8d8d1d7d0@mail.gmail.com> <71ee57c70907011037o574666f9k8af120c04b2ea54c@mail.gmail.com> Message-ID: <71ee57c70907011038u7bf75f00x7e486cb1b8a00e35@mail.gmail.com> Hello all, The next GMOD meeting will be held 6-7 August, at the University of Oxford, in Oxford, United Kingdom. Registration is now open. Space is available on a first come, first served basis and there is room for 55 attendees. The meeting cost is ?50. ?See http://gmod.org/wiki/August_2009_GMOD_Meeting to register As with previous GMOD meetings, this meeting will have a mixture of project, component, and user talks. The agenda is driven by attendee suggestions, and you are encouraged to add your suggestions now (see http://gmod.org/wiki/August_2009_GMOD_Meeting#Agenda_Suggestions). For examples of what happens at a GMOD meeting, see the writeups of the January 2009, July 2008, or any other previous meeting (see http://gmod.org/wiki/Meetings). GMOD meetings are an excellent way to meet other GMOD developers and users and to learn (and affect) what's coming in the project. Please join us in Oxford this August, Dave Clements GMOD Help Desk Note: Unless you have applied to and been admitted to the Summer School, don't you dare register for it. The registration web site will let you do this, but bureaucratic hellishness will ensue. -- * Learn more about GMOD at: ISMB/ECCB: http://www.iscb.org/ismbeccb2009/ ? (BioMart, Chado, Galaxy, InterMine) * Please keep responses on the list! * Was this helpful? ?Let us know at http://gmod.org/wiki/Help_Desk_Feedback From p.j.a.cock at googlemail.com Thu Jul 2 03:20:07 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 2 Jul 2009 08:20:07 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> Message-ID: <320fb6e00907020020o2fa686d2yab6f185785ad8a08@mail.gmail.com> On 7/1/09, Giles Weaver wrote: > Peter, the trimming algorithm I use employs a sliding window, as follows: > > - For each sequence position calculate the mean phred quality score for a > window around that position. > - Record whether the mean score is above or below a threshold as an array > of zeros and ones. > - Use a regular expression on the joined array to find the start and end > of the good quality sequence(s). > - Extract the quality sequence(s) and replace any bases below the quality > threshold with N. > - Trim any Ns from the ends. > > A refinement would be to weight the scores from positions in the window, but > this could give a performance hit, and the method seems to work well enough > as is. Thanks for the details - that is a bit more complex that what I had been thinking. Do you have any favoured window size and quality threshold, or does this really depend on the data itself? Also, if you find a sequence read that goes "good - poor - good" for example, do you extract the two good regions as two sub reads (presumably with a minimum length)? This may be silly for Illumina where the reads are very short, but might make sense for Roche 454. > Chris, thanks for committing the fix, I'll give bioperl illumina fastq > parsing a workout soon. Peter, as much as I'd love to help out with > biopython, I'm under too much time pressure right now! Even use cases are useful - so thank you. > Jonathan, some of the Illumina sequencing adapters are listed at > http://intron.ccam.uchc.edu/groups/tgcore/wiki/013c0/Solexa_Library_Primer_Sequences.htmland > http://seqanswers.com/forums/showthread.php?t=198 > Adapter sequence typically appears towards the end of the read, though the > latter part of it is often misread as the sequencing quality drops off. > I abuse needle (EMBOSS) into aligning the adapter sequence with each read. I > then use Bio::AlignIO, Bio::Range and a custom scoring scheme to identify > real alignments and trim the sequence. This is not the ideal way of doing > things, but it's fast enough, and does seem to work. The adapter sequence > shouldn't be gapped, so I'm sure there is a lot of scope for optimising the > adapter removal. > > I'll happily share some code once I've got it to the stage where I'm not > embarrassed by it! > > Giles Cheers, Peter From florian.mittag at uni-tuebingen.de Thu Jul 2 05:28:21 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 2 Jul 2009 11:28:21 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl Message-ID: <200907021128.21239.florian.mittag@uni-tuebingen.de> Hi! I previously posted a message on the BioSQL mailinglist regarding a BioSQL schema for DB2 and we are several steps closer to completion now. We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL to fill our DB2 database with taxonomy data, but loading the gene ontology with BioPerl's "load_ontology.pl" is somewhat harder. We created the Package Bio::DB::BioSQL::DB2 and copy-pasted the contents of the Oracle package into it. Then we changed the (what we thought) appropriate methods whenever we encountered an error, but now we are a bit frustrated. We execute the command: perl load_ontology.pl --driver DB2 --dbname bioseqdb --dbuser user --dbpass passwd --namespace "Gene Ontology" --format obo --debug gene_ontology.1_2.obo It first ran a few minutes processing the file and then died after the following SQL-command was prepared and executed: "SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier = ?" I don't know if the "NULL" column is supposed to be there, but DB2 doesn't like it. After ours of digging into the code, I gave up and simply commented out the line that added the NULL column in Bio::DB::BioSQL::BaseDriver::_build_select_list ... if((! $attr) || (! $entitymap->{$tbl}) || $dont_select_attrs->{$tbl .".". $attr}) { # push(@attrs, "NULL"); } else { ... The script completed with a few warnings, like: "no adaptor found for class Bio::Annotation::TypeManager" or "-------------------- WARNING --------------------- MSG: PMID:15012271 exists in the dblink of _default" so we don't know, if it really worked. Since removing this one line will probably break compatibility with other databases, it is not a real solution and we would appreciate any hints pointing us to the real cause. We would really like to contribute to the BioPerl project by adding DB2 support, but we need some help here, since none of us has experience with either Perl or BioPerl ;-) Keep up the good work! Regards, Florian -- Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076 Tuebingen, Germany Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 From jonathancrabtree at gmail.com Thu Jul 2 09:23:54 2009 From: jonathancrabtree at gmail.com (Jonathan Crabtree) Date: Thu, 2 Jul 2009 09:23:54 -0400 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907021128.21239.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> Message-ID: <8e5b8bf80907020623p4f13e218pf824ba7b4a55bb19@mail.gmail.com> Hi Florian, Just based on what's in your e-mail, it looks as though BioSQL *wants* a NULL value to come back as the 6th column of every row in the result of that query. So by removing it you run the risk that BioSQL is going to retrieve the wrong values from the query result, at least for those columns after the 6th (and assuming all the columns are retrieved by position--it's not entirely clear if this is the case.) I'd be inclined to throw in a test here to see if the backend is DB2 and, if so, substitute the appropriate syntax instead of "NULL". I'm not sure what that syntax is, but a bit of web searching suggests that you might be able to select the value from a dummy table (this might be more difficult because it would require non-local code changes -- this method is only for the select list) or use a function called "nullif" with appropriately-chosen arguments. Another comment I saw suggested that using "NULL" was OK but it has to be coerced/typecast into the right type. Jonathan On Thu, Jul 2, 2009 at 5:28 AM, Florian Mittag < florian.mittag at uni-tuebingen.de> wrote: > Hi! > > I previously posted a message on the BioSQL mailinglist regarding a BioSQL > schema for DB2 and we are several steps closer to completion now. > > We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL to > fill > our DB2 database with taxonomy data, but loading the gene ontology with > BioPerl's "load_ontology.pl" is somewhat harder. > > We created the Package Bio::DB::BioSQL::DB2 and copy-pasted the contents of > the Oracle package into it. Then we changed the (what we thought) > appropriate > methods whenever we encountered an error, but now we are a bit frustrated. > > We execute the command: > perl load_ontology.pl --driver DB2 --dbname bioseqdb --dbuser user > --dbpass passwd --namespace "Gene Ontology" > --format obo --debug gene_ontology.1_2.obo > > It first ran a few minutes processing the file and then died after the > following SQL-command was prepared and executed: > > "SELECT term.term_id, term.identifier, term.name, term.definition, > term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier = ?" > > I don't know if the "NULL" column is supposed to be there, but DB2 doesn't > like it. After ours of digging into the code, I gave up and simply > commented > out the line that added the NULL column in > Bio::DB::BioSQL::BaseDriver::_build_select_list > > ... > if((! $attr) || (! $entitymap->{$tbl}) || > $dont_select_attrs->{$tbl .".". $attr}) { > # push(@attrs, "NULL"); > } else { > ... > > The script completed with a few warnings, like: > "no adaptor found for class Bio::Annotation::TypeManager" > or > "-------------------- WARNING --------------------- > MSG: PMID:15012271 exists in the dblink of _default" > > so we don't know, if it really worked. Since removing this one line will > probably break compatibility with other databases, it is not a real > solution > and we would appreciate any hints pointing us to the real cause. > > > We would really like to contribute to the BioPerl project by adding DB2 > support, but we need some help here, since none of us has experience with > either Perl or BioPerl ;-) > > > Keep up the good work! > > Regards, > Florian > > > > -- > Dipl. Inf. Florian Mittag > Universit?t Tuebingen > WSI-RA, Sand 1 > 72076 Tuebingen, Germany > Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florian.mittag at uni-tuebingen.de Thu Jul 2 10:52:27 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 2 Jul 2009 16:52:27 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <8e5b8bf80907020623p4f13e218pf824ba7b4a55bb19@mail.gmail.com> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <8e5b8bf80907020623p4f13e218pf824ba7b4a55bb19@mail.gmail.com> Message-ID: <200907021652.27472.florian.mittag@uni-tuebingen.de> Hi Jonathan, thanks for your quick answer. On Thursday 02 July 2009 15:23, Jonathan Crabtree wrote: > Just based on what's in your e-mail, it looks as though BioSQL *wants* a > NULL value to come back as the 6th column of every row in the result of > that query. So by removing it you run the risk that BioSQL is going to > retrieve the wrong values from the query result, at least for those columns > after the 6th (and assuming all the columns are retrieved by position--it's > not entirely clear if this is the case.) Well, what made me suspicious was that the returned columns were exactly the ones from the term table plus the NULL column. One way to verify this would be to look whether the same thing happens with other tables as well. > I'd be inclined to throw in a > test here to see if the backend is DB2 and, if so, substitute the > appropriate syntax instead of "NULL". I'm not sure what that syntax is, > but a bit of web searching suggests that you might be able to select the > value from a dummy table (this might be more difficult because it would > require non-local code changes -- this method is only for the select list) > or use a function called "nullif" with appropriately-chosen arguments. > Another comment I saw suggested that using "NULL" was OK but it has to be > coerced/typecast into the right type. Yeah, this was what I've found, too, but I couldn't figure out what was the right type to cast to. Unfortunately, now that the database is filled (hopefully correctly), the script gives me a different error message and I don't know if it is because of a change I made or because the database is not empty. Originally, I was struggling with Hibernate and I'm back to it again (damn CLOBs...), so I am happy to have a seemingly correct database to work with. I'm pretty confident that I can write a working DB2 driver for BioPerl, but for that I should start from scratch instead of copying the MySQL one and modifying it until all error messages disappear. And this would take far too much time, if I'm doing this by trial and error. Is there any developers guide that would help to find out what methods I have to override to implement database specific stuff? Thanks, Florian > On Thu, Jul 2, 2009 at 5:28 AM, Florian Mittag > wrote: > > Hi! > > > > I previously posted a message on the BioSQL mailinglist regarding a > > BioSQL schema for DB2 and we are several steps closer to completion now. > > > > We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL to > > fill > > our DB2 database with taxonomy data, but loading the gene ontology with > > BioPerl's "load_ontology.pl" is somewhat harder. > > > > We created the Package Bio::DB::BioSQL::DB2 and copy-pasted the contents > > of the Oracle package into it. Then we changed the (what we thought) > > appropriate > > methods whenever we encountered an error, but now we are a bit > > frustrated. > > > > We execute the command: > > perl load_ontology.pl --driver DB2 --dbname bioseqdb --dbuser user > > --dbpass passwd --namespace "Gene Ontology" > > --format obo --debug gene_ontology.1_2.obo > > > > It first ran a few minutes processing the file and then died after the > > following SQL-command was prepared and executed: > > > > "SELECT term.term_id, term.identifier, term.name, term.definition, > > term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier = ?" > > > > I don't know if the "NULL" column is supposed to be there, but DB2 > > doesn't like it. After ours of digging into the code, I gave up and > > simply commented > > out the line that added the NULL column in > > Bio::DB::BioSQL::BaseDriver::_build_select_list > > > > ... > > if((! $attr) || (! $entitymap->{$tbl}) || > > $dont_select_attrs->{$tbl .".". $attr}) { > > # push(@attrs, "NULL"); > > } else { > > ... > > > > The script completed with a few warnings, like: > > "no adaptor found for class Bio::Annotation::TypeManager" > > or > > "-------------------- WARNING --------------------- > > MSG: PMID:15012271 exists in the dblink of _default" > > > > so we don't know, if it really worked. Since removing this one line will > > probably break compatibility with other databases, it is not a real > > solution > > and we would appreciate any hints pointing us to the real cause. > > > > > > We would really like to contribute to the BioPerl project by adding DB2 > > support, but we need some help here, since none of us has experience with > > either Perl or BioPerl ;-) > > > > > > Keep up the good work! > > > > Regards, > > Florian > > > > > > > > -- > > Dipl. Inf. Florian Mittag > > Universit?t Tuebingen > > WSI-RA, Sand 1 > > 72076 Tuebingen, Germany > > Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Dipl. Inf. Florian Mittag Universit?t Tuebingen WSI-RA, Sand 1 72076 Tuebingen, Germany Phone: +49 7071 / 29 78985 Fax: +49 7071 / 29 5091 From jonathancrabtree at gmail.com Thu Jul 2 11:39:32 2009 From: jonathancrabtree at gmail.com (Jonathan Crabtree) Date: Thu, 2 Jul 2009 11:39:32 -0400 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907021652.27472.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <8e5b8bf80907020623p4f13e218pf824ba7b4a55bb19@mail.gmail.com> <200907021652.27472.florian.mittag@uni-tuebingen.de> Message-ID: <8e5b8bf80907020839y15e97aedldd0ebadd1fddae69@mail.gmail.com> Hi Florian, On Thu, Jul 2, 2009 at 10:52 AM, Florian Mittag wrote: > Well, what made me suspicious was that the returned columns were exactly the > ones from the term table plus the NULL column. One way to verify this would > be to look whether the same thing happens with other tables as well. Others may disagree, but I think it's fairly clear (from just looking at the subroutine you mentioned) that the inclusion of the NULL value is most definitely deliberate; note that it is only done if $entitymap doesn't have a value for the table in question, and $entitymap is described as follows: A reference to a hash table mapping entity names to aliases (if omitted, aliases will not be used, and SELECT columns can only be from one table) So I suspect what we're seeing here is a select in which aliases _aren't_ being used and therefore the order of the returned values is significant, and the NULL value is needed to keep everything in the right order for whatever piece of code is reading the result. But never having worked much with BioSQL I don't know where you'd go to find the type information needed to determine what type the NULL value needs to be coerced into... Jonathan From gummyduk at gmail.com Thu Jul 2 14:50:29 2009 From: gummyduk at gmail.com (John Tyree) Date: Thu, 2 Jul 2009 14:50:29 -0400 Subject: [Bioperl-l] Bio::DB::GenBank batch mode usage In-Reply-To: <459dd5330907011236y31fea4fey8dc20e5274e94d1a@mail.gmail.com> References: <459dd5330907011236y31fea4fey8dc20e5274e94d1a@mail.gmail.com> Message-ID: <459dd5330907021150xaf9caabvd160cbd781cf904e@mail.gmail.com> I'm trying to use Bio::DB::GenBank to download a large number of files by accession number. The docs say not to do this in normal mode to reduce server load. There is some kind of helper function associated with this. %params = Bio::DB::GenBank->get_params('batch'); But I don't understand how to use it. If you pass the hash using: Bio::DB::GenBank->new(%params); it raises the following and dies: --------------------- WARNING --------------------- MSG: invalid retrieval type tool must be one of (pipeline,io_string,tempfile --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: seq_start() must be integer value if set STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib64/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::DB::NCBIHelper::seq_start /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:416 STACK: Bio::DB::NCBIHelper::new /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:117 STACK: Find_Patient_By_AccNo.pl:93 There is a deprecated method called get_Stream_by_batch() but how does one achieve batch mode using the proper get_Stream_by_id() ? Thanks, John From cjfields at illinois.edu Thu Jul 2 15:29:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 2 Jul 2009 14:29:29 -0500 Subject: [Bioperl-l] Bio::DB::GenBank batch mode usage In-Reply-To: <459dd5330907021150xaf9caabvd160cbd781cf904e@mail.gmail.com> References: <459dd5330907011236y31fea4fey8dc20e5274e94d1a@mail.gmail.com> <459dd5330907021150xaf9caabvd160cbd781cf904e@mail.gmail.com> Message-ID: <49458034-D329-4953-883B-298355513D35@illinois.edu> If you are just downloading the records to a file it might be better to retrieve the raw records using EUtilities, providing you have either the accession number or the GI. If downloading files via Bio::DB::GenBank, it requires a preparse and write to file via Bio::SeqIO. --------------------------- use Bio::DB::EUtilities; use Bio::SeqIO; my @ids = (); # your GI/acc here my $factory = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'genbank', -id => \@ids); $factory->get_Response(-file => "records.gb"); --------------------------- If you have a long lost of IDs you can use epost first, then efetch using the search history. This page has a few recipe scripts: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook chris On Jul 2, 2009, at 1:50 PM, John Tyree wrote: > I'm trying to use Bio::DB::GenBank to download a large number of files > by accession number. The docs say not to do this in normal mode to > reduce server load. There is some kind of helper function associated > with this. > > %params = Bio::DB::GenBank->get_params('batch'); > > But I don't understand how to use it. If you pass the hash using: > > Bio::DB::GenBank->new(%params); > > it raises the following and dies: > > --------------------- WARNING --------------------- > MSG: invalid retrieval type tool must be one of > (pipeline,io_string,tempfile > --------------------------------------------------- > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: seq_start() must be integer value if set > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib64/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::DB::NCBIHelper::seq_start > /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:416 > STACK: Bio::DB::NCBIHelper::new > /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:117 > STACK: Find_Patient_By_AccNo.pl:93 > > There is a deprecated method called get_Stream_by_batch() but how does > one achieve batch mode using the proper get_Stream_by_id() ? > > Thanks, > John > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From wrp at virginia.edu Thu Jul 2 19:56:07 2009 From: wrp at virginia.edu (William Pearson) Date: Thu, 2 Jul 2009 19:56:07 -0400 Subject: [Bioperl-l] Course Announcement: 2009 CSHL Computational and Comparative Genomics Deadline References: Message-ID: <610369A4-28CA-476C-A641-654A63D1FBEE@virginia.edu> Course announcement - Application deadline, July 15, 2009 Cold Spring Harbor COMPUTATIONAL & COMPARATIVE GENOMICS November 4 - 10, 2009 Application Deadline: July 15, 2009 INSTRUCTORS: Pearson, William, Ph.D., University of Virginia, Charlottesville, VA Smith, Randall, Ph.D., SmithKline Beecham Pharmaceuticals, King of Prussia, PA Lisa Stubbs, Ph.D., University of Illinois, Urbana, IL Beyond BLAST and FASTA - Alignment: from proteins to genomes - This course presents a comprehensive overview of the theory and practice of computational methods for extracting the maximum amount of information from protein and DNA sequence similarity through sequence database searches, statistical analysis, and multiple sequence alignment, and genome scale alignment. Additional topics include identifying signals in unaligned sequences, integration of genetic and sequence information in biological databases. This year, there will be a special focus on metagenomics and functional prediction. The course combines lectures with hands-on exercises; students are encouraged to pose challenging sequence analysis problems using their own data. The course makes extensive use of local WWW pages to present problem sets and the computing tools to solve them. Students use Windows and Mac workstations attached to a UNIX server. The course is designed for biologists seeking advanced training in biological sequence analysis, computational biology core resource directors and staff, and for scientists in other disciplines, such as computer science, who wish to survey current research problems in biological sequence analysis and comparative genomics. The primary focus of the Computational and Comparative Genomics Course is the theory and practice of algorithms used in computational biology, with the goal of using current methods more effectively and developing new algorithms. Cold Spring Harbor also offers a "Programming for Biology" course, which focuses more on software development. For additional information and the lecture schedule and problem sets for the 2008 course, see: http://fasta.bioch.virginia.edu/cshl/ To apply to the course, fill out and send in the form at: http://meetings.cshl.edu/course/courseapp_instr.shtml Bill Pearson wrp at virginia.edu From giles.weaver at googlemail.com Fri Jul 3 11:35:00 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Fri, 3 Jul 2009 16:35:00 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00907020020o2fa686d2yab6f185785ad8a08@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> <320fb6e00907020020o2fa686d2yab6f185785ad8a08@mail.gmail.com> Message-ID: <1d06cd5d0907030835w14407249l5b47db8893820816@mail.gmail.com> Regarding the trimming algorithm, I've been using a window size of 5, a minimum score of 20 and a minimum length of 15 with the Illumina data. In the past I have used a similar algorithm with a larger window size and much longer minimum length with sequence from ABI 3XXX machines. I imagine that the ideal parameters for ABI SOLiD and Roche 454 would likely be similar to those for Illumina and Sanger sequencing respectively. Window size doesn't appear to affect performance much, if at all. For sequences with multiple good regions, I do extract all good regions. Even with the Illumina data there are sometimes two good regions, but usually the second is adapter or junk and gets filtered out later. I haven't seen quality data from a 454 machine recently, and would be interested to know if multiple good regions are commonplace in 454 data. Can anyone with access to 454 data comment on this? Giles 2009/7/2 Peter Cock > On 7/1/09, Giles Weaver wrote: > > Peter, the trimming algorithm I use employs a sliding window, as follows: > > > > - For each sequence position calculate the mean phred quality score > for a > > window around that position. > > - Record whether the mean score is above or below a threshold as an > array > > of zeros and ones. > > - Use a regular expression on the joined array to find the start and > end > > of the good quality sequence(s). > > - Extract the quality sequence(s) and replace any bases below the > quality > > threshold with N. > > - Trim any Ns from the ends. > > > > A refinement would be to weight the scores from positions in the window, > but > > this could give a performance hit, and the method seems to work well > enough > > as is. > > Thanks for the details - that is a bit more complex that what I had been > thinking. Do you have any favoured window size and quality threshold, > or does this really depend on the data itself? > > Also, if you find a sequence read that goes "good - poor - good" for > example, do you extract the two good regions as two sub reads > (presumably with a minimum length)? This may be silly for Illumina > where the reads are very short, but might make sense for Roche 454. > > > Chris, thanks for committing the fix, I'll give bioperl illumina fastq > > parsing a workout soon. Peter, as much as I'd love to help out with > > biopython, I'm under too much time pressure right now! > > Even use cases are useful - so thank you. > > > Jonathan, some of the Illumina sequencing adapters are listed at > > > http://intron.ccam.uchc.edu/groups/tgcore/wiki/013c0/Solexa_Library_Primer_Sequences.htmland > > http://seqanswers.com/forums/showthread.php?t=198 > > Adapter sequence typically appears towards the end of the read, though > the > > latter part of it is often misread as the sequencing quality drops off. > > I abuse needle (EMBOSS) into aligning the adapter sequence with each > read. I > > then use Bio::AlignIO, Bio::Range and a custom scoring scheme to identify > > real alignments and trim the sequence. This is not the ideal way of doing > > things, but it's fast enough, and does seem to work. The adapter sequence > > shouldn't be gapped, so I'm sure there is a lot of scope for optimising > the > > adapter removal. > > > > I'll happily share some code once I've got it to the stage where I'm not > > embarrassed by it! > > > > Giles > > Cheers, > > Peter > From giles.weaver at googlemail.com Fri Jul 3 11:35:20 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Fri, 3 Jul 2009 16:35:20 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <6CAF4023-7D04-4B56-839F-E587A00DEEEA@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> <6CAF4023-7D04-4B56-839F-E587A00DEEEA@illinois.edu> Message-ID: <1d06cd5d0907030835h351b96ccif14b192b2e0b132c@mail.gmail.com> Chris, I've just tested your Illumina/Solexa fastq parsing code and am pleased to report that I haven't encountered any issues thus far. To give an idea of the processing overhead of object instantiation, fastq parsing performance on a lowly 3GHz Core 2 Duo (using one core) is as follows: Illumina fastq with next_dataset: ~1 million sequences/minute Solexa fastq with next_dataset: ~500000 sequences/minute Illumina fastq with next_seq: ~215000 sequences/minute Solexa fastq with next_seq: ~175000 sequences/minute My quality trimming script does about 300000 sequences/minute with next_dataset, up from ~130000 sequences/minute with next_seq, so it shaves hours off the run time, thanks! Giles 2009/7/1 Chris Fields > On Jul 1, 2009, at 11:27 AM, Giles Weaver wrote: > > ... > > Peter, the trimming algorithm I use employs a sliding window, as follows: >> >> - For each sequence position calculate the mean phred quality score for a >> window around that position. >> - Record whether the mean score is above or below a threshold as an array >> of zeros and ones. >> - Use a regular expression on the joined array to find the start and end >> of the good quality sequence(s). >> - Extract the quality sequence(s) and replace any bases below the quality >> threshold with N. >> - Trim any Ns from the ends. >> >> A refinement would be to weight the scores from positions in the window, >> but >> this could give a performance hit, and the method seems to work well >> enough >> as is. >> >> Chris, thanks for committing the fix, I'll give bioperl illumina fastq >> parsing a workout soon. Peter, as much as I'd love to help out with >> biopython, I'm under too much time pressure right now! >> > > Just let me know if the qual values match up with what is expected. You > can also iterate through the data with hashrefs using next_dataset (faster > than objects). This is from the fastq tests in core: > > ----------------------------------------- > $in_qual = Bio::SeqIO->new(-file => > test_input_file('fastq','test3_illumina.fastq'), > -variant => 'illumina', > -format => 'fastq'); > > $qual = $in_qual->next_dataset(); > > isa_ok($qual, 'HASH'); > is($qual->{-seq}, 'GTTAGCTCCCACCTTAAGATGTTTA'); > is($qual->{-raw_quality}, 'SXXTXXXXXXXXXTTSUXSSXKTMQ'); > is($qual->{-id}, 'FC12044_91407_8_200_406_24'); > is($qual->{-desc}, ''); > is($qual->{-descriptor}, 'FC12044_91407_8_200_406_24'); > is(join(',',@{$qual->{-qual}}[0..10]), '19,24,24,20,24,24,24,24,24,24,24'); > ----------------------------------------- > > So one could check those values directly and then filter them through as > needed directly into Bio::Seq::Quality if necessary (note some of the key > values are constructor args): > > my $qualobj = Bio::Seq::Quality->new(%$qual); > > chris > From Xianjun.Dong at bccs.uib.no Fri Jul 3 12:22:01 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Fri, 03 Jul 2009 18:22:01 +0200 Subject: [Bioperl-l] [Bio::Graphics::Panel] code reference cannot pass to -link, why? Message-ID: <4A4E3029.4020109@ii.uib.no> Hi, I have a problem while using the -link in Bio::Graphics (version 1.96): As the POD of Bio::Graphics described (http://search.cpan.org/~lds/Bio-Graphics-1.96/lib/Bio/Graphics/Panel.pm#Creating_Imagemaps), link format like: -link => 'http://www.google.com/search?q=$description' works well in my code, but the format like -link => sub { my ($feature,$panel) = @_; my $type = $feature->primary_tag; my $name = $feature->display_name; if ($primary_tag eq 'clone') { return "http://www.google.com/search?q=$name"; } else { return "http://www.yahoo.com/search?p=$name"; } does not output image map as expected. Here I attached a simple code as example for anyone who is willing to test for me: #!/usr/bin/perl use strict; use Bio::Graphics; use Bio::Graphics::Feature; my $ftr= 'Bio::Graphics::Feature'; # processed_transcript my $trans1 = $ftr->new(-start=>50,-end=>10,-display_name=>'ZK154.1',-type=>'UTR'); my $trans2 = $ftr->new(-start=>100,-end=>50,-display_name=>'ZK154.2',-type=>'CDS'); my $trans3 = $ftr->new(-start=>350,-end=>225,-display_name=>'ZK154.3',-type=>'CDS', -source=>'a'); my $trans4 = $ftr->new(-start=>700,-end=>650,-display_name=>'ZK154.4',-type=>'UTR'); my @trans = ($trans1,$trans2,$trans3,$trans4); my $panel= Bio::Graphics::Panel->new(-start =>0,-length=>1050); $panel->add_track(\@trans, -glyph => 'transcript2', # This works well! #-link => 'http://www.google.com/search?q=$name', # while, the following code does not work as expected. -link => sub { my ($feature,$panel) = @_; my $type = $feature->primary_tag; my $name = $feature->display_name; if ($type eq 'CDS') { return "http://www.google.com/search?q=$name"; } else { return "http://www.yahoo.com/search?p=$name"; } } ); my $map = $panel->create_web_map("mapname"); print $map; $panel->finished(); In my test (Bioperl 1.6.0), its output is: It seems $feature->primary_tag returns 'track' (I don't know where this come from...), but not the type of features. Anyone has clue for this problem? Thanks -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== From cjfields at illinois.edu Fri Jul 3 13:34:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 3 Jul 2009 12:34:25 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907030835h351b96ccif14b192b2e0b132c@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <320fb6e00907010044v38480030hd5cf89ad149cf738@mail.gmail.com> <30B8D613-EDD6-4F2F-9B29-C34B8F60CB2E@illinois.edu> <1d06cd5d0907010927p4aad2a7re7ce1e65245e67de@mail.gmail.com> <6CAF4023-7D04-4B56-839F-E587A00DEEEA@illinois.edu> <1d06cd5d0907030835h351b96ccif14b192b2e0b132c@mail.gmail.com> Message-ID: <8A756A09-8E26-4B6A-9390-151533EDB48A@illinois.edu> No problem. Scary to see that creating an instance is 2-4x slower than a simple hash ref. Not sure there is an easy way around that; maybe we need a direct_new? The next step is to ensure this works cross-platform and get indexing (via Bio::Index::Fastq) optimized. Would be nice to get output working with the hash refs as well. chris On Jul 3, 2009, at 10:35 AM, Giles Weaver wrote: > Chris, I've just tested your Illumina/Solexa fastq parsing code and > am pleased to report that I haven't encountered any issues thus far. > > To give an idea of the processing overhead of object instantiation, > fastq parsing performance on a lowly 3GHz Core 2 Duo (using one > core) is as follows: > Illumina fastq with next_dataset: ~1 million sequences/minute > Solexa fastq with next_dataset: ~500000 sequences/minute > Illumina fastq with next_seq: ~215000 sequences/minute > Solexa fastq with next_seq: ~175000 sequences/minute > > My quality trimming script does about 300000 sequences/minute with > next_dataset, up from ~130000 sequences/minute with next_seq, so it > shaves hours off the run time, thanks! > > Giles > > 2009/7/1 Chris Fields > On Jul 1, 2009, at 11:27 AM, Giles Weaver wrote: > > ... > > > Peter, the trimming algorithm I use employs a sliding window, as > follows: > > - For each sequence position calculate the mean phred quality score > for a > window around that position. > - Record whether the mean score is above or below a threshold as an > array > of zeros and ones. > - Use a regular expression on the joined array to find the start > and end > of the good quality sequence(s). > - Extract the quality sequence(s) and replace any bases below the > quality > threshold with N. > - Trim any Ns from the ends. > > A refinement would be to weight the scores from positions in the > window, but > this could give a performance hit, and the method seems to work well > enough > as is. > > Chris, thanks for committing the fix, I'll give bioperl illumina fastq > parsing a workout soon. Peter, as much as I'd love to help out with > biopython, I'm under too much time pressure right now! > > Just let me know if the qual values match up with what is expected. > You can also iterate through the data with hashrefs using > next_dataset (faster than objects). This is from the fastq tests in > core: > > ----------------------------------------- > $in_qual = Bio::SeqIO->new(-file => > test_input_file('fastq','test3_illumina.fastq'), > -variant => 'illumina', > -format => 'fastq'); > > $qual = $in_qual->next_dataset(); > > isa_ok($qual, 'HASH'); > is($qual->{-seq}, 'GTTAGCTCCCACCTTAAGATGTTTA'); > is($qual->{-raw_quality}, 'SXXTXXXXXXXXXTTSUXSSXKTMQ'); > is($qual->{-id}, 'FC12044_91407_8_200_406_24'); > is($qual->{-desc}, ''); > is($qual->{-descriptor}, 'FC12044_91407_8_200_406_24'); > is(join(',',@{$qual->{-qual}}[0..10]), > '19,24,24,20,24,24,24,24,24,24,24'); > ----------------------------------------- > > So one could check those values directly and then filter them > through as needed directly into Bio::Seq::Quality if necessary (note > some of the key values are constructor args): > > my $qualobj = Bio::Seq::Quality->new(%$qual); > > chris > From lskatz at gatech.edu Fri Jul 3 18:08:43 2009 From: lskatz at gatech.edu (Lee Katz) Date: Fri, 3 Jul 2009 22:08:43 +0000 (UTC) Subject: [Bioperl-l] chromatogram References: <7925de940711131727q8f05370h92dc60db4bae782f@mail.gmail.com> <473B5ED8.1090201@mail.nih.gov> <473B62D9.8010004@mail.nih.gov> <7925de940711150524p167bb266xb7fc78693f0848ed@mail.gmail.com> <1195136486.2785.12.camel@localhost.localdomain> Message-ID: Thank you Scott. I know that this message is really late, but I got really side tracked and want to follow through with this. My code so far is a mutt between everything I found online. It only produces a generic track though; however, I want to produce a chromatogram image, as shown on the gbrowse tutorial I found at http://wheat.pw.usda.gov/gbrowse/tutorial/tutorial.html (section 15: Displaying Trace Data, where a semantic zoom is shown). Can you guys help me finish it off? Thanks. use Bio::Graphics; use Bio::Seq; use Bio::SeqFeature::Generic; my @scfFile=qw(1.scf 2.scf); my $bsg = 'Bio::SeqFeature::Generic'; my $seq = Bio::Seq->new(-length=>900); my $whole = $bsg->new(-display_name => 'Clone82', -start => 1, -end => $seq->length); my $trace1 = $bsg->new(-start => 1, -end => 500, -display_name => 'Trace', -tag=>{ trace=>"$scfFile[0]" } ); my $panel = Bio::Graphics::Panel->new(-length => $seq->length, -width => 800, -truecolor => 1, -key_style => 'between', -pad_left => 10, -pad_right => 10, ); $panel->add_track($whole, -glyph => 'arrow', -double => 1, -tick => 2, -label => 1, ); $panel->add_track([$trace1], -feature=>'read', -strand_arrow=>1, -glyph => 'trace', -a_color=>'green', -c_color=>'blue', -g_color=>'black', -t_color=>'red', -trace_height=>80, -description=>1, -label => 1, -key => 'Traces'); binmode STDOUT; print $panel->png; From hlapp at gmx.net Sat Jul 4 06:39:37 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 4 Jul 2009 12:39:37 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907021128.21239.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> Message-ID: <5EC3CB83-22AD-4C79-9F6C-047ED58B7962@gmx.net> Hi Florian: On Jul 2, 2009, at 11:28 AM, Florian Mittag wrote: > Hi! > > I previously posted a message on the BioSQL mailinglist regarding a > BioSQL > schema for DB2 and we are several steps closer to completion now. Good to hear! > We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL > to fill > our DB2 database with taxonomy data Would you mind posting to the BioSQL list which changes you had to make to make the script work with DB2? More generally, is there some kind of comprehensive documentation on what is different in DB2 from standard SQL92? The load_ncbi_taxonomy.pl script should in principle work with any SQL92- compliant RDBMS ... Have you found that not to be the case (which would be a bug), or is DB2 in some ways not SQL92-compliant? > , but loading the gene ontology with BioPerl's "load_ontology.pl" is > somewhat harder. The ontology as well as the sequence loader are really just front-ends to the Bioperl-db object-relational mappers (ORMs). So I would start there, rather than looking at errors the script does or does not throw (you don't want to run all combinations of command line parameters that would exercise each and every feature of the script). In order to create DB2 driver support in Bioperl-db, you need to add two things. First, you need to create a module Bio/DB/DBI/DB2.pm that overrides the methods from base.pm according to DB2. The fact that you didn't report any errors about that module not having been found suggests that you've done this already. The second step is as you say to create a package Bio/DB/BioSQL/DB2 with at least BasePersistenceAdaptorDriver.pm as module in it, and starting with a copy of the existing ones is indeed the best way to get started on this. Unless you also created the DB2 database DDL scripts from the Oracle ones, I wouldn't necessarily copy from Oracle though, but maybe rather from Pg. And rather than looking for errors of one of the scripts, I'd just go systematically through the files and make sure the SQL in there is DB2 compliant. > [...] > It first ran a few minutes processing the file and then died after the > following SQL-command was prepared and executed: > > "SELECT term.term_id, term.identifier, term.name, term.definition, > term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier > = ?" Could you post the full error message? It is rather difficult to diagnose what's going on w/o the error message and stack trace. I'd be surprised BTW if DB2 were indeed offended by the NULL in the above statement - I'm pretty sure that "SELECT NULL FROM sometable" (or "SELECT 1 FROM sometable") is standard SQL. Are you sure that if you execute such a statement at a SQL prompt it results in an error? Since I can hardly believe that DB2 doesn't support selecting constants (NULL is as much a constant as 1 is), maybe what it wants though is aliasing the column. So if SELECT NULL FROM bioentry; yields an error, does SELECT NULL AS colAlias FROM bioentry; work fine? > I don't know if the "NULL" column is supposed to be there It is. The code in BaseDriver.pm that you were looking at should not need to be modified. (Rather, DB2/BasePersistenceAdaptorDriver.pm is supposed to override any method that needs to be adapted to DB2.) The way the ORM works is by trying to map all properties of a BioPerl object that are persistent to a column of a table in the database. If it can't map a property (for whatever reason) its value is simply always undef (or NULL in SQL). I.e., NULL columns are the placeholder for a column that failed to be mapped to a property. You can't simply remove them or all subsequent columns are shifted. Hth, -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Jul 4 08:02:33 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 4 Jul 2009 08:02:33 -0400 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: <59386.10.2.4.168.1245679938.squirrel@webmail.istge.it> References: <59386.10.2.4.168.1245679938.squirrel@webmail.istge.it> Message-ID: Hi Achille, according to Chris Mungall from the GO Consortium, the .ontology files have been deprecated by GO. You should use the .obo files instead, and BioPerl has a parser for that (and load_ontology.pl supports all formats that BioPerl supports). There has been a near identical issue report earlier (April 20 - I don't have the thread from the archives at hand). According to Chris, the BioPerl parser for the .ontology files appears to fail to deal with the new relations in GO, and so with the obsoletion of the .ontology format we have scheduled the respective parser for deprecation. -hilmar On Jun 22, 2009, at 10:12 AM, Achille Zappa wrote: > Hi guys > > I'm working with biosql and I try to figure out how to load ontologies > into biosql. > > I've tried to load the flat files gene ontologies : > > load_ontology.pl --driver mysql --dbuser xxx --dbpass xxx --host > localhost --dbname biosql --namespace "Gene Ontology" --format goflat > --fmtargs "-defs_file,GO.defs" function.ontology process.ontology > component.ontology > > as in the script info but I have an error, > > a lot of ------------ WARNING --------------------- > MSG: DBLink exists in the dblink of _default > --------------------------------------------------- > and at the end > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: format error (file /home/user/Download/process.ontology) > offending line: > -negative regulation of angiogenesis ; GO:0016525 ; synonym:down > regulation of angiogenesis ; synonym:down\-regulation of angiogenesis > ; synonym:downregulation of angiogenesis ; synonym:inhibition of > angiogenesis % negative regulation of developmental process ; > GO:0051093 % regulation of angiogenesis ; GO:0045765 > > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/vendor_perl/5.10.0/Bio/Root/Root.pm:357 > STACK: Bio::OntologyIO::dagflat::_parse_flat_file > /usr/lib/perl5/vendor_perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 > STACK: Bio::OntologyIO::dagflat::parse > /usr/lib/perl5/vendor_perl/5.10.0/Bio/OntologyIO/dagflat.pm:284 > STACK: Bio::OntologyIO::dagflat::next_ontology > /usr/lib/perl5/vendor_perl/5.10.0/Bio/OntologyIO/dagflat.pm:317 > STACK: load_ontology.pl:604 > ----------------------------------------------------------- > > could you help me? > is it possible to use the OBO format with the loader? > those GO flat files are deprecated by the Gene Ontology site > is there a list of format to use with the biosql perl scripts? > > thank you > regards > Achille > > > > > > -- > Achille Zappa > -Bioinformatics > National Cancer Research Institute (IST) > Largo Benzi 10 > 16132 Genova - ITALY > Tel. 010 5737288 > -IEIIT - Sezione di Genova > National Research Council (CNR) > via De Marini, 6 > 16149 Genova - ITALY > > > Aiutaci TU ad aiutare TANTI: Il tuo 5 per MILLE a sostegno della > nostra RICERCA. > Come fare: > Nella prossima dichiarazione dei redditi metti la firma > nell'apposito riquadro del 5 per mille, > scrivendo anche il codice fiscale dell'Istituto Nazionale per la > Ricerca sul Cancro di Genova : > c.f. 80 100 850 108 > Istituto Nazionale per la Ricerca sul Cancro > L.go R. Benzi, 10 -16132 Genova > http://www.istge.it > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Jul 4 09:49:39 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 4 Jul 2009 09:49:39 -0400 Subject: [Bioperl-l] FASTQ output In-Reply-To: References: Message-ID: On Jul 1, 2009, at 5:48 AM, Chris Fields wrote: > I am working on FASTQ output and noticed a real oddity. Apparently, > there are three write_* methods for this module, with the odd choice > of write_seq for Bio::SeqIO::fastq writing FASTA, not FASTQ. > write_qual() writes Qual format: Maybe the motivating thought was that a SeqIO module ought to write sequences when write_seq() is called. I agree though that a writer for a format ought to write that format and not something else. > [...] is there a reason for duplicating output code for qual and > FASTA output within Bio::SeqIO::fastq Hopefully not. > [...] Anyone have problems with me changing that up a bit? Go ahead. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From yewcoccus at gmail.com Sat Jul 4 22:39:23 2009 From: yewcoccus at gmail.com (yewcoccus) Date: Sun, 5 Jul 2009 10:39:23 +0800 Subject: [Bioperl-l] Bio::SeqIO::swiss.pm module help Message-ID: <200907051039202349328@gmail.com> Hi all, I want to parse uniprot_sprot.dat, get each of the features. but I found it is hard to understand how to use the Bio::SeqIO::swiss.pm module. I will be appreciate if there is anyone who can help. ID 002R_IIV3 Reviewed; 458 AA. AC Q197F8; DT 16-JUN-2009, integrated into UniProtKB/Swiss-Prot. DT 11-JUL-2006, sequence version 1. DT 16-JUN-2009, entry version 10. DE RecName: Full=Uncharacterized protein 002R; GN ORFNames=IIV3-002R; OS Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent virus). OC Viruses; dsDNA viruses, no RNA stage; Iridoviridae; Chloriridovirus. OX NCBI_TaxID=345201; OH NCBI_TaxID=7163; Aedes vexans (Inland floodwater mosquito) (Culex vexans). OH NCBI_TaxID=42431; Culex territans. OH NCBI_TaxID=332058; Culiseta annulata. OH NCBI_TaxID=310513; Ochlerotatus sollicitans (eastern saltmarsh mosquito). OH NCBI_TaxID=329105; Ochlerotatus taeniorhynchus. OH NCBI_TaxID=7183; Psorophora ferox. RN [1] RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. RX PubMed=16912294; DOI=10.1128/JVI.00464-06; RA Delhon G., Tulman E.R., Afonso C.L., Lu Z., Becnel J.J., Moser B.A., RA Kutish G.F., Rock D.L.; RT "Genome of invertebrate iridescent virus type 3 (mosquito iridescent RT virus)."; RL J. Virol. 80:8439-8449(2006). CC ----------------------------------------------------------------------- CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms CC Distributed under the Creative Commons Attribution-NoDerivs License CC ----------------------------------------------------------------------- DR EMBL; DQ643392; ABF82032.1; -; Genomic_DNA. DR RefSeq; YP_654574.1; -. DR GeneID; 4156251; -. PE 4: Predicted; FT CHAIN 1 458 Uncharacterized protein 002R. FT /FTId=PRO_0000377938. SQ SEQUENCE 458 AA; 53921 MW; E46E5C85D7ACA139 CRC64; MASNTVSAQG GSNRPVRDFS NIQDVAQFLL FDPIWNEQPG SIVPWKMNRE QALAERYPEL QTSEPSEDYS GPVESLELLP LEIKLDIMQY LSWEQISWCK HPWLWTRWYK DNVVRVSAIT FEDFQREYAF PEKIQEIHFT DTRAEEIKAI LETTPNVTRL VIRRIDDMNY NTHGDLGLDD LEFLTHLMVE DACGFTDFWA PSLTHLTIKN LDMHPRWFGP VMDGIKSMQS TLKYLYIFET YGVNKPFVQW CTDNIETFYC TNSYRYENVP RPIYVWVLFQ EDEWHGYRVE DNKFHRRYMY STILHKRDTD WVENNPLKTP AQVEMYKFLL RISQLNRDGT GYESDSDPEN EHFDDESFSS GEEDSSDEDD PTWAPDSDDS DWETETEEEP SVAARILEKG KLTITNLMKS LGFKPKPKKI QSIDRYFCSL DSNYNSEDED FEYDSDSEDD DSDSEDDC // 2009-07-05 yewcoccus From bosborne11 at verizon.net Sat Jul 4 22:50:40 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 04 Jul 2009 22:50:40 -0400 Subject: [Bioperl-l] Bio::SeqIO::swiss.pm module help In-Reply-To: <200907051039202349328@gmail.com> References: <200907051039202349328@gmail.com> Message-ID: <3B3CB9A4-4C89-4730-B0E7-52D62DFB1BF0@verizon.net> yewcoccus, You took a look at the Feature-Annotation HOWTO? http://www.bioperl.org/wiki/HOWTO:Feature-Annotation Brian O. On Jul 4, 2009, at 10:39 PM, yewcoccus wrote: > Hi all, > > I want to parse uniprot_sprot.dat, get each of the features. > but I found it is hard to understand how to use the > Bio::SeqIO::swiss.pm module. I will be appreciate if there is > anyone who can help. > > > > ID 002R_IIV3 Reviewed; 458 AA. > AC Q197F8; > DT 16-JUN-2009, integrated into UniProtKB/Swiss-Prot. > DT 11-JUL-2006, sequence version 1. > DT 16-JUN-2009, entry version 10. > DE RecName: Full=Uncharacterized protein 002R; > GN ORFNames=IIV3-002R; > OS Invertebrate iridescent virus 3 (IIV-3) (Mosquito iridescent > virus). > OC Viruses; dsDNA viruses, no RNA stage; Iridoviridae; > Chloriridovirus. > OX NCBI_TaxID=345201; > OH NCBI_TaxID=7163; Aedes vexans (Inland floodwater mosquito) > (Culex vexans). > OH NCBI_TaxID=42431; Culex territans. > OH NCBI_TaxID=332058; Culiseta annulata. > OH NCBI_TaxID=310513; Ochlerotatus sollicitans (eastern saltmarsh > mosquito). > OH NCBI_TaxID=329105; Ochlerotatus taeniorhynchus. > OH NCBI_TaxID=7183; Psorophora ferox. > RN [1] > RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. > RX PubMed=16912294; DOI=10.1128/JVI.00464-06; > RA Delhon G., Tulman E.R., Afonso C.L., Lu Z., Becnel J.J., Moser > B.A., > RA Kutish G.F., Rock D.L.; > RT "Genome of invertebrate iridescent virus type 3 (mosquito > iridescent > RT virus)."; > RL J. Virol. 80:8439-8449(2006). > CC > ----------------------------------------------------------------------- > CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms > CC Distributed under the Creative Commons Attribution-NoDerivs > License > CC > ----------------------------------------------------------------------- > DR EMBL; DQ643392; ABF82032.1; -; Genomic_DNA. > DR RefSeq; YP_654574.1; -. > DR GeneID; 4156251; -. > PE 4: Predicted; > FT CHAIN 1 458 Uncharacterized protein 002R. > FT /FTId=PRO_0000377938. > SQ SEQUENCE 458 AA; 53921 MW; E46E5C85D7ACA139 CRC64; > MASNTVSAQG GSNRPVRDFS NIQDVAQFLL FDPIWNEQPG SIVPWKMNRE QALAERYPEL > QTSEPSEDYS GPVESLELLP LEIKLDIMQY LSWEQISWCK HPWLWTRWYK DNVVRVSAIT > FEDFQREYAF PEKIQEIHFT DTRAEEIKAI LETTPNVTRL VIRRIDDMNY NTHGDLGLDD > LEFLTHLMVE DACGFTDFWA PSLTHLTIKN LDMHPRWFGP VMDGIKSMQS TLKYLYIFET > YGVNKPFVQW CTDNIETFYC TNSYRYENVP RPIYVWVLFQ EDEWHGYRVE DNKFHRRYMY > STILHKRDTD WVENNPLKTP AQVEMYKFLL RISQLNRDGT GYESDSDPEN EHFDDESFSS > GEEDSSDEDD PTWAPDSDDS DWETETEEEP SVAARILEKG KLTITNLMKS LGFKPKPKKI > QSIDRYFCSL DSNYNSEDED FEYDSDSEDD DSDSEDDC > // > > 2009-07-05 > > > > yewcoccus > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Jonathan.Moore at warwick.ac.uk Wed Jul 1 06:04:24 2009 From: Jonathan.Moore at warwick.ac.uk (Moore, Jonathan) Date: Wed, 1 Jul 2009 11:04:24 +0100 Subject: [Bioperl-l] Getting errors parsing TIGR XML in SeqIO References: <7BEB494D4E69964C8292CE4EDA2B9811BBB946@LAUREL.ads.warwick.ac.uk> Message-ID: <7BEB494D4E69964C8292CE4EDA2B9811BBB95E@LAUREL.ads.warwick.ac.uk> Thanks for the suggestion Jason. There is a bit of a gulf between the tigrxml test file and the TAIR9 Arabidopsis release in TIGR XML format. BP's tigrxml test file's top-level object is ASSEMBLY, whereas in the TAIR file ASSEMBLY is already two levels deep in the object hierarchy inside TIGR and PSEUDOCHROMOSOME. In addition, the two main objects within the TAIR ASSEMBLY object, GENE_LIST and ASSEMBLY_SEQUENCE, don't get a mention in our test file. Looks like a bit of work would be needed to map this. Jay >There are several flavors of TIGR XML for rice and arabidoposis, and >other projects etc, I don't know which is tracked with the current >tigrxml version unfortunately but one can compare the test files in t/ >data to the versions downloaded to see what is currently supported. >Usually the gbk will be more consistently parseable but we can try and >work it out if it is a sensible transformation. > > >> I'm trying to parse the TAIR9 Arabidopsis release from the TIGR XML >> files at the TAIR FTP site. >> >> I've tried SeqIO with both tigr and tigrxml formats but both are >> giving errors in 1.6.0. Has anyone advice on whether it's likely to >> be doable, or should I wait til the .gb files are available? >> >> Jay Moore > > >-- >Jason Stajich >jason at bioperl.org From johntyree at gmail.com Wed Jul 1 15:36:33 2009 From: johntyree at gmail.com (John Tyree) Date: Wed, 1 Jul 2009 15:36:33 -0400 Subject: [Bioperl-l] Bio::DB::GenBank batch mode usage Message-ID: <459dd5330907011236y31fea4fey8dc20e5274e94d1a@mail.gmail.com> I'm trying to use Bio::DB::GenBank to download a large number of files by accession number. The docs say not to do this in normal mode to reduce server load. There is some kind of helper function associated with this. %params = Bio::DB::GenBank->get_params('batch'); But I don't understand how to use it. If you pass the hash using: Bio::DB::GenBank->new(%params); it raises the following and dies: --------------------- WARNING --------------------- MSG: invalid retrieval type tool must be one of (pipeline,io_string,tempfile --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: seq_start() must be integer value if set STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib64/perl5/site_perl/5.10.0/Bio/Root/Root.pm:357 STACK: Bio::DB::NCBIHelper::seq_start /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:416 STACK: Bio::DB::NCBIHelper::new /usr/lib64/perl5/site_perl/5.10.0/Bio/DB/NCBIHelper.pm:117 STACK: Find_Patient_By_AccNo.pl:93 There is a deprecated method called get_Stream_by_batch() but how does one achieve batch mode using the proper get_Stream_by_id() ? Thanks, John From lacava at gmail.com Fri Jul 3 14:07:50 2009 From: lacava at gmail.com (John LaCava) Date: Fri, 3 Jul 2009 14:07:50 -0400 Subject: [Bioperl-l] Question regarding BioPerl / BioSQL - InterPro Optional IDs Message-ID: <48FCB39E-5CA8-4BE9-825D-49CFB14FDBB7@gmail.com> Hi all, I am trying to use the BioPerl-db script: load_seqdatabase.pl to parse a SwissProt.dat file (Yeast.dat, this is the yeast proteome with annotations etc.). The particular entry I am interested is the InterPro optional ID, which is the domain name. I have put a short stub up which displays the 4 pieces of info I want to parse into my data base. That can be found here: http://github.com/johnraekwon/BioPerl---BioSQL---InterPro-Optional-IDs/tree/master You can see that near the bottom, we get the optional ID: $protein_ids->{interpro_domain} = $dblink->{optional_id}; I do not think the bioperl script load_seqdatabase.pl retrieves this information. At least, I cannot find it in the db built from parsing a test .dat file. I would like some help figuring out: 1) WHY doesn't it retrieve this information, since it seems to be parsing "all" annotations... 2) HOW might I edit the script to include this particular annotation of interest in the info it passes to my db (biosql) I am a bit out of my depth on this, and so, any help is appreciated. Cheers, John From Russell.Smithies at agresearch.co.nz Sun Jul 5 17:00:16 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon, 6 Jul 2009 09:00:16 +1200 Subject: [Bioperl-l] different results with remote-blast skript In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> I'd guess it's a difference in the parameters used. Interesting that both have the number of letters in the db as "-1,125,070,205", I assume that's a bug :-) Stats from your remote_blast: 'stats' => { 'S1' => '42', 'S1_bits' => '20.8', 'lambda' => '0.309', 'entropy' => '0.345', 'kappa_gapped' => '0.0410', 'T' => '11', 'kappa' => '0.122', 'X3_bits' => '24.7', 'X1' => '16', 'lambda_gapped' => '0.267', 'X2' => '38', 'S2' => '74', 'seqs_better_than_cutoff' => '0', 'posted_date' => 'Jul 4, 2009 4:41 AM', 'Hits_to_DB' => '60102303', 'dbletters' => '-1125070205', 'A' => '40', 'num_successful_extensions' => '2004', 'num_extensions' => '1436892', 'X1_bits' => '7.1', 'X3' => '64', 'entropy_gapped' => '0.140', 'dbentries' => '9252258', 'X2_bits' => '14.6', 'S2_bits' => '33.1' } Stats from a blast done on the NCBI webpage: Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jul 4, 2009 4:41 AM Number of letters in database: -1,125,070,205 Number of sequences in database: 9,252,258 Lambda K H 0.309 0.124 0.340 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 9252258 Number of Hits to DB: 86493230 Number of extensions: 3101413 Number of successful extensions: 9001 Number of sequences better than 100: 65 Number of HSP's better than 100 without gapping: 0 Number of HSP's gapped: 9000 Number of HSP's successfully gapped: 66 Length of query: 150 Length of database: 3169897087 Length adjustment: 113 Effective length of query: 37 Effective length of database: 2124391933 Effective search space: 78602501521 Effective search space used: 78602501521 T: 11 A: 40 X1: 16 (7.1 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 42 (20.8 bits) S2: 65 (29.6 bits) > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > Sent: Sunday, 28 June 2009 10:15 p.m. > To: BioPerl List > Subject: [Bioperl-l] different results with remote-blast skript > > Hi again :) > please, I only have this little question: > why do I get different results with my remote::blast perl skript then on the > ncbi blast homepage? > I am using blastp, the query is an amino-sequence (different results with any > sequence, differences not only in number of hits but even in e-values, scores > etc...), the database is 'nr'. > PLEASE help me, > thank you in advance, > Jonas > > ps: my skript: > ############################################################################## > ## > use Bio::Seq::SeqFactory; > use Bio::Tools::Run::RemoteBlast; > use strict; > my @blast_report; > my $prog = 'blastp'; > my $db = 'nr'; > my $e_val= '1e-10'; > #my $e_val= '10'; > my @params = ( '-prog' => $prog, > '-data' => $db, > '-expect' => $e_val, > '-readmethod' => 'SearchIO' ); > my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; > > my > $blast_seq='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD > PDDEYE'; > #$v is just to turn on and off the messages > my $v = 1; > my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => 'Bio::PrimarySeq'); > my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > "$blast_seq"); > my $filename='temp2.out'; > my $r = $factory->submit_blast($seq); > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) > { > foreach my $rid ( @rids ) > { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) > { > if( $rc < 0 ) > { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > } > else > { > my $result = $rc->next_result(); > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) > { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) > { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > > > } > @blast_report = get_file_data ($filename); > return @blast_report; > ############################################################################## > #### > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason at bioperl.org Sun Jul 5 17:40:41 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 5 Jul 2009 14:40:41 -0700 Subject: [Bioperl-l] different results with remote-blast skript In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> Message-ID: integer overflow in blast.... On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > I'd guess it's a difference in the parameters used. > Interesting that both have the number of letters in the db as > "-1,125,070,205", I assume that's a bug :-) > > Stats from your remote_blast: > > 'stats' => { > 'S1' => '42', > 'S1_bits' => '20.8', > 'lambda' => '0.309', > 'entropy' => '0.345', > 'kappa_gapped' => '0.0410', > 'T' => '11', > 'kappa' => '0.122', > 'X3_bits' => '24.7', > 'X1' => '16', > 'lambda_gapped' => '0.267', > 'X2' => '38', > 'S2' => '74', > 'seqs_better_than_cutoff' => '0', > 'posted_date' => 'Jul 4, 2009 4:41 AM', > 'Hits_to_DB' => '60102303', > 'dbletters' => '-1125070205', > 'A' => '40', > 'num_successful_extensions' => '2004', > 'num_extensions' => '1436892', > 'X1_bits' => '7.1', > 'X3' => '64', > 'entropy_gapped' => '0.140', > 'dbentries' => '9252258', > 'X2_bits' => '14.6', > 'S2_bits' => '33.1' > } > > > Stats from a blast done on the NCBI webpage: > > Database: All non-redundant GenBank CDS translations+PDB+SwissProt > +PIR+PRF > excluding environmental samples from WGS projects > Posted date: Jul 4, 2009 4:41 AM > Number of letters in database: -1,125,070,205 > Number of sequences in database: 9,252,258 > > Lambda K H > 0.309 0.124 0.340 > Gapped > Lambda K H > 0.267 0.0410 0.140 > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Sequences: 9252258 > Number of Hits to DB: 86493230 > Number of extensions: 3101413 > Number of successful extensions: 9001 > Number of sequences better than 100: 65 > Number of HSP's better than 100 without gapping: 0 > Number of HSP's gapped: 9000 > Number of HSP's successfully gapped: 66 > Length of query: 150 > Length of database: 3169897087 > Length adjustment: 113 > Effective length of query: 37 > Effective length of database: 2124391933 > Effective search space: 78602501521 > Effective search space used: 78602501521 > T: 11 > A: 40 > X1: 16 (7.1 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 42 (20.8 bits) > S2: 65 (29.6 bits) > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >> Sent: Sunday, 28 June 2009 10:15 p.m. >> To: BioPerl List >> Subject: [Bioperl-l] different results with remote-blast skript >> >> Hi again :) >> please, I only have this little question: >> why do I get different results with my remote::blast perl skript >> then on the >> ncbi blast homepage? >> I am using blastp, the query is an amino-sequence (different >> results with any >> sequence, differences not only in number of hits but even in e- >> values, scores >> etc...), the database is 'nr'. >> PLEASE help me, >> thank you in advance, >> Jonas >> >> ps: my skript: >> ############################################################################## >> ## >> use Bio::Seq::SeqFactory; >> use Bio::Tools::Run::RemoteBlast; >> use strict; >> my @blast_report; >> my $prog = 'blastp'; >> my $db = 'nr'; >> my $e_val= '1e-10'; >> #my $e_val= '10'; >> my @params = ( '-prog' => $prog, >> '-data' => $db, >> '-expect' => $e_val, >> '-readmethod' => 'SearchIO' ); >> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >> $ >> Bio >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = >> '1'; >> >> my >> $ >> blast_seq >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >> PDDEYE'; >> #$v is just to turn on and off the messages >> my $v = 1; >> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >> 'Bio::PrimarySeq'); >> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >> "$blast_seq"); >> my $filename='temp2.out'; >> my $r = $factory->submit_blast($seq); >> print STDERR "waiting..." if( $v > 0 ); >> while ( my @rids = $factory->each_rid ) >> { >> foreach my $rid ( @rids ) >> { >> my $rc = $factory->retrieve_blast($rid); >> if( !ref($rc) ) >> { >> if( $rc < 0 ) >> { >> $factory->remove_rid($rid); >> } >> print STDERR "." if ( $v > 0 ); >> } >> else >> { >> my $result = $rc->next_result(); >> $factory->save_output($filename); >> $factory->remove_rid($rid); >> print "\nQuery Name: ", $result->query_name(), >> "\n"; >> while ( my $hit = $result->next_hit ) >> { >> next unless ( $v > 0); >> print "\thit name is ", $hit->name, "\n"; >> while( my $hsp = $hit->next_hsp ) >> { >> print "\t\tscore is ", $hsp->score, "\n"; >> } >> } >> } >> } >> >> >> } >> @blast_report = get_file_data ($filename); >> return @blast_report; >> ############################################################################## >> #### >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at illinois.edu Sun Jul 5 18:51:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 5 Jul 2009 17:51:39 -0500 Subject: [Bioperl-l] different results with remote-blast skript In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> Message-ID: That inspires confidence ;> chris On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > integer overflow in blast.... > > On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >> I'd guess it's a difference in the parameters used. >> Interesting that both have the number of letters in the db as >> "-1,125,070,205", I assume that's a bug :-) >> >> Stats from your remote_blast: >> >> 'stats' => { >> 'S1' => '42', >> 'S1_bits' => '20.8', >> 'lambda' => '0.309', >> 'entropy' => '0.345', >> 'kappa_gapped' => '0.0410', >> 'T' => '11', >> 'kappa' => '0.122', >> 'X3_bits' => '24.7', >> 'X1' => '16', >> 'lambda_gapped' => '0.267', >> 'X2' => '38', >> 'S2' => '74', >> 'seqs_better_than_cutoff' => '0', >> 'posted_date' => 'Jul 4, 2009 4:41 AM', >> 'Hits_to_DB' => '60102303', >> 'dbletters' => '-1125070205', >> 'A' => '40', >> 'num_successful_extensions' => '2004', >> 'num_extensions' => '1436892', >> 'X1_bits' => '7.1', >> 'X3' => '64', >> 'entropy_gapped' => '0.140', >> 'dbentries' => '9252258', >> 'X2_bits' => '14.6', >> 'S2_bits' => '33.1' >> } >> >> >> Stats from a blast done on the NCBI webpage: >> >> Database: All non-redundant GenBank CDS translations+PDB+SwissProt >> +PIR+PRF >> excluding environmental samples from WGS projects >> Posted date: Jul 4, 2009 4:41 AM >> Number of letters in database: -1,125,070,205 >> Number of sequences in database: 9,252,258 >> >> Lambda K H >> 0.309 0.124 0.340 >> Gapped >> Lambda K H >> 0.267 0.0410 0.140 >> Matrix: BLOSUM62 >> Gap Penalties: Existence: 11, Extension: 1 >> Number of Sequences: 9252258 >> Number of Hits to DB: 86493230 >> Number of extensions: 3101413 >> Number of successful extensions: 9001 >> Number of sequences better than 100: 65 >> Number of HSP's better than 100 without gapping: 0 >> Number of HSP's gapped: 9000 >> Number of HSP's successfully gapped: 66 >> Length of query: 150 >> Length of database: 3169897087 >> Length adjustment: 113 >> Effective length of query: 37 >> Effective length of database: 2124391933 >> Effective search space: 78602501521 >> Effective search space used: 78602501521 >> T: 11 >> A: 40 >> X1: 16 (7.1 bits) >> X2: 38 (14.6 bits) >> X3: 64 (24.7 bits) >> S1: 42 (20.8 bits) >> S2: 65 (29.6 bits) >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>> Sent: Sunday, 28 June 2009 10:15 p.m. >>> To: BioPerl List >>> Subject: [Bioperl-l] different results with remote-blast skript >>> >>> Hi again :) >>> please, I only have this little question: >>> why do I get different results with my remote::blast perl skript >>> then on the >>> ncbi blast homepage? >>> I am using blastp, the query is an amino-sequence (different >>> results with any >>> sequence, differences not only in number of hits but even in e- >>> values, scores >>> etc...), the database is 'nr'. >>> PLEASE help me, >>> thank you in advance, >>> Jonas >>> >>> ps: my skript: >>> ############################################################################## >>> ## >>> use Bio::Seq::SeqFactory; >>> use Bio::Tools::Run::RemoteBlast; >>> use strict; >>> my @blast_report; >>> my $prog = 'blastp'; >>> my $db = 'nr'; >>> my $e_val= '1e-10'; >>> #my $e_val= '10'; >>> my @params = ( '-prog' => $prog, >>> '-data' => $db, >>> '-expect' => $e_val, >>> '-readmethod' => 'SearchIO' ); >>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>> $ >>> Bio >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>> = '1'; >>> >>> my >>> $ >>> blast_seq >>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >>> PDDEYE'; >>> #$v is just to turn on and off the messages >>> my $v = 1; >>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>> 'Bio::PrimarySeq'); >>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>> "$blast_seq"); >>> my $filename='temp2.out'; >>> my $r = $factory->submit_blast($seq); >>> print STDERR "waiting..." if( $v > 0 ); >>> while ( my @rids = $factory->each_rid ) >>> { >>> foreach my $rid ( @rids ) >>> { >>> my $rc = $factory->retrieve_blast($rid); >>> if( !ref($rc) ) >>> { >>> if( $rc < 0 ) >>> { >>> $factory->remove_rid($rid); >>> } >>> print STDERR "." if ( $v > 0 ); >>> } >>> else >>> { >>> my $result = $rc->next_result(); >>> $factory->save_output($filename); >>> $factory->remove_rid($rid); >>> print "\nQuery Name: ", $result->query_name(), >>> "\n"; >>> while ( my $hit = $result->next_hit ) >>> { >>> next unless ( $v > 0); >>> print "\thit name is ", $hit->name, "\n"; >>> while( my $hsp = $hit->next_hsp ) >>> { >>> print "\t\tscore is ", $hsp->score, "\n"; >>> } >>> } >>> } >>> } >>> >>> >>> } >>> @blast_report = get_file_data ($filename); >>> return @blast_report; >>> ############################################################################## >>> #### >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Sun Jul 5 18:00:42 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 5 Jul 2009 18:00:42 -0400 Subject: [Bioperl-l] Syntax for load_interpro.pl? In-Reply-To: References: Message-ID: Hi John, I presume you mean the scripts/biosql/load_interpro.pl script in Bioperl-db. It has indeed been obsoleted for a long time and I guess I should remove it because the functionality is now in load_ontology.pl. This is b/c InterPro for the purposes of BioPerl is an ontology. Have you found it not to work with load_ontology.pl? -hilmar On Jul 5, 2009, at 4:30 PM, John LaCava wrote: > Greetings, > > I am attempting to use this script, but I don't seem to be able to > determine the appropriate syntax. > Documentation on this script seems minimal. Moreover, I am not yet > terribly experienced with > these endeavors. > > Could someone possibly provide me with an example syntax? > > e.g. > load_interpro.pl ... > > then what? > > I must specify -db -file -version ? > > I tried a couple of ways, but I get the similar errors each time: > > e.g. > /usr/local/bin/bp_load_interpro.pl: line 29: syntax error near > unexpected token `$file,' > /usr/local/bin/bp_load_interpro.pl: line 29: `my ($file, $version);' > > Also, from reading the comments, it appears this is supposed to be > made obsolete or superseded by > the script load_ontologies.pl. Why is this? > > Sorry to bother, and thanks in advance. > > John > _______________________________________________ > BioSQL-l mailing list > BioSQL-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biosql-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lacava at gmail.com Sun Jul 5 23:12:18 2009 From: lacava at gmail.com (John LaCava) Date: Sun, 5 Jul 2009 23:12:18 -0400 Subject: [Bioperl-l] Syntax for load_interpro.pl? In-Reply-To: References: Message-ID: <24A78389-1707-4D86-90E8-9C7F6B33AC0B@gmail.com> Hi again, Thanks for the response. Actually, I felt I did not need the entire functions of the ontology script, and thought I might see what the interpro script would generate, at the potential benefit of lower complexity to me, the novice programmer. But I was having trouble getting the interpro script to run, since I couldnt land the syntax. Anyway, I have since started writing my own script and will not pursue this matter further. Best wishes, John On Jul 5, 2009, at 6:00 PM, Hilmar Lapp wrote: > Hi John, > > I presume you mean the scripts/biosql/load_interpro.pl script in > Bioperl-db. It has indeed been obsoleted for a long time and I guess > I should remove it because the functionality is now in > load_ontology.pl. This is b/c InterPro for the purposes of BioPerl > is an ontology. > > Have you found it not to work with load_ontology.pl? > > -hilmar > > On Jul 5, 2009, at 4:30 PM, John LaCava wrote: > >> Greetings, >> >> I am attempting to use this script, but I don't seem to be able to >> determine the appropriate syntax. >> Documentation on this script seems minimal. Moreover, I am not yet >> terribly experienced with >> these endeavors. >> >> Could someone possibly provide me with an example syntax? >> >> e.g. > load_interpro.pl ... >> >> then what? >> >> I must specify -db -file -version ? >> >> I tried a couple of ways, but I get the similar errors each time: >> >> e.g. >> /usr/local/bin/bp_load_interpro.pl: line 29: syntax error near >> unexpected token `$file,' >> /usr/local/bin/bp_load_interpro.pl: line 29: `my ($file, $version);' >> >> Also, from reading the comments, it appears this is supposed to be >> made obsolete or superseded by >> the script load_ontologies.pl. Why is this? >> >> Sorry to bother, and thanks in advance. >> >> John >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biosql-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From hlapp at gmx.net Sun Jul 5 23:30:34 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 5 Jul 2009 23:30:34 -0400 Subject: [Bioperl-l] Syntax for load_interpro.pl? In-Reply-To: <24A78389-1707-4D86-90E8-9C7F6B33AC0B@gmail.com> References: <24A78389-1707-4D86-90E8-9C7F6B33AC0B@gmail.com> Message-ID: <7C8AB20D-FDFE-41B4-BB96-C93A1013B35D@gmx.net> Hi John - the load_ontology.pl script is oriented towards the end-user (it runs off-the-bat), indeed not the novice programmer. Was there something that the script doesn't do that you wanted to program into it? -hilmar On Jul 5, 2009, at 11:12 PM, John LaCava wrote: > Hi again, > > Thanks for the response. > > Actually, I felt I did not need the entire functions of the ontology > script, and thought I might see what > the interpro script would generate, at the potential benefit of > lower complexity to me, the novice programmer. > But I was having trouble getting the interpro script to run, since I > couldnt land the syntax. > > Anyway, I have since started writing my own script and will not > pursue this matter further. > > Best wishes, > John > > > On Jul 5, 2009, at 6:00 PM, Hilmar Lapp wrote: > >> Hi John, >> >> I presume you mean the scripts/biosql/load_interpro.pl script in >> Bioperl-db. It has indeed been obsoleted for a long time and I >> guess I should remove it because the functionality is now in >> load_ontology.pl. This is b/c InterPro for the purposes of BioPerl >> is an ontology. >> >> Have you found it not to work with load_ontology.pl? >> >> -hilmar >> >> On Jul 5, 2009, at 4:30 PM, John LaCava wrote: >> >>> Greetings, >>> >>> I am attempting to use this script, but I don't seem to be able to >>> determine the appropriate syntax. >>> Documentation on this script seems minimal. Moreover, I am not >>> yet terribly experienced with >>> these endeavors. >>> >>> Could someone possibly provide me with an example syntax? >>> >>> e.g. > load_interpro.pl ... >>> >>> then what? >>> >>> I must specify -db -file -version ? >>> >>> I tried a couple of ways, but I get the similar errors each time: >>> >>> e.g. >>> /usr/local/bin/bp_load_interpro.pl: line 29: syntax error near >>> unexpected token `$file,' >>> /usr/local/bin/bp_load_interpro.pl: line 29: `my ($file, $version);' >>> >>> Also, from reading the comments, it appears this is supposed to be >>> made obsolete or superseded by >>> the script load_ontologies.pl. Why is this? >>> >>> Sorry to bother, and thanks in advance. >>> >>> John >>> _______________________________________________ >>> BioSQL-l mailing list >>> BioSQL-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biosql-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From lacava at gmail.com Mon Jul 6 00:10:01 2009 From: lacava at gmail.com (John LaCava) Date: Mon, 6 Jul 2009 00:10:01 -0400 Subject: [Bioperl-l] Syntax for load_interpro.pl? In-Reply-To: <7C8AB20D-FDFE-41B4-BB96-C93A1013B35D@gmx.net> References: <24A78389-1707-4D86-90E8-9C7F6B33AC0B@gmail.com> <7C8AB20D-FDFE-41B4-BB96-C93A1013B35D@gmx.net> Message-ID: <0258CC4C-1CED-41DC-A637-025E37C7D913@gmail.com> Hi again, No, mainly I was scanning the different scripts to see if any of them were friendly towards capturing the InterPro domain-type info that we discussed in relation to load-seqdatabase.pl. I plan to explore the options you mentioned in your other reply to that thread, so that I can take full advantage of this script and the BioSQL schema... However, in the mean time I have authored a small and somewhat crappy script that will capture the SwissProt protein ID and accession number as well as the InterPro accession number and domain type and pass these into the bioentry and dbxref tables of the BioSQL schema. I had to add an additional column to the dbxref table that would accept the InterPro domain type (the optional ID) since I couldn't get it into dbxref_qualifier_value table without upsetting the mysql foreign key settings. Cheers again, John On Jul 5, 2009, at 11:30 PM, Hilmar Lapp wrote: > Hi John - the load_ontology.pl script is oriented towards the end- > user (it runs off-the-bat), indeed not the novice programmer. Was > there something that the script doesn't do that you wanted to > program into it? > > -hilmar > > On Jul 5, 2009, at 11:12 PM, John LaCava wrote: > >> Hi again, >> >> Thanks for the response. >> >> Actually, I felt I did not need the entire functions of the >> ontology script, and thought I might see what >> the interpro script would generate, at the potential benefit of >> lower complexity to me, the novice programmer. >> But I was having trouble getting the interpro script to run, since >> I couldnt land the syntax. >> >> Anyway, I have since started writing my own script and will not >> pursue this matter further. >> >> Best wishes, >> John >> >> >> On Jul 5, 2009, at 6:00 PM, Hilmar Lapp wrote: >> >>> Hi John, >>> >>> I presume you mean the scripts/biosql/load_interpro.pl script in >>> Bioperl-db. It has indeed been obsoleted for a long time and I >>> guess I should remove it because the functionality is now in >>> load_ontology.pl. This is b/c InterPro for the purposes of BioPerl >>> is an ontology. >>> >>> Have you found it not to work with load_ontology.pl? >>> >>> -hilmar >>> >>> On Jul 5, 2009, at 4:30 PM, John LaCava wrote: >>> >>>> Greetings, >>>> >>>> I am attempting to use this script, but I don't seem to be able >>>> to determine the appropriate syntax. >>>> Documentation on this script seems minimal. Moreover, I am not >>>> yet terribly experienced with >>>> these endeavors. >>>> >>>> Could someone possibly provide me with an example syntax? >>>> >>>> e.g. > load_interpro.pl ... >>>> >>>> then what? >>>> >>>> I must specify -db -file -version ? >>>> >>>> I tried a couple of ways, but I get the similar errors each time: >>>> >>>> e.g. >>>> /usr/local/bin/bp_load_interpro.pl: line 29: syntax error near >>>> unexpected token `$file,' >>>> /usr/local/bin/bp_load_interpro.pl: line 29: `my ($file, >>>> $version);' >>>> >>>> Also, from reading the comments, it appears this is supposed to >>>> be made obsolete or superseded by >>>> the script load_ontologies.pl. Why is this? >>>> >>>> Sorry to bother, and thanks in advance. >>>> >>>> John >>>> _______________________________________________ >>>> BioSQL-l mailing list >>>> BioSQL-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biosql-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From pmr at ebi.ac.uk Mon Jul 6 10:09:21 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 06 Jul 2009 15:09:21 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> Message-ID: <4A520591.3070407@ebi.ac.uk> Giles Weaver wrote: > I'm developing a transcriptomics database for use with next-gen data, and > have found processing the raw data to be a big hurdle. > > I'm a bit late in responding to this thread, so most issues have already > been discussed. One thing that hasn't been mentioned is removal of adapters > from raw Illumina sequence. This is a PITA, and I'm not aware of any well > developed and documented open source software for removal of adapters (and > poor quality sequence) from Illumina reads. We would like to add this to EMBOSS. Can you describe the method you would like to use (I see you currently use a combination of bioperl and emboss for this). > For my purposes the tools that would love to see supported in > bioperl/bioperl-run are: > > - next-gen sequence quality parsing (to output phred scores) > - sequence quality based trimming > - sequencing adapter removal > - filtering based on sequence complexity (repeats, entropy etc) > - bioperl-run modules for bowtie etc. We would like to see these supported in all the Open-Bio Projects and they are a priority for EMBOSS. Can you suggest quality filters, trimming methods, adaptor removal methods, sequence filters and any other applications we could provide in EMBOSS. We hope to keep in line with what the other projects do so that EMBOSS, bioperl, biopython etc. can be used interchangeably in pipelines. > Obviously all of these need to be fast! .... My > current code trims ~1300 sequences/second, including unzipping the raw data > and converting it to sanger fastq with biopython. Processing an entire > sequencing run with the whole pipeline takes in the region of 6-12h. OK, we will see what speed we can reach. > Hope this looooong post was of interest to someone! Very interesting! regards, Peter Rice From Jonas_Schaer at gmx.de Sun Jul 5 11:46:52 2009 From: Jonas_Schaer at gmx.de (Jonas Schaer) Date: Sun, 5 Jul 2009 17:46:52 +0200 Subject: [Bioperl-l] bioperl 1.6?? Message-ID: <51AF33DA19004A7B891743D1B094F4B1@jonas> what is the difference between bioperl 1.6 and 1.5.2??? which one should i use??? thx, jonas From Brotelzwieb at gmx.de Mon Jul 6 08:14:18 2009 From: Brotelzwieb at gmx.de (Jonas Schaer) Date: Mon, 6 Jul 2009 14:14:18 +0200 Subject: [Bioperl-l] different results with remote-blast skript References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> Message-ID: <46A05E0132144D73A0F805953B580B2F@jonas> Hi guys, thanks for your answers so far. @jason: integer overflow in blast.... sorry, but what do you mean by that? how can I fix it...? Since I never really changed any parameters I thought them all to be default. whatever, I tried to get "better" results with my prog by changing these: $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; with no effect...I guess these were default values anyway. So please maybe you can tell me all the other parameters I can change with my perl-skript AND how to do that? Unfortunately both, perl and the blast-algorithm are pretty much new to me, maybe thats why I just cannot find out how to do that on my own... :/ Here is the output I get with my remote-blast skript: ################################################################################################################# Query Name: MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL L hit name is ref|XP_001702807.1| score is 442 BLASTP 2.2.21+ Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for composition-based statistics: Alejandro A. Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements", Nucleic Acids Res. 29:2994-3005. RID: 53STX5G2013 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects 9,252,587 sequences; 3,169,972,781 total letters Query= MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM ATGPDPDDEYE Length=150 Score E Sequences producing significant alignments: (Bits) Value ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174 2e-42 ALIGNMENTS >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] Length=303 Score = 174 bits (442), Expect = 2e-42, Method: Composition-based stats. Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%) Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS 213 Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR 120 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR 273 Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects Posted date: Jul 5, 2009 4:41 AM Number of letters in database: -1,124,994,511 Number of sequences in database: 9,252,587 Lambda K H 0.309 0.122 0.345 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Sequences: 9252587 Number of Hits to DB: 60273703 Number of extensions: 1448367 Number of successful extensions: 2103 Number of sequences better than 10: 0 Number of HSP's better than 10 without gapping: 0 Number of HSP's gapped: 2113 Number of HSP's successfully gapped: 0 Length of query: 150 Length of database: 3169972781 Length adjustment: 113 Effective length of query: 37 Effective length of database: 2124430450 Effective search space: 78603926650 Effective search space used: 78603926650 T: 11 A: 40 X1: 16 (7.1 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 42 (20.8 bits) S2: 74 (33.1 bits) ################################################################################################################# and here are the hits (?) of the blast-algorithm on the ncbi-homepage with the same query of course: ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300 3e-80 ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2 1.1 ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4 1.8 ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3 4.2 ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5 6.0 ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5 7.0 #################################################################################################################at least the first hit is the same, but even there there is a different score and e-value. thanks so much for any help :) regards, jonas ----- Original Message ----- From: "Chris Fields" To: "Jason Stajich" Cc: "Smithies, Russell" ; "'BioPerl List'" ; "'Jonas Schaer'" Sent: Monday, July 06, 2009 12:51 AM Subject: Re: [Bioperl-l] different results with remote-blast skript > That inspires confidence ;> > > chris > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > >> integer overflow in blast.... >> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >> >>> I'd guess it's a difference in the parameters used. >>> Interesting that both have the number of letters in the db as >>> "-1,125,070,205", I assume that's a bug :-) >>> >>> Stats from your remote_blast: >>> >>> 'stats' => { >>> 'S1' => '42', >>> 'S1_bits' => '20.8', >>> 'lambda' => '0.309', >>> 'entropy' => '0.345', >>> 'kappa_gapped' => '0.0410', >>> 'T' => '11', >>> 'kappa' => '0.122', >>> 'X3_bits' => '24.7', >>> 'X1' => '16', >>> 'lambda_gapped' => '0.267', >>> 'X2' => '38', >>> 'S2' => '74', >>> 'seqs_better_than_cutoff' => '0', >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>> 'Hits_to_DB' => '60102303', >>> 'dbletters' => '-1125070205', >>> 'A' => '40', >>> 'num_successful_extensions' => '2004', >>> 'num_extensions' => '1436892', >>> 'X1_bits' => '7.1', >>> 'X3' => '64', >>> 'entropy_gapped' => '0.140', >>> 'dbentries' => '9252258', >>> 'X2_bits' => '14.6', >>> 'S2_bits' => '33.1' >>> } >>> >>> >>> Stats from a blast done on the NCBI webpage: >>> >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt >>> +PIR+PRF >>> excluding environmental samples from WGS projects >>> Posted date: Jul 4, 2009 4:41 AM >>> Number of letters in database: -1,125,070,205 >>> Number of sequences in database: 9,252,258 >>> >>> Lambda K H >>> 0.309 0.124 0.340 >>> Gapped >>> Lambda K H >>> 0.267 0.0410 0.140 >>> Matrix: BLOSUM62 >>> Gap Penalties: Existence: 11, Extension: 1 >>> Number of Sequences: 9252258 >>> Number of Hits to DB: 86493230 >>> Number of extensions: 3101413 >>> Number of successful extensions: 9001 >>> Number of sequences better than 100: 65 >>> Number of HSP's better than 100 without gapping: 0 >>> Number of HSP's gapped: 9000 >>> Number of HSP's successfully gapped: 66 >>> Length of query: 150 >>> Length of database: 3169897087 >>> Length adjustment: 113 >>> Effective length of query: 37 >>> Effective length of database: 2124391933 >>> Effective search space: 78602501521 >>> Effective search space used: 78602501521 >>> T: 11 >>> A: 40 >>> X1: 16 (7.1 bits) >>> X2: 38 (14.6 bits) >>> X3: 64 (24.7 bits) >>> S1: 42 (20.8 bits) >>> S2: 65 (29.6 bits) >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>> To: BioPerl List >>>> Subject: [Bioperl-l] different results with remote-blast skript >>>> >>>> Hi again :) >>>> please, I only have this little question: >>>> why do I get different results with my remote::blast perl skript >>>> then on the >>>> ncbi blast homepage? >>>> I am using blastp, the query is an amino-sequence (different >>>> results with any >>>> sequence, differences not only in number of hits but even in e- >>>> values, scores >>>> etc...), the database is 'nr'. >>>> PLEASE help me, >>>> thank you in advance, >>>> Jonas >>>> >>>> ps: my skript: >>>> ############################################################################## >>>> ## >>>> use Bio::Seq::SeqFactory; >>>> use Bio::Tools::Run::RemoteBlast; >>>> use strict; >>>> my @blast_report; >>>> my $prog = 'blastp'; >>>> my $db = 'nr'; >>>> my $e_val= '1e-10'; >>>> #my $e_val= '10'; >>>> my @params = ( '-prog' => $prog, >>>> '-data' => $db, >>>> '-expect' => $e_val, >>>> '-readmethod' => 'SearchIO' ); >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>> $ >>>> Bio >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>> = '1'; >>>> >>>> my >>>> $ >>>> blast_seq >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >>>> PDDEYE'; >>>> #$v is just to turn on and off the messages >>>> my $v = 1; >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>> 'Bio::PrimarySeq'); >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>>> "$blast_seq"); >>>> my $filename='temp2.out'; >>>> my $r = $factory->submit_blast($seq); >>>> print STDERR "waiting..." if( $v > 0 ); >>>> while ( my @rids = $factory->each_rid ) >>>> { >>>> foreach my $rid ( @rids ) >>>> { >>>> my $rc = $factory->retrieve_blast($rid); >>>> if( !ref($rc) ) >>>> { >>>> if( $rc < 0 ) >>>> { >>>> $factory->remove_rid($rid); >>>> } >>>> print STDERR "." if ( $v > 0 ); >>>> } >>>> else >>>> { >>>> my $result = $rc->next_result(); >>>> $factory->save_output($filename); >>>> $factory->remove_rid($rid); >>>> print "\nQuery Name: ", $result->query_name(), >>>> "\n"; >>>> while ( my $hit = $result->next_hit ) >>>> { >>>> next unless ( $v > 0); >>>> print "\thit name is ", $hit->name, "\n"; >>>> while( my $hsp = $hit->next_hsp ) >>>> { >>>> print "\t\tscore is ", $hsp->score, "\n"; >>>> } >>>> } >>>> } >>>> } >>>> >>>> >>>> } >>>> @blast_report = get_file_data ($filename); >>>> return @blast_report; >>>> ############################################################################## >>>> #### >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> = >>> = >>> ===================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> ===================================================================== >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> jason at bioperl.org >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09 05:53:00 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 231 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 358 bytes Desc: not available URL: From s_oheigeartaigh at yahoo.co.uk Mon Jul 6 11:01:04 2009 From: s_oheigeartaigh at yahoo.co.uk (Sean ohEigeartaigh) Date: Mon, 6 Jul 2009 15:01:04 +0000 (GMT) Subject: [Bioperl-l] bioperl BLAST question Message-ID: <626933.96171.qm@web27405.mail.ukl.yahoo.com> Hi, I'm trying to use bioperl to limit the number of BLAST results. However, when I use the following bit of code, it limits to less than the cutoff number, and excludes BLAST results that are halfway up the BLAST results page (without the limit) which it shouldn't exclude. $blast = Bio::Tools::Run::StandAloneBlast ->new(program => 'tblastn', database =>$blastdb, b =>100, v =>100, F=>$fil, outfile=>$out) ->blastall($seq1); } Using this bit of code, I get 60 results for my query (out of 173 with no hit limit and an e-value cutoff of e=10). If I use b =>150, v=>150, I get 85 results, and some BLAST results appear halfway up the results page. In other words, the limit seems to be removing results at random throughout the file, and is also not giving me enough results. Am I using the b and v parameters (to limit blast results and blast one-line summaries) incorrectly? Thanks very much for your help, Se?n ? h?igeartaigh From cjfields at illinois.edu Mon Jul 6 11:24:39 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Jul 2009 10:24:39 -0500 Subject: [Bioperl-l] bioperl 1.6?? In-Reply-To: <51AF33DA19004A7B891743D1B094F4B1@jonas> References: <51AF33DA19004A7B891743D1B094F4B1@jonas> Message-ID: <17AE2E44-66F3-49D5-AECE-D016FE66BD66@illinois.edu> The latest one (1.6). chris On Jul 5, 2009, at 10:46 AM, Jonas Schaer wrote: > what is the difference between bioperl 1.6 and 1.5.2??? > which one should i use??? > thx, jonas > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florian.mittag at uni-tuebingen.de Mon Jul 6 12:08:18 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Mon, 6 Jul 2009 18:08:18 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <5EC3CB83-22AD-4C79-9F6C-047ED58B7962@gmx.net> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <5EC3CB83-22AD-4C79-9F6C-047ED58B7962@gmx.net> Message-ID: <200907061808.18651.florian.mittag@uni-tuebingen.de> Hi! On Saturday 04 July 2009 12:39, Hilmar Lapp wrote: > On Jul 2, 2009, at 11:28 AM, Florian Mittag wrote: > > We were able to adapt the "load_ncbi_taxonomy.pl" script from BioSQL > > to fill > > our DB2 database with taxonomy data > > Would you mind posting to the BioSQL list which changes you had to > make to make the script work with DB2? No problem, I will post the diff sometime this week, since there are a few changes not necessary anymore, e.g., the new DB2 Express-C version 9.7 supports the "TRUNCATE TABLE" command, which it previously didn't. > More generally, is there some kind of comprehensive documentation on > what is different in DB2 from standard SQL92? The > load_ncbi_taxonomy.pl script should in principle work with any SQL92- > compliant RDBMS ... Have you found that not to be the case (which > would be a bug), or is DB2 in some ways not SQL92-compliant? I don't know, I haven't looked for this kind of documentation, but the two things that annoyed me most were: 1) DB2 doesn't support UNIQUE on columns that allow for NULL values. Solution: create triggers that ensure UNIQUEness and create an INDEX. 2) Columns of type CLOB do not allow to be compared through "=", but only through "LIKE", which leads to problems with BioJava's Hibernate queries. Solution: currently none I want to discuss these problems in more detail on the other mailinglists, since they do not really belong here. > > , but loading the gene ontology with BioPerl's "load_ontology.pl" is > > somewhat harder. > > The ontology as well as the sequence loader are really just front-ends > to the Bioperl-db object-relational mappers (ORMs). So I would start > there, rather than looking at errors the script does or does not throw > (you don't want to run all combinations of command line parameters > that would exercise each and every feature of the script). > > In order to create DB2 driver support in Bioperl-db, you need to add > two things. First, you need to create a module Bio/DB/DBI/DB2.pm that > overrides the methods from base.pm according to DB2. The fact that you > didn't report any errors about that module not having been found > suggests that you've done this already. Correct ;-) > The second step is as you say to create a package Bio/DB/BioSQL/DB2 > with at least BasePersistenceAdaptorDriver.pm as module in it, and > starting with a copy of the existing ones is indeed the best way to > get started on this. Unless you also created the DB2 database DDL > scripts from the Oracle ones, I wouldn't necessarily copy from Oracle > though, but maybe rather from Pg. And rather than looking for errors > of one of the scripts, I'd just go systematically through the files > and make sure the SQL in there is DB2 compliant. Okay, I'll do that, but that will take some time and I'll probably turn to this mailings for further assistance with more specific questions. > > [...] > > It first ran a few minutes processing the file and then died after the > > following SQL-command was prepared and executed: > > > > "SELECT term.term_id, term.identifier, term.name, term.definition, > > term.is_obsolete, NULL, term.ontology_id FROM term WHERE identifier > > = ?" > > Could you post the full error message? It is rather difficult to > diagnose what's going on w/o the error message and stack trace. Right now, unfortunately not, because this error message won't appear again. I'm not sure is this is because of the database now containing data or because of some other changes I've made, but I will see this in the process of rewriting the DDL scripts. > I'd be surprised BTW if DB2 were indeed offended by the NULL in the > above statement - I'm pretty sure that "SELECT NULL FROM > sometable" (or "SELECT 1 FROM sometable") is standard SQL. Are you > sure that if you execute such a statement at a SQL prompt it results > in an error? > > Since I can hardly believe that DB2 doesn't support selecting > constants (NULL is as much a constant as 1 is), maybe what it wants > though is aliasing the column. So if > > SELECT NULL FROM bioentry; > > yields an error, does > > SELECT NULL AS colAlias FROM bioentry; > > work fine? Well, it is like this with version 9.5 of DB2 Express-C: SELECT NULL FROM bioentry; yields: SQL0206N "NULL" is not valid in the context where it is used. SQLSTATE=42703 SQLCODE=-206 But if I do: SELECT cast(NULL AS VARCHAR(255)) FROM bioentry; it returns the correct result without error. Thew new version 9.7 claims to have changed this behavior, so that the first query would run fine, but I didn't have time to test the new version, yet. http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.wn.doc/doc/i0054263.html > > > I don't know if the "NULL" column is supposed to be there > > It is. The code in BaseDriver.pm that you were looking at should not > need to be modified. (Rather, DB2/BasePersistenceAdaptorDriver.pm is > supposed to override any method that needs to be adapted to DB2.) The > way the ORM works is by trying to map all properties of a BioPerl > object that are persistent to a column of a table in the database. If > it can't map a property (for whatever reason) its value is simply > always undef (or NULL in SQL). I.e., NULL columns are the placeholder > for a column that failed to be mapped to a property. You can't simply > remove them or all subsequent columns are shifted. It ran fine without the NULL column, but that isn't necessarily a sign of correctness. My problem was that (as stated above) the old version of DB2 requires you to cast the NULL value to a data type, which I wasn't able to determine from the code. With the new version, it should work, so I'll have to rerun my tests again and see if the problem is still there. I will keep you updated on the Perl issues and hope to have some useful results by the end of the week. And I hope you excuse me for posting things here that are hardly related to BioPerl, but the some problems are a complex entanglement of issues with BioSQL, BioPerl and BioJava, so it's hard to decide where to post it ;-) Regards, Florian From biopython at maubp.freeserve.co.uk Mon Jul 6 12:19:54 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 6 Jul 2009 17:19:54 +0100 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907061808.18651.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <5EC3CB83-22AD-4C79-9F6C-047ED58B7962@gmx.net> <200907061808.18651.florian.mittag@uni-tuebingen.de> Message-ID: <320fb6e00907060919w1ce69284r30fede63ec05adbb@mail.gmail.com> On BioPerl-l, July 2009, Florian Mittag wrote: > ... > Okay, I'll do that, but that will take some time and I'll probably turn to > this mailings for further assistance with more specific questions. > ... > I will keep you updated on the Perl issues and hope to have some useful > results by the end of the week. And I hope you excuse me for posting things > here that are hardly related to BioPerl, but the some problems are a complex > entanglement of issues with BioSQL, BioPerl and BioJava, so it's hard to > decide where to post it ;-) You may want to cross post some things (e.g. the hibernate issue to BioSQL and BioJava lists). I've CC'd this reply to BioSQL-l for example. I think some guidance from Hilmar on this etiquette would help ;) I would not expect BioJava people to follow BioPerl-l for example. (Although here I am as a Biopython person keeping an eye on BioPerl-l sometimes). I assume (hope?) that people from all the Bio* projects with BioSQL bindings will be following the BioSQL-l mailing list - so for anything clearly cross project like the schemas themselves, at very least please CC the BioSQL-l mailing list. Peter (Biopython) From cjfields at illinois.edu Mon Jul 6 12:42:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Jul 2009 11:42:10 -0500 Subject: [Bioperl-l] bioperl BLAST question In-Reply-To: <626933.96171.qm@web27405.mail.ukl.yahoo.com> References: <626933.96171.qm@web27405.mail.ukl.yahoo.com> Message-ID: <9441E227-2833-4B3D-AA15-5C5797D7F88F@illinois.edu> On Jul 6, 2009, at 10:01 AM, Sean ohEigeartaigh wrote: > > Hi, > > I'm trying to use bioperl to limit the number of BLAST results. > However, when I use the following bit of code, it limits to less > than the cutoff number, and excludes BLAST results that are halfway > up the BLAST results page (without the limit) which it shouldn't > exclude. > > $blast = Bio::Tools::Run::StandAloneBlast > ->new(program => 'tblastn', database =>$blastdb, b =>100, v > =>100, F=>$fil, outfile=>$out) > ->blastall($seq1); > } > > Using this bit of code, I get 60 results for my query (out of 173 > with no hit limit and an e-value cutoff of e=10). If I use b =>150, > v=>150, I get 85 results, and some BLAST results appear halfway up > the results page. In other words, the limit seems to be removing > results at random throughout the file, and is also not giving me > enough results. The problem is we can't adequately diagnose the problem with the script segment and w/o an example report and description of what you expect. The best way to handle this is to file a bug report so we can look things over: http://www.bioperl.org/wiki/Bugs > Am I using the b and v parameters (to limit blast results and blast > one-line summaries) incorrectly? > Thanks very much for your help, > Se?n ? h?igeartaigh chris From stevey_mac2k2 at hotmail.com Mon Jul 6 14:31:26 2009 From: stevey_mac2k2 at hotmail.com (stephenmcgowan1) Date: Mon, 6 Jul 2009 11:31:26 -0700 (PDT) Subject: [Bioperl-l] Bioperl Installation Message-ID: <24360594.post@talk.nabble.com> Hi, I seem to be having trouble with Installing Bioperl 1.6 in CPAN. I have attached a log of the install, i just can't see why it seems to be falling over. Thanks, Stephen http://www.nabble.com/file/p24360594/BioPerl%2BInstall.rtf BioPerl+Install.rtf http://www.nabble.com/file/p24360594/BioPerl%2BInstall.doc BioPerl+Install.doc -- View this message in context: http://www.nabble.com/Bioperl-Installation-tp24360594p24360594.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From David.Messina at sbc.su.se Mon Jul 6 15:38:09 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 21:38:09 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <24360594.post@talk.nabble.com> References: <24360594.post@talk.nabble.com> Message-ID: <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> Hi Stephen, This is on a Mac, correct? You need to install the developer tools first. The key line in your log is: Can't test without successful make Admittedly, that's cryptic. What it means is that it needs the program called make. That program is installed when you install the developer tools. Go to developer.apple.com and create an account if you don't already have one. Go to the Mac Dev Center, and click on "Xcode 3". This should be the right link: Xcode 3 You'll need to login to get to it, and then you'll get to the download page for the massive 986 MB Xcode 3.1.3 download. After you run the Xcode installer, you can check in Terminal that you've got 'make' installed by typing: which make on the command line. It should give you the answer make is /usr/bin/make If it does, then you're good to try again with the bioperl install. Dave From Kevin.M.Brown at asu.edu Mon Jul 6 15:28:48 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Mon, 6 Jul 2009 12:28:48 -0700 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <24360594.post@talk.nabble.com> References: <24360594.post@talk.nabble.com> Message-ID: <1A4207F8295607498283FE9E93B775B406130F1C@EX02.asurite.ad.asu.edu> Well, without the error messages that should have been printed out not sure how much help we can be. No idea what OS you're running, perl version, etc... > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > stephenmcgowan1 > Sent: Monday, July 06, 2009 11:31 AM > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bioperl Installation > > > Hi, > > I seem to be having trouble with Installing Bioperl 1.6 in CPAN. > > I have attached a log of the install, i just can't see why it > seems to be > falling over. > > Thanks, > > Stephen > > http://www.nabble.com/file/p24360594/BioPerl%2BInstall.rtf > BioPerl+Install.rtf > http://www.nabble.com/file/p24360594/BioPerl%2BInstall.doc > BioPerl+Install.doc > -- > View this message in context: > http://www.nabble.com/Bioperl-Installation-tp24360594p24360594.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cain.cshl at gmail.com Mon Jul 6 15:48:21 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 6 Jul 2009 15:48:21 -0400 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> Message-ID: <72D5229E-A0A8-480A-AC87-6CEE0F1067B7@gmail.com> After you get make installed, you may need to reconfigure cpan so it knows where to find it. Do this: sudo cpan (Assuming you want the libraries installed in the system paths) cpan> o conf init You can probably answer yes to the "do you want me to automatically configure" question, and it should sense that make is now present. If not, do it again and answer "no" and accept all of the defaults until it gets to the part about where make is. Scott On Jul 6, 2009, at 3:38 PM, Dave Messina wrote: > Hi Stephen, > This is on a Mac, correct? > > You need to install the developer tools first. The key line in your > log is: > > Can't test without successful make > > > Admittedly, that's cryptic. What it means is that it needs the program > called make. That program is installed when you install the > developer tools. > > > Go to developer.apple.com and create an account if you don't already > have > one. > > > Go to the Mac Dev Center, and click on "Xcode 3". > > > This should be the right link: > > Xcode 3 > > > You'll need to login to get to it, and then you'll get to the > download page > for the massive 986 MB Xcode 3.1.3 download. > > After you run the Xcode installer, you can check in Terminal that > you've got > 'make' installed by typing: > > which make > > on the command line. It should give you the answer > make is /usr/bin/make > > If it does, then you're good to try again with the bioperl install. > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From scott at scottcain.net Mon Jul 6 16:16:49 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 6 Jul 2009 16:16:49 -0400 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <72D5229E-A0A8-480A-AC87-6CEE0F1067B7@gmail.com> Message-ID: <724E02A7-2A73-422D-9A33-FF945547A777@scottcain.net> Always, always, always reply to the list, as the original author of the email that you are replying to doesn't always know the answer, like now. I don't recall how I installed libxslt. In fact, I don't remember doing it. Have you searched your hard drive? I think it gets installed with the developers tools. Scott On Jul 6, 2009, at 4:01 PM, Steven McGowan wrote: > Hi Scott, > > I removed my previous install with rm -rf ~/.cpan/build/* > > I've tried re-installing (install C/CJ/CJFIELDS/BioPerl- > db-1.6.0.tar.gz), and upon installing have noticed the error: > > Checking prerequisites... > - ERROR: Data::Stag is not installed > > so i have then quit out of the install, and entered "install > Data::Stag" in>CPAN > > but receive the following error messages: > > External Module XML::LibXSLT, XSLT, > is not installed on this computer. > Data::Stag::XSLTHandler in Data::Stag needs it for XSLT > Transformations > > External Module XML::Parser::PerlSAX, SAX Handler, > is not installed on this computer. > Data::Stag::XMLParser in Data::Stag needs it for parsing XML > > External Module GD, Graphical Drawing Toolkit, > is not installed on this computer. > stag-drawtree.pl in Data::Stag needs it for drawing trees > > External Module Graph::Directed, Generic Graph data stucture and > algorithms, > is not installed on this computer. > Data::Stag::GraphHandler in Data::Stag needs it for transforming > stag trees to graphs > > External Module Tk, Tk, > is not installed on this computer. > stag-view.pl in Data::Stag needs it for tree viewer > > ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's > lacking Data::Stag who's install is lacking the list above. How > would i go about installing the above list? is there an easier way > or something i'm doing wrong? > > Thanks, > > Stephen > > > From: cain.cshl at gmail.com > > To: David.Messina at sbc.su.se > > Subject: Re: [Bioperl-l] Bioperl Installation > > Date: Mon, 6 Jul 2009 15:48:21 -0400 > > CC: stevey_mac2k2 at hotmail.com; Bioperl-l at lists.open-bio.org > > > > After you get make installed, you may need to reconfigure cpan so it > > knows where to find it. Do this: > > > > sudo cpan > > > > (Assuming you want the libraries installed in the system paths) > > > > cpan> o conf init > > > > You can probably answer yes to the "do you want me to automatically > > configure" question, and it should sense that make is now present. > If > > not, do it again and answer "no" and accept all of the defaults > until > > it gets to the part about where make is. > > > > Scott > > > > On Jul 6, 2009, at 3:38 PM, Dave Messina wrote: > > > >> Hi Stephen, > >> This is on a Mac, correct? > >> > >> You need to install the developer tools first. The key line in your > >> log is: > >> > >> Can't test without successful make > >> > >> > >> Admittedly, that's cryptic. What it means is that it needs the > program > >> called make. That program is installed when you install the > >> developer tools. > >> > >> > >> Go to developer.apple.com and create an account if you don't > already > >> have > >> one. > >> > >> > >> Go to the Mac Dev Center, and click on "Xcode 3". > >> > >> > >> This should be the right link: > >> > >> Xcode 3 >>> > >> > >> You'll need to login to get to it, and then you'll get to the > >> download page > >> for the massive 986 MB Xcode 3.1.3 download. > >> > >> After you run the Xcode installer, you can check in Terminal that > >> you've got > >> 'make' installed by typing: > >> > >> which make > >> > >> on the command line. It should give you the answer > >> make is /usr/bin/make > >> > >> If it does, then you're good to try again with the bioperl install. > >> > >> Dave > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > ----------------------------------------------------------------------- > > Scott Cain, Ph. D. scott at scottcain dot net > > GMOD Coordinator (http://gmod.org/) 216-392-3087 > > Ontario Institute for Cancer Research > > > > > > > > > > View your Twitter and Flickr updates from one place ? Learn more! ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From David.Messina at sbc.su.se Mon Jul 6 16:26:55 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 22:26:55 +0200 Subject: [Bioperl-l] Fwd: Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> Message-ID: <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Hi Steven, Forwarding this to the list so that everyone can follow along...please keep the list on any replies. Don't quit out of the install -- cpan can automatically detect required dependencies and will try to install them first. Amidst all of the kerfuffle in your previous install there was this bit: ---- Unsatisfied dependencies detected during [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- Test::Harness Data::Stag CPAN Shall I follow them and prepend them to the queue of modules we are processing right now? [yes] So if you go back into cpan, try the Bioperl-1.6 install again, you should be prompted again about those missing dependencies. A note to the Bioperl core-devs: Data::Stag seems to have a couple of tricky dependencies of its own, namely GD and Tk, and it looks like they're for a couple of included scripts which I'm guessing Bioperl doesn't use. Perhaps we should send a request to the Data::Stag author to make GD and Tk optional instead of required? Dave ---------- Forwarded message ---------- From: Steven McGowan Date: Mon, Jul 6, 2009 at 22:02 Subject: RE: [Bioperl-l] Bioperl Installation To: david.messina at sbc.su.se Hi Dave, I managed to sort it and have had a go at installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon installing have noticed the error: Checking prerequisites... - ERROR: Data::Stag is not installed so i have then quit out of the install, and entered "install Data::Stag" in>CPAN but receive the following error messages: External Module XML::LibXSLT, XSLT, is not installed on this computer. Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations External Module XML::Parser::PerlSAX, SAX Handler, is not installed on this computer. Data::Stag::XMLParser in Data::Stag needs it for parsing XML External Module GD, Graphical Drawing Toolkit, is not installed on this computer. stag-drawtree.pl in Data::Stag needs it for drawing trees External Module Graph::Directed, Generic Graph data stucture and algorithms, is not installed on this computer. Data::Stag::GraphHandler in Data::Stag needs it for transforming stag trees to graphs External Module Tk, Tk, is not installed on this computer. stag-view.pl in Data::Stag needs it for tree viewer ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's lacking Data::Stag who's install is lacking the list above. How would i go about installing the above list? is there an easier way or something i'm doing wrong? Thanks, Stephen ------------------------------ From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 21:38:09 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: Bioperl-l at lists.open-bio.org Hi Stephen, This is on a Mac, correct? You need to install the developer tools first. The key line in your log is: Can't test without successful make Admittedly, that's cryptic. What it means is that it needs the program called make. That program is installed when you install the developer tools. Go to developer.apple.com and create an account if you don't already have one. Go to the Mac Dev Center, and click on "Xcode 3". This should be the right link: Xcode 3 You'll need to login to get to it, and then you'll get to the download page for the massive 986 MB Xcode 3.1.3 download. After you run the Xcode installer, you can check in Terminal that you've got 'make' installed by typing: which make on the command line. It should give you the answer make is /usr/bin/make If it does, then you're good to try again with the bioperl install. Dave ------------------------------ View your Twitter and Flickr updates from one place ? Learn more! From stevey_mac2k2 at hotmail.com Mon Jul 6 16:19:37 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 20:19:37 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <72D5229E-A0A8-480A-AC87-6CEE0F1067B7@gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <72D5229E-A0A8-480A-AC87-6CEE0F1067B7@gmail.com> Message-ID: I managed to sort it and have had a go at installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon installing have noticed the error: Checking prerequisites... - ERROR: Data::Stag is not installed so i have then quit out of the install, and entered "install Data::Stag" in>CPAN but receive the following error messages: External Module XML::LibXSLT, XSLT, is not installed on this computer. Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations External Module XML::Parser::PerlSAX, SAX Handler, is not installed on this computer. Data::Stag::XMLParser in Data::Stag needs it for parsing XML External Module GD, Graphical Drawing Toolkit, is not installed on this computer. stag-drawtree.pl in Data::Stag needs it for drawing trees External Module Graph::Directed, Generic Graph data stucture and algorithms, is not installed on this computer. Data::Stag::GraphHandler in Data::Stag needs it for transforming stag trees to graphs External Module Tk, Tk, is not installed on this computer. stag-view.pl in Data::Stag needs it for tree viewer ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's lacking Data::Stag who's install is lacking the list above. How would i go about installing the above list? is there an easier way or something i'm doing wrong? Thanks, Stephen _________________________________________________________________ MSN straight to your mobile - news, entertainment, videos and more. http://clk.atdmt.com/UKM/go/147991039/direct/01/ From David.Messina at sbc.su.se Mon Jul 6 16:47:06 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 22:47:06 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Message-ID: <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> > do i now want to Install [a]ll optional external modules, [n]one, or choose > [i]nteractively? [n] > > Data::Stag, Test::Harness, and CPAN are required, not optional. So I think they'll be installed even if you answer n to the question about the optional external modules. Dave From scott at scottcain.net Mon Jul 6 16:50:41 2009 From: scott at scottcain.net (Scott Cain) Date: Mon, 6 Jul 2009 16:50:41 -0400 Subject: [Bioperl-l] Fwd: Bioperl Installation In-Reply-To: <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Message-ID: <536f21b00907061350l1e6882a4v597629ebc1aafc5b@mail.gmail.com> Hi Dave, I think you are confused about the prereqs for Data::Stag: I have it installed and working and don't have Tk. cpantesters.org also thinks that IO::String is the only dependency: http://deps.cpantesters.org/?module=Data::Stag;perl=latest Scott On Mon, Jul 6, 2009 at 4:26 PM, Dave Messina wrote: > Hi Steven, > Forwarding this to the list so that everyone can follow along...please keep > the list on any replies. > > Don't quit out of the install -- cpan can automatically detect required > dependencies and will try to install them first. > > Amidst all of the kerfuffle in your previous install there was this bit: > > ---- Unsatisfied dependencies detected during > [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- > > ? ?Test::Harness > > ? ?Data::Stag > > ? ?CPAN > > Shall I follow them and prepend them to the queue > > of modules we are processing right now? [yes] > > > So if you go back into cpan, try the Bioperl-1.6 install again, you should > be prompted again about those missing dependencies. > > > > A note to the Bioperl core-devs: > > Data::Stag seems to have a couple of tricky dependencies of its own, namely > GD and Tk, and it looks like they're for a couple of included scripts which > I'm guessing Bioperl doesn't use. > > Perhaps we should send a request to the Data::Stag author to make GD and Tk > optional instead of required? > > > Dave > > > > > ---------- Forwarded message ---------- > From: Steven McGowan > Date: Mon, Jul 6, 2009 at 22:02 > Subject: RE: [Bioperl-l] Bioperl Installation > To: david.messina at sbc.su.se > > > ?Hi Dave, > > I managed to sort it and have had a go at > installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon > installing have noticed the error: > > Checking prerequisites... > ?- ERROR: Data::Stag is not installed > > so i have then quit out of the install, and entered "install Data::Stag" > in>CPAN > > but receive the following error messages: > > External Module XML::LibXSLT, XSLT, > ?is not installed on this computer. > ?Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations > > External Module XML::Parser::PerlSAX, SAX Handler, > ?is not installed on this computer. > ?Data::Stag::XMLParser in Data::Stag needs it for parsing XML > > External Module GD, Graphical Drawing Toolkit, > ?is not installed on this computer. > ?stag-drawtree.pl in Data::Stag needs it for drawing trees > > External Module Graph::Directed, Generic Graph data stucture and algorithms, > ?is not installed on this computer. > ?Data::Stag::GraphHandler in Data::Stag needs it for transforming stag > trees to graphs > > External Module Tk, Tk, > ?is not installed on this computer. > ?stag-view.pl in Data::Stag needs it for tree viewer > > ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's lacking > Data::Stag who's install is lacking the list above. How would i go about > installing the above list? is there an easier way or something i'm doing > wrong? > > Thanks, > > Stephen > > ------------------------------ > From: David.Messina at sbc.su.se > Date: Mon, 6 Jul 2009 21:38:09 +0200 > Subject: Re: [Bioperl-l] Bioperl Installation > To: stevey_mac2k2 at hotmail.com > CC: Bioperl-l at lists.open-bio.org > > > Hi Stephen, > This is on a Mac, correct? > > You need to install the developer tools first. The key line in your log is: > > ?Can't test without successful make > > > Admittedly, that's cryptic. What it means is that it needs the program > called make. That program is installed when you install the developer tools. > > > Go to developer.apple.com and create an account if you don't already have > one. > > > Go to the Mac Dev Center, and click on "Xcode 3". > > > This should be the right link: > > Xcode 3 > > You'll need to login to get to it, and then you'll get to the download page > for the massive 986 MB Xcode 3.1.3 download. > > After you run the Xcode installer, you can check in Terminal that you've got > 'make' installed by typing: > > which make > > on the command line. It should give you the answer > make is /usr/bin/make > > If it does, then you're good to try again with the bioperl install. > > Dave > > > ------------------------------ > View your Twitter and Flickr updates from one place ? Learn > more! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From Russell.Smithies at agresearch.co.nz Mon Jul 6 16:56:41 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 7 Jul 2009 08:56:41 +1200 Subject: [Bioperl-l] different results with remote-blast skript In-Reply-To: <46A05E0132144D73A0F805953B580B2F@jonas> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> Hi Jonas, You can't just play with the BLAST parameters and hope for a "better" result. I'd suggest that if you aren't sure what they do, you should leave them alone as small changes can make huge differences in the output - it's quite possible to miss finding what you're looking for by using the wrong parameters. If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html http://www.ncbi.nlm.nih.gov/blast/tutorial/ Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 Don't worry about the integer overflow bug as there's nothing you can do about it. If you're interested, Google and Wikipedia are your friends: http://en.wikipedia.org/wiki/Integer_overflow Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > Sent: Tuesday, 7 July 2009 12:14 a.m. > To: BioPerl List; Chris Fields > Subject: Re: [Bioperl-l] different results with remote-blast skript > > Hi guys, thanks for your answers so far. > @jason: integer overflow in blast.... sorry, but what do you mean by that? > how can I fix it...? > > Since I never really changed any parameters I thought them all to be default. > whatever, I tried to get "better" results with my prog by changing > these: > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; > with no effect...I guess these were default values anyway. > > So please maybe you can tell me all the other parameters I can change with my > perl-skript AND how to do that? > Unfortunately both, perl and the blast-algorithm are pretty much new to me, > maybe thats why I just cannot find out how to do that on my own... :/ > > Here is the output I get with my remote-blast skript: > ############################################################################## > ################################### > Query Name: > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > L > hit name is ref|XP_001702807.1| > score is 442 > BLASTP 2.2.21+ > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped > BLAST and PSI-BLAST: a new generation of protein database search programs", > Nucleic Acids Res. 25:3389-3402. > > > Reference for composition-based statistics: Alejandro A. > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the > accuracy of PSI-BLAST protein database searches with composition-based > statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > > RID: 53STX5G2013 > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 9,252,587 sequences; 3,169,972,781 total letters Query= > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > ATGPDPDDEYE > Length=150 > > > Score > E > Sequences producing significant alignments: (Bits) > Value > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174 > 2e-42 > > > ALIGNMENTS > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] > gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > Length=303 > > Score = 174 bits (442), Expect = 2e-42, Method: Composition-based stats. > Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 (0%) > > Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60 > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > 213 > > Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > 120 > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > 273 > > Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > AWHERDDNAFRQAHQNTAMATGPDPDDEYE > Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > > > > Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF > excluding environmental samples from WGS projects > Posted date: Jul 5, 2009 4:41 AM > Number of letters in database: -1,124,994,511 > Number of sequences in database: 9,252,587 > > Lambda K H > 0.309 0.122 0.345 > Gapped > Lambda K H > 0.267 0.0410 0.140 > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Sequences: 9252587 > Number of Hits to DB: 60273703 > Number of extensions: 1448367 > Number of successful extensions: 2103 > Number of sequences better than 10: 0 > Number of HSP's better than 10 without gapping: 0 > Number of HSP's gapped: 2113 > Number of HSP's successfully gapped: 0 > Length of query: 150 > Length of database: 3169972781 > Length adjustment: 113 > Effective length of query: 37 > Effective length of database: 2124430450 > Effective search space: 78603926650 > Effective search space used: 78603926650 > T: 11 > A: 40 > X1: 16 (7.1 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 42 (20.8 bits) > S2: 74 (33.1 bits) > > ############################################################################## > ################################### > and here are the hits (?) of the blast-algorithm on the ncbi-homepage with > the same query of course: > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300 > 3e-80 > ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2 > 1.1 > ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4 > 1.8 > ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3 > 4.2 > ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5 > 6.0 > ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5 > 7.0 > ############################################################################## > ###################################at > least the first hit is the same, but even there there is a different score > and e-value. > > thanks so much for any help :) > regards, jonas > > > ----- Original Message ----- > From: "Chris Fields" > To: "Jason Stajich" > Cc: "Smithies, Russell" ; "'BioPerl > List'" ; "'Jonas Schaer'" > > Sent: Monday, July 06, 2009 12:51 AM > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > That inspires confidence ;> > > > > chris > > > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > > > >> integer overflow in blast.... > >> > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >> > >>> I'd guess it's a difference in the parameters used. > >>> Interesting that both have the number of letters in the db as > >>> "-1,125,070,205", I assume that's a bug :-) > >>> > >>> Stats from your remote_blast: > >>> > >>> 'stats' => { > >>> 'S1' => '42', > >>> 'S1_bits' => '20.8', > >>> 'lambda' => '0.309', > >>> 'entropy' => '0.345', > >>> 'kappa_gapped' => '0.0410', > >>> 'T' => '11', > >>> 'kappa' => '0.122', > >>> 'X3_bits' => '24.7', > >>> 'X1' => '16', > >>> 'lambda_gapped' => '0.267', > >>> 'X2' => '38', > >>> 'S2' => '74', > >>> 'seqs_better_than_cutoff' => '0', > >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>> 'Hits_to_DB' => '60102303', > >>> 'dbletters' => '-1125070205', > >>> 'A' => '40', > >>> 'num_successful_extensions' => '2004', > >>> 'num_extensions' => '1436892', > >>> 'X1_bits' => '7.1', > >>> 'X3' => '64', > >>> 'entropy_gapped' => '0.140', > >>> 'dbentries' => '9252258', > >>> 'X2_bits' => '14.6', > >>> 'S2_bits' => '33.1' > >>> } > >>> > >>> > >>> Stats from a blast done on the NCBI webpage: > >>> > >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt > >>> +PIR+PRF > >>> excluding environmental samples from WGS projects > >>> Posted date: Jul 4, 2009 4:41 AM > >>> Number of letters in database: -1,125,070,205 > >>> Number of sequences in database: 9,252,258 > >>> > >>> Lambda K H > >>> 0.309 0.124 0.340 > >>> Gapped > >>> Lambda K H > >>> 0.267 0.0410 0.140 > >>> Matrix: BLOSUM62 > >>> Gap Penalties: Existence: 11, Extension: 1 > >>> Number of Sequences: 9252258 > >>> Number of Hits to DB: 86493230 > >>> Number of extensions: 3101413 > >>> Number of successful extensions: 9001 > >>> Number of sequences better than 100: 65 > >>> Number of HSP's better than 100 without gapping: 0 > >>> Number of HSP's gapped: 9000 > >>> Number of HSP's successfully gapped: 66 > >>> Length of query: 150 > >>> Length of database: 3169897087 > >>> Length adjustment: 113 > >>> Effective length of query: 37 > >>> Effective length of database: 2124391933 > >>> Effective search space: 78602501521 > >>> Effective search space used: 78602501521 > >>> T: 11 > >>> A: 40 > >>> X1: 16 (7.1 bits) > >>> X2: 38 (14.6 bits) > >>> X3: 64 (24.7 bits) > >>> S1: 42 (20.8 bits) > >>> S2: 65 (29.6 bits) > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>> To: BioPerl List > >>>> Subject: [Bioperl-l] different results with remote-blast skript > >>>> > >>>> Hi again :) > >>>> please, I only have this little question: > >>>> why do I get different results with my remote::blast perl skript > >>>> then on the > >>>> ncbi blast homepage? > >>>> I am using blastp, the query is an amino-sequence (different > >>>> results with any > >>>> sequence, differences not only in number of hits but even in e- > >>>> values, scores > >>>> etc...), the database is 'nr'. > >>>> PLEASE help me, > >>>> thank you in advance, > >>>> Jonas > >>>> > >>>> ps: my skript: > >>>> > ############################################################################## > >>>> ## > >>>> use Bio::Seq::SeqFactory; > >>>> use Bio::Tools::Run::RemoteBlast; > >>>> use strict; > >>>> my @blast_report; > >>>> my $prog = 'blastp'; > >>>> my $db = 'nr'; > >>>> my $e_val= '1e-10'; > >>>> #my $e_val= '10'; > >>>> my @params = ( '-prog' => $prog, > >>>> '-data' => $db, > >>>> '-expect' => $e_val, > >>>> '-readmethod' => 'SearchIO' ); > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>> $ > >>>> Bio > >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>> = '1'; > >>>> > >>>> my > >>>> $ > >>>> blast_seq > >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>> > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD > >>>> PDDEYE'; > >>>> #$v is just to turn on and off the messages > >>>> my $v = 1; > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>> 'Bio::PrimarySeq'); > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > >>>> "$blast_seq"); > >>>> my $filename='temp2.out'; > >>>> my $r = $factory->submit_blast($seq); > >>>> print STDERR "waiting..." if( $v > 0 ); > >>>> while ( my @rids = $factory->each_rid ) > >>>> { > >>>> foreach my $rid ( @rids ) > >>>> { > >>>> my $rc = $factory->retrieve_blast($rid); > >>>> if( !ref($rc) ) > >>>> { > >>>> if( $rc < 0 ) > >>>> { > >>>> $factory->remove_rid($rid); > >>>> } > >>>> print STDERR "." if ( $v > 0 ); > >>>> } > >>>> else > >>>> { > >>>> my $result = $rc->next_result(); > >>>> $factory->save_output($filename); > >>>> $factory->remove_rid($rid); > >>>> print "\nQuery Name: ", $result->query_name(), > >>>> "\n"; > >>>> while ( my $hit = $result->next_hit ) > >>>> { > >>>> next unless ( $v > 0); > >>>> print "\thit name is ", $hit->name, "\n"; > >>>> while( my $hsp = $hit->next_hsp ) > >>>> { > >>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> > >>>> > >>>> } > >>>> @blast_report = get_file_data ($filename); > >>>> return @blast_report; > >>>> > ############################################################################## > >>>> #### > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> = > >>> = > >>> ===================================================================== > >>> Attention: The information contained in this message and/or > >>> attachments > >>> from AgResearch Limited is intended only for the persons or entities > >>> to which it is addressed and may contain confidential and/or > >>> privileged > >>> material. Any review, retransmission, dissemination or other use > >>> of, or > >>> taking of any action in reliance upon, this information by persons or > >>> entities other than the intended recipients is prohibited by > >>> AgResearch > >>> Limited. If you have received this message in error, please notify > >>> the > >>> sender immediately. > >>> = > >>> = > >>> ===================================================================== > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason at bioperl.org > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------------ > -- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09 > 05:53:00 From stevey_mac2k2 at hotmail.com Mon Jul 6 16:39:08 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 20:39:08 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Message-ID: I have since initialised the install and receive the message: Checking prerequisites... - ERROR: Data::Stag is not installed (I think I'm being run by CPAN, so will rely on CPAN to handle prerequisite installation) I'll get CPAN to prepend the installation of this - ERROR: Test::Harness (2.56) is installed, but we need version>= 2.62 I'll get CPAN to prepend the installation of this - ERROR: CPAN (1.7602) is installed, but we need version>= 1.81 I'll get CPAN to prepend the installation of this Install [a]ll optional external modules, [n]one, or choose [i]nteractively? [n] do i now want to Install [a]ll optional external modules, [n]one, or choose [i]nteractively? [n] i'm guessing installing all external modules will include Data::Stag? StephenFrom: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 22:26:55 +0200 Subject: Fwd: [Bioperl-l] Bioperl Installation To: bioperl-l at lists.open-bio.org CC: stevey_mac2k2 at hotmail.com Hi Steven, Forwarding this to the list so that everyone can follow along...please keep the list on any replies. Don't quit out of the install -- cpan can automatically detect required dependencies and will try to install them first. Amidst all of the kerfuffle in your previous install there was this bit: ---- Unsatisfied dependencies detected during [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- Test::Harness Data::Stag CPAN Shall I follow them and prepend them to the queue of modules we are processing right now? [yes] So if you go back into cpan, try the Bioperl-1.6 install again, you should be prompted again about those missing dependencies. A note to the Bioperl core-devs: Data::Stag seems to have a couple of tricky dependencies of its own, namely GD and Tk, and it looks like they're for a couple of included scripts which I'm guessing Bioperl doesn't use. Perhaps we should send a request to the Data::Stag author to make GD and Tk optional instead of required? Dave ---------- Forwarded message ---------- From: Steven McGowan Date: Mon, Jul 6, 2009 at 22:02 Subject: RE: [Bioperl-l] Bioperl Installation To: david.messina at sbc.su.se Hi Dave, I managed to sort it and have had a go at installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon installing have noticed the error: Checking prerequisites... - ERROR: Data::Stag is not installed so i have then quit out of the install, and entered "install Data::Stag" in>CPAN but receive the following error messages: External Module XML::LibXSLT, XSLT, is not installed on this computer. Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations External Module XML::Parser::PerlSAX, SAX Handler, is not installed on this computer. Data::Stag::XMLParser in Data::Stag needs it for parsing XML External Module GD, Graphical Drawing Toolkit, is not installed on this computer. stag-drawtree.pl in Data::Stag needs it for drawing trees External Module Graph::Directed, Generic Graph data stucture and algorithms, is not installed on this computer. Data::Stag::GraphHandler in Data::Stag needs it for transforming stag trees to graphs External Module Tk, Tk, is not installed on this computer. stag-view.pl in Data::Stag needs it for tree viewer ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's lacking Data::Stag who's install is lacking the list above. How would i go about installing the above list? is there an easier way or something i'm doing wrong? Thanks, Stephen From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 21:38:09 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: Bioperl-l at lists.open-bio.org Hi Stephen, This is on a Mac, correct? You need to install the developer tools first. The key line in your log is: Can't test without successful make Admittedly, that's cryptic. What it means is that it needs the program called make. That program is installed when you install the developer tools. Go to developer.apple.com and create an account if you don't already have one. Go to the Mac Dev Center, and click on "Xcode 3". This should be the right link:Xcode 3 You'll need to login to get to it, and then you'll get to the download page for the massive 986 MB Xcode 3.1.3 download. After you run the Xcode installer, you can check in Terminal that you've got 'make' installed by typing: which make on the command line. It should give you the answermake is /usr/bin/make If it does, then you're good to try again with the bioperl install. Dave View your Twitter and Flickr updates from one place ? Learn more! _________________________________________________________________ Get the best of MSN on your mobile http://clk.atdmt.com/UKM/go/147991039/direct/01/ From stevey_mac2k2 at hotmail.com Mon Jul 6 16:52:46 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 20:52:46 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> Message-ID: ok i've hit [n] the next bypasses a list of optional prerequisites...apart from: * XML::SAX (0.14) is installed, but we prefer to have 0.15 (wanted for parsing xml, used by Bio::SearchIO::blastxml, Bio::SeqIO::tigrxml and Bio::SeqIO::bsml_sax) this does not seem to be an optional prerequisite but seems to be bypassed? and then i receive: ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the versionsof the modules indicated above before proceeding with this installation Checking features: Network..................enabled BioDBSeqFeature_mysql....enabled BioDBGFF.................enabled BioDBSeqFeature_BDB......enabled Do you want to run the Bio::DB::GFF or Bio::DB::SeqFeature::Store live database tests? y/n [n] n - will not run the BioDBGFF or BioDBSeqFeature live database tests Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 22:47:06 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: bioperl-l at lists.open-bio.org do i now want to Install [a]ll optional external modules, [n]one, or choose [i]nteractively? [n] Data::Stag, Test::Harness, and CPAN are required, not optional. So I think they'll be installed even if you answer n to the question about the optional external modules. Dave _________________________________________________________________ MSN straight to your mobile - news, entertainment, videos and more. http://clk.atdmt.com/UKM/go/147991039/direct/01/ From cjfields at illinois.edu Mon Jul 6 17:04:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Jul 2009 16:04:29 -0500 Subject: [Bioperl-l] Fwd: Bioperl Installation In-Reply-To: <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> Message-ID: (cc'ing Chris M about this) The Tk and GD dependencies should probably be optional if they are only required for the Data::Stag scripts. As for the libxslt, I'm not sure but I believe that's available with the dev kit; if not it's available via fink/macports. The inclusion of those as a requirement is a bit troubling for me, but this is the first time I've seen issues with it pop up. chris On Jul 6, 2009, at 3:26 PM, Dave Messina wrote: > Hi Steven, > Forwarding this to the list so that everyone can follow > along...please keep > the list on any replies. > > Don't quit out of the install -- cpan can automatically detect > required > dependencies and will try to install them first. > > Amidst all of the kerfuffle in your previous install there was this > bit: > > ---- Unsatisfied dependencies detected during > [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- > > Test::Harness > > Data::Stag > > CPAN > > Shall I follow them and prepend them to the queue > > of modules we are processing right now? [yes] > > > So if you go back into cpan, try the Bioperl-1.6 install again, you > should > be prompted again about those missing dependencies. > > > > A note to the Bioperl core-devs: > > Data::Stag seems to have a couple of tricky dependencies of its own, > namely > GD and Tk, and it looks like they're for a couple of included > scripts which > I'm guessing Bioperl doesn't use. > > Perhaps we should send a request to the Data::Stag author to make GD > and Tk > optional instead of required? > > > Dave > > > > > ---------- Forwarded message ---------- > From: Steven McGowan > Date: Mon, Jul 6, 2009 at 22:02 > Subject: RE: [Bioperl-l] Bioperl Installation > To: david.messina at sbc.su.se > > > Hi Dave, > > I managed to sort it and have had a go at > installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon > installing have noticed the error: > > Checking prerequisites... > - ERROR: Data::Stag is not installed > > so i have then quit out of the install, and entered "install > Data::Stag" > in>CPAN > > but receive the following error messages: > > External Module XML::LibXSLT, XSLT, > is not installed on this computer. > Data::Stag::XSLTHandler in Data::Stag needs it for XSLT > Transformations > > External Module XML::Parser::PerlSAX, SAX Handler, > is not installed on this computer. > Data::Stag::XMLParser in Data::Stag needs it for parsing XML > > External Module GD, Graphical Drawing Toolkit, > is not installed on this computer. > stag-drawtree.pl in Data::Stag needs it for drawing trees > > External Module Graph::Directed, Generic Graph data stucture and > algorithms, > is not installed on this computer. > Data::Stag::GraphHandler in Data::Stag needs it for transforming stag > trees to graphs > > External Module Tk, Tk, > is not installed on this computer. > stag-view.pl in Data::Stag needs it for tree viewer > > ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's > lacking > Data::Stag who's install is lacking the list above. How would i go > about > installing the above list? is there an easier way or something i'm > doing > wrong? > > Thanks, > > Stephen > > ------------------------------ > From: David.Messina at sbc.su.se > Date: Mon, 6 Jul 2009 21:38:09 +0200 > Subject: Re: [Bioperl-l] Bioperl Installation > To: stevey_mac2k2 at hotmail.com > CC: Bioperl-l at lists.open-bio.org > > > Hi Stephen, > This is on a Mac, correct? > > You need to install the developer tools first. The key line in your > log is: > > Can't test without successful make > > > Admittedly, that's cryptic. What it means is that it needs the program > called make. That program is installed when you install the > developer tools. > > > Go to developer.apple.com and create an account if you don't already > have > one. > > > Go to the Mac Dev Center, and click on "Xcode 3". > > > This should be the right link: > > Xcode 3 > > You'll need to login to get to it, and then you'll get to the > download page > for the massive 986 MB Xcode 3.1.3 download. > > After you run the Xcode installer, you can check in Terminal that > you've got > 'make' installed by typing: > > which make > > on the command line. It should give you the answer > make is /usr/bin/make > > If it does, then you're good to try again with the bioperl install. > > Dave > > > ------------------------------ > View your Twitter and Flickr updates from one place ? Learn > more! > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Jul 6 17:06:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 6 Jul 2009 16:06:56 -0500 Subject: [Bioperl-l] Fwd: Bioperl Installation In-Reply-To: <536f21b00907061350l1e6882a4v597629ebc1aafc5b@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <536f21b00907061350l1e6882a4v597629ebc1aafc5b@mail.gmail.com> Message-ID: Okay, that makes more sense to me (and also makes sense looking at the Data::Stag makefile). None of those additional modules are required for bioperl core functionality. chris On Jul 6, 2009, at 3:50 PM, Scott Cain wrote: > Hi Dave, > > I think you are confused about the prereqs for Data::Stag: I have it > installed and working and don't have Tk. cpantesters.org also thinks > that IO::String is the only dependency: > > http://deps.cpantesters.org/?module=Data::Stag;perl=latest > > Scott > > > On Mon, Jul 6, 2009 at 4:26 PM, Dave > Messina wrote: >> Hi Steven, >> Forwarding this to the list so that everyone can follow >> along...please keep >> the list on any replies. >> >> Don't quit out of the install -- cpan can automatically detect >> required >> dependencies and will try to install them first. >> >> Amidst all of the kerfuffle in your previous install there was this >> bit: >> >> ---- Unsatisfied dependencies detected during >> [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- >> >> Test::Harness >> >> Data::Stag >> >> CPAN >> >> Shall I follow them and prepend them to the queue >> >> of modules we are processing right now? [yes] >> >> >> So if you go back into cpan, try the Bioperl-1.6 install again, you >> should >> be prompted again about those missing dependencies. >> >> >> >> A note to the Bioperl core-devs: >> >> Data::Stag seems to have a couple of tricky dependencies of its >> own, namely >> GD and Tk, and it looks like they're for a couple of included >> scripts which >> I'm guessing Bioperl doesn't use. >> >> Perhaps we should send a request to the Data::Stag author to make >> GD and Tk >> optional instead of required? >> >> >> Dave >> >> >> >> >> ---------- Forwarded message ---------- >> From: Steven McGowan >> Date: Mon, Jul 6, 2009 at 22:02 >> Subject: RE: [Bioperl-l] Bioperl Installation >> To: david.messina at sbc.su.se >> >> >> Hi Dave, >> >> I managed to sort it and have had a go at >> installing: (install C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz), but upon >> installing have noticed the error: >> >> Checking prerequisites... >> - ERROR: Data::Stag is not installed >> >> so i have then quit out of the install, and entered "install >> Data::Stag" >> in>CPAN >> >> but receive the following error messages: >> >> External Module XML::LibXSLT, XSLT, >> is not installed on this computer. >> Data::Stag::XSLTHandler in Data::Stag needs it for XSLT >> Transformations >> >> External Module XML::Parser::PerlSAX, SAX Handler, >> is not installed on this computer. >> Data::Stag::XMLParser in Data::Stag needs it for parsing XML >> >> External Module GD, Graphical Drawing Toolkit, >> is not installed on this computer. >> stag-drawtree.pl in Data::Stag needs it for drawing trees >> >> External Module Graph::Directed, Generic Graph data stucture and >> algorithms, >> is not installed on this computer. >> Data::Stag::GraphHandler in Data::Stag needs it for transforming >> stag >> trees to graphs >> >> External Module Tk, Tk, >> is not installed on this computer. >> stag-view.pl in Data::Stag needs it for tree viewer >> >> ok so for the C/CJ/CJFIELDS/BioPerl-db-1.6.0.tar.gz install, it's >> lacking >> Data::Stag who's install is lacking the list above. How would i go >> about >> installing the above list? is there an easier way or something i'm >> doing >> wrong? >> >> Thanks, >> >> Stephen >> >> ------------------------------ >> From: David.Messina at sbc.su.se >> Date: Mon, 6 Jul 2009 21:38:09 +0200 >> Subject: Re: [Bioperl-l] Bioperl Installation >> To: stevey_mac2k2 at hotmail.com >> CC: Bioperl-l at lists.open-bio.org >> >> >> Hi Stephen, >> This is on a Mac, correct? >> >> You need to install the developer tools first. The key line in your >> log is: >> >> Can't test without successful make >> >> >> Admittedly, that's cryptic. What it means is that it needs the >> program >> called make. That program is installed when you install the >> developer tools. >> >> >> Go to developer.apple.com and create an account if you don't >> already have >> one. >> >> >> Go to the Mac Dev Center, and click on "Xcode 3". >> >> >> This should be the right link: >> >> Xcode 3 >> >> You'll need to login to get to it, and then you'll get to the >> download page >> for the massive 986 MB Xcode 3.1.3 download. >> >> After you run the Xcode installer, you can check in Terminal that >> you've got >> 'make' installed by typing: >> >> which make >> >> on the command line. It should give you the answer >> make is /usr/bin/make >> >> If it does, then you're good to try again with the bioperl install. >> >> Dave >> >> >> ------------------------------ >> View your Twitter and Flickr updates from one place ? Learn >> more! >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at > scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Mon Jul 6 17:09:05 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 23:09:05 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> Message-ID: <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> > > * XML::SAX (0.14) is installed, but we prefer to have 0.15 > (wanted for parsing xml, used by Bio::SearchIO::blastxml, > Bio::SeqIO::tigrxml and Bio::SeqIO::bsml_sax) > > [snip] > > ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the > versions > of the modules indicated above before proceeding with this installation > This "error/warning" refers to XML::SAX. I'm pretty sure that's optional. Not exactly sure why it's getting called out specifically here, but I think you can safely ignore it. Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] > > You went ahead and answered this question, right? The installation should have started at this point. D From David.Messina at sbc.su.se Mon Jul 6 17:16:00 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 23:16:00 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> Message-ID: <628aabb70907061416y1f8eb5d6j2da372d115456ffa@mail.gmail.com> > > I'm just hanging on the question at the moment.. not sure whether to > install all [a] or [n]one. I'm probably going to go with [a]ll > Yes, you'll probably want all the scripts. From David.Messina at sbc.su.se Mon Jul 6 17:29:21 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 6 Jul 2009 23:29:21 +0200 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: <24360594.post@talk.nabble.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> <628aabb70907061416y1f8eb5d6j2da372d115456ffa@mail.gmail.com> Message-ID: <628aabb70907061429h1d10623dxdca6239ecfe66c29@mail.gmail.com> Did you confirm that make is available to cpan before you started, by following Scott's earlier instructions? From stevey_mac2k2 at hotmail.com Mon Jul 6 17:11:14 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 21:11:14 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> Message-ID: I'm just hanging on the question at the moment.. not sure whether to install all [a] or [n]one. I'm probably going to go with [a]ll From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 23:09:05 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: bioperl-l at lists.open-bio.org * XML::SAX (0.14) is installed, but we prefer to have 0.15 (wanted for parsing xml, used by Bio::SearchIO::blastxml, Bio::SeqIO::tigrxml and Bio::SeqIO::bsml_sax) [snip] ERRORS/WARNINGS FOUND IN PREREQUISITES. You may wish to install the versionsof the modules indicated above before proceeding with this installation This "error/warning" refers to XML::SAX. I'm pretty sure that's optional. Not exactly sure why it's getting called out specifically here, but I think you can safely ignore it. Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] You went ahead and answered this question, right? The installation should have started at this point. D _________________________________________________________________ Share your photos with Windows Live Photos ? Free. http://clk.atdmt.com/UKM/go/134665338/direct/01/ From stevey_mac2k2 at hotmail.com Mon Jul 6 17:23:41 2009 From: stevey_mac2k2 at hotmail.com (Steven McGowan) Date: Mon, 6 Jul 2009 21:23:41 +0000 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <628aabb70907061416y1f8eb5d6j2da372d115456ffa@mail.gmail.com> References: <24360594.post@talk.nabble.com> <628aabb70907061238k4ff8cb97o921bab05290575c8@mail.gmail.com> <628aabb70907061326l46acee4fp65232f0765476159@mail.gmail.com> <628aabb70907061347p7df38e13yb71a9c66b8c20318@mail.gmail.com> <628aabb70907061409h709d7a12hf2fc06274fb87686@mail.gmail.com> <628aabb70907061416y1f8eb5d6j2da372d115456ffa@mail.gmail.com> Message-ID: ok... after choosing to install all scripts, i receive: Creating new 'Build' script for 'BioPerl' version '1.006000'Warning: PREREQ_PM mentions Test::Harness more than once, last mention wins at /System/Library/Perl/5.8.8/CPAN.pm line 4689, line 1.Warning: PREREQ_PM mentions CPAN more than once, last mention wins at /System/Library/Perl/5.8.8/CPAN.pm line 4689, line 1.---- Unsatisfied dependencies detected during [C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz] ----- Test::Harness Data::Stag CPAN Shall I follow them and prepend them to the queueof modules we are processing right now? [yes] y -------------TEST HARNESS------------- Running install for module Test::Harness CPAN.pm: Going to build A/AN/ANDYA/Test-Harness-3.17.tar.gz Checking if your kit is complete...Looks goodWriting Makefile for Test::Harness -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossible ---------DATA::STAG---------- Running install for module Data::StagRunning make for C/CM/CMUNGALL/Data-Stag-0.11.tar.gz CPAN.pm: Going to build C/CM/CMUNGALL/Data-Stag-0.11.tar.gz External Module XML::LibXSLT, XSLT, is not installed on this computer. Data::Stag::XSLTHandler in Data::Stag needs it for XSLT Transformations External Module XML::Parser::PerlSAX, SAX Handler, is not installed on this computer. Data::Stag::XMLParser in Data::Stag needs it for parsing XML External Module GD, Graphical Drawing Toolkit, is not installed on this computer. stag-drawtree.pl in Data::Stag needs it for drawing trees External Module Graph::Directed, Generic Graph data stucture and algorithms, is not installed on this computer. Data::Stag::GraphHandler in Data::Stag needs it for transforming stag trees to graphs External Module Tk, Tk, is not installed on this computer. stag-view.pl in Data::Stag needs it for tree viewer There are some external packages and perl modules, listed above, which stag uses. This only effects the functionality which is listed above: the rest of stag will work fine, which includes nearly all of the core functionality. Enjoy the rest of stag, which you can use after going 'make install' Checking if your kit is complete...Looks goodWriting Makefile for Data -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossible -------------CPAN-------------- CPAN.pm: Going to build A/AN/ANDK/CPAN-1.9402.tar.gz Checking if your kit is complete...Looks goodWarning: prerequisite File::HomeDir 0.69 not found.Warning: prerequisite Test::Harness 2.62 not found. We have 2.56.Writing Makefile for CPAN---- Unsatisfied dependencies detected during [A/AN/ANDK/CPAN-1.9402.tar.gz] ----- Test::Harness File::HomeDirShall I follow them and prepend them to the queueof modules we are processing right now? [yes] yRunning make test Delayed until after prerequisitesRunning make install Delayed until after prerequisitesRunning install for module Test::HarnessRunning make for A/AN/ANDYA/Test-Harness-3.17.tar.gz Is already unwrapped into directory /Users/stevey_mac2k2/.cpan/build/Test-Harness-3.17 Has already been processed within this sessionRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossible -------Further On-------> CPAN.pm: Going to build A/AD/ADAMK/File-HomeDir-0.86.tar.gz Checking if your kit is complete...Looks goodWriting Makefile for File::HomeDir -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossibleRunning make for A/AN/ANDK/CPAN-1.9402.tar.gz Is already unwrapped into directory /Users/stevey_mac2k2/.cpan/build/CPAN-1.9402 CPAN.pm: Going to build A/AN/ANDK/CPAN-1.9402.tar.gz -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossibleRunning make for C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz Is already unwrapped into directory /Users/stevey_mac2k2/.cpan/build/BioPerl-1.6.0 CPAN.pm: Going to build C/CJ/CJFIELDS/BioPerl-1.6.0.tar.gz -- NOT OKRunning make test Can't test without successful makeRunning make install make had returned bad status, install seems impossible From: David.Messina at sbc.su.se Date: Mon, 6 Jul 2009 23:16:00 +0200 Subject: Re: [Bioperl-l] Bioperl Installation To: stevey_mac2k2 at hotmail.com CC: bioperl-l at lists.open-bio.org I'm just hanging on the question at the moment.. not sure whether to install all [a] or [n]one. I'm probably going to go with [a]ll Yes, you'll probably want all the scripts. _________________________________________________________________ With Windows Live, you can organise, edit, and share your photos. http://clk.atdmt.com/UKM/go/134665338/direct/01/ From rmb32 at cornell.edu Mon Jul 6 14:59:38 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 06 Jul 2009 11:59:38 -0700 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: <24360594.post@talk.nabble.com> References: <24360594.post@talk.nabble.com> Message-ID: <4A52499A.7010208@cornell.edu> Hi Stephen, It looks to me like your CPAN installation has gotten a bit confused, possibly from it getting stopped in the middle of doing something. Try doing rm -rf ~/.cpan/build/* and trying the installation again. Also, it's usually best to just cut and paste logs like this into the body of an email, but try to paste only the most relevant parts. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu stephenmcgowan1 wrote: > Hi, > > I seem to be having trouble with Installing Bioperl 1.6 in CPAN. > > I have attached a log of the install, i just can't see why it seems to be > falling over. > > Thanks, > > Stephen > > http://www.nabble.com/file/p24360594/BioPerl%2BInstall.rtf > BioPerl+Install.rtf > http://www.nabble.com/file/p24360594/BioPerl%2BInstall.doc > BioPerl+Install.doc From manchunjohn-ma at uiowa.edu Mon Jul 6 18:10:17 2009 From: manchunjohn-ma at uiowa.edu (John M.C. Ma) Date: Mon, 6 Jul 2009 17:10:17 -0500 Subject: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any workarounds? Message-ID: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> We have told the guys at RepeatMasker that RM-3.1.6 have a problem causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). And as of today, they are now at 3.2.8, and the problem is not fixed. And I don't want my project to be stalled-- any tips for a workaround? From David.Messina at sbc.su.se Mon Jul 6 18:32:55 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 7 Jul 2009 00:32:55 +0200 Subject: [Bioperl-l] Fwd: Bioperl Installation Message-ID: <628aabb70907061532t3805e1bar3a02e6328f1f5b6c@mail.gmail.com> ---------- Forwarded message ---------- From: Steven McGowan Date: Tue, Jul 7, 2009 at 00:01 Subject: RE: To: david.messina at sbc.su.se I think it's done the trick! although if i want to be 100% sure it's > installed ok is there a command i can type to make sure it's installed ok? Yep, try this on the command line: perl -e 'use Bio::SeqIO; print "Success!\n";' If you see Success! then you're good to go. Now that bioperl is installed, i will now install the bioperl-db-. Okay. I'm going to bed now. :) Thanks for all your time and help Dave. You're welcome! Dave From koenvanderdrift at gmail.com Mon Jul 6 18:41:38 2009 From: koenvanderdrift at gmail.com (Koen van der Drift) Date: Mon, 6 Jul 2009 18:41:38 -0400 Subject: [Bioperl-l] Bioperl Installation Message-ID: Hi, Installation problems on a Mac seems to be a recurring question on this mailing list. Just as a reminder, besides CPAN, one of the easiest ways to install bioperl on a Mac is through fink. The instructions are available on the bioperl website here: http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink , but are rather hidden. Maybe http://www.bioperl.org/wiki/Installing_BioPerl can be edited to state Installing Bioperl for Unix (including Mac OS X)? I don't seem to have privileges to edit that page, so I'll leave that up to the team. Also, the file PACKAGES contains a link about installation on Mac OS X that is *very* outdated. Can this be removed from the package, I think it only creates confusion? Cheers, - Koen. From bix at sendu.me.uk Mon Jul 6 19:43:23 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 07 Jul 2009 00:43:23 +0100 Subject: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any workarounds? In-Reply-To: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> References: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> Message-ID: <4A528C1B.2030506@sendu.me.uk> John M.C. Ma wrote: > We have told the guys at RepeatMasker that RM-3.1.6 have a problem > causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). > And as of today, they are now at 3.2.8, and the problem is not fixed. > And I don't want my project to be stalled-- any tips for a workaround? Here's my mail to some RepeatMasker devs that they never replied to: ----- Hi, Perhaps you already know about this, but in RepeatMasker 3.1.6 -noint cannot be used because of error 'Unknown option: noint-species'. This is caused by line 1131 having no space after the "-noint". Likewise, -lcambig on 1128 would probably suffer a similar problem. Will this be fixed in the next version, and how often do you release new versions? ----- If it really is the same bug, it should be easy to fix the latest version in the same way yourself. From Russell.Smithies at agresearch.co.nz Mon Jul 6 20:06:54 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 7 Jul 2009 12:06:54 +1200 Subject: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any workarounds? In-Reply-To: <4A528C1B.2030506@sendu.me.uk> References: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> <4A528C1B.2030506@sendu.me.uk> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32A1B8697D0@exchsth.agresearch.co.nz> Is it the "-noint" bug causing the crash? We had major problems (with version 3.2.8) where it would stack-dump which I worked around by running it with the "-no_is" option so it doesn't check for bacterial insertion elements. We've never had a crash after that :-) Also, it is open-source so you could fix your own copy if you know what the bugs are. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Tuesday, 7 July 2009 11:43 a.m. > To: manchunjohn-ma at uiowa.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any > workarounds? > > John M.C. Ma wrote: > > We have told the guys at RepeatMasker that RM-3.1.6 have a problem > > causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). > > And as of today, they are now at 3.2.8, and the problem is not fixed. > > And I don't want my project to be stalled-- any tips for a workaround? > > Here's my mail to some RepeatMasker devs that they never replied to: > > ----- > Hi, > > Perhaps you already know about this, but in RepeatMasker 3.1.6 -noint > cannot be used because of error 'Unknown option: noint-species'. > This is caused by line 1131 having no space after the "-noint". > Likewise, -lcambig on 1128 would probably suffer a similar problem. > > Will this be fixed in the next version, and how often do you release new > versions? > ----- > > If it really is the same bug, it should be easy to fix the latest > version in the same way yourself. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From rmb32 at cornell.edu Mon Jul 6 20:13:06 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 06 Jul 2009 17:13:06 -0700 Subject: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: Any workarounds? In-Reply-To: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> References: <5486b2980907061510ke518009l7d86a92da86975bc@mail.gmail.com> Message-ID: <4A529312.9040905@cornell.edu> John M.C. Ma wrote: > And as of today, they are now at 3.2.8, and the problem is not fixed. > And I don't want my project to be stalled-- any tips for a workaround? FORK! Just kidding. Mostly. Actually, svn vendor branches or something similar can be a good option for unpleasant things like this, see http://svnbook.red-bean.com/en/1.5/svn.advanced.vendorbr.html Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From rhubley at systemsbiology.org Tue Jul 7 09:58:01 2009 From: rhubley at systemsbiology.org (Robert Hubley) Date: Tue, 07 Jul 2009 06:58:01 -0700 Subject: [Bioperl-l] RepeatMasker Message-ID: <4A535469.4060603@systemsbiology.org> This list email as forwarded to us by a colleague. I fixed this bug awhile back and I just double checked 3.2.8 and don't see any problems with the options -noint or -lcambig. Could someone help us determine how this is breaking bio-perl? Thanks, -Robert |We have told the guys at RepeatMasker that RM-3.1.6 have a problem |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). |And as of today, they are now at 3.2.8, and the problem is not fixed. |And I don't want my project to be stalled-- any tips for a workaround? || ||Hi, || ||Perhaps you already know about this, but in RepeatMasker 3.1.6 -noint ||cannot be used because of error 'Unknown option: noint-species'. ||This is caused by line 1131 having no space after the "-noint". ||Likewise, -lcambig on 1128 would probably suffer a similar problem. || ||Will this be fixed in the next version, and how often do you release new ||versions? From manchunjohn-ma at uiowa.edu Tue Jul 7 13:17:40 2009 From: manchunjohn-ma at uiowa.edu (John M.C. Ma) Date: Tue, 7 Jul 2009 12:17:40 -0500 Subject: [Bioperl-l] RepeatMasker Re: Bioperl-l Digest, Vol 75, Issue 10 Message-ID: <5486b2980907071017o24a6c186paefdef0bcbfe6ecc@mail.gmail.com> Hi, Sorry that I thought it was the same as 2138, as I never used -noint. I used -species and -noisy-- but it does not matter any more. I tried to run it without parameters and it crashed in the same way as 2138. John Ma On Tue, Jul 7, 2009 at 11:00 AM, wrote: > Send Bioperl-l mailing list submissions to > ? ? ? ?bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > ? ? ? ?http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > ? ? ? ?bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > ? ? ? ?bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > ? 1. Re: ?RepeatMasker still did not act upon Bug 2138: Any > ? ? ?workarounds? (Robert Buels) > ? 2. ?RepeatMasker (Robert Hubley) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 06 Jul 2009 17:13:06 -0700 > From: Robert Buels > Subject: Re: [Bioperl-l] RepeatMasker still did not act upon Bug 2138: > ? ? ? ?Any workarounds? > To: manchunjohn-ma at uiowa.edu, BioPerl List > ? ? ? ? > Message-ID: <4A529312.9040905 at cornell.edu> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > John M.C. Ma wrote: >> And as of today, they are now at 3.2.8, and the problem is not fixed. >> And I don't want my project to be stalled-- any tips for a workaround? > > FORK! > > Just kidding. ?Mostly. > > Actually, svn vendor branches or something similar can be a good option > for unpleasant things like this, see > http://svnbook.red-bean.com/en/1.5/svn.advanced.vendorbr.html > > Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY ?14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > ------------------------------ > > Message: 2 > Date: Tue, 07 Jul 2009 06:58:01 -0700 > From: Robert Hubley > Subject: [Bioperl-l] RepeatMasker > To: bioperl-l at bioperl.org > Message-ID: <4A535469.4060603 at systemsbiology.org> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > This list email as forwarded to us by a colleague. ?I fixed this bug > awhile back and I just double checked 3.2.8 and don't see any problems > with the options -noint or -lcambig. ?Could someone help us determine > how this is breaking bio-perl? > > Thanks, > > -Robert > > |We have told the guys at RepeatMasker that RM-3.1.6 have a problem > |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). > |And as of today, they are now at 3.2.8, and the problem is not fixed. > |And I don't want my project to be stalled-- any tips for a workaround? > || > ||Hi, > || > ||Perhaps you already know about this, but in RepeatMasker 3.1.6 -noint > ||cannot be used because of error 'Unknown option: noint-species'. > ||This is caused by line 1131 having no space after the "-noint". > ||Likewise, -lcambig on 1128 would probably suffer a similar problem. > || > ||Will this be fixed in the next version, and how often do you release new > ||versions? > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 75, Issue 10 > ***************************************** > From cjfields at illinois.edu Tue Jul 7 13:30:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Jul 2009 12:30:23 -0500 Subject: [Bioperl-l] YAPC::NA hackathon Message-ID: <37D8DDC8-358F-4E56-9C49-C21281735A3A@illinois.edu> On behalf of the bioperl core devs I want to thank the participants of the YAPC::NA 2009 BioPerl hackathon. Robert Buels, Jay Hannah, and Bruno Vecchi managed to squash several bugs in the process; Robert recently merged these back to trunk. Great work! chris From cjfields at illinois.edu Tue Jul 7 13:23:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Jul 2009 12:23:56 -0500 Subject: [Bioperl-l] RepeatMasker In-Reply-To: <4A535469.4060603@systemsbiology.org> References: <4A535469.4060603@systemsbiology.org> Message-ID: <870B41F0-A31B-44FF-B44F-2957ECDB6F9E@illinois.edu> Robert, the best way to handle this is to file a bug report indicating all the specifics as well as some example code demonstrating the problem. http://www.bioperl.org/wiki/Bugs chris On Jul 7, 2009, at 8:58 AM, Robert Hubley wrote: > This list email as forwarded to us by a colleague. I fixed this bug > awhile back and I just double checked 3.2.8 and don't see any > problems with the options -noint or -lcambig. Could someone help us > determine how this is breaking bio-perl? > > Thanks, > > -Robert > > |We have told the guys at RepeatMasker that RM-3.1.6 have a problem > |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug > 2138). > |And as of today, they are now at 3.2.8, and the problem is not fixed. > |And I don't want my project to be stalled-- any tips for a > workaround? > || > ||Hi, > || > ||Perhaps you already know about this, but in RepeatMasker 3.1.6 - > noint ||cannot be used because of error 'Unknown option: noint- > species'. > ||This is caused by line 1131 having no space after the "-noint". || > Likewise, -lcambig on 1128 would probably suffer a similar problem. > || > ||Will this be fixed in the next version, and how often do you > release new ||versions? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Jul 7 13:52:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Jul 2009 12:52:44 -0500 Subject: [Bioperl-l] RepeatMasker In-Reply-To: <4A535469.4060603@systemsbiology.org> References: <4A535469.4060603@systemsbiology.org> Message-ID: <3E4C0788-8B44-4408-BB26-FA9F48133948@illinois.edu> Robert, Sorry about that last post, thought you were reporting a problem not inquiring about one. Here's what we have: http://bugzilla.open-bio.org/show_bug.cgi?id=2138 Not sure but from the last few reports this is still a problem with RepeatMasker and bioperl. I'll try looking into it from our end. chris On Jul 7, 2009, at 8:58 AM, Robert Hubley wrote: > This list email as forwarded to us by a colleague. I fixed this bug > awhile back and I just double checked 3.2.8 and don't see any > problems with the options -noint or -lcambig. Could someone help us > determine how this is breaking bio-perl? > > Thanks, > > -Robert > > |We have told the guys at RepeatMasker that RM-3.1.6 have a problem > |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug > 2138). > |And as of today, they are now at 3.2.8, and the problem is not fixed. > |And I don't want my project to be stalled-- any tips for a > workaround? > || > ||Hi, > || > ||Perhaps you already know about this, but in RepeatMasker 3.1.6 - > noint ||cannot be used because of error 'Unknown option: noint- > species'. > ||This is caused by line 1131 having no space after the "-noint". || > Likewise, -lcambig on 1128 would probably suffer a similar problem. > || > ||Will this be fixed in the next version, and how often do you > release new ||versions? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gowthaman.ramasamy at sbri.org Tue Jul 7 13:59:41 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Tue, 7 Jul 2009 10:59:41 -0700 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... Message-ID: Hi All, I am trying to use bp_genbank2gff.pl script to convert a locally downloaded genbank file. It is throwing stack errors. But, the script works beautifully when I use --accession option to download and convert. Any suggestions? Thanks very much for checking this. the command i use: perl bp_genbank2gff.pl --stdout --file NC_004329.nb and i am getting the following exception message: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: EMBL stream with no ID. Not embl in my book STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/Root/Root.pm:359 STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/Bio/SeqIO/embl.pm:189 STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/biofetch.pm:163 STACK: bp_genbank2gff.pl:274 ----------------------------------------------------------- Many thanks in advance, Gowtham From cain.cshl at gmail.com Tue Jul 7 15:18:50 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Tue, 7 Jul 2009 15:18:50 -0400 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: References: Message-ID: <26CA9DCC-ABA6-46A9-AE8D-DBD116CFB055@gmail.com> Hi Gotham, I was going to send you an email to complain to the author, until I realized that it was me :-) It has been quite a while since I looked at the code for this script, as the one I typically use these days is bp_genbank2gff3.pl, but I think I have a "fix". Try changing the name of the file to NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic that the script uses to determine if a file is genbank formated versus embl (note that the error message says it's not an embl file--that's why). If that doesn't do it, let me (and the mailing list) know. Scott PS: I wasn't really going to say to complain to the author directly-- that was just me trying to be funny. PPS: As another side note, it is fairly funny to me that the code that this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in the documentation that it is proof-of-principle and should not be used in production. On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: > > Hi All, > I am trying to use bp_genbank2gff.pl script to convert a locally > downloaded genbank file. It is throwing stack errors. But, the > script works beautifully when I use --accession option to download > and convert. > > Any suggestions? Thanks very much for checking this. > > the command i use: > perl bp_genbank2gff.pl --stdout --file NC_004329.nb > > and i am getting the following exception message: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: EMBL stream with no ID. Not embl in my book > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ > Root/Root.pm:359 > STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/Bio/ > SeqIO/embl.pm:189 > STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ > perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ > biofetch.pm:163 > STACK: bp_genbank2gff.pl:274 > ----------------------------------------------------------- > > > Many thanks in advance, > Gowtham > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From gowthaman.ramasamy at sbri.org Tue Jul 7 16:06:16 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Tue, 07 Jul 2009 13:06:16 -0700 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: <26CA9DCC-ABA6-46A9-AE8D-DBD116CFB055@gmail.com> Message-ID: Hi Scott, Thanks for the mail. Its funny. Knowing you (at GMOD meeting) I wouldn't mistake it in any other ways. I was rushing back to email the list to tell I got it working. There is NOTHING wrong with script. Its perfectly good. (not many scripts stands the time. This one does). Its the genbank record. Some of the genbank records I tried did produce that error, while others did not. I'll dig into those files to see if anything is obvious (to my eyes) that causes this error. PS1: I tried changing the name, and that did NOT solve the problem. Thanks once again, Gowtham On 7/7/09 12:18 PM, "Scott Cain" wrote: > Hi Gotham, > > I was going to send you an email to complain to the author, until I > realized that it was me :-) > > It has been quite a while since I looked at the code for this script, > as the one I typically use these days is bp_genbank2gff3.pl, but I > think I have a "fix". Try changing the name of the file to > NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic > that the script uses to determine if a file is genbank formated versus > embl (note that the error message says it's not an embl file--that's > why). If that doesn't do it, let me (and the mailing list) know. > > Scott > > PS: I wasn't really going to say to complain to the author directly-- > that was just me trying to be funny. > > PPS: As another side note, it is fairly funny to me that the code that > this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in the > documentation that it is proof-of-principle and should not be used in > production. > > > On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: > >> >> Hi All, >> I am trying to use bp_genbank2gff.pl script to convert a locally >> downloaded genbank file. It is throwing stack errors. But, the >> script works beautifully when I use --accession option to download >> and convert. >> >> Any suggestions? Thanks very much for checking this. >> >> the command i use: >> perl bp_genbank2gff.pl --stdout --file NC_004329.nb >> >> and i am getting the following exception message: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ >> Root/Root.pm:359 >> STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/Bio/ >> SeqIO/embl.pm:189 >> STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ >> perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ >> biofetch.pm:163 >> STACK: bp_genbank2gff.pl:274 >> ----------------------------------------------------------- >> >> >> Many thanks in advance, >> Gowtham >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > From gowthaman.ramasamy at sbri.org Tue Jul 7 16:12:51 2009 From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy) Date: Tue, 07 Jul 2009 13:12:51 -0700 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: <26CA9DCC-ABA6-46A9-AE8D-DBD116CFB055@gmail.com> Message-ID: And bp_genbank2gff3.pl script handled them very well....... Thanks again, gowtham On 7/7/09 12:18 PM, "Scott Cain" wrote: > Hi Gotham, > > I was going to send you an email to complain to the author, until I > realized that it was me :-) > > It has been quite a while since I looked at the code for this script, > as the one I typically use these days is bp_genbank2gff3.pl, but I > think I have a "fix". Try changing the name of the file to > NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic > that the script uses to determine if a file is genbank formated versus > embl (note that the error message says it's not an embl file--that's > why). If that doesn't do it, let me (and the mailing list) know. > > Scott > > PS: I wasn't really going to say to complain to the author directly-- > that was just me trying to be funny. > > PPS: As another side note, it is fairly funny to me that the code that > this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in the > documentation that it is proof-of-principle and should not be used in > production. > > > On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: > >> >> Hi All, >> I am trying to use bp_genbank2gff.pl script to convert a locally >> downloaded genbank file. It is throwing stack errors. But, the >> script works beautifully when I use --accession option to download >> and convert. >> >> Any suggestions? Thanks very much for checking this. >> >> the command i use: >> perl bp_genbank2gff.pl --stdout --file NC_004329.nb >> >> and i am getting the following exception message: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: EMBL stream with no ID. Not embl in my book >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ >> Root/Root.pm:359 >> STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/Bio/ >> SeqIO/embl.pm:189 >> STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ >> perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ >> biofetch.pm:163 >> STACK: bp_genbank2gff.pl:274 >> ----------------------------------------------------------- >> >> >> Many thanks in advance, >> Gowtham >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > From scott at scottcain.net Tue Jul 7 16:17:58 2009 From: scott at scottcain.net (Scott Cain) Date: Tue, 7 Jul 2009 16:17:58 -0400 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: References: Message-ID: <12B256A9-A227-4234-B1D3-2ED4216CB745@scottcain.net> Hi Gowthaman, I thought I knew you but wasn't sure; hi again. About the problematic genbank files: was it something that the genbank parser should have handled but did not? If so, we should get that fixed anyway. Scott On Jul 7, 2009, at 4:06 PM, Gowthaman Ramasamy wrote: > Hi Scott, > Thanks for the mail. Its funny. Knowing you (at GMOD meeting) I > wouldn't > mistake it in any other ways. > > I was rushing back to email the list to tell I got it working. There > is > NOTHING wrong with script. Its perfectly good. (not many scripts > stands the > time. This one does). > > Its the genbank record. Some of the genbank records I tried did > produce that > error, while others did not. I'll dig into those files to see if > anything is > obvious (to my eyes) that causes this error. > > PS1: I tried changing the name, and that did NOT solve the problem. > > Thanks once again, > Gowtham > > > On 7/7/09 12:18 PM, "Scott Cain" wrote: > >> Hi Gotham, >> >> I was going to send you an email to complain to the author, until I >> realized that it was me :-) >> >> It has been quite a while since I looked at the code for this script, >> as the one I typically use these days is bp_genbank2gff3.pl, but I >> think I have a "fix". Try changing the name of the file to >> NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic >> that the script uses to determine if a file is genbank formated >> versus >> embl (note that the error message says it's not an embl file--that's >> why). If that doesn't do it, let me (and the mailing list) know. >> >> Scott >> >> PS: I wasn't really going to say to complain to the author directly-- >> that was just me trying to be funny. >> >> PPS: As another side note, it is fairly funny to me that the code >> that >> this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in >> the >> documentation that it is proof-of-principle and should not be used in >> production. >> >> >> On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: >> >>> >>> Hi All, >>> I am trying to use bp_genbank2gff.pl script to convert a locally >>> downloaded genbank file. It is throwing stack errors. But, the >>> script works beautifully when I use --accession option to download >>> and convert. >>> >>> Any suggestions? Thanks very much for checking this. >>> >>> the command i use: >>> perl bp_genbank2gff.pl --stdout --file NC_004329.nb >>> >>> and i am getting the following exception message: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: EMBL stream with no ID. Not embl in my book >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ >>> Root/Root.pm:359 >>> STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/ >>> Bio/ >>> SeqIO/embl.pm:189 >>> STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ >>> perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ >>> biofetch.pm:163 >>> STACK: bp_genbank2gff.pl:274 >>> ----------------------------------------------------------- >>> >>> >>> Many thanks in advance, >>> Gowtham >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> > ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Tue Jul 7 18:30:28 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 7 Jul 2009 17:30:28 -0500 Subject: [Bioperl-l] bp_genbank2gff.pl giving errors when using file.... In-Reply-To: References: Message-ID: I may take a look at that one as well (it would be nice to know if this is something that's popping up in newer records). chris On Jul 7, 2009, at 3:06 PM, Gowthaman Ramasamy wrote: > Hi Scott, > Thanks for the mail. Its funny. Knowing you (at GMOD meeting) I > wouldn't > mistake it in any other ways. > > I was rushing back to email the list to tell I got it working. There > is > NOTHING wrong with script. Its perfectly good. (not many scripts > stands the > time. This one does). > > Its the genbank record. Some of the genbank records I tried did > produce that > error, while others did not. I'll dig into those files to see if > anything is > obvious (to my eyes) that causes this error. > > PS1: I tried changing the name, and that did NOT solve the problem. > > Thanks once again, > Gowtham > > > On 7/7/09 12:18 PM, "Scott Cain" wrote: > >> Hi Gotham, >> >> I was going to send you an email to complain to the author, until I >> realized that it was me :-) >> >> It has been quite a while since I looked at the code for this script, >> as the one I typically use these days is bp_genbank2gff3.pl, but I >> think I have a "fix". Try changing the name of the file to >> NC_004329.gb or .gbk or .genbank. That is the (very weak) heuristic >> that the script uses to determine if a file is genbank formated >> versus >> embl (note that the error message says it's not an embl file--that's >> why). If that doesn't do it, let me (and the mailing list) know. >> >> Scott >> >> PS: I wasn't really going to say to complain to the author directly-- >> that was just me trying to be funny. >> >> PPS: As another side note, it is fairly funny to me that the code >> that >> this script depends upon, Bio::DB::GFF::Adaptor::biofetch, says in >> the >> documentation that it is proof-of-principle and should not be used in >> production. >> >> >> On Jul 7, 2009, at 1:59 PM, Gowthaman Ramasamy wrote: >> >>> >>> Hi All, >>> I am trying to use bp_genbank2gff.pl script to convert a locally >>> downloaded genbank file. It is throwing stack errors. But, the >>> script works beautifully when I use --accession option to download >>> and convert. >>> >>> Any suggestions? Thanks very much for checking this. >>> >>> the command i use: >>> perl bp_genbank2gff.pl --stdout --file NC_004329.nb >>> >>> and i am getting the following exception message: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: EMBL stream with no ID. Not embl in my book >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.5/Bio/ >>> Root/Root.pm:359 >>> STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.5/ >>> Bio/ >>> SeqIO/embl.pm:189 >>> STACK: Bio::DB::GFF::Adaptor::biofetch::load_from_file /usr/lib64/ >>> perl5/site_perl/5.8.5/x86_64-linux-thread-multi/Bio/DB/GFF/Adaptor/ >>> biofetch.pm:163 >>> STACK: bp_genbank2gff.pl:274 >>> ----------------------------------------------------------- >>> >>> >>> Many thanks in advance, >>> Gowtham >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From abhishek.vit at gmail.com Wed Jul 8 10:24:05 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Wed, 8 Jul 2009 10:24:05 -0400 Subject: [Bioperl-l] Classifying SNPs Message-ID: Hi All This might seem to be an old track question. However I was not able to find a good answer in the many diff mailing list archives. For all our SNP predictions we would like to know whether they are synonymous / non-synonymous. If Non-synonymous/Exonic? then find the position on the gene where amino acid is getting changed and to what ?...Also info about indels will help. I am not sure if something like this already exists. If not even some pointers on how to move forward will help. Thanks, -Abhi From Xianjun.Dong at bccs.uib.no Wed Jul 8 11:04:15 2009 From: Xianjun.Dong at bccs.uib.no (Xianjun Dong) Date: Wed, 08 Jul 2009 17:04:15 +0200 Subject: [Bioperl-l] [Bio::Graphics::Panel] code reference cannot pass to -link, why? In-Reply-To: <4A4E3029.4020109@ii.uib.no> References: <4A4E3029.4020109@ii.uib.no> Message-ID: <4A54B56F.6050204@ii.uib.no> Hi, Scott Thanks for your help to my previous question about background layer. It works well! Now, I have another question regarding the -link function in imagemap. I post to Bioperl mailist. It seems to detail to get much attention. I followed the code in the Bio::Graphics POD, but it does not work. Could you pls take a look? Thanks again Xianjun Xianjun Dong wrote: > Hi, > > I have a problem while using the -link in Bio::Graphics (version 1.96): > > As the POD of Bio::Graphics described > (http://search.cpan.org/~lds/Bio-Graphics-1.96/lib/Bio/Graphics/Panel.pm#Creating_Imagemaps), > > > link format like: > > -link => 'http://www.google.com/search?q=$description' > > > works well in my code, but the format like > > -link => sub { > my ($feature,$panel) = @_; > my $type = $feature->primary_tag; > my $name = $feature->display_name; > if ($primary_tag eq 'clone') { > return "http://www.google.com/search?q=$name"; > } else { > return "http://www.yahoo.com/search?p=$name"; > } > > > does not output image map as expected. > > Here I attached a simple code as example for anyone who is willing to > test for me: > > #!/usr/bin/perl > use strict; > use Bio::Graphics; > use Bio::Graphics::Feature; > my $ftr= 'Bio::Graphics::Feature'; > # processed_transcript > my $trans1 = > > $ftr->new(-start=>50,-end=>10,-display_name=>'ZK154.1',-type=>'UTR'); > my $trans2 = > > $ftr->new(-start=>100,-end=>50,-display_name=>'ZK154.2',-type=>'CDS'); > my $trans3 = > > $ftr->new(-start=>350,-end=>225,-display_name=>'ZK154.3',-type=>'CDS', > -source=>'a'); > my $trans4 = > > $ftr->new(-start=>700,-end=>650,-display_name=>'ZK154.4',-type=>'UTR'); > my @trans = ($trans1,$trans2,$trans3,$trans4); > > my $panel= Bio::Graphics::Panel->new(-start =>0,-length=>1050); > > $panel->add_track(\@trans, > -glyph => 'transcript2', > # This works well! > #-link => > 'http://www.google.com/search?q=$name', > # while, the following code does not work as > expected. > -link => sub { > my ($feature,$panel) = @_; > my $type = $feature->primary_tag; > my $name = $feature->display_name; > if ($type eq 'CDS') { > return > "http://www.google.com/search?q=$name"; > } else { > return > "http://www.yahoo.com/search?p=$name"; > } > } > ); > my $map = $panel->create_web_map("mapname"); > print $map; > $panel->finished(); > > In my test (Bioperl 1.6.0), its output is: > > > href="http://www.yahoo.com/search?p=" /> > href="http://www.yahoo.com/search?p=" /> > href="http://www.yahoo.com/search?p=" /> > href="http://www.yahoo.com/search?p=" /> > > > > It seems $feature->primary_tag returns 'track' (I don't know where > this come from...), but not the type of features. Anyone has clue for > this problem? > > Thanks > -- ========================================== Xianjun Dong PhD student, Lenhard group Computational Biology Unit Bergen Center for Computational Science University of Bergen Hoyteknologisenteret, Thormohlensgate 55 N-5008 Bergen, Norway E-mail: xianjun.dong at bccs.uib.no Tel.: +47 555 84022 Fax : +47 555 84295 ========================================== From giles.weaver at googlemail.com Wed Jul 8 11:26:54 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Wed, 8 Jul 2009 16:26:54 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A520591.3070407@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> Message-ID: <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> I've just added a sequence adapter removal implementation to the bioperl scrapbook at http://www.bioperl.org/wiki/Removing_sequencing_adapters. I think the basic method is sound, but the implementation is ugly. Performance wise, it currently takes around 80 minutes to remove adapters from a ~3.2 million read Illumina run. This includes quality trimming and grouping the sequences to reduce processing time. The quality trimming (described earlier in this thread) takes about 15 minutes, so adapter removal is definitely the bottleneck. I'm confident that some relatively simple developments in Bioperl and/or EMBOSS will yield some big performance improvements - if you see my sample code in the scrapbook you'll understand why! I've also been experimenting with sequence entropy calculations for filtering out junk sequence. I used Mark Jensens code at http://www.bioperl.org/wiki/Site_entropy_in_an_alignment for inspiration. Here is my current entropy calculation code: sub entropy { my ($seq_str, $word_size) = @_; my %res_counts; for (my $i = 0; $i <= ((length $seq_str) - $word_size); $i ++) { my $word = substr $seq_str, $i, $word_size; if ($word !~ /N/) { $res_counts{$word} ++; } } #~ print STDERR join (" ", keys %res_counts), "\n"; #~ print STDERR join (" ", values %res_counts), "\n"; my @counts = values %res_counts; my $word_count = sum @counts; map {$_ /= $word_count} @counts; return sum map {-$_*log2($_)} @counts; } sub log2 { my $n = shift; return log($n)/log(2); } I don't know if this does "the right thing", and have yet to determine a suitable word size and entropy threshold for sequence filtering, so feel free to comment/test away. Giles 2009/7/6 Peter Rice > Giles Weaver wrote: > > I'm developing a transcriptomics database for use with next-gen data, and > > have found processing the raw data to be a big hurdle. > > > > I'm a bit late in responding to this thread, so most issues have already > > been discussed. One thing that hasn't been mentioned is removal of > adapters > > from raw Illumina sequence. This is a PITA, and I'm not aware of any well > > developed and documented open source software for removal of adapters > (and > > poor quality sequence) from Illumina reads. > > We would like to add this to EMBOSS. Can you describe the method you > would like to use (I see you currently use a combination of bioperl and > emboss for this). > > > For my purposes the tools that would love to see supported in > > bioperl/bioperl-run are: > > > > - next-gen sequence quality parsing (to output phred scores) > > - sequence quality based trimming > > - sequencing adapter removal > > - filtering based on sequence complexity (repeats, entropy etc) > > - bioperl-run modules for bowtie etc. > > We would like to see these supported in all the Open-Bio Projects and > they are a priority for EMBOSS. > > Can you suggest quality filters, trimming methods, adaptor removal > methods, sequence filters and any other applications we could provide in > EMBOSS. > > We hope to keep in line with what the other projects do so that EMBOSS, > bioperl, biopython etc. can be used interchangeably in pipelines. > > > Obviously all of these need to be fast! .... My > > current code trims ~1300 sequences/second, including unzipping the raw > data > > and converting it to sanger fastq with biopython. Processing an entire > > sequencing run with the whole pipeline takes in the region of 6-12h. > > OK, we will see what speed we can reach. > > > Hope this looooong post was of interest to someone! > > Very interesting! > > regards, > > Peter Rice > From maj at fortinbras.us Wed Jul 8 11:23:54 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 11:23:54 -0400 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: Message-ID: <6269F0005AD041A69233C82E9BE1E776@NewLife> Hey Abhishek- You might root around in Bio::PopGen. Here's a script to get stuff from raw fasta data--see comments within. cheers Mark use Bio::AlignIO; use Bio::PopGen::Utilities; $file = "your_raw_file.fas"; my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; # get the alignment into a Bio::PopGen::Population format, with codons # as the marker sites my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, -site_model=>'cod'); # here are your variable codons... my @cdnpos = $pop->get_marker_names; # here are your individuals represented in the alignment my @inds = $pop->get_Individuals; # which have names like "Codon-3-9", "Codon-4-12", etc foreach my $cdn (@cdnpos) { # calculate the unique codons represented at this codon position my (%ucdns, @ucdns); @genos = $pop->get_Genotypes(-marker=>$cdn); $ucdns{$_->get_Alleles}++ for @genos; @ucdns = sort keys %ucdns; # # here, use translate or something faster to identify syn/non-syn # check out code in Bio::Align::DNAStatistics for various methods } # relate back to individuals with this foreach my $ind (@inds) { print "Individual ".$ind->unique_id."\n"; print "Site\tAllele\n"; foreach my $cdn (@cdnpos) { print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; } } 1; ----- Original Message ----- From: "Abhishek Pratap" To: Sent: Wednesday, July 08, 2009 10:24 AM Subject: [Bioperl-l] Classifying SNPs Hi All This might seem to be an old track question. However I was not able to find a good answer in the many diff mailing list archives. For all our SNP predictions we would like to know whether they are synonymous / non-synonymous. If Non-synonymous/Exonic then find the position on the gene where amino acid is getting changed and to what ...Also info about indels will help. I am not sure if something like this already exists. If not even some pointers on how to move forward will help. Thanks, -Abhi _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From pmr at ebi.ac.uk Wed Jul 8 11:57:47 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Wed, 08 Jul 2009 16:57:47 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> Message-ID: <4A54C1FB.8050708@ebi.ac.uk> Giles Weaver wrote: > I've just added a sequence adapter removal implementation to the bioperl > scrapbook at http://www.bioperl.org/wiki/Removing_sequencing_adapters. I > think the basic method is sound, but the implementation is ugly. Ugly perhaps, but I'll look anyway :-) I see you don't use needle because it creates gapped alignments, but that can be fixed with a sufficiently high gap penalty (just to see if it works - it won't be fast). We also have word-based matching methods in EMBOSS but they would not allow mismatches. I will play with alternatives and see what works best. Some word-based seed should allow for a faster solution. The provisional EMBOSS name for a quality filter and adaptor removal application is "quaffle" regards, Peter Rice From cjfields at illinois.edu Wed Jul 8 12:24:27 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Jul 2009 11:24:27 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A54C1FB.8050708@ebi.ac.uk> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> Message-ID: On Jul 8, 2009, at 10:57 AM, Peter Rice wrote: > Giles Weaver wrote: >> I've just added a sequence adapter removal implementation to the >> bioperl >> scrapbook at http://www.bioperl.org/wiki/ >> Removing_sequencing_adapters. I >> think the basic method is sound, but the implementation is ugly. > > Ugly perhaps, but I'll look anyway :-) > > I see you don't use needle because it creates gapped alignments, but > that can be fixed with a sufficiently high gap penalty (just to see if > it works - it won't be fast). > > We also have word-based matching methods in EMBOSS but they would not > allow mismatches. I will play with alternatives and see what works > best. > Some word-based seed should allow for a faster solution. > > The provisional EMBOSS name for a quality filter and adaptor removal > application is "quaffle" > > regards, > > Peter Rice In the meantime, we can probably add this in to Bio::SeqUtils for general use as an exported method. It would be nice to get some regression tests going for this to make sure it does what we expect, so maybe some test data and expected results? chris From IRytsareva at dow.com Wed Jul 8 15:42:54 2009 From: IRytsareva at dow.com (Rytsareva, Inna (I)) Date: Wed, 8 Jul 2009 15:42:54 -0400 Subject: [Bioperl-l] While loop - SearchIO for BioPerl Message-ID: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Hello, I have a follow script to parse the BLAST report: my $in = Bio::SearchIO->new ( -file =>$out_file, -format =>'blast') or die $!; while (my $result = $in->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp) { $qhit = $hit->name; $start = $hsp->hit->start; $end = $hsp->hit->end; } } print "Hit= ", $qhit, ",Start = ", $start, ",End = ", $end,"\n"; } Usually, the report has a number of the same hsp for each hit. Using "print" command it gives me a hit name, start and end positions for each hit, except last on. For last one it prints all the hsps. Something like this: Hit= gnl|DAS|22386,Start = 7578,End = 7601 Hit= gnl|DAS|25627,Start = 2824,End = 2863 Hit= gnl|DAS|25328,Start = 8864,End = 8887 Hit= gnl|DAS|4890,Start = 1896,End = 1919 Hit= gnl|DAS|12191,Start = 1898,End = 1921 Hit= gnl|DAS|4276,Start = 557,End = 580 Hit= gnl|DAS|12959,Start = 801,End = 824 Hit= gnl|DAS|4092,Start = 2266,End = 2304 Hit= gnl|DAS|19740,Start = 13572,End = 13610 Hit= gnl|DAS|12393,Start = 3901,End = 3924 Hit= gnl|DAS|25687,Start = 10415,End = 10438 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Hit= gnl|DAS|12277,Start = 7410,End = 7433 Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. I don't need these duplicates. How can I fix that? Thanks, Inna Rytsareva Discovery Information Management Dow AgroSciences Indianapolis, IN 317-337-4716 From rmb32 at cornell.edu Wed Jul 8 18:45:09 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 08 Jul 2009 15:45:09 -0700 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> Message-ID: <4A552175.70009@cornell.edu> Giles Weaver wrote: > takes about 15 minutes, so adapter removal is definitely the bottleneck. I'm > confident that some relatively simple developments in Bioperl and/or EMBOSS > will yield some big performance improvements - if you see my sample code in Apropos this kind of thing, have you guys already discussed using lazy object creation for objects returned from bioperl parsers? Not really relevant in the short term, but it could be a useful avenue to pursue for addressing some performance concerns people (like ebi) have. In very vague terms, one would probably implement this by defining a very light-weight role/class called something like Bio::LazyInflator, that would provide only an `inflate` method. Parsers would parse into lightweight structures (probably arrayrefs) that implement LazyInflator and users could choose between grabbing data out of the uninflated arrayref directly, or they could call inflate() on it to transform it into a real object (like a Bio::Annotation or Bio::Seq or something). The exact implementation of this would vary depending on whether Moose is being used. This could potentially also be compatible with having some of the tight parsing loops be implemented in XS. Rob From torsten.seemann at infotech.monash.edu.au Wed Jul 8 20:25:34 2009 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 9 Jul 2009 10:25:34 +1000 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: Inna, > Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. > I don't need these duplicates. > How can I fix that? > ? ? ? ? ? ? ? ? ? ? ?? $start = $hsp->hit->start; > ? ? ? ? ? ? ? ? ? ? ? ?$end = $hsp->hit->end; Are you sure you mean $hsp->hit->start ? Perhaps you mean $hsp->start() or $hsp->start('hit') ? --Torsten Seemann --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash University, AUSTRALIA From jason at bioperl.org Wed Jul 8 20:50:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 8 Jul 2009 17:50:54 -0700 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: <465D31E0-BBFE-4C73-A5E8-2CA9C0DF6DE9@bioperl.org> both work...TMTOWTDI $hsp->query->start and $hsp->start('query') are equivalent. as are $hsp->hit->start and $hsp->start('hit') . On Jul 8, 2009, at 5:25 PM, Torsten Seemann wrote: > Inna, > >> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >> I don't need these duplicates. >> How can I fix that? > >> $start = $hsp->hit->start; >> $end = $hsp->hit->end; > > Are you sure you mean $hsp->hit->start ? > Perhaps you mean $hsp->start() or $hsp->start('hit') ? > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From jason at bioperl.org Wed Jul 8 20:50:54 2009 From: jason at bioperl.org (Jason Stajich) Date: Wed, 8 Jul 2009 17:50:54 -0700 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: <465D31E0-BBFE-4C73-A5E8-2CA9C0DF6DE9@bioperl.org> both work...TMTOWTDI $hsp->query->start and $hsp->start('query') are equivalent. as are $hsp->hit->start and $hsp->start('hit') . On Jul 8, 2009, at 5:25 PM, Torsten Seemann wrote: > Inna, > >> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >> I don't need these duplicates. >> How can I fix that? > >> $start = $hsp->hit->start; >> $end = $hsp->hit->end; > > Are you sure you mean $hsp->hit->start ? > Perhaps you mean $hsp->start() or $hsp->start('hit') ? > > > --Torsten Seemann > --Victorian Bioinformatics Consortium, Dept. Microbiology, Monash > University, AUSTRALIA > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From maj at fortinbras.us Wed Jul 8 21:00:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 21:00:19 -0400 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: <165113A382BE4F20AAF10157EDF8E3FE@NewLife> My guess would be you have multiple query sequences (27, to be exact) that hit the same subject, viz. 12277 MAJ ----- Original Message ----- From: "Rytsareva, Inna (I)" To: Sent: Wednesday, July 08, 2009 3:42 PM Subject: [Bioperl-l] While loop - SearchIO for BioPerl > Hello, > > I have a follow script to parse the BLAST report: > > my $in = Bio::SearchIO->new ( -file =>$out_file, > -format =>'blast') or die $!; > > while (my $result = $in->next_result) { > while (my $hit = $result->next_hit) > { > while (my $hsp = $hit->next_hsp) { > $qhit = $hit->name; > $start = $hsp->hit->start; > $end = $hsp->hit->end; > } > > > } print "Hit= ", $qhit, > ",Start = ", $start, > ",End = ", $end,"\n"; > } > > Usually, the report has a number of the same hsp for each hit. > Using "print" command it gives me a hit name, start and end positions > for each hit, except last on. For last one it prints all the hsps. > Something like this: > > Hit= gnl|DAS|22386,Start = 7578,End = 7601 > Hit= gnl|DAS|25627,Start = 2824,End = 2863 > Hit= gnl|DAS|25328,Start = 8864,End = 8887 > Hit= gnl|DAS|4890,Start = 1896,End = 1919 > Hit= gnl|DAS|12191,Start = 1898,End = 1921 > Hit= gnl|DAS|4276,Start = 557,End = 580 > Hit= gnl|DAS|12959,Start = 801,End = 824 > Hit= gnl|DAS|4092,Start = 2266,End = 2304 > Hit= gnl|DAS|19740,Start = 13572,End = 13610 > Hit= gnl|DAS|12393,Start = 3901,End = 3924 > Hit= gnl|DAS|25687,Start = 10415,End = 10438 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > > Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. > I don't need these duplicates. > How can I fix that? > > Thanks, > Inna Rytsareva > Discovery Information Management > Dow AgroSciences > Indianapolis, IN > 317-337-4716 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jul 8 21:08:33 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Jul 2009 20:08:33 -0500 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> Message-ID: <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> I'm curious as to what this report looks like. The example report you posted to the gbrowse list had serious issues (different problem, 'No midline' error which I replicated); mainly there were no blank lines making it pretty much invalid, so the parser had issues with it. Example lines from one HSP: > gnl|DAS|24699 pDAB101580 Length = 12942 Score = 50.1 bits (25), Expect = 5e-06 Identities = 37/41 (90%) Strand = Plus / Plus Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 ||||||||||||||| ||| |||||||| ||||| |||||| Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 Score = 46.1 bits (23), Expect = 8e-05 Identities = 35/39 (89%) Strand = Plus / Plus Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 ||||||||||||| ||| |||||||| ||||| |||||| Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 Score = 46.1 bits (23), Expect = 8e-05 Identities = 35/39 (89%) Strand = Plus / Plus Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 ||||||||||||| ||| |||||||| ||||| |||||| Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 ... chris On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: > Hello, > > I have a follow script to parse the BLAST report: > > my $in = Bio::SearchIO->new ( -file =>$out_file, > -format =>'blast') or die $!; > > while (my $result = $in->next_result) { > while (my $hit = $result->next_hit) > { > while (my $hsp = $hit->next_hsp) { > $qhit = $hit->name; > $start = $hsp->hit->start; > $end = $hsp->hit->end; > } > > > } print "Hit= ", $qhit, > ",Start = ", $start, > ",End = ", $end,"\n"; > } > > Usually, the report has a number of the same hsp for each hit. > Using "print" command it gives me a hit name, start and end positions > for each hit, except last on. For last one it prints all the hsps. > Something like this: > > Hit= gnl|DAS|22386,Start = 7578,End = 7601 > Hit= gnl|DAS|25627,Start = 2824,End = 2863 > Hit= gnl|DAS|25328,Start = 8864,End = 8887 > Hit= gnl|DAS|4890,Start = 1896,End = 1919 > Hit= gnl|DAS|12191,Start = 1898,End = 1921 > Hit= gnl|DAS|4276,Start = 557,End = 580 > Hit= gnl|DAS|12959,Start = 801,End = 824 > Hit= gnl|DAS|4092,Start = 2266,End = 2304 > Hit= gnl|DAS|19740,Start = 13572,End = 13610 > Hit= gnl|DAS|12393,Start = 3901,End = 3924 > Hit= gnl|DAS|25687,Start = 10415,End = 10438 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > Hit= gnl|DAS|12277,Start = 7410,End = 7433 > > Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. > I don't need these duplicates. > How can I fix that? > > Thanks, > Inna Rytsareva > Discovery Information Management > Dow AgroSciences > Indianapolis, IN > 317-337-4716 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 8 21:41:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Jul 2009 20:41:01 -0500 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> Message-ID: Yep, that's what I was thinking. The fragment in question is fairly short. Inna, if you want the best HSP you could just grab the one that best fits what you expect (best eval, score, whatever). chris On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote: > A lack of low-complexity filtering (as seems apparent from this > report snippet, if > I understand that concept correctly) could explain the multiple > query hits on a > short (24bp) region of the same subject... > ----- Original Message ----- From: "Chris Fields" > > To: "Rytsareva, Inna (I)" > Cc: > Sent: Wednesday, July 08, 2009 9:08 PM > Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl > > >> I'm curious as to what this report looks like. The example report >> you posted to the gbrowse list had serious issues (different >> problem, 'No midline' error which I replicated); mainly there were >> no blank lines making it pretty much invalid, so the parser had >> issues with it. Example lines from one HSP: >> >> > gnl|DAS|24699 pDAB101580 >> Length = 12942 >> Score = 50.1 bits (25), Expect = 5e-06 >> Identities = 37/41 (90%) >> Strand = Plus / Plus >> Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 >> ||||||||||||||| ||| |||||||| ||||| |||||| >> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >> Score = 46.1 bits (23), Expect = 8e-05 >> Identities = 35/39 (89%) >> Strand = Plus / Plus >> Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 >> ||||||||||||| ||| |||||||| ||||| |||||| >> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >> Score = 46.1 bits (23), Expect = 8e-05 >> Identities = 35/39 (89%) >> Strand = Plus / Plus >> Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 >> ||||||||||||| ||| |||||||| ||||| |||||| >> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >> >> ... >> >> chris >> >> >> >> >> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: >> >>> Hello, >>> >>> I have a follow script to parse the BLAST report: >>> >>> my $in = Bio::SearchIO->new ( -file =>$out_file, >>> -format =>'blast') or die $!; >>> >>> while (my $result = $in->next_result) { >>> while (my $hit = $result->next_hit) >>> { >>> while (my $hsp = $hit->next_hsp) { >>> $qhit = $hit->name; >>> $start = $hsp->hit->start; >>> $end = $hsp->hit->end; >>> } >>> >>> >>> } print "Hit= ", $qhit, >>> ",Start = ", $start, >>> ",End = ", $end,"\n"; } >>> >>> Usually, the report has a number of the same hsp for each hit. >>> Using "print" command it gives me a hit name, start and end >>> positions >>> for each hit, except last on. For last one it prints all the hsps. >>> Something like this: >>> >>> Hit= gnl|DAS|22386,Start = 7578,End = 7601 >>> Hit= gnl|DAS|25627,Start = 2824,End = 2863 >>> Hit= gnl|DAS|25328,Start = 8864,End = 8887 >>> Hit= gnl|DAS|4890,Start = 1896,End = 1919 >>> Hit= gnl|DAS|12191,Start = 1898,End = 1921 >>> Hit= gnl|DAS|4276,Start = 557,End = 580 >>> Hit= gnl|DAS|12959,Start = 801,End = 824 >>> Hit= gnl|DAS|4092,Start = 2266,End = 2304 >>> Hit= gnl|DAS|19740,Start = 13572,End = 13610 >>> Hit= gnl|DAS|12393,Start = 3901,End = 3924 >>> Hit= gnl|DAS|25687,Start = 10415,End = 10438 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>> >>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >>> I don't need these duplicates. >>> How can I fix that? >>> >>> Thanks, >>> Inna Rytsareva >>> Discovery Information Management >>> Dow AgroSciences >>> Indianapolis, IN >>> 317-337-4716 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From maj at fortinbras.us Wed Jul 8 21:31:27 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 21:31:27 -0400 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> Message-ID: A lack of low-complexity filtering (as seems apparent from this report snippet, if I understand that concept correctly) could explain the multiple query hits on a short (24bp) region of the same subject... ----- Original Message ----- From: "Chris Fields" To: "Rytsareva, Inna (I)" Cc: Sent: Wednesday, July 08, 2009 9:08 PM Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl > I'm curious as to what this report looks like. The example report you posted > to the gbrowse list had serious issues (different problem, 'No midline' error > which I replicated); mainly there were no blank lines making it pretty much > invalid, so the parser had issues with it. Example lines from one HSP: > > > gnl|DAS|24699 pDAB101580 > Length = 12942 > Score = 50.1 bits (25), Expect = 5e-06 > Identities = 37/41 (90%) > Strand = Plus / Plus > Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 > ||||||||||||||| ||| |||||||| ||||| |||||| > Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 > Score = 46.1 bits (23), Expect = 8e-05 > Identities = 35/39 (89%) > Strand = Plus / Plus > Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 > ||||||||||||| ||| |||||||| ||||| |||||| > Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 > Score = 46.1 bits (23), Expect = 8e-05 > Identities = 35/39 (89%) > Strand = Plus / Plus > Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 > ||||||||||||| ||| |||||||| ||||| |||||| > Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 > > ... > > chris > > > > > On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: > >> Hello, >> >> I have a follow script to parse the BLAST report: >> >> my $in = Bio::SearchIO->new ( -file =>$out_file, >> -format =>'blast') or die $!; >> >> while (my $result = $in->next_result) { >> while (my $hit = $result->next_hit) >> { >> while (my $hsp = $hit->next_hsp) { >> $qhit = $hit->name; >> $start = $hsp->hit->start; >> $end = $hsp->hit->end; >> } >> >> >> } print "Hit= ", $qhit, >> ",Start = ", $start, >> ",End = ", $end,"\n"; } >> >> Usually, the report has a number of the same hsp for each hit. >> Using "print" command it gives me a hit name, start and end positions >> for each hit, except last on. For last one it prints all the hsps. >> Something like this: >> >> Hit= gnl|DAS|22386,Start = 7578,End = 7601 >> Hit= gnl|DAS|25627,Start = 2824,End = 2863 >> Hit= gnl|DAS|25328,Start = 8864,End = 8887 >> Hit= gnl|DAS|4890,Start = 1896,End = 1919 >> Hit= gnl|DAS|12191,Start = 1898,End = 1921 >> Hit= gnl|DAS|4276,Start = 557,End = 580 >> Hit= gnl|DAS|12959,Start = 801,End = 824 >> Hit= gnl|DAS|4092,Start = 2266,End = 2304 >> Hit= gnl|DAS|19740,Start = 13572,End = 13610 >> Hit= gnl|DAS|12393,Start = 3901,End = 3924 >> Hit= gnl|DAS|25687,Start = 10415,End = 10438 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >> >> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >> I don't need these duplicates. >> How can I fix that? >> >> Thanks, >> Inna Rytsareva >> Discovery Information Management >> Dow AgroSciences >> Indianapolis, IN >> 317-337-4716 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Jul 8 21:54:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 8 Jul 2009 20:54:16 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <4A552175.70009@cornell.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A552175.70009@cornell.edu> Message-ID: On Jul 8, 2009, at 5:45 PM, Robert Buels wrote: > Giles Weaver wrote: >> takes about 15 minutes, so adapter removal is definitely the >> bottleneck. I'm >> confident that some relatively simple developments in Bioperl and/ >> or EMBOSS >> will yield some big performance improvements - if you see my sample >> code in > > Apropos this kind of thing, have you guys already discussed using > lazy object creation for objects returned from bioperl parsers? Not > really relevant in the short term, but it could be a useful avenue > to pursue for addressing some performance concerns people (like ebi) > have. There are some lazy parsers for SearchIO, but each of those has specific classes geared towards the SearchIO format, an issue I worry about. I'm not sure about going down the path of having a Bio::Search::Result::FooResult, Bio::Search::Hit::FooHit, and Bio::Search::HSP::FooHSP for each 'Foo' format. The same thing could occur with SeqIO, TreeIO, etc. A possible maintenance nightmare. What I would like to see are generic lazy implementations for some of the various class, primarily Seq, AnnotationCollection, FeatureHolder/ Collection, etc, and parsers pass in just the necessary data (lazy implies file points or stream points). This may not be terribly hard to do if using iterators, but (as you may have seen) many of the current methods are greedily defined, so new interface methods would need to be drawn up (and older ones refactored to work with newer ones). > In very vague terms, one would probably implement this by defining a > very light-weight role/class called something like > Bio::LazyInflator, that would provide only an `inflate` method. > Parsers would parse into lightweight structures (probably arrayrefs) > that implement LazyInflator and users could choose between grabbing > data out of the uninflated arrayref directly, or they could call > inflate() on it to transform it into a real object (like a > Bio::Annotation or Bio::Seq or something). I would go one step further and reimplement the various AnnotationCollection/featureHolder methods in terms of a completely lazy implementation (i.e. parses the file or stream into a lazy Seq). See SwissKnife for instance. > The exact implementation of this would vary depending on whether > Moose is being used. This may be an area where optimization via Moose may not matter as much. It would be best to attempt some of this initially in bioperl, then port to Moose/Bio::Moose. > This could potentially also be compatible with having some of the > tight parsing loops be implemented in XS. > > Rob That's where it'll get a little trickier; you would probably need a decent grammar to get everything out the way you want it, or at least parse everything event-based, and other grammars would have to have similarly named rules/tokens so the same action could be tied to the data being parsed. I had a first go at generic parsing in the gbdriver/embldriver/swissdriver modules, which just pass data chunks to the handler object (which could do anything it wants with the data). The only thing not passed in yet are file points. That needs to be fleshed out more when I have the tuits, but you are more than welcome to look. Also, just to note (and something to think about): Perl6 has this 'solved' to a large degree with grammar/action combinations, where you define a grammar for a particular format and attach an Action class to process everything: my $action = MyActionClass.new(); while Bio::Grammar::Fasta.parse($filehandle, :action($action)) { # do interesting things with data from $action } In this case the Action class could create a Seq out of all the data, or possibly create something much more lightweight and lazily evaluated (for instance, use the file points instead of the actual text). The grammar in this case would essentially be C- or PIR-based I believe. Note the quotes above with 'solved'; with Rakudo you can almost do this now, however some of the Perl 6 specification needs to be fleshed out re: Grammars, and the grammar engine for Parrot (PGE) needs to be properly set up for iteration through a stream. There is enough interest that I think things could be worked out fairly quickly (e.g. months, not years). chris From maj at fortinbras.us Wed Jul 8 21:48:39 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 21:48:39 -0400 Subject: [Bioperl-l] Bioperl Installation In-Reply-To: References: Message-ID: <5EFB3F2EAA9E486E8B0C248180EA0E34@NewLife> Hi Koen- I've put the link on Installing BioPerl (tho it seems bizarre that you weren't able to make that mod). Thanks! MAJ ----- Original Message ----- From: "Koen van der Drift" To: "BioPerl List" Cc: Sent: Monday, July 06, 2009 6:41 PM Subject: Re: [Bioperl-l] Bioperl Installation > Hi, > > Installation problems on a Mac seems to be a recurring question on this > mailing list. Just as a reminder, besides CPAN, one of the easiest ways to > install bioperl on a Mac is through fink. The instructions are available on > the bioperl website here: > http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink , but are > rather hidden. > > Maybe http://www.bioperl.org/wiki/Installing_BioPerl can be edited to state > Installing Bioperl for Unix (including Mac OS X)? I don't seem to have > privileges to edit that page, so I'll leave that up to the team. > > Also, the file PACKAGES contains a link about installation on Mac OS X that > is *very* outdated. Can this be removed from the package, I think it only > creates confusion? > > Cheers, > > - Koen. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Wed Jul 8 21:51:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 8 Jul 2009 21:51:40 -0400 Subject: [Bioperl-l] While loop - SearchIO for BioPerl In-Reply-To: References: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDDBE7@USMDLMDOWX028.dow.com> <81487F67-8861-4847-A932-79AE2AB50BB5@illinois.edu> Message-ID: <3E96958FF18E4BA5A6446AC690DEC0C3@NewLife> Allow me to shamelessly plug the following: http://www.bioperl.org/wiki/HOWTO:Tiling#Quick_and_Dirty_.22Tiling.22 MAJ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Rytsareva, Inna (I)" ; Sent: Wednesday, July 08, 2009 9:41 PM Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl > Yep, that's what I was thinking. The fragment in question is fairly > short. > > Inna, if you want the best HSP you could just grab the one that best > fits what you expect (best eval, score, whatever). > > chris > > On Jul 8, 2009, at 8:31 PM, Mark A. Jensen wrote: > >> A lack of low-complexity filtering (as seems apparent from this >> report snippet, if >> I understand that concept correctly) could explain the multiple >> query hits on a >> short (24bp) region of the same subject... >> ----- Original Message ----- From: "Chris Fields" > > >> To: "Rytsareva, Inna (I)" >> Cc: >> Sent: Wednesday, July 08, 2009 9:08 PM >> Subject: Re: [Bioperl-l] While loop - SearchIO for BioPerl >> >> >>> I'm curious as to what this report looks like. The example report >>> you posted to the gbrowse list had serious issues (different >>> problem, 'No midline' error which I replicated); mainly there were >>> no blank lines making it pretty much invalid, so the parser had >>> issues with it. Example lines from one HSP: >>> >>> > gnl|DAS|24699 pDAB101580 >>> Length = 12942 >>> Score = 50.1 bits (25), Expect = 5e-06 >>> Identities = 37/41 (90%) >>> Strand = Plus / Plus >>> Query: 10 ccaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 50 >>> ||||||||||||||| ||| |||||||| ||||| |||||| >>> Sbjct: 4619 ccaaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >>> Score = 46.1 bits (23), Expect = 8e-05 >>> Identities = 35/39 (89%) >>> Strand = Plus / Plus >>> Query: 13 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 51 >>> ||||||||||||| ||| |||||||| ||||| |||||| >>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >>> Score = 46.1 bits (23), Expect = 8e-05 >>> Identities = 35/39 (89%) >>> Strand = Plus / Plus >>> Query: 14 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 52 >>> ||||||||||||| ||| |||||||| ||||| |||||| >>> Sbjct: 4621 aaaaaaaaaaaaagaaagaaaaaaaagaaaaagaaaaaa 4659 >>> >>> ... >>> >>> chris >>> >>> >>> >>> >>> On Jul 8, 2009, at 2:42 PM, Rytsareva, Inna (I) wrote: >>> >>>> Hello, >>>> >>>> I have a follow script to parse the BLAST report: >>>> >>>> my $in = Bio::SearchIO->new ( -file =>$out_file, >>>> -format =>'blast') or die $!; >>>> >>>> while (my $result = $in->next_result) { >>>> while (my $hit = $result->next_hit) >>>> { >>>> while (my $hsp = $hit->next_hsp) { >>>> $qhit = $hit->name; >>>> $start = $hsp->hit->start; >>>> $end = $hsp->hit->end; >>>> } >>>> >>>> >>>> } print "Hit= ", $qhit, >>>> ",Start = ", $start, >>>> ",End = ", $end,"\n"; } >>>> >>>> Usually, the report has a number of the same hsp for each hit. >>>> Using "print" command it gives me a hit name, start and end >>>> positions >>>> for each hit, except last on. For last one it prints all the hsps. >>>> Something like this: >>>> >>>> Hit= gnl|DAS|22386,Start = 7578,End = 7601 >>>> Hit= gnl|DAS|25627,Start = 2824,End = 2863 >>>> Hit= gnl|DAS|25328,Start = 8864,End = 8887 >>>> Hit= gnl|DAS|4890,Start = 1896,End = 1919 >>>> Hit= gnl|DAS|12191,Start = 1898,End = 1921 >>>> Hit= gnl|DAS|4276,Start = 557,End = 580 >>>> Hit= gnl|DAS|12959,Start = 801,End = 824 >>>> Hit= gnl|DAS|4092,Start = 2266,End = 2304 >>>> Hit= gnl|DAS|19740,Start = 13572,End = 13610 >>>> Hit= gnl|DAS|12393,Start = 3901,End = 3924 >>>> Hit= gnl|DAS|25687,Start = 10415,End = 10438 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> Hit= gnl|DAS|12277,Start = 7410,End = 7433 >>>> >>>> Where Hit= gnl|DAS|12277,Start = 7410,End = 7433 is the last one. >>>> I don't need these duplicates. >>>> How can I fix that? >>>> >>>> Thanks, >>>> Inna Rytsareva >>>> Discovery Information Management >>>> Dow AgroSciences >>>> Indianapolis, IN >>>> 317-337-4716 >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > From Brotelzwieb at gmx.de Thu Jul 9 06:16:06 2009 From: Brotelzwieb at gmx.de (Jonas Schaer) Date: Thu, 9 Jul 2009 12:16:06 +0200 Subject: [Bioperl-l] cdd-search with remoteblast? References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> Message-ID: <426C1893A5AD499DB4DBFEEBD257B254@jonas> Hi guys, Thank you all so much for your help and patience :). Of course you were right and I finaly found the right put-parameter to get exactly the same hits as on the homepage. I do have an other question though :)... I now want to include a search for conserved domains, but when I try to use the CDD_SEARCH-parameter (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#sub:CDD_SEARCH) like the other put-parameters the way chris once told me(works fine with the other params): my %put = ( WORD_SIZE => 3, HITLIST_SIZE => 100, THRESHOLD => 11, FILTER => 'R', GENETIC_CODE => 1, CDD_SEARCH => 'on' ###I tried it with 'true' and '1', too. ); for my $putName (keys %put) { $factory->submit_parameter($putName,$put{$putName}); } ...an exception is thrown: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: CDD_SEARCH is not a valid PUT parameter. STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/lib/Bio/Tools /Run/RemoteBlast.pm:325 STACK: main::blast_a_sequence firsteval0.8.pm:383 STACK: main::blast_it firsteval0.8.pm:288 STACK: firsteval0.8.pm:35 ----------------------------------------------------------- . I guess somehow this could be the solution to my problem: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#sub:RID-for-Simultaneous , but unfortunately I don't understand what to do. I'm so sorry to bother you with this but please help me once more...:) Best regards and thanks in advance, Jonas ----- Original Message ----- From: "Smithies, Russell" To: "'Jonas Schaer'" Cc: "'Chris Fields'" ; "'BioPerl List'" Sent: Monday, July 06, 2009 10:56 PM Subject: RE: [Bioperl-l] different results with remote-blast skript Hi Jonas, You can't just play with the BLAST parameters and hope for a "better" result. I'd suggest that if you aren't sure what they do, you should leave them alone as small changes can make huge differences in the output - it's quite possible to miss finding what you're looking for by using the wrong parameters. If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html http://www.ncbi.nlm.nih.gov/blast/tutorial/ Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 Don't worry about the integer overflow bug as there's nothing you can do about it. If you're interested, Google and Wikipedia are your friends: http://en.wikipedia.org/wiki/Integer_overflow Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > Sent: Tuesday, 7 July 2009 12:14 a.m. > To: BioPerl List; Chris Fields > Subject: Re: [Bioperl-l] different results with remote-blast skript > > Hi guys, thanks for your answers so far. > @jason: integer overflow in blast.... sorry, but what do you mean by that? > how can I fix it...? > > Since I never really changed any parameters I thought them all to be > default. > whatever, I tried to get "better" results with my prog by changing > these: > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = > '1'; > with no effect...I guess these were default values anyway. > > So please maybe you can tell me all the other parameters I can change with > my > perl-skript AND how to do that? > Unfortunately both, perl and the blast-algorithm are pretty much new to > me, > maybe thats why I just cannot find out how to do that on my own... :/ > > Here is the output I get with my remote-blast skript: > ############################################################################## > ################################### > Query Name: > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > L > hit name is ref|XP_001702807.1| > score is 442 > BLASTP 2.2.21+ > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped > BLAST and PSI-BLAST: a new generation of protein database search > programs", > Nucleic Acids Res. 25:3389-3402. > > > Reference for composition-based statistics: Alejandro A. > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, > Yuri > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), "Improving the > accuracy of PSI-BLAST protein database searches with composition-based > statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > > RID: 53STX5G2013 > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > from WGS projects > 9,252,587 sequences; 3,169,972,781 total letters Query= > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > ATGPDPDDEYE > Length=150 > > > Score > E > Sequences producing significant alignments: (Bits) > Value > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 174 > 2e-42 > > > ALIGNMENTS > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] > gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > Length=303 > > Score = 174 bits (442), Expect = 2e-42, Method: Composition-based > stats. > Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = 0/150 > (0%) > > Query 1 MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > 60 > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > Sbjct 154 MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > 213 > > Query 61 dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > 120 > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > Sbjct 214 DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > 273 > > Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > AWHERDDNAFRQAHQNTAMATGPDPDDEYE > Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > > > > Database: All non-redundant GenBank CDS > translations+PDB+SwissProt+PIR+PRF > excluding environmental samples from WGS projects > Posted date: Jul 5, 2009 4:41 AM > Number of letters in database: -1,124,994,511 > Number of sequences in database: 9,252,587 > > Lambda K H > 0.309 0.122 0.345 > Gapped > Lambda K H > 0.267 0.0410 0.140 > Matrix: BLOSUM62 > Gap Penalties: Existence: 11, Extension: 1 > Number of Sequences: 9252587 > Number of Hits to DB: 60273703 > Number of extensions: 1448367 > Number of successful extensions: 2103 > Number of sequences better than 10: 0 > Number of HSP's better than 10 without gapping: 0 > Number of HSP's gapped: 2113 > Number of HSP's successfully gapped: 0 > Length of query: 150 > Length of database: 3169972781 > Length adjustment: 113 > Effective length of query: 37 > Effective length of database: 2124430450 > Effective search space: 78603926650 > Effective search space used: 78603926650 > T: 11 > A: 40 > X1: 16 (7.1 bits) > X2: 38 (14.6 bits) > X3: 64 (24.7 bits) > S1: 42 (20.8 bits) > S2: 74 (33.1 bits) > > ############################################################################## > ################################### > and here are the hits (?) of the blast-algorithm on the ncbi-homepage with > the same query of course: > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... 300 > 3e-80 > ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... 36.2 > 1.1 > ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... 35.4 > 1.8 > ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... 34.3 > 4.2 > ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... 33.5 > 6.0 > ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... 33.5 > 7.0 > ############################################################################## > ###################################at > least the first hit is the same, but even there there is a different score > and e-value. > > thanks so much for any help :) > regards, jonas > > > ----- Original Message ----- > From: "Chris Fields" > To: "Jason Stajich" > Cc: "Smithies, Russell" ; "'BioPerl > List'" ; "'Jonas Schaer'" > > Sent: Monday, July 06, 2009 12:51 AM > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > That inspires confidence ;> > > > > chris > > > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > > > >> integer overflow in blast.... > >> > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >> > >>> I'd guess it's a difference in the parameters used. > >>> Interesting that both have the number of letters in the db as > >>> "-1,125,070,205", I assume that's a bug :-) > >>> > >>> Stats from your remote_blast: > >>> > >>> 'stats' => { > >>> 'S1' => '42', > >>> 'S1_bits' => '20.8', > >>> 'lambda' => '0.309', > >>> 'entropy' => '0.345', > >>> 'kappa_gapped' => '0.0410', > >>> 'T' => '11', > >>> 'kappa' => '0.122', > >>> 'X3_bits' => '24.7', > >>> 'X1' => '16', > >>> 'lambda_gapped' => '0.267', > >>> 'X2' => '38', > >>> 'S2' => '74', > >>> 'seqs_better_than_cutoff' => '0', > >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>> 'Hits_to_DB' => '60102303', > >>> 'dbletters' => '-1125070205', > >>> 'A' => '40', > >>> 'num_successful_extensions' => '2004', > >>> 'num_extensions' => '1436892', > >>> 'X1_bits' => '7.1', > >>> 'X3' => '64', > >>> 'entropy_gapped' => '0.140', > >>> 'dbentries' => '9252258', > >>> 'X2_bits' => '14.6', > >>> 'S2_bits' => '33.1' > >>> } > >>> > >>> > >>> Stats from a blast done on the NCBI webpage: > >>> > >>> Database: All non-redundant GenBank CDS translations+PDB+SwissProt > >>> +PIR+PRF > >>> excluding environmental samples from WGS projects > >>> Posted date: Jul 4, 2009 4:41 AM > >>> Number of letters in database: -1,125,070,205 > >>> Number of sequences in database: 9,252,258 > >>> > >>> Lambda K H > >>> 0.309 0.124 0.340 > >>> Gapped > >>> Lambda K H > >>> 0.267 0.0410 0.140 > >>> Matrix: BLOSUM62 > >>> Gap Penalties: Existence: 11, Extension: 1 > >>> Number of Sequences: 9252258 > >>> Number of Hits to DB: 86493230 > >>> Number of extensions: 3101413 > >>> Number of successful extensions: 9001 > >>> Number of sequences better than 100: 65 > >>> Number of HSP's better than 100 without gapping: 0 > >>> Number of HSP's gapped: 9000 > >>> Number of HSP's successfully gapped: 66 > >>> Length of query: 150 > >>> Length of database: 3169897087 > >>> Length adjustment: 113 > >>> Effective length of query: 37 > >>> Effective length of database: 2124391933 > >>> Effective search space: 78602501521 > >>> Effective search space used: 78602501521 > >>> T: 11 > >>> A: 40 > >>> X1: 16 (7.1 bits) > >>> X2: 38 (14.6 bits) > >>> X3: 64 (24.7 bits) > >>> S1: 42 (20.8 bits) > >>> S2: 65 (29.6 bits) > >>> > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>> To: BioPerl List > >>>> Subject: [Bioperl-l] different results with remote-blast skript > >>>> > >>>> Hi again :) > >>>> please, I only have this little question: > >>>> why do I get different results with my remote::blast perl skript > >>>> then on the > >>>> ncbi blast homepage? > >>>> I am using blastp, the query is an amino-sequence (different > >>>> results with any > >>>> sequence, differences not only in number of hits but even in e- > >>>> values, scores > >>>> etc...), the database is 'nr'. > >>>> PLEASE help me, > >>>> thank you in advance, > >>>> Jonas > >>>> > >>>> ps: my skript: > >>>> > ############################################################################## > >>>> ## > >>>> use Bio::Seq::SeqFactory; > >>>> use Bio::Tools::Run::RemoteBlast; > >>>> use strict; > >>>> my @blast_report; > >>>> my $prog = 'blastp'; > >>>> my $db = 'nr'; > >>>> my $e_val= '1e-10'; > >>>> #my $e_val= '10'; > >>>> my @params = ( '-prog' => $prog, > >>>> '-data' => $db, > >>>> '-expect' => $e_val, > >>>> '-readmethod' => 'SearchIO' ); > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>> $ > >>>> Bio > >>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>> = '1'; > >>>> > >>>> my > >>>> $ > >>>> blast_seq > >>>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>> > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD > >>>> PDDEYE'; > >>>> #$v is just to turn on and off the messages > >>>> my $v = 1; > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>> 'Bio::PrimarySeq'); > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > >>>> "$blast_seq"); > >>>> my $filename='temp2.out'; > >>>> my $r = $factory->submit_blast($seq); > >>>> print STDERR "waiting..." if( $v > 0 ); > >>>> while ( my @rids = $factory->each_rid ) > >>>> { > >>>> foreach my $rid ( @rids ) > >>>> { > >>>> my $rc = $factory->retrieve_blast($rid); > >>>> if( !ref($rc) ) > >>>> { > >>>> if( $rc < 0 ) > >>>> { > >>>> $factory->remove_rid($rid); > >>>> } > >>>> print STDERR "." if ( $v > 0 ); > >>>> } > >>>> else > >>>> { > >>>> my $result = $rc->next_result(); > >>>> $factory->save_output($filename); > >>>> $factory->remove_rid($rid); > >>>> print "\nQuery Name: ", $result->query_name(), > >>>> "\n"; > >>>> while ( my $hit = $result->next_hit ) > >>>> { > >>>> next unless ( $v > 0); > >>>> print "\thit name is ", $hit->name, "\n"; > >>>> while( my $hsp = $hit->next_hsp ) > >>>> { > >>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> > >>>> > >>>> } > >>>> @blast_report = get_file_data ($filename); > >>>> return @blast_report; > >>>> > ############################################################################## > >>>> #### > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> = > >>> = > >>> ===================================================================== > >>> Attention: The information contained in this message and/or > >>> attachments > >>> from AgResearch Limited is intended only for the persons or entities > >>> to which it is addressed and may contain confidential and/or > >>> privileged > >>> material. Any review, retransmission, dissemination or other use > >>> of, or > >>> taking of any action in reliance upon, this information by persons or > >>> entities other than the intended recipients is prohibited by > >>> AgResearch > >>> Limited. If you have received this message in error, please notify > >>> the > >>> sender immediately. > >>> = > >>> = > >>> ===================================================================== > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> Jason Stajich > >> jason at bioperl.org > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > ------------------------------------------------------------------------------ > -- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: 07/05/09 > 05:53:00 -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release Date: 07/05/09 17:54:00 From cjfields at illinois.edu Thu Jul 9 11:08:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Jul 2009 10:08:53 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: <426C1893A5AD499DB4DBFEEBD257B254@jonas> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> Message-ID: I'm not sure, but I think adding this in will take a little work (we'll need to catch the RID returned, which I'm fairly sure will require some modifications to checking the returned output). I would also have to look at the RemoteBlast API to see how this would fit in (I'm assuming we could either lump it in with other returned RIDs or create a new method for that). You are more than welcome to add this in as an enhancement request to bugzilla for BioPerl: http://bugzilla.open-bio.org/ chris On Jul 9, 2009, at 5:16 AM, Jonas Schaer wrote: > Hi guys, > Thank you all so much for your help and patience :). Of course you > were right and I finaly found the right put-parameter to get exactly > the same hits as on the homepage. > I do have an other question though :)... > I now want to include a search for conserved domains, but when I try > to use the CDD_SEARCH-parameter (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html#sub > :CDD_SEARCH) like the other put-parameters the way chris once told > me(works fine with the other params): > > my %put = ( > WORD_SIZE => 3, > HITLIST_SIZE => 100, > THRESHOLD => 11, > FILTER => 'R', > GENETIC_CODE => 1, > CDD_SEARCH => 'on' ###I tried > it with 'true' and '1', too. > > ); > > for my $putName (keys %put) { > $factory->submit_parameter($putName,$put{$putName}); > } > > > ...an exception is thrown: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: CDD_SEARCH is not a valid PUT parameter. > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/ > lib/Bio/Tools > /Run/RemoteBlast.pm:325 > STACK: main::blast_a_sequence firsteval0.8.pm:383 > STACK: main::blast_it firsteval0.8.pm:288 > STACK: firsteval0.8.pm:35 > ----------------------------------------------------------- . > I guess somehow this could be the solution to my problem: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#sub:RID- > for-Simultaneous > , but unfortunately I don't understand what to do. > I'm so sorry to bother you with this but please help me once more...:) > > Best regards and thanks in advance, > Jonas > > ----- Original Message ----- From: "Smithies, Russell" > > To: "'Jonas Schaer'" > Cc: "'Chris Fields'" ; "'BioPerl List'" > > Sent: Monday, July 06, 2009 10:56 PM > Subject: RE: [Bioperl-l] different results with remote-blast skript > > > Hi Jonas, > You can't just play with the BLAST parameters and hope for a > "better" result. > I'd suggest that if you aren't sure what they do, you should leave > them alone as small changes can make huge differences in the output > - it's quite possible to miss finding what you're looking for by > using the wrong parameters. > If all else fails, read the blast manual: http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall_all.html > http://www.ncbi.nlm.nih.gov/blast/tutorial/ > Or Read Ian Korfs' excellent book: http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJpfuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > > Don't worry about the integer overflow bug as there's nothing you > can do about it. If you're interested, Google and Wikipedia are your > friends: http://en.wikipedia.org/wiki/Integer_overflow > > > Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >> Sent: Tuesday, 7 July 2009 12:14 a.m. >> To: BioPerl List; Chris Fields >> Subject: Re: [Bioperl-l] different results with remote-blast skript >> >> Hi guys, thanks for your answers so far. >> @jason: integer overflow in blast.... sorry, but what do you mean >> by that? >> how can I fix it...? >> >> Since I never really changed any parameters I thought them all to >> be default. >> whatever, I tried to get "better" results with my prog by changing >> these: >> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >> $ >> Bio >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = >> '1'; >> with no effect...I guess these were default values anyway. >> >> So please maybe you can tell me all the other parameters I can >> change with my >> perl-skript AND how to do that? >> Unfortunately both, perl and the blast-algorithm are pretty much >> new to me, >> maybe thats why I just cannot find out how to do that on my own... :/ >> >> Here is the output I get with my remote-blast skript: >> ############################################################################## >> ################################### >> Query Name: >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >> L >> hit name is ref|XP_001702807.1| >> score is 442 >> BLASTP 2.2.21+ >> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A. >> Schaffer, >> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >> (1997), "Gapped >> BLAST and PSI-BLAST: a new generation of protein database search >> programs", >> Nucleic Acids Res. 25:3389-3402. >> >> >> Reference for composition-based statistics: Alejandro A. >> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. >> Spouge, Yuri >> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >> "Improving the >> accuracy of PSI-BLAST protein database searches with composition- >> based >> statistics and other refinements", Nucleic Acids Res. 29:2994-3005. >> >> >> RID: 53STX5G2013 >> >> >> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >> from WGS projects >> 9,252,587 sequences; 3,169,972,781 total letters Query= >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >> ATGPDPDDEYE >> Length=150 >> >> >> >> Score >> E >> Sequences producing significant alignments: >> (Bits) >> Value >> >> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... >> 174 >> 2e-42 >> >> >> ALIGNMENTS >> >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] >> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >> Length=303 >> >> Score = 174 bits (442), Expect = 2e-42, Method: Composition-based >> stats. >> Identities = 150/150 (100%), Positives = 150/150 (100%), Gaps = >> 0/150 (0%) >> >> Query 1 >> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds 60 >> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >> Sbjct 154 >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >> 213 >> >> Query 61 >> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >> 120 >> >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >> Sbjct 214 >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >> 273 >> >> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >> >> >> >> Database: All non-redundant GenBank CDS translations+PDB+SwissProt >> +PIR+PRF >> excluding environmental samples from WGS projects >> Posted date: Jul 5, 2009 4:41 AM >> Number of letters in database: -1,124,994,511 >> Number of sequences in database: 9,252,587 >> >> Lambda K H >> 0.309 0.122 0.345 >> Gapped >> Lambda K H >> 0.267 0.0410 0.140 >> Matrix: BLOSUM62 >> Gap Penalties: Existence: 11, Extension: 1 >> Number of Sequences: 9252587 >> Number of Hits to DB: 60273703 >> Number of extensions: 1448367 >> Number of successful extensions: 2103 >> Number of sequences better than 10: 0 >> Number of HSP's better than 10 without gapping: 0 >> Number of HSP's gapped: 2113 >> Number of HSP's successfully gapped: 0 >> Length of query: 150 >> Length of database: 3169972781 >> Length adjustment: 113 >> Effective length of query: 37 >> Effective length of database: 2124430450 >> Effective search space: 78603926650 >> Effective search space used: 78603926650 >> T: 11 >> A: 40 >> X1: 16 (7.1 bits) >> X2: 38 (14.6 bits) >> X3: 64 (24.7 bits) >> S1: 42 (20.8 bits) >> S2: 74 (33.1 bits) >> >> ############################################################################## >> ################################### >> and here are the hits (?) of the blast-algorithm on the ncbi- >> homepage with >> the same query of course: >> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhard... >> 300 >> 3e-80 >> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA [Acyrtho... >> 36.2 >> 1.1 >> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 [Blautia... >> 35.4 >> 1.8 >> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania brazil... >> 34.3 >> 4.2 >> ref|XP_680841.1| hypothetical protein AN7572.2 [Aspergillus n... >> 33.5 >> 6.0 >> ref|YP_001768110.1| hypothetical protein M446_1150 [Methyloba... >> 33.5 >> 7.0 >> ############################################################################## >> ###################################at >> least the first hit is the same, but even there there is a >> different score >> and e-value. >> >> thanks so much for any help :) >> regards, jonas >> >> >> ----- Original Message ----- >> From: "Chris Fields" >> To: "Jason Stajich" >> Cc: "Smithies, Russell" ; >> "'BioPerl >> List'" ; "'Jonas Schaer'" >> >> Sent: Monday, July 06, 2009 12:51 AM >> Subject: Re: [Bioperl-l] different results with remote-blast skript >> >> >> > That inspires confidence ;> >> > >> > chris >> > >> > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >> > >> >> integer overflow in blast.... >> >> >> >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >> >> >> >>> I'd guess it's a difference in the parameters used. >> >>> Interesting that both have the number of letters in the db as >> >>> "-1,125,070,205", I assume that's a bug :-) >> >>> >> >>> Stats from your remote_blast: >> >>> >> >>> 'stats' => { >> >>> 'S1' => '42', >> >>> 'S1_bits' => '20.8', >> >>> 'lambda' => '0.309', >> >>> 'entropy' => '0.345', >> >>> 'kappa_gapped' => '0.0410', >> >>> 'T' => '11', >> >>> 'kappa' => '0.122', >> >>> 'X3_bits' => '24.7', >> >>> 'X1' => '16', >> >>> 'lambda_gapped' => '0.267', >> >>> 'X2' => '38', >> >>> 'S2' => '74', >> >>> 'seqs_better_than_cutoff' => '0', >> >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >> >>> 'Hits_to_DB' => '60102303', >> >>> 'dbletters' => '-1125070205', >> >>> 'A' => '40', >> >>> 'num_successful_extensions' => '2004', >> >>> 'num_extensions' => '1436892', >> >>> 'X1_bits' => '7.1', >> >>> 'X3' => '64', >> >>> 'entropy_gapped' => '0.140', >> >>> 'dbentries' => '9252258', >> >>> 'X2_bits' => '14.6', >> >>> 'S2_bits' => '33.1' >> >>> } >> >>> >> >>> >> >>> Stats from a blast done on the NCBI webpage: >> >>> >> >>> Database: All non-redundant GenBank CDS translations+PDB >> +SwissProt >> >>> +PIR+PRF >> >>> excluding environmental samples from WGS projects >> >>> Posted date: Jul 4, 2009 4:41 AM >> >>> Number of letters in database: -1,125,070,205 >> >>> Number of sequences in database: 9,252,258 >> >>> >> >>> Lambda K H >> >>> 0.309 0.124 0.340 >> >>> Gapped >> >>> Lambda K H >> >>> 0.267 0.0410 0.140 >> >>> Matrix: BLOSUM62 >> >>> Gap Penalties: Existence: 11, Extension: 1 >> >>> Number of Sequences: 9252258 >> >>> Number of Hits to DB: 86493230 >> >>> Number of extensions: 3101413 >> >>> Number of successful extensions: 9001 >> >>> Number of sequences better than 100: 65 >> >>> Number of HSP's better than 100 without gapping: 0 >> >>> Number of HSP's gapped: 9000 >> >>> Number of HSP's successfully gapped: 66 >> >>> Length of query: 150 >> >>> Length of database: 3169897087 >> >>> Length adjustment: 113 >> >>> Effective length of query: 37 >> >>> Effective length of database: 2124391933 >> >>> Effective search space: 78602501521 >> >>> Effective search space used: 78602501521 >> >>> T: 11 >> >>> A: 40 >> >>> X1: 16 (7.1 bits) >> >>> X2: 38 (14.6 bits) >> >>> X3: 64 (24.7 bits) >> >>> S1: 42 (20.8 bits) >> >>> S2: 65 (29.6 bits) >> >>> >> >>> >> >>>> -----Original Message----- >> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >> >>>> Sent: Sunday, 28 June 2009 10:15 p.m. >> >>>> To: BioPerl List >> >>>> Subject: [Bioperl-l] different results with remote-blast skript >> >>>> >> >>>> Hi again :) >> >>>> please, I only have this little question: >> >>>> why do I get different results with my remote::blast perl skript >> >>>> then on the >> >>>> ncbi blast homepage? >> >>>> I am using blastp, the query is an amino-sequence (different >> >>>> results with any >> >>>> sequence, differences not only in number of hits but even in e- >> >>>> values, scores >> >>>> etc...), the database is 'nr'. >> >>>> PLEASE help me, >> >>>> thank you in advance, >> >>>> Jonas >> >>>> >> >>>> ps: my skript: >> >>>> >> ############################################################################## >> >>>> ## >> >>>> use Bio::Seq::SeqFactory; >> >>>> use Bio::Tools::Run::RemoteBlast; >> >>>> use strict; >> >>>> my @blast_report; >> >>>> my $prog = 'blastp'; >> >>>> my $db = 'nr'; >> >>>> my $e_val= '1e-10'; >> >>>> #my $e_val= '10'; >> >>>> my @params = ( '-prog' => $prog, >> >>>> '-data' => $db, >> >>>> '-expect' => $e_val, >> >>>> '-readmethod' => 'SearchIO' ); >> >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >> >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >> >>>> $ >> >>>> Bio >> > >> >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >> >>>> = '1'; >> >>>> >> >>>> my >> >>>> $ >> >>>> blast_seq >> >>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >> >>>> >> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAMATGPD >> >>>> PDDEYE'; >> >>>> #$v is just to turn on and off the messages >> >>>> my $v = 1; >> >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >> >>>> 'Bio::PrimarySeq'); >> >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >> >>>> "$blast_seq"); >> >>>> my $filename='temp2.out'; >> >>>> my $r = $factory->submit_blast($seq); >> >>>> print STDERR "waiting..." if( $v > 0 ); >> >>>> while ( my @rids = $factory->each_rid ) >> >>>> { >> >>>> foreach my $rid ( @rids ) >> >>>> { >> >>>> my $rc = $factory->retrieve_blast($rid); >> >>>> if( !ref($rc) ) >> >>>> { >> >>>> if( $rc < 0 ) >> >>>> { >> >>>> $factory->remove_rid($rid); >> >>>> } >> >>>> print STDERR "." if ( $v > 0 ); >> >>>> } >> >>>> else >> >>>> { >> >>>> my $result = $rc->next_result(); >> >>>> $factory->save_output($filename); >> >>>> $factory->remove_rid($rid); >> >>>> print "\nQuery Name: ", $result->query_name(), >> >>>> "\n"; >> >>>> while ( my $hit = $result->next_hit ) >> >>>> { >> >>>> next unless ( $v > 0); >> >>>> print "\thit name is ", $hit->name, "\n"; >> >>>> while( my $hsp = $hit->next_hsp ) >> >>>> { >> >>>> print "\t\tscore is ", $hsp->score, >> "\n"; >> >>>> } >> >>>> } >> >>>> } >> >>>> } >> >>>> >> >>>> >> >>>> } >> >>>> @blast_report = get_file_data ($filename); >> >>>> return @blast_report; >> >>>> >> ############################################################################## >> >>>> #### >> >>>> _______________________________________________ >> >>>> Bioperl-l mailing list >> >>>> Bioperl-l at lists.open-bio.org >> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>> = >> >>> = >> >>> >> ===================================================================== >> >>> Attention: The information contained in this message and/or >> >>> attachments >> >>> from AgResearch Limited is intended only for the persons or >> entities >> >>> to which it is addressed and may contain confidential and/or >> >>> privileged >> >>> material. Any review, retransmission, dissemination or other use >> >>> of, or >> >>> taking of any action in reliance upon, this information by >> persons or >> >>> entities other than the intended recipients is prohibited by >> >>> AgResearch >> >>> Limited. If you have received this message in error, please >> notify >> >>> the >> >>> sender immediately. >> >>> = >> >>> = >> >>> >> ===================================================================== >> >>> >> >>> _______________________________________________ >> >>> Bioperl-l mailing list >> >>> Bioperl-l at lists.open-bio.org >> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> >> Jason Stajich >> >> jason at bioperl.org >> >> >> >> _______________________________________________ >> >> Bioperl-l mailing list >> >> Bioperl-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> ------------------------------------------------------------------------------ >> -- >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release Date: >> 07/05/09 >> 05:53:00 > > > -------------------------------------------------------------------------------- > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release Date: > 07/05/09 17:54:00 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tristan.lefebure at gmail.com Thu Jul 9 11:50:20 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 9 Jul 2009 11:50:20 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... Message-ID: <200907091150.20729.tristan.lefebure@gmail.com> Hello, I have been bumping into problems while rerooting trees that contained bootstrap scores. Basically, after re-rooting the tree, some scores end-up at the wrong place (i.e. node) and some nodes lose their score. I found this thread from Bank Beszter, back in 2007, that exactly explains the same problems: http://lists.open-bio.org/pipermail/bioperl-l/2007- May/025599.html I attach a script that reproduces the bug and implements the fix that Bank described (at least this is my understanding, and it works on this example): #! /usr/bin/perl use strict; use warnings; use Bio::TreeIO; my $in = Bio::TreeIO->new(-format => 'newick', -fh => \*DATA, -internal_node_id => 'bootstrap'); my $out = Bio::TreeIO->new(-format => 'newick', -file => ">out.tree"); while( my $t = $in->next_tree ){ my $old_root = $t->get_root_node(); my ($b) = $t->find_node(-id =>"B"); my $b_anc = $b->ancestor; $out->write_tree($t); #reroot with B -> wrong, and the tree is kind of weird $t->reroot($b); $out->write_tree($t); #reroot with B ancestor -> wrong $t->reroot($b_anc); $out->write_tree($t); #a fix, following Bank Beszteri description my $node = $old_root; while (my $anc_node = $node->ancestor) { $node->bootstrap($anc_node->bootstrap()); $anc_node->bootstrap(''); $node = $anc_node; } $out->write_tree($t); #->good this time } __DATA__ (A:52,(B:46,C:50)68:11,D:70); Here is the output: (A:52,(B:46,C:50)68:11,D:70); ((C:50,(A:52,D:70):11)68:46)B; (B:46,C:50,(A:52,D:70):11)68; (B:46,C:50,(A:52,D:70)68:11); Tree #2 and #3 have the score 68 moved to the wrong node, while tree #4 is OK. (BTW tree #2 is really weird, except if B, is the real ancestor (a fossil ?), it really does not make much sense to me). My understanding here is that the problem is linked to the well-known difficulty to differentiate node from branch labels in newick trees. Bootstrap scores are branch attributes not node attributes, but since Bio::TreeI has no branch/edge/bipartition object they are attached to a node, and in fact reflects the bootstrap score of the ancestral branch leading to that node. Troubles naturally come when you are dealing with an unrooted tree or reroot a tree: a child can become an ancestor, and, if the bootstrap scores is not moved from the old child to the new child, it will end up attached at the wrong place (i.e. wrong node). I see several fix to that: 1- incorporate Bank's fix into the root() method. I.e. if there is bootstrap score, after re-rooting, the one on the old to new ancestor path, should be moved to the right node. 2- Modify the way trees are stored in bioperl to incorporate branch/edge/bipartition object, and move the bootstrap scores to them. That won't be easy and will break many things... What do you think? --Tristan From MEC at stowers.org Thu Jul 9 11:56:25 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Thu, 9 Jul 2009 10:56:25 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: <426C1893A5AD499DB4DBFEEBD257B254@jonas> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> Message-ID: Jonas, If you want to continue to use the bioperl remoteblast interface, probably what you should do is simply call it twice. Once, as you already know how to do, which will return without CDD results. Secondly, to get the CDD results, call remoteblast a second time. This time, using -database => 'CDD' -program => 'rpsblast' However, the wrapper may object to the 'rpsblast' program. It is not listed in the POD - http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm) If so, my guess is that changing the perl wrapper to allow rpsblast will "just work" (tm). I've cc:ed cjfields at bioperl.org for his opinion on this. Also, you might want to perform the CDD search first, especially if you are streaming results to eyeball that might like something to look at while the second (presumably longer) search is running. Cheers, Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jonas Schaer > Sent: Thursday, July 09, 2009 5:16 AM > To: BioPerl List; Smithies, Russell > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > Hi guys, > Thank you all so much for your help and patience :). Of > course you were right and I finaly found the right > put-parameter to get exactly the same hits as on the homepage. > I do have an other question though :)... > I now want to include a search for conserved domains, but > when I try to use the CDD_SEARCH-parameter > (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# > sub:CDD_SEARCH) > like the other put-parameters the way chris once told > me(works fine with the other params): > > my %put = ( > WORD_SIZE => 3, > HITLIST_SIZE => 100, > THRESHOLD => 11, > FILTER => 'R', > GENETIC_CODE => 1, > CDD_SEARCH => 'on' > ###I tried it > with 'true' and '1', too. > > ); > > for my $putName (keys %put) { > $factory->submit_parameter($putName,$put{$putName}); > } > > > ...an exception is thrown: > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: CDD_SEARCH is not a valid PUT parameter. > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > C:/Perl/site/lib/Bio/Tools > /Run/RemoteBlast.pm:325 > STACK: main::blast_a_sequence firsteval0.8.pm:383 > STACK: main::blast_it firsteval0.8.pm:288 > STACK: firsteval0.8.pm:35 > ----------------------------------------------------------- . > I guess somehow this could be the solution to my problem: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s > ub:RID-for-Simultaneous > , but unfortunately I don't understand what to do. > I'm so sorry to bother you with this but please help me once more...:) > > Best regards and thanks in advance, > Jonas > > ----- Original Message ----- > From: "Smithies, Russell" > To: "'Jonas Schaer'" > Cc: "'Chris Fields'" ; "'BioPerl List'" > > Sent: Monday, July 06, 2009 10:56 PM > Subject: RE: [Bioperl-l] different results with remote-blast skript > > > Hi Jonas, > You can't just play with the BLAST parameters and hope for a "better" > result. > I'd suggest that if you aren't sure what they do, you should > leave them > alone as small changes can make huge differences in the > output - it's quite > possible to miss finding what you're looking for by using the wrong > parameters. > If all else fails, read the blast manual: > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall > _all.html > http://www.ncbi.nlm.nih.gov/blast/tutorial/ > Or Read Ian Korfs' excellent book: > http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > > Don't worry about the integer overflow bug as there's nothing > you can do > about it. If you're interested, Google and Wikipedia are your > friends: > http://en.wikipedia.org/wiki/Integer_overflow > > > Russell > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > > Sent: Tuesday, 7 July 2009 12:14 a.m. > > To: BioPerl List; Chris Fields > > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > Hi guys, thanks for your answers so far. > > @jason: integer overflow in blast.... sorry, but what do > you mean by that? > > how can I fix it...? > > > > Since I never really changed any parameters I thought them > all to be > > default. > > whatever, I tried to get "better" results with my prog by changing > > these: > > $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > > $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI > STICS'} = > > '1'; > > with no effect...I guess these were default values anyway. > > > > So please maybe you can tell me all the other parameters I > can change with > > my > > perl-skript AND how to do that? > > Unfortunately both, perl and the blast-algorithm are pretty > much new to > > me, > > maybe thats why I just cannot find out how to do that on my > own... :/ > > > > Here is the output I get with my remote-blast skript: > > > ############################################################## > ################ > > ################################### > > Query Name: > > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > > L > > hit name is ref|XP_001702807.1| > > score is 442 > > BLASTP 2.2.21+ > > Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro > A. Schaffer, > > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > Lipman (1997), > > "Gapped > > BLAST and PSI-BLAST: a new generation of protein database search > > programs", > > Nucleic Acids Res. 25:3389-3402. > > > > > > Reference for composition-based statistics: Alejandro A. > > Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, > John L. Spouge, > > Yuri > > I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), > "Improving the > > accuracy of PSI-BLAST protein database searches with > composition-based > > statistics and other refinements", Nucleic Acids Res. 29:2994-3005. > > > > > > RID: 53STX5G2013 > > > > > > Database: All non-redundant GenBank CDS > > translations+PDB+SwissProt+PIR+PRF excluding environmental samples > > from WGS projects > > 9,252,587 sequences; 3,169,972,781 total letters Query= > > > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > > > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > > ATGPDPDDEYE > > Length=150 > > > > > > > Score > > E > > Sequences producing significant alignments: > (Bits) > > Value > > > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > reinhard... 174 > > 2e-42 > > > > > > ALIGNMENTS > > >ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] > > gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > > Length=303 > > > > Score = 174 bits (442), Expect = 2e-42, Method: > Composition-based > > stats. > > Identities = 150/150 (100%), Positives = 150/150 (100%), > Gaps = 0/150 > > (0%) > > > > Query 1 > MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > > 60 > > > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > > Sbjct 154 > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > > 213 > > > > Query 61 > dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > > 120 > > > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > > Sbjct 214 > DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > > 273 > > > > Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > > AWHERDDNAFRQAHQNTAMATGPDPDDEYE > > Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > > > > > > > > Database: All non-redundant GenBank CDS > > translations+PDB+SwissProt+PIR+PRF > > excluding environmental samples from WGS projects > > Posted date: Jul 5, 2009 4:41 AM > > Number of letters in database: -1,124,994,511 > > Number of sequences in database: 9,252,587 > > > > Lambda K H > > 0.309 0.122 0.345 > > Gapped > > Lambda K H > > 0.267 0.0410 0.140 > > Matrix: BLOSUM62 > > Gap Penalties: Existence: 11, Extension: 1 > > Number of Sequences: 9252587 > > Number of Hits to DB: 60273703 > > Number of extensions: 1448367 > > Number of successful extensions: 2103 > > Number of sequences better than 10: 0 > > Number of HSP's better than 10 without gapping: 0 > > Number of HSP's gapped: 2113 > > Number of HSP's successfully gapped: 0 > > Length of query: 150 > > Length of database: 3169972781 > > Length adjustment: 113 > > Effective length of query: 37 > > Effective length of database: 2124430450 > > Effective search space: 78603926650 > > Effective search space used: 78603926650 > > T: 11 > > A: 40 > > X1: 16 (7.1 bits) > > X2: 38 (14.6 bits) > > X3: 64 (24.7 bits) > > S1: 42 (20.8 bits) > > S2: 74 (33.1 bits) > > > > > ############################################################## > ################ > > ################################### > > and here are the hits (?) of the blast-algorithm on the > ncbi-homepage with > > the same query of course: > > ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > reinhard... 300 > > 3e-80 > > ref|XP_001942719.1| PREDICTED: similar to GA16705-PA > [Acyrtho... 36.2 > > 1.1 > > ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 > [Blautia... 35.4 > > 1.8 > > ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania > brazil... 34.3 > > 4.2 > > ref|XP_680841.1| hypothetical protein AN7572.2 > [Aspergillus n... 33.5 > > 6.0 > > ref|YP_001768110.1| hypothetical protein M446_1150 > [Methyloba... 33.5 > > 7.0 > > > ############################################################## > ################ > > ###################################at > > least the first hit is the same, but even there there is a > different score > > and e-value. > > > > thanks so much for any help :) > > regards, jonas > > > > > > ----- Original Message ----- > > From: "Chris Fields" > > To: "Jason Stajich" > > Cc: "Smithies, Russell" > ; "'BioPerl > > List'" ; "'Jonas Schaer'" > > > > Sent: Monday, July 06, 2009 12:51 AM > > Subject: Re: [Bioperl-l] different results with remote-blast skript > > > > > > > That inspires confidence ;> > > > > > > chris > > > > > > On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > > > > > >> integer overflow in blast.... > > >> > > >> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > > >> > > >>> I'd guess it's a difference in the parameters used. > > >>> Interesting that both have the number of letters in the db as > > >>> "-1,125,070,205", I assume that's a bug :-) > > >>> > > >>> Stats from your remote_blast: > > >>> > > >>> 'stats' => { > > >>> 'S1' => '42', > > >>> 'S1_bits' => '20.8', > > >>> 'lambda' => '0.309', > > >>> 'entropy' => '0.345', > > >>> 'kappa_gapped' => '0.0410', > > >>> 'T' => '11', > > >>> 'kappa' => '0.122', > > >>> 'X3_bits' => '24.7', > > >>> 'X1' => '16', > > >>> 'lambda_gapped' => '0.267', > > >>> 'X2' => '38', > > >>> 'S2' => '74', > > >>> 'seqs_better_than_cutoff' => '0', > > >>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > > >>> 'Hits_to_DB' => '60102303', > > >>> 'dbletters' => '-1125070205', > > >>> 'A' => '40', > > >>> 'num_successful_extensions' => '2004', > > >>> 'num_extensions' => '1436892', > > >>> 'X1_bits' => '7.1', > > >>> 'X3' => '64', > > >>> 'entropy_gapped' => '0.140', > > >>> 'dbentries' => '9252258', > > >>> 'X2_bits' => '14.6', > > >>> 'S2_bits' => '33.1' > > >>> } > > >>> > > >>> > > >>> Stats from a blast done on the NCBI webpage: > > >>> > > >>> Database: All non-redundant GenBank CDS > translations+PDB+SwissProt > > >>> +PIR+PRF > > >>> excluding environmental samples from WGS projects > > >>> Posted date: Jul 4, 2009 4:41 AM > > >>> Number of letters in database: -1,125,070,205 > > >>> Number of sequences in database: 9,252,258 > > >>> > > >>> Lambda K H > > >>> 0.309 0.124 0.340 > > >>> Gapped > > >>> Lambda K H > > >>> 0.267 0.0410 0.140 > > >>> Matrix: BLOSUM62 > > >>> Gap Penalties: Existence: 11, Extension: 1 > > >>> Number of Sequences: 9252258 > > >>> Number of Hits to DB: 86493230 > > >>> Number of extensions: 3101413 > > >>> Number of successful extensions: 9001 > > >>> Number of sequences better than 100: 65 > > >>> Number of HSP's better than 100 without gapping: 0 > > >>> Number of HSP's gapped: 9000 > > >>> Number of HSP's successfully gapped: 66 > > >>> Length of query: 150 > > >>> Length of database: 3169897087 > > >>> Length adjustment: 113 > > >>> Effective length of query: 37 > > >>> Effective length of database: 2124391933 > > >>> Effective search space: 78602501521 > > >>> Effective search space used: 78602501521 > > >>> T: 11 > > >>> A: 40 > > >>> X1: 16 (7.1 bits) > > >>> X2: 38 (14.6 bits) > > >>> X3: 64 (24.7 bits) > > >>> S1: 42 (20.8 bits) > > >>> S2: 65 (29.6 bits) > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > > >>>> Sent: Sunday, 28 June 2009 10:15 p.m. > > >>>> To: BioPerl List > > >>>> Subject: [Bioperl-l] different results with remote-blast skript > > >>>> > > >>>> Hi again :) > > >>>> please, I only have this little question: > > >>>> why do I get different results with my remote::blast > perl skript > > >>>> then on the > > >>>> ncbi blast homepage? > > >>>> I am using blastp, the query is an amino-sequence (different > > >>>> results with any > > >>>> sequence, differences not only in number of hits but even in e- > > >>>> values, scores > > >>>> etc...), the database is 'nr'. > > >>>> PLEASE help me, > > >>>> thank you in advance, > > >>>> Jonas > > >>>> > > >>>> ps: my skript: > > >>>> > > > ############################################################## > ################ > > >>>> ## > > >>>> use Bio::Seq::SeqFactory; > > >>>> use Bio::Tools::Run::RemoteBlast; > > >>>> use strict; > > >>>> my @blast_report; > > >>>> my $prog = 'blastp'; > > >>>> my $db = 'nr'; > > >>>> my $e_val= '1e-10'; > > >>>> #my $e_val= '10'; > > >>>> my @params = ( '-prog' => $prog, > > >>>> '-data' => $db, > > >>>> '-expect' => $e_val, > > >>>> '-readmethod' => 'SearchIO' ); > > >>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > > >>>> $ > > >>>> Bio > > >>>> > ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > > >>>> = '1'; > > >>>> > > >>>> my > > >>>> $ > > >>>> blast_seq > > >>>> > ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > > >>>> > > > SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN > AFRQAHQNTAMATGPD > > >>>> PDDEYE'; > > >>>> #$v is just to turn on and off the messages > > >>>> my $v = 1; > > >>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > > >>>> 'Bio::PrimarySeq'); > > >>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => > > >>>> "$blast_seq"); > > >>>> my $filename='temp2.out'; > > >>>> my $r = $factory->submit_blast($seq); > > >>>> print STDERR "waiting..." if( $v > 0 ); > > >>>> while ( my @rids = $factory->each_rid ) > > >>>> { > > >>>> foreach my $rid ( @rids ) > > >>>> { > > >>>> my $rc = $factory->retrieve_blast($rid); > > >>>> if( !ref($rc) ) > > >>>> { > > >>>> if( $rc < 0 ) > > >>>> { > > >>>> $factory->remove_rid($rid); > > >>>> } > > >>>> print STDERR "." if ( $v > 0 ); > > >>>> } > > >>>> else > > >>>> { > > >>>> my $result = $rc->next_result(); > > >>>> $factory->save_output($filename); > > >>>> $factory->remove_rid($rid); > > >>>> print "\nQuery Name: ", > $result->query_name(), > > >>>> "\n"; > > >>>> while ( my $hit = $result->next_hit ) > > >>>> { > > >>>> next unless ( $v > 0); > > >>>> print "\thit name is ", $hit->name, "\n"; > > >>>> while( my $hsp = $hit->next_hsp ) > > >>>> { > > >>>> print "\t\tscore is ", > $hsp->score, "\n"; > > >>>> } > > >>>> } > > >>>> } > > >>>> } > > >>>> > > >>>> > > >>>> } > > >>>> @blast_report = get_file_data ($filename); > > >>>> return @blast_report; > > >>>> > > > ############################################################## > ################ > > >>>> #### > > >>>> _______________________________________________ > > >>>> Bioperl-l mailing list > > >>>> Bioperl-l at lists.open-bio.org > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>> = > > >>> = > > >>> > ===================================================================== > > >>> Attention: The information contained in this message and/or > > >>> attachments > > >>> from AgResearch Limited is intended only for the > persons or entities > > >>> to which it is addressed and may contain confidential and/or > > >>> privileged > > >>> material. Any review, retransmission, dissemination or other use > > >>> of, or > > >>> taking of any action in reliance upon, this information > by persons or > > >>> entities other than the intended recipients is prohibited by > > >>> AgResearch > > >>> Limited. If you have received this message in error, > please notify > > >>> the > > >>> sender immediately. > > >>> = > > >>> = > > >>> > ===================================================================== > > >>> > > >>> _______________________________________________ > > >>> Bioperl-l mailing list > > >>> Bioperl-l at lists.open-bio.org > > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> -- > > >> Jason Stajich > > >> jason at bioperl.org > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -------------------------------------------------------------- > ---------------- > > -- > > > > > > > > No virus found in this incoming message. > > Checked by AVG - www.avg.com > > Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release > Date: 07/05/09 > > 05:53:00 > > > -------------------------------------------------------------- > ------------------ > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release > Date: 07/05/09 > 17:54:00 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Thu Jul 9 14:02:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Jul 2009 14:02:01 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: <200907091150.20729.tristan.lefebure@gmail.com> References: <200907091150.20729.tristan.lefebure@gmail.com> Message-ID: Hi Tristan-- Would you enter this in bugzilla? I did an overhaul of the root/reroot a while back, and maybe you're running into some stuff I need to check out. Thanks a lot- Mark ----- Original Message ----- From: "Tristan Lefebure" To: "BioPerl List" Sent: Thursday, July 09, 2009 11:50 AM Subject: [Bioperl-l] Bootstrap, root, reroot... > Hello, > > I have been bumping into problems while rerooting trees that > contained bootstrap scores. Basically, after re-rooting the > tree, some scores end-up at the wrong place (i.e. node) and > some nodes lose their score. I found this thread from Bank > Beszter, back in 2007, that exactly explains the same > problems: > > http://lists.open-bio.org/pipermail/bioperl-l/2007- > May/025599.html > > I attach a script that reproduces the bug and implements the > fix that Bank described (at least this is my understanding, > and it works on this example): > > > #! /usr/bin/perl > > use strict; > use warnings; > use Bio::TreeIO; > > > my $in = Bio::TreeIO->new(-format => 'newick', > -fh => \*DATA, > -internal_node_id => 'bootstrap'); > > my $out = Bio::TreeIO->new(-format => 'newick', -file => > ">out.tree"); > > while( my $t = $in->next_tree ){ > my $old_root = $t->get_root_node(); > my ($b) = $t->find_node(-id =>"B"); > my $b_anc = $b->ancestor; > $out->write_tree($t); > > #reroot with B -> wrong, and the tree is kind of weird > $t->reroot($b); > $out->write_tree($t); > > #reroot with B ancestor -> wrong > $t->reroot($b_anc); > $out->write_tree($t); > > #a fix, following Bank Beszteri description > my $node = $old_root; > while (my $anc_node = $node->ancestor) { > $node->bootstrap($anc_node->bootstrap()); > $anc_node->bootstrap(''); > $node = $anc_node; > } > $out->write_tree($t); #->good this time > } > > > __DATA__ > (A:52,(B:46,C:50)68:11,D:70); > > > Here is the output: > > (A:52,(B:46,C:50)68:11,D:70); > ((C:50,(A:52,D:70):11)68:46)B; > (B:46,C:50,(A:52,D:70):11)68; > (B:46,C:50,(A:52,D:70)68:11); > > > Tree #2 and #3 have the score 68 moved to the wrong node, > while tree #4 is OK. (BTW tree #2 is really weird, except if > B, is the real ancestor (a fossil ?), it really does not > make much sense to me). > > My understanding here is that the problem is linked to the > well-known difficulty to differentiate node from branch > labels in newick trees. Bootstrap scores are branch > attributes not node attributes, but since Bio::TreeI has no > branch/edge/bipartition object they are attached to a node, > and in fact reflects the bootstrap score of the ancestral > branch leading to that node. Troubles naturally come when > you are dealing with an unrooted tree or reroot a tree: a > child can become an ancestor, and, if the bootstrap scores > is not moved from the old child to the new child, it will > end up attached at the wrong place (i.e. wrong node). > > I see several fix to that: > > 1- incorporate Bank's fix into the root() method. I.e. if > there is bootstrap score, after re-rooting, the one on the > old to new ancestor path, should be moved to the right node. > > 2- Modify the way trees are stored in bioperl to incorporate > branch/edge/bipartition object, and move the bootstrap > scores to them. That won't be easy and will break many > things... > > > What do you think? > > --Tristan > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From tristan.lefebure at gmail.com Thu Jul 9 14:30:57 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 9 Jul 2009 14:30:57 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: References: <200907091150.20729.tristan.lefebure@gmail.com> Message-ID: <200907091430.58284.tristan.lefebure@gmail.com> Done. bug #2877. -Tristan On Thursday 09 July 2009 14:02:01 Mark A. Jensen wrote: > Hi Tristan-- > Would you enter this in bugzilla? I did an overhaul of > the root/reroot a while back, and maybe you're running > into some stuff I need to check out. Thanks a lot- > Mark > ----- Original Message ----- > From: "Tristan Lefebure" > To: "BioPerl List" > Sent: Thursday, July 09, 2009 11:50 AM > Subject: [Bioperl-l] Bootstrap, root, reroot... > > > Hello, > > > > I have been bumping into problems while rerooting trees > > that contained bootstrap scores. Basically, after > > re-rooting the tree, some scores end-up at the wrong > > place (i.e. node) and some nodes lose their score. I > > found this thread from Bank Beszter, back in 2007, that > > exactly explains the same problems: > > > > http://lists.open-bio.org/pipermail/bioperl-l/2007- > > May/025599.html > > > > I attach a script that reproduces the bug and > > implements the fix that Bank described (at least this > > is my understanding, and it works on this example): > > > > > > #! /usr/bin/perl > > > > use strict; > > use warnings; > > use Bio::TreeIO; > > > > > > my $in = Bio::TreeIO->new(-format => 'newick', > > -fh => \*DATA, > > -internal_node_id => 'bootstrap'); > > > > my $out = Bio::TreeIO->new(-format => 'newick', -file > > => ">out.tree"); > > > > while( my $t = $in->next_tree ){ > > my $old_root = $t->get_root_node(); > > my ($b) = $t->find_node(-id =>"B"); > > my $b_anc = $b->ancestor; > > $out->write_tree($t); > > > > #reroot with B -> wrong, and the tree is kind of weird > > $t->reroot($b); > > $out->write_tree($t); > > > > #reroot with B ancestor -> wrong > > $t->reroot($b_anc); > > $out->write_tree($t); > > > > #a fix, following Bank Beszteri description > > my $node = $old_root; > > while (my $anc_node = $node->ancestor) { > > $node->bootstrap($anc_node->bootstrap()); > > $anc_node->bootstrap(''); > > $node = $anc_node; > > } > > $out->write_tree($t); #->good this time > > } > > > > > > __DATA__ > > (A:52,(B:46,C:50)68:11,D:70); > > > > > > Here is the output: > > > > (A:52,(B:46,C:50)68:11,D:70); > > ((C:50,(A:52,D:70):11)68:46)B; > > (B:46,C:50,(A:52,D:70):11)68; > > (B:46,C:50,(A:52,D:70)68:11); > > > > > > Tree #2 and #3 have the score 68 moved to the wrong > > node, while tree #4 is OK. (BTW tree #2 is really > > weird, except if B, is the real ancestor (a fossil ?), > > it really does not make much sense to me). > > > > My understanding here is that the problem is linked to > > the well-known difficulty to differentiate node from > > branch labels in newick trees. Bootstrap scores are > > branch attributes not node attributes, but since > > Bio::TreeI has no branch/edge/bipartition object they > > are attached to a node, and in fact reflects the > > bootstrap score of the ancestral branch leading to that > > node. Troubles naturally come when you are dealing with > > an unrooted tree or reroot a tree: a child can become > > an ancestor, and, if the bootstrap scores is not moved > > from the old child to the new child, it will end up > > attached at the wrong place (i.e. wrong node). > > > > I see several fix to that: > > > > 1- incorporate Bank's fix into the root() method. I.e. > > if there is bootstrap score, after re-rooting, the one > > on the old to new ancestor path, should be moved to the > > right node. > > > > 2- Modify the way trees are stored in bioperl to > > incorporate branch/edge/bipartition object, and move > > the bootstrap scores to them. That won't be easy and > > will break many things... > > > > > > What do you think? > > > > --Tristan > > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tristan.lefebure at gmail.com Thu Jul 9 15:18:39 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Thu, 9 Jul 2009 15:18:39 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: <200907091430.58284.tristan.lefebure@gmail.com> References: <200907091150.20729.tristan.lefebure@gmail.com> <200907091430.58284.tristan.lefebure@gmail.com> Message-ID: I just add a quick look at the reroot() function of TreeFunctionsI, and it looks like that what should be done for the bootstrap scores is what is already done for the branch lengths. See this loop starting line 954: # reverse the ancestor & children pointers my $former_anc = $tmp_node->ancestor; my @path_from_oldroot = ($self->get_lineage_nodes($tmp_node), $tmp_node); for (my $i = 0; $i < @path_from_oldroot - 1; $i++) { my $current = $path_from_oldroot[$i]; my $next = $path_from_oldroot[$i + 1]; $current->remove_Descendent($next); $current->branch_length($next->branch_length); $next->add_Descendent($current); } It makes sense to me to treat bootstrap and branch lenght in a similar way: the branch lengths are stored inside the node object, but as the bootstrap, they really are branch attributes... Nope? -Tristan On Thu, Jul 9, 2009 at 2:30 PM, Tristan Lefebure wrote: > Done. bug #2877. > -Tristan > > On Thursday 09 July 2009 14:02:01 Mark A. Jensen wrote: > > Hi Tristan-- > > Would you enter this in bugzilla? I did an overhaul of > > the root/reroot a while back, and maybe you're running > > into some stuff I need to check out. Thanks a lot- > > Mark > > ----- Original Message ----- > > From: "Tristan Lefebure" > > To: "BioPerl List" > > Sent: Thursday, July 09, 2009 11:50 AM > > Subject: [Bioperl-l] Bootstrap, root, reroot... > > > > > Hello, > > > > > > I have been bumping into problems while rerooting trees > > > that contained bootstrap scores. Basically, after > > > re-rooting the tree, some scores end-up at the wrong > > > place (i.e. node) and some nodes lose their score. I > > > found this thread from Bank Beszter, back in 2007, that > > > exactly explains the same problems: > > > > > > http://lists.open-bio.org/pipermail/bioperl-l/2007- > > > May/025599.html > > > > > > I attach a script that reproduces the bug and > > > implements the fix that Bank described (at least this > > > is my understanding, and it works on this example): > > > > > > > > > #! /usr/bin/perl > > > > > > use strict; > > > use warnings; > > > use Bio::TreeIO; > > > > > > > > > my $in = Bio::TreeIO->new(-format => 'newick', > > > -fh => \*DATA, > > > -internal_node_id => 'bootstrap'); > > > > > > my $out = Bio::TreeIO->new(-format => 'newick', -file > > > => ">out.tree"); > > > > > > while( my $t = $in->next_tree ){ > > > my $old_root = $t->get_root_node(); > > > my ($b) = $t->find_node(-id =>"B"); > > > my $b_anc = $b->ancestor; > > > $out->write_tree($t); > > > > > > #reroot with B -> wrong, and the tree is kind of weird > > > $t->reroot($b); > > > $out->write_tree($t); > > > > > > #reroot with B ancestor -> wrong > > > $t->reroot($b_anc); > > > $out->write_tree($t); > > > > > > #a fix, following Bank Beszteri description > > > my $node = $old_root; > > > while (my $anc_node = $node->ancestor) { > > > $node->bootstrap($anc_node->bootstrap()); > > > $anc_node->bootstrap(''); > > > $node = $anc_node; > > > } > > > $out->write_tree($t); #->good this time > > > } > > > > > > > > > __DATA__ > > > (A:52,(B:46,C:50)68:11,D:70); > > > > > > > > > Here is the output: > > > > > > (A:52,(B:46,C:50)68:11,D:70); > > > ((C:50,(A:52,D:70):11)68:46)B; > > > (B:46,C:50,(A:52,D:70):11)68; > > > (B:46,C:50,(A:52,D:70)68:11); > > > > > > > > > Tree #2 and #3 have the score 68 moved to the wrong > > > node, while tree #4 is OK. (BTW tree #2 is really > > > weird, except if B, is the real ancestor (a fossil ?), > > > it really does not make much sense to me). > > > > > > My understanding here is that the problem is linked to > > > the well-known difficulty to differentiate node from > > > branch labels in newick trees. Bootstrap scores are > > > branch attributes not node attributes, but since > > > Bio::TreeI has no branch/edge/bipartition object they > > > are attached to a node, and in fact reflects the > > > bootstrap score of the ancestral branch leading to that > > > node. Troubles naturally come when you are dealing with > > > an unrooted tree or reroot a tree: a child can become > > > an ancestor, and, if the bootstrap scores is not moved > > > from the old child to the new child, it will end up > > > attached at the wrong place (i.e. wrong node). > > > > > > I see several fix to that: > > > > > > 1- incorporate Bank's fix into the root() method. I.e. > > > if there is bootstrap score, after re-rooting, the one > > > on the old to new ancestor path, should be moved to the > > > right node. > > > > > > 2- Modify the way trees are stored in bioperl to > > > incorporate branch/edge/bipartition object, and move > > > the bootstrap scores to them. That won't be easy and > > > will break many things... > > > > > > > > > What do you think? > > > > > > --Tristan > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields1 at gmail.com Thu Jul 9 15:19:15 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Thu, 9 Jul 2009 14:19:15 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> Message-ID: <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> I've scheduled this tentatively for the 1.6 release series (just not sure when yet). It may work as is, but I haven't tried it out yet (and am hazarding to guess it only retrieves the single main RID at the moment). chris On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > Jonas, > > If you want to continue to use the bioperl remoteblast interface, > probably what you should do is simply call it twice. > > Once, as you already know how to do, which will return without CDD > results. > > Secondly, to get the CDD results, call remoteblast a second time. > This time, using > -database => 'CDD' > -program => 'rpsblast' > > However, the wrapper may object to the 'rpsblast' program. It is > not listed in the POD - http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm) > If so, my guess is that changing the perl wrapper to allow > rpsblast will "just work" (tm). I've cc:ed cjfields at bioperl.org for > his opinion on this. > > Also, you might want to perform the CDD search first, especially if > you are streaming results to eyeball that might like something to > look at while the second (presumably longer) search is running. > > Cheers, > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Jonas Schaer >> Sent: Thursday, July 09, 2009 5:16 AM >> To: BioPerl List; Smithies, Russell >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Hi guys, >> Thank you all so much for your help and patience :). Of >> course you were right and I finaly found the right >> put-parameter to get exactly the same hits as on the homepage. >> I do have an other question though :)... >> I now want to include a search for conserved domains, but >> when I try to use the CDD_SEARCH-parameter >> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >> sub:CDD_SEARCH) >> like the other put-parameters the way chris once told >> me(works fine with the other params): >> >> my %put = ( >> WORD_SIZE => 3, >> HITLIST_SIZE => 100, >> THRESHOLD => 11, >> FILTER => 'R', >> GENETIC_CODE => 1, >> CDD_SEARCH => 'on' >> ###I tried it >> with 'true' and '1', too. >> >> ); >> >> for my $putName (keys %put) { >> $factory->submit_parameter($putName,$put{$putName}); >> } >> >> >> ...an exception is thrown: >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: CDD_SEARCH is not a valid PUT parameter. >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >> C:/Perl/site/lib/Bio/Tools >> /Run/RemoteBlast.pm:325 >> STACK: main::blast_a_sequence firsteval0.8.pm:383 >> STACK: main::blast_it firsteval0.8.pm:288 >> STACK: firsteval0.8.pm:35 >> ----------------------------------------------------------- . >> I guess somehow this could be the solution to my problem: >> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >> ub:RID-for-Simultaneous >> , but unfortunately I don't understand what to do. >> I'm so sorry to bother you with this but please help me once >> more...:) >> >> Best regards and thanks in advance, >> Jonas >> >> ----- Original Message ----- >> From: "Smithies, Russell" >> To: "'Jonas Schaer'" >> Cc: "'Chris Fields'" ; "'BioPerl List'" >> >> Sent: Monday, July 06, 2009 10:56 PM >> Subject: RE: [Bioperl-l] different results with remote-blast skript >> >> >> Hi Jonas, >> You can't just play with the BLAST parameters and hope for a "better" >> result. >> I'd suggest that if you aren't sure what they do, you should >> leave them >> alone as small changes can make huge differences in the >> output - it's quite >> possible to miss finding what you're looking for by using the wrong >> parameters. >> If all else fails, read the blast manual: >> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >> _all.html >> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >> Or Read Ian Korfs' excellent book: >> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >> >> Don't worry about the integer overflow bug as there's nothing >> you can do >> about it. If you're interested, Google and Wikipedia are your >> friends: >> http://en.wikipedia.org/wiki/Integer_overflow >> >> >> Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>> To: BioPerl List; Chris Fields >>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>> >>> Hi guys, thanks for your answers so far. >>> @jason: integer overflow in blast.... sorry, but what do >> you mean by that? >>> how can I fix it...? >>> >>> Since I never really changed any parameters I thought them >> all to be >>> default. >>> whatever, I tried to get "better" results with my prog by changing >>> these: >>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>> >> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >> STICS'} = >>> '1'; >>> with no effect...I guess these were default values anyway. >>> >>> So please maybe you can tell me all the other parameters I >> can change with >>> my >>> perl-skript AND how to do that? >>> Unfortunately both, perl and the blast-algorithm are pretty >> much new to >>> me, >>> maybe thats why I just cannot find out how to do that on my >> own... :/ >>> >>> Here is the output I get with my remote-blast skript: >>> >> ############################################################## >> ################ >>> ################################### >>> Query Name: >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>> L >>> hit name is ref|XP_001702807.1| >>> score is 442 >>> BLASTP 2.2.21+ >>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >> A. Schaffer, >>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >> Lipman (1997), >>> "Gapped >>> BLAST and PSI-BLAST: a new generation of protein database search >>> programs", >>> Nucleic Acids Res. 25:3389-3402. >>> >>> >>> Reference for composition-based statistics: Alejandro A. >>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >> John L. Spouge, >>> Yuri >>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >> "Improving the >>> accuracy of PSI-BLAST protein database searches with >> composition-based >>> statistics and other refinements", Nucleic Acids Res. 29:2994-3005. >>> >>> >>> RID: 53STX5G2013 >>> >>> >>> Database: All non-redundant GenBank CDS >>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>> from WGS projects >>> 9,252,587 sequences; 3,169,972,781 total letters Query= >>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >>> ATGPDPDDEYE >>> Length=150 >>> >>> >>> >> Score >>> E >>> Sequences producing significant alignments: >> (Bits) >>> Value >>> >>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >> reinhard... 174 >>> 2e-42 >>> >>> >>> ALIGNMENTS >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] >>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>> Length=303 >>> >>> Score = 174 bits (442), Expect = 2e-42, Method: >> Composition-based >>> stats. >>> Identities = 150/150 (100%), Positives = 150/150 (100%), >> Gaps = 0/150 >>> (0%) >>> >>> Query 1 >> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>> 60 >>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>> Sbjct 154 >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>> 213 >>> >>> Query 61 >> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>> 120 >>> >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>> Sbjct 214 >> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>> 273 >>> >>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>> >>> >>> >>> Database: All non-redundant GenBank CDS >>> translations+PDB+SwissProt+PIR+PRF >>> excluding environmental samples from WGS projects >>> Posted date: Jul 5, 2009 4:41 AM >>> Number of letters in database: -1,124,994,511 >>> Number of sequences in database: 9,252,587 >>> >>> Lambda K H >>> 0.309 0.122 0.345 >>> Gapped >>> Lambda K H >>> 0.267 0.0410 0.140 >>> Matrix: BLOSUM62 >>> Gap Penalties: Existence: 11, Extension: 1 >>> Number of Sequences: 9252587 >>> Number of Hits to DB: 60273703 >>> Number of extensions: 1448367 >>> Number of successful extensions: 2103 >>> Number of sequences better than 10: 0 >>> Number of HSP's better than 10 without gapping: 0 >>> Number of HSP's gapped: 2113 >>> Number of HSP's successfully gapped: 0 >>> Length of query: 150 >>> Length of database: 3169972781 >>> Length adjustment: 113 >>> Effective length of query: 37 >>> Effective length of database: 2124430450 >>> Effective search space: 78603926650 >>> Effective search space used: 78603926650 >>> T: 11 >>> A: 40 >>> X1: 16 (7.1 bits) >>> X2: 38 (14.6 bits) >>> X3: 64 (24.7 bits) >>> S1: 42 (20.8 bits) >>> S2: 74 (33.1 bits) >>> >>> >> ############################################################## >> ################ >>> ################################### >>> and here are the hits (?) of the blast-algorithm on the >> ncbi-homepage with >>> the same query of course: >>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >> reinhard... 300 >>> 3e-80 >>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >> [Acyrtho... 36.2 >>> 1.1 >>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >> [Blautia... 35.4 >>> 1.8 >>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >> brazil... 34.3 >>> 4.2 >>> ref|XP_680841.1| hypothetical protein AN7572.2 >> [Aspergillus n... 33.5 >>> 6.0 >>> ref|YP_001768110.1| hypothetical protein M446_1150 >> [Methyloba... 33.5 >>> 7.0 >>> >> ############################################################## >> ################ >>> ###################################at >>> least the first hit is the same, but even there there is a >> different score >>> and e-value. >>> >>> thanks so much for any help :) >>> regards, jonas >>> >>> >>> ----- Original Message ----- >>> From: "Chris Fields" >>> To: "Jason Stajich" >>> Cc: "Smithies, Russell" >> ; "'BioPerl >>> List'" ; "'Jonas Schaer'" >>> >>> Sent: Monday, July 06, 2009 12:51 AM >>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>> >>> >>>> That inspires confidence ;> >>>> >>>> chris >>>> >>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>> >>>>> integer overflow in blast.... >>>>> >>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>> >>>>>> I'd guess it's a difference in the parameters used. >>>>>> Interesting that both have the number of letters in the db as >>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>> >>>>>> Stats from your remote_blast: >>>>>> >>>>>> 'stats' => { >>>>>> 'S1' => '42', >>>>>> 'S1_bits' => '20.8', >>>>>> 'lambda' => '0.309', >>>>>> 'entropy' => '0.345', >>>>>> 'kappa_gapped' => '0.0410', >>>>>> 'T' => '11', >>>>>> 'kappa' => '0.122', >>>>>> 'X3_bits' => '24.7', >>>>>> 'X1' => '16', >>>>>> 'lambda_gapped' => '0.267', >>>>>> 'X2' => '38', >>>>>> 'S2' => '74', >>>>>> 'seqs_better_than_cutoff' => '0', >>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>> 'Hits_to_DB' => '60102303', >>>>>> 'dbletters' => '-1125070205', >>>>>> 'A' => '40', >>>>>> 'num_successful_extensions' => '2004', >>>>>> 'num_extensions' => '1436892', >>>>>> 'X1_bits' => '7.1', >>>>>> 'X3' => '64', >>>>>> 'entropy_gapped' => '0.140', >>>>>> 'dbentries' => '9252258', >>>>>> 'X2_bits' => '14.6', >>>>>> 'S2_bits' => '33.1' >>>>>> } >>>>>> >>>>>> >>>>>> Stats from a blast done on the NCBI webpage: >>>>>> >>>>>> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt >>>>>> +PIR+PRF >>>>>> excluding environmental samples from WGS projects >>>>>> Posted date: Jul 4, 2009 4:41 AM >>>>>> Number of letters in database: -1,125,070,205 >>>>>> Number of sequences in database: 9,252,258 >>>>>> >>>>>> Lambda K H >>>>>> 0.309 0.124 0.340 >>>>>> Gapped >>>>>> Lambda K H >>>>>> 0.267 0.0410 0.140 >>>>>> Matrix: BLOSUM62 >>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>> Number of Sequences: 9252258 >>>>>> Number of Hits to DB: 86493230 >>>>>> Number of extensions: 3101413 >>>>>> Number of successful extensions: 9001 >>>>>> Number of sequences better than 100: 65 >>>>>> Number of HSP's better than 100 without gapping: 0 >>>>>> Number of HSP's gapped: 9000 >>>>>> Number of HSP's successfully gapped: 66 >>>>>> Length of query: 150 >>>>>> Length of database: 3169897087 >>>>>> Length adjustment: 113 >>>>>> Effective length of query: 37 >>>>>> Effective length of database: 2124391933 >>>>>> Effective search space: 78602501521 >>>>>> Effective search space used: 78602501521 >>>>>> T: 11 >>>>>> A: 40 >>>>>> X1: 16 (7.1 bits) >>>>>> X2: 38 (14.6 bits) >>>>>> X3: 64 (24.7 bits) >>>>>> S1: 42 (20.8 bits) >>>>>> S2: 65 (29.6 bits) >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>> To: BioPerl List >>>>>>> Subject: [Bioperl-l] different results with remote-blast skript >>>>>>> >>>>>>> Hi again :) >>>>>>> please, I only have this little question: >>>>>>> why do I get different results with my remote::blast >> perl skript >>>>>>> then on the >>>>>>> ncbi blast homepage? >>>>>>> I am using blastp, the query is an amino-sequence (different >>>>>>> results with any >>>>>>> sequence, differences not only in number of hits but even in e- >>>>>>> values, scores >>>>>>> etc...), the database is 'nr'. >>>>>>> PLEASE help me, >>>>>>> thank you in advance, >>>>>>> Jonas >>>>>>> >>>>>>> ps: my skript: >>>>>>> >>> >> ############################################################## >> ################ >>>>>>> ## >>>>>>> use Bio::Seq::SeqFactory; >>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>> use strict; >>>>>>> my @blast_report; >>>>>>> my $prog = 'blastp'; >>>>>>> my $db = 'nr'; >>>>>>> my $e_val= '1e-10'; >>>>>>> #my $e_val= '10'; >>>>>>> my @params = ( '-prog' => $prog, >>>>>>> '-data' => $db, >>>>>>> '-expect' => $e_val, >>>>>>> '-readmethod' => 'SearchIO' ); >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>> $ >>>>>>> Bio >>>>>>> >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>> = '1'; >>>>>>> >>>>>>> my >>>>>>> $ >>>>>>> blast_seq >>>>>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>> >>> >> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >> AFRQAHQNTAMATGPD >>>>>>> PDDEYE'; >>>>>>> #$v is just to turn on and off the messages >>>>>>> my $v = 1; >>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>> 'Bio::PrimarySeq'); >>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>>>>>> "$blast_seq"); >>>>>>> my $filename='temp2.out'; >>>>>>> my $r = $factory->submit_blast($seq); >>>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>>> while ( my @rids = $factory->each_rid ) >>>>>>> { >>>>>>> foreach my $rid ( @rids ) >>>>>>> { >>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>> if( !ref($rc) ) >>>>>>> { >>>>>>> if( $rc < 0 ) >>>>>>> { >>>>>>> $factory->remove_rid($rid); >>>>>>> } >>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>> } >>>>>>> else >>>>>>> { >>>>>>> my $result = $rc->next_result(); >>>>>>> $factory->save_output($filename); >>>>>>> $factory->remove_rid($rid); >>>>>>> print "\nQuery Name: ", >> $result->query_name(), >>>>>>> "\n"; >>>>>>> while ( my $hit = $result->next_hit ) >>>>>>> { >>>>>>> next unless ( $v > 0); >>>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>> { >>>>>>> print "\t\tscore is ", >> $hsp->score, "\n"; >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> } >>>>>>> @blast_report = get_file_data ($filename); >>>>>>> return @blast_report; >>>>>>> >>> >> ############################################################## >> ################ >>>>>>> #### >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> = >>>>>> = >>>>>> >> ===================================================================== >>>>>> Attention: The information contained in this message and/or >>>>>> attachments >>>>>> from AgResearch Limited is intended only for the >> persons or entities >>>>>> to which it is addressed and may contain confidential and/or >>>>>> privileged >>>>>> material. Any review, retransmission, dissemination or other use >>>>>> of, or >>>>>> taking of any action in reliance upon, this information >> by persons or >>>>>> entities other than the intended recipients is prohibited by >>>>>> AgResearch >>>>>> Limited. If you have received this message in error, >> please notify >>>>>> the >>>>>> sender immediately. >>>>>> = >>>>>> = >>>>>> >> ===================================================================== >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> -- >>>>> Jason Stajich >>>>> jason at bioperl.org >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> -------------------------------------------------------------- >> ---------------- >>> -- >>> >>> >>> >>> No virus found in this incoming message. >>> Checked by AVG - www.avg.com >>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >> Date: 07/05/09 >>> 05:53:00 >> >> >> -------------------------------------------------------------- >> ------------------ >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >> Date: 07/05/09 >> 17:54:00 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From jay at jays.net Thu Jul 9 15:47:02 2009 From: jay at jays.net (Jay Hannah) Date: Thu, 09 Jul 2009 14:47:02 -0500 Subject: [Bioperl-l] [patch] Bio/TreeIO.pm POD patch Message-ID: <4A564936.2070909@jays.net> Hello, $tree->size throws this error: Can't locate object method "size" via package "Bio::Tree::Tree" at conv.pl line 17, line 1. Below, a POD patch to Bio::TreeIO to fix (sidestep) that problem and make podchecker happier. Thanks, j http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah Index: Bio/TreeIO.pm =================================================================== --- Bio/TreeIO.pm (revision 15841) +++ Bio/TreeIO.pm (working copy) @@ -18,13 +18,11 @@ =head1 SYNOPSIS - { - use Bio::TreeIO; - my $treeio = Bio::TreeIO->new('-format' => 'newick', - '-file' => 'globin.dnd'); - while( my $tree = $treeio->next_tree ) { - print "Tree is ", $tree->size, "\n"; - } + use Bio::TreeIO; + my $treeio = Bio::TreeIO->new('-format' => 'newick', + '-file' => 'globin.dnd'); + while( my $tree = $treeio->next_tree ) { + print "Tree has ", $tree->number_nodes, " nodes.\n"; } =head1 DESCRIPTION @@ -45,11 +43,11 @@ http://bioperl.org/wiki/Mailing_lists - About the mailing lists =head2 Support - + Please direct usage questions or support issues to the mailing list: - + L - + rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem From vaughn at cshl.edu Thu Jul 9 16:42:53 2009 From: vaughn at cshl.edu (Matthew Vaughn) Date: Thu, 9 Jul 2009 16:42:53 -0400 Subject: [Bioperl-l] Next-gen modules Message-ID: <1051DB29-0A4D-4A5A-A163-B698AFB97FFA@cshl.edu> A lot of what is being discussed is handled very elegantly by Assaf Gordon's FASTX toolkit . I spent a lot of time trying to roll my own solutions for basic Illumina processing and I've found his utilities to work much more reliably and very fast (almost real-time) than anything I could design in Perl. They are also the basis for Illumina handling in Galaxy, which is a second vote of confidence. They've got clean CLI interfaces and should be very easy to wrap in Bio::SeqUtils or Bio::Run packages. Matt -- Matthew W. Vaughn, Ph.D. Research Assistant Professor Cold Spring Harbor Laboratory 1 Bungtown Road Williams #5 Cold Spring Harbor, NY 11724 USA tel: (516) 367-8808 cell: (516) 353-7055 google-talk: matt.vaughn at gmail.com From IRytsareva at dow.com Thu Jul 9 16:33:01 2009 From: IRytsareva at dow.com (Rytsareva, Inna (I)) Date: Thu, 9 Jul 2009 16:33:01 -0400 Subject: [Bioperl-l] Modify wwwBLAST html report Message-ID: <3C9BDF0E91897443AD3C8B34CA8BDCA801FDE46C@USMDLMDOWX028.dow.com> Hello. Thanks so much for your help!! I need some ideas how to get follow: There is a wwwBLAST html report from blast.real. And I have references (maybe I'll place them in an array of strings). Each string is a reference to GBrowse for each HSP. Like: HSP 188396 189355
So, I'd need to modify this HTML page and "push" a reference between Sbjct and Score or between tags and
 for each HSP.
 
For now my script is:
########################################################################
############################
#!/usr/bin/perl

#
# $Id: blast.cgi,v 1.1 2002/08/06 19:03:51 dondosha Exp $
#

$|=1;
use CGI::Pretty qw (:standard);
use CGI::Carp qw (fatalsToBrowser);
use CGI;


use HTML::Strip;

use IO::String;
use List::Util qw (min max);
use Switch;
use File::Temp qw/ tempfile tempdir /;


use Data::Dumper;

use Bio::SearchIO;
use Bio::SearchIO::blast;


print "Content-type: text/html \n\n";

$ENV{DEBUG_COMMAND_LINE} = TRUE;
$ENV{BLASTDB} = "db";

open (BLAST,"cat $blast_form_data |./blast.REAL|");
@blast = ;
my $hs = new HTML::Strip;

my ($o_f,$out_file) = tempfile();
open (OUTFILE,">$out_file");

foreach $blast (@blast)
{
print $blast; # printing BLAST 
my $text=$hs->parse($blast);
print OUTFILE $text;
}
close OUTFILE;
$hs->eof;

my $q = new CGI;

my $in = Bio::SearchIO->new (  	-file 	=>$out_file,
				-format =>'blast') or die $!;

while (my $result = $in->next_result) 
{
	while (my $hit = $result->next_hit)
	{
		while (my $hsp = $hit->next_hsp) 	
		{
			$qhit = $hit->name;
			$qstart = $hsp->hit->start;
			$qend = $hsp->hit->end;

			print" HSP $qname $qstart $qend
\n"; unlink $out_file; } } } ######################################################################## ######################## It prints the BLAST report and then the links. Thanks, Inna Rytsareva Discovery Information Management Dow AgroSciences Indianapolis, IN 317-337-4716 From cjfields at illinois.edu Thu Jul 9 16:47:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Jul 2009 15:47:07 -0500 Subject: [Bioperl-l] [patch] Bio/TreeIO.pm POD patch In-Reply-To: <4A564936.2070909@jays.net> References: <4A564936.2070909@jays.net> Message-ID: committed in r15842. thanks! chris On Jul 9, 2009, at 2:47 PM, Jay Hannah wrote: > Hello, > > $tree->size throws this error: > > Can't locate object method "size" via package "Bio::Tree::Tree" at > conv.pl line 17, line 1. > > Below, a POD patch to Bio::TreeIO to fix (sidestep) that problem and > make podchecker happier. > > Thanks, > > j > http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah > > > > > Index: Bio/TreeIO.pm > =================================================================== > --- Bio/TreeIO.pm (revision 15841) > +++ Bio/TreeIO.pm (working copy) > @@ -18,13 +18,11 @@ > > =head1 SYNOPSIS > > - { > - use Bio::TreeIO; > - my $treeio = Bio::TreeIO->new('-format' => 'newick', > - '-file' => 'globin.dnd'); > - while( my $tree = $treeio->next_tree ) { > - print "Tree is ", $tree->size, "\n"; > - } > + use Bio::TreeIO; > + my $treeio = Bio::TreeIO->new('-format' => 'newick', > + '-file' => 'globin.dnd'); > + while( my $tree = $treeio->next_tree ) { > + print "Tree has ", $tree->number_nodes, " nodes.\n"; > } > > =head1 DESCRIPTION > @@ -45,11 +43,11 @@ > http://bioperl.org/wiki/Mailing_lists - About the mailing lists > > =head2 Support > - > + > Please direct usage questions or support issues to the mailing list: > - > + > L > - > + > rather than to the module maintainer directly. Many experienced and > reponsive experts will be able look at the problem and quickly > address it. Please include a thorough description of the problem > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jay at jays.net Thu Jul 9 17:03:52 2009 From: jay at jays.net (Jay Hannah) Date: Thu, 09 Jul 2009 16:03:52 -0500 Subject: [Bioperl-l] X-Greylist: Delayed Message-ID: <4A565B38.1090408@jays.net> (Thanks for committing r15842 Chris!!) I noticed this header in my last post (the copy MailMan sent me): X-Greylist: Delayed for 00:29:57 by milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); My post was, indeed, delayed by ~30 minutes. Is that intentional? And/or is there something I can do differently? Full headers of that email: http://scsys.co.uk:8001/30919 Thanks, j From maj at fortinbras.us Thu Jul 9 17:55:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Jul 2009 17:55:23 -0400 Subject: [Bioperl-l] Fw: Bootstrap, root, reroot... Message-ID: <999D8B7079824883AF63627CAF819614@NewLife> up to the list, too-- ----- Original Message ----- From: "Mark A. Jensen" To: "Tristan Lefebure" Sent: Thursday, July 09, 2009 3:37 PM Subject: Re: [Bioperl-l] Bootstrap, root, reroot... > I'll bet you're right-- I'll put this in as a comment-- thanks! > ----- Original Message ----- > From: "Tristan Lefebure" > To: "BioPerl List" > Sent: Thursday, July 09, 2009 3:18 PM > Subject: Re: [Bioperl-l] Bootstrap, root, reroot... > > >>I just add a quick look at the reroot() function of TreeFunctionsI, and it >> looks like that what should be done for the bootstrap scores is what is >> already done for the branch lengths. See this loop starting line 954: >> >> # reverse the ancestor & children pointers >> my $former_anc = $tmp_node->ancestor; >> my @path_from_oldroot = ($self->get_lineage_nodes($tmp_node), >> $tmp_node); >> for (my $i = 0; $i < @path_from_oldroot - 1; $i++) { >> my $current = $path_from_oldroot[$i]; >> my $next = $path_from_oldroot[$i + 1]; >> $current->remove_Descendent($next); >> $current->branch_length($next->branch_length); >> $next->add_Descendent($current); >> } >> >> It makes sense to me to treat bootstrap and branch lenght in a similar way: >> the branch lengths are stored inside the node object, but as the bootstrap, >> they really are branch attributes... Nope? >> >> -Tristan >> >> On Thu, Jul 9, 2009 at 2:30 PM, Tristan Lefebure >> wrote: >> >>> Done. bug #2877. >>> -Tristan >>> >>> On Thursday 09 July 2009 14:02:01 Mark A. Jensen wrote: >>> > Hi Tristan-- >>> > Would you enter this in bugzilla? I did an overhaul of >>> > the root/reroot a while back, and maybe you're running >>> > into some stuff I need to check out. Thanks a lot- >>> > Mark >>> > ----- Original Message ----- >>> > From: "Tristan Lefebure" >>> > To: "BioPerl List" >>> > Sent: Thursday, July 09, 2009 11:50 AM >>> > Subject: [Bioperl-l] Bootstrap, root, reroot... >>> > >>> > > Hello, >>> > > >>> > > I have been bumping into problems while rerooting trees >>> > > that contained bootstrap scores. Basically, after >>> > > re-rooting the tree, some scores end-up at the wrong >>> > > place (i.e. node) and some nodes lose their score. I >>> > > found this thread from Bank Beszter, back in 2007, that >>> > > exactly explains the same problems: >>> > > >>> > > http://lists.open-bio.org/pipermail/bioperl-l/2007- >>> > > May/025599.html >>> > > >>> > > I attach a script that reproduces the bug and >>> > > implements the fix that Bank described (at least this >>> > > is my understanding, and it works on this example): >>> > > >>> > > >>> > > #! /usr/bin/perl >>> > > >>> > > use strict; >>> > > use warnings; >>> > > use Bio::TreeIO; >>> > > >>> > > >>> > > my $in = Bio::TreeIO->new(-format => 'newick', >>> > > -fh => \*DATA, >>> > > -internal_node_id => 'bootstrap'); >>> > > >>> > > my $out = Bio::TreeIO->new(-format => 'newick', -file >>> > > => ">out.tree"); >>> > > >>> > > while( my $t = $in->next_tree ){ >>> > > my $old_root = $t->get_root_node(); >>> > > my ($b) = $t->find_node(-id =>"B"); >>> > > my $b_anc = $b->ancestor; >>> > > $out->write_tree($t); >>> > > >>> > > #reroot with B -> wrong, and the tree is kind of weird >>> > > $t->reroot($b); >>> > > $out->write_tree($t); >>> > > >>> > > #reroot with B ancestor -> wrong >>> > > $t->reroot($b_anc); >>> > > $out->write_tree($t); >>> > > >>> > > #a fix, following Bank Beszteri description >>> > > my $node = $old_root; >>> > > while (my $anc_node = $node->ancestor) { >>> > > $node->bootstrap($anc_node->bootstrap()); >>> > > $anc_node->bootstrap(''); >>> > > $node = $anc_node; >>> > > } >>> > > $out->write_tree($t); #->good this time >>> > > } >>> > > >>> > > >>> > > __DATA__ >>> > > (A:52,(B:46,C:50)68:11,D:70); >>> > > >>> > > >>> > > Here is the output: >>> > > >>> > > (A:52,(B:46,C:50)68:11,D:70); >>> > > ((C:50,(A:52,D:70):11)68:46)B; >>> > > (B:46,C:50,(A:52,D:70):11)68; >>> > > (B:46,C:50,(A:52,D:70)68:11); >>> > > >>> > > >>> > > Tree #2 and #3 have the score 68 moved to the wrong >>> > > node, while tree #4 is OK. (BTW tree #2 is really >>> > > weird, except if B, is the real ancestor (a fossil ?), >>> > > it really does not make much sense to me). >>> > > >>> > > My understanding here is that the problem is linked to >>> > > the well-known difficulty to differentiate node from >>> > > branch labels in newick trees. Bootstrap scores are >>> > > branch attributes not node attributes, but since >>> > > Bio::TreeI has no branch/edge/bipartition object they >>> > > are attached to a node, and in fact reflects the >>> > > bootstrap score of the ancestral branch leading to that >>> > > node. Troubles naturally come when you are dealing with >>> > > an unrooted tree or reroot a tree: a child can become >>> > > an ancestor, and, if the bootstrap scores is not moved >>> > > from the old child to the new child, it will end up >>> > > attached at the wrong place (i.e. wrong node). >>> > > >>> > > I see several fix to that: >>> > > >>> > > 1- incorporate Bank's fix into the root() method. I.e. >>> > > if there is bootstrap score, after re-rooting, the one >>> > > on the old to new ancestor path, should be moved to the >>> > > right node. >>> > > >>> > > 2- Modify the way trees are stored in bioperl to >>> > > incorporate branch/edge/bipartition object, and move >>> > > the bootstrap scores to them. That won't be easy and >>> > > will break many things... >>> > > >>> > > >>> > > What do you think? >>> > > >>> > > --Tristan >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > _______________________________________________ >>> > > Bioperl-l mailing list >>> > > Bioperl-l at lists.open-bio.org >>> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> From cjfields at illinois.edu Thu Jul 9 18:48:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 9 Jul 2009 17:48:13 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <1051DB29-0A4D-4A5A-A163-B698AFB97FFA@cshl.edu> References: <1051DB29-0A4D-4A5A-A163-B698AFB97FFA@cshl.edu> Message-ID: Looks very promising. Do you know if it's capable of reporting back indices, e.g. for building flat-file databases? chris On Jul 9, 2009, at 3:42 PM, Matthew Vaughn wrote: > A lot of what is being discussed is handled very elegantly by Assaf > Gordon's FASTX toolkit . I > spent a lot of time trying to roll my own solutions for basic > Illumina processing and I've found his utilities to work much more > reliably and very fast (almost real-time) than anything I could > design in Perl. They are also the basis for Illumina handling in > Galaxy, which is a second vote of confidence. > > They've got clean CLI interfaces and should be very easy to wrap in > Bio::SeqUtils or Bio::Run packages. > > Matt > > -- > Matthew W. Vaughn, Ph.D. > Research Assistant Professor > Cold Spring Harbor Laboratory > 1 Bungtown Road > Williams #5 > Cold Spring Harbor, NY 11724 USA > > tel: (516) 367-8808 > cell: (516) 353-7055 > google-talk: matt.vaughn at gmail.com > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields1 at gmail.com Thu Jul 9 21:44:12 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Thu, 9 Jul 2009 20:44:12 -0500 Subject: [Bioperl-l] update PLATFORMS file In-Reply-To: <4A569AC4.5080200@cornell.edu> References: <4A569AC4.5080200@cornell.edu> Message-ID: <08A95736-7BD5-48A9-9786-D3D1A3520EDE@gmail.com> Beat me to it! So, what does everyone think? chris On Jul 9, 2009, at 8:35 PM, Robert Buels wrote: > Taking this to bioperl-l: > > koenvanderdrift at gmail.com said: > > The PLATFORMS document contains a *very* outdated link on how to > install > bioperl on Macs. Please remove this link: "Steve Cannon has made > available > Bioperl OS X installation directions and notes online at the > following URL: http://www.tc.umn.edu/~cann0010/Bioperl_OSX_install.html > " > > ------- Comment #1 from cjfields at bioperl.org 2009-07-09 21:18 EST > ------- > I think we could actually remove this file completely. It hasn't > been updated > in quite a while and any information it contains would probably > serve a better > purpose elsewhere. > > > So, remove the PLATFORMS file? Is all of the stuff in there on the > wiki? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > From rbuels at gmail.com Thu Jul 9 21:35:00 2009 From: rbuels at gmail.com (Robert Buels) Date: Thu, 09 Jul 2009 18:35:00 -0700 Subject: [Bioperl-l] update PLATFORMS file Message-ID: <4A569AC4.5080200@cornell.edu> Taking this to bioperl-l: koenvanderdrift at gmail.com said: The PLATFORMS document contains a *very* outdated link on how to install bioperl on Macs. Please remove this link: "Steve Cannon has made available Bioperl OS X installation directions and notes online at the following URL: http://www.tc.umn.edu/~cann0010/Bioperl_OSX_install.html" ------- Comment #1 from cjfields at bioperl.org 2009-07-09 21:18 EST ------- I think we could actually remove this file completely. It hasn't been updated in quite a while and any information it contains would probably serve a better purpose elsewhere. So, remove the PLATFORMS file? Is all of the stuff in there on the wiki? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From maj.fortinbras at gmail.com Thu Jul 9 23:28:52 2009 From: maj.fortinbras at gmail.com (Mark Jensen) Date: Thu, 9 Jul 2009 23:28:52 -0400 Subject: [Bioperl-l] X-Greylist: Delayed Message-ID: <4239c0bb0907092028w1a321724jadd3fe6e4960b47a@mail.gmail.com> This is a test. MAJ From maj at fortinbras.us Thu Jul 9 23:38:58 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 9 Jul 2009 23:38:58 -0400 Subject: [Bioperl-l] X-Greylist: Delayed In-Reply-To: <4A565B38.1090408@jays.net> References: <4A565B38.1090408@jays.net> Message-ID: Good eye, Jay. Poking around, I find that some DNS names are more equal than others. My test post from my gmail account maj.fortinbras -at- gmail -dot- com had header X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 23:29:00 -0400 (EDT) X-Greylist: Sender DNS name whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); Thu, 09 Jul 2009 23:28:53 -0400 (EDT) while the domain with less cachet, maj -at- fortinbras -dot- us, has X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 14:25:45 -0400 (EDT) X-Greylist: Delayed for 00:16:28 by milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); Thu, 09 Jul 2009 14:18:37 -0400 (EDT) and has forever; this explains the infinite waiting time I typically also experience. Some fortunate posters even obtain the coveted X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Tue, 07 Jul 2009 13:30:29 -0400 (EDT) X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); This may be even more stupendous than a commit bit. cheers, Mark ----- Original Message ----- From: "Jay Hannah" To: Sent: Thursday, July 09, 2009 5:03 PM Subject: [Bioperl-l] X-Greylist: Delayed > (Thanks for committing r15842 Chris!!) > > > I noticed this header in my last post (the copy MailMan sent me): > > X-Greylist: Delayed for 00:29:57 by milter-greylist-2.0.2 > (portal.open-bio.org [207.154.17.70]); > > My post was, indeed, delayed by ~30 minutes. > > > Is that intentional? And/or is there something I can do differently? > > Full headers of that email: http://scsys.co.uk:8001/30919 > > Thanks, > > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Fri Jul 10 01:25:27 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 9 Jul 2009 22:25:27 -0700 Subject: [Bioperl-l] X-Greylist: Delayed In-Reply-To: References: <4A565B38.1090408@jays.net> Message-ID: <0D6E20B7-47A8-4E64-B973-35618300D246@bioperl.org> The IP your mail comes from is initially greylisted (hence the 30 min delay which requires the host to resend) and then after it is whitelisted so frequent posters's originating IP is will end up and be cached. So it depends on if your IP is dynamic, how often you are emailing the list, etc. All this was discussed at least once a while ago. http://portal.open-bio.org/pipermail/bioperl-l/2006-April/021340.html Mailing list problems should probably go to root-l at open-bio.org if you want specific help too. -jason On Jul 9, 2009, at 8:38 PM, Mark A. Jensen wrote: > Good eye, Jay. Poking around, I find that some DNS names are > more equal than others. My test post from my gmail account > maj.fortinbras -at- gmail -dot- com had header > > X-Greylist: Sender IP whitelisted, not delayed by milter- > greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 > 23:29:00 -0400 (EDT) > X-Greylist: Sender DNS name whitelisted, not delayed by milter- > greylist-2.0.2 > (portal.open-bio.org [207.154.17.70]); > Thu, 09 Jul 2009 23:28:53 -0400 (EDT) > > while the domain with less cachet, > maj -at- fortinbras -dot- us, > has > > X-Greylist: Sender IP whitelisted, not delayed by milter- > greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 > 14:25:45 -0400 (EDT) > X-Greylist: Delayed for 00:16:28 by milter-greylist-2.0.2 > (portal.open-bio.org > [207.154.17.70]); Thu, 09 Jul 2009 14:18:37 -0400 (EDT) > > and has forever; this explains the infinite waiting time I typically > also experience. > > Some fortunate posters even obtain the coveted > > X-Greylist: Sender IP whitelisted, not delayed by milter- > greylist-2.0.2 (portal.open-bio.org [127.0.0.1]); Tue, 07 Jul 2009 > 13:30:29 -0400 (EDT) > X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by > milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); > > This may be even more stupendous than a commit bit. > > cheers, > Mark > ----- Original Message ----- From: "Jay Hannah" > To: > Sent: Thursday, July 09, 2009 5:03 PM > Subject: [Bioperl-l] X-Greylist: Delayed > > >> (Thanks for committing r15842 Chris!!) >> >> >> I noticed this header in my last post (the copy MailMan sent me): >> >> X-Greylist: Delayed for 00:29:57 by milter-greylist-2.0.2 >> (portal.open-bio.org [207.154.17.70]); >> >> My post was, indeed, delayed by ~30 minutes. >> >> >> Is that intentional? And/or is there something I can do differently? >> >> Full headers of that email: http://scsys.co.uk:8001/30919 >> >> Thanks, >> >> j >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From maj at fortinbras.us Fri Jul 10 08:43:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 10 Jul 2009 08:43:04 -0400 Subject: [Bioperl-l] X-Greylist: Delayed In-Reply-To: <0D6E20B7-47A8-4E64-B973-35618300D246@bioperl.org> References: <4A565B38.1090408@jays.net> <0D6E20B7-47A8-4E64-B973-35618300D246@bioperl.org> Message-ID: <48C03241903F4C14A9CB6676B82362B7@NewLife> The problem doesn't seem to be the IP, which is whitelisted for Jay and me, but the DNS name, which is evidently not added to the whitelist automatically for frequent posters. It would be great if this could be automatically handled as well. ----- Original Message ----- From: "Jason Stajich" To: "Jay Hannah" Cc: "BioPerl List" ; "Mark A. Jensen" Sent: Friday, July 10, 2009 1:25 AM Subject: Re: [Bioperl-l] X-Greylist: Delayed > The IP your mail comes from is initially greylisted (hence the 30 min delay > which requires the host to resend) and then after it is whitelisted so > frequent posters's originating IP is will end up and be cached. So it depends > on if your IP is dynamic, how often you are emailing the list, etc. > > All this was discussed at least once a while ago. > http://portal.open-bio.org/pipermail/bioperl-l/2006-April/021340.html > > Mailing list problems should probably go to root-l at open-bio.org if you want > specific help too. > > -jason > On Jul 9, 2009, at 8:38 PM, Mark A. Jensen wrote: > >> Good eye, Jay. Poking around, I find that some DNS names are >> more equal than others. My test post from my gmail account >> maj.fortinbras -at- gmail -dot- com had header >> >> X-Greylist: Sender IP whitelisted, not delayed by milter- greylist-2.0.2 >> (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 23:29:00 -0400 (EDT) >> X-Greylist: Sender DNS name whitelisted, not delayed by milter- >> greylist-2.0.2 >> (portal.open-bio.org [207.154.17.70]); >> Thu, 09 Jul 2009 23:28:53 -0400 (EDT) >> >> while the domain with less cachet, >> maj -at- fortinbras -dot- us, >> has >> >> X-Greylist: Sender IP whitelisted, not delayed by milter- greylist-2.0.2 >> (portal.open-bio.org [127.0.0.1]); Thu, 09 Jul 2009 14:25:45 -0400 (EDT) >> X-Greylist: Delayed for 00:16:28 by milter-greylist-2.0.2 >> (portal.open-bio.org >> [207.154.17.70]); Thu, 09 Jul 2009 14:18:37 -0400 (EDT) >> >> and has forever; this explains the infinite waiting time I typically >> also experience. >> >> Some fortunate posters even obtain the coveted >> >> X-Greylist: Sender IP whitelisted, not delayed by milter- greylist-2.0.2 >> (portal.open-bio.org [127.0.0.1]); Tue, 07 Jul 2009 13:30:29 -0400 (EDT) >> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by >> milter-greylist-2.0.2 (portal.open-bio.org [207.154.17.70]); >> >> This may be even more stupendous than a commit bit. >> >> cheers, >> Mark >> ----- Original Message ----- From: "Jay Hannah" >> To: >> Sent: Thursday, July 09, 2009 5:03 PM >> Subject: [Bioperl-l] X-Greylist: Delayed >> >> >>> (Thanks for committing r15842 Chris!!) >>> >>> >>> I noticed this header in my last post (the copy MailMan sent me): >>> >>> X-Greylist: Delayed for 00:29:57 by milter-greylist-2.0.2 >>> (portal.open-bio.org [207.154.17.70]); >>> >>> My post was, indeed, delayed by ~30 minutes. >>> >>> >>> Is that intentional? And/or is there something I can do differently? >>> >>> Full headers of that email: http://scsys.co.uk:8001/30919 >>> >>> Thanks, >>> >>> j >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > > From Brotelzwieb at gmx.de Fri Jul 10 05:18:12 2009 From: Brotelzwieb at gmx.de (Jonas Schaer) Date: Fri, 10 Jul 2009 11:18:12 +0200 Subject: [Bioperl-l] cdd-search with remoteblast? References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> Message-ID: Hi, I tried to do what Malcom proposed my ($prog = 'rpsblast'; my $db = 'CDD';) but that didn't work. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Value rpsblast for PUT parameter PROGRAM does not match expression t?blast[ pnx]. Rejecting. STACK: Error::throw STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 STACK: Bio::Tools::Run::RemoteBlast::submit_parameter C:/Perl/site/lib/Bio/Tools /Run/RemoteBlast.pm:329 STACK: Bio::Tools::Run::RemoteBlast::new C:/Perl/site/lib/Bio/Tools/Run/RemoteBl ast.pm:257 STACK: blast_a_seq2.pm:14 ----------------------------------------------------------- So I should try to "change the wrapper to allow 'rpsblast'", right? Could You tell me how to do that, please? So sorry but I have no idea yet...:) If that doesn't work, is there any other way to run cdd-searches with perl? Thank you so much! Regards, Jonas ----- Original Message ----- From: "Chris Fields" To: "Cook, Malcolm" Cc: "'Jonas Schaer'" ; "'BioPerl List'" ; "'Smithies, Russell'" ; Sent: Thursday, July 09, 2009 9:19 PM Subject: Re: [Bioperl-l] cdd-search with remoteblast? > I've scheduled this tentatively for the 1.6 release series (just not > sure when yet). It may work as is, but I haven't tried it out yet > (and am hazarding to guess it only retrieves the single main RID at > the moment). > > chris > > On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > >> Jonas, >> >> If you want to continue to use the bioperl remoteblast interface, >> probably what you should do is simply call it twice. >> >> Once, as you already know how to do, which will return without CDD >> results. >> >> Secondly, to get the CDD results, call remoteblast a second time. >> This time, using >> -database => 'CDD' >> -program => 'rpsblast' >> >> However, the wrapper may object to the 'rpsblast' program. It is >> not listed in the POD - >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/RemoteBlast.pm) >> If so, my guess is that changing the perl wrapper to allow >> rpsblast will "just work" (tm). I've cc:ed cjfields at bioperl.org for >> his opinion on this. >> >> Also, you might want to perform the CDD search first, especially if >> you are streaming results to eyeball that might like something to >> look at while the second (presumably longer) search is running. >> >> Cheers, >> >> Malcolm Cook >> Stowers Institute for Medical Research - Kansas City, Missouri >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Jonas Schaer >>> Sent: Thursday, July 09, 2009 5:16 AM >>> To: BioPerl List; Smithies, Russell >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>> >>> Hi guys, >>> Thank you all so much for your help and patience :). Of >>> course you were right and I finaly found the right >>> put-parameter to get exactly the same hits as on the homepage. >>> I do have an other question though :)... >>> I now want to include a search for conserved domains, but >>> when I try to use the CDD_SEARCH-parameter >>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>> sub:CDD_SEARCH) >>> like the other put-parameters the way chris once told >>> me(works fine with the other params): >>> >>> my %put = ( >>> WORD_SIZE => 3, >>> HITLIST_SIZE => 100, >>> THRESHOLD => 11, >>> FILTER => 'R', >>> GENETIC_CODE => 1, >>> CDD_SEARCH => 'on' >>> ###I tried it >>> with 'true' and '1', too. >>> >>> ); >>> >>> for my $putName (keys %put) { >>> $factory->submit_parameter($putName,$put{$putName}); >>> } >>> >>> >>> ...an exception is thrown: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: CDD_SEARCH is not a valid PUT parameter. >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>> C:/Perl/site/lib/Bio/Tools >>> /Run/RemoteBlast.pm:325 >>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>> STACK: main::blast_it firsteval0.8.pm:288 >>> STACK: firsteval0.8.pm:35 >>> ----------------------------------------------------------- . >>> I guess somehow this could be the solution to my problem: >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>> ub:RID-for-Simultaneous >>> , but unfortunately I don't understand what to do. >>> I'm so sorry to bother you with this but please help me once >>> more...:) >>> >>> Best regards and thanks in advance, >>> Jonas >>> >>> ----- Original Message ----- >>> From: "Smithies, Russell" >>> To: "'Jonas Schaer'" >>> Cc: "'Chris Fields'" ; "'BioPerl List'" >>> >>> Sent: Monday, July 06, 2009 10:56 PM >>> Subject: RE: [Bioperl-l] different results with remote-blast skript >>> >>> >>> Hi Jonas, >>> You can't just play with the BLAST parameters and hope for a "better" >>> result. >>> I'd suggest that if you aren't sure what they do, you should >>> leave them >>> alone as small changes can make huge differences in the >>> output - it's quite >>> possible to miss finding what you're looking for by using the wrong >>> parameters. >>> If all else fails, read the blast manual: >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>> _all.html >>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>> Or Read Ian Korfs' excellent book: >>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>> >>> Don't worry about the integer overflow bug as there's nothing >>> you can do >>> about it. If you're interested, Google and Wikipedia are your >>> friends: >>> http://en.wikipedia.org/wiki/Integer_overflow >>> >>> >>> Russell >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>> To: BioPerl List; Chris Fields >>>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>>> >>>> Hi guys, thanks for your answers so far. >>>> @jason: integer overflow in blast.... sorry, but what do >>> you mean by that? >>>> how can I fix it...? >>>> >>>> Since I never really changed any parameters I thought them >>> all to be >>>> default. >>>> whatever, I tried to get "better" results with my prog by changing >>>> these: >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>> >>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>> STICS'} = >>>> '1'; >>>> with no effect...I guess these were default values anyway. >>>> >>>> So please maybe you can tell me all the other parameters I >>> can change with >>>> my >>>> perl-skript AND how to do that? >>>> Unfortunately both, perl and the blast-algorithm are pretty >>> much new to >>>> me, >>>> maybe thats why I just cannot find out how to do that on my >>> own... :/ >>>> >>>> Here is the output I get with my remote-blast skript: >>>> >>> ############################################################## >>> ################ >>>> ################################### >>>> Query Name: >>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>> L >>>> hit name is ref|XP_001702807.1| >>>> score is 442 >>>> BLASTP 2.2.21+ >>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>> A. Schaffer, >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>> Lipman (1997), >>>> "Gapped >>>> BLAST and PSI-BLAST: a new generation of protein database search >>>> programs", >>>> Nucleic Acids Res. 25:3389-3402. >>>> >>>> >>>> Reference for composition-based statistics: Alejandro A. >>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>> John L. Spouge, >>>> Yuri >>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>> "Improving the >>>> accuracy of PSI-BLAST protein database searches with >>> composition-based >>>> statistics and other refinements", Nucleic Acids Res. 29:2994-3005. >>>> >>>> >>>> RID: 53STX5G2013 >>>> >>>> >>>> Database: All non-redundant GenBank CDS >>>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>> from WGS projects >>>> 9,252,587 sequences; 3,169,972,781 total letters Query= >>>> >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>> >>> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >>>> ATGPDPDDEYE >>>> Length=150 >>>> >>>> >>>> >>> Score >>>> E >>>> Sequences producing significant alignments: >>> (Bits) >>>> Value >>>> >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>> reinhard... 174 >>>> 2e-42 >>>> >>>> >>>> ALIGNMENTS >>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>> Length=303 >>>> >>>> Score = 174 bits (442), Expect = 2e-42, Method: >>> Composition-based >>>> stats. >>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>> Gaps = 0/150 >>>> (0%) >>>> >>>> Query 1 >>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>> 60 >>>> >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>> Sbjct 154 >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>> 213 >>>> >>>> Query 61 >>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>> 120 >>>> >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>> Sbjct 214 >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>> 273 >>>> >>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>> >>>> >>>> >>>> Database: All non-redundant GenBank CDS >>>> translations+PDB+SwissProt+PIR+PRF >>>> excluding environmental samples from WGS projects >>>> Posted date: Jul 5, 2009 4:41 AM >>>> Number of letters in database: -1,124,994,511 >>>> Number of sequences in database: 9,252,587 >>>> >>>> Lambda K H >>>> 0.309 0.122 0.345 >>>> Gapped >>>> Lambda K H >>>> 0.267 0.0410 0.140 >>>> Matrix: BLOSUM62 >>>> Gap Penalties: Existence: 11, Extension: 1 >>>> Number of Sequences: 9252587 >>>> Number of Hits to DB: 60273703 >>>> Number of extensions: 1448367 >>>> Number of successful extensions: 2103 >>>> Number of sequences better than 10: 0 >>>> Number of HSP's better than 10 without gapping: 0 >>>> Number of HSP's gapped: 2113 >>>> Number of HSP's successfully gapped: 0 >>>> Length of query: 150 >>>> Length of database: 3169972781 >>>> Length adjustment: 113 >>>> Effective length of query: 37 >>>> Effective length of database: 2124430450 >>>> Effective search space: 78603926650 >>>> Effective search space used: 78603926650 >>>> T: 11 >>>> A: 40 >>>> X1: 16 (7.1 bits) >>>> X2: 38 (14.6 bits) >>>> X3: 64 (24.7 bits) >>>> S1: 42 (20.8 bits) >>>> S2: 74 (33.1 bits) >>>> >>>> >>> ############################################################## >>> ################ >>>> ################################### >>>> and here are the hits (?) of the blast-algorithm on the >>> ncbi-homepage with >>>> the same query of course: >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>> reinhard... 300 >>>> 3e-80 >>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>> [Acyrtho... 36.2 >>>> 1.1 >>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>> [Blautia... 35.4 >>>> 1.8 >>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>> brazil... 34.3 >>>> 4.2 >>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>> [Aspergillus n... 33.5 >>>> 6.0 >>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>> [Methyloba... 33.5 >>>> 7.0 >>>> >>> ############################################################## >>> ################ >>>> ###################################at >>>> least the first hit is the same, but even there there is a >>> different score >>>> and e-value. >>>> >>>> thanks so much for any help :) >>>> regards, jonas >>>> >>>> >>>> ----- Original Message ----- >>>> From: "Chris Fields" >>>> To: "Jason Stajich" >>>> Cc: "Smithies, Russell" >>> ; "'BioPerl >>>> List'" ; "'Jonas Schaer'" >>>> >>>> Sent: Monday, July 06, 2009 12:51 AM >>>> Subject: Re: [Bioperl-l] different results with remote-blast skript >>>> >>>> >>>>> That inspires confidence ;> >>>>> >>>>> chris >>>>> >>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>> >>>>>> integer overflow in blast.... >>>>>> >>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>> >>>>>>> I'd guess it's a difference in the parameters used. >>>>>>> Interesting that both have the number of letters in the db as >>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>> >>>>>>> Stats from your remote_blast: >>>>>>> >>>>>>> 'stats' => { >>>>>>> 'S1' => '42', >>>>>>> 'S1_bits' => '20.8', >>>>>>> 'lambda' => '0.309', >>>>>>> 'entropy' => '0.345', >>>>>>> 'kappa_gapped' => '0.0410', >>>>>>> 'T' => '11', >>>>>>> 'kappa' => '0.122', >>>>>>> 'X3_bits' => '24.7', >>>>>>> 'X1' => '16', >>>>>>> 'lambda_gapped' => '0.267', >>>>>>> 'X2' => '38', >>>>>>> 'S2' => '74', >>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>> 'Hits_to_DB' => '60102303', >>>>>>> 'dbletters' => '-1125070205', >>>>>>> 'A' => '40', >>>>>>> 'num_successful_extensions' => '2004', >>>>>>> 'num_extensions' => '1436892', >>>>>>> 'X1_bits' => '7.1', >>>>>>> 'X3' => '64', >>>>>>> 'entropy_gapped' => '0.140', >>>>>>> 'dbentries' => '9252258', >>>>>>> 'X2_bits' => '14.6', >>>>>>> 'S2_bits' => '33.1' >>>>>>> } >>>>>>> >>>>>>> >>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>> >>>>>>> Database: All non-redundant GenBank CDS >>> translations+PDB+SwissProt >>>>>>> +PIR+PRF >>>>>>> excluding environmental samples from WGS projects >>>>>>> Posted date: Jul 4, 2009 4:41 AM >>>>>>> Number of letters in database: -1,125,070,205 >>>>>>> Number of sequences in database: 9,252,258 >>>>>>> >>>>>>> Lambda K H >>>>>>> 0.309 0.124 0.340 >>>>>>> Gapped >>>>>>> Lambda K H >>>>>>> 0.267 0.0410 0.140 >>>>>>> Matrix: BLOSUM62 >>>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>>> Number of Sequences: 9252258 >>>>>>> Number of Hits to DB: 86493230 >>>>>>> Number of extensions: 3101413 >>>>>>> Number of successful extensions: 9001 >>>>>>> Number of sequences better than 100: 65 >>>>>>> Number of HSP's better than 100 without gapping: 0 >>>>>>> Number of HSP's gapped: 9000 >>>>>>> Number of HSP's successfully gapped: 66 >>>>>>> Length of query: 150 >>>>>>> Length of database: 3169897087 >>>>>>> Length adjustment: 113 >>>>>>> Effective length of query: 37 >>>>>>> Effective length of database: 2124391933 >>>>>>> Effective search space: 78602501521 >>>>>>> Effective search space used: 78602501521 >>>>>>> T: 11 >>>>>>> A: 40 >>>>>>> X1: 16 (7.1 bits) >>>>>>> X2: 38 (14.6 bits) >>>>>>> X3: 64 (24.7 bits) >>>>>>> S1: 42 (20.8 bits) >>>>>>> S2: 65 (29.6 bits) >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>> To: BioPerl List >>>>>>>> Subject: [Bioperl-l] different results with remote-blast skript >>>>>>>> >>>>>>>> Hi again :) >>>>>>>> please, I only have this little question: >>>>>>>> why do I get different results with my remote::blast >>> perl skript >>>>>>>> then on the >>>>>>>> ncbi blast homepage? >>>>>>>> I am using blastp, the query is an amino-sequence (different >>>>>>>> results with any >>>>>>>> sequence, differences not only in number of hits but even in e- >>>>>>>> values, scores >>>>>>>> etc...), the database is 'nr'. >>>>>>>> PLEASE help me, >>>>>>>> thank you in advance, >>>>>>>> Jonas >>>>>>>> >>>>>>>> ps: my skript: >>>>>>>> >>>> >>> ############################################################## >>> ################ >>>>>>>> ## >>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>> use strict; >>>>>>>> my @blast_report; >>>>>>>> my $prog = 'blastp'; >>>>>>>> my $db = 'nr'; >>>>>>>> my $e_val= '1e-10'; >>>>>>>> #my $e_val= '10'; >>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>> '-data' => $db, >>>>>>>> '-expect' => $e_val, >>>>>>>> '-readmethod' => 'SearchIO' ); >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>> $ >>>>>>>> Bio >>>>>>>> >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>> = '1'; >>>>>>>> >>>>>>>> my >>>>>>>> $ >>>>>>>> blast_seq >>>>>>>> >>> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>> >>>> >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>> AFRQAHQNTAMATGPD >>>>>>>> PDDEYE'; >>>>>>>> #$v is just to turn on and off the messages >>>>>>>> my $v = 1; >>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>> 'Bio::PrimarySeq'); >>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, -display_id => >>>>>>>> "$blast_seq"); >>>>>>>> my $filename='temp2.out'; >>>>>>>> my $r = $factory->submit_blast($seq); >>>>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>>>> while ( my @rids = $factory->each_rid ) >>>>>>>> { >>>>>>>> foreach my $rid ( @rids ) >>>>>>>> { >>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>> if( !ref($rc) ) >>>>>>>> { >>>>>>>> if( $rc < 0 ) >>>>>>>> { >>>>>>>> $factory->remove_rid($rid); >>>>>>>> } >>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>> } >>>>>>>> else >>>>>>>> { >>>>>>>> my $result = $rc->next_result(); >>>>>>>> $factory->save_output($filename); >>>>>>>> $factory->remove_rid($rid); >>>>>>>> print "\nQuery Name: ", >>> $result->query_name(), >>>>>>>> "\n"; >>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>> { >>>>>>>> next unless ( $v > 0); >>>>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>> { >>>>>>>> print "\t\tscore is ", >>> $hsp->score, "\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> } >>>>>>>> @blast_report = get_file_data ($filename); >>>>>>>> return @blast_report; >>>>>>>> >>>> >>> ############################################################## >>> ################ >>>>>>>> #### >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> = >>>>>>> = >>>>>>> >>> ===================================================================== >>>>>>> Attention: The information contained in this message and/or >>>>>>> attachments >>>>>>> from AgResearch Limited is intended only for the >>> persons or entities >>>>>>> to which it is addressed and may contain confidential and/or >>>>>>> privileged >>>>>>> material. Any review, retransmission, dissemination or other use >>>>>>> of, or >>>>>>> taking of any action in reliance upon, this information >>> by persons or >>>>>>> entities other than the intended recipients is prohibited by >>>>>>> AgResearch >>>>>>> Limited. If you have received this message in error, >>> please notify >>>>>>> the >>>>>>> sender immediately. >>>>>>> = >>>>>>> = >>>>>>> >>> ===================================================================== >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> -- >>>>>> Jason Stajich >>>>>> jason at bioperl.org >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> -------------------------------------------------------------- >>> ---------------- >>>> -- >>>> >>>> >>>> >>>> No virus found in this incoming message. >>>> Checked by AVG - www.avg.com >>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>> Date: 07/05/09 >>>> 05:53:00 >>> >>> >>> -------------------------------------------------------------- >>> ------------------ >>> >>> >>> >>> No virus found in this incoming message. >>> Checked by AVG - www.avg.com >>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>> Date: 07/05/09 >>> 17:54:00 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -------------------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release Date: 07/09/09 05:55:00 From bosborne11 at verizon.net Fri Jul 10 08:58:40 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Fri, 10 Jul 2009 08:58:40 -0400 Subject: [Bioperl-l] update PLATFORMS file In-Reply-To: <4A569AC4.5080200@cornell.edu> References: <4A569AC4.5080200@cornell.edu> Message-ID: Robert, This file can be removed, certainly. BIO On Jul 9, 2009, at 9:35 PM, Robert Buels wrote: > Taking this to bioperl-l: > > koenvanderdrift at gmail.com said: > > The PLATFORMS document contains a *very* outdated link on how to > install > bioperl on Macs. Please remove this link: "Steve Cannon has made > available > Bioperl OS X installation directions and notes online at the > following URL: http://www.tc.umn.edu/~cann0010/Bioperl_OSX_install.html > " > > ------- Comment #1 from cjfields at bioperl.org 2009-07-09 21:18 EST > ------- > I think we could actually remove this file completely. It hasn't > been updated > in quite a while and any information it contains would probably > serve a better > purpose elsewhere. > > > So, remove the PLATFORMS file? Is all of the stuff in there on the > wiki? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers.org Fri Jul 10 11:45:13 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Fri, 10 Jul 2009 10:45:13 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> Message-ID: Chris, I've added a test to bioperl RemoteBlast.t that demonstrates the following. Is it appropriate to submit it? Jonas, OK, I was a little quick on the gun... but I've got it now. You don't need to change the wrapper. Here is what you need to do: # 1) set your database like this: -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html for other cdd database options # 2) add this line before submitting the job: $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; You're in - No other changes needed. Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Jonas Schaer [mailto:Brotelzwieb at gmx.de] > Sent: Friday, July 10, 2009 4:18 AM > To: BioPerl List; Cook, Malcolm; Chris Fields > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > Hi, > I tried to do what Malcom proposed my ($prog = 'rpsblast'; > my $db = > 'CDD';) but that didn't work. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Value rpsblast for PUT parameter PROGRAM does not match > expression t?blast[ pnx]. Rejecting. > STACK: Error::throw > STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > C:/Perl/site/lib/Bio/Tools > /Run/RemoteBlast.pm:329 > STACK: Bio::Tools::Run::RemoteBlast::new > C:/Perl/site/lib/Bio/Tools/Run/RemoteBl > ast.pm:257 > STACK: blast_a_seq2.pm:14 > ----------------------------------------------------------- > So I should try to "change the wrapper to allow 'rpsblast'", > right? Could You tell me how to do that, please? So sorry but > I have no idea yet...:) If that doesn't work, is there any > other way to run cdd-searches with perl? > Thank you so much! > Regards, Jonas > > ----- Original Message ----- > From: "Chris Fields" > To: "Cook, Malcolm" > Cc: "'Jonas Schaer'" ; "'BioPerl List'" > ; "'Smithies, Russell'" > ; > Sent: Thursday, July 09, 2009 9:19 PM > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > > > I've scheduled this tentatively for the 1.6 release series (just not > > sure when yet). It may work as is, but I haven't tried it out yet > > (and am hazarding to guess it only retrieves the single main RID at > > the moment). > > > > chris > > > > On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > > > >> Jonas, > >> > >> If you want to continue to use the bioperl remoteblast interface, > >> probably what you should do is simply call it twice. > >> > >> Once, as you already know how to do, which will return without CDD > >> results. > >> > >> Secondly, to get the CDD results, call remoteblast a second time. > >> This time, using > >> -database => 'CDD' > >> -program => 'rpsblast' > >> > >> However, the wrapper may object to the 'rpsblast' program. It is > >> not listed in the POD - > >> > http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R > emoteBlast.pm) > >> If so, my guess is that changing the perl wrapper to allow > >> rpsblast will "just work" (tm). I've cc:ed > cjfields at bioperl.org for > >> his opinion on this. > >> > >> Also, you might want to perform the CDD search first, especially if > >> you are streaming results to eyeball that might like something to > >> look at while the second (presumably longer) search is running. > >> > >> Cheers, > >> > >> Malcolm Cook > >> Stowers Institute for Medical Research - Kansas City, Missouri > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org > >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >>> Jonas Schaer > >>> Sent: Thursday, July 09, 2009 5:16 AM > >>> To: BioPerl List; Smithies, Russell > >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >>> > >>> Hi guys, > >>> Thank you all so much for your help and patience :). Of > >>> course you were right and I finaly found the right > >>> put-parameter to get exactly the same hits as on the homepage. > >>> I do have an other question though :)... > >>> I now want to include a search for conserved domains, but > >>> when I try to use the CDD_SEARCH-parameter > >>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# > >>> sub:CDD_SEARCH) > >>> like the other put-parameters the way chris once told > >>> me(works fine with the other params): > >>> > >>> my %put = ( > >>> WORD_SIZE => 3, > >>> HITLIST_SIZE => 100, > >>> THRESHOLD => 11, > >>> FILTER => 'R', > >>> GENETIC_CODE => 1, > >>> CDD_SEARCH => 'on' > >>> ###I tried it > >>> with 'true' and '1', too. > >>> > >>> ); > >>> > >>> for my $putName (keys %put) { > >>> $factory->submit_parameter($putName,$put{$putName}); > >>> } > >>> > >>> > >>> ...an exception is thrown: > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: CDD_SEARCH is not a valid PUT parameter. > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw > C:/Perl/site/lib/Bio/Root/Root.pm:359 > >>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >>> C:/Perl/site/lib/Bio/Tools > >>> /Run/RemoteBlast.pm:325 > >>> STACK: main::blast_a_sequence firsteval0.8.pm:383 > >>> STACK: main::blast_it firsteval0.8.pm:288 > >>> STACK: firsteval0.8.pm:35 > >>> ----------------------------------------------------------- . > >>> I guess somehow this could be the solution to my problem: > >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s > >>> ub:RID-for-Simultaneous > >>> , but unfortunately I don't understand what to do. > >>> I'm so sorry to bother you with this but please help me once > >>> more...:) > >>> > >>> Best regards and thanks in advance, > >>> Jonas > >>> > >>> ----- Original Message ----- > >>> From: "Smithies, Russell" > >>> To: "'Jonas Schaer'" > >>> Cc: "'Chris Fields'" ; "'BioPerl List'" > >>> > >>> Sent: Monday, July 06, 2009 10:56 PM > >>> Subject: RE: [Bioperl-l] different results with > remote-blast skript > >>> > >>> > >>> Hi Jonas, > >>> You can't just play with the BLAST parameters and hope > for a "better" > >>> result. > >>> I'd suggest that if you aren't sure what they do, you should > >>> leave them > >>> alone as small changes can make huge differences in the > >>> output - it's quite > >>> possible to miss finding what you're looking for by using > the wrong > >>> parameters. > >>> If all else fails, read the blast manual: > >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall > >>> _all.html > >>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ > >>> Or Read Ian Korfs' excellent book: > >>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > >> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > >>> > >>> Don't worry about the integer overflow bug as there's nothing > >>> you can do > >>> about it. If you're interested, Google and Wikipedia are your > >>> friends: > >>> http://en.wikipedia.org/wiki/Integer_overflow > >>> > >>> > >>> Russell > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>> Sent: Tuesday, 7 July 2009 12:14 a.m. > >>>> To: BioPerl List; Chris Fields > >>>> Subject: Re: [Bioperl-l] different results with > remote-blast skript > >>>> > >>>> Hi guys, thanks for your answers so far. > >>>> @jason: integer overflow in blast.... sorry, but what do > >>> you mean by that? > >>>> how can I fix it...? > >>>> > >>>> Since I never really changed any parameters I thought them > >>> all to be > >>>> default. > >>>> whatever, I tried to get "better" results with my prog > by changing > >>>> these: > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>> > >>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI > >>> STICS'} = > >>>> '1'; > >>>> with no effect...I guess these were default values anyway. > >>>> > >>>> So please maybe you can tell me all the other parameters I > >>> can change with > >>>> my > >>>> perl-skript AND how to do that? > >>>> Unfortunately both, perl and the blast-algorithm are pretty > >>> much new to > >>>> me, > >>>> maybe thats why I just cannot find out how to do that on my > >>> own... :/ > >>>> > >>>> Here is the output I get with my remote-blast skript: > >>>> > >>> ############################################################## > >>> ################ > >>>> ################################### > >>>> Query Name: > >>>> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > >>>> L > >>>> hit name is ref|XP_001702807.1| > >>>> score is 442 > >>>> BLASTP 2.2.21+ > >>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro > >>> A. Schaffer, > >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > >>> Lipman (1997), > >>>> "Gapped > >>>> BLAST and PSI-BLAST: a new generation of protein database search > >>>> programs", > >>>> Nucleic Acids Res. 25:3389-3402. > >>>> > >>>> > >>>> Reference for composition-based statistics: Alejandro A. > >>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, > >>> John L. Spouge, > >>>> Yuri > >>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), > >>> "Improving the > >>>> accuracy of PSI-BLAST protein database searches with > >>> composition-based > >>>> statistics and other refinements", Nucleic Acids Res. > 29:2994-3005. > >>>> > >>>> > >>>> RID: 53STX5G2013 > >>>> > >>>> > >>>> Database: All non-redundant GenBank CDS > >>>> translations+PDB+SwissProt+PIR+PRF excluding > environmental samples > >>>> from WGS projects > >>>> 9,252,587 sequences; 3,169,972,781 total letters Query= > >>>> > >>> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > >>>> > >>> > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > >>>> ATGPDPDDEYE > >>>> Length=150 > >>>> > >>>> > >>>> > >>> Score > >>>> E > >>>> Sequences producing significant alignments: > >>> (Bits) > >>>> Value > >>>> > >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>> reinhard... 174 > >>>> 2e-42 > >>>> > >>>> > >>>> ALIGNMENTS > >>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > reinhardtii] > >>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > >>>> Length=303 > >>>> > >>>> Score = 174 bits (442), Expect = 2e-42, Method: > >>> Composition-based > >>>> stats. > >>>> Identities = 150/150 (100%), Positives = 150/150 (100%), > >>> Gaps = 0/150 > >>>> (0%) > >>>> > >>>> Query 1 > >>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > >>>> 60 > >>>> > >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>> Sbjct 154 > >>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>> 213 > >>>> > >>>> Query 61 > >>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>> 120 > >>>> > >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>> Sbjct 214 > >>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>> 273 > >>>> > >>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > >>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE > >>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > >>>> > >>>> > >>>> > >>>> Database: All non-redundant GenBank CDS > >>>> translations+PDB+SwissProt+PIR+PRF > >>>> excluding environmental samples from WGS projects > >>>> Posted date: Jul 5, 2009 4:41 AM > >>>> Number of letters in database: -1,124,994,511 > >>>> Number of sequences in database: 9,252,587 > >>>> > >>>> Lambda K H > >>>> 0.309 0.122 0.345 > >>>> Gapped > >>>> Lambda K H > >>>> 0.267 0.0410 0.140 > >>>> Matrix: BLOSUM62 > >>>> Gap Penalties: Existence: 11, Extension: 1 > >>>> Number of Sequences: 9252587 > >>>> Number of Hits to DB: 60273703 > >>>> Number of extensions: 1448367 > >>>> Number of successful extensions: 2103 > >>>> Number of sequences better than 10: 0 > >>>> Number of HSP's better than 10 without gapping: 0 > >>>> Number of HSP's gapped: 2113 > >>>> Number of HSP's successfully gapped: 0 > >>>> Length of query: 150 > >>>> Length of database: 3169972781 > >>>> Length adjustment: 113 > >>>> Effective length of query: 37 > >>>> Effective length of database: 2124430450 > >>>> Effective search space: 78603926650 > >>>> Effective search space used: 78603926650 > >>>> T: 11 > >>>> A: 40 > >>>> X1: 16 (7.1 bits) > >>>> X2: 38 (14.6 bits) > >>>> X3: 64 (24.7 bits) > >>>> S1: 42 (20.8 bits) > >>>> S2: 74 (33.1 bits) > >>>> > >>>> > >>> ############################################################## > >>> ################ > >>>> ################################### > >>>> and here are the hits (?) of the blast-algorithm on the > >>> ncbi-homepage with > >>>> the same query of course: > >>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>> reinhard... 300 > >>>> 3e-80 > >>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA > >>> [Acyrtho... 36.2 > >>>> 1.1 > >>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 > >>> [Blautia... 35.4 > >>>> 1.8 > >>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania > >>> brazil... 34.3 > >>>> 4.2 > >>>> ref|XP_680841.1| hypothetical protein AN7572.2 > >>> [Aspergillus n... 33.5 > >>>> 6.0 > >>>> ref|YP_001768110.1| hypothetical protein M446_1150 > >>> [Methyloba... 33.5 > >>>> 7.0 > >>>> > >>> ############################################################## > >>> ################ > >>>> ###################################at > >>>> least the first hit is the same, but even there there is a > >>> different score > >>>> and e-value. > >>>> > >>>> thanks so much for any help :) > >>>> regards, jonas > >>>> > >>>> > >>>> ----- Original Message ----- > >>>> From: "Chris Fields" > >>>> To: "Jason Stajich" > >>>> Cc: "Smithies, Russell" > >>> ; "'BioPerl > >>>> List'" ; "'Jonas Schaer'" > >>>> > >>>> Sent: Monday, July 06, 2009 12:51 AM > >>>> Subject: Re: [Bioperl-l] different results with > remote-blast skript > >>>> > >>>> > >>>>> That inspires confidence ;> > >>>>> > >>>>> chris > >>>>> > >>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > >>>>> > >>>>>> integer overflow in blast.... > >>>>>> > >>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >>>>>> > >>>>>>> I'd guess it's a difference in the parameters used. > >>>>>>> Interesting that both have the number of letters in the db as > >>>>>>> "-1,125,070,205", I assume that's a bug :-) > >>>>>>> > >>>>>>> Stats from your remote_blast: > >>>>>>> > >>>>>>> 'stats' => { > >>>>>>> 'S1' => '42', > >>>>>>> 'S1_bits' => '20.8', > >>>>>>> 'lambda' => '0.309', > >>>>>>> 'entropy' => '0.345', > >>>>>>> 'kappa_gapped' => '0.0410', > >>>>>>> 'T' => '11', > >>>>>>> 'kappa' => '0.122', > >>>>>>> 'X3_bits' => '24.7', > >>>>>>> 'X1' => '16', > >>>>>>> 'lambda_gapped' => '0.267', > >>>>>>> 'X2' => '38', > >>>>>>> 'S2' => '74', > >>>>>>> 'seqs_better_than_cutoff' => '0', > >>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>>>>>> 'Hits_to_DB' => '60102303', > >>>>>>> 'dbletters' => '-1125070205', > >>>>>>> 'A' => '40', > >>>>>>> 'num_successful_extensions' => '2004', > >>>>>>> 'num_extensions' => '1436892', > >>>>>>> 'X1_bits' => '7.1', > >>>>>>> 'X3' => '64', > >>>>>>> 'entropy_gapped' => '0.140', > >>>>>>> 'dbentries' => '9252258', > >>>>>>> 'X2_bits' => '14.6', > >>>>>>> 'S2_bits' => '33.1' > >>>>>>> } > >>>>>>> > >>>>>>> > >>>>>>> Stats from a blast done on the NCBI webpage: > >>>>>>> > >>>>>>> Database: All non-redundant GenBank CDS > >>> translations+PDB+SwissProt > >>>>>>> +PIR+PRF > >>>>>>> excluding environmental samples from WGS projects > >>>>>>> Posted date: Jul 4, 2009 4:41 AM > >>>>>>> Number of letters in database: -1,125,070,205 > >>>>>>> Number of sequences in database: 9,252,258 > >>>>>>> > >>>>>>> Lambda K H > >>>>>>> 0.309 0.124 0.340 > >>>>>>> Gapped > >>>>>>> Lambda K H > >>>>>>> 0.267 0.0410 0.140 > >>>>>>> Matrix: BLOSUM62 > >>>>>>> Gap Penalties: Existence: 11, Extension: 1 > >>>>>>> Number of Sequences: 9252258 > >>>>>>> Number of Hits to DB: 86493230 > >>>>>>> Number of extensions: 3101413 > >>>>>>> Number of successful extensions: 9001 > >>>>>>> Number of sequences better than 100: 65 > >>>>>>> Number of HSP's better than 100 without gapping: 0 > >>>>>>> Number of HSP's gapped: 9000 > >>>>>>> Number of HSP's successfully gapped: 66 > >>>>>>> Length of query: 150 > >>>>>>> Length of database: 3169897087 > >>>>>>> Length adjustment: 113 > >>>>>>> Effective length of query: 37 > >>>>>>> Effective length of database: 2124391933 > >>>>>>> Effective search space: 78602501521 > >>>>>>> Effective search space used: 78602501521 > >>>>>>> T: 11 > >>>>>>> A: 40 > >>>>>>> X1: 16 (7.1 bits) > >>>>>>> X2: 38 (14.6 bits) > >>>>>>> X3: 64 (24.7 bits) > >>>>>>> S1: 42 (20.8 bits) > >>>>>>> S2: 65 (29.6 bits) > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>>>>>> To: BioPerl List > >>>>>>>> Subject: [Bioperl-l] different results with > remote-blast skript > >>>>>>>> > >>>>>>>> Hi again :) > >>>>>>>> please, I only have this little question: > >>>>>>>> why do I get different results with my remote::blast > >>> perl skript > >>>>>>>> then on the > >>>>>>>> ncbi blast homepage? > >>>>>>>> I am using blastp, the query is an amino-sequence (different > >>>>>>>> results with any > >>>>>>>> sequence, differences not only in number of hits but > even in e- > >>>>>>>> values, scores > >>>>>>>> etc...), the database is 'nr'. > >>>>>>>> PLEASE help me, > >>>>>>>> thank you in advance, > >>>>>>>> Jonas > >>>>>>>> > >>>>>>>> ps: my skript: > >>>>>>>> > >>>> > >>> ############################################################## > >>> ################ > >>>>>>>> ## > >>>>>>>> use Bio::Seq::SeqFactory; > >>>>>>>> use Bio::Tools::Run::RemoteBlast; > >>>>>>>> use strict; > >>>>>>>> my @blast_report; > >>>>>>>> my $prog = 'blastp'; > >>>>>>>> my $db = 'nr'; > >>>>>>>> my $e_val= '1e-10'; > >>>>>>>> #my $e_val= '10'; > >>>>>>>> my @params = ( '-prog' => $prog, > >>>>>>>> '-data' => $db, > >>>>>>>> '-expect' => $e_val, > >>>>>>>> '-readmethod' => 'SearchIO' ); > >>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>>>>>> $ > >>>>>>>> Bio > >>>>>>>> > >>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>>>>>> = '1'; > >>>>>>>> > >>>>>>>> my > >>>>>>>> $ > >>>>>>>> blast_seq > >>>>>>>> > >>> > ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>>>>>> > >>>> > >>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN > >>> AFRQAHQNTAMATGPD > >>>>>>>> PDDEYE'; > >>>>>>>> #$v is just to turn on and off the messages > >>>>>>>> my $v = 1; > >>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>>>>>> 'Bio::PrimarySeq'); > >>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, > -display_id => > >>>>>>>> "$blast_seq"); > >>>>>>>> my $filename='temp2.out'; > >>>>>>>> my $r = $factory->submit_blast($seq); > >>>>>>>> print STDERR "waiting..." if( $v > 0 ); > >>>>>>>> while ( my @rids = $factory->each_rid ) > >>>>>>>> { > >>>>>>>> foreach my $rid ( @rids ) > >>>>>>>> { > >>>>>>>> my $rc = $factory->retrieve_blast($rid); > >>>>>>>> if( !ref($rc) ) > >>>>>>>> { > >>>>>>>> if( $rc < 0 ) > >>>>>>>> { > >>>>>>>> $factory->remove_rid($rid); > >>>>>>>> } > >>>>>>>> print STDERR "." if ( $v > 0 ); > >>>>>>>> } > >>>>>>>> else > >>>>>>>> { > >>>>>>>> my $result = $rc->next_result(); > >>>>>>>> $factory->save_output($filename); > >>>>>>>> $factory->remove_rid($rid); > >>>>>>>> print "\nQuery Name: ", > >>> $result->query_name(), > >>>>>>>> "\n"; > >>>>>>>> while ( my $hit = $result->next_hit ) > >>>>>>>> { > >>>>>>>> next unless ( $v > 0); > >>>>>>>> print "\thit name is ", > $hit->name, "\n"; > >>>>>>>> while( my $hsp = $hit->next_hsp ) > >>>>>>>> { > >>>>>>>> print "\t\tscore is ", > >>> $hsp->score, "\n"; > >>>>>>>> } > >>>>>>>> } > >>>>>>>> } > >>>>>>>> } > >>>>>>>> > >>>>>>>> > >>>>>>>> } > >>>>>>>> @blast_report = get_file_data ($filename); > >>>>>>>> return @blast_report; > >>>>>>>> > >>>> > >>> ############################################################## > >>> ################ > >>>>>>>> #### > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> = > >>>>>>> = > >>>>>>> > >>> > ===================================================================== > >>>>>>> Attention: The information contained in this message and/or > >>>>>>> attachments > >>>>>>> from AgResearch Limited is intended only for the > >>> persons or entities > >>>>>>> to which it is addressed and may contain confidential and/or > >>>>>>> privileged > >>>>>>> material. Any review, retransmission, dissemination > or other use > >>>>>>> of, or > >>>>>>> taking of any action in reliance upon, this information > >>> by persons or > >>>>>>> entities other than the intended recipients is prohibited by > >>>>>>> AgResearch > >>>>>>> Limited. If you have received this message in error, > >>> please notify > >>>>>>> the > >>>>>>> sender immediately. > >>>>>>> = > >>>>>>> = > >>>>>>> > >>> > ===================================================================== > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> -- > >>>>>> Jason Stajich > >>>>>> jason at bioperl.org > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> > >>> -------------------------------------------------------------- > >>> ---------------- > >>>> -- > >>>> > >>>> > >>>> > >>>> No virus found in this incoming message. > >>>> Checked by AVG - www.avg.com > >>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release > >>> Date: 07/05/09 > >>>> 05:53:00 > >>> > >>> > >>> -------------------------------------------------------------- > >>> ------------------ > >>> > >>> > >>> > >>> No virus found in this incoming message. > >>> Checked by AVG - www.avg.com > >>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release > >>> Date: 07/05/09 > >>> 17:54:00 > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > > > -------------------------------------------------------------- > ------------------ > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release > Date: 07/09/09 > 05:55:00 > > From clarsen at vecna.com Fri Jul 10 12:41:37 2009 From: clarsen at vecna.com (Chris Larsen) Date: Fri, 10 Jul 2009 12:41:37 -0400 Subject: [Bioperl-l] Mac platform instructions Message-ID: Brian, I too am on a Mac now. However the 'getting bioperl' MacOs link on: "http://www.bioperl.org/wiki/Getting_BioPerl" which loads the URL: "http://www.bioperl.org/wiki/Getting_BioPerl#MacOS_X_using_fink" does nothing but reload the same page...it took a bit to figure out how to begin install, scroll around etc. since it doesnt behave as do the other platforms links. (FIrefox 3.0.11, OS X 10.5.7). Think I have it now. The rest of the install instructions seem straightforward and should behave as well as the Fedora tarball did, thanks for that documentation. Cheers Chris -- Christopher Larsen, Ph.D. Sr. Scientist / Grants Manager Vecna Technologies 6404 Ivy Lane #500 Greenbelt, MD 20770 Phone: (240) 965-4525 Fax: (240) 547-6133 240-737-4525 From cjfields1 at gmail.com Fri Jul 10 14:04:43 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Fri, 10 Jul 2009 13:04:43 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> Message-ID: <7BBF64FF-F531-4F7C-8A31-BD04FCE1BF1A@gmail.com> Malcolm, Nice! Go ahead and add the test in; we can look at trying to get CDD_SEARCH working at some point but this is a nice workaround. chris On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: > Chris, I've added a test to bioperl RemoteBlast.t that demonstrates > the following. Is it appropriate to submit it? > > Jonas, OK, I was a little quick on the gun... but I've got it now. > > You don't need to change the wrapper. Here is what you need to do: > > # 1) set your database like this: > > -database => 'cdsearch/cdd', # c.f. http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html > for other cdd database options > > # 2) add this line before submitting the job: > $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; > > You're in - No other changes needed. > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: Jonas Schaer [mailto:Brotelzwieb at gmx.de] >> Sent: Friday, July 10, 2009 4:18 AM >> To: BioPerl List; Cook, Malcolm; Chris Fields >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Hi, >> I tried to do what Malcom proposed my ($prog = 'rpsblast'; >> my $db = >> 'CDD';) but that didn't work. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Value rpsblast for PUT parameter PROGRAM does not match >> expression t?blast[ pnx]. Rejecting. >> STACK: Error::throw >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >> C:/Perl/site/lib/Bio/Tools >> /Run/RemoteBlast.pm:329 >> STACK: Bio::Tools::Run::RemoteBlast::new >> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl >> ast.pm:257 >> STACK: blast_a_seq2.pm:14 >> ----------------------------------------------------------- >> So I should try to "change the wrapper to allow 'rpsblast'", >> right? Could You tell me how to do that, please? So sorry but >> I have no idea yet...:) If that doesn't work, is there any >> other way to run cdd-searches with perl? >> Thank you so much! >> Regards, Jonas >> >> ----- Original Message ----- >> From: "Chris Fields" >> To: "Cook, Malcolm" >> Cc: "'Jonas Schaer'" ; "'BioPerl List'" >> ; "'Smithies, Russell'" >> ; >> Sent: Thursday, July 09, 2009 9:19 PM >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> >>> I've scheduled this tentatively for the 1.6 release series (just not >>> sure when yet). It may work as is, but I haven't tried it out yet >>> (and am hazarding to guess it only retrieves the single main RID at >>> the moment). >>> >>> chris >>> >>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: >>> >>>> Jonas, >>>> >>>> If you want to continue to use the bioperl remoteblast interface, >>>> probably what you should do is simply call it twice. >>>> >>>> Once, as you already know how to do, which will return without CDD >>>> results. >>>> >>>> Secondly, to get the CDD results, call remoteblast a second time. >>>> This time, using >>>> -database => 'CDD' >>>> -program => 'rpsblast' >>>> >>>> However, the wrapper may object to the 'rpsblast' program. It is >>>> not listed in the POD - >>>> >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R >> emoteBlast.pm) >>>> If so, my guess is that changing the perl wrapper to allow >>>> rpsblast will "just work" (tm). I've cc:ed >> cjfields at bioperl.org for >>>> his opinion on this. >>>> >>>> Also, you might want to perform the CDD search first, especially if >>>> you are streaming results to eyeball that might like something to >>>> look at while the second (presumably longer) search is running. >>>> >>>> Cheers, >>>> >>>> Malcolm Cook >>>> Stowers Institute for Medical Research - Kansas City, Missouri >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>> Jonas Schaer >>>>> Sent: Thursday, July 09, 2009 5:16 AM >>>>> To: BioPerl List; Smithies, Russell >>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>>> >>>>> Hi guys, >>>>> Thank you all so much for your help and patience :). Of >>>>> course you were right and I finaly found the right >>>>> put-parameter to get exactly the same hits as on the homepage. >>>>> I do have an other question though :)... >>>>> I now want to include a search for conserved domains, but >>>>> when I try to use the CDD_SEARCH-parameter >>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>>>> sub:CDD_SEARCH) >>>>> like the other put-parameters the way chris once told >>>>> me(works fine with the other params): >>>>> >>>>> my %put = ( >>>>> WORD_SIZE => 3, >>>>> HITLIST_SIZE => 100, >>>>> THRESHOLD => 11, >>>>> FILTER => 'R', >>>>> GENETIC_CODE => 1, >>>>> CDD_SEARCH => 'on' >>>>> ###I tried it >>>>> with 'true' and '1', too. >>>>> >>>>> ); >>>>> >>>>> for my $putName (keys %put) { >>>>> $factory->submit_parameter($putName,$put{$putName}); >>>>> } >>>>> >>>>> >>>>> ...an exception is thrown: >>>>> >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> MSG: CDD_SEARCH is not a valid PUT parameter. >>>>> STACK: Error::throw >>>>> STACK: Bio::Root::Root::throw >> C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>>> C:/Perl/site/lib/Bio/Tools >>>>> /Run/RemoteBlast.pm:325 >>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>>>> STACK: main::blast_it firsteval0.8.pm:288 >>>>> STACK: firsteval0.8.pm:35 >>>>> ----------------------------------------------------------- . >>>>> I guess somehow this could be the solution to my problem: >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>>>> ub:RID-for-Simultaneous >>>>> , but unfortunately I don't understand what to do. >>>>> I'm so sorry to bother you with this but please help me once >>>>> more...:) >>>>> >>>>> Best regards and thanks in advance, >>>>> Jonas >>>>> >>>>> ----- Original Message ----- >>>>> From: "Smithies, Russell" >>>>> To: "'Jonas Schaer'" >>>>> Cc: "'Chris Fields'" ; "'BioPerl List'" >>>>> >>>>> Sent: Monday, July 06, 2009 10:56 PM >>>>> Subject: RE: [Bioperl-l] different results with >> remote-blast skript >>>>> >>>>> >>>>> Hi Jonas, >>>>> You can't just play with the BLAST parameters and hope >> for a "better" >>>>> result. >>>>> I'd suggest that if you aren't sure what they do, you should >>>>> leave them >>>>> alone as small changes can make huge differences in the >>>>> output - it's quite >>>>> possible to miss finding what you're looking for by using >> the wrong >>>>> parameters. >>>>> If all else fails, read the blast manual: >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>>>> _all.html >>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>>>> Or Read Ian Korfs' excellent book: >>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>>>> >>>>> Don't worry about the integer overflow bug as there's nothing >>>>> you can do >>>>> about it. If you're interested, Google and Wikipedia are your >>>>> friends: >>>>> http://en.wikipedia.org/wiki/Integer_overflow >>>>> >>>>> >>>>> Russell >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>>>> To: BioPerl List; Chris Fields >>>>>> Subject: Re: [Bioperl-l] different results with >> remote-blast skript >>>>>> >>>>>> Hi guys, thanks for your answers so far. >>>>>> @jason: integer overflow in blast.... sorry, but what do >>>>> you mean by that? >>>>>> how can I fix it...? >>>>>> >>>>>> Since I never really changed any parameters I thought them >>>>> all to be >>>>>> default. >>>>>> whatever, I tried to get "better" results with my prog >> by changing >>>>>> these: >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>> >>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>>>> STICS'} = >>>>>> '1'; >>>>>> with no effect...I guess these were default values anyway. >>>>>> >>>>>> So please maybe you can tell me all the other parameters I >>>>> can change with >>>>>> my >>>>>> perl-skript AND how to do that? >>>>>> Unfortunately both, perl and the blast-algorithm are pretty >>>>> much new to >>>>>> me, >>>>>> maybe thats why I just cannot find out how to do that on my >>>>> own... :/ >>>>>> >>>>>> Here is the output I get with my remote-blast skript: >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>> ################################### >>>>>> Query Name: >>>>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>>>> L >>>>>> hit name is ref|XP_001702807.1| >>>>>> score is 442 >>>>>> BLASTP 2.2.21+ >>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>>>> A. Schaffer, >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>> Lipman (1997), >>>>>> "Gapped >>>>>> BLAST and PSI-BLAST: a new generation of protein database search >>>>>> programs", >>>>>> Nucleic Acids Res. 25:3389-3402. >>>>>> >>>>>> >>>>>> Reference for composition-based statistics: Alejandro A. >>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>>>> John L. Spouge, >>>>>> Yuri >>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>>>> "Improving the >>>>>> accuracy of PSI-BLAST protein database searches with >>>>> composition-based >>>>>> statistics and other refinements", Nucleic Acids Res. >> 29:2994-3005. >>>>>> >>>>>> >>>>>> RID: 53STX5G2013 >>>>>> >>>>>> >>>>>> Database: All non-redundant GenBank CDS >>>>>> translations+PDB+SwissProt+PIR+PRF excluding >> environmental samples >>>>>> from WGS projects >>>>>> 9,252,587 sequences; 3,169,972,781 total letters Query= >>>>>> >>>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>>>> >>>>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM >>>>>> ATGPDPDDEYE >>>>>> Length=150 >>>>>> >>>>>> >>>>>> >>>>> Score >>>>>> E >>>>>> Sequences producing significant alignments: >>>>> (Bits) >>>>>> Value >>>>>> >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>> reinhard... 174 >>>>>> 2e-42 >>>>>> >>>>>> >>>>>> ALIGNMENTS >>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >> reinhardtii] >>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>>>> Length=303 >>>>>> >>>>>> Score = 174 bits (442), Expect = 2e-42, Method: >>>>> Composition-based >>>>>> stats. >>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>>>> Gaps = 0/150 >>>>>> (0%) >>>>>> >>>>>> Query 1 >>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>>>> 60 >>>>>> >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>> Sbjct 154 >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>> 213 >>>>>> >>>>>> Query 61 >>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>> 120 >>>>>> >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>> Sbjct 214 >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>> 273 >>>>>> >>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE >>>>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>>>> >>>>>> >>>>>> >>>>>> Database: All non-redundant GenBank CDS >>>>>> translations+PDB+SwissProt+PIR+PRF >>>>>> excluding environmental samples from WGS projects >>>>>> Posted date: Jul 5, 2009 4:41 AM >>>>>> Number of letters in database: -1,124,994,511 >>>>>> Number of sequences in database: 9,252,587 >>>>>> >>>>>> Lambda K H >>>>>> 0.309 0.122 0.345 >>>>>> Gapped >>>>>> Lambda K H >>>>>> 0.267 0.0410 0.140 >>>>>> Matrix: BLOSUM62 >>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>> Number of Sequences: 9252587 >>>>>> Number of Hits to DB: 60273703 >>>>>> Number of extensions: 1448367 >>>>>> Number of successful extensions: 2103 >>>>>> Number of sequences better than 10: 0 >>>>>> Number of HSP's better than 10 without gapping: 0 >>>>>> Number of HSP's gapped: 2113 >>>>>> Number of HSP's successfully gapped: 0 >>>>>> Length of query: 150 >>>>>> Length of database: 3169972781 >>>>>> Length adjustment: 113 >>>>>> Effective length of query: 37 >>>>>> Effective length of database: 2124430450 >>>>>> Effective search space: 78603926650 >>>>>> Effective search space used: 78603926650 >>>>>> T: 11 >>>>>> A: 40 >>>>>> X1: 16 (7.1 bits) >>>>>> X2: 38 (14.6 bits) >>>>>> X3: 64 (24.7 bits) >>>>>> S1: 42 (20.8 bits) >>>>>> S2: 74 (33.1 bits) >>>>>> >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>> ################################### >>>>>> and here are the hits (?) of the blast-algorithm on the >>>>> ncbi-homepage with >>>>>> the same query of course: >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>> reinhard... 300 >>>>>> 3e-80 >>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>>>> [Acyrtho... 36.2 >>>>>> 1.1 >>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>>>> [Blautia... 35.4 >>>>>> 1.8 >>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>>>> brazil... 34.3 >>>>>> 4.2 >>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>>>> [Aspergillus n... 33.5 >>>>>> 6.0 >>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>>>> [Methyloba... 33.5 >>>>>> 7.0 >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>> ###################################at >>>>>> least the first hit is the same, but even there there is a >>>>> different score >>>>>> and e-value. >>>>>> >>>>>> thanks so much for any help :) >>>>>> regards, jonas >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>> From: "Chris Fields" >>>>>> To: "Jason Stajich" >>>>>> Cc: "Smithies, Russell" >>>>> ; "'BioPerl >>>>>> List'" ; "'Jonas Schaer'" >>>>>> >>>>>> Sent: Monday, July 06, 2009 12:51 AM >>>>>> Subject: Re: [Bioperl-l] different results with >> remote-blast skript >>>>>> >>>>>> >>>>>>> That inspires confidence ;> >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>>>> >>>>>>>> integer overflow in blast.... >>>>>>>> >>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>>>> >>>>>>>>> I'd guess it's a difference in the parameters used. >>>>>>>>> Interesting that both have the number of letters in the db as >>>>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>>>> >>>>>>>>> Stats from your remote_blast: >>>>>>>>> >>>>>>>>> 'stats' => { >>>>>>>>> 'S1' => '42', >>>>>>>>> 'S1_bits' => '20.8', >>>>>>>>> 'lambda' => '0.309', >>>>>>>>> 'entropy' => '0.345', >>>>>>>>> 'kappa_gapped' => '0.0410', >>>>>>>>> 'T' => '11', >>>>>>>>> 'kappa' => '0.122', >>>>>>>>> 'X3_bits' => '24.7', >>>>>>>>> 'X1' => '16', >>>>>>>>> 'lambda_gapped' => '0.267', >>>>>>>>> 'X2' => '38', >>>>>>>>> 'S2' => '74', >>>>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>>>> 'Hits_to_DB' => '60102303', >>>>>>>>> 'dbletters' => '-1125070205', >>>>>>>>> 'A' => '40', >>>>>>>>> 'num_successful_extensions' => '2004', >>>>>>>>> 'num_extensions' => '1436892', >>>>>>>>> 'X1_bits' => '7.1', >>>>>>>>> 'X3' => '64', >>>>>>>>> 'entropy_gapped' => '0.140', >>>>>>>>> 'dbentries' => '9252258', >>>>>>>>> 'X2_bits' => '14.6', >>>>>>>>> 'S2_bits' => '33.1' >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>>>> >>>>>>>>> Database: All non-redundant GenBank CDS >>>>> translations+PDB+SwissProt >>>>>>>>> +PIR+PRF >>>>>>>>> excluding environmental samples from WGS projects >>>>>>>>> Posted date: Jul 4, 2009 4:41 AM >>>>>>>>> Number of letters in database: -1,125,070,205 >>>>>>>>> Number of sequences in database: 9,252,258 >>>>>>>>> >>>>>>>>> Lambda K H >>>>>>>>> 0.309 0.124 0.340 >>>>>>>>> Gapped >>>>>>>>> Lambda K H >>>>>>>>> 0.267 0.0410 0.140 >>>>>>>>> Matrix: BLOSUM62 >>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 >>>>>>>>> Number of Sequences: 9252258 >>>>>>>>> Number of Hits to DB: 86493230 >>>>>>>>> Number of extensions: 3101413 >>>>>>>>> Number of successful extensions: 9001 >>>>>>>>> Number of sequences better than 100: 65 >>>>>>>>> Number of HSP's better than 100 without gapping: 0 >>>>>>>>> Number of HSP's gapped: 9000 >>>>>>>>> Number of HSP's successfully gapped: 66 >>>>>>>>> Length of query: 150 >>>>>>>>> Length of database: 3169897087 >>>>>>>>> Length adjustment: 113 >>>>>>>>> Effective length of query: 37 >>>>>>>>> Effective length of database: 2124391933 >>>>>>>>> Effective search space: 78602501521 >>>>>>>>> Effective search space used: 78602501521 >>>>>>>>> T: 11 >>>>>>>>> A: 40 >>>>>>>>> X1: 16 (7.1 bits) >>>>>>>>> X2: 38 (14.6 bits) >>>>>>>>> X3: 64 (24.7 bits) >>>>>>>>> S1: 42 (20.8 bits) >>>>>>>>> S2: 65 (29.6 bits) >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>>>> To: BioPerl List >>>>>>>>>> Subject: [Bioperl-l] different results with >> remote-blast skript >>>>>>>>>> >>>>>>>>>> Hi again :) >>>>>>>>>> please, I only have this little question: >>>>>>>>>> why do I get different results with my remote::blast >>>>> perl skript >>>>>>>>>> then on the >>>>>>>>>> ncbi blast homepage? >>>>>>>>>> I am using blastp, the query is an amino-sequence (different >>>>>>>>>> results with any >>>>>>>>>> sequence, differences not only in number of hits but >> even in e- >>>>>>>>>> values, scores >>>>>>>>>> etc...), the database is 'nr'. >>>>>>>>>> PLEASE help me, >>>>>>>>>> thank you in advance, >>>>>>>>>> Jonas >>>>>>>>>> >>>>>>>>>> ps: my skript: >>>>>>>>>> >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>>>>>> ## >>>>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>>>> use Bio::Tools::Run::RemoteBlast; >>>>>>>>>> use strict; >>>>>>>>>> my @blast_report; >>>>>>>>>> my $prog = 'blastp'; >>>>>>>>>> my $db = 'nr'; >>>>>>>>>> my $e_val= '1e-10'; >>>>>>>>>> #my $e_val= '10'; >>>>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>>>> '-data' => $db, >>>>>>>>>> '-expect' => $e_val, >>>>>>>>>> '-readmethod' => 'SearchIO' ); >>>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>>>> $ >>>>>>>>>> Bio >>>>>>>>>> >>>>> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>>>> = '1'; >>>>>>>>>> >>>>>>>>>> my >>>>>>>>>> $ >>>>>>>>>> blast_seq >>>>>>>>>> >>>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>>>> >>>>>> >>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>>>> AFRQAHQNTAMATGPD >>>>>>>>>> PDDEYE'; >>>>>>>>>> #$v is just to turn on and off the messages >>>>>>>>>> my $v = 1; >>>>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>>>> 'Bio::PrimarySeq'); >>>>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, >> -display_id => >>>>>>>>>> "$blast_seq"); >>>>>>>>>> my $filename='temp2.out'; >>>>>>>>>> my $r = $factory->submit_blast($seq); >>>>>>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>>>>>> while ( my @rids = $factory->each_rid ) >>>>>>>>>> { >>>>>>>>>> foreach my $rid ( @rids ) >>>>>>>>>> { >>>>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>>>> if( !ref($rc) ) >>>>>>>>>> { >>>>>>>>>> if( $rc < 0 ) >>>>>>>>>> { >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> } >>>>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>>>> } >>>>>>>>>> else >>>>>>>>>> { >>>>>>>>>> my $result = $rc->next_result(); >>>>>>>>>> $factory->save_output($filename); >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> print "\nQuery Name: ", >>>>> $result->query_name(), >>>>>>>>>> "\n"; >>>>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>>>> { >>>>>>>>>> next unless ( $v > 0); >>>>>>>>>> print "\thit name is ", >> $hit->name, "\n"; >>>>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>>>> { >>>>>>>>>> print "\t\tscore is ", >>>>> $hsp->score, "\n"; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> @blast_report = get_file_data ($filename); >>>>>>>>>> return @blast_report; >>>>>>>>>> >>>>>> >>>>> ############################################################## >>>>> ################ >>>>>>>>>> #### >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> = >>>>>>>>> = >>>>>>>>> >>>>> >> ===================================================================== >>>>>>>>> Attention: The information contained in this message and/or >>>>>>>>> attachments >>>>>>>>> from AgResearch Limited is intended only for the >>>>> persons or entities >>>>>>>>> to which it is addressed and may contain confidential and/or >>>>>>>>> privileged >>>>>>>>> material. Any review, retransmission, dissemination >> or other use >>>>>>>>> of, or >>>>>>>>> taking of any action in reliance upon, this information >>>>> by persons or >>>>>>>>> entities other than the intended recipients is prohibited by >>>>>>>>> AgResearch >>>>>>>>> Limited. If you have received this message in error, >>>>> please notify >>>>>>>>> the >>>>>>>>> sender immediately. >>>>>>>>> = >>>>>>>>> = >>>>>>>>> >>>>> >> ===================================================================== >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> -- >>>>>>>> Jason Stajich >>>>>>>> jason at bioperl.org >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> -------------------------------------------------------------- >>>>> ---------------- >>>>>> -- >>>>>> >>>>>> >>>>>> >>>>>> No virus found in this incoming message. >>>>>> Checked by AVG - www.avg.com >>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>>>> Date: 07/05/09 >>>>>> 05:53:00 >>>>> >>>>> >>>>> -------------------------------------------------------------- >>>>> ------------------ >>>>> >>>>> >>>>> >>>>> No virus found in this incoming message. >>>>> Checked by AVG - www.avg.com >>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>>>> Date: 07/05/09 >>>>> 17:54:00 >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >> >> >> -------------------------------------------------------------- >> ------------------ >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release >> Date: 07/09/09 >> 05:55:00 >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Fri Jul 10 15:12:35 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 10 Jul 2009 15:12:35 -0400 Subject: [Bioperl-l] Mac platform instructions In-Reply-To: References: Message-ID: should be http://www.bioperl.org/wiki/Getting_BioPerl#Mac_OS_X_using_fink -- ----- Original Message ----- From: "Chris Larsen" To: Sent: Friday, July 10, 2009 12:41 PM Subject: Re: [Bioperl-l] Mac platform instructions > Brian, > > I too am on a Mac now. However the 'getting bioperl' MacOs link on: > "http://www.bioperl.org/wiki/Getting_BioPerl" > > which loads the URL: > "http://www.bioperl.org/wiki/Getting_BioPerl#MacOS_X_using_fink" > > does nothing but reload the same page...it took a bit to figure out > how to begin install, scroll around etc. since it doesnt behave as do > the other platforms links. (FIrefox 3.0.11, OS X 10.5.7). Think I have > it now. > > The rest of the install instructions seem straightforward and should > behave as well as the Fedora tarball did, thanks for that documentation. > > Cheers > > Chris > > > -- > > Christopher Larsen, Ph.D. > Sr. Scientist / Grants Manager > Vecna Technologies > 6404 Ivy Lane #500 > Greenbelt, MD 20770 > Phone: (240) 965-4525 > Fax: (240) 547-6133 > 240-737-4525 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From plantboy at gmail.com Fri Jul 10 19:33:25 2009 From: plantboy at gmail.com (cody h) Date: Fri, 10 Jul 2009 16:33:25 -0700 Subject: [Bioperl-l] Trouble installing bioperl-db on MacOS X... Help? Message-ID: <320708320907101633w69be0a18vd533727bf3e2b4bb@mail.gmail.com> Hi, I'm trying to install bioperl-db 1.5.2 on an intel mac running os 10.5.7. The Build.PL file executes fine, but the test suite fails dramatically, returning the error "No database selected" for many of the tests. All the error calls seem to be originating from line 852 in BasePersistenceAdaptor.pm. I took a look at the code but I could not figure out why it wasn't working. I have bioperl 1.5.2 installed and the biosql schema loaded into my mysql server. The dependencies all seem to be working, but I haven't used them enough to completely verify this, so that could be part of the problem. I don't know which ones to check though. Does anyone have any idea why I might be getting these "No database selected" errors? Here is a sample of the error messages given by the ./Build test command (note, this same error is generated by 15/16 test files) I am new to Perl and would really appreciate any help or guidance at all! Thanks! Cody t/12ontology.t .... 1/738 ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: error while executing statement in Bio::DB::BioSQL::OntologyAdaptor::find_by_unique_key: No database selected STACK: Error::throw STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:359 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB /BioSQL/BasePersistenceAdaptor.pm:948 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB /BioSQL/BasePersistenceAdaptor.pm:852 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:182 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244 STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:169 STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/Desktop/ bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm:244 STACK: t/12ontology.t:44 ----------------------------------------------------------- t/12ontology.t .... Dubious, test returned 255 (wstat 65280, 0xff00) From hlapp at gmx.net Sat Jul 11 07:32:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Jul 2009 07:32:11 -0400 Subject: [Bioperl-l] Trouble installing bioperl-db on MacOS X... Help? In-Reply-To: <320708320907101633w69be0a18vd533727bf3e2b4bb@mail.gmail.com> References: <320708320907101633w69be0a18vd533727bf3e2b4bb@mail.gmail.com> Message-ID: <7F2442F5-2224-405C-92A0-97E34FDFC2F9@gmx.net> Hi Cody, it seems like bioperl-db fails to connect to your database, or it connects but doesn't have a database selected (in MySQL connecting without setting the database is legitimate) and so as soon as it wants to execute a statement it fails. Have you set your connection parameters in t/DBTestHarness.conf? -hilmar On Jul 10, 2009, at 7:33 PM, cody h wrote: > Hi, > > I'm trying to install bioperl-db 1.5.2 on an intel mac running os > 10.5.7. > The Build.PL file executes fine, but the test suite fails > dramatically, > returning the error "No database selected" for many of the tests. > All the > error calls seem to be originating from line 852 in > BasePersistenceAdaptor.pm. I took a look at the code but I could not > figure > out why it wasn't working. > > I have bioperl 1.5.2 installed and the biosql schema loaded into my > mysql > server. The dependencies all seem to be working, but I haven't used > them > enough to completely verify this, so that could be part of the > problem. I > don't know which ones to check though. Does anyone have any idea why > I might > be getting these "No database selected" errors? Here is a sample of > the > error messages given by the ./Build test command (note, this same > error is > generated by 15/16 test files) > > I am new to Perl and would really appreciate any help or guidance at > all! > Thanks! > > Cody > > t/12ontology.t .... 1/738 > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: error while executing statement in > Bio::DB::BioSQL::OntologyAdaptor::find_by_unique_key: > No database selected > STACK: Error::throw > STACK: Bio::Root::Root::throw /Library/Perl/5.8.8/Bio/Root/Root.pm:359 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key > /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB > /BioSQL/BasePersistenceAdaptor.pm:948 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key > /Users/cody/Desktop/bioperl-db-1.5.2_100/blib/lib/Bio/DB > /BioSQL/BasePersistenceAdaptor.pm:852 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/ > Desktop/ > bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:182 > STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/ > Desktop/ > bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm: > 244 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/ > Desktop/ > bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:169 > STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/ > Desktop/ > bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm: > 244 > STACK: Bio::DB::BioSQL::BasePersistenceAdaptor::create /Users/cody/ > Desktop/ > bioperl-db-1.5.2_100/blib/lib/Bio/DB/BioSQL/ > BasePersistenceAdaptor.pm:169 > STACK: Bio::DB::Persistent::PersistentObject::create /Users/cody/ > Desktop/ > bioperl-db-1.5.2_100/blib/lib/Bio/DB/Persistent/PersistentObject.pm: > 244 > STACK: t/12ontology.t:44 > ----------------------------------------------------------- > t/12ontology.t .... Dubious, test returned 255 (wstat 65280, 0xff00) > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From abhishek.vit at gmail.com Mon Jul 13 11:10:04 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 13 Jul 2009 11:10:04 -0400 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: <6269F0005AD041A69233C82E9BE1E776@NewLife> References: <6269F0005AD041A69233C82E9BE1E776@NewLife> Message-ID: Dear Mark Sorry I was not able to reply earlier. Many Thanks for your detailed explanation. However this is not exactly what I am looking for. May be my initial mail was not well articulated or I am not able to infer your reply fully. My bad. Well as an input what we have is the just the genomic coordinates for SNP's predicted by Illumina propriety software CASAVA. What we would like to do is to further classify these predicted SNP's . If they fall into Coding region then whether they are synonymous/non-syn SNPs. So I guess something which translates 1. SNP genomic coordinate into mRNA offset 2. Then identify the ORF and target codon and check whether the SNP substitution will be syn/non-syn. Thanks, -Abhi On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen wrote: > Hey Abhishek- > You might root around in Bio::PopGen. Here's a script to get stuff from > raw fasta data--see comments within. > cheers > Mark > > use Bio::AlignIO; > use Bio::PopGen::Utilities; > > $file = "your_raw_file.fas"; > > > my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; > # get the alignment into a Bio::PopGen::Population format, with codons > # as the marker sites > my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, > -site_model=>'cod'); > # here are your variable codons... > my @cdnpos = $pop->get_marker_names; > # here are your individuals represented in the alignment > my @inds = $pop->get_Individuals; > # which have names like "Codon-3-9", "Codon-4-12", etc > foreach my $cdn (@cdnpos) { > # calculate the unique codons represented at this codon position > my (%ucdns, @ucdns); > @genos = $pop->get_Genotypes(-marker=>$cdn); > $ucdns{$_->get_Alleles}++ for @genos; > @ucdns = sort keys %ucdns; > # > # here, use translate or something faster to identify syn/non-syn > # check out code in Bio::Align::DNAStatistics for various methods > > } > # relate back to individuals with this > foreach my $ind (@inds) { > print "Individual ".$ind->unique_id."\n"; > print "Site\tAllele\n"; > foreach my $cdn (@cdnpos) { > print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; > } > } > > > 1; > > ----- Original Message ----- From: "Abhishek Pratap" < > abhishek.vit at gmail.com> > To: > Sent: Wednesday, July 08, 2009 10:24 AM > Subject: [Bioperl-l] Classifying SNPs > > > > Hi All > > This might seem to be an old track question. However I was not able to > find a good answer in the many diff mailing list archives. > > For all our SNP predictions we would like to know whether they are > synonymous / non-synonymous. If Non-synonymous/Exonic then find the > position on the gene where amino acid is getting changed and to what > ...Also info about indels will help. > > I am not sure if something like this already exists. If not even some > pointers on how to move forward will help. > > Thanks, > -Abhi > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From maj at fortinbras.us Mon Jul 13 11:43:13 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 13 Jul 2009 11:43:13 -0400 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: <6269F0005AD041A69233C82E9BE1E776@NewLife> Message-ID: <6CF6461D65CD4625B4A48B4DFE7174F5@NewLife> Thanks Abhi-- I had a feeling there was more (or "less") to it-- this would be a nice feature to have, don't think it exists. Will think about it-- cheers ----- Original Message ----- From: "Abhishek Pratap" To: "Mark A. Jensen" Cc: Sent: Monday, July 13, 2009 11:10 AM Subject: Re: [Bioperl-l] Classifying SNPs > Dear Mark > Sorry I was not able to reply earlier. Many Thanks for your detailed > explanation. However this is not exactly what I am looking for. May be my > initial mail was not well articulated or I am not able to infer your reply > fully. My bad. > > Well as an input what we have is the just the genomic coordinates for SNP's > predicted by Illumina propriety software CASAVA. What we would like to do is > to further classify these predicted SNP's . If they fall into Coding region > then whether they are synonymous/non-syn SNPs. > > So I guess something which translates > 1. SNP genomic coordinate into mRNA offset > 2. Then identify the ORF and target codon and check whether the SNP > substitution will be syn/non-syn. > > Thanks, > -Abhi > > On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen wrote: > >> Hey Abhishek- >> You might root around in Bio::PopGen. Here's a script to get stuff from >> raw fasta data--see comments within. >> cheers >> Mark >> >> use Bio::AlignIO; >> use Bio::PopGen::Utilities; >> >> $file = "your_raw_file.fas"; >> >> >> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; >> # get the alignment into a Bio::PopGen::Population format, with codons >> # as the marker sites >> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, >> -site_model=>'cod'); >> # here are your variable codons... >> my @cdnpos = $pop->get_marker_names; >> # here are your individuals represented in the alignment >> my @inds = $pop->get_Individuals; >> # which have names like "Codon-3-9", "Codon-4-12", etc >> foreach my $cdn (@cdnpos) { >> # calculate the unique codons represented at this codon position >> my (%ucdns, @ucdns); >> @genos = $pop->get_Genotypes(-marker=>$cdn); >> $ucdns{$_->get_Alleles}++ for @genos; >> @ucdns = sort keys %ucdns; >> # >> # here, use translate or something faster to identify syn/non-syn >> # check out code in Bio::Align::DNAStatistics for various methods >> >> } >> # relate back to individuals with this >> foreach my $ind (@inds) { >> print "Individual ".$ind->unique_id."\n"; >> print "Site\tAllele\n"; >> foreach my $cdn (@cdnpos) { >> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >> } >> } >> >> >> 1; >> >> ----- Original Message ----- From: "Abhishek Pratap" < >> abhishek.vit at gmail.com> >> To: >> Sent: Wednesday, July 08, 2009 10:24 AM >> Subject: [Bioperl-l] Classifying SNPs >> >> >> >> Hi All >> >> This might seem to be an old track question. However I was not able to >> find a good answer in the many diff mailing list archives. >> >> For all our SNP predictions we would like to know whether they are >> synonymous / non-synonymous. If Non-synonymous/Exonic then find the >> position on the gene where amino acid is getting changed and to what >> ...Also info about indels will help. >> >> I am not sure if something like this already exists. If not even some >> pointers on how to move forward will help. >> >> Thanks, >> -Abhi >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Jul 13 12:33:36 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 13 Jul 2009 11:33:36 -0500 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: <6CF6461D65CD4625B4A48B4DFE7174F5@NewLife> References: <6269F0005AD041A69233C82E9BE1E776@NewLife> <6CF6461D65CD4625B4A48B4DFE7174F5@NewLife> Message-ID: Bio::Coordinate might help with coordinate conversion. However, much of this sounds very Ensembl-like. Have you looked at the Ensembl perl API? It can do #1 (coordinate conversion), and I'm sure something could be written up to do the second. chris On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote: > Thanks Abhi-- I had a feeling there was more (or "less") to it-- > this would be a nice feature to have, don't think it exists. Will > think about it-- cheers > ----- Original Message ----- From: "Abhishek Pratap" > > To: "Mark A. Jensen" > Cc: > Sent: Monday, July 13, 2009 11:10 AM > Subject: Re: [Bioperl-l] Classifying SNPs > > >> Dear Mark >> Sorry I was not able to reply earlier. Many Thanks for your detailed >> explanation. However this is not exactly what I am looking for. May >> be my >> initial mail was not well articulated or I am not able to infer >> your reply >> fully. My bad. >> >> Well as an input what we have is the just the genomic coordinates >> for SNP's >> predicted by Illumina propriety software CASAVA. What we would like >> to do is >> to further classify these predicted SNP's . If they fall into >> Coding region >> then whether they are synonymous/non-syn SNPs. >> >> So I guess something which translates >> 1. SNP genomic coordinate into mRNA offset >> 2. Then identify the ORF and target codon and check whether the SNP >> substitution will be syn/non-syn. >> >> Thanks, >> -Abhi >> >> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen >> wrote: >> >>> Hey Abhishek- >>> You might root around in Bio::PopGen. Here's a script to get stuff >>> from >>> raw fasta data--see comments within. >>> cheers >>> Mark >>> >>> use Bio::AlignIO; >>> use Bio::PopGen::Utilities; >>> >>> $file = "your_raw_file.fas"; >>> >>> >>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)- >>> >next_aln; >>> # get the alignment into a Bio::PopGen::Population format, with >>> codons >>> # as the marker sites >>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=> >>> $aln, >>> -site_model=>'cod'); >>> # here are your variable codons... >>> my @cdnpos = $pop->get_marker_names; >>> # here are your individuals represented in the alignment >>> my @inds = $pop->get_Individuals; >>> # which have names like "Codon-3-9", "Codon-4-12", etc >>> foreach my $cdn (@cdnpos) { >>> # calculate the unique codons represented at this codon position >>> my (%ucdns, @ucdns); >>> @genos = $pop->get_Genotypes(-marker=>$cdn); >>> $ucdns{$_->get_Alleles}++ for @genos; >>> @ucdns = sort keys %ucdns; >>> # >>> # here, use translate or something faster to identify syn/non-syn >>> # check out code in Bio::Align::DNAStatistics for various methods >>> >>> } >>> # relate back to individuals with this >>> foreach my $ind (@inds) { >>> print "Individual ".$ind->unique_id."\n"; >>> print "Site\tAllele\n"; >>> foreach my $cdn (@cdnpos) { >>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >>> } >>> } >>> >>> >>> 1; >>> >>> ----- Original Message ----- From: "Abhishek Pratap" < >>> abhishek.vit at gmail.com> >>> To: >>> Sent: Wednesday, July 08, 2009 10:24 AM >>> Subject: [Bioperl-l] Classifying SNPs >>> >>> >>> >>> Hi All >>> >>> This might seem to be an old track question. However I was not >>> able to >>> find a good answer in the many diff mailing list archives. >>> >>> For all our SNP predictions we would like to know whether they are >>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the >>> position on the gene where amino acid is getting changed and to what >>> ...Also info about indels will help. >>> >>> I am not sure if something like this already exists. If not even >>> some >>> pointers on how to move forward will help. >>> >>> Thanks, >>> -Abhi >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Jul 13 12:54:00 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 13 Jul 2009 09:54:00 -0700 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: <6269F0005AD041A69233C82E9BE1E776@NewLife> <6CF6461D65CD4625B4A48B4DFE7174F5@NewLife> Message-ID: Ensembl would be best place to go if you are working with human SNPs but for those who aren't so data lucky... Aspects of this also relates to the dn/dS code in the Bio::Align::DNAStatistics -- thought it does the classification and comparison all at once so you'd have to dig code out. And the mcdonald_kreitman code in Bio::PopGen::Statistics which computes a synonymous or nonsynonymous via lookup table that is stored in Bio::MolEvol::CodonModel which compares the edit path which is encoded as the two codons concatenated together -- i.ee use Bio::MolEvol::CodonModel; my $codon_path = Bio::MolEvol::CodonModel->codon_path; my ($ns, $syn) = $codon_path->{'AATAAC'}; print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n"; It all kind of depends on how you have the data organized, if it is just SNPs and you are trying to figure out if they are syn or non-syn then you kind of need a good database to do this since you'll have to know what gene they are in, CDS of the gene, etc. It is possible to do with something as basic as GFF3 for your genome and the SNP locations and Bio::DB::SeqFeature::Store. While I can think of a way to code it up from those bare-bones - maybe you should report back if you can just use the Ensembl classification of the SNPs? -jason On Jul 13, 2009, at 9:33 AM, Chris Fields wrote: > Bio::Coordinate might help with coordinate conversion. However, > much of this sounds very Ensembl-like. Have you looked at the > Ensembl perl API? It can do #1 (coordinate conversion), and I'm > sure something could be written up to do the second. > > chris > > On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote: > >> Thanks Abhi-- I had a feeling there was more (or "less") to it-- >> this would be a nice feature to have, don't think it exists. Will >> think about it-- cheers >> ----- Original Message ----- From: "Abhishek Pratap" > > >> To: "Mark A. Jensen" >> Cc: >> Sent: Monday, July 13, 2009 11:10 AM >> Subject: Re: [Bioperl-l] Classifying SNPs >> >> >>> Dear Mark >>> Sorry I was not able to reply earlier. Many Thanks for your detailed >>> explanation. However this is not exactly what I am looking for. >>> May be my >>> initial mail was not well articulated or I am not able to infer >>> your reply >>> fully. My bad. >>> >>> Well as an input what we have is the just the genomic coordinates >>> for SNP's >>> predicted by Illumina propriety software CASAVA. What we would >>> like to do is >>> to further classify these predicted SNP's . If they fall into >>> Coding region >>> then whether they are synonymous/non-syn SNPs. >>> >>> So I guess something which translates >>> 1. SNP genomic coordinate into mRNA offset >>> 2. Then identify the ORF and target codon and check whether the SNP >>> substitution will be syn/non-syn. >>> >>> Thanks, >>> -Abhi >>> >>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen >>> wrote: >>> >>>> Hey Abhishek- >>>> You might root around in Bio::PopGen. Here's a script to get >>>> stuff from >>>> raw fasta data--see comments within. >>>> cheers >>>> Mark >>>> >>>> use Bio::AlignIO; >>>> use Bio::PopGen::Utilities; >>>> >>>> $file = "your_raw_file.fas"; >>>> >>>> >>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)- >>>> >next_aln; >>>> # get the alignment into a Bio::PopGen::Population format, with >>>> codons >>>> # as the marker sites >>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=> >>>> $aln, >>>> -site_model=>'cod'); >>>> # here are your variable codons... >>>> my @cdnpos = $pop->get_marker_names; >>>> # here are your individuals represented in the alignment >>>> my @inds = $pop->get_Individuals; >>>> # which have names like "Codon-3-9", "Codon-4-12", etc >>>> foreach my $cdn (@cdnpos) { >>>> # calculate the unique codons represented at this codon position >>>> my (%ucdns, @ucdns); >>>> @genos = $pop->get_Genotypes(-marker=>$cdn); >>>> $ucdns{$_->get_Alleles}++ for @genos; >>>> @ucdns = sort keys %ucdns; >>>> # >>>> # here, use translate or something faster to identify syn/non-syn >>>> # check out code in Bio::Align::DNAStatistics for various methods >>>> >>>> } >>>> # relate back to individuals with this >>>> foreach my $ind (@inds) { >>>> print "Individual ".$ind->unique_id."\n"; >>>> print "Site\tAllele\n"; >>>> foreach my $cdn (@cdnpos) { >>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >>>> } >>>> } >>>> >>>> >>>> 1; >>>> >>>> ----- Original Message ----- From: "Abhishek Pratap" < >>>> abhishek.vit at gmail.com> >>>> To: >>>> Sent: Wednesday, July 08, 2009 10:24 AM >>>> Subject: [Bioperl-l] Classifying SNPs >>>> >>>> >>>> >>>> Hi All >>>> >>>> This might seem to be an old track question. However I was not >>>> able to >>>> find a good answer in the many diff mailing list archives. >>>> >>>> For all our SNP predictions we would like to know whether they are >>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the >>>> position on the gene where amino acid is getting changed and to >>>> what >>>> ...Also info about indels will help. >>>> >>>> I am not sure if something like this already exists. If not even >>>> some >>>> pointers on how to move forward will help. >>>> >>>> Thanks, >>>> -Abhi >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at illinois.edu Mon Jul 13 13:02:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 13 Jul 2009 12:02:43 -0500 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: <6269F0005AD041A69233C82E9BE1E776@NewLife> <6CF6461D65CD4625B4A48B4DFE7174F5@NewLife> Message-ID: On Jul 13, 2009, at 11:54 AM, Jason Stajich wrote: > Ensembl would be best place to go if you are working with human SNPs > but for those who aren't so data lucky... My mouse bias is showing ;> chris From abhishek.vit at gmail.com Mon Jul 13 13:13:08 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Mon, 13 Jul 2009 13:13:08 -0400 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: <6269F0005AD041A69233C82E9BE1E776@NewLife> <6CF6461D65CD4625B4A48B4DFE7174F5@NewLife> Message-ID: Hi Jason Thanks for a detailed insight. I would definitely go the ensembl way first and try to see if it can do exactly what we want. In case it does/'nt I will report back on this same thread. I think having something like this in the Bioperl will help the NGS community. Lot of people are predicting SNPs from NGS.oops(next generation sequencing ) data and looking for ways to better annotate/classify their predictions. Thanks guys .. It is a pleasure to interact with you all. Just overwhelmed to see the responses. best, -Abhi On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich wrote: > Ensembl would be best place to go if you are working with human SNPs but > for those who aren't so data lucky... > > Aspects of this also relates to the dn/dS code in the > Bio::Align::DNAStatistics -- thought it does the classification and > comparison all at once so you'd have to dig code out. > > And the mcdonald_kreitman code in Bio::PopGen::Statistics which computes a > synonymous or nonsynonymous via lookup table that is stored in > Bio::MolEvol::CodonModel which compares the edit path which is encoded as > the two codons concatenated together -- i.ee > > use Bio::MolEvol::CodonModel; > my $codon_path = Bio::MolEvol::CodonModel->codon_path; > my ($ns, $syn) = $codon_path->{'AATAAC'}; > print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n"; > > > It all kind of depends on how you have the data organized, if it is just > SNPs and you are trying to figure out if they are syn or non-syn then you > kind of need a good database to do this since you'll have to know what gene > they are in, CDS of the gene, etc. It is possible to do with something as > basic as GFF3 for your genome and the SNP locations and > Bio::DB::SeqFeature::Store. While I can think of a way to code it up from > those bare-bones - maybe you should report back if you can just use the > Ensembl classification of the SNPs? > > -jason > > > On Jul 13, 2009, at 9:33 AM, Chris Fields wrote: > > Bio::Coordinate might help with coordinate conversion. However, much of >> this sounds very Ensembl-like. Have you looked at the Ensembl perl API? It >> can do #1 (coordinate conversion), and I'm sure something could be written >> up to do the second. >> >> chris >> >> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote: >> >> Thanks Abhi-- I had a feeling there was more (or "less") to it-- this >>> would be a nice feature to have, don't think it exists. Will think about >>> it-- cheers >>> ----- Original Message ----- From: "Abhishek Pratap" < >>> abhishek.vit at gmail.com> >>> To: "Mark A. Jensen" >>> Cc: >>> Sent: Monday, July 13, 2009 11:10 AM >>> Subject: Re: [Bioperl-l] Classifying SNPs >>> >>> >>> Dear Mark >>>> Sorry I was not able to reply earlier. Many Thanks for your detailed >>>> explanation. However this is not exactly what I am looking for. May be >>>> my >>>> initial mail was not well articulated or I am not able to infer your >>>> reply >>>> fully. My bad. >>>> >>>> Well as an input what we have is the just the genomic coordinates for >>>> SNP's >>>> predicted by Illumina propriety software CASAVA. What we would like to >>>> do is >>>> to further classify these predicted SNP's . If they fall into Coding >>>> region >>>> then whether they are synonymous/non-syn SNPs. >>>> >>>> So I guess something which translates >>>> 1. SNP genomic coordinate into mRNA offset >>>> 2. Then identify the ORF and target codon and check whether the SNP >>>> substitution will be syn/non-syn. >>>> >>>> Thanks, >>>> -Abhi >>>> >>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen >>>> wrote: >>>> >>>> Hey Abhishek- >>>>> You might root around in Bio::PopGen. Here's a script to get stuff from >>>>> raw fasta data--see comments within. >>>>> cheers >>>>> Mark >>>>> >>>>> use Bio::AlignIO; >>>>> use Bio::PopGen::Utilities; >>>>> >>>>> $file = "your_raw_file.fas"; >>>>> >>>>> >>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; >>>>> # get the alignment into a Bio::PopGen::Population format, with codons >>>>> # as the marker sites >>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, >>>>> -site_model=>'cod'); >>>>> # here are your variable codons... >>>>> my @cdnpos = $pop->get_marker_names; >>>>> # here are your individuals represented in the alignment >>>>> my @inds = $pop->get_Individuals; >>>>> # which have names like "Codon-3-9", "Codon-4-12", etc >>>>> foreach my $cdn (@cdnpos) { >>>>> # calculate the unique codons represented at this codon position >>>>> my (%ucdns, @ucdns); >>>>> @genos = $pop->get_Genotypes(-marker=>$cdn); >>>>> $ucdns{$_->get_Alleles}++ for @genos; >>>>> @ucdns = sort keys %ucdns; >>>>> # >>>>> # here, use translate or something faster to identify syn/non-syn >>>>> # check out code in Bio::Align::DNAStatistics for various methods >>>>> >>>>> } >>>>> # relate back to individuals with this >>>>> foreach my $ind (@inds) { >>>>> print "Individual ".$ind->unique_id."\n"; >>>>> print "Site\tAllele\n"; >>>>> foreach my $cdn (@cdnpos) { >>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >>>>> } >>>>> } >>>>> >>>>> >>>>> 1; >>>>> >>>>> ----- Original Message ----- From: "Abhishek Pratap" < >>>>> abhishek.vit at gmail.com> >>>>> To: >>>>> Sent: Wednesday, July 08, 2009 10:24 AM >>>>> Subject: [Bioperl-l] Classifying SNPs >>>>> >>>>> >>>>> >>>>> Hi All >>>>> >>>>> This might seem to be an old track question. However I was not able to >>>>> find a good answer in the many diff mailing list archives. >>>>> >>>>> For all our SNP predictions we would like to know whether they are >>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the >>>>> position on the gene where amino acid is getting changed and to what >>>>> ...Also info about indels will help. >>>>> >>>>> I am not sure if something like this already exists. If not even some >>>>> pointers on how to move forward will help. >>>>> >>>>> Thanks, >>>>> -Abhi >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason at bioperl.org > > From maj at fortinbras.us Mon Jul 13 13:10:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 13 Jul 2009 13:10:47 -0400 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: <6269F0005AD041A69233C82E9BE1E776@NewLife><6CF6461D65CD4625B4A48B4DFE7174F5@NewLife> Message-ID: I hate meeces to pieces. ----- Original Message ----- From: "Chris Fields" To: "Jason Stajich" Cc: "BioPerl List" ; "Abhishek Pratap" Sent: Monday, July 13, 2009 1:02 PM Subject: Re: [Bioperl-l] Classifying SNPs > On Jul 13, 2009, at 11:54 AM, Jason Stajich wrote: > >> Ensembl would be best place to go if you are working with human SNPs but for >> those who aren't so data lucky... > > My mouse bias is showing ;> > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jason at bioperl.org Mon Jul 13 20:02:39 2009 From: jason at bioperl.org (Jason Stajich) Date: Mon, 13 Jul 2009 17:02:39 -0700 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: <6269F0005AD041A69233C82E9BE1E776@NewLife> <6CF6461D65CD4625B4A48B4DFE7174F5@NewLife> Message-ID: Having a lightweight system for this could be helpful if it isn't replicated somewhere else. I don't think the NGS is really changing per se -- really it is just that more people have SNPs called in more research systems. It would be a simple little project I think coding up some basics if you think you'd actually take it over and run with it and/or push it into a semi-organized set of scripts? -jason On Jul 13, 2009, at 10:13 AM, Abhishek Pratap wrote: > Hi Jason > > Thanks for a detailed insight. I would definitely go the ensembl way > first > and try to see if it can do exactly what we want. > > In case it does/'nt I will report back on this same thread. I think > having > something like this in the Bioperl will help the NGS community. Lot of > people are predicting SNPs from NGS.oops(next generation > sequencing ) data > and looking for ways to better annotate/classify their predictions. > > Thanks guys .. It is a pleasure to interact with you all. Just > overwhelmed > to see the responses. > > best, > -Abhi > > > > On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich > wrote: > >> Ensembl would be best place to go if you are working with human >> SNPs but >> for those who aren't so data lucky... >> >> Aspects of this also relates to the dn/dS code in the >> Bio::Align::DNAStatistics -- thought it does the classification and >> comparison all at once so you'd have to dig code out. >> >> And the mcdonald_kreitman code in Bio::PopGen::Statistics which >> computes a >> synonymous or nonsynonymous via lookup table that is stored in >> Bio::MolEvol::CodonModel which compares the edit path which is >> encoded as >> the two codons concatenated together -- i.ee >> >> use Bio::MolEvol::CodonModel; >> my $codon_path = Bio::MolEvol::CodonModel->codon_path; >> my ($ns, $syn) = $codon_path->{'AATAAC'}; >> print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n"; >> >> >> It all kind of depends on how you have the data organized, if it is >> just >> SNPs and you are trying to figure out if they are syn or non-syn >> then you >> kind of need a good database to do this since you'll have to know >> what gene >> they are in, CDS of the gene, etc. It is possible to do with >> something as >> basic as GFF3 for your genome and the SNP locations and >> Bio::DB::SeqFeature::Store. While I can think of a way to code it >> up from >> those bare-bones - maybe you should report back if you can just use >> the >> Ensembl classification of the SNPs? >> >> -jason >> >> >> On Jul 13, 2009, at 9:33 AM, Chris Fields wrote: >> >> Bio::Coordinate might help with coordinate conversion. However, >> much of >>> this sounds very Ensembl-like. Have you looked at the Ensembl >>> perl API? It >>> can do #1 (coordinate conversion), and I'm sure something could be >>> written >>> up to do the second. >>> >>> chris >>> >>> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote: >>> >>> Thanks Abhi-- I had a feeling there was more (or "less") to it-- >>> this >>>> would be a nice feature to have, don't think it exists. Will >>>> think about >>>> it-- cheers >>>> ----- Original Message ----- From: "Abhishek Pratap" < >>>> abhishek.vit at gmail.com> >>>> To: "Mark A. Jensen" >>>> Cc: >>>> Sent: Monday, July 13, 2009 11:10 AM >>>> Subject: Re: [Bioperl-l] Classifying SNPs >>>> >>>> >>>> Dear Mark >>>>> Sorry I was not able to reply earlier. Many Thanks for your >>>>> detailed >>>>> explanation. However this is not exactly what I am looking for. >>>>> May be >>>>> my >>>>> initial mail was not well articulated or I am not able to infer >>>>> your >>>>> reply >>>>> fully. My bad. >>>>> >>>>> Well as an input what we have is the just the genomic >>>>> coordinates for >>>>> SNP's >>>>> predicted by Illumina propriety software CASAVA. What we would >>>>> like to >>>>> do is >>>>> to further classify these predicted SNP's . If they fall into >>>>> Coding >>>>> region >>>>> then whether they are synonymous/non-syn SNPs. >>>>> >>>>> So I guess something which translates >>>>> 1. SNP genomic coordinate into mRNA offset >>>>> 2. Then identify the ORF and target codon and check whether the >>>>> SNP >>>>> substitution will be syn/non-syn. >>>>> >>>>> Thanks, >>>>> -Abhi >>>>> >>>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen >>>>> >>>>> wrote: >>>>> >>>>> Hey Abhishek- >>>>>> You might root around in Bio::PopGen. Here's a script to get >>>>>> stuff from >>>>>> raw fasta data--see comments within. >>>>>> cheers >>>>>> Mark >>>>>> >>>>>> use Bio::AlignIO; >>>>>> use Bio::PopGen::Utilities; >>>>>> >>>>>> $file = "your_raw_file.fas"; >>>>>> >>>>>> >>>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)- >>>>>> >next_aln; >>>>>> # get the alignment into a Bio::PopGen::Population format, with >>>>>> codons >>>>>> # as the marker sites >>>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=> >>>>>> $aln, >>>>>> -site_model=>'cod'); >>>>>> # here are your variable codons... >>>>>> my @cdnpos = $pop->get_marker_names; >>>>>> # here are your individuals represented in the alignment >>>>>> my @inds = $pop->get_Individuals; >>>>>> # which have names like "Codon-3-9", "Codon-4-12", etc >>>>>> foreach my $cdn (@cdnpos) { >>>>>> # calculate the unique codons represented at this codon position >>>>>> my (%ucdns, @ucdns); >>>>>> @genos = $pop->get_Genotypes(-marker=>$cdn); >>>>>> $ucdns{$_->get_Alleles}++ for @genos; >>>>>> @ucdns = sort keys %ucdns; >>>>>> # >>>>>> # here, use translate or something faster to identify syn/non-syn >>>>>> # check out code in Bio::Align::DNAStatistics for various methods >>>>>> >>>>>> } >>>>>> # relate back to individuals with this >>>>>> foreach my $ind (@inds) { >>>>>> print "Individual ".$ind->unique_id."\n"; >>>>>> print "Site\tAllele\n"; >>>>>> foreach my $cdn (@cdnpos) { >>>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> 1; >>>>>> >>>>>> ----- Original Message ----- From: "Abhishek Pratap" < >>>>>> abhishek.vit at gmail.com> >>>>>> To: >>>>>> Sent: Wednesday, July 08, 2009 10:24 AM >>>>>> Subject: [Bioperl-l] Classifying SNPs >>>>>> >>>>>> >>>>>> >>>>>> Hi All >>>>>> >>>>>> This might seem to be an old track question. However I was not >>>>>> able to >>>>>> find a good answer in the many diff mailing list archives. >>>>>> >>>>>> For all our SNP predictions we would like to know whether they >>>>>> are >>>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find >>>>>> the >>>>>> position on the gene where amino acid is getting changed and to >>>>>> what >>>>>> ...Also info about indels will help. >>>>>> >>>>>> I am not sure if something like this already exists. If not >>>>>> even some >>>>>> pointers on how to move forward will help. >>>>>> >>>>>> Thanks, >>>>>> -Abhi >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> -- >> Jason Stajich >> jason at bioperl.org >> >> -- Jason Stajich jason at bioperl.org From maj at fortinbras.us Mon Jul 13 22:15:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 13 Jul 2009 22:15:30 -0400 Subject: [Bioperl-l] perly suffix trees-- Message-ID: <3F34863C45914120A62B84BF973FAB76@NewLife> Hi All- Russell sent me an almost magical Perl algorithm for creating a suffix tree or something like one. It was cool enough to make a scrap out of it-- http://www.bioperl.org/wiki/Suffix_trees_from_thin_air Have a look; might be diverting- cheers Mark From rmb32 at cornell.edu Tue Jul 14 00:24:04 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 13 Jul 2009 21:24:04 -0700 Subject: [Bioperl-l] perly suffix trees-- In-Reply-To: <3F34863C45914120A62B84BF973FAB76@NewLife> References: <3F34863C45914120A62B84BF973FAB76@NewLife> Message-ID: <4A5C0864.5070706@cornell.edu> Heh, "suffix trees--" I thought for a second you were decrementing the karma of suffix trees, meaning they were bad. Come hang around in the #bioperl channel on freenode. All the cool kids are doing it. ;-) Rob From gmodhelp at googlemail.com Tue Jul 14 02:36:50 2009 From: gmodhelp at googlemail.com (Dave Clements, GMOD Help Desk) Date: Tue, 14 Jul 2009 02:36:50 -0400 Subject: [Bioperl-l] August 2009 GMOD Meeting In-Reply-To: <71ee57c70907011038u7bf75f00x7e486cb1b8a00e35@mail.gmail.com> References: <71ee57c70907011032k25daa9cche0f4778e1c2c0093@mail.gmail.com> <71ee57c70907011036w49b9c144qbe04fcd8d8d1d7d0@mail.gmail.com> <71ee57c70907011037o574666f9k8af120c04b2ea54c@mail.gmail.com> <71ee57c70907011038u7bf75f00x7e486cb1b8a00e35@mail.gmail.com> Message-ID: <71ee57c70907132336v4837b5fbp55fb24e40a8b7374@mail.gmail.com> Hello all, Just a reminder that the August 2009 GMOD Meeting is being held 6-7 August, in Oxford, UK. With a little over 3 weeks to go, the meeting is now 40% full. All remaining space is available on a first come first served basis, and I encourage you to register now, before all open slots are taken. (The January meeting was completely full.) You can register at http://gmod.org/wiki/August_2009_GMOD_Meeting. Please let me know if you have any questions. Cheers, Dave C. On Wed, Jul 1, 2009 at 1:38 PM, Dave Clements, GMOD Help Desk wrote: > Hello all, > > The next GMOD meeting will be held 6-7 August, at the University of > Oxford, in Oxford, United Kingdom. Registration is now open. Space is > available on a first come, first served basis and there is room for 55 > attendees. The meeting cost is ?50. ?See > http://gmod.org/wiki/August_2009_GMOD_Meeting to register > > As with previous GMOD meetings, this meeting will have a mixture of > project, component, and user talks. The agenda is driven by attendee > suggestions, and you are encouraged to add your suggestions now (see > http://gmod.org/wiki/August_2009_GMOD_Meeting#Agenda_Suggestions). > > For examples of what happens at a GMOD meeting, see the writeups of > the January 2009, July 2008, or any other previous meeting (see > http://gmod.org/wiki/Meetings). GMOD meetings are an excellent way to > meet other GMOD developers and users and to learn (and affect) what's > coming in the project. > > Please join us in Oxford this August, > > Dave Clements > GMOD Help Desk > > Note: Unless you have applied to and been admitted to the Summer > School, don't you dare register for it. The registration web site will > let you do this, but bureaucratic hellishness will ensue. > > -- > * Learn more about GMOD at: ISMB/ECCB: http://www.iscb.org/ismbeccb2009/ > ? (BioMart, Chado, Galaxy, InterMine) > * Please keep responses on the list! > * Was this helpful? ?Let us know at http://gmod.org/wiki/Help_Desk_Feedback > -- * Register now for the August GMOD Meeting: http://gmod.org/wiki/August_2009_GMOD_Meeting * Please keep responses on the list! * Was this helpful? Let us know at http://gmod.org/wiki/Help_Desk_Feedback From hlapp at gmx.net Tue Jul 14 04:23:45 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 14 Jul 2009 09:23:45 +0100 Subject: [Bioperl-l] perly suffix trees-- In-Reply-To: <4A5C0864.5070706@cornell.edu> References: <3F34863C45914120A62B84BF973FAB76@NewLife> <4A5C0864.5070706@cornell.edu> Message-ID: <3B6AF1BF-B51B-4FD1-BD2F-0A3E72978F45@gmx.net> On Jul 14, 2009, at 5:24 AM, Robert Buels wrote: > Come hang around in the #bioperl channel on freenode. All the cool > kids are doing it. ;-) Which proves I'm not a cool kid. I've suspected that for a while ... ;) Seriously though, I think it's a great that some people are volunteering their time to create that presence and populate it. -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From scott at scottcain.net Tue Jul 14 11:10:44 2009 From: scott at scottcain.net (Scott Cain) Date: Tue, 14 Jul 2009 11:10:44 -0400 Subject: [Bioperl-l] Windows and ppm:package.xml Message-ID: <498AA22B-42C1-4FF3-A22D-46C8F293DFBB@scottcain.net> Hello, Is there a reason that http://bioperl.org/DIST/package.xml hasn't been freshened to include the 1.6 release? It appears that the ppm build of the release is there, but when I try to use ppm to install it (Activestate 5.8 build 826), it installs 1.5.2, since that is the only thing mentioned in the package.xml file. Thanks, Scott ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From jason at bioperl.org Tue Jul 14 13:20:35 2009 From: jason at bioperl.org (Jason Stajich) Date: Tue, 14 Jul 2009 10:20:35 -0700 Subject: [Bioperl-l] Windows and ppm:package.xml In-Reply-To: <498AA22B-42C1-4FF3-A22D-46C8F293DFBB@scottcain.net> References: <498AA22B-42C1-4FF3-A22D-46C8F293DFBB@scottcain.net> Message-ID: It ought to be fixed, I am sure just a reflection of forgetting to do it. If you can provide the XML patch I can paste it in or chris might have time to add this in. -jason On Jul 14, 2009, at 8:10 AM, Scott Cain wrote: > Hello, > > Is there a reason that http://bioperl.org/DIST/package.xml hasn't > been freshened to include the 1.6 release? It appears that the ppm > build of the release is there, but when I try to use ppm to install > it (Activestate 5.8 build 826), it installs 1.5.2, since that is the > only thing mentioned in the package.xml file. > > Thanks, > Scott > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjfields at illinois.edu Tue Jul 14 14:19:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 14 Jul 2009 13:19:03 -0500 Subject: [Bioperl-l] Windows and ppm:package.xml In-Reply-To: <498AA22B-42C1-4FF3-A22D-46C8F293DFBB@scottcain.net> References: <498AA22B-42C1-4FF3-A22D-46C8F293DFBB@scottcain.net> Message-ID: <7B1F9B94-D8A1-4092-86F1-66D6AF0B9F11@illinois.edu> Scott, It's filed as a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=2794 The problem with PPM at the time of the 1.6.0 release was there were several modules recommended that weren't available (GraphViz was one, can't remember the other). This will be fixed for the next release, which I can probably get an alpha for in the next week. The alphas will help us in testing this out. chris On Jul 14, 2009, at 10:10 AM, Scott Cain wrote: > Hello, > > Is there a reason that http://bioperl.org/DIST/package.xml hasn't > been freshened to include the 1.6 release? It appears that the ppm > build of the release is there, but when I try to use ppm to install > it (Activestate 5.8 build 826), it installs 1.5.2, since that is the > only thing mentioned in the package.xml file. > > Thanks, > Scott > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers.org Tue Jul 14 14:33:46 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Tue, 14 Jul 2009 13:33:46 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: <8FA7F2A099D84685B3B3351916AE5AB0@jonas> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> <7BBF64FF-F531-4F7C-8A31-BD04FCE1BF1A@gmail.com> <8FA7F2A099D84685B3B3351916AE5AB0@jonas> Message-ID: Jonas, I'm glad to hear it works for you too. I noted the recipe for success here: http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast#Status Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Jonas Schaer [mailto:Brotelzwieb at gmx.de] > Sent: Tuesday, July 14, 2009 12:05 PM > To: Cook, Malcolm; Chris Fields > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > Hi chris and malcom, > seems to work here, too. thank u both a lot. great job :) > best regards, jonas > ----- Original Message ----- > From: "Chris Fields" > To: "Cook, Malcolm" > Cc: "'Jonas Schaer'" ; "'BioPerl List'" > > Sent: Friday, July 10, 2009 8:04 PM > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > > > Malcolm, > > > > Nice! Go ahead and add the test in; we can look at trying to get > > CDD_SEARCH working at some point but this is a nice workaround. > > > > chris > > > > On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: > > > >> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates > >> the following. Is it appropriate to submit it? > >> > >> Jonas, OK, I was a little quick on the gun... but I've got it now. > >> > >> You don't need to change the wrapper. Here is what you need to do: > >> > >> # 1) set your database like this: > >> > >> -database => 'cdsearch/cdd', # c.f. > >> > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html > >> for other cdd database options > >> > >> # 2) add this line before submitting the job: > >> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; > >> > >> You're in - No other changes needed. > >> > >> Malcolm Cook > >> Stowers Institute for Medical Research - Kansas City, Missouri > >> > >> > >>> -----Original Message----- > >>> From: Jonas Schaer [mailto:Brotelzwieb at gmx.de] > >>> Sent: Friday, July 10, 2009 4:18 AM > >>> To: BioPerl List; Cook, Malcolm; Chris Fields > >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >>> > >>> Hi, > >>> I tried to do what Malcom proposed my ($prog = 'rpsblast'; > >>> my $db = > >>> 'CDD';) but that didn't work. > >>> > >>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>> MSG: Value rpsblast for PUT parameter PROGRAM does not match > >>> expression t?blast[ pnx]. Rejecting. > >>> STACK: Error::throw > >>> STACK: Bio::Root::Root::throw > C:/Perl/site/lib/Bio/Root/Root.pm:359 > >>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >>> C:/Perl/site/lib/Bio/Tools > >>> /Run/RemoteBlast.pm:329 > >>> STACK: Bio::Tools::Run::RemoteBlast::new > >>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl > >>> ast.pm:257 > >>> STACK: blast_a_seq2.pm:14 > >>> ----------------------------------------------------------- > >>> So I should try to "change the wrapper to allow 'rpsblast'", > >>> right? Could You tell me how to do that, please? So sorry but > >>> I have no idea yet...:) If that doesn't work, is there any > >>> other way to run cdd-searches with perl? > >>> Thank you so much! > >>> Regards, Jonas > >>> > >>> ----- Original Message ----- > >>> From: "Chris Fields" > >>> To: "Cook, Malcolm" > >>> Cc: "'Jonas Schaer'" ; "'BioPerl List'" > >>> ; "'Smithies, Russell'" > >>> ; > >>> Sent: Thursday, July 09, 2009 9:19 PM > >>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >>> > >>> > >>>> I've scheduled this tentatively for the 1.6 release > series (just not > >>>> sure when yet). It may work as is, but I haven't tried > it out yet > >>>> (and am hazarding to guess it only retrieves the single > main RID at > >>>> the moment). > >>>> > >>>> chris > >>>> > >>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > >>>> > >>>>> Jonas, > >>>>> > >>>>> If you want to continue to use the bioperl remoteblast > interface, > >>>>> probably what you should do is simply call it twice. > >>>>> > >>>>> Once, as you already know how to do, which will return > without CDD > >>>>> results. > >>>>> > >>>>> Secondly, to get the CDD results, call remoteblast a > second time. > >>>>> This time, using > >>>>> -database => 'CDD' > >>>>> -program => 'rpsblast' > >>>>> > >>>>> However, the wrapper may object to the 'rpsblast' > program. It is > >>>>> not listed in the POD - > >>>>> > >>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R > >>> emoteBlast.pm) > >>>>> If so, my guess is that changing the perl wrapper to allow > >>>>> rpsblast will "just work" (tm). I've cc:ed > >>> cjfields at bioperl.org for > >>>>> his opinion on this. > >>>>> > >>>>> Also, you might want to perform the CDD search first, > especially if > >>>>> you are streaming results to eyeball that might like > something to > >>>>> look at while the second (presumably longer) search is running. > >>>>> > >>>>> Cheers, > >>>>> > >>>>> Malcolm Cook > >>>>> Stowers Institute for Medical Research - Kansas City, Missouri > >>>>> > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org > >>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >>>>>> Jonas Schaer > >>>>>> Sent: Thursday, July 09, 2009 5:16 AM > >>>>>> To: BioPerl List; Smithies, Russell > >>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >>>>>> > >>>>>> Hi guys, > >>>>>> Thank you all so much for your help and patience :). Of > >>>>>> course you were right and I finaly found the right > >>>>>> put-parameter to get exactly the same hits as on the homepage. > >>>>>> I do have an other question though :)... > >>>>>> I now want to include a search for conserved domains, but > >>>>>> when I try to use the CDD_SEARCH-parameter > >>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# > >>>>>> sub:CDD_SEARCH) > >>>>>> like the other put-parameters the way chris once told > >>>>>> me(works fine with the other params): > >>>>>> > >>>>>> my %put = ( > >>>>>> WORD_SIZE => 3, > >>>>>> HITLIST_SIZE => 100, > >>>>>> THRESHOLD => 11, > >>>>>> FILTER => 'R', > >>>>>> GENETIC_CODE => 1, > >>>>>> CDD_SEARCH => 'on' > >>>>>> ###I tried it > >>>>>> with 'true' and '1', too. > >>>>>> > >>>>>> ); > >>>>>> > >>>>>> for my $putName (keys %put) { > >>>>>> $factory->submit_parameter($putName,$put{$putName}); > >>>>>> } > >>>>>> > >>>>>> > >>>>>> ...an exception is thrown: > >>>>>> > >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>>>> MSG: CDD_SEARCH is not a valid PUT parameter. > >>>>>> STACK: Error::throw > >>>>>> STACK: Bio::Root::Root::throw > >>> C:/Perl/site/lib/Bio/Root/Root.pm:359 > >>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >>>>>> C:/Perl/site/lib/Bio/Tools > >>>>>> /Run/RemoteBlast.pm:325 > >>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 > >>>>>> STACK: main::blast_it firsteval0.8.pm:288 > >>>>>> STACK: firsteval0.8.pm:35 > >>>>>> ----------------------------------------------------------- . > >>>>>> I guess somehow this could be the solution to my problem: > >>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s > >>>>>> ub:RID-for-Simultaneous > >>>>>> , but unfortunately I don't understand what to do. > >>>>>> I'm so sorry to bother you with this but please help me once > >>>>>> more...:) > >>>>>> > >>>>>> Best regards and thanks in advance, > >>>>>> Jonas > >>>>>> > >>>>>> ----- Original Message ----- > >>>>>> From: "Smithies, Russell" > >>>>>> To: "'Jonas Schaer'" > >>>>>> Cc: "'Chris Fields'" ; "'BioPerl List'" > >>>>>> > >>>>>> Sent: Monday, July 06, 2009 10:56 PM > >>>>>> Subject: RE: [Bioperl-l] different results with > >>> remote-blast skript > >>>>>> > >>>>>> > >>>>>> Hi Jonas, > >>>>>> You can't just play with the BLAST parameters and hope > >>> for a "better" > >>>>>> result. > >>>>>> I'd suggest that if you aren't sure what they do, you should > >>>>>> leave them > >>>>>> alone as small changes can make huge differences in the > >>>>>> output - it's quite > >>>>>> possible to miss finding what you're looking for by using > >>> the wrong > >>>>>> parameters. > >>>>>> If all else fails, read the blast manual: > >>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall > >>>>>> _all.html > >>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ > >>>>>> Or Read Ian Korfs' excellent book: > >>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > >>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > >>>>>> > >>>>>> Don't worry about the integer overflow bug as there's nothing > >>>>>> you can do > >>>>>> about it. If you're interested, Google and Wikipedia are your > >>>>>> friends: > >>>>>> http://en.wikipedia.org/wiki/Integer_overflow > >>>>>> > >>>>>> > >>>>>> Russell > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. > >>>>>>> To: BioPerl List; Chris Fields > >>>>>>> Subject: Re: [Bioperl-l] different results with > >>> remote-blast skript > >>>>>>> > >>>>>>> Hi guys, thanks for your answers so far. > >>>>>>> @jason: integer overflow in blast.... sorry, but what do > >>>>>> you mean by that? > >>>>>>> how can I fix it...? > >>>>>>> > >>>>>>> Since I never really changed any parameters I thought them > >>>>>> all to be > >>>>>>> default. > >>>>>>> whatever, I tried to get "better" results with my prog > >>> by changing > >>>>>>> these: > >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>>>>> > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI > >>>>>> STICS'} = > >>>>>>> '1'; > >>>>>>> with no effect...I guess these were default values anyway. > >>>>>>> > >>>>>>> So please maybe you can tell me all the other parameters I > >>>>>> can change with > >>>>>>> my > >>>>>>> perl-skript AND how to do that? > >>>>>>> Unfortunately both, perl and the blast-algorithm are pretty > >>>>>> much new to > >>>>>>> me, > >>>>>>> maybe thats why I just cannot find out how to do that on my > >>>>>> own... :/ > >>>>>>> > >>>>>>> Here is the output I get with my remote-blast skript: > >>>>>>> > >>>>>> ############################################################## > >>>>>> ################ > >>>>>>> ################################### > >>>>>>> Query Name: > >>>>>>> > >>> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > >>>>>>> L > >>>>>>> hit name is ref|XP_001702807.1| > >>>>>>> score is 442 > >>>>>>> BLASTP 2.2.21+ > >>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro > >>>>>> A. Schaffer, > >>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > >>>>>> Lipman (1997), > >>>>>>> "Gapped > >>>>>>> BLAST and PSI-BLAST: a new generation of protein > database search > >>>>>>> programs", > >>>>>>> Nucleic Acids Res. 25:3389-3402. > >>>>>>> > >>>>>>> > >>>>>>> Reference for composition-based statistics: Alejandro A. > >>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, > >>>>>> John L. Spouge, > >>>>>>> Yuri > >>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), > >>>>>> "Improving the > >>>>>>> accuracy of PSI-BLAST protein database searches with > >>>>>> composition-based > >>>>>>> statistics and other refinements", Nucleic Acids Res. > >>> 29:2994-3005. > >>>>>>> > >>>>>>> > >>>>>>> RID: 53STX5G2013 > >>>>>>> > >>>>>>> > >>>>>>> Database: All non-redundant GenBank CDS > >>>>>>> translations+PDB+SwissProt+PIR+PRF excluding > >>> environmental samples > >>>>>>> from WGS projects > >>>>>>> 9,252,587 sequences; 3,169,972,781 total > letters Query= > >>>>>>> > >>>>>> > >>> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > >>>>>>> > >>>>>> > >>> > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTAM > >>>>>>> ATGPDPDDEYE > >>>>>>> Length=150 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Score > >>>>>>> E > >>>>>>> Sequences producing significant alignments: > >>>>>> (Bits) > >>>>>>> Value > >>>>>>> > >>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>>>>> reinhard... 174 > >>>>>>> 2e-42 > >>>>>>> > >>>>>>> > >>>>>>> ALIGNMENTS > >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>> reinhardtii] > >>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > >>>>>>> Length=303 > >>>>>>> > >>>>>>> Score = 174 bits (442), Expect = 2e-42, Method: > >>>>>> Composition-based > >>>>>>> stats. > >>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), > >>>>>> Gaps = 0/150 > >>>>>>> (0%) > >>>>>>> > >>>>>>> Query 1 > >>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > >>>>>>> 60 > >>>>>>> > >>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>>>>> Sbjct 154 > >>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>>>>> 213 > >>>>>>> > >>>>>>> Query 61 > >>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>>>>> 120 > >>>>>>> > >>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>>>>> Sbjct 214 > >>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>>>>> 273 > >>>>>>> > >>>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > >>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE > >>>>>>> Sbjct 274 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Database: All non-redundant GenBank CDS > >>>>>>> translations+PDB+SwissProt+PIR+PRF > >>>>>>> excluding environmental samples from WGS projects > >>>>>>> Posted date: Jul 5, 2009 4:41 AM > >>>>>>> Number of letters in database: -1,124,994,511 > >>>>>>> Number of sequences in database: 9,252,587 > >>>>>>> > >>>>>>> Lambda K H > >>>>>>> 0.309 0.122 0.345 > >>>>>>> Gapped > >>>>>>> Lambda K H > >>>>>>> 0.267 0.0410 0.140 > >>>>>>> Matrix: BLOSUM62 > >>>>>>> Gap Penalties: Existence: 11, Extension: 1 > >>>>>>> Number of Sequences: 9252587 > >>>>>>> Number of Hits to DB: 60273703 > >>>>>>> Number of extensions: 1448367 > >>>>>>> Number of successful extensions: 2103 > >>>>>>> Number of sequences better than 10: 0 > >>>>>>> Number of HSP's better than 10 without gapping: 0 > >>>>>>> Number of HSP's gapped: 2113 > >>>>>>> Number of HSP's successfully gapped: 0 > >>>>>>> Length of query: 150 > >>>>>>> Length of database: 3169972781 > >>>>>>> Length adjustment: 113 > >>>>>>> Effective length of query: 37 > >>>>>>> Effective length of database: 2124430450 > >>>>>>> Effective search space: 78603926650 > >>>>>>> Effective search space used: 78603926650 > >>>>>>> T: 11 > >>>>>>> A: 40 > >>>>>>> X1: 16 (7.1 bits) > >>>>>>> X2: 38 (14.6 bits) > >>>>>>> X3: 64 (24.7 bits) > >>>>>>> S1: 42 (20.8 bits) > >>>>>>> S2: 74 (33.1 bits) > >>>>>>> > >>>>>>> > >>>>>> ############################################################## > >>>>>> ################ > >>>>>>> ################################### > >>>>>>> and here are the hits (?) of the blast-algorithm on the > >>>>>> ncbi-homepage with > >>>>>>> the same query of course: > >>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>>>>> reinhard... 300 > >>>>>>> 3e-80 > >>>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA > >>>>>> [Acyrtho... 36.2 > >>>>>>> 1.1 > >>>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 > >>>>>> [Blautia... 35.4 > >>>>>>> 1.8 > >>>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania > >>>>>> brazil... 34.3 > >>>>>>> 4.2 > >>>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 > >>>>>> [Aspergillus n... 33.5 > >>>>>>> 6.0 > >>>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 > >>>>>> [Methyloba... 33.5 > >>>>>>> 7.0 > >>>>>>> > >>>>>> ############################################################## > >>>>>> ################ > >>>>>>> ###################################at > >>>>>>> least the first hit is the same, but even there there is a > >>>>>> different score > >>>>>>> and e-value. > >>>>>>> > >>>>>>> thanks so much for any help :) > >>>>>>> regards, jonas > >>>>>>> > >>>>>>> > >>>>>>> ----- Original Message ----- > >>>>>>> From: "Chris Fields" > >>>>>>> To: "Jason Stajich" > >>>>>>> Cc: "Smithies, Russell" > >>>>>> ; "'BioPerl > >>>>>>> List'" ; "'Jonas Schaer'" > >>>>>>> > >>>>>>> Sent: Monday, July 06, 2009 12:51 AM > >>>>>>> Subject: Re: [Bioperl-l] different results with > >>> remote-blast skript > >>>>>>> > >>>>>>> > >>>>>>>> That inspires confidence ;> > >>>>>>>> > >>>>>>>> chris > >>>>>>>> > >>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > >>>>>>>> > >>>>>>>>> integer overflow in blast.... > >>>>>>>>> > >>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >>>>>>>>> > >>>>>>>>>> I'd guess it's a difference in the parameters used. > >>>>>>>>>> Interesting that both have the number of letters > in the db as > >>>>>>>>>> "-1,125,070,205", I assume that's a bug :-) > >>>>>>>>>> > >>>>>>>>>> Stats from your remote_blast: > >>>>>>>>>> > >>>>>>>>>> 'stats' => { > >>>>>>>>>> 'S1' => '42', > >>>>>>>>>> 'S1_bits' => '20.8', > >>>>>>>>>> 'lambda' => '0.309', > >>>>>>>>>> 'entropy' => '0.345', > >>>>>>>>>> 'kappa_gapped' => '0.0410', > >>>>>>>>>> 'T' => '11', > >>>>>>>>>> 'kappa' => '0.122', > >>>>>>>>>> 'X3_bits' => '24.7', > >>>>>>>>>> 'X1' => '16', > >>>>>>>>>> 'lambda_gapped' => '0.267', > >>>>>>>>>> 'X2' => '38', > >>>>>>>>>> 'S2' => '74', > >>>>>>>>>> 'seqs_better_than_cutoff' => '0', > >>>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>>>>>>>>> 'Hits_to_DB' => '60102303', > >>>>>>>>>> 'dbletters' => '-1125070205', > >>>>>>>>>> 'A' => '40', > >>>>>>>>>> 'num_successful_extensions' => '2004', > >>>>>>>>>> 'num_extensions' => '1436892', > >>>>>>>>>> 'X1_bits' => '7.1', > >>>>>>>>>> 'X3' => '64', > >>>>>>>>>> 'entropy_gapped' => '0.140', > >>>>>>>>>> 'dbentries' => '9252258', > >>>>>>>>>> 'X2_bits' => '14.6', > >>>>>>>>>> 'S2_bits' => '33.1' > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Stats from a blast done on the NCBI webpage: > >>>>>>>>>> > >>>>>>>>>> Database: All non-redundant GenBank CDS > >>>>>> translations+PDB+SwissProt > >>>>>>>>>> +PIR+PRF > >>>>>>>>>> excluding environmental samples from WGS projects > >>>>>>>>>> Posted date: Jul 4, 2009 4:41 AM > >>>>>>>>>> Number of letters in database: -1,125,070,205 > >>>>>>>>>> Number of sequences in database: 9,252,258 > >>>>>>>>>> > >>>>>>>>>> Lambda K H > >>>>>>>>>> 0.309 0.124 0.340 > >>>>>>>>>> Gapped > >>>>>>>>>> Lambda K H > >>>>>>>>>> 0.267 0.0410 0.140 > >>>>>>>>>> Matrix: BLOSUM62 > >>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 > >>>>>>>>>> Number of Sequences: 9252258 > >>>>>>>>>> Number of Hits to DB: 86493230 > >>>>>>>>>> Number of extensions: 3101413 > >>>>>>>>>> Number of successful extensions: 9001 > >>>>>>>>>> Number of sequences better than 100: 65 > >>>>>>>>>> Number of HSP's better than 100 without gapping: 0 > >>>>>>>>>> Number of HSP's gapped: 9000 > >>>>>>>>>> Number of HSP's successfully gapped: 66 > >>>>>>>>>> Length of query: 150 > >>>>>>>>>> Length of database: 3169897087 > >>>>>>>>>> Length adjustment: 113 > >>>>>>>>>> Effective length of query: 37 > >>>>>>>>>> Effective length of database: 2124391933 > >>>>>>>>>> Effective search space: 78602501521 > >>>>>>>>>> Effective search space used: 78602501521 > >>>>>>>>>> T: 11 > >>>>>>>>>> A: 40 > >>>>>>>>>> X1: 16 (7.1 bits) > >>>>>>>>>> X2: 38 (14.6 bits) > >>>>>>>>>> X3: 64 (24.7 bits) > >>>>>>>>>> S1: 42 (20.8 bits) > >>>>>>>>>> S2: 65 (29.6 bits) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> -----Original Message----- > >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l- > >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>>>>>>>>> To: BioPerl List > >>>>>>>>>>> Subject: [Bioperl-l] different results with > >>> remote-blast skript > >>>>>>>>>>> > >>>>>>>>>>> Hi again :) > >>>>>>>>>>> please, I only have this little question: > >>>>>>>>>>> why do I get different results with my remote::blast > >>>>>> perl skript > >>>>>>>>>>> then on the > >>>>>>>>>>> ncbi blast homepage? > >>>>>>>>>>> I am using blastp, the query is an amino-sequence > (different > >>>>>>>>>>> results with any > >>>>>>>>>>> sequence, differences not only in number of hits but > >>> even in e- > >>>>>>>>>>> values, scores > >>>>>>>>>>> etc...), the database is 'nr'. > >>>>>>>>>>> PLEASE help me, > >>>>>>>>>>> thank you in advance, > >>>>>>>>>>> Jonas > >>>>>>>>>>> > >>>>>>>>>>> ps: my skript: > >>>>>>>>>>> > >>>>>>> > >>>>>> ############################################################## > >>>>>> ################ > >>>>>>>>>>> ## > >>>>>>>>>>> use Bio::Seq::SeqFactory; > >>>>>>>>>>> use Bio::Tools::Run::RemoteBlast; > >>>>>>>>>>> use strict; > >>>>>>>>>>> my @blast_report; > >>>>>>>>>>> my $prog = 'blastp'; > >>>>>>>>>>> my $db = 'nr'; > >>>>>>>>>>> my $e_val= '1e-10'; > >>>>>>>>>>> #my $e_val= '10'; > >>>>>>>>>>> my @params = ( '-prog' => $prog, > >>>>>>>>>>> '-data' => $db, > >>>>>>>>>>> '-expect' => $e_val, > >>>>>>>>>>> '-readmethod' => 'SearchIO' ); > >>>>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params); > >>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} > = '11 1'; > >>>>>>>>>>> > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>>>>>>>>> $ > >>>>>>>>>>> Bio > >>>>>>>>>>> > >>>>>> > ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>>>>>>>>> = '1'; > >>>>>>>>>>> > >>>>>>>>>>> my > >>>>>>>>>>> $ > >>>>>>>>>>> blast_seq > >>>>>>>>>>> > >>>>>> > >>> > ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>>>>>>>>> > >>>>>>> > >>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN > >>>>>> AFRQAHQNTAMATGPD > >>>>>>>>>>> PDDEYE'; > >>>>>>>>>>> #$v is just to turn on and off the messages > >>>>>>>>>>> my $v = 1; > >>>>>>>>>>> my $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>>>>>>>>> 'Bio::PrimarySeq'); > >>>>>>>>>>> my $seq = $seqbuilder->create(-seq =>$blast_seq, > >>> -display_id => > >>>>>>>>>>> "$blast_seq"); > >>>>>>>>>>> my $filename='temp2.out'; > >>>>>>>>>>> my $r = $factory->submit_blast($seq); > >>>>>>>>>>> print STDERR "waiting..." if( $v > 0 ); > >>>>>>>>>>> while ( my @rids = $factory->each_rid ) > >>>>>>>>>>> { > >>>>>>>>>>> foreach my $rid ( @rids ) > >>>>>>>>>>> { > >>>>>>>>>>> my $rc = $factory->retrieve_blast($rid); > >>>>>>>>>>> if( !ref($rc) ) > >>>>>>>>>>> { > >>>>>>>>>>> if( $rc < 0 ) > >>>>>>>>>>> { > >>>>>>>>>>> $factory->remove_rid($rid); > >>>>>>>>>>> } > >>>>>>>>>>> print STDERR "." if ( $v > 0 ); > >>>>>>>>>>> } > >>>>>>>>>>> else > >>>>>>>>>>> { > >>>>>>>>>>> my $result = $rc->next_result(); > >>>>>>>>>>> $factory->save_output($filename); > >>>>>>>>>>> $factory->remove_rid($rid); > >>>>>>>>>>> print "\nQuery Name: ", > >>>>>> $result->query_name(), > >>>>>>>>>>> "\n"; > >>>>>>>>>>> while ( my $hit = $result->next_hit ) > >>>>>>>>>>> { > >>>>>>>>>>> next unless ( $v > 0); > >>>>>>>>>>> print "\thit name is ", > >>> $hit->name, "\n"; > >>>>>>>>>>> while( my $hsp = $hit->next_hsp ) > >>>>>>>>>>> { > >>>>>>>>>>> print "\t\tscore is ", > >>>>>> $hsp->score, "\n"; > >>>>>>>>>>> } > >>>>>>>>>>> } > >>>>>>>>>>> } > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> } > >>>>>>>>>>> @blast_report = get_file_data ($filename); > >>>>>>>>>>> return @blast_report; > >>>>>>>>>>> > >>>>>>> > >>>>>> ############################################################## > >>>>>> ################ > >>>>>>>>>>> #### > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> Bioperl-l mailing list > >>>>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>>> = > >>>>>>>>>> = > >>>>>>>>>> > >>>>>> > >>> > ===================================================================== > >>>>>>>>>> Attention: The information contained in this message and/or > >>>>>>>>>> attachments > >>>>>>>>>> from AgResearch Limited is intended only for the > >>>>>> persons or entities > >>>>>>>>>> to which it is addressed and may contain > confidential and/or > >>>>>>>>>> privileged > >>>>>>>>>> material. Any review, retransmission, dissemination > >>> or other use > >>>>>>>>>> of, or > >>>>>>>>>> taking of any action in reliance upon, this information > >>>>>> by persons or > >>>>>>>>>> entities other than the intended recipients is > prohibited by > >>>>>>>>>> AgResearch > >>>>>>>>>> Limited. If you have received this message in error, > >>>>>> please notify > >>>>>>>>>> the > >>>>>>>>>> sender immediately. > >>>>>>>>>> = > >>>>>>>>>> = > >>>>>>>>>> > >>>>>> > >>> > ===================================================================== > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> Bioperl-l mailing list > >>>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Jason Stajich > >>>>>>>>> jason at bioperl.org > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Bioperl-l mailing list > >>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> -------------------------------------------------------------- > >>>>>> ---------------- > >>>>>>> -- > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> No virus found in this incoming message. > >>>>>>> Checked by AVG - www.avg.com > >>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release > >>>>>> Date: 07/05/09 > >>>>>>> 05:53:00 > >>>>>> > >>>>>> > >>>>>> -------------------------------------------------------------- > >>>>>> ------------------ > >>>>>> > >>>>>> > >>>>>> > >>>>>> No virus found in this incoming message. > >>>>>> Checked by AVG - www.avg.com > >>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release > >>>>>> Date: 07/05/09 > >>>>>> 17:54:00 > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>> > >>> > >>> -------------------------------------------------------------- > >>> ------------------ > >>> > >>> > >>> > >>> No virus found in this incoming message. > >>> Checked by AVG - www.avg.com > >>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release > >>> Date: 07/09/09 > >>> 05:55:00 > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -------------------------------------------------------------- > ------------------ > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.375 / Virus Database: 270.13.9/2229 - Release > Date: 07/10/09 > 07:05:00 > > From karthik085 at gmail.com Tue Jul 14 18:23:21 2009 From: karthik085 at gmail.com (Rajasekar Karthik) Date: Tue, 14 Jul 2009 18:23:21 -0400 Subject: [Bioperl-l] Bioperl Entrez Esearch Message-ID: Hi, I an new to Bioperl. How can I do an Entrez Esearch using Bioperl? For example, I want to do an exact title search in pubmed Title: Guidelines for quantitative rt-PCR Using HTTP Get, I would do something like this URL: http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&field=titl&term=Guidelines%20for%20quantitative%20rt-PCR to get the response XML. How can I use Bioperl to do the above action? Thanks. -- Best Regards, Rajasekar Karthik karthik085 at gmail.com From Russell.Smithies at agresearch.co.nz Tue Jul 14 18:33:55 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 15 Jul 2009 10:33:55 +1200 Subject: [Bioperl-l] Bioperl Entrez Esearch In-Reply-To: References: Message-ID: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> You sure can. Take a look at http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rajasekar Karthik > Sent: Wednesday, 15 July 2009 10:23 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bioperl Entrez Esearch > > Hi, > I an new to Bioperl. How can I do an Entrez Esearch using Bioperl? > > For example, I want to do an exact title search in pubmed > Title: Guidelines for quantitative rt-PCR > > Using HTTP Get, I would do something like this > URL: > http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&field=titl&te > rm=Guidelines%20for%20quantitative%20rt-PCR > to get the response XML. > > How can I use Bioperl to do the above action? > > Thanks. > > -- > Best Regards, > Rajasekar Karthik > karthik085 at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From thinkalldifferently at gmail.com Wed Jul 15 08:02:15 2009 From: thinkalldifferently at gmail.com (tkd) Date: Wed, 15 Jul 2009 14:02:15 +0200 Subject: [Bioperl-l] verify format Message-ID: <4A5DC547.8030301@gmail.com> Hi, I'm computer scientist and I'm new in the bioinformatics world, and bioperl ... I have a small problem : I have a file, which has to be in FASTA format, but it could be wrong. So, I need : - to verify if my file is well-written in fasta format - to verify is the content of the fasta file is protein OR nucloetides I think there surely exists someting in bioperl to that but I didn't find anything. Thank for your help tkd From florian.mittag at uni-tuebingen.de Wed Jul 15 09:00:21 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Wed, 15 Jul 2009 15:00:21 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907061808.18651.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <5EC3CB83-22AD-4C79-9F6C-047ED58B7962@gmx.net> <200907061808.18651.florian.mittag@uni-tuebingen.de> Message-ID: <200907151500.21947.florian.mittag@uni-tuebingen.de> Hi! So, I have finally installed the new version 9.7 of DB2, but I am a bit disappointed. First of all, the TRUNCATE TABLE command only works when it is the first command in a transaction, so I will skip this one for now. Secondly, they only changed the way "NULL" is interpreted in a SELECT statement, when there is a column name "NULL". But it still is not allowed to have untyped NULLs in a select statement. On Monday 06 July 2009 18:08, Florian Mittag wrote: > On Saturday 04 July 2009 12:39, Hilmar Lapp wrote: > > I'd be surprised BTW if DB2 were indeed offended by the NULL in the > > above statement - I'm pretty sure that "SELECT NULL FROM > > sometable" (or "SELECT 1 FROM sometable") is standard SQL. Are you > > sure that if you execute such a statement at a SQL prompt it results > > in an error? > > > > Since I can hardly believe that DB2 doesn't support selecting > > constants (NULL is as much a constant as 1 is), maybe what it wants > > though is aliasing the column. So if > > > > SELECT NULL FROM bioentry; > > > > yields an error, does > > > > SELECT NULL AS colAlias FROM bioentry; > > > > work fine? > > Well, it is like this with version 9.5 of DB2 Express-C: > > SELECT NULL FROM bioentry; > > yields: > SQL0206N "NULL" is not valid in the context where it is used. > SQLSTATE=42703 SQLCODE=-206 > > But if I do: > > SELECT cast(NULL AS VARCHAR(255)) FROM bioentry; > > [...] > > It ran fine without the NULL column, but that isn't necessarily a sign of > correctness. My problem was that (as stated above) the old version of DB2 > requires you to cast the NULL value to a data type, which I wasn't able to > determine from the code. With the new version, it should work, so I'll have > to rerun my tests again and see if the problem is still there. You convinced me that the NULL column is supposed to be there, so I found another workaround around line 1273 in BaseDriver.pm: if((! $attr) || (! $entitymap->{$tbl}) || $dont_select_attrs->{$tbl .".". $attr}) { #push(@attrs, "NULL"); push(@attrs, "cast(NULL as VARCHAR(255))"); } else { Since I don't know how to determine the datatype of the column that is set to NULL, I simply chose VARCHAR and tested it. And it worked! (BTW: The column set to NULL is named "rank" in the case below.) But as before, it gives me a bunch of Warnings. The other messages between the warning are debug messages I inserted myself and they show which SQL commands are to be executed. The following output is only the end of a nearly endless stream of warnings similar to those. -------------------- WARNING --------------------- MSG: GOC:mah exists in the dblink of _default --------------------------------------------------- SELECT UK Bio::DB::BioSQL::TermAdaptoridentifier identifier : GO:0034679 SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, cast(NULL as VARCHAR(255)), term.ontology_id FROM term WHERE identifier = ? -------------------- WARNING --------------------- MSG: PMID:12297042 exists in the dblink of _default --------------------------------------------------- SELECT UK Bio::DB::BioSQL::TermAdaptoridentifier identifier : GO:0070505 SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, cast(NULL as VARCHAR(255)), term.ontology_id FROM term WHERE identifier = ? -------------------- WARNING --------------------- MSG: GOC:mah exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: GOC:rph exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: PMID:12930826 exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: PMID:15012271 exists in the dblink of _default --------------------------------------------------- Should I be worried? For now, I'll continue with my actual work on our program, so it is possible that some problems will turn up later. Regards, Florian From bartomas at gmail.com Wed Jul 15 09:40:28 2009 From: bartomas at gmail.com (bar tomas) Date: Wed, 15 Jul 2009 14:40:28 +0100 Subject: [Bioperl-l] Finding all bioactive substances through EUtils or PUG_SOAP Message-ID: Hi, Could you give me a hint on how to query Entrez databases to find all substances that have been found to be bioactive through a bioassay screening. I've looked at the wsdl file for querying pubchem (* http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap.cgi?wsdl* ) but have found no service for retrieving substance ids. Is there a way to do this with EUtils or a http query with parameters ? Thanks a lot. Tomas B. From SMarkel at accelrys.com Wed Jul 15 11:23:33 2009 From: SMarkel at accelrys.com (Scott Markel) Date: Wed, 15 Jul 2009 11:23:33 -0400 Subject: [Bioperl-l] verify format In-Reply-To: <4A5DC547.8030301@gmail.com> References: <4A5DC547.8030301@gmail.com> Message-ID: <1F1240778FB0AF46B4E5A72C44D2C74732F496E6@exch1-hi.accelrys.net> tkd, See http://www.bioperl.org/wiki/HOWTO:Beginners#Retrieving_a_sequence_from_a_file for examples of how to read files using BioPerl. FASTA is one of the supported formats. Once you have a sequence object, e.g., after a line like $seq_obj = $seqio_obj->next_seq; you can use $seq_obj->alphabet to see if the sequence is "dna", "rna", or "protein". Note that this check is per sequence. There's no guarantee that all sequences in a FASTA file are of a single type. See http://www.bioperl.org/wiki/HOWTO:Beginners#The_Sequence_Object for other methods that can be called on a sequence object. Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (SciTegic R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Co-chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of tkd > Sent: Wednesday, 15 July 2009 5:02 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] verify format > > Hi, > > I'm computer scientist and I'm new in the bioinformatics world, and > bioperl ... > > I have a small problem : > > I have a file, which has to be in FASTA format, but it could be wrong. > So, I need : > - to verify if my file is well-written in fasta format > - to verify is the content of the fasta file is protein OR nucloetides > > I think there surely exists someting in bioperl to that but I didn't > find anything. > > Thank for your help > > tkd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 15 14:11:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Jul 2009 13:11:47 -0500 Subject: [Bioperl-l] Finding all bioactive substances through EUtils or PUG_SOAP In-Reply-To: References: Message-ID: Tomas B., Just so you know, this isn't really a bioperl-specific question, though you may be able to use bioperl tools to do what you want. I'll run with the latter assumption. I'm not too familiar with pubchem and related, but using einfo you can get relevant information on the databases. The available databases are: pcassay pccompound pcsubstance Lots of filters available, summarized here: http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_index My guess is you would have to query the database pcassay with esearch and the appropriate filter to find the IDs active for a particular assay, then use elink from pcassay to either pccompound or pcsubstance to get what you want. Using Bio::DB::EUtilities (below) this worked to get the compound IDs, you could probably get more information using esummary (not sure if you can retrieve all info on them). chris ========================================== #!/usr/bin/perl -w use strict; use warnings; use Bio::DB::EUtilities; my $term = '"Luciferase Profiling Assay"'; my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', -db => 'pcassay', -term => $term, -verbose => 1, -retmax => 100); my @ids = $factory->get_ids; # note the linkname, can use same for pcsubstance $factory->reset_parameters(-eutil => 'elink', -db => 'pccompound', -dbfrom => 'pcassay', -linkname => 'pcassay_pccompound_active', -id => \@ids); $factory->print_all; ========================================== chris On Jul 15, 2009, at 8:40 AM, bar tomas wrote: > Hi, > > Could you give me a hint on how to query Entrez databases to find all > substances that have been found to be bioactive through a bioassay > screening. > I've looked at the wsdl file for querying pubchem (* > http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap.cgi?wsdl* ) but > have found > no service for retrieving substance ids. > Is there a way to do this with EUtils or a http query with > parameters ? > Thanks a lot. > Tomas B. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 15 14:37:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Jul 2009 13:37:45 -0500 Subject: [Bioperl-l] Finding all bioactive substances through EUtils or PUG_SOAP In-Reply-To: References: Message-ID: Posted a modified example of this to the EUtilities cookbook: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#How_do_I_find_all_active_compounds.2Fsubstances_for_a_particular_bioassay.3F chris On Jul 15, 2009, at 1:11 PM, Chris Fields wrote: > Tomas B., > > Just so you know, this isn't really a bioperl-specific question, > though you may be able to use bioperl tools to do what you want. > I'll run with the latter assumption. > > I'm not too familiar with pubchem and related, but using einfo you > can get relevant information on the databases. The available > databases are: > > pcassay > pccompound > pcsubstance > > Lots of filters available, summarized here: > > http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_index > > My guess is you would have to query the database pcassay with > esearch and the appropriate filter to find the IDs active for a > particular assay, then use elink from pcassay to either pccompound > or pcsubstance to get what you want. > > Using Bio::DB::EUtilities (below) this worked to get the compound > IDs, you could probably get more information using esummary (not > sure if you can retrieve all info on them). > > chris > > ========================================== > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $term = '"Luciferase Profiling Assay"'; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'pcassay', > -term => $term, > -verbose => 1, > -retmax => 100); > > my @ids = $factory->get_ids; > > # note the linkname, can use same for pcsubstance > $factory->reset_parameters(-eutil => 'elink', > -db => 'pccompound', > -dbfrom => 'pcassay', > -linkname => 'pcassay_pccompound_active', > -id => \@ids); > > $factory->print_all; > ========================================== > > chris > > On Jul 15, 2009, at 8:40 AM, bar tomas wrote: > >> Hi, >> >> Could you give me a hint on how to query Entrez databases to find all >> substances that have been found to be bioactive through a bioassay >> screening. >> I've looked at the wsdl file for querying pubchem (* >> http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap.cgi?wsdl* ) but >> have found >> no service for retrieving substance ids. >> Is there a way to do this with EUtils or a http query with >> parameters ? >> Thanks a lot. >> Tomas B. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From asjo at koldfront.dk Wed Jul 15 16:25:13 2009 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 15 Jul 2009 22:25:13 +0200 Subject: [Bioperl-l] Tiny pod patch for Bio::SeqIO::scf - write_seq() Message-ID: <87y6qpk82e.fsf@topper.koldfront.dk> Hi. Here is a tiny pod patch for Bio::SeqIO::scf that makes the documentation match the code a little closer. Hope it's correct :-) Best regards, Adam --- Bio/SeqIO/scf.pm | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Bio/SeqIO/scf.pm b/Bio/SeqIO/scf.pm index e7576ea..8c8f77c 100644 --- a/Bio/SeqIO/scf.pm +++ b/Bio/SeqIO/scf.pm @@ -602,7 +602,7 @@ sub _dump_traces_incoming_deprecated_use_the_sequencetrace_object { c) peak indices d) traces - You _can_ write an scf with just a and b by passing in a - SequenceWithQuality object- false traces will be synthesized + Bio::Seq::Quality object- false traces will be synthesized for you. =cut -- 1.6.0.4 From asjo at koldfront.dk Wed Jul 15 16:37:37 2009 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Wed, 15 Jul 2009 22:37:37 +0200 Subject: [Bioperl-l] Tiny pod patch for Bio::SeqIO::scf - write_seq() In-Reply-To: <87y6qpk82e.fsf@topper.koldfront.dk> ("Adam =?iso-8859-1?Q?Sj?= =?iso-8859-1?Q?=F8gren=22's?= message of "Wed, 15 Jul 2009 22:25:13 +0200") References: <87y6qpk82e.fsf@topper.koldfront.dk> Message-ID: <87tz1dk7hq.fsf@topper.koldfront.dk> Here is another tiny update. Best regards, Adam --- Bio/SeqIO/scf.pm | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Bio/SeqIO/scf.pm b/Bio/SeqIO/scf.pm index 8c8f77c..efc543c 100644 --- a/Bio/SeqIO/scf.pm +++ b/Bio/SeqIO/scf.pm @@ -579,7 +579,7 @@ sub _dump_traces_incoming_deprecated_use_the_sequencetrace_object { =head2 write_seq - Title : write_seq(-Quality => $swq, ) + Title : write_seq(-target => $swq, ) Usage : $obj->write_seq( -target => $swq, -version => 2, -- 1.6.0.4 From budd at embl-heidelberg.de Sat Jul 11 03:52:41 2009 From: budd at embl-heidelberg.de (Aidan Budd) Date: Sat, 11 Jul 2009 09:52:41 +0200 (CEST) Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: <200907091150.20729.tristan.lefebure@gmail.com> Message-ID: On Thu, 9 Jul 2009, Tristan Lefebure wrote: > ... > > My understanding here is that the problem is linked to the > well-known difficulty to differentiate node from branch > labels in newick trees. Bootstrap scores are branch > attributes not node attributes, but since Bio::TreeI has no > branch/edge/bipartition object they are attached to a node, > and in fact reflects the bootstrap score of the ancestral > branch leading to that node. Troubles naturally come when > you are dealing with an unrooted tree or reroot a tree: a > child can become an ancestor, and, if the bootstrap scores > is not moved from the old child to the new child, it will > end up attached at the wrong place (i.e. wrong node). > > I see several fix to that: > > 1- incorporate Bank's fix into the root() method. I.e. if > there is bootstrap score, after re-rooting, the one on the > old to new ancestor path, should be moved to the right node. > > 2- Modify the way trees are stored in bioperl to incorporate > branch/edge/bipartition object, and move the bootstrap > scores to them. That won't be easy and will break many > things... Just wanted to add that, from my point of view, it would be great if it were possible to add edge/branch objects as part of the bioperl trees. Perhaps so that the previous set of methods still behaved as before, but with some new methods on the trees such as get_splits() or get_branches() along with associated split/branch/etc. objects...? Being a bioperl user but keeping well away from coding objects in perl, the lack of such methods/objects meant I chose, in the end, not to use a bioperl solution to work with my trees (going instead for a homemade clunky python solution, where I'm happier with the OO stuff) No idea how difficult/problematic this would be to implement, though - just my 2 cents worth... > What do you think? > > --Tristan > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ---------------------------------------------------------------------- Aidan Budd tel:+49 (0)6221 387 8530 EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 8517 Meyerhofstr. 1, 69117 Heidelberg, Germany http://www.embl-heidelberg.de/~budd/ http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html From pg4 at sanger.ac.uk Tue Jul 14 18:59:51 2009 From: pg4 at sanger.ac.uk (Pablo Marin-Garcia) Date: Tue, 14 Jul 2009 23:59:51 +0100 (BST) Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: Message-ID: Hello Abhishek Ensembl has a module for calculate SNP consequences in a transcript. The script that they use to create their consequences is located in: ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl The important bit is to convert your snp coordenates and the variation_allele into a ConsequenceType object $consequence_type = Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles); and pass this and a transcript to the type_variation Bio::EnsEMBL::Utils::TranscriptAlleles exported method $consequences = type_variation($tr, "", $consequence_type); in the module ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm The other important bit in this script is that now the functional_genomics consequences are calculated in this script instead in the type_variation() The only drawback is that it return only the ensembl classes of consequences , but you can extend that later if you need more specific consequences (I have done that in the past for different projects). This ensembl aproach will save you a lot of problems with the mapping from gene to protein and with multiple snps in a codon. If you have experience with ensembl then is easy to follow the code. If not you can always ask for help in the ensembl-dev mailing list (ensembl-dev at ebi.ac.uk) If you want to read the code without checking out the whole api: http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log hope this helps - Pablo Abhishek Pratap gmail.com> writes: > > Hi Jason > > Thanks for a detailed insight. I would definitely go the ensembl way first > and try to see if it can do exactly what we want. > > In case it does/'nt I will report back on this same thread. I think having > something like this in the Bioperl will help the NGS community. Lot of > people are predicting SNPs from NGS.oops(next generation sequencing ) data > and looking for ways to better annotate/classify their predictions. > > Thanks guys .. It is a pleasure to interact with you all. Just overwhelmed > to see the responses. > > best, > -Abhi > > On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich bioperl.org> wrote: > > > Ensembl would be best place to go if you are working with human SNPs but > > for those who aren't so data lucky... > > > > Aspects of this also relates to the dn/dS code in the > > Bio::Align::DNAStatistics -- thought it does the classification and > > comparison all at once so you'd have to dig code out. > > > > And the mcdonald_kreitman code in Bio::PopGen::Statistics which computes a > > synonymous or nonsynonymous via lookup table that is stored in > > Bio::MolEvol::CodonModel which compares the edit path which is encoded as > > the two codons concatenated together -- i.ee > > > > use Bio::MolEvol::CodonModel; > > my $codon_path = Bio::MolEvol::CodonModel->codon_path; > > my ($ns, $syn) = $codon_path->{'AATAAC'}; > > print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n"; > > > > > > It all kind of depends on how you have the data organized, if it is just > > SNPs and you are trying to figure out if they are syn or non-syn then you > > kind of need a good database to do this since you'll have to know what gene > > they are in, CDS of the gene, etc. It is possible to do with something as > > basic as GFF3 for your genome and the SNP locations and > > Bio::DB::SeqFeature::Store. While I can think of a way to code it up from > > those bare-bones - maybe you should report back if you can just use the > > Ensembl classification of the SNPs? > > > > -jason > > > > > > On Jul 13, 2009, at 9:33 AM, Chris Fields wrote: > > > > Bio::Coordinate might help with coordinate conversion. However, much of > >> this sounds very Ensembl-like. Have you looked at the Ensembl perl API? It > >> can do #1 (coordinate conversion), and I'm sure something could be written > >> up to do the second. > >> > >> chris > >> > >> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote: > >> > >> Thanks Abhi-- I had a feeling there was more (or "less") to it-- this > >>> would be a nice feature to have, don't think it exists. Will think about > >>> it-- cheers > >>> ----- Original Message ----- From: "Abhishek Pratap" < > >>> abhishek.vit gmail.com> > >>> To: "Mark A. Jensen" fortinbras.us> > >>> Cc: lists.open-bio.org> > >>> Sent: Monday, July 13, 2009 11:10 AM > >>> Subject: Re: [Bioperl-l] Classifying SNPs > >>> > >>> > >>> Dear Mark > >>>> Sorry I was not able to reply earlier. Many Thanks for your detailed > >>>> explanation. However this is not exactly what I am looking for. May be > >>>> my > >>>> initial mail was not well articulated or I am not able to infer your > >>>> reply > >>>> fully. My bad. > >>>> > >>>> Well as an input what we have is the just the genomic coordinates for > >>>> SNP's > >>>> predicted by Illumina propriety software CASAVA. What we would like to > >>>> do is > >>>> to further classify these predicted SNP's . If they fall into Coding > >>>> region > >>>> then whether they are synonymous/non-syn SNPs. > >>>> > >>>> So I guess something which translates > >>>> 1. SNP genomic coordinate into mRNA offset > >>>> 2. Then identify the ORF and target codon and check whether the SNP > >>>> substitution will be syn/non-syn. > >>>> > >>>> Thanks, > >>>> -Abhi > >>>> > >>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen fortinbras.us> > >>>> wrote: > >>>> > >>>> Hey Abhishek- > >>>>> You might root around in Bio::PopGen. Here's a script to get stuff from > >>>>> raw fasta data--see comments within. > >>>>> cheers > >>>>> Mark > >>>>> > >>>>> use Bio::AlignIO; > >>>>> use Bio::PopGen::Utilities; > >>>>> > >>>>> $file = "your_raw_file.fas"; > >>>>> > >>>>> > >>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln; > >>>>> # get the alignment into a Bio::PopGen::Population format, with codons > >>>>> # as the marker sites > >>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln, > >>>>> -site_model=>'cod'); > >>>>> # here are your variable codons... > >>>>> my @cdnpos = $pop->get_marker_names; > >>>>> # here are your individuals represented in the alignment > >>>>> my @inds = $pop->get_Individuals; > >>>>> # which have names like "Codon-3-9", "Codon-4-12", etc > >>>>> foreach my $cdn (@cdnpos) { > >>>>> # calculate the unique codons represented at this codon position > >>>>> my (%ucdns, @ucdns); > >>>>> @genos = $pop->get_Genotypes(-marker=>$cdn); > >>>>> $ucdns{$_->get_Alleles}++ for @genos; > >>>>> @ucdns = sort keys %ucdns; > >>>>> # > >>>>> # here, use translate or something faster to identify syn/non-syn > >>>>> # check out code in Bio::Align::DNAStatistics for various methods > >>>>> > >>>>> } > >>>>> # relate back to individuals with this > >>>>> foreach my $ind (@inds) { > >>>>> print "Individual ".$ind->unique_id."\n"; > >>>>> print "Site\tAllele\n"; > >>>>> foreach my $cdn (@cdnpos) { > >>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n"; > >>>>> } > >>>>> } > >>>>> > >>>>> > >>>>> 1; > >>>>> > >>>>> ----- Original Message ----- From: "Abhishek Pratap" < > >>>>> abhishek.vit gmail.com> > >>>>> To: lists.open-bio.org> > >>>>> Sent: Wednesday, July 08, 2009 10:24 AM > >>>>> Subject: [Bioperl-l] Classifying SNPs > >>>>> > >>>>> > >>>>> > >>>>> Hi All > >>>>> > >>>>> This might seem to be an old track question. However I was not able to > >>>>> find a good answer in the many diff mailing list archives. > >>>>> > >>>>> For all our SNP predictions we would like to know whether they are > >>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the > >>>>> position on the gene where amino acid is getting changed and to what > >>>>> ...Also info about indels will help. > >>>>> > >>>>> I am not sure if something like this already exists. If not even some > >>>>> pointers on how to move forward will help. > >>>>> > >>>>> Thanks, > >>>>> -Abhi > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > -- > > Jason Stajich > > jason bioperl.org > > > > ===================================================================== Pablo Marin-Garcia, PhD \\// (Argiope bruennichi \/\/`(||>O:'\/\/ with stabilimentum) //\\ Sanger Institute | PostDoc / Computer Biologist Wellcome Trust Genome Campus | team : 128/108 (Human Genetics) Hinxton, Cambridge CB10 1HH | room : N333 United Kingdom | email: pablo.marin at sanger.ac.uk ==================================================================== -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From pg4 at sanger.ac.uk Tue Jul 14 19:57:52 2009 From: pg4 at sanger.ac.uk (Pablo Marin-Garcia) Date: Wed, 15 Jul 2009 00:57:52 +0100 (BST) Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: Message-ID: fixing a typo and explaining a gotcha On Tue, 14 Jul 2009, Pablo Marin-Garcia wrote: > > Hello Abhishek > > Ensembl has a module for calculate SNP consequences in a transcript. > > The script that they use to create their consequences is located in: > > ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl > > The important bit is to convert your snp coordenates and the > variation_allele into a ConsequenceType object > > $consequence_type = > Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles); > fixing typo: (instead $chr it would be a $variation_id) Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$var_id,$var_start,$var_end,$var_strand,\@alleles); warning: The transcript_id and the variation_id are not important if you are not building a ensembl database. BUT the gotcha part is that the start and end of the variation should refer to the same slice start than the transcript used in the next step (type_variation). Be careful because depending how you select the gene or slice to retrieve your transcripts your transcript start and end would be the chromosome coordinates or a relative start/end from the slice start. You should work with chr positions for the variations and the transcripts (where start/end == seq_region_start/seq_region_end) to avoid problems. > and pass this and a transcript to the type_variation > Bio::EnsEMBL::Utils::TranscriptAlleles exported method > > $consequences = type_variation($tr, $gene, $consequence_type); > The $gene is optional > in the module > > ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm > > The other important bit in this script is that now the functional_genomics > consequences are calculated in this script instead in the type_variation() > > The only drawback is that it return only the ensembl classes of consequences > , but you can extend that later if you need more specific consequences (I > have done that in the past for different projects). > > This ensembl aproach will save you a lot of problems with the mapping from > gene to protein and with multiple snps in a codon. > > If you have experience with ensembl then is easy to follow the code. If not > you can always ask for help in the ensembl-dev mailing list > (ensembl-dev at ebi.ac.uk) > > > If you want to read the code without checking out the whole api: > > > http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup > http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log > > > hope this helps > > > - Pablo > > > > ===================================================================== Pablo Marin-Garcia, PhD \\// (Argiope bruennichi \/\/`(||>O:'\/\/ with stabilimentum) //\\ Sanger Institute | PostDoc / Computer Biologist Wellcome Trust Genome Campus | team : 128/108 (Human Genetics) Hinxton, Cambridge CB10 1HH | room : N333 United Kingdom | email: pablo.marin at sanger.ac.uk ==================================================================== -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From rhubley at systemsbiology.org Tue Jul 14 15:42:37 2009 From: rhubley at systemsbiology.org (Robert Hubley) Date: Tue, 14 Jul 2009 12:42:37 -0700 Subject: [Bioperl-l] RepeatMasker In-Reply-To: <3E4C0788-8B44-4408-BB26-FA9F48133948@illinois.edu> References: <4A535469.4060603@systemsbiology.org> <3E4C0788-8B44-4408-BB26-FA9F48133948@illinois.edu> Message-ID: <4A5CDFAD.6040809@systemsbiology.org> Hi Chris, Just got back from a conference. So the original problem reported ( "Exception ( no such file or directory )" ) caused by: "Cause: mysequence.masked file (which holds the masked sequence) not found when no repeats are found in the supplied sequence. This file is not created anymore when no repeats are found." This is still the case, with RepeatMasker at least. We no longer create a *.masked file when no repeats are located in the input file. This can be checked by looking at the *.out file which in these cases will contain only one line: "There were no repetitive sequences detected in ......" In comment #1 of this bugreport a user writes: "This may be related to a bug with RepeatMasker and is known to be an issue with BioPerl: http://www.bioperl.org/wiki/Release_1.5.2#Notes The RepeatMasker authors have been notified about this and hopefully will have a fix available soon. The question now is, should RepeatMasker.pm check for no returned results?" This refers to a bug in the option processing in RM Version 3.1.6 ( '-noint' etc. ). This was fixed in 3.1.7 and has not returned. So this should no longer be impacting any version of bioperl. -R Chris Fields wrote: > Robert, > > Sorry about that last post, thought you were reporting a problem not > inquiring about one. > > Here's what we have: > > http://bugzilla.open-bio.org/show_bug.cgi?id=2138 > > Not sure but from the last few reports this is still a problem with > RepeatMasker and bioperl. I'll try looking into it from our end. > > chris > > On Jul 7, 2009, at 8:58 AM, Robert Hubley wrote: > >> This list email as forwarded to us by a colleague. I fixed this bug >> awhile back and I just double checked 3.2.8 and don't see any >> problems with the options -noint or -lcambig. Could someone help us >> determine how this is breaking bio-perl? >> >> Thanks, >> >> -Robert >> >> |We have told the guys at RepeatMasker that RM-3.1.6 have a problem >> |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug 2138). >> |And as of today, they are now at 3.2.8, and the problem is not fixed. >> |And I don't want my project to be stalled-- any tips for a workaround? >> || >> ||Hi, >> || >> ||Perhaps you already know about this, but in RepeatMasker 3.1.6 >> -noint ||cannot be used because of error 'Unknown option: >> noint-species'. >> ||This is caused by line 1131 having no space after the "-noint". >> ||Likewise, -lcambig on 1128 would probably suffer a similar problem. >> || >> ||Will this be fixed in the next version, and how often do you >> release new ||versions? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From karthik085 at gmail.com Wed Jul 15 17:34:21 2009 From: karthik085 at gmail.com (Rajasekar Karthik) Date: Wed, 15 Jul 2009 17:34:21 -0400 Subject: [Bioperl-l] Bioperl Entrez Esearch In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> Message-ID: that helps - thanks!!! On Tue, Jul 14, 2009 at 6:33 PM, Smithies, Russell < Russell.Smithies at agresearch.co.nz> wrote: > You sure can. > Take a look at http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > > --Russell > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Rajasekar Karthik > > Sent: Wednesday, 15 July 2009 10:23 a.m. > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] Bioperl Entrez Esearch > > > > Hi, > > I an new to Bioperl. How can I do an Entrez Esearch using Bioperl? > > > > For example, I want to do an exact title search in pubmed > > Title: Guidelines for quantitative rt-PCR > > > > Using HTTP Get, I would do something like this > > URL: > > > http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&field=titl&te > > rm=Guidelines%20for%20quantitative%20rt-PCR > > to get the response XML. > > > > How can I use Bioperl to do the above action? > > > > Thanks. > > > > -- > > Best Regards, > > Rajasekar Karthik > > karthik085 at gmail.com > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > -- Best Regards, Rajasekar Karthik karthik085 at gmail.com From cjfields at illinois.edu Wed Jul 15 17:54:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Jul 2009 16:54:29 -0500 Subject: [Bioperl-l] Tiny pod patch for Bio::SeqIO::scf - write_seq() In-Reply-To: <87tz1dk7hq.fsf@topper.koldfront.dk> References: <87y6qpk82e.fsf@topper.koldfront.dk> <87tz1dk7hq.fsf@topper.koldfront.dk> Message-ID: <7FA9D01D-FD3F-4B61-8B35-7A3364C2A3B5@illinois.edu> Okay, added in r15859 to core. thanks! chris On Jul 15, 2009, at 3:37 PM, Adam Sj?gren wrote: > Here is another tiny update. > > > Best regards, > > Adam > > --- > Bio/SeqIO/scf.pm | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/Bio/SeqIO/scf.pm b/Bio/SeqIO/scf.pm > index 8c8f77c..efc543c 100644 > --- a/Bio/SeqIO/scf.pm > +++ b/Bio/SeqIO/scf.pm > @@ -579,7 +579,7 @@ sub > _dump_traces_incoming_deprecated_use_the_sequencetrace_object { > > =head2 write_seq > > - Title : write_seq(-Quality => $swq, ) > + Title : write_seq(-target => $swq, ) > Usage : $obj->write_seq( > -target => $swq, > -version => 2, > -- > 1.6.0.4 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 15 18:11:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Jul 2009 17:11:44 -0500 Subject: [Bioperl-l] Tree refactor? was Re: Bootstrap, root, reroot... In-Reply-To: References: Message-ID: <2EF555C5-B709-46F4-BB59-90D1DECCEBCA@illinois.edu> On Jul 11, 2009, at 2:52 AM, Aidan Budd wrote: > On Thu, 9 Jul 2009, Tristan Lefebure wrote: > >> ... >> >> My understanding here is that the problem is linked to the >> well-known difficulty to differentiate node from branch >> labels in newick trees. Bootstrap scores are branch >> attributes not node attributes, but since Bio::TreeI has no >> branch/edge/bipartition object they are attached to a node, >> and in fact reflects the bootstrap score of the ancestral >> branch leading to that node. Troubles naturally come when >> you are dealing with an unrooted tree or reroot a tree: a >> child can become an ancestor, and, if the bootstrap scores >> is not moved from the old child to the new child, it will >> end up attached at the wrong place (i.e. wrong node). >> >> I see several fix to that: >> >> 1- incorporate Bank's fix into the root() method. I.e. if >> there is bootstrap score, after re-rooting, the one on the >> old to new ancestor path, should be moved to the right node. >> >> 2- Modify the way trees are stored in bioperl to incorporate >> branch/edge/bipartition object, and move the bootstrap >> scores to them. That won't be easy and will break many >> things... > > Just wanted to add that, from my point of view, it would be great if > it > were possible to add edge/branch objects as part of the bioperl trees. > Perhaps so that the previous set of methods still behaved as before, > but > with some new methods on the trees such as get_splits() or > get_branches() along with associated split/branch/etc. objects...? > > Being a bioperl user but keeping well away from coding objects in > perl, > the lack of such methods/objects meant I chose, in the end, not to > use a > bioperl solution to work with my trees (going instead for a homemade > clunky python solution, where I'm happier with the OO stuff) > > No idea how difficult/problematic this would be to implement, though - > just my 2 cents worth... Mark and Tristan have both indicated some of the problems that lie here, so it's worth discussing this on the list. I think the best way to approach this is to suggest what a proposed refactoring of Bio::Tree-related classes would look like (i.e. how it would be done, what is expected of said classes interface-wise, etc), and then come up with data and cases where the current classes don't DTRT, preferably as tests we can incorporate into the test suite. Note this will affect some of the key core classes we now have (seq classes specifically, so memory management will be important). I'll have my hands full with a few other refactors, so anyone out there willing to take the reins on this one? chris From cjfields at illinois.edu Wed Jul 15 18:12:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Jul 2009 17:12:32 -0500 Subject: [Bioperl-l] RepeatMasker In-Reply-To: <4A5CDFAD.6040809@systemsbiology.org> References: <4A535469.4060603@systemsbiology.org> <3E4C0788-8B44-4408-BB26-FA9F48133948@illinois.edu> <4A5CDFAD.6040809@systemsbiology.org> Message-ID: <69B434A9-ACA8-45C6-8499-55598B536EE5@illinois.edu> Thanks for the update Robert! chris On Jul 14, 2009, at 2:42 PM, Robert Hubley wrote: > Hi Chris, > > Just got back from a conference. > So the original problem reported ( "Exception ( no such file or > directory )" ) caused by: > > "Cause: mysequence.masked file (which holds the masked sequence) not > found when > no repeats are found in the supplied sequence. This file is not > created anymore > when no repeats are found." > > This is still the case, with RepeatMasker at least. We no longer > create a *.masked file when no repeats are located in the input > file. This can be checked by looking at the *.out file which in > these cases will contain only one line: > > "There were no repetitive sequences detected in ......" > > > In comment #1 of this bugreport a user writes: > > "This may be related to a bug with RepeatMasker and is known to be > an issue with > BioPerl: > > http://www.bioperl.org/wiki/Release_1.5.2#Notes > > The RepeatMasker authors have been notified about this and > hopefully will have > a fix available soon. The question now is, should RepeatMasker.pm > check for no > returned results?" > > This refers to a bug in the option processing in RM Version 3.1.6 > ( '-noint' etc. ). This was fixed > in 3.1.7 and has not returned. So this should no longer be > impacting any version of bioperl. > > -R > > > > > Chris Fields wrote: >> Robert, >> >> Sorry about that last post, thought you were reporting a problem >> not inquiring about one. >> >> Here's what we have: >> >> http://bugzilla.open-bio.org/show_bug.cgi?id=2138 >> >> Not sure but from the last few reports this is still a problem with >> RepeatMasker and bioperl. I'll try looking into it from our end. >> >> chris >> >> On Jul 7, 2009, at 8:58 AM, Robert Hubley wrote: >> >>> This list email as forwarded to us by a colleague. I fixed this >>> bug awhile back and I just double checked 3.2.8 and don't see any >>> problems with the options -noint or -lcambig. Could someone help >>> us determine how this is breaking bio-perl? >>> >>> Thanks, >>> >>> -Robert >>> >>> |We have told the guys at RepeatMasker that RM-3.1.6 have a problem >>> |causing Bio::Tools::RepeatMasker to crash in November 2006 (Bug >>> 2138). >>> |And as of today, they are now at 3.2.8, and the problem is not >>> fixed. >>> |And I don't want my project to be stalled-- any tips for a >>> workaround? >>> || >>> ||Hi, >>> || >>> ||Perhaps you already know about this, but in RepeatMasker 3.1.6 - >>> noint ||cannot be used because of error 'Unknown option: noint- >>> species'. >>> ||This is caused by line 1131 having no space after the "-noint". >>> ||Likewise, -lcambig on 1128 would probably suffer a similar >>> problem. >>> || >>> ||Will this be fixed in the next version, and how often do you >>> release new ||versions? >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From tristan.lefebure at gmail.com Wed Jul 15 18:21:52 2009 From: tristan.lefebure at gmail.com (Tristan Lefebure) Date: Wed, 15 Jul 2009 18:21:52 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: References: Message-ID: <200907151821.52558.tristan.lefebure@gmail.com> This would indeed be very good. It seems that the current implementation might always be suboptimal. Patch after patch, I am still harassing Mark with bugs... (see bug #2877 !). I don't have the necessary OO skills to help building the core object, but I would be happy to help with some functions (like get_bipartition_table(), is_edge_conflicting(), ...). Not sure this could be helpful, but I find the ape phylo object of R to be both very simple and quite flexible. See http://ape.mpl.ird.fr/misc/FormatTreeR_28July2008.pdf for details. --Tristan On Wednesday 15 July 2009 17:54:25 Mark A. Jensen wrote: > After fooling around with bug 2877, > I'm thinking seriously about starting the edge-branch > project in bioperl-dev, building out an implementation > off the interfaces B:T:TreeI and B:T:NodeI. It would > give the opp'y for some code rationalization too. > > Anyone out there have a problem with that? > cheers MAJ > ----- Original Message ----- > From: "Aidan Budd" > To: "BioPerl List" > Sent: Saturday, July 11, 2009 3:52 AM > Subject: Re: [Bioperl-l] Bootstrap, root, reroot... > > > On Thu, 9 Jul 2009, Tristan Lefebure wrote: > >> ... > >> > >> My understanding here is that the problem is linked to > >> the well-known difficulty to differentiate node from > >> branch labels in newick trees. Bootstrap scores are > >> branch attributes not node attributes, but since > >> Bio::TreeI has no branch/edge/bipartition object they > >> are attached to a node, and in fact reflects the > >> bootstrap score of the ancestral branch leading to > >> that node. Troubles naturally come when you are > >> dealing with an unrooted tree or reroot a tree: a > >> child can become an ancestor, and, if the bootstrap > >> scores is not moved from the old child to the new > >> child, it will end up attached at the wrong place > >> (i.e. wrong node). > >> > >> I see several fix to that: > >> > >> 1- incorporate Bank's fix into the root() method. I.e. > >> if there is bootstrap score, after re-rooting, the one > >> on the old to new ancestor path, should be moved to > >> the right node. > >> > >> 2- Modify the way trees are stored in bioperl to > >> incorporate branch/edge/bipartition object, and move > >> the bootstrap scores to them. That won't be easy and > >> will break many things... > > > > Just wanted to add that, from my point of view, it > > would be great if it were possible to add edge/branch > > objects as part of the bioperl trees. Perhaps so that > > the previous set of methods still behaved as before, > > but with some new methods on the trees such as > > get_splits() or get_branches() along with associated > > split/branch/etc. objects...? > > > > Being a bioperl user but keeping well away from coding > > objects in perl, the lack of such methods/objects meant > > I chose, in the end, not to use a bioperl solution to > > work with my trees (going instead for a homemade clunky > > python solution, where I'm happier with the OO stuff) > > > > No idea how difficult/problematic this would be to > > implement, though - just my 2 cents worth... > > > >> What do you think? > >> > >> --Tristan > >> > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ------------------------------------------------------- > >--------------- Aidan Budd > > tel:+49 (0)6221 387 8530 EMBL - European Molecular > > Biology Laboratory fax:+49 (0)6221 387 8517 > > Meyerhofstr. 1, 69117 Heidelberg, Germany > > > > http://www.embl-heidelberg.de/~budd/ > > http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Jul 15 17:54:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 15 Jul 2009 17:54:25 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: References: Message-ID: After fooling around with bug 2877, I'm thinking seriously about starting the edge-branch project in bioperl-dev, building out an implementation off the interfaces B:T:TreeI and B:T:NodeI. It would give the opp'y for some code rationalization too. Anyone out there have a problem with that? cheers MAJ ----- Original Message ----- From: "Aidan Budd" To: "BioPerl List" Sent: Saturday, July 11, 2009 3:52 AM Subject: Re: [Bioperl-l] Bootstrap, root, reroot... > On Thu, 9 Jul 2009, Tristan Lefebure wrote: > >> ... >> >> My understanding here is that the problem is linked to the >> well-known difficulty to differentiate node from branch >> labels in newick trees. Bootstrap scores are branch >> attributes not node attributes, but since Bio::TreeI has no >> branch/edge/bipartition object they are attached to a node, >> and in fact reflects the bootstrap score of the ancestral >> branch leading to that node. Troubles naturally come when >> you are dealing with an unrooted tree or reroot a tree: a >> child can become an ancestor, and, if the bootstrap scores >> is not moved from the old child to the new child, it will >> end up attached at the wrong place (i.e. wrong node). >> >> I see several fix to that: >> >> 1- incorporate Bank's fix into the root() method. I.e. if >> there is bootstrap score, after re-rooting, the one on the >> old to new ancestor path, should be moved to the right node. >> >> 2- Modify the way trees are stored in bioperl to incorporate >> branch/edge/bipartition object, and move the bootstrap >> scores to them. That won't be easy and will break many >> things... > > Just wanted to add that, from my point of view, it would be great if it > were possible to add edge/branch objects as part of the bioperl trees. > Perhaps so that the previous set of methods still behaved as before, but > with some new methods on the trees such as get_splits() or > get_branches() along with associated split/branch/etc. objects...? > > Being a bioperl user but keeping well away from coding objects in perl, > the lack of such methods/objects meant I chose, in the end, not to use a > bioperl solution to work with my trees (going instead for a homemade > clunky python solution, where I'm happier with the OO stuff) > > No idea how difficult/problematic this would be to implement, though - > just my 2 cents worth... > >> What do you think? >> >> --Tristan >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > ---------------------------------------------------------------------- > Aidan Budd tel:+49 (0)6221 387 8530 > EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 8517 > Meyerhofstr. 1, 69117 Heidelberg, Germany > > http://www.embl-heidelberg.de/~budd/ > http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From bill at genenformics.com Wed Jul 15 19:09:51 2009 From: bill at genenformics.com (bill at genenformics.com) Date: Wed, 15 Jul 2009 16:09:51 -0700 Subject: [Bioperl-l] verify format In-Reply-To: <4A5DC547.8030301@gmail.com> References: <4A5DC547.8030301@gmail.com> Message-ID: <1dabd42d9c4f79c4758380a4194f455e.squirrel@mail.dreamhost.com> Hopefully this is helpful! http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/util/format_guess.cpp#L838 > Hi, > > I'm computer scientist and I'm new in the bioinformatics world, and > bioperl ... > > I have a small problem : > > I have a file, which has to be in FASTA format, but it could be wrong. > So, I need : > - to verify if my file is well-written in fasta format > - to verify is the content of the fasta file is protein OR nucloetides > > I think there surely exists someting in bioperl to that but I didn't > find anything. > > Thank for your help > > tkd > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj.fortinbras at gmail.com Wed Jul 15 19:25:43 2009 From: maj.fortinbras at gmail.com (Mark Jensen) Date: Wed, 15 Jul 2009 19:25:43 -0400 Subject: [Bioperl-l] Tree refactor? was Re: Bootstrap, root, reroot... In-Reply-To: References: Message-ID: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> Hey all- I'm willing to spearhead this. I was thinking of a bioperl-dev module that concretizes the B:T:TreeI and B:T:NodeI interfaces, to get started. I don't think we have to spring a edge-based tree object on the unsuspecting masses all at once, but write a Tree class that has all the capabilities defined by the interface, and then some extras, as Tristan suggests in his post. We can squeak it back into the core with some node-based->edge-based conversion utilities, and possibly put the current implementation into a deprecation cycle (but I'm thinking that's a bit drastic). MAJ I have some unformed thoughts about this.... > ----- Original Message ----- From: "Chris Fields" > To: "Aidan Budd" > Cc: "BioPerl List" > Sent: Wednesday, July 15, 2009 6:11 PM > Subject: [Bioperl-l] Tree refactor? was Re: Bootstrap, root, reroot... > > > >> On Jul 11, 2009, at 2:52 AM, Aidan Budd wrote: >> >> On Thu, 9 Jul 2009, Tristan Lefebure wrote: >>> >>> ... >>>> >>>> My understanding here is that the problem is linked to the >>>> well-known difficulty to differentiate node from branch >>>> labels in newick trees. Bootstrap scores are branch >>>> attributes not node attributes, but since Bio::TreeI has no >>>> branch/edge/bipartition object they are attached to a node, >>>> and in fact reflects the bootstrap score of the ancestral >>>> branch leading to that node. Troubles naturally come when >>>> you are dealing with an unrooted tree or reroot a tree: a >>>> child can become an ancestor, and, if the bootstrap scores >>>> is not moved from the old child to the new child, it will >>>> end up attached at the wrong place (i.e. wrong node). >>>> >>>> I see several fix to that: >>>> >>>> 1- incorporate Bank's fix into the root() method. I.e. if >>>> there is bootstrap score, after re-rooting, the one on the >>>> old to new ancestor path, should be moved to the right node. >>>> >>>> 2- Modify the way trees are stored in bioperl to incorporate >>>> branch/edge/bipartition object, and move the bootstrap >>>> scores to them. That won't be easy and will break many >>>> things... >>>> >>> >>> Just wanted to add that, from my point of view, it would be great if it >>> were possible to add edge/branch objects as part of the bioperl trees. >>> Perhaps so that the previous set of methods still behaved as before, but >>> with some new methods on the trees such as get_splits() or >>> get_branches() along with associated split/branch/etc. objects...? >>> >>> Being a bioperl user but keeping well away from coding objects in perl, >>> the lack of such methods/objects meant I chose, in the end, not to use a >>> bioperl solution to work with my trees (going instead for a homemade >>> clunky python solution, where I'm happier with the OO stuff) >>> >>> No idea how difficult/problematic this would be to implement, though - >>> just my 2 cents worth... >>> >> >> Mark and Tristan have both indicated some of the problems that lie here, >> so it's worth discussing this on the list. I think the best way to >> approach this is to suggest what a proposed refactoring of >> Bio::Tree-related classes would look like (i.e. how it would be done, what >> is expected of said classes interface-wise, etc), and then come up with >> data and cases where the current classes don't DTRT, preferably as tests we >> can incorporate into the test suite. >> >> Note this will affect some of the key core classes we now have (seq >> classes specifically, so memory management will be important). I'll have >> my hands full with a few other refactors, so anyone out there willing to >> take the reins on this one? >> >> chris >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> From rmb32 at cornell.edu Wed Jul 15 21:05:43 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 15 Jul 2009 18:05:43 -0700 Subject: [Bioperl-l] Tree refactor? was Re: Bootstrap, root, reroot... In-Reply-To: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> Message-ID: <4A5E7CE7.4040908@cornell.edu> Rather than putting this in bioperl-dev, perhaps this would be a nice opportunity to make a new distribution called something standard like "Bio-Tree", with a standard directory structure, and a sane number of modules in it. I hadn't planned to start an actual battle about this yet, but I would just like to get it out there that the current 'huge monolithic distributions' model of BioPerl is completely insane. Talking to people about BioPerl at YAPC::NA last month, I saw that this is quite puzzling to the wider Perl community. I was going to say it was a laughingstock, but that's not actually the case. They are mostly puzzled and strongly suspect that it's not right. Well, the diplomatic ones do, anyway. Matt Trout (of DBIx::Class and Catalyst fame) would probably yell and curse about it in a very entertaining way. If things were in smaller distributions, making and testing releases would be a lot easier, because the pieces of code you're testing and releasing are smaller, and the dependencies among the pieces are characterized, codified, and enforced via the Build.PL files of each distribution. There, I said it. But aside from my inflammatory remarks above, this sort of thing need not happen all at once. The "Bio-Tree" distribution is a nice example of how things could be extracted from or begun outside the bioperl-* distributions, with the bioperl-* monolithic balls of mud getting smaller as things are moved from them into their own distributions. This needs to be done carefully, but so things like this should probably be done only with major releases, and with lots of notifications and release notes and things like that. OK, now that I've said "this sucks and needs to change", I now go on to volunteer to do work to make it happen. I will take and execute orders from you core developers saying things like "make a branch, take this list of modules, copy them into a new distribution, move their tests over, and write a Build.PL with the correct dependencies", and later "merge the moved_thing_somewhere" branch into the some_other_branch and test it". I bet somebody whose name rhymes with "Jay Hannah" would probably do grunt work to help with this also, but of course he would have to volunteer first. ;-) I also volunteer to help teach others how to do this, but they have to figure out how to use IRC. Oh, and I also volunteer to keep writing inflammatory emails. Rob From cjfields at illinois.edu Wed Jul 15 23:29:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Jul 2009 22:29:13 -0500 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: References: Message-ID: <7474C443-028A-409C-B50D-98A83852E554@illinois.edu> I don't; code away. With the edge/branch objects, I'm wondering whether those can be created lazily (only when needed); it might lighten up the tree a bit. Also, don't forget to look at Rutger's Bio::Phylo project, though I think his modules use inside-out objects (might not be easy to work into core unless they are wrapped). chris On Jul 15, 2009, at 4:54 PM, Mark A. Jensen wrote: > After fooling around with bug 2877, I'm thinking seriously about > starting the edge-branch > project in bioperl-dev, building out an implementation off the > interfaces B:T:TreeI and B:T:NodeI. It would > give the opp'y for some code rationalization too. > Anyone out there have a problem with that? > cheers MAJ > ----- Original Message ----- From: "Aidan Budd" > > To: "BioPerl List" > Sent: Saturday, July 11, 2009 3:52 AM > Subject: Re: [Bioperl-l] Bootstrap, root, reroot... > > >> On Thu, 9 Jul 2009, Tristan Lefebure wrote: >>> ... >>> My understanding here is that the problem is linked to the well- >>> known difficulty to differentiate node from branch labels in >>> newick trees. Bootstrap scores are branch attributes not node >>> attributes, but since Bio::TreeI has no branch/edge/bipartition >>> object they are attached to a node, and in fact reflects the >>> bootstrap score of the ancestral branch leading to that node. >>> Troubles naturally come when you are dealing with an unrooted tree >>> or reroot a tree: a child can become an ancestor, and, if the >>> bootstrap scores is not moved from the old child to the new child, >>> it will end up attached at the wrong place (i.e. wrong node). I >>> see several fix to that: >>> 1- incorporate Bank's fix into the root() method. I.e. if there is >>> bootstrap score, after re-rooting, the one on the old to new >>> ancestor path, should be moved to the right node. 2- Modify the >>> way trees are stored in bioperl to incorporate branch/edge/ >>> bipartition object, and move the bootstrap scores to them. That >>> won't be easy and will break many things... >> Just wanted to add that, from my point of view, it would be great >> if it >> were possible to add edge/branch objects as part of the bioperl >> trees. Perhaps so that the previous set of methods still behaved >> as before, but >> with some new methods on the trees such as get_splits() or >> get_branches() along with associated split/branch/etc. objects...? >> Being a bioperl user but keeping well away from coding objects in >> perl, >> the lack of such methods/objects meant I chose, in the end, not to >> use a >> bioperl solution to work with my trees (going instead for a homemade >> clunky python solution, where I'm happier with the OO stuff) >> No idea how difficult/problematic this would be to implement, >> though - just my 2 cents worth... >>> What do you think? >>> --Tristan >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> ---------------------------------------------------------------------- >> Aidan Budd tel:+49 (0)6221 387 >> 8530 >> EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 >> 8517 >> Meyerhofstr. 1, 69117 Heidelberg, Germany >> http://www.embl-heidelberg.de/~budd/ >> http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Wed Jul 15 23:43:03 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 15 Jul 2009 23:43:03 -0400 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: <7474C443-028A-409C-B50D-98A83852E554@illinois.edu> References: <7474C443-028A-409C-B50D-98A83852E554@illinois.edu> Message-ID: <60B160081EEB4C0DB3C2D1DE756E1A7C@NewLife> To examine rvos' Bio::Phylo was my plan exactly-- Lazy edges can be done I believe, although it seems that one of the main reasons to have edges is to attach lengths, bootstrap values, etc to them; so we may ultimately avoid edge creation only when we construct tree topology only--prob rare in practice? ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "Aidan Budd" ; "BioPerl List" ; Sent: Wednesday, July 15, 2009 11:29 PM Subject: Re: [Bioperl-l] Bootstrap, root, reroot... >I don't; code away. > > With the edge/branch objects, I'm wondering whether those can be created > lazily (only when needed); it might lighten up the tree a bit. Also, don't > forget to look at Rutger's Bio::Phylo project, though I think his modules use > inside-out objects (might not be easy to work into core unless they are > wrapped). > > chris > > On Jul 15, 2009, at 4:54 PM, Mark A. Jensen wrote: > >> After fooling around with bug 2877, I'm thinking seriously about starting >> the edge-branch >> project in bioperl-dev, building out an implementation off the interfaces >> B:T:TreeI and B:T:NodeI. It would >> give the opp'y for some code rationalization too. >> Anyone out there have a problem with that? >> cheers MAJ >> ----- Original Message ----- From: "Aidan Budd" > > >> To: "BioPerl List" >> Sent: Saturday, July 11, 2009 3:52 AM >> Subject: Re: [Bioperl-l] Bootstrap, root, reroot... >> >> >>> On Thu, 9 Jul 2009, Tristan Lefebure wrote: >>>> ... >>>> My understanding here is that the problem is linked to the well- known >>>> difficulty to differentiate node from branch labels in newick trees. >>>> Bootstrap scores are branch attributes not node attributes, but since >>>> Bio::TreeI has no branch/edge/bipartition object they are attached to a >>>> node, and in fact reflects the bootstrap score of the ancestral branch >>>> leading to that node. Troubles naturally come when you are dealing with an >>>> unrooted tree or reroot a tree: a child can become an ancestor, and, if >>>> the bootstrap scores is not moved from the old child to the new child, it >>>> will end up attached at the wrong place (i.e. wrong node). I see several >>>> fix to that: >>>> 1- incorporate Bank's fix into the root() method. I.e. if there is >>>> bootstrap score, after re-rooting, the one on the old to new ancestor >>>> path, should be moved to the right node. 2- Modify the way trees are >>>> stored in bioperl to incorporate branch/edge/ bipartition object, and move >>>> the bootstrap scores to them. That won't be easy and will break many >>>> things... >>> Just wanted to add that, from my point of view, it would be great if it >>> were possible to add edge/branch objects as part of the bioperl trees. >>> Perhaps so that the previous set of methods still behaved as before, but >>> with some new methods on the trees such as get_splits() or get_branches() >>> along with associated split/branch/etc. objects...? >>> Being a bioperl user but keeping well away from coding objects in perl, >>> the lack of such methods/objects meant I chose, in the end, not to use a >>> bioperl solution to work with my trees (going instead for a homemade >>> clunky python solution, where I'm happier with the OO stuff) >>> No idea how difficult/problematic this would be to implement, though - just >>> my 2 cents worth... >>>> What do you think? >>>> --Tristan >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> ---------------------------------------------------------------------- >>> Aidan Budd tel:+49 (0)6221 387 8530 >>> EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 8517 >>> Meyerhofstr. 1, 69117 Heidelberg, Germany >>> http://www.embl-heidelberg.de/~budd/ >>> http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at illinois.edu Thu Jul 16 00:55:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 15 Jul 2009 23:55:29 -0500 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: <60B160081EEB4C0DB3C2D1DE756E1A7C@NewLife> References: <7474C443-028A-409C-B50D-98A83852E554@illinois.edu> <60B160081EEB4C0DB3C2D1DE756E1A7C@NewLife> Message-ID: Well, what I was thinking that Nodes sharing an edge could share the same hash ref containing the edge information, with said hash ref magically becoming an object when absolutely needed (keys being named parameters, values being args for constructor). Just a thought. chris On Jul 15, 2009, at 10:43 PM, Mark A. Jensen wrote: > To examine rvos' Bio::Phylo was my plan exactly-- Lazy edges can be > done I believe, although it seems > that one of the main reasons to have edges is to > attach lengths, bootstrap values, etc to them; so we may > ultimately avoid edge creation only when we construct tree > topology only--prob rare in practice? > > ----- Original Message ----- From: "Chris Fields" > > To: "Mark A. Jensen" > Cc: "Aidan Budd" ; "BioPerl List" >; > Sent: Wednesday, July 15, 2009 11:29 PM > Subject: Re: [Bioperl-l] Bootstrap, root, reroot... > > >> I don't; code away. >> >> With the edge/branch objects, I'm wondering whether those can be >> created lazily (only when needed); it might lighten up the tree a >> bit. Also, don't forget to look at Rutger's Bio::Phylo project, >> though I think his modules use inside-out objects (might not be >> easy to work into core unless they are wrapped). >> >> chris >> >> On Jul 15, 2009, at 4:54 PM, Mark A. Jensen wrote: >> >>> After fooling around with bug 2877, I'm thinking seriously about >>> starting the edge-branch >>> project in bioperl-dev, building out an implementation off the >>> interfaces B:T:TreeI and B:T:NodeI. It would >>> give the opp'y for some code rationalization too. >>> Anyone out there have a problem with that? >>> cheers MAJ >>> ----- Original Message ----- From: "Aidan Budd" >> > >>> To: "BioPerl List" >>> Sent: Saturday, July 11, 2009 3:52 AM >>> Subject: Re: [Bioperl-l] Bootstrap, root, reroot... >>> >>> >>>> On Thu, 9 Jul 2009, Tristan Lefebure wrote: >>>>> ... >>>>> My understanding here is that the problem is linked to the well- >>>>> known difficulty to differentiate node from branch labels in >>>>> newick trees. Bootstrap scores are branch attributes not node >>>>> attributes, but since Bio::TreeI has no branch/edge/bipartition >>>>> object they are attached to a node, and in fact reflects the >>>>> bootstrap score of the ancestral branch leading to that node. >>>>> Troubles naturally come when you are dealing with an unrooted >>>>> tree or reroot a tree: a child can become an ancestor, and, if >>>>> the bootstrap scores is not moved from the old child to the new >>>>> child, it will end up attached at the wrong place (i.e. wrong >>>>> node). I see several fix to that: >>>>> 1- incorporate Bank's fix into the root() method. I.e. if there >>>>> is bootstrap score, after re-rooting, the one on the old to new >>>>> ancestor path, should be moved to the right node. 2- Modify the >>>>> way trees are stored in bioperl to incorporate branch/edge/ >>>>> bipartition object, and move the bootstrap scores to them. That >>>>> won't be easy and will break many things... >>>> Just wanted to add that, from my point of view, it would be >>>> great if it >>>> were possible to add edge/branch objects as part of the bioperl >>>> trees. Perhaps so that the previous set of methods still behaved >>>> as before, but >>>> with some new methods on the trees such as get_splits() or >>>> get_branches() along with associated split/branch/etc. objects...? >>>> Being a bioperl user but keeping well away from coding objects >>>> in perl, >>>> the lack of such methods/objects meant I chose, in the end, not >>>> to use a >>>> bioperl solution to work with my trees (going instead for a >>>> homemade >>>> clunky python solution, where I'm happier with the OO stuff) >>>> No idea how difficult/problematic this would be to implement, >>>> though - just my 2 cents worth... >>>>> What do you think? >>>>> --Tristan >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> -- >>>> ---------------------------------------------------------------------- >>>> Aidan Budd tel:+49 (0)6221 >>>> 387 8530 >>>> EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 >>>> 387 8517 >>>> Meyerhofstr. 1, 69117 Heidelberg, Germany >>>> http://www.embl-heidelberg.de/~budd/ >>>> http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From cjfields at illinois.edu Thu Jul 16 01:44:40 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 16 Jul 2009 00:44:40 -0500 Subject: [Bioperl-l] Tree refactor? was Re: Bootstrap, root, reroot... In-Reply-To: <4A5E7CE7.4040908@cornell.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> Message-ID: (Warning, longish response, I'll probably add to my blog at some point) Robert, I agree with you, but we've had this discussion before. Repeatedly, actually. I have a page in the wiki dedicated to it, having first raised the issue myself: http://www.bioperl.org/wiki/Proposed_core_modules_changes It also has a mention on the Core page: http://www.bioperl.org/wiki/Core_package In fact, I was planning on writing up a blog entry this week on this very thing to get the ball rolling again, but it probably should go here first anyway... First: the problem we have consistently run into is exactly how to deliver a core set of modules in a way that works both for users and for release managers. We have settled on one of the original proposals noted above, starting by roughly splitting up the current 'core' into something based on similar functions and level of development/support. bioperl-dev was part of that, for instance, and represents code we consider 'developer-only' or experimental. The true 'core' would be a base set of modules with minimal additional dependencies (see below for how nebulous this becomes). If you haven't already noticed, prior to 1.6.0 Bio::Graphics basically started the process (it's now an independent release on CPAN) and we already have a bioperl-dev. As you've noted we can't split everything up right from the beginning, but we have started down that path. Second: Bio::Tree seems independent of the other modules, but that's highly misleading. Bio::Species and Bio::Taxon (and thus anything that will use said objects, like Bio::Seqs, which are very much core) are now completely dependent on Bio::Tree code. Both are-a Bio::Tree::NodeI, I believe since 1.5.2. If we split that code off it then creates a circular dependency (Bio::Species, in core, requires Bio::Tree in the bio-tree package, which in turn requires Bio::Root::Root in the core package). Bio::Tree code also has a Bio::DB::Taxonomy, thus expanding core a little bit more. Similarly, Bio::Ontology classes are used by several key modules (Bio::Annotation::OntologyTerm comes to mind, but also Bio::Annotation::OntologyTerm). In other words, there are some parts of core that can't easily be split off w/o repercussions (and thus probably won't be). Third: the largest issue in my opinion, that being what really constitutes 'core', not just to us but to current bioperl users. To me, the idea or a true 'core' is the bare essentials (Seq, Features, Annotations, and some basic IO modules, the most common interfaces). Should 'core' include SearchIO, or AlignIO? Remote and/or local DB functionality? Bio::Tools? All of those are feasibly independent sets of modules, and I would definitely support those being in their own subdistributions and would be easier to fix bugs and release updates, but I may be in the minority as they are extremely popular, and many users still consider them 'core'. We need need a workaround for that. Finally (a wrap-up of bits and pieces): a) how are the various bio-* packages to be maintained? Would there be several release pumpkins, one for each release? b) How do we sort out versioning? For instance, would bio-foo have a separate version (like Bio::Graphics now does) and require a specific core version? c) I'm sure I have forgotten a few things, but I've rambled on enough already. Now, my suggestions. We have settled on a general layout, so... * Each subdistribution would have a separate version and require a specific core (Bio::Root::Root) version. Note that Bio::Graphics is using a different versioning scheme than BioPerl, but we may want to stick to a similar tripartite numbering scheme as for core. Whatever happens, this must be decided on first, as there will be no turning back. * We repurpose Bundle::BioPerl (or a similar Bundle::* package) or make the BioPerl distribution itself a bundle-like installation. This would be for packaging up an old-style 'everything and the kitchen sink' core package from the various distributions. Anytime we split off something into it's own distribution we release a newly trimmed- down core and add the new distribution to the bundle or BioPerl. Refer everyone to install the bundle if they want the old-style installation. * Other current subdistributions (run, db, network, etc) follow the same pattern as the above. Releases for non-core distributions do not have to be tied together with core except where needed. * Avoid any circular dependencies (Bio::ASN1::EntrezGene, I'm staring at you). * As you mention, work these out on branches to test things out. And finally, and I am saying this with the utmost respect and sincerest thanks for everything Sendu is doing and has done for BioPerl, but I'm not convinced we should keep using Bio::Root::Build. It does make some things convenient, but at the cost of additional bugs (2-3 at last count), some API breakage (some methods conflict with Module::Build), and a bit of a chicken-and-egg dilemma that particularly impacts subdistributions (attempting to fall back to Module::Build doesn't work due to API issues). I can elaborate on that more if asked, but I think this post is already long enough, so I'll leave that to later. chris On Jul 15, 2009, at 8:05 PM, Robert Buels wrote: > Rather than putting this in bioperl-dev, perhaps this would be a > nice opportunity to make a new distribution called something > standard like "Bio-Tree", with a standard directory structure, and a > sane number of modules in it. > > I hadn't planned to start an actual battle about this yet, but I > would just like to get it out there that the current 'huge > monolithic distributions' model of BioPerl is completely insane. > Talking to people about BioPerl at YAPC::NA last month, I saw that > this is quite puzzling to the wider Perl community. I was going to > say it was a laughingstock, but that's not actually the case. They > are mostly puzzled and strongly suspect that it's not right. Well, > the diplomatic ones do, anyway. Matt Trout (of DBIx::Class and > Catalyst fame) would probably yell and curse about it in a very > entertaining way. > > If things were in smaller distributions, making and testing releases > would be a lot easier, because the pieces of code you're testing and > releasing are smaller, and the dependencies among the pieces are > characterized, codified, and enforced via the Build.PL files of each > distribution. > > There, I said it. > > But aside from my inflammatory remarks above, this sort of thing > need not happen all at once. The "Bio-Tree" distribution is a nice > example of how things could be extracted from or begun outside the > bioperl-* distributions, with the bioperl-* monolithic balls of mud > getting smaller as things are moved from them into their own > distributions. This needs to be done carefully, but so things like > this should probably be done only with major releases, and with lots > of notifications and release notes and things like that. > > OK, now that I've said "this sucks and needs to change", I now go on > to volunteer to do work to make it happen. I will take and execute > orders from you core developers saying things like "make a branch, > take this list of modules, copy them into a new distribution, move > their tests over, and write a Build.PL with the correct > dependencies", and later "merge the moved_thing_somewhere" branch > into the some_other_branch and test it". I bet somebody whose name > rhymes with "Jay Hannah" would probably do grunt work to help with > this also, but of course he would have to volunteer first. ;-) I > also volunteer to help teach others how to do this, but they have to > figure out how to use IRC. > > Oh, and I also volunteer to keep writing inflammatory emails. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jul 16 01:45:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 16 Jul 2009 00:45:55 -0500 Subject: [Bioperl-l] Windows and ppm:package.xml In-Reply-To: References: <498AA22B-42C1-4FF3-A22D-46C8F293DFBB@scottcain.net> Message-ID: Might want to wait a week more. I would like to start syncing the 1.6 branch with trunk to get a point release ready (I want to push an alpha sometime next week, we can test PPM with that). chris On Jul 14, 2009, at 12:20 PM, Jason Stajich wrote: > It ought to be fixed, I am sure just a reflection of forgetting to > do it. > If you can provide the XML patch I can paste it in or chris might > have time to add this in. > > -jason > On Jul 14, 2009, at 8:10 AM, Scott Cain wrote: > >> Hello, >> >> Is there a reason that http://bioperl.org/DIST/package.xml hasn't >> been freshened to include the 1.6 release? It appears that the ppm >> build of the release is there, but when I try to use ppm to install >> it (Activestate 5.8 build 826), it installs 1.5.2, since that is >> the only thing mentioned in the package.xml file. >> >> Thanks, >> Scott >> >> ----------------------------------------------------------------------- >> Scott Cain, Ph. D. scott at scottcain dot net >> GMOD Coordinator (http://gmod.org/) 216-392-3087 >> Ontario Institute for Cancer Research >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason at bioperl.org > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Thu Jul 16 03:22:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 16 Jul 2009 00:22:00 -0700 Subject: [Bioperl-l] bioperl reorganization (was Re: Tree refactor? was Re: Bootstrap, root, reroot...) In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> Message-ID: <4A5ED518.7010504@cornell.edu> Renaming thread to bioperl reorganization. Chris Fields wrote: > I agree with you, but we've had this discussion before. Repeatedly, > actually. I have a page in the wiki dedicated to it, having first raised > the issue myself: > > http://www.bioperl.org/wiki/Proposed_core_modules_changes Ah good. It's good that there's been some discussion of this already. This is a major issue. I took at the proposed changes page, and it's a fundamentally unsound idea. If we're having problems maintaining a big, monolithic distribution of modules, the solution is not "let's keep doing monolithic distributions, but just slightly smaller ones". It's just pushing the problem back a bit, we'll still have the same problems down the road. The proven, scalable, maintainable way to maintain and distribute Perl modules is small, focused distributions tied together with dependencies. (For those of you following along at home, a distribution is just a tar.gz that the cpan installer downloads and installs behind the scenes) For users, this is fine, it has to be fine, or they could not use anything else that's on the CPAN, because it is all like this (except for BioPerl). And if a user doesn't know how to use the CPAN and refuses to learn, they are missing out, and that's just how it is. It is not trivial, but it is not that hard, and if they are going to be using bioperl to write their own Perl programs, they need to learn it. It is, in 2009, an integral part of writing Perl. For developers, this system works very well for reasons already covered. And without developers, there is no code. > First: the problem we have consistently run into is exactly how to > deliver a core set of modules in a way that works both for users and for > release managers. We have settled on one of the original proposals > noted above, starting by roughly splitting up the current 'core' into > something based on similar functions and level of development/support. > bioperl-dev was part of that, for instance, and represents code we > consider 'developer-only' or experimental. The true 'core' would be a > base set of modules with minimal additional dependencies (see below for > how nebulous this becomes). > > If you haven't already noticed, prior to 1.6.0 Bio::Graphics basically > started the process (it's now an independent release on CPAN) and we > already have a bioperl-dev. As you've noted we can't split everything > up right from the beginning, but we have started down that path. The Bio::Graphics split is definitely a step in the right direction. There it is on the CPAN, (http://search.cpan.org/~lds/Bio-Graphics-1.97/). Beautiful. > Second: Bio::Tree seems independent of the other modules, but that's > highly misleading. Bio::Species and Bio::Taxon (and thus anything that > will use said objects, like Bio::Seqs, which are very much core) are now > completely dependent on Bio::Tree code. Both are-a Bio::Tree::NodeI, I > believe since 1.5.2. If we split that code off it then creates a > circular dependency (Bio::Species, in core, requires Bio::Tree in the > bio-tree package, which in turn requires Bio::Root::Root in the core > package). Bio::Tree code also has a Bio::DB::Taxonomy, thus expanding > core a little bit more. Similarly, Bio::Ontology classes are used by > several key modules (Bio::Annotation::OntologyTerm comes to mind, but > also Bio::Annotation::OntologyTerm). In other words, there are some > parts of core that can't easily be split off w/o repercussions (and thus > probably won't be). OK, Bio::Tree is definitely not the place to start then. You have to start chipping away and extracting leaf nodes in the dependency tree, and that's what was done with Bio::Graphics. > > Third: the largest issue in my opinion, that being what really > constitutes 'core', not just to us but to current bioperl users. To me, > the idea or a true 'core' is the bare essentials (Seq, Features, > Annotations, and some basic IO modules, the most common interfaces). > Should 'core' include SearchIO, or AlignIO? Remote and/or local DB > functionality? Bio::Tools? All of those are feasibly independent sets > of modules, and I would definitely support those being in their own > subdistributions and would be easier to fix bugs and release updates, > but I may be in the minority as they are extremely popular, and many > users still consider them 'core'. We need need a workaround for that. There is no workaround needed. The user types at their cpan prompt: "install Bio::SeqIO" and says 'yes' to follow dependencies. There should be no core. Only dependencies. If we want to give users a convenient abstraction of "BioPerl", the way to do that would be to revisit Bundle::BioPerl (as you say below), or do a Task::BioPerl. Really, the whole idea of having a "core" is bogus. Somebody doing phylogenetics will say the Tree stuff should be core, because, you know, whatever else would you use BioPerl for anyway, but me, who runs genome annotation pipelines and data handling, does not give a hoot about trees. At least not right now. So you can go round and round arguing about what should and should not be in core, and you will never come to a set of modules that satisfies even "most researchers'" needs unless you have a huge, unmaintainable monolithic distribution, which, as has been demonstrated, is not a good idea. > Finally (a wrap-up of bits and pieces): a) how are the various bio-* > packages to be maintained? Would there be several release pumpkins, one > for each release? > b) How do we sort out versioning? For instance, > would bio-foo have a separate version (like Bio::Graphics now does) and > require a specific core version? c) I'm sure I have forgotten a few > things, but I've rambled on enough already. Each distribution would be versioned and released independently. Perhaps they could all start out at version 1.6. If there is a change in one module that breaks something in another distro (which of course should not be done lightly) it's the responsibility of the other distro's maintainer to fix it or code around it or pin it down with a specific version number dependency in its Build.PL, or whatever. Finding and characterizing these interactions is what automated testing is for, and why it's built into CPAN. > Grrr!!! No breather! (just kidding) > Now, my suggestions. We have settled on a general layout, so... > * Each subdistribution would have a separate version and require a > specific core (Bio::Root::Root) version. Note that Bio::Graphics is > using a different versioning scheme than BioPerl, but we may want to > stick to a similar tripartite numbering scheme as for core. Whatever > happens, this must be decided on first, as there will be no turning back. > * We repurpose Bundle::BioPerl (or a similar Bundle::* package) or make > the BioPerl distribution itself a bundle-like installation. This would > be for packaging up an old-style 'everything and the kitchen sink' core > package from the various distributions. Anytime we split off something > into it's own distribution we release a newly trimmed-down core and add > the new distribution to the bundle or BioPerl. Refer everyone to > install the bundle if they want the old-style installation. > * Other current subdistributions (run, db, network, etc) follow the same > pattern as the above. Releases for non-core distributions do not have > to be tied together with core except where needed. > * Avoid any circular dependencies (Bio::ASN1::EntrezGene, I'm staring at > you). (Is there any point in staring into its dead sunken eye sockets? It was last released in 2005. Need to remove this dependency, rewriting the module in question if necessary.) > * As you mention, work these out on branches to test things out. The above is all exactly right. The proposed layout of the distributions is the only thing that's wrong. They need to be much smaller, more focused, and thus more maintainable. > > And finally, and I am saying this with the utmost respect and sincerest > thanks for everything Sendu is doing and has done for BioPerl, but I'm > not convinced we should keep using Bio::Root::Build. It does make some > things convenient, but at the cost of additional bugs (2-3 at last > count), some API breakage (some methods conflict with Module::Build), > and a bit of a chicken-and-egg dilemma that particularly impacts > subdistributions (attempting to fall back to Module::Build doesn't work > due to API issues). I can elaborate on that more if asked, but I think > this post is already long enough, so I'll leave that to later. Yes, please elaborate on that more. I want to know. Such progress. Seems like now we just need to get everyone to agree that distributions need to be small and focused. Right? Rob From dan.bolser at gmail.com Thu Jul 16 04:57:21 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Thu, 16 Jul 2009 09:57:21 +0100 Subject: [Bioperl-l] bioperl reorganization (was Re: Tree refactor? was Re: Bootstrap, root, reroot...) In-Reply-To: <4A5ED518.7010504@cornell.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> Message-ID: <2c8757af0907160157x73c32467qe526373da14cf00f@mail.gmail.com> 2009/7/16 Robert Buels : > Renaming thread to bioperl reorganization. > > Chris Fields wrote: >> ... > Such progress. > > Seems like now we just need to get everyone to agree that distributions need > to be small and focused. > > Right? I agree! > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florian.mittag at uni-tuebingen.de Thu Jul 16 05:36:22 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 16 Jul 2009 11:36:22 +0200 Subject: [Bioperl-l] Can I load ontologies into BioSQL? In-Reply-To: References: <59386.10.2.4.168.1245679938.squirrel@webmail.istge.it> Message-ID: <200907161136.22499.florian.mittag@uni-tuebingen.de> Hi! I wrote this under the subject of the DB2 driver, too, but this particular problem seems to also fit here. I don't get the error message, but the same warnings as Achille, although I'm using the new OBO format. As you can see, the problems start right after the script gets to "relationships", before this there are no warnings or errors. (The lines between the warnings are debug information I added myself for a different problem.) --- snip --- [...] ... relationships SELECT UK Bio::DB::BioSQL::TermAdaptoridentifier identifier : GO:0060058 SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, cast(NULL as VARCHAR(255)), term.ontology_id FROM term WHERE identifier = ? -------------------- WARNING --------------------- MSG: GOC:dph exists in the dblink of _default --------------------------------------------------- -------------------- WARNING --------------------- MSG: PMID:15282149 exists in the dblink of _default --------------------------------------------------- SELECT UK Bio::DB::BioSQL::OntologyAdaptorname name : gene_ontology SELECT ontology.ontology_id, ontology.name, ontology.definition FROM ontology WHERE name = ? SELECT UK Bio::DB::BioSQL::TermAdaptorname;ontology ontology : 2 name : IS_A SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, cast(NULL as VARCHAR(255)), term.ontology_id FROM term WHERE ontology_id = ? AND name = ? SELECT UK Bio::DB::BioSQL::TermAdaptoridentifier identifier : GO:0043065 SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, cast(NULL as VARCHAR(255)), term.ontology_id FROM term WHERE identifier = ? -------------------- WARNING --------------------- MSG: GOC:jl exists in the dblink of _default --------------------------------------------------- SELECT UK Bio::DB::BioSQL::TermAdaptoridentifier identifier : GO:0007429 SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, cast(NULL as VARCHAR(255)), term.ontology_id FROM term WHERE identifier = ? -------------------- WARNING --------------------- MSG: GOC:mtg_sensu exists in the dblink of _default --------------------------------------------------- [...] --- snap --- The command is: perl load_ontology.pl --driver DB2 --dbname bioseqdb --dbuser --dbpass --namespace "Gene Ontology" --format obo /tmp/gene_ontology.1_2.obo Any ideas? Regards, Florian On Saturday 04 July 2009 14:02, Hilmar Lapp wrote: > according to Chris Mungall from the GO Consortium, the .ontology files > have been deprecated by GO. You should use the .obo files instead, and > BioPerl has a parser for that (and load_ontology.pl supports all > formats that BioPerl supports). > > There has been a near identical issue report earlier (April 20 - I > don't have the thread from the archives at hand). According to Chris, > the BioPerl parser for the .ontology files appears to fail to deal > with the new relations in GO, and so with the obsoletion of > the .ontology format we have scheduled the respective parser for > deprecation. > > -hilmar > > On Jun 22, 2009, at 10:12 AM, Achille Zappa wrote: > > Hi guys > > > > I'm working with biosql and I try to figure out how to load ontologies > > into biosql. > > > > I've tried to load the flat files gene ontologies : > > > > load_ontology.pl --driver mysql --dbuser xxx --dbpass xxx --host > > localhost --dbname biosql --namespace "Gene Ontology" --format goflat > > --fmtargs "-defs_file,GO.defs" function.ontology process.ontology > > component.ontology > > > > as in the script info but I have an error, > > > > a lot of ------------ WARNING --------------------- > > MSG: DBLink exists in the dblink of _default > > --------------------------------------------------- > > and at the end > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: format error (file /home/user/Download/process.ontology) > > offending line: > > -negative regulation of angiogenesis ; GO:0016525 ; synonym:down > > regulation of angiogenesis ; synonym:down\-regulation of angiogenesis > > ; synonym:downregulation of angiogenesis ; synonym:inhibition of > > angiogenesis % negative regulation of developmental process ; > > GO:0051093 % regulation of angiogenesis ; GO:0045765 > > > > STACK: Error::throw > > STACK: Bio::Root::Root::throw > > /usr/lib/perl5/vendor_perl/5.10.0/Bio/Root/Root.pm:357 > > STACK: Bio::OntologyIO::dagflat::_parse_flat_file > > /usr/lib/perl5/vendor_perl/5.10.0/Bio/OntologyIO/dagflat.pm:627 > > STACK: Bio::OntologyIO::dagflat::parse > > /usr/lib/perl5/vendor_perl/5.10.0/Bio/OntologyIO/dagflat.pm:284 > > STACK: Bio::OntologyIO::dagflat::next_ontology > > /usr/lib/perl5/vendor_perl/5.10.0/Bio/OntologyIO/dagflat.pm:317 > > STACK: load_ontology.pl:604 > > ----------------------------------------------------------- > > > > could you help me? > > is it possible to use the OBO format with the loader? > > those GO flat files are deprecated by the Gene Ontology site > > is there a list of format to use with the biosql perl scripts? > > > > thank you > > regards > > Achille From paolo.pavan at gmail.com Thu Jul 16 06:17:12 2009 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Thu, 16 Jul 2009 12:17:12 +0200 Subject: [Bioperl-l] Bio::SimpleAlign constructor? Message-ID: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> Hi, I have a brief question: I would like to know if there is a method to obtain a valid formatted and flush Bio::SimpleAlign object (i.e. properly filled with gaps on the right and on the left side of each sequence) given a bounch of Bio::LocatableSeq objects in which I have specified the -start and -end properties. Can anyone help me? Thank you very much, Paolo From cjfields at illinois.edu Thu Jul 16 08:04:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 16 Jul 2009 07:04:53 -0500 Subject: [Bioperl-l] bioperl reorganization (was Re: Tree refactor? was Re: Bootstrap, root, reroot...) In-Reply-To: <2c8757af0907160157x73c32467qe526373da14cf00f@mail.gmail.com> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <2c8757af0907160157x73c32467qe526373da14cf00f@mail.gmail.com> Message-ID: <20A78572-D4A7-4742-8990-341B777511FC@illinois.edu> On Jul 16, 2009, at 3:57 AM, Dan Bolser wrote: > 2009/7/16 Robert Buels : >> Renaming thread to bioperl reorganization. >> >> Chris Fields wrote: >>> > > ... > >> Such progress. >> >> Seems like now we just need to get everyone to agree that >> distributions need >> to be small and focused. >> >> Right? > > I agree! They must also be maintainable. I agree that splitting these up should make them more so, but this shouldn't be approached lightly. chris From asjo at koldfront.dk Thu Jul 16 08:47:44 2009 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Thu, 16 Jul 2009 14:47:44 +0200 Subject: [Bioperl-l] Bio::SeqIO::abi - [PATCH] Update pod and make get_trace_data() return the current value. Message-ID: <87eisg6bgv.fsf@topper.koldfront.dk> The pod references the option -read_graph_data and the method read_graph_data(), but neither are handled by the code; the code uses "get_trace_data". The method get_trace_data() is used as an accessor in the code: called without an argument to read the value - but the method overwrites the current value with 0 if called without any arguments; so calling get_trace_data() without arguments returns 0 always, making it impossible to reach the read_trace_with_graph() call. --- Bio/SeqIO/abi.pm | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Bio/SeqIO/abi.pm b/Bio/SeqIO/abi.pm index d6bc2da..2638b38 100644 --- a/Bio/SeqIO/abi.pm +++ b/Bio/SeqIO/abi.pm @@ -24,7 +24,7 @@ Do not use this module directly. Use it via the Bio::SeqIO class. This object can transform Bio::Seq objects to and from abi trace files. To optionally read the trace graph data (which can be used to draw chromatographs, for instance), set the optional -'-read_graph_data' flag or the read_graph_data method to a value +'-get_trace_data' flag or the get_trace_data method to a value evaluating to TRUE. =head1 FEEDBACK @@ -182,7 +182,7 @@ sub write_seq { sub get_trace_data { my ($self, $val) = @_; - $self->{_get_trace_data} = $val ? 1 : 0; + $self->{_get_trace_data} = $val ? 1 : 0 if (defined $val); $self->{_get_trace_data}; } -- 1.6.0.4 From cjfields at illinois.edu Thu Jul 16 09:04:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 16 Jul 2009 08:04:10 -0500 Subject: [Bioperl-l] Bio::SeqIO::abi - [PATCH] Update pod and make get_trace_data() return the current value. In-Reply-To: <87eisg6bgv.fsf@topper.koldfront.dk> References: <87eisg6bgv.fsf@topper.koldfront.dk> Message-ID: Adam, I'll patch these, however the best way to send patches is by following this: http://www.bioperl.org/wiki/HOWTO:SubmitPatch chris On Jul 16, 2009, at 7:47 AM, Adam Sj?gren wrote: > The pod references the option -read_graph_data and the method > read_graph_data(), but neither are handled by the code; the code > uses "get_trace_data". > > The method get_trace_data() is used as an accessor in the code: > called without an argument to read the value - but the method > overwrites the current value with 0 if called without any arguments; > so calling get_trace_data() without arguments returns 0 always, > making it impossible to reach the read_trace_with_graph() call. > --- > Bio/SeqIO/abi.pm | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/Bio/SeqIO/abi.pm b/Bio/SeqIO/abi.pm > index d6bc2da..2638b38 100644 > --- a/Bio/SeqIO/abi.pm > +++ b/Bio/SeqIO/abi.pm > @@ -24,7 +24,7 @@ Do not use this module directly. Use it via the > Bio::SeqIO class. > This object can transform Bio::Seq objects to and from abi trace > files. To optionally read the trace graph data (which can be used > to draw chromatographs, for instance), set the optional > -'-read_graph_data' flag or the read_graph_data method to a value > +'-get_trace_data' flag or the get_trace_data method to a value > evaluating to TRUE. > > =head1 FEEDBACK > @@ -182,7 +182,7 @@ sub write_seq { > > sub get_trace_data { > my ($self, $val) = @_; > - $self->{_get_trace_data} = $val ? 1 : 0; > + $self->{_get_trace_data} = $val ? 1 : 0 if (defined $val); > $self->{_get_trace_data}; > } > > -- > 1.6.0.4 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From giles.weaver at googlemail.com Thu Jul 16 09:13:51 2009 From: giles.weaver at googlemail.com (Giles Weaver) Date: Thu, 16 Jul 2009 14:13:51 +0100 Subject: [Bioperl-l] Bootstrap, root, reroot... In-Reply-To: References: <7474C443-028A-409C-B50D-98A83852E554@illinois.edu> <60B160081EEB4C0DB3C2D1DE756E1A7C@NewLife> Message-ID: <1d06cd5d0907160613l1864c8fat36b05f5003d04157@mail.gmail.com> All this talk of nodes and edges makes me thing of Cytoscape. Cytoscape was developed to visualise molecular interaction networks, but can be used to display all kinds of things, including trees. Has anyone considered a generic Bio::Network to describe nodes and edges? A Bio::Tree could be implemented as a (relatively) simple Bio::Network. A more complicated Bio::Network might be something like a Bio::Model, containing a mathematical model such as those found in the biomodels database. This could be very useful for systems biology. Giles 2009/7/16 Chris Fields > Well, what I was thinking that Nodes sharing an edge could share the same > hash ref containing the edge information, with said hash ref magically > becoming an object when absolutely needed (keys being named parameters, > values being args for constructor). Just a thought. > > chris > > > On Jul 15, 2009, at 10:43 PM, Mark A. Jensen wrote: > > To examine rvos' Bio::Phylo was my plan exactly-- Lazy edges can be done I >> believe, although it seems >> that one of the main reasons to have edges is to >> attach lengths, bootstrap values, etc to them; so we may >> ultimately avoid edge creation only when we construct tree >> topology only--prob rare in practice? >> >> ----- Original Message ----- From: "Chris Fields" >> To: "Mark A. Jensen" >> Cc: "Aidan Budd" ; "BioPerl List" < >> bioperl-l at lists.open-bio.org>; >> Sent: Wednesday, July 15, 2009 11:29 PM >> Subject: Re: [Bioperl-l] Bootstrap, root, reroot... >> >> >> I don't; code away. >>> >>> With the edge/branch objects, I'm wondering whether those can be created >>> lazily (only when needed); it might lighten up the tree a bit. Also, don't >>> forget to look at Rutger's Bio::Phylo project, though I think his modules >>> use inside-out objects (might not be easy to work into core unless they are >>> wrapped). >>> >>> chris >>> >>> On Jul 15, 2009, at 4:54 PM, Mark A. Jensen wrote: >>> >>> After fooling around with bug 2877, I'm thinking seriously about >>>> starting the edge-branch >>>> project in bioperl-dev, building out an implementation off the >>>> interfaces B:T:TreeI and B:T:NodeI. It would >>>> give the opp'y for some code rationalization too. >>>> Anyone out there have a problem with that? >>>> cheers MAJ >>>> ----- Original Message ----- From: "Aidan Budd" < >>>> budd at embl-heidelberg.de >>>> > >>>> To: "BioPerl List" >>>> Sent: Saturday, July 11, 2009 3:52 AM >>>> Subject: Re: [Bioperl-l] Bootstrap, root, reroot... >>>> >>>> >>>> On Thu, 9 Jul 2009, Tristan Lefebure wrote: >>>>> >>>>>> ... >>>>>> My understanding here is that the problem is linked to the well- known >>>>>> difficulty to differentiate node from branch labels in newick trees. >>>>>> Bootstrap scores are branch attributes not node attributes, but since >>>>>> Bio::TreeI has no branch/edge/bipartition object they are attached to a >>>>>> node, and in fact reflects the bootstrap score of the ancestral branch >>>>>> leading to that node. Troubles naturally come when you are dealing with an >>>>>> unrooted tree or reroot a tree: a child can become an ancestor, and, if the >>>>>> bootstrap scores is not moved from the old child to the new child, it will >>>>>> end up attached at the wrong place (i.e. wrong node). I see several fix to >>>>>> that: >>>>>> 1- incorporate Bank's fix into the root() method. I.e. if there is >>>>>> bootstrap score, after re-rooting, the one on the old to new ancestor path, >>>>>> should be moved to the right node. 2- Modify the way trees are stored in >>>>>> bioperl to incorporate branch/edge/ bipartition object, and move the >>>>>> bootstrap scores to them. That won't be easy and will break many things... >>>>>> >>>>> Just wanted to add that, from my point of view, it would be great if >>>>> it >>>>> were possible to add edge/branch objects as part of the bioperl trees. >>>>> Perhaps so that the previous set of methods still behaved as before, but >>>>> with some new methods on the trees such as get_splits() or >>>>> get_branches() along with associated split/branch/etc. objects...? >>>>> Being a bioperl user but keeping well away from coding objects in >>>>> perl, >>>>> the lack of such methods/objects meant I chose, in the end, not to use >>>>> a >>>>> bioperl solution to work with my trees (going instead for a homemade >>>>> clunky python solution, where I'm happier with the OO stuff) >>>>> No idea how difficult/problematic this would be to implement, though - >>>>> just my 2 cents worth... >>>>> >>>>>> What do you think? >>>>>> --Tristan >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>> -- >>>>> ---------------------------------------------------------------------- >>>>> Aidan Budd tel:+49 (0)6221 387 8530 >>>>> EMBL - European Molecular Biology Laboratory fax:+49 (0)6221 387 8517 >>>>> Meyerhofstr. 1, 69117 Heidelberg, Germany >>>>> http://www.embl-heidelberg.de/~budd/ >>>>> http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From abhishek.vit at gmail.com Thu Jul 16 09:21:11 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Thu, 16 Jul 2009 09:21:11 -0400 Subject: [Bioperl-l] Classifying SNPs In-Reply-To: References: Message-ID: Hi Pablo Many thanks for for your reply. I am currently attending a workshop so might not get time to check out your suggestions. Once I am back I will get back to you in case I have any questions. Thanks, -Abhi On Tue, Jul 14, 2009 at 7:57 PM, Pablo Marin-Garcia wrote: > > fixing a typo and explaining a gotcha > > On Tue, 14 Jul 2009, Pablo Marin-Garcia wrote: > > >> Hello Abhishek >> >> Ensembl has a module for calculate SNP consequences in a transcript. >> >> The script that they use to create their consequences is located in: >> >> >> ensembl-55/ensembl-variation/scripts/import/parallel_transcript_variation.pl >> >> The important bit is to convert your snp coordenates and the >> variation_allele into a ConsequenceType object >> >> $consequence_type = >> Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$chr,$start,$end,$strand,\@alleles); >> >> > fixing typo: (instead $chr it would be a $variation_id) > > > Bio::EnsEMBL::Variation::ConsequenceType->new($tr->dbID,$var_id,$var_start,$var_end,$var_strand,\@alleles); > > warning: > > The transcript_id and the variation_id are not important if you are not > building a ensembl database. > > BUT the gotcha part is that the start and end of the variation should refer > to the same slice start than the transcript used in the next step > (type_variation). Be careful because depending how you select the gene or > slice to retrieve your transcripts your transcript start and end would be > the chromosome coordinates or a relative start/end from the slice start. > > You should work with chr positions for the variations and the transcripts > (where start/end == seq_region_start/seq_region_end) to avoid problems. > > and pass this and a transcript to the type_variation >> Bio::EnsEMBL::Utils::TranscriptAlleles exported method >> >> $consequences = type_variation($tr, $gene, $consequence_type); >> >> > The $gene is optional > > > in the module >> >> ensembl-55/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm >> >> The other important bit in this script is that now the functional_genomics >> consequences are calculated in this script instead in the type_variation() >> >> The only drawback is that it return only the ensembl classes of >> consequences , but you can extend that later if you need more specific >> consequences (I have done that in the past for different projects). >> >> This ensembl aproach will save you a lot of problems with the mapping from >> gene to protein and with multiple snps in a codon. >> >> If you have experience with ensembl then is easy to follow the code. If >> not you can always ask for help in the ensembl-dev mailing list ( >> ensembl-dev at ebi.ac.uk) >> >> >> If you want to read the code without checking out the whole api: >> >> >> >> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl-variation/scripts/import/parallel_transcript_variation.pl?revision=1.27&root=ensembl&view=markup >> >> http://cvs.sanger.ac.uk/cgi-bin/viewvc.cgi/ensembl/modules/Bio/EnsEMBL/Utils/TranscriptAlleles.pm?root=ensembl&view=log >> >> >> hope this helps >> >> >> - Pablo >> >> >> >> >> > ===================================================================== > Pablo Marin-Garcia, PhD > > \\// (Argiope bruennichi > \/\/`(||>O:'\/\/ with stabilimentum) > //\\ > > Sanger Institute | PostDoc / Computer Biologist > Wellcome Trust Genome Campus | team : 128/108 (Human Genetics) > Hinxton, Cambridge CB10 1HH | room : N333 > United Kingdom | email: pablo.marin at sanger.ac.uk > ==================================================================== > > > > > > > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maj at fortinbras.us Thu Jul 16 08:48:16 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 16 Jul 2009 08:48:16 -0400 Subject: [Bioperl-l] Fw: perly suffix trees-- Message-ID: <641BC261614D4DEDB901AD110A39013D@NewLife> To the list with you, I have no secrets-- maybe Ian will chime in. cheers, MAJ ----- Original Message ----- From: Aaron Mackey To: Mark A. Jensen Sent: Thursday, July 16, 2009 8:41 AM Subject: Re: [Bioperl-l] perly suffix trees-- The code on that wiki page looks suspiciously incomplete. For example, you declare $i = 0 in readDictionary, then never use it again. It also looks like you only ever create entries for the entire word, and never any suffices (which presumably was what the $i was going to be for, to offset into each word). Further, it looks like the readDictionary loop will clobber already-seen fragments by reassigning "1" (when they might already be the prefix to some other suffix). Perhaps the missing $i loop would reveal to me how this would be avoided. It also seems like testing for a hashref to equal 1 during the search is asking for type mismatch trouble; perhaps better to directly test the ref() status to determine rightmost/inner status? But that's just style, not substance. Best wishes, -Aaron On Mon, Jul 13, 2009 at 10:15 PM, Mark A. Jensen wrote: Hi All- Russell sent me an almost magical Perl algorithm for creating a suffix tree or something like one. It was cool enough to make a scrap out of it-- http://www.bioperl.org/wiki/Suffix_trees_from_thin_air Have a look; might be diverting- cheers Mark _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jul 16 09:36:45 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 16 Jul 2009 09:36:45 -0400 Subject: [Bioperl-l] perly suffix trees-- In-Reply-To: <24c96eca0907160541u5807a6d3nf479be3dcbca48fa@mail.gmail.com> References: <3F34863C45914120A62B84BF973FAB76@NewLife> <24c96eca0907160541u5807a6d3nf479be3dcbca48fa@mail.gmail.com> Message-ID: <564CEA6E5A6B4111941A9A82B43CA6B4@NewLife> Modifications to http://www.bioperl.org/wiki/Suffix_trees_from_thin_air per these comments. Add/correct at will! cheers MAJ ----- Original Message ----- From: Aaron Mackey To: Mark A. Jensen Sent: Thursday, July 16, 2009 8:41 AM Subject: Re: [Bioperl-l] perly suffix trees-- The code on that wiki page looks suspiciously incomplete. For example, you declare $i = 0 in readDictionary, then never use it again. It also looks like you only ever create entries for the entire word, and never any suffices (which presumably was what the $i was going to be for, to offset into each word). Further, it looks like the readDictionary loop will clobber already-seen fragments by reassigning "1" (when they might already be the prefix to some other suffix). Perhaps the missing $i loop would reveal to me how this would be avoided. It also seems like testing for a hashref to equal 1 during the search is asking for type mismatch trouble; perhaps better to directly test the ref() status to determine rightmost/inner status? But that's just style, not substance. Best wishes, -Aaron On Mon, Jul 13, 2009 at 10:15 PM, Mark A. Jensen wrote: Hi All- Russell sent me an almost magical Perl algorithm for creating a suffix tree or something like one. It was cool enough to make a scrap out of it-- http://www.bioperl.org/wiki/Suffix_trees_from_thin_air Have a look; might be diverting- cheers Mark _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From asjo at koldfront.dk Thu Jul 16 09:49:17 2009 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Thu, 16 Jul 2009 15:49:17 +0200 Subject: [Bioperl-l] Bio::SeqIO::abi - Update pod and make get_trace_data() return the current value. In-Reply-To: (Chris Fields's message of "Thu, 16 Jul 2009 08:04:10 -0500") References: <87eisg6bgv.fsf@topper.koldfront.dk> Message-ID: <87ab3468ma.fsf@topper.koldfront.dk> On Thu, 16 Jul 2009 08:04:10 -0500, Chris wrote: > I'll patch these, Thanks! > however the best way to send patches is by following this: > http://www.bioperl.org/wiki/HOWTO:SubmitPatch Ah, I missed that, sorry. I'll go through bugzilla from now on. Best regards, Adam -- "Perl 5 is a velociraptor, but we need an Adam Sj?gren acceloraptor now." asjo at koldfront.dk From ajmackey at gmail.com Thu Jul 16 09:55:17 2009 From: ajmackey at gmail.com (Aaron Mackey) Date: Thu, 16 Jul 2009 09:55:17 -0400 Subject: [Bioperl-l] Fw: perly suffix trees-- In-Reply-To: <641BC261614D4DEDB901AD110A39013D@NewLife> References: <641BC261614D4DEDB901AD110A39013D@NewLife> Message-ID: <24c96eca0907160655n7402e130wb4de6424918c431a@mail.gmail.com> Fair enough, thanks! FYI, I've used one of Ian Korf's other clever tricks in my own code before: using six different intron states (combined with 3 exon states that correspond to the previous intron's phase) in a gene-model HMM to ensure that spliced codons don't encode stop codons. I'm not aware of any other gene finder that implements that structure directly (my impression is that these cases are disallowed through some *post hoc* checking). -Aaron On Thu, Jul 16, 2009 at 8:48 AM, Mark A. Jensen wrote: > To the list with you, I have no secrets-- maybe Ian will chime in. > cheers, > MAJ > ----- Original Message ----- > From: Aaron Mackey > To: Mark A. Jensen > Sent: Thursday, July 16, 2009 8:41 AM > Subject: Re: [Bioperl-l] perly suffix trees-- > > > The code on that wiki page looks suspiciously incomplete. For example, you > declare $i = 0 in readDictionary, then never use it again. It also looks > like you only ever create entries for the entire word, and never any > suffices (which presumably was what the $i was going to be for, to offset > into each word). > > Further, it looks like the readDictionary loop will clobber already-seen > fragments by reassigning "1" (when they might already be the prefix to some > other suffix). Perhaps the missing $i loop would reveal to me how this > would be avoided. > > It also seems like testing for a hashref to equal 1 during the search is > asking for type mismatch trouble; perhaps better to directly test the ref() > status to determine rightmost/inner status? But that's just style, not > substance. > > Best wishes, > > -Aaron > > > On Mon, Jul 13, 2009 at 10:15 PM, Mark A. Jensen > wrote: > > Hi All- > Russell sent me an almost magical Perl algorithm for creating a suffix > tree > or something like one. It was cool enough to make a scrap out of it-- > http://www.bioperl.org/wiki/Suffix_trees_from_thin_air > Have a look; might be diverting- > cheers > Mark > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Jul 16 12:31:20 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 16 Jul 2009 11:31:20 -0500 Subject: [Bioperl-l] Fw: perly suffix trees-- In-Reply-To: <24c96eca0907160655n7402e130wb4de6424918c431a@mail.gmail.com> References: <641BC261614D4DEDB901AD110A39013D@NewLife> <24c96eca0907160655n7402e130wb4de6424918c431a@mail.gmail.com> Message-ID: <87FA3D73-EE46-4FD8-858B-294EE9A3D836@illinois.edu> I still like masak's perl6 simple oneliner for translating a simple sequence: my $dna = "ttaagg"; sub translate($dna) { "FFLLSSSSYY!!CC! WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG".comb[map { : 4($_) }, $dna.trans("tcag" => "0123").comb(/.../)] }; say translate($dna) Scary, ain't it? It works quite well with rakudo. See the following for the explanation: http://use.perl.org/~masak/journal/39238 chris On Jul 16, 2009, at 8:55 AM, Aaron Mackey wrote: > Fair enough, thanks! > > FYI, I've used one of Ian Korf's other clever tricks in my own code > before: > using six different intron states (combined with 3 exon states that > correspond to the previous intron's phase) in a gene-model HMM to > ensure > that spliced codons don't encode stop codons. I'm not aware of any > other > gene finder that implements that structure directly (my impression > is that > these cases are disallowed through some *post hoc* checking). > > -Aaron > > On Thu, Jul 16, 2009 at 8:48 AM, Mark A. Jensen > wrote: > >> To the list with you, I have no secrets-- maybe Ian will chime in. >> cheers, >> MAJ >> ----- Original Message ----- >> From: Aaron Mackey >> To: Mark A. Jensen >> Sent: Thursday, July 16, 2009 8:41 AM >> Subject: Re: [Bioperl-l] perly suffix trees-- >> >> >> The code on that wiki page looks suspiciously incomplete. For >> example, you >> declare $i = 0 in readDictionary, then never use it again. It also >> looks >> like you only ever create entries for the entire word, and never any >> suffices (which presumably was what the $i was going to be for, to >> offset >> into each word). >> >> Further, it looks like the readDictionary loop will clobber already- >> seen >> fragments by reassigning "1" (when they might already be the prefix >> to some >> other suffix). Perhaps the missing $i loop would reveal to me how >> this >> would be avoided. >> >> It also seems like testing for a hashref to equal 1 during the >> search is >> asking for type mismatch trouble; perhaps better to directly test >> the ref() >> status to determine rightmost/inner status? But that's just style, >> not >> substance. >> >> Best wishes, >> >> -Aaron >> >> >> On Mon, Jul 13, 2009 at 10:15 PM, Mark A. Jensen >> wrote: >> >> Hi All- >> Russell sent me an almost magical Perl algorithm for creating a >> suffix >> tree >> or something like one. It was cool enough to make a scrap out of it-- >> http://www.bioperl.org/wiki/Suffix_trees_from_thin_air >> Have a look; might be diverting- >> cheers >> Mark >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jul 16 12:48:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 16 Jul 2009 11:48:53 -0500 Subject: [Bioperl-l] bioperl reorganization (was Re: Tree refactor? was Re: Bootstrap, root, reroot...) In-Reply-To: <4A5ED518.7010504@cornell.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> Message-ID: On Jul 16, 2009, at 2:22 AM, Robert Buels wrote: > Renaming thread to bioperl reorganization. > > Chris Fields wrote: >> I agree with you, but we've had this discussion before. Repeatedly, >> actually. I have a page in the wiki dedicated to it, having first >> raised the issue myself: >> >> http://www.bioperl.org/wiki/Proposed_core_modules_changes > > Ah good. It's good that there's been some discussion of this > already. This is a major issue. I took at the proposed changes > page, and it's a fundamentally unsound idea. If we're having > problems maintaining a big, monolithic distribution of modules, the > solution is not "let's keep doing monolithic distributions, but just > slightly smaller ones". It's just pushing the problem back a bit, > we'll still have the same problems down the road. > ... > > For developers, this system works very well for reasons already > covered. And without developers, there is no code. Robert, it helps to read the older mail threads the page links to for some historical context. This has been argued fairly extensively, with the proposed split written up on the page being the *initial* one, with more likely to occur along the way. BTW, this was originally planned for 1.6 but we were trying to cram too much into the release, so I bit the bullet and pushed it back until we had 1.6.0 out. As you have seen we did manage to have Bio::Graphics migrate out successfully. >> First: the problem we have consistently run into is exactly how to >> deliver a core set of modules in a way that works both for users >> and for release managers. We have settled on one of the original >> proposals noted above, starting by roughly splitting up the current >> 'core' into something based on similar functions and level of >> development/support. bioperl-dev was part of that, for instance, >> and represents code we consider 'developer-only' or experimental. >> The true 'core' would be a base set of modules with minimal >> additional dependencies (see below for how nebulous this becomes). >> If you haven't already noticed, prior to 1.6.0 Bio::Graphics >> basically started the process (it's now an independent release on >> CPAN) and we already have a bioperl-dev. As you've noted we can't >> split everything up right from the beginning, but we have started >> down that path. > > The Bio::Graphics split is definitely a step in the right direction. > There it is on the CPAN, (http://search.cpan.org/~lds/Bio-Graphics-1.97/ > ). Beautiful. > >> Second: Bio::Tree seems independent of the other modules, but >> that's highly misleading. Bio::Species and Bio::Taxon (and thus >> anything that will use said objects, like Bio::Seqs, which are very >> much core) are now completely dependent on Bio::Tree code. Both >> are-a Bio::Tree::NodeI, I believe since 1.5.2. If we split that >> code off it then creates a circular dependency (Bio::Species, in >> core, requires Bio::Tree in the bio-tree package, which in turn >> requires Bio::Root::Root in the core package). Bio::Tree code also >> has a Bio::DB::Taxonomy, thus expanding core a little bit more. >> Similarly, Bio::Ontology classes are used by several key modules >> (Bio::Annotation::OntologyTerm comes to mind, but also >> Bio::Annotation::OntologyTerm). In other words, there are some >> parts of core that can't easily be split off w/o repercussions (and >> thus probably won't be). > > OK, Bio::Tree is definitely not the place to start then. You have > to start chipping away and extracting leaf nodes in the dependency > tree, and that's what was done with Bio::Graphics. That will be the issue (and one of the reasons I brought up SearchIO, AlignIO, Tools, etc). >> Third: the largest issue in my opinion, that being what really >> constitutes 'core', not just to us but to current bioperl users. >> To me, the idea or a true 'core' is the bare essentials (Seq, >> Features, Annotations, and some basic IO modules, the most common >> interfaces). >> Should 'core' include SearchIO, or AlignIO? Remote and/or local DB >> functionality? Bio::Tools? All of those are feasibly independent >> sets of modules, and I would definitely support those being in >> their own subdistributions and would be easier to fix bugs and >> release updates, but I may be in the minority as they are extremely >> popular, and many users still consider them 'core'. We need need a >> workaround for that. > > There is no workaround needed. The user types at their cpan prompt: > "install Bio::SeqIO" and says 'yes' to follow dependencies. There > should be no core. Only dependencies. If we want to give users a > convenient abstraction of "BioPerl", the way to do that would be to > revisit Bundle::BioPerl (as you say below), or do a Task::BioPerl. Well, a Task::BioPerl or Bundle::BioPerl would essentially be a workaround. I consider anything to appease long-time users who expect an old-style core a 'workaround', though one might use 'solution' there as well. > Really, the whole idea of having a "core" is bogus. Somebody doing > phylogenetics will say the Tree stuff should be core, because, you > know, whatever else would you use BioPerl for anyway, but me, who > runs genome annotation pipelines and data handling, does not give a > hoot about trees. At least not right now. So you can go round and > round arguing about what should and should not be in core, and you > will never come to a set of modules that satisfies even "most > researchers'" needs unless you have a huge, unmaintainable > monolithic distribution, which, as has been demonstrated, is not a > good idea. I don't agree. I do think there is a 'core' set of modules (Bio::Root, if you want to take the most extreme point of view, would represent the purest core set of modules). Most larger projects define a core set. Perl itself. Moose as well; they have had recent discussions with adapting AttributeHelpers to Moose core: http://thread.gmane.org/gmane.comp.lang.perl.moose/890 The difference between Moose and BioPerl is Moose has effectively preempted the large distribution issue with MooseX::*, which goes with it's own versioning. However, for the tons of MooseX::*, there will always be one Moose 'core' set of modules. Conversely (and my point with the question), bioperl's core was never truly defined as 'this is the base set of modules, everything else is a separate distribution', and therefore it has grown to an almost unmaintainable proportion. Essentially we're the reverse of Moose, having to deal with splitting up a very large core into more maintainable bits. I think it's possible, but it won't be easy w/o having some way of bundling the whole lot together. >> Finally (a wrap-up of bits and pieces): a) how are the various bio- >> * packages to be maintained? Would there be several release >> pumpkins, one for each release? >> b) How do we sort out versioning? For instance, would bio-foo have >> a separate version (like Bio::Graphics now does) and require a >> specific core version? c) I'm sure I have forgotten a few things, >> but I've rambled on enough already. > > Each distribution would be versioned and released independently. > Perhaps they could all start out at version 1.6. That's what I'm thinking as well, at least for the modules split out of core. Anything else that could have it's own (hopefully sane) versioning. That would be left up to the developer. > If there is a change in one module that breaks something in another > distro (which of course should not be done lightly) it's the > responsibility of the other distro's maintainer to fix it or code > around it or pin it down with a specific version number dependency > in its Build.PL, or whatever. Finding and characterizing these > interactions is what automated testing is for, and why it's built > into CPAN. Yes. >> > > Grrr!!! No breather! (just kidding) > > >> Now, my suggestions. We have settled on a general layout, so... >> * Each subdistribution would have a separate version and require a >> specific core (Bio::Root::Root) version. Note that Bio::Graphics >> is using a different versioning scheme than BioPerl, but we may >> want to stick to a similar tripartite numbering scheme as for >> core. Whatever happens, this must be decided on first, as there >> will be no turning back. >> * We repurpose Bundle::BioPerl (or a similar Bundle::* package) or >> make the BioPerl distribution itself a bundle-like installation. >> This would be for packaging up an old-style 'everything and the >> kitchen sink' core package from the various distributions. Anytime >> we split off something into it's own distribution we release a >> newly trimmed-down core and add the new distribution to the bundle >> or BioPerl. Refer everyone to install the bundle if they want the >> old-style installation. >> * Other current subdistributions (run, db, network, etc) follow the >> same pattern as the above. Releases for non-core distributions do >> not have to be tied together with core except where needed. >> * Avoid any circular dependencies (Bio::ASN1::EntrezGene, I'm >> staring at you). > (Is there any point in staring into its dead sunken eye sockets? It > was last released in 2005. Need to remove this dependency, > rewriting the module in question if necessary.) >> * As you mention, work these out on branches to test things out. > > The above is all exactly right. The proposed layout of the > distributions is the only thing that's wrong. They need to be much > smaller, more focused, and thus more maintainable. Yes, I agree. However a large set of modules in bioperl were effectively donated by the author, so they will fall to the core devs to maintain by sheer property of legacy. >> And finally, and I am saying this with the utmost respect and >> sincerest thanks for everything Sendu is doing and has done for >> BioPerl, but I'm not convinced we should keep using >> Bio::Root::Build. It does make some things convenient, but at the >> cost of additional bugs (2-3 at last count), some API breakage >> (some methods conflict with Module::Build), and a bit of a chicken- >> and-egg dilemma that particularly impacts subdistributions >> (attempting to fall back to Module::Build doesn't work due to API >> issues). I can elaborate on that more if asked, but I think this >> post is already long enough, so I'll leave that to later. > > Yes, please elaborate on that more. I want to know. On bugs: http://bugzilla.open-bio.org/show_bug.cgi?id=2792 http://bugzilla.open-bio.org/show_bug.cgi?id=2831 http://bugzilla.open-bio.org/show_bug.cgi?id=2859 http://bugzilla.open-bio.org/show_bug.cgi?id=2832 (this one is more a TODO) Note that the author of Bio::Root::Build hasn't touched these, so my inclination is to convert over to plain ol' Module::Build. On API and the 'chicken-or-egg' issue: Several methods within Bio::Root::Build override Module::Build methods but break API, in that they accept, generate, or process different (sometimes bioperl-specific) data than what the same Module::Build methods expect. I think 'requires' and 'recommends' fall into this cateory, as well as some meta data generation, such as META.yaml and PPM. Other bits are more akin to syntactic sugar (automated installation via CPAN, network checking, etc). This may cause bugs as noted above, which goes to demonstrate that too much 'sugar' can send you into a coma ;> It also causes a bit of a 'chicken-or-egg' issue with subdistributions wanting to use Bio::Root::Build, in that one has to check for the presence of Bio::Root::Build first and then completely bail if it isn't present. One can't fall back to Module::Build due to the API difference. I have run into this when releasing bioperl-run and the others. What I would like is have the various breakaway Bio::* either fall back to Module::Build if Bio::Root::Build isn't present, or just use Module::Build. My suggestion is to just use Module::Build directly, but we could scale down Bio::Root::Build to respect the Module::Build API (thus allowing it as a fallback). > Such progress. > > Seems like now we just need to get everyone to agree that > distributions need to be small and focused. > > Right? > > Rob I think most devs are on board with this, as long as we have some way of *easily* collecting the various bits into a larger whole. We do get a ton of first-time programmers on this list, probably more similar to what is seen with the perl users list opposed to the moose list. Anyway, bundling should solve this. chris From rmb32 at cornell.edu Fri Jul 17 05:08:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 17 Jul 2009 02:08:18 -0700 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> Message-ID: <4A603F82.9020202@cornell.edu> Chris Fields wrote: > Yes, I agree. However a large set of modules in bioperl were > effectively donated by the author, so they will fall to the core devs to > maintain by sheer property of legacy. This is a very sticky point. The only way I can think of would be to have each distro have a "principal maintainer", that is the go-to guy for issues related to keeping it running, but can beg and cajole others to help. At least there will be fewer problems per distribution, since they would be smaller. If a maintainer has to stop, he has to find somebody else to do it, or the package sits there and bit rot sets in. That's just how it goes. If it's important enough (like if it's depended on by a dist that IS maintained), somebody will pick it up. > On bugs: > On API and the 'chicken-or-egg' issue: > What I would like is have the various breakaway Bio::* either fall back > to Module::Build if Bio::Root::Build isn't present, or just use > Module::Build. My suggestion is to just use Module::Build directly, but > we could scale down Bio::Root::Build to respect the Module::Build API > (thus allowing it as a fallback). I'm not sure about this, I'm not an expert on the ins and outs of subclassing Module::Build. One idea I do have, however, is that we might think about using an xt/ directory for intensive and network-based tests that are not meant to be run by automated installers, which could help simplify the test and build code. I've heard that this is a pretty common practice in other projects. ===================== Anyway, let's develop some concrete plans. I would say that the plan at http://www.bioperl.org/wiki/Proposed_core_modules_changes is a half-measure, in light of the successful (painless?) Bio::Graphics extraction. Here's a new proposal: 1.) renew/construct the Bundle/Task::Bioperl, get it pulling in all the current Bioperl modules as dependencies (or however it works) 2.) start repeating the same extraction procedure used with Bio::Graphics: * identify a candidate set of modules in bioperl-live to be extracted into their own distribution, propose the extraction on the mailing list, get some kind of agreement * make a new component in the svn repository (alongside the bioperl-live and other dirs) named something like Bio-Something-Something, with trunk/, branches/, and tags/ subdirs. * svn cp modules into the new trunk/lib/, tests into trunk/t, scripts into trunk/scripts, and write a Build.PL just like the one Lincoln wrote for Bio::Graphics. * when the extracted copy looks good, use svn merge to port any changes that happened in trunk to the new extracted modules if necessary and test. * delete the old copy from bioperl-live/trunk. * identify a new candidate set of modules, propose on the mailing list, and repeat 2.5) continue releasing 1.6.X bugfix releases while this is going on. 3.) when bioperl-live is down to a truly reasonable core set, (fewer than 10 modules might be a good target), rename it to Bio-Perl-Core, go through a round of testing, and push them all to CPAN at once. Task::BioPerl will have dependencies on the module names, I think, so it will continue to install the same from users' perspectives, it will just be downloading different dists. 4.) repeat steps 1-3 with bioperl-run, and maybe others. Thoughts? If people like it, I or somebody else could put it on the wiki. And of course, I volunteer to put in a lot of work on this. I'll try to see if I can identify some other likely extraction candidates as a preliminary step and report back to the list. Also we need some more people besides just me and Chris talking and thinking about this, these are large reshufflings being proposed. Rob From e.osimo at gmail.com Fri Jul 17 08:49:36 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Fri, 17 Jul 2009 14:49:36 +0200 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes Message-ID: <2ac05d0f0907170549td482271ra7ea77bdfe43ee27@mail.gmail.com> Hello everyone, I'm new to programming, I'm a biologist, so please forgive my ignorance, but I've been trying this for 2 weeks, now I have to ask you. I'm trying the script I found at http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates because I need to have some variables (like $from and $to) assigned to the start and end of a gene. The script works fine, but gives me the wrong coordinates: for example if I try it with the gene 842 (CASP9), it prints: NT_004610.19 2498878 2530877 I found out that in Entrez, for each gene (for CASP9, for example, at http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq ) under "Genome Reference Consortium Human Build 37 (GRCh37), Primary_Assembly" there are two different sets of coordinates. The first is called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37), Primary_Assembly", and is the one I need, and the second one is called just "NT_004610.19" and it's the one that the script prints. This is valid for all the genes I tried. DO you know how to make the script print the "right" coordinates (at least, the one I need)? Thanks a lot in advance, Emanuele From jason at bioperl.org Fri Jul 17 13:01:14 2009 From: jason at bioperl.org (Jason Stajich) Date: Fri, 17 Jul 2009 10:01:14 -0700 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4A603F82.9020202@cornell.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> Message-ID: Will try to weigh in more, a little bit of stream of consciousness to let you know I'm thinking about it. Tough summer to focus much on this. It's too bad we are apparently the laughing stock of Perl gurus, but it would be great to see how to modernize aspects of the development. I'm curious how it will work that we'll have dozens of separate distros that we'll have a hard time keeping track of what directory things are in? Will there have to be a master list of what version and what modules are in what distro now? When I do a SVN (or git) checkout do I need to checkout each of these in its own directory? Or will there be a master packaging script that makes the necessary zip files for CPAN submission? If they are in separate directories are we organizing by conceptual topic (phylogenetics, alignment, database search) or by namespace of the modules? Do all the 'database' modules live together - probably not - so do we name bioperl-db-remote bioperl-db-local-index, bioperl-db- local-sql, etc? really bioperl-db is somewhat focused on sequences and features, but what about things that integrate multiple data types - like biosql? If they are in separate directories, what about all the test data that might be shared, is this replicated among all the sub-directories - how do we do a good job keeping that up to date, could we have a test- data distro instead with symlinks within SVN? For some other obvious modules that can be split off and self- contained, each of these could be a package. I would estimate more than 20 packages depending on how Bio::Tools are carved up. - I think Bio::DB::SeqFeature needs to be split off for sure this is a nice logical peeling off. Could be another test case since it is a Gbrowse dependancy. - Bio::DB::GFF as well for the same reasons. - Bio::PopGen - self contained for the most part, but depends on Bio::Tree and Bio::Align objects - Bio::Variation - Bio::Map and Bio::MapIO - Bio::Cluster and Bio::ClusterIO - Bio::Assembly - Bio::Coordinate My nightmare is that we're going to have to manage a lot of 'use XX 1.01' enforcing version requiring when dealing with the dependancies on the interface classes and having to keep these all up to date? The version was implicit when they are all part of the same big distro. Also the splits need not only include one namespace if need be I guess but we have generally grouped things by namespace. What do you want to do about the bioperl-run. Do we make a set of parallel splits from all of these? I think at the outset we need to coordinate the applications supported here in some sort of loose ontology - the namespaces were not consistently applied so we have some alignment tools in different directories, etc. So the namespace sort of classifies them but it could be better. One of the challenges of multiple developers without a totally shared vision on how it should be done. I'm not convinced that the Bio::Graphics splitoff has been painless so we should take stock of how that is working. It seems like this split off would be a way to better streamline things in bioperl so that modern versions of bioperl might be able to better interface with things like Ensembl again too. How much of this effort is worth triaging on the current code versus the efforts we want to make on a cleaner, simpler bioperl system that appears to scare so many users (and potential developers) off. Okay I rambled, hope that was helpful. -jason -- Jason Stajich jason at bioperl.org On Jul 17, 2009, at 2:08 AM, Robert Buels wrote: > Chris Fields wrote: >> Yes, I agree. However a large set of modules in bioperl were >> effectively donated by the author, so they will fall to the core >> devs to maintain by sheer property of legacy. > > This is a very sticky point. The only way I can think of would be > to have each distro have a "principal maintainer", that is the go-to > guy for issues related to keeping it running, but can beg and cajole > others to help. At least there will be fewer problems per > distribution, since they would be smaller. If a maintainer has to > stop, he has to find somebody else to do it, or the package sits > there and bit rot sets in. That's just how it goes. If it's > important enough (like if it's depended on by a dist that IS > maintained), somebody will pick it up. > >> On bugs: > >> On API and the 'chicken-or-egg' issue: > >> What I would like is have the various breakaway Bio::* either fall >> back to Module::Build if Bio::Root::Build isn't present, or just >> use Module::Build. My suggestion is to just use Module::Build >> directly, but we could scale down Bio::Root::Build to respect the >> Module::Build API (thus allowing it as a fallback). > I'm not sure about this, I'm not an expert on the ins and outs of > subclassing Module::Build. > > One idea I do have, however, is that we might think about using an > xt/ directory for intensive and network-based tests that are not > meant to be run by automated installers, which could help simplify > the test and build code. I've heard that this is a pretty common > practice in other projects. > > ===================== > > Anyway, let's develop some concrete plans. I would say that the plan > at http://www.bioperl.org/wiki/Proposed_core_modules_changes is a > half-measure, in light of the successful (painless?) Bio::Graphics > extraction. > > Here's a new proposal: > > 1.) renew/construct the Bundle/Task::Bioperl, get it pulling in all > the current Bioperl modules as dependencies (or however it works) > > 2.) start repeating the same extraction procedure used with > Bio::Graphics: > * identify a candidate set of modules in bioperl-live to be > extracted into their own distribution, propose the extraction on the > mailing list, get some kind of agreement > * make a new component in the svn repository (alongside the bioperl- > live and other dirs) named something like Bio-Something-Something, > with trunk/, branches/, and tags/ subdirs. > * svn cp modules into the new trunk/lib/, tests into trunk/t, > scripts into trunk/scripts, and write a Build.PL just like the one > Lincoln wrote for Bio::Graphics. > * when the extracted copy looks good, use svn merge to port any > changes that happened in trunk to the new extracted modules if > necessary and test. > * delete the old copy from bioperl-live/trunk. > * identify a new candidate set of modules, propose on the mailing > list, and repeat > > 2.5) continue releasing 1.6.X bugfix releases while this is going on. > > 3.) when bioperl-live is down to a truly reasonable core set, (fewer > than 10 modules might be a good target), rename it to Bio-Perl-Core, > go through a round of testing, and push them all to CPAN at once. > Task::BioPerl will have dependencies on the module names, I think, > so it will continue to install the same from users' perspectives, it > will just be downloading different dists. > > 4.) repeat steps 1-3 with bioperl-run, and maybe others. > > Thoughts? If people like it, I or somebody else could put it on the > wiki. > > And of course, I volunteer to put in a lot of work on this. I'll > try to see if I can identify some other likely extraction candidates > as a preliminary step and report back to the list. > > Also we need some more people besides just me and Chris talking and > thinking about this, these are large reshufflings being proposed. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From bix at sendu.me.uk Fri Jul 17 12:54:30 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 17 Jul 2009 17:54:30 +0100 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> Message-ID: <4A60ACC6.6020003@sendu.me.uk> Chris Fields wrote: > On Jul 16, 2009, at 2:22 AM, Robert Buels wrote: >>> And finally, and I am saying this with the utmost respect and >>> sincerest thanks for everything Sendu is doing and has done for >>> BioPerl, but I'm not convinced we should keep using Bio::Root::Build. >>> It does make some things convenient, but at the cost of additional >>> bugs (2-3 at last count), some API breakage (some methods conflict >>> with Module::Build), and a bit of a chicken-and-egg dilemma that >>> particularly impacts subdistributions (attempting to fall back to >>> Module::Build doesn't work due to API issues). [snip] > http://bugzilla.open-bio.org/show_bug.cgi?id=2792 > http://bugzilla.open-bio.org/show_bug.cgi?id=2831 > http://bugzilla.open-bio.org/show_bug.cgi?id=2859 > http://bugzilla.open-bio.org/show_bug.cgi?id=2832 (this one is more a TODO) > > Note that the author of Bio::Root::Build hasn't touched these, so my > inclination is to convert over to plain ol' Module::Build. Well, it hardly had to be me that had to add CPANPLUS support. And they're all P2 normal and minor bugs. And you never (iirc) encouraged me to solve them to help out with your release. I did offer to help (generally), but you never took me up on that offer. But... > On API and the 'chicken-or-egg' issue: > > Several methods within Bio::Root::Build override Module::Build methods > but break API, in that they accept, generate, or process different > (sometimes bioperl-specific) data than what the same Module::Build > methods expect. I think 'requires' and 'recommends' fall into this > cateory, as well as some meta data generation, such as META.yaml and > PPM. Other bits are more akin to syntactic sugar (automated > installation via CPAN, network checking, etc). This may cause bugs as > noted above, which goes to demonstrate that too much 'sugar' can send > you into a coma ;> ... You're right, it's a bit of a mess. For 1.5.2 I felt all the extra stuff that made it easier to install was absolutely required. And the ultimate purpose of Bio::Root::Build (as it's called now) was to make installation easier for everyone. If it makes it harder, and/or if the current maintainer thinks they can deal with any support requests that arise from just using Module::Build directly, then go ahead and do away with it. But while BioPerl is still monolithic, how will people be able to choose which external dependencies they want to install? That's the question that must be resolved before getting rid of Bio::Root::Build. You'd also need to resolve the network tests issue. And, well, I guess all the other issues that Bio::Root:Build solves. > It also causes a bit of a 'chicken-or-egg' issue with subdistributions > wanting to use Bio::Root::Build, in that one has to check for the > presence of Bio::Root::Build first and then completely bail if it isn't > present. One can't fall back to Module::Build due to the API > difference. For small sub-distributions that have no optional external dependencies (all of the BioPerl subdists?), they can be changed to just using pure Module::Build, while core retains Bio::Root::Build as long as core is monolithic. (For 1.5.2, the subdists each came bundled with ModuleBuildBioPerl, so I didn't have this issue.) From cjfields at illinois.edu Fri Jul 17 15:35:33 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 17 Jul 2009 14:35:33 -0500 Subject: [Bioperl-l] [ANNOUNCEMENT] New core developer Mark Jensen Message-ID: <59E50DA0-E4ED-46D9-BE34-98A2ED5DC06B@illinois.edu> All, I am pleased to announce Mark Jensen is joining us as a core developer! Mark has contributed significant code enhancements (Bio::DB::HIV among them), and had made several critical bug fixes, among them refactors to Bio::Restriction and Bio::Tree. Mark is currently mentoring Chase Miller as part of the Google Summer of Code with NESCent, integrating NeXML parsing into BioPerl. On behalf of the bioperl developer team, Mark, welcome! chris From jay at jays.net Fri Jul 17 15:55:38 2009 From: jay at jays.net (Jay Hannah) Date: Fri, 17 Jul 2009 14:55:38 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> Message-ID: <4A60D73A.8030706@jays.net> Jason Stajich wrote: > I'm curious how it will work that we'll have dozens of separate > distros that we'll have a hard time keeping track of what directory > things are in? Will there have to be a master list of what version and > what modules are in what distro now? > > When I do a SVN (or git) checkout do I need to checkout each of these > in its own directory? Or will there be a master packaging script that > makes the necessary zip files for CPAN submission? Perhaps my Catalyst experience would be a useful additional to this discussion. Catalyst is a popular web framework composed of dozens of CPAN distributions. http://www.catalystframework.org/ Users install Catalyst (cpan Catalyst), which is everything a user needs to build a basic website. The list of classes the user just installed is here: http://search.cpan.org/~flora/Catalyst-Runtime-5.80007/ Which lives in SVN here: http://dev.catalyst.perl.org/repos/Catalyst/Catalyst-Runtime/ As each user finds additional shiny things relevant to them on CPAN (Catalyst::* e.g. Catalyst::Plugin::FillInForm), they install those, individually (cpan Catalyst::Plugin::FillInForm). All Catalyst::* distributions live in the same SVN repository, as entirely independent, ready-to-ship CPAN distributions: http://dev.catalyst.perl.org/repos/Catalyst/ http://dev.catalyst.perl.org/repos/Catalyst/trunk/ So, as a new or veteran developer, when I find a bug in Catalyst::Plugin::FillInForm I patch it in SVN http://dev.catalyst.perl.org/repos/Catalyst/Catalyst-Plugin-FillInForm/trunk/ and then, like any other CPAN distribution, I prep and push that distribution to PAUSE. -make my code changes- -vi Changes- -vi lib/Catalyst/Plugin/FillInForm.pm, increment VERSION- svn diff svn commit perl Makefile.PL make make test make manifest make dist make disttest ftp Catalyst-Plugin-FillInForm-0.11.tar.gz to pause.cpan.org:/incoming That's it. I just upgraded Catalyst::Plugin::FillInForm from 0.10 to 0.11. There is no "master list of what version and what modules are in what distro now". CPAN itself is that resource. Bottom line, small parts of Catalyst are pushed out to CPAN *every day*. Very cool. Shocking when compared to the BioPerl release history on CPAN. (Catalyst::Plugin::FillInForm happens to use Module::Install. But another author may prefer ExtUtils::MakeMaker, or Dist::Zilla, or Module::Build, or whatever. Each* Catalyst:: is an independent distribution that is free to shift slowly, or quickly, over time as developer interest dictates.) (* Each meaning "tiny, highly inter-relevant groups of classes.") Large, seismic shifts in Catalyst itself (Catalyst-Runtime) are a new branch in SVN, that can take a few months. Like this year's total reworking of Catalyst to use Moose internally (the move from the 5.70 branch to the 5.80 branch). But "total reworkings" of Catalyst can and do continue to happen because the "Catalyst" distribution (Catalyst-Runtime) is independent from the dozens of other great Catalyst:: packages available on CPAN. So Catalyst:: is a loose federation of cooperative modules on CPAN tied together by namespace and the API of Catalyst-Runtime. > If they are in separate directories are we organizing by conceptual > topic (phylogenetics, alignment, database search) or by namespace of > the modules? Do all the 'database' modules live together - probably > not - so do we name bioperl-db-remote bioperl-db-local-index, > bioperl-db-local-sql, etc? really bioperl-db is somewhat focused on > sequences and features, but what about things that integrate multiple > data types - like biosql? In the Catalyst development model CPAN namespace (package name), the SVN path, and distribution name are all the same. (Hopefully namespaces somewhat match conceptual topics. -grin-) > If they are in separate directories, what about all the test data that > might be shared, is this replicated among all the sub-directories - > how do we do a good job keeping that up to date, could we have a > test-data distro instead with symlinks within SVN? I don't believe Catalyst packages ever share test data. Is there lots of re-use of large amounts of test data by what should be separate distributions in BioPerl? I'm not familiar with SVN symlinks. I don't think Catalyst SVN has any. ( 14:33 <@t0m> jhannah: you mean svn:externals, and yes, it's used by a load of the engines to steal the TestApp from -Runtime 14:34 <@t0m> I'd be more tempted to make the test data it's own dist if that's sane. ) > My nightmare is that we're going to have to manage a lot of 'use XX > 1.01' enforcing version requiring when dealing with the dependancies > on the interface classes and having to keep these all up to date? The > version was implicit when they are all part of the same big distro. Catalyst::Plugin::FillInForm has this in its Makefile.PL: requires 'Catalyst' => '5.7012'; CPAN then enforces and auto-installs dependencies for the users. Like the rest of CPAN, Catalyst lets CPAN enforce dependencies. Doesn't that render most 'use XX 1.01' statements obsolete? > Also the splits need not only include one namespace if need be I guess > but we have generally grouped things by namespace. I believe all Catalyst distibutions are *very* cleanly split on namespace. I imagine not doing so would be a nightmare. > I'm not convinced that the Bio::Graphics splitoff has been painless so > we should take stock of how that is working. I'd like to hear about any pain so I could compare to Catalyst... > How much of this effort is worth triaging on the current code versus > the efforts we want to make on a cleaner, simpler bioperl system that > appears to scare so many users (and potential developers) off. One of the amazing things that happen in Catalyst and Moose frequently is that random people wander into irc.perl.org #catalyst or #moose and say "this is broke". If they seem to be clued then they get an SVN (or git) commit bit (on the specific directory of that distribution) after submitting a single patch, and then become CPAN co-maintainers of that package after the second patch. Soon they're improving that part of CPAN on their own. The risk to the community is mitigated by the fact that even if jhannah breaks Catalyst::Plugin::FillInForm 0.11, most Catalyst users don't use that specific Plugin anyway. Also, CPAN has copies of the 4 previous versions of C::P::F sitting around all over the world for people to fall back to. This leaves the hard-core Catalyst developers free to improve the central engine, rather than forcing them to focus on POD patches to the 500 peripheral bits all the time. Catalyst and Moose are amazingly delegated. It's hard not to end up with commit bits and CPAN co-maint of their small distributions when you express good ideas. The small/rare bits get fixed by the few people that care. The wizards focus on the big picture changes. ... I know all this is already happening in BioPerl SVN. It'd just be great if it happened on CPAN too. Since I've been using bioperl-live directly for years I really haven't cared about CPAN release schedules. But if you're not a BioPerl developer, you probably pull CPAN only. > Okay I rambled, hope that was helpful. Ditto!! Only worse rambling, and probably less helpful. :) Jay Hannah http://bioperl.org/wiki/User:Jhannah From rmb32 at cornell.edu Fri Jul 17 17:23:01 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 17 Jul 2009 14:23:01 -0700 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> Message-ID: <4A60EBB5.4010004@cornell.edu> I was going to write a longer post, but Jay wrote everything I was going to write, plus more, and did a better job. Jason Stajich wrote: > For some other obvious modules that can be split off and self-contained, > each of these could be a package. I would estimate more than 20 > packages depending on how Bio::Tools are carved up. > - I think Bio::DB::SeqFeature needs to be split off for sure this is a > nice logical peeling off. Could be another test case since it is a > Gbrowse dependancy. > - Bio::DB::GFF as well for the same reasons. > - Bio::PopGen - self contained for the most part, but depends on > Bio::Tree and Bio::Align objects > - Bio::Variation > - Bio::Map and Bio::MapIO > - Bio::Cluster and Bio::ClusterIO > - Bio::Assembly > - Bio::Coordinate Oh, this is a nice list. > What do you want to do about the bioperl-run. Do we make a set of > parallel splits from all of these? I think at the outset we need to > coordinate the applications supported here in some sort of loose > ontology - the namespaces were not consistently applied so we have some > alignment tools in different directories, etc. So the namespace sort of > classifies them but it could be better. One of the challenges of > multiple developers without a totally shared vision on how it should be > done. I would say that all alignment tools (for example) should probably not all go into the same distribution. For example if Alice wrote some alignment thing and Bob wrote some other thing, but they're not really related beyond the fact that they do similar things and possibly depend on similar things, they should go in separate distributions. > I'm not convinced that the Bio::Graphics splitoff has been painless so > we should take stock of how that is working. Yes, lets. I would like to hear more about that. > It seems like this split off would be a way to better streamline things > in bioperl so that modern versions of bioperl might be able to better > interface with things like Ensembl again too. Once things are less monolithic, developing and releasing *should* be a LOT easier. As Jay also mentioned a bit, it's more like on Tuesday Charlie notices a bug in Bio::Foo::Bar, fixes it. Pushes it to CPAN (with a small version bump) immediately afterward. Users pick it up via Task::BioPerl. That's it. Or, how about a slightly longer case study: Say on Wednesday Charlie notices that the design of Bio::Foo::Bar sucks and it really needs some work. He codes furiously for however long it takes, makes Bio::Fooer::Bar or something like that, in a new distribution, and pushes it to CPAN. Initially, no other modules are going to be using it, but then say Jason, the maintainer of Bio::SeqIO::fasta, notices that hey, Bio::Fooer::Bar is a lot better than Bio::Foo::Bar. Then he can just use it, test his new Bio::SeqIO::fasta with it, put it in his dist's Build.PL as a dependency, and push to CPAN. Now it's getting pulled in with Task::BioPerl and *USERS* now have been given that improvement, probably in only a matter of days. There are automated tests at every step of the process to ensure quality throughout. Or for larger changes, coordination among several distros may be necessary, but the nice thing is, exactly which ones those are is codified in all their Build.PL files! Much less guessing and worrying about unintended consequences. Things are abstracted into smaller chunks, which are much easier for developers to wrap their minds around, which means developing is easier, which leads to more contributors and accelerated development. > How much of this effort is worth triaging on the current code versus the > efforts we want to make on a cleaner, simpler bioperl system that > appears to scare so many users (and potential developers) off. If there were not so many person-years of development time already in BioPerl, I would probably be pushing for ground-up rewrite to simplify things. But as chromatic frequently says (he's fantastic, look him up), ground-up rewrites of large projects almost never work. You lose a year (or multiple years) of person time rewriting instead of adding features, or if you also add features to the old version in parallel, you have to also port those features to the new version (over a really long time period). It's theoretically possible to do, but in practice it almost never works, he says. I don't know, I've never been involved in an attempt like that from start to finish. > Okay I rambled, hope that was helpful. Quite helpful! Please keep it up if you can! Rob From jay at jays.net Fri Jul 17 18:49:28 2009 From: jay at jays.net (Jay Hannah) Date: Fri, 17 Jul 2009 17:49:28 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4A60EBB5.4010004@cornell.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> <4A60EBB5.4010004@cornell.edu> Message-ID: <4A60FFF8.3030302@jays.net> Robert Buels wrote: > Once things are less monolithic, developing and releasing *should* be > a LOT easier. As Jay also mentioned a bit, it's more like on Tuesday > Charlie notices a bug in Bio::Foo::Bar, fixes it. Pushes it to CPAN > (with a small version bump) immediately afterward. Users pick it up > via Task::BioPerl. That's it. Hmm... In the Catalyst model, users never get a new copy of Bio::Foo::Bar unless they explicitly install it. Typically, a user is perfectly happy with their pretty-out-of-date copy of Bio::Foo::Bar sitting on their server. It works for them, so they don't care. The big difference for the typical user, I think, is that when they go to install a new server, grabbing the list of things they care about from CPAN, what they're getting is current up to TODAY, instead of months/years old. Like I said, I'm a bioperl-live addict, so haven't cared about CPAN being current. But I'm blindly guessing that 95% of our customers install whatever is sitting on CPAN right now. (That's certainly how the rest of the Perl universe works.) A shame that our customers continually don't benefit from all the recent hard work. > Or, how about a slightly longer case study: > Say on Wednesday Charlie notices that the design of Bio::Foo::Bar > sucks and it really needs some work. He codes furiously for however > long it takes, makes Bio::Fooer::Bar or something like that, in a new > distribution, and pushes it to CPAN. Initially, no other modules are > going to be using it, but then say Jason, the maintainer of > Bio::SeqIO::fasta, notices that hey, Bio::Fooer::Bar is a lot better > than Bio::Foo::Bar. Then he can just use it, test his new > Bio::SeqIO::fasta with it, put it in his dist's Build.PL as a > dependency, and push to CPAN. Now it's getting pulled in with > Task::BioPerl and *USERS* now have been given that improvement, > probably in only a matter of days. There are automated tests at every > step of the process to ensure quality throughout. Yup. Every dist can declare it's dependency stack with every release. If Bio::Foo::Bar is abandoned by all distributions, a new copy of that dist is flagged DEPRECATED ("in favor of Bio::Fooer::Bar"), and pushed to CPAN. That clues everyone in that development has stopped and where they should go instead. For example: http://search.cpan.org/~mramberg/Catalyst-Plugin-FormValidator-0.03/ > Or for larger changes, coordination among several distros may be > necessary, but the nice thing is, exactly which ones those are is > codified in all their Build.PL files! Much less guessing and worrying > about unintended consequences. Things are abstracted into smaller > chunks, which are much easier for developers to wrap their minds > around, which means developing is easier, which leads to more > contributors and accelerated development. Ya. Two years ago there's no way I would have dared to change Catalyst. But changing Catalyst::Foo::Bar::Baz was far less intimidating and I was happy to submit a patch. That's how they hooked me, and they've had me ever since. Then Moose got me, the exact same way. -laugh- -sigh- -grin- > ground-up rewrites of large projects almost never work. Ya, I wouldn't recommend a big bang approach. (Until BioPerl6?) The whole idea is to turn the whole thing into lots of little bangs. :) Jason's list of targets is exciting! (Where's the Bio::Graphics SVN repo?) Anyhoo, I'll stop preaching to the choir now. Jay Hannah http://bioperl.org/wiki/User:Jhannah From rmb32 at cornell.edu Fri Jul 17 20:36:49 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 17 Jul 2009 17:36:49 -0700 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4A60FFF8.3030302@jays.net> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> <4A60EBB5.4010004@cornell.edu> <4A60FFF8.3030302@jays.net> Message-ID: <4A611921.60100@cornell.edu> Jay Hannah wrote: > Jason's list of targets is exciting! (Where's the Bio::Graphics SVN repo?) lol. Funny you should ask. http://sourceforge.net/mailarchive/forum.php?thread_name=4A3BC600.7060304%40cornell.edu&forum_name=gmod-devel Bio-Graphics is in the GMOD CVS repo. http://gmod.cvs.sourceforge.net/gmod/ Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Jul 17 21:31:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 17 Jul 2009 20:31:29 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4A603F82.9020202@cornell.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> Message-ID: <757C3DF2-B6CD-4120-9775-B66B0701351E@illinois.edu> On Jul 17, 2009, at 4:08 AM, Robert Buels wrote: > Chris Fields wrote: >> Yes, I agree. However a large set of modules in bioperl were >> effectively donated by the author, so they will fall to the core >> devs to maintain by sheer property of legacy. > > This is a very sticky point. The only way I can think of would be > to have each distro have a "principal maintainer", that is the go-to > guy for issues related to keeping it running, but can beg and cajole > others to help. At least there will be fewer problems per > distribution, since they would be smaller. If a maintainer has to > stop, he has to find somebody else to do it, or the package sits > there and bit rot sets in. That's just how it goes. If it's > important enough (like if it's depended on by a dist that IS > maintained), somebody will pick it up. Just so this isn't misunderstood, much of that code is fairly stable so I don't think it will be a significant problem, and it can be addressed at a later point. I think if we trim off enough of the current distribution the issue won't matter in the long term. I do think any legacy code will have to fall to the core devs for the primary reason that if bit rot does set in (and no one is maintaining critical modules) we can easily switch maintainers, fix bugs, and drop a CPAN release. We have a bioperl-specific account on CPAN that makes it easy. All of the code is currently under that name anyway so it might as well stay there for the time being. >> On bugs: > >> On API and the 'chicken-or-egg' issue: > >> What I would like is have the various breakaway Bio::* either fall >> back to Module::Build if Bio::Root::Build isn't present, or just >> use Module::Build. My suggestion is to just use Module::Build >> directly, but we could scale down Bio::Root::Build to respect the >> Module::Build API (thus allowing it as a fallback). > I'm not sure about this, I'm not an expert on the ins and outs of > subclassing Module::Build. > > One idea I do have, however, is that we might think about using an > xt/ directory for intensive and network-based tests that are not > meant to be run by automated installers, which could help simplify > the test and build code. I've heard that this is a pretty common > practice in other projects. That's a possibility. I have already started towards a few of those bug fixes, but I would rather they be *Module::Build* bugs, not bioperl ones (i.e. if we go with their API, it should be their bug ;) > ===================== > > Anyway, let's develop some concrete plans. I would say that the plan > at http://www.bioperl.org/wiki/Proposed_core_modules_changes is a > half-measure, in light of the successful (painless?) Bio::Graphics > extraction. > > Here's a new proposal: > > 1.) renew/construct the Bundle/Task::Bioperl, get it pulling in all > the current Bioperl modules as dependencies (or however it works) > > 2.) start repeating the same extraction procedure used with > Bio::Graphics: > * identify a candidate set of modules in bioperl-live to be > extracted into their own distribution, propose the extraction on the > mailing list, get some kind of agreement > * make a new component in the svn repository (alongside the bioperl- > live and other dirs) named something like Bio-Something-Something, > with trunk/, branches/, and tags/ subdirs. > * svn cp modules into the new trunk/lib/, tests into trunk/t, > scripts into trunk/scripts, and write a Build.PL just like the one > Lincoln wrote for Bio::Graphics. > * when the extracted copy looks good, use svn merge to port any > changes that happened in trunk to the new extracted modules if > necessary and test. > * delete the old copy from bioperl-live/trunk. > * identify a new candidate set of modules, propose on the mailing > list, and repeat We may have to think a bit outside of just namespaces alone. Some (like EUtilities) are present in more than one. These would also have to be in line with what others want (so now's the time to chime in). If going strictly on namespace, these may be easiest: * Assembly * Biblio * Cluster/ClusterIO * Coordinate * Draw (modules in the Graphics namespace that weren't related to Bio::Graphics) * Expression * Map * Matrix * Microarray * Restriction * MolEvol * Phenotype * PhyloNetwork * PopGen * SeqEvolution * Structure * Symbol (may be deprecated) * Taxonomy (is deprecated, so don't bother) * Variation These probably a little trickier: * Search/SearchIO * Align/AlignIO * Index/DB/Das (General) * Tools (very tricky, as there are several outside requirements) It's possible (and probably best) that these be grouped by function. MolEvol, Phenotype, PopGen, PhyloNetwork, SeqEvolution, for instance could go into a general evol package. > 2.5) continue releasing 1.6.X bugfix releases while this is going on. Speaking of, I want to push an alpha out in the next week or two for 1.6.x (may be 1.6.2 in order to sync run with the others). > 3.) when bioperl-live is down to a truly reasonable core set, (fewer > than 10 modules might be a good target), rename it to Bio-Perl-Core, > go through a round of testing, and push them all to CPAN at once. > Task::BioPerl will have dependencies on the module names, I think, > so it will continue to install the same from users' perspectives, it > will just be downloading different dists. I don't think it's possible to get it to 10 as there are too many interrelated modules. That is, unless you subscribe to the more extreme core==Root, and we whittle core down to those root-based modules. > 4.) repeat steps 1-3 with bioperl-run, and maybe others. bioperl-db will probably need to stay largely intact, with the possible exception of the DB-specific modules for mysql, pg, oracle, etc. bioperl-network is pretty self-contained as well; it doesn't make much sense to split it up completely. > Thoughts? If people like it, I or somebody else could put it on the > wiki. > > And of course, I volunteer to put in a lot of work on this. I'll > try to see if I can identify some other likely extraction candidates > as a preliminary step and report back to the list. > > Also we need some more people besides just me and Chris talking and > thinking about this, these are large reshufflings being proposed. > > Rob As mentioned before, I think most are on board. It comes down to exactly how we package these smaller distributions. I don't think we can simply dump individual modules into CPAN; IIRC Sendu corresponded with Andreas K?nig about this and was strongly dissuaded from doing it (focused packages were promoted instead, but maybe Sendu can elaborate more on that). chris From cjfields at illinois.edu Fri Jul 17 21:40:28 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 17 Jul 2009 20:40:28 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4A60ACC6.6020003@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> Message-ID: On Jul 17, 2009, at 11:54 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Jul 16, 2009, at 2:22 AM, Robert Buels wrote: >>>> And finally, and I am saying this with the utmost respect and >>>> sincerest thanks for everything Sendu is doing and has done for >>>> BioPerl, but I'm not convinced we should keep using >>>> Bio::Root::Build. It does make some things convenient, but at the >>>> cost of additional bugs (2-3 at last count), some API breakage >>>> (some methods conflict with Module::Build), and a bit of a >>>> chicken-and-egg dilemma that particularly impacts >>>> subdistributions (attempting to fall back to Module::Build >>>> doesn't work due to API issues). > [snip] >> http://bugzilla.open-bio.org/show_bug.cgi?id=2792 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2831 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2859 >> http://bugzilla.open-bio.org/show_bug.cgi?id=2832 (this one is more >> a TODO) >> Note that the author of Bio::Root::Build hasn't touched these, so >> my inclination is to convert over to plain ol' Module::Build. > > Well, it hardly had to be me that had to add CPANPLUS support. And > they're all P2 normal and minor bugs. And you never (iirc) > encouraged me to solve them to help out with your release. I did > offer to help (generally), but you never took me up on that offer. > But... Just to note, most of issues with Bio::Root::Build popped up after the final core 1.6.0 was released and prior to the release of run/db/ network (how I found out about the inability to fall back to M::B). Regardless... I'm not sure why I am the one that needs to approve fixing something in bioperl-live; I'm the 1.6.x release pumpkin, not the bioperl-live pumpkin. Core developers keep an eye on bioperl-live and svn code. This is just as much a bioperl-live bug as a 1.6.x bug (the bugs were filed fairly soon after the releases, IIRC). If it seriously breaks bioperl's API or can feasibly be merged into 1.6.x, I can cherry-pick around it. I don't see fixes to make Bio::Root::Build compliant with M::B falling into the 'cherry-picking' category, but who knows? Furthermore, again respectfully, but one shouldn't need encouragement to fix this. Bio::Root::Build has very useful methods, but I didn't write that particular code, so I'm really not the best one to debug it. I can certainly try but I can't guarantee how long that will take for obvious reasons. It's a nice way to assign priority but don't hang too much on the 'P2' designation, seeing as Bugzilla automatically defaults new bugs to P2 status (it has to be changed to something else manually). These are not blockers per se, but they are fairly serious bugs if they cause any installation issues. The CPANPLUS ones are the most troubling (lots of CPAN Testers give an UNKNOWN status due to CPANPLUS issues, possibly due to bad META.yml). >> On API and the 'chicken-or-egg' issue: >> Several methods within Bio::Root::Build override Module::Build >> methods but break API, in that they accept, generate, or process >> different (sometimes bioperl-specific) data than what the same >> Module::Build methods expect. I think 'requires' and 'recommends' >> fall into this cateory, as well as some meta data generation, such >> as META.yaml and PPM. Other bits are more akin to syntactic sugar >> (automated installation via CPAN, network checking, etc). This may >> cause bugs as noted above, which goes to demonstrate that too much >> 'sugar' can send you into a coma ;> > > ... You're right, it's a bit of a mess. For 1.5.2 I felt all the > extra stuff that made it easier to install was absolutely required. > And the ultimate purpose of Bio::Root::Build (as it's called now) > was to make installation easier for everyone. If it makes it harder, > and/or if the current maintainer thinks they can deal with any > support requests that arise from just using Module::Build directly, > then go ahead and do away with it. > > But while BioPerl is still monolithic, how will people be able to > choose which external dependencies they want to install? That's the > question that must be resolved before getting rid of > Bio::Root::Build. You'd also need to resolve the network tests > issue. And, well, I guess all the other issues that Bio::Root:Build > solves. I mentioned two options. The first was to revert back to Module::Build. The second was to have Bio::Root::Build methods comply with the Module::Build API. It's pretty obvious which one I favor, but I also mention we can certainly follow the second option IF<\emphasis> sub-distributions have a way to fall back to Module::Build and install core, either if it's not present or isn't at the correct version. Doing so requires Module::Build API compliance. Fixing Bio::Root::Build towards that end would solve the network issue. Robert suggested another option as well (separate out those tests into another directory tree), though thinking about it more I don't think that's immediately tenable with all the network tests mixed in. As for the external dependencies, we're falling into the trap of thinking general users need to install bioperl-live (and thus are using a tarball and 'perl Build.PL'). Everyday users should use CPAN (or PPM when we have that running); devs and advanced users can use bioperl-live. A standard CPAN install should take care of most required dependencies; we should be able to push additional 'dependencies' onto the required queue if the user wants them. If needed we could keep the CPAN sugar to make it a bit easier, but i think it needs to be turned off by default and prompted for, and not run at all when in CPAN/CPANPLUS shell. Those should be easy enough to fix; the latter is a matter of looking for the env. vars $CPAN_IS_RUNNING and/or $CPANPLUS_IS_RUNNING (I believe those are correct). >> It also causes a bit of a 'chicken-or-egg' issue with >> subdistributions wanting to use Bio::Root::Build, in that one has >> to check for the presence of Bio::Root::Build first and then >> completely bail if it isn't present. One can't fall back to >> Module::Build due to the API difference. > > For small sub-distributions that have no optional external > dependencies (all of the BioPerl subdists?), they can be changed to > just using pure Module::Build, while core retains Bio::Root::Build > as long as core is monolithic. > > (For 1.5.2, the subdists each came bundled with ModuleBuildBioPerl, > so I didn't have this issue.) The bugs mainly pertain to bp/M::DB API conflicts. We could use either/or if the API was the same, but it would be nice to have some consistency and not have to choose between one or the other (or worse, change from one to the other if a dependency is added). chris From cjfields at illinois.edu Fri Jul 17 23:14:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 17 Jul 2009 22:14:49 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> Message-ID: <9C987542-1B90-4462-9DE9-F88007579ACA@illinois.edu> My 2c... On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote: > Will try to weigh in more, a little bit of stream of consciousness > to let you know I'm thinking about it. Tough summer to focus much > on this. Yes, for me as well. That will change soon (approx two weeks) ;> > It's too bad we are apparently the laughing stock of Perl gurus, but > it would be great to see how to modernize aspects of the development. > > I'm curious how it will work that we'll have dozens of separate > distros that we'll have a hard time keeping track of what directory > things are in? Will there have to be a master list of what version > and what modules are in what distro now? I don't think we're a laughingstock as much as we haven't had the time to dedicate towards this (and much of this occurred at a point early on, with that whole 'Cathedral and Bazaar' esr-based thingy). BTW,, those same gurus shouldn't speak: perl core is just as bad and riddled with worse bugs, though rgs and co. wouldn't admit it. In fact, base.pm itself has a nasty one; I'm surprised no one in the bioperl community has noticed it yet (it's listed as a bug on RT I think): pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print $Bio::SeqIO::VERSION."\n"' 1.0069 pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print $Bio::Root::IO::VERSION."\n"' -1, set by base.pm Imported modules do not have VERSION set correctly when it is exported. This hasn't become an issue in bioperl yet (it's really an edge case), but several devs have run into this. And really, why set VERSION to a string like '-1, set by base.pm'? Anyway, re: versioning, the way I think about it, if we have a small very stable core with version X, and a focused very stable module group with version Y, other distributions would have a separate version and require subgroup version Y (which would in turn require core version X). CPAN would take care of it. This isn't much different than what occurs everyday on CPAN anyway (Jay's Catalyst, Moose and MooseX, and so on). In fact, several Moose-requiring distributions don't require the latest Moose. > When I do a SVN (or git) checkout do I need to checkout each of > these in its own directory? Or will there be a master packaging > script that makes the necessary zip files for CPAN submission? Not sure; that would be up to us I suppose. I think it would be easier to maintain and release if they were separate or packaged up as Jay suggests. > If they are in separate directories are we organizing by conceptual > topic (phylogenetics, alignment, database search) or by namespace of > the modules? By topic, retaining namespaces. We have a basic Bio::* directory structure already in place for various generic terms (Tools, DB, etc), so I see this crossing simple namespaces very easily. And as I pointed out to Robert, several of those could possibly go together. > Do all the 'database' modules live together - probably not - so do > we name bioperl-db-remote bioperl-db-local-index, bioperl-db-local- > sql, etc? really bioperl-db is somewhat focused on sequences and > features, but what about things that integrate multiple data types - > like biosql? I don't see bioperl-db (BioSQL) being split up. I think it's too intrinsically linked and cohesive (it's almost a separate core unto itself), so it would be counterproductive to do so. Maybe have bioperl-db become bioperl-biosql. Web-based = bioperl- remotedb. Local = bioperl-localdb. OBDA = bioperl-obda. > If they are in separate directories, what about all the test data > that might be shared, is this replicated among all the sub- > directories - how do we do a good job keeping that up to date, could > we have a test-data distro instead with symlinks within SVN? We have to see how much is actually shared and proceed from there. I would like to eventually resurrect the idea of a separate biodata repo that we could just ftp the data from as needed. That would cut down on the package size quite a bit, but I'm not sure how feasible that is from the testing point of view (would we have to skip all tests if there were no network access)? > For some other obvious modules that can be split off and self- > contained, each of these could be a package. I would estimate more > than 20 packages depending on how Bio::Tools are carved up. > - I think Bio::DB::SeqFeature needs to be split off for sure this is > a nice logical peeling off. Could be another test case since it is > a Gbrowse dependancy > - Bio::DB::GFF as well for the same reasons. Completely agree (and I think Lincoln would like this as well). > - Bio::PopGen - self contained for the most part, but depends on > Bio::Tree and Bio::Align objects Could list those as a required dependency. > - Bio::Variation > - Bio::Map and Bio::MapIO > - Bio::Cluster and Bio::ClusterIO > - Bio::Assembly > - Bio::Coordinate > > My nightmare is that we're going to have to manage a lot of 'use XX > 1.01' enforcing version requiring when dealing with the dependancies > on the interface classes and having to keep these all up to date? > The version was implicit when they are all part of the same big > distro. Right. But it also becomes a maintenance problem when serious bugs in one module impede the needed release of others to CPAN. > Also the splits need not only include one namespace if need be I > guess but we have generally grouped things by namespace. > > What do you want to do about the bioperl-run. Do we make a set of > parallel splits from all of these? I think at the outset we need to > coordinate the applications supported here in some sort of loose > ontology - the namespaces were not consistently applied so we have > some alignment tools in different directories, etc. So the > namespace sort of classifies them but it could be better. One of > the challenges of multiple developers without a totally shared > vision on how it should be done. We could split bp-run and Tools, pairing the wrappers with the relevant parsers modules. Not sure if this can be done with SearchIO as well but it could be tested to see how feasible that would be. > I'm not convinced that the Bio::Graphics splitoff has been painless > so we should take stock of how that is working. Really? Lincoln has made several fixes lately on CPAN, so I thought everything was going well. If anything I would think the lack of additional 1.6.x bioperl releases has probably held Gbrowse 2.0 up more due to Bio::DB::SeqFeature (my fault, but as you know life and job take precedence sometimes). > It seems like this split off would be a way to better streamline > things in bioperl so that modern versions of bioperl might be able > to better interface with things like Ensembl again too. > > How much of this effort is worth triaging on the current code versus > the efforts we want to make on a cleaner, simpler bioperl system > that appears to scare so many users (and potential developers) off. I say triage away on a branch, but we need to indicate which ones to whittle out first. The reason I believe we went for a larger split initially (as indicated on the wiki page) was to push something forward and not get too bogged down in the details. But we may as well go full throttle and do this right away. > Okay I rambled, hope that was helpful. > > -jason > -- > Jason Stajich > jason at bioperl.org Very, very helpful. Now I need a beer. chris From cjfields at illinois.edu Fri Jul 17 23:26:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 17 Jul 2009 22:26:09 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4A60FFF8.3030302@jays.net> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> <4A60EBB5.4010004@cornell.edu> <4A60FFF8.3030302@jays.net> Message-ID: <66FDE248-4CF8-4F68-91D5-16D0AE30B36E@illinois.edu> (Jay and Robert pretty much sum up what I think as well, so I won't attempt answering all of these)... On Jul 17, 2009, at 5:49 PM, Jay Hannah wrote: > Robert Buels wrote: >> Once things are less monolithic, developing and releasing *should* >> be a LOT easier. As Jay also mentioned a bit, it's more like on >> Tuesday Charlie notices a bug in Bio::Foo::Bar, fixes it. Pushes >> it to CPAN (with a small version bump) immediately afterward. >> Users pick it up via Task::BioPerl. That's it. > > Hmm... In the Catalyst model, users never get a new copy of > Bio::Foo::Bar unless they explicitly install it. > > Typically, a user is perfectly happy with their pretty-out-of-date > copy of Bio::Foo::Bar sitting on their server. It works for them, so > they don't care. > > The big difference for the typical user, I think, is that when they > go to install a new server, grabbing the list of things they care > about from CPAN, what they're getting is current up to TODAY, > instead of months/years old. > > Like I said, I'm a bioperl-live addict, so haven't cared about CPAN > being current. But I'm blindly guessing that 95% of our customers > install whatever is sitting on CPAN right now. (That's certainly how > the rest of the Perl universe works.) A shame that our customers > continually don't benefit from all the recent hard work. Actually, I think the problem is, either we have a large set of users using bioperl-live as if it were stable code, or we have users using very old code (1.2.3 due to ensembl, and 1.4, yes, even after 1.6 was released). Part of this an be blamed (I'm sorry to say) on ensembl's continued insistence that they are only compatible with bioperl 1.2.3. I would really like a concrete reason on exactly WHY bioperl after 1.2.3 supposedly doesn't work with new Bio::EnsEMBL* code. I have some current scripts that beg to differ, so it can't be that broken, and if it is, we can certainly work coordinately to fix it. But requiring a certain part of our user community install an over 6-yr old version with a ton of bugs does not make me happy. >> Or, how about a slightly longer case study: >> Say on Wednesday Charlie notices that the design of Bio::Foo::Bar >> sucks and it really needs some work. He codes furiously for >> however long it takes, makes Bio::Fooer::Bar or something like >> that, in a new distribution, and pushes it to CPAN. Initially, no >> other modules are going to be using it, but then say Jason, the >> maintainer of Bio::SeqIO::fasta, notices that hey, Bio::Fooer::Bar >> is a lot better than Bio::Foo::Bar. Then he can just use it, test >> his new Bio::SeqIO::fasta with it, put it in his dist's Build.PL as >> a dependency, and push to CPAN. Now it's getting pulled in with >> Task::BioPerl and *USERS* now have been given that improvement, >> probably in only a matter of days. There are automated tests at >> every step of the process to ensure quality throughout. > > Yup. Every dist can declare it's dependency stack with every > release. If Bio::Foo::Bar is abandoned by all distributions, a new > copy of that dist is flagged DEPRECATED ("in favor of > Bio::Fooer::Bar"), and pushed to CPAN. That clues everyone in that > development has stopped and where they should go instead. For example: > > http://search.cpan.org/~mramberg/Catalyst-Plugin-FormValidator-0.03/ Okay, but seems kinda crufty. I do think there is some talk of removing such modules from the active CPAN, as they would always be available as part of BackPAN, but I haven't seen movement along those lines. >> Or for larger changes, coordination among several distros may be >> necessary, but the nice thing is, exactly which ones those are is >> codified in all their Build.PL files! Much less guessing and >> worrying about unintended consequences. Things are abstracted into >> smaller chunks, which are much easier for developers to wrap their >> minds around, which means developing is easier, which leads to more >> contributors and accelerated development. > > Ya. Two years ago there's no way I would have dared to change > Catalyst. But changing Catalyst::Foo::Bar::Baz was far less > intimidating and I was happy to submit a patch. That's how they > hooked me, and they've had me ever since. Then Moose got me, the > exact same way. -laugh- -sigh- -grin- Yes, I have to say it has been very nice with Moose, though I wish MooseX::Declare and MooseX::Method::Signatures would move out of alpha (probably will happen around the first stable release of perl6). >> ground-up rewrites of large projects almost never work. > > Ya, I wouldn't recommend a big bang approach. (Until BioPerl6?) The > whole idea is to turn the whole thing into lots of little bangs. :) I think bioperl6 will follow suit with a smaller bioperl-specific core (probably simple metaclass, exceptions, etc) and separate smaller focused distributions. Bio::Moose similarly. > Jason's list of targets is exciting! (Where's the Bio::Graphics SVN > repo?) Rob's answered that one: GMOD. > Anyhoo, I'll stop preaching to the choir now. > > Jay Hannah > http://bioperl.org/wiki/User:Jhannah chris From cjfields at illinois.edu Fri Jul 17 23:31:33 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 17 Jul 2009 22:31:33 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4A60EBB5.4010004@cornell.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> <4A60EBB5.4010004@cornell.edu> Message-ID: <3E2E79A0-D463-489C-8F49-99BF68000683@illinois.edu> On Jul 17, 2009, at 4:23 PM, Robert Buels wrote: > I was going to write a longer post, but Jay wrote everything I was > going to write, plus more, and did a better job. I think both of you made very good arguments. Will have to nickname you guys the IRC Mob. > ... > If there were not so many person-years of development time already > in BioPerl, I would probably be pushing for ground-up rewrite to > simplify things. But as chromatic frequently says (he's fantastic, > look him up), ground-up rewrites of large projects almost never > work. You lose a year (or multiple years) of person time rewriting > instead of adding features, or if you also add features to the old > version in parallel, you have to also port those features to the new > version (over a really long time period). It's theoretically > possible to do, but in practice it almost never works, he says. I > don't know, I've never been involved in an attempt like that from > start to finish. I agree. The Bio::Moose stuff is an initial attempt to see if it's worth porting code to Moose (I think it will be, but we'll see). If anything it'll be a port and will simplify the code. bioperl6 is similar in scope, using some concepts we would learn first from Bio::Moose, but with the additional fun of grammar-based parsing. >> Okay I rambled, hope that was helpful. > > Quite helpful! Please keep it up if you can! > > Rob Just don't waste too much time talkin' and not spend enough time codin' chris From cain.cshl at gmail.com Sat Jul 18 08:23:50 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Sat, 18 Jul 2009 08:23:50 -0400 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <9C987542-1B90-4462-9DE9-F88007579ACA@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <9C987542-1B90-4462-9DE9-F88007579ACA@illinois.edu> Message-ID: <2182D83B-D855-48B5-B57B-52F1D0FC78B6@gmail.com> Hi All, I don't want to wade in too deeply, but I like the idea of splitting things up. I think the Bio::Graphics split has gone well and has made life easier in GBrowse world. I could see Bio::DB::SeqFeature and Bio::DB::GFF being split and either being kept together or going there separate ways (though I have a nagging suspicion that SeqFeature code depends on GFF code in a few places, so it may make sense to just keep them together. And Chris, if it makes you feel any better, I don't think anything you've done or not done has held up GBrowse2. Scott On Jul 17, 2009, at 11:14 PM, Chris Fields wrote: > My 2c... > > On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote: > >> Will try to weigh in more, a little bit of stream of consciousness >> to let you know I'm thinking about it. Tough summer to focus much >> on this. > > Yes, for me as well. That will change soon (approx two weeks) ;> > >> It's too bad we are apparently the laughing stock of Perl gurus, >> but it would be great to see how to modernize aspects of the >> development. >> >> I'm curious how it will work that we'll have dozens of separate >> distros that we'll have a hard time keeping track of what directory >> things are in? Will there have to be a master list of what version >> and what modules are in what distro now? > > I don't think we're a laughingstock as much as we haven't had the > time to dedicate towards this (and much of this occurred at a point > early on, with that whole 'Cathedral and Bazaar' esr-based thingy). > BTW,, those same gurus shouldn't speak: perl core is just as bad and > riddled with worse bugs, though rgs and co. wouldn't admit it. > > In fact, base.pm itself has a nasty one; I'm surprised no one in the > bioperl community has noticed it yet (it's listed as a bug on RT I > think): > > pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print > $Bio::SeqIO::VERSION."\n"' > 1.0069 > pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print > $Bio::Root::IO::VERSION."\n"' > -1, set by base.pm > > Imported modules do not have VERSION set correctly when it is > exported. This hasn't become an issue in bioperl yet (it's really > an edge case), but several devs have run into this. And really, why > set VERSION to a string like '-1, set by base.pm'? > > Anyway, re: versioning, the way I think about it, if we have a small > very stable core with version X, and a focused very stable module > group with version Y, other distributions would have a separate > version and require subgroup version Y (which would in turn require > core version X). CPAN would take care of it. This isn't much > different than what occurs everyday on CPAN anyway (Jay's Catalyst, > Moose and MooseX, and so on). In fact, several Moose-requiring > distributions don't require the latest Moose. > >> When I do a SVN (or git) checkout do I need to checkout each of >> these in its own directory? Or will there be a master packaging >> script that makes the necessary zip files for CPAN submission? > > Not sure; that would be up to us I suppose. I think it would be > easier to maintain and release if they were separate or packaged up > as Jay suggests. > >> If they are in separate directories are we organizing by conceptual >> topic (phylogenetics, alignment, database search) or by namespace >> of the modules? > > By topic, retaining namespaces. We have a basic Bio::* directory > structure already in place for various generic terms (Tools, DB, > etc), so I see this crossing simple namespaces very easily. And as > I pointed out to Robert, several of those could possibly go together. > >> Do all the 'database' modules live together - probably not - so do >> we name bioperl-db-remote bioperl-db-local-index, bioperl-db-local- >> sql, etc? really bioperl-db is somewhat focused on sequences and >> features, but what about things that integrate multiple data types >> - like biosql? > > I don't see bioperl-db (BioSQL) being split up. I think it's too > intrinsically linked and cohesive (it's almost a separate core unto > itself), so it would be counterproductive to do so. > > Maybe have bioperl-db become bioperl-biosql. Web-based = bioperl- > remotedb. Local = bioperl-localdb. OBDA = bioperl-obda. > >> If they are in separate directories, what about all the test data >> that might be shared, is this replicated among all the sub- >> directories - how do we do a good job keeping that up to date, >> could we have a test-data distro instead with symlinks within SVN? > > We have to see how much is actually shared and proceed from there. > I would like to eventually resurrect the idea of a separate biodata > repo that we could just ftp the data from as needed. That would cut > down on the package size quite a bit, but I'm not sure how feasible > that is from the testing point of view (would we have to skip all > tests if there were no network access)? > >> For some other obvious modules that can be split off and self- >> contained, each of these could be a package. I would estimate more >> than 20 packages depending on how Bio::Tools are carved up. >> - I think Bio::DB::SeqFeature needs to be split off for sure this >> is a nice logical peeling off. Could be another test case since it >> is a Gbrowse dependancy >> - Bio::DB::GFF as well for the same reasons. > > Completely agree (and I think Lincoln would like this as well). > >> - Bio::PopGen - self contained for the most part, but depends on >> Bio::Tree and Bio::Align objects > > Could list those as a required dependency. > >> - Bio::Variation >> - Bio::Map and Bio::MapIO >> - Bio::Cluster and Bio::ClusterIO >> - Bio::Assembly >> - Bio::Coordinate >> >> My nightmare is that we're going to have to manage a lot of 'use XX >> 1.01' enforcing version requiring when dealing with the >> dependancies on the interface classes and having to keep these all >> up to date? The version was implicit when they are all part of the >> same big distro. > > Right. But it also becomes a maintenance problem when serious bugs > in one module impede the needed release of others to CPAN. > >> Also the splits need not only include one namespace if need be I >> guess but we have generally grouped things by namespace. >> >> What do you want to do about the bioperl-run. Do we make a set of >> parallel splits from all of these? I think at the outset we need >> to coordinate the applications supported here in some sort of loose >> ontology - the namespaces were not consistently applied so we have >> some alignment tools in different directories, etc. So the >> namespace sort of classifies them but it could be better. One of >> the challenges of multiple developers without a totally shared >> vision on how it should be done. > > We could split bp-run and Tools, pairing the wrappers with the > relevant parsers modules. Not sure if this can be done with > SearchIO as well but it could be tested to see how feasible that > would be. > >> I'm not convinced that the Bio::Graphics splitoff has been painless >> so we should take stock of how that is working. > > Really? Lincoln has made several fixes lately on CPAN, so I thought > everything was going well. If anything I would think the lack of > additional 1.6.x bioperl releases has probably held Gbrowse 2.0 up > more due to Bio::DB::SeqFeature (my fault, but as you know life and > job take precedence sometimes). > >> It seems like this split off would be a way to better streamline >> things in bioperl so that modern versions of bioperl might be able >> to better interface with things like Ensembl again too. >> >> How much of this effort is worth triaging on the current code >> versus the efforts we want to make on a cleaner, simpler bioperl >> system that appears to scare so many users (and potential >> developers) off. > > I say triage away on a branch, but we need to indicate which ones to > whittle out first. The reason I believe we went for a larger split > initially (as indicated on the wiki page) was to push something > forward and not get too bogged down in the details. But we may as > well go full throttle and do this right away. > >> Okay I rambled, hope that was helpful. >> >> -jason >> -- >> Jason Stajich >> jason at bioperl.org > > Very, very helpful. Now I need a beer. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From cjfields at illinois.edu Sat Jul 18 09:48:54 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 18 Jul 2009 08:48:54 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <2182D83B-D855-48B5-B57B-52F1D0FC78B6@gmail.com> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <9C987542-1B90-4462-9DE9-F88007579ACA@illinois.edu> <2182D83B-D855-48B5-B57B-52F1D0FC78B6@gmail.com> Message-ID: <63F5CAB0-2903-4E15-BC28-08B1091165DC@illinois.edu> Scott, I think keeping the two together is a good idea unless Bio::DB::GFF is essentially end-of-life and will no longer be maintained. Then maybe it's a good idea to port all needed methods to Bio::DB::SeaFeature and release the code separately, then call it a day on Bio::DB::GFF maintenance-wise? Just a thought. Nice to hear my tardiness on 1.6.whatever has not held up Gbrowse2. Thanks! Will be setting up my own local instance of Gbrowse2 here soon. chris On Jul 18, 2009, at 7:23 AM, Scott Cain wrote: > Hi All, > > I don't want to wade in too deeply, but I like the idea of splitting > things up. I think the Bio::Graphics split has gone well and has > made life easier in GBrowse world. I could see Bio::DB::SeqFeature > and Bio::DB::GFF being split and either being kept together or going > there separate ways (though I have a nagging suspicion that > SeqFeature code depends on GFF code in a few places, so it may make > sense to just keep them together. > > And Chris, if it makes you feel any better, I don't think anything > you've done or not done has held up GBrowse2. > > Scott > > > On Jul 17, 2009, at 11:14 PM, Chris Fields wrote: > >> My 2c... >> >> On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote: >> >>> Will try to weigh in more, a little bit of stream of consciousness >>> to let you know I'm thinking about it. Tough summer to focus much >>> on this. >> >> Yes, for me as well. That will change soon (approx two weeks) ;> >> >>> It's too bad we are apparently the laughing stock of Perl gurus, >>> but it would be great to see how to modernize aspects of the >>> development. >>> >>> I'm curious how it will work that we'll have dozens of separate >>> distros that we'll have a hard time keeping track of what >>> directory things are in? Will there have to be a master list of >>> what version and what modules are in what distro now? >> >> I don't think we're a laughingstock as much as we haven't had the >> time to dedicate towards this (and much of this occurred at a point >> early on, with that whole 'Cathedral and Bazaar' esr-based >> thingy). BTW,, those same gurus shouldn't speak: perl core is just >> as bad and riddled with worse bugs, though rgs and co. wouldn't >> admit it. >> >> In fact, base.pm itself has a nasty one; I'm surprised no one in >> the bioperl community has noticed it yet (it's listed as a bug on >> RT I think): >> >> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print >> $Bio::SeqIO::VERSION."\n"' >> 1.0069 >> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print >> $Bio::Root::IO::VERSION."\n"' >> -1, set by base.pm >> >> Imported modules do not have VERSION set correctly when it is >> exported. This hasn't become an issue in bioperl yet (it's really >> an edge case), but several devs have run into this. And really, why >> set VERSION to a string like '-1, set by base.pm'? >> >> Anyway, re: versioning, the way I think about it, if we have a >> small very stable core with version X, and a focused very stable >> module group with version Y, other distributions would have a >> separate version and require subgroup version Y (which would in >> turn require core version X). CPAN would take care of it. This >> isn't much different than what occurs everyday on CPAN anyway >> (Jay's Catalyst, Moose and MooseX, and so on). In fact, several >> Moose-requiring distributions don't require the latest Moose. >> >>> When I do a SVN (or git) checkout do I need to checkout each of >>> these in its own directory? Or will there be a master packaging >>> script that makes the necessary zip files for CPAN submission? >> >> Not sure; that would be up to us I suppose. I think it would be >> easier to maintain and release if they were separate or packaged up >> as Jay suggests. >> >>> If they are in separate directories are we organizing by >>> conceptual topic (phylogenetics, alignment, database search) or by >>> namespace of the modules? >> >> By topic, retaining namespaces. We have a basic Bio::* directory >> structure already in place for various generic terms (Tools, DB, >> etc), so I see this crossing simple namespaces very easily. And as >> I pointed out to Robert, several of those could possibly go together. >> >>> Do all the 'database' modules live together - probably not - so >>> do we name bioperl-db-remote bioperl-db-local-index, bioperl-db- >>> local-sql, etc? really bioperl-db is somewhat focused on >>> sequences and features, but what about things that integrate >>> multiple data types - like biosql? >> >> I don't see bioperl-db (BioSQL) being split up. I think it's too >> intrinsically linked and cohesive (it's almost a separate core unto >> itself), so it would be counterproductive to do so. >> >> Maybe have bioperl-db become bioperl-biosql. Web-based = bioperl- >> remotedb. Local = bioperl-localdb. OBDA = bioperl-obda. >> >>> If they are in separate directories, what about all the test data >>> that might be shared, is this replicated among all the sub- >>> directories - how do we do a good job keeping that up to date, >>> could we have a test-data distro instead with symlinks within SVN? >> >> We have to see how much is actually shared and proceed from there. >> I would like to eventually resurrect the idea of a separate biodata >> repo that we could just ftp the data from as needed. That would >> cut down on the package size quite a bit, but I'm not sure how >> feasible that is from the testing point of view (would we have to >> skip all tests if there were no network access)? >> >>> For some other obvious modules that can be split off and self- >>> contained, each of these could be a package. I would estimate >>> more than 20 packages depending on how Bio::Tools are carved up. >>> - I think Bio::DB::SeqFeature needs to be split off for sure this >>> is a nice logical peeling off. Could be another test case since >>> it is a Gbrowse dependancy >>> - Bio::DB::GFF as well for the same reasons. >> >> Completely agree (and I think Lincoln would like this as well). >> >>> - Bio::PopGen - self contained for the most part, but depends on >>> Bio::Tree and Bio::Align objects >> >> Could list those as a required dependency. >> >>> - Bio::Variation >>> - Bio::Map and Bio::MapIO >>> - Bio::Cluster and Bio::ClusterIO >>> - Bio::Assembly >>> - Bio::Coordinate >>> >>> My nightmare is that we're going to have to manage a lot of 'use >>> XX 1.01' enforcing version requiring when dealing with the >>> dependancies on the interface classes and having to keep these all >>> up to date? The version was implicit when they are all part of >>> the same big distro. >> >> Right. But it also becomes a maintenance problem when serious bugs >> in one module impede the needed release of others to CPAN. >> >>> Also the splits need not only include one namespace if need be I >>> guess but we have generally grouped things by namespace. >>> >>> What do you want to do about the bioperl-run. Do we make a set of >>> parallel splits from all of these? I think at the outset we need >>> to coordinate the applications supported here in some sort of >>> loose ontology - the namespaces were not consistently applied so >>> we have some alignment tools in different directories, etc. So >>> the namespace sort of classifies them but it could be better. One >>> of the challenges of multiple developers without a totally shared >>> vision on how it should be done. >> >> We could split bp-run and Tools, pairing the wrappers with the >> relevant parsers modules. Not sure if this can be done with >> SearchIO as well but it could be tested to see how feasible that >> would be. >> >>> I'm not convinced that the Bio::Graphics splitoff has been >>> painless so we should take stock of how that is working. >> >> Really? Lincoln has made several fixes lately on CPAN, so I >> thought everything was going well. If anything I would think the >> lack of additional 1.6.x bioperl releases has probably held Gbrowse >> 2.0 up more due to Bio::DB::SeqFeature (my fault, but as you know >> life and job take precedence sometimes). >> >>> It seems like this split off would be a way to better streamline >>> things in bioperl so that modern versions of bioperl might be able >>> to better interface with things like Ensembl again too. >>> >>> How much of this effort is worth triaging on the current code >>> versus the efforts we want to make on a cleaner, simpler bioperl >>> system that appears to scare so many users (and potential >>> developers) off. >> >> I say triage away on a branch, but we need to indicate which ones >> to whittle out first. The reason I believe we went for a larger >> split initially (as indicated on the wiki page) was to push >> something forward and not get too bogged down in the details. But >> we may as well go full throttle and do this right away. >> >>> Okay I rambled, hope that was helpful. >>> >>> -jason >>> -- >>> Jason Stajich >>> jason at bioperl.org >> >> Very, very helpful. Now I need a beer. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Tejashwari.Meerupati at mbioekol.lu.se Fri Jul 17 02:44:05 2009 From: Tejashwari.Meerupati at mbioekol.lu.se (Tejashwari Meerupati) Date: Fri, 17 Jul 2009 08:44:05 +0200 Subject: [Bioperl-l] Bioperl installation problem Message-ID: <3B5A974EEEE97E41A13F8345215F8AD8BD9C69665D@UWEXMBX01.uw.lu.se> Hi, I want to install the Bioperl (http://bioperl.open-bio.org/wiki/Installing_Bioperl_for_Unix) , I followed the instructions as mentioned in the above said link. I got an error message as : ************************************************************ Checking installed modules? XML::Parser is installed , it will be used by the test suite Checking if your kit is complete ? Looks good Writing MakeFile for XML::Simple Make :*** No rule to make target ?make?. Stop. /usr/bin/make make ? not OK Running make test Can?t test without successful make Runnings make install Make bad returned bad status, install seems impossible I cannot install the bioperl on my computer, is there any other alternative way to get it installed?? Thanks in advance!! Regards, Tejashwari From jason at bioperl.org Sat Jul 18 11:57:49 2009 From: jason at bioperl.org (Jason Stajich) Date: Sat, 18 Jul 2009 08:57:49 -0700 Subject: [Bioperl-l] Bioperl installation problem In-Reply-To: <3B5A974EEEE97E41A13F8345215F8AD8BD9C69665D@UWEXMBX01.uw.lu.se> References: <3B5A974EEEE97E41A13F8345215F8AD8BD9C69665D@UWEXMBX01.uw.lu.se> Message-ID: <10F4DCD4-870B-45CF-B575-E5DC990BE5E9@bioperl.org> I presume this is an osx machine? You need to have installed the developer tools so that make is installed. Without at least make,and for some other CPAN modules a C compiler, you can't install modules. -jason On Jul 16, 2009, at 11:44 PM, Tejashwari Meerupati wrote: > Hi, > I want to install the Bioperl (http://bioperl.open-bio.org/wiki/Installing_Bioperl_for_Unix > ) , I followed the instructions as mentioned in the above said link. > I got an error message as : > ************************************************************ > Checking installed modules? > XML::Parser is installed , it will be used by the test suite > Checking if your kit is complete ? > Looks good > Writing MakeFile for XML::Simple > Make :*** No rule to make target ?make?. Stop. > /usr/bin/make make ? not OK > Running make test > Can?t test without successful make > Runnings make install > Make bad returned bad status, install seems impossible > I cannot install the bioperl on my computer, is there any other > alternative way to get it installed?? > > Thanks in advance!! > > Regards, > Tejashwari > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From maj at fortinbras.us Sat Jul 18 12:10:23 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 18 Jul 2009 12:10:23 -0400 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4A5E7CE7.4040908@cornell.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> Message-ID: <8008912B85694235B57D38B161FFA9CB@NewLife> Hi All, After carefully reading this thread, weighing the pro and cons of the different positions, and searching out both the scientific and personal content of the reponses, I am compelled as a new core developer to say emphatically: "So long, and thanks for all the fish." No. I actually thought that a little analysis might tell us what BioPerl thinks its core is. So I invite you to check out http://www.bioperl.org/wiki/Module_Connectivity and come back swinging. Cheers, MAJ ----- Original Message ----- From: "Robert Buels" To: "Mark Jensen" ; "BioPerl List" Sent: Wednesday, July 15, 2009 9:05 PM Subject: Re: [Bioperl-l] Tree refactor? was Re: Bootstrap, root, reroot... > Rather than putting this in bioperl-dev, perhaps this would be a nice > opportunity to make a new distribution called something standard like > "Bio-Tree", with a standard directory structure, and a sane number of modules > in it. ... From cjfields at illinois.edu Sat Jul 18 13:11:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 18 Jul 2009 12:11:10 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <8008912B85694235B57D38B161FFA9CB@NewLife> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <8008912B85694235B57D38B161FFA9CB@NewLife> Message-ID: <0CC2A550-8EE0-4F45-8D01-019ED63579AE@illinois.edu> Nice graph! We should probably attempt something similar to Graph::Dependency to actually visualize some of the clusters. BTW, as long as one follows the rules doing this is fairly easy via Moose using the metaclass (can check both class and instance attributes). Not so much here. One other thing: dependencies.pl isn't completely full-proof though it strives to be. For instance, it currently doesn't catch loading modules dynamically via _load_module(), and understandably borks out on interpolated evals (something like 'eval "require $module; return 1;"'). Catching inheritance hierarchies via 'use base' alleviates that to a certain degree (IO modules for instance), but it's not perfect. pyrimidine1:bioperl-live cjfields$ ack _load_module Bio/AlignIO/Handler/GenericAlignHandler.pm 316: $self->_load_module($class); 323: $self->_load_module('Bio::Seq::Meta'); Bio/AlignIO.pm 407: $ok = $self->_load_module($module); Bio/Annotation/AnnotationFactory.pm 162: $self->_load_module($type); 196: $self->_load_module($type); Bio/Assembly/IO.pm 192: $ok = $self->_load_module($module); Bio/ClusterIO.pm 270: $ok = $self->_load_module($module); Bio/DB/Expression.pm 198: eval { $ok = $self->_load_module($module) }; Bio/DB/SeqVersion.pm 168: eval { $ok = $self->_load_module($module) }; Bio/DB/Taxonomy.pm 264: eval { $ok = $self->_load_module($module) }; Bio/DB/TFBS.pm 151: eval { $ok = $self->_load_module($module) }; Bio/Factory/DriverFactory.pm 167:=head2 _load_module 169: Title : _load_module 170: Usage : $self->_load_module("Bio::Tools::Genscan"); 178:sub _load_module { Bio/Factory/ObjectFactory.pm 184: $self->_load_module($type); Bio/Factory/SeqAnalysisParserFactory.pm 161: $self->_load_module($module); # throws an exception on failure to load Bio/FeatureIO.pm 420: $ok = $self->_load_module($module); Bio/Location/Atomic.pm 97: Bio::Root::Root->_load_module($class); Bio/MapIO.pm 209: $ok = $self->_load_module($module); Bio/Matrix/IO.pm 215: $ok = $self->_load_module($module); Bio/Matrix/PSM/IO.pm 195: $ok = $self->_load_module($module); Bio/OntologyIO.pm 273: $ok = $self->_load_module($module); Bio/PopGen/IO.pm 272: $ok = $self->_load_module($module); Bio/Restriction/IO.pm 169: $ok = $class->_load_module($module); Bio/Root/Root.pm 409:=head2 _load_module 411: Title : _load_module 412: Usage : $self->_load_module("Bio::SeqIO::genbank"); 420:sub _load_module { Bio/Search/Hit/HitFactory.pm 123: eval { $self->_load_module($type) }; 143: # redundancy with the create method which also calls _load_module 146: eval {$self->_load_module($type) }; Bio/Search/HSP/HSPFactory.pm 124: eval { $self->_load_module($type) }; 143: # redundancy with the create method which also calls _load_module 146: eval {$self->_load_module($type) }; Bio/Search/Result/ResultFactory.pm 123: eval { $self->_load_module($type) }; 143: # redundancy with the create method which also calls _load_module 146: eval {$self->_load_module($type) }; Bio/SearchIO/blastxml.pm 313: $ok = $self->_load_module($VALID_TYPE{$value}); Bio/SearchIO.pm 180: $class->_load_module($output_module); 446: $ok = $self->_load_module($module); Bio/SeqEvolution/Factory.pm 158: $ok = $self->_load_module($module); 188: $self->_load_module($self->{'_type'}); 295: $self->_load_module($self->{'_seq_type'}); Bio/SeqIO.pm 568: $ok = $self->_load_module($module); Bio/Tools/EUtilities.pm 1589: $ok = $self->_load_module($module); Bio/Tools/Run/StandAloneBlast.pm 381: Bio::Root::Root->_load_module($module); Bio/Tree/Tree.pm 381: $self->_load_module($iomod); Bio/TreeIO/TreeEventBuilder.pm 103: $self->_load_module($treetype); 104: $self->_load_module($nodetype); Bio/TreeIO.pm 243: $ok = $self->_load_module($module); Bio/Variation/IO.pm 279: $ok = $class->_load_module($module); -c On Jul 18, 2009, at 11:10 AM, Mark A. Jensen wrote: > Hi All, > > After carefully reading this thread, weighing the pro and cons of > the different positions, and searching out both the scientific and > personal content of the reponses, I am compelled as a new core > developer to say emphatically: > > "So long, and thanks for all the fish." > > No. I actually thought that a little analysis might tell us what > BioPerl thinks its core is. So I invite you to check out > http://www.bioperl.org/wiki/Module_Connectivity > and come back swinging. > > Cheers, MAJ > > ----- Original Message ----- From: "Robert Buels" > To: "Mark Jensen" ; "BioPerl List" > > Sent: Wednesday, July 15, 2009 9:05 PM > Subject: Re: [Bioperl-l] Tree refactor? was Re: Bootstrap, root, > reroot... > > >> Rather than putting this in bioperl-dev, perhaps this would be a >> nice opportunity to make a new distribution called something >> standard like "Bio-Tree", with a standard directory structure, and >> a sane number of modules in it. > ... From bartomas at gmail.com Sat Jul 18 14:38:11 2009 From: bartomas at gmail.com (bar tomas) Date: Sat, 18 Jul 2009 20:38:11 +0200 Subject: [Bioperl-l] Finding all bioactive substances through EUtils or PUG_SOAP In-Reply-To: References: Message-ID: Dear Chris, Thank you again for you helpful reply and your code. I've been trying to find a way to extend your BioPerl code to be able to retrieve the NCBI Taxonomy db IDs of the species in which the bioactive compounds are found. (The query that I'm interested in, is to find bioactive compounds found in natural organisms. I'd like to identify the species where the nautral compounds are found). I've looked in the web page you mention in your mail ( http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_index) and have found a linking filter *pcassay_taxonomy *for the bioassay database, but I think(?) that this does not refer to the taxonomy of the species in which the active screened compound is found. Do you know if it is possible to retrieve the link between a natural compound and the species in which the compound can be found? Thanks very much for any help or hints. (sorrys if the email is a bit misplaced in this discussion list as it is not really specific to bioperl, although I'm trying to implement it using Bioperl tools. I have not been able to find a general discussion list about querying Entrez databases, unspecific to any particular proramming language). Thanks again Tomas Bar ** On 7/15/09, Chris Fields wrote: > > Tomas B., > > Just so you know, this isn't really a bioperl-specific question, though you > may be able to use bioperl tools to do what you want. I'll run with the > latter assumption. > > I'm not too familiar with pubchem and related, but using einfo you can get > relevant information on the databases. The available databases are: > > pcassay > pccompound > pcsubstance > > Lots of filters available, summarized here: > > http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_index > > My guess is you would have to query the database pcassay with esearch and > the appropriate filter to find the IDs active for a particular assay, then > use elink from pcassay to either pccompound or pcsubstance to get what you > want. > > Using Bio::DB::EUtilities (below) this worked to get the compound IDs, you > could probably get more information using esummary (not sure if you can > retrieve all info on them). > > chris > > ========================================== > #!/usr/bin/perl -w > > use strict; > use warnings; > use Bio::DB::EUtilities; > > my $term = '"Luciferase Profiling Assay"'; > > my $factory = Bio::DB::EUtilities->new(-eutil => 'esearch', > -db => 'pcassay', > -term => $term, > -verbose => 1, > -retmax => 100); > > my @ids = $factory->get_ids; > > # note the linkname, can use same for pcsubstance > $factory->reset_parameters(-eutil => 'elink', > -db => 'pccompound', > -dbfrom => 'pcassay', > -linkname => 'pcassay_pccompound_active', > -id => \@ids); > > $factory->print_all; > ========================================== > > chris > > On Jul 15, 2009, at 8:40 AM, bar tomas wrote: > > Hi, >> >> Could you give me a hint on how to query Entrez databases to find all >> substances that have been found to be bioactive through a bioassay >> screening. >> I've looked at the wsdl file for querying pubchem (* >> http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap.cgi?wsdl* ) but have >> found >> no service for retrieving substance ids. >> Is there a way to do this with EUtils or a http query with parameters ? >> Thanks a lot. >> Tomas B. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > From asjo at koldfront.dk Sat Jul 18 16:00:25 2009 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Sat, 18 Jul 2009 22:00:25 +0200 Subject: [Bioperl-l] Writing .scf files Message-ID: <87zlb1ep7q.fsf@topper.koldfront.dk> Hi. I'm trying to write an .scf file from just a sequence (to create test-input). I can't seem to figure out how to do it correctly. This is what I do: #!/usr/bin/perl use strict; use warnings; use Bio::Seq::Quality; use Bio::SeqIO; print "BioPerl version " . $Bio::Root::Version::VERSION . "\n"; print "Creating Bio::Seq::Quality object\n"; my $seq=Bio::Seq::Quality->new( -qual=>'65 66 67 68 69 70 71 72 73 74 75 76', -seq =>'atcgatcgatcg', ); print "Writing .scf\n"; my $out=Bio::SeqIO->new(-file=>'>write.scf', -format=>'scf'); $out->write_seq(-target=>$seq); print "Reading .scf\n"; my $in=Bio::SeqIO->new(-file=>'write.scf', -format=>'scf'); my $in_seq=$in->next_seq; print "Done\n"; Basically following the pod of Bio::Seq::Quality and Bio::SeqIO::scf tells me to. Reading the scf gives me an exception; this is the output of the script: Creating Bio::Seq::Quality object Writing .scf Reading .scf --------------------- WARNING --------------------- MSG: seq doesn't validate with [0-9A-Za-z\*\-\.=~\\/\?], mismatch is ,, --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Attempting to set the sequence to [AEI] which does not look healthy STACK: Error::throw STACK: Bio::Root::Root::throw /home/adsj/work/bioperl/bioperl-live/Bio/Root/Root.pm:368 STACK: Bio::PrimarySeq::seq /home/adsj/work/bioperl/bioperl-live/Bio/PrimarySeq.pm:283 STACK: Bio::PrimarySeq::new /home/adsj/work/bioperl/bioperl-live/Bio/PrimarySeq.pm:234 STACK: Bio::LocatableSeq::new /home/adsj/work/bioperl/bioperl-live/Bio/LocatableSeq.pm:122 STACK: Bio::Seq::Meta::Array::new /home/adsj/work/bioperl/bioperl-live/Bio/Seq/Meta/Array.pm:180 STACK: Bio::Seq::Quality::new /home/adsj/work/bioperl/bioperl-live/Bio/Seq/Quality.pm:206 STACK: Bio::SeqIO::scf::next_seq /home/adsj/work/bioperl/bioperl-live/Bio/SeqIO/scf.pm:245 STACK: ./t/nzdb/write_scf:23 ----------------------------------------------------------- If I obtain an .scf file from an ab1-file (using stadens convert_trace to convert it), I can read the resulting .scf fine and I can write it anew and read the file I wrote too. So I guess I'm doing something wrong when creating my test-file, but I must have stared me blind on the code, I can't seem to see it. I am using a fresh BioPerl from svn (r15859.) Any ideas? Best regards, Adam -- "I said to Hank Williams: How lonely does it get? Adam Sj?gren Hank Williams hasn't answered yet" asjo at koldfront.dk From asjo at koldfront.dk Sat Jul 18 16:26:10 2009 From: asjo at koldfront.dk (Adam =?iso-8859-1?Q?Sj=F8gren?=) Date: Sat, 18 Jul 2009 22:26:10 +0200 Subject: [Bioperl-l] Writing .scf files In-Reply-To: <87zlb1ep7q.fsf@topper.koldfront.dk> ("Adam =?iso-8859-1?Q?Sj?= =?iso-8859-1?Q?=F8gren=22's?= message of "Sat, 18 Jul 2009 22:00:25 +0200") References: <87zlb1ep7q.fsf@topper.koldfront.dk> Message-ID: <87vdlpeo0t.fsf@topper.koldfront.dk> On Sat, 18 Jul 2009 22:00:25 +0200, Adam wrote: > MSG: Attempting to set the sequence to [AEI] which does not look healthy Ok, I think I have found out why, by checking the write_seq against the next_seq and looking at the offsets and lengths. It turns out that when synthesizing traces, the peak_indexes array gets exactly twice the length of what it should. This seems to be because Bio::SeqIO::scf::new() calls _synthesize_traces() itself on the Bio::Seq::SequenceTrace it created, but Bio::Seq::SequenceTrace::new does so as well, when no traces are supplied. So peak_indices ends up being twice as long and the data written after peak_indices in the .scf file is mixed up when read back. I have filed bugzilla bug #2881 with a patch attacted that simply removes the call to _synthesize_traces() (and set_accuracies(); same deal) from Bio::SeqIO::scf::write_seq(), which fixes the problem. Best regards, Adam -- "I said to Hank Williams: How lonely does it get? Adam Sj?gren Hank Williams hasn't answered yet" asjo at koldfront.dk From bix at sendu.me.uk Sun Jul 19 11:28:49 2009 From: bix at sendu.me.uk (bix at sendu.me.uk) Date: Sun, 19 Jul 2009 16:28:49 +0100 (BST) Subject: [Bioperl-l] bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> Message-ID: <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> > On Jul 17, 2009, at 11:54 AM, Sendu Bala wrote: >> Chris Fields wrote: >> But while BioPerl is still monolithic, how will people be able to >> choose which external dependencies they want to install? That's the >> question that must be resolved before getting rid of >> Bio::Root::Build. You'd also need to resolve the network tests >> issue. And, well, I guess all the other issues that Bio::Root:Build >> solves. > > I mentioned two options. The first was to revert back to > Module::Build. The second was to have Bio::Root::Build methods comply > with the Module::Build API. I'm not sure I follow. How does reverting back to Module::Build help core installers choose what they want to install? > As for the external dependencies, we're falling into the trap of > thinking general users need to install bioperl-live (and thus are > using a tarball and 'perl Build.PL'). Everyday users should use CPAN > (or PPM when we have that running); devs and advanced users can use > bioperl-live. A standard CPAN install should take care of most > required dependencies; No. B::R::Build's fancy stuff exists primarily for CPAN users. A standard CPAN installation using standard Module::Build would force all users to install all external dependencies for all BioPerl modules, even if they only wanted to use 5 BioPerl modules that had no external deps of their own. This is the main issue that is making it desirable for us to break Core up into smaller parts. > we should be able to push additional > 'dependencies' onto the required queue if the user wants them. I'm aware of no such functionality outside of B::R::Build. Elaborate? >>> It also causes a bit of a 'chicken-or-egg' issue with >>> subdistributions wanting to use Bio::Root::Build, in that one has >>> to check for the presence of Bio::Root::Build first and then >>> completely bail if it isn't present. One can't fall back to >>> Module::Build due to the API difference. >> >> For small sub-distributions that have no optional external >> dependencies (all of the BioPerl subdists?), they can be changed to >> just using pure Module::Build, while core retains Bio::Root::Build >> as long as core is monolithic. > > The bugs mainly pertain to bp/M::DB API conflicts. We could use > either/or if the API was the same, but it would be nice to have some > consistency and not have to choose between one or the other (or worse, > change from one to the other if a dependency is added). I Don't follow. A BioPerl subdist should never have optional external deps. The whole point of splitting Core into smaller bits is so that people can install only what they want. An optional dep means that they're installing something they don't want along with something they do. Do you see a problem with my suggestion? I was actually thinking of just going ahead and doing it: converting all the non-core dists to use pure Module::Build. That will instantly solve all the problems except for people wanting to use CPANPLUS to install BioPerl core. And while that spoils the CPAN automated testing, we've never had a single real user complain to us, have we? From bix at sendu.me.uk Sun Jul 19 11:42:09 2009 From: bix at sendu.me.uk (bix at sendu.me.uk) Date: Sun, 19 Jul 2009 16:42:09 +0100 (BST) Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <0CC2A550-8EE0-4F45-8D01-019ED63579AE@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <8008912B85694235B57D38B161FFA9CB@NewLife> <0CC2A550-8EE0-4F45-8D01-019ED63579AE@illinois.edu> Message-ID: <9c43908662eb60c3beffc6ab6128960b.squirrel@sendu.me.uk> > Nice graph! We should probably attempt something similar to > Graph::Dependency to actually visualize some of the clusters. We've done visualization long ago: http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/maintenance/module_usage.pl (I can't find my post to the list about it right now) We never really decided on splits based on that, beyond the obvious Bio::Graphics. There's a wiki page somewhere that was keeping track of what we thought the best splits might be. Was in mentioned in this thread already? If not, someone please post it. From cjfields at illinois.edu Sun Jul 19 15:35:11 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 19 Jul 2009 14:35:11 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <9c43908662eb60c3beffc6ab6128960b.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <8008912B85694235B57D38B161FFA9CB@NewLife> <0CC2A550-8EE0-4F45-8D01-019ED63579AE@illinois.edu> <9c43908662eb60c3beffc6ab6128960b.squirrel@sendu.me.uk> Message-ID: <273B95AB-D403-4155-900F-67C20E2012EF@illinois.edu> On Jul 19, 2009, at 10:42 AM, bix at sendu.me.uk wrote: >> Nice graph! We should probably attempt something similar to >> Graph::Dependency to actually visualize some of the clusters. > > We've done visualization long ago: > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/maintenance/module_usage.pl > (I can't find my post to the list about it right now) Yes, I recall the same. Actually, I found Tim Bunce's Module::Dependency can do this very easily (I've put a PNG or SVG up on the wiki for Bio::SeqIO): http://www.bioperl.org/wiki/Talk:Proposed_BioPerl_changes It appears to be partly broken, though; I'm not getting any external deps, just blank nodes. Should be easy to fix and patch, but Tim's longer maintaining it, though he's open to a co-maintainer (any volunteers?) > We never really decided on splits based on that, beyond the obvious > Bio::Graphics. Well, we had a very general idea but never really acted on it, partly due to time but primarily for the same reason I've been pointing out; we have never specifically defined what is core and what isn't (hard to take initiative when faced with that). I'm veering towards a very small core, myself, Bio::Root* only, using bundling to create larger distributions (something we raised in the past for Bundle::Bioperl and that Robert brought up via a Task::BioPerl). Anyway, the plan on the wiki was along the lines of wanting to break a deadlock between 'core is everything' and 'core is very small', and initiate the process by splitting the package into a core-main-dev hierarchy (see the Talk page for that). > There's a wiki page somewhere that was keeping track of what we > thought > the best splits might be. Was in mentioned in this thread already? > If not, > someone please post it. Already mentioned, though easily lost in the discussion. As it's moving a bit beyond core I've renamed it: http://www.bioperl.org/wiki/Proposed_BioPerl_changes I've also add a direct link to the main wiki page. I can plan a new 1.6.x release around all this, but my feeling is that past that future releases in 1.6.x will likely be dead to focus on this (we can go through an extensive alpha release). Viva la 1.7? chris From cjfields at illinois.edu Sun Jul 19 17:26:59 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 19 Jul 2009 16:26:59 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> Message-ID: <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> I don't want to distract away from the reorganization discussion, and email lends itself only so much to the discussion (interspering comments doesn't lend itself to easy reading), so I've posted my response to my blog: http://cjfields.wordpress.com/ I'm hoping that answers most of your questions and clarifies matters. chris PS : I don't think we're really that far apart on our thinking here, both on Module::Build/Bio::Root::Build and on restructuring. We just need to actually get the work done and stop writing about it ;> On Jul 19, 2009, at 10:28 AM, bix at sendu.me.uk wrote: >> On Jul 17, 2009, at 11:54 AM, Sendu Bala wrote: >>> Chris Fields wrote: >>> But while BioPerl is still monolithic, how will people be able to >>> choose which external dependencies they want to install? That's the >>> question that must be resolved before getting rid of >>> Bio::Root::Build. You'd also need to resolve the network tests >>> issue. And, well, I guess all the other issues that Bio::Root:Build >>> solves. >> >> I mentioned two options. The first was to revert back to >> Module::Build. The second was to have Bio::Root::Build methods comply >> with the Module::Build API. > > I'm not sure I follow. How does reverting back to Module::Build help > core > installers choose what they want to install? > > >> As for the external dependencies, we're falling into the trap of >> thinking general users need to install bioperl-live (and thus are >> using a tarball and 'perl Build.PL'). Everyday users should use CPAN >> (or PPM when we have that running); devs and advanced users can use >> bioperl-live. A standard CPAN install should take care of most >> required dependencies; > > No. B::R::Build's fancy stuff exists primarily for CPAN users. A > standard > CPAN installation using standard Module::Build would force all users > to > install all external dependencies for all BioPerl modules, even if > they > only wanted to use 5 BioPerl modules that had no external deps of > their > own. This is the main issue that is making it desirable for us to > break > Core up into smaller parts. > > >> we should be able to push additional >> 'dependencies' onto the required queue if the user wants them. > > I'm aware of no such functionality outside of B::R::Build. Elaborate? > > >>>> It also causes a bit of a 'chicken-or-egg' issue with >>>> subdistributions wanting to use Bio::Root::Build, in that one has >>>> to check for the presence of Bio::Root::Build first and then >>>> completely bail if it isn't present. One can't fall back to >>>> Module::Build due to the API difference. >>> >>> For small sub-distributions that have no optional external >>> dependencies (all of the BioPerl subdists?), they can be changed to >>> just using pure Module::Build, while core retains Bio::Root::Build >>> as long as core is monolithic. >> >> The bugs mainly pertain to bp/M::DB API conflicts. We could use >> either/or if the API was the same, but it would be nice to have some >> consistency and not have to choose between one or the other (or >> worse, >> change from one to the other if a dependency is added). > > I Don't follow. A BioPerl subdist should never have optional external > deps. The whole point of splitting Core into smaller bits is so that > people can install only what they want. An optional dep means that > they're > installing something they don't want along with something they do. > > Do you see a problem with my suggestion? I was actually thinking of > just > going ahead and doing it: converting all the non-core dists to use > pure > Module::Build. That will instantly solve all the problems except for > people wanting to use CPANPLUS to install BioPerl core. > > And while that spoils the CPAN automated testing, we've never had a > single > real user complain to us, have we? > From bix at sendu.me.uk Sun Jul 19 21:29:19 2009 From: bix at sendu.me.uk (bix at sendu.me.uk) Date: Mon, 20 Jul 2009 02:29:19 +0100 (BST) Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> Message-ID: <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> > I don't want to distract away from the reorganization discussion, and > email lends itself only so much to the discussion (interspering > comments doesn't lend itself to easy reading), so I've posted my > response to my blog: > > http://cjfields.wordpress.com/ > > I'm hoping that answers most of your questions and clarifies matters. Unfortunately, it doesn't. You've only reiterated what you've said previously in this thread, but didn't yet address my questions. In any case, you suggest a number of possible solutions, but which one are we actually going to go for? From sidd.basu at gmail.com Sun Jul 19 22:23:42 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Sun, 19 Jul 2009 21:23:42 -0500 Subject: [Bioperl-l] Contributing to biomoose In-Reply-To: References: Message-ID: <20090720022341.GA10399@Macintosh-74.local> Hi chris, I have forked your 'biomoose' git project and recently added bunch of roles in my master in addition to the one (Annotation) you have already merged. So far, these were mostly interface centric module and were quite easy to implement. Here is my github link .... http://github.com/cybersiddhu/biomoose/tree/master Anyway, i want to contribute with other module but wondering which other modules would be a good target particularly leaving those out which you/others are working or have plans to work on. Any suggestions/ideas/roadmaps will be really helpful. thanks, -siddhartha From cjfields at illinois.edu Sun Jul 19 23:43:20 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 19 Jul 2009 22:43:20 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> Message-ID: On Jul 19, 2009, at 8:29 PM, bix at sendu.me.uk wrote: >> I don't want to distract away from the reorganization discussion, and >> email lends itself only so much to the discussion (interspering >> comments doesn't lend itself to easy reading), so I've posted my >> response to my blog: >> >> http://cjfields.wordpress.com/ >> >> I'm hoping that answers most of your questions and clarifies matters. > > Unfortunately, it doesn't. You've only reiterated what you've said > previously in this thread, but didn't yet address my questions. Which ones? These? I have answered them, either in the post or in > I'm not sure I follow. How does reverting back to Module::Build help > core > installers choose what they want to install? Prior to Module::Build the Makefile.PL we just looked for the dependencies and reported back if they were missing; installation of those modules was left up to the user. I don't necessarily think it's our *responsibility* to make the job easier for the user to choose and install modules other than BioPerl. We just need to indicate what they may need to run certain modules (the warnings about missing recommended dependencies). If we take on the additional responsibility to allow users to both choose the modules and call CPAN and have them installed during the Build.PL script, it has to work under conditions we may not be able to completely control. The infinite loop bug, which I believe may be a combination of Gbrowse net install and a bad Module::Build, is an example of that. I am merely suggesting we only allow the CPAN installation option under certain circumstances (i.e. only during 'perl Build.PL', not within the CPAN or CPANPLUS shell, or when recursing). > I'm aware of no such functionality outside of B::R::Build. > Elaborate? (re: recommend/require queue) Determining what is recommended/required (and checking for them) is handled within Bio::Root::Build, is that correct? We could make those decisions prior to creating the instance, or take care of this internally (rearrange 'recommends'/'requires' based on what the user wants). When in CPAN/CPANPLUS shell push the installation of those to allow the currently running shell to do the installation; don't spawn an additional shell. That's all. This would allow CPAN/CPANPLUS (and possibly future implementations) to take care of installing the modules for us. > And while that spoils the CPAN automated testing, we've never had a > single > real user complain to us, have we? I believe bug reports count as complaints. Granted, a couple are from me (indicating problems I found during 1.6.0), but I assure you they are legitimate. A third report from me was actually from a user on the list (links in the report), another is an independent report indicating CPANPLUS installation problems re: META.yml. As CPANPLUS is being adopted by a lot of devs and it is used quite extensively in CPAN testers, I would really like to see it working; by far the largest proportion of UNKNOWN reports are due to CPANPLUS issues. > In any case, you suggest a number of possible solutions, but which > one are > we actually going to go for? If it's easier to fix Bio::Root::Build in a way that addresses the bugs reported (the ones I indicate), then I'm okay with that. I trust whatever it is you want to do. The three critical issues (as I've pointed out before) are: 1) Getting CPANPLUS installation working, which may be just META.yml, or it may be shell-related. I would like it for CPAN Testers, if for nothing else. That's at least 2 bug reports, maybe more. 2) Bio::Root::Build converted towards a Module::Build-compliant API, or we'll need to convert run/db/network to Module::Build. 1 bug report. 3) Avoid potential infinite looping. This may be Gbrowse-related via the net install script, but if Build.PL is being called in some way that potentially causes recursion we need to be aware of it. This one appears rarely, but I did manage to replicate it using an old Module::Build (I can't recall if I used the net install script or not). 1 bug report. chris From cjfields at illinois.edu Mon Jul 20 00:02:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 19 Jul 2009 23:02:55 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> Message-ID: <2D664559-6948-4A0D-AA1C-C7A53987AEE4@illinois.edu> On Jul 19, 2009, at 10:43 PM, Chris Fields wrote: > On Jul 19, 2009, at 8:29 PM, bix at sendu.me.uk wrote: > >>> I don't want to distract away from the reorganization discussion, >>> and >>> email lends itself only so much to the discussion (interspering >>> comments doesn't lend itself to easy reading), so I've posted my >>> response to my blog: >>> >>> http://cjfields.wordpress.com/ >>> >>> I'm hoping that answers most of your questions and clarifies >>> matters. >> >> Unfortunately, it doesn't. You've only reiterated what you've said >> previously in this thread, but didn't yet address my questions. > > Which ones? These? I have answered them, either in the post or in Sorry about that, halfway deleted that last sentence seeing as I hadn't answered all your questions, but hit send. Oops! chr From cjfields at illinois.edu Mon Jul 20 00:22:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 19 Jul 2009 23:22:41 -0500 Subject: [Bioperl-l] Contributing to biomoose In-Reply-To: <20090720022341.GA10399@Macintosh-74.local> References: <20090720022341.GA10399@Macintosh-74.local> Message-ID: <608C0716-BD0B-4113-84B6-2843E2226CA4@illinois.edu> I'm actually working on Bio::Moose::Annotation along with tests (I have SimpleValue, Reference, Comment, Target, and DBLink working). I'll try committing that tonight, but first I'll take a look at your code and try to make sure ours merge properly. I don't see any conflicts thus far. A ROADMAP is a good idea; I'll definitely work towards that. As for specific modules, there are lots of places to start: Bio::Tree? Bio::Location? Bio::Align* and Bio::SimpleAlign (just the basic functionality)? SeqFeature? The IO parsers should fall into place rather quickly once the basic objects are in place, I wouldn't worry about them for the moment; we need basic implementations and tests. Bio::Annotation::Collection is where I'm going next. Just make sure to 'use Bio::Moose' or 'use Bio::Moose::Role', and have the namespace in Bio::Moose so it doesn't conflict with regular BioPerl, e.g. 'Bio::Moose::Tree', roles maybe as 'Bio::Moose::Roles::Tree'. We will need to test these against their BioPerl counterparts sometime. I'll also add an AUTHORS file, so add your name in when you can. chris PS: Based on our recent discussions on-list about breaking up BioPerl, I'm wondering whether we'll need an eventual Bio::MooseX. May be something to think about. On Jul 19, 2009, at 9:23 PM, Siddhartha Basu wrote: > Hi chris, > I have forked your 'biomoose' git project and recently added bunch of > roles in my master in addition to the one (Annotation) you have > already > merged. So far, these were mostly interface centric module and were > quite easy > to implement. Here is my github link .... > http://github.com/cybersiddhu/biomoose/tree/master > > Anyway, i want to contribute with other module but wondering which > other > modules would be a good target particularly leaving those out which > you/others are working or have plans to work on. Any > suggestions/ideas/roadmaps will be really helpful. > > > thanks, > -siddhartha > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Mon Jul 20 00:15:45 2009 From: hartzell at alerce.com (George Hartzell) Date: Sun, 19 Jul 2009 21:15:45 -0700 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> Message-ID: <19043.61297.80141.781810@already.local> Chris Fields writes: > [...] > Prior to Module::Build the Makefile.PL we just looked for the > dependencies and reported back if they were missing; installation of > those modules was left up to the user. [...] Chiming here a bit late to say that I really *like* it when we leave installing the modules to the user. I'd often rather install them via e.g. the FreeBSD ports system instead of system, but how/why would BioPerl ever know that? g. From cjfields at illinois.edu Mon Jul 20 12:31:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 20 Jul 2009 11:31:03 -0500 Subject: [Bioperl-l] Finding all bioactive substances through EUtils or PUG_SOAP In-Reply-To: References: Message-ID: <5B1494D7-E5E4-4B68-BCF7-A3CE9EC72B91@illinois.edu> On Jul 18, 2009, at 1:38 PM, bar tomas wrote: > Dear Chris, > Thank you again for you helpful reply and your code. > I've been trying to find a way to extend your BioPerl code to be > able to retrieve the NCBI Taxonomy db IDs of the species in which > the bioactive compounds are found. > (The query that I'm interested in, is to find bioactive compounds > found in natural organisms. I'd like to identify the species where > the nautral compounds are found). > I've looked in the web page you mention in your mail (http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_index > ) > and have found a linking filter pcassay_taxonomy for the bioassay > database, but I think(?) that this does not refer to the taxonomy of > the species in which the active screened compound is found. I think this represents a legit link to taxonomy, either the species the assay is performed on the species of the protein target. > Do you know if it is possible to retrieve the link between a natural > compound and the species in which the compound can be found? I would think this is achievable through 'pcassay_taxonomy' or 'pccompound_taxonomy'. This appears to be assay/compound/substance- dependent, though, and a lot of them don't have links (it doesn't look like they are reported, or maybe the assay is generic). You may have to do some digging, unfortunately. chris > Thanks very much for any help or hints. > > (sorrys if the email is a bit misplaced in this discussion list as > it is not really specific to bioperl, although I'm trying to > implement it using Bioperl tools. I have not been able to find a > general discussion list about querying Entrez databases, unspecific > to any particular proramming language). > > Thanks again > > Tomas Bar From cjfields at illinois.edu Mon Jul 20 15:02:33 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 20 Jul 2009 14:02:33 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: <19043.61297.80141.781810@already.local> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <19043.61297.80141.781810@already.local> Message-ID: On Jul 19, 2009, at 11:15 PM, George Hartzell wrote: > Chris Fields writes: >> [...] >> Prior to Module::Build the Makefile.PL we just looked for the >> dependencies and reported back if they were missing; installation of >> those modules was left up to the user. [...] > > Chiming here a bit late to say that I really *like* it when we leave > installing the modules to the user. I'd often rather install them via > e.g. the FreeBSD ports system instead of system, but how/why would > BioPerl ever know that? > > g. That's a good point. Leaving it up to the user does make things a lot simpler. The only downside is the onslaught of users who don't know why a specific module doesn't work. May be the reason this was added in? chris From sidd.basu at gmail.com Mon Jul 20 21:56:55 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Mon, 20 Jul 2009 20:56:55 -0500 Subject: [Bioperl-l] Re: Contributing to biomoose In-Reply-To: <608C0716-BD0B-4113-84B6-2843E2226CA4@illinois.edu> References: <20090720022341.GA10399@Macintosh-74.local> <608C0716-BD0B-4113-84B6-2843E2226CA4@illinois.edu> Message-ID: <20090721015654.GA5651@siddhartha-basus-computer.local> On Sun, 19 Jul 2009, Chris Fields wrote: > I'm actually working on Bio::Moose::Annotation along with tests (I have > SimpleValue, Reference, Comment, Target, and DBLink working). I'll try > committing that tonight, but first I'll take a look at your code and try > to make sure ours merge properly. I don't see any conflicts thus far. > > A ROADMAP is a good idea; I'll definitely work towards that. As for > specific modules, there are lots of places to start: > > Bio::Tree? > Bio::Location? > Bio::Align* and Bio::SimpleAlign (just the basic functionality)? > SeqFeature? Will start with Bio::Location then. > > The IO parsers should fall into place rather quickly once the basic > objects are in place, I wouldn't worry about them for the moment; we > need basic implementations and tests. Bio::Annotation::Collection is > where I'm going next. > > Just make sure to 'use Bio::Moose' or 'use Bio::Moose::Role', and have > the namespace in Bio::Moose so it doesn't conflict with regular BioPerl, > e.g. 'Bio::Moose::Tree', roles maybe as 'Bio::Moose::Roles::Tree'. We > will need to test these against their BioPerl counterparts sometime. I will also try to put things in the wiki as i go along, particularly my understanding of the basic moose role/base classes that you have added. > > I'll also add an AUTHORS file, so add your name in when you can. > > chris > > PS: Based on our recent discussions on-list about breaking up BioPerl, > I'm wondering whether we'll need an eventual Bio::MooseX. May be > something to think about. Of course it there is a core biomoose distribution, then that namespace makes a lot of sense for non-core modules. The philosophy also goes nicely with the organization of current MooseX modules. And if something in Bio::MooseX(really futuristic) becomes heavily important it can be integrated into the core Bio::Moose namespace. The same thing is also happening with MooseX::Attribute module. -siddhartha > > On Jul 19, 2009, at 9:23 PM, Siddhartha Basu wrote: > > > Hi chris, > > I have forked your 'biomoose' git project and recently added bunch of > > roles in my master in addition to the one (Annotation) you have > > already > > merged. So far, these were mostly interface centric module and were > > quite easy > > to implement. Here is my github link .... > > http://github.com/cybersiddhu/biomoose/tree/master > > > > Anyway, i want to contribute with other module but wondering which > > other > > modules would be a good target particularly leaving those out which > > you/others are working or have plans to work on. Any > > suggestions/ideas/roadmaps will be really helpful. > > > > > > thanks, > > -siddhartha > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Tue Jul 21 00:03:26 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 20 Jul 2009 21:03:26 -0700 Subject: [Bioperl-l] Contributing to biomoose In-Reply-To: <20090721015654.GA5651@siddhartha-basus-computer.local> References: <20090720022341.GA10399@Macintosh-74.local> <608C0716-BD0B-4113-84B6-2843E2226CA4@illinois.edu> <20090721015654.GA5651@siddhartha-basus-computer.local> Message-ID: <4A653E0E.4040605@cornell.edu> Siddhartha Basu wrote: > Of course it there is a core biomoose distribution, then that namespace > makes a lot of sense for non-core modules. The philosophy also goes > nicely with the organization of current MooseX modules. And if something > in Bio::MooseX(really futuristic) becomes heavily important it can be > integrated into the core Bio::Moose namespace. The same thing is also > happening with MooseX::Attribute module. Bio::Moose isn't a good namespace for the long term. For experimenting around with Moosey implementation techniques it's fine, but before you guys go putting TOO much code into it, consider what its future is going to be. Moose is an implementation technology, and modules should be named for what they do, not how they're implemented. We already know Moose is far superior for organizing and expressing designs, so what I would be shooting for here would be some deep, focused implementations of certain aspects. It's starting to sound like it's getting a little bigger than just playing around. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From bartomas at gmail.com Tue Jul 21 10:43:12 2009 From: bartomas at gmail.com (bar tomas) Date: Tue, 21 Jul 2009 15:43:12 +0100 Subject: [Bioperl-l] Finding all bioactive substances through EUtils or PUG_SOAP In-Reply-To: <5B1494D7-E5E4-4B68-BCF7-A3CE9EC72B91@illinois.edu> References: <5B1494D7-E5E4-4B68-BCF7-A3CE9EC72B91@illinois.edu> Message-ID: Thanks a lot for your help and your ideas. I had the impression that 'pcassay_taxonomy' refers to the taxonomy of the screening substance not the screened substance which is what I'm interested in. But I'll have a better look. Thanks again Tomas Bar On Mon, Jul 20, 2009 at 5:31 PM, Chris Fields wrote: > On Jul 18, 2009, at 1:38 PM, bar tomas wrote: > > Dear Chris, >> Thank you again for you helpful reply and your code. >> I've been trying to find a way to extend your BioPerl code to be able to >> retrieve the NCBI Taxonomy db IDs of the species in which the bioactive >> compounds are found. >> (The query that I'm interested in, is to find bioactive compounds found in >> natural organisms. I'd like to identify the species where the nautral >> compounds are found). >> I've looked in the web page you mention in your mail ( >> http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_index) >> and have found a linking filter pcassay_taxonomy for the bioassay >> database, but I think(?) that this does not refer to the taxonomy of the >> species in which the active screened compound is found. >> > > I think this represents a legit link to taxonomy, either the species the > assay is performed on the species of the protein target. > > Do you know if it is possible to retrieve the link between a natural >> compound and the species in which the compound can be found? >> > > I would think this is achievable through 'pcassay_taxonomy' or > 'pccompound_taxonomy'. This appears to be > assay/compound/substance-dependent, though, and a lot of them don't have > links (it doesn't look like they are reported, or maybe the assay is > generic). You may have to do some digging, unfortunately. > > chris > > > Thanks very much for any help or hints. >> >> (sorrys if the email is a bit misplaced in this discussion list as it is >> not really specific to bioperl, although I'm trying to implement it using >> Bioperl tools. I have not been able to find a general discussion list about >> querying Entrez databases, unspecific to any particular proramming >> language). >> >> Thanks again >> >> Tomas Bar >> > > > From MEC at stowers.org Tue Jul 21 12:59:15 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Tue, 21 Jul 2009 11:59:15 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: <7BBF64FF-F531-4F7C-8A31-BD04FCE1BF1A@gmail.com> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> <7BBF64FF-F531-4F7C-8A31-BD04FCE1BF1A@gmail.com> Message-ID: Chris, I wound up adding a new test # $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $ with the comment : # malcolm_cook at stowers.org: this test is in a separate file from # RemoteBlast.t (on which it is modelled) since there is some sort of # side-effecting between the multiple remote blasts that is causing # this test to fail, if it comes last, or the other test to fail, if # this one comes first. THIS IS A BUG EITHER IN REMOTE BLAST OR MY # UNDERSTANDING, i.e. of how to initialize it. In any case, the test passes and demos rpsblast usage. Cheers, Malcolm Cook Stowers Institute for Medical Research - Kansas City, Missouri > -----Original Message----- > From: Chris Fields [mailto:cjfields1 at gmail.com] > Sent: Friday, July 10, 2009 1:05 PM > To: Cook, Malcolm > Cc: 'Jonas Schaer'; 'BioPerl List' > Subject: Re: [Bioperl-l] cdd-search with remoteblast? > > Malcolm, > > Nice! Go ahead and add the test in; we can look at trying to > get CDD_SEARCH working at some point but this is a nice workaround. > > chris > > On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: > > > Chris, I've added a test to bioperl RemoteBlast.t that demonstrates > > the following. Is it appropriate to submit it? > > > > Jonas, OK, I was a little quick on the gun... but I've got it now. > > > > You don't need to change the wrapper. Here is what you need to do: > > > > # 1) set your database like this: > > > > -database => 'cdsearch/cdd', # c.f. > > http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html > > for other cdd database options > > > > # 2) add this line before submitting the job: > > $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; > > > > You're in - No other changes needed. > > > > Malcolm Cook > > Stowers Institute for Medical Research - Kansas City, Missouri > > > > > >> -----Original Message----- > >> From: Jonas Schaer [mailto:Brotelzwieb at gmx.de] > >> Sent: Friday, July 10, 2009 4:18 AM > >> To: BioPerl List; Cook, Malcolm; Chris Fields > >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >> > >> Hi, > >> I tried to do what Malcom proposed my ($prog = 'rpsblast'; > >> my $db = > >> 'CDD';) but that didn't work. > >> > >> ------------- EXCEPTION: Bio::Root::Exception ------------- > >> MSG: Value rpsblast for PUT parameter PROGRAM does not match > >> expression t?blast[ pnx]. Rejecting. > >> STACK: Error::throw > >> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 > >> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >> C:/Perl/site/lib/Bio/Tools > >> /Run/RemoteBlast.pm:329 > >> STACK: Bio::Tools::Run::RemoteBlast::new > >> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl > >> ast.pm:257 > >> STACK: blast_a_seq2.pm:14 > >> ----------------------------------------------------------- > >> So I should try to "change the wrapper to allow > 'rpsblast'", right? > >> Could You tell me how to do that, please? So sorry but I > have no idea > >> yet...:) If that doesn't work, is there any other way to run > >> cdd-searches with perl? > >> Thank you so much! > >> Regards, Jonas > >> > >> ----- Original Message ----- > >> From: "Chris Fields" > >> To: "Cook, Malcolm" > >> Cc: "'Jonas Schaer'" ; "'BioPerl List'" > >> ; "'Smithies, Russell'" > >> ; > >> Sent: Thursday, July 09, 2009 9:19 PM > >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >> > >> > >>> I've scheduled this tentatively for the 1.6 release > series (just not > >>> sure when yet). It may work as is, but I haven't tried > it out yet > >>> (and am hazarding to guess it only retrieves the single > main RID at > >>> the moment). > >>> > >>> chris > >>> > >>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: > >>> > >>>> Jonas, > >>>> > >>>> If you want to continue to use the bioperl remoteblast > interface, > >>>> probably what you should do is simply call it twice. > >>>> > >>>> Once, as you already know how to do, which will return > without CDD > >>>> results. > >>>> > >>>> Secondly, to get the CDD results, call remoteblast a second time. > >>>> This time, using > >>>> -database => 'CDD' > >>>> -program => 'rpsblast' > >>>> > >>>> However, the wrapper may object to the 'rpsblast' > program. It is > >>>> not listed in the POD - > >>>> > >> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R > >> emoteBlast.pm) > >>>> If so, my guess is that changing the perl wrapper to allow > >>>> rpsblast will "just work" (tm). I've cc:ed > >> cjfields at bioperl.org for > >>>> his opinion on this. > >>>> > >>>> Also, you might want to perform the CDD search first, > especially if > >>>> you are streaming results to eyeball that might like > something to > >>>> look at while the second (presumably longer) search is running. > >>>> > >>>> Cheers, > >>>> > >>>> Malcolm Cook > >>>> Stowers Institute for Medical Research - Kansas City, Missouri > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: bioperl-l-bounces at lists.open-bio.org > >>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf > Of Jonas > >>>>> Schaer > >>>>> Sent: Thursday, July 09, 2009 5:16 AM > >>>>> To: BioPerl List; Smithies, Russell > >>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? > >>>>> > >>>>> Hi guys, > >>>>> Thank you all so much for your help and patience :). Of > course you > >>>>> were right and I finaly found the right put-parameter to get > >>>>> exactly the same hits as on the homepage. > >>>>> I do have an other question though :)... > >>>>> I now want to include a search for conserved domains, > but when I > >>>>> try to use the CDD_SEARCH-parameter > >>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# > >>>>> sub:CDD_SEARCH) > >>>>> like the other put-parameters the way chris once told me(works > >>>>> fine with the other params): > >>>>> > >>>>> my %put = ( > >>>>> WORD_SIZE => 3, > >>>>> HITLIST_SIZE => 100, > >>>>> THRESHOLD => 11, > >>>>> FILTER => 'R', > >>>>> GENETIC_CODE => 1, > >>>>> CDD_SEARCH => 'on' > >>>>> ###I tried it > >>>>> with 'true' and '1', too. > >>>>> > >>>>> ); > >>>>> > >>>>> for my $putName (keys %put) { > >>>>> $factory->submit_parameter($putName,$put{$putName}); > >>>>> } > >>>>> > >>>>> > >>>>> ...an exception is thrown: > >>>>> > >>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > >>>>> MSG: CDD_SEARCH is not a valid PUT parameter. > >>>>> STACK: Error::throw > >>>>> STACK: Bio::Root::Root::throw > >> C:/Perl/site/lib/Bio/Root/Root.pm:359 > >>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >>>>> C:/Perl/site/lib/Bio/Tools > >>>>> /Run/RemoteBlast.pm:325 > >>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 > >>>>> STACK: main::blast_it firsteval0.8.pm:288 > >>>>> STACK: firsteval0.8.pm:35 > >>>>> ----------------------------------------------------------- . > >>>>> I guess somehow this could be the solution to my problem: > >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s > >>>>> ub:RID-for-Simultaneous > >>>>> , but unfortunately I don't understand what to do. > >>>>> I'm so sorry to bother you with this but please help me once > >>>>> more...:) > >>>>> > >>>>> Best regards and thanks in advance, Jonas > >>>>> > >>>>> ----- Original Message ----- > >>>>> From: "Smithies, Russell" > >>>>> To: "'Jonas Schaer'" > >>>>> Cc: "'Chris Fields'" ; "'BioPerl List'" > >>>>> > >>>>> Sent: Monday, July 06, 2009 10:56 PM > >>>>> Subject: RE: [Bioperl-l] different results with > >> remote-blast skript > >>>>> > >>>>> > >>>>> Hi Jonas, > >>>>> You can't just play with the BLAST parameters and hope > >> for a "better" > >>>>> result. > >>>>> I'd suggest that if you aren't sure what they do, you > should leave > >>>>> them alone as small changes can make huge differences in the > >>>>> output - it's quite possible to miss finding what > you're looking > >>>>> for by using > >> the wrong > >>>>> parameters. > >>>>> If all else fails, read the blast manual: > >>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall > >>>>> _all.html > >>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ > >>>>> Or Read Ian Korfs' excellent book: > >>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp > >>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 > >>>>> > >>>>> Don't worry about the integer overflow bug as there's > nothing you > >>>>> can do about it. If you're interested, Google and Wikipedia are > >>>>> your > >>>>> friends: > >>>>> http://en.wikipedia.org/wiki/Integer_overflow > >>>>> > >>>>> > >>>>> Russell > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. > >>>>>> To: BioPerl List; Chris Fields > >>>>>> Subject: Re: [Bioperl-l] different results with > >> remote-blast skript > >>>>>> > >>>>>> Hi guys, thanks for your answers so far. > >>>>>> @jason: integer overflow in blast.... sorry, but what do > >>>>> you mean by that? > >>>>>> how can I fix it...? > >>>>>> > >>>>>> Since I never really changed any parameters I thought them > >>>>> all to be > >>>>>> default. > >>>>>> whatever, I tried to get "better" results with my prog > >> by changing > >>>>>> these: > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; > >>>>>> > >>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI > >>>>> STICS'} = > >>>>>> '1'; > >>>>>> with no effect...I guess these were default values anyway. > >>>>>> > >>>>>> So please maybe you can tell me all the other parameters I > >>>>> can change with > >>>>>> my > >>>>>> perl-skript AND how to do that? > >>>>>> Unfortunately both, perl and the blast-algorithm are pretty > >>>>> much new to > >>>>>> me, > >>>>>> maybe thats why I just cannot find out how to do that on my > >>>>> own... :/ > >>>>>> > >>>>>> Here is the output I get with my remote-blast skript: > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>> ################################### > >>>>>> Query Name: > >>>>>> > >> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL > >>>>>> L > >>>>>> hit name is ref|XP_001702807.1| > >>>>>> score is 442 > >>>>>> BLASTP 2.2.21+ > >>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro > >>>>> A. Schaffer, > >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > >>>>> Lipman (1997), > >>>>>> "Gapped > >>>>>> BLAST and PSI-BLAST: a new generation of protein > database search > >>>>>> programs", Nucleic Acids Res. 25:3389-3402. > >>>>>> > >>>>>> > >>>>>> Reference for composition-based statistics: Alejandro A. > >>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, > >>>>> John L. Spouge, > >>>>>> Yuri > >>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), > >>>>> "Improving the > >>>>>> accuracy of PSI-BLAST protein database searches with > >>>>> composition-based > >>>>>> statistics and other refinements", Nucleic Acids Res. > >> 29:2994-3005. > >>>>>> > >>>>>> > >>>>>> RID: 53STX5G2013 > >>>>>> > >>>>>> > >>>>>> Database: All non-redundant GenBank CDS > >>>>>> translations+PDB+SwissProt+PIR+PRF excluding > >> environmental samples > >>>>>> from WGS projects > >>>>>> 9,252,587 sequences; 3,169,972,781 total > letters Query= > >>>>>> > >>>>> > >> > MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL > >>>>>> > >>>>> > >> > DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA > >> M > >>>>>> ATGPDPDDEYE > >>>>>> Length=150 > >>>>>> > >>>>>> > >>>>>> > >>>>> Score > >>>>>> E > >>>>>> Sequences producing significant alignments: > >>>>> (Bits) > >>>>>> Value > >>>>>> > >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>>>> reinhard... 174 > >>>>>> 2e-42 > >>>>>> > >>>>>> > >>>>>> ALIGNMENTS > >>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >> reinhardtii] > >>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] > >>>>>> Length=303 > >>>>>> > >>>>>> Score = 174 bits (442), Expect = 2e-42, Method: > >>>>> Composition-based > >>>>>> stats. > >>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), > >>>>> Gaps = 0/150 > >>>>>> (0%) > >>>>>> > >>>>>> Query 1 > >>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds > >>>>>> 60 > >>>>>> > >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>>>> Sbjct 154 > >>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS > >>>>>> 213 > >>>>>> > >>>>>> Query 61 > >>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>>>> 120 > >>>>>> > >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>>>> Sbjct 214 > >>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR > >>>>>> 273 > >>>>>> > >>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 > >>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct 274 > >>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 > >>>>>> > >>>>>> > >>>>>> > >>>>>> Database: All non-redundant GenBank CDS > >>>>>> translations+PDB+SwissProt+PIR+PRF > >>>>>> excluding environmental samples from WGS projects > >>>>>> Posted date: Jul 5, 2009 4:41 AM Number of letters in > >>>>>> database: -1,124,994,511 Number of sequences in database: > >>>>>> 9,252,587 > >>>>>> > >>>>>> Lambda K H > >>>>>> 0.309 0.122 0.345 > >>>>>> Gapped > >>>>>> Lambda K H > >>>>>> 0.267 0.0410 0.140 > >>>>>> Matrix: BLOSUM62 > >>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of > Sequences: > >>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions: > >>>>>> 1448367 Number of successful extensions: 2103 Number > of sequences > >>>>>> better than 10: 0 Number of HSP's better than 10 > without gapping: > >>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully > >>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781 > >>>>>> Length adjustment: 113 Effective length of query: 37 Effective > >>>>>> length of database: 2124430450 Effective search space: > >>>>>> 78603926650 Effective search space used: 78603926650 > >>>>>> T: 11 > >>>>>> A: 40 > >>>>>> X1: 16 (7.1 bits) > >>>>>> X2: 38 (14.6 bits) > >>>>>> X3: 64 (24.7 bits) > >>>>>> S1: 42 (20.8 bits) > >>>>>> S2: 74 (33.1 bits) > >>>>>> > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>> ################################### > >>>>>> and here are the hits (?) of the blast-algorithm on the > >>>>> ncbi-homepage with > >>>>>> the same query of course: > >>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas > >>>>> reinhard... 300 > >>>>>> 3e-80 > >>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA > >>>>> [Acyrtho... 36.2 > >>>>>> 1.1 > >>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 > >>>>> [Blautia... 35.4 > >>>>>> 1.8 > >>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania > >>>>> brazil... 34.3 > >>>>>> 4.2 > >>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 > >>>>> [Aspergillus n... 33.5 > >>>>>> 6.0 > >>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 > >>>>> [Methyloba... 33.5 > >>>>>> 7.0 > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>> ###################################at > >>>>>> least the first hit is the same, but even there there is a > >>>>> different score > >>>>>> and e-value. > >>>>>> > >>>>>> thanks so much for any help :) > >>>>>> regards, jonas > >>>>>> > >>>>>> > >>>>>> ----- Original Message ----- > >>>>>> From: "Chris Fields" > >>>>>> To: "Jason Stajich" > >>>>>> Cc: "Smithies, Russell" > >>>>> ; "'BioPerl > >>>>>> List'" ; "'Jonas Schaer'" > >>>>>> > >>>>>> Sent: Monday, July 06, 2009 12:51 AM > >>>>>> Subject: Re: [Bioperl-l] different results with > >> remote-blast skript > >>>>>> > >>>>>> > >>>>>>> That inspires confidence ;> > >>>>>>> > >>>>>>> chris > >>>>>>> > >>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: > >>>>>>> > >>>>>>>> integer overflow in blast.... > >>>>>>>> > >>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: > >>>>>>>> > >>>>>>>>> I'd guess it's a difference in the parameters used. > >>>>>>>>> Interesting that both have the number of letters in > the db as > >>>>>>>>> "-1,125,070,205", I assume that's a bug :-) > >>>>>>>>> > >>>>>>>>> Stats from your remote_blast: > >>>>>>>>> > >>>>>>>>> 'stats' => { > >>>>>>>>> 'S1' => '42', > >>>>>>>>> 'S1_bits' => '20.8', > >>>>>>>>> 'lambda' => '0.309', > >>>>>>>>> 'entropy' => '0.345', > >>>>>>>>> 'kappa_gapped' => '0.0410', > >>>>>>>>> 'T' => '11', > >>>>>>>>> 'kappa' => '0.122', > >>>>>>>>> 'X3_bits' => '24.7', > >>>>>>>>> 'X1' => '16', > >>>>>>>>> 'lambda_gapped' => '0.267', > >>>>>>>>> 'X2' => '38', > >>>>>>>>> 'S2' => '74', > >>>>>>>>> 'seqs_better_than_cutoff' => '0', > >>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', > >>>>>>>>> 'Hits_to_DB' => '60102303', > >>>>>>>>> 'dbletters' => '-1125070205', > >>>>>>>>> 'A' => '40', > >>>>>>>>> 'num_successful_extensions' => '2004', > >>>>>>>>> 'num_extensions' => '1436892', > >>>>>>>>> 'X1_bits' => '7.1', > >>>>>>>>> 'X3' => '64', > >>>>>>>>> 'entropy_gapped' => '0.140', > >>>>>>>>> 'dbentries' => '9252258', > >>>>>>>>> 'X2_bits' => '14.6', > >>>>>>>>> 'S2_bits' => '33.1' > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Stats from a blast done on the NCBI webpage: > >>>>>>>>> > >>>>>>>>> Database: All non-redundant GenBank CDS > >>>>> translations+PDB+SwissProt > >>>>>>>>> +PIR+PRF > >>>>>>>>> excluding environmental samples from WGS projects > Posted date: > >>>>>>>>> Jul 4, 2009 4:41 AM Number of letters in database: > >>>>>>>>> -1,125,070,205 Number of sequences in database: 9,252,258 > >>>>>>>>> > >>>>>>>>> Lambda K H > >>>>>>>>> 0.309 0.124 0.340 > >>>>>>>>> Gapped > >>>>>>>>> Lambda K H > >>>>>>>>> 0.267 0.0410 0.140 > >>>>>>>>> Matrix: BLOSUM62 > >>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of > >>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of > >>>>>>>>> extensions: 3101413 Number of successful extensions: 9001 > >>>>>>>>> Number of sequences better than 100: 65 Number of > HSP's better > >>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000 > >>>>>>>>> Number of HSP's successfully gapped: 66 Length of > query: 150 > >>>>>>>>> Length of database: 3169897087 Length adjustment: 113 > >>>>>>>>> Effective length of query: 37 Effective length of database: > >>>>>>>>> 2124391933 Effective search space: 78602501521 Effective > >>>>>>>>> search space used: 78602501521 > >>>>>>>>> T: 11 > >>>>>>>>> A: 40 > >>>>>>>>> X1: 16 (7.1 bits) > >>>>>>>>> X2: 38 (14.6 bits) > >>>>>>>>> X3: 64 (24.7 bits) > >>>>>>>>> S1: 42 (20.8 bits) > >>>>>>>>> S2: 65 (29.6 bits) > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l- > >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer > >>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. > >>>>>>>>>> To: BioPerl List > >>>>>>>>>> Subject: [Bioperl-l] different results with > >> remote-blast skript > >>>>>>>>>> > >>>>>>>>>> Hi again :) > >>>>>>>>>> please, I only have this little question: > >>>>>>>>>> why do I get different results with my remote::blast > >>>>> perl skript > >>>>>>>>>> then on the > >>>>>>>>>> ncbi blast homepage? > >>>>>>>>>> I am using blastp, the query is an amino-sequence > (different > >>>>>>>>>> results with any sequence, differences not only in > number of > >>>>>>>>>> hits but > >> even in e- > >>>>>>>>>> values, scores > >>>>>>>>>> etc...), the database is 'nr'. > >>>>>>>>>> PLEASE help me, > >>>>>>>>>> thank you in advance, > >>>>>>>>>> Jonas > >>>>>>>>>> > >>>>>>>>>> ps: my skript: > >>>>>>>>>> > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>>>>>> ## > >>>>>>>>>> use Bio::Seq::SeqFactory; > >>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my > >>>>>>>>>> @blast_report; my $prog = 'blastp'; > >>>>>>>>>> my $db = 'nr'; > >>>>>>>>>> my $e_val= '1e-10'; > >>>>>>>>>> #my $e_val= '10'; > >>>>>>>>>> my @params = ( '-prog' => $prog, > >>>>>>>>>> '-data' => $db, > >>>>>>>>>> '-expect' => $e_val, > >>>>>>>>>> '-readmethod' => 'SearchIO' ); my $factory = > >>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params); > >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} > = '11 1'; > >>>>>>>>>> > $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; > >>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = > '10'; $ Bio > >>>>>>>>>> > >>>>> > ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} > >>>>>>>>>> = '1'; > >>>>>>>>>> > >>>>>>>>>> my > >>>>>>>>>> $ > >>>>>>>>>> blast_seq > >>>>>>>>>> > >>>>> > >> > ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR > >>>>>>>>>> > >>>>>> > >>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN > >>>>> AFRQAHQNTAMATGPD > >>>>>>>>>> PDDEYE'; > >>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my > >>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' => > >>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq > >>>>>>>>>> =>$blast_seq, > >> -display_id => > >>>>>>>>>> "$blast_seq"); > >>>>>>>>>> my $filename='temp2.out'; > >>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR > >>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids = > >>>>>>>>>> $factory->each_rid ) { > >>>>>>>>>> foreach my $rid ( @rids ) > >>>>>>>>>> { > >>>>>>>>>> my $rc = $factory->retrieve_blast($rid); > >>>>>>>>>> if( !ref($rc) ) > >>>>>>>>>> { > >>>>>>>>>> if( $rc < 0 ) > >>>>>>>>>> { > >>>>>>>>>> $factory->remove_rid($rid); > >>>>>>>>>> } > >>>>>>>>>> print STDERR "." if ( $v > 0 ); > >>>>>>>>>> } > >>>>>>>>>> else > >>>>>>>>>> { > >>>>>>>>>> my $result = $rc->next_result(); > >>>>>>>>>> $factory->save_output($filename); > >>>>>>>>>> $factory->remove_rid($rid); > >>>>>>>>>> print "\nQuery Name: ", > >>>>> $result->query_name(), > >>>>>>>>>> "\n"; > >>>>>>>>>> while ( my $hit = $result->next_hit ) > >>>>>>>>>> { > >>>>>>>>>> next unless ( $v > 0); > >>>>>>>>>> print "\thit name is ", > >> $hit->name, "\n"; > >>>>>>>>>> while( my $hsp = $hit->next_hsp ) > >>>>>>>>>> { > >>>>>>>>>> print "\t\tscore is ", > >>>>> $hsp->score, "\n"; > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> } > >>>>>>>>>> @blast_report = get_file_data ($filename); return > >>>>>>>>>> @blast_report; > >>>>>>>>>> > >>>>>> > >>>>> ############################################################## > >>>>> ################ > >>>>>>>>>> #### > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> Bioperl-l mailing list > >>>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> = > >>>>>>>>> = > >>>>>>>>> > >>>>> > >> > ===================================================================== > >>>>>>>>> Attention: The information contained in this message and/or > >>>>>>>>> attachments from AgResearch Limited is intended only for the > >>>>> persons or entities > >>>>>>>>> to which it is addressed and may contain > confidential and/or > >>>>>>>>> privileged material. Any review, retransmission, > dissemination > >> or other use > >>>>>>>>> of, or > >>>>>>>>> taking of any action in reliance upon, this information > >>>>> by persons or > >>>>>>>>> entities other than the intended recipients is > prohibited by > >>>>>>>>> AgResearch Limited. If you have received this message in > >>>>>>>>> error, > >>>>> please notify > >>>>>>>>> the > >>>>>>>>> sender immediately. > >>>>>>>>> = > >>>>>>>>> = > >>>>>>>>> > >>>>> > >> > ===================================================================== > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Bioperl-l mailing list > >>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Jason Stajich > >>>>>>>> jason at bioperl.org > >>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > >>>>> -------------------------------------------------------------- > >>>>> ---------------- > >>>>>> -- > >>>>>> > >>>>>> > >>>>>> > >>>>>> No virus found in this incoming message. > >>>>>> Checked by AVG - www.avg.com > >>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release > >>>>> Date: 07/05/09 > >>>>>> 05:53:00 > >>>>> > >>>>> > >>>>> -------------------------------------------------------------- > >>>>> ------------------ > >>>>> > >>>>> > >>>>> > >>>>> No virus found in this incoming message. > >>>>> Checked by AVG - www.avg.com > >>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release > >>>>> Date: 07/05/09 > >>>>> 17:54:00 > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >> > >> > >> -------------------------------------------------------------- > >> ------------------ > >> > >> > >> > >> No virus found in this incoming message. > >> Checked by AVG - www.avg.com > >> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release > >> Date: 07/09/09 > >> 05:55:00 > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Tue Jul 21 14:00:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 21 Jul 2009 13:00:09 -0500 Subject: [Bioperl-l] Contributing to biomoose In-Reply-To: <4A653E0E.4040605@cornell.edu> References: <20090720022341.GA10399@Macintosh-74.local> <608C0716-BD0B-4113-84B6-2843E2226CA4@illinois.edu> <20090721015654.GA5651@siddhartha-basus-computer.local> <4A653E0E.4040605@cornell.edu> Message-ID: <9E6DA369-462A-4354-9120-D8075AF65689@illinois.edu> On Jul 20, 2009, at 11:03 PM, Robert Buels wrote: > Siddhartha Basu wrote: >> Of course it there is a core biomoose distribution, then that >> namespace >> makes a lot of sense for non-core modules. The philosophy also goes >> nicely with the organization of current MooseX modules. And if >> something >> in Bio::MooseX(really futuristic) becomes heavily important it can be >> integrated into the core Bio::Moose namespace. The same thing is also >> happening with MooseX::Attribute module. > > Bio::Moose isn't a good namespace for the long term. For > experimenting around with Moosey implementation techniques it's > fine, but before you guys go putting TOO much code into it, consider > what its future is going to be. Moose is an implementation > technology, and modules should be named for what they do, not how > they're implemented. True, but in the short term this will have to do, primarily to stay out of the way of similarly names BioPerl namespace (for the time being it enables us to run a few initial benchmarks comparing implementations between the two). Not too many other namespaces like Bio that work. If we want something shorter we could move it to something else once we have a decent enough name. Maybe Alces (genus name for Moose)? Seems kinda lame... > We already know Moose is far superior for organizing and expressing > designs, so what I would be shooting for here would be some deep, > focused implementations of certain aspects. It's starting to sound > like it's getting a little bigger than just playing around. > > Rob Not really. We have to test out a few initial implementations in order to see how they compare to the BioPerl versions (memory- and speed-wise). PrimarySeq is essentially done except for small things, so I may start working on some prim. benchmarks. chris From Russell.Smithies at agresearch.co.nz Tue Jul 21 16:34:31 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 22 Jul 2009 08:34:31 +1200 Subject: [Bioperl-l] Bioperl Entrez Esearch In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32A80443089@exchsth.agresearch.co.nz> Someone mentioned using try-catch a while ago for catching errors, it might work in this case: #!perl -w use Error qw(:try); try { $seqio = Bio::SeqIO->new(-file='my.fas'); } catch Error with { my $e = shift; # $e->test will contain the message }; Or you could redirect STDERR to a file: open(STDERR, ">", "$logfile") or die "Failed to re-direct STDERR to '$logfile': $!"; Or you could try using the "no warnings" pragma http://search.cpan.org/~nwclark/perl-5.8.9/lib/warnings.pm Hopefully, one of these will work :) --Russell From: Rajasekar Karthik [mailto:karthik085 at gmail.com] Sent: Wednesday, 22 July 2009 3:07 a.m. To: Smithies, Russell Cc: bioperl-l at lists.open-bio.org Subject: Re: [Bioperl-l] Bioperl Entrez Esearch Russell / Others, The utilities keep printing out warnings and errors. Is there any way to a) either not print at all b) or send them to some other log file other than apache's error.log Thanks. On Wed, Jul 15, 2009 at 5:34 PM, Rajasekar Karthik > wrote: that helps - thanks!!! On Tue, Jul 14, 2009 at 6:33 PM, Smithies, Russell > wrote: You sure can. Take a look at http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Rajasekar Karthik > Sent: Wednesday, 15 July 2009 10:23 a.m. > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bioperl Entrez Esearch > > Hi, > I an new to Bioperl. How can I do an Entrez Esearch using Bioperl? > > For example, I want to do an exact title search in pubmed > Title: Guidelines for quantitative rt-PCR > > Using HTTP Get, I would do something like this > URL: > http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&field=titl&te > rm=Guidelines%20for%20quantitative%20rt-PCR > to get the response XML. > > How can I use Bioperl to do the above action? > > Thanks. > > -- > Best Regards, > Rajasekar Karthik > karthik085 at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= -- Best Regards, Rajasekar Karthik karthik085 at gmail.com -- Best Regards, Rajasekar Karthik karthik085 at gmail.com From cjfields at illinois.edu Tue Jul 21 17:21:16 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 21 Jul 2009 16:21:16 -0500 Subject: [Bioperl-l] Bioperl Entrez Esearch In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32A80443089@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32A80443089@exchsth.agresearch.co.nz> Message-ID: <23A21D80-6DBD-4E4E-B590-E0D650B65943@illinois.edu> Hopefully most of those warnings are server-side issues and not my bugs! ;> If there are warnings you can turn them off ($obj->verbose(-1)). You can also use ultrastrict verbose(2), which converts everything to an exception, and then a simple eval block or try/catch (latter if you have Error.pm) should work (I don't think try/catch works with warnings, but I know an eval block won't). chris On Jul 21, 2009, at 3:34 PM, Smithies, Russell wrote: > Someone mentioned using try-catch a while ago for catching errors, > it might work in this case: > > #!perl -w > use Error qw(:try); > > try { > $seqio = Bio::SeqIO->new(-file='my.fas'); > } > catch Error with { > my $e = shift; > # $e->test will contain the message > }; > > > Or you could redirect STDERR to a file: > > open(STDERR, ">", "$logfile") or die "Failed to re-direct > STDERR to '$logfile': $!"; > > Or you could try using the "no warnings" pragma > http://search.cpan.org/~nwclark/perl-5.8.9/lib/warnings.pm > > > Hopefully, one of these will work :) > > --Russell > > > > From: Rajasekar Karthik [mailto:karthik085 at gmail.com] > Sent: Wednesday, 22 July 2009 3:07 a.m. > To: Smithies, Russell > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bioperl Entrez Esearch > > Russell / Others, > The utilities keep printing out warnings and errors. Is there any > way to > a) either not print at all > b) or send them to some other log file other than apache's error.log > > Thanks. > On Wed, Jul 15, 2009 at 5:34 PM, Rajasekar Karthik > wrote: > that helps - thanks!!! > > On Tue, Jul 14, 2009 at 6:33 PM, Smithies, Russell > wrote: > You sure can. > Take a look at http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook > > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org> > [mailto:bioperl-l- >> bounces at lists.open-bio.org] On >> Behalf Of Rajasekar Karthik >> Sent: Wednesday, 15 July 2009 10:23 a.m. >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Bioperl Entrez Esearch >> >> Hi, >> I an new to Bioperl. How can I do an Entrez Esearch using Bioperl? >> >> For example, I want to do an exact title search in pubmed >> Title: Guidelines for quantitative rt-PCR >> >> Using HTTP Get, I would do something like this >> URL: >> http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&field=titl&te >> rm=Guidelines%20for%20quantitative%20rt-PCR >> to get the response XML. >> >> How can I use Bioperl to do the above action? >> >> Thanks. >> >> -- >> Best Regards, >> Rajasekar Karthik >> karthik085 at gmail.com >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > > -- > Best Regards, > Rajasekar Karthik > karthik085 at gmail.com > > > > -- > Best Regards, > Rajasekar Karthik > karthik085 at gmail.com > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Tue Jul 21 19:14:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 22 Jul 2009 11:14:30 +1200 Subject: [Bioperl-l] GMAP IO? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32A80443089@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32A80443089@exchsth.agresearch.co.nz> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32A804431BB@exchsth.agresearch.co.nz> Is there an IO module for GMAP output? I looked in the obvious places (Bio::AlignIO, Google etc..) I know GMAP will do gff or psl output and there are parsers for those (they're SeqIO, not AlignIO?), but I was hoping there might be a better way as I've found GMAP's gff needs a bit of post-processing to make it usable. It's a fairly large job (mammalian refseqs vs. a mammalian genome) so I want to do it as efficiently as possible. Thanx, Russell Smithies Bioinformatics Applications Developer T +64 3 489 9085 E? russell.smithies at agresearch.co.nz Invermay? Research Centre Puddle Alley, Mosgiel, New Zealand T? +64 3 489 3809?? F? +64 3 489 9174? www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From jason.stajich at gmail.com Tue Jul 21 19:26:56 2009 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 21 Jul 2009 16:26:56 -0700 Subject: [Bioperl-l] GMAP IO? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32A804431BB@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32A80443089@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32A804431BB@exchsth.agresearch.co.nz> Message-ID: <8273f6c20907211626o2d8171f5hc4353db6b71573e2@mail.gmail.com> There hasn't been one yet that I know of. PSL can be parsed with Bio::SearchIO parser. The "non-standard" gff flavors is one of the annoyances but I just end up writing very simple gff converters in simple scripts rather than a master bioperl module that can handle all the different nuances that are not part of a standard. ?If you post the problematic alignment gff can give you a better sense of what would be the code, but I suspect it is just simple perl like this can standardize some things. ?This is my basic skeleton for that: while(<>) { ?chomp; ?my @row = split(/\t/,$_); ?# assuming a gff3 file, map 9th column to hash, assume single key=value pairs not key=value1,value2 ?my %last = map { split(/=/,$_) } split(/;/,pop @row); ?# some more code to fix the 9th column fields... ?# ?# now print out the line ?print join("\t", @row, map { sprintf("%s=%s",$_,$last{$_}) } grep { exists $last{$_} } qw(ID Parent Note)),"\n"; } -jason Jason Stajich jason at bioperl.org jason.stajich at gmail.com http://bioperl.org/wiki/User:Jason On Tue, Jul 21, 2009 at 4:14 PM, Smithies, Russell wrote: > > Is there an IO module for GMAP output? > I looked in the obvious places (Bio::AlignIO, Google etc..) > I know GMAP will do gff or psl output and there are parsers for those (they're SeqIO, not AlignIO?), but I was hoping there might be a better way as I've found GMAP's gff needs a bit of post-processing to make it usable. > > It's a fairly large job (mammalian refseqs vs. a mammalian genome) so I want to do it as efficiently as possible. > > Thanx, > > Russell Smithies > > Bioinformatics Applications Developer > T +64 3 489 9085 > E? russell.smithies at agresearch.co.nz > > Invermay? Research Centre > Puddle Alley, > Mosgiel, > New Zealand > T? +64 3 489 3809 > F? +64 3 489 9174 > www.agresearch.co.nz > > > > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Jul 21 20:29:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 21 Jul 2009 20:29:18 -0400 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes In-Reply-To: <2ac05d0f0907170549td482271ra7ea77bdfe43ee27@mail.gmail.com> References: <2ac05d0f0907170549td482271ra7ea77bdfe43ee27@mail.gmail.com> Message-ID: <931BE2F557CF4D7B9272D675EA69CCA8@NewLife> Hi Emanuele-- Well, that script looks a bit hackish to me. Here's another hack, maybe less hackish, that seems to work for me. Gory details are at the end of the post. Here's the output: Possibilities: chr start end src info 1 15691382 15723971 UCSC CASP9 (uc001awq.1) at chr1:15691382-15723971 1 15691382 15723377 UCSC CASP9 (uc001awn.1) at chr1:15691382-15723377 1 15691382 15723377 Ref NM_001229 at chr1:15691382-15723377 Here's the hack: my @info = genome_coords(1, $db, $ua); print "Possibilities:\n"; print join("\t", qw( chr start end src info )), "\n"; foreach (@info) { print join("\t", @{$_}{qw( chr start end src info )}), "\n"; } # work done here... sub genome_coords { my ($id, $db, $ua) = @_; my $seq = $db->get_Seq_by_id($id); my $ac = $seq->annotation; for my $ann ($ac->get_Annotations('dblink')) { if ($ann->database eq "UCSC") { my $resp = $ua->get($ann->url); my @a = $resp->content =~ m{position=chr([0-9]+):([0-9]+)-([0-9]+)\&.*\&(known|ref)Gene.*?\">(.*?)}g; my @ret; while (@a) { push @ret, { 'chr' => shift @a, 'start' => shift @a, 'end' => shift @a, 'src' => shift(@a) eq 'known' ? 'UCSC' : 'Ref', 'info' => shift @a }; } return unless @ret; return @ret; } } return; # parse error, no UCSC link on page } Here are the details: The script on the wiki page gets coordinates by looking at a url under a link on the page: the database "ModelMaker" link, whose url is (after the 842 query): 'http://www.ncbi.nlm.nih.gov/mapview/modelmaker.cgi?taxid=9606&contig=NT_004610.19&from=2498878&to=2530877&gene=CASP9&lid=842' The script reads the 'from' and 'to' values directly from the text of this url to deliver the coordinates. This is somewhat hacky, since the assumption is the coordinates that ModelMaker wants (were you to actually visit the link, which the script doesn't do) are the ones you want. The hack above is slightly better, in that it finds a database url link and visits it, then parses the page that the link returns -- the UCSC page for geneid 842, more likely to have what you want. It's still hacky, in that the format of that page may change, and that may break the regexp. But by then, you'll be able to hack yourself out of that situation! cheers, Mark ----- Original Message ----- From: "Emanuele Osimo" To: "perl bioperl ml" Sent: Friday, July 17, 2009 8:49 AM Subject: [Bioperl-l] Getting genomic coordinates for a list of genes > Hello everyone, > I'm new to programming, I'm a biologist, so please forgive my ignorance, but > I've been trying this for 2 weeks, now I have to ask you. > I'm trying the script I found at > http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates > because I need to have some variables (like $from and $to) assigned to the > start and end of a gene. > The script works fine, but gives me the wrong coordinates: for example if I > try it with the gene 842 (CASP9), it prints: > NT_004610.19 2498878 2530877 > > I found out that in Entrez, for each gene (for CASP9, for example, at > http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq > ) under "Genome Reference Consortium Human Build 37 (GRCh37), > Primary_Assembly" there are two different sets of coordinates. The first is > called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37), > Primary_Assembly", and is the one I need, and the second one is called just > "NT_004610.19" and it's the one that the script prints. > This is valid for all the genes I tried. > > DO you know how to make the script print the "right" coordinates (at least, > the one I need)? > Thanks a lot in advance, > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue Jul 21 20:41:16 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 21 Jul 2009 20:41:16 -0400 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes In-Reply-To: <931BE2F557CF4D7B9272D675EA69CCA8@NewLife> References: <2ac05d0f0907170549td482271ra7ea77bdfe43ee27@mail.gmail.com> <931BE2F557CF4D7B9272D675EA69CCA8@NewLife> Message-ID: <77C86ED7D5094F54A733C683FB3E6CE4@NewLife> Yikes! Left off the includes: Put these at the beginning of the script... use strict; use Bio::DB::EntrezGene; use Bio::WebAgent; my $ua = Bio::WebAgent->new(); my $db = new Bio::DB::EntrezGene; Ugh. MAJ ----- Original Message ----- From: "Mark A. Jensen" To: "Emanuele Osimo" ; "perl bioperl ml" Sent: Tuesday, July 21, 2009 8:29 PM Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of genes > Hi Emanuele-- > > Well, that script looks a bit hackish to me. Here's another hack, maybe less > hackish, that seems to work for me. Gory details are at the end of the post. > > Here's the output: > > Possibilities: > chr start end src info > 1 15691382 15723971 UCSC CASP9 (uc001awq.1) at > chr1:15691382-15723971 > 1 15691382 15723377 UCSC CASP9 (uc001awn.1) at > chr1:15691382-15723377 > 1 15691382 15723377 Ref NM_001229 at > chr1:15691382-15723377 > > Here's the hack: > > my @info = genome_coords(1, $db, $ua); > > print "Possibilities:\n"; > print join("\t", qw( chr start end src info )), "\n"; > foreach (@info) { > print join("\t", @{$_}{qw( chr start end src info )}), "\n"; > } > > # work done here... > sub genome_coords { > my ($id, $db, $ua) = @_; > my $seq = $db->get_Seq_by_id($id); > my $ac = $seq->annotation; > for my $ann ($ac->get_Annotations('dblink')) { > if ($ann->database eq "UCSC") { > my $resp = $ua->get($ann->url); > my @a = $resp->content =~ > m{position=chr([0-9]+):([0-9]+)-([0-9]+)\&.*\&(known|ref)Gene.*?\">(.*?)}g; > my @ret; > while (@a) { > push @ret, { > 'chr' => shift @a, > 'start' => shift @a, > 'end' => shift @a, > 'src' => shift(@a) eq 'known' ? 'UCSC' : 'Ref', > 'info' => shift @a > }; > } > return unless @ret; > return @ret; > } > } > return; # parse error, no UCSC link on page > } > > Here are the details: > The script on the wiki page gets coordinates by looking at a url > under a link on the page: the database "ModelMaker" link, whose > url is (after the 842 query): > > 'http://www.ncbi.nlm.nih.gov/mapview/modelmaker.cgi?taxid=9606&contig=NT_004610.19&from=2498878&to=2530877&gene=CASP9&lid=842' > > The script reads the 'from' and 'to' values directly from the text > of this url to deliver the coordinates. This is somewhat hacky, > since the assumption is the coordinates that ModelMaker wants > (were you to actually visit the link, which the script doesn't do) > are the ones you want. The hack above is slightly better, in that > it finds a database url link and visits it, then parses the page that the > link returns -- the UCSC page for geneid 842, more likely to > have what you want. It's still hacky, in that the format of that page > may change, and that may break the regexp. But by then, you'll > be able to hack yourself out of that situation! > > cheers, > Mark > > > > ----- Original Message ----- > From: "Emanuele Osimo" > To: "perl bioperl ml" > Sent: Friday, July 17, 2009 8:49 AM > Subject: [Bioperl-l] Getting genomic coordinates for a list of genes > > >> Hello everyone, >> I'm new to programming, I'm a biologist, so please forgive my ignorance, but >> I've been trying this for 2 weeks, now I have to ask you. >> I'm trying the script I found at >> http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates >> because I need to have some variables (like $from and $to) assigned to the >> start and end of a gene. >> The script works fine, but gives me the wrong coordinates: for example if I >> try it with the gene 842 (CASP9), it prints: >> NT_004610.19 2498878 2530877 >> >> I found out that in Entrez, for each gene (for CASP9, for example, at >> http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq >> ) under "Genome Reference Consortium Human Build 37 (GRCh37), >> Primary_Assembly" there are two different sets of coordinates. The first is >> called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37), >> Primary_Assembly", and is the one I need, and the second one is called just >> "NT_004610.19" and it's the one that the script prints. >> This is valid for all the genes I tried. >> >> DO you know how to make the script print the "right" coordinates (at least, >> the one I need)? >> Thanks a lot in advance, >> Emanuele >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From maj at fortinbras.us Tue Jul 21 21:17:36 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 21 Jul 2009 21:17:36 -0400 Subject: [Bioperl-l] Bio::SimpleAlign constructor? In-Reply-To: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> References: <56be91b60907160317r237a54c8v71d87e1ee4f4190b@mail.gmail.com> Message-ID: <990CEF10B1AD4BD5BE9977FD62DB3437@NewLife> Hi Paolo, I think I see what you want to do, however, it doesn't quite work this way. I'm supposing you want to specify something like s1/3-6 attc s2/7-10 gaag and obtain output like s1 --attc---- s2 ------gaag But (and this is why LocatableSeqs are "locatable"), the alignment described by the former data is always going to be s1 attc s2 gaag so that I can query the alignment *column* number 1 and obtain the residue coordinates of the original sequences in that column: $loc = $aln->get_seq_by_pos(1)->location_from_column(1); # 3 or vice-versa $col = $aln->column_from_residue_number( 's1', 3); # 1 As far as I know, you have to fill in the gaps yourself; a good exercise, since you already have all the information you need, in having set up the start and end coordinates (which are really the column coordinates in this model). If this wasn't what you had in mind, I apologize. cheers, Mark ----- Original Message ----- From: "Paolo Pavan" To: Sent: Thursday, July 16, 2009 6:17 AM Subject: [Bioperl-l] Bio::SimpleAlign constructor? > Hi, > I have a brief question: I would like to know if there is a method to > obtain a valid formatted and flush Bio::SimpleAlign object (i.e. > properly filled with gaps on the right and on the left side of each > sequence) given a bounch of Bio::LocatableSeq objects in which I have > specified the -start and -end properties. > Can anyone help me? Thank you very much, > > Paolo > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hartzell at alerce.com Wed Jul 22 11:51:26 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 22 Jul 2009 08:51:26 -0700 Subject: [Bioperl-l] GMAP IO? In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32A804431BB@exchsth.agresearch.co.nz> References: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32A80443089@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32A804431BB@exchsth.agresearch.co.nz> Message-ID: <19047.13694.549947.411284@already.local> Smithies, Russell writes: > Is there an IO module for GMAP output? > I looked in the obvious places (Bio::AlignIO, Google etc..) > I know GMAP will do gff or psl output and there are parsers for > those (they're SeqIO, not AlignIO?), but I was hoping there might be > a better way as I've found GMAP's gff needs a bit of post-processing > to make it usable. > > It's a fairly large job (mammalian refseqs vs. a mammalian genome) > so I want to do it as efficiently as possible. > [...] What output format are you using. I have a parser for the '-f 9' style of output, with tests, etc..., but I built it at my Day Job and while I know that I can get it pushed back to the repository I haven't dealt with it yet. If it'd be useful I'll figure out what I need to do. g. From kellert at ohsu.edu Wed Jul 22 13:17:03 2009 From: kellert at ohsu.edu (Thomas Keller) Date: Wed, 22 Jul 2009 10:17:03 -0700 Subject: [Bioperl-l] genbank (blast) alignments Message-ID: <06C35F8D-1EE5-4882-8BF4-111311FBEEC4@ohsu.edu> Greetings, Blast 2.2.21 has a multi-sequence alignment feature that is really handy: put in the accession number of the refseq in one sequence field and a concatenated fasta file of the Sanger reads to align in the second box and it does the alignments. Unfortunately, the output is a series of alignments rather than the more useful msf format with all reads aligned with the reference. Is there a bioperl module that reads the blast alignments and converts it to an msf alignment? thanks, Tom kellert at ohsu.edu 503-494-2442 From hartzell at alerce.com Wed Jul 22 16:14:40 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 22 Jul 2009 13:14:40 -0700 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <19043.61297.80141.781810@already.local> Message-ID: <19047.29488.841282.578782@already.local> Chris Fields writes: > On Jul 19, 2009, at 11:15 PM, George Hartzell wrote: > > > Chris Fields writes: > >> [...] > >> Prior to Module::Build the Makefile.PL we just looked for the > >> dependencies and reported back if they were missing; installation of > >> those modules was left up to the user. [...] > > > > Chiming here a bit late to say that I really *like* it when we leave > > installing the modules to the user. I'd often rather install them via > > e.g. the FreeBSD ports system instead of system, but how/why would > > BioPerl ever know that? > > > > g. > > That's a good point. Leaving it up to the user does make things a lot > simpler. > > The only downside is the onslaught of users who don't know why a > specific module doesn't work. May be the reason this was added in? > If we keep our dependencies current and write use_ok() style tests for our modules so that ./Build test fails when a dependency is missing I think that we've done our part of the job. We might be able to pick up some automated way to check dependencies (stolen from the autodepend Dist::Zilla plugin or something) and increase our odds of staying on top of it. Perl programmers need to know how to install dependencies using some toolset (cpan, ports, packages, apt-get, etc...) and understand how the pieces fit together. I'd *much* rather see us do the standard CPAN best practice dependency thing and then spend our time/effort building better tools and/or doing better science. Even if we do Magic for them to make BioPerl appear to work they're going to be stuck as soon as they try to use any other CPAN module (and they *SHOULD* be using CPAN modules, but that's a different high horse altogether...) and we've just ended up creating fragile code that someone needs to support. I'm working on moving my current project to use local::lib so that I depend on a well defined set of installed stuff and while I think local::lib sets up enough of an environment so that any automated cpan installs would do the right thing I'd rather not have to trust it. g. From Russell.Smithies at agresearch.co.nz Wed Jul 22 16:40:08 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 23 Jul 2009 08:40:08 +1200 Subject: [Bioperl-l] GMAP IO? In-Reply-To: <19047.13694.549947.411284@already.local> References: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32A80443089@exchsth.agresearch.co.nz> <18DF7D20DFEC044098A1062202F5FFF32A804431BB@exchsth.agresearch.co.nz> <19047.13694.549947.411284@already.local> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB0A0B5F@exchsth.agresearch.co.nz> Hi George, I'll have to admit, I've never used the "-f 9" table format but if you have a parser you're willing to share, I'll give it a go. Fortunately, GMAP is fairly quick once you have the indices built so I can do a bit of experimenting to find a good compromise. Thanx for your help, --Russell > -----Original Message----- > From: George Hartzell [mailto:hartzell at alerce.com] > Sent: Thursday, 23 July 2009 3:51 a.m. > To: Smithies, Russell > Cc: 'bioperl-l at lists.open-bio.org' > Subject: Re: [Bioperl-l] GMAP IO? > > Smithies, Russell writes: > > Is there an IO module for GMAP output? > > I looked in the obvious places (Bio::AlignIO, Google etc..) > > I know GMAP will do gff or psl output and there are parsers for > > those (they're SeqIO, not AlignIO?), but I was hoping there might be > > a better way as I've found GMAP's gff needs a bit of post-processing > > to make it usable. > > > > It's a fairly large job (mammalian refseqs vs. a mammalian genome) > > so I want to do it as efficiently as possible. > > [...] > > What output format are you using. > > I have a parser for the '-f 9' style of output, with tests, etc..., > but I built it at my Day Job and while I know that I can get it pushed > back to the repository I haven't dealt with it yet. If it'd be useful > I'll figure out what I need to do. > > g. ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Wed Jul 22 19:30:11 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 22 Jul 2009 18:30:11 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: <19047.29488.841282.578782@already.local> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <19043.61297.80141.781810@already.local> <19047.29488.841282.578782@already.local> Message-ID: <901A5E0C-67C9-4286-B8CC-2BA811543D96@illinois.edu> On Jul 22, 2009, at 3:14 PM, George Hartzell wrote: > Chris Fields writes: >> On Jul 19, 2009, at 11:15 PM, George Hartzell wrote: >> >>> Chris Fields writes: >>>> [...] >>>> Prior to Module::Build the Makefile.PL we just looked for the >>>> dependencies and reported back if they were missing; installation >>>> of >>>> those modules was left up to the user. [...] >>> >>> Chiming here a bit late to say that I really *like* it when we leave >>> installing the modules to the user. I'd often rather install them >>> via >>> e.g. the FreeBSD ports system instead of system, but how/why would >>> BioPerl ever know that? >>> >>> g. >> >> That's a good point. Leaving it up to the user does make things a >> lot >> simpler. >> >> The only downside is the onslaught of users who don't know why a >> specific module doesn't work. May be the reason this was added in? >> > > If we keep our dependencies current and write use_ok() style tests for > our modules so that > > ./Build test > > fails when a dependency is missing I think that we've done our part of > the job. We might be able to pick up some automated way to check > dependencies (stolen from the autodepend Dist::Zilla plugin or > something) and increase our odds of staying on top of it. We have added some bits to the test suite (largely thanks to Sendu) for checking these things for us, so tests requiring a specific module are not run and a warning is issued. > Perl programmers need to know how to install dependencies using some > toolset (cpan, ports, packages, apt-get, etc...) and understand how > the pieces fit together. I'd *much* rather see us do the standard > CPAN best practice dependency thing and then spend our time/effort > building better tools and/or doing better science. I agree. > Even if we do Magic for them to make BioPerl appear to work they're > going to be stuck as soon as they try to use any other CPAN module > (and they *SHOULD* be using CPAN modules, but that's a different high > horse altogether...) and we've just ended up creating fragile code > that someone needs to support. Exactly. As I mentioned before, I would rather any bugs be CPAN or Module::Build problems, not BioPerl bugs. > I'm working on moving my current project to use local::lib so that I > depend on a well defined set of installed stuff and while I think > local::lib sets up enough of an environment so that any automated cpan > installs would do the right thing I'd rather not have to trust it. > > g. I have thought about doing something along those lines for testing purposes, just haven't had time yet to set it up. chris From cjfields1 at gmail.com Wed Jul 22 19:32:24 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Wed, 22 Jul 2009 18:32:24 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> <7BBF64FF-F531-4F7C-8A31-BD04FCE1BF1A@gmail.com> Message-ID: <50CEDEF1-0BFD-42FB-9820-5AE21AA05C6F@gmail.com> Malcolm, it's probably not you. Looks like the get/put parameters are set as globals, so there may be cross-contamination of instances (worth checking JIC). You can probably work around that to an extent by encompassing any calls in blocks to localize changes. chris On Jul 21, 2009, at 11:59 AM, Cook, Malcolm wrote: > Chris, > > I wound up adding a new test > > # $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $ > > with the comment : > > # malcolm_cook at stowers.org: this test is in a separate file from > # RemoteBlast.t (on which it is modelled) since there is some sort of > # side-effecting between the multiple remote blasts that is causing > # this test to fail, if it comes last, or the other test to fail, if > # this one comes first. THIS IS A BUG EITHER IN REMOTE BLAST OR MY > # UNDERSTANDING, i.e. of how to initialize it. > > In any case, the test passes and demos rpsblast usage. > > Cheers, > > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields1 at gmail.com] >> Sent: Friday, July 10, 2009 1:05 PM >> To: Cook, Malcolm >> Cc: 'Jonas Schaer'; 'BioPerl List' >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Malcolm, >> >> Nice! Go ahead and add the test in; we can look at trying to >> get CDD_SEARCH working at some point but this is a nice workaround. >> >> chris >> >> On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: >> >>> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates >>> the following. Is it appropriate to submit it? >>> >>> Jonas, OK, I was a little quick on the gun... but I've got it now. >>> >>> You don't need to change the wrapper. Here is what you need to do: >>> >>> # 1) set your database like this: >>> >>> -database => 'cdsearch/cdd', # c.f. >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html >>> for other cdd database options >>> >>> # 2) add this line before submitting the job: >>> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; >>> >>> You're in - No other changes needed. >>> >>> Malcolm Cook >>> Stowers Institute for Medical Research - Kansas City, Missouri >>> >>> >>>> -----Original Message----- >>>> From: Jonas Schaer [mailto:Brotelzwieb at gmx.de] >>>> Sent: Friday, July 10, 2009 4:18 AM >>>> To: BioPerl List; Cook, Malcolm; Chris Fields >>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>> >>>> Hi, >>>> I tried to do what Malcom proposed my ($prog = 'rpsblast'; >>>> my $db = >>>> 'CDD';) but that didn't work. >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: Value rpsblast for PUT parameter PROGRAM does not match >>>> expression t?blast[ pnx]. Rejecting. >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>> C:/Perl/site/lib/Bio/Tools >>>> /Run/RemoteBlast.pm:329 >>>> STACK: Bio::Tools::Run::RemoteBlast::new >>>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl >>>> ast.pm:257 >>>> STACK: blast_a_seq2.pm:14 >>>> ----------------------------------------------------------- >>>> So I should try to "change the wrapper to allow >> 'rpsblast'", right? >>>> Could You tell me how to do that, please? So sorry but I >> have no idea >>>> yet...:) If that doesn't work, is there any other way to run >>>> cdd-searches with perl? >>>> Thank you so much! >>>> Regards, Jonas >>>> >>>> ----- Original Message ----- >>>> From: "Chris Fields" >>>> To: "Cook, Malcolm" >>>> Cc: "'Jonas Schaer'" ; "'BioPerl List'" >>>> ; "'Smithies, Russell'" >>>> ; >>>> Sent: Thursday, July 09, 2009 9:19 PM >>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>> >>>> >>>>> I've scheduled this tentatively for the 1.6 release >> series (just not >>>>> sure when yet). It may work as is, but I haven't tried >> it out yet >>>>> (and am hazarding to guess it only retrieves the single >> main RID at >>>>> the moment). >>>>> >>>>> chris >>>>> >>>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: >>>>> >>>>>> Jonas, >>>>>> >>>>>> If you want to continue to use the bioperl remoteblast >> interface, >>>>>> probably what you should do is simply call it twice. >>>>>> >>>>>> Once, as you already know how to do, which will return >> without CDD >>>>>> results. >>>>>> >>>>>> Secondly, to get the CDD results, call remoteblast a second time. >>>>>> This time, using >>>>>> -database => 'CDD' >>>>>> -program => 'rpsblast' >>>>>> >>>>>> However, the wrapper may object to the 'rpsblast' >> program. It is >>>>>> not listed in the POD - >>>>>> >>>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R >>>> emoteBlast.pm) >>>>>> If so, my guess is that changing the perl wrapper to allow >>>>>> rpsblast will "just work" (tm). I've cc:ed >>>> cjfields at bioperl.org for >>>>>> his opinion on this. >>>>>> >>>>>> Also, you might want to perform the CDD search first, >> especially if >>>>>> you are streaming results to eyeball that might like >> something to >>>>>> look at while the second (presumably longer) search is running. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Malcolm Cook >>>>>> Stowers Institute for Medical Research - Kansas City, Missouri >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >> Of Jonas >>>>>>> Schaer >>>>>>> Sent: Thursday, July 09, 2009 5:16 AM >>>>>>> To: BioPerl List; Smithies, Russell >>>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>>>>> >>>>>>> Hi guys, >>>>>>> Thank you all so much for your help and patience :). Of >> course you >>>>>>> were right and I finaly found the right put-parameter to get >>>>>>> exactly the same hits as on the homepage. >>>>>>> I do have an other question though :)... >>>>>>> I now want to include a search for conserved domains, >> but when I >>>>>>> try to use the CDD_SEARCH-parameter >>>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>>>>>> sub:CDD_SEARCH) >>>>>>> like the other put-parameters the way chris once told me(works >>>>>>> fine with the other params): >>>>>>> >>>>>>> my %put = ( >>>>>>> WORD_SIZE => 3, >>>>>>> HITLIST_SIZE => 100, >>>>>>> THRESHOLD => 11, >>>>>>> FILTER => 'R', >>>>>>> GENETIC_CODE => 1, >>>>>>> CDD_SEARCH => 'on' >>>>>>> ###I tried it >>>>>>> with 'true' and '1', too. >>>>>>> >>>>>>> ); >>>>>>> >>>>>>> for my $putName (keys %put) { >>>>>>> $factory->submit_parameter($putName,$put{$putName}); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> ...an exception is thrown: >>>>>>> >>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>> MSG: CDD_SEARCH is not a valid PUT parameter. >>>>>>> STACK: Error::throw >>>>>>> STACK: Bio::Root::Root::throw >>>> C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>>>>> C:/Perl/site/lib/Bio/Tools >>>>>>> /Run/RemoteBlast.pm:325 >>>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>>>>>> STACK: main::blast_it firsteval0.8.pm:288 >>>>>>> STACK: firsteval0.8.pm:35 >>>>>>> ----------------------------------------------------------- . >>>>>>> I guess somehow this could be the solution to my problem: >>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>>>>>> ub:RID-for-Simultaneous >>>>>>> , but unfortunately I don't understand what to do. >>>>>>> I'm so sorry to bother you with this but please help me once >>>>>>> more...:) >>>>>>> >>>>>>> Best regards and thanks in advance, Jonas >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>> From: "Smithies, Russell" >>>>>>> To: "'Jonas Schaer'" >>>>>>> Cc: "'Chris Fields'" ; "'BioPerl List'" >>>>>>> >>>>>>> Sent: Monday, July 06, 2009 10:56 PM >>>>>>> Subject: RE: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>> >>>>>>> >>>>>>> Hi Jonas, >>>>>>> You can't just play with the BLAST parameters and hope >>>> for a "better" >>>>>>> result. >>>>>>> I'd suggest that if you aren't sure what they do, you >> should leave >>>>>>> them alone as small changes can make huge differences in the >>>>>>> output - it's quite possible to miss finding what >> you're looking >>>>>>> for by using >>>> the wrong >>>>>>> parameters. >>>>>>> If all else fails, read the blast manual: >>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>>>>>> _all.html >>>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>>>>>> Or Read Ian Korfs' excellent book: >>>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >>>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>>>>>> >>>>>>> Don't worry about the integer overflow bug as there's >> nothing you >>>>>>> can do about it. If you're interested, Google and Wikipedia are >>>>>>> your >>>>>>> friends: >>>>>>> http://en.wikipedia.org/wiki/Integer_overflow >>>>>>> >>>>>>> >>>>>>> Russell >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>>>>>> To: BioPerl List; Chris Fields >>>>>>>> Subject: Re: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>> >>>>>>>> Hi guys, thanks for your answers so far. >>>>>>>> @jason: integer overflow in blast.... sorry, but what do >>>>>>> you mean by that? >>>>>>>> how can I fix it...? >>>>>>>> >>>>>>>> Since I never really changed any parameters I thought them >>>>>>> all to be >>>>>>>> default. >>>>>>>> whatever, I tried to get "better" results with my prog >>>> by changing >>>>>>>> these: >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>> >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>>>>>> STICS'} = >>>>>>>> '1'; >>>>>>>> with no effect...I guess these were default values anyway. >>>>>>>> >>>>>>>> So please maybe you can tell me all the other parameters I >>>>>>> can change with >>>>>>>> my >>>>>>>> perl-skript AND how to do that? >>>>>>>> Unfortunately both, perl and the blast-algorithm are pretty >>>>>>> much new to >>>>>>>> me, >>>>>>>> maybe thats why I just cannot find out how to do that on my >>>>>>> own... :/ >>>>>>>> >>>>>>>> Here is the output I get with my remote-blast skript: >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ################################### >>>>>>>> Query Name: >>>>>>>> >>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>>>>>> L >>>>>>>> hit name is ref|XP_001702807.1| >>>>>>>> score is 442 >>>>>>>> BLASTP 2.2.21+ >>>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>>>>>> A. Schaffer, >>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>>> Lipman (1997), >>>>>>>> "Gapped >>>>>>>> BLAST and PSI-BLAST: a new generation of protein >> database search >>>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>>> >>>>>>>> >>>>>>>> Reference for composition-based statistics: Alejandro A. >>>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>>>>>> John L. Spouge, >>>>>>>> Yuri >>>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>>>>>> "Improving the >>>>>>>> accuracy of PSI-BLAST protein database searches with >>>>>>> composition-based >>>>>>>> statistics and other refinements", Nucleic Acids Res. >>>> 29:2994-3005. >>>>>>>> >>>>>>>> >>>>>>>> RID: 53STX5G2013 >>>>>>>> >>>>>>>> >>>>>>>> Database: All non-redundant GenBank CDS >>>>>>>> translations+PDB+SwissProt+PIR+PRF excluding >>>> environmental samples >>>>>>>> from WGS projects >>>>>>>> 9,252,587 sequences; 3,169,972,781 total >> letters Query= >>>>>>>> >>>>>>> >>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>>>>>> >>>>>>> >>>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA >>>> M >>>>>>>> ATGPDPDDEYE >>>>>>>> Length=150 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Score >>>>>>>> E >>>>>>>> Sequences producing significant alignments: >>>>>>> (Bits) >>>>>>>> Value >>>>>>>> >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>>>> reinhard... 174 >>>>>>>> 2e-42 >>>>>>>> >>>>>>>> >>>>>>>> ALIGNMENTS >>>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>> reinhardtii] >>>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>>>>>> Length=303 >>>>>>>> >>>>>>>> Score = 174 bits (442), Expect = 2e-42, Method: >>>>>>> Composition-based >>>>>>>> stats. >>>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>>>>>> Gaps = 0/150 >>>>>>>> (0%) >>>>>>>> >>>>>>>> Query 1 >>>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>>>>>> 60 >>>>>>>> >>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>>>> Sbjct 154 >>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>>>> 213 >>>>>>>> >>>>>>>> Query 61 >>>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> 120 >>>>>>>> >>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> Sbjct 214 >>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> 273 >>>>>>>> >>>>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct 274 >>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Database: All non-redundant GenBank CDS >>>>>>>> translations+PDB+SwissProt+PIR+PRF >>>>>>>> excluding environmental samples from WGS projects >>>>>>>> Posted date: Jul 5, 2009 4:41 AM Number of letters in >>>>>>>> database: -1,124,994,511 Number of sequences in database: >>>>>>>> 9,252,587 >>>>>>>> >>>>>>>> Lambda K H >>>>>>>> 0.309 0.122 0.345 >>>>>>>> Gapped >>>>>>>> Lambda K H >>>>>>>> 0.267 0.0410 0.140 >>>>>>>> Matrix: BLOSUM62 >>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of >> Sequences: >>>>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions: >>>>>>>> 1448367 Number of successful extensions: 2103 Number >> of sequences >>>>>>>> better than 10: 0 Number of HSP's better than 10 >> without gapping: >>>>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully >>>>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781 >>>>>>>> Length adjustment: 113 Effective length of query: 37 Effective >>>>>>>> length of database: 2124430450 Effective search space: >>>>>>>> 78603926650 Effective search space used: 78603926650 >>>>>>>> T: 11 >>>>>>>> A: 40 >>>>>>>> X1: 16 (7.1 bits) >>>>>>>> X2: 38 (14.6 bits) >>>>>>>> X3: 64 (24.7 bits) >>>>>>>> S1: 42 (20.8 bits) >>>>>>>> S2: 74 (33.1 bits) >>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ################################### >>>>>>>> and here are the hits (?) of the blast-algorithm on the >>>>>>> ncbi-homepage with >>>>>>>> the same query of course: >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>>>> reinhard... 300 >>>>>>>> 3e-80 >>>>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>>>>>> [Acyrtho... 36.2 >>>>>>>> 1.1 >>>>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>>>>>> [Blautia... 35.4 >>>>>>>> 1.8 >>>>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>>>>>> brazil... 34.3 >>>>>>>> 4.2 >>>>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>>>>>> [Aspergillus n... 33.5 >>>>>>>> 6.0 >>>>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>>>>>> [Methyloba... 33.5 >>>>>>>> 7.0 >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ###################################at >>>>>>>> least the first hit is the same, but even there there is a >>>>>>> different score >>>>>>>> and e-value. >>>>>>>> >>>>>>>> thanks so much for any help :) >>>>>>>> regards, jonas >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>> From: "Chris Fields" >>>>>>>> To: "Jason Stajich" >>>>>>>> Cc: "Smithies, Russell" >>>>>>> ; "'BioPerl >>>>>>>> List'" ; "'Jonas Schaer'" >>>>>>>> >>>>>>>> Sent: Monday, July 06, 2009 12:51 AM >>>>>>>> Subject: Re: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>> >>>>>>>> >>>>>>>>> That inspires confidence ;> >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>>>>>> >>>>>>>>>> integer overflow in blast.... >>>>>>>>>> >>>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>>>>>> >>>>>>>>>>> I'd guess it's a difference in the parameters used. >>>>>>>>>>> Interesting that both have the number of letters in >> the db as >>>>>>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>>>>>> >>>>>>>>>>> Stats from your remote_blast: >>>>>>>>>>> >>>>>>>>>>> 'stats' => { >>>>>>>>>>> 'S1' => '42', >>>>>>>>>>> 'S1_bits' => '20.8', >>>>>>>>>>> 'lambda' => '0.309', >>>>>>>>>>> 'entropy' => '0.345', >>>>>>>>>>> 'kappa_gapped' => '0.0410', >>>>>>>>>>> 'T' => '11', >>>>>>>>>>> 'kappa' => '0.122', >>>>>>>>>>> 'X3_bits' => '24.7', >>>>>>>>>>> 'X1' => '16', >>>>>>>>>>> 'lambda_gapped' => '0.267', >>>>>>>>>>> 'X2' => '38', >>>>>>>>>>> 'S2' => '74', >>>>>>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>>>>>> 'Hits_to_DB' => '60102303', >>>>>>>>>>> 'dbletters' => '-1125070205', >>>>>>>>>>> 'A' => '40', >>>>>>>>>>> 'num_successful_extensions' => '2004', >>>>>>>>>>> 'num_extensions' => '1436892', >>>>>>>>>>> 'X1_bits' => '7.1', >>>>>>>>>>> 'X3' => '64', >>>>>>>>>>> 'entropy_gapped' => '0.140', >>>>>>>>>>> 'dbentries' => '9252258', >>>>>>>>>>> 'X2_bits' => '14.6', >>>>>>>>>>> 'S2_bits' => '33.1' >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>>>>>> >>>>>>>>>>> Database: All non-redundant GenBank CDS >>>>>>> translations+PDB+SwissProt >>>>>>>>>>> +PIR+PRF >>>>>>>>>>> excluding environmental samples from WGS projects >> Posted date: >>>>>>>>>>> Jul 4, 2009 4:41 AM Number of letters in database: >>>>>>>>>>> -1,125,070,205 Number of sequences in database: 9,252,258 >>>>>>>>>>> >>>>>>>>>>> Lambda K H >>>>>>>>>>> 0.309 0.124 0.340 >>>>>>>>>>> Gapped >>>>>>>>>>> Lambda K H >>>>>>>>>>> 0.267 0.0410 0.140 >>>>>>>>>>> Matrix: BLOSUM62 >>>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of >>>>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of >>>>>>>>>>> extensions: 3101413 Number of successful extensions: 9001 >>>>>>>>>>> Number of sequences better than 100: 65 Number of >> HSP's better >>>>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000 >>>>>>>>>>> Number of HSP's successfully gapped: 66 Length of >> query: 150 >>>>>>>>>>> Length of database: 3169897087 Length adjustment: 113 >>>>>>>>>>> Effective length of query: 37 Effective length of database: >>>>>>>>>>> 2124391933 Effective search space: 78602501521 Effective >>>>>>>>>>> search space used: 78602501521 >>>>>>>>>>> T: 11 >>>>>>>>>>> A: 40 >>>>>>>>>>> X1: 16 (7.1 bits) >>>>>>>>>>> X2: 38 (14.6 bits) >>>>>>>>>>> X3: 64 (24.7 bits) >>>>>>>>>>> S1: 42 (20.8 bits) >>>>>>>>>>> S2: 65 (29.6 bits) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l- >>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>>>>>> To: BioPerl List >>>>>>>>>>>> Subject: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>>>>>> >>>>>>>>>>>> Hi again :) >>>>>>>>>>>> please, I only have this little question: >>>>>>>>>>>> why do I get different results with my remote::blast >>>>>>> perl skript >>>>>>>>>>>> then on the >>>>>>>>>>>> ncbi blast homepage? >>>>>>>>>>>> I am using blastp, the query is an amino-sequence >> (different >>>>>>>>>>>> results with any sequence, differences not only in >> number of >>>>>>>>>>>> hits but >>>> even in e- >>>>>>>>>>>> values, scores >>>>>>>>>>>> etc...), the database is 'nr'. >>>>>>>>>>>> PLEASE help me, >>>>>>>>>>>> thank you in advance, >>>>>>>>>>>> Jonas >>>>>>>>>>>> >>>>>>>>>>>> ps: my skript: >>>>>>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>>>>>> ## >>>>>>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my >>>>>>>>>>>> @blast_report; my $prog = 'blastp'; >>>>>>>>>>>> my $db = 'nr'; >>>>>>>>>>>> my $e_val= '1e-10'; >>>>>>>>>>>> #my $e_val= '10'; >>>>>>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>>>>>> '-data' => $db, >>>>>>>>>>>> '-expect' => $e_val, >>>>>>>>>>>> '-readmethod' => 'SearchIO' ); my $factory = >>>>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} >> = '11 1'; >>>>>>>>>>>> >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = >> '10'; $ Bio >>>>>>>>>>>> >>>>>>> >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>>>>>> = '1'; >>>>>>>>>>>> >>>>>>>>>>>> my >>>>>>>>>>>> $ >>>>>>>>>>>> blast_seq >>>>>>>>>>>> >>>>>>> >>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>>>>>> >>>>>>>> >>>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>>>>>> AFRQAHQNTAMATGPD >>>>>>>>>>>> PDDEYE'; >>>>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my >>>>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq >>>>>>>>>>>> =>$blast_seq, >>>> -display_id => >>>>>>>>>>>> "$blast_seq"); >>>>>>>>>>>> my $filename='temp2.out'; >>>>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR >>>>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids = >>>>>>>>>>>> $factory->each_rid ) { >>>>>>>>>>>> foreach my $rid ( @rids ) >>>>>>>>>>>> { >>>>>>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>>>>>> if( !ref($rc) ) >>>>>>>>>>>> { >>>>>>>>>>>> if( $rc < 0 ) >>>>>>>>>>>> { >>>>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>>>> } >>>>>>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>>>>>> } >>>>>>>>>>>> else >>>>>>>>>>>> { >>>>>>>>>>>> my $result = $rc->next_result(); >>>>>>>>>>>> $factory->save_output($filename); >>>>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>>>> print "\nQuery Name: ", >>>>>>> $result->query_name(), >>>>>>>>>>>> "\n"; >>>>>>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>>>>>> { >>>>>>>>>>>> next unless ( $v > 0); >>>>>>>>>>>> print "\thit name is ", >>>> $hit->name, "\n"; >>>>>>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>>>>>> { >>>>>>>>>>>> print "\t\tscore is ", >>>>>>> $hsp->score, "\n"; >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> } >>>>>>>>>>>> @blast_report = get_file_data ($filename); return >>>>>>>>>>>> @blast_report; >>>>>>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>>>>>> #### >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> >>>>>>> >>>> >> ===================================================================== >>>>>>>>>>> Attention: The information contained in this message and/or >>>>>>>>>>> attachments from AgResearch Limited is intended only for the >>>>>>> persons or entities >>>>>>>>>>> to which it is addressed and may contain >> confidential and/or >>>>>>>>>>> privileged material. Any review, retransmission, >> dissemination >>>> or other use >>>>>>>>>>> of, or >>>>>>>>>>> taking of any action in reliance upon, this information >>>>>>> by persons or >>>>>>>>>>> entities other than the intended recipients is >> prohibited by >>>>>>>>>>> AgResearch Limited. If you have received this message in >>>>>>>>>>> error, >>>>>>> please notify >>>>>>>>>>> the >>>>>>>>>>> sender immediately. >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> >>>>>>> >>>> >> ===================================================================== >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason Stajich >>>>>>>>>> jason at bioperl.org >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -------------------------------------------------------------- >>>>>>> ---------------- >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> No virus found in this incoming message. >>>>>>>> Checked by AVG - www.avg.com >>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>>>>>> Date: 07/05/09 >>>>>>>> 05:53:00 >>>>>>> >>>>>>> >>>>>>> -------------------------------------------------------------- >>>>>>> ------------------ >>>>>>> >>>>>>> >>>>>>> >>>>>>> No virus found in this incoming message. >>>>>>> Checked by AVG - www.avg.com >>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>>>>>> Date: 07/05/09 >>>>>>> 17:54:00 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>> >>>> >>>> -------------------------------------------------------------- >>>> ------------------ >>>> >>>> >>>> >>>> No virus found in this incoming message. >>>> Checked by AVG - www.avg.com >>>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release >>>> Date: 07/09/09 >>>> 05:55:00 >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gmodhelp at googlemail.com Wed Jul 22 19:58:17 2009 From: gmodhelp at googlemail.com (Dave Clements, GMOD Help Desk) Date: Wed, 22 Jul 2009 19:58:17 -0400 Subject: [Bioperl-l] [Gmod-gbrowse] plugin Aligner.pm and MySQL syntax In-Reply-To: <4A6442C8.5030209@itb.cnr.it> References: <4A6442C8.5030209@itb.cnr.it> Message-ID: <71ee57c70907221658w10ca9c09h3e4c451c81bda11d@mail.gmail.com> Hi Davide, On my version of SeqFeature/Store/DBI/mysql.pm, the problem query is: # if no other criteria are specified, then # only fetch indexed (i.e. top level objects) @where = 'indexed=1' unless @where; my $from = join ', ', at from; my $where = join ' AND ',map {"($_)"} @where; my $group = join ', ', at group; $group = "GROUP BY $group" if @group; my $query = <_print_query($query, at args) if DEBUG || $self->debug; my $sth = $self->_prepare($query); $sth->execute(@args) or $self->throw($sth->errstr); Which looks OK to me. However, $from, $where and $group are set in many places upstream, and that is where the extra comma (or missing field name) is, I think. I don't see how this can be a user-caused problem. My guess is that this is a problem in SeqFeature, rather than in GBrowse per se, so I'm going to also add BioPerl to this thread (and maybe someone there will send it back :-). Dave C. On Mon, Jul 20, 2009 at 6:11 AM, Davide Rambaldi CNR < davide.rambaldi at itb.cnr.it> wrote: > Hi, I have a server with an instance of gbrowse version 1.69 (with MySQL > DBI adaptor) > > I have decided to add the plugin Aligner.pm and I have as result an > empty page and this EXCEPTION: > > -------------------- EXCEPTION --------------------, referer: > http://155.253.41.215/cgi-bin/gbrowse/charcot/ > MSG: You have an error in your SQL syntax; check the manual that > corresponds to your MySQL server version for the right syntax to use > near '), referer: http://155.253.41.215/cgi-bin/gbrowse/charcot/ > > )' at line 12, referer: http://155.253.41.215/cgi-bin/gbrowse/charcot/ > > STACK Bio::DB::SeqFeature::Store::DBI::mysql::_features > /usr/local/share/perl/5.10.0/Bio/DB/SeqFeature/Store/DBI/mysql.pm:851, > referer: http://155.253.41.215/cgi-bin/gbrowse/charcot/ > STACK Bio::DB::SeqFeature::Store::features > /usr/local/share/perl/5.10.0/Bio/DB/SeqFeature/Store.pm:1067, referer: > http://155.253.41.215/cgi-bin/gbrowse/charcot/ > STACK Bio::DB::SeqFeature::Segment::features > /usr/local/share/perl/5.10.0/Bio/DB/SeqFeature/Segment.pm:201, referer: > http://155.253.41.215/cgi-bin/gbrowse/charcot/ > STACK Bio::Graphics::Browser::Plugin::Aligner::dump > /etc/apache2/gbrowse.conf/plugins/Aligner.pm:159, referer: > http://155.253.41.215/cgi-bin/gbrowse/charcot/ > STACK main::do_plugin_dump /usr/lib/cgi-bin/gbrowse:3439, referer: > http://155.253.41.215/cgi-bin/gbrowse/charcot/ > STACK toplevel /usr/lib/cgi-bin/gbrowse:237, referer: > http://155.253.41.215/cgi-bin/gbrowse/charcot/ > -------------------------------------------, referer: > http://155.253.41.215/cgi-bin/gbrowse/charcot/ > > > Seems that the query to MySQL have a syntax error... any solution > avaliable for this bug? > > Thanks > > Davide R > > > ------------------------------------------------------------------------------ > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a limited time, > vendors submitting new applications to BlackBerry App World(TM) will have > the opportunity to enter the BlackBerry Developer Challenge. See full prize > details at: http://p.sf.net/sfu/Challenge > _______________________________________________ > Gmod-gbrowse mailing list > Gmod-gbrowse at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse > -- * Register now for the August GMOD Meeting: http://gmod.org/wiki/August_2009_GMOD_Meeting * Please keep responses on the list! * Was this helpful? Let us know at http://gmod.org/wiki/Help_Desk_Feedback From hlapp at gmx.net Wed Jul 22 20:03:11 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 22 Jul 2009 20:03:11 -0400 Subject: [Bioperl-l] ubuntu installation Message-ID: <2EDCC4BB-5653-444A-BB6F-6CB415509652@gmx.net> Apparently when you say "apt-get install bioperl" on Ubuntu you get the 1.5.2 release. I don't know anything about BioPerl packaging on Ubuntu but thought I'd check whether someone who knows (or who maintains that package?) is here and can offer some advice or comment? See http://phylr-gsoc.blogspot.com/2009/07/setting-up-biosql-101-on-postgresql-83.html Dazhi is one of our Google Summer of Code students. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Wed Jul 22 20:23:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 22 Jul 2009 20:23:09 -0400 Subject: [Bioperl-l] ubuntu installation In-Reply-To: <2EDCC4BB-5653-444A-BB6F-6CB415509652@gmx.net> References: <2EDCC4BB-5653-444A-BB6F-6CB415509652@gmx.net> Message-ID: <18C32CE6BDA043FDB253288D48B48751@NewLife> Bioperl-max ( my public VM on Amazon ) is Ubuntu Hardy-- I set up everything (live, db, tools, biosql) from an svn checkout of the various trunks, and built by 'perl Build.PL; ./Build ; ./Build test; ./Build install'. (Also needed to apt-get svn; no problem.) Worked fine. If Dazhi wants a particular release, he could check out a release tag instead of trunk. More info at http://www.bioperl.org/wiki/Using_Subversion MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "BioPerl List" Cc: "Dazhi Jiao" Sent: Wednesday, July 22, 2009 8:03 PM Subject: [Bioperl-l] ubuntu installation > Apparently when you say "apt-get install bioperl" on Ubuntu you get the 1.5.2 > release. I don't know anything about BioPerl packaging on Ubuntu but thought > I'd check whether someone who knows (or who maintains that package?) is here > and can offer some advice or comment? > > See > http://phylr-gsoc.blogspot.com/2009/07/setting-up-biosql-101-on-postgresql-83.html > Dazhi is one of our Google Summer of Code students. > > -hilmar > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Wed Jul 22 20:41:15 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 22 Jul 2009 17:41:15 -0700 Subject: [Bioperl-l] ubuntu installation In-Reply-To: <2EDCC4BB-5653-444A-BB6F-6CB415509652@gmx.net> References: <2EDCC4BB-5653-444A-BB6F-6CB415509652@gmx.net> Message-ID: <4A67B1AB.4060005@cornell.edu> According to `apt-cache show bioperl` on ubuntu, the Debian/Ubuntu bioperl packages is maintained by the Debian-Med Packaging Team , meaning they are the ones who update and maintain it. I wonder who 'they' are? Are any of you Debian Med packagers on this list? Looking at http://packages.debian.org/search?suite=all&searchon=names&keywords=bioperl it looks like 1.6 has made it into Debian testing and unstable, so I would guess that it would probably be in the next ubuntu release, I would think. However, looking at the ubuntu packages site, http://packages.ubuntu.com/search?keywords=bioperl&searchon=names&suite=all§ion=all , it looks like karmic, the next release still has 1.5.2 in it. This could be a problem. Perhaps we should email the Ubuntu MOTU Developers , who are maintaining the ubuntu version of the package and ask them to make sure that 1.6 gets into Karmic? Rob Hilmar Lapp wrote: > Apparently when you say "apt-get install bioperl" on Ubuntu you get the > 1.5.2 release. I don't know anything about BioPerl packaging on Ubuntu > but thought I'd check whether someone who knows (or who maintains that > package?) is here and can offer some advice or comment? > > See > http://phylr-gsoc.blogspot.com/2009/07/setting-up-biosql-101-on-postgresql-83.html > > Dazhi is one of our Google Summer of Code students. > > -hilmar > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From charles-listes+bioperl at plessy.org Wed Jul 22 20:46:29 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Thu, 23 Jul 2009 09:46:29 +0900 Subject: [Bioperl-l] ubuntu installation In-Reply-To: <2EDCC4BB-5653-444A-BB6F-6CB415509652@gmx.net> References: <2EDCC4BB-5653-444A-BB6F-6CB415509652@gmx.net> Message-ID: <20090723004629.GC30276@kunpuu.plessy.org> Le Wed, Jul 22, 2009 at 08:03:11PM -0400, Hilmar Lapp a ?crit : > Apparently when you say "apt-get install bioperl" on Ubuntu you get the > 1.5.2 release. I don't know anything about BioPerl packaging on Ubuntu > but thought I'd check whether someone who knows (or who maintains that > package?) is here and can offer some advice or comment? > > See http://phylr-gsoc.blogspot.com/2009/07/setting-up-biosql-101-on-postgresql-83.html > Dazhi is one of our Google Summer of Code students. Dear Hilmar, BioPerl is only in version 1.6 in the development version of Ubuntu. You can refer to the following page: https://launchpad.net/ubuntu/+source/bioperl This said, I think that the current stable release contains everything needed for the installation of the latest package. Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From charles-listes+bioperl at plessy.org Wed Jul 22 21:06:08 2009 From: charles-listes+bioperl at plessy.org (Charles Plessy) Date: Thu, 23 Jul 2009 10:06:08 +0900 Subject: [Bioperl-l] ubuntu installation In-Reply-To: <4A67B1AB.4060005@cornell.edu> References: <2EDCC4BB-5653-444A-BB6F-6CB415509652@gmx.net> <4A67B1AB.4060005@cornell.edu> Message-ID: <20090723010608.GD30276@kunpuu.plessy.org> Le Wed, Jul 22, 2009 at 05:41:15PM -0700, Robert Buels a ?crit : > According to `apt-cache show bioperl` on ubuntu, the Debian/Ubuntu > bioperl packages is maintained by the Debian-Med Packaging Team > , meaning they are the > ones who update and maintain it. I wonder who 'they' are? Are any of > you Debian Med packagers on this list? > > Looking at > http://packages.debian.org/search?suite=all&searchon=names&keywords=bioperl > it looks like 1.6 has made it into Debian testing and unstable, so I > would guess that it would probably be in the next ubuntu release, I > would think. > > However, looking at the ubuntu packages site, > http://packages.ubuntu.com/search?keywords=bioperl&searchon=names&suite=all§ion=all > , it looks like karmic, the next release still has 1.5.2 in it. This > could be a problem. Perhaps we should email the Ubuntu MOTU Developers > , who are maintaining the ubuntu version > of the package and ask them to make sure that 1.6 gets into Karmic? Hello Robert, there is a strange discrepancy between packages.ubuntu.com and launchpad.net, and it seems that bioperl 1.6 is already in Karmic. Indeed, the source download links in packages.ubuntu.com point to that version. I just notified the contact person for packages.u.c of this problem. http://packages.ubuntu.com/karmic/bioperl https://launchpad.net/ubuntu/+source/bioperl The Ubuntu packages are directly synchronised from Debian and there is no specific Ubuntu maintainer for them, but it seems that requests through Launchpad bugs are answered, as in the case of the lastest request for synchronisat (https://bugs.launchpad.net/ubuntu/+source/bioperl/+bug/324001). I am not able to do tests on Ubuntu as I am only using Debian, so I recommend to report problems to the distribution you use first. I monitor both and this mailing list anyway. By the way, feel free to make any proposition or comment on our mailing list (debian-med at lists.debian.org) if you think that some key bioinformatics programs are missing on your Debian or Ubuntu systems. Also, we welcome new team members, so when you meet people who wonder how to contribute to open source bioinformatics without writing software, do not hesitate to suggest them to join our packaging team :) Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan From hartzell at alerce.com Wed Jul 22 22:03:09 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 22 Jul 2009 19:03:09 -0700 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: <901A5E0C-67C9-4286-B8CC-2BA811543D96@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <19043.61297.80141.781810@already.local> <19047.29488.841282.578782@already.local> <901A5E0C-67C9-4286-B8CC-2BA811543D96@illinois.edu> Message-ID: <19047.50397.694196.227661@already.local> Chris Fields writes: > On Jul 22, 2009, at 3:14 PM, George Hartzell wrote: > > > Chris Fields writes: > >> On Jul 19, 2009, at 11:15 PM, George Hartzell wrote: > >> > >>> Chris Fields writes: > >>>> [...] > >>>> Prior to Module::Build the Makefile.PL we just looked for the > >>>> dependencies and reported back if they were missing; installation > >>>> of > >>>> those modules was left up to the user. [...] > >>> > >>> Chiming here a bit late to say that I really *like* it when we leave > >>> installing the modules to the user. I'd often rather install them > >>> via > >>> e.g. the FreeBSD ports system instead of system, but how/why would > >>> BioPerl ever know that? > >>> > >>> g. > >> > >> That's a good point. Leaving it up to the user does make things a > >> lot > >> simpler. > >> > >> The only downside is the onslaught of users who don't know why a > >> specific module doesn't work. May be the reason this was added in? > >> > > > > If we keep our dependencies current and write use_ok() style tests for > > our modules so that > > > > ./Build test > > > > fails when a dependency is missing I think that we've done our part of > > the job. We might be able to pick up some automated way to check > > dependencies (stolen from the autodepend Dist::Zilla plugin or > > something) and increase our odds of staying on top of it. > > We have added some bits to the test suite (largely thanks to Sendu) > for checking these things for us, so tests requiring a specific module > are not run and a warning is issued. > [...] What I was describing is a layer simpler than what Sendu et al. have done. In a module testing Foo, Bar::Bah, instead of use Foo; use Bar::Bah; use BEGIN { use_ok('Foo'); use_ok('Bar::Bah'); } and that way if there's some missing dependency that wasn't properly specified/dealt with then it's handled as part of the framework, "rather than just vomiting if its load fails" (in the words of the Test::More author). g. From maj at fortinbras.us Wed Jul 22 22:11:25 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 22 Jul 2009 22:11:25 -0400 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: <19047.50397.694196.227661@already.local> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com><4A5E7CE7.4040908@cornell.edu><4A5ED518.7010504@cornell.edu><4A60ACC6.6020003@sendu.me.uk><4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk><1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu><934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk><19043.61297.80141.781810@already.local><19047.29488.841282.578782@already.local><901A5E0C-67C9-4286-B8CC-2BA811543D96@illinois.edu> <19047.50397.694196.227661@already.local> Message-ID: chiming in to +1 on this; seems very 'natural' (i.e. plugs into to the Perl Common Sense, approximated by perl core modules + CPAN) ----- Original Message ----- From: "George Hartzell" To: "Chris Fields" Cc: "Robert Buels" ; "BioPerl List" ; "George Hartzell" ; "Mark Jensen" Sent: Wednesday, July 22, 2009 10:03 PM Subject: Re: [Bioperl-l] Regarding Bio::Root::Build,was Re: bioperl reorganization > Chris Fields writes: > > On Jul 22, 2009, at 3:14 PM, George Hartzell wrote: > > > > > Chris Fields writes: > > >> On Jul 19, 2009, at 11:15 PM, George Hartzell wrote: > > >> > > >>> Chris Fields writes: > > >>>> [...] > > >>>> Prior to Module::Build the Makefile.PL we just looked for the > > >>>> dependencies and reported back if they were missing; installation > > >>>> of > > >>>> those modules was left up to the user. [...] > > >>> > > >>> Chiming here a bit late to say that I really *like* it when we leave > > >>> installing the modules to the user. I'd often rather install them > > >>> via > > >>> e.g. the FreeBSD ports system instead of system, but how/why would > > >>> BioPerl ever know that? > > >>> > > >>> g. > > >> > > >> That's a good point. Leaving it up to the user does make things a > > >> lot > > >> simpler. > > >> > > >> The only downside is the onslaught of users who don't know why a > > >> specific module doesn't work. May be the reason this was added in? > > >> > > > > > > If we keep our dependencies current and write use_ok() style tests for > > > our modules so that > > > > > > ./Build test > > > > > > fails when a dependency is missing I think that we've done our part of > > > the job. We might be able to pick up some automated way to check > > > dependencies (stolen from the autodepend Dist::Zilla plugin or > > > something) and increase our odds of staying on top of it. > > > > We have added some bits to the test suite (largely thanks to Sendu) > > for checking these things for us, so tests requiring a specific module > > are not run and a warning is issued. > > [...] > > What I was describing is a layer simpler than what Sendu et al. have > done. In a module testing Foo, Bar::Bah, instead of > > use Foo; > use Bar::Bah; > > use > > BEGIN { > use_ok('Foo'); > use_ok('Bar::Bah'); > } > > and that way if there's some missing dependency that wasn't properly > specified/dealt with then it's handled as part of the framework, > "rather than just vomiting if its load fails" (in the words of the > Test::More author). > > g. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Marc.Perry at oicr.on.ca Thu Jul 23 01:07:45 2009 From: Marc.Perry at oicr.on.ca (Marc Perry) Date: Thu, 23 Jul 2009 01:07:45 -0400 Subject: [Bioperl-l] QUERY: Correct use of Bio::SeqIO::largefasta? Message-ID: Hi, I would like to use position weight matrices to scan chromosome sized sequences to identify variants of some of my favorite DNA elements. I was able to install the Bioperl compliant TFBS_0.5 modules and use the documentation to create different objects (e.g., PFM, PWM, ICM) and all of the methods I tested seemed to work as expected. However, when I tried to create sequence objects to search I seem to have hit a wall of sorts. Below is my script; almost all of it is cut and pasted from the TFBS::Matrix::PFM POD, or the PODs I cite in the comments. It works fine for a fasta file containing ~ 175,000 nt of worm DNA, but balks on fasta files of 1.7 Mb (and larger) [Output and Error messages included below]. Feeling rather new to Object oriented programming, I also ran the script through the Perl debugger (step-by-step) and convinced myself that Bio::SeqIO::largefasta was indeed opening filehandles in /tmp sub-directories. My reading of the PODs for LargeSeq, LargePrimarySeq, and LargeSeqI makes it sound like I should be manually chopping the chromosomes up into smaller, digestible chunks (probably with a loop?), but thus far I haven't been able to find any good examples that people have commonly used (and examples I have found don't seem to "chunk"). Further details: Bioperl version 1.0069 Perl version 5.10.0 Linux Ubuntu 8.10 with 512 Mb of RAM (because I am running it as a virtual machine using VM-ware running on a Windows XP host). Thanks in advance for your feedback, --Marc Perry Ontario Institute for Cancer Research Code: #!/usr/bin/perl use warnings; use strict; use TFBS::Matrix::PFM; use Bio::Seq; use Bio::SeqIO; my $matrixref = [ [ 5, 5, 5, 5, 5, 5, 85, 5, 5, 5, 5, 85], [ 5, 5, 5, 5, 5, 85, 5, 5, 5, 5, 85, 5], [ 5, 5, 85, 85, 5, 5, 5, 85, 5, 85, 5, 5], [ 85, 85, 5, 5, 85, 5, 5, 5, 85, 5, 5, 5] ]; my $pfm = TFBS::Matrix::PFM->new(-matrix => $matrixref, -name => "CeRep_matrix_1", -ID => "M1000" ); my $pwm = $pfm->to_PWM(); # convert to position weight matrix my $stream = Bio::SeqIO->new(-format => 'largefasta', -fh => \*ARGV); # from Bio::SeqIO POD my $seq = $stream->next_seq(); # from Bio::SeqIO::largefasta POD my $siteset = $pwm->search_seq(-seqobj => $seq, -threshold => "75%"); print $siteset->GFF(); exit; Output from 1.7 Megabase DNA fragment: GET_SEQUENCE: Sequence too long. LOOP_ON_SEQS: get_sequence failed. MAIN: loop_on_seqs failed. Output from 175,000 bp DNA fragment: CHROMOSOME_X TFBS TF binding site 33912 33923 12.950 - 0 TF CeRep_matrix_1 ; class Unknown ; score "12.950" ; sequence ttgggcagagca CHROMOSOME_X TFBS TF binding site 33988 33999 16.494 - 0 TF CeRep_matrix_1 ; class Unknown ; score "16.494" ; sequence ttggtcagtgta CHROMOSOME_X TFBS TF binding site 74439 74450 12.950 + 0 TF CeRep_matrix_1 ; class Unknown ; score "12.950" ; sequence ttaatcagtgca CHROMOSOME_X TFBS TF binding site 74470 74481 16.494 + 0 TF CeRep_matrix_1 ; class Unknown ; score "16.494" ; sequence ttggtcagtgcg CHROMOSOME_X TFBS TF binding site 74535 74546 12.950 + 0 TF CeRep_matrix_1 ; class Unknown ; score "12.950" ; sequence ttggacagtgaa CHROMOSOME_X TFBS TF binding site 74567 74578 12.950 + 0 TF CeRep_matrix_1 ; class Unknown ; score "12.950" ; sequence atggtcagggca CHROMOSOME_X TFBS TF binding site 103365 103376 12.950 + 0 TF CeRep_matrix_1 ; class Unknown ; score "12.950" ; sequence tcggttagtgca CHROMOSOME_X TFBS TF binding site 175608 175619 12.950 - 0 TF CeRep_matrix_1 ; class Unknown ; score "12.950" ; sequence ttggtctgttca Data (truncated to conserve space): >CHROMOSOME_X ctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcct aagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagc ctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcct aagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa gcctaagcctaatctgtgctccaaagccttcgaactgacggacttgtgtc From cjfields at illinois.edu Thu Jul 23 01:51:36 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Jul 2009 00:51:36 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com><4A5E7CE7.4040908@cornell.edu><4A5ED518.7010504@cornell.edu><4A60ACC6.6020003@sendu.me.uk><4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk><1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu><934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk><19043.61297.80141.781810@already.local><19047.29488.841282.578782@already.local><901A5E0C-67C9-4286-B8CC-2BA811543D96@illinois.edu> <19047.50397.694196.227661@already.local> Message-ID: <0038B92D-F3FC-4C85-85BB-7422A6557CBA@illinois.edu> Unless I'm misreading you I think that's how we're currently running things, for instance in Annotation.t: BEGIN { use lib '.'; use Bio::Root::Test; test_begin(-tests => 158); use_ok('Bio::Annotation::Collection'); use_ok('Bio::Annotation::DBLink'); use_ok('Bio::Annotation::Comment'); use_ok('Bio::Annotation::Reference'); use_ok('Bio::Annotation::SimpleValue'); use_ok('Bio::Annotation::Target'); use_ok('Bio::Annotation::AnnotationFactory'); use_ok('Bio::Annotation::StructuredValue'); use_ok('Bio::Annotation::TagTree'); use_ok('Bio::Annotation::Tree'); use_ok('Bio::Seq'); use_ok('Bio::SimpleAlign'); use_ok('Bio::Cluster::UniGene'); } The critical difference is we check (and skip tests for) bioperl modules which have a 'recommends' module dependency based on Build.PL, with requisite messages. Bio::Root::Test has methods built in that check for modules required for a specific set of tests. We're going by the Module::Build definition of 'requires'/'recommends': 1) requires - Items that are necessary for basic functioning. 2) recommends - Items that are recommended for enhanced functionality, but there are ways to use this distribution without having them installed. You might also think of this as "can use" or "is aware of" or "changes behavior in the presence of". The distribution is so large it's hard to require the user to install all modules, particularly those used by one or two 'non-essential' modules, so we deem those as 'recommends'. By non-essential, we mean they don't crash anything beyond their own tests (another reason I pushed for, and we eventually decided to, split up tests prior to the initial 1.6 release). This is also one key reason to split bioperl into more manageable bits; splitting up tests also makes that easier (speaking of, my example above, Annotation.t needs to be split up based on each Annotation type). If anything, our 'required' modules should be a good starting point for what we consider 'core', with everything else requiring additional 'recommended' dependencies split off. chris On Jul 22, 2009, at 9:11 PM, Mark A. Jensen wrote: > chiming in to +1 on this; seems very 'natural' (i.e. plugs into to > the Perl Common Sense, approximated by perl core modules + CPAN) > ----- Original Message ----- From: "George Hartzell" > > To: "Chris Fields" > Cc: "Robert Buels" ; "BioPerl List" >; "George Hartzell" ; "Mark Jensen" > > Sent: Wednesday, July 22, 2009 10:03 PM > Subject: Re: [Bioperl-l] Regarding Bio::Root::Build,was Re: bioperl > reorganization > > >> Chris Fields writes: >> > On Jul 22, 2009, at 3:14 PM, George Hartzell wrote: >> > >> > > Chris Fields writes: >> > >> On Jul 19, 2009, at 11:15 PM, George Hartzell wrote: >> > >> >> > >>> Chris Fields writes: >> > >>>> [...] >> > >>>> Prior to Module::Build the Makefile.PL we just looked for the >> > >>>> dependencies and reported back if they were missing; >> installation >> > >>>> of >> > >>>> those modules was left up to the user. [...] >> > >>> >> > >>> Chiming here a bit late to say that I really *like* it when >> we leave >> > >>> installing the modules to the user. I'd often rather install >> them >> > >>> via >> > >>> e.g. the FreeBSD ports system instead of system, but how/why >> would >> > >>> BioPerl ever know that? >> > >>> >> > >>> g. >> > >> >> > >> That's a good point. Leaving it up to the user does make >> things a >> > >> lot >> > >> simpler. >> > >> >> > >> The only downside is the onslaught of users who don't know why a >> > >> specific module doesn't work. May be the reason this was >> added in? >> > >> >> > > >> > > If we keep our dependencies current and write use_ok() style >> tests for >> > > our modules so that >> > > >> > > ./Build test >> > > >> > > fails when a dependency is missing I think that we've done our >> part of >> > > the job. We might be able to pick up some automated way to check >> > > dependencies (stolen from the autodepend Dist::Zilla plugin or >> > > something) and increase our odds of staying on top of it. >> > >> > We have added some bits to the test suite (largely thanks to Sendu) >> > for checking these things for us, so tests requiring a specific >> module >> > are not run and a warning is issued. >> > [...] >> >> What I was describing is a layer simpler than what Sendu et al. have >> done. In a module testing Foo, Bar::Bah, instead of >> >> use Foo; >> use Bar::Bah; >> >> use >> >> BEGIN { >> use_ok('Foo'); >> use_ok('Bar::Bah'); >> } >> >> and that way if there's some missing dependency that wasn't properly >> specified/dealt with then it's handled as part of the framework, >> "rather than just vomiting if its load fails" (in the words of the >> Test::More author). >> >> g. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Thu Jul 23 07:31:13 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Jul 2009 12:31:13 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> Message-ID: <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> On Wed, Jul 8, 2009 at 5:24 PM, Chris Fields wrote: > > It would be nice to get some regression tests going for this to make sure it > does what we expect, so maybe some test data and expected results? > Regression tests for BioPerl's FASTQ support would of course be sensible. In terms of sample data and expected results... I've got some test files put together for Biopython, and I have been cross checking Biopython's FASTQ support against EMBOSS 6.1.0 which has turned up a few issues: http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000577.html ------------------------------------------------------------------------------ I'd like to get comparisons against BioPerl's new FASTQ support going too. To do this I'd need to know which (branch?) of BioPerl I should install, and I'd also like a trivial sample BioPerl script to do piped FASTQ conversion. i.e. read a FASTQ file from stdin (say as "fastq-solexa"), and output it to stdout (say as "fastq" meaning the Sanger Standard FASTQ). i.e. Something like this four line Biopython script would be perfect: http://biopython.org/wiki/Reading_from_unix_pipes ------------------------------------------------------------------------------ Peter Rice and I have also been talking about line wrapping when writing FASTQ output, and if this is a good idea or not: http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000593.html Thanks! Peter C. (@Biopython) From ajmackey at gmail.com Thu Jul 23 07:43:32 2009 From: ajmackey at gmail.com (Aaron Mackey) Date: Thu, 23 Jul 2009 07:43:32 -0400 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> Message-ID: <24c96eca0907230443l6e7f2c8cq7a7fbc4493130f9b@mail.gmail.com> On Thu, Jul 23, 2009 at 7:31 AM, Peter Cock wrote: > Peter Rice and I have also been talking about line wrapping when > writing FASTQ output, and if this is a good idea or not: > http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000593.html > >From the perspective of many user scripts that expect (irrationally or otherwise) that FASTQ data not be line-wrapped, I'd argue against line-wrapping. The days of "human readable and editable" formats must be coming to an end soon ... Just another 2 cents on the matter, -Aaron From jay at jays.net Thu Jul 23 10:11:26 2009 From: jay at jays.net (Jay Hannah) Date: Thu, 23 Jul 2009 09:11:26 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <66FDE248-4CF8-4F68-91D5-16D0AE30B36E@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> <4A60EBB5.4010004@cornell.edu> <4A60FFF8.3030302@jays.net> <66FDE248-4CF8-4F68-91D5-16D0AE30B36E@illinois.edu> Message-ID: <91389D4D-B46C-49BA-9D5D-04DD82014B1C@jays.net> On Jul 17, 2009, at 10:26 PM, Chris Fields wrote: >> If Bio::Foo::Bar is abandoned by all distributions, a new copy of >> that dist is flagged DEPRECATED ("in favor of Bio::Fooer::Bar"), >> and pushed to CPAN. That clues everyone in that development has >> stopped and where they should go instead. For example: >> >> http://search.cpan.org/~mramberg/Catalyst-Plugin-FormValidator-0.03/ > > Okay, but seems kinda crufty. I do think there is some talk of > removing such modules from the active CPAN, as they would always be > available as part of BackPAN, but I haven't seen movement along > those lines. DEPRECATED modules can be removed from PAUSE after 6 months or 1 year or 50 years or whatever. Better to have it explicitly flagged and sitting out there than not flagged, misleading new users seeking solutions on CPAN. Eventually completely gone. > Yes, I have to say it has been very nice with Moose, though I wish > MooseX::Declare and MooseX::Method::Signatures would move out of > alpha (probably will happen around the first stable release of perl6). Indeed. A current CPAN is no magic bullet for every development dilemma. It's just better than a stagnant one. :) There's still plenty of tactical argument and jockeying in Catalyst and Moose. Like any healthy and active open source project populated by energetic people. On Jul 17, 2009, at 10:31 PM, Chris Fields wrote: > I think both of you made very good arguments. Will have to > nickname you guys the IRC Mob. Oooo... I like it! I'll sketch up some tattoo designs. :) On Jul 18, 2009, at 11:10 AM, Mark A. Jensen wrote: > http://www.bioperl.org/wiki/Module_Connectivity Wow. That's awesome. :) > My guess is that the NumDependencies values (which move fastest in > the Degree sum and create the sawtooth pattern) reflect > dependencies among the modules within the clusters. Wouldn't that > be cool? I don't think this works, but the data could certainly be > cajoled into giving us this. Can the data be cajoled into proving that we've already split up BioPerl, so we can avoid all that SVN drudgery? :) BioPerl became self aware on August 17, 2009. j From maj at fortinbras.us Thu Jul 23 10:16:07 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 23 Jul 2009 10:16:07 -0400 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <91389D4D-B46C-49BA-9D5D-04DD82014B1C@jays.net> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org><4A60EBB5.4010004@cornell.edu> <4A60FFF8.3030302@jays.net><66FDE248-4CF8-4F68-91D5-16D0AE30B36E@illinois.edu> <91389D4D-B46C-49BA-9D5D-04DD82014B1C@jays.net> Message-ID: <025907D4D2344FDC90E915E605B7FEB8@NewLife> Open the pod bay doors, BioPerl. ----- Original Message ----- From: "Jay Hannah" To: Sent: Thursday, July 23, 2009 10:11 AM Subject: Re: [Bioperl-l] bioperl reorganization > On Jul 17, 2009, at 10:26 PM, Chris Fields wrote: >>> If Bio::Foo::Bar is abandoned by all distributions, a new copy of >>> that dist is flagged DEPRECATED ("in favor of Bio::Fooer::Bar"), >>> and pushed to CPAN. That clues everyone in that development has >>> stopped and where they should go instead. For example: >>> >>> http://search.cpan.org/~mramberg/Catalyst-Plugin-FormValidator-0.03/ >> >> Okay, but seems kinda crufty. I do think there is some talk of >> removing such modules from the active CPAN, as they would always be >> available as part of BackPAN, but I haven't seen movement along >> those lines. > > DEPRECATED modules can be removed from PAUSE after 6 months or 1 year > or 50 years or whatever. Better to have it explicitly flagged and > sitting out there than not flagged, misleading new users seeking > solutions on CPAN. Eventually completely gone. > >> Yes, I have to say it has been very nice with Moose, though I wish >> MooseX::Declare and MooseX::Method::Signatures would move out of >> alpha (probably will happen around the first stable release of perl6). > > Indeed. A current CPAN is no magic bullet for every development > dilemma. It's just better than a stagnant one. :) > > There's still plenty of tactical argument and jockeying in Catalyst > and Moose. Like any healthy and active open source project populated > by energetic people. > > On Jul 17, 2009, at 10:31 PM, Chris Fields wrote: >> I think both of you made very good arguments. Will have to >> nickname you guys the IRC Mob. > > Oooo... I like it! I'll sketch up some tattoo designs. :) > > On Jul 18, 2009, at 11:10 AM, Mark A. Jensen wrote: >> http://www.bioperl.org/wiki/Module_Connectivity > > Wow. That's awesome. :) > >> My guess is that the NumDependencies values (which move fastest in >> the Degree sum and create the sawtooth pattern) reflect >> dependencies among the modules within the clusters. Wouldn't that >> be cool? I don't think this works, but the data could certainly be >> cajoled into giving us this. > > > Can the data be cajoled into proving that we've already split up > BioPerl, so we can avoid all that SVN drudgery? :) > > BioPerl became self aware on August 17, 2009. > > j > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From MEC at stowers.org Thu Jul 23 10:53:55 2009 From: MEC at stowers.org (Cook, Malcolm) Date: Thu, 23 Jul 2009 09:53:55 -0500 Subject: [Bioperl-l] cdd-search with remoteblast? In-Reply-To: <50CEDEF1-0BFD-42FB-9820-5AE21AA05C6F@gmail.com> References: <18DF7D20DFEC044098A1062202F5FFF32A1B86932C@exchsth.agresearch.co.nz> <46A05E0132144D73A0F805953B580B2F@jonas> <18DF7D20DFEC044098A1062202F5FFF32A1B8696AA@exchsth.agresearch.co.nz> <426C1893A5AD499DB4DBFEEBD257B254@jonas> <98C9DC3C-80ED-49EF-A6BC-C233336AFEC6@gmail.com> <7BBF64FF-F531-4F7C-8A31-BD04FCE1BF1A@gmail.com> , <50CEDEF1-0BFD-42FB-9820-5AE21AA05C6F@gmail.com> Message-ID: Chris, I figured it was something like that. Isolating the test in a file worked for me to decouple whatever cross-contamination was going on. So, I'll just leave it at that for now unless any objections. I suppose the correct follow on would be to file a bug.... I'm poised to go on vacation.... perhaps when I get back.... I don't use RemoteBlast myself at all in fact.... just wrote the rps thing to make sure I knew how the ncbi service worked... Cheers, Malcolm ________________________________________ From: Chris Fields [cjfields1 at gmail.com] Sent: Wednesday, July 22, 2009 6:32 PM To: Cook, Malcolm Cc: 'BioPerl List'; 'Jonas Schaer' Subject: Re: [Bioperl-l] cdd-search with remoteblast? Malcolm, it's probably not you. Looks like the get/put parameters are set as globals, so there may be cross-contamination of instances (worth checking JIC). You can probably work around that to an extent by encompassing any calls in blocks to localize changes. chris On Jul 21, 2009, at 11:59 AM, Cook, Malcolm wrote: > Chris, > > I wound up adding a new test > > # $Id: RemoteBlast_rpsblast.t 15874 2009-07-21 16:57:54Z mcook $ > > with the comment : > > # malcolm_cook at stowers.org: this test is in a separate file from > # RemoteBlast.t (on which it is modelled) since there is some sort of > # side-effecting between the multiple remote blasts that is causing > # this test to fail, if it comes last, or the other test to fail, if > # this one comes first. THIS IS A BUG EITHER IN REMOTE BLAST OR MY > # UNDERSTANDING, i.e. of how to initialize it. > > In any case, the test passes and demos rpsblast usage. > > Cheers, > > > Malcolm Cook > Stowers Institute for Medical Research - Kansas City, Missouri > > >> -----Original Message----- >> From: Chris Fields [mailto:cjfields1 at gmail.com] >> Sent: Friday, July 10, 2009 1:05 PM >> To: Cook, Malcolm >> Cc: 'Jonas Schaer'; 'BioPerl List' >> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >> >> Malcolm, >> >> Nice! Go ahead and add the test in; we can look at trying to >> get CDD_SEARCH working at some point but this is a nice workaround. >> >> chris >> >> On Jul 10, 2009, at 10:45 AM, Cook, Malcolm wrote: >> >>> Chris, I've added a test to bioperl RemoteBlast.t that demonstrates >>> the following. Is it appropriate to submit it? >>> >>> Jonas, OK, I was a little quick on the gun... but I've got it now. >>> >>> You don't need to change the wrapper. Here is what you need to do: >>> >>> # 1) set your database like this: >>> >>> -database => 'cdsearch/cdd', # c.f. >>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html >>> for other cdd database options >>> >>> # 2) add this line before submitting the job: >>> $Bio::Tools::Run::RemoteBlast::HEADER{'SERVICE'} = 'rpsblast'; >>> >>> You're in - No other changes needed. >>> >>> Malcolm Cook >>> Stowers Institute for Medical Research - Kansas City, Missouri >>> >>> >>>> -----Original Message----- >>>> From: Jonas Schaer [mailto:Brotelzwieb at gmx.de] >>>> Sent: Friday, July 10, 2009 4:18 AM >>>> To: BioPerl List; Cook, Malcolm; Chris Fields >>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>> >>>> Hi, >>>> I tried to do what Malcom proposed my ($prog = 'rpsblast'; >>>> my $db = >>>> 'CDD';) but that didn't work. >>>> >>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> MSG: Value rpsblast for PUT parameter PROGRAM does not match >>>> expression t?blast[ pnx]. Rejecting. >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>> C:/Perl/site/lib/Bio/Tools >>>> /Run/RemoteBlast.pm:329 >>>> STACK: Bio::Tools::Run::RemoteBlast::new >>>> C:/Perl/site/lib/Bio/Tools/Run/RemoteBl >>>> ast.pm:257 >>>> STACK: blast_a_seq2.pm:14 >>>> ----------------------------------------------------------- >>>> So I should try to "change the wrapper to allow >> 'rpsblast'", right? >>>> Could You tell me how to do that, please? So sorry but I >> have no idea >>>> yet...:) If that doesn't work, is there any other way to run >>>> cdd-searches with perl? >>>> Thank you so much! >>>> Regards, Jonas >>>> >>>> ----- Original Message ----- >>>> From: "Chris Fields" >>>> To: "Cook, Malcolm" >>>> Cc: "'Jonas Schaer'" ; "'BioPerl List'" >>>> ; "'Smithies, Russell'" >>>> ; >>>> Sent: Thursday, July 09, 2009 9:19 PM >>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>> >>>> >>>>> I've scheduled this tentatively for the 1.6 release >> series (just not >>>>> sure when yet). It may work as is, but I haven't tried >> it out yet >>>>> (and am hazarding to guess it only retrieves the single >> main RID at >>>>> the moment). >>>>> >>>>> chris >>>>> >>>>> On Jul 9, 2009, at 10:56 AM, Cook, Malcolm wrote: >>>>> >>>>>> Jonas, >>>>>> >>>>>> If you want to continue to use the bioperl remoteblast >> interface, >>>>>> probably what you should do is simply call it twice. >>>>>> >>>>>> Once, as you already know how to do, which will return >> without CDD >>>>>> results. >>>>>> >>>>>> Secondly, to get the CDD results, call remoteblast a second time. >>>>>> This time, using >>>>>> -database => 'CDD' >>>>>> -program => 'rpsblast' >>>>>> >>>>>> However, the wrapper may object to the 'rpsblast' >> program. It is >>>>>> not listed in the POD - >>>>>> >>>> http://search.cpan.org/~cjfields/BioPerl-1.6.0/Bio/Tools/Run/R >>>> emoteBlast.pm) >>>>>> If so, my guess is that changing the perl wrapper to allow >>>>>> rpsblast will "just work" (tm). I've cc:ed >>>> cjfields at bioperl.org for >>>>>> his opinion on this. >>>>>> >>>>>> Also, you might want to perform the CDD search first, >> especially if >>>>>> you are streaming results to eyeball that might like >> something to >>>>>> look at while the second (presumably longer) search is running. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Malcolm Cook >>>>>> Stowers Institute for Medical Research - Kansas City, Missouri >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >> Of Jonas >>>>>>> Schaer >>>>>>> Sent: Thursday, July 09, 2009 5:16 AM >>>>>>> To: BioPerl List; Smithies, Russell >>>>>>> Subject: Re: [Bioperl-l] cdd-search with remoteblast? >>>>>>> >>>>>>> Hi guys, >>>>>>> Thank you all so much for your help and patience :). Of >> course you >>>>>>> were right and I finaly found the right put-parameter to get >>>>>>> exactly the same hits as on the homepage. >>>>>>> I do have an other question though :)... >>>>>>> I now want to include a search for conserved domains, >> but when I >>>>>>> try to use the CDD_SEARCH-parameter >>>>>>> (http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node16.html# >>>>>>> sub:CDD_SEARCH) >>>>>>> like the other put-parameters the way chris once told me(works >>>>>>> fine with the other params): >>>>>>> >>>>>>> my %put = ( >>>>>>> WORD_SIZE => 3, >>>>>>> HITLIST_SIZE => 100, >>>>>>> THRESHOLD => 11, >>>>>>> FILTER => 'R', >>>>>>> GENETIC_CODE => 1, >>>>>>> CDD_SEARCH => 'on' >>>>>>> ###I tried it >>>>>>> with 'true' and '1', too. >>>>>>> >>>>>>> ); >>>>>>> >>>>>>> for my $putName (keys %put) { >>>>>>> $factory->submit_parameter($putName,$put{$putName}); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> ...an exception is thrown: >>>>>>> >>>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>>> MSG: CDD_SEARCH is not a valid PUT parameter. >>>>>>> STACK: Error::throw >>>>>>> STACK: Bio::Root::Root::throw >>>> C:/Perl/site/lib/Bio/Root/Root.pm:359 >>>>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>>>>> C:/Perl/site/lib/Bio/Tools >>>>>>> /Run/RemoteBlast.pm:325 >>>>>>> STACK: main::blast_a_sequence firsteval0.8.pm:383 >>>>>>> STACK: main::blast_it firsteval0.8.pm:288 >>>>>>> STACK: firsteval0.8.pm:35 >>>>>>> ----------------------------------------------------------- . >>>>>>> I guess somehow this could be the solution to my problem: >>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/new/node78.html#s >>>>>>> ub:RID-for-Simultaneous >>>>>>> , but unfortunately I don't understand what to do. >>>>>>> I'm so sorry to bother you with this but please help me once >>>>>>> more...:) >>>>>>> >>>>>>> Best regards and thanks in advance, Jonas >>>>>>> >>>>>>> ----- Original Message ----- >>>>>>> From: "Smithies, Russell" >>>>>>> To: "'Jonas Schaer'" >>>>>>> Cc: "'Chris Fields'" ; "'BioPerl List'" >>>>>>> >>>>>>> Sent: Monday, July 06, 2009 10:56 PM >>>>>>> Subject: RE: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>> >>>>>>> >>>>>>> Hi Jonas, >>>>>>> You can't just play with the BLAST parameters and hope >>>> for a "better" >>>>>>> result. >>>>>>> I'd suggest that if you aren't sure what they do, you >> should leave >>>>>>> them alone as small changes can make huge differences in the >>>>>>> output - it's quite possible to miss finding what >> you're looking >>>>>>> for by using >>>> the wrong >>>>>>> parameters. >>>>>>> If all else fails, read the blast manual: >>>>>>> http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastall/blastall >>>>>>> _all.html >>>>>>> http://www.ncbi.nlm.nih.gov/blast/tutorial/ >>>>>>> Or Read Ian Korfs' excellent book: >>>>>>> http://books.google.com/books?id=xvcnhDG9fNUC&lpg=PR17&ots=WJp >>>>>> fuHF6Hn&dq=ian%20korf%20%20blast%20book&pg=PA3 >>>>>>> >>>>>>> Don't worry about the integer overflow bug as there's >> nothing you >>>>>>> can do about it. If you're interested, Google and Wikipedia are >>>>>>> your >>>>>>> friends: >>>>>>> http://en.wikipedia.org/wiki/Integer_overflow >>>>>>> >>>>>>> >>>>>>> Russell >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>>> Sent: Tuesday, 7 July 2009 12:14 a.m. >>>>>>>> To: BioPerl List; Chris Fields >>>>>>>> Subject: Re: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>> >>>>>>>> Hi guys, thanks for your answers so far. >>>>>>>> @jason: integer overflow in blast.... sorry, but what do >>>>>>> you mean by that? >>>>>>>> how can I fix it...? >>>>>>>> >>>>>>>> Since I never really changed any parameters I thought them >>>>>>> all to be >>>>>>>> default. >>>>>>>> whatever, I tried to get "better" results with my prog >>>> by changing >>>>>>>> these: >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = '11 1'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = '10'; >>>>>>>> >>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATI >>>>>>> STICS'} = >>>>>>>> '1'; >>>>>>>> with no effect...I guess these were default values anyway. >>>>>>>> >>>>>>>> So please maybe you can tell me all the other parameters I >>>>>>> can change with >>>>>>>> my >>>>>>>> perl-skript AND how to do that? >>>>>>>> Unfortunately both, perl and the blast-algorithm are pretty >>>>>>> much new to >>>>>>>> me, >>>>>>>> maybe thats why I just cannot find out how to do that on my >>>>>>> own... :/ >>>>>>>> >>>>>>>> Here is the output I get with my remote-blast skript: >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ################################### >>>>>>>> Query Name: >>>>>>>> >>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSL >>>>>>>> L >>>>>>>> hit name is ref|XP_001702807.1| >>>>>>>> score is 442 >>>>>>>> BLASTP 2.2.21+ >>>>>>>> Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro >>>>>>> A. Schaffer, >>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>>> Lipman (1997), >>>>>>>> "Gapped >>>>>>>> BLAST and PSI-BLAST: a new generation of protein >> database search >>>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>>> >>>>>>>> >>>>>>>> Reference for composition-based statistics: Alejandro A. >>>>>>>> Schaffer, L. Aravind, Thomas L. Madden, Sergei Shavirin, >>>>>>> John L. Spouge, >>>>>>>> Yuri >>>>>>>> I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001), >>>>>>> "Improving the >>>>>>>> accuracy of PSI-BLAST protein database searches with >>>>>>> composition-based >>>>>>>> statistics and other refinements", Nucleic Acids Res. >>>> 29:2994-3005. >>>>>>>> >>>>>>>> >>>>>>>> RID: 53STX5G2013 >>>>>>>> >>>>>>>> >>>>>>>> Database: All non-redundant GenBank CDS >>>>>>>> translations+PDB+SwissProt+PIR+PRF excluding >>>> environmental samples >>>>>>>> from WGS projects >>>>>>>> 9,252,587 sequences; 3,169,972,781 total >> letters Query= >>>>>>>> >>>>>>> >>>> >> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLRSLL >>>>>>>> >>>>>>> >>>> >> DVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDNAFRQAHQNTA >>>> M >>>>>>>> ATGPDPDDEYE >>>>>>>> Length=150 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Score >>>>>>>> E >>>>>>>> Sequences producing significant alignments: >>>>>>> (Bits) >>>>>>>> Value >>>>>>>> >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>>>> reinhard... 174 >>>>>>>> 2e-42 >>>>>>>> >>>>>>>> >>>>>>>> ALIGNMENTS >>>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>> reinhardtii] >>>>>>>> gb|EDP06586.1| ClpS-like protein [Chlamydomonas reinhardtii] >>>>>>>> Length=303 >>>>>>>> >>>>>>>> Score = 174 bits (442), Expect = 2e-42, Method: >>>>>>> Composition-based >>>>>>>> stats. >>>>>>>> Identities = 150/150 (100%), Positives = 150/150 (100%), >>>>>>> Gaps = 0/150 >>>>>>>> (0%) >>>>>>>> >>>>>>>> Query 1 >>>>>>> MGSSSVGTYHLLLVLMgaggeqqavqagaevaSTEQVDGSGMAANSRGSTSGSEQPPrds >>>>>>>> 60 >>>>>>>> >>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>>>> Sbjct 154 >>>>>>> MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDS >>>>>>>> 213 >>>>>>>> >>>>>>>> Query 61 >>>>>>> dlgllrslldVAGVDRTalevkllalaeagaeMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> 120 >>>>>>>> >>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> Sbjct 214 >>>>>>> DLGLLRSLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVAR >>>>>>>> 273 >>>>>>>> >>>>>>>> Query 121 AWHERDDNAFRQAHQNTAMATGPDPDDEYE 150 >>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE Sbjct 274 >>>>>>>> AWHERDDNAFRQAHQNTAMATGPDPDDEYE 303 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Database: All non-redundant GenBank CDS >>>>>>>> translations+PDB+SwissProt+PIR+PRF >>>>>>>> excluding environmental samples from WGS projects >>>>>>>> Posted date: Jul 5, 2009 4:41 AM Number of letters in >>>>>>>> database: -1,124,994,511 Number of sequences in database: >>>>>>>> 9,252,587 >>>>>>>> >>>>>>>> Lambda K H >>>>>>>> 0.309 0.122 0.345 >>>>>>>> Gapped >>>>>>>> Lambda K H >>>>>>>> 0.267 0.0410 0.140 >>>>>>>> Matrix: BLOSUM62 >>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of >> Sequences: >>>>>>>> 9252587 Number of Hits to DB: 60273703 Number of extensions: >>>>>>>> 1448367 Number of successful extensions: 2103 Number >> of sequences >>>>>>>> better than 10: 0 Number of HSP's better than 10 >> without gapping: >>>>>>>> 0 Number of HSP's gapped: 2113 Number of HSP's successfully >>>>>>>> gapped: 0 Length of query: 150 Length of database: 3169972781 >>>>>>>> Length adjustment: 113 Effective length of query: 37 Effective >>>>>>>> length of database: 2124430450 Effective search space: >>>>>>>> 78603926650 Effective search space used: 78603926650 >>>>>>>> T: 11 >>>>>>>> A: 40 >>>>>>>> X1: 16 (7.1 bits) >>>>>>>> X2: 38 (14.6 bits) >>>>>>>> X3: 64 (24.7 bits) >>>>>>>> S1: 42 (20.8 bits) >>>>>>>> S2: 74 (33.1 bits) >>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ################################### >>>>>>>> and here are the hits (?) of the blast-algorithm on the >>>>>>> ncbi-homepage with >>>>>>>> the same query of course: >>>>>>>> ref|XP_001702807.1| ClpS-like protein [Chlamydomonas >>>>>>> reinhard... 300 >>>>>>>> 3e-80 >>>>>>>> ref|XP_001942719.1| PREDICTED: similar to GA16705-PA >>>>>>> [Acyrtho... 36.2 >>>>>>>> 1.1 >>>>>>>> ref|ZP_03781446.1| hypothetical protein RUMHYD_00880 >>>>>>> [Blautia... 35.4 >>>>>>>> 1.8 >>>>>>>> ref|XP_001563232.1| leucyl-tRNA synthetase [Leishmania >>>>>>> brazil... 34.3 >>>>>>>> 4.2 >>>>>>>> ref|XP_680841.1| hypothetical protein AN7572.2 >>>>>>> [Aspergillus n... 33.5 >>>>>>>> 6.0 >>>>>>>> ref|YP_001768110.1| hypothetical protein M446_1150 >>>>>>> [Methyloba... 33.5 >>>>>>>> 7.0 >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>> ###################################at >>>>>>>> least the first hit is the same, but even there there is a >>>>>>> different score >>>>>>>> and e-value. >>>>>>>> >>>>>>>> thanks so much for any help :) >>>>>>>> regards, jonas >>>>>>>> >>>>>>>> >>>>>>>> ----- Original Message ----- >>>>>>>> From: "Chris Fields" >>>>>>>> To: "Jason Stajich" >>>>>>>> Cc: "Smithies, Russell" >>>>>>> ; "'BioPerl >>>>>>>> List'" ; "'Jonas Schaer'" >>>>>>>> >>>>>>>> Sent: Monday, July 06, 2009 12:51 AM >>>>>>>> Subject: Re: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>> >>>>>>>> >>>>>>>>> That inspires confidence ;> >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Jul 5, 2009, at 4:40 PM, Jason Stajich wrote: >>>>>>>>> >>>>>>>>>> integer overflow in blast.... >>>>>>>>>> >>>>>>>>>> On Jul 5, 2009, at 2:00 PM, Smithies, Russell wrote: >>>>>>>>>> >>>>>>>>>>> I'd guess it's a difference in the parameters used. >>>>>>>>>>> Interesting that both have the number of letters in >> the db as >>>>>>>>>>> "-1,125,070,205", I assume that's a bug :-) >>>>>>>>>>> >>>>>>>>>>> Stats from your remote_blast: >>>>>>>>>>> >>>>>>>>>>> 'stats' => { >>>>>>>>>>> 'S1' => '42', >>>>>>>>>>> 'S1_bits' => '20.8', >>>>>>>>>>> 'lambda' => '0.309', >>>>>>>>>>> 'entropy' => '0.345', >>>>>>>>>>> 'kappa_gapped' => '0.0410', >>>>>>>>>>> 'T' => '11', >>>>>>>>>>> 'kappa' => '0.122', >>>>>>>>>>> 'X3_bits' => '24.7', >>>>>>>>>>> 'X1' => '16', >>>>>>>>>>> 'lambda_gapped' => '0.267', >>>>>>>>>>> 'X2' => '38', >>>>>>>>>>> 'S2' => '74', >>>>>>>>>>> 'seqs_better_than_cutoff' => '0', >>>>>>>>>>> 'posted_date' => 'Jul 4, 2009 4:41 AM', >>>>>>>>>>> 'Hits_to_DB' => '60102303', >>>>>>>>>>> 'dbletters' => '-1125070205', >>>>>>>>>>> 'A' => '40', >>>>>>>>>>> 'num_successful_extensions' => '2004', >>>>>>>>>>> 'num_extensions' => '1436892', >>>>>>>>>>> 'X1_bits' => '7.1', >>>>>>>>>>> 'X3' => '64', >>>>>>>>>>> 'entropy_gapped' => '0.140', >>>>>>>>>>> 'dbentries' => '9252258', >>>>>>>>>>> 'X2_bits' => '14.6', >>>>>>>>>>> 'S2_bits' => '33.1' >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Stats from a blast done on the NCBI webpage: >>>>>>>>>>> >>>>>>>>>>> Database: All non-redundant GenBank CDS >>>>>>> translations+PDB+SwissProt >>>>>>>>>>> +PIR+PRF >>>>>>>>>>> excluding environmental samples from WGS projects >> Posted date: >>>>>>>>>>> Jul 4, 2009 4:41 AM Number of letters in database: >>>>>>>>>>> -1,125,070,205 Number of sequences in database: 9,252,258 >>>>>>>>>>> >>>>>>>>>>> Lambda K H >>>>>>>>>>> 0.309 0.124 0.340 >>>>>>>>>>> Gapped >>>>>>>>>>> Lambda K H >>>>>>>>>>> 0.267 0.0410 0.140 >>>>>>>>>>> Matrix: BLOSUM62 >>>>>>>>>>> Gap Penalties: Existence: 11, Extension: 1 Number of >>>>>>>>>>> Sequences: 9252258 Number of Hits to DB: 86493230 Number of >>>>>>>>>>> extensions: 3101413 Number of successful extensions: 9001 >>>>>>>>>>> Number of sequences better than 100: 65 Number of >> HSP's better >>>>>>>>>>> than 100 without gapping: 0 Number of HSP's gapped: 9000 >>>>>>>>>>> Number of HSP's successfully gapped: 66 Length of >> query: 150 >>>>>>>>>>> Length of database: 3169897087 Length adjustment: 113 >>>>>>>>>>> Effective length of query: 37 Effective length of database: >>>>>>>>>>> 2124391933 Effective search space: 78602501521 Effective >>>>>>>>>>> search space used: 78602501521 >>>>>>>>>>> T: 11 >>>>>>>>>>> A: 40 >>>>>>>>>>> X1: 16 (7.1 bits) >>>>>>>>>>> X2: 38 (14.6 bits) >>>>>>>>>>> X3: 64 (24.7 bits) >>>>>>>>>>> S1: 42 (20.8 bits) >>>>>>>>>>> S2: 65 (29.6 bits) >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l- >>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Jonas Schaer >>>>>>>>>>>> Sent: Sunday, 28 June 2009 10:15 p.m. >>>>>>>>>>>> To: BioPerl List >>>>>>>>>>>> Subject: [Bioperl-l] different results with >>>> remote-blast skript >>>>>>>>>>>> >>>>>>>>>>>> Hi again :) >>>>>>>>>>>> please, I only have this little question: >>>>>>>>>>>> why do I get different results with my remote::blast >>>>>>> perl skript >>>>>>>>>>>> then on the >>>>>>>>>>>> ncbi blast homepage? >>>>>>>>>>>> I am using blastp, the query is an amino-sequence >> (different >>>>>>>>>>>> results with any sequence, differences not only in >> number of >>>>>>>>>>>> hits but >>>> even in e- >>>>>>>>>>>> values, scores >>>>>>>>>>>> etc...), the database is 'nr'. >>>>>>>>>>>> PLEASE help me, >>>>>>>>>>>> thank you in advance, >>>>>>>>>>>> Jonas >>>>>>>>>>>> >>>>>>>>>>>> ps: my skript: >>>>>>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>>>>>> ## >>>>>>>>>>>> use Bio::Seq::SeqFactory; >>>>>>>>>>>> use Bio::Tools::Run::RemoteBlast; use strict; my >>>>>>>>>>>> @blast_report; my $prog = 'blastp'; >>>>>>>>>>>> my $db = 'nr'; >>>>>>>>>>>> my $e_val= '1e-10'; >>>>>>>>>>>> #my $e_val= '10'; >>>>>>>>>>>> my @params = ( '-prog' => $prog, >>>>>>>>>>>> '-data' => $db, >>>>>>>>>>>> '-expect' => $e_val, >>>>>>>>>>>> '-readmethod' => 'SearchIO' ); my $factory = >>>>>>>>>>>> Bio::Tools::Run::RemoteBlast->new(@params); >>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} >> = '11 1'; >>>>>>>>>>>> >> $Bio::Tools::Run::RemoteBlast::HEADER{'MAX_NUM_SEQ'} = '100'; >>>>>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'EXPECT'} = >> '10'; $ Bio >>>>>>>>>>>> >>>>>>> >> ::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} >>>>>>>>>>>> = '1'; >>>>>>>>>>>> >>>>>>>>>>>> my >>>>>>>>>>>> $ >>>>>>>>>>>> blast_seq >>>>>>>>>>>> >>>>>>> >>>> >> ='MGSSSVGTYHLLLVLMGAGGEQQAVQAGAEVASTEQVDGSGMAANSRGSTSGSEQPPRDSDLGLLR >>>>>>>>>>>> >>>>>>>> >>>>>>> SLLDVAGVDRTALEVKLLALAEAGAEMPPAQDSQATAAGVVATLTSVYRQQVARAWHERDDN >>>>>>> AFRQAHQNTAMATGPD >>>>>>>>>>>> PDDEYE'; >>>>>>>>>>>> #$v is just to turn on and off the messages my $v = 1; my >>>>>>>>>>>> $seqbuilder = Bio::Seq::SeqFactory->new('-type' => >>>>>>>>>>>> 'Bio::PrimarySeq'); my $seq = $seqbuilder->create(-seq >>>>>>>>>>>> =>$blast_seq, >>>> -display_id => >>>>>>>>>>>> "$blast_seq"); >>>>>>>>>>>> my $filename='temp2.out'; >>>>>>>>>>>> my $r = $factory->submit_blast($seq); print STDERR >>>>>>>>>>>> "waiting..." if( $v > 0 ); while ( my @rids = >>>>>>>>>>>> $factory->each_rid ) { >>>>>>>>>>>> foreach my $rid ( @rids ) >>>>>>>>>>>> { >>>>>>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>>>>>> if( !ref($rc) ) >>>>>>>>>>>> { >>>>>>>>>>>> if( $rc < 0 ) >>>>>>>>>>>> { >>>>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>>>> } >>>>>>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>>>>>> } >>>>>>>>>>>> else >>>>>>>>>>>> { >>>>>>>>>>>> my $result = $rc->next_result(); >>>>>>>>>>>> $factory->save_output($filename); >>>>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>>>> print "\nQuery Name: ", >>>>>>> $result->query_name(), >>>>>>>>>>>> "\n"; >>>>>>>>>>>> while ( my $hit = $result->next_hit ) >>>>>>>>>>>> { >>>>>>>>>>>> next unless ( $v > 0); >>>>>>>>>>>> print "\thit name is ", >>>> $hit->name, "\n"; >>>>>>>>>>>> while( my $hsp = $hit->next_hsp ) >>>>>>>>>>>> { >>>>>>>>>>>> print "\t\tscore is ", >>>>>>> $hsp->score, "\n"; >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> } >>>>>>>>>>>> @blast_report = get_file_data ($filename); return >>>>>>>>>>>> @blast_report; >>>>>>>>>>>> >>>>>>>> >>>>>>> ############################################################## >>>>>>> ################ >>>>>>>>>>>> #### >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> >>>>>>> >>>> >> ===================================================================== >>>>>>>>>>> Attention: The information contained in this message and/or >>>>>>>>>>> attachments from AgResearch Limited is intended only for the >>>>>>> persons or entities >>>>>>>>>>> to which it is addressed and may contain >> confidential and/or >>>>>>>>>>> privileged material. Any review, retransmission, >> dissemination >>>> or other use >>>>>>>>>>> of, or >>>>>>>>>>> taking of any action in reliance upon, this information >>>>>>> by persons or >>>>>>>>>>> entities other than the intended recipients is >> prohibited by >>>>>>>>>>> AgResearch Limited. If you have received this message in >>>>>>>>>>> error, >>>>>>> please notify >>>>>>>>>>> the >>>>>>>>>>> sender immediately. >>>>>>>>>>> = >>>>>>>>>>> = >>>>>>>>>>> >>>>>>> >>>> >> ===================================================================== >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jason Stajich >>>>>>>>>> jason at bioperl.org >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -------------------------------------------------------------- >>>>>>> ---------------- >>>>>>>> -- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> No virus found in this incoming message. >>>>>>>> Checked by AVG - www.avg.com >>>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2219 - Release >>>>>>> Date: 07/05/09 >>>>>>>> 05:53:00 >>>>>>> >>>>>>> >>>>>>> -------------------------------------------------------------- >>>>>>> ------------------ >>>>>>> >>>>>>> >>>>>>> >>>>>>> No virus found in this incoming message. >>>>>>> Checked by AVG - www.avg.com >>>>>>> Version: 8.5.375 / Virus Database: 270.13.5/2220 - Release >>>>>>> Date: 07/05/09 >>>>>>> 17:54:00 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>> >>>> >>>> -------------------------------------------------------------- >>>> ------------------ >>>> >>>> >>>> >>>> No virus found in this incoming message. >>>> Checked by AVG - www.avg.com >>>> Version: 8.5.375 / Virus Database: 270.13.8/2227 - Release >>>> Date: 07/09/09 >>>> 05:55:00 >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hartzell at alerce.com Thu Jul 23 11:33:11 2009 From: hartzell at alerce.com (George Hartzell) Date: Thu, 23 Jul 2009 08:33:11 -0700 Subject: [Bioperl-l] Regarding Bio::Root::Build, was Re: bioperl reorganization In-Reply-To: <0038B92D-F3FC-4C85-85BB-7422A6557CBA@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <19043.61297.80141.781810@already.local> <19047.29488.841282.578782@already.local> <901A5E0C-67C9-4286-B8CC-2BA811543D96@illinois.edu> <19047.50397.694196.227661@already.local> <0038B92D-F3FC-4C85-85BB-7422A6557CBA@illinois.edu> Message-ID: <19048.33463.731104.531095@already.local> Chris Fields writes: > Unless I'm misreading you I think that's how we're currently running > things, for instance in Annotation.t: > > BEGIN { > use lib '.'; > use Bio::Root::Test; > > test_begin(-tests => 158); > > use_ok('Bio::Annotation::Collection'); > [...] You're spot on. That's exactly what I meant. g. From bix at sendu.me.uk Thu Jul 23 14:00:21 2009 From: bix at sendu.me.uk (bix at sendu.me.uk) Date: Thu, 23 Jul 2009 19:00:21 +0100 (BST) Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> Message-ID: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> > On Jul 19, 2009, at 8:29 PM, bix at sendu.me.uk wrote: >> I'm not sure I follow. How does reverting back to Module::Build help >> core installers choose what they want to install? > > Prior to Module::Build the Makefile.PL we just looked for the > dependencies and reported back if they were missing; installation of > those modules was left up to the user. I don't necessarily think it's > our *responsibility* to make the job easier for the user to choose and > install modules other than BioPerl. We just need to indicate what > they may need to run certain modules (the warnings about missing > recommended dependencies). OK, and given what the others have said, perhaps we shouldn't take this on as our responsibility. So, just say that everything currently 'recommended' is 'required'? Is that what we really want to do? (The opposite, to say that nothing is required, would be really very broken behavior for CPAN and other packing systems) >> I'm aware of no such functionality outside of B::R::Build. >> Elaborate? (re: recommend/require queue) > > Determining what is recommended/required (and checking for them) is > handled within Bio::Root::Build, is that correct? We could make those > decisions prior to creating the instance, or take care of this > internally (rearrange 'recommends'/'requires' based on what the user > wants). When in CPAN/CPANPLUS shell push the installation of those to > allow the currently running shell to do the installation; don't spawn > an additional shell. That's all. IIRC what B:R:Build is supposed to do is exactly that: not spawn a new shell but simply to make CPAN think that the user's desired recommended modules are actually required prerequisite modules. Then CPAN handles those in the normal way. If this isn't happening, then something is broken. CPAN should only be loaded when B:R:Build detects that CPAN isn't currently running. > The three critical issues (as I've pointed out before) are: > > 1) Getting CPANPLUS installation working, which may be just META.yml, > or it may be shell-related. I would like it for CPAN Testers, if for > nothing else. That's at least 2 bug reports, maybe more. > 2) Bio::Root::Build converted towards a Module::Build-compliant API, > or we'll need to convert run/db/network to Module::Build. 1 bug report. > 3) Avoid potential infinite looping. This may be Gbrowse-related via > the net install script, but if Build.PL is being called in some way > that potentially causes recursion we need to be aware of it. This one > appears rarely, but I did manage to replicate it using an old > Module::Build (I can't recall if I used the net install script or > not). 1 bug report. OK, I propose to look into these. Almost certainly I'll be doing "convert run/db/network to Module::Build". I'll try to resolve the bugs you've mentioned. It might be a week or so before I get started since I'm currently on holiday away from a usable computer. From cjfields at illinois.edu Thu Jul 23 15:40:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Jul 2009 14:40:25 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <60071D76-9D3E-492B-A270-D09A2A881992@illinois.edu> On Jul 23, 2009, at 1:00 PM, bix at sendu.me.uk wrote: >> On Jul 19, 2009, at 8:29 PM, bix at sendu.me.uk wrote: >>> I'm not sure I follow. How does reverting back to Module::Build help >>> core installers choose what they want to install? >> >> Prior to Module::Build the Makefile.PL we just looked for the >> dependencies and reported back if they were missing; installation of >> those modules was left up to the user. I don't necessarily think >> it's >> our *responsibility* to make the job easier for the user to choose >> and >> install modules other than BioPerl. We just need to indicate what >> they may need to run certain modules (the warnings about missing >> recommended dependencies). > > OK, and given what the others have said, perhaps we shouldn't take > this on > as our responsibility. So, just say that everything currently > 'recommended' is 'required'? Is that what we really want to do? > > (The opposite, to say that nothing is required, would be really very > broken behavior for CPAN and other packing systems) Actually, the answer's both 'yes' and 'no'. We should leave them as 'recommends' until we split off packages that would end up requiring them (and it those packages, set them as 'requires'). >>> I'm aware of no such functionality outside of B::R::Build. >>> Elaborate? (re: recommend/require queue) >> >> Determining what is recommended/required (and checking for them) is >> handled within Bio::Root::Build, is that correct? We could make >> those >> decisions prior to creating the instance, or take care of this >> internally (rearrange 'recommends'/'requires' based on what the user >> wants). When in CPAN/CPANPLUS shell push the installation of those >> to >> allow the currently running shell to do the installation; don't spawn >> an additional shell. That's all. > > IIRC what B:R:Build is supposed to do is exactly that: not spawn a new > shell but simply to make CPAN think that the user's desired > recommended > modules are actually required prerequisite modules. Then CPAN handles > those in the normal way. > > If this isn't happening, then something is broken. CPAN should only be > loaded when B:R:Build detects that CPAN isn't currently running. If that is what is going on already then we're okay on that point. It'll become less and less necessary to worry about that as we break away modules with those prerequisites. The method used to check for the shell process isn't fullproof and doesn't catch all cases (for instance, if you are running CPANPLUS shell instead). What devs use now is look for env var CPAN_IS_RUNNING/ CPANPLUS_IS_RUNNING; if both are set, you are running CPAN, if only the latter, CPANPLUS. I would just look for either/or and turn it off. Also, we are attempting to load the proper version of Module::Build prior to actually running (in BEGIN block). I think that's where we were running into the weird looping issue; if Module::Build doesn't install correctly for whatever reason, we don't have the correct version, so it tries over and over. Should that just be a 'build_requires'? I'll see if I can come up with the conditions to replicate that. >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug >> report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > OK, I propose to look into these. Almost certainly I'll be doing > "convert > run/db/network to Module::Build". I'll try to resolve the bugs you've > mentioned. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. That works for me. I can spend more time on getting on new point release ready, we can merge over changes when they are made. chris From rmb32 at cornell.edu Thu Jul 23 15:38:30 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 23 Jul 2009 12:38:30 -0700 Subject: [Bioperl-l] genbank (blast) alignments In-Reply-To: <06C35F8D-1EE5-4882-8BF4-111311FBEEC4@ohsu.edu> References: <06C35F8D-1EE5-4882-8BF4-111311FBEEC4@ohsu.edu> Message-ID: <4A68BC36.7080006@cornell.edu> Wow, that silence is deafening. I can't believe somebody who knows what they're talking about hasn't written you back yet. Perhaps you could do some kind of transformation where you read in the BLAST report with Bio::SearchIO, and then write to MSF with Bio::AlignIO::msf? You would probably need to do some fiddling to create the proper objects and relationships that Bio::AlignIO::msf would want. But this reply probably isn't helpful, because you probably already knew that much. I'm mostly just trying to add to this thread so that people who actually know a lot about BioPerl's functions in this area will see it and hopefully be of more help. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu Thomas Keller wrote: > Greetings, > Blast 2.2.21 has a multi-sequence alignment feature that is really > handy: put in the accession number of the refseq in one sequence field > and a concatenated fasta file of the Sanger reads to align in the second > box and it does the alignments. Unfortunately, the output is a series of > alignments rather than the more useful msf format with all reads aligned > with the reference. > > Is there a bioperl module that reads the blast alignments and converts > it to an msf alignment? > > thanks, > > > Tom > kellert at ohsu.edu > 503-494-2442 > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason at bioperl.org Thu Jul 23 16:00:51 2009 From: jason at bioperl.org (Jason Stajich) Date: Thu, 23 Jul 2009 14:00:51 -0600 Subject: [Bioperl-l] genbank (blast) alignments In-Reply-To: <4A68BC36.7080006@cornell.edu> References: <06C35F8D-1EE5-4882-8BF4-111311FBEEC4@ohsu.edu> <4A68BC36.7080006@cornell.edu> Message-ID: <902CCCFE-8E3D-4A73-AE6F-2E3207E6EC0B@bioperl.org> HSP->get_aln will give you the pairwise alignment as a multiple sequence aln object but the short answer is there is not a parser for the multi-sequence alignment format from BLAST that I know of -- you might want to post the example format so we can more easily figure out how different it is from the supported parsers in Bio::AlignIO -jason On Jul 23, 2009, at 1:38 PM, Robert Buels wrote: > Wow, that silence is deafening. I can't believe somebody who knows > what they're talking about hasn't written you back yet. > > Perhaps you could do some kind of transformation where you read in > the BLAST report with Bio::SearchIO, and then write to MSF with > Bio::AlignIO::msf? You would probably need to do some fiddling to > create the proper objects and relationships that Bio::AlignIO::msf > would want. > > But this reply probably isn't helpful, because you probably already > knew that much. I'm mostly just trying to add to this thread so > that people who actually know a lot about BioPerl's functions in > this area will see it and hopefully be of more help. > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > Thomas Keller wrote: >> Greetings, >> Blast 2.2.21 has a multi-sequence alignment feature that is really >> handy: put in the accession number of the refseq in one sequence >> field and a concatenated fasta file of the Sanger reads to align in >> the second box and it does the alignments. Unfortunately, the >> output is a series of alignments rather than the more useful msf >> format with all reads aligned with the reference. >> Is there a bioperl module that reads the blast alignments and >> converts it to an msf alignment? >> thanks, >> Tom >> kellert at ohsu.edu >> 503-494-2442 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason at bioperl.org From cjm at berkeleybop.org Thu Jul 23 15:54:13 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Thu, 23 Jul 2009 15:54:13 -0400 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <025907D4D2344FDC90E915E605B7FEB8@NewLife> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org><4A60EBB5.4010004@cornell.edu> <4A60FFF8.3030302@jays.net><66FDE248-4CF8-4F68-91D5-16D0AE30B36E@illinois.edu> <91389D4D-B46C-49BA-9D5D-04DD82014B1C@jays.net> <025907D4D2344FDC90E915E605B7FEB8@NewLife> Message-ID: <2379556E-937B-4BAC-9BA4-6C0092AD804B@berkeleybop.org> OMG it's full of *s. (and %s. and $s. and @s. and #s...) On Jul 23, 2009, at 10:16 AM, Mark A. Jensen wrote: > Open the pod bay doors, BioPerl. > > ----- Original Message ----- From: "Jay Hannah" > To: > Sent: Thursday, July 23, 2009 10:11 AM > Subject: Re: [Bioperl-l] bioperl reorganization > > >> On Jul 17, 2009, at 10:26 PM, Chris Fields wrote: >>>> If Bio::Foo::Bar is abandoned by all distributions, a new copy >>>> of that dist is flagged DEPRECATED ("in favor of >>>> Bio::Fooer::Bar"), and pushed to CPAN. That clues everyone in >>>> that development has stopped and where they should go instead. >>>> For example: >>>> >>>> http://search.cpan.org/~mramberg/Catalyst-Plugin- >>>> FormValidator-0.03/ >>> >>> Okay, but seems kinda crufty. I do think there is some talk of >>> removing such modules from the active CPAN, as they would always >>> be available as part of BackPAN, but I haven't seen movement >>> along those lines. >> DEPRECATED modules can be removed from PAUSE after 6 months or 1 >> year or 50 years or whatever. Better to have it explicitly flagged >> and sitting out there than not flagged, misleading new users >> seeking solutions on CPAN. Eventually completely gone. >>> Yes, I have to say it has been very nice with Moose, though I >>> wish MooseX::Declare and MooseX::Method::Signatures would move >>> out of alpha (probably will happen around the first stable >>> release of perl6). >> Indeed. A current CPAN is no magic bullet for every development >> dilemma. It's just better than a stagnant one. :) >> There's still plenty of tactical argument and jockeying in >> Catalyst and Moose. Like any healthy and active open source >> project populated by energetic people. >> On Jul 17, 2009, at 10:31 PM, Chris Fields wrote: >>> I think both of you made very good arguments. Will have to >>> nickname you guys the IRC Mob. >> Oooo... I like it! I'll sketch up some tattoo designs. :) >> On Jul 18, 2009, at 11:10 AM, Mark A. Jensen wrote: >>> http://www.bioperl.org/wiki/Module_Connectivity >> Wow. That's awesome. :) >>> My guess is that the NumDependencies values (which move fastest >>> in the Degree sum and create the sawtooth pattern) reflect >>> dependencies among the modules within the clusters. Wouldn't that >>> be cool? I don't think this works, but the data could certainly >>> be cajoled into giving us this. >> Can the data be cajoled into proving that we've already split up >> BioPerl, so we can avoid all that SVN drudgery? :) >> BioPerl became self aware on August 17, 2009. >> j >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From rmb32 at cornell.edu Thu Jul 23 17:26:05 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 23 Jul 2009 14:26:05 -0700 Subject: [Bioperl-l] genbank (blast) alignments In-Reply-To: <9E710B34E44A4057AF7DFD0B7EDC8017@NewLife> References: <06C35F8D-1EE5-4882-8BF4-111311FBEEC4@ohsu.edu> <4A68BC36.7080006@cornell.edu> <9E710B34E44A4057AF7DFD0B7EDC8017@NewLife> Message-ID: <4A68D56D.7070103@cornell.edu> Mark A. Jensen wrote: > encourage others to do the same, and they do. One thing I tend > not to do is to lay the guilt trip on already busy people. I find this > helps me in later interactions, presently unimagined, with those busy > and influential persons. I certainly wasn't trying to give any "busy and influential persons" any guilt, I was just innocently saying "I'm surprised nobody has gotten to you yet". Maybe I should have been a little more careful with my wording. I apologize if I caused offense. Rob From maj at fortinbras.us Thu Jul 23 17:20:32 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 23 Jul 2009 17:20:32 -0400 Subject: [Bioperl-l] genbank (blast) alignments In-Reply-To: <4A68BC36.7080006@cornell.edu> References: <06C35F8D-1EE5-4882-8BF4-111311FBEEC4@ohsu.edu> <4A68BC36.7080006@cornell.edu> Message-ID: <9E710B34E44A4057AF7DFD0B7EDC8017@NewLife> In my experience on the list, everyone gets helped as soon as it is possible for the extremely busy people who do the helping make time for it. For example, I trolled the list in my limited free time the other night and answered two outstanding questions. Of course, I encourage others to do the same, and they do. One thing I tend not to do is to lay the guilt trip on already busy people. I find this helps me in later interactions, presently unimagined, with those busy and influential persons. The culture may be different for other open source projects. I have found a good strategy here to be: provide the help with minimal editorial comment (either on others' response, or your own), and let the help you provide speak for itself and accumulate over time. cheers, MAJ ----- Original Message ----- From: "Robert Buels" To: "Thomas Keller" Cc: "BioPerl-List" Sent: Thursday, July 23, 2009 3:38 PM Subject: Re: [Bioperl-l] genbank (blast) alignments > Wow, that silence is deafening. I can't believe somebody who knows what > they're talking about hasn't written you back yet. > > Perhaps you could do some kind of transformation where you read in the > BLAST report with Bio::SearchIO, and then write to MSF with > Bio::AlignIO::msf? You would probably need to do some fiddling to > create the proper objects and relationships that Bio::AlignIO::msf would > want. > > But this reply probably isn't helpful, because you probably already knew > that much. I'm mostly just trying to add to this thread so that people > who actually know a lot about BioPerl's functions in this area will see > it and hopefully be of more help. > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > Thomas Keller wrote: >> Greetings, >> Blast 2.2.21 has a multi-sequence alignment feature that is really >> handy: put in the accession number of the refseq in one sequence field >> and a concatenated fasta file of the Sanger reads to align in the second >> box and it does the alignments. Unfortunately, the output is a series of >> alignments rather than the more useful msf format with all reads aligned >> with the reference. >> >> Is there a bioperl module that reads the blast alignments and converts >> it to an msf alignment? >> >> thanks, >> >> >> Tom >> kellert at ohsu.edu >> 503-494-2442 >> >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Thu Jul 23 18:15:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 23 Jul 2009 18:15:06 -0400 Subject: [Bioperl-l] genbank (blast) alignments In-Reply-To: <4A68BC36.7080006@cornell.edu> References: <06C35F8D-1EE5-4882-8BF4-111311FBEEC4@ohsu.edu> <4A68BC36.7080006@cornell.edu> Message-ID: On Jul 23, 2009, at 3:38 PM, Robert Buels wrote: > Wow, that silence is deafening. On Jul 23, 2009, at 5:20 PM, Mark A. Jensen wrote: > One thing I tend not to do is to lay the guilt trip on already busy > people. I'd like to turn this into the positive message that a) people watching and helping that things aren't falling through the cracks on the list is a Good Thing, much appreciated, and I don't think makes anyone feel guilty (not me at least), and b) this should serve as a reminder to all lurkers and casual users out there that you can help out already simply by answering questions on the list rather than waiting for one of the core devs to come along and do so, whether they are extremely busy or not. Everyone can and should do the helping. So, in that sense, thanks to both Rob and Mark for your contributions - they're valued. As for the influential persons, I don't think there are any here worth paying much attention to other than those who by the nature of volunteering to do some stuff influence what gets done and what doesn't :-) My $0.02. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Thu Jul 23 18:53:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Jul 2009 17:53:29 -0500 Subject: [Bioperl-l] genbank (blast) alignments In-Reply-To: <4A68BC36.7080006@cornell.edu> References: <06C35F8D-1EE5-4882-8BF4-111311FBEEC4@ohsu.edu> <4A68BC36.7080006@cornell.edu> Message-ID: <4B791FED-2A45-4D05-BBEA-2DFFB96F54E2@illinois.edu> Lots of emails to answer, so little time. Doesn't help when my VPN goes out either ;> What you want appears to be generating a multiple alignment from pairwise alignment. The answer is 'very likely not'. However, the local BLAST executable does have several options for generating alignments from HSP data (assuming that's what you mean): -m : alignment view options: 0 = pairwise, 1 = query-anchored showing identities, 2 = query-anchored no identities, 3 = flat query-anchored, show identities, 4 = flat query-anchored, no identities, 5 = query-anchored no identities and blunt ends, 6 = flat query-anchored, no identities and blunt ends, 7 = XML Blast output, 8 = tabular, 9 tabular with comment lines [Integer] default = 0 You can set this by reformatting on the BLAST web site (here's a chunk of the output, note the query): Query 61 PVTVGEIDITLYRDDLS-KKTSND-E--PLVKGADIPVDIT------- DQKVILVDDVLY 109 NP_389430 61 PVTVGEIDITLYRDDLS-KKTSND-E--PLVKGADIPVDIT------- DQKVILVDDVLY 109 YP_001421124 61 PVTVGEIDITLYRDDLT-KKTSNE-E--PLVKGADIPADIT------- DQKVIVVDDVLY 109 YP_078940 63 KVTVGELDITLYRDDLS-KKTSNK-E--PLVKGADIPADIT------- DQKVILVDDVLY 111 ZP_03053294 61 PVIVGELDITLYRDDLT-KKTENQ-D--PLVKGADIPADIN------- DKTLIVVDDVLF 109 YP_001486689 61 PVIVGELDITLYRDDLT-KKTDNQ-D--PLVKGADIPADIN------- DKTLIVVDDVLF 109 YP_002949168 60 AVPVGELDITLYRDDLT-VKTIDH-E--PLVKGTDVPFDVT------- NKKVILVDDVLF 108 ZP_01860800 61 KMPVGEIDITLYRDDLT-VKTANE-E--PEVKGSDLPVDVT------- DKKVILIDDVLF 109 ZP_04121773 61 EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT------- KKKVILVDDVLY 109 ZP_04218628 61 EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT------- KKKVILVDDVLY 109 YP_002316154 66 SIPVGELDITLYRDDLT-VKTDDR-E--PLVKGTDVPFSVT------- NQKVILVDDVLF 114 ZP_00240953 61 EMEVGELDITLYRDDLT-LQSKNE-E--PLVKGSDIPVDIT------- KKKVILVDDVLY 109 YP_037953 61 EIEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT------- KKKVILVDDVLY 109 ZP_04193166 61 KMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT------- KKKVILVDDVLY 109 NP_833611 61 EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT------- KKKVILVDDVLY 109 ZP_03018932 61 EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT------- KKKVILVDDVLY 109 ... We do not have a parser for that format, BTW, but it wouldn't be too hard to get something working quickly based on one of the current parsers. Probably could go AlignIO or SearchIO (or both). chris On Jul 23, 2009, at 2:38 PM, Robert Buels wrote: > Wow, that silence is deafening. I can't believe somebody who knows > what they're talking about hasn't written you back yet. > > Perhaps you could do some kind of transformation where you read in > the BLAST report with Bio::SearchIO, and then write to MSF with > Bio::AlignIO::msf? You would probably need to do some fiddling to > create the proper objects and relationships that Bio::AlignIO::msf > would want. > > But this reply probably isn't helpful, because you probably already > knew that much. I'm mostly just trying to add to this thread so > that people who actually know a lot about BioPerl's functions in > this area will see it and hopefully be of more help. > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > Thomas Keller wrote: >> Greetings, >> Blast 2.2.21 has a multi-sequence alignment feature that is really >> handy: put in the accession number of the refseq in one sequence >> field and a concatenated fasta file of the Sanger reads to align in >> the second box and it does the alignments. Unfortunately, the >> output is a series of alignments rather than the more useful msf >> format with all reads aligned with the reference. >> Is there a bioperl module that reads the blast alignments and >> converts it to an msf alignment? >> thanks, >> Tom >> kellert at ohsu.edu >> 503-494-2442 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jul 23 18:58:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Jul 2009 17:58:01 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> Message-ID: <5128A289-377E-4EC3-9030-E0E91B463EA1@illinois.edu> On Jul 23, 2009, at 6:31 AM, Peter Cock wrote: > On Wed, Jul 8, 2009 at 5:24 PM, Chris Fields > wrote: >> >> It would be nice to get some regression tests going for this to >> make sure it >> does what we expect, so maybe some test data and expected results? >> > > Regression tests for BioPerl's FASTQ support would of course > be sensible. In terms of sample data and expected results... > > I've got some test files put together for Biopython, and I have > been cross checking Biopython's FASTQ support against > EMBOSS 6.1.0 which has turned up a few issues: > http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000577.html > > ------------------------------------------------------------------------------ > > I'd like to get comparisons against BioPerl's new FASTQ support > going too. To do this I'd need to know which (branch?) of BioPerl I > should install, and I'd also like a trivial sample BioPerl script to > do > piped FASTQ conversion. i.e. read a FASTQ file from stdin (say > as "fastq-solexa"), and output it to stdout (say as "fastq" meaning > the Sanger Standard FASTQ). You would have to install svn (bioperl-live) if you want the refactored fastq. That commit was within the last month. > i.e. Something like this four line Biopython script would be perfect: > http://biopython.org/wiki/Reading_from_unix_pipes We use named parameters so it's a little more verbose. use Bio::SeqIO; my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'fastq-sanger'); my $out = Bio::SeqIO->new(-format => 'fastq-solexa'); while (my $seq = $in->next_seq) { $out->write_seq($seq) } Don't be surprised if there are still bugs lurking about, just let me know and I'll fix 'em. > ------------------------------------------------------------------------------ > > Peter Rice and I have also been talking about line wrapping when > writing FASTQ output, and if this is a good idea or not: > http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000593.html > > Thanks! > > Peter C. (@Biopython) BTW, I think the bioperl parser does handle line-wrapped FASTQ now. Anyway, I tend to agree with Aaron on that point. Too many exceptions to the rule make it harder to write parsers for human-readable format. chris From maj at fortinbras.us Thu Jul 23 19:16:18 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 23 Jul 2009 19:16:18 -0400 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes In-Reply-To: <2ac05d0f0907170549td482271ra7ea77bdfe43ee27@mail.gmail.com> References: <2ac05d0f0907170549td482271ra7ea77bdfe43ee27@mail.gmail.com> Message-ID: Sorry, went off-list for a couple cycles. The final product will get the correct chromosomal coordinates and then return the sequence from the current build, based on a geneID input. See http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence for the results. cheers MAJ ----- Original Message ----- From: "Emanuele Osimo" To: "perl bioperl ml" Sent: Friday, July 17, 2009 8:49 AM Subject: [Bioperl-l] Getting genomic coordinates for a list of genes > Hello everyone, > I'm new to programming, I'm a biologist, so please forgive my ignorance, but > I've been trying this for 2 weeks, now I have to ask you. > I'm trying the script I found at > http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates > because I need to have some variables (like $from and $to) assigned to the > start and end of a gene. > The script works fine, but gives me the wrong coordinates: for example if I > try it with the gene 842 (CASP9), it prints: > NT_004610.19 2498878 2530877 > > I found out that in Entrez, for each gene (for CASP9, for example, at > http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq > ) under "Genome Reference Consortium Human Build 37 (GRCh37), > Primary_Assembly" there are two different sets of coordinates. The first is > called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37), > Primary_Assembly", and is the one I need, and the second one is called just > "NT_004610.19" and it's the one that the script prints. > This is valid for all the genes I tried. > > DO you know how to make the script print the "right" coordinates (at least, > the one I need)? > Thanks a lot in advance, > Emanuele > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From e.osimo at gmail.com Thu Jul 23 19:24:24 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Thu, 23 Jul 2009 16:24:24 -0700 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes In-Reply-To: References: <2ac05d0f0907170549td482271ra7ea77bdfe43ee27@mail.gmail.com> Message-ID: <2ac05d0f0907231624j3b6f6a86yc14833a3d0fc1181@mail.gmail.com> Hello everyone. Today I discovered that the coupling of the two subs that Mark posted doesn't get the right results. I think this is because one gets the coordinates with RefSeq build 36.3, the other with build 37. I found that coupling the first sub, genome_coords, with the Bio::EnsEMBL::Registry fetch by region API is a lot better, and it actually generates sequences that contain the genes. Bye Emanuele P.S. Thanks a lot to Mark!! On Thu, Jul 23, 2009 at 16:16, Mark A. Jensen wrote: > Sorry, went off-list for a couple cycles. The final product will get the > correct chromosomal coordinates and then return the sequence from > the current build, based on a geneID input. See > http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence > for the results. > cheers MAJ > ----- Original Message ----- From: "Emanuele Osimo" > To: "perl bioperl ml" > Sent: Friday, July 17, 2009 8:49 AM > Subject: [Bioperl-l] Getting genomic coordinates for a list of genes > > > Hello everyone, >> I'm new to programming, I'm a biologist, so please forgive my ignorance, >> but >> I've been trying this for 2 weeks, now I have to ask you. >> I'm trying the script I found at >> >> http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates >> because I need to have some variables (like $from and $to) assigned to the >> start and end of a gene. >> The script works fine, but gives me the wrong coordinates: for example if >> I >> try it with the gene 842 (CASP9), it prints: >> NT_004610.19 2498878 2530877 >> >> I found out that in Entrez, for each gene (for CASP9, for example, at >> >> http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq >> ) under "Genome Reference Consortium Human Build 37 (GRCh37), >> Primary_Assembly" there are two different sets of coordinates. The first >> is >> called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37), >> Primary_Assembly", and is the one I need, and the second one is called >> just >> "NT_004610.19" and it's the one that the script prints. >> This is valid for all the genes I tried. >> >> DO you know how to make the script print the "right" coordinates (at >> least, >> the one I need)? >> Thanks a lot in advance, >> Emanuele >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > From maj at fortinbras.us Thu Jul 23 19:33:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 23 Jul 2009 19:33:42 -0400 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes In-Reply-To: <2ac05d0f0907231624j3b6f6a86yc14833a3d0fc1181@mail.gmail.com> References: <2ac05d0f0907170549td482271ra7ea77bdfe43ee27@mail.gmail.com> <2ac05d0f0907231624j3b6f6a86yc14833a3d0fc1181@mail.gmail.com> Message-ID: <6B9FC91D4C4A470A830E9ED7B067EF90@NewLife> Excellent, Emanuele-- would you post your fix to the list? thanks--MAJ ----- Original Message ----- From: Emanuele Osimo To: Mark A. Jensen Cc: perl bioperl ml Sent: Thursday, July 23, 2009 7:24 PM Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of genes Hello everyone. Today I discovered that the coupling of the two subs that Mark posted doesn't get the right results. I think this is because one gets the coordinates with RefSeq build 36.3, the other with build 37. I found that coupling the first sub, genome_coords, with the Bio::EnsEMBL::Registry fetch by region API is a lot better, and it actually generates sequences that contain the genes. Bye Emanuele P.S. Thanks a lot to Mark!! On Thu, Jul 23, 2009 at 16:16, Mark A. Jensen wrote: Sorry, went off-list for a couple cycles. The final product will get the correct chromosomal coordinates and then return the sequence from the current build, based on a geneID input. See http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence for the results. cheers MAJ ----- Original Message ----- From: "Emanuele Osimo" To: "perl bioperl ml" Sent: Friday, July 17, 2009 8:49 AM Subject: [Bioperl-l] Getting genomic coordinates for a list of genes Hello everyone, I'm new to programming, I'm a biologist, so please forgive my ignorance, but I've been trying this for 2 weeks, now I have to ask you. I'm trying the script I found at http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates because I need to have some variables (like $from and $to) assigned to the start and end of a gene. The script works fine, but gives me the wrong coordinates: for example if I try it with the gene 842 (CASP9), it prints: NT_004610.19 2498878 2530877 I found out that in Entrez, for each gene (for CASP9, for example, at http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq ) under "Genome Reference Consortium Human Build 37 (GRCh37), Primary_Assembly" there are two different sets of coordinates. The first is called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37), Primary_Assembly", and is the one I need, and the second one is called just "NT_004610.19" and it's the one that the script prints. This is valid for all the genes I tried. DO you know how to make the script print the "right" coordinates (at least, the one I need)? Thanks a lot in advance, Emanuele _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jul 23 19:36:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 23 Jul 2009 19:36:01 -0400 Subject: [Bioperl-l] completely off topic Message-ID: <89CD26B05CCA4378882B63E8BBCDC016@NewLife> All, http://www.google.com/search?q=recursion :-) MAJ From cjfields at illinois.edu Thu Jul 23 19:57:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Jul 2009 18:57:03 -0500 Subject: [Bioperl-l] completely off topic In-Reply-To: <89CD26B05CCA4378882B63E8BBCDC016@NewLife> References: <89CD26B05CCA4378882B63E8BBCDC016@NewLife> Message-ID: <963BE27F-ED1A-4AD9-9FA4-5A498CA6EB7A@illinois.edu> Ha! Wonder how many people fell for that one. -c On Jul 23, 2009, at 6:36 PM, Mark A. Jensen wrote: > All, > http://www.google.com/search?q=recursion > :-) > MAJ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jul 23 20:06:07 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 23 Jul 2009 20:06:07 -0400 Subject: [Bioperl-l] completely off topic In-Reply-To: <963BE27F-ED1A-4AD9-9FA4-5A498CA6EB7A@illinois.edu> References: <89CD26B05CCA4378882B63E8BBCDC016@NewLife> <963BE27F-ED1A-4AD9-9FA4-5A498CA6EB7A@illinois.edu> Message-ID: <6219CF943E114BF28FF77083F407F960@NewLife> Good question-- I saw it under "Stupid Google Tricks" in http://www.publish2.com/newsgroups/nyt-technology-journalists/rss/ ----- Original Message ----- From: "Chris Fields" To: "Mark A. Jensen" Cc: "BioPerl List" Sent: Thursday, July 23, 2009 7:57 PM Subject: Re: [Bioperl-l] completely off topic > Ha! Wonder how many people fell for that one. > > -c > > On Jul 23, 2009, at 6:36 PM, Mark A. Jensen wrote: > >> All, >> http://www.google.com/search?q=recursion >> :-) >> MAJ >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From rmb32 at cornell.edu Thu Jul 23 20:27:25 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 23 Jul 2009 17:27:25 -0700 Subject: [Bioperl-l] completely off topic In-Reply-To: <963BE27F-ED1A-4AD9-9FA4-5A498CA6EB7A@illinois.edu> References: <89CD26B05CCA4378882B63E8BBCDC016@NewLife> <963BE27F-ED1A-4AD9-9FA4-5A498CA6EB7A@illinois.edu> Message-ID: <4A68FFED.8010102@cornell.edu> That is damn funny. From cjfields at illinois.edu Thu Jul 23 20:36:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Jul 2009 19:36:45 -0500 Subject: [Bioperl-l] completely off topic In-Reply-To: <6219CF943E114BF28FF77083F407F960@NewLife> References: <89CD26B05CCA4378882B63E8BBCDC016@NewLife> <963BE27F-ED1A-4AD9-9FA4-5A498CA6EB7A@illinois.edu> <6219CF943E114BF28FF77083F407F960@NewLife> Message-ID: <25C6629B-5A70-47E3-838C-2AF829F8C202@illinois.edu> Ah, Spaced is on Hulu now! oh, sorry... -c On Jul 23, 2009, at 7:06 PM, Mark A. Jensen wrote: > Good question-- I saw it under "Stupid Google Tricks" > in http://www.publish2.com/newsgroups/nyt-technology-journalists/rss/ > > ----- Original Message ----- From: "Chris Fields" > > To: "Mark A. Jensen" > Cc: "BioPerl List" > Sent: Thursday, July 23, 2009 7:57 PM > Subject: Re: [Bioperl-l] completely off topic > > >> Ha! Wonder how many people fell for that one. >> -c >> On Jul 23, 2009, at 6:36 PM, Mark A. Jensen wrote: >>> All, >>> http://www.google.com/search?q=recursion >>> :-) >>> MAJ >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From e.osimo at gmail.com Thu Jul 23 20:48:26 2009 From: e.osimo at gmail.com (Emanuele Osimo) Date: Thu, 23 Jul 2009 17:48:26 -0700 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes AND WUBlast Message-ID: <2ac05d0f0907231748l6b0fd53cl6c9c435688b89b73@mail.gmail.com> Hello, this is the fix: use Bio::EnsEMBL::Slice; use Bio::EnsEMBL::Registry; my $db = new Bio::DB::EntrezGene; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Slice' ); my $slice = $slice_adaptor->fetch_by_region( 'chromosome', $chr, $start, $end ); print $slice->seq ; To be used after getting the coordinates with sub genome_coords . I have another question for you: I need to use the software WUBlast, but I noticed that it is no more available on the website. They just say that if you have it, you can use it. I don't have it, but I urgently need it, if anyone has it, could you please send it to me? Thanks Emanuele On Thu, Jul 23, 2009 at 16:33, Mark A. Jensen wrote: > Excellent, Emanuele-- would you post your fix to the list? > thanks--MAJ > > ----- Original Message ----- > From: Emanuele Osimo > To: Mark A. Jensen > Cc: perl bioperl ml > Sent: Thursday, July 23, 2009 7:24 PM > Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of genes > Hello everyone. > Today I discovered that the coupling of the two subs that Mark posted > doesn't get the right results. I think this is because one gets the > coordinates with RefSeq build 36.3, the other with build 37. > I found that coupling the first sub, genome_coords, with the > Bio::EnsEMBL::Registry fetch by region API is a lot better, and it actually > generates sequences that contain the genes. > Bye > Emanuele > > P.S. > Thanks a lot to Mark!! > > > On Thu, Jul 23, 2009 at 16:16, Mark A. Jensen wrote: >> >> Sorry, went off-list for a couple cycles. The final product will get the >> correct chromosomal coordinates and then return the sequence from >> the current build, based on a geneID input. See >> http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence >> for the results. >> cheers MAJ >> ----- Original Message ----- From: "Emanuele Osimo" >> To: "perl bioperl ml" >> Sent: Friday, July 17, 2009 8:49 AM >> Subject: [Bioperl-l] Getting genomic coordinates for a list of genes >> >> >>> Hello everyone, >>> I'm new to programming, I'm a biologist, so please forgive my ignorance, >>> but >>> I've been trying this for 2 weeks, now I have to ask you. >>> I'm trying the script I found at >>> >>> http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates >>> because I need to have some variables (like $from and $to) assigned to >>> the >>> start and end of a gene. >>> The script works fine, but gives me the wrong coordinates: for example if >>> I >>> try it with the gene ?842 (CASP9), it prints: >>> NT_004610.19 ? ?2498878 ? ?2530877 >>> >>> I found out that in Entrez, for each gene (for CASP9, for example, at >>> >>> http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq >>> ) under "Genome Reference Consortium Human Build 37 (GRCh37), >>> Primary_Assembly" there are two different sets of coordinates. The first >>> is >>> called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37), >>> Primary_Assembly", and is the one I need, and the second one is called >>> just >>> "NT_004610.19" and it's the one that the script prints. >>> This is valid for all the genes I tried. >>> >>> DO you know how to make the script print the "right" coordinates (at >>> least, >>> the one I need)? >>> Thanks a lot in advance, >>> Emanuele >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > > From maj at fortinbras.us Thu Jul 23 21:19:21 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 23 Jul 2009 21:19:21 -0400 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes ANDWUBlast In-Reply-To: <2ac05d0f0907231748l6b0fd53cl6c9c435688b89b73@mail.gmail.com> References: <2ac05d0f0907231748l6b0fd53cl6c9c435688b89b73@mail.gmail.com> Message-ID: <42BE527274484A98AE6CDC08BFFF45FF@NewLife> the Scrapbook page is now updated- MAJ ----- Original Message ----- From: "Emanuele Osimo" To: "Mark A. Jensen" Cc: "perl bioperl ml" Sent: Thursday, July 23, 2009 8:48 PM Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of genes ANDWUBlast Hello, this is the fix: use Bio::EnsEMBL::Slice; use Bio::EnsEMBL::Registry; my $db = new Bio::DB::EntrezGene; my $registry = 'Bio::EnsEMBL::Registry'; $registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous' ); my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Slice' ); my $slice = $slice_adaptor->fetch_by_region( 'chromosome', $chr, $start, $end ); print $slice->seq ; To be used after getting the coordinates with sub genome_coords . I have another question for you: I need to use the software WUBlast, but I noticed that it is no more available on the website. They just say that if you have it, you can use it. I don't have it, but I urgently need it, if anyone has it, could you please send it to me? Thanks Emanuele On Thu, Jul 23, 2009 at 16:33, Mark A. Jensen wrote: > Excellent, Emanuele-- would you post your fix to the list? > thanks--MAJ > > ----- Original Message ----- > From: Emanuele Osimo > To: Mark A. Jensen > Cc: perl bioperl ml > Sent: Thursday, July 23, 2009 7:24 PM > Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of genes > Hello everyone. > Today I discovered that the coupling of the two subs that Mark posted > doesn't get the right results. I think this is because one gets the > coordinates with RefSeq build 36.3, the other with build 37. > I found that coupling the first sub, genome_coords, with the > Bio::EnsEMBL::Registry fetch by region API is a lot better, and it actually > generates sequences that contain the genes. > Bye > Emanuele > > P.S. > Thanks a lot to Mark!! > > > On Thu, Jul 23, 2009 at 16:16, Mark A. Jensen wrote: >> >> Sorry, went off-list for a couple cycles. The final product will get the >> correct chromosomal coordinates and then return the sequence from >> the current build, based on a geneID input. See >> http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence >> for the results. >> cheers MAJ >> ----- Original Message ----- From: "Emanuele Osimo" >> To: "perl bioperl ml" >> Sent: Friday, July 17, 2009 8:49 AM >> Subject: [Bioperl-l] Getting genomic coordinates for a list of genes >> >> >>> Hello everyone, >>> I'm new to programming, I'm a biologist, so please forgive my ignorance, >>> but >>> I've been trying this for 2 weeks, now I have to ask you. >>> I'm trying the script I found at >>> >>> http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::EntrezGene_to_get_genomic_coordinates >>> because I need to have some variables (like $from and $to) assigned to >>> the >>> start and end of a gene. >>> The script works fine, but gives me the wrong coordinates: for example if >>> I >>> try it with the gene 842 (CASP9), it prints: >>> NT_004610.19 2498878 2530877 >>> >>> I found out that in Entrez, for each gene (for CASP9, for example, at >>> >>> http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez.Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq >>> ) under "Genome Reference Consortium Human Build 37 (GRCh37), >>> Primary_Assembly" there are two different sets of coordinates. The first >>> is >>> called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37), >>> Primary_Assembly", and is the one I need, and the second one is called >>> just >>> "NT_004610.19" and it's the one that the script prints. >>> This is valid for all the genes I tried. >>> >>> DO you know how to make the script print the "right" coordinates (at >>> least, >>> the one I need)? >>> Thanks a lot in advance, >>> Emanuele >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Jul 23 22:31:06 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 24 Jul 2009 14:31:06 +1200 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes AND WUBlast In-Reply-To: <2ac05d0f0907231748l6b0fd53cl6c9c435688b89b73@mail.gmail.com> References: <2ac05d0f0907231748l6b0fd53cl6c9c435688b89b73@mail.gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB0A1144@exchsth.agresearch.co.nz> It's still available on the new site but only as an old version - v2.0a19 (but it's now free) http://www.advbiocomp.com/blast/obsolete/ --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Emanuele Osimo > Sent: Friday, 24 July 2009 12:48 p.m. > To: Mark A. Jensen > Cc: perl bioperl ml > Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of genes AND > WUBlast > > Hello, > this is the fix: > > use Bio::EnsEMBL::Slice; > use Bio::EnsEMBL::Registry; > > my $db = new Bio::DB::EntrezGene; > > my $registry = 'Bio::EnsEMBL::Registry'; > $registry->load_registry_from_db( > -host => 'ensembldb.ensembl.org', > -user => 'anonymous' > ); > my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Slice' ); > > my $slice = $slice_adaptor->fetch_by_region( 'chromosome', $chr, $start, $end > ); > print $slice->seq ; > > To be used after getting the coordinates with sub genome_coords . > > > I have another question for you: I need to use the software WUBlast, > but I noticed that it is no more available on the website. They just > say that if you have it, you can use it. I don't have it, but I > urgently need it, if anyone has it, could you please send it to me? > > Thanks > Emanuele > > > On Thu, Jul 23, 2009 at 16:33, Mark A. Jensen wrote: > > Excellent, Emanuele-- would you post your fix to the list? > > thanks--MAJ > > > > ----- Original Message ----- > > From: Emanuele Osimo > > To: Mark A. Jensen > > Cc: perl bioperl ml > > Sent: Thursday, July 23, 2009 7:24 PM > > Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of genes > > Hello everyone. > > Today I discovered that the coupling of the two subs that Mark posted > > doesn't get the right results. I think this is because one gets the > > coordinates with RefSeq build 36.3, the other with build 37. > > I found that coupling the first sub, genome_coords, with the > > Bio::EnsEMBL::Registry fetch by region API is a lot better, and it actually > > generates sequences that contain the genes. > > Bye > > Emanuele > > > > P.S. > > Thanks a lot to Mark!! > > > > > > On Thu, Jul 23, 2009 at 16:16, Mark A. Jensen wrote: > >> > >> Sorry, went off-list for a couple cycles. The final product will get the > >> correct chromosomal coordinates and then return the sequence from > >> the current build, based on a geneID input. See > >> http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence > >> for the results. > >> cheers MAJ > >> ----- Original Message ----- From: "Emanuele Osimo" > >> To: "perl bioperl ml" > >> Sent: Friday, July 17, 2009 8:49 AM > >> Subject: [Bioperl-l] Getting genomic coordinates for a list of genes > >> > >> > >>> Hello everyone, > >>> I'm new to programming, I'm a biologist, so please forgive my ignorance, > >>> but > >>> I've been trying this for 2 weeks, now I have to ask you. > >>> I'm trying the script I found at > >>> > >>> > http://bio.perl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::Entrez > Gene_to_get_genomic_coordinates > >>> because I need to have some variables (like $from and $to) assigned to > >>> the > >>> start and end of a gene. > >>> The script works fine, but gives me the wrong coordinates: for example if > >>> I > >>> try it with the gene ?842 (CASP9), it prints: > >>> NT_004610.19 ? ?2498878 ? ?2530877 > >>> > >>> I found out that in Entrez, for each gene (for CASP9, for example, at > >>> > >>> > http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez. > Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq > >>> ) under "Genome Reference Consortium Human Build 37 (GRCh37), > >>> Primary_Assembly" there are two different sets of coordinates. The first > >>> is > >>> called "NC_000001.10 Genome Reference Consortium Human Build 37 (GRCh37), > >>> Primary_Assembly", and is the one I need, and the second one is called > >>> just > >>> "NT_004610.19" and it's the one that the script prints. > >>> This is valid for all the genes I tried. > >>> > >>> DO you know how to make the script print the "right" coordinates (at > >>> least, > >>> the one I need)? > >>> Thanks a lot in advance, > >>> Emanuele > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Thu Jul 23 23:04:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 23 Jul 2009 22:04:38 -0500 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes AND WUBlast In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB0A1144@exchsth.agresearch.co.nz> References: <2ac05d0f0907231748l6b0fd53cl6c9c435688b89b73@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB0A1144@exchsth.agresearch.co.nz> Message-ID: <36042A85-6219-465C-8850-A8FB89029504@illinois.edu> Unfortunately most users are bound by the previous licensing terms, b/ c I know a few squirrels who have this lying around. I think the newer version will be free for academic use if it ever makes the light of day. chris On Jul 23, 2009, at 9:31 PM, Smithies, Russell wrote: > It's still available on the new site but only as an old version - > v2.0a19 (but it's now free) > http://www.advbiocomp.com/blast/obsolete/ > > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Emanuele Osimo >> Sent: Friday, 24 July 2009 12:48 p.m. >> To: Mark A. Jensen >> Cc: perl bioperl ml >> Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of >> genes AND >> WUBlast >> >> Hello, >> this is the fix: >> >> use Bio::EnsEMBL::Slice; >> use Bio::EnsEMBL::Registry; >> >> my $db = new Bio::DB::EntrezGene; >> >> my $registry = 'Bio::EnsEMBL::Registry'; >> $registry->load_registry_from_db( >> -host => 'ensembldb.ensembl.org', >> -user => 'anonymous' >> ); >> my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', >> 'Slice' ); >> >> my $slice = $slice_adaptor->fetch_by_region( 'chromosome', $chr, >> $start, $end >> ); >> print $slice->seq ; >> >> To be used after getting the coordinates with sub genome_coords . >> >> >> I have another question for you: I need to use the software WUBlast, >> but I noticed that it is no more available on the website. They just >> say that if you have it, you can use it. I don't have it, but I >> urgently need it, if anyone has it, could you please send it to me? >> >> Thanks >> Emanuele >> >> >> On Thu, Jul 23, 2009 at 16:33, Mark A. Jensen >> wrote: >>> Excellent, Emanuele-- would you post your fix to the list? >>> thanks--MAJ >>> >>> ----- Original Message ----- >>> From: Emanuele Osimo >>> To: Mark A. Jensen >>> Cc: perl bioperl ml >>> Sent: Thursday, July 23, 2009 7:24 PM >>> Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of >>> genes >>> Hello everyone. >>> Today I discovered that the coupling of the two subs that Mark >>> posted >>> doesn't get the right results. I think this is because one gets the >>> coordinates with RefSeq build 36.3, the other with build 37. >>> I found that coupling the first sub, genome_coords, with the >>> Bio::EnsEMBL::Registry fetch by region API is a lot better, and it >>> actually >>> generates sequences that contain the genes. >>> Bye >>> Emanuele >>> >>> P.S. >>> Thanks a lot to Mark!! >>> >>> >>> On Thu, Jul 23, 2009 at 16:16, Mark A. Jensen >>> wrote: >>>> >>>> Sorry, went off-list for a couple cycles. The final product will >>>> get the >>>> correct chromosomal coordinates and then return the sequence from >>>> the current build, based on a geneID input. See >>>> http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence >>>> for the results. >>>> cheers MAJ >>>> ----- Original Message ----- From: "Emanuele Osimo" >>> > >>>> To: "perl bioperl ml" >>>> Sent: Friday, July 17, 2009 8:49 AM >>>> Subject: [Bioperl-l] Getting genomic coordinates for a list of >>>> genes >>>> >>>> >>>>> Hello everyone, >>>>> I'm new to programming, I'm a biologist, so please forgive my >>>>> ignorance, >>>>> but >>>>> I've been trying this for 2 weeks, now I have to ask you. >>>>> I'm trying the script I found at >>>>> >>>>> >> http://bio.perl.org/wiki/ >> HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::Entrez >> Gene_to_get_genomic_coordinates >>>>> because I need to have some variables (like $from and $to) >>>>> assigned to >>>>> the >>>>> start and end of a gene. >>>>> The script works fine, but gives me the wrong coordinates: for >>>>> example if >>>>> I >>>>> try it with the gene 842 (CASP9), it prints: >>>>> NT_004610.19 2498878 2530877 >>>>> >>>>> I found out that in Entrez, for each gene (for CASP9, for >>>>> example, at >>>>> >>>>> >> http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez >> . >> Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq >>>>> ) under "Genome Reference Consortium Human Build 37 (GRCh37), >>>>> Primary_Assembly" there are two different sets of coordinates. >>>>> The first >>>>> is >>>>> called "NC_000001.10 Genome Reference Consortium Human Build 37 >>>>> (GRCh37), >>>>> Primary_Assembly", and is the one I need, and the second one is >>>>> called >>>>> just >>>>> "NT_004610.19" and it's the one that the script prints. >>>>> This is valid for all the genes I tried. >>>>> >>>>> DO you know how to make the script print the "right" coordinates >>>>> (at >>>>> least, >>>>> the one I need)? >>>>> Thanks a lot in advance, >>>>> Emanuele >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > = > ====================================================================== > Attention: The information contained in this message and/or > attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or > privileged > material. Any review, retransmission, dissemination or other use of, > or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by > AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > = > ====================================================================== > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Russell.Smithies at agresearch.co.nz Thu Jul 23 23:15:09 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 24 Jul 2009 15:15:09 +1200 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes AND WUBlast In-Reply-To: <36042A85-6219-465C-8850-A8FB89029504@illinois.edu> References: <2ac05d0f0907231748l6b0fd53cl6c9c435688b89b73@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB0A1144@exchsth.agresearch.co.nz> <36042A85-6219-465C-8850-A8FB89029504@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB0A1197@exchsth.agresearch.co.nz> Who's behind AdvBioComp? I saw Warren Gish's name on a copyright notice at the bottom of one of their pages but I'm not sure if it's just a left-over WU-Blast page. --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Friday, 24 July 2009 3:05 p.m. > To: Smithies, Russell > Cc: Emanuele Osimo; Mark Jensen; BioPerl List > Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of genes AND > WUBlast > > Unfortunately most users are bound by the previous licensing terms, b/ > c I know a few squirrels who have this lying around. I think the > newer version will be free for academic use if it ever makes the light > of day. > > chris > > On Jul 23, 2009, at 9:31 PM, Smithies, Russell wrote: > > > It's still available on the new site but only as an old version - > > v2.0a19 (but it's now free) > > http://www.advbiocomp.com/blast/obsolete/ > > > > > > --Russell > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Emanuele Osimo > >> Sent: Friday, 24 July 2009 12:48 p.m. > >> To: Mark A. Jensen > >> Cc: perl bioperl ml > >> Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of > >> genes AND > >> WUBlast > >> > >> Hello, > >> this is the fix: > >> > >> use Bio::EnsEMBL::Slice; > >> use Bio::EnsEMBL::Registry; > >> > >> my $db = new Bio::DB::EntrezGene; > >> > >> my $registry = 'Bio::EnsEMBL::Registry'; > >> $registry->load_registry_from_db( > >> -host => 'ensembldb.ensembl.org', > >> -user => 'anonymous' > >> ); > >> my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', > >> 'Slice' ); > >> > >> my $slice = $slice_adaptor->fetch_by_region( 'chromosome', $chr, > >> $start, $end > >> ); > >> print $slice->seq ; > >> > >> To be used after getting the coordinates with sub genome_coords . > >> > >> > >> I have another question for you: I need to use the software WUBlast, > >> but I noticed that it is no more available on the website. They just > >> say that if you have it, you can use it. I don't have it, but I > >> urgently need it, if anyone has it, could you please send it to me? > >> > >> Thanks > >> Emanuele > >> > >> > >> On Thu, Jul 23, 2009 at 16:33, Mark A. Jensen > >> wrote: > >>> Excellent, Emanuele-- would you post your fix to the list? > >>> thanks--MAJ > >>> > >>> ----- Original Message ----- > >>> From: Emanuele Osimo > >>> To: Mark A. Jensen > >>> Cc: perl bioperl ml > >>> Sent: Thursday, July 23, 2009 7:24 PM > >>> Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of > >>> genes > >>> Hello everyone. > >>> Today I discovered that the coupling of the two subs that Mark > >>> posted > >>> doesn't get the right results. I think this is because one gets the > >>> coordinates with RefSeq build 36.3, the other with build 37. > >>> I found that coupling the first sub, genome_coords, with the > >>> Bio::EnsEMBL::Registry fetch by region API is a lot better, and it > >>> actually > >>> generates sequences that contain the genes. > >>> Bye > >>> Emanuele > >>> > >>> P.S. > >>> Thanks a lot to Mark!! > >>> > >>> > >>> On Thu, Jul 23, 2009 at 16:16, Mark A. Jensen > >>> wrote: > >>>> > >>>> Sorry, went off-list for a couple cycles. The final product will > >>>> get the > >>>> correct chromosomal coordinates and then return the sequence from > >>>> the current build, based on a geneID input. See > >>>> http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence > >>>> for the results. > >>>> cheers MAJ > >>>> ----- Original Message ----- From: "Emanuele Osimo" >>>> > > >>>> To: "perl bioperl ml" > >>>> Sent: Friday, July 17, 2009 8:49 AM > >>>> Subject: [Bioperl-l] Getting genomic coordinates for a list of > >>>> genes > >>>> > >>>> > >>>>> Hello everyone, > >>>>> I'm new to programming, I'm a biologist, so please forgive my > >>>>> ignorance, > >>>>> but > >>>>> I've been trying this for 2 weeks, now I have to ask you. > >>>>> I'm trying the script I found at > >>>>> > >>>>> > >> http://bio.perl.org/wiki/ > >> HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::Entrez > >> Gene_to_get_genomic_coordinates > >>>>> because I need to have some variables (like $from and $to) > >>>>> assigned to > >>>>> the > >>>>> start and end of a gene. > >>>>> The script works fine, but gives me the wrong coordinates: for > >>>>> example if > >>>>> I > >>>>> try it with the gene 842 (CASP9), it prints: > >>>>> NT_004610.19 2498878 2530877 > >>>>> > >>>>> I found out that in Entrez, for each gene (for CASP9, for > >>>>> example, at > >>>>> > >>>>> > >> > http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez > >> . > >> Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq > >>>>> ) under "Genome Reference Consortium Human Build 37 (GRCh37), > >>>>> Primary_Assembly" there are two different sets of coordinates. > >>>>> The first > >>>>> is > >>>>> called "NC_000001.10 Genome Reference Consortium Human Build 37 > >>>>> (GRCh37), > >>>>> Primary_Assembly", and is the one I need, and the second one is > >>>>> called > >>>>> just > >>>>> "NT_004610.19" and it's the one that the script prints. > >>>>> This is valid for all the genes I tried. > >>>>> > >>>>> DO you know how to make the script print the "right" coordinates > >>>>> (at > >>>>> least, > >>>>> the one I need)? > >>>>> Thanks a lot in advance, > >>>>> Emanuele > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Thu Jul 23 23:20:09 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 23 Jul 2009 23:20:09 -0400 Subject: [Bioperl-l] Getting genomic coordinates for a list of genes ANDWUBlast In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB0A1197@exchsth.agresearch.co.nz> References: <2ac05d0f0907231748l6b0fd53cl6c9c435688b89b73@mail.gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB0A1144@exchsth.agresearch.co.nz> <36042A85-6219-465C-8850-A8FB89029504@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF32AAB0A1197@exchsth.agresearch.co.nz> Message-ID: <8275C6C4D955406D95DB6C0AB54C92EF@NewLife> Himself, I believe- some previous discussion at http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030075.html ----- Original Message ----- From: "Smithies, Russell" To: "'Chris Fields'" Cc: "'Emanuele Osimo'" ; "'Mark Jensen'" ; "'BioPerl List'" Sent: Thursday, July 23, 2009 11:15 PM Subject: RE: [Bioperl-l] Getting genomic coordinates for a list of genes ANDWUBlast Who's behind AdvBioComp? I saw Warren Gish's name on a copyright notice at the bottom of one of their pages but I'm not sure if it's just a left-over WU-Blast page. --Russell > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Friday, 24 July 2009 3:05 p.m. > To: Smithies, Russell > Cc: Emanuele Osimo; Mark Jensen; BioPerl List > Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of genes AND > WUBlast > > Unfortunately most users are bound by the previous licensing terms, b/ > c I know a few squirrels who have this lying around. I think the > newer version will be free for academic use if it ever makes the light > of day. > > chris > > On Jul 23, 2009, at 9:31 PM, Smithies, Russell wrote: > > > It's still available on the new site but only as an old version - > > v2.0a19 (but it's now free) > > http://www.advbiocomp.com/blast/obsolete/ > > > > > > --Russell > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Emanuele Osimo > >> Sent: Friday, 24 July 2009 12:48 p.m. > >> To: Mark A. Jensen > >> Cc: perl bioperl ml > >> Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of > >> genes AND > >> WUBlast > >> > >> Hello, > >> this is the fix: > >> > >> use Bio::EnsEMBL::Slice; > >> use Bio::EnsEMBL::Registry; > >> > >> my $db = new Bio::DB::EntrezGene; > >> > >> my $registry = 'Bio::EnsEMBL::Registry'; > >> $registry->load_registry_from_db( > >> -host => 'ensembldb.ensembl.org', > >> -user => 'anonymous' > >> ); > >> my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', > >> 'Slice' ); > >> > >> my $slice = $slice_adaptor->fetch_by_region( 'chromosome', $chr, > >> $start, $end > >> ); > >> print $slice->seq ; > >> > >> To be used after getting the coordinates with sub genome_coords . > >> > >> > >> I have another question for you: I need to use the software WUBlast, > >> but I noticed that it is no more available on the website. They just > >> say that if you have it, you can use it. I don't have it, but I > >> urgently need it, if anyone has it, could you please send it to me? > >> > >> Thanks > >> Emanuele > >> > >> > >> On Thu, Jul 23, 2009 at 16:33, Mark A. Jensen > >> wrote: > >>> Excellent, Emanuele-- would you post your fix to the list? > >>> thanks--MAJ > >>> > >>> ----- Original Message ----- > >>> From: Emanuele Osimo > >>> To: Mark A. Jensen > >>> Cc: perl bioperl ml > >>> Sent: Thursday, July 23, 2009 7:24 PM > >>> Subject: Re: [Bioperl-l] Getting genomic coordinates for a list of > >>> genes > >>> Hello everyone. > >>> Today I discovered that the coupling of the two subs that Mark > >>> posted > >>> doesn't get the right results. I think this is because one gets the > >>> coordinates with RefSeq build 36.3, the other with build 37. > >>> I found that coupling the first sub, genome_coords, with the > >>> Bio::EnsEMBL::Registry fetch by region API is a lot better, and it > >>> actually > >>> generates sequences that contain the genes. > >>> Bye > >>> Emanuele > >>> > >>> P.S. > >>> Thanks a lot to Mark!! > >>> > >>> > >>> On Thu, Jul 23, 2009 at 16:16, Mark A. Jensen > >>> wrote: > >>>> > >>>> Sorry, went off-list for a couple cycles. The final product will > >>>> get the > >>>> correct chromosomal coordinates and then return the sequence from > >>>> the current build, based on a geneID input. See > >>>> http://www.bioperl.org/wiki/Human_genomic_coordinates_and_sequence > >>>> for the results. > >>>> cheers MAJ > >>>> ----- Original Message ----- From: "Emanuele Osimo" >>>> > > >>>> To: "perl bioperl ml" > >>>> Sent: Friday, July 17, 2009 8:49 AM > >>>> Subject: [Bioperl-l] Getting genomic coordinates for a list of > >>>> genes > >>>> > >>>> > >>>>> Hello everyone, > >>>>> I'm new to programming, I'm a biologist, so please forgive my > >>>>> ignorance, > >>>>> but > >>>>> I've been trying this for 2 weeks, now I have to ask you. > >>>>> I'm trying the script I found at > >>>>> > >>>>> > >> http://bio.perl.org/wiki/ > >> HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::Entrez > >> Gene_to_get_genomic_coordinates > >>>>> because I need to have some variables (like $from and $to) > >>>>> assigned to > >>>>> the > >>>>> start and end of a gene. > >>>>> The script works fine, but gives me the wrong coordinates: for > >>>>> example if > >>>>> I > >>>>> try it with the gene 842 (CASP9), it prints: > >>>>> NT_004610.19 2498878 2530877 > >>>>> > >>>>> I found out that in Entrez, for each gene (for CASP9, for > >>>>> example, at > >>>>> > >>>>> > >> > http://www.ncbi.nlm.nih.gov/gene/842?ordinalpos=1&itool=EntrezSystem2.PEntrez > >> . > >> Gene.Gene_ResultsPanel.Gene_RVDocSum#refseq > >>>>> ) under "Genome Reference Consortium Human Build 37 (GRCh37), > >>>>> Primary_Assembly" there are two different sets of coordinates. > >>>>> The first > >>>>> is > >>>>> called "NC_000001.10 Genome Reference Consortium Human Build 37 > >>>>> (GRCh37), > >>>>> Primary_Assembly", and is the one I need, and the second one is > >>>>> called > >>>>> just > >>>>> "NT_004610.19" and it's the one that the script prints. > >>>>> This is valid for all the genes I tried. > >>>>> > >>>>> DO you know how to make the script print the "right" coordinates > >>>>> (at > >>>>> least, > >>>>> the one I need)? > >>>>> Thanks a lot in advance, > >>>>> Emanuele > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>> > >>> > >>> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > = > > ====================================================================== > > Attention: The information contained in this message and/or > > attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or > > privileged > > material. Any review, retransmission, dissemination or other use of, > > or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by > > AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > = > > ====================================================================== > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Jul 24 05:28:44 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 24 Jul 2009 10:28:44 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <5128A289-377E-4EC3-9030-E0E91B463EA1@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> <5128A289-377E-4EC3-9030-E0E91B463EA1@illinois.edu> Message-ID: <320fb6e00907240228u797c316ds94297f349d8c097c@mail.gmail.com> On Thu, Jul 23, 2009 at 11:58 PM, Chris Fields wrote: >> i.e. Something like this four line Biopython script would be perfect: >> http://biopython.org/wiki/Reading_from_unix_pipes > > We use named parameters so it's a little more verbose. > > use Bio::SeqIO; > my $in ?= Bio::SeqIO->new(-fh => \*STDIN, -format => 'fastq-sanger'); > my $out = Bio::SeqIO->new(-format => 'fastq-solexa'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) } Thanks. So that implicitly uses STDOUT for the output? > Don't be surprised if there are still bugs lurking about, just let me know > and I'll fix 'em. Have you guys (BioPerl) have also gone for "fastq-sanger" instead of just "fastq" for the Sanger Standard version of FASTQ (like EMBOSS)? Does BioPerl use just "fastq" to mean anything? If BioPerl and EMBOSS are using "fastq-sanger", I think Biopython will have to support that as an alias too: http://lists.open-bio.org/pipermail/biopython-dev/2009-July/006416.html Thanks, Peter From bernd.web at gmail.com Fri Jul 24 08:12:56 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 24 Jul 2009 14:12:56 +0200 Subject: [Bioperl-l] genbank (blast) alignments In-Reply-To: <4B791FED-2A45-4D05-BBEA-2DFFB96F54E2@illinois.edu> References: <06C35F8D-1EE5-4882-8BF4-111311FBEEC4@ohsu.edu> <4A68BC36.7080006@cornell.edu> <4B791FED-2A45-4D05-BBEA-2DFFB96F54E2@illinois.edu> Message-ID: <716af09c0907240512v10e49cecxb7197c53469bb21@mail.gmail.com> Hi, Although this not refer to the original query/new alignment format for blast 2.2.21, the BLAST -m 6 format (query-anchored) is relatively easily transformed to a format the Bio::AlignIO can read as Chris suggests: This alignment format can be parsed as a Clustal alignment by prepending a CLUSTAL header and removing the start positions from the sequences (or as a SELEX alignment) Bernd On Fri, Jul 24, 2009 at 12:53 AM, Chris Fields wrote: > Lots of emails to answer, so little time. Doesn't help when my VPN goes out > either ;> > > What you want appears to be generating a multiple alignment from pairwise > alignment. The answer is 'very likely not'. However, the local BLAST > executable does have several options for generating alignments from HSP data > (assuming that's what you mean): > > -m : alignment view options: > 0 = pairwise, > 1 = query-anchored showing identities, > 2 = query-anchored no identities, > 3 = flat query-anchored, show identities, > 4 = flat query-anchored, no identities, > 5 = query-anchored no identities and blunt ends, > 6 = flat query-anchored, no identities and blunt ends, > 7 = XML Blast output, > 8 = tabular, > 9 tabular with comment lines [Integer] default = 0 > > You can set this by reformatting on the BLAST web site (here's a chunk of > the output, note the query): > > Query 61 > PVTVGEIDITLYRDDLS-KKTSND-E--PLVKGADIPVDIT-------DQKVILVDDVLY 109 > NP_389430 61 > PVTVGEIDITLYRDDLS-KKTSND-E--PLVKGADIPVDIT-------DQKVILVDDVLY 109 > YP_001421124 61 > PVTVGEIDITLYRDDLT-KKTSNE-E--PLVKGADIPADIT-------DQKVIVVDDVLY 109 > YP_078940 63 > KVTVGELDITLYRDDLS-KKTSNK-E--PLVKGADIPADIT-------DQKVILVDDVLY 111 > ZP_03053294 61 > PVIVGELDITLYRDDLT-KKTENQ-D--PLVKGADIPADIN-------DKTLIVVDDVLF 109 > YP_001486689 61 > PVIVGELDITLYRDDLT-KKTDNQ-D--PLVKGADIPADIN-------DKTLIVVDDVLF 109 > YP_002949168 60 > AVPVGELDITLYRDDLT-VKTIDH-E--PLVKGTDVPFDVT-------NKKVILVDDVLF 108 > ZP_01860800 61 > KMPVGEIDITLYRDDLT-VKTANE-E--PEVKGSDLPVDVT-------DKKVILIDDVLF 109 > ZP_04121773 61 > EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109 > ZP_04218628 61 > EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109 > YP_002316154 66 > SIPVGELDITLYRDDLT-VKTDDR-E--PLVKGTDVPFSVT-------NQKVILVDDVLF 114 > ZP_00240953 61 > EMEVGELDITLYRDDLT-LQSKNE-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109 > YP_037953 61 > EIEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109 > ZP_04193166 61 > KMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109 > NP_833611 61 > EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109 > ZP_03018932 61 > EMEVGELDITLYRDDLT-LQSKNK-E--PLVKGSDIPVDIT-------KKKVILVDDVLY 109 > ... > > We do not have a parser for that format, BTW, but it wouldn't be too hard to > get something working quickly based on one of the current parsers. Probably > could go AlignIO or SearchIO (or both). > > chris > > On Jul 23, 2009, at 2:38 PM, Robert Buels wrote: > >> Wow, that silence is deafening. I can't believe somebody who knows what >> they're talking about hasn't written you back yet. >> >> Perhaps you could do some kind of transformation where you read in the >> BLAST report with Bio::SearchIO, and then write to MSF with >> Bio::AlignIO::msf? You would probably need to do some fiddling to create >> the proper objects and relationships that Bio::AlignIO::msf would want. >> >> But this reply probably isn't helpful, because you probably already knew >> that much. I'm mostly just trying to add to this thread so that people who >> actually know a lot about BioPerl's functions in this area will see it and >> hopefully be of more help. >> >> Rob >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> >> >> Thomas Keller wrote: >>> >>> Greetings, >>> Blast 2.2.21 has a multi-sequence alignment feature that is really handy: >>> put in the accession number of the refseq in one sequence field and a >>> concatenated fasta file of the Sanger reads to align in the second box and >>> it does the alignments. Unfortunately, the output is a series of alignments >>> rather than the more useful msf format with all reads aligned with the >>> reference. >>> Is there a bioperl module that reads the blast alignments and converts it >>> to an msf alignment? >>> thanks, >>> Tom >>> kellert at ohsu.edu >>> 503-494-2442 >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bernd.web at gmail.com Fri Jul 24 08:19:00 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 24 Jul 2009 14:19:00 +0200 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <2379556E-937B-4BAC-9BA4-6C0092AD804B@berkeleybop.org> References: <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> <4A60EBB5.4010004@cornell.edu> <4A60FFF8.3030302@jays.net> <66FDE248-4CF8-4F68-91D5-16D0AE30B36E@illinois.edu> <91389D4D-B46C-49BA-9D5D-04DD82014B1C@jays.net> <025907D4D2344FDC90E915E605B7FEB8@NewLife> <2379556E-937B-4BAC-9BA4-6C0092AD804B@berkeleybop.org> Message-ID: <716af09c0907240519tdba21fcjaddcdceeeef91bc2@mail.gmail.com> Hi, I realize splitting BioPerl into smaller packages maybe nice (or not) depending on where/how you use/develop bioperl. Sendu wrote: "But while BioPerl is still monolithic, how will people be able to choose which external dependencies they want to install? That's the question that must be resolved before getting rid of Bio::Root::Build. You'd also need to resolve the network tests issue. And, well, I guess all the other issues that Bio::Root:Build solves." Actually, I and many student I worked with really likes the monolithic form of BioPerl. No fuss in choosing what you want and finding out later you need more. Simply install everything (which was possibly slighly easier with the old perl Make files). Students that worked on several computers even just saved the entire bioperl distro on their USB stick, and easily could use and program with BioPerl on any PC. >From the user's perpective, we really liked to have just one package. I realize that this may be not as nice for the core developers. Regards, Bernd From cjfields at illinois.edu Fri Jul 24 08:19:42 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Jul 2009 07:19:42 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00907240228u797c316ds94297f349d8c097c@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <92C15E3391F64BAF801754E924122540@NewLife> <200906170927.13273.tristan.lefebure@gmail.com> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> <5128A289-377E-4EC3-9030-E0E91B463EA1@illinois.edu> <320fb6e00907240228u797c316ds94297f349d8c097c@mail.gmail.com> Message-ID: <222FBEA4-37CD-4619-9BBD-AB502CF85AD5@illinois.edu> On Jul 24, 2009, at 4:28 AM, Peter wrote: > On Thu, Jul 23, 2009 at 11:58 PM, Chris > Fields wrote: >>> i.e. Something like this four line Biopython script would be >>> perfect: >>> http://biopython.org/wiki/Reading_from_unix_pipes >> >> We use named parameters so it's a little more verbose. >> >> use Bio::SeqIO; >> my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'fastq-sanger'); >> my $out = Bio::SeqIO->new(-format => 'fastq-solexa'); >> while (my $seq = $in->next_seq) { $out->write_seq($seq) } > > Thanks. So that implicitly uses STDOUT for the output? Yes. >> Don't be surprised if there are still bugs lurking about, just let >> me know >> and I'll fix 'em. > > Have you guys (BioPerl) have also gone for "fastq-sanger" instead of > just "fastq" for the Sanger Standard version of FASTQ (like EMBOSS)? > Does BioPerl use just "fastq" to mean anything? Short answer: yes, and yes. Slightly longer answer: I've set up SeqIO so it converts "new(-format => 'foo-bar')" to new(-format => 'foo, -variant => 'bar'). In the fastq constructor, if the variant is expected but isn't defined (i.e. for 'fastq') it defaults to sanger. Makes it a bit easier maintenance- wise if a new variant pops up. > If BioPerl and EMBOSS are using "fastq-sanger", I think Biopython will > have to support that as an alias too: > http://lists.open-bio.org/pipermail/biopython-dev/2009-July/ > 006416.html > > Thanks, > > Peter It's consistent with the 'format-variant' usage, but 'fastq' for us is backwards-compatible, so we'll likely support both. chris From biopython at maubp.freeserve.co.uk Fri Jul 24 09:00:23 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 24 Jul 2009 14:00:23 +0100 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <222FBEA4-37CD-4619-9BBD-AB502CF85AD5@illinois.edu> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> <5128A289-377E-4EC3-9030-E0E91B463EA1@illinois.edu> <320fb6e00907240228u797c316ds94297f349d8c097c@mail.gmail.com> <222FBEA4-37CD-4619-9BBD-AB502CF85AD5@illinois.edu> Message-ID: <320fb6e00907240600p7cc41b37wc7c0f748160f109@mail.gmail.com> Hi all, On Fri, Jul 24, 2009 at 1:19 PM, Chris Fields wrote: >> >> Have you guys (BioPerl) have also gone for "fastq-sanger" instead of >> just "fastq" for the Sanger Standard version of FASTQ (like EMBOSS)? >> Does BioPerl use just "fastq" to mean anything? > > Short answer: yes, and yes. > > Slightly longer answer: I've set up SeqIO so it converts "new(-format => > 'foo-bar')" to new(-format => 'foo, -variant => 'bar'). ?In the fastq > constructor, if the variant is expected but isn't defined (i.e. for 'fastq') > it defaults to sanger. ?Makes it a bit easier maintenance-wise if a new > variant pops up. Right, so BioPerl understands "fastq" and "fastq-sanger" to mean the Sanger standard FASTQ files. I've just updated Biopython to also allow "fastq-sanger" as an alias for "fastq", so we are consistent here: http://lists.open-bio.org/pipermail/biopython-dev/2009-July/006466.html Biopython, BioPerl and EMBOSS now all agree on the format names: * "fastq-sanger" - PHRED scores offset 33 * "fastq-solexa" - Solexa scores offset 64 * "fastq-illumina" - PHRED scores offset 64 And Biopython and BioPerl also agree on the meaning of "fastq" as an alias for "fastq-sanger". Unfortunately EMBOSS differs here, see: http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000599.html Does BioJava or BioRuby have a SeqIO equivalent where they need to give different sequence formats unique names? If so, we should talk to them soon... Peter From IRytsareva at dow.com Mon Jul 20 11:02:19 2009 From: IRytsareva at dow.com (Rytsareva, Inna (I)) Date: Mon, 20 Jul 2009 11:02:19 -0400 Subject: [Bioperl-l] Adaptor Message-ID: <3C9BDF0E91897443AD3C8B34CA8BDCA8020CFE25@USMDLMDOWX028.dow.com> Hello, To get GBrowse to use my custom database (MySQL, not GFF, not Chado), I would need to write my own adaptor. I started by looking at existing adaptors: the Chado and BioSql adaptors. And I'm looking for any doc on how to write a GBrowse adaptor. Does anybody have that experience? How can I get started? Thanks, Inna Rytsareva Discovery Information Management Dow AgroSciences Indianapolis, IN 317-337-4716 From karthik085 at gmail.com Tue Jul 21 11:07:17 2009 From: karthik085 at gmail.com (Rajasekar Karthik) Date: Tue, 21 Jul 2009 11:07:17 -0400 Subject: [Bioperl-l] Bioperl Entrez Esearch In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> Message-ID: Russell / Others, The utilities keep printing out warnings and errors. Is there any way to a) either not print at all b) or send them to some other log file other than apache's error.log Thanks. On Wed, Jul 15, 2009 at 5:34 PM, Rajasekar Karthik wrote: > that helps - thanks!!! > > > On Tue, Jul 14, 2009 at 6:33 PM, Smithies, Russell < > Russell.Smithies at agresearch.co.nz> wrote: > >> You sure can. >> Take a look at http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >> >> >> --Russell >> >> > -----Original Message----- >> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> > bounces at lists.open-bio.org] On Behalf Of Rajasekar Karthik >> > Sent: Wednesday, 15 July 2009 10:23 a.m. >> > To: bioperl-l at lists.open-bio.org >> > Subject: [Bioperl-l] Bioperl Entrez Esearch >> > >> > Hi, >> > I an new to Bioperl. How can I do an Entrez Esearch using Bioperl? >> > >> > For example, I want to do an exact title search in pubmed >> > Title: Guidelines for quantitative rt-PCR >> > >> > Using HTTP Get, I would do something like this >> > URL: >> > >> http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&field=titl&te >> > rm=Guidelines%20for%20quantitative%20rt-PCR >> > to get the response XML. >> > >> > How can I use Bioperl to do the above action? >> > >> > Thanks. >> > >> > -- >> > Best Regards, >> > Rajasekar Karthik >> > karthik085 at gmail.com >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> > > > > -- > Best Regards, > Rajasekar Karthik > karthik085 at gmail.com > -- Best Regards, Rajasekar Karthik karthik085 at gmail.com From gmodhelp at googlemail.com Thu Jul 23 12:58:54 2009 From: gmodhelp at googlemail.com (Dave Clements, GMOD Help Desk) Date: Thu, 23 Jul 2009 09:58:54 -0700 Subject: [Bioperl-l] Fwd: [Gmod-help] Problem on using berkeleydb In-Reply-To: <4A5ED366.6010308@kuicr.kyoto-u.ac.jp> References: <4A5ED366.6010308@kuicr.kyoto-u.ac.jp> Message-ID: <71ee57c70907230958h1ad7dbf7s35352d67aae6c78@mail.gmail.com> Hi Kazushi, I can't find this bug reported anywhere else. Are you using GFF2 (Bio::DB::GFF) or GFF3 (Bio::DB::SeqFeature::Store)? The Berkeleydb adaptor is part of BioPerl. I have forwarded your question to the bioperl list (and to the GBrowse list as well), Thanks, Dave C. 2009/7/16 Kazushi Hiranuka > Hi, > > I'm currently setting up Gbrowse for mapping and viewing full-length > cDNA data, and most of the features work fine. > One problem is that I tried to use "draw_target" configuration as in > 2.8.1 of Tutorial but Gbrowse didn't show the multiple alignment at high > magnification. Without any other changes, however, just switching > database adaptor setting from berkeleydb to on-memory solved the problem > and displayed sequence alignments correctly as well as "show_mismatch" > feature. > > Is there any solution reported on this problem? Will it work correctly > in mySQL as well? > Since I don't like non-root restriction of mySQL in the server and don't > want to use on-memory adaptor, it will be nice for me to keep using > berkelydb. > > Thank you for your time, > > Kazushi Hiranuka > -- * Register now for the August GMOD Meeting: http://gmod.org/wiki/August_2009_GMOD_Meeting * Please keep responses on the list! * Was this helpful? Let us know at http://gmod.org/wiki/Help_Desk_Feedback -------------- next part -------------- A non-text attachment was scrubbed... Name: mail_atach.png Type: image/png Size: 29843 bytes Desc: not available URL: From lincoln.stein at gmail.com Fri Jul 24 09:28:15 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 24 Jul 2009 09:28:15 -0400 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> Message-ID: <6dce9a0b0907240628k5d60542bi81157801478cfaa9@mail.gmail.com> Sorry I'm joining this thread so late, but I've been taking a break from development work. Since the Bio::Graphics split is being used as an exemplar, I'd like to share with you how the process went. Overall, it was pretty painless. The main issue that I encountered was that there was a high-performance Bio::SeqFeatureI object called Bio::Graphics::Feature that was used both by Bio::Graphics and by Bio::DB::SeqFeature. This caused a cross-dependency between Bio::Graphics and BioPerl. When I realized this problem, I proposed to the mailing list to create a replacement for Bio::Graphics::Feature called Bio::SeqFeature::Lite that would live in the BioPerl distribution. When this ideas was OK'd, I replaced the Bio::Graphics::Feature module in Bio::Graphics with a shell class that inherited from Bio::SeqFeature::Lite. With this dependency removed, I was able to lift Bio/Graphics out of BioPerl and put it in its own repository. Creating the Build.PL and regression tests were then very straightforward. I think the whole process took about six hours, spread across two days. It made my life a whole lot easier to be able to release new versions of Bio::Graphics independent of the BioPerl distribution. I think many (but not all) of the BioPerl modules could be handled this way, and that the easiest way to deal with this is to schedule to extract them singly in a careful, step-by-step fashion rather than to try to reorganize everything all at once. Lincoln On Fri, Jul 17, 2009 at 1:01 PM, Jason Stajich wrote: > Will try to weigh in more, a little bit of stream of consciousness to let > you know I'm thinking about it. Tough summer to focus much on this. > > It's too bad we are apparently the laughing stock of Perl gurus, but it > would be great to see how to modernize aspects of the development. > > I'm curious how it will work that we'll have dozens of separate distros > that we'll have a hard time keeping track of what directory things are in? > Will there have to be a master list of what version and what modules are in > what distro now? > > When I do a SVN (or git) checkout do I need to checkout each of these in > its own directory? Or will there be a master packaging script that makes > the necessary zip files for CPAN submission? If they are in separate > directories are we organizing by conceptual topic (phylogenetics, alignment, > database search) or by namespace of the modules? Do all the 'database' > modules live together - probably not - so do we name bioperl-db-remote > bioperl-db-local-index, bioperl-db-local-sql, etc? really bioperl-db is > somewhat focused on sequences and features, but what about things that > integrate multiple data types - like biosql? > > If they are in separate directories, what about all the test data that > might be shared, is this replicated among all the sub-directories - how do > we do a good job keeping that up to date, could we have a test-data distro > instead with symlinks within SVN? > > For some other obvious modules that can be split off and self-contained, > each of these could be a package. I would estimate more than 20 packages > depending on how Bio::Tools are carved up. > - I think Bio::DB::SeqFeature needs to be split off for sure this is a nice > logical peeling off. Could be another test case since it is a Gbrowse > dependancy. > - Bio::DB::GFF as well for the same reasons. > - Bio::PopGen - self contained for the most part, but depends on Bio::Tree > and Bio::Align objects > - Bio::Variation > - Bio::Map and Bio::MapIO > - Bio::Cluster and Bio::ClusterIO > - Bio::Assembly > - Bio::Coordinate > > My nightmare is that we're going to have to manage a lot of 'use XX 1.01' > enforcing version requiring when dealing with the dependancies on the > interface classes and having to keep these all up to date? The version was > implicit when they are all part of the same big distro. > > Also the splits need not only include one namespace if need be I guess but > we have generally grouped things by namespace. > > What do you want to do about the bioperl-run. Do we make a set of parallel > splits from all of these? I think at the outset we need to coordinate the > applications supported here in some sort of loose ontology - the namespaces > were not consistently applied so we have some alignment tools in different > directories, etc. So the namespace sort of classifies them but it could be > better. One of the challenges of multiple developers without a totally > shared vision on how it should be done. > > I'm not convinced that the Bio::Graphics splitoff has been painless so we > should take stock of how that is working. > > It seems like this split off would be a way to better streamline things in > bioperl so that modern versions of bioperl might be able to better interface > with things like Ensembl again too. > > How much of this effort is worth triaging on the current code versus the > efforts we want to make on a cleaner, simpler bioperl system that appears to > scare so many users (and potential developers) off. > > Okay I rambled, hope that was helpful. > > -jason > -- > Jason Stajich > jason at bioperl.org > > On Jul 17, 2009, at 2:08 AM, Robert Buels wrote: > > Chris Fields wrote: >> >>> Yes, I agree. However a large set of modules in bioperl were effectively >>> donated by the author, so they will fall to the core devs to maintain by >>> sheer property of legacy. >>> >> >> This is a very sticky point. The only way I can think of would be to have >> each distro have a "principal maintainer", that is the go-to guy for issues >> related to keeping it running, but can beg and cajole others to help. At >> least there will be fewer problems per distribution, since they would be >> smaller. If a maintainer has to stop, he has to find somebody else to do >> it, or the package sits there and bit rot sets in. That's just how it goes. >> If it's important enough (like if it's depended on by a dist that IS >> maintained), somebody will pick it up. >> >> On bugs: >>> >> >> >>> On API and the 'chicken-or-egg' issue: >>> >> >> >>> What I would like is have the various breakaway Bio::* either fall back >>> to Module::Build if Bio::Root::Build isn't present, or just use >>> Module::Build. My suggestion is to just use Module::Build directly, but we >>> could scale down Bio::Root::Build to respect the Module::Build API (thus >>> allowing it as a fallback). >>> >> I'm not sure about this, I'm not an expert on the ins and outs of >> subclassing Module::Build. >> >> One idea I do have, however, is that we might think about using an xt/ >> directory for intensive and network-based tests that are not meant to be run >> by automated installers, which could help simplify the test and build code. >> I've heard that this is a pretty common practice in other projects. >> >> ===================== >> >> Anyway, let's develop some concrete plans. I would say that the plan at >> http://www.bioperl.org/wiki/Proposed_core_modules_changes is a >> half-measure, in light of the successful (painless?) Bio::Graphics >> extraction. >> >> Here's a new proposal: >> >> 1.) renew/construct the Bundle/Task::Bioperl, get it pulling in all the >> current Bioperl modules as dependencies (or however it works) >> >> 2.) start repeating the same extraction procedure used with Bio::Graphics: >> * identify a candidate set of modules in bioperl-live to be extracted >> into their own distribution, propose the extraction on the mailing list, get >> some kind of agreement >> * make a new component in the svn repository (alongside the bioperl-live >> and other dirs) named something like Bio-Something-Something, with trunk/, >> branches/, and tags/ subdirs. >> * svn cp modules into the new trunk/lib/, tests into trunk/t, scripts >> into trunk/scripts, and write a Build.PL just like the one Lincoln wrote for >> Bio::Graphics. >> * when the extracted copy looks good, use svn merge to port any changes >> that happened in trunk to the new extracted modules if necessary and test. >> * delete the old copy from bioperl-live/trunk. >> * identify a new candidate set of modules, propose on the mailing list, >> and repeat >> >> 2.5) continue releasing 1.6.X bugfix releases while this is going on. >> >> 3.) when bioperl-live is down to a truly reasonable core set, (fewer than >> 10 modules might be a good target), rename it to Bio-Perl-Core, go through a >> round of testing, and push them all to CPAN at once. Task::BioPerl will have >> dependencies on the module names, I think, so it will continue to install >> the same from users' perspectives, it will just be downloading different >> dists. >> >> 4.) repeat steps 1-3 with bioperl-run, and maybe others. >> >> Thoughts? If people like it, I or somebody else could put it on the wiki. >> >> And of course, I volunteer to put in a lot of work on this. I'll try to >> see if I can identify some other likely extraction candidates as a >> preliminary step and report back to the list. >> >> Also we need some more people besides just me and Chris talking and >> thinking about this, these are large reshufflings being proposed. >> >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > Jason Stajich > jason at bioperl.org > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From lincoln.stein at gmail.com Fri Jul 24 09:31:11 2009 From: lincoln.stein at gmail.com (Lincoln Stein) Date: Fri, 24 Jul 2009 09:31:11 -0400 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <2182D83B-D855-48B5-B57B-52F1D0FC78B6@gmail.com> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <9C987542-1B90-4462-9DE9-F88007579ACA@illinois.edu> <2182D83B-D855-48B5-B57B-52F1D0FC78B6@gmail.com> Message-ID: <6dce9a0b0907240631n48ce47bela1c3446a8c8c3d93@mail.gmail.com> My preference would be to split both Bio::DB::SeqFeature and Bio::DB::GFF into their own module. I don't think they depend on each other, but I'm not 100% sure! Lincoln On Sat, Jul 18, 2009 at 8:23 AM, Scott Cain wrote: > Hi All, > > I don't want to wade in too deeply, but I like the idea of splitting things > up. I think the Bio::Graphics split has gone well and has made life easier > in GBrowse world. I could see Bio::DB::SeqFeature and Bio::DB::GFF being > split and either being kept together or going there separate ways (though I > have a nagging suspicion that SeqFeature code depends on GFF code in a few > places, so it may make sense to just keep them together. > > And Chris, if it makes you feel any better, I don't think anything you've > done or not done has held up GBrowse2. > > Scott > > > > On Jul 17, 2009, at 11:14 PM, Chris Fields wrote: > > My 2c... >> >> On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote: >> >> Will try to weigh in more, a little bit of stream of consciousness to let >>> you know I'm thinking about it. Tough summer to focus much on this. >>> >> >> Yes, for me as well. That will change soon (approx two weeks) ;> >> >> It's too bad we are apparently the laughing stock of Perl gurus, but it >>> would be great to see how to modernize aspects of the development. >>> >>> I'm curious how it will work that we'll have dozens of separate distros >>> that we'll have a hard time keeping track of what directory things are in? >>> Will there have to be a master list of what version and what modules are in >>> what distro now? >>> >> >> I don't think we're a laughingstock as much as we haven't had the time to >> dedicate towards this (and much of this occurred at a point early on, with >> that whole 'Cathedral and Bazaar' esr-based thingy). BTW,, those same gurus >> shouldn't speak: perl core is just as bad and riddled with worse bugs, >> though rgs and co. wouldn't admit it. >> >> In fact, base.pm itself has a nasty one; I'm surprised no one in the >> bioperl community has noticed it yet (it's listed as a bug on RT I think): >> >> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print >> $Bio::SeqIO::VERSION."\n"' >> 1.0069 >> pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print >> $Bio::Root::IO::VERSION."\n"' >> -1, set by base.pm >> >> Imported modules do not have VERSION set correctly when it is exported. >> This hasn't become an issue in bioperl yet (it's really an edge case), but >> several devs have run into this. And really, why set VERSION to a string >> like '-1, set by base.pm'? >> >> Anyway, re: versioning, the way I think about it, if we have a small very >> stable core with version X, and a focused very stable module group with >> version Y, other distributions would have a separate version and require >> subgroup version Y (which would in turn require core version X). CPAN would >> take care of it. This isn't much different than what occurs everyday on >> CPAN anyway (Jay's Catalyst, Moose and MooseX, and so on). In fact, several >> Moose-requiring distributions don't require the latest Moose. >> >> When I do a SVN (or git) checkout do I need to checkout each of these in >>> its own directory? Or will there be a master packaging script that makes >>> the necessary zip files for CPAN submission? >>> >> >> Not sure; that would be up to us I suppose. I think it would be easier to >> maintain and release if they were separate or packaged up as Jay suggests. >> >> If they are in separate directories are we organizing by conceptual topic >>> (phylogenetics, alignment, database search) or by namespace of the modules? >>> >> >> By topic, retaining namespaces. We have a basic Bio::* directory >> structure already in place for various generic terms (Tools, DB, etc), so I >> see this crossing simple namespaces very easily. And as I pointed out to >> Robert, several of those could possibly go together. >> >> Do all the 'database' modules live together - probably not - so do we >>> name bioperl-db-remote bioperl-db-local-index, bioperl-db-local-sql, etc? >>> really bioperl-db is somewhat focused on sequences and features, but what >>> about things that integrate multiple data types - like biosql? >>> >> >> I don't see bioperl-db (BioSQL) being split up. I think it's too >> intrinsically linked and cohesive (it's almost a separate core unto itself), >> so it would be counterproductive to do so. >> >> Maybe have bioperl-db become bioperl-biosql. Web-based = >> bioperl-remotedb. Local = bioperl-localdb. OBDA = bioperl-obda. >> >> If they are in separate directories, what about all the test data that >>> might be shared, is this replicated among all the sub-directories - how do >>> we do a good job keeping that up to date, could we have a test-data distro >>> instead with symlinks within SVN? >>> >> >> We have to see how much is actually shared and proceed from there. I >> would like to eventually resurrect the idea of a separate biodata repo that >> we could just ftp the data from as needed. That would cut down on the >> package size quite a bit, but I'm not sure how feasible that is from the >> testing point of view (would we have to skip all tests if there were no >> network access)? >> >> For some other obvious modules that can be split off and self-contained, >>> each of these could be a package. I would estimate more than 20 packages >>> depending on how Bio::Tools are carved up. >>> - I think Bio::DB::SeqFeature needs to be split off for sure this is a >>> nice logical peeling off. Could be another test case since it is a Gbrowse >>> dependancy >>> - Bio::DB::GFF as well for the same reasons. >>> >> >> Completely agree (and I think Lincoln would like this as well). >> >> - Bio::PopGen - self contained for the most part, but depends on >>> Bio::Tree and Bio::Align objects >>> >> >> Could list those as a required dependency. >> >> - Bio::Variation >>> - Bio::Map and Bio::MapIO >>> - Bio::Cluster and Bio::ClusterIO >>> - Bio::Assembly >>> - Bio::Coordinate >>> >>> My nightmare is that we're going to have to manage a lot of 'use XX 1.01' >>> enforcing version requiring when dealing with the dependancies on the >>> interface classes and having to keep these all up to date? The version was >>> implicit when they are all part of the same big distro. >>> >> >> Right. But it also becomes a maintenance problem when serious bugs in one >> module impede the needed release of others to CPAN. >> >> Also the splits need not only include one namespace if need be I guess >>> but we have generally grouped things by namespace. >>> >>> What do you want to do about the bioperl-run. Do we make a set of >>> parallel splits from all of these? I think at the outset we need to >>> coordinate the applications supported here in some sort of loose ontology - >>> the namespaces were not consistently applied so we have some alignment tools >>> in different directories, etc. So the namespace sort of classifies them but >>> it could be better. One of the challenges of multiple developers without a >>> totally shared vision on how it should be done. >>> >> >> We could split bp-run and Tools, pairing the wrappers with the relevant >> parsers modules. Not sure if this can be done with SearchIO as well but it >> could be tested to see how feasible that would be. >> >> I'm not convinced that the Bio::Graphics splitoff has been painless so we >>> should take stock of how that is working. >>> >> >> Really? Lincoln has made several fixes lately on CPAN, so I thought >> everything was going well. If anything I would think the lack of additional >> 1.6.x bioperl releases has probably held Gbrowse 2.0 up more due to >> Bio::DB::SeqFeature (my fault, but as you know life and job take precedence >> sometimes). >> >> It seems like this split off would be a way to better streamline things >>> in bioperl so that modern versions of bioperl might be able to better >>> interface with things like Ensembl again too. >>> >>> How much of this effort is worth triaging on the current code versus the >>> efforts we want to make on a cleaner, simpler bioperl system that appears to >>> scare so many users (and potential developers) off. >>> >> >> I say triage away on a branch, but we need to indicate which ones to >> whittle out first. The reason I believe we went for a larger split >> initially (as indicated on the wiki page) was to push something forward and >> not get too bogged down in the details. But we may as well go full throttle >> and do this right away. >> >> Okay I rambled, hope that was helpful. >>> >>> -jason >>> -- >>> Jason Stajich >>> jason at bioperl.org >>> >> >> Very, very helpful. Now I need a beer. >> >> chris >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Lincoln D. Stein Director, Informatics and Biocomputing Platform Ontario Institute for Cancer Research 101 College St., Suite 800 Toronto, ON, Canada M5G0A3 416 673-8514 Assistant: Renata Musa From biopython at maubp.freeserve.co.uk Fri Jul 24 09:32:49 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 24 Jul 2009 14:32:49 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS Message-ID: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> Hi all, Peter Rice kindly said he will look into an OBF cross project mailing list, but in the meantime this has been cross posted to the Biopython, BioPerl, and EMBOSS development lists. On Thu, Jul 23, 2009 at 11:58 PM, Chris Fields wrote: >> I'd like to get comparisons against BioPerl's new FASTQ support >> going too. To do this I'd need to know which (branch?) of BioPerl I >> should install, and I'd also like a trivial sample BioPerl script to do >> piped FASTQ conversion. i.e. read a FASTQ file from stdin (say >> as "fastq-solexa"), and output it to stdout (say as "fastq" meaning >> the Sanger Standard FASTQ). > > You would have to install svn (bioperl-live) if you want the refactored > fastq. ?That commit was within the last month. I've got SVN bioperl-live installed and apparently working :) >> i.e. Something like this four line Biopython script would be perfect: >> http://biopython.org/wiki/Reading_from_unix_pipes > > We use named parameters so it's a little more verbose. > > use Bio::SeqIO; > my $in ?= Bio::SeqIO->new(-fh => \*STDIN, -format => 'fastq-sanger'); > my $out = Bio::SeqIO->new(-format => 'fastq-solexa'); > while (my $seq = $in->next_seq) { $out->write_seq($seq) } > > Don't be surprised if there are still bugs lurking about, just let me know > and I'll fix 'em. I've got a bug report coming up in a second email, but the basics work :) e.g. Using this Sanger style FASTQ file, and converting it to Solexa style http://biopython.org/SRC/biopython/Tests/Quality/example.fastq $ more example.fastq @EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC + ;;3;;;;;;;;;;;;7;;;;;;;88 @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA + ;;;;;;;;;;;7;;;;;-;;;3;83 @EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG + ;;;;;;;;;;;9;7;;.7;393333 This is simple three record FASTQ file (in the Sanger format). Using EMBOSS 6.1.0: $ seqret -filter -sformat fastq-sanger -osformat fastq-solexa < example.fastq @EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC +EAS54_6_R1_2_1_413_324 ZZRZZZZZZZZZZZZVZZZZZZZWW @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA +EAS54_6_R1_2_1_540_792 ZZZZZZZZZZZVZZZZZLZZZRZWR @EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG +EAS54_6_R1_2_1_443_348 ZZZZZZZZZZZXZVZZMVZRXRRRR Using BioPerl: $ perl bioperl_sanger2solexa.pl < example.fastq @EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC +EAS54_6_R1_2_1_413_324 ZZRZZZZZZZZZZZZVZZZZZZZWW @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA +EAS54_6_R1_2_1_540_792 ZZZZZZZZZZZVZZZZZLZZZRZWR @EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG +EAS54_6_R1_2_1_443_348 ZZZZZZZZZZZXZVZZMVZRXRRRR Using Biopython: $ python biopython_sanger2solexa.py < example.fastq @EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC + ZZRZZZZZZZZZZZZVZZZZZZZWW @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA + ZZZZZZZZZZZVZZZZZLZZZRZWR @EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG + ZZZZZZZZZZZXZVZZMVZRXRRRR They all agree, except that Biopython has followed the MAQ convention of omitting the (optional) repeat of the captions on the plus lines. This is something I'd already asked Peter Rice about for EMBOSS (but I think we got sidetracked): http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000577.html Peter From biopython at maubp.freeserve.co.uk Fri Jul 24 09:53:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 24 Jul 2009 14:53:40 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> Message-ID: <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> On Fri, Jul 24, 2009 at 2:32 PM, Peter wrote: >> >> Don't be surprised if there are still bugs lurking about, just let me >> know and I'll fix 'em. > > I've got a bug report coming up in a second email, but the basics work :) I think I have found a bug in BioPerl's conversion from fastq-solexa to fastq-sanger concerning lower quality scores. Here is an artificial Solexa file using the Solexa scores from 40 down to -5 (which I believe to be the full range expected from an instrument). $ more solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<; A Solexa quality of 40 maps to ASCII 40+64 = 104, "h" A Solexa quality of -5 maps to ASCII -5+64 = 59, ";" You should find this example has Solexa scores 40, 39, .., -4, -5. This file is in the Biopython repository under biopython/Tests/Quality Here is the conversion using MAQ (with the chomp fix from Tim Yu to remove an extra "!" character, see the maq-help mailing list for 10 July 2009): http://sourceforge.net/mailarchive/forum.php?thread_name=320fb6e00906170708lb2ce4f7qbc5dfa43543189a2%40mail.gmail.com&forum_name=maq-help $ perl fq_all2std.pl sol2std < solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN + IHGFEDCBA@?>=<;:9876543210/.-,++*)('&&%%$$##"" Here is the Biopython conversion, which is identical: $ python biopython_solexa2sanger.py < solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN + IHGFEDCBA@?>=<;:9876543210/.-,++*)('&&%%$$##"" EMBOSS 6.1.0 has a rounding issue with negative Solexa scores, and the last six qualities are up by one - Peter Rice is aware of this, and has a fix: http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000596.html $ seqret -filter -sformat fastq-solexa -osformat fastq-sanger < solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 IHGFEDCBA@?>=<;:9876543210/.-,+*)(''&%%$$##""" Now we come to BioPerl, $ perl bioperl_solexa2sanger.pl < solexa_faked.fastq @slxa_0001_1_0001_01 ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTNNNNNN +slxa_0001_1_0001_01 IHGFEDCBA@?>=<;:9876543210/.-,+++*)(''&&&&%%%% You look fine for the higher qualities, but there is something really wrong for the lower scores (not just the negative ones). I'll leave you to double check the details, but here are the Sanger PHRED qualities decoded into integers (using Biopython to convert from "fastq-sanger" to "qual" output): $ perl bioperl_solexa2sanger.pl < solexa_faked.fastq | python biopython_sanger2qual.py >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 10 9 8 7 6 6 5 5 5 5 4 4 4 4 $ perl fq_all2std.pl sol2std < solexa_faked.fastq | python biopython_sanger2qual.py >slxa_0001_1_0001_01 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 10 9 8 7 6 5 5 4 4 3 3 2 2 1 1 Peter C. P.S. This is the BioPerl script I am using here: $ more bioperl_solexa2sanger.pl use Bio::SeqIO; my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'fastq-solexa'); my $out = Bio::SeqIO->new(-format => 'fastq-sanger'); while (my $seq = $in->next_seq) { $out->write_seq($seq) }; From biopython at maubp.freeserve.co.uk Fri Jul 24 11:12:57 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 24 Jul 2009 16:12:57 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> Message-ID: <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> On Fri, Jul 24, 2009 at 2:53 PM, Peter wrote: > On Fri, Jul 24, 2009 at 2:32 PM, Peter wrote: >>> >>> Don't be surprised if there are still bugs lurking about, just let me >>> know and I'll fix 'em. >> >> I've got a bug report coming up in a second email, but the basics work :) > > I think I have found a bug in BioPerl's conversion from fastq-solexa > to fastq-sanger concerning lower quality scores. Next up is an issue with BioPerl converting from Sanger to Illumina. In principle this is simple - the quality strings both use PHRED scores just with different offsets. With lower PHRED scores, everything is fine: $ more sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + IHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! Again, this is an example constructed by hand to cover a broad range of valid scores, and can be found in the Biopython repository under biopython/Tests/Quality $ perl bioperl_sanger2illumina.pl < sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN +Test PHRED qualities from 40 to 0 inclusive hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ $ python biopython_sanger2illumina.py < sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ So, BioPerl and Biopython (and EMBOSS) agree - apart from the repeating second title on the plus line. I understand that EMBOSS will in future omit the repeated title on the plus line: http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000598.html Now, here comes the problem. I believe FASTQ files directly from an Illumina 1.3+ pipeline will have PHRED scores in the range 0 to 40 (as in this example). However, much higher PHRED scores are possible during assembly / contig'ing and read mapping. For example, the tool MAQ will output Sanger style FASTQ files with PHRED scores in the range 0 to 93 inclusive. Now, in the Sanger FASTQ format, PHRED scores of 0 to 93 map onto ASCII values of 33 to 126 (! to ~). There is a reason for stopping at 126, since ASCII 127 is "delete". However, in the Illumina 1.3+ FASTQ format, PHRED scores of 0 to 93 would map to ASCII values of 64 to 157, which includes a lot of non printing characters. Working with such files at the command line or in an editor is a big problem. Clearly, Illumina never intended to include such high scores in their FASTQ files! Nevertheless, it is possible to write a FASTQ format following the Illumina 1.3+ encoding with these values. Biopython and EMBOSS attempt to do this - although I would regard throwing an error as equally acceptable. So, here is another hand constructed example of a Sanger style FASTQ file using the full quality range: $ more sanger_93.fastq @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN + ~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! Again, this example is in the Biopython repository under biopython/Tests/Quality Just to check: $ python biopython_sanger2qual.py < sanger_93.fastq >Test PHRED qualities from 93 to 0 inclusive 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 So, here we go - apologies for the expected line mangling: $ seqret -filter -sformat fastq-sanger -osformat fastq-illumina < sanger_93.fastq | hexdump -C -v 00000000 40 54 65 73 74 20 50 48 52 45 44 20 71 75 61 6c |@Test PHRED qual| 00000010 69 74 69 65 73 20 66 72 6f 6d 20 39 33 20 74 6f |ities from 93 to| 00000020 20 30 20 69 6e 63 6c 75 73 69 76 65 0a 41 43 54 | 0 inclusive.ACT| 00000030 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 |GACTGACTGACTGACT| 00000040 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 |GACTGACTGACTGACT| 00000050 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 |GACTGACTGACTGACT| 00000060 47 41 43 54 47 41 43 54 47 0a 41 43 54 47 41 43 |GACTGACTG.ACTGAC| 00000070 54 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 |TGACTGACTGACTGAC| 00000080 54 47 41 43 54 47 41 43 54 47 41 4e 0a 2b 54 65 |TGACTGACTGAN.+Te| 00000090 73 74 0a 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 |st..............| 000000a0 90 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 |................| 000000b0 80 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 |..~}|{zyxwvutsrq| 000000c0 70 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 0a |ponmlkjihgfedcb.| 000000d0 61 60 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 |a`_^]\[ZYXWVUTSR| 000000e0 51 50 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 |QPONMLKJIHGFEDCB| 000000f0 41 40 0a |A at .| 000000f3 $ python biopython_sanger2illumina.py < sanger_93.fastq | hexdump -C -v00000000 40 54 65 73 74 20 50 48 52 45 44 20 71 75 61 6c |@Test PHRED qual| 00000010 69 74 69 65 73 20 66 72 6f 6d 20 39 33 20 74 6f |ities from 93 to| 00000020 20 30 20 69 6e 63 6c 75 73 69 76 65 0a 41 43 54 | 0 inclusive.ACT| 00000030 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 |GACTGACTGACTGACT| 00000040 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 |GACTGACTGACTGACT| 00000050 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 |GACTGACTGACTGACT| 00000060 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 |GACTGACTGACTGACT| 00000070 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 |GACTGACTGACTGACT| 00000080 47 41 43 54 47 41 43 54 47 41 4e 0a 2b 0a 9d 9c |GACTGACTGAN.+...| 00000090 9b 9a 99 98 97 96 95 94 93 92 91 90 8f 8e 8d 8c |................| 000000a0 8b 8a 89 88 87 86 85 84 83 82 81 80 7f 7e 7d 7c |.............~}|| 000000b0 7b 7a 79 78 77 76 75 74 73 72 71 70 6f 6e 6d 6c |{zyxwvutsrqponml| 000000c0 6b 6a 69 68 67 66 65 64 63 62 61 60 5f 5e 5d 5c |kjihgfedcba`_^]\| 000000d0 5b 5a 59 58 57 56 55 54 53 52 51 50 4f 4e 4d 4c |[ZYXWVUTSRQPONML| 000000e0 4b 4a 49 48 47 46 45 44 43 42 41 40 0a |KJIHGFEDCBA at .| 000000ed Biopython and EMBOSS 6.1.0 differ regarding the plus line, but agree on the quality string which runs from 0x9d to 0x40 (in hex), or 157 to 64 in decimal, which after subtracting the Illumina offset of 64, gives PHRED scores of 93 to 0 as desired. Now to BioPerl, $ perl bioperl_sanger2illumina.pl < sanger_93.fastq @Test PHRED qualities from 93 to 0 inclusive ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN +Test PHRED qualities from 93 to 0 inclusive hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ $ perl bioperl_sanger2illumina.pl < sanger_93.fastq | hexdump -C -v ... BioPerl has output an invalid FASTQ file - it seems to omit the quality scores for the top scoring nucleotides at the start. The BioPerl quality string runs from just "h" to "@", or 0x68 to 0x40 (in hex), giving 104 to 64 in decimal, giving PHRED values of 40 to 0. I think BioPerl should either throw an error, or output the non printing characters as done by Biopython and EMBOSS. Regards, Peter C. (@Biopython) From j_martin at lbl.gov Fri Jul 24 12:22:02 2009 From: j_martin at lbl.gov (Joel Martin) Date: Fri, 24 Jul 2009 09:22:02 -0700 Subject: [Bioperl-l] how to stop prerequisite modules auto-installing In-Reply-To: <320fb6e00907240600p7cc41b37wc7c0f748160f109@mail.gmail.com> References: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> <5128A289-377E-4EC3-9030-E0E91B463EA1@illinois.edu> <320fb6e00907240228u797c316ds94297f349d8c097c@mail.gmail.com> <222FBEA4-37CD-4619-9BBD-AB502CF85AD5@illinois.edu> <320fb6e00907240600p7cc41b37wc7c0f748160f109@mail.gmail.com> Message-ID: <20090724162202.GA1512@eniac.jgi-psf.org> Hello, I went to test bioperl-live and Build.PL started updating modules in my perl install w/o prompting me, frightening! I really need to test modules and other groups here need to test them before they're updated so we don't break anything when some module's api changes. I did svn co of bioperl-live then perl Build.PL PREFIX=/scratch/bioperl-live saw some output including "I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand" then it went to cpan and tried updating Data::Stag in my perl install. How can I ask it to prompt me before updating modules ( so I can put the updated versions somewhere for it to find that isn't the live perl install )? Should I be running Build.PL indirectly? Joel From cjfields at illinois.edu Fri Jul 24 14:26:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Jul 2009 13:26:29 -0500 Subject: [Bioperl-l] Bioperl Entrez Esearch In-Reply-To: References: <18DF7D20DFEC044098A1062202F5FFF32A7FFF3CEF@exchsth.agresearch.co.nz> Message-ID: <9BD34010-4386-43B3-868B-521674800242@illinois.edu> Hard to determine what to do w/o seeing any code or knowing what these warnings/errors are. BTW, these exceptions/warnings are there for a good reason (either a server side issue or a bug in the code). chris On Jul 21, 2009, at 10:07 AM, Rajasekar Karthik wrote: > Russell / Others, > The utilities keep printing out warnings and errors. Is there any > way to > a) either not print at all > b) or send them to some other log file other than apache's error.log > > Thanks. > > On Wed, Jul 15, 2009 at 5:34 PM, Rajasekar Karthik >wrote: > >> that helps - thanks!!! >> >> >> On Tue, Jul 14, 2009 at 6:33 PM, Smithies, Russell < >> Russell.Smithies at agresearch.co.nz> wrote: >> >>> You sure can. >>> Take a look at http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook >>> >>> >>> --Russell >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Rajasekar Karthik >>>> Sent: Wednesday, 15 July 2009 10:23 a.m. >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] Bioperl Entrez Esearch >>>> >>>> Hi, >>>> I an new to Bioperl. How can I do an Entrez Esearch using Bioperl? >>>> >>>> For example, I want to do an exact title search in pubmed >>>> Title: Guidelines for quantitative rt-PCR >>>> >>>> Using HTTP Get, I would do something like this >>>> URL: >>>> >>> http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&field=titl&te >>>> rm=Guidelines%20for%20quantitative%20rt-PCR >>>> to get the response XML. >>>> >>>> How can I use Bioperl to do the above action? >>>> >>>> Thanks. >>>> >>>> -- >>>> Best Regards, >>>> Rajasekar Karthik >>>> karthik085 at gmail.com >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> = >>> = >>> = >>> ==================================================================== >>> Attention: The information contained in this message and/or >>> attachments >>> from AgResearch Limited is intended only for the persons or entities >>> to which it is addressed and may contain confidential and/or >>> privileged >>> material. Any review, retransmission, dissemination or other use >>> of, or >>> taking of any action in reliance upon, this information by persons >>> or >>> entities other than the intended recipients is prohibited by >>> AgResearch >>> Limited. If you have received this message in error, please notify >>> the >>> sender immediately. >>> = >>> = >>> = >>> ==================================================================== >>> >> >> >> >> -- >> Best Regards, >> Rajasekar Karthik >> karthik085 at gmail.com >> > > > > -- > Best Regards, > Rajasekar Karthik > karthik085 at gmail.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jul 24 14:27:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Jul 2009 13:27:32 -0500 Subject: [Bioperl-l] Next-gen modules In-Reply-To: <320fb6e00907240600p7cc41b37wc7c0f748160f109@mail.gmail.com> References: <6FA80489-D779-4247-B9EE-BB08ECEA0F8A@ucl.ac.uk> <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> <5128A289-377E-4EC3-9030-E0E91B463EA1@illinois.edu> <320fb6e00907240228u797c316ds94297f349d8c097c@mail.gmail.com> <222FBEA4-37CD-4619-9BBD-AB502CF85AD5@illinois.edu> <320fb6e00907240600p7cc41b37wc7c0f748160f109@mail.gmail.com> Message-ID: <8E25EF23-8D53-420E-B564-94DAE6784162@illinois.edu> On Jul 24, 2009, at 8:00 AM, Peter wrote: > Hi all, > > On Fri, Jul 24, 2009 at 1:19 PM, Chris Fields > wrote: >>> >>> Have you guys (BioPerl) have also gone for "fastq-sanger" instead of >>> just "fastq" for the Sanger Standard version of FASTQ (like EMBOSS)? >>> Does BioPerl use just "fastq" to mean anything? >> >> Short answer: yes, and yes. >> >> Slightly longer answer: I've set up SeqIO so it converts "new(- >> format => >> 'foo-bar')" to new(-format => 'foo, -variant => 'bar'). In the fastq >> constructor, if the variant is expected but isn't defined (i.e. for >> 'fastq') >> it defaults to sanger. Makes it a bit easier maintenance-wise if a >> new >> variant pops up. > > Right, so BioPerl understands "fastq" and "fastq-sanger" to mean the > Sanger standard FASTQ files. Yes. > I've just updated Biopython to also allow "fastq-sanger" as an alias > for > "fastq", so we are consistent here: > http://lists.open-bio.org/pipermail/biopython-dev/2009-July/ > 006466.html > > Biopython, BioPerl and EMBOSS now all agree on the format names: > * "fastq-sanger" - PHRED scores offset 33 > * "fastq-solexa" - Solexa scores offset 64 > * "fastq-illumina" - PHRED scores offset 64 > > And Biopython and BioPerl also agree on the meaning of "fastq" as > an alias for "fastq-sanger". Unfortunately EMBOSS differs here, see: > http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000599.html > > Does BioJava or BioRuby have a SeqIO equivalent where they need > to give different sequence formats unique names? If so, we should > talk to them soon... > > Peter Not sure, but it would be nice to have consistency there, yes. chris From cjfields at illinois.edu Fri Jul 24 14:32:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Jul 2009 13:32:48 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <6dce9a0b0907240631n48ce47bela1c3446a8c8c3d93@mail.gmail.com> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <9C987542-1B90-4462-9DE9-F88007579ACA@illinois.edu> <2182D83B-D855-48B5-B57B-52F1D0FC78B6@gmail.com> <6dce9a0b0907240631n48ce47bela1c3446a8c8c3d93@mail.gmail.com> Message-ID: <9545B34D-6B89-40E7-B313-7144C5792BBD@illinois.edu> Lincoln, I recall seeing somewhere in the Bio::DB::SeqFeature code a reliance on some of the Bio::DB::GFF Utility stuff (rearrange and binning come to mind). Thinking about it, these are pretty commonly used. Maybe we could move some of these to Bio::Root::Utilities and just export/import code as needed. This way both GFF and SeqFeature::Store could use it. chris On Jul 24, 2009, at 8:31 AM, Lincoln Stein wrote: > My preference would be to split both Bio::DB::SeqFeature and > Bio::DB::GFF into their own module. I don't think they depend on > each other, but I'm not 100% sure! > > Lincoln > > On Sat, Jul 18, 2009 at 8:23 AM, Scott Cain > wrote: > Hi All, > > I don't want to wade in too deeply, but I like the idea of splitting > things up. I think the Bio::Graphics split has gone well and has > made life easier in GBrowse world. I could see Bio::DB::SeqFeature > and Bio::DB::GFF being split and either being kept together or going > there separate ways (though I have a nagging suspicion that > SeqFeature code depends on GFF code in a few places, so it may make > sense to just keep them together. > > And Chris, if it makes you feel any better, I don't think anything > you've done or not done has held up GBrowse2. > > Scott > > > > On Jul 17, 2009, at 11:14 PM, Chris Fields wrote: > > My 2c... > > On Jul 17, 2009, at 12:01 PM, Jason Stajich wrote: > > Will try to weigh in more, a little bit of stream of consciousness > to let you know I'm thinking about it. Tough summer to focus much > on this. > > Yes, for me as well. That will change soon (approx two weeks) ;> > > It's too bad we are apparently the laughing stock of Perl gurus, but > it would be great to see how to modernize aspects of the development. > > I'm curious how it will work that we'll have dozens of separate > distros that we'll have a hard time keeping track of what directory > things are in? Will there have to be a master list of what version > and what modules are in what distro now? > > I don't think we're a laughingstock as much as we haven't had the > time to dedicate towards this (and much of this occurred at a point > early on, with that whole 'Cathedral and Bazaar' esr-based thingy). > BTW,, those same gurus shouldn't speak: perl core is just as bad and > riddled with worse bugs, though rgs and co. wouldn't admit it. > > In fact, base.pm itself has a nasty one; I'm surprised no one in the > bioperl community has noticed it yet (it's listed as a bug on RT I > think): > > pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print > $Bio::SeqIO::VERSION."\n"' > 1.0069 > pyrimidine1:biomoose cjfields$ perl -MBio::SeqIO -e 'print > $Bio::Root::IO::VERSION."\n"' > -1, set by base.pm > > Imported modules do not have VERSION set correctly when it is > exported. This hasn't become an issue in bioperl yet (it's really > an edge case), but several devs have run into this. And really, why > set VERSION to a string like '-1, set by base.pm'? > > Anyway, re: versioning, the way I think about it, if we have a small > very stable core with version X, and a focused very stable module > group with version Y, other distributions would have a separate > version and require subgroup version Y (which would in turn require > core version X). CPAN would take care of it. This isn't much > different than what occurs everyday on CPAN anyway (Jay's Catalyst, > Moose and MooseX, and so on). In fact, several Moose-requiring > distributions don't require the latest Moose. > > When I do a SVN (or git) checkout do I need to checkout each of > these in its own directory? Or will there be a master packaging > script that makes the necessary zip files for CPAN submission? > > Not sure; that would be up to us I suppose. I think it would be > easier to maintain and release if they were separate or packaged up > as Jay suggests. > > If they are in separate directories are we organizing by conceptual > topic (phylogenetics, alignment, database search) or by namespace of > the modules? > > By topic, retaining namespaces. We have a basic Bio::* directory > structure already in place for various generic terms (Tools, DB, > etc), so I see this crossing simple namespaces very easily. And as > I pointed out to Robert, several of those could possibly go together. > > Do all the 'database' modules live together - probably not - so do > we name bioperl-db-remote bioperl-db-local-index, bioperl-db-local- > sql, etc? really bioperl-db is somewhat focused on sequences and > features, but what about things that integrate multiple data types - > like biosql? > > I don't see bioperl-db (BioSQL) being split up. I think it's too > intrinsically linked and cohesive (it's almost a separate core unto > itself), so it would be counterproductive to do so. > > Maybe have bioperl-db become bioperl-biosql. Web-based = bioperl- > remotedb. Local = bioperl-localdb. OBDA = bioperl-obda. > > If they are in separate directories, what about all the test data > that might be shared, is this replicated among all the sub- > directories - how do we do a good job keeping that up to date, could > we have a test-data distro instead with symlinks within SVN? > > We have to see how much is actually shared and proceed from there. > I would like to eventually resurrect the idea of a separate biodata > repo that we could just ftp the data from as needed. That would cut > down on the package size quite a bit, but I'm not sure how feasible > that is from the testing point of view (would we have to skip all > tests if there were no network access)? > > For some other obvious modules that can be split off and self- > contained, each of these could be a package. I would estimate more > than 20 packages depending on how Bio::Tools are carved up. > - I think Bio::DB::SeqFeature needs to be split off for sure this is > a nice logical peeling off. Could be another test case since it is > a Gbrowse dependancy > - Bio::DB::GFF as well for the same reasons. > > Completely agree (and I think Lincoln would like this as well). > > - Bio::PopGen - self contained for the most part, but depends on > Bio::Tree and Bio::Align objects > > Could list those as a required dependency. > > - Bio::Variation > - Bio::Map and Bio::MapIO > - Bio::Cluster and Bio::ClusterIO > - Bio::Assembly > - Bio::Coordinate > > My nightmare is that we're going to have to manage a lot of 'use XX > 1.01' enforcing version requiring when dealing with the dependancies > on the interface classes and having to keep these all up to date? > The version was implicit when they are all part of the same big > distro. > > Right. But it also becomes a maintenance problem when serious bugs > in one module impede the needed release of others to CPAN. > > Also the splits need not only include one namespace if need be I > guess but we have generally grouped things by namespace. > > What do you want to do about the bioperl-run. Do we make a set of > parallel splits from all of these? I think at the outset we need to > coordinate the applications supported here in some sort of loose > ontology - the namespaces were not consistently applied so we have > some alignment tools in different directories, etc. So the > namespace sort of classifies them but it could be better. One of > the challenges of multiple developers without a totally shared > vision on how it should be done. > > We could split bp-run and Tools, pairing the wrappers with the > relevant parsers modules. Not sure if this can be done with > SearchIO as well but it could be tested to see how feasible that > would be. > > I'm not convinced that the Bio::Graphics splitoff has been painless > so we should take stock of how that is working. > > Really? Lincoln has made several fixes lately on CPAN, so I thought > everything was going well. If anything I would think the lack of > additional 1.6.x bioperl releases has probably held Gbrowse 2.0 up > more due to Bio::DB::SeqFeature (my fault, but as you know life and > job take precedence sometimes). > > It seems like this split off would be a way to better streamline > things in bioperl so that modern versions of bioperl might be able > to better interface with things like Ensembl again too. > > How much of this effort is worth triaging on the current code versus > the efforts we want to make on a cleaner, simpler bioperl system > that appears to scare so many users (and potential developers) off. > > I say triage away on a branch, but we need to indicate which ones to > whittle out first. The reason I believe we went for a larger split > initially (as indicated on the wiki page) was to push something > forward and not get too bogged down in the details. But we may as > well go full throttle and do this right away. > > Okay I rambled, hope that was helpful. > > -jason > -- > Jason Stajich > jason at bioperl.org > > Very, very helpful. Now I need a beer. > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ----------------------------------------------------------------------- > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Lincoln D. Stein > Director, Informatics and Biocomputing Platform > Ontario Institute for Cancer Research > 101 College St., Suite 800 > Toronto, ON, Canada M5G0A3 > 416 673-8514 > Assistant: Renata Musa From cjfields at illinois.edu Fri Jul 24 14:38:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Jul 2009 13:38:51 -0500 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> Message-ID: On Jul 24, 2009, at 8:32 AM, Peter wrote: > Hi all, > > Peter Rice kindly said he will look into an OBF cross project mailing > list, but in the meantime this has been cross posted to the Biopython, > BioPerl, and EMBOSS development lists. That's a great idea! Would help cut down on the cross-posting (I'm getting this directly and via bioperl and biopython). > On Thu, Jul 23, 2009 at 11:58 PM, Chris > Fields wrote: >>> I'd like to get comparisons against BioPerl's new FASTQ support >>> going too. To do this I'd need to know which (branch?) of BioPerl I >>> should install, and I'd also like a trivial sample BioPerl script >>> to do >>> piped FASTQ conversion. i.e. read a FASTQ file from stdin (say >>> as "fastq-solexa"), and output it to stdout (say as "fastq" meaning >>> the Sanger Standard FASTQ). >> >> You would have to install svn (bioperl-live) if you want the >> refactored >> fastq. That commit was within the last month. > > I've got SVN bioperl-live installed and apparently working :) > >>> i.e. Something like this four line Biopython script would be >>> perfect: >>> http://biopython.org/wiki/Reading_from_unix_pipes >> >> We use named parameters so it's a little more verbose. >> >> use Bio::SeqIO; >> my $in = Bio::SeqIO->new(-fh => \*STDIN, -format => 'fastq-sanger'); >> my $out = Bio::SeqIO->new(-format => 'fastq-solexa'); >> while (my $seq = $in->next_seq) { $out->write_seq($seq) } >> >> Don't be surprised if there are still bugs lurking about, just let >> me know >> and I'll fix 'em. > > I've got a bug report coming up in a second email, but the basics > work :) > > e.g. Using this Sanger style FASTQ file, and converting it to Solexa > style > http://biopython.org/SRC/biopython/Tests/Quality/example.fastq > > $ more example.fastq > @EAS54_6_R1_2_1_413_324 > CCCTTCTTGTCTTCAGCGTTTCTCC > + > ;;3;;;;;;;;;;;;7;;;;;;;88 > @EAS54_6_R1_2_1_540_792 > TTGGCAGGCCAAGGCCGATGGATCA > + > ;;;;;;;;;;;7;;;;;-;;;3;83 > @EAS54_6_R1_2_1_443_348 > GTTGCTTCTGGCGTGGGTGGGGGGG > + > ;;;;;;;;;;;9;7;;.7;393333 > > This is simple three record FASTQ file (in the Sanger format). > > Using EMBOSS 6.1.0: > > $ seqret -filter -sformat fastq-sanger -osformat fastq-solexa < > example.fastq > @EAS54_6_R1_2_1_413_324 > CCCTTCTTGTCTTCAGCGTTTCTCC > +EAS54_6_R1_2_1_413_324 > ZZRZZZZZZZZZZZZVZZZZZZZWW > @EAS54_6_R1_2_1_540_792 > TTGGCAGGCCAAGGCCGATGGATCA > +EAS54_6_R1_2_1_540_792 > ZZZZZZZZZZZVZZZZZLZZZRZWR > @EAS54_6_R1_2_1_443_348 > GTTGCTTCTGGCGTGGGTGGGGGGG > +EAS54_6_R1_2_1_443_348 > ZZZZZZZZZZZXZVZZMVZRXRRRR > > Using BioPerl: > > $ perl bioperl_sanger2solexa.pl < example.fastq > @EAS54_6_R1_2_1_413_324 > CCCTTCTTGTCTTCAGCGTTTCTCC > +EAS54_6_R1_2_1_413_324 > ZZRZZZZZZZZZZZZVZZZZZZZWW > @EAS54_6_R1_2_1_540_792 > TTGGCAGGCCAAGGCCGATGGATCA > +EAS54_6_R1_2_1_540_792 > ZZZZZZZZZZZVZZZZZLZZZRZWR > @EAS54_6_R1_2_1_443_348 > GTTGCTTCTGGCGTGGGTGGGGGGG > +EAS54_6_R1_2_1_443_348 > ZZZZZZZZZZZXZVZZMVZRXRRRR > > Using Biopython: > > $ python biopython_sanger2solexa.py < example.fastq > @EAS54_6_R1_2_1_413_324 > CCCTTCTTGTCTTCAGCGTTTCTCC > + > ZZRZZZZZZZZZZZZVZZZZZZZWW > @EAS54_6_R1_2_1_540_792 > TTGGCAGGCCAAGGCCGATGGATCA > + > ZZZZZZZZZZZVZZZZZLZZZRZWR > @EAS54_6_R1_2_1_443_348 > GTTGCTTCTGGCGTGGGTGGGGGGG > + > ZZZZZZZZZZZXZVZZMVZRXRRRR > > They all agree, except that Biopython has followed the MAQ > convention of omitting the (optional) repeat of the captions > on the plus lines. This is something I'd already asked Peter > Rice about for EMBOSS (but I think we got sidetracked): > http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000577.html > > Peter Good to know the conversion is working, I was basically writing some code in the dark on that (and keeping my fingers crossed ;) As for the optional header, we could add a flag to allow the user the option of printing it or not. Would be easy enough; we can follow your lead as to what the default behavior is. I'll take a look at the bug and try to get it into the next point release, hopefully not be anything too hard to fix. chris From hlapp at gmx.net Fri Jul 24 15:55:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 24 Jul 2009 12:55:54 -0700 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <716af09c0907240519tdba21fcjaddcdceeeef91bc2@mail.gmail.com> References: <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> <4A60EBB5.4010004@cornell.edu> <4A60FFF8.3030302@jays.net> <66FDE248-4CF8-4F68-91D5-16D0AE30B36E@illinois.edu> <91389D4D-B46C-49BA-9D5D-04DD82014B1C@jays.net> <025907D4D2344FDC90E915E605B7FEB8@NewLife> <2379556E-937B-4BAC-9BA4-6C0092AD804B@berkeleybop.org> <716af09c0907240519tdba21fcjaddcdceeeef91bc2@mail.gmail.com> Message-ID: <4B94CA67-129D-487D-8A77-ABF31B5121CA@gmx.net> On Jul 24, 2009, at 5:19 AM, Bernd Web wrote: > Actually, I and many student I worked with really likes the monolithic > form of BioPerl. No fuss in choosing what you want and finding out > later you need more. I have to agree with this. This is what have done for the courses that we run; anything else I think would be rather painful and not worth the time. Of course that doesn't mean that there can't be simply a - physical or virtual - bundle that will download all the sub-packages and install them. Just realize that there is quite a few out there too who simply need a painless way to install "everything, no questions asked". -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Fri Jul 24 16:57:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 24 Jul 2009 15:57:03 -0500 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4B94CA67-129D-487D-8A77-ABF31B5121CA@gmx.net> References: <4A603F82.9020202@cornell.edu> <0F76BD98-C8B7-49F7-8A3C-46AA619C023D@bioperl.org> <4A60EBB5.4010004@cornell.edu> <4A60FFF8.3030302@jays.net> <66FDE248-4CF8-4F68-91D5-16D0AE30B36E@illinois.edu> <91389D4D-B46C-49BA-9D5D-04DD82014B1C@jays.net> <025907D4D2344FDC90E915E605B7FEB8@NewLife> <2379556E-937B-4BAC-9BA4-6C0092AD804B@berkeleybop.org> <716af09c0907240519tdba21fcjaddcdceeeef91bc2@mail.gmail.com> <4B94CA67-129D-487D-8A77-ABF31B5121CA@gmx.net> Message-ID: <81A89B04-F3A2-4DCC-8DC0-1ABD6B38E1D6@illinois.edu> On Jul 24, 2009, at 2:55 PM, Hilmar Lapp wrote: > On Jul 24, 2009, at 5:19 AM, Bernd Web wrote: > >> Actually, I and many student I worked with really likes the >> monolithic >> form of BioPerl. No fuss in choosing what you want and finding out >> later you need more. > > > I have to agree with this. This is what have done for the courses > that we run; anything else I think would be rather painful and not > worth the time. > > Of course that doesn't mean that there can't be simply a - physical > or virtual - bundle that will download all the sub-packages and > install them. Just realize that there is quite a few out there too > who simply need a painless way to install "everything, no questions > asked". > > -hilmar > -- That's in the plan, at least the virtual bundling part. We would either reutilize Bundle::BioPerl for this purpose, or have a Task::BioPerl that would install the requested modules using a Module::Install (it comes with the bundler). I'm not sure about physically bundling it. Not sure how to go about setting that up beyond faking a local installation, zipping it up, and shipping it (and that would be w/o tests, Build.PL, etc). chris From cjfields at illinois.edu Sat Jul 25 15:50:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 25 Jul 2009 14:50:13 -0500 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> Message-ID: <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> On Jul 24, 2009, at 10:12 AM, Peter wrote: > On Fri, Jul 24, 2009 at 2:53 PM, > Peter wrote: >> On Fri, Jul 24, 2009 at 2:32 PM, Peter> > wrote: >>>> >>>> Don't be surprised if there are still bugs lurking about, just >>>> let me >>>> know and I'll fix 'em. >>> >>> I've got a bug report coming up in a second email, but the basics >>> work :) >> >> I think I have found a bug in BioPerl's conversion from fastq-solexa >> to fastq-sanger concerning lower quality scores. > > Next up is an issue with BioPerl converting from Sanger to Illumina. > In principle this is simple - the quality strings both use PHRED > scores > just with different offsets. With lower PHRED scores, everything is > fine: > > $ more sanger_faked.fastq > @Test PHRED qualities from 40 to 0 inclusive > ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN > + > IHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! > > Again, this is an example constructed by hand to cover a broad > range of valid scores, and can be found in the Biopython > repository under biopython/Tests/Quality > > $ perl bioperl_sanger2illumina.pl < sanger_faked.fastq @Test PHRED > qualities from 40 to 0 inclusive > ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN > +Test PHRED qualities from 40 to 0 inclusive > hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ > > $ python biopython_sanger2illumina.py < sanger_faked.fastq > @Test PHRED qualities from 40 to 0 inclusive > ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN > + > hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ > > So, BioPerl and Biopython (and EMBOSS) agree - apart from > the repeating second title on the plus line. I understand that > EMBOSS will in future omit the repeated title on the plus line: > http://lists.open-bio.org/pipermail/emboss-dev/2009-July/000598.html We'll make it optional and default to no header. > Now, here comes the problem. I believe FASTQ files directly > from an Illumina 1.3+ pipeline will have PHRED scores in the > range 0 to 40 (as in this example). However, much higher > PHRED scores are possible during assembly / contig'ing > and read mapping. For example, the tool MAQ will output > Sanger style FASTQ files with PHRED scores in the range > 0 to 93 inclusive. Is this behavior documented anywhere, specifically by Illumina (that values can exceed 40)? If Illumina 1.3 is specified as being PHRED 0-40, and another (non-Illumina) software package pushes that limit above the specified range of Illumina values, I would consider that unfortunately yet another variant. We can support it as Illumina 1.3, but my point is this may getting into a grey area and may be something that Illumina doesn't/wouldn't support. Reminds me a little of the multiple GFF2 variations (one of the main reasons for a GFF3). > Now, in the Sanger FASTQ format, PHRED scores of 0 to > 93 map onto ASCII values of 33 to 126 (! to ~). There is a > reason for stopping at 126, since ASCII 127 is "delete". > > However, in the Illumina 1.3+ FASTQ format, PHRED > scores of 0 to 93 would map to ASCII values of 64 to > 157, which includes a lot of non printing characters. > Working with such files at the command line or in an > editor is a big problem. Clearly, Illumina never intended > to include such high scores in their FASTQ files! Exactly. > Nevertheless, it is possible to write a FASTQ format > following the Illumina 1.3+ encoding with these values. > Biopython and EMBOSS attempt to do this - although I > would regard throwing an error as equally acceptable. > > So, here is another hand constructed example of a > Sanger style FASTQ file using the full quality range: > > $ more sanger_93.fastq > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > + > ~}|{zyxwvutsrqponmlkjihgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@?>=<;: > 9876543210/.-,+*)('&%$#"! > > Again, this example is in the Biopython repository under > biopython/Tests/Quality > > Just to check: > > $ python biopython_sanger2qual.py < sanger_93.fastq >> Test PHRED qualities from 93 to 0 inclusive > 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 > 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 > 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 > 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 > 13 12 11 10 9 8 7 6 5 4 3 2 1 0 > > So, here we go - apologies for the expected line mangling: > > $ seqret -filter -sformat fastq-sanger -osformat fastq-illumina < > sanger_93.fastq | hexdump -C -v > 00000000 40 54 65 73 74 20 50 48 52 45 44 20 71 75 61 6c |@Test > PHRED qual| > 00000010 69 74 69 65 73 20 66 72 6f 6d 20 39 33 20 74 6f |ities > from 93 to| > 00000020 20 30 20 69 6e 63 6c 75 73 69 76 65 0a 41 43 54 | 0 > inclusive.ACT| > 00000030 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 | > GACTGACTGACTGACT| > 00000040 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 | > GACTGACTGACTGACT| > 00000050 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 | > GACTGACTGACTGACT| > 00000060 47 41 43 54 47 41 43 54 47 0a 41 43 54 47 41 43 | > GACTGACTG.ACTGAC| > 00000070 54 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 | > TGACTGACTGACTGAC| > 00000080 54 47 41 43 54 47 41 43 54 47 41 4e 0a 2b 54 65 | > TGACTGACTGAN.+Te| > 00000090 73 74 0a 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 | > st..............| > 000000a0 90 8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 > |................| > 000000b0 80 7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 |..~}| > {zyxwvutsrq| > 000000c0 70 6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 0a | > ponmlkjihgfedcb.| > 000000d0 61 60 5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 |a`_^]\ > [ZYXWVUTSR| > 000000e0 51 50 4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 | > QPONMLKJIHGFEDCB| > 000000f0 41 40 0a |A at .| > 000000f3 > > $ python biopython_sanger2illumina.py < sanger_93.fastq | hexdump -C > -v00000000 40 54 65 73 74 20 50 48 52 45 44 20 71 75 61 6c |@Test > PHRED qual| > 00000010 69 74 69 65 73 20 66 72 6f 6d 20 39 33 20 74 6f |ities > from 93 to| > 00000020 20 30 20 69 6e 63 6c 75 73 69 76 65 0a 41 43 54 | 0 > inclusive.ACT| > 00000030 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 | > GACTGACTGACTGACT| > 00000040 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 | > GACTGACTGACTGACT| > 00000050 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 | > GACTGACTGACTGACT| > 00000060 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 | > GACTGACTGACTGACT| > 00000070 47 41 43 54 47 41 43 54 47 41 43 54 47 41 43 54 | > GACTGACTGACTGACT| > 00000080 47 41 43 54 47 41 43 54 47 41 4e 0a 2b 0a 9d 9c | > GACTGACTGAN.+...| > 00000090 9b 9a 99 98 97 96 95 94 93 92 91 90 8f 8e 8d 8c > |................| > 000000a0 8b 8a 89 88 87 86 85 84 83 82 81 80 7f 7e 7d 7c > |.............~}|| > 000000b0 7b 7a 79 78 77 76 75 74 73 72 71 70 6f 6e 6d 6c | > {zyxwvutsrqponml| > 000000c0 6b 6a 69 68 67 66 65 64 63 62 61 60 5f 5e 5d 5c | > kjihgfedcba`_^]\| > 000000d0 5b 5a 59 58 57 56 55 54 53 52 51 50 4f 4e 4d 4c | > [ZYXWVUTSRQPONML| > 000000e0 4b 4a 49 48 47 46 45 44 43 42 41 40 0a | > KJIHGFEDCBA at .| > 000000ed > > Biopython and EMBOSS 6.1.0 differ regarding the plus line, but agree > on the quality string which runs from 0x9d to 0x40 (in hex), or 157 to > 64 in decimal, which after subtracting the Illumina offset of 64, > gives > PHRED scores of 93 to 0 as desired. > > Now to BioPerl, > > $ perl bioperl_sanger2illumina.pl < sanger_93.fastq > @Test PHRED qualities from 93 to 0 inclusive > ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAN > +Test PHRED qualities from 93 to 0 inclusive > hgfedcba`_^]\[ZYXWVUTSRQPONMLKJIHGFEDCBA@ > > $ perl bioperl_sanger2illumina.pl < sanger_93.fastq | hexdump -C -v > ... > > BioPerl has output an invalid FASTQ file - it seems to omit the > quality scores for the top scoring nucleotides at the start. The > BioPerl quality string runs from just "h" to "@", or 0x68 to 0x40 > (in hex), giving 104 to 64 in decimal, giving PHRED values of > 40 to 0. I think BioPerl should either throw an error, or output > the non printing characters as done by Biopython and EMBOSS. > > Regards, > > Peter C. > (@Biopython) If this is accepted as common practice between BioPython and EMBOSS we will follow similarly. I do think it's worth at least a warning for the reasons outlined above (e.g. it likely isn't Illumina's intent to support qual values outside the specified range). Might be worth checking into. From this it could be summarized that converting to sanger format is least problematic, as possible issues may be encountered when converting to the other variants. We'll need to fix the solexa quality calculations in the BioPerl parser as noted in your previous post; I'll work on that. chris From biopython at maubp.freeserve.co.uk Sat Jul 25 17:12:26 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 25 Jul 2009 22:12:26 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> Message-ID: <320fb6e00907251412u5f53b24eiea618906e607a0e1@mail.gmail.com> On Sat, Jul 25, 2009 at 8:50 PM, Chris Fields wrote: > >> Now, here comes the problem. I believe FASTQ files directly >> from an Illumina 1.3+ pipeline will have PHRED scores in the >> range 0 to 40 (as in this example). However, much higher >> PHRED scores are possible during assembly / contig'ing >> and read mapping. For example, the tool MAQ will output >> Sanger style FASTQ files with PHRED scores in the range >> 0 to 93 inclusive. > > Is this behavior documented anywhere, specifically by Illumina (that values > can exceed 40)? If Illumina 1.3 is specified as being PHRED 0-40, and > another (non-Illumina) software package pushes that limit above the > specified range of Illumina values, I would consider that unfortunately yet > another variant. > > We can support it as Illumina 1.3, but my point is this may getting into a > grey area and may be something that Illumina doesn't/wouldn't support. > Reminds me a little of the multiple GFF2 variations (one of the main > reasons for a GFF3). I agree this is an grey area (high scores in Solexa/Illumina FASTQ files). >> Now, in the Sanger FASTQ format, PHRED scores of 0 to >> 93 map onto ASCII values of 33 to 126 (! to ~). There is a >> reason for stopping at 126, since ASCII 127 is "delete". >> >> However, in the Illumina 1.3+ FASTQ format, PHRED >> scores of 0 to 93 would map to ASCII values of 64 to >> 157, which includes a lot of non printing characters. >> Working with such files at the command line or in an >> editor is a big problem. Clearly, Illumina never intended >> to include such high scores in their FASTQ files! > > Exactly. > >> Nevertheless, it is possible to write a FASTQ format >> following the Illumina 1.3+ encoding with these values. >> Biopython and EMBOSS attempt to do this - although I >> would regard throwing an error as equally acceptable. >> >> So, here is another hand constructed example of a >> Sanger style FASTQ file using the full quality range: >> >> ... >> >> Biopython and EMBOSS 6.1.0 differ regarding the plus line, but agree >> on the quality string which runs from 0x9d to 0x40 (in hex), or 157 to >> 64 in decimal, which after subtracting the Illumina offset of 64, gives >> PHRED scores of 93 to 0 as desired. >> >> Now to BioPerl, >> >> $ perl bioperl_sanger2illumina.pl < sanger_93.fastq >> ... >> >> $ perl bioperl_sanger2illumina.pl < sanger_93.fastq | hexdump -C -v >> ... >> >> BioPerl has output an invalid FASTQ file - it seems to omit the >> quality scores for the top scoring nucleotides at the start. The >> BioPerl quality string runs from just "h" to "@", or 0x68 to 0x40 >> (in hex), giving 104 to 64 in decimal, giving PHRED values of >> 40 to 0. I think BioPerl should either throw an error, or output >> the non printing characters as done by Biopython and EMBOSS. > > If this is accepted as common practice between BioPython and EMBOSS > we will follow similarly. I do think it's worth at least a warning for the > reasons outlined above (e.g. it likely isn't Illumina's intent to support qual > values outside the specified range). Might be worth checking into. True. I think what EMBOSS and Biopython are doing is reasonable (although a warning in this situation makes sense). Equally, an error is a valid option. However, one question is when would you issue the warning/error? For a PHRED score above 40? (Assuming we have a definative reference for Illumina using just 0 to 40). How about if a problem character would result? Since ASCII 64+63=127, the first problem character would be for PHRED score 63. i.e. An Illumina FASTQ format file can hold PHRED scores in the range 0 to 62 without using problem characters. And likewise for a Solexa FASTQ file (Solexa scores up to 62). > From this it could be summarized that converting to sanger format is least > problematic, as possible issues may be encountered when converting to the > other variants. Yes. The Sanger FASTQ format will hold PHRED scores from 0 to 93 while using nice ASCII characters - this means it is suitable for both raw reads and processed data from assemblies or read mappings. In my personal experience, Solexa/Illumina FASTQ files tend to get converted into the Sanger FASTQ format for downstream analysis (e.g. the MAQ tool, or the NCBI short read archive). i.e. Writing high quality reads (i.e. above PHRED 40) to Solexa or Illumina FASTQ files is unlikely. > We'll need to fix the solexa quality calculations in the BioPerl > parser as noted in your previous post; I'll work on that. Great. Peter From cjfields at illinois.edu Sat Jul 25 17:28:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 25 Jul 2009 16:28:41 -0500 Subject: [Bioperl-l] A new name for Bio::Moose? Message-ID: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> All, Pushed by a recent suggestion by Robert, I am considering changing the name of the Bio::Moose project to something simpler. I would like to steer away from naming this directly after the implementation and have something simpler namespace-wise. I have thought of 'Alces' (the genus name for moose), which indicates both the Bio aspect and the implementation in an more indirect way (and is a bit shorter). However, I would like to solicit suggestions for alternatives. The shorter the better, and the 'winner' will receive a free beverage or so (on me) should we meet! chris From cjfields at illinois.edu Sat Jul 25 17:47:11 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 25 Jul 2009 16:47:11 -0500 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: <320fb6e00907251412u5f53b24eiea618906e607a0e1@mail.gmail.com> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> <320fb6e00907251412u5f53b24eiea618906e607a0e1@mail.gmail.com> Message-ID: <312CA078-FA72-4DB1-A447-88A83BEF7D9A@illinois.edu> On Jul 25, 2009, at 4:12 PM, Peter wrote: > On Sat, Jul 25, 2009 at 8:50 PM, Chris Fields > wrote: >> >> If this is accepted as common practice between BioPython and EMBOSS >> we will follow similarly. I do think it's worth at least a warning >> for the >> reasons outlined above (e.g. it likely isn't Illumina's intent to >> support qual >> values outside the specified range). Might be worth checking into. > > True. I think what EMBOSS and Biopython are doing is reasonable > (although a warning in this situation makes sense). Equally, an > error is a valid option. However, one question is when would you > issue the warning/error? For a PHRED score above 40? (Assuming > we have a definative reference for Illumina using just 0 to 40). > How about if a problem character would result? Since ASCII > 64+63=127, the first problem character would be for PHRED > score 63. > > i.e. An Illumina FASTQ format file can hold PHRED scores in the > range 0 to 62 without using problem characters. And likewise > for a Solexa FASTQ file (Solexa scores up to 62). I don't think there is a middle ground, we either indicate it the score falls outside the specified range (and warn/throw), or we allow it completely and just run the conversion w/o warnings, regardless of output. The former would at least let the user know what the problem is when they look at their output. If we issue a warning it would pop up only if the bounds are passed. I will probably set this up to occur only warn once (if needed I could cache the out-of-range quals and print them). >> From this it could be summarized that converting to sanger format >> is least >> problematic, as possible issues may be encountered when converting >> to the >> other variants. > > Yes. The Sanger FASTQ format will hold PHRED scores from 0 to 93 > while using nice ASCII characters - this means it is suitable for both > raw reads and processed data from assemblies or read mappings. > > In my personal experience, Solexa/Illumina FASTQ files tend to get > converted into the Sanger FASTQ format for downstream analysis > (e.g. the MAQ tool, or the NCBI short read archive). > > i.e. Writing high quality reads (i.e. above PHRED 40) to Solexa or > Illumina FASTQ files is unlikely. Yes, though we can unfortunately never rule it out, just try to account for the possibility in some way. >> We'll need to fix the solexa quality calculations in the BioPerl >> parser as noted in your previous post; I'll work on that. > > Great. > > Peter chris From j_martin at lbl.gov Sat Jul 25 20:23:49 2009 From: j_martin at lbl.gov (Joel Martin) Date: Sat, 25 Jul 2009 17:23:49 -0700 Subject: [Bioperl-l] A new name for Bio::Moose? In-Reply-To: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> References: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> Message-ID: <20090726002349.GE17078@eniac.jgi-psf.org> Boose; [ bio object oriented script enhancement ] [ bio object ornamenting sequence engine ] or just because it rhymes with goose. Joel On Sat, Jul 25, 2009 at 04:28:41PM -0500, Chris Fields wrote: > All, > > Pushed by a recent suggestion by Robert, I am considering changing the name > of the Bio::Moose project to something simpler. I would like to steer away > from naming this directly after the implementation and have something > simpler namespace-wise. > > I have thought of 'Alces' (the genus name for moose), which indicates both > the Bio aspect and the implementation in an more indirect way (and is a bit > shorter). However, I would like to solicit suggestions for alternatives. > The shorter the better, and the 'winner' will receive a free beverage or so > (on me) should we meet! > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Sun Jul 26 08:33:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 26 Jul 2009 08:33:10 -0400 Subject: [Bioperl-l] A new name for Bio::Moose? In-Reply-To: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> References: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> Message-ID: <12400DDA970840B59F973D0B890BB239@NewLife> maybe 'Biome' (Bioperl with Metaobject Extensions) cheers ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Cc: "Siddhartha Basu" Sent: Saturday, July 25, 2009 5:28 PM Subject: [Bioperl-l] A new name for Bio::Moose? > All, > > Pushed by a recent suggestion by Robert, I am considering changing the > name of the Bio::Moose project to something simpler. I would like to > steer away from naming this directly after the implementation and have > something simpler namespace-wise. > > I have thought of 'Alces' (the genus name for moose), which indicates > both the Bio aspect and the implementation in an more indirect way > (and is a bit shorter). However, I would like to solicit suggestions > for alternatives. The shorter the better, and the 'winner' will > receive a free beverage or so (on me) should we meet! > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From vecchi.b at gmail.com Sun Jul 26 12:42:38 2009 From: vecchi.b at gmail.com (Bruno Vecchi) Date: Sun, 26 Jul 2009 09:42:38 -0700 Subject: [Bioperl-l] A new name for Bio::Moose? In-Reply-To: <12400DDA970840B59F973D0B890BB239@NewLife> References: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> <12400DDA970840B59F973D0B890BB239@NewLife> Message-ID: <1a0c1b750907260942m4a09e6d0vd71603c07357850@mail.gmail.com> BioM? Extensions could go to BioMX, and BioX would be free to use for extensions of standard BioPerl. 2009/7/26 Mark A. Jensen > maybe > > 'Biome' (Bioperl with Metaobject Extensions) > > cheers > ----- Original Message ----- From: "Chris Fields" > To: "BioPerl List" > Cc: "Siddhartha Basu" > Sent: Saturday, July 25, 2009 5:28 PM > Subject: [Bioperl-l] A new name for Bio::Moose? > > > > All, >> >> Pushed by a recent suggestion by Robert, I am considering changing the >> name of the Bio::Moose project to something simpler. I would like to >> steer away from naming this directly after the implementation and have >> something simpler namespace-wise. >> >> I have thought of 'Alces' (the genus name for moose), which indicates >> both the Bio aspect and the implementation in an more indirect way (and is >> a bit shorter). However, I would like to solicit suggestions for >> alternatives. The shorter the better, and the 'winner' will receive a free >> beverage or so (on me) should we meet! >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Sun Jul 26 15:33:33 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 26 Jul 2009 20:33:33 +0100 Subject: [Bioperl-l] how to stop prerequisite modules auto-installing In-Reply-To: <20090724162202.GA1512@eniac.jgi-psf.org> References: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> <5128A289-377E-4EC3-9030-E0E91B463EA1@illinois.edu> <320fb6e00907240228u797c316ds94297f349d8c097c@mail.gmail.com> <222FBEA4-37CD-4619-9BBD-AB502CF85AD5@illinois.edu> <320fb6e00907240600p7cc41b37wc7c0f748160f109@mail.gmail.com> <20090724162202.GA1512@eniac.jgi-psf.org> Message-ID: <4A6CAF8D.3090403@sendu.me.uk> Joel Martin wrote: > Hello, > I went to test bioperl-live and Build.PL started updating > modules in my perl install w/o prompting me, frightening! I > really need to test modules and other groups here need to > test them before they're updated so we don't break anything > when some module's api changes. > > I did svn co of bioperl-live then > > perl Build.PL PREFIX=/scratch/bioperl-live > > saw some output including > "I think you ran Build.PL directly, so will use CPAN to install prerequisites on demand" > > then it went to cpan and tried updating Data::Stag in > my perl install. > > How can I ask it to prompt me before updating modules ( so > I can put the updated versions somewhere for it to find > that isn't the live perl install )? Unfortunately, for those few (5) modules that are currently "absolutely required", it doesn't ask, it just updates. Data::Stag actually has a comment next to it suggesting it isn't even "absolutely required". So you could just comment that line out completely in Build.PL as a temporary fix. Otherwise, since it will be using CPAN to install modules, you can just arrange beforehand for CPAN to install to your desired location using the CPAN configuration system. From cjfields at illinois.edu Sun Jul 26 16:21:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 26 Jul 2009 15:21:23 -0500 Subject: [Bioperl-l] how to stop prerequisite modules auto-installing In-Reply-To: <4A6CAF8D.3090403@sendu.me.uk> References: <1d06cd5d0906300428x59c004f1h200bfe3c23ed769@mail.gmail.com> <4A520591.3070407@ebi.ac.uk> <1d06cd5d0907080826g35534843l665350ef9ecc0c50@mail.gmail.com> <4A54C1FB.8050708@ebi.ac.uk> <320fb6e00907230431y33190228ic6b0d01adede3243@mail.gmail.com> <5128A289-377E-4EC3-9030-E0E91B463EA1@illinois.edu> <320fb6e00907240228u797c316ds94297f349d8c097c@mail.gmail.com> <222FBEA4-37CD-4619-9BBD-AB502CF85AD5@illinois.edu> <320fb6e00907240600p7cc41b37wc7c0f748160f109@mail.gmail.com> <20090724162202.GA1512@eniac.jgi-psf.org> <4A6CAF8D.3090403@sendu.me.uk> Message-ID: <80CBC451-90B1-4224-8E13-E78C9F771149@illinois.edu> On Jul 26, 2009, at 2:33 PM, Sendu Bala wrote: > Joel Martin wrote: >> Hello, >> I went to test bioperl-live and Build.PL started updating >> modules in my perl install w/o prompting me, frightening! I >> really need to test modules and other groups here need to >> test them before they're updated so we don't break anything >> when some module's api changes. >> I did svn co of bioperl-live then >> perl Build.PL PREFIX=/scratch/bioperl-live >> saw some output including >> "I think you ran Build.PL directly, so will use CPAN to install >> prerequisites on demand" >> then it went to cpan and tried updating Data::Stag in my perl >> install. >> How can I ask it to prompt me before updating modules ( so >> I can put the updated versions somewhere for it to find >> that isn't the live perl install )? > > Unfortunately, for those few (5) modules that are currently > "absolutely required", it doesn't ask, it just updates. This is bad; we should never assume the intent of the user. I would rather prompt for these, then bail if the answer is 'no' for any 'requires' modules. Leave it up to the user to make the decision to update; if they want they will install the latest required modules, otherwise there isn't much we can do. > Data::Stag actually has a comment next to it suggesting it isn't > even "absolutely required". So you could just comment that line out > completely in Build.PL as a temporary fix. If you do any work with UniProt, then Data::Stag *is* required. It is used by Bio::Annotation::TagTree, the replacement for Bio::Annotation::StructuredValue (this has been the case for a couple years now I believe). The Data::Stag update is small but required as well, BTW, unless you want warnings popping up. > Otherwise, since it will be using CPAN to install modules, you can > just arrange beforehand for CPAN to install to your desired location > using the CPAN configuration system. chris From pmr at ebi.ac.uk Mon Jul 27 04:55:43 2009 From: pmr at ebi.ac.uk (Peter Rice) Date: Mon, 27 Jul 2009 09:55:43 +0100 Subject: [Bioperl-l] Open-bio cross-project issues In-Reply-To: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> Message-ID: <4A6D6B8F.9060108@ebi.ac.uk> Peter C. wrote (to bioperl-l, biopython-l, emboss-dev): > Hi all, > > Peter Rice kindly said he will look into an OBF cross project mailing > list, but in the meantime this has been cross posted to the Biopython, > BioPerl, and EMBOSS development lists. There is a list already for this purpose - open-bio-l I think we will also need a cross-project wiki space on the OBF site. Is there something already used by other projects or should we set something up? I am cross-posting this to other OBF project lists to encourage developers interested in combining efforts to address common problems. This started with FASTQ short read formats, and open-bio-l (a low volume list) has also seen discussion of common test data sets. Please sign up to open-bio-l (if you are not there already) and post suggestions for cross-project issues there. The list subscription page is: http://lists.open-bio.org/mailman/listinfo/open-bio-l Please feel free to forward this to any other projects I may have missed (I picked the obvious addresses from the list.open-bio-org server) regards, Peter Rice From David.Messina at sbc.su.se Mon Jul 27 06:13:27 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Mon, 27 Jul 2009 12:13:27 +0200 Subject: [Bioperl-l] QUERY: Correct use of Bio::SeqIO::largefasta? In-Reply-To: References: Message-ID: <628aabb70907270313r1e93cad5vd6c70f5ec808c813@mail.gmail.com> Hi Mark, I'm not familiar with the TFBS code, but these errors you're getting: Output from 1.7 Megabase DNA fragment: > > GET_SEQUENCE: Sequence too long. > LOOP_ON_SEQS: get_sequence failed. > MAIN: loop_on_seqs failed. are coming from the C code that's part of the TFBS package. So I think your inclination to chop up the longer sequences is correct. I modified your code to do that (see below). Also, you might contact the author of the TFBS package (Boris.Lenhard at bccs.uib.no) and see if he has any suggestions for handling longer sequences with his code. Dave ------- CUT HERE ------ #!/usr/bin/perl use strict; use warnings; use TFBS::Matrix::PFM; use Bio::Seq; use Bio::SeqIO; my $matrixref = [ [ 5, 5, 5, 5, 5, 5, 85, 5, 5, 5, 5, 85 ], [ 5, 5, 5, 5, 5, 85, 5, 5, 5, 5, 85, 5 ], [ 5, 5, 85, 85, 5, 5, 5, 85, 5, 85, 5, 5 ], [ 85, 85, 5, 5, 85, 5, 5, 5, 85, 5, 5, 5 ] ]; my $chunklength = 175000; # set this to whatever you want your chunk size to be my $pfm = TFBS::Matrix::PFM->new( -matrix => $matrixref, -name => "CeRep_matrix_1", -ID => "M1000" ); my $pwm = $pfm->to_PWM(); # convert to position weight matrix my $stream = Bio::SeqIO->new( -format => 'fasta', -fh => \*ARGV ); while ( my $seq = $stream->next_seq() ) { my $seqlength = $seq->length(); for ( my $i = 1 ; $i <= $seqlength ; $i += $chunklength ) { my $start = $i; my $end = $start + $chunklength - 1; if ( $end > $seqlength ) { $end = $seqlength; } my $subseq = $seq->subseq( $start, $end ); my $display_id = $seq->display_id . '/' . $start . '-' . $end; my $chunk = Bio::Seq->new( -seq => $subseq, -display_id => $display_id, -alphabet => $seq->alphabet, ); my $siteset = $pwm->search_seq( -seqobj => $chunk, -threshold => " 75 %", ); print $siteset->GFF(); } } From biopython at maubp.freeserve.co.uk Mon Jul 27 07:51:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 27 Jul 2009 12:51:13 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> Message-ID: <320fb6e00907270451i3d40b4ffq607360cfcb6f6282@mail.gmail.com> On Sat, Jul 25, 2009 at 8:50 PM, Chris Fields wrote: > > From this it could be summarized that converting to sanger format is least > problematic, as possible issues may be encountered when converting to the > other variants. ?We'll need to fix the solexa quality calculations in the > BioPerl parser as noted in your previous post; I'll work on that. > BioPerl SVN (revision 15887, just updated on the off chance you have committed any fixes recently) also has a problem going the other way (from FASTQ Sanger to FASTQ Solexa), $ more sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN + IHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! $ perl bioperl_sanger2solexa.pl < sanger_faked.fastq @Test PHRED qualities from 40 to 0 inclusive ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN +Test PHRED qualities from 40 to 0 inclusive hgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFEDB@>< Depending on your email viewer this may not be obvious, but the sequence line is length 41 but the quality line is only 40 characters. And again, I also suspect a problem in the mapping itself. Peter From sirisha at mycib.ac.uk Mon Jul 27 08:32:23 2009 From: sirisha at mycib.ac.uk (Sirisha Gollapudi) Date: Mon, 27 Jul 2009 13:32:23 +0100 Subject: [Bioperl-l] BioPerl-network problem? Message-ID: <4A6D9E57.4060205@mycib.ac.uk> Hi everyone, I'm trying to use Bio::Network (version 1.6.0) to parse the file "HPRD_SINGLE_PSIMI_070609.xml" which I've downloaded from the HPRD website (http://www.hprd.org/download). This is the code I'm using: #!/usr/bin/perl -w use lib "/opt/bioperl_1.6.0/lib/perl5/"; use Bio::Network::IO; use Getopt::Std; my %opts; getopts('i:', \%opts); my $input_file = $opts{i}; my $io = Bio::Network::IO->new(-file => $input_file, -format => 'psi25', -verbose => 1 ); my $network = $io->next_network; Which gives the following: No fullName for interactor Aldehyde dehydrogenase 1 Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/perl5//Bio/Network/IO/psi25.pm line 376. Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/perl5//Bio/Network/IO/psi25.pm line 376. Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/perl5//Bio/Network/IO/psi25.pm line 376. Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/perl5//Bio/Network/IO/psi25.pm line 376. Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/perl5//Bio/Network/IO/psi25.pm line 376. Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/perl5//Bio/Network/IO/psi25.pm line 376. Segmentation fault I've tried with numerous other PSI-MI v2.5 files, and the only ones that "work" are those from MINT - all the others give the same "Use of uninitialized value" error as above. I've tried DIP, BioGRID and MPact. Any advice on the errors would be great! Best wishes, Sirisha This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation. From cjfields at illinois.edu Mon Jul 27 09:05:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Jul 2009 08:05:00 -0500 Subject: [Bioperl-l] BioPerl-network problem? In-Reply-To: <4A6D9E57.4060205@mycib.ac.uk> References: <4A6D9E57.4060205@mycib.ac.uk> Message-ID: <38E5E546-B9FD-423A-AD97-EE7EA4BEC291@illinois.edu> On Jul 27, 2009, at 7:32 AM, Sirisha Gollapudi wrote: > Hi everyone, > > I'm trying to use Bio::Network (version 1.6.0) to parse the file > "HPRD_SINGLE_PSIMI_070609.xml" which I've downloaded from the HPRD > website (http://www.hprd.org/download). This is the code I'm using: > > #!/usr/bin/perl -w > use lib "/opt/bioperl_1.6.0/lib/perl5/"; > use Bio::Network::IO; > > use Getopt::Std; > my %opts; > getopts('i:', \%opts); > > my $input_file = $opts{i}; > > my $io = Bio::Network::IO->new(-file => $input_file, > -format => 'psi25', > -verbose => 1 ); > > my $network = $io->next_network; > > > Which gives the following: > > No fullName for interactor Aldehyde dehydrogenase 1 > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Segmentation fault > > I've tried with numerous other PSI-MI v2.5 files, and the only ones > that "work" are those from MINT - all the others give the same "Use > of uninitialized value" error as above. I've tried DIP, BioGRID and > MPact. Any advice on the errors would be great! > > Best wishes, > > Sirisha The segfault makes me wonder if it could have something to do with the XML parser. Can you file this as a bug report so we can track it? http://www.bioperl.org/wiki/Bugs http://bugzilla.open-bio.org/ chris From cjfields at illinois.edu Mon Jul 27 09:06:58 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Jul 2009 08:06:58 -0500 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: <320fb6e00907270451i3d40b4ffq607360cfcb6f6282@mail.gmail.com> References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> <320fb6e00907270451i3d40b4ffq607360cfcb6f6282@mail.gmail.com> Message-ID: On Jul 27, 2009, at 6:51 AM, Peter wrote: > On Sat, Jul 25, 2009 at 8:50 PM, Chris Fields > wrote: >> >> From this it could be summarized that converting to sanger format >> is least >> problematic, as possible issues may be encountered when converting >> to the >> other variants. We'll need to fix the solexa quality calculations >> in the >> BioPerl parser as noted in your previous post; I'll work on that. >> > > BioPerl SVN (revision 15887, just updated on the off chance you > have committed any fixes recently) also has a problem going the > other way (from FASTQ Sanger to FASTQ Solexa), > > $ more sanger_faked.fastq > @Test PHRED qualities from 40 to 0 inclusive > ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN > + > IHGFEDCBA@?>=<;:9876543210/.-,+*)('&%$#"! > > $ perl bioperl_sanger2solexa.pl < sanger_faked.fastq > @Test PHRED qualities from 40 to 0 inclusive > ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTN > +Test PHRED qualities from 40 to 0 inclusive > hgfedcba`_^]\[ZYXWVUTSRQPONMLKJHGFEDB@>< > > Depending on your email viewer this may not be obvious, but > the sequence line is length 41 but the quality line is only 40 > characters. And again, I also suspect a problem in the mapping > itself. > > Peter I added this (and the others) to our ticket tracking this. Looks like solexa conversion either way is borked, which is very likely an issue with conversion. chris From biopython at maubp.freeserve.co.uk Mon Jul 27 09:15:39 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 27 Jul 2009 14:15:39 +0100 Subject: [Bioperl-l] FASTQ support in Biopython, BioPerl, and EMBOSS In-Reply-To: References: <320fb6e00907240632h53600e73s63590a8deb4e8ffe@mail.gmail.com> <320fb6e00907240653y1d7e7861j98ce45a12f02d9df@mail.gmail.com> <320fb6e00907240812l25cd222dxf72fee0e3093f7b3@mail.gmail.com> <32BA007E-949A-4BF2-9F73-8FE0F98807CC@illinois.edu> <320fb6e00907270451i3d40b4ffq607360cfcb6f6282@mail.gmail.com> Message-ID: <320fb6e00907270615m438b4230wbaed5895d5ed35d1@mail.gmail.com> On Mon, Jul 27, 2009 at 2:06 PM, Chris Fields wrote: > >> >> Depending on your email viewer this may not be obvious, but >> the sequence line is length 41 but the quality line is only 40 >> characters. And again, I also suspect a problem in the mapping >> itself. >> >> Peter > > I added this (and the others) to our ticket tracking this. ?Looks like > solexa conversion either way is borked, which is very likely an issue with > conversion. > > chris I'm afraid so. I'll keep an eye on that then (Bug 2857) http://bugzilla.open-bio.org/show_bug.cgi?id=2857 Peter From bosborne11 at verizon.net Mon Jul 27 09:12:30 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Mon, 27 Jul 2009 09:12:30 -0400 Subject: [Bioperl-l] BioPerl-network problem? In-Reply-To: <4A6D9E57.4060205@mycib.ac.uk> References: <4A6D9E57.4060205@mycib.ac.uk> Message-ID: Sirisha, Yes, I've noticed this too and talk about it in the Wiki: http://www.bioperl.org/wiki/Module:Bio::Graph::IO::psi_xml Unfortunately there are different "flavors" of PSI MI 2.5, and I elected not to try to parse a flavor that either didn't match Bioperl's data model or that didn't precisely adhere to PSI MI. I've found that another reliable source of PSI 2.5 files is IntAct, try those files. Just a detail, the real error you're seeing is this: > No fullName for interactor Aldehyde dehydrogenase 1 That tells you what the problem is although, as the Wiki page says, there may be other problems with files from HPRD. Brian O. On Jul 27, 2009, at 8:32 AM, Sirisha Gollapudi wrote: > Hi everyone, > > I'm trying to use Bio::Network (version 1.6.0) to parse the file > "HPRD_SINGLE_PSIMI_070609.xml" which I've downloaded from the HPRD > website (http://www.hprd.org/download). This is the code I'm using: > > #!/usr/bin/perl -w > use lib "/opt/bioperl_1.6.0/lib/perl5/"; > use Bio::Network::IO; > > use Getopt::Std; > my %opts; > getopts('i:', \%opts); > > my $input_file = $opts{i}; > > my $io = Bio::Network::IO->new(-file => $input_file, > -format => 'psi25', > -verbose => 1 ); > > my $network = $io->next_network; > > > Which gives the following: > > No fullName for interactor Aldehyde dehydrogenase 1 > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Use of uninitialized value in string eq at /opt/bioperl_1.6.0/lib/ > perl5//Bio/Network/IO/psi25.pm line 376. > Segmentation fault > > > > I've tried with numerous other PSI-MI v2.5 files, and the only ones > that "work" are those from MINT - all the others give the same "Use > of uninitialized value" error as above. I've tried DIP, BioGRID and > MPact. Any advice on the errors would be great! > > Best wishes, > > Sirisha > > This message has been checked for viruses but the contents of an > attachment > may still contain software viruses, which could damage your computer > system: > you are advised to perform your own checks. Email communications > with the > University of Nottingham may be monitored as permitted by UK > legislation. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From rmb32 at cornell.edu Mon Jul 27 11:25:43 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 27 Jul 2009 08:25:43 -0700 Subject: [Bioperl-l] A new name for Bio::Moose? In-Reply-To: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> References: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> Message-ID: <4A6DC6F7.1050209@cornell.edu> Bioperl 2. If the Moose business doesn't work out, throw away the moose code and do something different. But I think it would be good to christen it as the embryonic bioperl 2 to get some momentum around it. Rob Chris Fields wrote: > All, > > Pushed by a recent suggestion by Robert, I am considering changing the > name of the Bio::Moose project to something simpler. I would like to > steer away from naming this directly after the implementation and have > something simpler namespace-wise. > > I have thought of 'Alces' (the genus name for moose), which indicates > both the Bio aspect and the implementation in an more indirect way (and > is a bit shorter). However, I would like to solicit suggestions for > alternatives. The shorter the better, and the 'winner' will receive a > free beverage or so (on me) should we meet! > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Mon Jul 27 12:23:03 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 27 Jul 2009 11:23:03 -0500 Subject: [Bioperl-l] A new name for Bio::Moose? In-Reply-To: <4A6DC6F7.1050209@cornell.edu> References: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> <4A6DC6F7.1050209@cornell.edu> Message-ID: Robert, Herein lies the namespace problem. What namespace would we inhabit with Bioperl 2? Bio2::*? We can't place anything in Bio w/o running into potential problems re: namespace collisions, both locally and on CPAN (yes, I do intend on releasing to CPAN at some point). Bio* primary maintainer is BIOPERLML, so releasing a Bio::Annotation::Foo on CPAN would potentially collide with any BioPerl Bio::Annotation::Foo as an unauthorized release. If all code is sequestered in Bio::Moose::Annotation::Foo, we don't have a problem beyond it requiring more typing, hence the request for a name change (preferrably something short) ;> Anyway, for all intents and purposes, if everything works out it will very likely become bioperl 2.0, and the old Bio::Moose (or whatever) will be deprecated on CPAN in favor of this. Until then, it is simply a side project to explore the feasibility of moving to a Moose-based framework. We can then work on reimplementing the various split-off Bio::* we have been discussing elsewhere. One main side benefit of doing this: it's incredibly freeing. I'm finding numerous places where we could optimize things in BioPerl, even in the case should this not pan out. For instance, memoize and clone locations instead of creating everything de novo. Annotations and AnnotationCollections could be lightened considerably if the proper framework were provided, something like Data::Stag or XPath. Also, I have found other instances that will be beneficial to a perl6- based implementation (roles vs interfaces primarily). I'll be posting here and blogging about these along the way. chris On Jul 27, 2009, at 10:25 AM, Robert Buels wrote: > Bioperl 2. If the Moose business doesn't work out, throw away the > moose code and do something different. But I think it would be good > to christen it as the embryonic bioperl 2 to get some momentum > around it. > > Rob > > Chris Fields wrote: >> All, >> Pushed by a recent suggestion by Robert, I am considering changing >> the name of the Bio::Moose project to something simpler. I would >> like to steer away from naming this directly after the >> implementation and have something simpler namespace-wise. >> I have thought of 'Alces' (the genus name for moose), which >> indicates both the Bio aspect and the implementation in an more >> indirect way (and is a bit shorter). However, I would like to >> solicit suggestions for alternatives. The shorter the better, and >> the 'winner' will receive a free beverage or so (on me) should we >> meet! >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu From cjm at berkeleybop.org Mon Jul 27 21:53:09 2009 From: cjm at berkeleybop.org (Chris Mungall) Date: Mon, 27 Jul 2009 18:53:09 -0700 Subject: [Bioperl-l] A new name for Bio::Moose? In-Reply-To: References: <45056703-1979-4C8D-AFC7-BF14551D10E4@illinois.edu> <4A6DC6F7.1050209@cornell.edu> Message-ID: On Jul 27, 2009, at 9:23 AM, Chris Fields wrote: > Robert, > > Herein lies the namespace problem. What namespace would we inhabit > with Bioperl 2? Bio2::*? > > We can't place anything in Bio w/o running into potential problems > re: namespace collisions, both locally and on CPAN (yes, I do intend > on releasing to CPAN at some point). Bio* primary maintainer is > BIOPERLML, so releasing a Bio::Annotation::Foo on CPAN would > potentially collide with any BioPerl Bio::Annotation::Foo as an > unauthorized release. If all code is sequestered in > Bio::Moose::Annotation::Foo, we don't have a problem beyond it > requiring more typing, hence the request for a name change > (preferrably something short) ;> > > Anyway, for all intents and purposes, if everything works out it > will very likely become bioperl 2.0, and the old Bio::Moose (or > whatever) will be deprecated on CPAN in favor of this. Until then, > it is simply a side project to explore the feasibility of moving to > a Moose-based framework. We can then work on reimplementing the > various split-off Bio::* we have been discussing elsewhere. > > One main side benefit of doing this: it's incredibly freeing. I'm > finding numerous places where we could optimize things in BioPerl, > even in the case should this not pan out. For instance, memoize and > clone locations instead of creating everything de novo. Annotations > and AnnotationCollections could be lightened considerably if the > proper framework were provided, something like Data::Stag or XPath. As the author of Data::Stag and one-time proponent of less object-y more xml-y modeling I'd say instead go for something a bit more moosey. I have a feeling that it is possible to use the meta-level features of Moose to come up with a solution for annotations that would turn out to be (a) flexible, extensible and intuitive, with the right benefits from both strong and weak typing (b) mind-bogglingly complex. Not sure which. But if Bio::Moose is just for experimentation just now might be worth trying. > Also, I have found other instances that will be beneficial to a > perl6-based implementation (roles vs interfaces primarily). I'll be > posting here and blogging about these along the way. > > chris > > On Jul 27, 2009, at 10:25 AM, Robert Buels wrote: > >> Bioperl 2. If the Moose business doesn't work out, throw away the >> moose code and do something different. But I think it would be >> good to christen it as the embryonic bioperl 2 to get some momentum >> around it. >> >> Rob >> >> Chris Fields wrote: >>> All, >>> Pushed by a recent suggestion by Robert, I am considering changing >>> the name of the Bio::Moose project to something simpler. I would >>> like to steer away from naming this directly after the >>> implementation and have something simpler namespace-wise. >>> I have thought of 'Alces' (the genus name for moose), which >>> indicates both the Bio aspect and the implementation in an more >>> indirect way (and is a bit shorter). However, I would like to >>> solicit suggestions for alternatives. The shorter the better, and >>> the 'winner' will receive a free beverage or so (on me) should we >>> meet! >>> chris >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Jul 28 10:59:11 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 28 Jul 2009 09:59:11 -0500 Subject: [Bioperl-l] Fwd: [blast-announce] BLAST 2.2.21 now available References: <715433AE-B519-4B7F-AA43-1853D48F4156@ncbi.nlm.nih.gov> Message-ID: These *should* work, but we'll need to test these just in case to make sure we're catching everything we expect. chris Begin forwarded message: > From: mcginnis > Date: July 28, 2009 8:32:52 AM CDT > To: blast-announce at ncbi.nlm.nih.gov > Subject: [blast-announce] BLAST 2.2.21 now available > > > BLAST 2.2.21 released. > ********************* > > The 2.2.21 version of BLAST has been released. With this release > the new BLAST+ command-line applications are being promoted. The > BLAST+ > applications have a number of advantages over the older applications > that include working more robustly with long sequences and a new > type of > masking (database masking). For details see ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/user_manual.pdf > . > > The new applications can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST > These > applications have been built with the NCBI C++ toolkit. Changes from > the last release are listed below. > > The older C toolkit applications (e.g., blastall) are still > available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.21/ > Changes from the last release are listed below. > > Please send questions or comments to blast-help at ncbi.nlm.nih.gov > > c toolkit binary changes: > * corrected a bug in xml output (SB-217) > * corrected a bug with query concatenation in ungapped searches > (SB-263) > * tabular output header for "-m 8" now printed even if there are no > results. (sb-290) > > C++ toolkit binary improvements: > * best hit algorithm, see section 4.5.12 in ftp://ftp.ncbi.nih.gov/blast/executables/blast+/LATEST/user_manual.pdf > * improve culling option performance > * fix mutex problems in BLAST database reader. > * improve performance of database masking option. > > C++ binary changes: > * database masking enabled, see details in ftp://ftp.ncbi.nih.gov/blast/executables/blast+/LATEST/user_manual.pdf > * makeblastdb user-interface improvements > * blastdbcmd can now emit masked fasta for a masked database > > > From shalabh.sharma7 at gmail.com Tue Jul 28 11:46:02 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 28 Jul 2009 11:46:02 -0400 Subject: [Bioperl-l] Percentage Similarity Message-ID: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> Hi All, I have some protein sequences (around 100) i need to find overall percentage similarity between them. How i can do that? Thanks Shalabh From flancer85 at gmail.com Wed Jul 29 14:12:57 2009 From: flancer85 at gmail.com (Lance Ferguson) Date: Wed, 29 Jul 2009 13:12:57 -0500 Subject: [Bioperl-l] BioPerl objects Message-ID: Howdy All, I'm a graduate student who is new to programming. I was wondering if there is any bioperl method that will compare two bioperl objects? As an example comparing two sequence objects by walking along them and outputing differences. Thank you very much for your assistance. Lance Ferguson From cjfields at illinois.edu Wed Jul 29 14:35:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Jul 2009 13:35:50 -0500 Subject: [Bioperl-l] Bio::Moose is now.... Message-ID: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> Biome! This makes the most sense to me; as Mark points out the name works as an appropriate acronym (BioPerl with Metaclass Extensions), as well as a biome being (per wikipedia): "a climatically and geographically defined areas of ecologically similar climatic conditions such as communities of plants, animals, and soil organisms ... often referred to as ecosystems". Seems a fitting name for a open-source project. I'll be moving the namespace over to Biome over the next couple of days on github. Now I owe Mark some beer... Now, for extensions, should I assume this will eventually be BioPerl2 (and thus use BioX::*)? Or stick with BiomeX::*? chris PS: Just a quick benchmark for the current Bio::Moose::PrimarySeq implementation (we don't have SeqIO working as of yet, so the benchmark script does the heavy lifting): http://gist.github.com/158317 From sidd.basu at gmail.com Wed Jul 29 16:12:26 2009 From: sidd.basu at gmail.com (Siddhartha Basu) Date: Wed, 29 Jul 2009 15:12:26 -0500 Subject: [Bioperl-l] Re: Bio::Moose is now.... In-Reply-To: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> References: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> Message-ID: <4a70ad2e.06d6720a.102c.ffff8577@mx.google.com> On Wed, 29 Jul 2009, Chris Fields wrote: > Biome! This makes the most sense to me; as Mark points out the name works > as an appropriate acronym (BioPerl with Metaclass Extensions), as well as a > biome being (per wikipedia): > > "a climatically and geographically defined areas of ecologically similar > climatic conditions such as communities of plants, animals, and soil > organisms ... often referred to as ecosystems". > > Seems a fitting name for a open-source project. I'll be moving the > namespace over to Biome over the next couple of days on github. Now I owe > Mark some beer... +1 for that name, have to update by repository now. So, how the namespace would be now, everything Bio::Moose => Biome. And Bio::Moose::Role becomes Biome::Role, Bio::Moose::Location becomes Biome::Location. > > Now, for extensions, should I assume this will eventually be BioPerl2 (and > thus use BioX::*)? Or stick with BiomeX::*? I would stay with BiomeX -siddhartha > > chris > > PS: Just a quick benchmark for the current Bio::Moose::PrimarySeq > implementation (we don't have SeqIO working as of yet, so the benchmark > script does the heavy lifting): > > http://gist.github.com/158317 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Jul 29 18:08:21 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Jul 2009 17:08:21 -0500 Subject: [Bioperl-l] Bio::Moose is now.... In-Reply-To: <4a70ad2e.06d6720a.102c.ffff8577@mx.google.com> References: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> <4a70ad2e.06d6720a.102c.ffff8577@mx.google.com> Message-ID: <0B4DD03D-A94E-4065-B883-1CF2F7A3E984@illinois.edu> On Jul 29, 2009, at 3:12 PM, Siddhartha Basu wrote: > On Wed, 29 Jul 2009, Chris Fields wrote: > >> Biome! This makes the most sense to me; as Mark points out the >> name works >> as an appropriate acronym (BioPerl with Metaclass Extensions), as >> well as a >> biome being (per wikipedia): >> >> "a climatically and geographically defined areas of ecologically >> similar >> climatic conditions such as communities of plants, animals, and soil >> organisms ... often referred to as ecosystems". >> >> Seems a fitting name for a open-source project. I'll be moving the >> namespace over to Biome over the next couple of days on github. >> Now I owe >> Mark some beer... > > +1 for that name, have to update by repository now. > So, how the namespace would be now, everything Bio::Moose => Biome. > And Bio::Moose::Role becomes Biome::Role, Bio::Moose::Location > becomes > Biome::Location. Yes. I just did a full Bio::Moose->Biome subst in all the files and that seemed to catch all tests. Should be updated in github. Note that everything is in /lib/Biome now (instead of plain /Biome) for Module::Build. >> Now, for extensions, should I assume this will eventually be >> BioPerl2 (and >> thus use BioX::*)? Or stick with BiomeX::*? > I would stay with BiomeX > > -siddhartha Okay; we'll stick with that for now. chris From maj at fortinbras.us Wed Jul 29 19:35:07 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 29 Jul 2009 19:35:07 -0400 Subject: [Bioperl-l] Bio::Moose is now.... In-Reply-To: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> References: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> Message-ID: <0211794B9987496885106C036D0A72E4@NewLife> Excellent: more opensource beer! I like the south-of-the-border feel of BiomeX. cheers all MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Wednesday, July 29, 2009 2:35 PM Subject: [Bioperl-l] Bio::Moose is now.... > Biome! This makes the most sense to me; as Mark points out the name > works as an appropriate acronym (BioPerl with Metaclass Extensions), > as well as a biome being (per wikipedia): > > "a climatically and geographically defined areas of ecologically > similar climatic conditions such as communities of plants, animals, > and soil organisms ... often referred to as ecosystems". > > Seems a fitting name for a open-source project. I'll be moving the > namespace over to Biome over the next couple of days on github. Now I > owe Mark some beer... > > Now, for extensions, should I assume this will eventually be BioPerl2 > (and thus use BioX::*)? Or stick with BiomeX::*? > > chris > > PS: Just a quick benchmark for the current Bio::Moose::PrimarySeq > implementation (we don't have SeqIO working as of yet, so the > benchmark script does the heavy lifting): > > http://gist.github.com/158317 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jncline at gmail.com Wed Jul 29 22:06:38 2009 From: jncline at gmail.com (Jonathan Cline) Date: Wed, 29 Jul 2009 21:06:38 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion Message-ID: <4A71002E.6060507@gmail.com> I am writing a module for communication with biology robotics, as discussed recently on #bioperl, and I invite your comments. Currently this mode talks to a Tecan genesis workstation robot ( http://images.google.com/images?q=tecan genesis ). Other vendors are Beckman Biomek, Agilent, etc. No such modules exist anywhere on the 'net with the exception of some visual basic and labview scripts which I have found. There are some computational biologists who program for robots via high level s/w, but these scripts are not distributed as OSS. With Tecan, there is a datapipe interface for hardware communication, as an added $$ option from the vendor. I haven't checked other vendors to see if they likewise have an open communication path for third party software. By allowing third-party communication, then naturally the next step is to create a socket client-server; especially as the robot vendor only support MS Win and using the local machine has typical Microsoft issues (like losing real time communication with the hardware due to GUI animation, bad operating system stability, no unix except cygwin, etc). On Namespace: I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many s/w modules already called 'robots' (web spider robots, chat bots, www automate, etc) so I chose the longer name "robotics" to differentiate this module as manipulating real hardware. Bio::Robotics is the abstraction for generic robotics and Bio::Robotics::(vendor) is the manufacturer-specific implementation. Robot control is made more complex due to the very configurable nature of the work table (placement of equipment, type of equipment, type of attached arm, etc). The abstraction has to be careful not to generalize or assume too much. In some cases, the Bio::Robotics modules may expand to arbitrary equipment such as thermocyclers, tray holders, imagers, etc - that could be a future roadmap plan. Here is some theoretical example usage below, subject to change. At this time I am deciding how much state to keep within the Perl module. By keeping state, some robot programming might be simplified (avoiding deadlock or tracking tip state). In general I am aiming for a more "protocol friendly" method implementation. To use this software with locally-connected robotics hardware: use Bio::Robotics; my $tecan = Bio::Robotics->new("Tecan") || die; $tecan->attach() || die; $tecan->home(); $tecan->pipette(tips => "1", from => "rack1"); $tecan->pipette(aspirate => "1", dispense => "1", from => "sampleTray", to => "DNATray"); ... To use this software with remote robotics hardware over the network: # On the local machine, run: use Bio::Robotics; my @connected_hardware = Bio::Robotics->query(); my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in @connected_hardware\n"; $tecan->attach() || die; $tecan->configure("my work table configuration file") || die; # Run the server and process commands while (1) { $error = $tecan->server(passwordplaintext => "0xd290"); if ($tecan->lastClientCommand() =~ /^shutdown/) { last; } } $tecan->detach(); exit(0); # On the remote machine (the client), run: use Bio::Robotics; my $server = "heavybio.dyndns.org:8080"; my $password = "0xd290"; my $tecan = Bio::Robotics->new("Tecan"); $tecan->connect($server, $mypassword) || die; $tecan->home(); $tecan->pipette(tips => "1", from => "rack200"); $tecan->pipette(aspirate => "1", dispense => "1", from => "sampleTray A1", to => "DNATray A2", volume => "45", liquid => "Buffer"); $tecan->pipette(drop => "1"); ... $tecan->disconnect(); exit(0); -- ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## From cjfields at illinois.edu Wed Jul 29 22:15:27 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 29 Jul 2009 21:15:27 -0500 Subject: [Bioperl-l] Bio::Moose is now.... In-Reply-To: <0211794B9987496885106C036D0A72E4@NewLife> References: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> <0211794B9987496885106C036D0A72E4@NewLife> Message-ID: <11067F04-F22C-4613-B3A0-EED463F04DAD@illinois.edu> Well, I am from Texas... chris On Jul 29, 2009, at 6:35 PM, Mark A. Jensen wrote: > Excellent: more opensource beer! I like the south-of-the-border feel > of BiomeX. cheers all > MAJ > ----- Original Message ----- From: "Chris Fields" > > To: "BioPerl List" > Sent: Wednesday, July 29, 2009 2:35 PM > Subject: [Bioperl-l] Bio::Moose is now.... > > >> Biome! This makes the most sense to me; as Mark points out the >> name works as an appropriate acronym (BioPerl with Metaclass >> Extensions), as well as a biome being (per wikipedia): >> "a climatically and geographically defined areas of ecologically >> similar climatic conditions such as communities of plants, >> animals, and soil organisms ... often referred to as ecosystems". >> Seems a fitting name for a open-source project. I'll be moving >> the namespace over to Biome over the next couple of days on >> github. Now I owe Mark some beer... >> Now, for extensions, should I assume this will eventually be >> BioPerl2 (and thus use BioX::*)? Or stick with BiomeX::*? >> chris >> PS: Just a quick benchmark for the current Bio::Moose::PrimarySeq >> implementation (we don't have SeqIO working as of yet, so the >> benchmark script does the heavy lifting): >> http://gist.github.com/158317 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From Russell.Smithies at agresearch.co.nz Wed Jul 29 22:44:02 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Thu, 30 Jul 2009 14:44:02 +1200 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A71002E.6060507@gmail.com> References: <4A71002E.6060507@gmail.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> I "acquired" an old Biomek 1000 that I'm thinking of modernising. It was originally controlled by a monstrously large but slow pc (IBM Value Point Model 466DX2 computer with Microsoft Windows* Version 3.1) My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) and use software like mach3 www.machsupport.com along with G-code to control it. I come from an engineering background so it seemed like the easy way to me :-) Now I just need a bit of free time to get it working... --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jonathan Cline > Sent: Thursday, 30 July 2009 2:07 p.m. > To: bioperl-l at lists.open-bio.org > Cc: Jonathan Cline > Subject: [Bioperl-l] Bio::Robotics namespace discussion > > I am writing a module for communication with biology robotics, as > discussed recently on #bioperl, and I invite your comments. > > Currently this mode talks to a Tecan genesis workstation robot ( > http://images.google.com/images?q=tecan genesis ). Other vendors are > Beckman Biomek, Agilent, etc. No such modules exist anywhere on the > 'net with the exception of some visual basic and labview scripts which I > have found. There are some computational biologists who program for > robots via high level s/w, but these scripts are not distributed as OSS. > > With Tecan, there is a datapipe interface for hardware communication, as > an added $$ option from the vendor. I haven't checked other vendors to > see if they likewise have an open communication path for third party > software. By allowing third-party communication, then naturally the > next step is to create a socket client-server; especially as the robot > vendor only support MS Win and using the local machine has typical > Microsoft issues (like losing real time communication with the hardware > due to GUI animation, bad operating system stability, no unix except > cygwin, etc). > > > On Namespace: > > I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many > s/w modules already called 'robots' (web spider robots, chat bots, www > automate, etc) so I chose the longer name "robotics" to differentiate > this module as manipulating real hardware. Bio::Robotics is the > abstraction for generic robotics and Bio::Robotics::(vendor) is the > manufacturer-specific implementation. Robot control is made more > complex due to the very configurable nature of the work table (placement > of equipment, type of equipment, type of attached arm, etc). The > abstraction has to be careful not to generalize or assume too much. In > some cases, the Bio::Robotics modules may expand to arbitrary equipment > such as thermocyclers, tray holders, imagers, etc - that could be a > future roadmap plan. > > Here is some theoretical example usage below, subject to change. At > this time I am deciding how much state to keep within the Perl module. > By keeping state, some robot programming might be simplified (avoiding > deadlock or tracking tip state). In general I am aiming for a more > "protocol friendly" method implementation. > > > To use this software with locally-connected robotics hardware: > > use Bio::Robotics; > > my $tecan = Bio::Robotics->new("Tecan") || die; > $tecan->attach() || die; > $tecan->home(); > $tecan->pipette(tips => "1", from => "rack1"); > $tecan->pipette(aspirate => "1", dispense => "1", from => "sampleTray", to > => "DNATray"); > ... > > To use this software with remote robotics hardware over the network: > > # On the local machine, run: > use Bio::Robotics; > > my @connected_hardware = Bio::Robotics->query(); > my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in > @connected_hardware\n"; > $tecan->attach() || die; > $tecan->configure("my work table configuration file") || die; > # Run the server and process commands > while (1) { > $error = $tecan->server(passwordplaintext => "0xd290"); > if ($tecan->lastClientCommand() =~ /^shutdown/) { > last; > } > } > $tecan->detach(); > exit(0); > > # On the remote machine (the client), run: > use Bio::Robotics; > > my $server = "heavybio.dyndns.org:8080"; > my $password = "0xd290"; > my $tecan = Bio::Robotics->new("Tecan"); > $tecan->connect($server, $mypassword) || die; > $tecan->home(); > $tecan->pipette(tips => "1", from => "rack200"); > $tecan->pipette(aspirate => "1", dispense => "1", > from => "sampleTray A1", to => "DNATray A2", > volume => "45", liquid => "Buffer"); > $tecan->pipette(drop => "1"); > ... > $tecan->disconnect(); > exit(0); > > > > -- > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields1 at gmail.com Thu Jul 30 09:27:57 2009 From: cjfields1 at gmail.com (Chris Fields) Date: Thu, 30 Jul 2009 08:27:57 -0500 Subject: [Bioperl-l] Perlmonks hacked Message-ID: All, In case there are a few users who haven't been notified, PerlMonks has been hacked rather severely: http://perlmonks.org/ The site was unsecure; all passwords were (astonishingly) stored as plain text, are out in the open, can be easily found (I did, and not I will not point them out). If anyone has decided to use a common password for, say Perlmonks and PAUSE (or Amazon, or CitiBank, or...), make sure to change both. Also realize that PerlMonks is NOT https, and that they have NOT patched the security hole yet, so any changed password may be further compromised (don't use a common password). In fact, your PAUSE account may be frozen already due to this: http://use.perl.org/~Alias/journal/39372 It's hard to overstate the intense irony of all this. For some reaction: http://perlhacks.com/2009/07/perl-monks-passwords.php http://blog.afoolishmanifesto.com/archives/1028 Good luck! chris From maj at fortinbras.us Thu Jul 30 11:33:06 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Thu, 30 Jul 2009 11:33:06 -0400 Subject: [Bioperl-l] Perlmonks hacked In-Reply-To: References: Message-ID: <51D9BD1FFCD74247A4CB6EB041F34AE0@NewLife> Men of the cloth do tend to live in their own little world. ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Thursday, July 30, 2009 9:27 AM Subject: [Bioperl-l] Perlmonks hacked > All, > > In case there are a few users who haven't been notified, PerlMonks has > been hacked rather severely: > > http://perlmonks.org/ > > The site was unsecure; all passwords were (astonishingly) stored as > plain text, are out in the open, can be easily found (I did, and not I > will not point them out). If anyone has decided to use a common > password for, say Perlmonks and PAUSE (or Amazon, or CitiBank, or...), > make sure to change both. Also realize that PerlMonks is NOT https, > and that they have NOT patched the security hole yet, so any changed > password may be further compromised (don't use a common password). > > In fact, your PAUSE account may be frozen already due to this: > > http://use.perl.org/~Alias/journal/39372 > > It's hard to overstate the intense irony of all this. For some reaction: > > http://perlhacks.com/2009/07/perl-monks-passwords.php > http://blog.afoolishmanifesto.com/archives/1028 > > > > Good luck! > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ocarnorsk138 at gmail.com Thu Jul 30 11:51:52 2009 From: ocarnorsk138 at gmail.com (Ocar Campos) Date: Thu, 30 Jul 2009 11:51:52 -0400 Subject: [Bioperl-l] Perlmonks hacked In-Reply-To: References: <51D9BD1FFCD74247A4CB6EB041F34AE0@NewLife> Message-ID: [smack]smacking my hand against my head in frustration.... hang on... I didn't have a perlmonk account...[/smack] it amaze me quite much how a developer's site can have so poor security... anyway thanks for the info.... Cheers. O'car Campos C. Bioinformatics Engineering Student. University of Talca. Chile. From jay at jays.net Fri Jul 31 03:29:14 2009 From: jay at jays.net (jay at jays.net) Date: Fri, 31 Jul 2009 03:29:14 -0400 Subject: [Bioperl-l] bioperl reorganization In-Reply-To: <4A60D73A.8030706@jays.net> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A603F82.9020202@cornell.edu> <4A60D73A.8030706@jays.net> Message-ID: <64439d16511f28ba7c28dcde8c8a0458@jays.net> On Fri, 17 Jul 2009 14:55:38 -0500, Jay Hannah wrote: > All Catalyst::* distributions live in the same SVN repository, as > entirely independent, ready-to-ship CPAN distributions: > > http://dev.catalyst.perl.org/repos/Catalyst/ > http://dev.catalyst.perl.org/repos/Catalyst/trunk/ Ah, progress(?). Catalyst has begun the migration to git: http://git.shadowcat.co.uk/gitweb/gitweb.cgi git clone git://git.shadowcat.co.uk/catagits/Catalyst-Action-REST.git Jay Hannah http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah From dan.bolser at gmail.com Fri Jul 31 08:13:45 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 31 Jul 2009 13:13:45 +0100 Subject: [Bioperl-l] problem with t/LocalDB/SeqFeature.t when host ne localhost Message-ID: <2c8757af0907310513q24bec4b0k7bec06b09e069b07@mail.gmail.com> Hi, Whenever I try to do the Bio::DB::GFF or Bio::DB::SeqFeature::Store live database tests: - will run tests with database driver 'mysql' and these settings: Database test Host our.mysql.host DSN dbi:mysql:database=test;host=our.mysql.host User me Password secret I get the following error: DBI connect('database=test','',...) failed: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) at Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 sh: -user: command not found The clue is the sh error that follows. The contents of the t/LocalDB/SeqFeature_mysql.t file looks like this: system '/usr/bin/perl t/LocalDB/SeqFeature.t -adaptor DBI::mysql -create 1 -temp 1 -dsn dbi:mysql:database=test;host=our.mysql.host -user me -password secret'; I tried the following diff to 'work around' the problem created by the ';' character in the dsn: diff -u t/LocalDB/SeqFeature.t~ t/LocalDB/SeqFeature.t --- t/LocalDB/SeqFeature.t~ 2009-05-11 15:22:07.000000000 +0100 +++ t/LocalDB/SeqFeature.t 2009-07-31 12:56:53.554227455 +0100 @@ -25,7 +25,7 @@ @args = (-adaptor => 'memory') unless @args; SKIP: { -my $db = eval { Bio::DB::SeqFeature::Store->new(@args) }; +my $db = eval { Bio::DB::SeqFeature::Store->new(-adaptor => "DBI::mysql", -create => 1, -temp => 1, -dsn => "dbi:mysql:database=test;host=our.mysql.host", -user => "me", -password => "secret") }; skip "DB load failed? Skipping all! $@", (TEST_COUNT - 2) if $@; ok($db); However, running the above script creates the following error: DBD::mysql::db do failed: BLOB/TEXT column 'tag' used in key specification without a key length at Bio/DB/SeqFeature/Store/DBI/mysql.pm line 450. ok 3 # skip DB load failed? Skipping all! # -------------------- EXCEPTION -------------------- # MSG: BLOB/TEXT column 'tag' used in key specification without a key length # STACK Bio::DB::SeqFeature::Store::DBI::mysql::_create_table Bio/DB/SeqFeature/Store/DBI/mysql.pm:450 # STACK Bio::DB::SeqFeature::Store::DBI::mysql::init_tmp_database Bio/DB/SeqFeature/Store/DBI/mysql.pm:439 # STACK Bio::DB::SeqFeature::Store::DBI::mysql::init Bio/DB/SeqFeature/Store/DBI/mysql.pm:223 # STACK Bio::DB::SeqFeature::Store::new Bio/DB/SeqFeature/Store.pm:360 # STACK (eval) t/LocalDB/SeqFeature.t:28 # STACK toplevel t/LocalDB/SeqFeature.t:28 # ------------------------------------------- # I'm not sure how to proceed from here. Thanks for any hints, Dan. From rmb32 at cornell.edu Fri Jul 31 10:53:08 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 31 Jul 2009 07:53:08 -0700 Subject: [Bioperl-l] Bio::Moose is now.... In-Reply-To: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> References: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> Message-ID: <4A730554.704@cornell.edu> I think this sounds great. GREAT news about the Biome::PrimarySeq performance. Rob Chris Fields wrote: > Biome! This makes the most sense to me; as Mark points out the name > works as an appropriate acronym (BioPerl with Metaclass Extensions), as > well as a biome being (per wikipedia): > > "a climatically and geographically defined areas of ecologically similar > climatic conditions such as communities of plants, animals, and soil > organisms ... often referred to as ecosystems". > > Seems a fitting name for a open-source project. I'll be moving the > namespace over to Biome over the next couple of days on github. Now I > owe Mark some beer... > > Now, for extensions, should I assume this will eventually be BioPerl2 > (and thus use BioX::*)? Or stick with BiomeX::*? > > chris > > PS: Just a quick benchmark for the current Bio::Moose::PrimarySeq > implementation (we don't have SeqIO working as of yet, so the benchmark > script does the heavy lifting): > > http://gist.github.com/158317 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Fri Jul 31 11:09:32 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 31 Jul 2009 10:09:32 -0500 Subject: [Bioperl-l] problem with t/LocalDB/SeqFeature.t when host ne localhost In-Reply-To: <2c8757af0907310513q24bec4b0k7bec06b09e069b07@mail.gmail.com> References: <2c8757af0907310513q24bec4b0k7bec06b09e069b07@mail.gmail.com> Message-ID: Dan, Can you file this as a BioPerl bug? I'm planning on driving towards releasing 1.6.1 alpha1 soon (next few weeks) and I would like to get this one fixed. chris On Jul 31, 2009, at 7:13 AM, Dan Bolser wrote: > Hi, > > Whenever I try to do the Bio::DB::GFF or Bio::DB::SeqFeature::Store > live database tests: > > - will run tests with database driver 'mysql' and these settings: > Database test > Host our.mysql.host > DSN dbi:mysql:database=test;host=our.mysql.host > User me > Password secret > > > I get the following error: > > DBI connect('database=test','',...) failed: Can't connect to local > MySQL server through socket '/var/lib/mysql/mysql.sock' (2) at > Bio/DB/SeqFeature/Store/DBI/mysql.pm line 212 > sh: -user: command not found > > > The clue is the sh error that follows. The contents of the > t/LocalDB/SeqFeature_mysql.t file looks like this: > > system '/usr/bin/perl t/LocalDB/SeqFeature.t -adaptor DBI::mysql > -create 1 -temp 1 -dsn dbi:mysql:database=test;host=our.mysql.host > -user me -password secret'; > > > I tried the following diff to 'work around' the problem created by the > ';' character in the dsn: > > diff -u t/LocalDB/SeqFeature.t~ t/LocalDB/SeqFeature.t > --- t/LocalDB/SeqFeature.t~ 2009-05-11 15:22:07.000000000 +0100 > +++ t/LocalDB/SeqFeature.t 2009-07-31 12:56:53.554227455 +0100 > @@ -25,7 +25,7 @@ > @args = (-adaptor => 'memory') unless @args; > > SKIP: { > -my $db = eval { Bio::DB::SeqFeature::Store->new(@args) }; > +my $db = eval { Bio::DB::SeqFeature::Store->new(-adaptor => > "DBI::mysql", -create => 1, -temp => 1, -dsn => > "dbi:mysql:database=test;host=our.mysql.host", -user => "me", > -password => "secret") }; > skip "DB load failed? Skipping all! $@", (TEST_COUNT - 2) if $@; > ok($db); > > > However, running the above script creates the following error: > > DBD::mysql::db do failed: BLOB/TEXT column 'tag' used in key > specification without a key length at > Bio/DB/SeqFeature/Store/DBI/mysql.pm line 450. > ok 3 # skip DB load failed? Skipping all! > # -------------------- EXCEPTION -------------------- > # MSG: BLOB/TEXT column 'tag' used in key specification without a > key length > # STACK Bio::DB::SeqFeature::Store::DBI::mysql::_create_table > Bio/DB/SeqFeature/Store/DBI/mysql.pm:450 > # STACK Bio::DB::SeqFeature::Store::DBI::mysql::init_tmp_database > Bio/DB/SeqFeature/Store/DBI/mysql.pm:439 > # STACK Bio::DB::SeqFeature::Store::DBI::mysql::init > Bio/DB/SeqFeature/Store/DBI/mysql.pm:223 > # STACK Bio::DB::SeqFeature::Store::new Bio/DB/SeqFeature/Store.pm:360 > # STACK (eval) t/LocalDB/SeqFeature.t:28 > # STACK toplevel t/LocalDB/SeqFeature.t:28 > # ------------------------------------------- > # > > > I'm not sure how to proceed from here. > > Thanks for any hints, > > Dan. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Jul 31 22:22:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 31 Jul 2009 21:22:17 -0500 Subject: [Bioperl-l] Bio::Moose is now.... In-Reply-To: <4A730554.704@cornell.edu> References: <526BD1BD-5887-4035-B3EB-ED2B426ED727@illinois.edu> <4A730554.704@cornell.edu> Message-ID: <0FB9B117-185B-4995-A63F-1BA14313DED0@illinois.edu> I think, before any CPAN release, I want to nip the monolith in the bud. Just have Meta/Root and simple interfaces (roles) describing classes in Biome, actual implementations or other additions going into BiomeX::*. The current Biome::Location/Annotation/etc would eventually be moved into their own BiomeX repos. Bundle with Task::Biome (maybe add some automated bundling options). Sound familiar? I'll try to get a ROADMAP up next week. chris On Jul 31, 2009, at 9:53 AM, Robert Buels wrote: > I think this sounds great. GREAT news about the Biome::PrimarySeq > performance. > > Rob > > Chris Fields wrote: >> Biome! This makes the most sense to me; as Mark points out the >> name works as an appropriate acronym (BioPerl with Metaclass >> Extensions), as well as a biome being (per wikipedia): >> "a climatically and geographically defined areas of ecologically >> similar climatic conditions such as communities of plants, animals, >> and soil organisms ... often referred to as ecosystems". >> Seems a fitting name for a open-source project. I'll be moving the >> namespace over to Biome over the next couple of days on github. >> Now I owe Mark some beer... >> Now, for extensions, should I assume this will eventually be >> BioPerl2 (and thus use BioX::*)? Or stick with BiomeX::*? >> chris >> PS: Just a quick benchmark for the current Bio::Moose::PrimarySeq >> implementation (we don't have SeqIO working as of yet, so the >> benchmark script does the heavy lifting): >> http://gist.github.com/158317 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jncline at gmail.com Fri Jul 31 23:24:56 2009 From: jncline at gmail.com (Jonathan Cline) Date: Fri, 31 Jul 2009 22:24:56 -0500 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl Message-ID: I recently mentioned working on Bio::Robotics for Tecan. Vendors being MS-Win specific, the vendor software allows third-party software communication through a named pipe (the literal filename is "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific and this pseudo-pipe is opened with sysopen() ). This is broken under cygwin-perl due to cygwin's method of handling paths -- the sysopen fails. However it works under ActiveState Perl and communication through the named pipe (to the robot hardware) is OK. The standard workaround is usually to use cygwin bash, and force the PATH to use ActiveState perl. (Typical MS Windows incompatibility problem.) The issue is: Perl module libraries for CPAN work under cygwin-perl (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN module use, or "make test", result in a bad list of incompatibility problems. Yet ActiveState Perl is required for communicating to the vendor application (unless there is some workaround to raw filesystem access in cygwin-perl that I haven't found in 2 days of working this). The stand-alone scripts I have work fine to access the named pipe (using ActiveState Perl) since the standalone scripts have no module INC dependencies, no CPAN module test harness, etc etc. This isn't specifically a Bio:: issue, though if anyone has suggestions please email. I could try msys and see if it handles the named-pipe-special-file better, if msys has an msys-perl distribution. -- ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## From maj at fortinbras.us Fri Jul 31 23:50:24 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 31 Jul 2009 23:50:24 -0400 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: References: Message-ID: Jonathan- I have an utter kludge for this very problem, if I understand it correctly. The kludge works for me a majority of the time. Be warned that this is in no way optimized or clever; there is no warranty expressed or implied... Two scripts are below; one runs the other. Together they convert a makefile generated by ActiveState into one suitable for a cygwin make. When the cygwin make is run after conversion, the installation occurs in the ActiveState locations. A demo session follows (note that 'asperl' is an alias, defined as alias asperl=/cygdrive/c/Perl/bin/perl ) cygwin session: $ wget http://search.cpan.org/CPAN/authors/id/N/NI/NI-S/Devel-Leak-0.03.tar.gz $ tar -xzf Devel-Leak-0.03.tar.gz $ cd Devel-Leak-0.03 $ asperl Makefile.PL $ as2cyg.sh $ make $ make test $ make install This is how I constantly install CPAN modules "by hand" into my ActiveState instance. I really hope this helps. The scripts are below. cheers and good luck- Mark cygwin paths...note these are both in $PATH /usr/local/bin/as2cyg.sh : #!/usr/bin/bash TF=$(uuidgen) conv-ASmake.sh Makefile > $TF mv $TF Makefile #end of as2cyg.sh /usr/local/bin/conv-ASMake.sh : (note this is a sed script) #!/usr/bin/sed -f #converting an ActiveState PERL Makefile to run under cygwin make: s/^DIRFILESEP = ^\\/DIRFILESEP = \// s/^NOOP = rem/NOOP = :/ # -or- NOOP = echo -n # byebye volume s/C:/\/cygdrive\/c/ # sed to convert directory \ to / s/\([\)0-9a-zA-Z.]\)\\\([\(0-9a-zA-Z]\)/\1\/\2/g # convert full perl s/\/usr\/bin\/perl/\/cygdrive\/c\/Perl\/bin\/perl/ # a key conversion for DOC_INSTALL action /^DESTINSTALLVENDORHTMLDIR/ a\ DECYGDESTINSTALLARCHLIB = $(subst /cygdrive/c,c:,$(DESTINSTALLARCHLIB)) # --- MakeMaker tools_other section: # let cygwin do native linux commands /^MAKE/ c\ MAKE = make /^CHMOD/ c\ CHMOD = chmod /^CP/ c\ CP = cp /^MV/ c\ #end of conv-ASMake.sh ----- Original Message ----- From: "Jonathan Cline" To: Cc: Sent: Friday, July 31, 2009 11:24 PM Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl >I recently mentioned working on Bio::Robotics for Tecan. Vendors > being MS-Win specific, the vendor software allows third-party software > communication through a named pipe (the literal filename is > "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific > and this pseudo-pipe is opened with sysopen() ). This is broken under > cygwin-perl due to cygwin's method of handling paths -- the sysopen > fails. However it works under ActiveState Perl and communication > through the named pipe (to the robot hardware) is OK. The standard > workaround is usually to use cygwin bash, and force the PATH to use > ActiveState perl. (Typical MS Windows incompatibility problem.) The > issue is: Perl module libraries for CPAN work under cygwin-perl > (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN > module use, or "make test", result in a bad list of incompatibility > problems. Yet ActiveState Perl is required for communicating to the > vendor application (unless there is some workaround to raw filesystem > access in cygwin-perl that I haven't found in 2 days of working this). > The stand-alone scripts I have work fine to access the named pipe > (using ActiveState Perl) since the standalone scripts have no module > INC dependencies, no CPAN module test harness, etc etc. > > This isn't specifically a Bio:: issue, though if anyone has > suggestions please email. I could try msys and see if it handles the > named-pipe-special-file better, if msys has an msys-perl distribution. > > -- > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >