Core 1.4.0 1.5.0 delta

From BioPerl
Jump to: navigation, search

These are detailed notes on changes made between bioperl-release-1-4-0 and bioperl-release-1-5-0.

Bio::Align::DNAStatistics
Change API to return Bio::Matrix::MatrixI rather than arrayrefs - this takes a step closer to having pure-perl replacement for dnadist (eventually protdist)
Fixed TajimaNei, added Uncorrected
Code cleanup - use substr rather than arrays for speed. Do some indenting cleanup
Update some docs
Some more doc updates
Chris Dwan's fixes
Support for calculating Kimura variance per Wu and Li 1995 and Kimura 1980
Bio::Align::PairwiseStatistics
Quiet undef warnings
Documentation update
Made SYNOPSIS section to compile, helped by "cd maintenance; ./modules.pl --synopsis"
Simpler routine for calculating number of gapped columns -- fixes Issue #1676
Bio::Align::ProteinStatistics
Perl implementation of protein distance
Bail when distance is going to be too far to calculate
Bio::Align::Utilities
98% speedup for aa_to_dna_aln, operate on strings instead of sequences, and just walk down each sequence rather than going through the alignment structure. Old implementation is left intact and renamed OLD_aa_to_dna_aln
Faster reverse translate
Add bootstrap_replicates method which will generate replicates for non-parametric bootstrap. Also remove old aa->codon alignment implementation, new one is good enough.
Make sure Id gets set for new randomized sequence in bootstrap
Some more doc updates
Tiny bug- only gap character - was considered, it is possible to expect . as well
Transform gaps
If we've gone beyond the end we're in same state as a gap
Deal with missing trailing gaps
Bio::AlignIO
A flag for allowing a forced set_displayname_flat flag to be set on initialization
Remove repetitive, correct examples, clarify
Made SYNOPSIS section to compile, helped by "cd maintenance; ./modules.pl --synopsis"
Bio::AlignIO::bl2seq
Some documentation update to record the -report_type information
Various HEAD changes committed to the branch
Bail if no HSP - how we got here I have no idea
Bio::AlignIO::clustalw
Modified to accomodate concatenated alignments
Various HEAD changes committed to the branch
Match SQUID clustalw 1.5 output
Better differentiation from SQUID's Clustal 1.5 and real Clustalw
Use the new flag which will allow one to specify displayname should be flat (and used in bp_sreformat with the --special option
Change regular expression to address Issue #1637
Grab the right number of bases off of the matchline as well. Allow clustalw version to be customized (global variable for now, can be tweaked later on if we want this)
Bio::AlignIO::emboss
Push score into the SimpleAlign object as well
Bio::AlignIO::fasta
Started adding sequence alignments that store sequences in a file. coding by Albert Vilella
Various HEAD changes committed to the branch
Force displayname flat
Bio::AlignIO::largemultifasta
Started adding sequence alignments that store sequences in a file. coding by Albert Vilella
Remove unnecessary length() calls, more docs
Minor doc and cosmetic changes
Various HEAD changes committed to the branch
Bio::AlignIO::maf
Added strand designation to the Bio::LocatableSeq object as per Issue #1721.
Brad F's maf Issue #1772
Fixed maf.pm so that co-ordinates are one-based rather than zero-based.
Bio::AlignIO::mega
Nathan Haigh's suggestion for making next_aln call unmatch on an alignment before returning it
Bio::AlignIO::meme
So far so good, but I don't know if the meme output file used in test has all the conditions we need
Capture strand column when it's there
Clean up - there were some warnings in 'make test'
Bio::AlignIO::msf
Wes Barris's suggested fix to make msf have stackpack like location information
Various HEAD changes committed to the branch
Format
Bio::AlignIO::nexus
Add documentation for _initialize/new, add a feature to disable writing of 'symbols' part of nexus header in order to support Mr.Bayes nexus flavor
Parse treebase nexus files a little better
Various HEAD changes committed to the branch
Better nexus parsing
Avoid dealing with undefs if we hit the end of the file first
Nathan's patch
More from Nathan
Nathan's additions
Nathan's fix
Dos2unix
Bio::AlignIO::phylip
Started adding sequence alignments that store sequences in a file. coding by Albert Vilella
Various HEAD changes committed to the branch
Proper parse of non-interleaved files now
Adding tag_length and flag_SI, and taking care of line_length as parameters to pretty write_aln to phylip
Not zero for tag_length - nonsense
Bio::AlignIO::po
Matthew Bettis's po alignment parser/writer
Matthew's patch
prints 'VERSION=bioperl' if the alignment source field is blank. (poa chokes on blank 'VERSION=' information.)
Bio::AlignIO::selex
Support '*' in the alignment as produced by matches to model in HMMER. Also retool loops to be simpler and capture the description (DE) from Stockholm
Bio::AlignIO::stockholm
Support '*' in the alignment as produced by matches to model in HMMER. Also retool loops to be simpler and capture the description (DE) from Stockholm
Bio::AnalysisI
Fix Revision string problems
Bio::AnnotatableI
. Bio::SeqFeatureI inherits Bio::AnnotatableI NOT Bio::AnnotationCollectionI . *_tag_* methods are in Bio::AnnotatableI, and internally defer to Bio::AnnotatableI->annotation->some_analagous_mapped_function() . deprecation warnings commented until 1.6 . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* methods map directly onto Bio::AnnotationI's Bio::AnnotationCollectionI instance. . Unflattener and Unflattener2 tests pass with no changes.
<rant> No no no no no. What the hell! This was supposed to be code working towards a stable branch! Don't add a dependancy that makes people have to install Graph::Directed to parse sequence files or use SearchIO. Bio::Seqfeature::Generic is a CORE part of the toolkit -- all changes should at least preserve its functionality and all the things it depends on. </rant> Points to the guts-l folks and anyone who actually is reading log commit messages in the future.
FTHelper uses SeqFeatureI calls rather than SeqFeature::Generic-specific calls.
Bio::Annotation::Comment
Fixed two minor bugs as reported by Peter v. Heusden.
Bio::Annotation::DBLink
Included version and possibly optional ID into the as_text() method.
Bio::Annotation::Reference
Allow initialization of all the components in new and rely on inheritance for the tagname field
Added recognition and capability of dealing with RG line in swissprot format.
Bio::Annotation::SimpleValue
Fixed two minor bugs as reported by Peter v. Heusden.
. Bio::SeqFeatureI inherits Bio::AnnotatableI NOT Bio::AnnotationCollectionI . *_tag_* methods are in Bio::AnnotatableI, and internally defer to Bio::AnnotatableI->annotation->some_analagous_mapped_function() . deprecation warnings commented until 1.6 . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* methods map directly onto Bio::AnnotationI's Bio::AnnotationCollectionI instance. . Unflattener and Unflattener2 tests pass with no changes.
Bio::Annotation::Target
Much better support for Targets in Bio::FeatureIO::gff via a new Bio::Annotation::Target object.
Allen was right--Bio::Annotation::Target should inherit from Bio::Range
Changes to work with scott's test gff. target processing seems borken.
Fixed minor bugs in Target.pm and Target related bugs in gff.pm
Made SYNOPSIS section to compile, helped by "cd maintenance; ./modules.pl --synopsis"
Bio::AnnotationCollectionI
Doc fix
Updated Bio::AnnotationCollection to implement *_tag_* methods with deprecation warning. these were taken from Bio::SeqFeatureI and Bio::SeqFeature::Generic. *_tag_* methods in Bio::SeqFeature::Annotated are now implemented by explicit pasthru to the conttained Bio::Annotation::Collection instance.
. Bio::SeqFeatureI inherits Bio::AnnotatableI NOT Bio::AnnotationCollectionI . *_tag_* methods are in Bio::AnnotatableI, and internally defer to Bio::AnnotatableI->annotation->some_analagous_mapped_function() . deprecation warnings commented until 1.6 . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* methods map directly onto Bio::AnnotationI's Bio::AnnotationCollectionI instance. . Unflattener and Unflattener2 tests pass with no changes.
Bio::AnnotationI
Added an ontology document registry. ontology store is now usable outside the Bio::OntologyIO subsystem as a class to create ontology structures on demand -- just ask the store for what you want by name, e.g. my $store = Bio::Ontology::OntologyStore->get_instance(); my $so = $store->get_ontology('Sequence Ontology'); file fetching happens behind the scenes with the help of recent modifications to Bio::Root::IO to support URLs, and Bio::Ontology::DocumentRegistry (just commited).
Bio::Assembly::Contig
Continuing to make Singlets work with the Assembly framework.
Bio::Assembly::IO::ace
Added a routine to read singlets in the next_seq method. Added (well, uncommented really) a routine for reading from the BS line to get chromat fine information from read information.
Creating the Singlet object. I'm not sure what has changed in sequencetrace.t but I'll look into that right away.
Singlet now works as expected. See t/singlet.t for examples of that I am trying to accomplish.
Continuing to make Singlets work with the Assembly framework.
Bio::Assembly::Scaffold
Added a routine to read singlets in the next_seq method. Added (well, uncommented really) a routine for reading from the BS line to get chromat fine information from read information.
Creating the Singlet object. I'm not sure what has changed in sequencetrace.t but I'll look into that right away.
Bio::Assembly::Singlet
This is a module to model a Singlet. I proposed creating this module on the mailing list in April 2004 and there were no objections so here it is.
Creating the Singlet object. I'm not sure what has changed in sequencetrace.t but I'll look into that right away.
Singlet now works as expected. See t/singlet.t for examples of that I am trying to accomplish.
Continuing to make Singlets work with the Assembly framework.
Bio::Biblio
Access pubmed via eutils, see: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html it would be nice to make Bio::DB::Biblio return Bio::Biblio::Ref object with Bio::DB::MeSH attached to them, but the Bio::Biblio::IO::pubmedxml parser doesn't currently handle enough of the pubmed xml elements to do this.
Edits
Fix Revision string problems
Bio::Biblio::IO::medline2ref
Fix Revision string problems
Bio::Biblio::IO::medlinexml
Fix Revision string problems
Bio::Biblio::IO::pubmed2ref
Fix Revision string problems
Bio::Biblio::IO::pubmedxml
Fix Revision string problems
Bio::Cluster::UniGene
Updated species.
Adding species update to branch.
Added parsing of HOMOL tag, added new species, tidied up tests
Merely whitespace formatting.
Bio::Cluster::UniGeneI
Added parsing of HOMOL tag, added new species, tidied up tests
Bio::ClusterIO::unigene
Added parsing of HOMOL tag, added new species, tidied up tests
Fixed ClusterIO::unigene not to set version unless it could parse one out.
Bio::CodonUsage::Table
Probable_codons() added - returns codons with frequency above a spcified threshold
Bio::Coordinate::Collection
Some documentation fixes Cood -> Coord
Docu and warning update from main trunk
Bio::Coordinate::GeneMapper
Pass verbose flag through to creation of locations as well - helps when debugging
Also match on stop codon, not just aa/nt - this removes error where coordinate::pairs were not mapping correctly from a pairwise alignment
Push verbosity status down to child objects
Bio::Coordinate::Graph
Old docu update
Bio::Coordinate::Pair
Also match on stop codon, not just aa/nt - this removes error where coordinate::pairs were not mapping correctly from a pairwise alignment
Prettyness update
Bio::Coordinate::Utils
Cleanup for other types of gap chars
Don't match '.' - that is special code here - only match '-'
Because we detect gaps via '-' translate gaps to '-' before getting gap line - some MSAs have this
Also match on stop codon, not just aa/nt - this removes error where coordinate::pairs were not mapping correctly from a pairwise alignment
Test with warning message
New code to map from sequence to position in alignment rather than position in the other sequence for a pair
New method to generate coordinate mapper from sequence to the alignment space- merged from main trunk
Fix method name in docu
Bio::DB::Biblio::biofetch
Fix Revision string problems
Bio::DB::Biblio::eutils
Access pubmed via eutils, see: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html it would be nice to make Bio::DB::Biblio return Bio::Biblio::Ref object with Bio::DB::MeSH attached to them, but the Bio::Biblio::IO::pubmedxml parser doesn't currently handle enough of the pubmed xml elements to do this.
Rather use offset of zero for cursor
Fixed a bug in get_next(). Made sure you get an exception if you try this again.
Added ability to search other dbs, such as pmc, in addition to pubmed, which was the only option before.
More docs
Bio::DB::Biblio::soap
Fix Revision string problems
Bio::DB::BiblioI
Fix Revision string problems
Bio::DB::BioFetch
Little code cleanup after the testing
Bio::DB::EMBL
Nathan's additions
Bio::DB::Failover
Fixed Bio::DB::Failover to properly pass get_seq_by_version() method, and fixed Bio::DB::Flat::BDB to properly implement it
Bio::DB::Fasta
Workaround for Win32 glob() failures due to long file names that contain whitespace
Lincoln's Win32 glob changes merged onto branch
Close and reopen index file to possibly avoid corruption issues on Windows platforms
Quenched uninitialized variable warning
Speculative changes to the search_notes() feature which will increase sensitivity but cause hits to non-full words as well
Removing cvs conflict messages (where there didn't appear to be a conflict anyway)
Killed uninit variable warning when reindexing a database for the first time
Bio::DB::FileCache
Issue #1628 - don't try and store things we didn't find
Issue #1628 - don't try and store things we didn't find
Bio::DB::Flat
Issue #1642 patch
Rearrange format
Remove warnings
Odd spacing
Bio::DB::Flat::BDB
Fixed Bio::DB::Failover to properly pass get_seq_by_version() method, and fixed Bio::DB::Flat::BDB to properly implement it
Bio::DB::Flat::BDB::embl
Fixed Bio::DB::Failover to properly pass get_seq_by_version() method, and fixed Bio::DB::Flat::BDB to properly implement it
Bio::DB::Flat::BDB::genbank
Fixed Bio::DB::Failover to properly pass get_seq_by_version() method, and fixed Bio::DB::Flat::BDB to properly implement it
Bio::DB::Flat::BDB::swiss
Fixed Bio::DB::Failover to properly pass get_seq_by_version() method, and fixed Bio::DB::Flat::BDB to properly implement it
Bio::DB::Flat::BDB::swissprot
Fixed transcript glyph so that nonstranded features are not automatically treated as + strand
Bio::DB::Flat::BinarySearch
Documentation errors
Fixed Bio::DB::Flat::BinarySearch to respect get_seq_by_version()
Pass format
Eliminate warnings by providing a regexp
Bio::DB::GFF
Patch problems that occur on win32 platforms with long names that contain spaces
Lincoln's Win32 glob changes merged onto branch
Fixed awful error which turns 0 attributes into undefined ones
Applied patch for precedence in GFF2 group assignment
Added a workable mechanism to change the grouping behavior in the ninth column of GFF2
Fixed bad merge; could someone check that the grep() solution is any faster than the original loop?
Fixed documentation errors in synopsis section
Fixed nasty bug in Bio::DB::GFF->new() routine which will cause all subsequent new() attempts to fail if the first one fails. This is a lesson not to rely on $@ being undef after a success!
Performance improvements in way that group field is handled during GFF2 parsing
Modified GFF3 compatability so that order of ID preference is "Target", "Parent" and "ID"
Quenched a bug that prevented Bio::DB::GFF from loading attributes whose values are zero.
Changed behavior of features() method so that aggregation occurs even if the type is not specified
Made Bio::DB::GFF more RandomAccessI compliant (apparently this hadn't been tested!)
Can now specify the "require_whole_object" behavior of aggregators in constructor
Moved parse_types() method from GFF.pm to DasI.pm it's a utility method, and unassuming enough to be in the interface class.
Fixed bad bug in GFF3 group handling
Bio::DB::GFF::Adaptor::dbi
Keep from croaking with invalid SQL when unknown feature types are requested
Squashed bug which prevented features added via add_feature() from appearing during the render() call
Speculative changes to the search_notes() feature which will increase sensitivity but cause hits to non-full words as well
Folded in Aaron Mackey patch for wildcard matching on source and method
Bio::DB::GFF::Adaptor::dbi::mysql
Modified GFF3 compatability so that order of ID preference is "Target", "Parent" and "ID"
Speculative changes to the search_notes() feature which will increase sensitivity but cause hits to non-full words as well
Added warnings regarding the maxbin value
Bio::DB::GFF::Adaptor::dbi::pg
Speculative changes to the search_notes() feature which will increase sensitivity but cause hits to non-full words as well
Added a few indexes to the postgres GFF adaptor that were missing
Bio::DB::GFF::Adaptor::memory
Added simple bumping and horizontal whitespace control per user request
Close and reopen index file to possibly avoid corruption issues on Windows platforms
Quelched a non-numeric argument warning
Added get_feature_by_name() and search_notes() methods
Squashed bug which prevented features added via add_feature() from appearing during the render() call
Made the description returned by search_notes() a little more specific
Fixed a bug in memory adapter that caused segments found by wildcard searching to be (sometimes) flipped in wrong orientation
Bio::DB::GFF::Aggregator
Quelched a non-numeric argument warning
Fixed error which caused method() to return undef on custom-created aggregators
Old docu update
Fixed a problem in which features with same groups but different sources got aggregated together even when user requests sources explicitly
Can now specify the "require_whole_object" behavior of aggregators in constructor
Bio::DB::GFF::Aggregator::alignment
Updated for arrival of new methods in elegans release WS121
Bio::DB::GFF::Aggregator::clone
Updated for arrival of new methods in elegans release WS121
Keys on the left or right side now adjust padding automatically
Bio::DB::GFF::Aggregator::coding
Suppress uninit variable warnings in the make_link() function
Bio::DB::GFF::Aggregator::processed_transcript
Quelched a non-numeric argument warning
Bio::DB::GFF::Feature
Remove leftover debugging
Use these variable we computed score into - handles undef problems for us
Fixed missing strand when start==stop
Fixed problem with gff3 dumping in which ID and Parent attributes were missing from features that had Target attributes
Fixed longstanding apparent bug in the merged_segments() call
Removed extraneous debugging code
Setting refseq() of top-level feature should change refseq of subfeatures as well, no?
Folded in Aaron Mackey patch to shrink parent features if subfeatures dont fill it
Bio::DB::GFF::RelSegment
Fixed biographics side key so as not to be cut off for short tracks
Quashed uninit variable warnings
Added utility routines to help gbrowse
Bio::DB::InMemoryCache
Issue #1628 - don't try and store things we didn't find
Issue #1628 - don't try and store things we didn't find
Add Id line
Bio::DB::NCBIHelper
Worked on the genbank query fetcher and got it to return more than 500 records, but maybe it is because NCBI decided to cooperate and not because of anything I did :-(
Added gbwithparts key to %FORMATMAP to deal with some genbank nucleotide records that by default don't contain all CDS features, such as L42023. To force the Bio::DB object to get all the features, you can do the following: my $gb = new Bio::DB::GenBank; $gb->request_format('gbwithparts');
Bio::DB::Query::GenBank
Unset retmax - Brad Chapman's patch.
Fixed date format that is now expected by NCBI. Also, the query produces a different result now. Fixed the expected lengths as a temporary hack.
Actually make efetch work for >500 seqs?
Fixed the featurefile class to allow for link generation when building imagemaps
Worked on the genbank query fetcher and got it to return more than 500 records, but maybe it is because NCBI decided to cooperate and not because of anything I did :-(
Bio::DB::Query::WebQuery
Fixed date format that is now expected by NCBI. Also, the query produces a different result now. Fixed the expected lengths as a temporary hack.
Bio::DB::RefSeq
Nathan's additions
Bio::DB::Registry
Need to eval{} getpwuid() call since some Perl's don't implement it.
Don't rely on HOME
Little code cleanup after the testing
Bio::DB::SwissProt
Nathan's additions
Bio::DB::Taxonomy
Add API for supporting retrieval of child nodes
Bio::DB::Taxonomy::entrez
Fix so that case doesn't matter and apply Andreas Kahari's fix Issue #1583
Taxonomy::Node can now masquerade as a Bio::Species
Move dependancy check for XML::Twig to where it is actually needed
Bio::DB::Taxonomy::flatfile
Add API for supporting retrieval of child nodes
Make sure we remove the handle before untie-ing
Get species name correct when building classification
Deal with uninit and unknown taxa requests with some grace
Bio::DB::WebDBSeqI
Worked on the genbank query fetcher and got it to return more than 500 records, but maybe it is because NCBI decided to cooperate and not because of anything I did :-(
The retry code seems to be working now
Chomp the right variable
Fixed bug in _stream_request() that was causing Bio::DB::BioFetch to fail when attempting to read FASTA format.
Nathan's patch for Windows
Nathan's additions
Format
Bio::Das::FeatureTypeI
Implementing a Bio::Das::FeatureTypeI class, this required touching several files during debug. I think the only changes made are formatting and switchable debug warningses.
Bio::Das::SegmentI
The contained_in() and contains() methods were handling list arguments incorrectly
Bio::DasI
Moved parse_types() method from GFF.pm to DasI.pm it's a utility method, and unassuming enough to be in the interface class.
Bio::Expression::FeatureSet::FeatureSetMas50
Dos2unix
Bio::Factory::AnalysisI
Fix Revision string problems
Bio::Factory::FTLocationFactory
Avoid warnings when location specified with start > end
Deal with non-numeric start/end separately
Deal with ? as start or end as well, okay this is getting a little ridiculous, just to handle reversed start/end...
Yuck. lookaheads and balanced parens. But this is a problem that has been around for a while, glad to finally fix it. Issue #1674 describes the behavior. Couldn't previously handle nested join(join()) properly
Bio::Factory::LocationFactoryI
Some documentation for the interface
Update email documentation
Bio::FeatureHolderI
See bp maillist for full desc the script seq/unflatten_seq will now generate GFF3 - the unflattener module is used to build the 'feature graph' connecting genes, transcripts, exons and CDSs together. This means we can have GFF3 for anything in genbank! As far as I'm aware, the only other sensible output formats to use here (ie formats that support feature graphs/containment hierarchies) are: chado, chaos, and the write-only asciitree. This feature graph is written out in the GFF3 using the ID and Parent tags. To do this there is an extra intermediate step - the bioperl FeatureHolderI hierarchy is traversed and ID/Parent tags are generated. Here is a description of the changes I have made: [unless you're a bioperl hacker you don't really need to read the rest of this] You can get the context of what I'm on about from this thread: http://bioperl.org/pipermail/bioperl-l/2003-December/014150.html Two new public methods: FeatureHolderI->set_ParentIDs_from_hierarchy sets both ID and ParentID from FeatureHolder hierarchy SeqFeatureI->generate_unique_persistent_id this is required by the above method Lincoln wanted this to be private, but I think it has to be called from outside FeatureHolderI->create_hierarchy_from_ParentIDs the inverse of set_ParentIDs_from_hierarchy
Moved ID methods to new class IDHandler see thread 'new GFF3 support methods' http://bioperl.org/pipermail/bioperl-l/2004-March/thread.html
On rare occasions, subfeatures might appear more than once in the FeatureHolder hierarchy. E.g. an exon might be a subfeature of two transcripts, which both are subfeatures of the same gene. Fixed 'get_all_SeqFeatures' method to take this into account.
Bringing SeqFeature::Annotated up to be Bio::SeqFeatureI compliant. added add_SeqFeature() and remove_SeqFeature() methods to Bio::FeatureHolderI
'EXPAND' option in 'add_SeqFeature' now documented.
Bio::FeatureIO
Initial commit of FeatureIO subsystem
Doc fix
Adding BED module. write-only, and not yet tested. http://www.genome.ucsc.edu/goldenPath/help/customTrack.html#BED
Cvs commit
Interpro sets 'source' of feature to value of match 'evidence' attribute, which is the program name that made the match. also adds friendly name of interpro accession as a 'comment' annotation. gff modified to support writing of 'Note' attributes.
Doc update, also _guess_format() tweak to remove Bio::Tools dependency
Bio::FeatureIO::bed
Adding BED module. write-only, and not yet tested. http://www.genome.ucsc.edu/goldenPath/help/customTrack.html#BED
Added header functionality to BED writer.
Bio::FeatureIO::gff
Initial commit of FeatureIO subsystem
Basic support for gff i/o
Gff parser wasn't initializing IO properly
Added support for more canonical gff attributes. these are represented using Bio::Annotation::SimpleValue objects
Added handling of more gff3 attributes. added test gff3 file to t/data. will add unit tests soon.
Adding BED module. write-only, and not yet tested. http://www.genome.ucsc.edu/goldenPath/help/customTrack.html#BED
Now writes GFF v2.5 (GTF)
Added a source() method to Bio::SeqFeature::Annotated. updated warning messages to state that tag* methods are deprecated
Two things: * adding SOFA as an available ontology to DocumentRegistry.pm * modifying FeatureIO::gff to use SOFA to validate, and to parse Ontology_term
Modifications to allow url fetch of gene ontology. this was not easy b/c of the multiple .ontology files for each aspect. Bio::SeqFeature::Annotated objects now instantiate Ontology_term tags as Bio::Annotation::OntologyTerm objects, not Bio::Annotation::SimpleValue objects (Scott!)
Patch from Steffen Grossmann to parse fasta (with no result)
Trying to add Target handling to FeatureIO::gff--it doesn't work yet, but is no more broken than before. Also added a phase method to Bio::SeqFeatureAnnotated and fixed confusion over frame vs phase.
Fixed some bugs in the usage of the 'source' entry. Implementation now is based on the 'source' method of Bio::SeqFeature::Annotated.
Added the ability to handle non-reserved word tags; still on the list of things to do is handling of Target tags and sequence-region and fasta directives.
Added stuff to support fasta and target processing. The quesion remains what to do with this data once you have it--particularly the fasta data. Should there be (or is there) a next_sequence() method?
Cleanup to use annotation shortcut methods
Much better support for Targets in Bio::FeatureIO::gff via a new Bio::Annotation::Target object.
Allen was right--Bio::Annotation::Target should inherit from Bio::Range
Changes to work with scott's test gff. target processing seems borken.
Added a 'next_feature_group' method as a starter for retrieving more than flat arrays of features from GFF3 files. Moved some of the functionality from 'next_feature' into the new method '_next_feature_or_directive'. Existing functionality is not changed by this.
'gff-version' directive is now placed on top of a file which is opened for output. There was a strange behaviour when I tried to use 'mode' to detect whether the file is writeable. I got some extra output which must have been related to the fact that 'mode' tries to access the file. Therefore I did it differently (certainly not optimal). Any suggestions/comments?
'source' in Bio::SeqFeature::Annotated gives back a Bio::Annotation::SimpleValue object. Make sure to print its value.
Bio::SeqFeature::Annotated: Unified the implementation of attribute accessor methods. 'seqid', 'type', 'source', 'score', 'frame' and 'phase' now all use the AnnotationCollection directly and give back Bio::AnnotationI implementing objects. Scalar values as well as appropriate Bio::AnnotationI implementing objects can be used on set. Default values ('.') are returned when called without previous setting. 'seqid' and 'source' uri_escape their values on set with scalars. 'seqid' itself is new and should replace 'id' for better compatibility with gff2 specs, but I left 'id' untouched. Bio::FeatureIO::gff: Adapted to the changes in Bio::SeqFeature::Annotated. Especially switched back to the use of 'seqid' instead of 'id'.
I'm trying to 'fix' the constructor of Annotated.pm and the write_feature method of gff.pm so that I can create an arbitrary Annotated sequence that I can use to generate a line of gff. I'm sure I did several things along the way.
Interpro makes features returned of type 'region'. gff3 produced now correctly uri-escapes attribute column, and additionally displays Ontology_term and Dbxref attributes.
Interpro sets 'source' of feature to value of match 'evidence' attribute, which is the program name that made the match. also adds friendly name of interpro accession as a 'comment' annotation. gff modified to support writing of 'Note' attributes.
Consolidated seq_id/seqid/id to seq_id.
Interpro SO accession did not match name. gff writer now supports 'Target' annotations.
- patched in validation code from Rob Edwards. - added code to parse ##sequence-region (based on Rob Edwards patch). sequences given as ##sequence-region directives are buffered onto the next_feature() stream. - added a next_seq() method for reading the FASTA section (if any) at the bottom of the file. this will only ever return objects after all features have been read. introduces dependency on Bio::SeqIO - refactored next_feature_group() to use recursion rather than clunky _next_feature_or_directive() method (removed).
Bio::FeatureIO::gtf
Now writes GFF v2.5 (GTF)
Bio::FeatureIO::interpro
Moved over from Bio::SeqIO::interpro
Interpro makes features returned of type 'region'. gff3 produced now correctly uri-escapes attribute column, and additionally displays Ontology_term and Dbxref attributes.
Interpro sets 'source' of feature to value of match 'evidence' attribute, which is the program name that made the match. also adds friendly name of interpro accession as a 'comment' annotation. gff modified to support writing of 'Note' attributes.
Consolidated seq_id/seqid/id to seq_id.
Interpro SO accession did not match name. gff writer now supports 'Target' annotations.
Bio::Graph::Edge
Initial commit
Made SYNOPSIS section to compile, helped by "cd maintenance; ./modules.pl --synopsis"
Bio::Graph::IO
Initial commit
Debug
Threshold option
Threshold parameter specifiable
Bio::Graph::IO::dip
Intial commits
Write_network() method added
Debug for test
Debugw
Threshold option
Debug
Bio::Graph::IO::psi_xml
Intial commits
Debug
Bio::Graph::ProteinGraph
Initial commit of protein network module
Debug for test
3 new methods added: remove_nodes(), clustering_coefficient(), remove_dup_edges()
New method - unconnected_nodes()
New method - articulation_points()
Algoritm improvement to articulation_points
Revised articulation_points() now faster
Bugfix
New methods edge_count(), node_count(), debugged union()
New method, neighbour_count()
Improved articulation_point() now independent of implementation of SimpleGraphTraversal
Made SYNOPSIS section to compile, helped by "cd maintenance; ./modules.pl --synopsis"
Get rid of 'redeclare' warning
Bio::Graph::SimpleGraph
Commit of Nat Goodman's new SimpleGraph module
Debug for test
Small fixes to the names of the modules (Bio::Graph::SimpleGraph vs SimpleGraph)
Tidy up
Made SYNOPSIS section to compile, helped by "cd maintenance; ./modules.pl --synopsis"
Bio::Graph::SimpleGraph::Traversal
Initial commit of Nat Goodman's graph traversal module
Debug for test
Algoritm improvement to articulation_points
Tidied up
Test that get_all deals with non-hash based nodes
Dependencies on node being a hashref removed
Bio::Graphics::Feature
Backed out Jason's very sensible paranoia because it breaks ability to deliberately make an undef setting
Fixed the cds and translation glyphs so that they honor the -flip argument correctly
Added get_feature_by_name() and search_notes() methods
Keys on the left or right side now adjust padding automatically
Added utility routines to help gbrowse
Bio::Graphics::FeatureFile
Paranoia to avoid dereferencing undefinded values
Backed out Jason's very sensible paranoia because it breaks ability to deliberately make an undef setting
Added code for generating clickable imagemaps in CGI scripts
Quashed uninit variable warnings
Hopefully fixed all the linkrule problems
Fixed more confusion in the make_link() method (how I wish I\'d never started this!
Quelched a non-numeric argument warning
Removed trailing whitespace from each line before parsing. I hope nobody wants/needs those.
Suppress uninit variable warnings in the make_link() function
Added get_feature_by_name() and search_notes() methods
Auto-label attribute should now work correctly, at cost of some memory bloat
Speculative changes to improve performance of uploaded 3d party features in genome browser
Performance improvements in way that group field is handled during GFF2 parsing
Fixed the MAX_REMAP variable so as to be a more rational value
Squashed bug which prevented features added via add_feature() from appearing during the render() call
Fixed the featurefile class to allow for link generation when building imagemaps
Fixed handling of reverse strands in the super-short version of the featureFile renderer
Fixed a bug in memory adapter that caused segments found by wildcard searching to be (sometimes) flipped in wrong orientation
Change to allow a prototyped coderef to be passed in from a configuration (ie, to allow something like 'sort_order = sub ($$) {...}'
Bio::Graphics::Glyph
Fixed a problem that occurs when an unrecognized font is passed to the glyph
Conditional use of ellipse() and filledEllipse() for backwards support of gd 1.8.4. NB: this might fail if user has mismatched gd/GD versions as it checks for can() of above methods.
Merged conditional use of gd2-specific methods, enabling backward compatibility with gd 1.8.4
Paranoia to avoid dereferencing undefinded values
Added simple bumping and horizontal whitespace control per user request
Added code for generating clickable imagemaps in CGI scripts
Quashed uninit variable warnings
Quelched a non-numeric argument warning
Keys on the left or right side now adjust padding automatically
Resolved (at least some) clipping bugs for features that extend over the end of the panel
Added recognition of 'maxdepth' option. if declared for a track and > 0, the glyph only attempts to render subfeatures down to level 'maxdepth'.
Improved alignment of bases when segments are showing a multiple alignment; still problems involving ragged ends and dynamic realignment
Fixed highmag segments glyph crash (uncommitted changes from yesterday)
Implementing a Bio::Das::FeatureTypeI class, this required touching several files during debug. I think the only changes made are formatting and switchable debug warningses.
Bio::Graphics::Glyph::Factory
Resolved (at least some) clipping bugs for features that extend over the end of the panel
Added the EMBL/Genbank entry renderer to the scripts directory, since it is probably of general interest
Implementing a Bio::Das::FeatureTypeI class, this required touching several files during debug. I think the only changes made are formatting and switchable debug warningses.
Bio::Graphics::Glyph::anchored_arrow
Keys on the left or right side now adjust padding automatically
Issue #1641 - documentation fix
Issue #1641 - documentation fix
Bio::Graphics::Glyph::arrow
Patched in Aaron's flipping fixes
Issue #1641 - documentation fix
Issue #1641 - documentation fix
Bio::Graphics::Glyph::broken_line
Added new glyphs
Bio::Graphics::Glyph::cds
Fixed the cds and translation glyphs so that they honor the -flip argument correctly
Fixed color problems in reverse-strand-flipped 6-frame translation glyph
Bio::Graphics::Glyph::christmas_arrow
Added new glyphs
Bio::Graphics::Glyph::dashed_line
Glyphs requested by KEGG, courtesy Simon
Added new glyphs
Bio::Graphics::Glyph::diamond
Fixed track clipping problems
Bio::Graphics::Glyph::dna
Fixed the cds and translation glyphs so that they honor the -flip argument correctly
Fixed color problems in reverse-strand-flipped 6-frame translation glyph
Bio::Graphics::Glyph::dot
Conditional use of ellipse() and filledEllipse() for backwards support of gd 1.8.4. NB: this might fail if user has mismatched gd/GD versions as it checks for can() of above methods.
Merged conditional use of gd2-specific methods, enabling backward compatibility with gd 1.8.4
Bio::Graphics::Glyph::dumbbell
Glyphs requested by KEGG, courtesy Simon
Fixes to the dumbbell glyph from Simon
Added new glyphs
Bio::Graphics::Glyph::flag
Glyphs requested by KEGG, courtesy Simon
Bio::Graphics::Glyph::generic
Fixed problem of right-side of glyph labels colliding with right-side key
Fixed a crash that occurs when drawing right & left keys and no default background specified for panel; added magic code to display tags of type "note" or "description" as default descriptions
Bio::Graphics::Glyph::lightning
Added a fairly ridiculous new lightning glyph. April showers bring May flowers...
Bio::Graphics::Glyph::pentagram
Added new glyphs
Bio::Graphics::Glyph::ragged_ends
Initial import
Bio::Graphics::Glyph::repeating_shape
Glyphs requested by KEGG, courtesy Simon
Bio::Graphics::Glyph::saw_teeth
Glyphs requested by KEGG, courtesy Simon
Bio::Graphics::Glyph::segments
Improved alignment of bases when segments are showing a multiple alignment; still problems involving ragged ends and dynamic realignment
Rolled back changes
Rewrote the way multiple alignments are displayed from scratch
Improved display of inserted bases
Fixed crash in segments glyph when drawing multiple alignments at high mag
Still problems in boundary conditions when no segment is in current display window
Fixed problem of phony DNA appearing when zoomed into a gap at high mag
Fixed highmag segments glyph crash (uncommitted changes from yesterday)
Bio::Graphics::Glyph::text_in_box
Added new glyphs
Bio::Graphics::Glyph::three_letters
Glyphs requested by KEGG, courtesy Simon
Bio::Graphics::Glyph::tic_tac_toe
Added new glyphs
Bio::Graphics::Glyph::track
Fixed problem of right-side of glyph labels colliding with right-side key
Fixed track clipping problems
Bio::Graphics::Glyph::transcript
Performance improvements in way that group field is handled during GFF2 parsing
Fixed transcript glyph so that nonstranded features are not automatically treated as + strand
Bio::Graphics::Glyph::transcript2
Performance improvements in way that group field is handled during GFF2 parsing
Resolved (at least some) clipping bugs for features that extend over the end of the panel
Bio::Graphics::Glyph::translation
Workaround for the dreaded difference between a Bio::SeqFeature and a Bio::Seq
Fixed the cds and translation glyphs so that they honor the -flip argument correctly
Added the EMBL/Genbank entry renderer to the scripts directory, since it is probably of general interest
Fixed problem of translation glyph not aligning to flipped coordinates; issue of colorization of flipped six-frame translation still pending
Fixed colors of translation glyph so that when the image is flipped, the colors of the frames flip appropriately
Another attempted fix of the padding problem
Fixed color problems in reverse-strand-flipped 6-frame translation glyph
Removing a redundant line to eliminate a warning
Bio::Graphics::Glyph::two_bolts
Added new glyphs
Bio::Graphics::Glyph::wave
Added new glyphs
Bio::Graphics::Glyph::weighted_arrow
Added new glyphs
Bio::Graphics::Glyph::xyplot
Added defaults to xyplot graph type
Some people are using -graphtype option rather than -graph_type, and if you can't beat 'em join 'em
Formatting fixes for xyplot, including drawing labels which were previously neglected
Removed a debugging statement
Resolved (at least some) clipping bugs for features that extend over the end of the panel
Fixed bad bug in GFF3 group handling
Bio::Graphics::Panel
Paranoia to avoid dereferencing undefinded values
Added simple bumping and horizontal whitespace control per user request
Fixed biographics side key so as not to be cut off for short tracks
Added code for generating clickable imagemaps in CGI scripts
Hopefully fixed all the linkrule problems
Fixed more confusion in the make_link() method (how I wish I\'d never started this!
Keys on the left or right side now adjust padding automatically
Improved documentation of boxes() result
Squashed bug which prevented features added via add_feature() from appearing during the render() call
Resolved (at least some) clipping bugs for features that extend over the end of the panel
Fixed problem of right-side of glyph labels colliding with right-side key
Fixed track clipping problems
Fixed a crash that occurs when drawing right & left keys and no default background specified for panel; added magic code to display tags of type "note" or "description" as default descriptions
Fixed mis-registration of grid with tickmarks when viewing flipped regions; problem of translation glyph not aligning is still pending
Fixed scrambled "bottom" key when viewing flipped regions; problem of translation glyph not aligning is still pending
Small fix brought out by recent changes to GBrowse; user may now specify image class simply using SVG instead of fully qualified GD::SVG package
Bio::Graphics::Pictogram
Need to include SeqIO
Bio::Index::Abstract
Add warning if overwriting an existing value for sequence id -- to turn off the warning set verbose < 0
Format
Bio::Index::AbstractSeq
Nathan's additions
Bio::Index::Blast
Note on id_parser
Bio::Index::Fasta
Format
Note on id_parser
Bio::Index::GenBank
Format
Bio::Index::Hmmer
Adding hmmer index
Probably want to put this after we keep track of headers - per Marc Logghe individually reported bug
Made SYNOPSIS section to compile, helped by "cd maintenance; ./modules.pl --synopsis"
Note on id_parser
Bio::Index::Qual
Added Mark's Bio::Index::Qual module
Note on id_parser
Bio::Index::Swissprot
Add id_parser method
Suppress a warning
Note on id_parser
Bio::LiveSeq::IO::Loader
Jouni's patch
Bio::LocatableSeq
Escape the special characters out of paranoia
Also match on stop codon, not just aa/nt - this removes error where coordinate::pairs were not mapping correctly from a pairwise alignment
More fixes to handle stop codons in alignments and remapping positions
Bio::Location::Atomic
Documentation fix, Issue #1731 fix
Bio::Location::AvWithinCoordPolicy
Documentation fix, Issue #1731 fix
Bio::Location::Fuzzy
Documentation fix, Issue #1731 fix
More cleanup for Issue #1731
Bio::Location::FuzzyLocationI
Documentation fix, Issue #1731 fix
Bio::Location::NarrowestCoordPolicy
Documentation fix, Issue #1731 fix
Bio::Location::Simple
Documentation fix, Issue #1731 fix
Bio::Location::Split
Documentation fix, Issue #1731 fix
Bio::Location::SplitLocationI
Documentation fix, Issue #1731 fix
Bio::Location::WidestCoordPolicy
Documentation fix, Issue #1731 fix
Bio::LocationI
Documentation fix, Issue #1731 fix
Bio::Map::Clone
Jamie Hatfield et al's Bio::Map stuff for FPC Marker data
Bio::Map::Contig
Jamie Hatfield et al's Bio::Map stuff for FPC Marker data
Bio::Map::FPCMarker
Jamie Hatfield et al's Bio::Map stuff for FPC Marker data
Bio::Map::Physical
Jamie Hatfield et al's Bio::Map stuff for FPC Marker data
Bio::MapIO::fpc
Jamie Hatfield et al's Bio::Map stuff for FPC Marker data
Bio::Matrix::PSM::IO
Added psiblast to list of allowable file formats in IO.pm.
Fixed a problem in IO.pm: moved throw to $class rather than $self, as $self isn't defined yet.
Fix problems with ISA inheritance
Dos2unix
Adding a matrix fasta- 'masta' file format Only DNA support for now
Bio::Matrix::PSM::IO::mast
Spelling
Bio::Matrix::PSM::IO::masta
Adding a matrix fasta- 'masta' file format Only DNA support for now
Bio::Matrix::PSM::IO::meme
Now parses the log-odds
Bug fixed- supports -revcomp option now and gives info on the strand of the instance parser will throw an exception now if it can't find the right number of columns
Get rev strand properly
Critical bug- when instance sequence at the end of the input sequence
Critical bug- when instance sequence at the end of the input sequence
+ to 1/- to 0 for the strand
Gap char . added to the alphabet
Bio::Matrix::PSM::IO::psiblast
Initial commit of ProtMatrix.pm, a module for handling position-specific matrices with protein alphabets by implementing SiteMatrixI.pm (basically a protein analogue to SiteMatrix.pm). Not yet fully functional, but very close. Initial commit of psiblast.pm, a parser for ASCII-formatted PSI-BLAST matrices. Returns a Bio::Matrix::PSM::ProtMatrix object. See "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." by Altschul et. al (1997) for more information.
Changed documentation in psiblast.pm to reflect use of ProtPsm rather than Psm module. Added alphabet subroutine to ProtMatrix.pm.
Cleaned up whitespace in psiblast formatter, removed unnecessary break condition in file parsing loop.
Added PsmHeader and Bio::Root::Root to @ISA.
Fix problems with ISA inheritance
Bio::Matrix::PSM::IO::transfac
Parses any reference data, stores it into the Psm object that is returned
Bio::Matrix::PSM::InstanceSiteI
Dos2unix
Bio::Matrix::PSM::ProtMatrix
Initial commit of ProtMatrix.pm, a module for handling position-specific matrices with protein alphabets by implementing SiteMatrixI.pm (basically a protein analogue to SiteMatrix.pm). Not yet fully functional, but very close. Initial commit of psiblast.pm, a parser for ASCII-formatted PSI-BLAST matrices. Returns a Bio::Matrix::PSM::ProtMatrix object. See "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." by Altschul et. al (1997) for more information.
Added the IUPAC method in ProtMatrix.pm to comply with SiteMatrix.pm interface. Added my name to the AUTHORS file. :)
- Made regexp methods in ProtMatrix.pm accept a threshold. - Fixed sequence_match_weight method in ProtMatrix.pm. - Added testing for regexp and sequence_match_weight methods in ProtMatrix.t.
Changed documentation in psiblast.pm to reflect use of ProtPsm rather than Psm module. Added alphabet subroutine to ProtMatrix.pm.
Fixed documentation in ProtMatrix. Added another column to psiblast header in PsmHeader.pm.
Cleaned up docs in ProtMatrix.
Removed unimplemented method get_compressed_freq and get_compressed_logs. Fixed PWM->PSM in Perldoc.
Bio::Matrix::PSM::ProtPsm
Initial commit of ProtPsm.pm, a module for holding ProtMatrix and SiteMatrix information on protein-specific PSM's.
Need to 'use' a module before declaring inheritance with ISA
Bio::Matrix::PSM::Psm
Now isa Bio::Annotation object
Psm has to initialize as Annotation::Collection, done
When calling matrix method log-odds were lost
Unnecessary Bio::Root::Root in ISA
Bio::Matrix::PSM::PsmHeader
Added psiblast information to PsmHeader.pm.
Fixed documentation in ProtMatrix. Added another column to psiblast header in PsmHeader.pm.
Bio::Matrix::PSM::PsmHeaderI
Dos2unix
Bio::Matrix::PSM::SiteMatrix
Regexp was incorrectly created
Compression of frequencies and log-odds: one position, one base is a single char
Added checks and documentation
Simple consensus can be calculated now using threshold value
New method- get the score for a sequence match, calculate from the log-odds
Method added: get_all_vectors, all possible seq to satisfy the PFM under a give threshold
Aaaron's suggestions
Aaaron's suggestions
Uninitialized variable-fixed
Next_pos will return weights as well now
Method added to convert to PSM(PWM) from PFM
Sequence_match_weight will produce more info to locate a possible problem
Bio::Matrix::PSM::SiteMatrixI
Added checks and documentation
Simple consensus can be calculated now using threshold value
New method- get the score for a sequence match, calculate from the log-odds
Method added: get_all_vectors, all possible seq to satisfy the PFM under a give threshold
Aaaron's suggestions
Doc update
Method added to convert to PSM(PWM) from PFM, documentation added for calc_weight
Bio::Matrix::PhylipDist
Shorten code, debug warnings when missing a value
Bio::Ontology::DocumentRegistry
Added an ontology document registry. ontology store is now usable outside the Bio::OntologyIO subsystem as a class to create ontology structures on demand -- just ask the store for what you want by name, e.g. my $store = Bio::Ontology::OntologyStore->get_instance(); my $so = $store->get_ontology('Sequence Ontology'); file fetching happens behind the scenes with the help of recent modifications to Bio::Root::IO to support URLs, and Bio::Ontology::DocumentRegistry (just commited).
Fixed duplicate definition of get_instance().
Two things: * adding SOFA as an available ontology to DocumentRegistry.pm * modifying FeatureIO::gff to use SOFA to validate, and to parse Ontology_term
Modifications to allow url fetch of gene ontology. this was not easy b/c of the multiple .ontology files for each aspect. Bio::SeqFeature::Annotated objects now instantiate Ontology_term tags as Bio::Annotation::OntologyTerm objects, not Bio::Annotation::SimpleValue objects (Scott!)
Experimental changes for caching ontology files
Bio::Ontology::Ontology
Removed useless line of code, possibly a left-over from somebody testing.
Implementing a Bio::Das::FeatureTypeI class, this required touching several files during debug. I think the only changes made are formatting and switchable debug warningses.
Experimental changes for caching ontology files
Bio::Ontology::OntologyStore
Added an ontology document registry. ontology store is now usable outside the Bio::OntologyIO subsystem as a class to create ontology structures on demand -- just ask the store for what you want by name, e.g. my $store = Bio::Ontology::OntologyStore->get_instance(); my $so = $store->get_ontology('Sequence Ontology'); file fetching happens behind the scenes with the help of recent modifications to Bio::Root::IO to support URLs, and Bio::Ontology::DocumentRegistry (just commited).
Cache downloaded ontology
Doc fix
Modifications to allow url fetch of gene ontology. this was not easy b/c of the multiple .ontology files for each aspect. Bio::SeqFeature::Annotated objects now instantiate Ontology_term tags as Bio::Annotation::OntologyTerm objects, not Bio::Annotation::SimpleValue objects (Scott!)
Experimental changes for caching ontology files
Oops, keep as url on head
Experimental ontology caching
A little protection
Bio::Ontology::RelationshipType
Added support for '~' (related_to) relationships. ran into this one in the current Cell Ontology. '~' relationships are described in the format spec: http://www.geneontology.org/GO.format.html note: there is still a lack of support for '<' (opposite of is_a) and '!=' (opposite of '=' or is_a in the identical sense). will add as needed.
Bio::Ontology::SimpleGOEngine
Added support for '~' (related_to) relationships. ran into this one in the current Cell Ontology. '~' relationships are described in the format spec: http://www.geneontology.org/GO.format.html note: there is still a lack of support for '<' (opposite of is_a) and '!=' (opposite of '=' or is_a in the identical sense). will add as needed.
Temporary workaround -- this won't necessarily work forever....
Bio::Ontology::SimpleOntologyEngine
Issue #2004 richard at uwc dot net Files: Bio/Ontology/SimpleOntologyEngine.pm Bio/Ontology/Term.pm Bio/OntologyIO/InterProParser.pm Bio/OntologyIO/Handlers/InterProHandler.pm t/InterProParser.t
Fixed bug in erroneously expecting a scalar instead of an array.
Bio::Ontology::Term
Dblinks come with context
Issue #2004 richard at uwc dot net Files: Bio/Ontology/SimpleOntologyEngine.pm Bio/Ontology/Term.pm Bio/OntologyIO/InterProParser.pm Bio/OntologyIO/Handlers/InterProHandler.pm t/InterProParser.t
You can't just cut the definition to an arbitrary length - only because your schema or whatever imposes that limit. Reverting this change. BTW it leads to lots of uninitialized value warnings being thrown.
Polished up the InterPro XML SAX handler and added persistence handler properties in order for it to be useable under load_ontology.pl (which wants to install its own persistence handler since they are highly configurable). Aliased the handler as interprosax in OntologyIO.
Experimental changes for caching ontology files
Bio::OntologyIO
Polished up the InterPro XML SAX handler and added persistence handler properties in order for it to be useable under load_ontology.pl (which wants to install its own persistence handler since they are highly configurable). Aliased the handler as interprosax in OntologyIO.
Bio::OntologyIO::Handlers::BaseSAXHandler
Polished up the InterPro XML SAX handler and added persistence handler properties in order for it to be useable under load_ontology.pl (which wants to install its own persistence handler since they are highly configurable). Aliased the handler as interprosax in OntologyIO.
Bio::OntologyIO::Handlers::InterProHandler
Issue #2004 richard at uwc dot net Files: Bio/Ontology/SimpleOntologyEngine.pm Bio/Ontology/Term.pm Bio/OntologyIO/InterProParser.pm Bio/OntologyIO/Handlers/InterProHandler.pm t/InterProParser.t
Bio::OntologyIO::Handlers::InterPro_BioSQL_Handler
Polished up the InterPro XML SAX handler and added persistence handler properties in order for it to be useable under load_ontology.pl (which wants to install its own persistence handler since they are highly configurable). Aliased the handler as interprosax in OntologyIO.
Bio::OntologyIO::InterProParser
Issue #2004 richard at uwc dot net Files: Bio/Ontology/SimpleOntologyEngine.pm Bio/Ontology/Term.pm Bio/OntologyIO/InterProParser.pm Bio/OntologyIO/Handlers/InterProHandler.pm t/InterProParser.t
Bio::OntologyIO::dagflat
Fixed the dagflat ontology parser. Yes you do suffer if your regexps become too promiscuous. Also added tests so that this doesn't happen anymore.
Migrated dagflat parser fix to the stable branch.
Added support for '~' (related_to) relationships. ran into this one in the current Cell Ontology. '~' relationships are described in the format spec: http://www.geneontology.org/GO.format.html note: there is still a lack of support for '<' (opposite of is_a) and '!=' (opposite of '=' or is_a in the identical sense). will add as needed.
Bug in defintions file parser caused a def node and a term node to be created (2x the number of objects needed). further, the defs were not associated with the terms. :\
Modifications to allow url fetch of gene ontology. this was not easy b/c of the multiple .ontology files for each aspect. Bio::SeqFeature::Annotated objects now instantiate Ontology_term tags as Bio::Annotation::OntologyTerm objects, not Bio::Annotation::SimpleValue objects (Scott!)
Bio::OntologyIO::simplehierarchy
Little improvements, mostly to accommodate the latest eVoc formats.
Removed cruft that was lying around commented out. Improved the meaningfulness of an error message.
Fixed omitted related_to relationship from setting the ontology etc.
Bio::Perl
A little more detail
Bio::Phenotype::OMIM::OMIMentry
Perl 5.8 doesn't like "return my @array = ()" and calls it bizarre. Well, it's certainly not the most efficient way to program perl ...
Merged fixes from main trunk to make perl 5.8.4 happy.
Add get_clinical_symptom_organs and query_clinical_symptoms
Bio::Phenotype::OMIM::OMIMparser
Make parser a little more robust to badly formatted data
Pitch in _finer_parse_symptoms method
Bio::Phenotype::Phenotype
Perl 5.8 doesn't like "return my @array = ()" and calls it bizarre. Well, it's certainly not the most efficient way to program perl ...
Merged fixes from main trunk to make perl 5.8.4 happy.
Bio::PopGen::Genotype
Fix some documentation
Change regexp
Bio::PopGen::HtSNP
Adding Tag and HtSNP modules
Made SYNOPSIS section to compile, helped by "cd maintenance; ./modules.pl --synopsis"
Bio::PopGen::IO::csv
Initial contribution by Rich, cleaned up by Jason
Bio::PopGen::IO::hapmap
Initial contribution by Rich, cleaned up by Jason
Update Rich's email addr
Bio::PopGen::IO::phase
Initial contribution by Rich, cleaned up by Jason
Update Rich's email addr
Bio::PopGen::IO::prettybase
Warn if did not parse line correctly
Bio::PopGen::Individual
Documentation update
Fix some documentation
Force individual id to be same for genotype when it is add to an ind
Bio::PopGen::PopStats
Documentation update
Bio::PopGen::Population
Convert a diploid (or any-ploid really) population to a haploid individuals
Documentation update
Fix some documentation
Bio::PopGen::PopulationI
Fix doc for interface
Bio::PopGen::Statistics
Fix Fu and Li's F, warn that F* is still broken, allow for methods to directly calculate the statistics from the raw counts rather than inferring from a population. These are named XX_counts methods
Some docu fix and force haploid population calculation
Fu and Li's F* now working properly. Thanks to Charla Lambert for pointing out the corrections in Simonsen et al (1995). Also checked values against other available libs. K.thornton and D.Ardell
Warn if the full complement of markers is not seen for every individual (probably need to adjust and remove these markers from consideration eventually)
Hanging use Data::Dumper deleted
Docu update
Promised example in the SYNOPSIS
Added citation to MBE paper using Bio::PopGen modules
Bio::PopGen::TagHaplotype
Adding Tag and HtSNP modules
Bio::PopGen::Utilities
Documentation update
Convert alignments into population
Missing needed use statements
Bio::PrimarySeq
Fixing docs to indicate that these objects have get/set on the alphabet call
Allow unsetting of the primary ID.
Nathan says correct regexp
Fix Issue #1670 on the branch
Add X to class of ambiguous characters, _guess_alphabet
Bio::Range
Adding new constructor, unions() that takes a list of Bio::Range as input and returns a list of non-overlapping Bio::Range as output. this method is strand agnostic
Commented warning
Get rid of warnings
Bio::Restriction::Analysis
Method needs to be cut(), not cuts(). Issue #1596
Method needs to be cut(), not cuts(). Issue #1596
Yes, Type I and Type III enzymes cut outside their recognition sequences, there should be no exception
Fixed according to Andrew Nunberg's comments.
Bio::Restriction::IO::withrefm
Remove some warnings
Bio::Root::AccessorMaker
A bioperl customized accessor maker. working now but still under construction for more features
This module is removed from bioperl and will be used an external module
Bio::Root::IO
Dave Howorth's fix for dealing with mac/unix/dos linefeeds when reading from files produced on different platforms
Added code which, if a file to read from doesn't exist and the name looks like a web url, attempts to download the file and use a local tempfile of it instead. this allows: Some::IO->new(-file=>'http://server.my/test.gff') to pull down a gff file, cache it locally, and read from it. it may be better to have a special -url option or something to keep this code separated out, but this commit is just to get it out there.
Network fetch functionality in _initialize_io() activated with -url argument (as opposed to overloading -file). removed LWP::Simple requirement, if it is not present failover to use Bio::Root::HTTPget.
Three minor fixes: - switch off $VERBOSE by default - require LWP::Simple instead of use - return the return value of print from _print method
More intelligent _pushback and _readline
Bio::Root::Root
Rich's fix for untainting
Bio::Root::RootI
Unify not_implemented messsage
Correct a fib, caused by my miscounting for caller method
Bio::Root::Storable
Will Spooner's fixes
Bio::Root::Version
Increased the version number to be differenet from the release
Bio::Search::BlastStatistics
I forgot to commit this yesterday.
Bio::Search::GenericStatistics
Created wrrappers for the statics and parameters in a generic result. I geared this specifically for Blast in the name of Java interoperability. The reason the interoperability was spoiled because of the use of a has to represent statistics and parameters. This efficiently and elegantly fixes the problem.
Bio::Search::HSP::GenericHSP
* new(): Modified the frac_identical/frac_conserved setting calls to use the logical length of the aligned region where appropriate via SearchUtils::logical_length(). Also, the percent_identity() set call relies on frac_identical() (calculate something once, why calculate it again? ;-).
Migrating revision 1.60.2.1 to the main trunk. Fixes Issue #1597
Store Group field
Argument order and data initialized was flip-flopped
Bio::Search::HSP::HSPI
Don't try and calculate match overlap when there is no seq string stored (like PSL)
Bio::Search::Hit::GenericHit
Added method _warn_about_no_hsps() which is now called by all methods requiring HSP data (most of which also call tile_hsps()).
* frac_identical(), frac_conserved(): Added comments describing the when the "logical" length of the aligned portion of query/hit sequences are used. * length_aln(): Moved the setter code before the attempt to tile hsps. * Also migrating mods from revision 1.30 to branch-1-4.
Refactored logical_length() to delegate to Bio::Search::SearchUtils::logical_length()
Migrating revisions 1.29.2.1 and 1.29.2.2 to the main trunk.
Fixed check for undef omissions. Also, "$acc.$version" doesn't look very good if $version is undef or an empty string (fixed now).
* Moved the _warn_about_no_hsps() method from GenericHit.pm into SearchUtils.pm. Refactored SearchUtils::tile_hsps() to use it. * Added more descriptive text to the warning message in _warn_about_no_hsps() explaining the most likely cause of it.
Migrating revision 1.29.2.3 to the main trunk.
Issue #1714 - there was inconsistent expectation of the state of _hsps arrayref
Bio::Search::Iteration::IterationI
Issue #1611 - add_hit to a BlastResult object
Bio::Search::Result::BlastResult
Created wrrappers for the statics and parameters in a generic result. I geared this specifically for Blast in the name of Java interoperability. The reason the interoperability was spoiled because of the use of a has to represent statistics and parameters. This efficiently and elegantly fixes the problem.
I forgot to commit this yesterday.
Issue #1611 - add_hit to a BlastResult object
Bio::Search::Result::GenericResult
Created wrrappers for the statics and parameters in a generic result. I geared this specifically for Blast in the name of Java interoperability. The reason the interoperability was spoiled because of the use of a has to represent statistics and parameters. This efficiently and elegantly fixes the problem.
Bio::Search::Result::ResultI
Added sort_hits() method -- allows sorting of hits by user-supplied coderef. Added _default_sort_hits method -- default codref for sort_hits(), sorts by descending score
Bio::Search::SearchUtils
Added warn call to tile_hsps() in the event of no HSPs in the supplied hit object. tile_hsps() also returns a boolean value: true if it was able to tile, false otherwise. Not currently used.
* Migrating 1.11 modification to branch-1.4. * Added logical_length() to compute the length of the aligned sequence based on the algorithm type (e.g., blastx) and sequence type (e.g., query vs hit). This refactors coded used by GenericHit and GenericHSP. * Fixes Issue #1597
Migrating revision 1.10.2.1 to the main trunk.
* Moved the _warn_about_no_hsps() method from GenericHit.pm into SearchUtils.pm. Refactored SearchUtils::tile_hsps() to use it. * Added more descriptive text to the warning message in _warn_about_no_hsps() explaining the most likely cause of it.
Migrating revision 1.10.2.2 to the main trunk.
No FASTA algorithm is as abominable as TBLASTX; similarly, there's no TFASTN (and no one should ever use TFASTA, so we don't support it)
Bio::Search::StatisticsI
Created wrrappers for the statics and parameters in a generic result. I geared this specifically for Blast in the name of Java interoperability. The reason the interoperability was spoiled because of the use of a has to represent statistics and parameters. This efficiently and elegantly fixes the problem.
Bio::SearchIO::IteratedSearchResultEventBuilder
Issue #1653 get rank back for hits
Issue #1653 get rank back for hits
Bio::SearchIO::SearchResultEventBuilder
Handle cases when query_length or hit_length should be derived from the parsed data rather than counted from the alignment bases
Issue #1653 get rank back for hits
Issue #1653 get rank back for hits
Bio::SearchIO::Writer::GbrowseGFF
Squashed a bug in the GFF3 formatter. The GbrowseGFF formatter is now updated to be GFF3 compliant. It also uses GFF3 by default now.
Resolved a GFF3/2.5 formatting issue
Fixed some syntax errors(??)
Bio::SearchIO::Writer::HTMLResultWriter
Deal with HMMER output okay
Igor Dolgalev reported unbalanced </tr> in statistics table; also make HMMPFAM writing quieter by reusing bit/score value when one is missing
Bio::SearchIO::Writer::TextResultWriter
Deal with HMMER output okay
Marc Logghe's commit
Bio::SearchIO::blast
* Fixed Issue #1126 (504 letters) * Commented out some debug calls
Fixed syntax error created by previous commit.
Fixed check for undef omissions. Also, "$acc.$version" doesn't look very good if $version is undef or an empty string (fixed now).
Handle NCBI weirdness/bug on OS X blast 2.2.8 where numbers aren't really numbers...
Deal with part of Issue #1598, parsing on OSX
Merge necessary main trunk fixes to the branch in anticipation of a 1.4.1 release one day
Treat empty and undef strings as empty
Handle extra val in the Expect Evalue # (only present in megablast output I think) -from bl2seq -m T output at any rate)
Parse -noseqs wublast output
Parse Group field
Handle extra space introduced by HTML blast or HTML::Strip
Move the test for whitespace only string later in the characters subroutine. Also fix hsp_group parsing to properly capture the data (missing the ',' in the parse). Initialization parameter is -hsp_group now not -group
Silence uninit warnings
Issue #1703 fixes
Bio::SearchIO::blasttable
Apply Steve's small caching speedup to blastable
Bio::SearchIO::fasta
More fixes for fasta parsing <sigh>
Match changes in new FASTA ungapped has become similar match both; slight code indenting
Set positive count properly too
Bio::SearchIO::hmmer
Better regexp to distinguish between RF lines and those with the characters RF in the alignment
Change HMMER parser to report a single HSP per Hit so domains with multiple alignments get separate Hits (more FASTA like) since they aren't really HSPs
Bio::SearchIO::sim4
Major fix: 'HSPs' that were preceded by another one for which the definition finished at the very end of an alignment block did not get initialized correctly.
Add comment and escape comment in regexp
Old changes some cleaning up
Bio::Seq
Fixing docs to indicate that these objects have get/set on the alphabet call
Correction
Allow unsetting of the primary ID.
Made primary_id() to delegate. Yes, always. Also, added a Seq test to make sure glaring inconsistencies don't happen again.
Bio::Seq::LargeLocatableSeq
Started adding sequence alignments that store sequences in a file. coding by Albert Vilella
Minor doc and cosmetic changes
Bio::Seq::LargePrimarySeq
Started adding sequence alignments that store sequences in a file. coding by Albert Vilella
Bio::Seq::LargeSeq
Started adding sequence alignments that store sequences in a file. coding by Albert Vilella
Bio::Seq::LargeSeqI
Started adding sequence alignments that store sequences in a file. coding by Albert Vilella
Remove unnecessary length() calls, more docs
Bio::Seq::PrimaryQual
Fixed a problem detected by Luke McCarthy in which leading spaces in quality lines threw off the parser.
Added the ability to specify a header when calling write_seq on a qual
Bio::Seq::QualI
Phillip's patch
Bio::Seq::RichSeqI
Fixed transcript glyph so that nonstranded features are not automatically treated as + strand
Bio::Seq::SeqFastaSpeedFactory
Allow alphabet to be passed to create() so we dont have to guess it if its known
Bio::Seq::SeqWithQuality
Continuing to make Singlets work with the Assembly framework.
Bio::Seq::SequenceTrace
Need to pass arguments when delegating
Fixed case bug + commented out apparently unused variables
Bio::SeqFeature::Annotated
This class tries to be clean of *_tag_* calls, opting to use Bio::AnnotationCollectionI as a replacement. we should be able to map most of the *_tag_* calls to Bio::AnnotationCollectionI calls, but i haven't done it yet. currently throwing an error on *_tag_* calls.
Added a source() method to Bio::SeqFeature::Annotated. updated warning messages to state that tag* methods are deprecated
Added a shortcut get_Annotations() method to feature to bypass ->annotation-> middleman call. also added nice list->scalar autoconverstion
Two things: * adding SOFA as an available ontology to DocumentRegistry.pm * modifying FeatureIO::gff to use SOFA to validate, and to parse Ontology_term
Trying to add Target handling to FeatureIO::gff--it doesn't work yet, but is no more broken than before. Also added a phase method to Bio::SeqFeatureAnnotated and fixed confusion over frame vs phase.
Methods for targets
'source' method now is really based on the Annotation::Collection. FeatureIO::gff profits from this.
Bringing SeqFeature::Annotated up to be Bio::SeqFeatureI compliant. added add_SeqFeature() and remove_SeqFeature() methods to Bio::FeatureHolderI
Fixed syntax errors
Dropped duplicated methods.
Cleanup to use annotation shortcut methods
Updated Bio::AnnotationCollection to implement *_tag_* methods with deprecation warning. these were taken from Bio::SeqFeatureI and Bio::SeqFeature::Generic. *_tag_* methods in Bio::SeqFeature::Annotated are now implemented by explicit pasthru to the conttained Bio::Annotation::Collection instance.
Changes to work with scott's test gff. target processing seems borken.
Removed the 'my $self = shift;' lines in 'has_tag', 'add_tag_value', 'get_tag_values', 'get_all_tags' and 'remove_tag' because they are not needed here.
. Bio::SeqFeatureI inherits Bio::AnnotatableI NOT Bio::AnnotationCollectionI . *_tag_* methods are in Bio::AnnotatableI, and internally defer to Bio::AnnotatableI->annotation->some_analagous_mapped_function() . deprecation warnings commented until 1.6 . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* methods map directly onto Bio::AnnotationI's Bio::AnnotationCollectionI instance. . Unflattener and Unflattener2 tests pass with no changes.
Added support for the 'EXPAND' option in 'add_SeqFeature'.
'source' now gives back its value instead of a Bio::Annotation::SimpleValue object. (This is consisitent with 'type').
'source' now gives back a Bio::Annotation::SimpleValue object, but we make sure that we really always get something, so that we can call $feat->source->value without bothering about the existence of a source annotation. I think this behaviour should consistently be implemented for all parts of the annotation.
Bio::SeqFeature::Annotated: Unified the implementation of attribute accessor methods. 'seqid', 'type', 'source', 'score', 'frame' and 'phase' now all use the AnnotationCollection directly and give back Bio::AnnotationI implementing objects. Scalar values as well as appropriate Bio::AnnotationI implementing objects can be used on set. Default values ('.') are returned when called without previous setting. 'seqid' and 'source' uri_escape their values on set with scalars. 'seqid' itself is new and should replace 'id' for better compatibility with gff2 specs, but I left 'id' untouched. Bio::FeatureIO::gff: Adapted to the changes in Bio::SeqFeature::Annotated. Especially switched back to the use of 'seqid' instead of 'id'.
If we are setting undef scores to '.', we have to allow that as a valid value!
I'm trying to 'fix' the constructor of Annotated.pm and the write_feature method of gff.pm so that I can create an arbitrary Annotated sequence that I can use to generate a line of gff. I'm sure I did several things along the way.
Consolidated seq_id/seqid/id to seq_id.
FTHelper uses SeqFeatureI calls rather than SeqFeature::Generic-specific calls.
Bio::SeqFeature::AnnotationAdaptor
. Bio::SeqFeatureI inherits Bio::AnnotatableI NOT Bio::AnnotationCollectionI . *_tag_* methods are in Bio::AnnotatableI, and internally defer to Bio::AnnotatableI->annotation->some_analagous_mapped_function() . deprecation warnings commented until 1.6 . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* methods map directly onto Bio::AnnotationI's Bio::AnnotationCollectionI instance. . Unflattener and Unflattener2 tests pass with no changes.
Bio::SeqFeature::Collection
Insure that feature is-a Generic feature before getting FTstring in debug
Bio::SeqFeature::Gene::Intron
Acceptor and donor splice_site methods now accept 0 as a legitimate value for splice_site
Bio::SeqFeature::Generic
Protect in event on calls add_SeqFeature with an empty value
See bp maillist for full desc the script seq/unflatten_seq will now generate GFF3 - the unflattener module is used to build the 'feature graph' connecting genes, transcripts, exons and CDSs together. This means we can have GFF3 for anything in genbank! As far as I'm aware, the only other sensible output formats to use here (ie formats that support feature graphs/containment hierarchies) are: chado, chaos, and the write-only asciitree. This feature graph is written out in the GFF3 using the ID and Parent tags. To do this there is an extra intermediate step - the bioperl FeatureHolderI hierarchy is traversed and ID/Parent tags are generated. Here is a description of the changes I have made: [unless you're a bioperl hacker you don't really need to read the rest of this] You can get the context of what I'm on about from this thread: http://bioperl.org/pipermail/bioperl-l/2003-December/014150.html Two new public methods: FeatureHolderI->set_ParentIDs_from_hierarchy sets both ID and ParentID from FeatureHolder hierarchy SeqFeatureI->generate_unique_persistent_id this is required by the above method Lincoln wanted this to be private, but I think it has to be called from outside FeatureHolderI->create_hierarchy_from_ParentIDs the inverse of set_ParentIDs_from_hierarchy
Support specifying multiple values per tag (already supported by add_tag_value)
. Bio::SeqFeatureI inherits Bio::AnnotatableI NOT Bio::AnnotationCollectionI . *_tag_* methods are in Bio::AnnotatableI, and internally defer to Bio::AnnotatableI->annotation->some_analagous_mapped_function() . deprecation warnings commented until 1.6 . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* methods map directly onto Bio::AnnotationI's Bio::AnnotationCollectionI instance. . Unflattener and Unflattener2 tests pass with no changes.
Bio::SeqFeature::Primer
Updated Tm calculation to reflect suggestions by Barry Moore. See bioperl-l for discussion of why.
Bio::SeqFeature::Similarity
Will's fix
Bio::SeqFeature::Tools::FeatureNamer
New module for granting names to features
Bio::SeqFeature::Tools::IDHandler
Moved ID methods to new class IDHandler see thread 'new GFF3 support methods' http://bioperl.org/pipermail/bioperl-l/2004-March/thread.html
Tidied up ID handling
Steffen Grossman's patch; test case upcoming
In ID generation, transcript_id or protein_id from genbank file are used if available
Bio::SeqFeature::Tools::TypeMapper
Added get_relationship_type_by_parent_child
Fixed docs
Modified TypeMapper.pm to translate 'Protein' to 'protein' (perhaps there ought to be a general 'lower' and then deal with exceptions like CDS). Also modified genbank2gff3.PLS to produce a full reference sequence line in addition to the ##sequence-region directive.
Bio::SeqFeature::Tools::Unflattener
Malcolm cook's patch to populate exon accessors
See bp maillist for full desc the script seq/unflatten_seq will now generate GFF3 - the unflattener module is used to build the 'feature graph' connecting genes, transcripts, exons and CDSs together. This means we can have GFF3 for anything in genbank! As far as I'm aware, the only other sensible output formats to use here (ie formats that support feature graphs/containment hierarchies) are: chado, chaos, and the write-only asciitree. This feature graph is written out in the GFF3 using the ID and Parent tags. To do this there is an extra intermediate step - the bioperl FeatureHolderI hierarchy is traversed and ID/Parent tags are generated. Here is a description of the changes I have made: [unless you're a bioperl hacker you don't really need to read the rest of this] You can get the context of what I'm on about from this thread: http://bioperl.org/pipermail/bioperl-l/2003-December/014150.html Two new public methods: FeatureHolderI->set_ParentIDs_from_hierarchy sets both ID and ParentID from FeatureHolder hierarchy SeqFeatureI->generate_unique_persistent_id this is required by the above method Lincoln wanted this to be private, but I think it has to be called from outside FeatureHolderI->create_hierarchy_from_ParentIDs the inverse of set_ParentIDs_from_hierarchy
Allowed subfeatures to EXPAND the container feature; this is necessary to deal with how genbank represents dicistronic genes; see ZFP91 and CNTF in build 34.3 this is not the ideal way of dealing with dicistronics but it should suffice for now expansion causes a level=1 problem to be thrown
Now deals with pseudogenes
Rationalised ID generation code in chaosxml write adapter changed verbose reporting in unflattener to use stderr
Turn verbosity down
Turn verbosity down
Verbose option now consistently writes to STDERR more liberal in paranoid checking of subfeature ordering; now handles unusual ensembl style genbank revcomp splitlocs
Be quiet when verbosity is <=0
Make it clean when verbosity < 0
Bio::SeqFeatureI
See bp maillist for full desc the script seq/unflatten_seq will now generate GFF3 - the unflattener module is used to build the 'feature graph' connecting genes, transcripts, exons and CDSs together. This means we can have GFF3 for anything in genbank! As far as I'm aware, the only other sensible output formats to use here (ie formats that support feature graphs/containment hierarchies) are: chado, chaos, and the write-only asciitree. This feature graph is written out in the GFF3 using the ID and Parent tags. To do this there is an extra intermediate step - the bioperl FeatureHolderI hierarchy is traversed and ID/Parent tags are generated. Here is a description of the changes I have made: [unless you're a bioperl hacker you don't really need to read the rest of this] You can get the context of what I'm on about from this thread: http://bioperl.org/pipermail/bioperl-l/2003-December/014150.html Two new public methods: FeatureHolderI->set_ParentIDs_from_hierarchy sets both ID and ParentID from FeatureHolder hierarchy SeqFeatureI->generate_unique_persistent_id this is required by the above method Lincoln wanted this to be private, but I think it has to be called from outside FeatureHolderI->create_hierarchy_from_ParentIDs the inverse of set_ParentIDs_from_hierarchy
Source is part of unique key
Moved ID methods to new class IDHandler see thread 'new GFF3 support methods' http://bioperl.org/pipermail/bioperl-l/2004-March/thread.html
Bug fixes for James Thompson's reported bug in spliced_seq, changed the API so it supports an optional 2nd argument which if true, will not sort the sub-locations in a split-location before splicing together the pieces. Also fixed an error in how revcomplemented pieces were stitched together
Updated Bio::AnnotationCollection to implement *_tag_* methods with deprecation warning. these were taken from Bio::SeqFeatureI and Bio::SeqFeature::Generic. *_tag_* methods in Bio::SeqFeature::Annotated are now implemented by explicit pasthru to the conttained Bio::Annotation::Collection instance.
Changes to Bio::AnnotationCollectionI and Bio::SeqFeatureI. * Bio::SeqFeatureI now ISA Bio::AnnotationCollectionI * All Bio::SeqFeatureI *_tag_* methods have been moved to Bio::AnnotationCollectionI, marked as deprecated, and mapped to their analogous and mostly pre-existing Bio::AnnotationCollectionI methods. Methods which were not in Bio::AnnotationCollectionI, but were i Bio::Annotation::Collection and were necessary for *_tag_* method remapping were created in Bio::AnnotationCollecitonI. * Bio::RangeI and Bio::AnnotationCollectionI method documentation removed from Bio::SeqFeatureI, and replaced with a link to the interface class inherited from. This reduces documentation maintenance overhead.
. Bio::SeqFeatureI inherits Bio::AnnotatableI NOT Bio::AnnotationCollectionI . *_tag_* methods are in Bio::AnnotatableI, and internally defer to Bio::AnnotatableI->annotation->some_analagous_mapped_function() . deprecation warnings commented until 1.6 . Bio::AnnotatableI now keeps a tag->annotation_type registry to allow new tags to be created (see Bio::SeqFeature::AnnotationAdaptor). . Bio::SeqFeature::AnnotationAdaptor is now not very useful, as *_tag_* methods map directly onto Bio::AnnotationI's Bio::AnnotationCollectionI instance. . Unflattener and Unflattener2 tests pass with no changes.
Only do the prepending when nosort is specified
Bio::SeqIO
Add mention of new formats
Add tinyseq
Fix documentation since the default doesn't work anymore without -fh => \*ARGV
Fix documentation since the default doesn't work anymore without -fh => \*ARGV
Doc fix
Check to see if -file or -fh arguments have been passed in with undefined values. If that's the case, don't fall back to $ARGV[0], rather throw an exception.
List of formats in 1 file, not 3
Bio::SeqIO::FTHelper
Seqid needs to be set for the feature object as well
Update doc
FTHelper uses SeqFeatureI calls rather than SeqFeature::Generic-specific calls.
Bio::SeqIO::agave
Initial version of AGAVE XML parser
Bio::SeqIO::bsml_sax
BSML parsing via SAX parser
Add Id line
Bio::SeqIO::chadoxml
See bp maillist for full desc the script seq/unflatten_seq will now generate GFF3 - the unflattener module is used to build the 'feature graph' connecting genes, transcripts, exons and CDSs together. This means we can have GFF3 for anything in genbank! As far as I'm aware, the only other sensible output formats to use here (ie formats that support feature graphs/containment hierarchies) are: chado, chaos, and the write-only asciitree. This feature graph is written out in the GFF3 using the ID and Parent tags. To do this there is an extra intermediate step - the bioperl FeatureHolderI hierarchy is traversed and ID/Parent tags are generated. Here is a description of the changes I have made: [unless you're a bioperl hacker you don't really need to read the rest of this] You can get the context of what I'm on about from this thread: http://bioperl.org/pipermail/bioperl-l/2003-December/014150.html Two new public methods: FeatureHolderI->set_ParentIDs_from_hierarchy sets both ID and ParentID from FeatureHolder hierarchy SeqFeatureI->generate_unique_persistent_id this is required by the above method Lincoln wanted this to be private, but I think it has to be called from outside FeatureHolderI->create_hierarchy_from_ParentIDs the inverse of set_ParentIDs_from_hierarchy
Moved ID methods to new class IDHandler see thread 'new GFF3 support methods' http://bioperl.org/pipermail/bioperl-l/2004-March/thread.html
Add following input to write_seq method: is_analysis flag: for setting the is_analysis flag in chado feature table; data_source: 'GenBank' or 'GFF' to handle GenBank and GFF data differently. change pub.miniref to pub.uniquename to conform to chado schema change.
Edits
Bio::SeqIO::chaos
Modules for producing chaos-xml
Tidied up ID handling
Rationalised ID generation code in chaosxml write adapter changed verbose reporting in unflattener to use stderr
Bio::SeqIO::chaosxml
Modules for producing chaos-xml
Bio::SeqIO::embl
Handle Feature Table-less records and set unknown id to something without spaces
Remove debugging
Parse sequences with no FT
Added extraction of NCBI taxon ID to the embl parser.
James Abbott's patch
Merged J.Abbot's patch and the recognition of the NCBI taxon ID from the main trunk.
All Bio::DB sequence retrieval modules only warn not throw on missing seq
All Bio::DB sequence retrieval modules only warn not throw on missing seq
Space required in ID line between accession and 'standard'
Indent so that table looks good
Consolidate this into a single test
Consolidate this into a single test
Issue #1662. copied code from genbank to embl to avoid truncation of long feature values; word wrapping added to warning
Previous fix on Issue #1662 highlighted a spurious warning in code. fixed
Added return values to all writing statements, so that write_seq() now returns 1 on success and undef on failure. (The documentation said that it did already, but the docs were incorrect)
Apply Simon's patch
Nathan Haigh's suggestions to call return instead of last when short circuiting, also tightened up the code a little, need to protect calls to regexps
Resove Issue #1618 for long, over 10 character, ids
Bio::SeqIO::fasta
Quash errors when display_id is undef
Protect for empty value possibility (to get rid of warnings)
Avoid undef warnings when no id for a fasta sequence (which is legal)
Allow alphabet to be set for SeqIO::fasta, thus determining the alphabet of all it's sequences - doesn't then rely on _guess_alphabet, but limited to objects that contain one one type of alphabet if this is to be used
Bio::SeqIO::game::featHandler
Made changes to fix broken protein ID handling, various cosmetic changes
Various patches applied to get round-tripping with the Generic Genome Browser working
Bio::SeqIO::game::gameHandler
Various patches applied to get round-tripping with the Generic Genome Browser working
Bio::SeqIO::game::gameSubs
Made changes to fix broken protein ID handling, various cosmetic changes
Various patches applied to get round-tripping with the Generic Genome Browser working
Bio::SeqIO::game::gameWriter
Made changes to fix broken protein ID handling, various cosmetic changes
Various patches applied to get round-tripping with the Generic Genome Browser working
Bio::SeqIO::game::seqHandler
Made changes to fix broken protein ID handling, various cosmetic changes
Various patches applied to get round-tripping with the Generic Genome Browser working
Bio::SeqIO::genbank
Fix missing 6 trailing spaces after ORIGIN when writing GenBank
Merge fix for trailing space onto branch
Documentation fixes.
Issue #1588 -- Fix parsing problem when there is PUBMED id but not MEDLINE id in the REFERENCE block
Added extraction of NCBI taxon ID to the embl parser.
Now accepts unconventional Organism names. Have given examples in head2. Will sort out embl.pm and others that need doing as well as updating tests soon.
I seem to have altered the contributors list, @ to at, unintentionally
Updated old version previously. this one is correct.
Accepts another instance of strange Organism naming.
Some valid, if unconventional, species labels had been split into subspecies. Not anymore.
Exit the loop after the taxon was found. Purely for efficiency.
Bug fix: stray unmatched parenthesis in SOURCE line, messed up a regex.
Tenative fix for Issue #1650 - comment lines truncated in genbank writing
Indent so that table looks good
Issue #1662. copied code from genbank to embl to avoid truncation of long feature values; word wrapping added to warning
Previous fix on Issue #1662 highlighted a spurious warning in code. fixed
Fix Issue #1673 -definition line parsing not quite proper before heading to REFERENCE parsing
Skip writing ORIGIN and BASE count when dealing with a CONTIG record. Issue #1314 should be fixed now.
Parse WGS records
Bio::SeqIO::interpro
Adding support for parsing IPRscan output XML files.
Added documentation to Bio::SeqIO::interpro.pm
Dos2unix
Made SYNOPSIS section to compile, helped by "cd maintenance; ./modules.pl --synopsis"
Bio::SeqIO::kegg
Keggification -- parse pathway name now
Update docs wrt pathway
Cosmetics -- Albert Vilella
Entry_id now can cope with .N (versioning) - changed some parameters to be optional, as it seems some kegg files, like ftp://ftp.genome.ad.jp/pub/kegg/genomes/genes/L.major.ent, dont have, for example, NAME or other fields -- Albert Vilella
Bio::SeqIO::metafasta
Edits
Bio::SeqIO::qual
Added the ability to specify a header when calling write_seq on a qual
Write_seq can be passed either of 2 different objects, but only 1 of these has a header() method. Let's check to see what we have before we call header().
I like this syntax better...
Rearrange HEADER, call header() before id()
Bio::SeqIO::scf
Because of a 'feature' discovered by Anthony Underwood (Hi Anthony!) I made a change to scf.pm. The original specification for scf as found here: (http://staden.sourceforge.net/manual/formats_unix_8.html) allows the program writing scf to place base information before sample information in the file. I did not detect that behavior in any of my programs so I read the samples and then the bases sequentially. I changed scf.pm to 'seek' around the file looking for information based on what is contained in the header rather then assuming a given program behaves logically. I wasn't able to find a 'seek' in and of the bioperl IO modules so I used a perl call. I hope that doesn't break any platform-specific seek issues.
Remove warnings - make them debug stmts
Make obj dump a debug stmt
Fixed version issue in writer subroutine
Fixed missing last base bug.
Bio::SeqIO::swiss
Fixed Issue #1584
Fixed Issue #1584 on branch
All Bio::DB sequence retrieval modules only warn not throw on missing seq
All Bio::DB sequence retrieval modules only warn not throw on missing seq
Now allows unconventional OS labels.
Indent so that table looks good
Major overhaul - SeqIO code needs an audit! Parsing of multi-lined RP lines and better usage of the _pushback function in Bio::Root::IO. Create all the references at once as well
Added capability to parse new gene name (GN) line format.
Added recognition and capability of dealing with RG line in swissprot format.
Fixed reference parsing if the RA lacking reference is not the last one. Fixed species parsing to not generate warning if genus gets undefined for environmental sample seqs.
Fix problem when species name contains regexp special characters
Fixed parsing of references.
Partial fix for Issue #1734
Bio::SeqIO::tab
Check for tabs in id before writing out
Check for tabs in id before writing out
Bio::SeqIO::tigr
Fixed a bug in the KEYWORD regex Added the _process_tiling_path function
Fixed my e-mail address.
Fix for newly used IS_PRIMARY
Fixed GO regex
Bio::SeqIO::tigrxml
Maybe not perfect version of tigrxml - I might move this somewhere else later
Grab the description from TIGRXML
Cleanup slightly
Get the ASMBL ID properly, although not really necessary since we grab accession later on
Get the fname properly
Bio::SeqIO::tinyseq
Initial check-in of modules for parsing NCBI TinySeq xml sequences
Added method to Bio::SeqIO::tinyseq to write taxid/organism info if available Added method to Bio::SeqIO::tinyseq::tinyseqHandler.pm to reject docs w/o tinyseq DTD
Bio::SeqIO::tinyseq::tinyseqHandler
Initial check-in of modules for parsing NCBI TinySeq xml sequences
Added method to Bio::SeqIO::tinyseq to write taxid/organism info if available Added method to Bio::SeqIO::tinyseq::tinyseqHandler.pm to reject docs w/o tinyseq DTD
Bio::SimpleAlign
Started adding sequence alignments that store sequences in a file. coding by Albert Vilella
Applying Dmitry Samborsky's patch
Dmitry's patch
Fixing indentation
Dmitri's patch
Accept score as an init option
Handle a couple of odd cases I invented, use the deprecated method
Let 0 be a valid state, so look for undefined chars
Brad F's suggestion to propigate strand down to newly created seqs in a slice. Also a third option to the slice() function allows slice to return columns in the slice which are all gaps
Bio::SimpleAnalysisI
Fix Revision string problems
Bio::Species
Wes' fix
Bio::Structure::IO::pdb
Dave's patch
Bio::Structure::SecStr::DSSP::Res
Updated DSSP parser to successfully deal with DSSP output generated from ATOM-line only input files.
Bio::Taxonomy::Node
Taxonomy::Node can now masquerade as a Bio::Species
Add API for supporting retrieval of child nodes
Bio::Tools::Alignment::Consed
Fixed a fault when a singlets file does not exist.
Bio::Tools::Analysis::Protein::GOR4
Debug for 1.5 release
Bio::Tools::Analysis::Protein::Scansite
Improved docs and argument processing
to handle server errors
Bio::Tools::BPbl2seq
Attempt to cleanup memory cycles- Issue #1581
Bio::Tools::BPlite
Localize $_, make sure /o some regexps for potential speed, and apply patches for Issue #1668 reported by Michael Cariaso parsing support for MPIblast output
Bio::Tools::BPlite::Sbjct
Applied Frederic Pecqueur's fix for subset of databases
Localize $_, make sure /o some regexps for potential speed, and apply patches for Issue #1668 reported by Michael Cariaso parsing support for MPIblast output
Bio::Tools::Blat
Neil's patch
Neil's patch
Bio::Tools::CodonTable
Test translate gaps - condense 3gaps to a single gap for translate to work instead of 'X'
Translate with triplet gaps are made into a single gap in the resultant protein
New method, reverse_translate_all() which reverse translates the whole aa sequence to a IUPAC nucleotide string
Enhanced reverse_translate_all() now takes a codon usage table as an argument
Bio::Tools::EPCR
Apply Malcolm Cook's patch to make the module more flexible. Also capitalize the 'Note' field and add supporting get/set methods for source,primary,groupclass
Last bit of Malcom Cook's changes I forgot
Bio::Tools::Fgenesh
Added Christopher Dwan's Fgenesh.pm
Parse with or without leading whitespace
Dos2unix
Bio::Tools::GFF
Make parse header not display warnings when opening a writer stream as it always calls _parse_header - not a good way to prevent this so turn of the warnings
Risking another stint in purgatory, I bravely change the GFF formatter. The changes are: 1) new format, GFF2.5, is available. The only difference is in the way that it formats Target tag/values in column 9 using the tstart and tend tags. 2) when calling gff_string on a SeqFeature::FeaturePair it will now use the 'hit' sequence as its Reference (column 1), rather than the 'query' sequence; this is the behaviour that Gbrowse expects, and it just makes more sense that way IMO. GFF3 format now forces lowercase on the first character of non-reserved words, since the spec reserves all ucfirst tags for reserved words in the future. At the moment the list of reserved words is hard-coded, so this will need to be looked at in the future. GFF3 format now produces the Target tag with proper format, and uses information from the 'query' sequence to get its information, rather than the 'hit' sequence. If this breaks anyone, I'm sure I will hear the screams.
Squashed a bug in the GFF3 formatter. The GbrowseGFF formatter is now updated to be GFF3 compliant. It also uses GFF3 by default now.
Oops. GFF3 is super-sensitive to spaces in column 9. All tests pass now.
See bp maillist for full desc the script seq/unflatten_seq will now generate GFF3 - the unflattener module is used to build the 'feature graph' connecting genes, transcripts, exons and CDSs together. This means we can have GFF3 for anything in genbank! As far as I'm aware, the only other sensible output formats to use here (ie formats that support feature graphs/containment hierarchies) are: chado, chaos, and the write-only asciitree. This feature graph is written out in the GFF3 using the ID and Parent tags. To do this there is an extra intermediate step - the bioperl FeatureHolderI hierarchy is traversed and ID/Parent tags are generated. Here is a description of the changes I have made: [unless you're a bioperl hacker you don't really need to read the rest of this] You can get the context of what I'm on about from this thread: http://bioperl.org/pipermail/bioperl-l/2003-December/014150.html Two new public methods: FeatureHolderI->set_ParentIDs_from_hierarchy sets both ID and ParentID from FeatureHolder hierarchy SeqFeatureI->generate_unique_persistent_id this is required by the above method Lincoln wanted this to be private, but I think it has to be called from outside FeatureHolderI->create_hierarchy_from_ParentIDs the inverse of set_ParentIDs_from_hierarchy
Protect for empty value possibility (to get rid of warnings)
Added ability to parse sequence data in GFF3 - see NOTES section & email to bioperl list for details
Adopted Aarons suggestions for method names
Altering _from_string for GFF3 to specify tab delimited and thus allow unescaped spaces--the GFF3 spec changed in this regard
Fix Issue #1690 and do some code formatting fix
Non-existent 'source_tag' entries should give a '.' in the gff string. Fixed this.
Disallowing double quotes in GFF3
Bio::Tools::Genewise
One step these pattern matches
Bio::Tools::IUPAC
New hash added to map nucleotide combinations -> IUPAC code
Prevent warning of GT deprecation
Requested feature
Bio::Tools::Phylo::PAML
Parse kappa properly for codeml reports
Merge changes for parsing kappa to branch
Add kappa info to the docs
Add kappa info to the docs
Parse omega when it is fixed as well as when it is estimated - undo problem introduced when applying kappa fix
Added why PAML short circuited - need to add check to warn specifically in the case of stop codons
Strip whitespace so nodenames are proper
BASEML parsing (partial - pariwise only for now)
Parse tree data too not just pairwise
Grab rate matrix
Bio::Tools::Phylo::PAML::Result
Avoid falling over when there are no nssite results
Merge fix for not failing when no NSSites were run
Update doc - it is a Bio::Matrix::PhylipDist object now
BASEML parsing (partial - pariwise only for now)
Bio::Tools::Phylo::Phylip::ProtDist
Shorten code
Bio::Tools::Primer::AssessorI
Fix a bug
Bio::Tools::Primer3
Michael's patch applied
Bio::Tools::Run::GenericParameters
Created wrrappers for the statics and parameters in a generic result. I geared this specifically for Blast in the name of Java interoperability. The reason the interoperability was spoiled because of the use of a has to represent statistics and parameters. This efficiently and elegantly fixes the problem.
Bio::Tools::Run::ParametersI
Created wrrappers for the statics and parameters in a generic result. I geared this specifically for Blast in the name of Java interoperability. The reason the interoperability was spoiled because of the use of a has to represent statistics and parameters. This efficiently and elegantly fixes the problem.
Bio::Tools::Run::RemoteBlast
Modified RemoteBlast.pm to understand the entire QBlast PUT/GET API.
Made independent of tempfiles - results held in memory
Explicit file handle close stops tempfile errors
Add RETRIEVALHEADER example
Bio::Tools::Run::StandAloneBlast
Modified _setparams so queries against multiple databases can use dbs in default directory ($BLASTDATADIR) or be specified by full path.
Issue #1599
Issue #1599
Correct some documentation
New versions of bl2seq requires a program to provided, will not default to blastp anymore
Old doc commits
Documentation fix
Quiet and q parameter were getting stored in the same slot... Another reason I HATE AUTOLOAD
Added -R option for PSI-TBLASTN per Issue #1137
Back out duplicate
Handle files which have been segmented
Update allowable blastpgp params
Bio::Tools::Run::WrapperBase
Argument get/set for passing commandline parameters to exec command
Docu update
Bio::Tools::SeqStats
0.01 dalton adjustments to amino acid molweights to match those in SWISS-PROT.
0.01 dalton adjustments to amino acid molweights to match those in SWISS-PROT.
Add 1 significant digit to molecular weight result
Bio::Tools::SeqWords
Fix some documentation
Bio::Tools::SiRNA
Modified to allow oligos in 3prime UTR as an option Fixed calls to $target->start / $target->end. Modified to use any Bio::SeqI compliant object as a target.
Moved Don's fixes from head
Extensive revision to support multiple rulesets as Bio::Tools::SiRNA::ruleset subclasses
Bio::Tools::SiRNA::Ruleset::saigo
Initial check-in of bioperl object for designing siRNA reagents using the ruleset published by Ui-Tei et al.
Bio::Tools::SiRNA::Ruleset::tuschl
Initial check in of bioperl object for designing siRNA reagents using rules developed by the Tuschl group (note - these used to be part of Bio::Tools::SiRNA).
Bio::Tools::Sim4::Results
Issue #1644 fixed - distinguish between EOF and no exons found
I'm paranoid - let's not really force the line to start with this
Issue #1644 fixed on branch - distinguish between EOF and 'no-exons-found'
Bio::Tools::Spidey::Exon
Add Ryan's modules
Bio::Tools::Spidey::Results
Add Ryan's modules
Bio::Tools::WebBlat
Added a module that runs a blat at ucsc using the standard web form cgi
Doc update
Bio::Tools::dpAlign
Implemented global alignment using dynamic programming.
Now supports custom subsitution matrix for protein
Fixed the write_pretty_str_align call
Fixed line 376 with call to Bio::Ext
Fully qualify the 'Align' package
Bio::Tools::tRNAscanSE
TRNAscan-SE parser of tRNA location predictions in genome; not parsing structure at this time
Bio::Tree::DistanceFactory
Distance Factory for generating phylogenetic trees from pairwise distance matricies - UPGMA implemented
Neighbor-Joining implemented, WooHoo
Removed left-over that perl would complain about.
Bio::Tree::Draw::Cladogram
A Cladogram drawing module
A Cladogram and Tanglegram drawing module
A Cladogram and Tanglegram drawing module
Bio::Tree::Node
Suppport non-numeric values for bl for PAML labels - although they should probably go on bootstraps
Support new method for auto-quoting ids which have normally unallowed values in node ids and labels
Leaf nodes have height 0
Bio::Tree::NodeI
Support new method for auto-quoting ids which have normally unallowed values in node ids and labels
Depth function to get 'how far we are from the root'. Height is 'how far are we from the bottom (0)'. Small change to get_all_Descedents so that DFS should work properly now
Bio::Tree::Statistics
A simplistic (and probably still flawed) boostrap counting method - need to switch to consensus tree building instead I expect
Bio::Tree::TreeFunctionsI
Document the function
Doc inconsistent with method name
Doc fix
Correction to how reroot works when requested for a 'leaf' node and how internal 'fake' root nodes are removed during the re-root process. Thanks to John Calley for illustrating the bug
Bio::TreeIO::TreeEventBuilder
Support labeling root nodes
Bio::TreeIO::cluster
Guillaume Rousse's code for turning Algorithm::Cluster::treecluster into a tree
Bio::TreeIO::lintree
Fix Issue #1614 - was returning empty trees instead of undef when got to end of file
Fix Issue #1614 - was returning empty trees instead of undef when got to end of file
Valentin's fix for long taxaids; Issue #1625
Valentin's fix; Issue #1616
Merge on branch Valentin's fix; Issue #1616
Bio::TreeIO::newick
Print out a tree count at the beginning of the tree output which is needed for things like protml and codeml
Special case for PAML branch labels
Rename doc to 'new' to avoid confusing
Support labeled root label, fix problem with single node clades and their ancestors getting written as sister nodes. Added new function id_output which auto quotes ids which contain spaces, parens,;,commas
Bio::TreeIO::nexus
Support MrBayes produced trees
Some cleanup- wasn't matching all nexus formats
Handle nexus trees a little better I hope - more testing needed
Issue #1656
Back out attempted MrBayes parsing patch - shouldn't be needed; #1619
Support writing NEXUS format trees - still need some more documentation
Bio::TreeIO::pag
Added pagel format output
Documentation added
Based on Mark/Daniel's code more closely - still needs work to fully support the phylip2pag module
Print trait count based on the number of traits seen (we assume that trait have been consistently assigned for all tip nodes - better code to check this in the future
Bio::TreeIO::svggraph
Parameterize values in SVG writing - from Guilaume Rousse
Bio::WebAgent
Fix Revision string problems
FAQ
Added a FAQ about the project design and links to mailing list
Merge FAQ changes onto branch
Where is Factory::EMBOSS?
Oops - Brian accidently committed html not txt here
Regen faq with updates
Info about bioperl-ext
Remove 7
Add back
Added citation to MBE paper using Bio::PopGen modules
Makefile.PL
Added SVG::Graph dep
Add module names that use SVG or SVG::Graph
Add Clone and AutoClass
Mention XML::SAX dependancy
biodatabases.pod
Paragraph about OBDA
Add over/back
Correct bioperl-db module names, some editing
bioperl.pod
Sentence on HOWTOs
Edits. Remove repetitive, rearrange
bioscripts.pod
Extra line
Add mention of nexus2nh.pl
Removed script
Add mention of contig_draw.PLS
Add mention of search2table.PLS
Added a script to turn fasta -m9 into NCBI-like m9 output
Added pagel format output
Add mention of bp_embl2picture.PLS
Remove this mention, Tools::RestrictionEnzyme is no longer supported
Describe genbank2gff3.PLS
bptutorial.pl
Change version
Add maf to AlignIO list
Add tinyseq
A little more detail
Add mention of dpAlign
Fix links, add links
Fix Issue #1598
Old link
Add mention of AlignIO po format
Add KEGG to list of formats
Require XML::Writer so that tutorial.t fails more gracefully
Add Jurgen's example
Add AGAVE
Add mention of interpro
List of formats in 1 file, not 3
Edits
S/desc/description/; supress empty string warnings on print
doc/makedoc.PL
Dos2unix
examples/biblio/biblio_examples.pl
Useful example
Show ultra-abbreviated version as well
Vocabulary, title
Note that there are 3 repositories whose contents aren't necessarily the same!
Dos2unix
examples/contributed/rebase2list.pl
Dos2unix
examples/generate_random_seq.pl
Dos2unix
examples/longorf.pl
Dos2unix
examples/revcom_dir.pl
Dos2unix
examples/searchio/blast_example.pl
Format
examples/searchio/custom_writer.pl
Added -fh=>\*ARGV to SearchIO input constructor calls to enable STDIN or @ARGV reading, given change in Root::IO::_readline() as of release 1.303.
examples/searchio/hitwriter.pl
* Added -fh => \*ARGV to the input SearchIO constructor call. * Added note about which cols require HSP alignment data. * Ouptputting query length and number of hits for each report.
Exercising the interfaces a little more.
Migrating revision 1.2.2.1 to the main trunk.
examples/searchio/hspwriter.pl
Added -fh=>\*ARGV to SearchIO input constructor calls to enable STDIN or @ARGV reading, given change in Root::IO::_readline() as of release 1.303.
examples/searchio/htmlwriter.pl
Added -fh=>\*ARGV to SearchIO input constructor calls to enable STDIN or @ARGV reading, given change in Root::IO::_readline() as of release 1.303.
examples/searchio/psiblast_features.pl
Added -fh=>\*ARGV to SearchIO input constructor calls to enable STDIN or @ARGV reading, given change in Root::IO::_readline() as of release 1.303.
examples/searchio/psiblast_iterations.pl
Added -fh=>\*ARGV to SearchIO input constructor calls to enable STDIN or @ARGV reading, given change in Root::IO::_readline() as of release 1.303.
examples/searchio/rawwriter.pl
Added -fh=>\*ARGV to SearchIO input constructor calls to enable STDIN or @ARGV reading, given change in Root::IO::_readline() as of release 1.303.
examples/searchio/resultwriter.pl
Added -fh=>\*ARGV to SearchIO input constructor calls to enable STDIN or @ARGV reading, given change in Root::IO::_readline() as of release 1.303.
examples/searchio/waba2gff.pl
Added -fh=>\*ARGV to SearchIO input constructor calls to enable STDIN or @ARGV reading, given change in Root::IO::_readline() as of release 1.303.
examples/searchio/waba2gff3.pl
Unroll WABA states into suitable GFF3
examples/structure/structure-io.pl
Strucure example
Dos2unix
examples/tools/restriction.pl
Remove this, Tools::RestrictionEnzyme is no longer supported
examples/tools/run_genscan.pl
Better, more Bioperl-ish
Correction
examples/tools/standaloneblast.pl
Cleanup per Issue #1598 due to migration to SearchIO by default instead of BPlite
Small api change fix as well
Fix per Issue #1598 on the branch
Dos2unix
maintenance/authors.pl
Add Id line
maintenance/modules.pl
Added definitions of class categories as written down by Albert
Option ---dir vasnot documented
Add Id line
maintenance/pod.pl
Add Id line
maintenance/version.pl
Add the version declaration for each module
Add Id line
models/README
Add mention that these are version 1.0 diagrams
scripts/Bio-DB-GFF/bp_genbank2gff.PLS
Updating docs
scripts/Bio-DB-GFF/bulk_load_gff.PLS
Added a workable mechanism to change the grouping behavior in the ninth column of GFF2
Added some timing diagnostics to the gff load scripts; you will need Time::HiRes to see this feature
Quenched a bug that prevented Bio::DB::GFF from loading attributes whose values are zero.
Removed debugging code inadvertently left in the bulk loader
-Changed streaming behavior to allow multiple embedded sequences -Added option to process very large numbers of GFF files (> kernel limit for command line arguments)
Added warnings regarding the maxbin value
Fixed a mess of errors introduced when I tried to add bin overflow checking to the GFF loaders
Give a hint as to what the size so we can more easily adjust MAX_BIN
-brought fast_load_gff into sync with bulk_load_gff wrt 1) support for multiple embedded sequences 2) support for large numbers of files > kernel limit for command line args -removed STDIN dependency in both scripts for standalone --fasta loading
-merged Lincoln's maxfeature changes back into fast_load_gff -fixed regex that was not sorting compressed fasta and gff files in @ARGV properly
scripts/Bio-DB-GFF/fast_load_gff.PLS
Added a workable mechanism to change the grouping behavior in the ninth column of GFF2
Added some timing diagnostics to the gff load scripts; you will need Time::HiRes to see this feature
Added warnings regarding the maxbin value
Fixed a mess of errors introduced when I tried to add bin overflow checking to the GFF loaders
-brought fast_load_gff into sync with bulk_load_gff wrt 1) support for multiple embedded sequences 2) support for large numbers of files > kernel limit for command line args -removed STDIN dependency in both scripts for standalone --fasta loading
-merged Lincoln's maxfeature changes back into fast_load_gff -fixed regex that was not sorting compressed fasta and gff files in @ARGV properly
scripts/Bio-DB-GFF/genbank2gff3.PLS
A script to convert Genbank flatfiles into GFF3 suitable for gbrowse on a Bio::DB:GFF backend. It uses Chris Mungall's Unflattener libraries and borrows elements from Scott Cain's GenBank unflattener. This script handles gene-related feature IDs differently to avoid loss of alternative splice variants that have common three prime and five prime ends. It has been tested on the refseq genbank-format files for the mouse and human genome builds as well as a few 3rd party annotated genbank accessions.
Modified TypeMapper.pm to translate 'Protein' to 'protein' (perhaps there ought to be a general 'lower' and then deal with exceptions like CDS). Also modified genbank2gff3.PLS to produce a full reference sequence line in addition to the ##sequence-region directive.
Fixing a bug where '0' would print for a gff line if problems happened
Added --ethresh option for unflattener; setting this high raises the threshold at which errors are showstoppers
scripts/Bio-DB-GFF/load_gff.PLS
Added a workable mechanism to change the grouping behavior in the ninth column of GFF2
Added warnings regarding the maxbin value
Fixed a mess of errors introduced when I tried to add bin overflow checking to the GFF loaders
scripts/Bio-DB-GFF/meta_gff.PLS
Added simple Bio::DB::GFF meta-data getter/setter
scripts/Bio-DB-GFF/pg_bulk_load_gff.PLS
Mirrorring the group preference stuff lincoln added to the other loaders
scripts/Bio-DB-GFF/process_ncbi_human.PLS
Script out of date and being removed until a replacement script can be produced
scripts/DB/bioflat_index.PLS
Fixed transcript glyph so that nonstranded features are not automatically treated as + strand
scripts/DB/biogetseq.PLS
Fixed Bio::DB::Failover to properly pass get_seq_by_version() method, and fixed Bio::DB::Flat::BDB to properly implement it
scripts/biographics/bp_embl2picture.PLS
Added the EMBL/Genbank entry renderer to the scripts directory, since it is probably of general interest
scripts/biographics/bp_glyphs2-demo.PLS
Added new glyphs
scripts/graphics/contig_draw.PLS
This is the start of another demo that draws contigs./
scripts/graphics/frend.PLS
Fixed handling of reverse strands in the super-short version of the featureFile renderer
scripts/index/bp_fetch.PLS
Allow ":" in the sequence id name
scripts/index/bp_index.PLS
Case insensitive search for the module to use
scripts/searchio/README
Added a script to turn fasta -m9 into NCBI-like m9 output
scripts/searchio/fastam9_to_table.PLS
Added a script to turn fasta -m9 into NCBI-like m9 output
Oops - fix so that we match lines where number is > 999
Deal with possibility of regexp in the sequence name
scripts/searchio/search2table.PLS
This looks like a *PLS script to me!
scripts/searchio/search2table.pl
Searchio reports into a tabular format like NCBI's -m 9 format
This looks like a *PLS script to me!
scripts/seq/seqconvert.PLS
Fixed it so that it recognises chadoxml
scripts/seq/split_seq.PLS
Support cmdline and STDIN file with transparent ARGV handle
scripts/seq/translate_seq.PLS
Lincoln's changes to initialize_io mean that magic <> will not pick up cmdline stuff - either pass in on cmdline
Default to fasta format
scripts/seq/unflatten_seq.PLS
See bp maillist for full desc the script seq/unflatten_seq will now generate GFF3 - the unflattener module is used to build the 'feature graph' connecting genes, transcripts, exons and CDSs together. This means we can have GFF3 for anything in genbank! As far as I'm aware, the only other sensible output formats to use here (ie formats that support feature graphs/containment hierarchies) are: chado, chaos, and the write-only asciitree. This feature graph is written out in the GFF3 using the ID and Parent tags. To do this there is an extra intermediate step - the bioperl FeatureHolderI hierarchy is traversed and ID/Parent tags are generated. Here is a description of the changes I have made: [unless you're a bioperl hacker you don't really need to read the rest of this] You can get the context of what I'm on about from this thread: http://bioperl.org/pipermail/bioperl-l/2003-December/014150.html Two new public methods: FeatureHolderI->set_ParentIDs_from_hierarchy sets both ID and ParentID from FeatureHolder hierarchy SeqFeatureI->generate_unique_persistent_id this is required by the above method Lincoln wanted this to be private, but I think it has to be called from outside FeatureHolderI->create_hierarchy_from_ParentIDs the inverse of set_ParentIDs_from_hierarchy
Moved ID methods to new class IDHandler see thread 'new GFF3 support methods' http://bioperl.org/pipermail/bioperl-l/2004-March/thread.html
Rationalised ID generation code in chaosxml write adapter changed verbose reporting in unflattener to use stderr
scripts/taxa/local_taxonomydb_query.PLS
Some slightly more interesting examples
scripts/tree/TAG
Added pagel format output
scripts/tree/nexus2nh.PLS
Convert to proper PLS
scripts/tree/nexus2nh.pl
Convert nexus trees to newhampshire (but maintain long taxon names)
Convert to proper PLS
scripts/tree/tree2pag.PLS
Added pagel format output
scripts/utilities/bp_sreformat.PLS
Autodetect MSA formats, also die with different message when file cannot be opened
Support same default option of grabbing file from cmdline
Accept cmd-line stuff again
Allow cmdline setting of displayname to flat
scripts/utilities/dbsplit.PLS
Some defaults
scripts/utilities/mutate.PLS
Little bug
Dos2unix
scripts/utilities/pairwise_kaks.PLS
More verbose warnings
Use uc for uppercase
scripts/utilities/search2gff.PLS
Search2gff does the right thing now - now strand is done right properly
More cleanup for GFF3 supported
Simplify code - get min/max for match from HSP data rather than computing from tile overlap - get match container score from hit->bits
Quiet if wanted
scripts/utilities/search2tribe.PLS
Support old behavior
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox