Talk:Bioperl scripts

From BioPerl
Jump to: navigation, search

About bp_bulk_load_gff.pl

From the description "This script loads a mySQL Bio::DB::GFF database...", what does that mean? I think it means "This script creates a MySQL database that can function as a back end for the Bio::DB::GFF sequence feature annotation object..." I think.

What is confusing me is that there are several sequence feature annotation databases such as BioSQL and Chado, which are both implemented in MySQL. One question could be:

  • How does the database created by bp_bulk_load_gff relate to the BioSQL or Chado databases?

Another related question could be:

  • Can BioSQL or Chado MySQL databases function as back end for the BiO::DB::GFF?


More basic questions are, what is BiO::DB::GFF? Technically? Practically?

A good description that illuminates some of this discussion is given here (surprisingly): [1]


What I understand is this, GFF is a flat file 'sequence feature annotation' format. Flat files are problematic, as they are not indexed and they can become very large. Bio::DB::GFF is an 'interface' to sequence feature annotations that can read / write annotations in a variety of formats, including a relational implementation of the flat file. The problem here is the two different uses of the word 'format'. Perhaps its better to say that GFF is a sequence feature annotation schema, and that schema can be written in several formats (flat file or RDB). Other sequence feature annotation schemas exist, such as BioSQL or Chado. Does that seem reasonable?

Thanks for help --DanBolser 11:02, 19 January 2009 (UTC)

Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox