Biopipe FAQ ----------- v. 1.0 This FAQ maintained by: * Shawn Hoon --------------------------------------------------------------------------- Contents --------------------------------------------------------------------------- 0. About this FAQ Q0.1: What is this FAQ? Q0.2: How is it maintained? 1. Biopipe questions Q1.1: How do I start? Q1.2: What are the methods that I have to insert in the datahandler table? Q1.3: What should be inserted in the field 'name' in the argument table? Q1.4: When I create the database for the pipeline should I manually create the tables? Q1.5: In the input table, there is a field called 'name'. What's that? Q1.6: There are 2 dbnames, one in the analysis table and the other in the dbadaptor table. Are they the same? Q1.7: The pipeline ran well the first time but didn't the second time. What could have gone wrong? Q1.8: The pipeline is not running. All the tables are populated correctly. What went wrong? Q1.9: There are many tables that I left empty. Don't we need to put something in them? Q1.10: What about the output of the pipeline? --------------------------------------------------------------------------- 0. About this FAQ --------------------------------------------------------------------------- Q0.1: What is this FAQ? A: It is the list of Frequently Asked Questions about Biopipe. Q0.2: How is it maintained? A: This FAQ was generated using a Perl script and an XML file. All the files are in the Bioperl distribution directory doc/faq. So do not edit this file! Edit file faq.xml and run: % faq.pl -text faq.xml The XML structure was originally used by the Perl XML project. Their website seems to have vanished, though. The XML and modifying scripts were copied from Michael Rodriguez's web site http://www.xmltwig.com/xmltwig/XML-Twig-FAQ.html and modified to our needs. --------------------------------------------------------------------------- 1. Biopipe questions --------------------------------------------------------------------------- Q1.1: How do I start? A: First install the components, as described in http://www.biopipe.org/bioperl-pipeline-install.html. Next steps: - Create a DB for the pipeline, e.g genscan_pipe (follow the schema of course) - Update the DB in the PipeConf.pm - Populate the DB Tables (see Populating the Tables in INSTALL for more Info) Q1.2: What are the methods that I have to insert in the datahandler table? A: The methods in the datahandler table are used to get the input for the runnable and store the output of the runnable. If the input object is in a database, we will be instantiating a new dbadaptor and use its method to retrieve the data from the database. The data will be the input to the runnable. The output of the runnable(an object) has to be stored in the database. Hence the methods that do this has to be specified. Example: +----------------+--------------+---------------------------+----- -+ | datahandler_id | iohandler_id | method | rank | +----------------+--------------+---------------------------+----- -+ | 1 | 1 | get_Contig_by_internal_id | 1 | | 2 | 1 | perl_primary_seq | 2 | | 3 | 2 | get_ScoreAdaptor | 1 | | 4 | 2 | store_by_PID | 2 | +----------------+--------------+---------------------------+----- -+ The method get_Contig_by_internal_id gets the contig from the database from its internal id. Then the perl_primary_seq is called upon the contig to obtain a Seq object from the contig. The output from the runnable (genscan) a seqfeature object will have to be stored back to the database (or other database). Hence the adaptor for the database will obtain the scoreAdaptor via get_ScoreAdaptor method and the score Adaptor will store the output in the database by the store_by_PID method. Q1.3: What should be inserted in the field 'name' in the argument table? A: The argument table is for the arguments required by the methods in the datahandler table. Example: +-------------+----------------+--------+------+--------+ | argument_id | datahandler_id | name | rank | type | +-------------+----------------+--------+------+--------+ | 1 | 1 | INPUT | 1 | SCALAR | | 2 | 4 | OUTPUT | 1 | SCALAR | +-------------+----------------+--------+------+--------+ Since only two methods take arguments (in this case) there are only 2 entries in the arguments table. In the field 'name', there entries 'INPUT' and 'OUTPUT'. The method with corresponding datahandler_id's 1 and 4 are the methods that retrieve the input and store the output respectively. Q1.4: When I create the database for the pipeline should I manually create the tables? A: No. Users can copy the biopipelinedb-mysql.sql scheme when they create a new database. Hence they don't have to manually create tables. Steps:- go to the directory where the schema file is then type these commands (assuming the dbname is testdb): mysql>use testdb; Database changed mysql> source biopipelinedb-mysql.sql; Q1.5: In the input table, there is a field called 'name'. What's that? A: When we fetch an input from the database based on certain fields, that field will be the name in the input directory. If the method get_Contig_by_internal_id is used, then the internal_id will be the name in this case. The name of every single entry has to be populated into the input table before running the pipeline. Q1.6: There are 2 dbnames, one in the analysis table and the other in the dbadaptor table. Are they the same? A: No. The db in the analysis table is for analysis like Blast where it requires a database to blast against. The dbname in the dbadaptor table is the database(s) where the input is retrieved or stored. Q1.7: The pipeline ran well the first time but didn't the second time. What could have gone wrong? A: There could be many reasons for this. The most probable reason could be that you failed to empty the output and job tables after running the pipeline the first time. You might get a message like this: Tests Completed. Starting Pipeline Fetched 0 jobs Waking up and run again! The two tables have to be reset everytime the pipeline is run. Q1.8: The pipeline is not running. All the tables are populated correctly. What went wrong? A: Again as in (7) there could be more than 1 reason for this. One possibility is that you might be getting messages like this: Tests Completed. Starting Pipeline Fetched 2 jobs opening bsub command line: bsub -o /data0/tmp//2/genscan_pipe.job_1.Genscan.1025852231.749.out -e /data0/tmp//2/genscan_pipe.job_1.Genscan.1025852231.749.err -q /usr/users/savikalpa/src/bioperl-pipeline//Bio/Pipeline/runner.pl 1 2 couldn't submit jobs 1 2 to LSF. Waking up and run again! This could be due to some problem in sending the job to LSF. Try running it locally (i.e. perl PipelineManager.pl -l) if it runs then it confirms this. Q1.9: There are many tables that I left empty. Don't we need to put something in them? A: Some of the tables like the job and output tables will be automatically populated as the pipeline runs. Q1.10: What about the output of the pipeline? A: The output of the pipeline (i.e. gene object etc.) can be stored in any database of choice (need the dbadaptor of course). --------------------------------------------------------------------------- Copyright (c)2002-2003 Open Bioinformatics Foundation. You may distribute this FAQ under the same terms as perl itself.