## $Id: bioperl.pod,v 1.23.2.2 2002-04-21 14:32:03 jason Exp $
## Should contain general info about the distribution including
## links to many modules.
##
## 'cookbook' type examples are probably better off being placed
## in the local embedded module PODs. This will make it easier for
## authors to update and maintain.
=head1 NAME
Bioperl - Coordinated OOP-Perl Modules for Biology
=head1 SYNOPSIS
Read on...
=head1 DESCRIPTION
Bioperl contains a number of Perl objects which are useful in biology.
Examples include Sequence objects, Alignment objects and database
searching objects. These objects not only do what they are advertised
to do in the documentation, but they also interact - Alignment
objects are made from the Sequence objects, Sequence objects have access
to Annotation and SeqFeature objects and databases, Blast objects can be
converted to Alignment objects, and so on. This means that the objects
provide a coordinated and extensible framework to do computational biology.
Bioperl development is focused on the Perl modules or objects themselves.
There are scripts provided in the scripts/ and examples/ directories, but
scripts are not the focus of the Bioperl developers. Of course, as the
objects do most of the hard work for you, all you have to do is combine a
number of objects together sensibly to make useful scripts.
The intent of the Bioperl development effort is to make reusable tools
that aid people in creating their own sites or job-specific applications.
The bioperl.org website at http://bioperl.org also attempts to maintain
links and archives of standalone bio-related Perl tools that are not
affiliated or related to the core Bioperl effort. Check the site for
useful code ideas and contribute your own if possible.
=head1 DOCUMENTATION
We have a cookbook tutorial in bptutorial.pl which has embedded
documentation. Start there if learning-by-example suits you most, or
examine the Bioperl online course at
http://www.pasteur.fr/recherche/unites/sis/formation/bioperl. Make sure
to check the documentation in the modules as well - there are almost 200
modules in Bioperl, and counting, and there's detail in the modules'
documentation that will not appear in the general documentation.
=head1 INSTALLATION
The Bioperl modules are distributed as a tar file that expands into a
standard perl CPAN distribution. Detailed installation directions
can be found in the distribution INSTALL file.
The Bioperl modules can now interact with local flat file and relational
databases. To learn how to set this up, look at the biodatabases.pod
documentation ('perldoc biodatabases.pod' should work once Bioperl has
been installed).
The bioperl-db, bioperl-gui, corba-server, and corba-client packages are
installed separately from Bioperl. Please refer to their respective
documentation for more information.
=head1 GETTING STARTED
A good place to start is by reading and running the cookbook script,
bptutorial.pl.
The distribution I directory has fully working, industrial
strength scripts for use with Bioperl. These are documented, and the
command 'perldoc I' will work. This area only started in the
0.05 distribution, and so not that many scripts have been written - you
are more than welcome to contribute!
The example scripts in the distribution I directory and sub
directories therein give you an idea of how to use some of the modules
and driver code.
If you have installed Bioperl in the standard way, as detailed in the
README in the distribution, these examples should work by just running
them. If you have not installed it in a standard way you will
have to change the 'use lib' to point to your installation (see INSTALL
for details).
=head1 Examples/ Directory
There are many scripts included in the distribution.
Here are brief descriptions of the scripts in the I directory:
I - examples using EMBOSS, pSW, Clustalw,
TCoffee, and Blast to align sequences
I - a script that shows how to query bibliographic
databases, such as Medline, using ids, keywords, and other fields. See
L for details
I - connect to and test a SOAP server using a
Bio::Biblio object
I - a set of scripts showing how to use Blast.pm.
Please see L for more information
I - a script showing how to use LiveSeq::Mutator
and LiveSeq::Mutation. Please see L and
L for more information
I - a demonstration of the various uses of
Alignment::Clustalw. See L for more
I - retrieving Genbank entries over the Web using
DB::GenBank. See L for more information
I - create a Protein Sequence Control Panel GUI with Gtk
I - create a GUI for displaying Blast results using
Tk::HitDisplay. Please see L for more information
I - example code for using the XS extensions for a protein
Smith-Waterman comparison
I - this script executes remote Blast using
RemoteBlast. See L for more information
I - example code for using the RestrictionEnzyme
module. See L for more information
I - examples using Bio::Seq.pm for reversing
and translating sequences. See L for more information
I - example code for using Object.pm. Please see
L for more information
I - run GENSCAN on multiple sequences and create
output sequence files using Tools::Genscan. Please see L
for more information
I - a number of scripts illustrating the use of
Bio::SearchIO for parsing Blast and PSI-Blast results. See
L for more information.
I - example code for working with multiple sequence files,
including formatting and filtering based on length or description or ID
I - a script that shows how to use sequences as
regular expressions using Tools::SeqPattern. Please see
L for more information
I - a script that demonstrates some uses of
AlignIO. Please see L for more information
I - a demonstration of some of the uses of
StandAloneBlast.pm. See L for details
I - a demonstration of how to create a
state machine using StateMachine::AbstractStateMachine. Please see
L for more information
I - scripts that show how to examine
details of the 3D structure of a protein by parsing a PDB file. See
L for more information.
I - script for testing and demonstrating Genscan.pm
I - scripts that demonstrate how to throw and
catch Error.pm objects.
I - script to test Bio::Root::Vector.pm
I - scripts that demonstrate uses of Bio::Root modules.
I - script that shows how to use Bio::DB::Registry,
part of Bioperl's integration with OBDA, the Open Bio Database Access registry
scheme. See L for more information.
=head1 Scripts/ Directory
Here are brief descriptions of the scripts in the I directory:
I - aligns nucleotide sequences based on
codons in a specified reading frame
I - scripts that reformat sequence to GFF and load
GFF format files into an indexed database - see L for
more information
I - a Bioperl shell!
I - parse a Blast results file for ids and
extract pertinent sequences from a local, indexed database using
Tools::BPlite and Index::Fasta. See L and
L for more information
I - parse a Blast result and fetch sequences from
Genbank or Genpept over the network using Tools::Blast and Bio::DB*. See
L, L, and L
I - fetch sequences from local indexed database or over
the network and reformat using Bio::Index* and Bio::DB*
I - indexes local databases, partners with bpfetch.pl
I - return reverse complement sequences of
all sequences in the current directory and save them in the same directory,
using the same names with extension changed from "seq" to "rev"
I - a set of scripts for analysis
of expression data : discriminative gene selection, leave-out-one cross
validation, relevance network of gene expression
I - sets up a minimal DAS annotation server, requires
Apache::DBI and Bio::DB::GFF. See L for details
I - creates a Web page to query a local SRS server and
fetch sequences
I - fetch EST sequences from local files or
Genbank filtered by tissue using Bio::DB* or Bio::Index*
I - fetch a sequence, find the sequences flanking
a variant or SNP in the sequence given its position
I - extracts top-level sequence features from Genbank-
formatted sequence files using Tools::GFF. See L
I - writes random RNA, DNA, or protein
sequence of given length
I - fetches and formats sequences from GenBank, EMBL,
or SwissProt over the network using Bio::DB*
I - takes an input file in GFF format and draws its genes
and features as Postscript using Tools::GFF. See L
I - a script that uses Bio::DB::Registry to retrieve
sequences from EMBL, reformat them, and print them. See L
I - translate a cDNA or ORF to protein using
Bio::Seq's translate() method
I - design PCR primers given a sequence and the
positions of the start and stop codons in the sequence's ORF
I - convert Prosite motifs to Perl regular
expressions
I - this scripts fetchs a sequence from a remote
database, extracts its features (CDS, gene, tRNA), and creates a graphic
representation of the sequence in PNG or GIF format. See L
and L
I - calculate amino acid composition of a
protein using Tools::CodonTable and Tools::IUPAC. See L
and L for more information
I - produce a PNG or JPEG chaos plot given a
DNA sequence using GD.pm
I - calculate %GC given a DNA sequence using
Tools::SeqStats. See L for more information
I - calculates oligomer frequencies given
an oligomer length and a sequence
I - extracts individual conformers
from an NMR-derived PDB file
I - CGI script to fetch a sequence from Genbank
and extract a subsequence using DB::GenBank. See L
I - convert a PAUP tree block to Phylip format
=head1 GETTING INVOLVED
Bioperl is a completely open community of developers. We are not
funded and we don't have a mission statement. We encourage
collaborative code, in particular in Perl. You can help us in many
different ways, from just a simple statement about how you have used
Bioperl to doing something interesting to contributing a whole new object
hierarchy. See http://bioperl.org for more information. Here are
some ways of helping us:
=head2 Asking questions and telling us you used it
We are very interested to hear how you experienced using Bioperl. Did
it install cleanly? Did you understand the documentation? Could you
get the objects to do what you wanted them to do? If Bioperl was useless
we want to know why, and if it was great - that too. Post a message to
bioperl-l@bioperl.org, the Bioperl mailing list, where all the developers
are.
Only by getting people's feedback do we know whether we are providing
anything useful.
=head2 Writing a script that uses it
By writing a good script that uses Bioperl you both show that Bioperl
is useful and probably save someone elsewhere writing it. If you
contribute it to the 'script central' at http://bioperl.org then other
people can view and use it. Don't be nervous if you've never done this
sort of work, advice is freely given and all are welcome!
=head2 Find bugs!
We know that there are bugs in there. If you find something which you are
pretty sure is a problem, post a note to bioperl-bugs@bioperl.org and
we will get on it as soon as possible. You can also access the bug
system through the web pages.
=head2 Suggest new functionality
You can suggest areas where the objects are not ideally written and
could be done better. The best way is to find the main developer
of the module (each module was written principally by one person,
except for Seq.pm). Talk to him or her and suggest changes.
=head2 Make your own objects
If you can make a useful object we will happily include it into the
core. Probably you will want to read a lot of the documentation
in the L, talk to people on the Bioperl mailing list,
bioperl-l@bioperl.org, and read biodesign.pod. biodesign.pod provides
documentation on the conventions and ideas used in Bioperl, it's definitely
worth a read if you would like to be a Bioperl developer.
=head1 ACKNOWLEDGEMENTS
Bioperl owes its early organizational support to its association with
the award-winning VSNS-BCD BioComputing Courses; some students of the
1996 course (Chris Dagdigian, Richard Resnick, Lew Gramer, Alessandro
Guffanti, and others) have contributed code and commentary. Georg
Fuellen, the VSNS-BCD chief organizer was one of the early driving forces
behind Bioperl. Steven Brenner, who was an early adopter of Perl for
bioinformatics provided some of the early work on Bioperl. Lincoln Stein
has long provided guidance and code.
Bioperl was then taken up by people developing code at the large
genome centres. In particular Steve Chervitz at Stanford, Ian Korf at
the Genome Sequencing Centre (St. Louis) and Ewan Birney at the Sanger
Centre (Cambridge UK). All of the C code XS extensions were
provided by Ewan Birney. Bioperl is used in anger at these sites,
indicating that is both useful and that it works.
Jason Stajich and Hilmar Lapp joined Bioperl for the drive towards a
0.7 release over 2000 and the first part of 2001, which includes a
revised feature location model, richer feature objects (in particular
genes) and more and better tools. Peter Schattner and Lorenz Pollak
contributed serious chunks of code, being the AlignIO and bptutorial
scripts and the BPLite port to Bioperl respectively. At this time
Bioperl was being used in absolute earnest by the Ensembl group which
shook out a number of problems in the code base. Additional
compatibility with the Sequence Workbench (Bioperl-gui, Mark
Wilkinson and David Block) and Biocorba (Jason Stajich, Brad Chapman
and Alan Robinson) and finally Game-XML (Brad Marshall) provided more
interoperability.
Current server hardware for bioperl.org (and other open-bio.org hosted
projects) was provided by Compaq Computer Corporation. The donation
was facilitated by both the Pharmaceutical Sales and High Performance
Technical Computing (HPTC) groups.
The Bioperl servers reside in Cambridge, Massachusetts USA with
colocation facilities and Internet bandwidth donated by Genetics
Institute. In particular Dr. Steven Howes, Kenny Grant &
Rich DiNunno have made significant efforts on our behalf.
=head1 COPYRIGHT
Copyright (c) 1996-2000 Georg Fuellen, Richard Resnick, Steven E. Brenner,
Chris Dagdigian, Steve Chervitz, Ewan Birney, James Gilbert, Elia Stupka,
and others. All Rights Reserved. This module is free software;
you can redistribute it and/or modify it under the same terms as Perl itself.
=cut