Bioperl Best Practices

From BioPerl
Jump to: navigation, search


This page is a list of best practices for anyone contributing to the Bioperl project. For more detail see Advanced BioPerl. The biodesign document describes some basics of how modules are designed.

Contents

Style

  • Use spaces instead of tabs for indenting
  • Prefix protected/private subroutines/fields with an underscore
  • Interface modules end with a capital "I", e.g. Bio::LocationI
  • Driver modules which are loaded dynamically from a "deployer" module are all lower case, e.g. Bio::SeqIO::genbank
  • Use blessed hashes for class fields
  • Use a combined getter/setter accessor function for each class field
  • Parse potentially large input files "on-demand" rather than reading all in at once into memory (eg. make use of Bio::PullParserI)

Coding

General coding practices

  • Always use strict;
  • Use return; instead of return undef;
  • Use our $x; instead of use vars ($x); BEGIN { $x=... };
  • Use generalized quotes instead of escaping, e.g. qq{error in "$file"} not "error in \"$file\""
  • Use clearer uc($s),lc($s),quotemeta($s) rather than "\U$s","\L$s","\Q$s"

Error handling and debugging

  • Use $self->throw() instead of die() / confess()
  • Use $self->warn() instead of warn() / carp() / cluck()
  • Use $self->debug() instead of print STDERR "...."

I/O and cross-platform

  • Build file paths with Bio::Root::IO->catfile(@dir) or File::Spec->catfile() instead of join('/',@dir)
  • Use File::Spec functions for portability across platforms
  • Use the 3-argument form of open, e.g. open my $FH, '<', 'filename.txt'
  • Use lexical auto-vivified file handles rather than globs, e.g. open my $OUT, '>', 'output.txt'
  • Pre-declare file handles so they don't mask earlier declarations in the same scope (specially when switching from read to write open() modes and vice-versa):
{
  my $FH; # 1st and unique declaration
  open $FH, "<", $file or $self->throw("Cannot open $file: $!");
  my @data = <$FH>;
  # do something with @data...
  open $FH, ">", $file or $self->throw("Cannot write to $file: $!");
  print $FH @data;
  close $FH;
}
 
# NOT
{
  open my $FH, "<", $file or $self->throw("Cannot open $file: $!"); # 1st declaration
  my @data = <$FH>;
  # do something with @data...
  open my $FH, ">", $file or $self->throw("Cannot write to $file: $!"); # 2nd declaration
  print $FH @data;
  close $FH;
}

BioPerl Object-oriented programming and modules

  • Use use base qw(Bio::Class); instead of use vars qw(@ISA); @ISA=qw(Bio::Class);
  • Use Bio::Class->new() instead of new Bio::Class()
    • Indirect object syntax can lead to subtle errors which are best avoided.
    • Never use method Bio::Class(@args): this simply doesn't work on some systems.
  • Modules must end by returning true: have 1; as the last line

Methods

  • For easier code maintenance, unload @_ into named variables. If there are more that two arguments present, use named parameters and Bio::Root::RootI->_rearrange(). In general, always use Bio::Root::RootI->_rearrange() for maintainability unless there is a demonstrable and significant performance issue.
    • The method _rearrange() takes two arguments. The first argument is an array reference containing the name of the parameters in upper-case letters. The second argument is the array of parameter-value pairs.
# unloading method arguments, two args
sub foobar {
   my ($self, $start, $end) = @_;
   ...
}
 
# unloading method arguments, more than two args
sub barfoo {
   my ($self, @args) = @_;
   my ($start, $end, $score, $strand) = $self->_rearrange(
            [qw(START END SCORE STRAND)], @args);
   ...
}
  • The use of AUTOLOAD is controversial for most core BioPerl developers but has been used for bioperl-run
    • See the following links (here and here) for the mail list threads concerning the use of AUTOLOAD in BioPerl.
    • In short, it is highly recommended not to use AUTOLOAD in the core modules unless absolutely necessary, primarily for performance reasons but also because the UNIVERSAL method $self->can() will not work for AUTOLOAD'ed methods.
    • As an alternative, especially for Run wrappers, the use of _set_from_args() is recommended, most likely in combination with _setparams:
sub new {
  my($class, @args) = @_;
  my $self = $class->SUPER::new(@args);
 
  $self->_set_from_args(\@args, -methods => [@allowed_methods],
                                -create => 1);
 
  return $self;
}
 
sub _setparams {
  ...
 
  my $param_string = $self->SUPER::_setparams(
      -params => [@settable_methods], -dash => 1);
  return $param_string;
}

Regular Expressions

  • Don't use the slow special regexp variables $` $& $' $- $+
  • Avoid regexps where possible: string eq > index() > =~
  • Use generalized quotes instead of escaping, e.g. m{//} not /\/\//
  • Avoid using the o (compile-once) modifier when combining regular expressions with interpolated variables and loops, which will result in subtle errors. The following compiles the regex to only find 'start', so here the regex will always match, even with 'foobar':
...
my @strings = qw(hello goodbye start end flag score);
 
while (my $string = shift @strings) {
    for my $flag (qw(start end hello foobar)) {
        if ($string =~ m{^$flag}o) {
            print "Got $flag!\n";
        }
    }
}
  • Use qr/.../ rather than strings to pre-store regexps as they provide compile-time syntax checking
  • Use capture parentheses only for capturing, otherwise use (?:)
  • For easier code maintenance, unload regex capture variables like $1 into named variables (similar to what is done for methods, above):
if (my ($start, $end, $strand, $score) = $line =~ m{^(\d+)\s+(\d+)\s+(\d)\s+(\d+)}xms) {
   ...
}
 
# alternatively (same as above)
if ($line =~ m{^(\d+)\s+(\d+)\s+(\d)\s+(\d+)}xms) {
   my ($start, $end, $strand, $score) = ($1, $2, $3, $4);
   ...
}

Sorting

  • Never directly return from a sort (for background see reference) :
sub foo {
  # ...
  @sorted = sort @unsorted;
  return @sorted;
}
 
# NOT
sub bar {
  # ...
  return sort @unsorted;
}
 
# The latter form has undefined behaviour if bar() is
# called in scalar context
  • When sorting objects by their method values, use a Schwartzian transformation:
@sorted = map { $_->[1] }
          sort { $a->[0] <=> $b->[0] }
          map { [$_->method(), $_] }
          @unsorted;
 
# NOT
@sorted = sort { $a->method() <=> $b->method() } @unsorted;
 
# The latter form is inefficient and can cause subtle bugs
# if method() (indirectly) calls its own sort subroutine

Testing

  • Every module must have tests
  • Test scripts should be named t/Module.t
  • Test data files go in t/data/ in the version control repository
  • Use Bio::Root::Test to write your test script. See the How To for details.
  • Before committing changes to the version control repository, make sure that the relevant test script passes:
# Do this once, answering 'no' to script installation
perl Build.PL
 
# Then do this every time you want to run a test script where test.t is the name of the script
./Build test --test_files t/test.t --verbose
 
# Note that 'perl -I. -w t/test.t' is NOT good enough, since it won't catch all problems
 
# When you're happy the script passes on its own, run the entire test suite
./Build test
 
# If everything passes, commit

POD

  • Ensure your POD has a =head1 NAME section with the fully qualified module name and a description e.g.
=head1 NAME

Bio::Tools::MyTool - parse MyTool gene predictions

=head1 SYNOPSIS

# Synopsis code demonstrating the module goes here

=head1 DESCRIPTION

A description about this module.

=cut 
  • Tests will be included that check there is POD for each public method in a module. Although these tests will not enforce POD for private methods (those starting with an underscore: '_'), it is also advisable to include POD for these methods as it helps other developers to identify what the method is supposed to be for. POD for methods should be in a form such as
=head2 method_name

Title    : method_name
Usage    : Some small examples of method usage
Function : Some description about what the method does
Returns  : What the method does
Args     : What arguments the method takes

=cut
  • It is preferable that you also include the following boilerplate in the POD (with the author section filled in appropriately)


=head1 FEEDBACK

=head2 Mailing Lists

User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to the
Bioperl mailing list. Your participation is much appreciated. 

  bioperl-l@bioperl.org                  - General discussion
  http://bioperl.org/wiki/Mailing_lists  - About the mailing lists

=head2 Reporting Bugs 

Report bugs to the Bioperl bug tracking system to help us keep track
of the bugs and their resolution. Bug reports can be submitted via the
web:

  http://bugzilla.open-bio.org/  

=head1 AUTHOR - NAME OF AUTHOR

The author(s) and contact details should be included here (this insures you get credit for creating the module.  
Lesser contributions can be documented in a separate CONTRIBUTORS section if you prefer. 

=cut
  • All the general documentation about a module should be placed before any code, and each method should have its own documentation just before the method code.
  • Use podchecker to check your POD syntax
  • If using Emacs, use the bioperl.lisp macros - there is a standard boilerplate you can follow.
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox