Installing BioPerl on Unix

From BioPerl
Jump to: navigation, search

Installing BioPerl is actually really easy (despite this document). For installing on other platforms or specific distributions, see Installing BioPerl.

Contents

System requirements

Preparing to install

This is optional, but regardless of your subsequent choice of installation method, it will help to carry out the following steps. They will increase the likelihood of installation success (especially of optional dependencies).

  • Upgrade CPAN:
>perl -MCPAN -e shell
cpan>install Bundle::CPAN
cpan>q
  • Install/upgrade Module::Build, and make it your preferred installer:
>cpan
cpan>install Module::Build
cpan>o conf prefer_installer MB
cpan>o conf commit
cpan>q
  • Install the expat library by whatever method is appropriate for your system.

Note: This library is essential if you will be working with xml files. You might need to do something like this:

>wget http://sourceforge.net/projects/expat/files/expat/2.0.1/expat-2.0.1.tar.gz/download
>./configure --prefix=/home/mydir/bin/

If you install expat from your distribution installer remember that you need also the library headers so you will need to install

[ubuntu]> sudo aptitude install libexpat-dev
 
  • If your expat library is installed in a non-standard location, tell CPAN about it:
>cpan
cpan>o conf makepl_arg "EXPATLIBPATH=/non-standard/lib EXPATINCPATH=/non-standard/include"
cpan>o conf commit


Installing using Build.PL

The advantage of this approach is it's stepwise, so it's easy to stop and analyze in case of any problem.

Download, then unpack the appropriate package (note: these packages are at least 4 years old, use CPAN for latest version). For example:

>tar xvfz BioPerl-1.6.1.tar.gz
>cd BioPerl-1.6.1

Now issue the build commands:

>perl Build.PL
>./Build test

If you've installed everything perfectly and all the network connections are working then you may pass all the tests run in the './Build test' phase. It's also possible that you may fail some tests. Possible explanations: problems with local Perl installation, network problems, previously undetected bug in Bioperl, flawed test script, problems with CGI script using for sequence retrieval at public database, and so on. Remember that there are over 800 modules in Bioperl and the test suite is running more than 12000 individual tests, a few failed tests may not affect your usage of Bioperl.

If you decide that the failed tests will not affect how you intend to use Bioperl and you'd like to install anyway, or if all tests were fine, do:

>./Build install

However, if you're concerned about a failed test and need assistance or advice then contact bioperl-l@bioperl.org.

To './Build install' you need write permission in the perl5/site_perl/source area (or similar, depending on your environment). Usually this will require you becoming root, so you will want to talk to your systems manager if you don't have the necessary privileges.


INSTALLING IN A PERSONAL MODULE AREA

If you lack permission to install perl modules into the standard site_perl/ system area you can configure BioPerl to install itself anywhere you choose. This could be a personal perl directory or a standard place where you plan to put all your 'local' or personal perl modules.

Simply pass a parameter to perl as it builds your system specific makefile.

Example:

>perl Build.PL --install_base /home/users/dag
>./Build test
>./Build install

This tells perl to install all the various parts of bioperl in the desired place, e.g. creating:

  /home/users/dag/lib/perl5/Bio/

Then in your Bioperl script you would write:

use lib "/home/users/dag/lib/perl5/";
use Bio::Perl;

Installing using CPAN

You can use the CPAN shell to install Bioperl. For example:

>perl -MCPAN -e shell

Or you might have the cpan alias installed:

>cpan

Then find the name of the most recent Bioperl version:

cpan>d /bioperl/
Reading '/home/francisco/.cpan/Metadata'
  Database was generated on Wed, 19 Mar 2014 13:17:02 GMT
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CJFIELDS/BioPerl-1.6.923.tar.gz
Distribution    CJFIELDS/BioPerl-DB-1.006900.tar.gz
Distribution    CJFIELDS/BioPerl-Network-1.006902.tar.gz
Distribution    CJFIELDS/BioPerl-Run-1.006900.tar.gz

Now install:

cpan>install CJFIELDS/BioPerl-1.6.923.tar.gz

If you've installed everything perfectly and all the network connections are working then you may pass all the tests run in the './Build test' phase. It's also possible that you may fail some tests. Possible explanations: problems with local Perl installation, network problems, previously undetected bug in Bioperl, flawed test script, problems with CGI script used for sequence retrieval at public database, and so on.

Remember that there are over 800 modules in Bioperl and the test suite is running more than 12000 individual tests, a few failed tests may not affect your usage of Bioperl.

If you decide that the failed tests will not affect how you intend to use Bioperl and you'd like to install anyway do:

cpan>force install CJFIELDS/BioPerl-1.6.923.tar.gz

However, if you're concerned about a failed test and need assistance or advice then contact bioperl-l@bioperl.org.


INSTALLING IN A PERSONAL MODULE AREA

You can also use CPAN to install modules in your local directory. First enter the CPAN shell, then set the arguments for the commands"perl Makefile.PL" and "./Build install", like this:

>perl -e shell -MCPAN
cpan>o conf makepl_arg PREFIX=/home/users/dag/My_Local_Perl_Modules
cpan>o conf mbuildpl_arg "--prefix /home/users/dag/My_Local_Perl_Modules"
cpan>o conf commit

If you have problems starting a CPAN shell, you likely need to set up a local MyConfig.pm file. This article describes how to do this, though you will need to modify the PERL5LIB for your perl version and configuration (32- vs 64-bit, for instance). Further notes can also be found in the 'Configuration' section of the CPAN POD.

Alternatively you might like to use local::lib (it is becoming a standard best practice) that allows you to use CPAN as non root and takes care of everything for you. Take a look to local::lib and the bootstrapping technique for your local installs:

https://metacpan.org/module/local::lib#The-bootstrapping-technique
http://perl.jonallen.info/writing/articles/install-perl-modules-without-root


WHERE ARE THE MAN PAGES?

When using Makefile.PL (no longer covered in this documentation), we had to disable the automatic creation of man pages because this step was triggering a "line too long" error on some OSs due to shell constraints. If you want man pages installed use the Build.PL installation process discussed above.


External programs

Bioperl can interface with some external programs for executing analyses. These include clustalw and t_coffee for Multiple Sequence Alignment (Bio::Tools::Run::Alignment::Clustalw and Bio::Tools::Run::Alignment::TCoffee) and blastall, blastpgp, and bl2seq for BLAST analyses (Bio::Tools::Run::StandAloneBlast), and to all the programs in the EMBOSS suite (Bio::Factory::EMBOSS).


Environment variables

Some modules which run external programs need certain environment variables set. If you do not have a local copy of the specific executable you do not need to set these variables. Additionally the modules will attempt to locate the specific applications in your runtime PATH variable. You may also need to set an environment variable to tell BioPerl about your network configuration if your site uses a firewall.

Setting environment variables on unix means adding lines like the following to your shell *rc file.

For bash or sh:

export BLASTDIR=/data1/blast

For csh or tcsh:

setenv BLASTDIR /data1/blast

Some environment variables include:

Env. Variable Description
BLASTDIR Specifies where the NCBI blastall, blastpgp, bl2seq, etc.. are located. A 'data' directory could also be present in this directory as well, you could put your blastable databases here.
BLASTDATADIR or BLASTDB If one does not want to locate the data dir within the same dir as where the BLASTDIR variable points, a BLASTDATADIR or BLASTDB variable can be set to point to a dir where BLAST database indexes are located.
BLASTMAT The directory containing the substitution matrices such as BLOSUM62.
CLUSTALDIR The directory where the clustalw executable is located.
TCOFFEEDIR The directory where the t_coffee executable is located.
http_proxy If you access the internet via a proxy server then you can tell the Bioperl modules which require network access about this by using the http_proxy environment variable. The value set includes the proxy address and port, with optional username/password for authenticating proxies. e.g. http://USERNAME:PASSWORD@proxy.example.com:8080


Installing Bioperl scripts

Bioperl comes with a set of production-quality scripts that are kept in the scripts/ directory. You can install these scripts if you'd like: simply answer the questions during 'perl Build.PL'. The installation directory can be specified by

perl Build.PL
./Build install --install_path script=/foo/scripts

By default they install to /usr/bin or similar, depending on platform.


INSTALLING BIOPERL MODULES THE HARD WAY

As a last resort, you can simply copy all files in Bio/ to any directory in which you have write privileges. This is generally NOT recommended since some modules may require special configuration (currently none do, but don't rely on this).

You will need to set "use lib '/path/to/my/bioperl/modules';" in your perl scripts so that you can access these modules if they are not installed in the standard site_perl/ location. See above for an example.

To get manpage documentation to work correctly you will have to configure man so that it looks in the proper directory. On most systems this will just involve adding an additional directory to your $MANPATH environment variable.

The installation of the Compile directory can be similarly redirected, but execute the make commands from the Compile/SW directory.

If all else fails or are unable to access the perl distribution directories, ask your system administrator to place the files there for you. You can always execute perl scripts in the same directory as the location of the modules (Bio/ in the distribution) since perl always checks the current working directory when looking for modules.


USING MODULES NOT INSTALLED IN THE STANDARD LOCATION

You can explicitly tell perl where to look for modules by using the lib module which comes standard with Perl.

Example:

#!/usr/bin/perl
use lib "/home/users/dag/My_Local_Perl_Modules/";
use Bio::Perl;
# <...insert whizzy perl code here...>

Or, you can set the environmental variable PERL5LIB:

csh or tcsh:

setenv PERL5LIB /home/users/dag/My_Local_Perl_Modules/

bash or sh:

export PERL5LIB=/home/users/dag/My_Local_Perl_Modules/


Testing your install

The Bioperl test system is located in the t/ directory and is automatically run whenever you execute the './Build test' command (having previously run 'Perl Build.PL'; if you have already installed Bioperl answer 'no' to script installation to get nicer test output later).

Alternatively if you want to investigate the behavior of a specific test such as the Seq test you would type:

>./Build test --test_files t/Seq.t --verbose

The ./ ensures you are using the Build script in the current directory to make sure you are testing the modules in this directory not ones installed elsewhere. The --test_files arguement can be used multiple times to try a set of test scripts in one go. The --verbose arguement outputs the detailed test results, instead of just the summary you see during './Build test'.

If you are trying to learn how to use a module, often the test suite is a good place to look. All good extreme programmers try and write a test BEFORE they write the module to insure that their module behaves the way they expect. You'll notice some 'ok' and 'skip' commands in a test, this is part of the Perl test suite that signifies a passed test with an 'ok N', where N is the test number. Alternatively you can tell Perl to skip tests. This is useful when, for example, your test detects that the network is not present and thus should skip, not fail, any tests that require a network connection.

[Optional] building bioperl-ext

The bioperl-ext package contains C code and XS extensions for various alignment and trace file modules (Bio::Tools::pSW for DNA Smith-Waterman, Bio::Tools::dpAlign for protein Smith-Waterman, Bio::SearchDist for EVD fitting of extreme value, Bio::SeqIO::staden).

This Installation works out-of-the box for all platforms except BSD and Solaris boxes. For other platforms skip this next paragraph.

Configuring for BSD and Solaris boxes

You should add the line -fPIC to the CFLAGS line in Compile/SW/libs/makefile. This makes the compile generate position independent code, which is required for these architectures. In addition, on some Solaris boxes, the generated Makefile does not make the correct -fPIC/-fpic flags for the C compiler that is used. This requires manual editing of the generated Makefile to switch case. Try it out once, and if you get errors, try editing the -fpic line


Installing bioperl-ext

Move to the directory bioperl-ext. This Ext package is available as a separate package released from ftp://bioperl.org/pub/bioperl/DIST. This is where the C code and XS extension for the bp_sw module is held and execute these commands: (possibly after making the change for BSD and Solaris, as detailed above).

perl Makefile.PL   # makes the system specific makefile  
make          # builds all the libaries
make test     # runs a short test
make install  # installs the package correctly.

This should install the compiled extension. The Bio::Tools::pSW module will work cleanly now.

Feedback

What did you think of this guide?

  • General comments, suggestions and installation war stories should be sent to the project mail-list bioperl-l.
  • Installation bug reports should be submitted via our web-based reporting system at http://bugzilla.bioperl.org/.
Personal tools
Namespaces
Variants
Actions
Main Links
documentation
community
development
Toolbox