Bioperl scripts
Contents |
Introduction
These scripts have been contributed by the developers and users of BioPerl. They are organized into directories roughly mirroring those in the BioPerl Bio/ directory. There are two directories for these scripts, scripts/ and examples/. The scripts in scripts/ are production quality scripts that have POD documentation and accept command-line arguments, and all of these scripts have the PLS suffix. The scripts in examples/ are useful examples of BioPerl code but have been written more casually.
You can install the scripts in the scripts/ directory if you'd like, simply follow the instructions on make install. The installation directory is specified by the INSTALLSCRIPT variable in the Makefile, the default directory is /usr/bin. Installation will copy the scripts to the specified directory, change the .PLS suffix to .pl and prepend bp_ to the script name if it isn't so named already.
Please contact the BioPerl mailing list if you are interested in contributing your own script.
Production Scripts
Installation
- scripts/install_bioperl_scripts.pl
- This script installs scripts from the scripts/ directory upon make install.
Bio::DB::SeqFeature::Store
- scripts/Bio-SeqFeature-Store/bp_seqfeature_gff3.PLS
- Dumps output GFF3 for selected database features
- scripts/Bio-SeqFeature-Store/bp_seqfeature_load.PLS
- This script loads a mySQL Bio::DB::SeqFeature::Store database with the features contained in a list of GFF files.
Bio::DB::GFF
- scripts/Bio-DB-GFF/bulk_load_gff.PLS
- This script loads a mySQL Bio::DB::GFF database with the features contained in a list of GFF files, it cannot do incremental loads.
- scripts/Bio-DB-GFF/genbank2gff.PLS
- This script loads a Bio::DB::GFF database with the features contained in a either a local GenBank file or an accession that is fetched from GenBank.
- scripts/Bio-DB-GFF/fast_load_gff.PLS
- This script does a rapid load of a MySQL Bio::DB::GFF database using files as source. Probably only works in Unix as it relies on pipes.
- scripts/Bio-DB-GFF/genbank2gff3.PLS
- This script uses Bio::SeqFeature::Tools::Unflattener to convert GenBank flatfiles to GFF3 with gene hierarchies mapped for optimal display in Gbrowse.
- scripts/Bio-DB-GFF/generate_histogram.PLS
- Create a GFF-formatted histogram of the density of the indicated set of feature types.
- scripts/Bio-DB-GFF/load_gff.PLS
- This script loads a mySQL Bio::DB::GFF database with the features contained in a list of GFF files. This script will work with all database adaptors supported by Bio::DB::GFF - namely MySQL, Oracle, and PostgreSQL.
- scripts/Bio-DB-GFF/pg_bulk_load_gff.PLS
- Bulk-load a PostgreSQL Bio::DB::GFF database from GFF files.
- scripts/Bio-DB-GFF/process_gadfly.PLS
- Transforms Gadfly GFF files into correct format.
- scripts/Bio-DB-GFF/process_sgd.PLS
- Transform SGD format annotations into GFF format.
- scripts/Bio-DB-GFF/process_wormbase.PLS
- Transforms Wormbase's GFF files into correct format. Requires Ace.
Bio::Biblio
- scripts/biblio/biblio.PLS
- A fully-featured script that uses Bio::Biblio, a module for accessing and querying bibliographic repositories like MEDLINE.
- scripts/DB/bioflat_index.pl
- Create or update a biological sequence database indexed with the Bio::DB::Flat indexing scheme.
- scripts/DB/flanks.PLS
- Fetch a sequence, find the sequences flanking a variant or SNP in the sequence given its position.
- scripts/DB/biofetch_genbank_proxy.PLS
- A CGI scripts that queries NCBI's eutils to provide database access according to the BioFetch protocol. Requires Cache::FileCache.
- scripts/DB/biogetseq.PLS
- Sequence retrieval using the OBDA registry.
DB-HIV
- scripts/DB-HIV/hivq.PLS
- A command-line interface to the Los Alamos HIV sequence database, based on Bio::DB::HIV and Bio::DB::Query::HIVQuery.
Index
- scripts/index/bp_fetch.PLS
- Fetch sequences from local indexed database or over the network and reformat using Bio::Index::* and Bio::DB::*.
- scripts/index/bp_index.PLS
- Indexes local databases, partners with bp_fetch.pl.
- scripts/index/bp_seqret.PLS
- Index a local Fasta database and fetch sequences using same syntax as EMBOSS seqret tool. Does not support the full EMBOSS db config files, it is designed to support fast fetching of seqs from a FASTA db.
PopGen
- scripts/popgen/composite_LD.PLS
- An easy way to calculate composite linkage disequilibrium (LD).
- scripts/popgen/heterogeneity_test.PLS
- A test for distinguishing between selection and population expansion.
SearchIO
- scripts/searchio/filter_search.PLS
- Simple script to filter by Bio::SearchIO criteria and print.
- scripts/searchio/search2table.PLS
- Turn Bio::SearchIO reports into a tabular format like blastall's "-m 9" output.
- scripts/searchio/fastam9_to_table.PLS
- Turn FASTA -m9 reports into a tabular format like blastall's "-m 9" output. Does not actually use Bio::SearchIO so is very fast.
- scripts/searchio/hmmer_to_table.PLS
- Turn HMMER reports into a tabular format like. Does not actually use Bio::SearchIO so is very fast.
- scripts/searchio/parse_hmmsearch.PLS
- Parse single or multiple HMMER hmmsearch results file(s) with different output options.
Seq
- scripts/seq/extract_feature_seq.PLS
- Extract the sequence for a specified feature type.
- scripts/seq/make_mrna_protein.PLS
- Translate a DNA or RNA sequence to protein using Bio::Seq's translate() method.
- scripts/seq/seqconvert.PLS
- Bioperl sequence format converter.
- scripts/seq/split_seq.PLS
- Split a sequence in a file into chunks of equal size with an optional overlapping range.
- scripts/seq/translate_seq.PLS
- A simple BioPerl translator.
- scripts/seq/unflatten_seq.PLS
- Unflatten a genbank or genbank-style feature file into a nested Bio::SeqFeatureI hierarchy. Uses Bio::SeqFeature::Tools::Unflattener.
SeqStats
- scripts/seqstats/aacomp.PLS
- Prints out the count of amino acids over all protein sequences in the input file.
- scripts/seqstats/chaos_plot.PLS
- Produce a PNG or JPEG chaos plot given a DNA sequence using GD.
- scripts/seqstats/gccalc.PLS
- Prints out the GC content for every nucleotide sequence in the input file.
- scripts/seqstats/oligo_count.PLS
- Use this script to determine what primers would be useful for frequent priming of nucleic acid for random labeling.
Taxonomy
- scripts/taxa/local_taxonomydb_query.PLS
- Script that accesses a local taxonomy database and retrieves species or TaxonIDs.
- scripts/taxa/query_entrez_taxa.PLS
- Demonstrate how to retrieve the NCBI TaxonID for a given species. Also retrieve TaxonID for a given accession number.
- scripts/taxa/taxid4species.PLS
- Retrieve the NCBI TaxonID for a given species.
- scripts/taxa/classify_hits_kingdom.PLS
- Classify hits on the taxonomy hierarchy from an -m9/-m8 BLAST tab delimited using a local copy of the taxonomy database downloaded from NCBI.
Tree
- scripts/tree/blast2tree.PLS
- Builds a phylogenetic tree based on a sequence search (FASTA, BLAST, HMMER).
- scripts/tree/nexus2nh.PLS
- Convert Nexus tree format trees to New Hampshire tree format, but maintain long taxon names.
- scripts/tree/tree2pag.PLS
- Convert Bio::TreeIO parseable trees to Pagel tree format.
Utilities
- scripts/utilities/bp_mrtrans.PLS
- Perl implementation of Bill Pearson's mrtrans to project protein alignment back into cDNA coordinates.
- scripts/utilities/bp_nrdb.PLS
- Make a non-redundant database based on sequence, not id. Requires Digest::MD5.
- scripts/utilities/bp_sreformat.PLS
- Perl implementation of Sean Eddy's sreformat, a sequence and alignment converter.
- scripts/utilities/dbsplit.PLS
- Splits one or more sequence files into subfiles with specified numbers of sequences, any sequence format.
- scripts/utilities/download_query_genbank.PLS
- Use Bio::DB::Query::GenBank to download files from NCBI.
- scripts/utilities/mask_by_search.PLS
- Masks parts of a sequence based on a significant matches to that sequence as contained in a Bio::SearchIO-compatible report file.
- scripts/utilities/mutate.PLS
- Randomly mutagenize a single protein or DNA sequence. Specify percentage mutated and number of resulting mutagenized sequences.
- scripts/utilities/pairwise_kaks.PLS
- Takes DNA sequences as input, aligns them as proteins, projects the alignment back into DNA and estimates the Ka (non-synonymous) and Ks (synonymous) substitutions.
- scripts/utilities/remote_blast.PLS
- This script executes a remote BLAST search using Bio::Tools::Run::RemoteBlast.
- scripts/utilities/revtrans_motif.PLS
- Reverse translate a Profam-like protein motif
- scripts/utilities/search2BSML.PLS
- Turns SearchIO-compatible reports into a BSML report.
- scripts/utilities/search2alnblocks.PLS
- Turns SearchIO-compatible reports into alignments in formats supported by Bio::AlignIO.
- scripts/utilities/search2tribe.PLS
- This script will turn a protein SearchIO-compatible report (BLAST's blastp, FASTA's FASTP and SSEARCH) into a Markov Matrix for TribeMCL clustering.
- scripts/utilities/search2gff.PLS
- Turn SearchIO parseable report(s) into a GFF report.
- scripts/utilities/seq_length.PLS
- Reports the total number of residues and total number of individual sequences in a specified sequence database file.
Example Scripts
Alignment
- examples/align/align_on_codons.pl
- Aligns nucleotide sequences based on codons in a specified reading frame.
- examples/align/aligntutorial.pl
- Examples using EMBOSS, Bio::Tools::pSW, Clustalw, TCoffee, and BLAST to align sequences.
- examples/align/clustalw.pl
- A demonstration of the various uses of Bio::Tools::Run::Alignment::Clustalw.
- examples/align/simplealign.pl
- A script that demonstrates some uses of Bio::AlignIO.
Bio::Biblio
- examples/biblio/biblio.pl
- A script that shows how to query bibliographic databases (such as MEDLINE) using ids, keywords, and other fields using the Bio::Biblio module.
- examples/biblio/biblio_soap.pl
- Connect to and test a SOAP server using a Bio::Biblio object.
DB
- examples/db/dbfetch
- Creates a Web page to query a local SRS server and fetch sequences.
- examples/db/est_tissue_query.pl
- Fetch EST sequences from local files or GenBank filtered by tissue using Bio::DB::* or Bio::Index::*.
- examples/db/gb2features.pl
- Shows how to extract all the features from a GenBank file using Bio::Seq.
- examples/db/getGenBank.pl
- Retrieving GenBank entries over the Web using Bio::DB::GenBank.
- examples/db/get_seqs.pl
- Fetches and formats sequences from GenBank, EMBL, or SwissProt over the network using Bio::DB*.
- examples/db/rfetch.pl
- A script that uses Bio::DB::Registry to retrieve sequences from EMBL, reformat them, and print them.
- examples/db/use_registry.pl
- Script that shows how to use Bio::DB::Registry, part of BioPerl's integration with OBDA, the Open Bio Database Access registry scheme. See Bio::DB::Registry for more information.
- examples/db/gff/
- Scripts that reformat sequence to GFF and load GFF format files into an indexed database using Bio::DB::GFF.
SearchIO
- examples/searchio/blast_example.pl
- Print out all parsed values from a BLAST report.
- examples/searchio/custom_writer.pl
- Demonstrates how to extract data from BLAST reports and output as tab-delimited data.
- examples/searchio/hitwriter.pl
- Demonstrates how to extract data from BLAST reports and output as tab-delimited data.
- examples/searchio/hspwriter.pl
- Demonstrates how to extract data from BLAST reports and output as tab-delimited data.
- examples/searchio/htmlwriter.pl
- Demonstrates how to extract data from BLAST reports and output as HTML.
- examples/searchio/psiblast_features.pl
- Illustrates how to grab a set of SeqFeatures from a PSI-BLAST (blastpgp) report.
- examples/searchio/psiblast_iterations.pl
- Demonstrates the use of a SearchIO parser for processing the iterations within a PSI-BLAST report.
- examples/searchio/rawwriter.pl
- Shows how to print out raw BLAST alignment data for each HSP.
- examples/searchio/resultwriter.pl
- Demonstrates how to extract data from BLAST reports and output as tab-delimited data.
- examples/searchio/waba2gff.pl
- Convert raw WABA output to one type of GFF.
Tools
- examples/tools/extract_genes.pl
- Simple solution to the problem of extracting genomic regions corresponding to genes, uses files from NCBI and Bio::DB::Fasta.
- examples/tools/gb_to_gff.pl
- Extracts top-level sequence features from GenBank-formatted sequence files using Bio::Tools::GFF.
- examples/tools/reverse-translate.pl
- Reverse-translates a nucleotide sequence using the most frequent codons, uses Bio::CodonUsage::Table and Bio::Tools::CodonTable.
- examples/tools/gff2ps.pl
- Takes an input file in GFF format and draws its genes and features as Postscript using Bio::Tools::GFF.
- examples/tools/parse_codeml.pl
- Script that parses output from codeml, one of the PAML programs, using Bio::Tools::Phylo::PAML.
- examples/tools/psw.pl
- Example code for using the Ext package for comparing proteins using Smith-Waterman.
- examples/tools/run_genscan.pl
- Run GENSCAN on multiple sequences and create output sequence files using Bio::Tools::Genscan.
- examples/tools/seq_pattern.pl
- A script that shows how to use sequences as regular expressions using Bio::Tools::SeqPattern.
- examples/tools/standaloneblast.pl
- The many uses of Bio::Tools::Run::StandAloneBlast, including BLAST and PSI-BLAST.
Bio::Root
- examples/root/exceptions1.pl
- A simple tester script for demonstrating how to throw and catch Error objects.
- examples/root/exceptions2.pl
- This shows how Error.pm-based objects can be thrown by
Bio::Root::Root::throw()when Error.pm is available.
- examples/root/exceptions3.pl
- This shows that Error objects can be subclassed into more specialized types.
- examples/root/exceptions4.pl
- This shows how the examples work when Error.pm isn't installed.
Other
- examples/bioperl.pl
- A BioPerl shell!
- examples/cluster/dbsnp.pl
- How to parse a dbsnp XML file. See Bio::ClusterIO for details.
- examples/contributed/nmrpdb_parse.pl
- Extracts individual conformers from an NMR-derived PDB file.
- examples/contributed/prosite2perl.pl
- Convert Prosite motifs to Perl regular expressions.
- examples/contributed/rebase2list.pl
- Script to convert rebase file to format compatible with Bio::Tools::RestrictionEnzyme.
- examples/generate_random_seq.pl
- Writes random RNA, DNA, or protein sequence of given length.
- examples/liveseq/change_gene.pl
- A script showing how to use Bio::LiveSeq::Mutator and Bio::LiveSeq::Mutation.
- examples/longorf.pl
- A script that finds the longest ORF in one or more nucleotide sequences.
- examples/make_primers.pl
- Design PCR primers given a sequence and the positions of the start and stop codons in the sequence's ORF.
- examples/popgen/parse_calc_stats.pl
- Shows how to read data from a Bio::PopGen::IO object.
- examples/rev_and_trans.pl
- Examples using Bio::Seq for reversing and translating sequences.
- examples/revcom_dir.pl
- Return reverse complement sequences of all sequences in the current directory and save them in the same directory.
- examples/sirna/rnai_finder.cgi
- CGI script for RNAi reagent design. See Bio::Tools::SiRNA for more information.
- examples/seq/extract_cds.pl
- Extract the CDS features from a GenBank file.
- examples/seqstats/aacomp.pl
- Calculate amino acid composition of a protein using Bio::Tools::IUPAC and Bio::Tools::CodonTable.
- examples/structure/struct-io.ps
- How to examine details of the 3D structure of a protein by parsing a PDB using Bio::Structure::IO.
- examples/subsequence.cgi
- CGI script to fetch a sequence from GenBank and extract a subsequence using Bio::DB::GenBank.
- examples/tk/gsequence.pl
- Create a Protein Sequence Control Panel GUI with Gtk.
- examples/tk/hitdisplay.pl
- Create a GUI for displaying BLAST results using Bio::Tk::HitDisplay from the GUI package.
- examples/tree/paup2phylip.pl
- Convert a PAUP tree block to PHYLIP format.