NOTE: THIS IS A STUB FOR A WORK IN PROGRESS. This will not be added to the official HOWTO until it is mostly complete.
Entrez Utilities (EUtilities)
These are a group of methods with an unusual API that allows to access most of the information from the Entrez databases. There's different types of Eutilities (detailed below), each of them with its own function.
For some quick recipes with EUtilities, read the cookbook.
This Eutility allows to fetch sequences from different databases on different formats. See on the NCBI site.
Note: a sequence file doesn't necessarily has the actual DNA/RNA/protein sequence. It can have the Annotations and SeqFeatures only.
my $fetcher = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', # database to search (gene/nucleotude/protein/etc) -rettype => 'gb', # file format -retmode => 'text', # output type -id => $acc, # the accesion such as NM_002105.2 -seq_start => $start, # self-explanatory -seq_stop => $stop, # self-explanatory -strand => $strand, # 2 for complement or minus strand. 1 otherwise );
This option controls the database to search for.
rettype and retmode
These options controls the output type and format. Possible values are dependent on the database from where data is being retrieved.
rettype is the output type such as fasta, gb, native, etc...
rettmode is the output format such as text, xml, html or asn1. To feed the output of this Eutility to Bio::SeqIO without pre-parsing, use 'text'.
seq_start and seq_stop
When retrieving a sequence with efetch, these control the range of values. If not specified the whole sequence is download (which can be HUGE in the case of contigs).
When retrieving a sequence with efetch, it's necessary to specify the strand. These can be 1 or 2. A strand value of 1 is the same as the plus strand while a value of 2 is the same as the minus or complement strand.