Species names from accession numbers
From BioPerl
(see thread)
- A Bio::DB::EUtilities scrap, showing how you can profitably combine "elink" and "esummary" --Ed.
Bhakti Dwivedi wonders:
Does anyone know how to retrieve the "Source" or the "Species name" given the accession number?
The following scrap (with portions suspiciously reminiscent of HOWTO:EUtilities) demonstrates how you might do this:
use Bio::DB::EUtilities; my (%taxa, @taxa); my (%names, %idmap); # these are protein ids; nuc ids will work by changing -dbfrom => 'nucleotide', # (probably) my @ids = qw(1621261 89318838 68536103 20807972 730439); my $factory = Bio::DB::EUtilities->new(-eutil => 'elink', -db => 'taxonomy', -dbfrom => 'protein', -correspondence => 1, -id => \@ids); # iterate through the LinkSet objects while (my $ds = $factory->next_LinkSet) { $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0] } @taxa = @taxa{@ids}; $factory = Bio::DB::EUtilities->new(-eutil => 'esummary', -db => 'taxonomy', -id => \@taxa ); while (local $_ = $factory->next_DocSum) { $names{($_->get_contents_by_name('TaxId'))[0]} = ($_->get_contents_by_name('ScientificName'))[0]; } foreach (@ids) { $idmap{$_} = $names{$taxa{$_}}; } # %idmap is # 1621261 => 'Mycobacterium tuberculosis H37Rv' # 20807972 => 'Thermoanaerobacter tengcongensis MB4' # 68536103 => 'Corynebacterium jeikeium K411' # 730439 => 'Bacillus caldolyticus' # 89318838 => undef (this record has been removed from the db) 1;
--maj