Here is an example of how to use the code to 'lift over' features aligned to a contig into a chromosomal coordinate space (i.e. using an AGP file).
#!/usr/bin/perl -w use strict; use Bio::Location::Simple; use Bio::Coordinate::Pair; # Define the contig my $ctg = Bio::Location::Simple-> new( -seq_id => 'ctg', -start => 1, -end => 999, -strand => +1 ); # Define the contig's position in the chromosome my $ctg_on_chr = Bio::Location::Simple-> new( -seq_id => 'chr', -start => 2001, -end => 2999, -strand => -1 ); # Define the mapping between the two coordinate systems (as an object) my $agp = Bio::Coordinate::Pair-> new( -in => $ctg, -out => $ctg_on_chr ); # Create an example 'match' that has contig coordinates my $match_on_ctg = Bio::Location::Simple-> new( -seq_id => 'hit', -start => 25, -end => 125, -strand => +1 ); # Convert the match into chromosomal coordinates my $match_on_chr = $agp->map( $match_on_ctg ); # (and print it) print join("\t", $match_on_chr->seq_id, $match_on_chr->start, $match_on_chr->end, $match_on_chr->strand ), "\n";
chr 2875 2975 -1
See the test script (and its results) on the talk page. Here is some commentary on that script:
What we see is that, that part of the feature to 'map' that extends beyond the range of the '-in' sequence is (arbitrarily?) mapped to a 'gap' (a Bio::Coordinate::Result::Gap) on *that* sequence in the result (a Bio::Coordinate::Result). This gap has the seq_id and strand of *that* sequence, and that's that.
i.e. The Bio::Coordinate::Pair may define the position of (part of) a contig in a scaffold, and we may wish to 'map' some features that are in 'contig coordinates' onto the scaffold to get the same features in 'scaffold coordinates'. Any feature falling fully within the contig (or fully within that part of the contig that we specify when creating the Bio::Coordinate::Pair) is 'mapped' as a single 'Match' (Bio::Coordinate::Result::Match) within the result.
If, however, feature (in 'contig coordinates') lies outside or across that part of the contig specified when creating the Bio::Coordinate::Pair, then that (part of) the feature is 'mapped' as a single 'Gap' (Bio::Coordinate::Result::Gap) in the result. The gap is given in terms of the contig (i.e. it has the contig seq_id and the contig strand).
I'm really not sure of the semantics of this... It makes sense that, if we put half of one contig into one scaffold, for example, that we don't want features from the other half of the contig to be mapped onto the scaffold, but I don't know why we would want those features to turn up as gaps, but there you have it.