# Regular expressions and Repeats

From BioPerl

## How do I find an iteration of any sequence of a specific length?

So `/(QA)+/`

will match one or more iterations of QA but what if you want to match any repeat of length 2?

/(..)\1+/

Then `$1`

will tell you what the repeat was, `length($&)/2`

will tell you the number of repeats.

## How do I find some sequence flanked by homopolymers of a given length?

For example, to find **FAFCRCFCFAFAFCRF** flanked by *n* number of Q, e.g.:

AGTWRWDFDQQQQQQQQFAFCRCFCFAFAFCRFQQQQQQQQQQQQQThe regular expression would be something like

/(Q{$n,})([^Q]{$x,})(Q{$n,})/

Example:

perl -e '$n=5; $x=9; $_= "AGTWRWDFDQQQQQQQQFAFCRCFCFAFAFCRFQQQQQQQQQQQQQ"; print "$1|$2|$3\n" if /(Q{$n,})([^Q]{$x,})(Q{$n,})/;'

QQQQQQQQ|FAFCRCFCFAFAFCRF|QQQQQQQQQQQQQ|

## How do I find any homopolymer flanked on both sides by the same amino acid?

For example, **HTTTTTTTTTTH** or **TGGGGGGGGGGGT**.

/(.)[^\1]+\1/

In action:

perl -e '$_ = "HTTH"; print "|$1|\n" if /((.)[^\2]+\2)/;'

Note that the "homopolymer" could have a length of **1**!