Alignment-free biological sequence comparison

Biologists frequently need to assess the relatedness of DNA or amino acid sequences when searching for homologous genes between species or inferring phylogenetic trees.  Conventional methods for comparing sequences using long sequence alignments assume point substitutions, insertions and deletions.   This may not be appropriate when genes are closely related but differ by insertion, deletion or repetition of extended stretches of sequence. This project looks at alternative, alignment-free measures of sequence comparison based on counting the number of short-word matches between sequences.