# Applications of mathematics and statistics to bioinformatics: information guide

## Lecturer

Conrad Burden (ANU)

## Synopsis

Bioinformatics is a rapidly growing interdisciplinary _field concerned with the use of computational methods to solve biological problems related to DNA and amino acid sequence information. Typical problems addressed by bioinformaticians are identifying functionally different parts of a genome, searching DNA or protein databases to find sequences which are functionally similar to a given query sequence, or inferring the relatedness of different species by measuring the similarity of their genomes. The course will cover the mathematical theory behind some of the algorithms commonly used by biologists and also give examples of current research.

The course covers the following topics:

- A crash course in probability and statistics.
- Analysis of a single DNA sequence: Shotgun sequencing, word occurrences in random sequences.
- Comparison of DNA or protein sequences: Aligning and scoring DNA and protein sequences, significance measures of an alignment, the BLAST algorithm.
- High throughput sequencing: Detecting and quantifying differential gene expression.

## Contact hours

7 hours of lectures per week, with consultation as requested/required. For information on timetabling please visit the timetable web page.

## Prerequisites

Some exposure to probability and statistics would be very helpful. I will start from scratch, but the scratch may be very scratchy if you have not previously done any probability theory. Little or no knowledge of biology will be assumed, and I will devote one lecture to the small amount of biology needed.

## Assessment

For those students taking the course for credit, there are two assignments worth 25% each to be handed at the end of the second and fourth weeks, and one 3 hour exam held at your own institution during the first week of February.

I will be happy to mark assignments for not-for-credit students provided they are handed in on time.

## Resources

### Lecture notes

Lecture notes for the entire course will be available as a pdf file on the course web page.

### Textbook

Much of the material for the first three sections listed in the synopsis is from the graduate textbook:

- Statistical Methods in Bioinformatics: An Introduction by W.J. Ewens and G.R. Grant (2nd Ed., Springer, New York, NY).

### Software

Some of the take-home exam questions require programming with the open source statistical software package R, which can be downloaded from http://cran.r-project.org/.

I will provide a do-it-yourself introductory tutorial sheet covering the basics of R, and will arrange a tutorial if there is enough demand.

## About Conrad Burden

Conrad Burden has been a bioinformatician in the Mathematical Sciences Institute since 2003. His areas of research include the analysis of high throughput sequencing data, the statistics of word counts in random sequences with applications to DNA and protein sequences, and physico-chemical modelling of microarrays. Prior to making the transition to bioinformatics he was a theoretical physicist working in subatomic particle physics.

## Contacts

For further information about this course, please contact Dr Burden directly.

Dr Conrad Burden

Mathematical Sciences Institute

Building 27

Australian National University

Canberra A.C.T. 0200

**Phone: **61-2-61250730

**E-mail: **conrad.burden@anu.edu.au

**Web page: **http://wwwmaths.anu.edu.au/~burden/Personal/index.html