Reading genomes, bit by bit

Because of rapid advances in genome sequencing technology, we can finally see the source code for life: the complete genomic DNA sequences that specify development, regulation, and function of organisms. We still don't really understand how to read this trove of encoded information, nor do we understand in any satisfying detail how it evolved.

Analogies to reading, source code, and encoding quickly break down. Genomes are digital information, and can be analyzed with tools that have parallels in digital signal processing fields, but genomes work very differently from human digital technologies. Using computational analysis to figure out how genomes work and how they evolved is a field of its own: computational genomics.

Our laboratory develops computational methods for genome sequence analysis. We are particularly interested in methods for identifying remote evolutionary relationships between distantly related protein and RNA sequences.

We're a Howard Hughes Medical Institute laboratory at Harvard University, in the Molecular & Cellular Biology department and Applied Mathematics in the School of Engineering and Applied Sciences, and we're also affiliated with the Center for Brain Science.

Recent publications

How to reach us

Department of Molecular & Cellular Biology
Biological Laboratories 1008A
16 Divinity Avenue
Harvard University
Cambridge MA 02138, USA


Hidden Markov models for sequence profile analysis.


RNA structure analysis using covariance models.


Database of protein family alignments and hidden Markov models.


The Rfam database of RNA alignments, consensus secondary structures, and profile SCFGs.


The Dfam database of repetitive DNA sequence elements.