A tutorial introduction to hidden Markov models and other probabilistic modelling approaches in computational sequence analysis.
Richard Durbin, Sean Eddy, Anders Krogh, and Graeme Mitchison.
Cambridge University Press, 1998.
ISBN 0-521-62041-4 (hardback)
ISBN 0-521-62971-3 (paperback)
An up-to-date list of errata is here.
If you discover an error that isn't already on the list, please
The face of biology has been changed by the emergence of modern molecular genetics. Among the most exciting advances are large-scale genome sequencing efforts, such as the Human Genome Project, which are producing an immense amount of data. The need to understand the data is becoming ever more pressing. Demands for sophisticated analyses of biological sequences are driving forward the newly created and explosively expanding research area of computational molecular biology, or bioinformatics.
Many of the most powerful sequence analysis methods are now based on principles of probabilistic modeling. Examples of such methods include the use of probabilistically derived score matrices to determine the significance of sequence alignments, the use of hidden Markov models as the basis for profile searches to identify distant members of sequence families, and the inference of phylogenetic trees using maximum likelihood approaches.
This book provides the first unified, up to date, and tutorial level overview of sequence analysis methods, with particular emphasis on probabilistic modelling. Pairwise alignment, hidden Markov models, multiple alignment, profile searches, RNA secondary structure analysis, and phylogenetic inference are treated at length.
Written by an interdisciplinary team of authors, the book is accessible to molecular biologists, computer scientists, and mathematicians with no formal knowledge of each others' fields. It presents the state of the art in this important, new and rapidly developing discipline.
|Sequence similarity, homology, and alignment||2|
|Overview of the book||2|
|Probabilities and probabilistic models||4|
|The scoring model||13|
|Dynamic programming with more complex models||28|
|Heuristic alignment algorithms||32|
|Linear space alignments||34|
|Significance of scores||36|
|Deriving score parameters from alignment data||41|
|Markov chains and hidden Markov models||46|
|Hidden Markov models||51|
|Parameter estimation for HMMs||62|
|HMM model structure||68|
|More complex Markov chains||72|
|Numerical stability of HMM algorithms||77|
|Pairwise alignment using HMMs||80|
|The full probability of x and y, summing over all paths||87|
|The posterior probability that xi is aligned to yj||91|
|Pair HMMs versus FSAs for searching||95|
|Profile HMMs for sequence families||100|
|Ungapped score matrices||102|
|Adding insert and delete states to obtain profile HMMs||102|
|Deriving profile HMMs from multiple alignments||105|
|Searching with profile HMMs||108|
|Profile HMM variants for non-global alignments||113|
|More on estimation of probabilities||115|
|Optimal model construction||122|
|Weighting training sequences||124|
|Multiple sequence alignment methods||134|
|What a multiple alignment means||135|
|Scoring a multiple alignment||137|
|Multidimensional dynamic programming||141|
|Progressive alignment methods||143|
|Multiple alignment by profile HMM training||149|
|Building phylogenetic trees||160|
|The tree of life||160|
|Background on trees||161|
|Making a tree from pairwise distances||165|
|Assessing the trees: the bootstrap||179|
|Simultaneous alignment and phylogeny||180|
|Appendix: proof of neighbour-joining theorem||190|
|Probabilistic approaches to phylogeny||192|
|Probabilistic models of evolution||193|
|Calculating the likelihood for ungapped alignments||197|
|Using the likelihood for inference||205|
|Towards more realistic evolutionary models||215|
|Comparison of probabilistic and non-probabilistic methods||224|
|Stochastic context-free grammars for sequence modelling||252|
|RNA structure analysis||260|
|RNA secondary structure prediction||267|
|Covariance models: SCFG-based RNA profiles||277|
|Background on probability||299|
|Estimation of probabilities from counts||319|
|The EM algorithm||323|