Yes, there are errors. Unlike Donald Knuth, we don't dare offer you $1 for each one you find. But we would still appreciate knowing about them!

Please report errata to

The book is now past its 10th printing, and in its third revision.

- p.19, a previous fix (see below) had added: "Assume that gapped sequence alignments do not allow gaps in the second sequence after a gap in the first; that is, allow alignments of form ABC/A-C and A-CD/AB-D but not AB-D/A-CD. (This is a natural restriction, because a region between aligned pairs can be aligned in a large number of uninteresting ways.)" This contains a typo: change "after" to "before", so that it reads "do not allow gaps in the second sequence before a gap in the first". Alternatively, "do not allow inserted residues in the second sequence after an inserted residue in the first". [Thanks to Jeffrey Xing, February 2016.]
- p.30, we say, "we assume that a deletion will not be followed
directly by an insertion. This will be true for the optimal path if
-d-e is less than the lowest mismatch score." This is incorrect.
A counterexample: suppose d=10, e=1, lowest mismatch score = -10,
and the alignment:
ccc--- ---aaa

It should instead probably read something like, "This will be true for an optimal path if -2e <= the lowest mismatch score and d>=e." [Thanks to Martin Frith, CBRC, Japan, Jun 2012]. - p.39, eqn 2.18, the equation for the extreme value distribution is missing a negative sign in the innermost exponential, in front of \lambda (x - \mu). [Thanks to Larry Ruzzo, UW Seattle, Apr 2011].

- p.52, "normalised by dividing by their length, i.e. as an average number of bits per molecule" should be "per residue". [Thanks to Michael Hoffman, EBI, June 2008].
- p.78 (in my 1999 1st rev copy), "The b variables have to be scaled with the same numbers, so the recursion step in (3.3) becomes...": two corrections here. First, (3.3) should be (3.13). Second, change "have to be scaled" to "can be scaled". Using the same scaling factors is convenient because it leads to the simplifications in exercises 3.13 and 3.14, but is not necessary, as Kevin Murphy has pointed out (p.8 of https://www.cs.ubc.ca/~murphyk/Papers/segment.pdf). [Thanks to Jason Ernst, UCLA, Sept 2022]

- p.88, the pseudocode for the recursion doesn't correctly handle the border cases i=0 or j=0. [Thanks to Lee Newberg, Wadsworth Institute, Oct 2007]

- p.196 in "... namely purine to purine or pyr-miding to pyrimidine", an 'i' is missing from the first pyrimidine, right at the line break. [Thanks to Tommy Kaplan, Dec 2015]

- pp.270-273. We should make clear that by "Nussinov algorithm" we mean the general class of base-pair scoring algorithms, as opposed to "Zuker algorithms" that are stacking- (also said to be loop-)dependent algorithms. There are many different DP algorithms (in detail) for maximizing the number of base pairs in an RNA sequence, corresponding to different choices of grammar. The actual algorithm in Nussinov (1978) corresponds essentially to grammar G5 in Dowell and Eddy, BMC Bioinf 5:71 (2004), whereas the algorithm in the book corresponds to grammar G1. G5 is an unambiguous grammar, but is a little tricky to understand if it's the first RNA folding algorithm one sees. G1 is more intuitive, which is why we use it here, while still calling it a "Nussinov" algorithm because it's maximizing the number of base pairs. [Thanks to Georg Sauthoff, Bielefeld University, Germany, Dec 2008.]
- p.299: Two references are missing and appear as ?. This was a formatting error
erroneously introduced into the third revision. The missing references are:
Margalit, H., Shapiro, B. A., Oppenheim, A. B. and Maizel, J. V. 1989. Detection of common motifs in RNA secondary structures. Nucleic Acids Research 17:4829-4845.

Shapiro, B. A. and Zhang, K. 1990. Comparing multiple RNA secondary structures using tree comparisons. Computer Applications in the Biosciences 6:309-318.

[Thanks to Yiran Guo, Beijing Institute of Genomics; April 2008]

- p.317: The description of rejection sampling for the Dirichlet is not correct.
The condition g(x,\alpha ,1) \leq f(x) means that f(x) cannot be a normalized
probability distribution, since g(x,\alpha ,1) is.
Actually f(x) = C \rho (x), where \rho(x) is a probability (see Exercise 11.8): \rho (x) = \frac{\lambda \alpha^\lambda x^{\lambda -1}} {(\alpha^\lambda +x^\lambda )^2}, and the constant C = \frac {4e^{-\alpha}\alpha^\alpha}{\lambda\Gamma (\alpha )}.

Then 'sample from f' should be 'sample from \rho'; and finally \rho(x) P({\rm rand}[0,1]< g(x)/f(x)) \propto g(x).

[Thanks to Dr. Wei-Mou Zheng, Institute of Theoretical Physics, Academica Sinica, Beijing; April 2008]

- pg.17, exercises 2.3 and 2.4. A careful reader may be unduly confused by the fact that d=12,e=2 is used for gap penalties in both exercises, but the score matrices (BLOSUM62 and BLOSUM50) are scaled by different factors: BLOSUM 62 is in half bits, and BLOSUM50 is in third bits. We are not trying to confuse you here: the identity of the two sets of gap penalties is not significant to the questions. Once you see that gap penalties imply probability distributions over gap opening frequency and gap extension frequency, you'd think (correctly) that gap scores should be scaled by the same constant as the scoring matrix you use. In practice, people really don't always rescale their gap penalties when they shift to a matrix that has a different scale -- an illuminating consequence of it not being widely appreciated that gap scores have a probabilistic interpretation. [Thanks to Chun Li, U.Mich.]
- pg.19, exercises 2.5-2.7. This line of argument relies on pairwise
alignments being of the form:
A-C or ABC or A-BC but not AB-C ABC A-C AD-C A-DC

That is, for any region of unaligned residues between two aligned columns, you have to put the unaligned residues in y (the second sequence) before the unaligned residues in x (the first sequence). This is sufficient to make sure there is only one possible representation (and intercalation) per alignment, where we say that any two alignments that specify the same set of aligned residues are the same alignment. (This is related to ambiguity issues in formal grammars.) Thus, in the third printing, we've added: "Assume that gapped sequence alignments do not allow gaps in the second sequence after a gap in the first; that is, allow alignments of form ABC/A-C and A-CD/AB-D but not AB-D/A-CD. (This is a natural restriction, because a region between aligned pairs can be aligned in a large number of uninteresting ways.)"

[thanks to Bob Dobrow, Carleton College, 9 Apr 2002].

[corrected again and hopefully definitively by AK, 12 Dec 2005]

- pg. 21, figure 2.5. The 7th cell over and 2nd cell down should read "41" instead of "42". Also, in cases where degenerate traceback pointers are possible, only one traceback pointer is shown.... a note in the Figure legend could be added to clarify that not all traceback pointer possibilities have been shown. [thanks to R. Gottlieb, U. Colorado]
- p.27, figure 2.8 contains some errors. cell A2,E2 should read -3 not -2. E5,A5 should be -7, not 7. A6,W6 should be -3 not 3. E7,G7 should be -4 not 4. [thanks to Rodger Staden, MRC-LMB] And A4,H6 should be -6, not 6 [thanks to DM Meyer, Nov 2004]. Figure 2.8 needs to be recalculated; looks like multiple errors crept in.
- p.27, the words "right" and "bottom" should be swapped in the sentence "We set Fmax to be the maximum value on the right border (i,m),i = 1,...,n, and the bottom border (n,j),j = 1,...,m." [thanks to Paul Stothard, University of Alberta, 16 June 2003]
- pg. 28, in the DP recursion. F(i-1,n) should read F(i-1,m). [thanks to David Gestel, Utrecht University, 4 January 2001]
- pg. 35, 2nd pp from bottom: should say "set c(i,j) = j if i = u", not j'. Also, (since we initialize c(i,j) on row u) it should say "For i \geq u let us define", not i > u. [Thanks to danber, Xi'an Jiaotong University, 4 April 2003]
- p. 43 top: It should say "the expected frequency of substitutions", not "the expected number of substitutions" in an average protein, and the equation should be \sum_(a,b, a \neq b) q_a q_b B_{ab} because it needs to sum over nonidentical residue pairs (a != b), not all residue pairs. [thanks to Carlos Gerardo Arroyo at SLAC, 12 Dec 2000]

- p55. At the top, the example of calculating a probability for
the sequence CGCG has two typos in it. It should be:
a0,c+ * 1 * ac+,g- * 1 * ag-,c- * 1 * ac-,g+ * 1 * ag+,0

[thanks to Bill Nagy, IBM] - p65. At the top, it says that Viterbi training "maximises the contribution to the likelihood P(x1..xn | \theta, \pi*(x1)..\pi*(xn))." That should read P(x1..xn, \pi*(x1)..\pi*(xn)): e.g. Viterbi finds an optimal path by maximizing P(path, seq | model) over possible paths. [thanks to Shlomo Moran, Technion, Haifa]
- pp 70-71. The comparison of the numbers of transitions between the "fully downstream connected" model and the "silent state" model may be confusing. It's less confusing if we change 600 to 800 for the number of transitions in the silent state model for 200 match states; the figure at the top of p. 71 clearly shows 4 transitions per match state. The reason it says 600 was that we weren't counting the 200 match->match "main line" transitions in the comparison (since both models have to have them), only the extra transitions having to do with further downstream connectivity. [thanks to Chun Li, U.Mich.]

- p 83. The initialisation of the pair-HMM Viterbi algorithm isn't right. It should be v^M(0,0)=1, v^X(0,0)= v^Y(0,0)=0, and v^*(-1,j)=v^*(i,-1)=0. The recursion should be over i=0,..,n and j=0,..,m except for (0,0). As it is, there can be no deletes in the beginning of any of the sequences. It would be nicer to avoid the -1 row/column and explicitly initialize v^X(0,j) and v^X(i,0) and similarly for v^Y [and initialize v^M(0,j)=v^M(i,0)=0, except for v^M(0,0)=1]. Then the recusion would start at 1 in i and j, and things would start to look more like chap 3. [ak, 7 Sept 99]
- p.84 eqn 4.2. In two places, q_x_j should be q_y_j. [Thanks to Michael Hoffman, UT Austin, 6 Aug 03.]
- p.84. The equation for e should have 1-\eta in the denominator, not 1-\tau. [Thanks to J. Burton, 27 Feb 02.]
- p.85 top. The initialization of V^M(0,0) should be -2 \log \eta, not 2 \log \eta. (And I'm a little suspicious still: I think this algorithm doesn't work out right for some special cases, like length 0 alignment, or a length 1 insert-only alignment.) [sre, 2 Jan 03].
- p.85. Where it says "If the exit transition probabilities from the gap states are set to (1-epsilon)tau/(1-tau) then c will be zero", that should read "...set to (1-epsilon)tau/(1-2delta)...". [Thanks to Youyi Fong, 21 Dec 02]
- p.90. In the example global alignments, two pairs - (1,1),(1,3); (2,3), (3,1) are identical, but the intention was to show different alternative global alignments. [Thanks to P. Tung-Ming, 28 Apr 02.]
- p.99. The Zuker suboptimal alignment algorithm (which does a forward/backward algorithm to calculate the scores of conditionally optimal alignments that pass though each DP cell, samples a suboptimal DP cell, then traces the optimal alignment that passes through that sampled cell) is not really similar to the Waterman/Eggert algorithm (which recalculates the DP matrix to erase the influence of the optimal alignment, then finds the second best independent alignment, and repeats this process for additional independent suboptimals), except in the very trivial sense that they are both "suboptimal alignment" algorithms. [Thanks to M. Zuker.]

- p 117: There's a minor typo in the second equation on the page. The prime should be on the k instead of the p (p_{k'}). (ak, 31/5/99)
- p 117: There's an error in the equation for P(c_j | k)
for the mixture Dirichlet. The middle ratio is flipped
over, and should instead read:
\prod_a \Gamma (c_ja + \alpha^k_a) ------------------------------------ \Gamma ( \sum_a (c_ja + \alpha^k_a))

(sre, 31/5/99) - p. 122: end of second pp, "mode" should be "model". [thanks to Youyi Fang, 31 Dec 02]
- p 123: at the very bottom, t_xy in the text should be a_xy to agree with notation in the displayed equation (and elsewhere). [thanks to Youyi Fong, Dec 2002]
- p 126: The Gaussian pdf at the bottom of the page has a t^t term in it that should read just plain t, instead. [thanks to Carlos Gerardo Arroyo at SLAC, 7 Dec 2000]
- p 129 bottom: the equation

\frac{\partial \log y}{\partial x} = \frac{1}{K + e^x} = K(1 - y).

should instead read

\frac{\partial \log y}{\partial x} = \frac{K}{K + e^x} = (1 - y).

[thanks to Carlos Gerardo Arroyo at SLAC, 12 Dec 2000] - p 131 third pp: inline equation for Hi(w.) should have negative sign; it's an entropy. [thanks to Youyi Fong, 31 Dec 02]
- p 131 last pp: "another way to view to", delete the second "to". [thanks to Youyi Fong, 31 Dec 02]

- p.156 bottom, \hat{a}_{i,\pi_{i}}} should be changed to \hat{a}_{k,\pi_{i}}} (that is, index on a_i should be a_k) [AK, 3 Jun 2005]

- p.172, question 7.10: should say the bridge length, not twice the bridge length. [thanks to Itai Sharon, 2 Feb 03].
- p.178-179. Last sentence of pp at top of p.179 should read "Therefore, when we reach a situation with a row of 0s on the right, we have to advance the leftmost 0 to 1 to make the next step." Figure 7.11 (ii) needs to be changed accordingly as well. [Thanks to Bernhard Sonderegger, 14 Jun 2004]
- p.189-191, the Studier/Keppler proof of NJ. Equation 7.10 should read -2 d_lk, not -d_lk (note that d_ij contains a d_lk segment). This affects subsequent algebra. However, Graeme believes the proof has other problems. This section needs to be rethought carefully.
- p.196, around the line "Rescaling does not change the diagonal form of the matrix", the operation should be "normalization"; and it is not clear that this leaves the matrix diagonalizable - if that's what this sentence is supposed to mean. GJM wants to revise here. [GJM, 7 Jun 04]

- p. 228, in the first equation on the page, there's a missing
summation over a^8; both the first and second part of the equation
are summations over a^8, not just the first.
The LaTeX should read:
\eqas \sum_{a^8} P(a^1 | a^8, t_1+t_6)P(a^3 | a^8, t_7+t_3)q_{a^8} &=& \sum_{a^8} P(a^1 | a^8, t_1+t_6)P(a^8 | a^3,t_3+t_7)q_{a^3} \\ &=& P(a^1 | a^3, t_1+t_6+t_3+t_7)q_{a^3} \\ \enas

[thanks to Carlos Gerardo Arroyo at SLAC, 13 Dec 2000]

- p.238 last line: delete first string GCGCTG. It is not matched by the FSA in fig 9.2, nor the grammar. [AK, 3 Jun 2005].
- p.239. W_4 \rightarrow g W_6 production should read W_4 \rightarrow g W_5 instead. [Thanks to Marcela Davila, Chalmers Univ., Sweden, 10 Dec 04]
- p.254, Inside algorithm. Loop iteration condition is incorrect. for i = 1 to L-1 should read for i = L-1 down to 1. (Thanks to Piero Fariselli, U. of Bologna, 13 Nov 02)
- p.255, Outside algorithm. Loop iteration condition recalculates
the i=1, j=1 case that was initialized. This either can be special
cased (don't apply iteration eq to i=1, j=1) or to introduce
an aux variable for subsequence length, s = j-i+1:
for s = L-1 to 1 for j = s to L i = j-s+1

(Thanks to Piero Fariselli, U. of Bologna, 13 Nov 02)

(And thanks for a meta-erratum to Steven Van Vaerenbergh, Ghent University, 9 May 03). - p.257, CYK algorithm. Same loop iteration condition error as was made for Inside. for i = 1 to L-1 should read for i = L-1 down to 1. (Thanks to Piero Fariselli, U. of Bologna, 13 Nov 02)
- There's several serious misstatements in the section on
NP problems and intractability. Thanks to Mark Pleszkoch
at IBM for sending several corrections. The entire section
should be revised. Basically we're using "NP" incorrectly
to mean "problems which have no known polynomial time
algorithm that solves them"; this isn't what NP means.
The basic issues are the following:
- NP is a superset of P, not exclusive of P.
- Context sensitive grammar parsing has not been proven to be NP, although there are no known polynomial time algorithms to solve the CSG parsing problem.
- p.248 in "a subclass of NP problems, including both context-free grammar parsing and the famous travelling salesman problem..", "context-free grammar" is a typo for "context-sensitive grammar"... but even that's wrong.

- p 272, simple SCFG for RNA folding: needs one more production S -> \epsilon, to terminate derivations. (Thanks to R. Giegerich, 10/11/99)
- p 271-272. Algorithm on p. 271 does not give the structure in figure 10.9, though the structure in fig 10.9 is indeed a possible optimal structure. Clarify in exercise 10.2: point this out, and ask for modifications to the traceback to give the fig 10.9 structure. (Discrepancy caught by B. Sonderegger, 6/26/02)
- p 273, CYK recursion: for i = 1 to L-1 should read for i = L-1 down to 1. (Thanks to R. Giegerich, 10/11/99)
- Last paragraph on p 273 (or at some point): We need to discuss grammar ambiguity. Obviously any useful RNA structure prediction grammar must be formally ambiguous: there are many possible structures for a given sequence, and we're going to be interested in the maximum likelihood structure. It's important to avoid a second level of ambiguity: we want each structure to have an unambiguous parse tree. If this is not true, we cannot say that the maximum likelihood parse tree (which is what CYK is going to give us) corresponds to the maximum likelihood structure. The grammar in (10.2) is ambiguous in this way... which is another reason why it's not a good example of an RNA folding grammar. (Thanks to R. Giegerich, 10/11/99)
- p 274, footnote: The URL's gone out of date (surprise surprise). A working URL is here. But we should replace it with a real reference. [Thanks to Michael Hoffman, UT Austin, 7 Sept 2003.]
- p 280, fig 10.12: On the left branch of the parse tree the two last L productions before E should be C and G rather than U and C.

- p.304: "...the Poisson, f(n)=e^-p p^n/n,...", should be n! in the denominator. (This is a TeX source typo; "\!" does nothing.) (Thanks to Li Liao at Dupont, 8/29/01.)
- p.312 line 6: "we we" should be "we". (Thanks to Chun Li, U.Mich., 4/20/00).
- p.315 exercise 11.9: "from (x,y) to (u,w)" should be "from (u,w) to (x,y)".(Chun Li, 4/20/00).
- p.315, exercise 11.10 line 2: The expression at the end should be $\cos(2\pi x) \sqrt{\log(1/y^2)}$. (The square root was left out.) (Chun Li, 4/20/00).
- p.317, Ex.11.12, line 1. The inequality at the end should be 1<=\lambda instead of 1>=\lambda. (Chun Li, 4/20/00).

- p.342: Stormo and Haussler generalized DP reference is 1994, not 1996; 2nd ISMB, not 4th. (sre to ak - doublecheck the bibtex crossrefs for ismb, make sure this didn't happen elsewhere) [Thanks to Gary Stormo, 15 Oct 2003.]

- pg. 8. In the last equation on the page, after "the denominator is now an integral rather than a sum", the left hand side P(\theta) should instead be P(D).
- pg. 8. Exercise 1.4, "Why?" should more specifically say "Using Bayes' theorem, give a plausible reason for making this decision." In real life, the decision to take or not take a genetic test would also depend on the risk associated with erroneously not treating the disease in true positives and the risk associated with treating the disease in false positives.

- p. 18, eq. 2.7: Remove the "2" under the sqrt: "\frac{ 2^{2n}}{\sqrt{2 \pi n}}" --> "\frac{ 2^{2n}}{\sqrt{\pi n}}"
- p. 24: H(p||q) should be H(qxq||p) -- rd; and I (sre) think there's incorrect text too; should read something like "where H(qxq || p) is the relative entropy of the expected distribution qxq with respect to the observed distribution p..."

- p. 51. The equation for S(x) isn't specific enough about what log base is used to produce the actual example numbers in the table for \Beta. The numbers in the example table are in log base two (bits). This appears in the footnote, but it's easy to miss.
- p. 71, point (ii): "add" --> "set"
- p. 76, fig. 3.12, y-axis on lower plot: "Bits per nucleotide" --> "Bits"

- In Exercise 4.4, "Figure 4.8" should read "Figure 4.7".

- p. 118. line 1 (prime is on sum instead of the a) "$f_{ja} = c_{ja}/\sum_a' c_{ja'}$" --> "$f_{ja} = c_{ja}/\sum_{a'} c_{ja'}$"
- p. 118, eq. 5.4. The definition of f is incompatible with the exercises, and I suggest changing " = 1" to "- 1 = 0".
- pg. 121. Equation 5.8 should read $\sum_C \sum_{\mbox{samples } s} \sum_a C_a \log e_s(a)$.
- pg. 123. In figure 5.7, arrows for I-> D transitions were omitted.
- pg. 123 (bottom of page). The probability $t_{xy}$ should be $a_{xy}$.

- p. 140, equation 6.5: When we define the SP score (eqn 6.5), we don't assign a cost to $s(-,-)$, though of course this can arise in multiple alignments. I presume it should be zero. Additional note (gjm): actually we do define $s(-,-)$ to be zero, but it is on p. 146, and in fact we repeat what has been said on p 140 two lines down "For simple linear gap costs..". We really ought to say s(-,-)=0 on page 140 and then refer back to this whole passage on page 146.

- p. 160, Zuckerkandl is misspelled; also in refs and in index (pp. 344, 349) -> change in bibtex db.
- p. 162, caption to fig. 7.1, ln. 7: "the alphabetically the" --> "alphabetically the"
- p. 165, Exercise 7.2 should read: "The trees with three and four leaves in Figure 7.3 all have the same unlabelled branching pattern. For both rooted and unrooted trees, how many leaves do there have to be to obtain more than one unlabelled branching pattern? Find a recurrence relation for the number of rooted trees. (Hint: Consider the trees formed by joining two trees at their roots.)"
- pg. 169. In Figure 7.5, the tree on the right is incorrect. It should be (1(4(23))), not ((14)(23)).
- pg. 171. in the neighbour joining algorithm iteration, "d_jk = d_ij - d_jk" should read "d_jk = d_ij - d_ik".
- pg. 176. A typo; "applying the nparsimony algorithm" should read "applying the parsimony algorithm".
- pg. 180. In Fig 7.12 the residue on the small tree should be $x^2^j$, not $x^2_{j-1}$.
- pg. 188. The short paragraph beginning "Note that in this example..." doesn't make sense. The topology, qua unrooted tree, is of course the same if GTT and GT are grouped together. What is true is that different rootings of the tree give different results for Hein's algorithm, which is worrying. Actually, the whole section needs rethinking, because someone has just shown that the use of "dummy edges" in Hein's algorithm is incorrect.

- p. 213, line -2: should read "assuming a flat prior on $\exp(-\alpha t)$" -- I think. (But there may be something nastier wrong with this.) [GJM]
- p. 214: should be a 3 in the denominator of eq 8.23.

- pg. 301, Equation 11.4. The denominator of the last bit should read (\sum_k n_k)!. The factorial was inadvertently dropped.
- p. 302 eqn. 11.5: slight misformatting, parentheses should be bigger for \delta term
- p. 306 beginning of ln. 6: "is the" --> "is then"
- p. 309 line -2: "neigbouring" --> "neighboring"
- p. 312 exercise 11.6: stray blemish on d of $\sigma^2/(Nd^2)$
- p. 319, second line before equations: "Using equations (11.3)" --> "Using equation (11.3)"
- p. 324 line 4, "equatilty" --> "equality"