This page lists errors or typos appearing in the first edition and printing 2 of the book Foundations of Machine Learning as well as their corresponding corrections. We are grateful to all readers who kindly bring those to our attention.
• Page 19, paragraph following equation (2.10): "the bound guarantees 90% accuracy" (not 99% accuracy).

• Page 44, Example 3.3: "figure 3.2(a)" should read "figure 3.3(a)" and "figure 3.2(b)" should read "figure 3.3(b)".

• Page 80, proof of Theorem 4.4: there is no need to resort to $\Phi_\rho - 1$, the proof holds directly with $\Phi_\rho$.

• Page 95, first line of the proof: $\Phi(x) \colon {\cal X} \to \Rset$ should read $\Phi(x) \colon \cX \to \Rset^{\cal X}$.

• Page 121, definition 6.1: "$\epsilon > 0$ and" should be removed.

• Page 168, pseudocode of Kernel Perceptron algorithm: $\alpha_{t + 1}$ should be replaed by $\alpha_t$.

• Page 170, in the inequality for $\Phi_{t + 1} - \Phi_{t}$, the following two intermediate lines should be inserted just before the last inequality for more explanation:
& = \log \E_{i \sim \w_t}\big[ \exp(\eta y_t x_{t, i} - \eta y_t \w_t \cdot x_{t} + \eta y_t \w_t \cdot x_{t}) \big] - \eta \rho_\infty\\
& \leq \log \big[ \exp(\eta^2 (2 r_\infty)^2/8) \big] + \underbrace{\eta y_t (\w_t \cdot x_{t})}_{\leq 0} - \eta \rho_\infty\\[-.35cm]

• Page 181, exercise 7.10, second paragraph: the definition of $m_i$ in the first sentence of that paragraph is given in the special case of the zero-one loss. For the general case, the sentence should be replaced by: "Let $m_i$ be the cumulative loss of hypothesis $h_i$ on the points $(x_i, \ldots, x_T)$, that is $m_i = \sum_{t = i}^T L(h_i(x_t), y_t)$".

• Page 181, exercise 7.10, in the text following the inline equation: $i^* = argmin_i m_i / (T - i)$ should be replaced by $i^* = argmin_i m_i / (T - i + 1)$ .

• page 189, third paragraph: $W=(w_1^\top, \ldots, w_k^\top)^\top$ should read $W=(w_1, \ldots, w_k)^\top$.

• Page 190, line 5: the empirical Rademacher complexity symbol should be replaced by that of Rademacher complexity.

• Page 191, equation 8.12: the factor 4k^2 should be 2k^2 instead.

• Page 191, optimization problem: the constraints $\xi_i \geq 0$ should be added.

• Page 192, section 8.3.2 first paragraph: "exercise 9.5" should read "exercise 8.4".

• Page 207, exercise 8.4: "family of base hypothesis" should read "family of base hypotheses".

• Page 217, line 3 from bottom: the expression should be replaced by $\sqrt{1 - \frac{(\e^+_t - \e^-_t)^2}{(1 - \e^0_t)}}$, which holds by the concavity of the square-root function.

• Page 283, inline after (12.2): $U^\top X X^\top U$ should read $Tr[U^\top X X^\top U]$.

• Page 370, line 5: $\phi'(t) \leq \frac{(b-a)^2}{4}$ should read $\phi''(t) \leq\frac{(b-a)^2}{4}$.

• Page 371, last line of lemma D.2: $\E[e^{sV} | Z ]$ should read $\E[e^{tV} | Z ]$.

• Page 381, first line of section D.1: extra space before the comma should be removed.

• Page 364, last line: it should read $(1 - 2t)^{-1/2}$ and not $(1 - 2t)^{1/2}$.

• Page 365, first displayed equation: it should read $(1 - 2t)^{-k/2}$ and not $(1 - 2t)^{k/2}$.

• Page 370, lines 3 and 4 of proof of Theorem D.1: the factor $exp(-t \epsilon)$ should be outside the product sign.