- Page 34, Definition 3.1: it should be noted that "Z" is an arbitrary
input space.
- Page 36, first inline equation: all instances of "$g \in H$"
(which the supremum is taken over) should be "$g \in G$".
- Page 43, proof of Theorem 3.4: The definition of $I_2$ should
have the condition $\beta_i \leq 0$ (instead of $\beta_i < 0$).
- Page 44, Example 3.3: "figure 3.2(a)" should read "figure 3.3(a)" and "figure 3.2(b)" should read "figure 3.3(b)".
- Page 46, proof of Theorem 3.5, second paragraph: "$H$ includes by
restriction to $S'$" should read "$H$ induces by restriction to
$S'$".
- Page 49, proof of Theorem 3.6: The line above eq. (3.34) should
read "with very high probability $(1 - 8 \epsilon)$" (instead of $(1 -
\epsilon)$).
- Page 50, first second to last paragraph: "...PAC-learning in the
non-realizable case is not possible..." should read "..PAC-learning in
the realizable case is not possible...".
- Page 57, Exercise 3.12: The hint should read "show that $\{2^{-i}
: i < m\}$ can be fully shattered for any $m \in \mathbb{N}$"
- Page 66, first paragraph: "$\nabla_w (F)" should instead be
written "$\nabla F(w)$" and "its Hessian the" should read "its Hessian
is the"
- Page 78, Lemma 4.2: The word "function" should appear at the end
of the first sentence.
- Page 79, proof of Lemma 4.2, third inline equation: in the first
equality, parentheses are missing around the two terms that are
preceded by the factor 1/2.
- Page 80, proof of Theorem 4.4: there is no need to resort to
$\Phi_\rho - 1$, the proof holds directly with $\Phi_\rho$.
- Page 94, second inline equation: the "$\sigma^n" in
the denominator of the expression should instead be "$\sigma^{2n}$".
- Page 95, first line of the proof: $\Phi(x) \colon {\cal X} \to \Rset$
should read $\Phi(x) \colon \cX \to \Rset^{\cal X}$.
- Page 95, proof of Theorem 5.2, following third inline equation:
$g$ should be defined in terms of $x_j'$ (instead of $x_j$).
- Page 99, second line to last, "$(x_1, x_1', x_2, x_2') \mapsto
K'(y_1, y_2)" should read "$(x_1, x_1', x_2, x_2') \mapsto K'(x_1',
x_2')".
- Page 100, last two sentences in proof of Theorem 5.3: "$K_n$"
should instead be "$K^n$" (in two places).
- Page 102, last line of Theorem 5.5: "$w \cdot \Phi(x)$" should be
"$\langle w, \Phi(x) \rangle$" (for consistency of notation).
- Page 121, definition 6.1: "$\epsilon > 0$ and" should be removed.
- Page 121, definition 6.1: "$poly(1/\epsilon, 1/\delta, n, size(c)$"
should instead read "$poly(1/\delta, n, size(c))$".
- Page 121, definition 6.1: it should be added that "$O(n)$ is
a bound on the cost of representing $x \in \cal X$".
- Page 125, last sentence of proof of Theorem 6.1: "follows from
the identity" should read "follows from the inequality".
- Page 135, Theorem 6.3: Instead of "$a_t > 0$" it should read
"$\alpha_t > 0$".
- Page 167, pseudocode of Dual Perceptron algorithm: $\alpha_{t +
1}$ should be replaced by $\alpha_t$.
- Page 168, pseudocode of Kernel Perceptron algorithm: $\alpha_{t +
1}$ should be replaced by $\alpha_t$.
- Page 170, in the inequality for $\Phi_{t + 1} - \Phi_{t}$, the
following two intermediate lines should be inserted just before the
last inequality for more explanation:

& = \log \E_{i \sim \w_t}\big[ \exp(\eta y_t x_{t, i} - \eta y_t \w_t \cdot x_{t} + \eta y_t \w_t \cdot x_{t}) \big] - \eta \rho_\infty\\

& \leq \log \big[ \exp(\eta^2 (2 r_\infty)^2/8) \big] + \underbrace{\eta y_t (\w_t \cdot x_{t})}_{\leq 0} - \eta \rho_\infty\\[-.35cm] - Page 181, exercise 7.10, second paragraph: the definition of
$m_i$ in the first sentence of that paragraph is given in the special
case of the zero-one loss. For the general case, the sentence should
be replaced by: "Let $m_i$ be the cumulative loss of hypothesis $h_i$
on the points $(x_i, \ldots, x_T)$, that is $m_i = \sum_{t = i}^T
L(h_i(x_t), y_t)$".
- Page 181, exercise 7.10, in the text following the inline equation: $i^* = argmin_i m_i / (T - i)$ should be replaced by $i^* = argmin_i m_i / (T - i + 1)$ .
- page 189, third paragraph: $W=(w_1^\top, \ldots, w_k^\top)^\top$ should read $W=(w_1, \ldots, w_k)^\top$.
- Page 190, line 5: the empirical Rademacher complexity symbol should be replaced by that of Rademacher complexity.
- Page 191, equation 8.12: the factor 4k^2 should be 2k^2 instead.
- Page 191, optimization problem: the constraints $\xi_i \geq 0$ should be added.
- Page 192, section 8.3.2 first paragraph: "exercise 9.5" should read "exercise 8.4".
- Page 207, exercise 8.4: "family of base hypothesis" should read "family of base hypotheses".
- Page 283, inline after (12.2): $U^\top X X^\top U$ should read $Tr[U^\top X X^\top U]$.
- Page 354, paragraph preceding Definition B.7: all "$x$" should be
bolded.
- Page 357, equation (B.13): "$g(\bar x_i)$" should instead be
"$g_i(\bar x)$".
- Page 357, proof of Theorem B.8: "$\bar x$" should be bolded
throughout.
- Page 357, proof of Theorem B.8, inline equation: the second and
fourth inequalities should be equalities. Also, the final inequality
holds due to the "(third condition and g(x) \leq 0)".
- Page 369, first line of section D.1: extra space before the comma should be removed.
- Page 370, line 5: $\phi'(t) \leq \frac{(b-a)^2}{4}$ should read
$\phi''(t) \leq\frac{(b-a)^2}{4}$.
- Page 371, last line of lemma D.2: $\E[e^{sV} | Z ]$ should read
$\E[e^{tV} | Z ]$.