Based on my handwritten notes, this post records the solutions to the exercises 1.1 through 1.13 of PRML Chapter 1.
1.1
This problem is about minimizing the error function $E(\vec{w})$. \(E(\vec{w}) = \frac{1}{2} \sum_{n=1}^N \{y(x_n, \vec{w}) - t_n\}^2\) Here, the polynomial is $y(x, \vec{w}) = \sum_{j=0}^M w_j x^j$. We find the point where the partial derivative with respect to the weight $w_i$ becomes 0. \(\frac{\partial E}{\partial w_i} = \sum_{n=1}^N \{y(x_n, \vec{w}) - t_n\} x_n^i = 0\) \(\sum_{n=1}^N \left( \sum_{j=0}^M w_j x_n^j \right) x_n^i - \sum_{n=1}^N t_n x_n^i = 0\) \(\sum_{j=0}^M \left( \sum_{n=1}^N x_n^{i+j} \right) w_j = \sum_{n=1}^N t_n x_n^i\)
Letting $A_{ij} = \sum_{n=1}^N x_n^{i+j}$ and $T_i = \sum_{n=1}^N t_n x_n^i$, we obtain the following linear equation: \(\sum_{j=0}^M A_{ij}w_j = T_i\)
1.2
Minimizing the error function $\tilde{E}(\vec{w})$ with an added regularization term. \(\tilde{E}(\vec{w}) = \frac{1}{2} \sum_{n=1}^N \{y(x_n, \vec{w}) - t_n\}^2 + \frac{\lambda}{2} \|\vec{w}\|^2\) Similarly, we take the partial derivative with respect to $w_i$. \(\frac{\partial \tilde{E}}{\partial w_i} = \sum_{n=1}^N \{y(x_n, \vec{w}) - t_n\} x_n^i + \lambda w_i = 0\) \(\sum_{j=0}^M \left( \sum_{n=1}^N x_n^{i+j} \right) w_j + \lambda w_i = \sum_{n=1}^N t_n x_n^i\) Using the Kronecker delta $\delta_{ij}$ to express $\lambda w_i = \sum_{j=0}^M \lambda \delta_{ij} w_j$, it can be rearranged as follows: \(\sum_{j=0}^M (A_{ij} + \lambda \delta_{ij})w_j = T_i\)
1.3
A probability problem. We calculate the probability of picking a fruit from the boxes. Given probabilities: $P(r) = 0.2, P(b) = 0.2, P(g) = 0.6$ The total probability of choosing an apple is as follows: \(P(\text{apple}) = 0.2 \cdot \frac{1}{10} + 0.2 \cdot \frac{1}{2} + 0.6 \cdot \frac{3}{10} = \frac{0.2 + 1 + 1.8}{10} = \frac{3}{10}\) Using Bayes’ theorem, we find the conditional probability that it was box $g$ given that an orange was picked. \(P(g|\text{orange}) = \frac{0.6 \cdot \frac{3}{10}}{0.2 \cdot \frac{9}{10} + 0.2 \cdot \frac{1}{2} + 0.6 \cdot \frac{3}{10}} = \frac{0.18}{0.18 + 0.1 + 0.18} = \frac{0.18}{0.36} = \frac{1}{2}\)
1.4
Finding the position of the maximum value of a probability density function under a change of variables. \(p_y(y) = p_x(x) \left| \frac{dx}{dy} \right| = p_x(g(y)) |g'(y)|\) To find the maximum, we differentiate and find the point $\hat{y}$ where it equals 0. \(\frac{\partial p_y(y)}{\partial y} \bigg|_{\hat{y}} = 0\) Applying the product rule yields the following: \(p_x'(g(\hat{y})) (\pm g'(\hat{y})^2) + p_x(g(\hat{y})) (\pm g''(\hat{y})) = 0\)
1.5
Proof of the basic property of variance. \(Var[f] = \mathbb{E}[(f(x) - \mathbb{E}[f(x)])^2]\) \(= \mathbb{E}[f^2 - 2f\mathbb{E}[f] + \mathbb{E}[f]^2]\) Rearranging by the linearity of expectation, \(= \mathbb{E}[f^2] - 2(\mathbb{E}[f])^2 + \mathbb{E}[f]^2 = \mathbb{E}[f^2] - \mathbb{E}[f]^2\)
1.6
Showing that the covariance is 0 when two variables $x$ and $y$ are independent ($x \perp y$). \(Cov(x,y) = \mathbb{E}[(x - \mathbb{E}[x])(y - \mathbb{E}[y])]\) \(= \mathbb{E}[xy - x\mathbb{E}[y] - y\mathbb{E}[x] + \mathbb{E}[x]\mathbb{E}[y]]\) \(= \mathbb{E}[xy] - 2\mathbb{E}[x]\mathbb{E}[y] + \mathbb{E}[x]\mathbb{E}[y]\) Since they are independent, $\mathbb{E}[xy] = \mathbb{E}[x]\mathbb{E}[y]$ holds. \(= \mathbb{E}[x]\mathbb{E}[y] - \mathbb{E}[x]\mathbb{E}[y] = 0\)
1.7
Solving the Gaussian integral using the polar coordinate system. Substituting $x = r\cos\theta, y = r\sin\theta$, the Jacobian determinant is $|J| = r$. \(\int_0^{2\pi} \int_0^\infty \exp\left(-\frac{1}{2\sigma^2}r^2\right) r dr d\theta\) Substituting $u = -\frac{1}{2\sigma^2}r^2$ gives $du = -\frac{1}{\sigma^2}r dr$. The calculation results in $I^2 = 2\pi\sigma^2$, so we get $I = \sigma\sqrt{2\pi}$.
1.8
Proof of the mean and variance of the Gaussian distribution. \(\mathcal{N}(x|\mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} \exp \left\{ -\frac{1}{2\sigma^2}(x - \mu)^2 \right\}\) The mean $\mathbb{E}[x]$ can be shown to be $\mu$ through the substitution $t = x - \mu$. By differentiating the equation that the total integral is 1 with respect to $\sigma^2$, we derive $\mathbb{E}[x^2] = \mu^2 + \sigma^2$. Consequently, $Var(x) = \mathbb{E}[x^2] - \mathbb{E}[x]^2 = \mu^2 + \sigma^2 - \mu^2 = \sigma^2$.
1.9
The differentiation process to find the mode of a multivariate Gaussian distribution. We take the partial derivative of the exponent with respect to $x_k$ and find where it equals 0. \(\frac{\partial}{\partial x_k} \left( -\frac{1}{2} (x_i - \mu_i)(\Sigma^{-1})_{ij}(x_j - \mu_j) \right) = 0\) Rearranging using the chain rule and symmetry, \((\Sigma^{-1})_{ik}(x_i - \mu_i) = 0\) Thus, it has its maximum at $\mathbf{x} = \boldsymbol{\mu}$ ($x_k = \mu_k$).
1.10
Proof of the addition rule for expectation and variance. Linearity of expectation: \(\mathbb{E}[x+z] = \int \int (x+z)p(x,z)dxdz = \int x p(x)dx + \int z p(z)dz = \mathbb{E}[x] + \mathbb{E}[z]\) Variance for independent variables: \(Var[x+z] = \int \int ((x-\mu_x) + (z-\mu_z))^2 p(x,z) dxdz\) The cross-term disappears due to the independence condition ($Cov(x,z)=0$), resulting in $Var(x) + Var(z)$.
1.11
Maximum Likelihood Estimation (MLE) of a 1-dimensional Gaussian distribution. Differentiate the log-likelihood $\ln p$ with respect to $\mu$ and $\sigma^2$, respectively. \(\frac{\partial \ln p}{\partial \mu} = \sum_{n=1}^N \frac{x_n - \mu}{\sigma^2} = 0 \implies \mu_{ML} = \frac{1}{N}\sum_{n=1}^N x_n\) \(\frac{\partial \ln p}{\partial \sigma^2} = 0 \implies \sigma^2_{ML} = \frac{1}{N}\sum_{n=1}^N (x_n - \mu)^2\)
1.12
Proof of the bias of the MLE variance. $\mathbb{E}[x_n x_m]$ is $\mu^2 + \sigma^2$ when $n=m$, and $\mu^2$ otherwise. \(\mathbb{E}[\mu_{ML}] = \mu\) \(\mathbb{E}[\sigma^2_{ML}] = E \left[ \frac{1}{N}\sum_{n=1}^N (x_n - \mu_{ML})^2 \right]\) By expanding this and applying the expectation, we ultimately obtain the following result, confirming it is a biased estimator. \(\mathbb{E}[\sigma^2_{ML}] = \frac{N-1}{N}\sigma^2\)
1.13
The expectation of the variance estimator when the population mean is known. \(E \left[ \frac{1}{N}\sum_{n=1}^N (x_n - \mu)^2 \right] = \frac{1}{N}\sum_{n=1}^N (\mathbb{E}[x_n^2] - 2\mu \mathbb{E}[x_n] + \mu^2)\) \(= \mathbb{E}[x_n^2] - 2\mu^2 + \mu^2 = Var(x)\)