Re-sampling Pt.1: Cross-Validation

Today and over the next three posts we will talk about re-sampling methods, which is a family of approaches to synthesizing multiple data samples from one original data set.

There are a number of reasons why you may want to do that and a number of ways in which you could do that. Specific re-sampling methods all differ in the way they generate new samples and subsequently in their computational complexity and bias-variance trade-off.  Because of this, specific re-sampling methods differ in their suitability for various specific purposes.

Re-sampling Pt.2: Jackknife and Bootstrap

Suppose we have a sample of $n$ data points and we want to estimate some parameter $\theta$.  We come up with $\hat{\theta}$ – an estimator of $\theta$.  What do we know about $\hat{\theta}$?  How good an estimator of $\theta$ is it?  Is it biased?  How efficient is it?
We could answer these questions if we knew the distribution of the population  from which $x_{1}, \text{...}, x_{n}$ came. More often than not however, we don’t know anything about the distribution of the underlying population, all we have is a sample and we want to figure out things about the population.
This is where re-sampling, such as jackknife or bootstrapping comes into play.

$\text{MSE}\small\left(\hat{\theta}\small\right) = \normalsize\left[\text{Bias}\small\left(\hat{\theta}\small\right)\normalsize\right]^{2} + \text{Var}\small\left(\hat{\theta}\small\right)$