# On the Definition of Uniform Convergence Property and Representativeness

This post reviews the definition of uniform convergence (UC) property of a hypothesis class which is covered in the book
"Ben David S. et at. (2014), *Understanding Machine Learning*, Cambridge University Press".
It also visits the idea of the representativeness of a training sample and its property related to UC, also mentioned in the book.

## Representativeness of a Training Sample

**Definition of $\epsilon$-representative sample**
A training set S is called $\epsilon$-representative (w.r.t. domain $Z$, hypothesis class $\mathcal{H}$
, loss function $\mathcal{l}$, and distribution $\mathcal{D}$) if:
$$
\forall h\in \mathcal{H},\ |L_S(h_S)-L_D(h_S)|\lt \epsilon.
$$

Based on this definition, one can easily show that the following lemma is true:

### Lemma

Assume that a training set S is $\epsilon/2$-representative (w.r.t. domain $Z$, hypothesis class $\mathcal{H}$ , loss function $\mathcal{l}$, and distribution $\mathcal{D}$). Then, any output of $ERM_\mathcal{H}(S)$, namely, any $h_S \in \underset{h\in H}{argmin} L_S(h)$, satisfies

$$ L_\mathcal{D}(h_S) \le \min_{h\in\mathcal{H}}L_\mathcal{D}(h) + \epsilon $$*Proof.* Let $h^\star = \underset{h\in\mathcal{H}}{argmin} L_D(h)$, then

The above lemma clearly implies that if the probability of an $\frac{\epsilon}{2}$-representative sample (with size not less than the value of some function of $\epsilon$ and $\delta$) is at least $1 - \delta$, then the hypothesis class is guaranteed to be agnostic PAC learnable with the ERM's rule is a successful learner. Or in symbols:

$$ \mathcal{P}[\lbrace S: |S| \gt m_{H}^{UC}(\epsilon, \delta) \wedge S\text{ is }\frac{\epsilon}{2}\text{-representitive}\rbrace] \ge 1-\delta\rightarrow\text{ERM's rule is a successful agnostic PAC learner}. $$The reason is because the probability of a training sample to be $\frac{\epsilon}{2}$-representative not less than $1 - \delta$ means that for any traning sample $S$, we always have that $$ \mathbb{P}\lbrace h_{S}\in\mathcal{H}: L_\mathcal{D}(h_S) \le \min_{h_{S}\in\mathcal{H}}L_\mathcal{D}(h) + \epsilon\rbrace \ge 1- \delta, $$ for all training $S$ with $|S|\ge m_\mathcal{H}^{UC}(\epsilon, \delta)$.

The above observation motivates the definition of uniform convergence property.

## Uniform Convergence

**Definition of uniform convergence**.
We say that a hypothesis class $\mathcal{H}$ has the uniform convergence property
(w.r.t domain $Z$, hypothesis class $\mathcal{H}$, loss function $\mathcal{l}$, and distribution $\mathcal{D}$)
if there exists a function $m_\mathcal{H}^{UC}(\epsilon, \delta) : (0,1)^2\to\mathbb{N}$ such that for every $\epsilon
, \delta \in(0,1)$ and for every probability distribution $\mathcal{D}$ over $Z$
, if $S$ is a sample of $m \ge m_\mathcal{H}^{UC}(\epsilon, \delta)$ examples drawn i.i.d. according to $\mathcal{D}$
, then, with probability of at least 1 − $\delta$, $S$ is $\epsilon$-representative.