Noise-tolerant Learning Near the Information-theoretic Bound
Abstract. We investigate the sample size necessary for PAC
learning in the presence of malicious noise, a type of adversarial noise
model introduced by Kearns and Li. We prove the first nontrivial sample
complexity lower bound in this model by showing that order of \varepsilon
/ (\Delta)2 + d/ \Delta examples are necessary for PAC
learning any target class of {0,1}-valued functions of VC dimension d,
where \varepsilon the desired accuracy and \eta = \varepsilon / (1+\varepsilon)
-\Delta is the malicious noise rate (it is well known that any nontrivial
target class cannot be PAC learned with accuracy \varepsilon and malicious
noise \eta >= \varepsilon / (1+\varepsilon), this irrespective to sample
complexity.)
This result cannot be significantly improved in general. In fact, we
show a learning algorithm for the same noise model that, for each d,
learns the class of all subsets of d elements using a number of
examples of the same order as that proven necessary by our lower bound
(disregarding logarithmic factors). In contrast, we show that to learn
any target class of VC-dimension d in the presence of a high malicious
noise rate \eta (i.e. \Delta = o(\varepsilon), where \Delta = \varepsilon
/ (1+\varepsilon) - \eta), the popular strategy choosing any hypothesis
that minimizes disagreements on the sample needs at least d\varepsilon
/ (\Delta)2 examples. This implies that, for high noise rate
and for any target class of VC-dimension large enough, there are distributions
over the target domain where the minimum disagreement strategy is outperformed
by our algorithm.