Randomized Hypotheses and Minimum Disagreement Hypothesis for Learning with Noise
Abstract. In this paper we prove various results about PAC
learning in the presence of malicious and random classification noise.
Our main theme is the use of randomized hypotheses for learning with small
sample sizes and high malicious noise rates. We show an algorithm that
PAC learns any target class of VC-dimension d using randomized hypotheses
and order of d/ \varepsilon training examples (up to logarithmic
factors) while tolerating malicious noise rates even slightly larger than
the information-theoretic bound \varepsilon /(1+\varepsilon) for deterministic
hypothesis. Combined with previous results, this implies that a lower bound
d/ \Delta + \varepsilon / (\Delta)2 on the sample size,
where \eta = \varepsilon /(1+\varepsilon)-\Delta is the malicious noise
rate, applies only when using deterministic hypotheses. We then show that
the information-theoretic upper bound on the noise rate for deterministic
hypotheses can be replaced by 2\varepsilon / (1+2\varepsilon) if randomized
hypotheses are used. Investigating further the use of randomized hypotheses,
we show a strategy for learning the powerset of d elements using
an optimal sample size of order d\varepsilon / (\Delta)2
(up to logarithmic factors) and tolerating a noise rate \eta = 2\varepsilon
/(1+2\varepsilon)-\Delta. We complement this result by proving that this
sample size is also necessary for any class C of VC-dimension
d.
We then discuss the performance of the minimum disagreement strategy
under both malicious and random classification noise models. For malicious
noise we show an algorithm that, using deterministic hypotheses, learns
unicorns of d intervals on the continous domain [0,1) using a sample
size significantly smaller than that needed by the minimum disagreement
strategy. For classification noise we show, generalizing a result by Laird,
that order of d / (\varepsilon(\Delta)2) training examples
suffice (up to logarithmic factors) to learn by minimizing disagreements
any target class of VC-dimension d tolerating random classification
noise rate \eta =1/2 - \Delta. Using a lower bound by Simon, we also prove
that this sample size bound cannot be significantly improved.