From Noise-Free to Noise-Tolerant and from On-line to Batch Learning
Abstract. A simple method is presented which, loosely speaking,
virtually removes noise or misfit from data, and thereby converts a
"noise-free" algorithm A, which on-line learns linear functions
from data without noise or misfit, into a "noise-tolerant" algorithm Ant
which learns linear functions from data containing noise or misfit. Given
some technical conditions, this conversion preserves optimally. For instance,
the optimal noise-free algorithm B of Bernstein from [3] is converted
into an optimal noise-tolerant algorithm Bnt. The conversion
also works properly for all function classes which are closed under addition
and contain linear functions as a subclass.
In the second part of the paper, we show that Bernstein's on-line learning
algorithm B can be converted into a batch learning algorithm \hat
{B} which consumes an (almost) minimal number of random training
examples. This is true for a whole class of "pac-style" batch learning
models (including learning with an (\varepsilon,\gamma)-good model and
learning with an \varepsilon -bounded expected r-loss). The upper
bound on the sample size is shown by applying a method of Littlestone from
[9] which converts an on-line learning algorithm A into a batch
learning algorithm \hat {A} and relates the sample size of \hat
{A} to the total loss of A. For this purpose, Littlestone's
method is extended from 0,1-valued functions to real-valued functions.
The lower bound on the sample size is shown by generalizing and applying
a method of Simon from [11].
References:
[3] Ethan J. Bernstein, Absolute error bounds for learning
linear functions online, in "Proceedings of the 5th Annual ACM Workshop
on
Computational Learning Theory",
pages 160-164, New York, 1992, ACM Press.
[9] Nick Littlestone, From on-line to batch learning, in
"Proceedings of the 2nd Annual Workshop on Computational Learning Theory",
pages 269-284, Palo Alto, California,
1989, Morgan Kaufmann.
[11] Hans Ulrich Simon, Bounds on the number of examples needed for
learning functions, in "Proceedings of the 1st European Conference on
Computational Learning Theory",
pages 83-95, Oxford University Press, 1993.