The complexity of densest region detection
Abstract. We investigate the computational complexity of
the task of detecting dense regions of an unknown distribution from
unlabeled samples of this distribution. We introduce a formal learning
model for this task that uses a hypothesis class as its
"anti-overfitting" mechanism.
The learning task in our model can be reduced to a combinatorial
optimization problem. Since we can show that for some constants,
depending on the hypothesis class, these problems are NP hard to
approximate, we introduce relaxations of the basic learning task. The
relaxations augment our otherwise agnostic model with some
regularization assumptions concerning the sample-generating
distributions.
Quite surprisingly, we discover that for each of the two hypothesis
classes that we investigate, there is a "critical value" of
the relaxation parameter.
For any value below the critical value the problems are NP hard to
approximate, while, once this value is exceeded, the problems become
poly-time approximable.