Lehrstuhl Mathematik & Informatik

The complexity of densest region detection

Abstract. We investigate the computational complexity of the task of detecting dense regions of an unknown distribution from unlabeled samples of this distribution. We introduce a formal learning model for this task that uses a hypothesis class as its "anti-overfitting" mechanism.
The learning task in our model can be reduced to a combinatorial optimization problem. Since we can show that for some constants, depending on the hypothesis class, these problems are NP hard to approximate, we introduce relaxations of the basic learning task. The relaxations augment our otherwise agnostic model with some regularization assumptions concerning the sample-generating distributions.
Quite surprisingly, we discover that for each of the two hypothesis classes that we investigate, there is a "critical value" of the relaxation parameter.
For any value below the critical value the problems are NP hard to approximate, while, once this value is exceeded, the problems become poly-time approximable.

Fakultät für Mathematik

The complexity of densest region detection