How to Achieve Minimax Expected Kullback-Leibler Distance from an Unknown Finite Distirbution
Abstract. We consider a problem that is related to the ``Universal Encoding Problem'' from information theory. The basic goal is to find rules that map ``partial information'' about a distribution X over an m letter alphabet into a guess X' for X such that the Kullback-Leibler divergence between X and X' is as small as possible. The cost associated with a rule is the maximal expected Kullback-Leibler divergence between X and X'. First, we show that the cost associated with the well-known add-one rule equals ln(1+(m-1)/(n+1)) thereby extending a result of Forster and Warmuth to m >= 3. Second, we derive an absolute (as opposed to asymptotic) lower bound on the smallest possible cost. Technically, this is done by determining (almost exactly) the Bayes error of the add-one rule with a uniform prior (where the asymptotics for n --> infinity was known before). Third, we hint to tools from approximation theory and support the conjecture that there exists a rule whose cost asymptotically matches the theoretical barrier from the lower bound.