Nearly optimal classification for semimetrics

Lee Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch

Research output: Contribution to conferencePaperpeer-review

3 Scopus citations

Abstract

We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the density dimension dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates. Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d = (R/γ)dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.

Original languageEnglish
Pages379-388
Number of pages10
StatePublished - 2016
Event19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016 - Cadiz, Spain
Duration: 9 May 201611 May 2016

Conference

Conference19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016
Country/TerritorySpain
CityCadiz
Period9/05/1611/05/16

Fingerprint

Dive into the research topics of 'Nearly optimal classification for semimetrics'. Together they form a unique fingerprint.

Cite this