TY - JOUR

T1 - Large-width bounds for learning half-spaces on distance spaces

AU - Anthony, Martin

AU - Ratsaby, Joel

N1 - Publisher Copyright:
© 2018 Elsevier B.V.

PY - 2018/7/10

Y1 - 2018/7/10

N2 - A half-space over a distance space is a generalization of a half-space in a vector space. An important advantage of a distance space over a metric space is that the triangle inequality need not be satisfied, which makes our results potentially very useful in practice. Given two points in a set, a half-space is defined by them, as the set of all points closer to the first point than to the second. In this paper we consider the problem of learning half-spaces in any finite distance space, that is, any finite set equipped with a distance function. We make use of a notion of ‘width’ of a half-space at a given point: this is defined as the difference between the distances of the point to the two points that define the half-space. We obtain probabilistic bounds on the generalization error when learning half-spaces from samples. These bounds depend on the empirical error (the fraction of sample points on which the half-space does not achieve a large width) and on the VC-dimension of the effective class of half-spaces that have a large sample width. Unlike some previous work on learning classification over metric spaces, the bound does not involve the covering number of the space, and can therefore be tighter.

AB - A half-space over a distance space is a generalization of a half-space in a vector space. An important advantage of a distance space over a metric space is that the triangle inequality need not be satisfied, which makes our results potentially very useful in practice. Given two points in a set, a half-space is defined by them, as the set of all points closer to the first point than to the second. In this paper we consider the problem of learning half-spaces in any finite distance space, that is, any finite set equipped with a distance function. We make use of a notion of ‘width’ of a half-space at a given point: this is defined as the difference between the distances of the point to the two points that define the half-space. We obtain probabilistic bounds on the generalization error when learning half-spaces from samples. These bounds depend on the empirical error (the fraction of sample points on which the half-space does not achieve a large width) and on the VC-dimension of the effective class of half-spaces that have a large sample width. Unlike some previous work on learning classification over metric spaces, the bound does not involve the covering number of the space, and can therefore be tighter.

KW - Distance and metric spaces

KW - Half spaces

KW - Large width learning

KW - Margin

KW - Pseudo rank

UR - http://www.scopus.com/inward/record.url?scp=85044334901&partnerID=8YFLogxK

U2 - 10.1016/j.dam.2018.02.004

DO - 10.1016/j.dam.2018.02.004

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:85044334901

SN - 0166-218X

VL - 243

SP - 73

EP - 89

JO - Discrete Applied Mathematics

JF - Discrete Applied Mathematics

ER -