TY - JOUR
T1 - Large-width bounds for learning half-spaces on distance spaces
AU - Anthony, Martin
AU - Ratsaby, Joel
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/7/10
Y1 - 2018/7/10
N2 - A half-space over a distance space is a generalization of a half-space in a vector space. An important advantage of a distance space over a metric space is that the triangle inequality need not be satisfied, which makes our results potentially very useful in practice. Given two points in a set, a half-space is defined by them, as the set of all points closer to the first point than to the second. In this paper we consider the problem of learning half-spaces in any finite distance space, that is, any finite set equipped with a distance function. We make use of a notion of ‘width’ of a half-space at a given point: this is defined as the difference between the distances of the point to the two points that define the half-space. We obtain probabilistic bounds on the generalization error when learning half-spaces from samples. These bounds depend on the empirical error (the fraction of sample points on which the half-space does not achieve a large width) and on the VC-dimension of the effective class of half-spaces that have a large sample width. Unlike some previous work on learning classification over metric spaces, the bound does not involve the covering number of the space, and can therefore be tighter.
AB - A half-space over a distance space is a generalization of a half-space in a vector space. An important advantage of a distance space over a metric space is that the triangle inequality need not be satisfied, which makes our results potentially very useful in practice. Given two points in a set, a half-space is defined by them, as the set of all points closer to the first point than to the second. In this paper we consider the problem of learning half-spaces in any finite distance space, that is, any finite set equipped with a distance function. We make use of a notion of ‘width’ of a half-space at a given point: this is defined as the difference between the distances of the point to the two points that define the half-space. We obtain probabilistic bounds on the generalization error when learning half-spaces from samples. These bounds depend on the empirical error (the fraction of sample points on which the half-space does not achieve a large width) and on the VC-dimension of the effective class of half-spaces that have a large sample width. Unlike some previous work on learning classification over metric spaces, the bound does not involve the covering number of the space, and can therefore be tighter.
KW - Distance and metric spaces
KW - Half spaces
KW - Large width learning
KW - Margin
KW - Pseudo rank
UR - http://www.scopus.com/inward/record.url?scp=85044334901&partnerID=8YFLogxK
U2 - 10.1016/j.dam.2018.02.004
DO - 10.1016/j.dam.2018.02.004
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85044334901
SN - 0166-218X
VL - 243
SP - 73
EP - 89
JO - Discrete Applied Mathematics
JF - Discrete Applied Mathematics
ER -