TY - JOUR
T1 - One-Class SVMs for Document Classification
AU - Manevitz, Larry M.
AU - Yousef, Malik
N1 - Publisher Copyright:
© 2001 Larry M. Manevitz and Malik Yousef.
PY - 2002
Y1 - 2002
N2 - We implemented versions of the SVM appropriate for one-class classification in the context of information retrieval. The experiments were conducted on the standard Reuters data set. For the SVM implementation we used both a version of Schölkopf et al. and a somewhat different version of one-class SVM based on identifying “outlier” data as representative of the second-class. We report on experiments with different kernels for both of these implementations and with different representations of the data, including binary vectors, tf-idf representation and a modification called “Hadamard” representation. Then we compared it with one-class versions of the algorithms prototype (Rocchio), nearest neighbor, naive Bayes, and finally a natural one-class neural network classification method based on “bottleneck” compression generated filters. The SVM approach as represented by Schölkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable. However, the SVM methods turned out to be quite sensitive to the choice of representation and kernel in ways which are not well understood; therefore, for the time being leaving the neural network approach as the most robust.
AB - We implemented versions of the SVM appropriate for one-class classification in the context of information retrieval. The experiments were conducted on the standard Reuters data set. For the SVM implementation we used both a version of Schölkopf et al. and a somewhat different version of one-class SVM based on identifying “outlier” data as representative of the second-class. We report on experiments with different kernels for both of these implementations and with different representations of the data, including binary vectors, tf-idf representation and a modification called “Hadamard” representation. Then we compared it with one-class versions of the algorithms prototype (Rocchio), nearest neighbor, naive Bayes, and finally a natural one-class neural network classification method based on “bottleneck” compression generated filters. The SVM approach as represented by Schölkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable. However, the SVM methods turned out to be quite sensitive to the choice of representation and kernel in ways which are not well understood; therefore, for the time being leaving the neural network approach as the most robust.
KW - Compression Neural Network
KW - Neural Network
KW - Positive Information
KW - Support Vector Machine
KW - SVM
KW - Text Retrieval
UR - http://www.scopus.com/inward/record.url?scp=85096855936&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85096855936
SN - 1532-4435
VL - 2
SP - 139
EP - 154
JO - Journal of Machine Learning Research
JF - Journal of Machine Learning Research
ER -