TY - GEN
T1 - Efficient error setting for subspace miners
AU - Shaham, Eran
AU - Sarne, David
AU - Ben-Moshe, Boaz
PY - 2014
Y1 - 2014
N2 - A typical mining problem is the extraction of patterns from subspaces of multidimensional data. Such patterns, known as a biclusters, comprise subsets of objects that behave similarly across subsets of attributes, and may overlap each other, i.e., objects/attributes may belong to several patterns, or to none. For many miners, a key input parameter is the maximum allowed error used which greatly affects the quality, quantity and coherency of the mined clusters. As the error is dataset dependent, setting it demands either domain knowledge or some trial-and-error. The paper presents a new method for automatically setting the error to the value that maximizes the number of clusters mined. This error value is strongly correlated to the value for which performance scores are maximized. The correlation is extensively evaluated using six datasets, two mining algorithms, seven prevailing performance measures, and compared with five prior literature methods, demonstrating a substantial improvement in the mining score.
AB - A typical mining problem is the extraction of patterns from subspaces of multidimensional data. Such patterns, known as a biclusters, comprise subsets of objects that behave similarly across subsets of attributes, and may overlap each other, i.e., objects/attributes may belong to several patterns, or to none. For many miners, a key input parameter is the maximum allowed error used which greatly affects the quality, quantity and coherency of the mined clusters. As the error is dataset dependent, setting it demands either domain knowledge or some trial-and-error. The paper presents a new method for automatically setting the error to the value that maximizes the number of clusters mined. This error value is strongly correlated to the value for which performance scores are maximized. The correlation is extensively evaluated using six datasets, two mining algorithms, seven prevailing performance measures, and compared with five prior literature methods, demonstrating a substantial improvement in the mining score.
KW - Biclustering
KW - Error Setting
KW - Subspace Mining
UR - http://www.scopus.com/inward/record.url?scp=84958532588&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-08979-9_1
DO - 10.1007/978-3-319-08979-9_1
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84958532588
SN - 9783319089782
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 1
EP - 15
BT - Machine Learning and Data Mining in Pattern Recognition - 10th International Conference, MLDM 2014, Proceedings
T2 - 10th International Conference on Machine Learning and Data Mining in Pattern Recognition, MLDM 2014
Y2 - 21 July 2014 through 24 July 2014
ER -