TY - GEN
T1 - Applying Compression to Hierarchical Clustering
AU - Baruch, Gilad
AU - Klein, Shmuel Tomi
AU - Shapira, Dana
N1 - Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a data compression application of hierarchical clustering with a double usage of the xoring operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.
AB - Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a data compression application of hierarchical clustering with a double usage of the xoring operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.
UR - http://www.scopus.com/inward/record.url?scp=85055083865&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-02224-2_12
DO - 10.1007/978-3-030-02224-2_12
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85055083865
SN - 9783030022235
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 151
EP - 162
BT - Similarity Search and Applications - 11th International Conference, SISAP 2018, Proceedings
A2 - Marchand-Maillet, Stéphane
A2 - Silva, Yasin N.
A2 - Chávez, Edgar
T2 - 11th International Conference on Similarity Search and Applications, SISAP 2018
Y2 - 7 October 2018 through 9 October 2018
ER -