TY - GEN
T1 - Compressed hierarchical clustering
AU - Baruch, Gilad
AU - Shapira, Dana
AU - Klein, Shmuel T.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/19
Y1 - 2018/7/19
N2 - Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a double usage of the {\sf xor}ing operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.
AB - Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a double usage of the {\sf xor}ing operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.
KW - Hamming distance
KW - Hierarchical Clustering
KW - Run length encoding
UR - http://www.scopus.com/inward/record.url?scp=85050972435&partnerID=8YFLogxK
U2 - 10.1109/DCC.2018.00052
DO - 10.1109/DCC.2018.00052
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85050972435
T3 - Data Compression Conference Proceedings
SP - 399
BT - Proceedings - DCC 2018
A2 - Bilgin, Ali
A2 - Storer, James A.
A2 - Serra-Sagrista, Joan
A2 - Marcellin, Michael W.
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 Data Compression Conference, DCC 2018
Y2 - 27 March 2018 through 30 March 2018
ER -