Applying Compression to Hierarchical Clustering

Gilad Baruch, Shmuel Tomi Klein, Dana Shapira

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations


Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a data compression application of hierarchical clustering with a double usage of the xoring operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.

Original languageEnglish
Title of host publicationSimilarity Search and Applications - 11th International Conference, SISAP 2018, Proceedings
EditorsStéphane Marchand-Maillet, Yasin N. Silva, Edgar Chávez
Number of pages12
StatePublished - 2018
Event11th International Conference on Similarity Search and Applications, SISAP 2018 - Lima, Peru
Duration: 7 Oct 20189 Oct 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11223 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference11th International Conference on Similarity Search and Applications, SISAP 2018


Dive into the research topics of 'Applying Compression to Hierarchical Clustering'. Together they form a unique fingerprint.

Cite this