Compressed hierarchical clustering

Gilad Baruch, Dana Shapira, Shmuel T. Klein

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


Hierarchical Clustering is widely used in Machine Learning and Data Mining. It stores bit-vectors in the nodes of a k-ary tree, usually without trying to compress them. We suggest a double usage of the {\sf xor}ing operations defining the Hamming distance used in the clustering process, extending it also to be used to transform the vector in one node into a more compressible form, as a function of the vector in the parent node. Compression is then achieved by run-length encoding, followed by optional Huffman coding, and we show how the compressed file may be processed directly, without decompression.

Original languageEnglish
Title of host publicationProceedings - DCC 2018
Subtitle of host publication2018 Data Compression Conference
EditorsAli Bilgin, James A. Storer, Joan Serra-Sagrista, Michael W. Marcellin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages1
ISBN (Electronic)9781538648834
StatePublished - 19 Jul 2018
Event2018 Data Compression Conference, DCC 2018 - Snowbird, United States
Duration: 27 Mar 201830 Mar 2018

Publication series

NameData Compression Conference Proceedings
ISSN (Print)1068-0314


Conference2018 Data Compression Conference, DCC 2018
Country/TerritoryUnited States


  • Hamming distance
  • Hierarchical Clustering
  • Run length encoding


Dive into the research topics of 'Compressed hierarchical clustering'. Together they form a unique fingerprint.

Cite this