Controlling the chunk-size in deduplication systems

Michael Hirsch, Shmuel T. Klein, Dana Shapira, Yair Toaff

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

A special case of data compression in which repeated chunks of data are stored only once is known as deduplication. The input data is cut into chunks and a cryptographically strong hash value of each (different) chunk is stored. To restrict the influence of small inserts and deletes to local perturbations, the chunk boundaries are usually defined in a data dependent way, which implies that the chunks are of variable length. Usually, the chunk sizes may spread over a large range, which may have a negative impact on the storage performance. This may be dealt with by imposing artificial lower and upper bounds. This paper suggests an alternative by which the chunk size distribution is controlled in a natural way. Some analytical and experimental results are given.

Original languageEnglish
Title of host publicationProceedings of the Prague Stringology Conference 2015, PSC 2015
EditorsJan Zd'arek, Jan Holub
Pages78-89
Number of pages12
ISBN (Electronic)9788001057872
StatePublished - 2015
Event19th Prague Stringology Conference, PSC 2015 - Prague, Czech Republic
Duration: 24 Aug 201526 Aug 2015

Publication series

NameProceedings of the Prague Stringology Conference 2015, PSC 2015

Conference

Conference19th Prague Stringology Conference, PSC 2015
Country/TerritoryCzech Republic
CityPrague
Period24/08/1526/08/15

Fingerprint

Dive into the research topics of 'Controlling the chunk-size in deduplication systems'. Together they form a unique fingerprint.

Cite this