Direct merging of delta encoded files

Research output: Contribution to journalArticlepeer-review

Abstract

Delta encoding represents a target file making use of a source file by replacing common substrings by pointer references. Two similar, yet different, models are introduced and investigated in this paper: the Compressed Transitive Delta Encoding (CTDE) and the Compressed Source Delta Encoding (CSDE) paradigms. In these models we are given two delta files and the goal is to construct a third delta file working directly on the given compressed forms. Formally, given a source file S and two differencing files Δ(S,T) and Δ(T,R), where Δ(X,Y) is used to denote the delta file of the target file Y with respect to the source file X, the objective of the CTDE problem is to be able to attain R. Unlike the traditional way which uses S to decompress Δ(S,T), in order to attain T, and then applies Δ(T,R) on T to obtain R, CTDE constructs a delta file Δ(S,R) working directly on the two given delta files Δ(S,T) and Δ(T,R), without any decompression or the use of the base file S. Thus, avoiding the storage of the redundant intermediate file T. An algorithm for solving CTDE is proposed and its compression performance is compared to the traditional “double delta decompression”. Not only does it use constant space, as opposed to linear memory storage used by the traditional method, experiments show that the compression efficiency of the constructed delta file Δ(S,R) is usually better than both Δ(S,T) and Δ(T,R). The CSDE problem deals with a source file S and two differencing files Δ(S,T) and Δ(S,R), and the goal is still to be able to attain R. Although it is not always possible to construct the target file R by processing only the two input delta files, empirical experiments show that on typical real life data, usually about 99% of the file can be constructed using the proposed algorithm for the CSDE problem.

Original languageEnglish
Pages (from-to)130-140
Number of pages11
JournalDiscrete Applied Mathematics
Volume274
DOIs
StatePublished - 15 Mar 2020

Keywords

  • Data compression
  • Delta encoding
  • Lempel–Ziv 1977 encoding

Fingerprint

Dive into the research topics of 'Direct merging of delta encoded files'. Together they form a unique fingerprint.

Cite this