TY - JOUR

T1 - Direct merging of delta encoded files

AU - Shapira, Dana

N1 - Publisher Copyright:
© 2018 Elsevier B.V.

PY - 2020/3/15

Y1 - 2020/3/15

N2 - Delta encoding represents a target file making use of a source file by replacing common substrings by pointer references. Two similar, yet different, models are introduced and investigated in this paper: the Compressed Transitive Delta Encoding (CTDE) and the Compressed Source Delta Encoding (CSDE) paradigms. In these models we are given two delta files and the goal is to construct a third delta file working directly on the given compressed forms. Formally, given a source file S and two differencing files Δ(S,T) and Δ(T,R), where Δ(X,Y) is used to denote the delta file of the target file Y with respect to the source file X, the objective of the CTDE problem is to be able to attain R. Unlike the traditional way which uses S to decompress Δ(S,T), in order to attain T, and then applies Δ(T,R) on T to obtain R, CTDE constructs a delta file Δ(S,R) working directly on the two given delta files Δ(S,T) and Δ(T,R), without any decompression or the use of the base file S. Thus, avoiding the storage of the redundant intermediate file T. An algorithm for solving CTDE is proposed and its compression performance is compared to the traditional “double delta decompression”. Not only does it use constant space, as opposed to linear memory storage used by the traditional method, experiments show that the compression efficiency of the constructed delta file Δ(S,R) is usually better than both Δ(S,T) and Δ(T,R). The CSDE problem deals with a source file S and two differencing files Δ(S,T) and Δ(S,R), and the goal is still to be able to attain R. Although it is not always possible to construct the target file R by processing only the two input delta files, empirical experiments show that on typical real life data, usually about 99% of the file can be constructed using the proposed algorithm for the CSDE problem.

AB - Delta encoding represents a target file making use of a source file by replacing common substrings by pointer references. Two similar, yet different, models are introduced and investigated in this paper: the Compressed Transitive Delta Encoding (CTDE) and the Compressed Source Delta Encoding (CSDE) paradigms. In these models we are given two delta files and the goal is to construct a third delta file working directly on the given compressed forms. Formally, given a source file S and two differencing files Δ(S,T) and Δ(T,R), where Δ(X,Y) is used to denote the delta file of the target file Y with respect to the source file X, the objective of the CTDE problem is to be able to attain R. Unlike the traditional way which uses S to decompress Δ(S,T), in order to attain T, and then applies Δ(T,R) on T to obtain R, CTDE constructs a delta file Δ(S,R) working directly on the two given delta files Δ(S,T) and Δ(T,R), without any decompression or the use of the base file S. Thus, avoiding the storage of the redundant intermediate file T. An algorithm for solving CTDE is proposed and its compression performance is compared to the traditional “double delta decompression”. Not only does it use constant space, as opposed to linear memory storage used by the traditional method, experiments show that the compression efficiency of the constructed delta file Δ(S,R) is usually better than both Δ(S,T) and Δ(T,R). The CSDE problem deals with a source file S and two differencing files Δ(S,T) and Δ(S,R), and the goal is still to be able to attain R. Although it is not always possible to construct the target file R by processing only the two input delta files, empirical experiments show that on typical real life data, usually about 99% of the file can be constructed using the proposed algorithm for the CSDE problem.

KW - Data compression

KW - Delta encoding

KW - Lempel–Ziv 1977 encoding

UR - http://www.scopus.com/inward/record.url?scp=85051115580&partnerID=8YFLogxK

U2 - 10.1016/j.dam.2018.07.011

DO - 10.1016/j.dam.2018.07.011

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:85051115580

SN - 0166-218X

VL - 274

SP - 130

EP - 140

JO - Discrete Applied Mathematics

JF - Discrete Applied Mathematics

ER -