TY - JOUR
T1 - Direct merging of delta encoded files
AU - Shapira, Dana
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2020/3/15
Y1 - 2020/3/15
N2 - Delta encoding represents a target file making use of a source file by replacing common substrings by pointer references. Two similar, yet different, models are introduced and investigated in this paper: the Compressed Transitive Delta Encoding (CTDE) and the Compressed Source Delta Encoding (CSDE) paradigms. In these models we are given two delta files and the goal is to construct a third delta file working directly on the given compressed forms. Formally, given a source file S and two differencing files Δ(S,T) and Δ(T,R), where Δ(X,Y) is used to denote the delta file of the target file Y with respect to the source file X, the objective of the CTDE problem is to be able to attain R. Unlike the traditional way which uses S to decompress Δ(S,T), in order to attain T, and then applies Δ(T,R) on T to obtain R, CTDE constructs a delta file Δ(S,R) working directly on the two given delta files Δ(S,T) and Δ(T,R), without any decompression or the use of the base file S. Thus, avoiding the storage of the redundant intermediate file T. An algorithm for solving CTDE is proposed and its compression performance is compared to the traditional “double delta decompression”. Not only does it use constant space, as opposed to linear memory storage used by the traditional method, experiments show that the compression efficiency of the constructed delta file Δ(S,R) is usually better than both Δ(S,T) and Δ(T,R). The CSDE problem deals with a source file S and two differencing files Δ(S,T) and Δ(S,R), and the goal is still to be able to attain R. Although it is not always possible to construct the target file R by processing only the two input delta files, empirical experiments show that on typical real life data, usually about 99% of the file can be constructed using the proposed algorithm for the CSDE problem.
AB - Delta encoding represents a target file making use of a source file by replacing common substrings by pointer references. Two similar, yet different, models are introduced and investigated in this paper: the Compressed Transitive Delta Encoding (CTDE) and the Compressed Source Delta Encoding (CSDE) paradigms. In these models we are given two delta files and the goal is to construct a third delta file working directly on the given compressed forms. Formally, given a source file S and two differencing files Δ(S,T) and Δ(T,R), where Δ(X,Y) is used to denote the delta file of the target file Y with respect to the source file X, the objective of the CTDE problem is to be able to attain R. Unlike the traditional way which uses S to decompress Δ(S,T), in order to attain T, and then applies Δ(T,R) on T to obtain R, CTDE constructs a delta file Δ(S,R) working directly on the two given delta files Δ(S,T) and Δ(T,R), without any decompression or the use of the base file S. Thus, avoiding the storage of the redundant intermediate file T. An algorithm for solving CTDE is proposed and its compression performance is compared to the traditional “double delta decompression”. Not only does it use constant space, as opposed to linear memory storage used by the traditional method, experiments show that the compression efficiency of the constructed delta file Δ(S,R) is usually better than both Δ(S,T) and Δ(T,R). The CSDE problem deals with a source file S and two differencing files Δ(S,T) and Δ(S,R), and the goal is still to be able to attain R. Although it is not always possible to construct the target file R by processing only the two input delta files, empirical experiments show that on typical real life data, usually about 99% of the file can be constructed using the proposed algorithm for the CSDE problem.
KW - Data compression
KW - Delta encoding
KW - Lempel–Ziv 1977 encoding
UR - http://www.scopus.com/inward/record.url?scp=85051115580&partnerID=8YFLogxK
U2 - 10.1016/j.dam.2018.07.011
DO - 10.1016/j.dam.2018.07.011
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85051115580
SN - 0166-218X
VL - 274
SP - 130
EP - 140
JO - Discrete Applied Mathematics
JF - Discrete Applied Mathematics
ER -