TY - JOUR
T1 - In place differential file compression
AU - Shapira, Dana
AU - Storer, James A.
PY - 2005/11
Y1 - 2005/11
N2 - We present algorithms for in-place differential file compression, where a target file T of size n is compressed with respect to a source file S of size m using no additional space in addition to the that used to replace S by T; that is, it is possible to encode using m + n + O(1) space and decode using MAX(m, n) + O(1) space (so that when decoding the source file is overwritten by the decompressed target file). From a theoretical point of view, an optimal solution (best possible compression) to this problem is known to be NP-hard, and in previous work we have presented a factor of 4 approximation (not in-place) algorithm based on a sliding window approach. Here we consider practical in-place algorithms based on sliding window compression where our focus is on decoding; that is, although in-place encoding is possible, we will allow O(m + n) space for the encoder so as to improve its speed and present very fast decoding with only MAX(m, n) + O(1) space. Although NP-hardness implies that these algorithms cannot always be optimal, the asymptotic optimality of sliding window methods along with their ability for constant-factor approximation is evidence that they should work well for this problem in practice. We introduce the IPSW algorithm (in-place sliding window) and present experiments that indicate that it compares favorably with traditional practical approaches, even those that do not decode in-place, while at the same time having low encoding complexity and extremely low decoding complexity. IPSW is most effective when S and T are reasonably well aligned (most large common substrings occur in approximately the same order). We also present a preprocessing step for string alignment that can be employed when the encoder determines significant gains will be achieved.
AB - We present algorithms for in-place differential file compression, where a target file T of size n is compressed with respect to a source file S of size m using no additional space in addition to the that used to replace S by T; that is, it is possible to encode using m + n + O(1) space and decode using MAX(m, n) + O(1) space (so that when decoding the source file is overwritten by the decompressed target file). From a theoretical point of view, an optimal solution (best possible compression) to this problem is known to be NP-hard, and in previous work we have presented a factor of 4 approximation (not in-place) algorithm based on a sliding window approach. Here we consider practical in-place algorithms based on sliding window compression where our focus is on decoding; that is, although in-place encoding is possible, we will allow O(m + n) space for the encoder so as to improve its speed and present very fast decoding with only MAX(m, n) + O(1) space. Although NP-hardness implies that these algorithms cannot always be optimal, the asymptotic optimality of sliding window methods along with their ability for constant-factor approximation is evidence that they should work well for this problem in practice. We introduce the IPSW algorithm (in-place sliding window) and present experiments that indicate that it compares favorably with traditional practical approaches, even those that do not decode in-place, while at the same time having low encoding complexity and extremely low decoding complexity. IPSW is most effective when S and T are reasonably well aligned (most large common substrings occur in approximately the same order). We also present a preprocessing step for string alignment that can be employed when the encoder determines significant gains will be achieved.
KW - Algorithms
KW - Approximation theory
KW - Computational complexity
KW - Decoding
KW - Encoding (symbols)
KW - Optimization
KW - Constant factor approximation
KW - Differential file compression
KW - Sliding window compression
KW - Data compression
UR - http://www.scopus.com/inward/record.url?scp=27844459078&partnerID=8YFLogxK
U2 - 10.1093/comjnl/bxh128
DO - 10.1093/comjnl/bxh128
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.systematicreview???
AN - SCOPUS:27844459078
SN - 0010-4620
VL - 48
SP - 677
EP - 691
JO - Computer Journal
JF - Computer Journal
IS - 6
ER -