TY - JOUR
T1 - Weighted forward looking adaptive coding
AU - Fruchtman, Aharon
AU - Gross, Yoav
AU - Klein, Shmuel T.
AU - Shapira, Dana
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/9/21
Y1 - 2022/9/21
N2 - Huffman coding is known to be optimal under certain constraints, yet its dynamic version, which constantly alters the Huffman tree as a function of the already processed characters, may be even more efficient in practice. A new forward looking variant of Huffman compression has been proposed recently, that provably always performs better than static Huffman coding by at least m−1 bits, where m denotes the size of the alphabet, and has a better worst case size than the standard dynamic Huffman coding. This paper introduces a new generic coding method, extending the known static and dynamic variants and including them as special cases. In fact, the generalization is applicable to all statistical methods, including arithmetic coding. This leads then to the formalization of a new double-pass coding method that is adaptive in the sense that it uses changing statistics depending on the current position within the processed file, yet it behaves like static coding, as it assumes the knowledge of the distribution in the entire file; this is contrary to online variants that rely only on the text seen so far and adapt the model dynamically. We call the new method positional coding, and its compression performance, using global statistics, is provably always at least as good as that of the best dynamic variants known to date. Moreover, we present empirical results that show improvements by positional coding and its extensions over static and dynamic Huffman and arithmetic coding, even when the encoded file includes the model description.
AB - Huffman coding is known to be optimal under certain constraints, yet its dynamic version, which constantly alters the Huffman tree as a function of the already processed characters, may be even more efficient in practice. A new forward looking variant of Huffman compression has been proposed recently, that provably always performs better than static Huffman coding by at least m−1 bits, where m denotes the size of the alphabet, and has a better worst case size than the standard dynamic Huffman coding. This paper introduces a new generic coding method, extending the known static and dynamic variants and including them as special cases. In fact, the generalization is applicable to all statistical methods, including arithmetic coding. This leads then to the formalization of a new double-pass coding method that is adaptive in the sense that it uses changing statistics depending on the current position within the processed file, yet it behaves like static coding, as it assumes the knowledge of the distribution in the entire file; this is contrary to online variants that rely only on the text seen so far and adapt the model dynamically. We call the new method positional coding, and its compression performance, using global statistics, is provably always at least as good as that of the best dynamic variants known to date. Moreover, we present empirical results that show improvements by positional coding and its extensions over static and dynamic Huffman and arithmetic coding, even when the encoded file includes the model description.
KW - Arithmetic coding
KW - Huffman coding
KW - Lossless data compression
KW - Static and dynamic coding
UR - http://www.scopus.com/inward/record.url?scp=85134807890&partnerID=8YFLogxK
U2 - 10.1016/j.tcs.2022.07.013
DO - 10.1016/j.tcs.2022.07.013
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85134807890
SN - 0304-3975
VL - 930
SP - 86
EP - 99
JO - Theoretical Computer Science
JF - Theoretical Computer Science
ER -