TY - JOUR
T1 - LOCALLY CONSISTENT PARSING FOR TEXT INDEXING IN SMALL SPACE
AU - Birenzwige, Or
AU - Golan, Shay
AU - Porat, Ely
N1 - Publisher Copyright:
© 2025 Society for Industriai and Appiied Mathematics.
PY - 2025
Y1 - 2025
N2 - We consider two closely related problems of text indexing in a sublinear working space. The first problem is the sparse suffix tree construction, where a text S is given in read-only memory, along with a set of suffixes B, and the goal is to construct the compressed trie of all these suffixes ordered lexicographically, using only \scrO(|B|) words of space. The second problem is the longest common extension problem, where again a text S of length n is given in read-only memory with some trade-off parameter 1 \leq \tau \leq n, and the goal is to construct a data structure that uses \scrO(n\tau ) words of space and can compute for any pair of suffixes their longest common prefix length as fast as possible as a function of \tau (O(\tau ) time for a randomized Las Vegas data structure or O(\tau \sqrt{}log\ast n) time for a deterministic data structure). We show how to use ideas based on the locally consistent parsing technique, that were introduced by Sahinalp and Vishkin [Proceedings of the 26th Annual ACM Symposium on Theory of Computing, 1994, pp. 300-309], in some nontrivial ways in order to improve the known results for the above problems under the space constraints. We introduce the first almost-linear, \scrO(n \cdot polylog n), deterministic construction for both problems, where all previous algorithms take at least \Omega(min\{n|B|, |nB2| \}) time. We also introduce the first linear-time Las Vegas algorithms for both problems, achieving \scrO(n) construction time with high probability. This is an improvement over the last result of Gawrychowski and Kociumaka [Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, 2017, pp. 425-439], which obtained \scrO(n) time for the Monte Carlo algorithm and \scrO(n\sqrt{}log |B|) time with high probability for the Las Vegas algorithm.
AB - We consider two closely related problems of text indexing in a sublinear working space. The first problem is the sparse suffix tree construction, where a text S is given in read-only memory, along with a set of suffixes B, and the goal is to construct the compressed trie of all these suffixes ordered lexicographically, using only \scrO(|B|) words of space. The second problem is the longest common extension problem, where again a text S of length n is given in read-only memory with some trade-off parameter 1 \leq \tau \leq n, and the goal is to construct a data structure that uses \scrO(n\tau ) words of space and can compute for any pair of suffixes their longest common prefix length as fast as possible as a function of \tau (O(\tau ) time for a randomized Las Vegas data structure or O(\tau \sqrt{}log\ast n) time for a deterministic data structure). We show how to use ideas based on the locally consistent parsing technique, that were introduced by Sahinalp and Vishkin [Proceedings of the 26th Annual ACM Symposium on Theory of Computing, 1994, pp. 300-309], in some nontrivial ways in order to improve the known results for the above problems under the space constraints. We introduce the first almost-linear, \scrO(n \cdot polylog n), deterministic construction for both problems, where all previous algorithms take at least \Omega(min\{n|B|, |nB2| \}) time. We also introduce the first linear-time Las Vegas algorithms for both problems, achieving \scrO(n) construction time with high probability. This is an improvement over the last result of Gawrychowski and Kociumaka [Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, 2017, pp. 425-439], which obtained \scrO(n) time for the Monte Carlo algorithm and \scrO(n\sqrt{}log |B|) time with high probability for the Las Vegas algorithm.
KW - locally consistent parsing
KW - longest common extension
KW - partitioning set
KW - suffix tree
KW - text indexing
UR - https://www.scopus.com/pages/publications/105013789438
U2 - 10.1137/21M1465706
DO - 10.1137/21M1465706
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:105013789438
SN - 0097-5397
VL - 54
SP - 916
EP - 963
JO - SIAM Journal on Computing
JF - SIAM Journal on Computing
IS - 4
ER -