LOCALLY CONSISTENT PARSING FOR TEXT INDEXING IN SMALL SPACE

Or Birenzwige, Shay Golan, Ely Porat

Research output: Contribution to journalArticlepeer-review

Abstract

We consider two closely related problems of text indexing in a sublinear working space. The first problem is the sparse suffix tree construction, where a text S is given in read-only memory, along with a set of suffixes B, and the goal is to construct the compressed trie of all these suffixes ordered lexicographically, using only \scrO(|B|) words of space. The second problem is the longest common extension problem, where again a text S of length n is given in read-only memory with some trade-off parameter 1 \leq \tau \leq n, and the goal is to construct a data structure that uses \scrO(n\tau ) words of space and can compute for any pair of suffixes their longest common prefix length as fast as possible as a function of \tau (O(\tau ) time for a randomized Las Vegas data structure or O(\tau \sqrt{}log\ast n) time for a deterministic data structure). We show how to use ideas based on the locally consistent parsing technique, that were introduced by Sahinalp and Vishkin [Proceedings of the 26th Annual ACM Symposium on Theory of Computing, 1994, pp. 300-309], in some nontrivial ways in order to improve the known results for the above problems under the space constraints. We introduce the first almost-linear, \scrO(n \cdot polylog n), deterministic construction for both problems, where all previous algorithms take at least \Omega(min\{n|B|, |nB2| \}) time. We also introduce the first linear-time Las Vegas algorithms for both problems, achieving \scrO(n) construction time with high probability. This is an improvement over the last result of Gawrychowski and Kociumaka [Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms, 2017, pp. 425-439], which obtained \scrO(n) time for the Monte Carlo algorithm and \scrO(n\sqrt{}log |B|) time with high probability for the Las Vegas algorithm.

Original languageEnglish
Pages (from-to)916-963
Number of pages48
JournalSIAM Journal on Computing
Volume54
Issue number4
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • locally consistent parsing
  • longest common extension
  • partitioning set
  • suffix tree
  • text indexing

Fingerprint

Dive into the research topics of 'LOCALLY CONSISTENT PARSING FOR TEXT INDEXING IN SMALL SPACE'. Together they form a unique fingerprint.

Cite this