TY - GEN
T1 - Time-Space Tradeoffs for Finding a Long Common Substring
AU - Ben-Nun, Stav
AU - Golan, Shay
AU - Kociumaka, Tomasz
AU - Kraus, Matan
N1 - Publisher Copyright:
© 2020 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - We consider the problem of finding, given two documents of total length n, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic O(n)-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require(n) space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for n2/3sn, the LCS problem can be solved in O(s) space and∼O ( n2 s ) time.1 Kociumaka et al. (ESA 2014) generalized this tradeoff to 1sn, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length L of the sought LCS is large. For 1sn, we show that the LCS problem can be solved in O(s) space and∼O( n2 L·s + n) time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents. 2012 ACM Subject Classification Theory of computation ! Pattern matching.
AB - We consider the problem of finding, given two documents of total length n, a longest string occurring as a substring of both documents. This problem, known as the Longest Common Substring (LCS) problem, has a classic O(n)-time solution dating back to the discovery of suffix trees (Weiner, 1973) and their efficient construction for integer alphabets (Farach-Colton, 1997). However, these solutions require(n) space, which is prohibitive in many applications. To address this issue, Starikovskaya and Vildhøj (CPM 2013) showed that for n2/3sn, the LCS problem can be solved in O(s) space and∼O ( n2 s ) time.1 Kociumaka et al. (ESA 2014) generalized this tradeoff to 1sn, thus providing a smooth time-space tradeoff from constant to linear space. In this paper, we obtain a significant speed-up for instances where the length L of the sought LCS is large. For 1sn, we show that the LCS problem can be solved in O(s) space and∼O( n2 L·s + n) time. The result is based on techniques originating from the LCS with Mismatches problem (Flouri et al., 2015; Charalampopoulos et al., CPM 2018), on space-efficient locally consistent parsing (Birenzwige et al., SODA 2020), and on the structure of maximal repetitions (runs) in the input documents. 2012 ACM Subject Classification Theory of computation ! Pattern matching.
KW - Local consistency
KW - Longest common substring
KW - Periodicity
KW - Time-space tradeoff
UR - https://www.scopus.com/pages/publications/85088397666
U2 - 10.4230/LIPIcs.CPM.2020.5
DO - 10.4230/LIPIcs.CPM.2020.5
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85088397666
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
A2 - Gortz, Inge Li
A2 - Weimann, Oren
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 31st Annual Symposium on Combinatorial Pattern Matching, CPM 2020
Y2 - 17 June 2020 through 19 June 2020
ER -