TY - GEN
T1 - Towards optimal approximate streaming pattern matching by matching multiple patterns in multiple streams
AU - Golan, Shay
AU - Kopelowitz, Tsvi
AU - Porat, Ely
N1 - Publisher Copyright:
© Shay Golan, Tsvi Kopelowitz, and Ely Porat;.
PY - 2018/7/1
Y1 - 2018/7/1
N2 - Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately. In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D = (P1, P2, . . ., Pd) of d = |D| patterns (strings with maximum length m over alphabet) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear. We design two e cient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m + log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use di erent techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average.
AB - Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately. In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D = (P1, P2, . . ., Pd) of d = |D| patterns (strings with maximum length m over alphabet) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear. We design two e cient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m + log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use di erent techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average.
KW - Dictionary matching
KW - Streaming approximate pattern matching
UR - https://www.scopus.com/pages/publications/85049805482
U2 - 10.4230/LIPIcs.ICALP.2018.65
DO - 10.4230/LIPIcs.ICALP.2018.65
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85049805482
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018
A2 - Kaklamanis, Christos
A2 - Marx, Daniel
A2 - Chatzigiannakis, Ioannis
A2 - Sannella, Donald
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018
Y2 - 9 July 2018 through 13 July 2018
ER -