TY - GEN
T1 - Real-Time streaming multi-pattern search for constant alphabet
AU - Golan, Shay
AU - Porat, Ely
PY - 2017/9/1
Y1 - 2017/9/1
N2 - In the streaming multi-pattern search problem, which is also known as the streaming dictionary matching problem, a set D = {P1, P2, . . . , Pd} of d patterns (strings over an alphabet ∑), called the dictionary, is given to be preprocessed. Then, a text T arrives one character at a time and the goal is to report, before the next character arrives, the longest pattern in the dictionary that is a current suffix of T. We prove that for a constant size alphabet, there exists a randomized Monte-Carlo algorithm for the streaming dictionary matching problem that takes constant time per character and uses O(d logm) words of space, where m is the length of the longest pattern in the dictionary. In the case where the alphabet size is not constant, we introduce two new randomized Monte-Carlo algorithms with the following complexities: O(log log |∑|) time per character in the worst case and O(d logm) words of space. O( 1/ ϵ ) time per character in the worst case and O(d|∑|ϵ log m/ϵ ) words of space for any 0 < ϵ≤1. These results improve upon the algorithm of Clifford et al. [12] which uses O(d logm) words of space and takes O(log log(m + d)) time per character.
AB - In the streaming multi-pattern search problem, which is also known as the streaming dictionary matching problem, a set D = {P1, P2, . . . , Pd} of d patterns (strings over an alphabet ∑), called the dictionary, is given to be preprocessed. Then, a text T arrives one character at a time and the goal is to report, before the next character arrives, the longest pattern in the dictionary that is a current suffix of T. We prove that for a constant size alphabet, there exists a randomized Monte-Carlo algorithm for the streaming dictionary matching problem that takes constant time per character and uses O(d logm) words of space, where m is the length of the longest pattern in the dictionary. In the case where the alphabet size is not constant, we introduce two new randomized Monte-Carlo algorithms with the following complexities: O(log log |∑|) time per character in the worst case and O(d logm) words of space. O( 1/ ϵ ) time per character in the worst case and O(d|∑|ϵ log m/ϵ ) words of space for any 0 < ϵ≤1. These results improve upon the algorithm of Clifford et al. [12] which uses O(d logm) words of space and takes O(log log(m + d)) time per character.
KW - Dictionary
KW - Fingerprints
KW - Multi-pattern
KW - Streaming pattern matching
UR - https://www.scopus.com/pages/publications/85030526376
U2 - 10.4230/LIPIcs.ESA.2017.41
DO - 10.4230/LIPIcs.ESA.2017.41
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85030526376
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 25th European Symposium on Algorithms, ESA 2017
A2 - Sohler, Christian
A2 - Sohler, Christian
A2 - Pruhs, Kirk
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 25th European Symposium on Algorithms, ESA 2017
Y2 - 4 September 2017 through 6 September 2017
ER -