Abstract
The String-to-Dictionary Matching Problem is defined, in which a string is searched for in all the possible concatenations of the elements of a given dictionary, with applications to compressed matching in variable to fixed-length encodings, such as Tunstall's. Two algorithms based on suffix trees are suggested, the one focusing on the dictionary, the other on the pattern to be searched for. The problem is then extended to deal also with patterns that include gaps. Experiments on natural language text suggest that compressed search might use less comparisons for long enough patterns, in spite of a potentially large number of encodings.
Original language | English |
---|---|
Pages (from-to) | 1347-1356 |
Number of pages | 10 |
Journal | Computer Journal |
Volume | 55 |
Issue number | 11 |
DOIs | |
State | Published - Nov 2012 |
Externally published | Yes |
Keywords
- compressed matching
- suffix trees
- tunstall