TY - JOUR
T1 - Edit distance with move operations
AU - Shapira, Dana
AU - Storer, James A.
N1 - Funding Information:
The work is part of the CHEC (Combustion and Harmful Emission Control) research program at DTU Chemical Engineering. YS wishes to acknowledge funding from CSC (China Scholarship Council). PM thanks the R.A. Welch Foundation (Grant B-1174) for support.
PY - 2007/6
Y1 - 2007/6
N2 - The traditional edit-distance problem is to find the minimum number of insert-character and delete-character (and sometimes change character) operations required to transform one string into another. Here we consider the more general problem of a string represented by a singly linked list (one character per node) and being able to apply these operations to the pointer associated with a vertex as well as the character associated with the vertex. That is, in O (1) time, not only can characters be inserted or deleted, but substrings can be moved or deleted. We limit our attention to the ability to move substrings and leave substring deletions for future research. Note that O (1) time substring move operation implies O (1) substring exchange operation as well, a form of transformation that has been of interest in molecular biology. We show that this problem is NP-complete, and that a "recursive" sequence of moves can be simulated with at most a constant factor increase by a non-recursive sequence. Although a greedy algorithm is known to have poor (a polynomial factor) worst case performance, we present a polynomial time greedy algorithm for non-recursive moves which on a subclass of instances of a problem of size n achieves an approximation factor to optimal of at most O (log n). The development of this greedy algorithm shows how to reduce moves of substrings to moves of characters, and how to convert moves of characters to only inserts and deletes of characters.
AB - The traditional edit-distance problem is to find the minimum number of insert-character and delete-character (and sometimes change character) operations required to transform one string into another. Here we consider the more general problem of a string represented by a singly linked list (one character per node) and being able to apply these operations to the pointer associated with a vertex as well as the character associated with the vertex. That is, in O (1) time, not only can characters be inserted or deleted, but substrings can be moved or deleted. We limit our attention to the ability to move substrings and leave substring deletions for future research. Note that O (1) time substring move operation implies O (1) substring exchange operation as well, a form of transformation that has been of interest in molecular biology. We show that this problem is NP-complete, and that a "recursive" sequence of moves can be simulated with at most a constant factor increase by a non-recursive sequence. Although a greedy algorithm is known to have poor (a polynomial factor) worst case performance, we present a polynomial time greedy algorithm for non-recursive moves which on a subclass of instances of a problem of size n achieves an approximation factor to optimal of at most O (log n). The development of this greedy algorithm shows how to reduce moves of substrings to moves of characters, and how to convert moves of characters to only inserts and deletes of characters.
KW - Bin packing problem
KW - Edit-distance
KW - Longest common substring (LCS)
KW - NP-complete
UR - http://www.scopus.com/inward/record.url?scp=34247124656&partnerID=8YFLogxK
U2 - 10.1016/j.jda.2005.01.010
DO - 10.1016/j.jda.2005.01.010
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:34247124656
SN - 1570-8667
VL - 5
SP - 380
EP - 392
JO - Journal of Discrete Algorithms
JF - Journal of Discrete Algorithms
IS - 2 SPEC. ISS.
ER -