TY - GEN
T1 - Accelerating the LZ-complexity algorithm
AU - Ratsaby, Joel
AU - Timashkov, Alexander
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The Lempel Ziv complexity of a string has recently been used in pattern recognition and classification as part of a string distance function. Its main advantage is that it can measure dissimilarity between a pair of strings of different lengths. This is very useful for machine learning on unstructured data since such data is not restricted to a fixed input dimensionality. The standard computation of LZ-complexity is inherently serial and is not suitable for processing large unstructured data. Hence, we propose a parallel algorithm that computes the LZ-complexity of strings whose length is limited only by the amount of memory, typically in the tens of gigabytes. The algorithm is implemented in CUDA on a GPU. Its speed-up factor is approximately n2/3 for strings of length n, for at least up to n = 2Mb. For instance, on 2Mb strings, the speed-up is 150. We compare the execution times of kernel variants with shared and global memory. The more efficient variant obtains approximately 90% GPU utilization.
AB - The Lempel Ziv complexity of a string has recently been used in pattern recognition and classification as part of a string distance function. Its main advantage is that it can measure dissimilarity between a pair of strings of different lengths. This is very useful for machine learning on unstructured data since such data is not restricted to a fixed input dimensionality. The standard computation of LZ-complexity is inherently serial and is not suitable for processing large unstructured data. Hence, we propose a parallel algorithm that computes the LZ-complexity of strings whose length is limited only by the amount of memory, typically in the tens of gigabytes. The algorithm is implemented in CUDA on a GPU. Its speed-up factor is approximately n2/3 for strings of length n, for at least up to n = 2Mb. For instance, on 2Mb strings, the speed-up is 150. We compare the execution times of kernel variants with shared and global memory. The more efficient variant obtains approximately 90% GPU utilization.
KW - CUDA
KW - GPU
KW - LZ-complexity
KW - UID distance
KW - string distance
UR - http://www.scopus.com/inward/record.url?scp=85190242648&partnerID=8YFLogxK
U2 - 10.1109/ICPADS60453.2023.00038
DO - 10.1109/ICPADS60453.2023.00038
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85190242648
T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS
SP - 200
EP - 207
BT - Proceedings - 2023 IEEE 29th International Conference on Parallel and Distributed Systems, ICPADS 2023
PB - IEEE Computer Society
T2 - 29th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2023
Y2 - 17 December 2023 through 21 December 2023
ER -