TY - JOUR
T1 - Co-clustering of fuzzy lagged data
AU - Shaham, Eran
AU - Sarne, David
AU - Ben-Moshe, Boaz
N1 - Publisher Copyright:
© 2014, Springer-Verlag London.
PY - 2015/7/17
Y1 - 2015/7/17
N2 - The paper focuses on mining patterns that are characterized by a fuzzy lagged relationship between the data objects forming them. Such a regulatory mechanism is quite common in real-life settings. It appears in a variety of fields: finance, gene expression, neuroscience, crowds and collective movements are but a limited list of examples. Mining such patterns not only helps in understanding the relationship between objects in the domain, but assists in forecasting their future behavior. For most interesting variants of this problem, finding an optimal fuzzy lagged co-cluster is an NP-complete problem. We present a polynomial time Monte Carlo approximation algorithm for mining fuzzy lagged co-clusters. We prove that for any data matrix, the algorithm mines a fuzzy lagged co-cluster with fixed probability, which encompasses the optimal fuzzy lagged co-cluster by a maximum 2 ratio columns overhead and completely no rows overhead. Moreover, the algorithm handles noise, anti-correlations, missing values and overlapping patterns. The algorithm was extensively evaluated using both artificial and real-life datasets. The results not only corroborate the ability of the algorithm to efficiently mine relevant and accurate fuzzy lagged co-clusters, but also illustrate the importance of including fuzziness in the lagged-pattern model.
AB - The paper focuses on mining patterns that are characterized by a fuzzy lagged relationship between the data objects forming them. Such a regulatory mechanism is quite common in real-life settings. It appears in a variety of fields: finance, gene expression, neuroscience, crowds and collective movements are but a limited list of examples. Mining such patterns not only helps in understanding the relationship between objects in the domain, but assists in forecasting their future behavior. For most interesting variants of this problem, finding an optimal fuzzy lagged co-cluster is an NP-complete problem. We present a polynomial time Monte Carlo approximation algorithm for mining fuzzy lagged co-clusters. We prove that for any data matrix, the algorithm mines a fuzzy lagged co-cluster with fixed probability, which encompasses the optimal fuzzy lagged co-cluster by a maximum 2 ratio columns overhead and completely no rows overhead. Moreover, the algorithm handles noise, anti-correlations, missing values and overlapping patterns. The algorithm was extensively evaluated using both artificial and real-life datasets. The results not only corroborate the ability of the algorithm to efficiently mine relevant and accurate fuzzy lagged co-clusters, but also illustrate the importance of including fuzziness in the lagged-pattern model.
KW - Biclustering
KW - Data mining
KW - Fuzzy lagged data clustering
KW - Spatio-temporal patterns
KW - Time lagged
UR - http://www.scopus.com/inward/record.url?scp=84931004794&partnerID=8YFLogxK
U2 - 10.1007/s10115-014-0758-7
DO - 10.1007/s10115-014-0758-7
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:84931004794
SN - 0219-1377
VL - 44
SP - 217
EP - 252
JO - Knowledge and Information Systems
JF - Knowledge and Information Systems
IS - 1
ER -