TY - JOUR
T1 - Joint Cluster Analysis of Attribute Data and Relationship Data
T2 - The Connected k-Center Problem, Algorithms and Applications
AU - Ge, Rong
AU - Ester, Martin
AU - Gao, Byron J.
AU - Hu, Zengjian
AU - Bhattacharya, Binay
AU - Ben-Moshe, Boaz
PY - 2008/7/1
Y1 - 2008/7/1
N2 - Attribute data and relationship data are two principal types of data, representing the intrinsic and extrinsic properties of entities. While attribute data have been the main source of data for cluster analysis, relationship data such as social networks or metabolic networks are becoming increasingly available. It is also common to observe both data types carry complementary information such as in market segmentation and community identification, which calls for a joint cluster analysis of both data types so as to achieve better results. In this article, we introduce the novel Connected k-Center (CkC) problem, a clustering model taking into account attribute data as well as relationship data. We analyze the complexity of the problem and prove its NP-hardness. Therefore, we analyze the approximability of the problem and also present a constant factor approximation algorithm. For the special case of the CkC problem where the relationship data form a tree structure, we propose a dynamic programming method giving an optimal solution in polynomial time. We further present NetScan, a heuristic algorithm that is efficient and effective for large real databases. Our extensive experimental evaluation on real datasets demonstrates the meaningfulness and accuracy of the NetScan results.
AB - Attribute data and relationship data are two principal types of data, representing the intrinsic and extrinsic properties of entities. While attribute data have been the main source of data for cluster analysis, relationship data such as social networks or metabolic networks are becoming increasingly available. It is also common to observe both data types carry complementary information such as in market segmentation and community identification, which calls for a joint cluster analysis of both data types so as to achieve better results. In this article, we introduce the novel Connected k-Center (CkC) problem, a clustering model taking into account attribute data as well as relationship data. We analyze the complexity of the problem and prove its NP-hardness. Therefore, we analyze the approximability of the problem and also present a constant factor approximation algorithm. For the special case of the CkC problem where the relationship data form a tree structure, we propose a dynamic programming method giving an optimal solution in polynomial time. We further present NetScan, a heuristic algorithm that is efficient and effective for large real databases. Our extensive experimental evaluation on real datasets demonstrates the meaningfulness and accuracy of the NetScan results.
KW - Approximation algorithms
KW - Attribute data
KW - Community identification
KW - Document clustering
KW - Joint cluster analysis
KW - Market segmentation
KW - NP-hardness
KW - Relationship data
UR - http://www.scopus.com/inward/record.url?scp=84859109801&partnerID=8YFLogxK
U2 - 10.1145/1376815.1376816
DO - 10.1145/1376815.1376816
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:49149121323
SN - 1556-4681
VL - 2
SP - 1
EP - 35
JO - ACM Transactions on Knowledge Discovery from Data
JF - ACM Transactions on Knowledge Discovery from Data
IS - 2
ER -