TY - GEN
T1 - Joint cluster analysis of attribute data and relationship data
T2 - Sixth SIAM International Conference on Data Mining
AU - Ester, Martin
AU - Ge, Rong
AU - Gao, Byron J.
AU - Hu, Zengjian
AU - Ben-Moshe, Boaz
PY - 2006
Y1 - 2006
N2 - Attribute data and relationship data are two principle types of data, representing the intrinsic and extrinsic properties of entities. While attribute data has been the main source of data for cluster analysis, relationship data such as social networks or metabolic networks are becoming increasingly available. It is also common to observe both data types carry orthogonal information such as in market segmentation and community identification, which calls for a joint cluster analysis of both data types so as to achieve more accurate results. For this purpose, we introduce the novel Connected k-Center problem, taking into account attribute data as well as relationship data. We analyze the complexity of this problem and prove its NP-completeness. We also present a constant factor approximation algorithm, based on which we further design NetScan, a heuristic algorithm that is efficient for large, real databases. Our experimental evaluation demonstrates the meaningfulness and accuracy of the NetScan results.
AB - Attribute data and relationship data are two principle types of data, representing the intrinsic and extrinsic properties of entities. While attribute data has been the main source of data for cluster analysis, relationship data such as social networks or metabolic networks are becoming increasingly available. It is also common to observe both data types carry orthogonal information such as in market segmentation and community identification, which calls for a joint cluster analysis of both data types so as to achieve more accurate results. For this purpose, we introduce the novel Connected k-Center problem, taking into account attribute data as well as relationship data. We analyze the complexity of this problem and prove its NP-completeness. We also present a constant factor approximation algorithm, based on which we further design NetScan, a heuristic algorithm that is efficient for large, real databases. Our experimental evaluation demonstrates the meaningfulness and accuracy of the NetScan results.
UR - http://www.scopus.com/inward/record.url?scp=33745484604&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972764.22
DO - 10.1137/1.9781611972764.22
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:33745484604
SN - 089871611X
SN - 9780898716115
T3 - Proceedings of the Sixth SIAM International Conference on Data Mining
SP - 246
EP - 257
BT - Proceedings of the Sixth SIAM International Conference on Data Mining
Y2 - 20 April 2006 through 22 April 2006
ER -