TY - JOUR
T1 - Counting stars and other small subgraphs in sublinear-time
AU - Gonen, Mira
AU - Ron, Dana
AU - Shavitt, Yuval
PY - 2011
Y1 - 2011
N2 - Detecting and counting the number of copies of certain subgraphs (also known as network motifs or graphlets) is motivated by applications in a variety of areas ranging from biology to the study of the World Wide Web. Several polynomial-time algorithms have been suggested for counting or detecting the number of occurrences of certain network motifs. However, a need for more efficient algorithms arises when the input graph is very large, as is indeed the case in many applications of motif counting. In this paper we design sublinear-time algorithms for approximating the number of copies of certain constant-size subgraphs in a graph G. That is, our algorithms do not read the whole graph, but rather query parts of the graph. Specifically, we consider algorithms that may query the degree of any vertex of their choice and may ask for any neighbor of any vertex of their choice. The main focus of this work is on the basic problem of counting the number of length-2 paths and more generally on counting the number of stars of a certain size. Specifically, we design an algorithm that, given an approximation parameter 0 < ε < 1 and query access to a graph G, outputs an estimate ν̂s such that with high constant probability, (1 - ε)νs(G) < ν̂s < (1 + ε)νs(G), where νs(G) denotes the number of stars of size s + 1 in the graph. The expected query complexity and running time of the algorithm are O (n/(νs(G))1/s+1 + min {n1-1/s, n s-1/s/(νs(G))1-1/s}) · poly(log n, 1/ε). We also prove lower bounds showing that this algorithm is tight up to polylogarithmic factors in n and the dependence on ε. Our work extends the work of Feige [SIAM J. Comput., 35 (2006), pp. 964-984] and Goldreich and Ron [Random Structures Algorithms, 32 (2008), pp. 473-493] on approximating the number of edges (or average degree) in a graph. Combined with these results, our result can be used to obtain an estimate on the variance of the degrees in the graph and corresponding higher moments. In addition, we give some (negative) results on approximating the number of triangles and on approximating the number of length-3 paths in sublinear-time.
AB - Detecting and counting the number of copies of certain subgraphs (also known as network motifs or graphlets) is motivated by applications in a variety of areas ranging from biology to the study of the World Wide Web. Several polynomial-time algorithms have been suggested for counting or detecting the number of occurrences of certain network motifs. However, a need for more efficient algorithms arises when the input graph is very large, as is indeed the case in many applications of motif counting. In this paper we design sublinear-time algorithms for approximating the number of copies of certain constant-size subgraphs in a graph G. That is, our algorithms do not read the whole graph, but rather query parts of the graph. Specifically, we consider algorithms that may query the degree of any vertex of their choice and may ask for any neighbor of any vertex of their choice. The main focus of this work is on the basic problem of counting the number of length-2 paths and more generally on counting the number of stars of a certain size. Specifically, we design an algorithm that, given an approximation parameter 0 < ε < 1 and query access to a graph G, outputs an estimate ν̂s such that with high constant probability, (1 - ε)νs(G) < ν̂s < (1 + ε)νs(G), where νs(G) denotes the number of stars of size s + 1 in the graph. The expected query complexity and running time of the algorithm are O (n/(νs(G))1/s+1 + min {n1-1/s, n s-1/s/(νs(G))1-1/s}) · poly(log n, 1/ε). We also prove lower bounds showing that this algorithm is tight up to polylogarithmic factors in n and the dependence on ε. Our work extends the work of Feige [SIAM J. Comput., 35 (2006), pp. 964-984] and Goldreich and Ron [Random Structures Algorithms, 32 (2008), pp. 473-493] on approximating the number of edges (or average degree) in a graph. Combined with these results, our result can be used to obtain an estimate on the variance of the degrees in the graph and corresponding higher moments. In addition, we give some (negative) results on approximating the number of triangles and on approximating the number of length-3 paths in sublinear-time.
KW - Approximate counting
KW - Subgraphs
KW - Sublinear-time algorithms
UR - http://www.scopus.com/inward/record.url?scp=82955197378&partnerID=8YFLogxK
U2 - 10.1137/100783066
DO - 10.1137/100783066
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:82955197378
SN - 0895-4801
VL - 25
SP - 1365
EP - 1411
JO - SIAM Journal on Discrete Mathematics
JF - SIAM Journal on Discrete Mathematics
IS - 3
ER -