TY - GEN
T1 - Optimal communication structures for big data aggregation
AU - Culhane, William
AU - Kogan, Kirill
AU - Jayalath, Chamikara
AU - Eugster, Patrick
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/21
Y1 - 2015/8/21
N2 - Aggregation of computed sets of results fundamentally underlies the distillation of information in many of today's big data applications. To this end there are many systems which have been introduced which allow users to obtain aggregate results by aggregating along communication structures such as trees, but they do not focus on optimizing performance by optimizing the underlying structure to perform the aggregation. We consider two cases of the problem - aggregation of (1) single blocks of data, and of (2) streaming input. For each case we determine which metric of 'fast' completion is the most relevant and mathematically model resulting systems based on aggregation trees to optimize that metric. Our assumptions and model are laid out in depth. From our model we determine how to create a provably ideal aggregation tree (i.e., with optimal fanin) using only limited information about the aggregation function being applied. Experiments in the Amazon Elastic Compute Cloud (EC2) confirm the validatity of our models in practice.
AB - Aggregation of computed sets of results fundamentally underlies the distillation of information in many of today's big data applications. To this end there are many systems which have been introduced which allow users to obtain aggregate results by aggregating along communication structures such as trees, but they do not focus on optimizing performance by optimizing the underlying structure to perform the aggregation. We consider two cases of the problem - aggregation of (1) single blocks of data, and of (2) streaming input. For each case we determine which metric of 'fast' completion is the most relevant and mathematically model resulting systems based on aggregation trees to optimize that metric. Our assumptions and model are laid out in depth. From our model we determine how to create a provably ideal aggregation tree (i.e., with optimal fanin) using only limited information about the aggregation function being applied. Experiments in the Amazon Elastic Compute Cloud (EC2) confirm the validatity of our models in practice.
UR - http://www.scopus.com/inward/record.url?scp=84954206416&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM.2015.7218544
DO - 10.1109/INFOCOM.2015.7218544
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84954206416
T3 - Proceedings - IEEE INFOCOM
SP - 1643
EP - 1651
BT - 2015 IEEE Conference on Computer Communications, IEEE INFOCOM 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015
Y2 - 26 April 2015 through 1 May 2015
ER -