Optimal communication structures for big data aggregation

William Culhane, Kirill Kogan, Chamikara Jayalath, Patrick Eugster

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

Aggregation of computed sets of results fundamentally underlies the distillation of information in many of today's big data applications. To this end there are many systems which have been introduced which allow users to obtain aggregate results by aggregating along communication structures such as trees, but they do not focus on optimizing performance by optimizing the underlying structure to perform the aggregation. We consider two cases of the problem - aggregation of (1) single blocks of data, and of (2) streaming input. For each case we determine which metric of 'fast' completion is the most relevant and mathematically model resulting systems based on aggregation trees to optimize that metric. Our assumptions and model are laid out in depth. From our model we determine how to create a provably ideal aggregation tree (i.e., with optimal fanin) using only limited information about the aggregation function being applied. Experiments in the Amazon Elastic Compute Cloud (EC2) confirm the validatity of our models in practice.

Original languageEnglish
Title of host publication2015 IEEE Conference on Computer Communications, IEEE INFOCOM 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1643-1651
Number of pages9
ISBN (Electronic)9781479983810
DOIs
StatePublished - 21 Aug 2015
Externally publishedYes
Event34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015 - Hong Kong, Hong Kong
Duration: 26 Apr 20151 May 2015

Publication series

NameProceedings - IEEE INFOCOM
Volume26
ISSN (Print)0743-166X

Conference

Conference34th IEEE Annual Conference on Computer Communications and Networks, IEEE INFOCOM 2015
Country/TerritoryHong Kong
CityHong Kong
Period26/04/151/05/15

Fingerprint

Dive into the research topics of 'Optimal communication structures for big data aggregation'. Together they form a unique fingerprint.

Cite this