LOOM: Optimal aggregation overlays for in-memory big data processing

William Culhane, Kirill Kogan, Chamikara Jayalath, Patrick Eugster

פרסום מחקרי: תוצר מחקר מכנסהרצאהביקורת עמיתים

8 ציטוטים ‏(Scopus)

תקציר

Aggregation underlies the distillation of information from big data. Many well-known basic operations including top-k matching and word count hinge on fast aggregation across large data-sets. Common frameworks including MapReduce support aggregation, but do not explicitly consider or optimize it. Optimizing aggregation however becomes yet more relevant in recent “online” approaches to expressive big data analysis which store data in main memory across nodes. This shifts the bottlenecks from disk I/O to distributed computation and network communication and significantly increases the impact of aggregation time on total job completion time. This paper presents LOOM, a (sub)system for efficient big data aggregation for use within big data analysis frameworks. LOOM efficiently supports two-phased (sub)computations consisting in a first phase performed on individual data sub-sets (e.g., word count, top-k matching) followed by a second aggregation phase which consolidates individual results of the first phase (e.g., count sum, top-k). Using characteristics of an aggregation function, LOOM constructs a specifically configured aggregation overlay to minimize aggregation costs. We present optimality heuristics and experimentally demonstrate the benefits of thus optimizing aggregation overlays using microbenchmarks and real world examples.

שפה מקוריתאנגלית
סטטוס פרסוםפורסם - 2014
פורסם באופן חיצוניכן
אירוע6th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2014 - Philadelphia, ארצות הברית
משך הזמן: 17 יוני 201418 יוני 2014

כנס

כנס6th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2014
מדינה/אזורארצות הברית
עירPhiladelphia
תקופה17/06/1418/06/14

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'LOOM: Optimal aggregation overlays for in-memory big data processing'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי