תקציר
Aggregation underlies the distillation of information from big data. Many well-known basic operations including top-k matching and word count hinge on fast aggregation across large data-sets. Common frameworks including MapReduce support aggregation, but do not explicitly consider or optimize it. Optimizing aggregation however becomes yet more relevant in recent “online” approaches to expressive big data analysis which store data in main memory across nodes. This shifts the bottlenecks from disk I/O to distributed computation and network communication and significantly increases the impact of aggregation time on total job completion time. This paper presents LOOM, a (sub)system for efficient big data aggregation for use within big data analysis frameworks. LOOM efficiently supports two-phased (sub)computations consisting in a first phase performed on individual data sub-sets (e.g., word count, top-k matching) followed by a second aggregation phase which consolidates individual results of the first phase (e.g., count sum, top-k). Using characteristics of an aggregation function, LOOM constructs a specifically configured aggregation overlay to minimize aggregation costs. We present optimality heuristics and experimentally demonstrate the benefits of thus optimizing aggregation overlays using microbenchmarks and real world examples.
שפה מקורית | אנגלית |
---|---|
סטטוס פרסום | פורסם - 2014 |
פורסם באופן חיצוני | כן |
אירוע | 6th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2014 - Philadelphia, ארצות הברית משך הזמן: 17 יוני 2014 → 18 יוני 2014 |
כנס
כנס | 6th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud 2014 |
---|---|
מדינה/אזור | ארצות הברית |
עיר | Philadelphia |
תקופה | 17/06/14 → 18/06/14 |