TY - JOUR
T1 - High Throughput VMs Placement with Constrained Communication Overhead and Provable Guarantees
AU - Cohen, Itamar
AU - Einziger, Gil
AU - Goldstein, Maayan
AU - Sa'Ar, Yaniv
AU - Scalosub, Gabriel
AU - Waisbard, Erez
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2023/9/1
Y1 - 2023/9/1
N2 - Placement of VMs in the cloud is one of the most fundamental problems in systems research. Traditionally, placement algorithms assume that the schedulers have complete information about the currently available resources at each host. However, this assumption is in many cases unrealistic, as gathering fresh status information from each of the thousands of hosts in a large data center incurs excessive communication overhead, which results in long queueing delays. Efforts to resolve this problem by employing several parallel schedulers typically exhibit collisions when several schedulers are simultaneously trying to place VMs on the same host. Our work analyzes the performance of various placement algorithms and provides empirical evidence that using multiple randomized schedulers obtains high throughput, while significantly decreasing both the communication overhead, and the number of collisions between schedulers. We, therefore, introduce Adaptive Partial State Random (APSR) - an efficient parallel random resource management algorithm that samples only from a small number of hosts and dynamically adjusts the degree of parallelism to provide provable guarantees on the probability of collisions between distinct schedulers. We formally analyze APSR, evaluate it on real workloads, and integrate it into the popular OpenStack cloud management platform. Our evaluation shows that APSR matches the throughput provided by other parallel schedulers, while achieving up to 13x lower decline ratio and a reduction of over 85% in communication overheads.
AB - Placement of VMs in the cloud is one of the most fundamental problems in systems research. Traditionally, placement algorithms assume that the schedulers have complete information about the currently available resources at each host. However, this assumption is in many cases unrealistic, as gathering fresh status information from each of the thousands of hosts in a large data center incurs excessive communication overhead, which results in long queueing delays. Efforts to resolve this problem by employing several parallel schedulers typically exhibit collisions when several schedulers are simultaneously trying to place VMs on the same host. Our work analyzes the performance of various placement algorithms and provides empirical evidence that using multiple randomized schedulers obtains high throughput, while significantly decreasing both the communication overhead, and the number of collisions between schedulers. We, therefore, introduce Adaptive Partial State Random (APSR) - an efficient parallel random resource management algorithm that samples only from a small number of hosts and dynamically adjusts the degree of parallelism to provide provable guarantees on the probability of collisions between distinct schedulers. We formally analyze APSR, evaluate it on real workloads, and integrate it into the popular OpenStack cloud management platform. Our evaluation shows that APSR matches the throughput provided by other parallel schedulers, while achieving up to 13x lower decline ratio and a reduction of over 85% in communication overheads.
KW - Cloud computing services
KW - cloud computing and cloud storage
KW - distributed management
KW - performance management
UR - http://www.scopus.com/inward/record.url?scp=85147286763&partnerID=8YFLogxK
U2 - 10.1109/TNSM.2023.3238644
DO - 10.1109/TNSM.2023.3238644
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
AN - SCOPUS:85147286763
SN - 1932-4537
VL - 20
SP - 3148
EP - 3161
JO - IEEE Transactions on Network and Service Management
JF - IEEE Transactions on Network and Service Management
IS - 3
ER -