Data-intensive jobs need to process huge amounts of data, which is suitable to run on the cloud environment. A cloud environment provides both computing and storage resources by connecting thousands of servers together. Typically, data-sensitive jobs split and distribute datasets into multiple data chunks followed by task launches to process these chunks on machines. Multiple jobs might run simultaneously in a cloud computing system. Schedulers can allocate resources among multiple jobs but do not take job deadlines into account. When users submit jobs to cloud computing environments, they expect that the cloud service provider can meet their Service Level Agreement (SLA), e.g., meet jobs’ deadlines. While a number of task-scheduling strategies have been developed to improve system response time and the performance of the cloud, none provide a quantitative framework that optimizes the speculative execution offering guaranteed Service Level Agreements (SLAs) to meet application deadlines. The dissertation, advised by Dr. Suresh Subramaniam, presents novel frameworks and approaches to meet job deadlines via job and task scheduling algorithms.
Maotong Xu’s dissertation titled “Deadline-aware Job and Task Scheduling in Cloud Environment,” centers around a range of exciting new algorithms that optimize data placement, task scheduling and task execution failure recovery in cloud environment, while meeting jobs’ deadlines under different constraints.
The first part of this dissertation considers cloud right-sizing with execution deadlines and data locality constraints. They present a novel framework CRED, which harnesses workload-aware chunk placement to partition data chunks based on their workload and schedules jobs to efficiently utilize both space and computing resources on active nodes, thus minimizing the number of nodes required to process all jobs, while meeting users’ SLA requirements.
The second part of this dissertation extends the research to offer guaranteed SLAs to meet application deadlines. They propose an innovative framework, Chronos, a unifying optimization framework that can provide probabilistic guarantees for deadline-critical MapReduce jobs. Chronos brings together several speculative scheduling strategies under a unifying optimization framework and defines a new metric, Probability of Completion before Deadlines (PoCD), to quantify the probability that MapReduce jobs meet their desired deadlines.
The third part of this dissertation presents LASER, a deep learning approach for speculative execution and replication of deadline-critical jobs. Machine learning has been successfully used to solve a large variety of classification and prediction problems. In particular, the deep neural network (DNN) can provide more accurate regression (prediction) than traditional machine learning algorithms. The final part of this dissertation presents a new scheduling framework for training machine learning (ML) and deep learning (DL) models. They combine Bayesian Optimization (BO) with Reinforcement Learning (RL).
Dr. Maotong Xu went on to join Facebook HQ as a Research Scientist. His current research focus is on optimization for News Feed ranking and delivery.