Reliability, security and stability of cloud services without sacrificing too much resources have become a desired feature in the area of workload management in clouds. The paper proposes and evaluates a lightweight framework for scheduling a workload which part could be unreliable. This unreliability could be caused by various types of failures or attacks. Our framework for robust workload scheduling efficiently combines classic fault-tolerant and security tools, such as packet/job scanning, with workload scheduling, and it does not use any heavy resource-consuming tools, e.g., cryptography or non-linear optimization. More specifically, the framework uses a novel objective function to allocate jobs to servers and constantly decides which job to scan based on a formula associated with the objective function. We show how to set up the objective function and the corresponding scanning procedure to make the system provably stable, provided it satisfies a specific stability condition. As a result, we show that our framework assures cloud stability even if naive scanning-all and scanning-none strategies are not stable. We extend the framework to decentralized scheduling and evaluate it under several popular routing procedures.
View on arXiv