Implicit Coscheduling: From Simulation to Implementation and Back Again

Andrea C. Arpaci-Dusseau


Building fault-tolerant, scalable services in a distributed system has typically involved complex implementations. The use of implicit information can greatly aid the construction of services requiring coordination. With implicit information, cooperating distributed clients can observe naturally occurring events to transfer information amongst themselves, rather than communicating explicitly.

In this work, we focus on the use of implicit information for scheduling communicating jobs in a cluster of workstations. The traditional time-sharing approach for communicating jobs has been explicit coscheduling, where processes of a parallel job are explicitly run at the same time across processors. We propose an alternative, implicit coscheduling, where local schedulers in the system dynamically coordinate scheduling by observing only implicit information, such as round-trip time and arrival rate of messages.

In this talk, we describe our experience with developing implicit coscheduling, from our initial simulations, to our prototype implementation, and back to more simulations. In addition to showing the performance of implicit coscheduling on a variety of workloads, we highlight the strengths and weaknesses of using simulation and implementation for evaluation.


For more information, see The Implicit Home Page, or click here to go back to the Finale schedule.