Implicit Coscheduling Simulations Abstract
Effective Distributed Scheduling of Parallel Workloads
We present a distributed algorithm for time-sharing parallel workloads
that is competitive with coscheduling. Implicit scheduling allows each
local scheduler in the system to make independent decisions that
dynamically coordinate the scheduling of cooperating processes across
processors. Of particular importance is the blocking algorithm which
decides the action of a process waiting for a communication or
synchronization event to complete. Through simulation of
bulk-synchronous parallel applications, we find that a simple two-phase
fixed-spin blocking algorithm performs well; a two-phase adaptive
algorithm that gathers run-time data on barrier wait-times performs
slightly better. Our results hold for a range of machine parameters and
parallel program characteristics. These findings are in direct contrast
to the literature that states explicit coscheduling is necessary for
fine-grained programs. We show that the choice of the local scheduler
is crucial, with a priority-based scheduler performing two to three
times better than a round-robin scheduler. Overall, we find that the
performance of implicit scheduling is near that of coscheduling
(+/- 35%), without the requirement of explicit, global coordination.