Workshop on Scheduling for Parallel Computing

Summary of discussion

After the session on Tuesday (15th of September) we discussed about the future and the challenges of scheduling for parallel computing. The following issues were raised:

There are apparent differences in scheduling between different levels of abstraction in the parallel computing system. For example, the internals of the application are known for the programmer and the compiler. The use of the underlaying hardware platform can be tuned at this application level scheduling. An operating system must quickly schedule threads on CPU (cores) without knowing much about the threads themselves. The resource managers schedule on processors, memory, bandwidth to guarantee fairness, to obey the deadlines, etc. The grid schedulers integrate heterogeneous parallel systems of different organizations, to minimize job mean waiting time, obey deadlines, maximize resource utilization, etc. All these different levels of scheduling use different abstraction of the resources, have different information about the jobs, different instrumentation to implement the schedules, and operate on different time scales. Hence, scheduling levels of abstraction could possibly be considered as a stack of scheduling abstractions (and algorithms) analogously to the OSI/ISO communication stack model. The question is how to define such a scheduling stack (delegate scheduling responsibilities between different levels, define their interfaces) to make it suitable for the current and the future architectures and applications.
The discussion tackled the future of the computing hardware, and the implications for scheduling. For example, it was speculated that the number of cores (or GPU processors) on a standard desktop computer (or a laptop) shall grow. If the number of processors goes into hundreds or thousands, will scheduling on such resources pose a problem for operating system developers? On the one hand, this should not be a problem because the processing power is in excess, and a typical user does not care much about the (in-)efficiency of his/her computer. On the other hand, if certain computing platform (e.g. a PC) is used by millions of users, then the same scheduling efficiency problem repeats millions of times, hence optimizing resource use (energy, performance to monetary cost) can be beneficial both to the hardware/OS producers, to the computer owners, and to the (environmentally friendly) society. Ultimately, can we expect a grid on a laptop, and scheduling problems typical of the contemporary grids?
It has been pointed out that with a growing level of abstraction in scheduling of parallel systems the efficiency of the schedules is decreasing. In other words, the opportunities for efficient use of the hardware are gradually lost by schedules constructed at the high levels of abstraction. On the other hand, it was noted that many users accept this loss of performance in exchange for ease of use and functionality. It was also noted that scheduling theory is excessively fixed on utilization of processors, while availability of computing power is already big, and can be expected to further increase in the future.

The above summary was written by M.Drozdowski
Minutes of what has been said are given here (by J.Berlińska).
If any details of the discussion have been omitted, or you would like to add a comment, then please write an email to Maciej Drozdowski.

Things to ponder

A questionnaire was distributed between the participants of the discussion. The questions asked, and the answers are given below.

What are the biggest challenges of scheduling for parallel computing (SPC)?
Answers:
- Architecture dependency.
- Availability of advanced reservations.
- "Optimal/improved/good" schedules (optimality is impossible because of NP-hardness).
- New hardware architectures including parallel many-core systems, GPGPU, etc.
- A lack of application and operating system dynamic scheduling techniques to deal with hierarchical and heterogeneous computing.
- Energy-aware scheduling algorithms.
- Multicriteria scheduling methods.
- Good [scheduling] algorithms results despite poor knowledge of processing times.
- Complexity of real problems (in modeling not computational complexity).
- Decomposition of real problems into parts simple to represent and solve.
What are the practical scheduling problems that should be solved?
Answers:
- Self-tuning algorithms which provide close-to-optimal results (provably) for a wide range of applications.
- New scheduling modules implemented in operating and HPC queuing systems taking into account new application requirements and new hardware architectures and their constraints.
- Scheduling of applications on many-core and hybrid systems.
- Scheduling in large scale heterogeneous environments taking into account data management.
- Optimization of the costs (e.g. energy, cost of ownership) on platforms with big numbers of machines.
- Diversity of models, lack of uniform approaches.
What are the obstacles in applying theoretical scheduling results in practice?
Answers:
- Availability of libraries/reference implementations.
- Unavailability of advanced reservations.
- Not enough/reliable information about resources/services.
- A lack of simulations and emulations based on reference benchmarks and real workloads.
- Difficulty with the access to sources of practical applications and systems.
- Lack of fine-grain parallelization of majority of applications.
- Missing good estimations of job execution times.
- Often optimization time [is] too long - decision is needed immediately.
- Lack of input data for scheduling algorithms, lack of instrumentation to implement the schedules.
What is the most obsolete subject in SPC?
Answers:
- Focus on theoretical models for relatively small problem instances.
- Load balancing in grids and clusters.
- Taking into account low level communication hardware, or communication network structure.
What is the most promising subject in SPC?
Answers:
- Multiobjective scheduling of workflows.
- Hierarchical and many-level scheduling in parallel and distributed systems.
- Application-level dynamic scheduling for parallel threads and processes.
- Probably fine-grain scheduling of applications on new architectures (man-cores, accelerators, hybrid systems) due to large area of applications (very high impact).
What question should we have asked?
Answers:
- What to do "not to reinvent the wheel" and to implement more applicable modules and scheduling approaches in practice.

Last modified :