A new scheduling model applicable in many parallel architectures is presented in this section. We analyze scheduling divisible tasks, i.e. tasks that can be divided into parts of arbitrary size. These parts can be processed in parallel and independently of each other. In other words, divisible tasks are parallel applications which include no data dependencies (precedence constraints) and whose granularity of parallelism is fine.
Before going into details we give some motivation of the model and introduce basic concepts. Consider, as an example, searching for a record in a huge database with thousands (or more) records. This can be done by cooperating processors. The database file can be divided into parts with granularity of one record. Note that the granularity of the division is fine compared the total size of a database. The search can be conducted in each part independently of the other parts. Thus, the two assumptions of the divisible task model are fulfilled. Finally, the results are reported to a master processor. This approach can be applied while searching for a pattern in a text, graphical, audio, etc. file. A similar situation takes place when sorting a database file in a distributed way. Further examples of divisible tasks are related to data parallelism: processing big measurement data files [CR88], image and signal processing [BGMR96], simulations of molecular dynamics [ADKMY93], some problems of linear algebra on big matrices [BT90], solving partial differential equations by the finite element method [Wi91] and many other engineering and scientific problems. Observe that similar assumptions on divisibility of the work are made in loop scheduling [LSL94,ML94] and load balancing [G91,GKR92,LM92,XH93] models of parallel applications.
Now, we outline the process of data dissemination and processing.
A parallel computer consists of
processing elements (PEs),
each of which comprises a processor, local memory, and is
capable of communicating over the interconnection network
(either by an independent network processor,
or by the use of software run on the processor).
The notions of a processor from the scheduling theory and
the processing element are equivalent here.
Only when a PE has a network processor is it capable of simultaneously
computing and communicating.
Initially, the whole volume
of data (or work) to be processed resides
in one processor called originator.
The originator processes locally
data units and sends the rest (i.e.
) to its idle neighbors.
Each processing element intercepts for local computing some data
from the received volume and sends the rest to the idle neighbors.
Thus, PE number
(denoted by
) intercepts and processes locally
data units which lasts
units of time.
represents the processing rate, i.e. the reciprocal of the speed,
for
,
.
The transmission time of
data units over communication
link
joining two processors is
.
is the startup time that is spent to initiate the communication,
and
is the transmission rate.
Our goal is to find such a distribution of task parts
(i.e.
's) that the communications and computations
are finished in the shortest time.
For simplicity of the presentation we assume that the processing time depends
linearly on the volume of processed data.
It will be demonstrated that this assumption can be relaxed.
The above description still leaves space for details including,
e.g. a communication algorithm tailored to the interconnection network.
In the following sections we present solutions for different
network topologies.
Observe that when no results are returned to the originator,
all the processors must stop working at the same moment of time.
It can be explained intuitively:
when finishes earlier, then it is possible to off-load
other PEs by moving some part of the load to
.
Thus, the whole schedule would be shortened.
This observation has been proved both for particular
interconnections [CR88,SR93] and for a general type of
interconnection [BD97].
Assuming that the results are not returned
simplifies the presentation.
However, we show in the following sections that also
the case where results are returned to the originator
can be included in our model.
Unless stated otherwise we assume that the number
of divisible tasks is equal to one.
The process of distributing workload which is a distribution
of information in which each PE receives different data
is called scattering.
This part of the book is organized as follows. In Section "Scheduling Divisible Task in Processor Networks" we present results of applying the idea of divisible task in various distributed systems. We concentrate on basic models, and then show the way of expanding them to cover more complex cases. Performance evaluation is one of divisible task model applications. Examples of such an evaluation are presented in Section "Performance Prediction Based on the Divisible Task Model - example". Section "Validation of the Model" describes some results of verifying divisible task model experimentally.