Meshes

In a regular non-toroidal

-dimensional mesh each PE is directly coupled up to

neighbors (cf. Fig.5). In the toroidal mesh PEs at the mesh opposite 'sides' are connected, and each PE is connected with exactly

neighbors. We assume here that communication links and processors are identical, PEs are equipped with network processors, results are not returned to the originator located in the center of the network. Each PE is capable of simultaneous communication over $p\in\{1,\dots,2d\}$ links (ports). We will call by a layer the set of PEs activated in the same stage of data distribution.

Two-dimensional rectangular mesh networks with store-and-forward communication mode were considered in [BD96]. A scattering method based on routing messages only between the nearest neighbors was presented. Thus, the layers of PEs form squares around the originator. In [BDGT99] a more efficient method based on circuit-switched routing was proposed for a two-dimensional mesh. Scattering methods were further extended in [D97] to the case of a 3-dimensional mesh with $p=1,\dots,6$ . Due to space limitations we describe only the case of

in 3-dimensional mesh (the following discussion and formulae apply also in the case of

). The methods of scattering in 3-dimensional mesh for PEs using

ports simultaneously are depicted in Fig.5.

**Figure 5:** Scattering in 3-dimensional regular mesh, for $p\!=\!1\!,2\!,3\!.$
$\begin{figure}\begin{picture}(300,240)% dimension in pt=1/72' \put(10,0){\psfig{file=hbk3dmsh.eps,width=11cm}} \end{picture}\end{figure}$

The algorithms are based on recursive division of the mesh into sub-meshes of smaller size. In each step of scattering

PEs are activated. These PEs become originators for scattering in the next step. Thus, in each step the initial mesh is divided into

submeshes of

times smaller size. Each of the steps consists, in turn, of three moves. The same moves are performed simultaneously by all the active PEs. Each move increases the number of active PEs

times, which is maximum because only

ports can be used simultaneously. Hence, after

moves the number of working PEs is

. For

and

, each move of a step activates

PEs neighboring along a different dimension. In a 3-port system the originator activates three PEs located in the same two-dimensional cross-section of the initial mesh (say, along the plane

). Next, the four active PEs send data along the third dimension (

dimension). In the third move each active processor sends data to the neighbors along the hull of the initial mesh and to one neighbor inside the mesh.

Let us denote by $\alpha_i$ the amount of data to be processed by a PE activated in move

, and by $k=\log_{p+1}m$ the number of moves. The diagram of communication and computation is given in Fig.2, but here the number of ports involved simultaneously can be greater. Similarly, computing on the sending layer of PEs lasts as long as communicating to and computing on the PEs of the receiving layer. The distribution of data can be found from the following set of equations ( $p\in\{1,\dots,6\}$ ):

$\displaystyle A\alpha_{k-i}$	$\textstyle =$	$\displaystyle S+C(\alpha_{k-i+1}+%% p\sum_{j=2}^i\alpha_{k-i+j}(p+1)^{j-2})+A\alpha_{k-i+1}$
		$\displaystyle {\rm\hspace{50mm} for \hspace{1mm}}i=1,\dots,k$	(10)
$\displaystyle V$	$\textstyle =\!$	$\displaystyle \alpha_0+p\sum_{i=1}^k(p+1)^{i-1}\alpha_i$

A broadcasting method for

-dimensional toroidal mesh with

-port PEs and edge size

( $k\in Z^+$ ) has been proposed in [PC96]. This method activates $(2d+1)^{kd}$ PEs in

moves. Thus, the methods proposed here can be extended to toroidal meshes of arbitrary dimension when

. Observe that also scattering methods for

and

can be applied in meshes of higher dimension. The proposed scattering algorithms are optimal in the sense that the number of activated processors in the allowed number of steps is the biggest possible. Equations (10) can be applied in any architecture in which PEs can be activated in the same way as above, i.e. each active PE in each move starts

new processors. One of such architectures can be multistage interconnection [D97].