|Library for 2D pencil decomposition and distributed Fast Fourier Transform|
While 2DECOMP&FFT implements a general-purpose 2D pencil decomposition library, 1D slab decomposition remains an attractive option for certain applications.
Fig.1 shows an arbitrary 3D domain partitioned using a 2D processor grid of P_row=4 by P_col=3. Clearly 1D decomposition is just a special case of 2D decomposition with either P_row=1 or P_col=1. In both cases, the communication algorithms can be simplified significantly.
If P_row=1, state (a) and (b) are identical, as shown in Fig.2 (left); similarly, for P_col=1, state (b) and (c) are identical, shown in Fig.2 (right). So the 1D decomposition can be defined as either slabs in Y and Z or slabs in X and Y. The former is often preferred as better cache efficiency may be achieved by always keeping the X direction in local memory.
When using the FFT library with 1D decomposition, half of the global transpositions can be dropped, resulting in more efficient code. This optimisation was introduced in version 1.1.x of 2DECOMP&FFT.
Finally, note that one can also rely on this arrangement to perform large distributed 2D simulations. For example one option is to define the 2D data sets in an X-Y plane by setting nz=1 and P_col=1 (arrays are still to be declared as 3D to satisfy the programming interface of the library).