Next release: the following features are planned for the next release and are under active development.
- Flexible data layout for input and output data.
- New IO routines that wrap around MPI's Split Collective Data Access Routines to help overlap of large IO operation and computation.
- Support to Chebyshev transform.
2012-10-08: version 1.5.847 (Download)
- 2DECOMP&FFT now has a stringent test suite for quality assurance. Discovered and fixed bugs in the halo-cell support code and in the memory management code which in rare cases could crash the library.
- Introduced padded-alltoall optimisation to the communication code. This could potentially improve performance for small message passings on some Cray systems.
- Introduced utility routines that make allocating 3D distributed arrays easier.
- Fixed a bug in the FFTPACK5 engine (FFTPACK 5.0 can produce incorrect results for certain problem size, fixed in version 5.1).
- Added support to the T3PIO library to improve I/O performance on LUSTRE file system.
- Added experimental API to allow integration with the Global Arrays Toolkit. More details here.
2011-10-12: version 1.4.682 (Download)
- The previous release was wrongly marked as version 1.3.319 at some websites promoting open-source software. So this new version starts from 1.4.x to avoid any confusion.
- MAJOR NEW FEATURE - new experimental API to support overlap of communications and computations in applications. More details here.
- Much improved IO library with refactored code, new functions and bug fixes. Please note there are minor adjustments of the parameter lists of several I/O routines. Refer to the IO API page for more details.
- Much improved halo-cell communication code, now supporting arbitrary global data size, data structures defined using global coordinate, and periodic boundary conditions.
- More sample applications.
- Many bug fixes and minor improvements.
2011-07-08: version 1.1.319 (Download)
- Reintroduced the IBM ESSL implementation of the FFT library, to be used on IBM hardware such as Blue Genes and other PowerPC based machines where ESSL is available.
- A new FFTW implementation using the Fortran 2003 interface provided by the latest FFTW 3.3-beta1. The old implementation using the legacy Fortran interface remains. The main benefit of the new Fortran 2003 interface is the guaranteed memory alignment which may offers performance improvement on certain hardware/compiler combinations, although this is not seen on my test hardware.
- A new sample application allowing to crosscheck the parallel FFT result against P3DFFT.
2011-06-12: version 1.1.273 (Download)
- Better handling of the MPI buffers using global data strucutres in the 2D decomposition library, resulting in significant speed-up (more than 20% on a Cray XE6) for all communication code.
- Special branch of code to optimise the FFT performance when 1D decomposition is actually in use as a special case of 2D decomposition, again resulting in significant speed-up. Read this note if you want to parallelising applications using the 1D decomposition.
- Optimisation of several FFT engines by using in-place transforms when computing the underlying 1D FFTs.
- Introduced an option that allows overwriting the FFT input. This can reduce the memory footprint of the library and may improve performance.
2011-05-04: version 1.0.246 (Download)
- Initial public release.
- General-purpose 2D pencil decomposition module.
- Distributed 3D Fast Fourier Transform.
- Halo-cell support allowing explicit message passing between neighbouring blocks.
- Parallel I/O module using MPI-IO to handle the input/output of data sets.
- System V IPC shared-memory optimisation of the communication code.
- Interface with most popular external FFT libraries: FFTW, ACML, MKL, FFTE and FFTPACK.