Application of parallel version two-dimensional fast Fourier transform algorithm, analog of the Cooley-Tukey algorithm, for digital image processing of satellite data

In modern systems of remote sensing two-dimensional fast Fourier transform (FFT) has been widely used for digital processing of satellite images and subsequent image filtering. This article provides a parallel version two-dimensional fast Fourier transform algorithm, analog of the Cooley-Tukey algorithm and its implementation for processing the satellite image of Krasnoyarsk and its suburban areas.


Introduction
At present, Earth remote sensing is closely related to digital image processing, as the aerospace images that are rich in detail are commonly represented in digital form of a raster type. Adjustment of contrast and brightness, spatial filtering, Fourier transform and subsequent frequency filtering are often used to improve image quality [1]. The traditionally applied algorithm for computing two-dimensional fast Fourier transform (FFT) is the sequential application of a one-dimensional FFT, first for all rows, then for all columns. The article describes a version two-dimensional fast Fourier transform algorithm, analog of the Cooley-Tukey algorithm, with the reduced number of complex operations compared to the traditionally used algorithm. A variant of the algorithm parallelization to accelerate calculations is shown in the article. The use of the algorithms to perform image processing on digital images is considered.
The filtering procedure is implemented in several stages: • Data reading and swapping • Parallel computation of the Cooley-Tukey algorithm • Filtering the obtained Fourier transform • Reverse FFT Calculation • Image acquisition Detailed description of these stages can be found in [1]. A two-dimensional analogue of the Cooley-Tukey algorithm is considered in [2], its multidimensional variant in [3] and a more detailed description of the algorithm is provided in [4]. In this paper we present a parallel two-dimensional version of the Cooley-Tukey algorithm. First, we consider two dimensional analog of the Cooley-Tukey algorithm for computing the fast Fourier transform.

The Cooley-Tukey algorithm
Given a function f(x,y) of two-dimensional periodic signal with values in the complex space, where x,y = 0, 1, …, 2 s -1, s is a positive integer. Grayscale image that is 2 s pixels in height and width can be taken as an example of such a function. The value of the function f(x,y) is a complex number, which real component is equivalent to the brightness values of the corresponding pixel with coordinates (x,y) in the range 0-255, and the imaginary component of the number is zero. Then the Fourier transform F(u,v) of this signal can be represented as follows: where N=2 s , x,y = 0,1,…,N-1.
We convert formula (1) as follows: the coordinates x and y are decomposed into even and odd parts. Then, the initial sum can be divided into four equations:  (2) is a FFT signal with the elements for Then, in the next step , we divide the coordinates u and v into two equal-sized subsets 2u1, 2v1 and 2u1+N/2, 2v1+N/2, respectively, u1,v1 = 0,1,…,N/2-1. Then for the factors and from the second subset we get: . , We plug (3) into the formula (2) to get: Formula (4) describes the two-dimensional fast Fourier transform batterfly in analog of Cooley-Tukey algorithm. It is schematically depicted in Figure 1. With this butterfly we can split the source signal and the Fourier transform with elements into four sub-signals each with the number of elements [4]. This reduces the number of  Decomposition (4) can be applied to each sub-signal g(u,v) several times until we get sub-signals of 4 elements, which FFTs are being computed directly. Then for the signal f(x,y) of the elements the total number of multiplications required to compute N N  complex numbers will be , and the number of additions - [3].

The parallel Cooley-Tukey algorithm
To test the running time of an algorithm for calculating two-dimensional fast Fourier transform algorithm, analog of the Cooley-Tukey algorithm, a simulation program was written in the C++ programming language [4]. Two principal methods of algorithm parallelization were used: OpenMP, targeted toward use on shared memory systems, and MPI for distributed memory systems. Testing was carried out on the cluster node of the SFU supercomputer with IBM HS21 XM Xeon Quad core E5450 3.0 GHz, 64 Gb RAM [5]. The result of testing for a system with distributed memory on a single cluster node is presented in Table 1.
The two-dimensional analogue of the Cooley-Tukey algorithm processes the signal with the number of samples in s iterations. In the first iteration, due to the preliminary s s 2 2  permutation of the elements, the signal data is divided into quads of related (vertically and horizontally) elements spaced by one element from each other; in the second iteration, into quads of elements separated by two elements from each other; in the third iteration -by the 2 2 element; in the last s-iteration by-2 s-1 element. For such implementation of the algorithm, in shared memory systems parallel data structures are designed by splitting multiple data in each iteration into arrays of interconnected sets of the elements for each separate thread. In a distributed memory systems, a similar partitioning into sets of related data occurs at each iteration with subsequent transfer of data between the processes for independent computations. The data are presented graphically in Figure 2, where the FFT algorithm by rows and columns is denoted by FFT RC, and the analogue of the Cooley-Tukey algorithm by FFT CT.

Image filtering
LandSat-8 image of Krasnoyarsk and its surrounding areas dated April 7, 2016 [6], which is shown in Figure 1 on the left, was used as a test signal. The original image resolution is 8081 * 8171 pixels, it was converted to the nearest power of two: and then scaled for the powers 10-15. On the right side of the figure 3 the result from high-pass filtering is shown. In this case, the contours are more vivid: the river line, the boundaries of the rocky areas. The original image is shown on the left side of the figure 4. The result from low-pass filtering is on the right side. In this case, small sharp changes in mountainous terrain are not so noticeable on the general background, that is, small details have been removed.