Hyperspectral regression lossless compression algorithm of aerospace images

. In this work, we propose an algorithm for compressing lossless hyperspectral aerospace images, which is characterized by the use of a channel-difference linear regression transformation, which significantly reduces the range of data changes and increases the degree of compression. The main idea of the proposed conversion is to form a set of pairs of correlated channels with the subsequent creation of the transformed blocks without losses using regression analysis. This analysis allows you to reduce the size of the channels of the aerospace image and convert them before compression. The transformation of the regressed channel is performed on the values of the constructed regression equation model. An important step is coding with the adapted Huffman algorithm. The obtained comparison results of the converted hyperspectral AI suggest the effectiveness of the stages of regression conversion and multi-threaded processing, showing good results in the calculation of compression algorithms.


Introduction
Hyperspectral aerospace images (AI) are images obtained from Earth remote sensing spacecraft (ERS), designed to solve problems in the field of applied research. Studies to compress hyperspectral AI are the most interesting; this is subject to a large number of publications. In hyperspectral AI for each pixel, the hyperspectral camera accepts light intensities for a large number of adjacent spectral ranges reaching several hundred. Hyperspectral AI due to the rich content of information are effectively used in the tasks of automated processing of remote sensing images. The stages of hyperspectral (AI) processing can include: internal representation of images, image conversion, geometric correction of scenes, and image preprocessing. Compression efficiency can consist of the following steps -a change in the internal representation of images, converting it to a compact form, and using pre-processing. Compression is the preprocessing and removal of redundancy in images. There are two types of redundancy in hyperspectral AI, spatial and spectral. These redundancies will allow the development of efficient compression algorithms. Based on researches of hyperspectral AI in the field of compression [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20] presented in the works of scientists from Russia, China, the USA, India, etc., it can be assumed that the developed methods and lossless compression algorithms for hyperspectral AI can be improved by reducing their computational efficiency and increasing the compression ratio by modifying the preprocessing steps using mathematical methods. In addition, new compression preprocessing steps can be proposed that effectively increase the compression ratio and reduce the compression process time. The latest best results in solving the compression problem, as in many other remote sensing problems, were obtained using well-known algorithms in their various combinations and groups, while having low compression ratios, the average compression ratio is 3.85. Based on the analysis of lossless GI compression methods and algorithms, it should be concluded that the most effective ways to solve the compression problem are: -taking into account spectral correlation, which gives certain advantages on the basis of the calculated correlation matrix;  application of a new method of ordering GI channels;  use of interpolation based on mathematical methods;  arithmetic coding and the Huffman algorithm, which are the best among statistical methods;  the use and organization of parallel compression processing to reduce the cost of computing resources.
In a previous study of the authors of a multi-stage compression algorithm [10], the main drawback remains -the computational efficiency of compression is low, large computational costs are required, which makes it difficult to use them to solve GI compression problems.
Therefore, in this work, a modification of the algorithm is proposed taking into account the correlation [10], in which an attempt is made to eliminate this drawback.
Modification of the algorithm with regression analysis. The essence of the modification is to calculate the linear regression coefficients between the values of the generating channel (master) and the regressed channels (slave, compressible) of the GI by forming arrays of differences between master and slave. When using a PC with different computing capabilities, the modification of the algorithm uses a different number of threads, corresponding to the number of available processor cores. The proposed modification of the lossless compression algorithm taking into account inter-band correlation and regression analysis will increase the compression ratio by more than two times in comparison with the use of universal archivers. The proposed algorithm for finding the best channel groups for a given correlation value will increase the efficiency of the channel subtraction (difference transformation) application.

Description of the approach to the problem of compression of hyperspectral AI
The sequence of processing steps and the compression algorithm: 1) calculation of the correlation between all pairs of AI channels and determining the sequence of channel coding and decoding; 2) regression conversion algorithm; 3) obtaining channel differences and their block conversion; 4) compression by a statistical algorithm. We describe each step of the algorithm. 1 step. We calculate the values of the correlation matrix between all pairs of AI channels, revealing the most correlated groups of channel pairs. Based on the matrix, we will form and determine the sequence of transformation (coding) and inverse transformation by the method of constructing a strongly branching tree. 2 step. Regression analysis based on step 1. Let us calculate the linear regression coefficients between the values of the generating and regressing channels of the hyperspectral AI by creating optimal values for generating arrays of differences between master and slave. 2 Description of the compression algorithm

Compression Sequencing
At the first stage, we transform the data structures based on the initial hyperspectral AI, using mathematical models for finding the correlation between all pairs of channels, and we determine the coding sequence. Knowing that hyperspectral AI are obtained in the spectrum of a single wave, we assume that a certain degree of dependence can be determined between pairs of channels. To determine the magnitude of this dependence, we use the formula for calculating the Pearson correlation coefficient (aka linear correlation coefficient).The formula requires two sequences of data, so we first extract two sequences of samples from a pair of channels. This happens as follows: we represent the twodimensional data matrix of one channel in the form of a linear array (passing the matrix row by row from left to right and rows from top to bottom), then select a certain number of samples in it (we denote the number by the letter m) dividing the array into approximately equal segments. From the second channel, we extract a sequence of samples that are located in the matrix at the same positions as the samples from the first channel. We denote the obtained sequences by the letters x and y, and the unit values are x i and y i (i from 1 to m inclusive). It is also necessary to calculate the arithmetic mean values of both sequences of samples.
Counting the coefficients for all possible pairs of channels of the same image, we build the coding sequence -naturally, starting from larger values to smaller ones.
1 The Construction of the correlation matrix of AI channels. 2 Build a highly branching tree to determine the sequence of channel conversion depending on the correlation value using the Pearson correlation calculation formula.
3 Formation of the hyperspectral AI processing sequence for step 1.2, necessary for AI recovery.
Example. There is a 12-channel GI, respectively, the index tree will contain 11 vertices, we will build them in the correlation table, table. 1. From table 1 it is seen that the interchannel correlation increases, starting from channel 5. This suggests that the channels whose indices lie in the range [4-11] have a high correlation dependence, the maximum correlation value between channel 10 and 11. 3.1 Based on the constructed correlation matrix, we construct the L tree of indicesthese are ordered pairs of channels (for example, channels numbered 10 and 11). Let there be 11 pairs and 11 vertices of the tree. Channel 12 under index [11] is converted through the first vertex of the tree [# 0] (the numbering of vertices and indices starts from zero), Channel 10 under index [9] is converted through the second vertex [# 1], Channel 9 under index [8] is converted through the third peak [# 2], etc., fig. 1. 3.2 Formation of a sequence of pairs: Step 1. Find the maximum element among the available pairs in the correlation matrix, form the first pair of channels.
Step 2. We put in a set of ordered pairs (SOP) a new pair of channels.
Step 3. We remove from the table of the correlation matrix all pairs whose elements are already defined in the SOP.
Step 4. While all the elements are not in the SOP, we return to step 1.

Regression (regression) transformation
The essence of the transformation is to bring into conformity to the encoded pair of channels some structure that would: 1) allowed to unambiguously restore one of the source channels according to the data of another channel, 2) occupied as little disk space as possible. The subtracted channel is converted using the linear regression equation (LRE) as an example. The generalized form of the transformation will be like this. found. Then Y = 1233. The difference is d = y -Y = 79. The value is 79 <394 and requires fewer bits to store them. We apply this type of generalization of LRE to hyperspectral AI. The coefficients of the constructed regression model Y i =a x+b for the generating channel were determined.
The idea of linear regression in our case is to find such real values a and b such that the matrix formed from the data of the encoded pair by the following formula: In each slave channel we put the values a and b in the Double format (8 bytes, 15 decimal places in decimal format). Encoding the only main generating channel (generating the channel of the first pair) for the image, independently of the others, we can subsequently restore it first, then gradually restore all other compressed channels (as mentioned earlier, through the coefficients a and b).
The absence of losses during the conversion is ensured as follows. After receiving the matrix d and before writing it to the file, we round off the values to the nearest integer (in the case of a fractional part equal to 0.5, to a smaller integer). This does not prevent lossless recovery. Let's explain why. Let's pay attention to the formula: The matrices x and y are integer; therefore, the quantity has the same fractional part (we denote it by q) as the number d ij before rounding. We will denote the integer part of numbers by square brackets, the fractional by curly brackets. So q = {d ij } = {w ij }. After rounding, rounded values are written to the difference file. We proceed to recovery: The value of w ij remains unchanged, since a and b were previously calculated and recorded in the Double format (that is, without loss), the values of x are integer and do not undergo losses.
If q <= 0. 5 The value (1 -q) <= 0.5, а y ij is an integer. It also means: rounding, we get the initial value y ij .
Thus, regression conversion does not incur losses. To reverse decode hyperspectral AI, we perform the following steps: Step 1. Decoding arrays of differences by the Huffman algorithm.
Step 2. Formation of regression transformation arrays by finding the sums between the generating channel and its average value.
Step 3. Formation of the initial arrays based on the available coefficients of HRM and obtaining the initial data of hyperspectral AI.

Experimental researches
To determine the effectiveness of the proposed algorithm from the point of view of the degree of compression, as well as the limits of its applicability, a number of experiments were carried out on hyperspectral AI (Aviris remote sensing system), in table. 2.