Information content of statistical texture features in the problem of recognition and mapping of natural and man-made objects from space images

. Statistical texture features are frequently used for the thematic processing of very high spatial resolution satellite images. The assessment of information content of 1st and 2nd order statistics is carried out based on processing WorldView-2 images of test areas located on the territory of the Savvatyevskoe forestry and employing the corresponding ground-based data. The comparison of the accuracy and computational efficiency of traditional and ensemble classifiers in the problem of pattern recognition of various natural and man-made objects reveals the high performance of the error correcting output codes method. The estimates obtained in this study demonstrate the advantage of using ensemble classification and 2nd order statistical texture features.


Introduction
An important direction of the development of methods for thematic processing of Earth remote sensing data is the introduction of jointly used spectral and texture characteristics of ground objects. The advantages of spectral-texture classification are shown in different studies. For example, the method for combined processing of multispectral and panchromatic satellite images of high spatial resolution, proposed in the paper [1], shows the effectiveness of this approach in solving the problem of classification of soil and vegetation cover in comparison with traditional approaches based on classification by spectral features [2].
Similar conclusions can be found in other scientific papers. For example, when considering the problem of remote sensing identification of species composition of tropical forests, it was revealed that the combined use of the spectral features of stands and statistical texture analysis of panchromatic images can improve the classification accuracy to acceptable values [3]. At the same time, it was shown that such a result cannot be obtained on the basis of characteristic changes of spectral features in different seasons.
High spatial resolution becomes necessary when assessing changes in structural changes of vegetation cover. Structural parameters of forest areas are primarily determined by forest inventory relative stocking and species composition. Estimates of these parameters obtained from remote sensing data are further used in statistical models for the assessment of the biological productivity of forest stands. A good example is results of processing satellite images of Quickbird and Pléiades of the Land de Gascony natural park, located in the southwestern part of France, presented in [4]. In addition to stand density connected with the relative stocking, the proposed method made it possible to obtain estimates of the size of crowns, height and diameter of trees.
Currently, there are many different methods for extracting texture features, however, the issue of assessing their information content in the problems connected with the thematic processing of aerospace images remains open. In this study we present the results of evaluating the effectiveness of using statistical features of the 1st and 2nd orders in the implementation of texture classification of satellite images of very high spatial resolution based on algorithms of varying degrees of complexity.

Texture classification method
The statistical approach to the extraction of texture features from panchromatic images implies an assessment of the probability function of the distribution of gray levels for pixels within the moving window and the subsequent calculation of certain statistical parameters [5]. Currently, statistical texture characteristics of 1st and 2nd orders are used for thematic satellite image processing. For the extraction of the 1st order characteristics, we need to estimate the probability mass function for the gray level distribution, i.e. the occurrence of each of the possible values of gray levels in the analyzed image should be found. In this study, we consider 5 statistical characteristics of the 1st order: mean, mean square, entropy, energy and variance. The corresponding calculation formulas are given in [5].
The calculation of the 2nd order characteristics is based on the construction of the probability mass function for the joint occurrence of gray levels (i.e. the normalized gray level co-occurrence matrices, GLCM) for the given parameters of distance and direction of adjacency [6]. For the calculations, we use 19 different characteristics: autocorrelation, cluster prominence, cluster shade, contrast, correlation, difference entropy, difference variance, dissimilarity, energy, entropy, homogeneity, homogeneity2, information measure of correlation 1, information measure of correlation 2, maximum probability, sum average, sum entropy, sum squares, sum variance. The calculation formulas for these parameters are presented in [1].
We used different supervised learning classification methods of varying degrees of complexity for solving the problem of recognition of natural and man-made objects by texture characteristics. The detailed description of these methods can be found in [7] and [8]. Traditional metric classifiers are the Nearest Centroid (NC) and K Nearest Neighbors (KNN) with optimized kd-tree search. Linear (LDA) and quadratic (QDA) discriminant analysis were employed as statistical classifiers.
Also, for the texture analysis, we used ensemble classification methods, such as random forest (RF) and the error-correcting output codes (ECOC). The random forest method can significantly improve the efficiency of traditional classification algorithms based on decision trees. One of the considered approaches, bagging (RFB), is based on a bootstrap resampling method to randomize the training set and build an ensemble of solutions. Alternatively, the adaptive boosting algorithm (RFAB) is also considered. RFAB is a greedy method for constructing a composition of classifiers, each of which strives to compensate the shortcomings of all previous ones.
The ECOC method is a universal ensemble classifier that allows us to combine a variety of binary classifiers into a multi-class algorithm. In this study, the support vector machine (SVM) with a Gaussian kernel is used as a basic binary learner for ECOC. Thus the method used is abbreviated as ECOC SVM. The effectiveness of ECOC and in particular ECOC SVM was investigated in [9].
Since some of the considered methods are evidently unbalanced, then the number of training samples for each of the considered classes was equalized to provide the equal a priori probability of the classes. We used also the regularized sequential feature selection method proposed in [10] to effectively reduce the feature space.

Results and discussion
Panchromatic satellite images of very high spatial resolution (0.46 m) obtained with the WorldView-2 equipment were used for the texture analysis. The survey was carried out on the territory of the Savvatyevskoe forestry (Russia, Tver region) at noon on June 25, 2016 at the sun height of 55.7. The survey area, consisting of 2 stitched images with different viewing angles, is indicated by a black frame in Fig. 1.
Numerical calculations to assess the informativeness of statistical textural features were performed for two test regions: Konstantinovsky and Tverskoy Posad -the location of which is shown in Fig. 1. The first of them is located in the eastern part of the Krasnogorsky sandpit near the village of Domnikovo. The region contains several large zones (Fig.2b) corresponding to 5 types of natural and man-made objects with different textures: 1 -buildings, 2 -plantations, 3 -pine forest, 4 -peat bog, 5 -water surface. The size of the moving window was set equal to 91 pixels, since this corresponds to the minimum errors for all considered classifiers. The results of classification using the ECOC SVM method according to the characteristics of the 1st and 2nd orders are shown in Fig. 2c, d, e, f.  The test region Tverskoy Posad is located in the mixed forest zone (pine and birch are dominant species) of the Savvatyevskoye forestry and contains stands of various structures: dense canopy, sparse canopy and undergrowth. The contours of the areas corresponding to the listed objects are shown in Fig. 3b. The classification results obtained by using the ECOC SVM method according to the texture characteristics of the 1st and 2nd orders are shown in Fig. 3c, d, e, f. The following terms were introduced as the main characteristics of classification errors: TE is the total probability of error; TOE is the mean omission error and TCE is the mean commission error.
The results of evaluating the classification accuracy obtained for various methods when processing images of the considered test regions are presented in Table 1. The number of training samples (NTS) was chosen equal to 1000 for all calculations. The remaining pixels were used for the independent estimation of classification errors. The number of test samples is much greater than the number of training samples. Estimates of the classification time for one pixel for different NTS values are presented in Table 2. The analysis of the results obtained shows that the considered objects can be classified with acceptable accuracy for both test regions. The NC method reveals the lowest accuracy in comparison with all other methods considered in this study, but at the same time it has the highest processing speed, which practically does not depend on NTS and is by 1-2 orders of magnitude less than the speed of ensemble classifiers. The LDA method has almost the same accuracy as NC, but the processing speed decreases by several times. The use of traditional nonlinear classifiers QDA and KNN leads to a significant increase in accuracy, but the processing speed for KNN is less by an order of magnitude. The greatest accuracy can be achieved when using ensemble classifiers. ECOC SVM showed the best results for both test regions. RFB and RFAB methods have a significantly lower processing speed, however, it should be noted that with an increase in NTS, the processing speed for ECOC SVM decreases significantly, as well as for KNN. The comparison of the results presented in Fig. 2 and 3 shows that the statistical characteristics of the 1st order are less informative for the classification of the considered types of objects. To reveal more subtle texture differences, it is necessary to use the characteristics of the 2nd order. In this case, it is possible to obtain recognition accuracy of more than 92% for all classes.
The thematic map of natural and man-made objects, recognized on the basis of the automated texture classification of the full image of the Savvatyevskoy forestry is shown in Fig. 4. The ECOC SVM method was used for calculations. Validation of the results obtained based on visual analysis and comparison with ground thematic maps shows good quality of thematic processing based on texture analysis.

Conclusion.
Texture processing of panchromatic images of WorldView-2 of the territory of Savvatyevskoe forestry (Tver region) was carried out using traditional and ensemble supervised learning classification algorithms. It is shown that the use of 2nd order statistical texture features leads to a significant increase in the classification accuracy and allows solving the recognition problem for a wider class of objects. The use of ensemble algorithms makes it possible to increase the classification accuracy by 2-3 times in comparison with the traditional ones. ECOC SVM reveals the best results and can be considered as the promising effective method, which provides the classification of the natural and man-made objects with an error of less than 3% and the structural features of forest stands with an error of about 6%.