Neural network model for detection of changes in forest environment using multispectral images

. The article presents the task of detecting changes in the forest environment using multispectral images for balanced forest management and monitoring of the regional forest ecosystem. The method of semantic segmentation based on assigning each pixel a corresponding class label is used for recognition, object classification on the image, and analysis of multispectral images. By studying the spectral characteristics of pixels, the neural network automatically extracts and memorizes features from the data, using them for classification and change detection. Experimental results based on the developed software prototype confirm the reliability of the theoretical aspects of the model for change detection in the forest environment. Moreover, the neural network model for detecting changes in the forest environment using multispectral images has practical significance and can be applied to solve real tasks of the regional forest ecosystem of the Republic of Buryatia.


Introduction
An important task in the field of ecology and forestry is the detection and monitoring of changes occurring in the forest environment.A review of the literature revealed a variety of studies related to various types of monitoring and the development of forest decoding, such as the work of Lupyan E.A., Bartaleva S.A., Malinnikova V.A., and others, including works [1,2].The foundations of the theory of image usage in forest-related tasks were laid by Krinov E.L., Isaev A.S., Danilin I.M., and others, such as [3,4].The application of satellite and multispectral imagery in forest monitoring is the focus of many foreign works by Coppin, Bauer, Gholz, Bertrand, Wilson, Sader and others.Studies on vegetation cover and its individual properties based on remote sensing data are reflected in the works of Bartaleva S.A., Balashova I.V., Lupyan E.A. Multispectral images have gained a solid position in the study of forest stands, such as McRoberts, Banskota, and others [5,6].Research on convolutional neural networks for image classification tasks is presented in works [7,8].
However, it should be noted that the scalability of neural networks, particularly with different architectures, depths, and performance capabilities to process both small forest areas and large satellite images of the entire forest cover and accumulated experimental data in this direction, is currently insufficiently studied for a specific regional forest change detection system in vegetation and soil cover.This is directly related to deforestation, including illegal logging, and consequently, the alteration of the carbon cycle and the ecosystem as a whole.In this regard, in our opinion, the task of improving methods and models for detecting changes in the forest environment using multispectral images is particularly relevant.

Semantic Segmentation Method for Forest Environment Objects
The study employs a remote sensing method for semantic segmentation of images, which combines an object-oriented approach in satellite image analysis with an index-based classification algorithm.Thematic processing of composite images is conducted using Sentinel-2 satellites with diffuse reflection, demonstrating their effectiveness in measuring the density and area of forested areas and capturing various changes within them.
To validate the remote sensing method, a forested area in the Kabansky District of the Republic of Buryatia was chosen.Sentinel-2 satellite images with the highest spatial resolution and a range of spectral channels from 32 to 39 were investigated.Cloud-free Sentinel-2 scenes with a spatial resolution of 10 meters were utilized as input data, covering five summer seasons.The applied satellite monitoring technology allows for the observation of changes in the vegetation index obtained through spectral analysis of high-resolution satellite imagery.
The spectral processing of images can be accomplished based on the Normalized Difference Vegetation Index (NDVI), as we believe this approach is effective in expressing the vegetation state and its quantitative characteristics: (1) where: Nbthe reflection in the near infrared region of the spectrum; Roreflection in the infrared region of the spectrum.Observing that the Sentinel-2 sensors have the necessary bands with reflection in the nearinfrared region of the Nb spectrum and in the red region of the Ro spectrum, we come to the conclusion that the image processed with NDVI is easier to segment, because based on the indices, vegetation is quite clearly separated from other natural objects.Assigning an index and a color gradient to each pixel corresponds to the image of the processed image.
The value and corresponding classes of the NDVI index are given in the Table 1.Thus, semantic segmentation allows for automatic detection of objects and their delineation in images by dividing the image into multiple segments, each corresponding to an object or a part of an object in the image.

Convolutional neural network model of the U-net architecture
Semantic image segmentation is performed using a convolutional neural network (CNN) model called U-Net.The input to the network is an image, and the output is a single-channel map representing the probable area of specific classes of forest objects [10].
Simultaneously, when constructing the object map, the entire image is scanned by a device, whose states are stored in reserved locations on the object map.This construction is equivalent to a convolution operation followed by an additive bias term and a sigmoid function: (2 where d is the depth of the convolutional layer, W is the weight matrix, and b is the offset.The use of a sparse weight matrix reduces the number of configurable network parameters and, thus, increases its ability to generalize.Multiplying W by the input data of the layer forms a convolution of the input data using w, which can be considered as a trainable filter.If the input to the convolutional layer d-1 has dimension N×N, and the receptive field of units in a particular plane of the convolutional layer d has dimension m×m, then the constructed feature map will be a matrix of dimensions (Nm + 1)×(Nm + 1).In particular, the feature map element at position (i, j) can be defined as: (3) while as: (4) where b is scalar.Using formulas (3) and (4) sequentially for (i, j) input positions, a feature map is constructed for the corresponding plane.
The neural network model consists of four levels of blocks containing two convolutional layers with batch normalization and ReLu activation function: During the training of the neural network for semantic segmentation, several hyperparameters are used, including: batch size: the number of images used simultaneously for updating the network weights during training, ranging from 4 to 100; hidden dimension: the number of convolutional filters in each layer of the model, ranging from 10, 20, 40, and 80.The bottleneck of the model has 160 convolutional filters.Encoding layers have skip connections with corresponding layers in the decoding part; number of layers: the number of convolutional layers used for extracting image features is 9; kernel size: the size of the kernel used for convolution on each convolutional layer of the network is 3x3; stride: the number of pixels the convolution kernel moves during the convolution on each convolutional layer of the network is 2; input image size: the size of the input image used for training and testing the convolutional neural network is 256x256; number of epochs: how many times the model will go through the entire training dataset, which is set to 50.
In addition, 20% of the total dataset size is allocated for testing.The input data consists of a three-channel image of a forest area, and the output is a single-channel map of the probable area of a specific feature, which is then transformed into a binary segmentation mask using a threshold.Figure 1 shows the architecture of the constructed neural network based on the U-net model [9].For the performance and evaluation of the functioning of the neural network model, the cross-entropy loss function is used, the result of which is a probability value from 0 to 1. (7) where P is the distribution of the true image; Q is the probability distribution in the model forecast.
Following formula (7), сross entropy is calculated by comparing the actual labels with the model's probabilistic predictions.The greater the difference between the actual label distribution and the predicted distribution, the higher the cross entropy value.In our case, it is a sensitive metric and is 0.012.
To train a neural network, a dataset with images and masks is initially initialized.Next, the dataset set is divided into training and evolutionary samples for neural network training and data prediction.A generalized neural network model is shown in Figure 2. As mentioned earlier, for satellite image segmentation, a forested area in the Kabansky District of the Republic of Buryatia was selected.The area covers approximately 180 square kilometers and is located on relatively flat terrain, predominantly consisting of various-aged coniferous trees.The forest is classified as managed, and planned forest management activities, such as selective cutting, have been carried out in the area for several years.Figure 3 shows a snapshot of a forest fragment before and after being processed by the neural network.In the figure, deforested areas are represented by blue color, the forested area is represented by green color, and the burnt forest is represented by yellow color.The algorithm for recognizing the changed area includes the following key aspects: two images are inputted into the algorithm; the resolution of the images is compared (if they are not equal), and they are rescaled to a common resolution; the images are converted to grayscale for analysis; the absolute difference between the images is computed; a threshold value is used to highlight the changed area; the contour of the changed area is determined; a mask of the changed area is created; the changed area is extracted (cropped); an image with a transparent background is generated; the changed area is saved as a jpg file for further processing by a neural network.

Results
The neural network model was implemented using the Python programming language and an Integrated Development Environment (IDE).Several libraries were utilized, including NumPy, Matplotlib, OpenCV, Pandas, TensorFlow, and Keras.Experiments for training the neural network were conducted using input data from the Kabansky District forestry in the Republic of Buryatia, covering the period from 2019 to May 2023.These details are presented in Table 2.The analysis of activation functions' effectiveness in a neural network is an important step in designing and training a model.
The results of calculating the efficiency of activation functions include investigating and comparing their impact on the training process and the accuracy of the model, presented in percentage ratios, as shown in Tables 3-6.

Fig. 1 .
Fig. 1.Neural network architecture.At the output of the neural network, we have a non-linear activation function called Softmax, which takes the same linear combination as the input: The main stages of neural network training include: initialization of satellite image dataset; data normalization; initialization of the CNN architecture of the U-net model; compilation of NS; training cycle; output of results; evaluation of the results.

Fig. 3 .
Fig. 3. Forest fragment before and after neural network processing.

Table 1 .
NDVI index values and classes.

Table 2 .
Neural network training results.

Table 3 .
ReLu activation function.Data for the number of epochs -50, Sigmoid activation functions are presented in Table4.