Research on GF-6 Data Selection Technology for Fine Classification of Forest Land

. GF-6 satellite is a kind of high-resolution satellites launched by China in recent years. Its sensors have the characteristics of multispectrals, wide field of view, high spatial resolution and high frequency imaging. In order to carry out fine identification of forest types, this paper proposes a method to improve data screening efficiency and data availability rate in GF-6 satellite data selection stage. This paper describes the selection process and key technical methods of GF-6 satellite data, and gives a verification program. It has been proved that the program meets the design objectives and can quickly scree out the required fast screening technologies in the face of massive data and large-area business applications, thus increasing the degree of automation and reducing the workload of manual visual selection.


Introduction
Remote sensing data query is an important first step in remote sensing information processing software. Remote sensing data query is also the first step in large-scale forest remote sensing mapping or forest land product production. So far, the conventional practice is that the application unit searches the web page or retrieval program provided by the remote sensing data provider for the remote sensing data it needs, and places an order to obtain the data. It is often time-consuming and laborious to query data that meet the needs of large-scale forest land mapping, such as national and provincial data. It is necessary to manually stack the views within the required time phase, space and cloud cover range to speed up the selection of the minimum number of image sets with the best quality and coverage area.
In recent years, more and more high-resolution remote sensing images with good quality and low price are available. As a result, the workload of data selection is increasing. More automated tools are needed to help manually improve the efficiency of remote sensing data selection. Gaofen (GF) is a series of Chinese civilian remote sensing satellites.
GF-6 data have great application potential in forest land mapping, forest type classification and other fields. However, GF-6 data have a huge amount of data due to its special width and multi-bands. In order to carry out fine identification of forest types. Facing the fine classification of forest land, the total amount of remote sensing data is initially estimated to cover more than 5 times of China's land area, and the demand for data increases by more than 50% every year. Every year, business departments need to sort out the best imaging quality data that meets the temporal and spatial needs and covers at least one repetition in a short time. This is a challenging task in the era when massive remote sensing data are available.
Traditional remote sensing data query methods, such as China Center for Resources Statelite Data and Application provided [1] or some researches [2] require a lot of manual work in the face of massive data query. This paper proposes a method to improve data selection efficiency rate in GF-6 satellite data selection stage.

Data
GF-6 is an optical satellite similar to the Gaofen 1 satellite, but using a different instrument suit, consisting of a 2/8 m resolution panchromatic/hyperspectral camera with an image swath of >90 km) and a 16 m resolution wide angle camera with an 800 km image swath. Both cameras use a three-mirror anastigmat telescope. Both covers visible light to NIR bands (wavelength ~450-900 nm).
The experimental data in this paper are GF-6 image fast view data of Wangyedian National Forest Park. Wangyedian National Forest Park is located in Wangyedian Town, Chifeng City, Inner Mongolia Autonomous Region. It is a junction zone between Daxing 'anling and Yanshan Mountains. Plants are the largest community in northeast and north China. Located in the southwest of Harqin Banner, with a forest area of 348,000 mu, there are more than 20 tree species such as pine, cypress, birch and linden.

Overall query strategy and method
GF-6 data retrieval provides the following retrieval conditions: satellite (drop-down box selection), payload (drop-down box selection), data level (drop-down box selection), acquisition time interval (time control selection), cloud coverage (manual entry). There are two spatial retrieval methods for high score data: spatial retrieval based on coordinates and spatial retrieval based on administrative divisions. The actual data selection strategy implemented in this paper is as follows in sequence.
The first step is to select the image sequence number that satisfies the time, region and metadata to mark the cloud amount. This step is a traditional remote sensing data query method. Regions support polygons and administrative divisions. Considering that the selected images are for the subsequent application of fine classification of forest land, we have designed a simple NDVI index based on fast view RGB images and marked vegetation region at the same time in this step. In the second step, the cloud cover and image quality indexes proposed in this paper are used for further deletion. Thirdly, according to the GIS spatial geometry calculation method, the image sequence number is selected by selecting the minimum overlap image method under the condition of meeting the maximum area coverage.

Image quality ranking
For image quality ranking, a remote sensing image quality evaluation algorithm based on multi-index synthesis for thumbnails is adopted. The indexes considered include definition, noise, invalid pixels, etc. Sharpness refers to the clearness of each detail on the image and its boundary. It is an important factor to reflect the quality of remote sensing images. Average gradient method is adopted. The average gradient method is to calculate the gradient value of each pixel in the image, and then average the gradients of all pixels to obtain the average gradient of the image. In general, the larger the average gradient, the clearer the image.
In order to evaluate the noise in the image, it is generally necessary to select a uniform area of a certain size in the image and calculate some evaluation index. The purpose of selecting a uniform area is to eliminate the influence of detailed information such as edges and textures. The calculation index usually adopts the Inverse Coefficient of Variation (ICV), that is, the ratio of the mean and variance of the regional image. A plurality of uniform regions can be selected to solve respectively, and then ICV of each region is averaged as the noise evaluation result of the whole image. [3,6,7] If the proportion of invalid cells in the image is too large, the availability of the image is not high. The simplest way to judge invalid pixels is to compare pixel values point by point. If a pixel has the same value as the four corners of the image, it is considered as invalid pixels.

Cloud Recognition Based on Quicklook View
For cloud amount judgment, a remote sensing image cloud amount judgment algorithm based on fuzzy clustering for thumbnails is adopted.
The cloud amount judgment process is divided into four parts: feature extraction of remote sensing images, data matrix composition, fuzzy clustering, cloud area identification and segmentation. For the feature extraction part, the most basic and direct feature is selected: grayscale feature. The gray values of RGB channels of thumbnail images are taken as three features to directly form a feature vector. At the same time, it is also the processing (point processing) for the pixel points. After the data matrix is formed, the most basic fuzzy clustering: fuzzy C-means (FCM) algorithm is used for clustering, the classification is judged according to the membership matrix, and the cloud area is identified and segmented according to the clustering results.
For feature selection, considering the importance of color features of multispectral remote sensing images, RGB space is converted to HSV color space for clustering. Experiments show that most data can be effectively identified by taking S and V as features.
For fuzzy clustering algorithm, the advantage is that the algorithm is relatively simple, but there are also many shortcomings, such as the need to manually specify the number of clusters, especially sensitive to outliers (noise). Improved fuzzy C-means clustering (IKFCM) in feature space has the ability to discover non-convex clustering structures. Generally, there are two approaches. One is to write the clustering center as a linear combination of the points in the mapped feature space (IK FCM1), which is similar to the approach in the support vector machine classifier. The second is to eliminate the center as an intermediate variable (IKFCM2). The improved fuzzy C-means clustering (IKD-FCM) based on kernel distance uses kernel function to change the distance function in the original algorithm to define the objective function, which overcomes the problem of too many parameters and sensitivity to parameter setting in the original algorithm to a certain extent. This paper adopts IKD-FCM algorithm. [4,5]

GF-6 Fast View Similarity Judgment Based on Depth Learning
FaceNet [8,9,10] is used to measure the similarity of remote sensing images. In the actual implementation of the network, not for the overall thumbnail, but to compare thumbnail blocks. In this paper, we use 400 to 500 pixel size images for similarity measurement. Triplet loss is used to directly verify, identify and cluster the desired remote sensing images. The square distance between high-score remote sensing images in the same area is small, and the square distance between high-score remote sensing images from different pairs is large. The end of the model structure is directly classified by using Triplet loss. Traditional loss functions tend to map images with a class of features to the same space. Triplet loss attempts to separate one of the remote sensing images from other remote sensing images. Triplet is actually three examples, such as benchmark, positive example (similar), Negative example (different) (Anchor, Positive, negative), which are judged by distance relation.
In the training process, 18 pairs of large-format GF-6 panchromatic remote sensing images were used. Considering that the remote sensing images of each large format are firstly cut into 16 pairs of 480*480 small format remote sensing images, and then respectively placed under different directory files, a directory has a pair of similar remote sensing images and three different other images as triplets, and the training data set includes 256 directory files and 768 training images. Then it uses random clipping, overturning, rotation, chromaticity change, light and shade change, noise interference and other operations for synchronization, and then inputs them into the network to train the model. The basic network of FaceNet uses the Inception ResNet v1 model and uses the Triplet loss function to train the model. According to the characteristics of high-score remote sensing images, the network parameters are adjusted, ADAM descent mode is selected, and then the training data set is trained to save the best classification result, which is conducted for a total of 50000 times. In order to verify the feasibility of GF-6 data selection method we proposed, a prototype program is designed in this paper. The program includes five modules, namely GF-6 data metadata storage module, which realizes batch storage of GF-6 image metadata and fast views. The visual data selection module is used for realizing the time and space range selection based on humancomputer interaction and screening out image data falling into the space and time range. The optimal data screening module for forest land classification gives a data screening scheme according to the method in section 3. The optimal data list generation and output module generates a screening result scheme in the form of a data list and outputs the screening result scheme in the form of an XML file. The data screening task management module manages the one-click data selection process and establishes task files and logs for each screening process, which can merge, split and continue data screening tasks according to the task files.
The output result of the program generates a list of candidate data product numbers/scene numbers according to satellite sensors, and the output of candidate data files is conditional (sensor, time, regional fourcorner coordinates, regional coverage, cloud cover, image quality value), followed by a list of selected product numbers separated by commas. At the same time, the candidate data is copied to a directory. Fig. 2 shows the interface of software GF-6 remote sensing data query. the special data selection conditions in this paper are set in the parameter setting. Fig. 3 shows the results of data selection.

Verifiable test
We query GF-6 multispectral data covering Wangyedian National Forest Park (covering Harqin Banner, Chifeng City) from January 1, 2019 to September 10, 2019, with cloud cover less than 20%, and obtained 34 fast views according to the traditional retrieval method. Furthermore, the image quality judgment and cloud estimation algorithm evaluation in this paper are carried out, and about 10 more usable images are obtained. Furthermore, according to the minimum repeated coverage calculation proposed in this paper, the 5 images can cover the whole area of Wangyedian National Forest Park. And these 5 images are all within the range of NDVI value calculation, i.e. greater than 0.5, so they belong to the range of forest coverage. This completes the optimal data selection for subsequent classification of forest land. Fig. 4 shows a quick view of data with poor cloud cover and image quality in the results of common data query. The algorithm in this paper automatically excludes such data from the final data selection result, because the data with such quality cannot be used for subsequent forest land type identification. Fig. 5 shows that the final selected data quick view is overlaid on the selected area.

Conclusions
At present, satellite data screening mainly relies on manual visual inspection to select fast views, which is time-consuming and laborious. Based on metadata and fast view, this paper proposes a GF6 satellite data fast automatic optimization and screening technology that meets the requirements of data quality for the application requirements of subsequent forest land type fine identification. The main purpose is to distinguish similar and duplicate data, which can reduce the workload of manual selection and improve work efficiency.
The algorithm in this paper also supports the joint data optimization of domestic high-score satellite data in GF-6 and GF-1/-2 networks. The algorithm in this paper has a variety of scheme formulation and scheme adjustment functions. It has the capability of cloud judgment, setting up national regions and automatic data selection under multi-time period setting. The optimal coverage area has at least repeated coverage adjustment. It has a variety of region visualization methods and has the capability of outputting in various formats such as sensor, quality, batch product number and scene serial number according to scheme.
The method in this paper is not as powerful as the method proposed in Datcu's paper [11], nor as complicated, and is easier to implement. Up to now, only a few scientific research experiments have been applied following the Datcu's paper. Compared with the existing satellite data selection methods, this method is innovative and has great practical value, especially in the era of massive satellite data, it can greatly reduce the work intensity of manual data selection. The method proposed in this paper will need a lot of data testing and wider application soon. Only in the process of continuous use can stable use expectations be obtained.