CLASSIFICATION OF A QUICKBIRD SATELLITE IMAGE BY MACHINE LEARNING TECHNIQUES: MAPPING AN URBAN ENVIRONEMENT BY DECISION TREE METHOD .

. Classification is a crucial stage in the processing of satellite images that influence considerably the quality of the result. A variety of methods is proposed in the literature for the purposes of image classification. They present many differences in their basic principles, thus in the quality of the results obtained. Therefore, a study of different classification methods seems to be essential. The classification of satellite images with conventional methods can be done in several ways using different algorithms. These algorithms can be divided into two main categories: supervised and non-supervised. Decision tree on the contrary is a machine learning tool. It is a plain model characterized by the simplicity of understanding and interpretation. This work aims firstly, to classify a high resolution Quickbird satellite image of an urban area by the decision tree method and compare it with the conventional classification algorithms in order to evaluate its efficiency. The methodology consists of two main stages: classification and evaluation of results. The second is based on the calculation of a number of statistical indices derived from the confusion matrix: the statistical parameter "kappa" and the overall coefficient of precision.


Introduction
For proficient and sustainable management of urban areas there is an urgent need for effective and successful monitoring of physical changes over time [1,2,3], Satellite images can be decisive in helping manage cities and infrastructure growth [4,5]. They are considered as one of the most important data sources for land mapping due to their extensive geographical coverage at an efficient cost while providing irreplaceable information on the earth's surface [6,7,8]. However, the accuracy of the produced maps is considerably affected by the accuracy and choice of the classification method [9].
Choosing the right classification method is not an easy task. The panoply of methods and algorithms used to classify satellite images certainly leaves a wide choice to the analyst, but complicates his task insofar as the basis changes completely from one method to another. According to Lu and Weng [10], it is not only the imagery appropriateness but also the right choice of classification method that affects the results of land * Corresponding author: o.ameslek@usms.ma cover mapping. In literature, classification methods range from unsupervised algorithms (i.e., ISODATA or K-means) [11] to supervised algorithms (i.e., maximum likelihood) and machine learning algorithms such as k-Nearest Neighbors (KNN), decision trees (DT), support vector machines (SVM), and random forest (RF) [12].
For the decision tree, it consists of a hierarchical series of decisions to be made in order to determine the correct class. The decision tree consists of a number of decision nodes. Each node makes an assignment to a class or group of classes [13]. The advantage of decision trees is that it is possible to integrate different data sources and different types of attributes at each level of decision [12]. The choice of relevant decisions at each node is very important to obtain an accurate classification. Otherwise, there may be an accumulation of errors and difficulty in discriminating between subcategories of the same class [14]. In this paper, we will be using the decision tree method for land cover classification in an urban area presented by a high resolution Quickbird satellite image.
The main purpose is to implement this machine learning algorithm and evaluate its performance in a relatively complex environment (urban zone) in order to optimize its usage and help researchers better choose between different image classifiers.

Study area and satellite image.
Our study area is Rabat city (figure 1), the administrative capital of Morocco. It is located on the Atlantic coast at 33 ° 1', 31" North, and 6° 53' 10" Ouest. The choice of this area is due on the one hand to the availability of images and on the other hand to the diversity of the existing details: buildings of different configurations, a dense road network and a vegetation cover.
In this study, we used a Quickbird image (figure 2), previously corrected, in a merged panchromatic and multispectral mode, dating from the year 2007.  This image has four spectral bands for which Table1 presents the corresponding wavelengths.
In panchromatic mode, the spatial resolution is 0.6 m, in multispectral mode, the spatial resolution is 2.4 m. To accomplish this study, we will use the ENVI software (Environment for Visualizing Images) which is a professional software from the company "EXELIS" for processing remote sensing, optical and radar images. All image processing methods for geometric and radiometric corrections, classification and cartographic layout are present. Other tools related to the visualization and modelling of topographic data are also available.
The ENVI software is designed in IDL (Interactive Data Language) and therefore offers advanced programming resources. This software is much more specialized for multi-spectral images than for the more cartographic aspects of other GIS software. It allows easy integration of raster and vector data. It offers some classification algorithms. On the other hand, it has the ability to easily add modules and external programs in various ways can use its algorithms.

The decision tree workflow.
The decision tree workflow takes place according to the six following steps (figure 3):

Fig. 3.
Steps in constructing the decision tree.

Identification of thematic classes
The determination of these classes requires a very good knowledge of the actual land use of the image area. Land cover exhibits different spectral signatures at the time of image registration. It is therefore a question of defining as many spectral classes as there are very different situations for each land use.
The classification algorithm will thus be able to process each spectral signature independently. In this study, six thematic classes have been identified on the image ( Table 2): Roads constructed of materials that appear with a dark color having an elongated shape Bare ground These are bare land, with a yellowish color.

Trees
Is characterized by the density of the vegetation and the saturation of the green color which appears red when viewing the image in false colors.

Pastures
Ground covered with little less dense vegetation that appears with a lighter red color.

Shadow
This is the darkest part of the image. Usually black.

Identification of attributes and indices
We mainly used the spectral attributes based on the spectral values of the pixels in the different bands of the satellite image. These spectral values are used either directly or to calculate other indices that will facilitate the classification.
• Spectral response in the bands In our case, it is adopted to characterize the shadow pixels. Indeed, the shadow has low radiometric values in the 4 bands of the image used.
• Normalized Vegetation Index (NDVI) The normalized difference vegetation index, also called NDVI, is constructed from the red (R) and near infrared (NIR) channels. The normalized vegetation index highlights the difference between the visible red band and the near infrared band. It is widely used for the discrimination of vegetation type objects. It was chosen to spectrally characterize the pixels of vegetation and bare soil. and therefore constitutes a reference value when looking at plant cover.

• The route extraction report
It is calculated from the blue channel (B) and the near infrared channel (PIR) of the image according to the following formula: This report is used to discriminate between the class of roads and the class of buildings. Indeed, the roads have low values of this ratio which facilitates their classification.

Analysis of the discriminating power of attributes
This analysis aims to study the discriminating power of the different quantifiable attributes. It seeks to associate with each attribute a threshold with respect to which the attribute characterizes the abstract property that it describes. For this, the analysis consists in first studying the mathematical formulation and the variability of each attribute [14]. In this study, the thresholds were set on the basis of their use in the literature and by testing. Thus, for the NDVI vegetation index, the thresholds have been set considering the values given in the literature by [15]. For the other indices and spectral values, the thresholds were set after performing several tests until satisfactory results were obtained.

Construction of the tree and classification
The construction of the decision tree will be done per node, each node will include a test on an index or a given spectral value, with the threshold which has been judged the most discriminating between the two classes or groups of resulting classes. We will follow the following architecture to build the decision tree ( figure 4):   Fig. 4. The decision tree built on Envi.

Accuracy assessment
No result -even if quantitatively proven -will convince, if it does not satisfy the human eye [16]. The result will be evaluated first visually; by measuring its similarity with the ground truth. The image used in our application covers an urban area. The comparison is based essentially on the criteria of shape and delimitation of buildings, roads, trees and pastures. Bare ground and shade are also taken into consideration. We will use for the evaluation the error report which will contain the confusion matrix and the indices which are derived from it namely: the global accuracy and the kappa coefficient. The value of the kappa coefficient varies from 0 to 1, Table 4 presents the evaluation of the results of the classification according to the value of the Kappa coefficient. Kappa 0-0.20 0 .21-0.40 The classification result Very weak Weak

Results and discussion
The execution of the decision tree built on Envi gives the following result (figure 5). According to the visual interpretation of the result by the classification carried out by the decision tree, it can be seen that the image obtained is very satisfactory and very faithful to the reality on the ground. All classes are present and the geometric shapes have been well respected. The qualitative parameters that evaluate the result obtained during the classification by the decision tree are presented in the following table (table 5): The classification by the decision tree gave an "overall precision" of 90.22% and a "kappa" coefficient of 88.36% and, hence the result is considered very good and very satisfactory. If we compare the result of classification by the decision tree with the results obtained by some traditional algorithms used to classify the same Quickbird image we find that the decision tree gave better global precision than the K-means, ISODATA, minimal distance, Mahalanobis distance as shown in Figure 6. The value of the Kappa Coefficient also exceeded the values obtained by the other algorithms. Indeed, products based on decision trees are easy to use, very visual and their implementation is very intuitive. It is a very understandable and easily interpretable white box model thanks to its meaningful and informative graphical representation [18]. It also makes it possible to manage and use data from different sources and types.

Conclusion
In this paper, Decision tree technique is applied to classify a very high resolution satellite image in order to get the land cover presentation. The main objective of this paper is to test a machine learning technique and evaluate it performance but also to help researchers choose between the different available image classification techniques. The decision tree has proven its efficiency and performance with a 90.22% for global precision and 88.36% for Kappa coefficient. If compared with traditional classification algorithms, these results outperformed them all. These findings provide insights into the selection of classifiers and highlights the importance of the decision tree method as a machine learning classification tool. It is a clear model characterized by simplicity of understanding and interpretation that has proven its efficiency and performance in the classification of high resolution satellite images. Finally, for a possible similar study, it is recommended to.
• Include texture in the classification process.
• Use other indices and try other thresholds to build the decision tree.
• Use an object-based approach in classification.