Automated multi-classifier recognition of atmospheric turbulent structures obtained by Doppler lidar

We present algorithms and results of automated processing of LiDAR measurements obtained during VEGILOT measuring campaign in Paris in autumn 2014 in order to study horizontal turbulent atmospheric regimes on urban scales. To process images obtained by horizontal atmospheric scanning using Doppler LiDAR, the method is proposed based on texture analysis and classification using supervised machine learning algorithms. The results of the parallel classification by various classifiers were combined using the majority voting strategy. The obtained estimates of accuracy demonstrate the efficiency of the proposed method for solving the problem of remote sensing of regional-scale turbulent patterns in the atmosphere.


Introduction
Atmospheric turbulence is a key meteorological characteristic, responsible for the dispersion of the air pollution and the cloud formation. When wind speed measurements are available with sufficient spatial and temporal resolution, the estimation of turbulence parameters becomes possible. As an example, the ground-based Doppler wind lidar Leosphere WLS100 allows remote measurements of radial wind speed profiles with the temporal resolution of about one profile by second and the spatial resolution of 50 meters.
The measurement database used in this study is based on data of VEGILOT measurement campaign [1], which was held in Paris in September - October 2014. This campaign was aimed at studying urban atmospheric dynamics, air pollution and turbulent regimes from lidar measurements. More information about VEGILOT campaign could be found in [1].
Horizontal wind turbulent patterns was calculated based on horizontal radial wind scans (see [2,3]). Typical examples of these images are presented at Fig.1. Three classes of local atmospheric patterns were introduced: Thermals, Rolls and Streaks. An additional fourth class 'Others' contains patterns that could not be classified as these three turbulence types.
Each type of turbulence structure forms a specific cloud pattern and could be observed on satellite images.

Description of Algorithms
A few thousands of images was obtained for the two-month campaign. The following supervise machine learning SML algorithms was applied to classify local atmospheric patterns: • Parzen - Rozenblatt window (PRW); • K-nearest-neighbors (KNN), a few number of neighbors were tested: К=1, К=3, and К=5; • Error-correcting output codes with support vector machine (SVM); • Quadratic discriminant analysis (QDA).
An expert classified 150 patterns to construct the training set [2]. The in-situ meteorological and satellite data were used in addition to lidar measurements. Since SML algorithms require numerical values characterizing patterns, Haralick texture features were applied for image classification [4]. Below we describe briefly how the features were calculated, see [2] for more details.
The following four statistics were calculated: Contrast, Homogeneity, Correlation and Energy. Each of those characteristics was calculated at different distances and directions of adjacency. The neighbors ranging from first until 30th for each statistic allow generating 30*4 = 120 angular functions [2,4]. Each of these functions was characterized by the following three properties: amplitude (maximum value minus minimum value), integral, symmetry.
Three additional features were added to those 30*4*3 = 360 textural features, namely the time of the scan, the average wind speed and the cosine fit error [2,3]. Thus, 363 features (predictors) were calculated for each turbulent image.
To avoid the curse of dimensionality, the cross-validation stepwise forward selection method was implemented, since the dimension of feature vector largely exceeds the number of patterns in the learning ensemble. The number of optimal features selected [2,5] vary between 2 and 20.
The SML classifiers were combined in one multiple classification (MC) algorithm using the majority voting strategy [6].

Results and Discussion
The total overall accuracy (TA) score for each classification technique was presented in the table 1. It shows the percentage of correctly classified images based on cross validation.
We can see that among applied standard SML algorithms the QDA has the best TA performance, following by SVM and KNN. The PRW algorithm has the lowest TA. It shows that SML algorithms, that produce more complex decision boundaries are less accurate for the given classification problem. We can also note the significant TA improvement for the proposed MC algorithm. A confusion matrix for MC is presented in Fig. 2A. All types of turbulent patterns are identified with good accuracy. After the learning step, in the classification step, the MC algorithm was applied to a complete lidar dataset (test set) of 4557 patterns. Streaks was detected in 23% of cases, Rolls in 10% and Thermals in 17%.
In Fig. 2B, the distribution of turbulence types by the time UTC is shown. As expected, Streaks are generally observed during the nighttime, while Thermals and Rolls are detected in the daytime (see [2] for more details).

Conclusions and Perspectives
On the basis of the results of large lidar dataset processing, we can conclude that the proposed method allows efficient solving the problem of turbulent regimes classification. The comparison of SML algorithms shows that the accuracy of relatively simple QDA is better than accuracy of other SML algorithms that construct more complicated decision boundaries (PRW, KNN and SVM). An important increase of the total accuracy could be achieved by combining a few SML algorithms in MC system. This technique was successfully applied for the study of atmospheric turbulence in Paris region in the autumn.
In the future studies, we are going to optimize the procedure of classifiers' combination and to increase the number of Haralick texture features. The proposed algorithm will be applied to characterize the atmospheric turbulence in coastal areas.