Prediction of local particle pollution level based on artificial neural network

. Citizens eager to know the local pollution level to prevent from air pollution. The real-time measurement for everywhere is a very expensive way, a statistical model based on artificial neural network is applied in this research. This model can estimate particle pollution level with some influencing factors, including background pollution level, weather conditions, urban morphology and local pollution sources. The monitoring from regulatory monitoring sites is considered as the background level. The field measurements of 20 locations are conducted to feed the output layer of ANN model. The average relative error of prediction compared with measurement is 9.24% for PM10 and 18.90% for PM2.5.


Introduction
In recent years, the air pollution issue has drawn widespread public attention. Citizens eager to know the local pollution level, i.e. the concentrations of some main air pollutants, which can advise them to make some protection for their outdoor activities [1] or to decide whether or not to apply the natural ventilation for energyefficiently comfortable indoor environment [2]. It is also very important data for the risk assessment of some environmental-related diseases [3].
There are two main approaches to acquire the particle concentrations: measurement and prediction. Measurement is the method with the most accuracy, it directly reflects the true value of the sampling point when ignoring the system errors, but the cost, including equipment, maintenance and labour costs, is much higher. Other than this, quantities of models, further divided into the numerical model and statistical model, were developed to estimate the dispersion and concentration of particulate matters. For the statistical model, the multiple linear regression (MLR) and the artificial neural network (ANN) are two mainstream approaches.
This research will develop a fast-speed prediction model of particle concentrations at any location of urban area. Some variables regarding the outdoor weather status, the local pollution sources and the urban features will be input into this model.

Artificial neural networks (ANN)
Artificial neural networks (ANN), in an entirely different way from the conventional algorithms, are computing systems vaguely inspired by the biological neural networks that constitute human brains [4]. The structure of a fully connected feedforward neural network is consisting of the input layer, the hidden layers and the output layer (Fig. 1). The activation of (the j th neuron in the lth layer) is related to the neurons in the (l-1) th layer by the equation: where −1 is the k th neuron in the (l-1) th layer, nl-1 is the total number of neuron in the (l-1) th layer, is the weight for the connection from the k th neuron in the (l-1) th layer to the jth neuron in the lth layer, is the bias of the j th neuron in the l th layer, and ( * ) is the activation function, which determines its nonlinear properties.

Measurement of real-time pollutant concentrations
In order to get real-time pollutant concentration data, the field measurement was carried out on the central urban area with the dense built environment in Chongqing from July 2015 to January 2016 (covering summer, autumn and winter). There are totally 20 dwellings i.e. locations selected in the area covering five central districts, namely Yuzhong District, Jiangbei District, Shapingba District, Yubei District and Jiulongpo District. Continuous 4~5 days monitoring data were collected for each location successively (totally 84 days).
The measuring variables include temperature, relative humidity, concentrations of particulate matter (including PM10 and PM2.5). All the monitoring equipment are set-up to log data in 1-min interval, and the collected data could be processed for specific purposes readily.

Background pollution level
Local emission, dispersion and deposition status contribute to the overall air pollution level in a macro scale, in return, the local air pollution level can be considered using overall air pollution level adding the features influencing the production and movement of pollutants. The background pollution level is related to factors such as socio-economic development, and not the focus of this study. So, the particle pollution monitoring data from regulatory sites are used to reflect the background pollution level. The hourly PM10 and PM2.5 data are obtained from the National Air Quality Real-time Release Platform (http://106.37.208.233:20035/) [5] by the China National Environmental Monitoring Centre. There are 6 sites selected from the case-study. Not statistics (like average, maximum, minimum) of those sites, instead data from all the selected sites were directly entered the predicting dataset, which would help the model learn its spatial associations, making this part of variables play its role in spatial interpolation, and the other variables to capture the local features.

Meteorological conditions
Daily and hourly observations from the China Meteorological Administration (http://data.cma.cn/) [6] are obtained. The observation site chosen is called Shapingba (57516), where located in the urban area, it is the closest to all field measurement points. The daily temperature, relative humidity, wind speed, sunshine hours and precipitation are analysed to capture some characteristics of the case-study area. Additionally, the hourly temperature and relative humidity were measured in the measurement campaign as mentioned above. The measurement data and official data are compared, and the hourly temperature and relative humidity from field measurement enter the predicting dataset, and without measurement of wind speed and precipitation on-site, instead the weather station data are applied.

Urban morphology
Building coverage ratio (BCR) is the percentage of total area covered by building in a target land, indicating the compactness of infrastructures horizontally, which is the most commonly used indices for quantifying the building density at land lot scale [7]: where S is total area of target land, Ai is the coverage area of the building i, and n is the total number of building in the target land. The building coverage ratio at different heights is calculated to express the urban form with the density of the buildings, and reflect its changes in the vertical direction, using a set of values to depict more details of the three-dimensional morphological characteristics of the urban.

Local pollution sources
Roads are one of the sources of pollutant emission in an urban area. The statistics of transportation facilities and information from the real-time release platform of road condition are utilised to symbolize the pollutant emission level of the local area and its surroundings. The transportation facilities are recognised using the satellite image provided by the software Google Earth Pro (vision 7.3.2) during the field measurement period (21th Oct. 2015). The length of each road on 500 m*500 m buffer area centred on the sampling point can be measured, and the number of lanes for each road can be counted.
A large amount of dust generated from the construction site can carry for a wide range of area over a long period of time. The construction site within 500 m based on the sampling point is also recognised with the satellite image provided by the software Google Earth Pro. The area of construction sites and the distance from the sampling point are input into the model as the estimators for local traffic emissions. If there is no construction site appearing in the surrounding area, the area of construction sites is set as 0 m2, and the distance is set as 10 km.

Model evaluation
The effectiveness of the prediction can be evaluated by statistics measuring how well the observed outcomes are replicated by the model. The root mean square error (RMSE) and the mean absolute error (MAE) is the most common indicators used with prediction models. RMSE use the square root of the second sample moment of the differences between predicted values and measured values to represent the overall accuracy.
where Pi is the ith predicted value, Mi is the ith measured value, and n is the volume of the datasets to compare. The Pearson correlation coefficient (r), a value between -1 and +1, is a measure of the linear correlation between predicted values and measured values.
where � is the average of measured values, and � is the average of predicted values. Totally 40 predicting variables, including time periodicity, background pollution level, weather conditions, urban morphology and local pollution sources are considered in this model, it is a spatial interpolation model considering comprehensively the local divergence, including metrological conditions, urban morphologies and emission sources (SC0). Other three input variable schemes are put forward to discuss the impact of inputs on the prediction performance. SC1 only omits background pollution level, it can be used when there is no knowledge of real-time pollutant concentration on certain locations surrounded. SC2, considering the meteorological conditions and local pollution sources, is the most common input variable scheme in the previous studies. SC3 only used official-released data measured from 6 regulatory sites, which is the application of ANN to the spatial interpolation.

Results
The prediction results of ANN model with background pollution level, weather conditions, urban morphology and local pollution sources well presents the measuring data (Fig. 2). The mean square error for PM10 is 13.12 μg m -3 , and 13.44 μg m -3 for PM2.5. It shows linear relationship between predicted values and measured values, with the Pearson coefficient of 0.937 for PM10, and 0.948 for PM2.5. The bias is very small for PM10, but it shows certain negative bias for PM2.5 (Table  1 and Table 2), however, the positive errors appear in the higher concentration, and the negative bias mainly caused in lower concentration.
The prediction performances are also compared to other 3 input variables scheme ( Table 1, Table 2 and Fig.  3). SC1 omits background pollution level, it also shows very good performance of prediction with the Pearson coefficient of 0.948 for PM10 and 0.927 for PM2.5. This input scheme can be used to predict the pollution level when there is no knowledge of real-time pollutant concentration in certain locations surrounded. SC2 considers the meteorological conditions and local pollution sources, the prediction performance has become lower without the input of urban morphologies information. But the worst performance is appeared with SC3, only using official-released data measured from 6 regulatory sites. The mean square error reaches around 20 μg m -3 , and the Pearson coefficient is only 0.851 for PM10 and 0.708 for PM2.5. This result indicates the application of ANN to the spatial interpolation is relatively limited, it must consider some local information relevant to the generation and dispersion of air pollutants. The distributing of relative error of PM10 and PM2.5 respectively using predicted value compared with the measured value. The relative error is most concentrated around 0 for SC0, and most scattered for SC3.

Conclusion
A fast-speed estimation model of particle pollution considering some influencing factors, including background pollution level, weather conditions, urban morphology and local pollution sources, is presented. This model is a statistical model based on artificial neural network.
The background pollution level from regulatory monitoring sites replaces most macro-scale development factors that influence the overall pollution level. Local pollution sources directly show how many particle pollutant generates locally. The weather information may influence the whole process of particle contamination, generation, dispersion, transformation and deposition. Urban morphology is displayed using generic indexes to show the urban texture, which will impact the dispersion and deposition of particulate matters. All these factors are input into the ANN model, and the estimation performance is validated with testing dataset. This model can be used for spatial interpolation of particle concentrations. And it can be further used as an operational tool for air quality forecasting with suitable adaptations in any other dense urban areas.