Prediction of PM 10 Level During High Particulate Event in Malaysia Using Modified Model

. Particulate matter (PM 10 ) is one of the key indicator of air quality index (API) during high particulate event (HPE). PM 10 can cause adverse effect on human health and environment; hence, it is important to develop a reliable and accurate predictive model to be used as forecasting tool to alarm the citizen especially during HPE. This study aims to develop a modified Quantile Regression (QR) model to forecast the PM 10 concentration during HPE in Malaysia. The performances of three predictive models namely Multiple Linear Regression (MLR), Quantile Regression (QR) and a modified QR models i.e. combination of QR with Relief-based were compared. The hourly dataset of PM 10 concentration with other gaseous pollutants and weather parameters at Klang from the year with severe haze event in Malaysia (1997, 2005, 2013 and 2015) were obtained from Department of Environment (DOE) Malaysia. Three performance measures namely Mean Absolute Error (MAE), Normalised Absolute Error (NAE) and Root Mean Squared Error (RMSE) were calculated to evaluate the accuracy of the predictive models. This study found that the Relief-QR model showed the best performance compared to MLR and QR models. The prediction of future PM 10 concentration is very important because it can aid the local authorities to implement precautionary measures to limit the impact of air pollution.


Introduction
Malaysia experienced air pollution issue for over a decade as a result of high particulate event (HPE) from its neighbouring country, Indonesia.Consequently, the occurrence of HPE is not exceptional in Malaysia as it was first recorded back in the year 1982 when regional haze from biomass burning disrupted daily life in Malaysia [1].Since then, several episodes of HPE have been reported whereby the concentrations of particulate matter (PM) with an aerodynamic diameter of less than 10 μm (PM 10 ) concentrations greatly exceeded the Malaysian Air Quality Guideline for PM 10 concentration (150 µgm -3 for a 24-hour average) at one or more locations across Malaysia.In most years, the Malaysian air quality was influenced by the occurrence of dense HPE episodes.A research on air quality in Kuala Lumpur by [2] found that the smoke haze was associated with high levels of suspended micro particulate matter, but with relatively low levels of other gaseous pollutants such as carbon monoxide, nitrogen dioxide, sulphur dioxide, and ozone.Series of HPEs were recorded in peninsular Malaysia, Sabah, and Sarawak in 1991, in 1994, and during September and October of 1997, resulted from the significant amounts of particle matter that have been transported by south-westerly winds from neighbouring country due to uncontrolled biomass burning activities.This is common at some poorly managed disposal sites and results in smoke and fly ash problems.The large-scale forest and plantation fires, mainly in southern Sumatra and central Kalimantan, both in neighbouring Indonesia have contributed to the cause of the occurrence in 1997.Department of Environment (DOE) Malaysia reported the HPE episodes in Malaysia which can be highlighted with severe incidents recorded in the year 2005, 2013 and 2015 [3].The crisis has also affected not just Malaysia but other neighbouring countries such as Singapore and Brunei.Health problems such as respiratory, cardiovascular diseases and increase mortality rate has long been linked to the long-term exposure to PM10 [4], [5].The prediction of PM 10 can provide a good insight allowing the government and authorities to plan appropriate proper mitigation actions in order to minimize the health issues arising due to exposure to PM 10 .
Over the past decade, several studies have been conducted to predict air quality.However, the majority of these studies were restricted to utilizing a statistical approach.For example, a study by [6]- [8] forecasted PM 10 level in the East Coast peninsular Malaysia during various monsoon was conducted to analyse its variation during usual condition of ambient atmosphere by developing a multiple linear regression (MLR) model, based on various site backgrounds.A study on the distribution of the ozone in Athens via quantile regression (QR) was conducted by [9].The results of the study exhibited that the influence of independent variables vary over the quantile distributions of ozone and the nonlinear relationship between ozone and the independent variables was delineated by using QR.The number of inputs were not optimized in most of the studies.Therefore, this study aims to compare different approaches for selecting significant input variables before selecting the best one to predict the PM10 concentration.
Various researches implemented models to forecast air pollutants during usual condition but there are lack of studies that predict air pollutant especially PM 10 specifically during HPE.This study focuses on developing hybrid model to forecast the PM 10 concentrations specifically during HPE occurrence in Malaysia, by combining QR approach with Reliefbased method.The development of single MLR and QR models, along with a hybrid model combining QR and Relief-based operator, was aimed at exploring different methodologies to forecast PM 10 concentrations during haze episodes.However, the models exhibited a degree of bias due to the variation in weighting strategies and model complexity.The observed bias underscores the importance of acknowledging potential divergences in modelling approaches.The model developed will be very beneficial for local authority to take precautionary measures to avoid or minimize their exposure to unhealthy PM 10 levels and introduce necessary actions aimed at improving air quality.

Dataset
The hourly datasets at Klang that sited in the west coast region of peninsular Malaysia was obtained from Department of Environment (DOE) Malaysia.The dataset consisted of PM 10 concentration, gaseous pollutants such as nitrogen oxides (NOx), suhfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), ozone (O 3 ) and carbon monoxide (CO).The meteorological parameters such as wind speed, temperature and humidity were also included in the dataset.The hourly data were taken from year 1997.2005, 2013 and 2015 where severe haze was recorded in Malaysia.

Feature Selection
In this study, the process of feature selection, which involves reducing the number of input variables during the development of a predictive model, was employed using the filter method.It picks and retains only the most significant features from the dataset.Relief-Based Algorithm (RBA) was utilized in this study.RBA is a group of algorithms that select the most informative features from high-dimensional data sets based on their ability to distinguish between different classes [10].The primary principle of "Relief" is to assess the quality of features by evaluating how effectively their values distinguish between cases of the same and different classes that are in close proximity to each other.Relief assesses the applicability of features by sampling examples and comparing current feature's value for the nearest example of the same and of a different class.Relevant parameters were selected by using RBA approach prior to modelling of PM10.The datasets were evaluated by weight by Relief using RapidMiner software by computing the attribute weights for each parameter involved.The weights computed were normalized into the interval between 0 and 1 if the normalize weights parameter is set to true.

Prediction Model
In this study, PM 10 concentrations for the next-day (PM 10+24 ), next-two-days (PM 10+48 ) and next-three-days (PM 10+72 ) were forecasted.The hourly data of the PM 10 concentrations, gaseous pollutants and weather parameters were distributed into training and testing dataset.
The training dataset was used to develop the prediction model, while the testing dataset was used in the model validation process.The training dataset consists of 80 percent of the data meanwhile 20 percent of the data was used for validation purposed.Three predictive models were developed which include Multiple Linear Regression (MLR), Quantile Regression (QR) and hybrid model (Relief-QR).
MLR is a widely used forecasting approach that predicts the outcome of a dependent variable by fitting a linear equation to observed data, considering the values of two or more independent variables.It is among the most commonly employed methods for making predictions in various fields.
QR was used to develop a model to predict the PM10 concentration at each study area.It is an extension of median regression that includes assessing the value of the parameter vector β from the range of acceptable vectors that reduces the mean loss function.The relation between a set of independent variables and specific percentiles of a dependent variable, is modelled using quantile regression.A series of coefficients and equations at several quantiles were produced using this approach.Consequently, a clear picture of how predictors affect PM 10 concentrations at each quantile will be shown.This study adopted 9 quantiles (0.1 to 0.9 with an increment of 0.1) and thus 9 equations were generated.The quantile that exhibited best performance were selected to develop the hybrid model.
The hybrid model was developed by combining two models.QR models were combined with Relief-based algorithm to forecast the PM 10+24 , PM 10+48 and PM 10+72 .It is expected that the hybrid model able to improve the accuracy and reduce the error of prediction model.Fig. 1 illustrates the procedures involved in obtaining the best prediction model.

Performance Indicator
Performance indicators (PI) based on the model's error such as Root Mean Squared Error (RMSE), Normalized Absolute Error (NAE) and Mean Absolute Error (MAE) were used to evaluate the prediction model for the PM 10 concentration at each study location.The best method in forecasting PM 10 concentration were chosen based on the least values of error for each of the PI.

Result and Discussion
Table 1 shows the performance measure for PM 10+24, PM 10+48 and PM 10+72 in Klang, Malaysia.The Relief-QR prediction model gives a good performance in predicting PM 10 level for three consecutive days during HPE.The model was compared with the MLR as well as QR.The proposed model achieved the least error compared to MLR and QR models, in terms of MAE, RMSE and MAE.Referring to the Relief-QR model, the numbers from 1 to 8 were denoting to the parameters selected in Klang from the weight by relief method as shown in Table 2.It was detected that only CO, O 3 and SO 2 were the significant parameters in developing the best predictive model in Klang.

Conclusion
The hourly air quality parameters in Klang that is situated in the west coast of peninsular Malaysia during severe haze event in 1997, 2005, 2013 and 2015 were investigated.The goal of this study is develop a modified Quantile Regression (QR) model to forecast the PM 10 concentration during HPE in Malaysia.The performance of Relief-QR model to predict the next-day (PM 10+24 ), the next-two-day (PM 10+48 ) and the next-three-day (PM 10+72 ) of PM 10 level were assessed.Significant parameters were chosen to develop PM 10 predictive model using feature selection i.e.Relief-based method.These models were compared with QR and MLR.MAE, RMSE and NAE were used to evaluate the performances of the models.It was proven that, Relief QR model at p=0.4,where CO, O 3 , SO 2 were selected as the significant parameters, showed the best performance for the prediction of PM 10 level in Klang for the next-day to the next-two-day.Meanwhile, Relief-QR with p=0.4,where CO and O 3 were selected as the significant parameters, was chosen as the best model in Klang for the nextthree-day of PM 10 prediction.Thus, it was verified that Relief-QR method can be one of the reliable method for predicting air quality specifically during HPE.

Fig. 2 to
Fig.2to Fig.4illustrate the accuracy of all the prediction models for the next-day, the nexttwo-day and the next-three-day of PM 10 level in Klang.Obviously, the proposed hybrid method reduced the calculated error for the prediction of PM 10 concentration for the three consecutive days compared to MLR and QR at p=0.4.Hence, the proposed method can be considered as the most accurate predictive model for estimating PM 10 level during HPE or haze event.

Table 1 .
Performance measures of prediction models in Klang.

Table 2 .
Parameters selected from weight by relief approach in Klang.