Long-term forecasting of climatic parameters using parametric and non-parametric stochastic modelling

Climatic parameters fluctuate dynamically and their turbulences become more significant as the influence of the climate change increases. A robust model that is able to factor in the recent climate change for long-term climatic parameters forecasting is desired to strategically plan for future anthropogenic activities. In this study, two stochastic time series model, namely the seasonal auto-regressive integrated moving average (SARIMA) model and the artificial neural network (ANN) model are used to predict monthly mean temperature (Tmean), relative humidity (RH), wind speed (u) and pan evaporation (Epan) up to 12 months ahead. This study is conducted using data collected from three meteorological stations in the northern region of the Peninsular Malaysia. The stochastic models forecasted the Tmean with the highest accuracy, followed by RH, u and Epan. Besides, despite the increasing time step (from 1 to 12 months), the accuracy of the models remain consistent. However, both of the models are susceptible to the occurrence of extreme climates. In general, the SARIMA model performs better than the ANN model, probably attributed to its ability to consider the seasonality of the climatic data rather than depending solely on black-box computation.


Introduction
The effect of climate change is becoming increasingly prominent over the last few decades. The consequences of climate change are witnessed in many aspects of the natural systems, including the weather, agriculture, ecosystem and hydrology [1]. The Intergovernmental Panel on Climate Change (IPCC) had observed that from the year 2002 to 2017, the global mean temperature experienced an increase of 0.5 °C, with the increase in land surface temperature exceeded this mean value by another 0.5 °C [2]. In the same report, the IPCC stressed that the degree of climate change is highly correlated with the anthropogenic activities through the emission of greenhouse gases (GHG) as well as land use. It is now clear that in the near future, the human activities have to be planned carefully and strategically as an effort to curb the rate of climate change. Numerous models and simulations known as global climate models (GCMs) have been proposed to project the future climate as a cautious note for the aggressive development. These models are widely used in forecasting dry spell, precipitation and temperature [3][4][5].
Although the GCMs have wide spatial applications, experts have claimed that there still exist inevitable uncertainties when performing such modelling or simulation works. These uncertainties arise from different sources, including the downscaling, natural variability and the model itself [6]. Besides, the use of the GCMs requires the users to arbitrarily assume the representative concentration pathway (RCP) which illustrates the trajectory of greenhouse gases up to year 2100 [7]. While the prediction of the true pathway is not possible, simulating GCMs over all the pathways could be computationally expensive and impractical, not to mention the variety of GCMs that have distinct performances under different conditions. A robust and simple approach is needed.
In this study, parametric and non-parametric stochastic models are used to perform longterm forecasting of climatic parameters in the Peninsular Malaysia. The used of stochastic models do not require the input of RCP and merely produce future projections based on a historical time series. The auto-regressive integrated moving average (ARIMA) is a traditional time series model that combines regression analysis with the moving average. It includes a differencing term to transform the non-stationary time series into a stationary time series model before performing subsequent analysis [8]. The parametric ARIMA model weights the historical time series differently, whereby newer data are given higher weightage. Nonetheless, the seasonal ARIMA (SARIMA) is claimed to be having higher efficiency in forecasting climatic parameters due to its nature that is able to deal with seasonal fluctuations [9]. The SARIMA model has proved its suitability in many applications, including temperature and wind speed [10,11].
On contrary, the representative for non-parametric stochastic model in the artificial neural network (ANN). The ANN is classified as a non-parametric model due to black-box operation that gives random weightage to the historical data. The non-linearity in ANN allows it to better adapt to complex processes and problems such as the climatic parameters [12]. Time series modelling involving climatic processes using ANN is common [13]. The originality of this research work is to compare the performances of parametric (represented by SARIMA) and non-parametric (represented by ANN) stochastic modelling in forecasting multiple climatic parameters using univariate time series data in a region with tropical climate. The output of this research work could provide a robust and simple forecasting strategy for the decision makers that eliminate the need of simulating the complex and uncertain GCMs.

Study Area and Data
Three meteorological stations in the northern region of the Peninsular Malaysia are selected to be included in this study. The three stations share similar characteristic whereby all of them are located at coastal areas. The three stations are Station 48600 (Pulau Langkawi), Station 48601 (Bayan Lepas) and Station 48615 (Kota Bharu). The details of the stations are provided in Table 1, whereas the exact locations of the stations are shown in Fig. 1. Four types of monthly climatic data, including the mean temperature (T mean ), relative humidity (RH), wind speed (u) and pan evaporation (E pan ) are obtained from the Malaysia Meteorological Department for the period of year 2002 to 2017. In order to assess the longterm forecasting ability of the SARIMA and ANN models, data from the year 2002 to 2016 are used for training and modelling, whereas the data in the year 2017 are used for comparison with the forecasted climatic data. In other words, the SARIMA and ANN models developed in this study should be able to predict values of climatic parameters 12 months ahead of current time.

Seasonal Auto-Regressive Integrated Moving Average
The parameters of the SARIMA model are tuned by using the trial and error method.
where X t is the stochastic climatic parameter, ε t is the normal random variable, B is the regressive operator, Φ is the seasonal autoregressive operator, ϕ is the non-seasonal autoregressive operator, ∇ ω D is the seasonal differencing operator, ∇ d is the non-seasonal differencing operator, Θ is the seasonal moving average operator and θ is the non-seasonal moving average operator.

Artificial Neural Network
The non-parametric ANN, in the form of multilayer perceptron (MLP) works on the principle that different weightage is given to every input into the network. The input is then transformed using tangent-sigmoid activation function to determine the strength of the output of the hidden neurons. The final output of the ANN is the summation of the values output by the hidden neurons. In this study, the ANN is trained using the Levenberg-Marquardt algorithm and the optimum number of hidden neurons is determined by the grid search method with the minimisation of the prediction error. Time lags of six months is used to ensure the meaningful time series is fed into the model. The mathematical expression of the ANN is shown in Equation 2.

∑
( 2 ) where f is the activation function, w is the weight term, b is the bias term, n is the number of input, x is the input and y is the output. The general structure of the developed ANN models can be referred to other literatures [12].

Performance Evaluation
In where N is the number of observations, y p is the predicted value, y a is the actual value and ȳ is the mean of the actual values.

Tuning of SARIMA Models
The parameters of the SARIMA models are determined for different climatic parameters at different meteorological stations. The tuned SARIMA models and their performances are compiled in Table 2. As shown in Table 2, most of the parameters for ARIMA and seasonal components are (2,0,0) and (0,1,1), respectively. In other words, the time series of the climatic parameters at different stations is auto-regressive with respect to the second order (only requires two data points backwards). There are two exception alternatives for the parameters of the ARIMA component, namely (2,0,2) and (2,1,2). The former is used for u at Station 48601 (Bayan Lepas) whereas the latter is used for RH at Station 48601 (Bayan Lepas) and T mean at Station 48615 (Kota Bharu). This means that some sort differencing is required for the mentioned climatic parameters in order to ensure stationarity in the time series.
When the time series trends of the climatic parameters are analysed with a seasonal return period of 12-time steps, seasonal component of (0,1,1) is obtained. At all the studied stations, the seasonal trends of the climatic parameters require first order differencing to make the trends are stationary. On the other hand, the seasonal trends are not auto-regressive except for the T mean at Station 48601 (Bayan Lepas). By combining the seasonal and ARIMA components, it is discovered that none of the time series of the investigated climatic parameters is stationary and all of them exhibit seasonality. The analysed time series can be used for the forecasting purpose.
Referring Table 2, it is observed that the SARIMA model forecasted the T mean of the year 2017 with the highest accuracy in terms of MAPE and RMSE, followed by RH, u and E pan . This finding is reasonable as the temperature fluctuation is a simple process that can be easily predicted by univariate analysis. On contrary, moving from RH to u and then to E pan , the processes become increasingly complex which involve other environmental conditions. For instance, the magnitude of E pan is dictated by multiple factors such as temperature, humidity as well as surface moisture.

Optimum ANN Models
The performances of the ANN models in forecasting climatic parameters are shown in Table  3. The forecasting plot for the ANN models are shown in Figure 3. For different climatic parameters at different stations, the optimum number of hidden neurons varies, but generally, minimum number of hidden neurons is five. Similar to the SARIMA models, the ANN models could forecast T mean more accurately, with RH, u and the E pan follow suit. However, as compared to the SARIMA models, the accuracy of the ANN models is slightly lower as shown in the MAPE and RMSE values. There are some results with high errors with high values of R 2 . This means that the forecasting time series is less accurate as compared to the mean value, and naively using the mean value as the forecasted value is better.
The forecast time series of the SARIMA and the ANN models at Station 48600 (Pulau Langkawi) are compared in Fig. 2. As shown by the figure, it can be seen that both models are unable to adapt themselves towards extreme values. When the values of the climatic parameter is extremely high or extremely low, the models would tend to underestimate and overestimate, respectively. Besides, the distances between the forecast plots and the actual plots of the SARIMA models are smaller than that of the ANN models, suggesting that the SARIMA models should be the more suitable approach for forecasting climatic parameters in the Peninsular Malaysia with tropical climate.
Moreover, the time series plots show that the accuracies of the model predictions are not affected by the length of the time window. That is to say, regardless of the one month or twelve months ahead, the accuracies of the models are similar. This means that theoretically, the SARIMA and the ANN models can forecast the climatic parameters for prediction horizon of more than twelve months. With consistent update of the model structure, the SARIMA and the ANN models can be deemed as robust in terms of the temporal context.

Conclusion
The SARIMA and ANN models are used to forecast T mean , RH, u and E pan for the northern region of Peninsular Malaysia up to 12 months ahead. It is concluded that the SARIMA models is more suitable for this task due to the lower MAPE and RMSE achieved. This is attributed to the nature of SARIMA model in considering the non-stationarity and seasonal trend of the climatic parameters. Both SARIMA and ANN models could predict T mean with the highest accuracy, followed by RH, u and E pan due to the difference in the complexity of the processes. However, the SARIMA and ANN models could not predict extreme values accurately and thus need more study for the mitigation of this issue. The prediction of the models are not affected by the size of the forecasting window, suggesting that the models can forecast the climatic parameters for longer time ahead. Although the performance of the non-