Particle Swarm Optimized–Support Vector Regression Hybrid Model for Daily Horizon Electricity Demand Forecasting Using Climate Dataset

This paper has adopted six daily climate variables for the eleven major locations, and heavily populated areas in Queensland, Australia obtained from Scientific Information for Land Owners (SILO) to forecast the daily electricity demand (G) obtained from the Australian Energy Market Operator (AEMO). Optimal data-driven technique based on a support vector regression (SVR) model was applied in this study for the G forecasting, where the model’s parameters were selected using a particle swarm optimization (PSO) algorithm. The performance of PSO–SVR was compared with multivariate adaptive regression spline (MARS) and the traditional model of SVR. The results showed that the PSO–SVR model outperformed MARS and SVR.


Introduction
Electricity demand (G) forecasting is a purely fundamental yet a challenging optimisation task for improving business efficiency of the electricity industry. A relationship between the G data and temperature is clearly evident in winter and summer [1]. Hence, it would be significant to develop a forecasting model employing both the G and related climate input datasets.
In recent years, support vector regression (SVR), PSO algorithm, and multivariate adaptive regression splines (MARS) have been widely adopted in energy demand forecasting [1]. Those methods have been used to forecast G in [1,2], however, the influences of the climate datasets are not incorporated yet.
The main contribution of this research paper is to improve the G forecasting accuracy by involving climate datasets and integrating the merits of the PSO algorithm with the SVR model. To evaluate the PSO-SVR model, the traditional methods of the SVR and MARS algorithms are also developed.

Support vector regression
A nonlinear regression problem can be solved by a SVR model, which is a machine learning method and pioneered by [3], below: where ܺ = ‫ݔ{‬ } ୀଵ ୀ ∈ ℛ ே , ‫ݕ‬ = ‫ݕ{‬ } ୀଵ ୀே ∈ ℛ are the predictors and target variables, respectively. ܾ is a constant, ߱ is the weighted vector, and ∅(ܺ) represents the mapping function employed in the feature space. A minimisation technique is used to estimate the coefficients ߱ and ܾ as follows [3]: where the smoothness of the function is determined by ଵ ଶ ∥ ‫ݓ‬ ∥ ଶ , ‫ܥ‬ and ߝ are the model's parameters and the nonnegative slack variables (ߦ and ߦ * ) demonstrate the distance between actual and equivalent boundary values of a function approximation. A nonlinear regression function can be expressed by Eq. 4 after applying Lagrangian multipliers and optimising conditions [3]: where ‫ݔ‬ and ‫ݔ‬ ߳ ܺ, and the term ‫ݔ‪൫‬ܭ‬ , ‫ݔ‬ ൯ denotes the kernel function. ߙ and ߙ * are Lagrangian multipliers [3]. In this study, the radial basis function (RBF) was used in the processing of the SVR model as follows [4]: where the kernel width and inputs are represented by ߪ and ‫ݔ‬ , ‫ݔ‬ , respectively. The critical task for developing the SVR model with a good accuracy is to determine the three parameters which are kernel width ( ߪ ), the loss function (ߝ) and regulation ‫)ܥ(‬ during the training period [5]. This is achieved through a hybrid method called particle swarm optimization (PSO) in section 2.3 below.

Multivariate adaptive regression splines
The relationship between ܺ and ‫ݕ‬ is demonstrated by the MARS model as follows [6]: where ܺ and ‫ݕ‬ are offered in Eq. 1, ܽ is a constant, where ‫ݒ‬ is a penalty factor with a characteristic value of v = 3. ‫)ܯ(ܥ‬ is the number of parameters being fitted. In the training dataset, the lowest value of the ‫ܸܥܩ‬ refers to the optimal MARS model.
The two values of ܿ ଵ and ܿ ଶ are usually within [2, 2.05], whereas ߱ can be defined as follows [9,10]: where ߱ and ߱ ௫ usually equal to 0.4 and 0.9; ܶ and ܶ ௫ are the current and maximum iteration numbers, respectively [9].

Electricity demand data (G)
In this study, the G data were recorded half-hourly (48 times per day) in Megawatts (MW) for the state of Queensland, and these data were acquired from the Australian Energy Market Operator (AEMO) [12] for the period of 01-01-2015 to 31-12-2016 (dd-mm-yyyy). The 30-minute data periods were converted to daily terms by obtaining total values for each day. Figure 1 showed the plots of the actual G data series.

Climate dataset
Historical climate datasets for the same period of the G data were obtained from Scientific Information for Land Owners (SILO) [13]. The data were collected for the main eleven stations, which contain the majority of the population of Queensland, that were shown in Fig. 2 and Table 1.   The population numbers were obtained from Australian Bureau of Statistics [14] where the total number of population resulted from the eleven stations in Table 1 is very close to the population of whole Queensland (4,883,739).
The input data were comprised of the time-series of maximum and minimum air temperature (ܶ ௫ and ܶ ), rainfall (Rain) evaporation (Evap), solar radiation (Radn) and vapour pressure (VP). The datasets for whole Queensland were obtained by getting the average of ܶ ௫ , ܶ , Radn and VP and the total values of Rain and Evap of the eleven stations datasets. Those were used as the inputs of the models. Figure 3 showed the plots of those actual time series.

Forecast model development and validation
The climate variables in section 3.2 above were used to forecast the G data by developing three models: PSO-SVR, SVR and MARS. As there is no a single method for splitting data into training, validation and testing [5], the data were divided into subsets of 70% for training, 15% for validation and 15% for testing.
MATLAB-based Libsvm toolbox (version 3.22), developed by Chang and Lin [15], was used to build the SVR model. To develop a hybrid SVR model, the PSO algorithm (section 2.3) was used to select the optimal parameters based on the smallest value of MSE. To evaluate the accuracy of the SVR model, the software packages version 1.13.0 was employed [16] for the MARS model.
The models were validated in Table 2 using the rootmean square error (RMSE, Eq. 11). The PSO-SVR model yielded the lowest RMSE, which indicated the best accuracy compared to the other models.   Fig. 4. Scatterplot of the G-forecasted vs. G-observed of electricity demand data in the testing period using the three models. The equation of linear regression line and the coefficient of determination are incorporated.
where ‫ܩ‬ and ‫ܩ‬ ௦ are the i th forecasted and observed values of G in the testing period, respectively; n is the total number of ‫ܩ‬ or ‫ܩ‬ ௦ values, ‫ܩ‬ തതതതതത and ‫ܩ‬ ௦ തതതതതത are the means of forecasted and observed values, respectively.

Results and Discussions
The performance of the PSO-SVR model for the daily forecast horizon was compared with traditional SVR and MARS models in the testing period. The results of the comparison indicated that the PSO-SVR yielded better performances (lowest RMSE, and MAE, as well as the largest WI, E NS , and E LM ) than SVR and MARS models. Those values were summarized in Table 3. The scatterplots of ‫ܩ‬ vs. ‫ܩ‬ ௦ and the model forecasted errors, ‫|ܧܨ|‬ = ‫ܩ|‬ ிைோ, − ‫ܩ‬ ைௌ, | in the testing period for the three models were shown in Figs. 4 and 5, respectively. The lowest forecasted errors ‫)|ܧܨ|(‬ were shown by the PSO-SVR model in this study (Fig. 5). On the other hand, the highest correlation of determination (ܴ ଶ ) was achieved by the PSO-SVR model (Fig. 4). Overall, a significantly greater accuracy was attained by the PSO-SVR model than the other models.

Concluding Remark
In this paper, a hybrid PSO-SVR model was proposed for daily G forecasting horizon in Queensland, Australia, where the model used data from Australian Energy Market Operator (AEMO) and Scientific Information for Land Owners (SILO). The MARS and the traditional method of SVR were also used in this research study to evaluate the main model. The results showed that the PSO-SVR outperformed the MARS and SVR models. As a result, the data-driven tool constructed by the PSO-SVR model is a powerful forecasting framework which can support the National Electricity Market (e.g., AEMO). Although the PSO-SVR model performed well in this paper, some challenges in model development section could be appeared. As the PSO algorithm needs a longer time to produce the SVR parameters, alternative methods, such as multi-swarm PSO and sine cosine algorithm may need to be used. In addition, the model could be improved using ensemble-based uncertainty testing by a bootstrapping technique. Those should be addressed in future studies.

Acknowledgments
Firstly, we would like to offer our thanks to the Australian Energy Market Operator (AEMO), Australian Bureau of Statistics, and Scientific Information for Land Owners (SILO) which provided all the required data. We are very grateful to the Ministry of Higher Education and Scientific Research in the Government of Iraq for funding the first author's PhD project.