Short-term wind power prediction based on GPR-BSO model

. Wind power forecasting is a crucial part for the safe and stable operation of wind power integration, which is under the influence of different factors such as wind speed, wind direction, atmospheric pressure. These factors bring randomness and volatility to wind power which makes it less predictable. While, there are very limited studies on describing the uncertainty of wind power. Therefore, to providing additional information on the uncertainty and volatility, a kernel-based on Gaussian Process Regression (GPR) incorporating the hyper-parameters intelligent optimization method is proposed in this paper. Firstly, the hyper-parameters solution of GPR is formulated as a nonlinear optimization with constraints. Then, an intelligent algorithm named Brain-storming optimization (BSO) is adopted to obtain the optimal hyperparameters of GPR. Furthermore, the performance is examined on short-term wind power data. Most importantly, the GPR incorporating BSO can avoid the hyper-parameters at local optimum.


1Introduction
The uncertainty and stochastic nature of wind power has brought significant challenges to power system operation and planning [1]. The wind power forecasting can mitigate the negative impact on the grid, which is critical to the security and economics of the power system operation [2]. Currently, wind power forecasting methods fall into two main categories, statistical and physical methods. Among them, the physical method is based on surface information and meteorological data as the initial boundary conditions, using mathematical models for inference to solve [3]- [5]. Statistical methods are used to build statistical models from the characteristics of historical data to further predict wind power at future times and mainly include time series methods and artificial intelligence methods [6]- [8]. As all methods have different aspects of error, weighting various forecasting methods or fusing the advantages of different methods to construct combined forecasting models is also a research direction aimed at reducing the upper limit of error of a single method.
The focus of this paper is on the study of statistical method for wind power forecasting, such as multivariate linear regression (MLR), Autoregressive Integrated Moving Average (ARIMA) [9], artificial neural network (ANN), support vector regression (SVR) [10], neuronfuzzy system (NFS). The main process of this network is clustering the data in the first step, then chose the best data as the training data to the SVM to fit the different situation. Comparing with the New York Independent System Operator and single SVM result, the hybrid network outperforms the ISO and single SVM in MAE and MAPE.
The above methods can be summarized as linear and nonlinear methods. Simple linear regression and multivariable linear regression were used to forecast the load. Reddy [11] predicted the short-term load with the linear regression technique and found its coefficients. Amral [12] used a Multiple Linear Regression method for the short-term load forecasting, but this model was very sensitive to the temperature and also affected by other weather factors, therefore, it requires a very accurate temperature forecast and error coefficients to increase the accuracy of forecasting [12].
Besides, another nonparametric machine learning named Gaussian Process Regression (GPR) has been successfully applied in power load forecasting [13]. The superiority of GPR comes from the use of the Bayesian Inference theory, which evaluates the confidence interval and indicates where the model is unreliable. However, the determination of the hyper-parameters needs to solve a non-convex optimization, which easily traps in local optimum [14]. Xiao et al. optimized the hyper-parameters of GPR using Particle Swarm Optimization (PSO) [15]. However, the PSO has characteristics of slow convergence rate and early-maturing problem, which affect the training speed and accuracy of the PSO-GPR algorithm. The contribution of this paper is to propose a new methodology for short-term load forecasting using GPR with Brainstorming Optimization (BSO), and its demonstration in a real-load case study.
This paper is organized as follows: In section 2, the theories of GPR and BSO are introduced respectively. In section 3, the nonlinear optimization with constraints is formulated for hyper-parameters of GPR, and the BSO approach used to design the optimal solution is described. In section 4, a real case study are used to test the proposed

2.1Gaussian Process Regression
The gaussian process is also called normal stochastic process from the perspective of probability theory. The main idea is to convert point predictions into predictions of posterior means and variance. The Gaussian process regression models a function to generate a nonparametric model directly, which is regarded as a supervised learning algorithm widely used in many kinds of fields in the past decades.
Given a input vector  , and the corresponding output Where: Generally, ( ) m X sets to zero mean function. The output vector Y is determined due to Assuming that there is a new query point * X . The joint distribution of the training output Y and the query output * Y is formulated as: According to the Bayesian rules, can be calculated as follows, The covariance function ( ) K X X , which commonly called the kernel function describing the correlation of X .The kernel function maps the nonlinear relationship between sampling to the high-dimensional feature space to convert it into a linear relationship, which simplifies the process. The squared exponential covariance function is universally used which is formulated as follows, where the first term is data-fit, the second term is a complexity penalty, and the last term is a normalizing constant, where n is the number of training samples. The common method for solving non-convex optimization problems is taken the gradient descent method (i.e. Newton method). However, it is easy to trap in local optimization and it is sensitive to the initial value. To solve this problem, Brainstorming Optimization algorithm is proposed to obtain the optimal values of hyper-parameters.

2.2Brainstorming Optimization Algorithm
Different from traditional optimization algorithms inspired by animals' behaviors or natural phenomena (such as ant colony algorithm, simulated annealing algorithm, genetic algorithm.), BSO simulates the thinking process of human beings [16]. In the brainstorming process, a group of people are divided into several teams to solve a problem. To get solutions as many as possible, the individuals in each team are demanded that not to give comments on others' solutions before this term is over. In order to bring more inspiration collision, the individuals are encouraged to use divergent thinking on others' solutions. And then choosing the best in these solutions. The BSO can be summarized as three main processes: (1) Clustering process Cluster the N individuals into M clusters and choose a center from each cluster according to the value of fitness function.
(2) Replacing process Randomly choosing an individual to replace the center and change its value using eq. (13): iteration.
cur Iter is the current value of iteration. s is slope-changing value of log () sig function.
(3) Generating process The new individual has three characteristics which are dominant and flexibility. The dominant means that the current individual with good performance will lead the next generation, which means that the clustering center participates in generating new individual with a high probability. The flexibility means that the new individual could also be a combination of different individuals from two classes eq.(17).
where, the better individuals will be preserved and entered into the next iteration process. In this way, before reaching the maximum number of iterations, the algorithm is always searching for the optimal solution.

2.3The BSO-GPR Framework
To achieve the best hyperparameters of GPR kernel function, the BSO is embedded into the nonlinear optimization and the fitness function is defined as: where, * i y is the prediction, and the i y is the actual value.

3Case Study
In order to evaluate the performance of the proposed methodology, a wind power data obtained from National Weather Service (NWS) is used in this section. It contains 15 minutes of wind speed, wind direction, wind power and other meteorological data. After the normalizing data, the data from July 1,2017 to August 30, 2017 are used for training and the data on August 31, 2017 are used for forecasting. The input factors include 15 variables such as wind speed, wind direction, atmospheric pressure. The model uses the squared exponential as the covariance function, BSO to optimize the hyper-parameter, and the prediction error as the optimization objective function. The convergence value comparing between BSO and PSO is shown in Figure.1. The load prediction curve is shown in Figure.3 and Figure.4. Additionally, Support vector machine (SVR) and Random Forest (RF) are taken as comparisons which are shown in Figure.2.   According to Figure.1, BSO converges to 3.13 from iteration 143 and PSO converges to 4.24 from iteration 297. BSO has a fast convergence speed and a better ability of global searching than PSO. From the results of Figure.3 and Figure.4, the GPR-BSO probability density distribution including more actual value than the PSO which means that BSO has more probability to optimize the hyper-parameter than PSO in the same situation. From the result of Figure.2 and Table 1, average error results of multiple experiments indicate that GPR-BSO outperforms other methods in all evaluation indexes. Therefore, BSO has a better ability than PSO in optimization and GPR can provide reliable distribution for wind power prediction.

4Conclusion.
A GPR model for short-term wind power forecasting is presented in this paper. Wind power presents uncertainty, nonlinearity, and complexity, making it difficult to forecast with a linear model. While, the parameters of the model such as neural network are too complex, and GPR as a non-parameter model can describe this mapping relationship easily. In addition, the GPR model can also predict the posterior probability density distribution to provide more information. The hyper-parameters of GPR are obtained by using the brainstorming optimization and the accuracy of GPR model is improved.