A Short-term Wind Power Prediction Framework based on Two-layer Decomposition and the Combination of Ensemble Model and Deep Network

. The time series of wind power is influenced by many external factors, showing strong volatility and randomness. Aiming at the problem of low prediction accuracy of wind power time series, this paper proposes a short-term wind power prediction framework based on two-layer decomposition and the combination of ensemble model and deep network, which is composed of complete ensemble empirical mode decomposition (CEEMD), sample entropy (SE), stacking ensemble, linear regression (LR), variational mode decomposition (VMD), long short term memory (LSTM) and multi-layer perceptron (MLP). Firstly, CEEMD is used to decompose the time series of wind power into different modes and then SE is used for reconstruction. Secondly, different models are applied to predict the different reconstruction components and select the optimal model. Subsequently, VMD is used to decompose the partially decomposed reconstruction components and a combined prediction model of stacking ensemble and LSTM is established. Finally, in order to further improve the prediction accuracy, MLP is applied to correct the error and the corrected error is superimposed with the prediction results and other reconstruction components to obtain the final predicted value. The simulation results show that the accuracy and effectiveness of this model is superior to the traditional model and the prediction accuracy of short-term wind power time series is improved effectively.


Introduction
With the development and continuous progress of society, people's demand and requirements for energy continue to increase. Traditional fossil fuels are far from meeting people's needs. The emergence of new energy can not only solve the problem of energy shortage, but also effectively alleviate the environmental issues such as the greenhouse effect and air pollution. Wind energy is the most widely used new energy with the largest application prospects. It is of great significance to use wind energy reasonably and realize wind power generation. However, due to the uncertainty and volatility of wind, when wind power generation is carried out, it will bring a severe test to the safe and stable operation of the power grid and the safety and reliability of the grid connection of the power system. In order to reduce the impact of wind power generation on the power system, it is of great significance to achieve accurate prediction of wind power in wind farms [1][2][3][4]. At present, domestic and foreign researchers have mainly divided the methods of wind power prediction into physical method [5][6][7][8], statistical method [9][10][11][12][13] and combined prediction method [14][15][16]. The physical method is based on the wind speed, wind direction, longitude, temperature, air pressure and other impact factors to realize wind power prediction. Due to the large number of impact factors, it is usually necessary to perform feature selection; the statistical method is based on the time series of historical wind power, using methods such as data mining and deep feature analysis to realize wind power forecast; the combined forecasting method is currently the most popular method, which often manifested as the ensemble of multiple models such as the combination of shallow model and deep model, to improve the prediction accuracy to a certain extent. Currently, support vector machine [17][18], neural networks [19][20][21] and deep networks [22][23][24][25][26] are often used in wind power prediction. It proposes an ultra-shortterm wind power forecasting method combining multicluster algorithm and hierarchical clustering algorithm, which clustering historical power sequences and historical meteorological sequences and establishing particle swarm optimization-back propagation (PSO-BP) neural network prediction model separately to obtain the final prediction result in reference [27]. It reports a LSTM prediction model based on multiple meteorological features and error correction of the final prediction result to obtain the final prediction value in reference [28]. In reference [29], Ying-Yi Hong proposes a deep feature extraction of wind power time series by convolutional network (CNN) and then establishes an LSTM prediction model to realize the prediction of wind power. Reference [30] proposes a new short-term wind power prediction method consisting of partial direct prediction and iterative prediction through the combination of chaotic time series analysis and singular spectrum analysis (SSA).

COMPLETE ENSEMBLE EMPIRICAL MODE DECOMPOSOTION(CEEMD)
Complete ensemble empirical mode decomposition (CEEMD) is an improvement of empirical mode decomposition (EMD) and integrated empirical mode decomposition (EEMD). EMD can decompose the fluctuation and change trend of the original signal, making the signal tends to be stable. CEEMD adds a set of positive and negative white noise to the original signal, which can not only suppress the EMD mode aliasing problem, but also make the residual noise maintain on a very small scale. The main steps are as follows: Step 1: Add n sets of positive and negative white noises � , � to the original signal x(t) to form the composite signals P(t) and N(t).
Step 3: Calculate the average value of all inherent modal function (IMF) components to obtain the decomposition result of CEEMD, as shown in Equation 1.

SAMPLE ENTROPY(SE)
Sample entropy (SE) is a method of measuring the complexity of time series. The larger the entropy value, the higher the complexity of the time series. The SE does not depend on the data length and has better consistency. The calculation of the sample entropy value is shown in Equation 2.
SampEn�k, r, L� � ��n Where k is the dimension, r is the threshold, L is the length, and � ��� is the probability of matching k points when the threshold is r.
For each mode, the state � surrounds the center frequency � and the VMD method mainly includes the solution and construction of the variational problem. To decompose the original time series into a series of modal functions with limited bandwidth, the main process is as follows: Step 1: Hilbert transform is performed for each mode � to obtain a single-sided spectrum.
Step 2: For each mode � , according to the estimated center frequency, adjust the spectrum to the corresponding base band.
Step 3: Using the Gaussian smoothness of the demodulated signal to estimate the bandwidth of each mode, the constraint variation problem is expressed as Equation 3.
Step 4: Introduce the second penalty factor C and Lagrange multiplier θ to transform the constrained optimization problem into an unconstrained optimization problem, as shown in Equation 4.
Step 5: Solve the problem by using the alternating direction multiplier algorithm (ADMM), and get the updated formulas of � and � .The specific expressions are shown in Equation 5 and Equation 6.

STACKING ENSEMBLE
The Stacking ensemble model is currently the most widely used model in data mining competitions. It is a method of classification and regression using the fusion of mu ltiple models. The key lies in the selection of the base model and the number of cross-validation. The selection requires that the effects of the base models are close and the correlation is low. The main flows of the Stacking model are as follows: Step 1: Divide the dataset into training set and test set.
Step 2: Cross-validate the training set to obtain the prediction result of the validation set and the test set.
Step 3: Calculate the weighted average of the test set prediction results, the same operation is performed according to the number of base models, until a new training set and test set are constructed.

LONG SHORT TERM MEMORY(LSTM)
Long short term memory (LSTM) is an improvement of the recurrent neural network. It is an improved network proposed to solve the gradient disappearance or gradient explosion of the recurrent neural network (RNN). The LSTM consists of an input gate, a forget gate, and an output gate, as shown in Figure 1.
Among the three gates, the forget gate determines which information will be discarded, as shown in Equation 7.
The input gate determines which information is retained, and receives new input at the same time, and updates the control parameter � , as shown in Equations 8,9,10.
The output gate generates the output of the LSTM at the current time according to the control parameter � , as shown in Equations 11 and 12.
Where � is the input, ℎ � is the output, ℎ ��� is the output at the last time, ��� is the control parameter at the last time, � is the latest control parameter, � is the information to be saved.

MULTILAYER PERCEPTRON(MLP)
Multi-layer perceptron (MLP) is a simple multi-layer neural network, the hierarchy of MLP is a directed graph. Each layer is fully connected to the next layer, and the output of the neuron in the previous layer is the input of the neuron in the next layer. The key to MLP lies in the number of hidden layers, the number of neurons in the hidden layer, and the connection weights and offsets, which are often adjusted by grid search and other methods.

DATA PREPROCESSING
In order to eliminate the dimensional influence between the data, we can make the data be normalized. After the original data is normalized, the data will be in the same dimension. When the input of the model is the same dimension, the operating efficiency of the model will be greatly improved. The equation for data normalization is shown in the Equation 13: Where ��� is the maximum value of the sample data, ��� is the minimum value of the sample data.

ONE-LAYER DECOMPOSITION
Due to the large volatility of wind power time series, in order to improve the prediction accuracy, many signal decomposition techniques are widely used in wind power prediction. At present, EMD, EEMD, WT, VMD and so on are widely used. In the one-layer decomposition, this paper uses complete ensemble empirical model decomposition (CEEMD), which can effectively suppress the modal aliasing problem of EMD and make the residual component always keep at a very small level. Also, the original wind power data can be completely decomposed without loss. Calculating the sample entropy value (SE) of each IMF and reconstructing the components which the SE are closed to form a new reconstructed component. The process of one-layer decomposition is shown in Figure 2. After obtaining different reconstructed components, Analyzing the reconstructed components and selecting different models for prediction according to the characteristics of the different components, Finally, we select the optimal model .

TWO-LAYER DECOMPOSITION
When the volatility of one of the reconstructed components is large, a two-layer decomposition is performed on the basis of the one-layer decomposition. The method of two-layer decomposition in this paper adopts the variational mode decomposition (VMD) technology. After the two-layer decomposition, the reconstructed component obtains k modes and the mode number k is determined by the center frequency � of each mode. For each different mode, the combined model of stacking and LSTM is used to make predictions and the final prediction results are accumulated to obtain the wind power prediction results. Among them, the selection of the basic model of the stacking model are random forest (Random Forest), Xgboost and LightGBM three ensemble models, making full use of the characteristics of the three basic models and constructing new features as the input of the LSTM model to predict wind power, which is shown in Figure 3. All prediction results are superimposed to obtain the final prediction value. Since certain prediction errors will occur, in order to further improve the prediction accuracy, the prediction errors are corrected by the MLP model. The specific process of the two-layer decomposition of the reconstructed components is shown in Figure 4.   The main steps of the short-term wind power prediction model based on two-layer decomposition and the combination of ensemble model and deep network are: Step 1: Perform CEEMD decomposition and SE reconstruction on the original wind power time series to form reconstructed component 1, reconstructed component 2, reconstructed component 3, and the residual component can be directly ignored because of its small magnitude.
Step 2: For reconstructed component 2 and reconstructed component 3, analyzing their data characteristics and selecting different models for comparison and prediction, we decide to use linear regression (LR) model on the reconstructed component 2 and reconstructed component 3.
Step 3: Because the volatility of reconstructed component 1 is relatively large, VMD is used for two-layer decomposition and the combined prediction models of stacking and LSTM are used for each mode, and error correction is performed using MLP to obtain the final predicted value of reconstructed component 1.
Step 4: Accumulate the predicted results of the reconstructed components 1, 2 and 3 to obtain the final predicted value of wind power.

D. EVALUATION FUNCTION
The evaluations function of the prediction model selected in this paper are: a. Root mean square error: Mean absolute error: Where � is the actual value of the wind power at the i-th moment, � � i is the predicted value of the wind power at the i-th moment, and � is the average value of the m wind power values.

Case studies
In this paper, we use a wind farm data in the UK for shortterm wind power prediction. In order to prove the validity of the model, this paper selects a total of 4,490 wind power data from the wind farm at 1:00 on March 1, 2018 to 8:00 on April 14, 2018.
In the experiment, the sampling interval between the experimental data points is 10 minutes and the time series of the wind power obtained is shown in Figure 6. Among them, the first 3995 data points are used as the training set and the last 495 data points are used as the test set. The previous 5 wind power data predict the wind power at the next moment to achieve rolling prediction. It can be seen from Figure 6 that the time series of wind power has obvious fluctuation, so CEEMD decomposition is performed on it, and the specific decomposition results are shown in Figure 7.  Table 1.  Figure 8. It can be seen from Figure 8 that the time series of wind power still exhibits a certain degree of non-stationarity, so VMD decomposition is performed on the reconstructed component 1 and the center frequency method is used to determine the number of modes k. The general value range of k is 3~ 8. The center frequency when k takes different values is shown in Table 2.  It can be seen from the center frequency in Table 2 that when k=7, the center frequency appears 25.636 and 54.285, the two center frequencies are relatively close and the number of modes of VMD decomposition can be determined to be 6.As is shown in Figure7, Comparing with the six modes before decomposition, the wind power is decomposed two times, which reduces the fluctuation of wind power and has better stability. The results of VMD decomposition are shown in Figure 9. When training and predicting each mode, we build a combined prediction model of stacking and LSTM. The three base models are random forest (RF), xgboost and LightGBM. Among them, the parameters of each model are determined by grid search, where the max depth of three base models is 4, the learning rate is 0.1, LSTM is a 4-layer network structure, the number of neurons in each layer is 50, dropout is 0.1, the optimizer is adam and The number of iterations is 50. when MLP is used for error correction, the number of MLP network layers is 4 and the number of neurons in each layer is 100, 100, 100, and 50 respectively. Table 3 lists the predicted evaluation indexes of four different models and Figure 10 shows the predicted results of the four models.  Figure 10. The graph of four model prediction curves According to the predicted results in Figure 10 and Table  3, we can conclude that for the reconstructed component 1, the four models can both predict the wind power at the next moment. Compared with the one-layer decomposition and the two-layer decomposition, the MSE is reduced by 91.07 KW and MAE are reduced by 53.41KW. In terms of predicted accuracy, VMDstacking-LSTM-MLP has the highest predicted accuracy and R-Square achieves 0.998, indicating that this model has better fitting ability and better predicted effect than other models. For the reconstructed component 2 and the reconstructed component 3, the graphs of the two reconstructed components are shown in Figure 11 and Figure 12.  As can be seen from Figure 11 and Figure 12, the curve relationship between the reconstructed component 2 and the reconstructed component 3 has a certain linear relationship, because of its potential data characteristics, a linear regression (LR) model is selected for prediction and the predicted effect is the best. As shown in Figure 13, Figure 14.  Table 4 shows the predicted indexes of the 7 models. The graphs of the final prediction results of the 7 models are shown in Figure 15. Analyzing Table 4 and Figure 15, we can get: (1) Compared with the SVR model, The RMSE and MAE of the LSTM model is reduced by 2.56KW and 9.15KW respectively.
(2) After one-layer decomposition of CEEMD, RMSE and MAE are reduced by 4.51KW and 15.02KW respectively and R-Square is increased to 0.773.
(3) After one-layer decomposition of CEEMD, using stacking integration and LSTM combined prediction, both RMSE and MAE are reduced, which indicates that the combined prediction model has a strong fitting ability than the single model.
(4) The second decomposition of VMD based on the first decomposition of CEEMD greatly improves the prediction accuracy of the model. RMSE and MAE are reduced by 91.08KW and 55.46KW respectively and R-Square is also greatly improved to 0.966. (5) After MLP correction of the error, RMSE and MAE were reduced to 13.15KW and 9.44KW respectively and R-Square is increased to 0.998, which indicates that the correction of the error can improve the prediction accuracy of the model. Therefore, compared with other models, the prediction model proposed in this paper can better predict the shortterm wind power changes and the predicted accuracy of the model has also been greatly improved, which verifies the feasibility and effectiveness of the proposed method.

Conclusion
This paper proposes a short-term wind power prediction based on the two-layer decomposition and the combination of ensemble model and deep network and draws the following conclusions: a. Signal decomposition techniques such as CEEMD, VMD can effectively improve the predicted accuracy of the model. b. The predicted results obtained by the combined prediction model are significantly better than the single prediction model. c. Perform feature analysis on the data to find the most suitable model for prediction, which can not only get better predicted results, but also speed up the operation efficiency of the model. d. When it is found that the prediction error shows a certain regularity, an appropriate model can be used to train and correct the error, which can further improve the predicted accuracy of the model. At present, the extensive use of ensemble models and combined forecasting models, the combined use of shallow models and deep models and the correction of errors are the main development directions of forecasting models. In the future, we can do major research in these three areas.