Online Bayesian Learning with Natural Sequential Prior Distribution Used for Wind Speed Prediction

Predicting wind speed is one of the most important and critic tasks in a wind farm. All approaches, which directly describe the stochastic dynamics of the meteorological data are facing problems related to the nature of its non-Gaussian statistics and the presence of seasonal effects .In this paper, Online Bayesian learning has been successfully applied to online learning for three-layer perceptr on’s used for wind speed prediction. First a conventional transition model based on the squared norm of the difference between the current parameter vector and the previous parameter vector has been used. We noticed that the transition model does not adequately consider the difference between the current and the previous wind speed measurement. To adequately consider this difference, we use a natural sequential prior. The proposed transition model uses a Fisher information matrix to consider the difference between the observation models more naturally. The obtained results showed a good agreement between both series, measured and predicted. The mean relative error over the whole data set is not exceeding 5 %.


Introduction
Due to the high penetration of wind power in the electricity system, the forecast accuracy of wind power prediction systems becomes increasingly important.In recent years, many scholars have done a lot of research on wind power prediction.The forecast accuracy has improved constantly, and it can be expected that intense research and development efforts are already on track.Many researches were achieved in order to predict wind behaviour.However, it is still one of the most difficult quantities to forecast [1], namely due its stochastic nature.The actual state of the art includes five main families of methods: persistence Method [2], physical Methods [3], spatial Correlation Models [5], artificial Intelligence Methods [6] and hybrid Methods [7].However, there will always be an inherent and irreducible uncertainty in every prediction.
Bossanyi [8] used a Kalman Filter with the last 6 values as input and got up to 10% improvement in the RMS error over persistence for 1-min averaged data for the prediction of the next time step.This improvement decreased for longer averages, and disappeared completely for 1-hourly averages.Dambrosio and Fortunato [9] used a one-step-ahead adaptive control by means of a recursive least squares algorithm for the electrical part of the turbine.They show a fast and reliable response to a step in the wind.Nogaret et al [10] reported that for the control system of a medium size island system, persistent forecasting is best with an average of the last 2 or 3 values, ie 20-30 minutes.Tantareanu [11] found that ARMA models can perform up to 30% better than persistence for 3-10 steps ahead in 4-sec averages of 2.5Hz-sampled data.Dutton et al [12] used a linear autoregressive model and an adaptive fuzzy logic based model for the cases of Crete and Shetland.They found minor improvements over persistence for a forecasting horizon of 2 hours, but up to 20% in RMS error improvement for 8 hours horizon.However, for longer horizons, the 95% confidence band contained most of the likely wind speed values, and therefore a meteorological-based approach was deemed more promising on this time scale.In the same team, Kariniotakis et al [13,14] were testing various methods of forecasting for the Greek island of Crete.These included adaptive linear models, adaptive fuzzy logic models and wavelet based models.Adaptive fuzzy logic based models were installed for on-line operation in the frame of the Joule II project CARE (JOR3-CT96-0119).Fukuda et al [15] worked on an Auto Regressive model for blade angle optimisation.Using data mining, they found that the use of additional variables was helpful only in December, but not in June.Hunt and Nason [16] used an analysis of principal components of wavelets derived from wind speed time series for a measurecorrelate-predict technique.In [17] comprehensive comparison study on the application of different artificial neural networks in 1-h-ahead wind speed forecasting is presented.Three types of typical neural networks, namely, adaptive linear element, back propagation, and radial basis function, are investigated.The wind data used are the hourly mean wind speed.The results show that even for the same wind dataset, no single neural network model outperforms others universally in terms of all evaluation metrics.Moreover, the selection of the type of neural networks for best performance is also dependent upon the data sources.Different network structures, learning rates, and inputs are believed to result in different forecast accuracies.Among the optimal models obtained, the relative difference in terms of one particular evaluation metric can be as much as 20%.This indicates the need of generating a single robust and reliable forecast by applying a post-processing method.
For this purpose in this paper, the variation in wind data is formulated as a system identification problem, where the input of the system is the past values (z(t -1), z (t -2), z(t -3), …) of a time series and its desired output zt is the future of one value.So data is generated sequentially from unknown time-varying systems, the problem of approximating the target system by a parameterized probabilistic model via the data sequence is called an online learning problem.A reasonable approach to such problems is online Bayesian approach considered with an observation model P(y(t) |θt ) parameterized by time-varying parameter θt (∈ IRk) for the behavior of the observation data zt.

Used Data
The dataset of these parameters were recorded in (Algeria), the collected experimental data is the hourly average wind speed.Data have been collected during 2013 at a height of 60m. Figure 1 shows the evolution of the different wind speed time series during April 2013.Therefore, the objective of this work is to predict these data based on the past observations.

Results
There is a lot of scope in meteorological time series prediction and what is needed is the accuracy of forecast.First the performances of a three-layer perceptron in combination with an online Bayesian learning, was comprehensively investigated for forecast value of next hourly average wind speed.The three-layer perceptron defined as follows is used as the function f (•) where the integer nh (> 0) denotes the number of hidden units, and the integer nx (> 0) is the dimension of the input variable xt .The parameter a0,t (∈ IR) is the output bias, the parameter ai,t (∈ IR)(i = 1,..., nh) describes a weight parameter of the i th hidden unit, and the parameter vector bi,t(∈ IRnx+1)(i = 1,..., nh ) denotes a weight parameter vector connecting from the inputs xt and the input bias to the i th hidden unit.
The observation model use P(z t |θ t ) parameterized by time-varying parameter θt (∈ IRk) for the behavior of the observation data zt at time t , and a transition model (prior distribution) P(θt |θt−1) to describe the behavior of the time-varying parameter θt .
Transition models P(θt |θt−1) based on the squared norm of differences between the current parameter θt and the previous parameter θt−1, which can be written as have been considered [18]- [20].Here, the variable γt is often called the hyperparameter, which controls the scale of the transition.However, the squared norm ||θt−θt−1||2 does not adequately relate to the difference between the current observation model P(zt |θt ) and the previous observation model P(zt−1|θt−1), and the changes between the observation models P(zt |θt ) and P(zt−1|θt−1) are highly dependent on the values of θt−1.
In figure 1 The neural network cannot explore or predicts the speed below 1.5m / s or up to 11m / s because the change between the observation models is extremely small at θt−1 = θa, whereas the change between the observation models is large at θt−1 = θb with the transition model (1).We may consider that this phenomenon can happen when the eigenvalues of the Fisher information matrix are small at θt−1 = θa and when the eigenvalues of the Fisher information matrix are large at θt−1 = θb.This phenomenon of the transition model ( 1) has affect the tracking of the system or data zt .To break this bridge we will use the same structure of artificial neural network with an online Bayesian learning.the observation data zt generated by an input-output system.The observation model P(zt |θt ,βt )describing the observation data zt will be: where the variables yt (∈ IR) and xt (∈ IRnx ) stand for the output variable and the input variable at time t , respectively, and the observation data zt is considered as the set of these variables xt and yt , i.e., zt = (yt , xt ).The function P(xt ) describes a probability distribution of the input variable xt.And the online Bayesian approach use natural sequential prior transition model [21]: to adequately consider the difference between the observation models P(zt |θt ) and P(zt−1|θt−1), and to avoid influences on the learning and/or tracking of the target system.Here,θt= θt − θt−1,and T stands for the transpose operator of vectors and matrices.The natural sequential prior uses a Fisherinformation matrix Ft−1 P(z|θt−1)(∂ log P(z|θt−1)/∂θt−1)(∂ log P(z|θt−1)/∂θt−1)T dz derived from the observation model P(zt−1|θt−1), in order to consider the difference between the observation models more naturally.In information geometry approaches, this Fisher information matrix Ft−1(∈ IRk×k) is defined as a metric matrix of a model manifold corresponding to the observation model P(zt−1|θt−1) [11].The natural sequential prior is based on the norm θtT Ft−1θt, which considers such a metric of the model manifold.Given the observation model in (19), the Fisher information matrix Ft can be written as by assuming that the variable yt contains Gaussian noise.The variable βt ∈ (0,∞) stands for a hyperparameter describing the inverted variance of the Gaussian noise at time t .In this work we performed experiments with a step time of one hour.The input network is a sequence of hourly mean values prior.The target output is made of future values.Root Mean Square the Error (RMS) differences between observed and estimated values, by chosen models, were used to evaluate the performance of models RMSE were computed by: (9) Where t y the original time series, yˆt is is the computed.
The RMS error and Mean relative error has been calculated for the different estimation of the wind speeds and compared at real measured data wind speed.The resultants are given in table1.According to this table it should be noted that the MAE is improved from 21% to 3% and the RMSE is improved from 1.6102 to 0.1892, the coefficient of correlation (r) is improved at 0.996.prediction purpose.As a result, the model consider the difference between the observation models more naturally and confirmed the superiority of the proposed model in terms of errors.One of the most important aspects of future work will be to generalize and/or extend the proposed model to apply it to other renewable energy sources such as solar irradiation and air temperature.

Fig. 1 :
Fig. 1: Evolution of the wind speed time series used in this simulation.

Figures 2 and 3
Figures 2 and 3 present the simulation results for generating a single-step (one hour) prediction with natural sequential prior transition model.

Fig. 2 .
Fig. 2. Actual and predicted wind speed data with classical learning.

Fig. 3 .
Fig.3.Actual and predicted wind speed data with our approach.

Table 1 :
Comparison between observed and predicted meteorological dataCONCLUSIONIn this paper, a natural sequential prior which uses a Fisher information matrix was applied to the online learning of three-layer perceptron used for wind speed