Wind Speed Prediction Based on Seasonal ARIMA model

Major dependency on fossil energy resources and emission of greenhouse gases are common problems that have a very harmful impact on human communities. Thus, the use of renewable energy resources, such as wind power, has become a strong alternative to solve this problem. Nevertheless, because of the intermittence and unpredictability of the wind energy, an accurate wind speed forecasting is a very challenging research subject. This paper addresses a short-term wind speed forecasting based on Seasonal Autoregressive Integrated Moving Average (SARIMA) model. The forecasting performances of the model were conducted using the same dataset under different evaluation metrics in terms of Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) performance evaluation metrics. The obtained results denote that the used model achieves excellent forecasting accuracy.


Introduction
Renewable energies (RE) provide sustainable solutions, to the energy challenges of the 21st century, climate change, air pollution, depletion of resources, and fast demographic evolution. Actually, more than 90% of the world's new renewable energy production capacity is from solar and wind [1]. Wind energy is among the most promising sources [2]. Indeed, this source coupled with interconnected grid networks could provide on the one hand, the energy needs, and on the other hand, prevent the excessive depletion of fossil sources and reduce the harmful gas emission. The development of a wind farm projects makes use of climate data. So, very short-term forecasts ranging from a few minutes to several days will be used for the optimization of electricity market transactions, sizing and management of electricity networks, including reserves, at different spatial and temporal scales. Wind speed is the main factor in the wind power generation system. Subsequently, accurate wind speed forecasting is critical for system operators to ensure reliable electricity supply, thus operating the system securely, in terms of system stability, auxiliary service and power quality [3].Because of its nonlinear and non-stationary characteristics, as well as the complex interactions with several meteorological factors, forecasting the wind speed precisely is a difficult task.
The first works on wind speed forecasting problems for the next hours and days appeared in the early 1980s. These methods were presented by the authors in [4] , who *E-mail: ilham.tyass-etu@etu.univh2c.ma proposed to give information on the uncertainty of forecasts. Nowadays, the literature related to wind forecasting and wind energy production is expanding very rapidly. A major part of the developments in forecasting methods is devoted to time series modeling methods (AR, ARMA, ARIMA...etc.). Authors in [5] used an Autoregressive Moving Average (ARMA) model to predict the hourly average wind speed and compared the results with the persistence model , and they concluded that the errors of ARMA model were smaller than the ones of the persistence model. In [6] , authors compared the Autoregressive Integrated Moving Average (ARIMA) and the Artificial Neural Network (ANN) methods applied to a time series of wind speed measurements. They found that the ARIMA models presented a better sensitivity to the wind speed adjustment and prediction .Based on the ARIMA and Improved Kalman Filter Algorithm, authors proposed a new wind speed prediction model [7].
The primary purpose of this paper is to analyze the time series of wind speed data in order to extract significant statistics and other data features (Trend, seasonality, noise…). Additionally, we propose a predicting wind speed data model based on seasonal ARIMA model (SARIMA). Finally, we will use three different yearly periods to investigate the forecasting. Different evaluation metrics, such as root mean square error (RMSE), mean absolute error (MAE) and Absolute Percentage Error (MAPE), are used to evaluate the prediction performance.

Background Information of study area and data.
The meteorological data used in this study are collected on the site "Abdelkhalak Torres" in Koudia Al Baida Tetouen located at (Latitude: 35° 45' 35.1, Longitude: -5° 41' 19.9''). This wind farm, ordered by the national office of electricity and drinking water (ONEE) in 2000, has a capacity of 50.4 MW. It is the first wind farm realized in Morocco and on the African continent, a repowering and extension program is developing. Currently, more than 206 GWh/year of wind power are generated by 91 wind turbines of 500 kW and 600 kW. This contributes to a reduction of greenhouse gas emissions equivalent to 140,000 tons of CO2 per year. The wind speed data used in the present study covers three different months in 2018, March, October and July which correspond respectively to the windiest, moderately windy and less windy months. These data are recorded daily at 10-minute intervals at a height of 100 m [8].The average wind speed in this site is about 10.3 m/s. Figure 1, shows the distribution of wind speed and direction at the studied location. We observe that there is a predominance of wind in the East-South-East (ESE) direction.

Data mining
In order to prepare the data for processing, the original data were carefully reviewed for errors, so columns that were not used were removed, missing and null values were replaced by the average of the five previous values, and afterwards data were synthesized by calculating the average wind speed for each hour. Thus, each month totals 744 data samples whose evolution is depicted in Figure 2.

Time Series Wind Data analysis
Time series are sequences of data points measured over successive time intervals. Their main specificities compared to the most common domains of machine learning are their time dependency and the seasonal behaviors that appear in their evolution [9]. In order to analyze a time series, a first step of decomposition of the series is necessary. Decomposition provides a useful abstract model for better understanding problems during time series analysis and forecasting. A time series (Yt) is commonly decomposed into: Trend (Tt): A trend corresponds to a long-term evolution of the series.

Seasonality (St):
This is the property of a time series displaying periodic behaviors repeating at a constant frequency.

Noise (Ɛt): Statistical noise is irregular, random and unexplained fluctuations
This decomposition can be additive Yt = Tt+St+Ɛt or multiplicative Yt = Tt*St*Ɛt. It is also possible to combine these two decompositions Yt = Tt*St+Ɛt Sometimes we add another component, the cycle Ct which corresponds to a regular repetitive phenomenon (thus predictable) of unknown or changing period. The graph illustrated in figure 3, shows separately the three components obtained following the decomposition of the series, the variance of the series has been stabilized in order to have a stable periodic component over time. The graph presented on figure 3 shows clearly that the wind speed for the three months study is unstable, with some seasonality, also the residuals are interesting, which means that there are important random fluctuations in the data . According to the trend line, the series has no longterm evolution.

The forecasting autoregressive models of time series
Time series prediction models are different from classical models, because they permit to use the series historical values to estimate the future ones, using lagged variables. In other words, to make predictions at time t, we suppose to know the past values at time (t-1). The first models developed for the prediction of time series are univariate models based on the principle of auto-regression. The most popular models of this type are; AR (Auto-Regressive), MA (Moving Average), ARMA (Auto-Regressive Moving Average), and ARIMA (Auto Regressive Integrated Moving Average) [10].
AR model: Most current models are inspired from the autoregressive principle. In an auto regression model, we forecast the variable of interest using a linear combination of past values of the variable. The term auto regression indicates that it is a regression of the variable against itself. Thus, an autoregressive model of order p can be written as: Changing the parameters α1… αp, we obtain different time series models. The variance of the error term Ɛ(t) will only change the scale of the series, not the patterns.

MA model:
The Moving Average has the same structure as the AR model, but considering the error terms instead of the previous values of the series. The MA model (q) can be defined as follows: ARMA model (p, q): Combines both processes AR (p) and MA (p) by considering both the error terms and the previous values of the series: ARIMA model: For non-stationary time series, the ARIMA (p, d, q) model is more appropriate. It consists in applying the ARMA model on the series transformed by differentiation of order d, calculating d times the differences between the consecutive observations. But, ARIMA does not support seasonality. The Seasonal Autoregressive Integrated Moving Average, SARIMA, is an extension of ARIMA that explicitly supports the seasonal component. Seasonality is treated as an ARIMA itself, with terms similar to the non-seasonal components of the model by adding the parameter S which is the period and the parameters P, D and Q which correspond to the factors p, d and q of the differentiated series times [11]. The parameters p, d, q, S, P, D and Q are determined by optimizing the Akaike information criteria (AIC). This criteria measures the compromise between the complexity of a model and its fitting quality [12]. There are also neural network models and deep learning model that use delayed predictors ,especially in the Recurrent Neural Network (RNN) family, such as the Long and Short and Term Memory (LSTM) [13] and Gate Recurrent Unit (GRU) networks [14].

The parameters selection
The aim of this phase is to find the optimal set of parameters that achieves the best performance for the regression model. Primarily, the data is prepared by a degree of differencing in order to make it stationary. The optimal combinations of SARIMA parameters that give the lowest Akaike Information Criteria (AIC) are:  Figure 4 illustrates the model diagnostics obtained for each month. The model diagnostics reveals that the model residuals are normally distributed. Moreover, the time series residuals have low correlation with lagged versions of itself. The line of KDE (Kernel Density Estimator) follows the N (0, 1) line (The standard notation for a normal distribution with men "0" and standard deviation of "1"), this is another indication that the residuals are normally distributed. The quantile-quantile (Q-Q ) plot, assesses if the data came from some theoretical distribution ,it shows that the ordered distribution of residuals follows the linear trend in the middle of the graph, but they curve at the ends, which means that the data have several extreme values. The Correlogram shows that the time series residuals have low correlation with lagged versions of itself. Consequently, the model produces a satisfactory fit that helps to understand the time series data and forecast future values.

Forecasting results
We split the dataset into train and test sets, the train set allows to fit the model, and generates a prediction for each element on the test set. We fix forecasts to start respectively at 24-03-2018, 24-07-2018, 24-10-2018 to the end of the data. A rolling forecast is established, given the dependence of the observations of the previous time steps. We produced one-step ahead forecasts, meaning that forecasts at each point are generated using the full history up to that point. The graph in figure 5 shows the observed values compared to the forecast values. On the whole, the forecasts align well with actual values.
Where "n" is forecasting number, "ŷ" is the forecasting wind speed and "y" is the corresponding actual value.

Results analysis
The forecasting performance in three different periods is presented in However, it should be noted that MAPE is not the best measurement criterion to be used for the wind speed analysis, because some of wind speed data values are equal or close to 0 m/s. That is why it is better to use RMSE and MAE instead of MAPE to verify if the model is suitable for wind speed prediction. According to October errors, we conclude that SARIMA forecast quality depends on invariability of the data. When there are sudden changes in the analyzed data, higher errors occur. The reason behind this is the use of previous wind speed values in SARIMA model prediction.

Conclusion
Improving the wind speed prediction accuracy is of great significance for wind energy conversion, SARIMA based models are developed in this paper to predict the wind speed. The model is verified by three wind speed examples data sets .It has been shown that the prediction error for the one hour ahead forecast is less than 16%, which represents a good result regarding the short compilation time which does not exceed 5 minutes. To improve the accuracy of power forecasting, it will be useful to develop a model that integrates the direction of wind flow, as will be investigated in the future work. Also, it is possible to build a hybrid forecasting method based on SARIMA method in order to adopt more seasonal and trending changes and to take care of both the high frequency variation as well as low frequency variation in wind speed.