PV power prediction based on AO-VMD-RF-Informer

. Due to the strong volatility of PV power, PV grid-connected may have an impact on the safe and stable operation of the power system, so accurate prediction of PV power is of great significance to the operation and maintenance of the power system. In order to improve the prediction accuracy of photovoltaic power, an ultra-short-term photovoltaic power prediction method was studied by combining the Aquila Optimizer (AO) algorithm, the Variational Mode Decomposition (VMD), the Random Forest (RF) and the Informer prediction model. Firstly, the VMD parameters are optimized by AO to reduce the adverse effects of human-set parameters on the prediction accuracy; the optimized VMD is used to decompose the original PV power series into multiple sub-sequences to reduce the volatility and complexity of the original power series; then, the RF feature selection method is used to screen out the meteorological features of strong relevance for each sub-sequence to further reduce the feature dimensions and the model runtime and ensure the effectiveness of the input features. Finally, the Informer model is used to deeply mine the potential time series features of each subsequence for prediction, and the predicted values of each subsequence are superimposed and reconstructed to obtain the final prediction results. The simulation results show that the method in this paper has high prediction accuracy, and compared with the original Informer, the MAE is reduced by 49.14% and the RMSE is reduced by 47.64%.


Introduction
With the worsening of the global energy crisis, the development and utilization of renewable energy sources are receiving more and more attention from various countries [1].As a clean and non-polluting energy source, solar energy occupies an important position in the field of renewable energy.However, photovoltaic power generation is affected by multiple factors, such as weather conditions, the equipment's own efficiency, season and irradiance, which leads to the instability of photovoltaic power generation and seriously affects its efficiency and stability when connected to the grid [2].Therefore, accurate prediction of photovoltaic power generation provides important decision support for grid scheduling and stable operation, and also helps to improve the economic efficiency of solar power generation and energy utilization, as well as providing a reference for energy policy formulation and power market operation [3].
Most of the existing prediction methods are based on physical methods [4] and machine learning methods.Physical methods are mainly based on the physical characteristics and laws of PV power generation systems for prediction.Such methods usually include photothermal dynamics models, circuit theory models, etc [5].These models require detailed component parameters, such as the physical characteristics of solar cells, and environmental parameters such as temperature and light intensity [6].Literature [7] establishes the electrical relationship between weather factors and solar panels and predicts the output power by diode modeling of photovoltaic cells taking into account the inverter efficiency.The prediction error is between 9% and 15%, but the accuracy relies on accurate numerical weather forecasts.Literature [8] developed a PV module power prediction model by calculating the tilted radiation of PV panels and extracting photovoltaic cell parameters to achieve variable time scale prediction.Although these methods can provide relatively accurate prediction results, in practice, physical methods usually require high expertise and complex calculations due to the variability of environmental conditions and device parameters, limiting their application.On the other hand, machine learning methods predict PV power by learning and mining patterns from a large amount of historical data.Such methods include linear regression [9], support vector machines [10], artificial neural networks [11], and deep learning [12].These methods can deal with complex, nonlinear relationships and require less accuracy in environmental factors and equipment parameters, so they have become the mainstream prediction methods at this stage.
Based on the strong volatility problem of PV power, literature [13] proposes a PV power prediction method combining Ensemble Empirical Mode Decomposition (EEMD) and Long Short Term Memory (LSTM) network, which reduces the PV power volatility and improved the average prediction accuracy.However, the input features are not screened or downscaled, and EEMD is similar to EMD, which also generates the problem of modal aliasing during the decomposition process, resulting in the decomposition results not being clear enough or not being able to effectively distinguish different modal components.Literature [14] used Variational Mode Decomposition (VMD) to decompose the PV power series into several different modal components, overcoming the fact that EMD reduces the volatility of PV power, and used Long Short Term Memory (LSTM) to perform the prediction, and finally superimpose all the predicted modal components, so that the prediction accuracy has been improved to some extent.However, VMD needs to select some parameters manually, such as the number of modes and regularization parameters.The selection of these parameters has a large impact on the performance and results of the VMD, but it may not be easy to determine the optimal parameter configuration in practical applications.Literature [15] compared the prediction model Informer with Recurrent Neural Network (RNN), LSTM, and Transformer, and demonstrated that the Informer model possesses high prediction accuracy and operational efficiency, and has outstanding advantages in prediction.
Based on the study of the above literature, this thesis investigates an ultra-short-term PV power prediction method by combining the Aquila Optimizer (AO), VMD, Random Forest (RF) and Informer.Firstly, AO is used to optimize the parameters in VMD to decompose the historical power sequence into multiple modal components, which initially reduces the volatility of PV power and avoids the problem of unsatisfactory decomposition by empirical parameter tuning.After that, RF is used to filter each component based on feature importance, and the meteorological factors with higher impact on each component are selected from multiple meteorological factors to form the input feature vectors of each model.Finally, an Informer prediction model is built for each component, and the prediction results of each component are superimposed to obtain the final PV power prediction results.

Algorithm theory 2.1. Variational modal decomposition (physics)
VMD is an adaptive signal analysis method that decomposes a complex signal into several intrinsic mode functions (IMF) [16].It is able to decompose nonlinear and nonstationary data into a series of intrinsic mode functions with finite, instantaneous bandwidths, which well compensates for the modal mixing problem of EMD, and its mathematical model is: Where k u is the power series decomposition of each sub-sequence mode; k  is the center frequency of each sub-sequence mode; t  is the function Dirichlet function.
In order to simplify the process of finding the optimal solution, Lagrange transform and quadratic penalty factor are introduced: Where  is the Lagrange operator;  is the quadratic penalty factor.For the problem with restricted variables, k u and k  for the optimal solution can be updated as. ( Where n is the number of iterations, 1 ˆn k u + is the Wiener filter for each modal component,  denotes the Fourier transform, and

Skyhawk optimization algorithm
The Skyhawk Optimization Algorithm is an optimization algorithm based on the hunting behavior of skyhawks, which simulates four behavioral strategies of eagles to hunt prey [17].The specific steps of AO to optimize the VMD parameters are as follows: After initializing the parameter information of AO, in each iteration, the position of the individual is evaluated and updated.If the position of an individual exceeds the boundary, it is confined within the boundary.The VMD function is then used to analyze the signal and obtain the fitness score of each individual.The individual with the best fitness is selected by comparing the fitness of all current individuals.
The location updating strategy of an individual is categorized into four scenarios: extended exploration phase, reduced exploration phase, expanded exploitation phase, and reduced exploitation phase.When 2 3 tT  and 0.5 R  , it is the extended exploration stage: Where t is the current iteration number; T is the maximum iteration number; ( ) ) tT  and 0.5 R  , it is a reduced development phase: Where Q is the mass function of the balanced search strategy; 1 G is the various motions of the prey tracked by the AO; 2 G is the flight angle from the initial to the final position of the AO.
When the maximum number of iterations is reached, the algorithm ends and returns the optimal position as well as the optimal fitness.

Random forest
Random forest algorithm as an integrated learning method can also be used for feature selection tasks.Among them, Out-of-bag (OOB) and Mean Decrease Accuracy (MDA) are common methods used for feature selection and evaluating the importance of features in random forests [18].Random forests use bootstrap sampling to randomly select a portion of samples from the original data when building each tree, and the out-ofbag samples are the samples that are not sampled in this process.And MDA is a method to assess the importance of features.Benchmark performance of the model is calculated using out-of-bag samples.The values of the out-of-bag samples for a feature are randomly disrupted, and then these out-of-bag samples are predicted using Random Forest to get a new performance metric.The baseline performance is subtracted from the new performance metric and the resulting difference is the MDA score for that feature.The larger the difference, the greater the of this feature to the model.Based on the size of the MDA score of each feature, the feature that contributes more to the prediction of each component is selected to form the final input feature for each modal component prediction model.

Informer model
Informer is a deep learning model based on the Transformer structure, which is mainly applied to time series prediction, especially for high-dimensional and long-term time series prediction tasks [19].Based on Transformer, its encoder and decoder are optimized by adopting a novel temporal encoding method and generative decoding, which improves the model's ability to understand and capture temporal patterns as well as the model's computational efficiency.Its structure is shown in Fig.The model introduces a novel sparse self-attention mechanism that efficiently handles long sequences and reduces computational complexity by automatically selecting key time steps from long sequences for attention.The self-attention mechanism of the Transformer model relies on ternary inputs, i.e., the query ( Q ), the key ( K ), and the value (V ), which is followed by scaling of the dot product.Informer's probabilistic sparse self-attention mechanism then causes K to pay attention to the more important Q in order to extract the main feature information, Eq: ( ) Where ,, Q K V denote the three matrices obtained by linear transformation of the input PV sequence data, Q denotes Q after probabilistic sparse operation, and k d is the input dimension.
In addition, it proposes a sequence distillation method called "Distilling the Past to the Future", which can learn and extract useful information from the past and use it to predict the future.The specific process is described as follows: ( ) ( ) Where t j att X   denotes the multi-head probabilistic sparse self-attention module of the jth layer matrix; 1 Conv d is a one-dimensional convolution operation; and the activation function is ELU .After thinning and distillation, the length of the sequences fed into the decoding layer is halved, which reduces the computational effort and runtime of the model.
Afterwards, the decoder receives as input a segment of PV sequence data and a segment of zeros of length equal to the prediction step, which are used as placeholders for the values to be predicted.Finally, the data is passed through a fully connected layer in order to adjust the output dimension to obtain the prediction result.

PV power prediction based on AO-VMD-RF-Informer
Aiming at the strong volatility of PV power sequences and the problems caused by the drawbacks of artificially setting the VMD parameters, the PV power sequences are decomposed using the AO optimization VMD, generating IMFs of multiple frequency bands to initially reduce the volatility of the original PV power sequences; after that, the RF is used to screen the input features for each IMF to reduce the feature dimensions, and at the same time, ensure the validity of the input features in order to enhance the prediction accuracy; finally, an Informer model is established to predict each subsequence, and finally, the prediction results are superimposed and reconstructed to get the final prediction results, and the established prediction model flow is shown in Fig. 2.  (1) After obtaining the historical dataset of PV power plants, the data are preprocessed, sampled in 15-minute units, and checked for missing values and outliers, which are filled in by mean interpolation.
(2) Initialize the number of populations and the maximum number of iterations of the AO algorithm; then, the AO algorithm seeks the optimization of the parameters of the VMD: the penalty factor (alpha) and the number of modes (K) according to the fitness function; the population is updated according to the rules of the previous Skyhawk optimization algorithm for each iteration until the maximum number of iterations is reached; and finally, the PV power signal is decomposed with the optimized parameters of the VMD.
(3) All feature sequences X and modal functions Y were obtained.the MDA score for each feature corresponding to each modality was calculated using RF.And the features were sorted according to their importance scores and the top n features with higher importance were selected.Finally, these important features were selected from the original feature data X to obtain a new feature set for each Y.
(4) After choosing a suitable feature set for each subsequence, the Informer model is built to predict each sub-sequence, and the Mean Square Error (MSE) is used as the loss evaluation function for training, and the validation set and early stopping mechanism are set up, so that when the MSE of the validation set does not decrease after three consecutive rounds of training, the training is stopped and the optimal model is saved for testing.Finally, the reconstruction superimposes the prediction results of each IMF to obtain the final prediction results.
(5) Multiple error evaluation metrics are used to analyze the prediction results and verify the effectiveness of the modeling.

Calculus analysis 4.1 Data sources and related settings
In this paper, the measured data of a kilowatt-scale PV power plant in Alice Springs, Australia, from 2014 to 2015 are used fr experimental validation, and all the data are recorded once in 5 min, and because the source data density is too high and is no sunlight at night, the data at night are deleted, and only the data from 7:00 to 19:00 every day are retained and sampled once every 15 minutes, which makes a total of 35,667 sets of data.Considering the characteristics of the input time series for experimental validation, the input series is set to 4 and the output series is set to 1, which means the power data at the next time point is predicted.Finally, the data set is divided into training set, validation set, and test set according to the ratio of 6:2:2.Finally the prediction results of typical sunny days and typical fluctuating days in the test set are randomly selected for presentation and analysis.

Results for AO-VMD
The original PV power dataset and the decomposition results are shown in Fig. 3, where only the first 3000 samples are shown.The number of populations N=10 for the initialized AO, the maximum number of iterations T is also set to 10, and the number of modes K after decomposition is 7; IMF1 in the figure is not strongly volatile, showing the overall trend of PV power change in this period; IMF2 and IMF3 change more slowly, from which the potential periodicity of PV power can be seen; IMF4-IMF7 change relatively fast, which to a certain extent reflecting the strong volatility of PV power.The decomposition of multiple modes by AO-VMD, which have different characteristics, helps to understand and analyze the original signals from different perspectives, reveals some hidden characteristics of the original signals, and at the same time reduces the strong volatility of the original PV data, which also has a certain positive effect on the enhancement of the prediction accuracy.

RF feature screening results
The original PV dataset contains many meteorological sequence data: global horizontal radiation (GHR), global scattered radiation (DHR), temperature (T), humidity (H), wind speed (WS), wind direction (WD), and rainfall (R), and when the PV power sequence is decomposed into multiple sub-sequences, the physical significance of the changes, and the correlation between the meteorological factors and the sub-sequences also changes, so it is necessary to filter the characteristics of individual IMFs.
Through the RF feature selection method, the MDA scores between each sub-sequence and meteorological features are calculated, and several features with similar and higher scores are selected as feature inputs for each sub-sequence, and the final selection results are shown in Table 1.As can be seen from Table 1, the meteorological features that have a greater impact on the overall trend of PV power are GHR, DHR, H, and T; while the cyclical collection of the power series is most affected by GHR and DHR; other meteorological factors are key to the fluctuation of the power series, with DHR and GHR contributing the most to all the sub-series.

Analysis of forecast results
The accuracy of the prediction is judged by Mean Absolute Error (MAE) and Root mean squared error (RMSE), the smaller their values, the better the prediction, as specified in the formula:  As can be seen in Figure 4, the PV power changes on sunny days are more regular, the curve is relatively smooth, and the results of each model for the prediction of PV power on sunny days are better fitted to the real value; the PV power in Figure 4 undergoes a mutation, and this paper's model is able to better fit the mutation point compared to other models; the PV power in Figure 5 is a continuous mutation type, and it can be seen from the figure that this paper's model is able to better grasp the mutation trend compared to other models can better grasp the mutation trend and have better prediction effect.The error evaluation indexes of each model for the prediction of three typical days are shown in Table 2.It is more intuitively obvious from Table 2 that, regardless of the type of power variation, the method in this paper has a lower MAE and RMSE compared to other models, and has a higher prediction accuracy.
This paper also statistically evaluates the error evaluation metrics of all the models for the prediction of the test set, as shown in Table 3. Table 3 shows that AO, VMD, and RF in this paper's method are all effective in improving the average accuracy of PV power prediction.When the VMD method is introduced into the Informer model, the MAE is reduced by 23.69% and the RMSE is reduced by 26.32%, which verifies the effectiveness of the VMD method; when the RF feature selection method is added again, the MAE is reduced by 29.52% and the RMSE is reduced by 34.34% compared to the Informer, which proves that the feature screening has a positive effect.

Conclusion
Aiming at the phenomenon that PV power is affected by multiple factors and shows significant volatility, which may cause a series of problems in the process of grid connection, this paper investigates an ultra-short-term prediction method of PV power by AO-VMD-RF-Informer.The validity of the model is verified by simulation with actual measurement data from a kilowatt-scale PV power plant in Alice Springs, Australia.The following are the main research conclusions: (1) The optimization of VMD parameters is carried out by AO, which successfully optimizes the problem of poor prediction accuracy caused by human-set parameters.The experimental results show that after the optimization by AO, the VMD can decompose a more stable subsequence, which effectively reduces the complexity of the original sequence, thus helping to improve the accuracy of prediction.
(2) The RF feature selection method effectively filters out the input features that have the strongest correlation with the subsequence, which reduces the feature dimension and the model runtime, as well as the prediction error.
(3) The Informer model can deeply explore the potential features of the time series.Through various comparative experiments, it is confirmed that the prediction error evaluation indexes of this paper's model are better than other models in the case of sunny days, days with sudden power changes, and days with power fluctuations, and it is also proved that this paper's method has better prediction effect in comparison with the original model.
The method in this paper reduces MAE by 49.14% and RMSE by 47.64% compared to the source Informer, proving that the subsequence after adding AO-optimized VMD and RF feature selection better reflects the characteristics of the data, which is conducive to accurate prediction by Informer.

Discussion
The prediction accuracy of this paper's method is high under various weather conditions, and the addition of AO-VMD and RF on the basis of Informer model effectively reduces the prediction error, which proves the effectiveness of this paper's method.This study optimizes the problem of low prediction accuracy in the case of strong fluctuation of PV power, which plays a positive role in the stable operation of the power system after grid-connected PV power plants, and promotes the development of clean energy to a certain extent.
( ) best Xt is the best position of the t iteration; ( ) M Xt is the average of the neighboring positions of the t iteration.3 tT  and 0.5 R  at the time of the reduced exploration phase:
Photovoltaic historical data

Table 1 .
Feature selection results.

Table 2 .
Error evaluation metrics for three typical day forecasts.

Table 3 .
Error evaluation metrics for test set forecasts.