Analysis of hydropower ratio from total energy production in Romania

. It is important to know the structure of electricity production of a country and to establish the place of each source within the total energy mix. The aim of this paper is to analyze statistically the percentage of the hydropower production related to the total electrical energy produced in Romania. The time series data analysis was done to determine the trends of average and standard deviation of hydro energy data produced in Romania over eleven years’ period (2006-2016).


Introduction
Generally, recorded data from any process can be used to create models based on statistical analysis and machine learning. In order to be able to process such series, it is important to determine the main features that allow further predictions. By analyzing the known data, the tendency of their variation can be shown by identifying the nature of the represented phenomenon.
Numerous activities that present some uncertainties can be defined through time series of observations that are time-dependent, such as: monthly energy production, the temperatures recorded for a given region in the last 10 years, the evolution of the exchange rate, the annual evolution of a country's population, etc.
Based on the obtained model with statistical analysis of the temporal series, it is possible to predict the future evolution of the process from the last recorded values of the variable of interest.
Paper [7] presents the main characteristics of the energy data and existing models for energy research. Mainly, the analysis of time series admits that four components can be identified: tendency, periodicity, cyclicality and stochastic. For example, in different countries, a comparative analysis of energy consumption was made by using an integrated data characteristic approach [7]. Hence, formulating a forecasting model based on time series analysis requires the determination of these four components [8].
Various authors propose an improvement in forecasts based on time series analysis (in particular for nonstationary series) using the artificial intelligence [9][10][11].
For Romania the evolution of electricity consumption and production of hydropower plants in time, since 1884, was presented in [12]. The Romanian hydropower sector recorded an important increase in hydroelectric power between 1950 and 1990, when the most important and biggest hydropower developments (HPDs) were made (e.g: Izvorul Muntelui Bicaz HPD, Vidraru HPD, Lotru HPD, Iron Gates HPD, etc.). From 1 MW installed capacity in 1884, Romania reached an installed capacity of 6761 MW on January 1 st , 2018. In 2017, the electricity sector in Romania was dominated by coal and hydrocarbons, 25.1%, hydro, 14.5%, nuclear, 10.6%, and renewable energy sources (wind, solar and biomass), 9.6%, [14].
The analysis of hydropower production variation was also related to the construction and modernization of hydropower plants. Furthermore, the economy of a country can influence the variation in time of produced and consumed energy [15].

Data analyzed as time series
With the implementation of European Directives, over the past ten years, an increase in electricity production from renewable energy sources can be noticed. For example, wind energy accounted for approximately 0.02% of the total electricity production in Romania in 2008, reaching 10.1% in 2016.
In this context, it becomes necessary to observe the evolution over time of the percentage of energy from hydropower in the total energy production in Romania by performing a statistical analysis.
Chronological records of the hydropower production and the total electricity production, monthly outputs, in Romania, between 2006 and 2016, define the time series analyzed in this paper. We mentioned that data from the first two years of the series are approximated from the existing graphs, and precision may not be accurate. The rest of the values used in the analysis are expressed as numerical values in the annual reports of Hidroelectrica available on-line [16].
For the coefficient defined as the hydropower production divided by the total electricity production in Romania, hereafter referred to as the random variable X, the analyzed values are recorded with yearly time-step N p  , 2 , 1 = and monthly time-step  =   , 2 , 1 and notation is Since the analysis period is not long enough, only tendency, periodicity and stochastic components were determined; it was not the case for the cyclicality component.

Determination of time series characteristics
Time variation of the random variable  If tendency, periodicity and stochastic components are identified and removed from the original random variable, what remains is a variable that behaves as a dependent stationary process.

Modeling the tendency component on average
To analyze the tendency of the monthly average values, a linear regression equation can be determined using the average values of the variable ( ) mp T X for each year: where m m B A , are coefficients of the polynomial function and t is the year index.
To change the time from years to months the following relation is used: a regression equation for each step in the original series can be found ( ) Removing the average trend from the original variable leads to the generation of a new variable: If the tendency in average is important, this new variable will be different from the original one.

Modeling the tendency component on variation
The string of annual averages of variable  , p Y is determined using the relation: and then the standard deviation values for each year can be computed with the formula: For this variable, a linear regression equation can be determined, of the form:

Determination of the periodic component
In order to determine the periodicity of this series, the method used is the autocorrelation.
To determine the existence of periodicity of the time series, the Anderson test [17] is used, according to which the boundaries of the confidence interval for the process correlation are given by the equation: where N is the length of the data series string, k is the correlations' order and C Z -the standard normal distribution quantile for which the distribution function is equal to , if the correlogram exceeds the limits specified by (8), then the process is not purely stochastic, but also has a component of periodicity.
In this case, variable Z can be expressed in the form: ; Z m and Z s are the average and the standard deviation of ); n is the number of significant harmonics taken into account, and j A , j B , ' The averages and standard deviations in these relationships will be given by: According to the mean   and the standard deviation   , variable Z can be written: where:

Independent and stationary stochastic component
By relation (14), the independent and stationary stochastic component is determined as: For the variable  one can choose a normal probability density function of the form: Parameters   and   of the proposed function can be determined by the maximum likelihood methods or the least squares method. Its adjustment to the values of the data set obtained by relation (16) is tested with the chi-square test. The statistic can be computed with the relation: where j n is the number of observations in class j and * j n is the theoretical number of observations in class j corresponding to the function being tested. If the value of this estimator is lower than the standard normal distribution quantile for which the distribution function is ) 1 (  − , the normal function is accepted as adequate for the data sample being analyzed.

Results
The array of 132 analyzed values has the average 27.11% and the standard deviation 3.29%.
The coefficients of the regression equation, specified by a first order polynomial, were obtained by the least squares method and have the form: where t specified the year. Transforming time from years into months and eliminating this trend from the original variable, the variable becomes Y, which has the same order of magnitude and the rate of the curve does not differ from that of the original variables (figure 1), because the average trend is insignificant.
The regression equation for the trend in the variance obtained from the values of the standard deviation for each year has the form: in which t is the year specification which also had to be converted into months. By removing this trend from intermediate variable Y a new statistical process Z is obtained, which contains only components of periodicity and stochasticity. This variable is graphically represented in figure 2.  While the original variable varies from 12 to 45%, figure 2 illustrates that the variable obtained after the elimination of both tendencies, on average and in the variant, is situated between 2 and 6.5%. The average is 3.88% and the standard deviation 1.07%.
The correlogram for variable Z is shown in figure 3. It is clear that the limits defined by the Anderson test are exceeded, which shows that variable Z has periodicity. This was to be expected, given the dependence between the hydro energy produced and the variation of the rivers flows which also show periodicity within the period 1 = T year. For periodicity modeling, the multi-year values of each month for average and variance are used for calculating coefficients A, B, A' and B' of the 6 considered harmonics.  Periodic averages and standard deviations, calculated with Fourier series (6 harmonics) and for Z array respectively for every month, are very close, as can be seen in Table 2. The averages obtained by approximation with the Fourier series show a maximum deviation of 0.0106 as compared to the annual average of the sample and for the standard deviation the maximum difference is 0.0068. The independent stochastic component is obtained by relation (16). For this variable, a classic statistical analysis can be performed [17].
The frequency histogram for classification in 8 = k classes is shown in figure 5. One can see symmetry in the graph which suggests choosing a normal density function. The normal function can be accepted as being appropriate for the variable representing the independent stochastic component.

Conclusions
The variation coefficient of the hydro energy percentage and the total energy produced in Romania over a period of 11 years shows an insignificant tendency of variation on average. This one faithfully follows the seasonal variations of river flows, the inflows in the hydropower developments reservoirs, respectively the turbine flow in hydropower plants. The period of the analyzed temporal series is of 1 year.
The random variable Z that was obtained after elimination of trend components can be used in future in a forecasting program by generating values for the stochastic variable according to a normal probability law that has been found to fit very well with the considered variable.