Possibility of the modelling of electricity production from hydropower

In hydropower plants benefits depends on available flow. The paper presents a hybrid model for forecasting the operation of a hydropower plant, including the production of electricity. The possibility of mathematical modeling was chosen to show connections between observed in the past hydrological conditions (available flow) and energy deliver in the future. The available flow which is not enough for start turbines was forecasting by logistic regression model. The opposite situation when the flow starts turbine to produce energy, regression models (the support vector machines SVM, random forest RF, k nearest neighbour k-NN) were used. Results from hybrid model were compared with chosen data-mining methods. The possibility of forecasting of the length of periods when hydropower plant will be working could be very useful. It provides the prognosis of energy value which could be produced from hydropower plant. From the investors’ point of view the economic justification for the execution of the project based on the future energy producing could be a main criteria to realize or buy/sell hydropower plant. Also the secondary importance could be a possibility of planning review and maintenance work. Knowledge of power plant working periods could be a base for assessing a potential production from hydropower plant.


Introduction
With the vast changes that have taken place in the world over past years there is now a great demand for renewable energy and countries are being required to generate and utilize a specific amount of renewable energy by the year 2020. In recent years, dynamic growth of expenditure on environmental protection has been observed in European Union (EU) countries [1]. Access to EU funds has allowed significant resources for the financing of investment and ecological activities to accumulate in some countries [2]. Renewable power engineering sector investments are recognised, by virtue of the European Parliament and Council Directive 2009/28/EC of 23 April 2009 on the promotion of the use of energy from renewable sources [3] as important sources of energy, supported due to the care about the environment. Thus, the benefits from execution of such investments are highly valued in both regional and global scale, especially in the aspect of climatic changes and progressing pollution of the environment. To confirm the priority of sources of energy recognised as renewable and friendly to the environment, the directive imposes the obligation of generating and managing resources of renewable sources energy, thus defining the objectives for the member countries, including Polandat 15% of the gross final consumption of energy by 2020 [4]. A large amount of investment has been made during recent years and the advancement of technology has enabled countries to produce renewable energy more cost effectively [5]. An energy production forecast, which gives information about how much energy will be produced effectively by a certain power station in a certain period can be useful for optimising the marketing of a renewable energy [6,7]. Such regional hydroelectric energy projections can then be used to support energy resource planning and also to evaluate the climaterelated risk for long-term power marketing activities in further investigations [8]. Giving the status of a renewable source to such facilities also enables to get preferential funding sources as well as obtain certificates of origin, which greatly improves the economic efficiency of the project [4]. In run-of-the-river power plants benefits depend strictly on the available river flow. The operating principle of hydroelectric power plants is based on utilizing the potential energy of water, which constitutes the basic component of the plant. In other words, water is crucially important in energy production in hydroelectric power plants [9]. On the other hand, the hydropower plants may degradated of water habitats, which is reflected in reduced abundance and diversity of fishspecies and other aquatic organisms [10].
Forecasting production of electricity generated in hydropower plants most often refer to large pumpedstorage power stations, i.e. those with two water reservoirs (the upper one and the lower one) along with the possibility of water management [11]. The review of the literature shows that models executed with the methods of artificial neuron networks [12,13,14] and their modifications [15,16] were used for calculations. These models required implementation of complex mathematical algorithms.
However, from a practical point of view, information solely about whether the analysed facility is functioning or not is insufficient. A prospective investor, or the owner of the hydropower plant is interested in a specific value of production of electricity and a period in which he could rationally use excess of the produced energy. Therefore, development of the so-called hybrid models appears to be advisable in modelling operation of run-ofthe-river power plants, which constitute the combination of classification models (forecasting of the periods in which the analysed power plant does not generate energy) and a regression model (simulation of discrete values of the values of electricity).
The usefulness and universality of used methods is demonstrated by the fact that they are commonly used in the water-sanitary sector for the development of shortterm water usage forecasts for the waterworks and sewerage systems as well as for the sewage treatment plants' optimization [17]..

Example database
The proposed hydropower plant is located at the existing Dillon Dam, Ohio, United States of America. The dam was built on the Licking River near the town Zanesville. The Dillon Dam near Dillon Falls water-level indicator is located directly below the barrier closing the Dillon Lake ( Fig. 1). The information for this water-level indicator available in the database of daily flow observations cover the years since 1939. The Dillon Dam impounds a reservoir but the proposal hydropower plant will be operated on a strict run of the river mode.
For described analyses, series of daily flows in the Licking River in United States were used as an example database from USGS Water Data.
A sample turbine solution has been choosen, along with simultaneous reduction of production costs of the turbine sets by use of four identical units. The proposed turbine equipment includes 4 turbines with the nominal discharge 20 m 3 /s and the installed capacity of 1,700 kW each. The simulation conducted within this article is an example, without discussing alternative solutions, if any, that could be economically more justified.
For the Dillon Dam near Dillon Falls water-level indicator, daily observations of the flow of water are available for the period from 01. 10.1939 to 30.09.1991 [18]. The Dillon Reservoir was completed in 1961 for flood control because the initial database for further calculations was limited to the this year, so that the developed forecast model could be applied to the power plant operated in the run-of-river system without the possibility of managing water in the reservoir.
The value of the power head was assumed in accordance with the characteristics of the existing ground dam as equal to 10.36 m. In reality, the value of the head is strictly dependent on the flow. The run-ofriver hydropower plant will not be operated under flood conditions due to the necessity of maintaining free passage of flood waters.
The hydropower plant starts to produce electricity when the flow in the Licking River achieves 8 m 3 /s, which activates the first turbine. The maximum discharge of a single turbine is 24 m 3 /s. When the river flow achieves the level of 32 m 3 /s, the second turbine is activated, and the third and fourth turbines are activated at the flow of 56 m 3 /s and 80 m 3 /s, respectively. In case of flows below 8 m 3 /s and above 140 m 3 /s, the power plant will not be operated (Fig. 2). Due to the fact the analysed power plant has the concept nature, the temporary achieved power of the turbine P [kW] is calculated from the formula: The 24-h production of energy was determined from the formula: E = 24·P (2) On the basis of the equation (1) and 2), the curve E = f(Q, H, η) was determined. Moreover, using the dependences 1 and 2, the theoretical 1-14-day production of electricity was calculated.

Long-term electricity forecast
This publication presents a hybrid statistical model for forecasting operation of a hydropower plant, including production of electricity. The obtained results of the simulation were compared with the forecasts from typical regression models. Due to the fact that the analysed power plant has the concept nature and no long term measurement series of the above parameter were available, the dependence between flow intensity and the parameters of the initially selected turbines was used to determine the theoretical quantity of energy. The possibility of forecasting periods (1-14 days) was included in the developed model, during which the power plant is not functioning, using logistic regression. To forecast discrete values of energy, the selected data mining methods were used (support vector, the k nearest neighbour, random forests). The logit model constitutes a simple and clear regression dependence and is one of classification models commonly used in economics and medicine and is often implemented in statistical software packages (R, STATISTICA, SPSS, etc.).
The paper calls for two methods of calculation of electricity produced by the hydropower plant. In the first case, on the basis of the measurement results of the flow rate Q(t-i) and the determined theoretical energy production, long-term forecasts were provided for theoretical production of energy E(t = 1-14 days). In the second case, the hybrid model was used to forecast electricity. In the first stage of the analyses, a classification type of the model was planned, which will allow to identify the periods where the planned power plant will not be operated -the application of logistic regression was considered for this purpose. When calculations provided with the logistic regression model showed that the turbines will produce energy in the consecutive days (t = 1-14), long-term forecasts were provided with regression models. Prediction of electricity E(t)>0 was executed on the basis of the value of the flow from the preceding days Q(t-i).
In the logistic regression model, the probability (p) of occurrence of an event that would consist in production of electricity with the turbines or its lack may be expressed in general with the dependence: where: β0 -absolute term, β1, β2,… βj -regression coefficients determined with the maximum likelihood method, Xj -dependent variables, which include here: the daily inflow to the power plant in the previous days, i.e. Q(t-i).
Assessment of the prediction capacity of the logistic regression model (accuracy of forecasts) was provided on the basis of sensitivity (SENS), specificity (SPEC) and calculation error (Rz 2 ). To assess prediction capacity of the model, the calculated values of McFadden's R 2 and Cox-Snell R 2 coefficient and the Akaike information criterion were also used.
The obtained result of the logistic regression model is the probability of operation of the water power plant (p). Thus, in the case when the value p defined with the formula (3) for the adopted independent variables is smaller than p = 0.5, production of electricity is 0. In the case when the value of probability p determined with the formula (3) is larger than 0.5, the value E > 0 and then the regression model is used.
To calculate flow rate, three methods of data mining were used: support vector, the k nearest neighbours and random forests.
To make the training process appropriate, and then to properly assess the performance of a statistical model, the data were partitioned into the training set (75%), and the validating and testing set (25%). Prior to the start of the construction of mathematical models, input and output data normalization was performed by means of normal form transformation. Support Vector Machines (SVM) cover a group of methods developed by Vapnik [19] first exclusively for classification purposes, which expanded over time to include regression issues (SVR). For that reason, the dependence between the model output and input variables can be non-linear. As a result, in this method a non-linear transformation of N -dimensional space to K -dimensional feature space of much larger size is applied. In this study, the support vector regression method with the radial kernel function was applied to predict electric energy.
The k -nearest neighbour method (k-NN) is one of the simplest non-parametric methods, and like those already mentioned can be applied to classification and regression problems [20,21]. In this case, the dependent variable prediction is expressed by formula: (4) where: xi is one of K number of nearest neighbours of xj when the distance d(xi, xj) belongs the smallest distances between observations from set ZN = {(x1, y1), …, (xn, yn)}ϵR m+1 where: xi = (x1,i, …, xp,i) is the i-th vector of independent variables with m number of coordinates, yi is the i-th dependent variable, N -number of observations, J(xi, xj) -function of the form: (5) In the computations shown above, Euclidean (employed here) and Mahalanobis distances are most frequently used. The number of neighbours (K) was established by trial and error, seeking such a value of K, for which the model devised would show the best predictive abilities.
The random forests algorithm was proposed by [20] and it is a development of the bootstrap method. In the first stage, k-time sampling of the n-element training set is done, allowing repetitions, and then regression trees are created based on the obtained sets. The process of their construction in reference to the classic algorithm was modified so as to make the best breaking down in each node of the tree not on the basis of all, but random attributes (the explaining variables). In this way, k regression trees are obtained that make up the forest on the basis of which the forecast is determined that consists in calculation of arithmetical mean of individual forecasts of single trees as a result of the entire model.
To assess prediction capacity of the developed regression models, the values were used: mean absolute error (MAE), mean absolute percentage error (MAE) and correlation coefficient (R).

Results
On the basis of historical measurement series of flows, one may conclude that the flow rate in the analysed water-level section varied from 1.25 m 3 /s to 1019 m 3 /s, with the average value equal to 21.66 m 3 /s. The analysis of data showed that over a major period of time, i.e. 104 days a year, the analysed hydropower plant will not produce energy due to too low flows run in the river (below the so-called turbine start) or flood flows that force lowering of impoundage and turning off the power plant.
With the historical data and the knowledge that the power plant is operating in the range of the flows of Q = <8; 140 m 3 /s >, the input data Q(t-i) were identified, where: i =1, 2, 3, 4, 5…n, n = 14, and the output data for the logit model. When the determined value of the daily production of energy for of the period of 1-7 days was larger than 0, the output value of the logistic regression model equal to 1 (the power plant running), or 0 was assigned. The calculations proved ( Table 2) that the logit models developed on the basis of Q(t-1), Q(t-2), Q(t-3), and Q(t-4) in case of E(t = 1-14 days) feature satisfactory predictive capacity with statistically significant parameters at the adopted confidence limit p = 0.05. The values of the parameters SENS, SPEC and Rz 2 in these models change in the ranges 92.18-95.15%, 82.95-92.47% and 86.24-92.14% respectively, which indicates satisfactory matching of the results of the calculations to the measurements.
In the next stage of the calculations, based on discrete values of production of 1-to 14-day production of electricity and the input data Q(t-1), Q(t-2), Q(t-3), Q(t-4), the mathematical models were developed for prediction of E with support vector machines, k-nearest neighbour and random forest: the results of the obtained calculations are presented in Table 3. Table 4 presents the results of the calculations of the theoretical production of energy by a hydropower plant obtained based on the traditional support vector, random forests and k nearest neighbours methods. These calculations included also values in a time series, when production of energy during the analysed period was zero and the power plant did not produce electricity. It was found from the analyses that error in the forecast of electricity generated by the power plant is reduced along with extension of the period for which the forecast is provided. Tables 3 and 6 show that the lowest values of errors in the forecast of energy was received with the support vector methods, both in the classic and hybrid models. In turn, the largest values of errors in the forecast of energy were obtained with the models in which the k nearest neighbours method was used. It follows from the conducted analyses (Tables 3 and 4) that smaller values of errors in the forecast of electricity were obtained with the hybrid model than with the classic models. These differences result from the fact that the days / periods in the developed mathematical model in which the power plant is not functioning are forecast with satisfactory accuracy with logistic regression ( Table 2). In the regression models in which only the SVM, RF or k-NN method was used, the errors in the forecast of production of energy during the period when it is not generated are considerably larger than those obtained in the hybrid model.

Conclusions
It follows from the provided analyses that a hybrid model presented in the publication may be useful in modelling production of electricity in a hydropower plant operated on the run-of-river principle, i.e. in a noncontinuous way, resulting from periods of low water and freshet of flood nature. The obtained values of errors in the forecast of electricity produced by the hydropower plant with the support vector, random forests and k nearest neighbours method are higher than with the proposed mathematical model. Within the analysed methods (classical and the proposed hybrid model), the lowest values of errors in the forecast of electricity were received with the support vector method. On the basis of the calculations, the logistic regression was found out to be applicable in forecasting periods during which a power plant is not operated. The possibility of forecasting the length of idle periods in operation of a hydropower plant is very important because it offers the possibility of planning inspection and maintenance work for the individual components of a hydropower plant, which is included in the models developed so far in a limited degree. Due to the fact that the small hydropower plant analysed in this paper was a concept design, and the used data and their further analysis were aimed at showing the possibility of application of the method presented in the paper for forecasting electricity and the length of the period in which the power plant is not functioning, which is why verification of the developed methodology is recommended based on the data in the functioning facility.