Forecasting the Nysa Kłodzka flow rate in order to predict the available flow for a run-off-river (ROR) power plant

Hydroelectricity is generally perceived as a stable and predictable power source. However ROR power plant without reservoir energy output is mainly driven by changing flow rate. This study applies artificial neural networks to create flow rate forecasts with one hour lead time. Forecasting models were built for Nysa Klodzka catchment which possesses significant potential for new hydropower plants development as well as leads to frequent floods. The best of the obtained model gives satisfactory results both in terms of root mean square error (0.6379 m 3 /s) as well as Nash-Sutcliffe performance indicator (0.9978). Obtained results were compared with currently used forecasting models and were proven to be superior.


Introduction
Development of the so-called renewable energy sources (RES) in Poland is driven mainly by the European Union regulations and directives.To a much lesser degree an increasing installed capacity in RESs can be attributed to their superior economic performance and social awareness of the environmental protection.However in future it is the economics that may play a crucial role due to dwindling non-renewable resources and/or their picking up prices.Additionally foreseeable is a reduction in RES costs as well as their increasing efficiency which will undoubtedly lead to their greater role in national energy markets.Unfortunately three the most important representatives of RESs namely: solar, wind and hydropower in some circumstances (e.g.hydroelectricity dependent on current flow rate, photovoltaics and wind energy without energy storage facilities) introduce undesirable variability to the operation of energy system (by some they are denoted as VRE -Variable Renewable Energy).Their non-dispatchable nature results from dependence on naturally varying phenomena such as: wind speed, irradiation and flow rate.Therefore there is not only a need to overcome fluctuating demand side of energy market but also its supply side.Hence the increasing research interest in areas such as: energy yield forecasting from solar [1,2] wind [3] power sources; temporal and spatial complementarity assessment of certain RES [4,5]; energy storage [6]; hybrid RES power sources [7]; RES integration to the national power systems [8].
The Directive 2009/28/EC of the European Parliament on the promotion of the use of energy from renewable sources required from Poland to 2020 an achievement of 15% share of renewables in the total of primary generated energy.The national authorities concerns indicate that the goals of the Directive rather won't be fulfilled.These fears were confirmed by the computation of renewable energy (RE) share in the national electricity consumption for the last several years.In 2009 the share of RE was 5.7% while in 2011 it rose to 8% which it is still not enough [9,10].
According to the Polish Energy Regulatory Office (http://www.ure.gov.pl/) as on 30.06.2016 the installed capacity in RES amounted to almost 8.25 GW (excluding co-firing biomass in commercial power plants).This was however unevenly distributed among: wind power (68.7%), biomass power plants (15.4%), hydroelectricity (12%), biogas power plants (2.8%) and slightly above 1% in photovoltaics.As can be seen, hydroelectricity amounts to almost 1 GW of the installed capacity in Polish RES.The biggest contribution to this capacity is made by hydro-pumped storage located on rivers (382.7 MW) and run-off-river power plants with capacity greater than 10 MW (311 MW).However, Polish hydroelectricity is dominated by relatively a small run-off-river units.From total of 752 hydropower plants 98.7% has nominal power smaller than 10 MW, from which 76.7% is not greater than 300 kW.According to the RESTOR HYDRO (http://www.restor-hydro.eu/pl/)project there is an opportunity to develop over 8 500 pico, micro and even bigger hydro installations in Poland.In order to minimize the environmental impact and financial investments those hydro generators may be built on existing weirs, dams and other lateral structures and become a run-off-the-river power plants.Therefore in future not only the role of small scale hydroelectricity may increase in Poland but in some cases it will play crucial role in smoothing and balancing inherently variable energy generation coming from photovoltaics and wind turbines [11].Hence it will be a necessity for a power grid operator to possess knowledge on future VRE energy generation capabilities.
Above described situation in Polish renewable energy market and current problems in area of RES enable formulation of a following goal of this study: "Assessment of the possibility of applying artificial neural networks (ANN) to create a flow rate forecast with a one hour lead time for Nysa Kłodzka river and juxtapose obtained results with those coming from current models used by HydroProg [12].

Artificial neural networks
The concept of artificial neural networks has been inspired by biological neural networks which operate in human and animal brains to estimate approximate functions based on usually large sets of often unknown input variables.Typical ANN consists of three layers, namely: input, hidden and output each of which performs its idiosyncratic tasks.Additionally the process of ANN model creation can be divided into three phases, namely: teaching, validation and testing each of which is performed on previously distinguished subsets.Thanks to the ANN structure and applicability for non-linear systems they have been widely applied in many areas of research such as: medicine [13]; energy demand [14]; waterworks operation/water demand [15]; drought forecasting [16].Comprehensive description of ANN and their performance as forecasting tools can be found for example in [17].
Due to the rapid development of computing capabilities of modern computers the process of ANN creation has been in large portion automated and the analysis of various ANN architectures even when the data set is very large tend not to be a big problem.This study applied the Statistica software developed by StatSoft for creating multilayer perceptron (MLP) ANNs.Statistica has Broyden-Fletcher-Goldfarb-Shanno teaching algorithm built in along with tool dividing data into teaching, validation and testing subsets which correspond to accordingly 70%, 15% and 15% of the initial set.However the area which has not been automated so far is the selection of appropriate explanatory variables -this problem is usually encountered in area of building forecasting models.According to [18] following approaches to selecting most appropriate explanatory variables subsets may be implemented in area water resources modelling:  based on expertise and former knowledge on investigated system,  correlation analysis between dependent and independent variables,  heuristic approaches based on creating various ANN models and adding or subtracting subsequent explanatory variables,  extracting data from newly created ANN and using it to perform sensitivity analysis,  combination of tools used in various methods.
On the basis of the previous experiences with ANN models creation [14,15,19]] it has been assumed that in this study the explanatory variables subset will be selected based on correlation analysis combined with expertise and reports from previous studies on Nysa Kłodzka catchment properties.Obtained results were then compared with those available at HydroProg hydrological forecasting system (free access: http://www.klodzko.hydroprog.uni.wroc.pl/)which is a well-known and established forecasts source for the Nysa Kłodzka catchment [12,20,21].Because the forecasting performance of the models available there is assessed based on two criteria, namely: rootmean squared error (RMSE) and Nash-Sutcliffe [25] model efficiency coefficient (E) they were also applied in this study.They were calculated based on formulas (1) and ( 2).In case of the RMSE criterion the smaller its value the better is the forecasting model -therefore it is a useful tool to compare various models tested on the same data sets.The second criterion values can range from -∞ to 1 where E = 1 indicates the perfect match between modelled and observed values, whereas E = 0 informs that the model accuracy is as good as using a mean of the set as a forecast.The Nash-Sutcliffe values less than 0 show that created model can be easily outperformed by simply using the mean of the set to predict system behaviour.
where: n -number of observations in testing subset, e -forecast error where: -mean value of observed discharges, -modelled discharge at time t, -is observed discharge at time t.

Case study -Nysa Kłodzka catchment
The Nysa Kłodzka River is a left tributary of the Odra River, into which it flows at km 181.3 in the Opolskie voivodeship -Southern West Poland.The river's total length is 181.7 km, while the catchment area reaches 4565.7 km 2 .The Nysa Kłodzka is a mountain river, located in Kłodzko Valley.It has an average gradient of 9.05‰, and starts in the Lower Silesia voivodeship, through which it flows nearly 50% of its total length -89.4 km.The highest elevation within the basin is equal to 1425 m a.s.l.The river supplies several important for the flood protection purposes reservoirs: the Topola, Kozielno, Otmuchow and Hydrometeorological data were derived from the national hydrological and meteorological measurement and observational network held by Institute of Meteorology and Water Management -National Research Institute (IMWM-NRI).In the Nysa Kłodzka river catchment there are monitoring structures there are 12 water level gauges including hydrological station in Bardo, and about 12 meteorological stations including synoptic station in Kłodzko (Fig. 1).Water level is measured on all mentioned stations and the flow rate is computed from the relationships based on flow measurements made by IMWM-NRI.All stations in the Nysa Kłodzka river basin are automatically gathered.Data series used in this study becomes from standard measurement system, that can be seen from (http:/www.pogodynka.pl).Basic time period for water level measurement is 10 minutes.
Because of the very frequent flooding occurrence in the Kłodzko Valley, local authorities decided to build early warning system based on the 22 automatic hydrologic gauging stations where water level is measured every 15 minutes, and 18 automatic meteorological stations (with various sets of measured parameters) recording observations every 15 minutes.That hydrometeorological system is owned by Crisis Management Center in Kłodzko and it is called LSOP (Lokalny System Osłony Przeciwpowodziowej in polish, free access at http://lsop.powiat.klodzko.pl).Besides measuring intervals the main difference between LSOP and IMWM-NRI monitoring is that LSOP gives only water level in case of the hydrological data.
Using LSOP hydrological data the HydroProg forecasting system was applied in the recent years [12,20,21].Energy and Fuels

Forecasting models -results
The ANN forecasting models have been created to predict the flow rate in Bardo (which is located in the upper part of the scheme depicted in Fig. (1).Based on described in the second section approach eight sites with water gauges were selected to use collected there data sets as explanatory variables.Additionally historical hourly flow rates measured in Bardo were also considered as input variables.Expertise indicate that in the case of Nysa Kłodzka catchment the lag time ranging from 1 to 12 hours is sufficient (denoted in this paper as e.g.t-1).It has been assumed that forecast will be made with one hour lead time.Table 1 contains calculated correlation coefficients, all of which were significant with p -value < 0.05, therefore all those subsets were used as explanatory variables.According to the literature [22,23] using meteorological parameters such as precipitation, temperature or humidity in hydrological forecasting models, based on flow hydrodynamic transformation between neighbouring gauges does not lead to an improvement in accuracy.
The so called "Automatic network search" in-built functionality in Statistica was used to construct ANN models.Although it initially suggests to investigate only 20 various ANN structures it has been calculated that exactly 500 different architectures of MLP ANN should be tested.This number results from the following assumptions:  the minimal and maximal number of neurons in so called hidden layer is a problematic issue which is often very controversial.Therefore an approach presented in [24] has been applied.It states that the minimal number of neurons in hidden layer should be equal to the square root of the product of the number of neurons in input and output layers.Whereas the maximal number should not be greater than three times the minimal number of neurons in the hidden layer plus one,  five various (namely: sine, linear, logistic, hyperbolic tangent and exponential) activation functions have been considered in the hidden and output layers.Therefore assuming 20 various numbers of neurons in hidden layer, 5 different activation functions in both hidden and output layers it is possible to create as much as 500 unique ANN architectures.Created models have been assessed based on criteria previously mentioned (Eq. 1 and 2) but only with a reference to the testing subset which does not take part in the process of ANN creation.Table 2 summarizes the most important characteristics of selected five ANN which outperformed the reaming ones in terms of forecasting accuracy measured as a correlation between observed and models flow rate values within the testing subset.Those five models were further investigated in the next section.The original dataset was a 26294 hours long time series.However some missing values in flow rate due to the equipment failure or malfunctions were observed.Therefore this set has been reduced to 15856 hours long subset in order to make sure that each forecast will be created based on a complete set of explanatory variables.Obtained complete set was further divided into three smaller ones, namely:  teaching subset 11000 cases,  validation subset 2378 cases,  testing subset 2378 cases.Classification of the cases to the individual subsets was performed based on an in-built in the Statistica software algorithm.But in order to avoid type III errors [26] a k-fold crossvalidation with k = 10 was conducted.It basically means that the procedure of dividing the original set into three subsets was done ten times and the obtained performance indicators values (RMSE and E) were averaged.Energy Table 3 summarises the RMSE and E for each of the five ANN models.Observed values are quite similar and differentiable at two decimal places.In terms of Nash-Sutcliffe criterion all models exhibited equal performance and the values of E suggests that created forecasting models explain almost completely the behaviour of modelled phenomena.In case of forecasts generated by models using one hour prediction time period e.g used by HydroProg, exhibited values of E equal to 0.93 what means that all five proposed in this paper models came out to be superior.However it is clear that model ANN2 which had only nine neurons in the hidden layer and used hyperbolic tangent and liner functions as activations ones respectively in hidden and output layer the remaining ones in terms of RMSE criterion.Although the greatest observed absolute error in case of the ANN2 model was 12.48 m 3 /s, the mean of all absolute errors was 0.2836 m 3 /s.Majority (95%) of absolute errors was smaller than 1 m 3 /s and almost 86% of them was not greater than 0.5 m 3 /s.Figure (3) depicts the 30 observed and forecasted by the ANN2 model flow rates for 30 subsequent observations.

Conclusions
The results and observations coming from this study may be applied for the future analysis of the operation of run-off-river power plants with or without reservoir.Information about the flow rate which will be available an hour ahead may give the operator/owner of the ROR type hydropower plant an opportunity to perform better (financially) on the energy market but also to adjust energy yield to the current market needs.It is important to note that flow rate forecasts are currently of a great importance for civilian safety services but in future Energy and Fuels the variable renewable energy may start such a huge role in energy system that it will be necessary to put in service all possible measures which ease their inherent stochastic nature.
There seems to emerge several interesting directions for future studies, which should focus on:  investigating the impact of smaller explanatory variables subsets and longer lag times on the forecast accuracy;  building hybrid forecasting models based on other machine learning tools and data pre-processing techniques such as wavelet transform;  scrutinizing the impact of forecast accuracy on the real ROR hydropower energy output;  creating a simulation and optimization model (which will incorporate data from flow rate forecasts) of a ROR power plant with a reservoir operating on the energy market.

Fig. 1 . 19 Fig. 2 .
Fig. 1.The Nysa Kłodzka catchment along with location of weather stations and water level gauges from which data were obtained for the purpose of this study.Over the considered time period (2010-2013) the hourly values of Nysa Kłodzka flow rate exhibited following statistical properties: mean -20.87 m 3 /s; standard deviation -19.75 m 3 /s; coefficient of variation -94.65 %; maximal value -240.78 m 3 /s; minimal value -5.22 m 3 /s.Example of flow rate variation over twelve month period is given in Fig. (2) notice rapidly occurring flooding flow rate peaks and how much time it takes for them to smooth out.Random swells of waters may not only lead to loss of human life, destruction of infrastructure but also have impact on the operation of hydroelectricity.
[9,12,20,21]utaries are: the Bystrzyca Kłodzka, Biała Lądecka, Bystrzyca Dusznicka, Ścinawka and Buszówka.The Nysa Kłodzka catchment was developed intensively by hydropower facilities in recent years.There are 24 hydroelectric power plants (of either runof-the-river (ROR) or conventional and dam (CON) type) with capacities from 0,002 to 5 MW (detailed table[9]).The Otmuchow and Nysa-Glebinow power plants stands as the highest capacities (with 4.8 MW and 3.28 MW respectively), while the lowest capacities (several kW each) goes to Nowa Morawa on the Biala Ladecka, Klodzko on the Bystrzyca Dusznicka and Rynarcice on the Scinawka Niemodlinska.Most of the functioning power plants belongs to the local or national authorities, but there are also commercial companies like (among others) Hydroelektrownie Dolnego Slask, TAURON Ekoenergia or Hydroenergia as well as private investors.Total capacity of all actually existing hydropower plants on the Nysa Klodzka River covers nearly 17.6 MW.[9,12,20,21] 19Nysa.

Table 1 .
Matrix of correlation coefficients between set of explanatory variables and a dependent variable (flow rate in Bardo in time t).

Table 2 .
Architecture of five ANN models selected for a further investigation (BFGS -stands for Broyden-Fletcher-Goldfarb-Shanno teaching algorithm).

Table 3 .
RMSE and Nash-Sutcliffe forecasting metrics values for five ANN models.Relation between modelled (ANN2) and observed flow rate for 30 subsequent observations for The Nysa Kłodzka dataset were obtained for the purpose of this study.