Attempts to establish a regional probabilistic model of intense rainfall for the Upper and Middle Oder River basin

The main scope of this study was to present the possibility of developing a heavy rainfall regional model, based on a probabilistic approach. Following the methods and conclusions of the scientific and technical work of the research team in the Upper and Middle Oder basin, carried out on rainfall data between 1961 and 2010, it was found that the generalized exponential distribution enables a satisfactory probabilistic description of heavy rainfall, and the results of the models constructed on its basis are obtained in an accessible and not time-consuming way. The regional model presented in the article sufficiently describes the empirical heavy rainfall at eight analyzed meteorological stations, especially in the range of rainfall amounts occurring most frequently from a statistical point of view, i.e. with the p-value from 0.3 to 1.0.


Introduction
The safe design of urban areas drainage requires reliable information on rainfall conditions in a specific region, derived statistically from long-term meteorological measurements and observations [1][2][3][4]. The determination of probable rainfall is also necessary to identify areas exposed to the occurrence of floods, regardless of their nature (flash floods or urban floods) [5][6][7][8][9]. In Poland, there is still a deficiency of scientific works in the field of probabilistic modelling of intense rainfall; therefore, this article is an attempt to reduce this deficit. The paper is a brief description of a consistent continuation of the research activities undertaken by the authors in the works [8,9]. Thus, for the description of the study area and methodological details we rather refer to the previously published works and indicate only the necessary differences or modifications here. At the beginning, before the preparation of this work, there were primarily the conclusions of previous studies [1,2,[8][9][10][11]. This paper attends to the attempt if the development of the regional rainfall model is possible for data series with different precipitation characteristics at locations with different elevation. In order to achieve the intended goal, the quantiles of the probability distribution best fitted to random variables representing eight individual stations will be compared with the quantiles of an averaged time series variable that theoretically represents the largest precipitation heights for the whole Upper and Middle Oder basin. The purpose of the proposed simplification is to answer the question whether the generalization of rainfall data, the quality of which is usually the subject of many scientific discussions, as the input to the rainfall model, can give satisfactory results from the viewpoint of engineering applications.

Study area and methods
Following the paper [9], the basic study material was composed of one-minute rainfall series from meteorological stations located in the Upper and Middle (also called Central) Oder basin covering the years from 1961 to 2010, provided by the Polish Institute of Meteorology and Water Management -National Research Institute (IMWM-NRI). Figure 1 (adopted from [9]) shows the eight stations located in the Oder basin with elevation depicted in meters a.s.l. The Upper and Middle Oder was limited by the profiles of water gauges in Chalupki, directly located on the Polish-Czech border (downstream of Bohumín), and Slubice, located near the Polish-German border. Due to these research assumptions, only the Polish part of the Upper Oder basin was taken into account. With respect to [9], some potential meteorological stations such as Cieszyn, Leszno, Grabik and Słubice (not shown in Fig. 1) had to be excluded from further analyses because of the violation of the data series homogeneity due to the analyzed long-term period. Oder basin (adopted from [9], based on the PUWG-92 coordinate system). Probabilistic modelling of intense rainfall was made on the basis of the methodology developed and constantly improved by the research team of the Wrocław University of Science and Technology (Faculty of Environmental Engineering) and the IMWM-NRI, specifically for rainfall data from Wrocław (2009-2015) [1,2,10] and then from Legnica [11]. Simplified procedure for the development of a probabilistic model assumes the following tasks: 1. separation of maximum interval precipitation heights for rainfall duration from 5 minutes to 3 days based on the available precipitation events in the assumed 16 time periods (5, 10, 15, 30, 45, 60, 90, 120, 180, 360, 720, 1080, 1440, 2160, 2880 and 4320 minutes); this method is sometimes called the Partial Duration Series (PDS) approach, 2. selection of representative random samples of the largest precipitation values for the analyzed multi-year period, in this case 50 years of measurements, Important: For further analyses, the selection of the highest rainfall values can be made directly if one takes only the 50 largest values from the non-increasing sequence of all peaks found. In such a case, however, the independence of rainfall data above the accepted cut-off threshold (POT) must be maintained. Otherwise, the annual maximum precipitation (AMP) method should be used. 3. selection of the probability distribution that best fits the measurement data (through a numerical estimation of probability distribution parameters followed by a goodness-of-fit criterion assessment), 4. determination of the dependence of selected probability distributions parameters using the rainfall duration and the probability of exceedance. Following the methods and results given in [1,2,[8][9][10][11] and also the theoretical basis in [12], in this research, the four probability distributions were fitted to the data series from the eight stations, namely the Gumbel, Weibull, Log-Normal and Generalized Exponential Distribution (GED). The goodness of fit was provided by the BIC and rRMSE criteria. The results were presented in figures containing a comparison between the regional model and the eight station maximum empirical data series for chosen time periods (editorial frames of this paper did not allow to present all 16 time periods for eight stations; moreover, the comparison of all time periods for all analyzed stations would be unclear). All necessary numerical calculations were performed using the R-package statistical modules ('FAdistr', 'fExtremes', 'MASS' and 'bbmle') compiled in our own constructed scripts.

Results
With respect to the research methods described in Section 2, maximum rainfall data series were derived from the long-term period database representing the eight analyzed stations. As a result of this data separation, we got eight data frames combined with 50 maximum values of 16 rainfall durations (from 5 to 4320 minutes). In the next step, the averaging of particular rainfall durations from all eight stations was done in order to get regional data frame containing maximum values for the Oder River basin. The procedure of choosing the best-fitting probability distribution was applied to the maximum rainfall data prepared in described way. The results of the BIC and rRMSE criteria for each rainfall duration are summarized in Table 1. The smallest values (in bold) indicate the greatest correlation between the theoretical distribution and the regional average obtained using the eight measured data series. In general, for rainfall durations between 5 and 2160 minutes both criteria pointed out that GED is the best-fitting probability distribution. These results are in strict accordance with the conclusions of the work for Wrocław [1,2,10], Legnica [11] and several stations in the catchment area of the Lusatian Neisse River basin [8]. Despite major differences for the 2880-and 4320-minute rainfall duration, GED was adopted for further analysis as the best-fitting. In fact, for the purposes of designing and operating drainage systems of urbanized areas, the rainfall duration limited to one day is sufficient [1,2]. The GED goodness of fit is shown in Fig. 2. It can be clearly seen in Fig. 2 that, for averaged measurements, GED was best fitted to rainfall durations t ranging from 5 to 1440 minutes. For t ∈ [2160;4320], the differences ranged generally from 11% to 23%, and in the extreme case of the highest precipitation for the duration t = 4320 minutes almost 31%. The final step of this research was to compare the GED quantile determined based on the regional-average data set with the quantiles corresponding to the empirical data for the eight analyzed stations. Figure 3 shows the correlation level between the GED theoretical regional-average rainfall and the empirical maximum rainfall for the eight meteorological stations. Fig. 3. Correlation between the GED regional-average rainfall height (h [mm]) and the empirical maximum rainfall height (h [mm]) (coloured dots and linear trend lines with R 2 described in text) for the eight meteorological stations of the IMWM-NRI for the rainfall duration t = 5 minutes.
Simple regression correlation factor indicated a high level of model conformity to measured data. The values of the R 2 coefficient, for the rainfall duration t = 5 minutes, were respectively (decreasing order): Zgorzelec 98.95%, Zielona Góra 98.91%, Legnica 98.42%, Opole 98.35%, Wrocław 98.18%, Jelenia Góra 97.08%, Racibórz 94.70%, Kłodzko 90.64%. A strong correlation in the range of heavy rainfall for durations such as t = 5 minutes is very important for the drainage design purposes. Figure 4 shows that the differences between theoretical and measured data were primarily due to the least-frequent rainfall with the probability of exceedance up to p = 0.1. The largest differences in precipitation values were 5.5 mm and 6.2 mm for Kłodzko, 2.9 mm for Jelenia Góra and 2.4 mm for Zgorzelec. The remaining differences did not exceed 2.0 mm, and in the majority of cases 1.0 mm, which can be considered a very good result. The editing scope of the work does not allow to present, in a similar manner, all the results obtained for all analyzed rainfall durations. Therefore, Fig. 5 presents one of the negative examples for t = 1440 minutes. Significantly inferior-quality fit of the GED theoretical data to the measurement data for the duration of precipitation from a few to several hours was confirmed in the literature of the subject and has been confirmed in previous studies conducted for Wrocław [1,2,10], Legnica [11], several Lusatian Neisse basin stations [8] and 12 meteorological stations in the Upper and Middle Oder basin [9]. The example presented in Fig. 5 shows that the data from the Racibórz, Opole and Jelenia Góra stations are significantly different not only from the proposed regional model but also from other measurement stations, which, on the other hand, show satisfactory compatibility between themselves and the regional model. Distinct differences also apply to the least expected precipitation corresponding to the probability of exceedance p between 0.01 and 0.3. The differences here range from 22% to 39%, which gives a rainfall difference of up to 45.6 mm. The topographic location of these three stations is not without significance in the observed situations because it is a submontane area prone to the occurrence of intense rainfall that is hardly predictable. Also, in this area, floods harmful to the technical infrastructure, human property and the natural environment are very common [7]. It is worth mentioning that the occurrence of extremely high rainfall values such as those from Racibórz (i.e. h = 130.3 mm for t = 1440 minutes or h = 227.4 mm for t = 4320 minutes) poses considerable difficulties when trying to describe the precipitation distribution. This is because most of the quality tests indicate such values as outliers. The statistical description of the precipitation data in mountain areas is difficult on the one hand, but necessary on the other in order to carry out the further research for various tasks, especially for the purpose of water supply and sewerage.

Conclusions
Eight meteorological stations and four probability distributions were used to describe the long-term rainfall data in the Oder River basin . Averaging of the maximum rainfall durations values allowed to create a regional model based on the GED distribution. An evaluation analysis led to the formulation of the following conclusions: • general conclusion: we can provide well-fitting (to measured data) local probabilistic rainfall models, but it is much harder to develop a regional model for the terrain with varying elevation (in the analyzed Oder basin area from ~100 m to nearly 400 m a.s.l. -without higher parts of the Sudetes), • the best-fitting probability distribution was GED, • the differences between model and measured data were mainly higher in the extreme maximum values range (probability of exceedance p between 0.02 and 0.3). • for Legnica, Kłodzko, Zielona Góra, Jelenia Góra and for most rainfall durations at Opole and Wrocław, regional model described measured data to a satisfactory degree, • the largest differences between the data from the regional model and those from the eight stations were usually between 20-30%, and in extreme cases they did not exceed 40%, • more and more detailed analyses (including the usage of other probability distributions and a spatial analysis) should be performed in the future for the Oder River basin for flash flood and urban flood protection purposes, • if possible, similar analyses should also be done in the Czech parts of the Oder catchment and, especially, in its mountainous parts in both Czechia and Poland; maybe, in whole Poland (in order to find the regional schemes and also the probability distribution parameters similarities).