Hydrograph estimation at upstream ungauged sections on the Secchia River (Italy) by means of a parallel Bayesian inverse methodology

. In this work, we present a reverse flow routing procedure, which allows estimating discharge hydrographs at upstream ungauged stations by means of information available at downstream monitored sites. The reverse routing problem is solved adopting a Bayesian Geostatistical Approach (BGA). In order to capture the complex hydrodynamic field typical of many real cases of rivers including large floodable areas, meanwhile overcoming the computational time limitations, we adopted as forward model a self-developed 2D-SWE parallel numerical model (PARFLOOD) that allows achieving ratio of physical to computational time of about 500-1000. To exploit the computational capabilities of modern GPU cluster, a parallel procedure to estimate the Jacobian matrix required by the BGA approach has been implemented. The inflow hydrograph in a river reach with several meanders and floodplains has been estimated in “only” 13 hours using a HPC cluster with 10 P100 Nvidia GPUs.


Introduction
The knowledge of discharge hydrographs in natural river sections is fundamental for water resource and risk management, design of new structures, etc. Furthermore, due to the different scopes, discharge information may potentially be required in any section along the river reach. Particularly challenging is the discharge reconstruction at the entrance of a river reach, where no observations (about discharge or water levels) are available neither in the section of interest nor in any upstream ones.
One of the approaches to solve this problem consists in the reverse flow routing process, which is a numerical procedure that allows evaluating the upstream flood wave starting from the knowledge of a downstream recorded hydrograph, together with the hydraulic characteristics of the river reach. [1] estimated discharge hydrographs in ungauged sections of natural channels by means of a Genetic algorithm coupled with a simplified hydrological routing model. However, the assumption of a Pearson type III distributed hydrograph limits the application of the procedure in reconstructing real flood waves which often have irregular shapes. [2] coupled a Genetic Algorithm with a 1D forward hydraulic model under the kinematic approximation, in order to identify the upstream hydrograph, given the downstream one, of a river reach. However, the estimations are limited to a rectangular prismatic channel and no errors were added to the downstream observations. Conversely, [3] solved the reverse flow routing problem for a real river and an irregular discharge hydrograph. They adopted a Bayesian Geostatistical Approach (BGA), introduced by [4] for groundwater problems, which considers a flow hydrograph as a statistical continuous random function that presents autocorrelation and accounts for uncertainties. In the framework of 1D numerical schemes, the authors showed the capability of the BGA methodology to estimate the inflow in an upstream ungauged section having discharge information (also corrupted) only in a downstream one. [5] further extended the method in order to adopt, as downstream observations, stage hydrographs instead of discharge ones. However, in many real cases of rivers including large floodable areas, where the 1D schematization does not hold, it is necessary to adopt a 2D Shallow Water model to capture the complex hydrodynamic field, even if this poses the drawback of the high computation cost, with respect to a 1D scheme.
This work extends the BGA methodology for reverse flow routing to 2D test cases, in order to model natural rivers with complex geometry, including flood plains and meanders. The transition of the application of the inverse procedure from 1D to 2D forward models is not straightforward, due to both the choice of the 2D Shallow Water scheme and the optimization techniques necessary for reducing the otherwise unfeasible computational time. Therefore, the stable, accurate and fast PARFLOOD 2D GPU finite volume model [6,7] is adopted as forward model and it is efficiently coupled with the Bayesian methodology. In this way, the inverse procedure allows estimating the discharge hydrograph in an upstream ungauged river section, having water level information only in a downstream observation site ( Figure 1) also for 2D applications.

The Bayesian Geostatistical Approach
The bgaPEST software ( [8]) solves inverse problems, in which the unknown parameters are correlated one another in space or time, by maximizing the posterior probability density function, which according to the Bayes' theorem reads: where s is the vector of the unknown parameters (the time values of the inflow hydrograph, in this case), y is the vector of the measured data (the downstream water level observations), p(s|y) the posterior probability density function (pdf) of s given y, L(y|s) the likelihood function and p(s) the prior probability density function of s.
The prior pdf provides a soft knowledge about the structure of the unknown parameters, making use of temporal autocorrelation functions. Meanwhile the likelihood function quantifies how a given set of parameters succeeds in reproducing the observations y, by calculating the Jacobian matrix ( [5]). Therefore, the evaluation of the likelihood function requires a forward model of the considered river reach that is able to describe the hydraulic routing process, and that provides for a given inflow hydrograph (set of parameters), the modelled water levels extracted in the same place and time of the available observations y. (1)) consist in the Np values that discretize the upstream discharge hydrograph, and represent the main final result of the estimation process. Additionally, so-called structural parameters (also known as hyperparameters), associated with the prior pdf and the likelihood function, are part of the inverse procedure. These parameters controls the shape or structure of the unknowns characterizing the covariance of both the prior and the uncertainty related to the observations ( [8]). The structural parameters are also estimated by the inverse algorithm.

The unknown parameters (s vector in Equation
The bgaPEST software solves the inverse problem by means of the following steps: 1. The vector of the unknown parameters s is initialized by assuming starting values (even constant) coherent with the considered river. Additionally, the structural parameters are set as to guarantee a flat solution (complexity is introduced during the optimization process only if supported by the data); 2. The hydraulic forward model propagates the discharge hydrograph defined at point 1 as upstream boundary condition. At the end of the simulation, the resulted water levels in the observation site are extracted; 3. A new set of parameters s is estimated after having computed the Jacobian matrix, which requires, in a finite differential approximation, as many runs of the forward model as the number of parameters Np. The water levels extracted from simulation i are the result of the routing of the discharge hydrograph, where just the parameter i has been varied of a known quantity, with respect to the value at point 2 ( Figure 2); 4. The parameters resulted at point 3 are adopted as discharge hydrograph and the operations at points 2-3 are repeated until convergence or the maximum number of iteration Ni (usually Ni = 4-5) is reached. Thus, a new set of Np parameters is estimated; 5. The structural parameters are estimated until convergence or the maximum number of iteration No (usually No=4-5) is reached. Since the problem is non-linear, the inverse procedure requires Nt run of the forward model, according to the following relation ( [8]): Most of the Nt runs in Equation (2) derive from the Np ones needed for the Jacobian matrix computation. However, since each run checks the sensitivity of the observations to the variation of a single parameter (run i tests the sensitivity on parameter i), the solution of a test does not have effects on the solution of other tests: as a consequence, the Np runs are independent one another and can be potentially performed in parallel. Due to the fact that the adopted PARFLOOD ( [6,7]) forward model solves the 2D-SWE on a finite volume scheme, and it is parallelized on GPU, the computational burden can be reduced by taking advantage of High Performance Computing (HPC) clusters equipped with several GPUs. For this reason, the original bgaPEST software has been modified as to run parallel simulations on any cluster. In the serial version, a main do loop over the parameters manages the operations that, for each parameter i, write the input file of the forward model, execute the numerical forward model and read the results. Conversely, the parallel version here implemented allows running all the Np independent simulations at the same time, before computing the Jacobian matrix and then estimating a new set of parameters s. The parallelism introduced in the bgaPEST procedure is highlighted in the flow chart in Figure 3. The parallel version of the bgaPEST is executed on the CPU of a classical PC, and it performs, for each of the Nt simulations, the following operations ( Figure 4): 1. The input files for the PARFLOOD model are organized. All the Np simulations present the same bathymetrical and roughness maps, initial condition, but a different upstream boundary condition; 2. All the simulations are sent to the server by means of the Secure Shell network protocol (SSH). In the HPC cluster the access node schedules all the submitted jobs and assigns each one to a GPU node where the run will be executed. At the end, the resulted water levels are extracted; 3. All the results are moved from the remote server to the local PC, where a new parameter estimation is done. Since the variation of parameter i causes effects only after time ti, each of the Np simulations can be performed from time ti, further reducing the computational times. The decrease of the theoretical computational time can be evaluated as the difference between the run time for simulations performed from tstart until tend and those run from ti to tend, as follows: where ∆t represents the time interval between parameters i and i+1. Fig. 4. Schematization of the data transfer from the client to the server assuming three parameters and thus three simulations to be parallel performed.

Test cases
The method was tested by estimating upstream discharge hydrographs in the reach of the Secchia River (Northern Italy) connecting the flood control reservoir of Rubiera (point A in Figure 5) to the gauging station of Ponte Bacchello (point C in Figure 5). In this reach the river is characterized by wide flood plains and floodable areas. The water levels at the intermediate gauging station of Ponte Alto (point B in Figure 5) were adopted as observations. Historical events recorded at those stations allowed the model calibration ( [7]). The overall studied domain was discretized by means of a non-uniform grid ( [6]), in a result of 77•10 3 computational cells.
The procedure proposed by [9] was adopted in order to obtain a reference solution for the method validation. Firstly, the selected inflow discharge was routed from the upstream section A until a section located far downstream C, and the resulted water level hydrographs were extracted in sites B and C. Then, in the inverse procedure, the water levels in sites B and C (resulted from the previous step) were assumed as observations and downstream boundary condition, respectively; the inverse methodology estimated the inflow discharge assuming that no information was available on the discharge/stage time series in site A. The availability of a reference solution allows providing quantitative information about the accuracy of the inverse methodology, by calculating two indicators. Firstly, the root mean square error, RMSE was evaluated as follows: where N is the total number of parameters (data of the discharge hydrograph), and are the i-th actual and estimated inflow values, respectively.
Secondly, the error in the peak discharge Ep was assessed as: where and denote the peak discharge value of the estimated and actual hydrographs, respectively.

Syntethic Gamma-distributed hydrograph
The first test concerns the estimation of an inflow hydrograph assumed with Gamma distribution ( [5]): where t denotes the time, A the base flow (150 m 3 /s), B the volume above the base flow (75•10 6 m 3 ), Γ(b) the gamma function defined through the parameters b (8.5) and k (9000 s) that denote the shape and the scale parameter, respectively. The flow hydrograph is characterized by a peak value of about 1350 m 3 /s, its duration was limited to 72 hours and it was discretized using 2 hours steps, resulting in 37 parameters to be estimated (Np=37). The observations, which were discretized every 0.5 hours, were assumed in a first estimation as free of errors, whereas in a second estimation they were corrupted with random errors uniformly distributed with variance 10 3 m 2 . Figure 5 shows the map of the water depth at the arrival of the flood wave pick: most of the areas near the upstream section are flooded, as well as the flood plains along the river.  Beside this qualitative assessment of the inverse methodology, quantitative information is provided by means of the RMSE and Ep indicators ( Table 1). The errors in the peak values Ep are almost negligible in both cases and the RMSE is less than 1 m 3 /s with free of errors observations and about 9 m 3 /s with corrupted observations: these values confirm the accuracy of the procedure in estimating the overall shape and peak of the inflow hydrographs. Each inflow hydrograph estimation required 609 runs of the PARFLOOD forward model (Np = 37, Ni = 4, No = 4) and 13 hours of computational time. The simulations were performed taking advantage of the HPC cluster of the University of Parma, equipped with 10 NVIDIA ® Tesla ® P100 GPU. Conversely, the adoption of the serial bgaPEST would have required about 4 computing days, by retaining the same accelerated forward model. Finally, the adoption of both a serial bgaPEST procedure and a 2D-CPU forward code, and assuming a speed-up of two order of magnitude between PARFLOOD and serial CPU codes, would have required about 400 computing days, meaning the practical impossibility to resolve the inverse problem.

Syntethic hydrograph with two peaks
The second test focuses on the estimation of a real flood wave with a bimodal shape, occurred in December 2009 on the Secchia River ( [7]).
The discharge hydrograph (Figure 7) is characterized by a total duration of 150 hours and presents a first peak value of about 380 m 3 /s and a second one of about 550 m 3 /s 45 hours later. As for the previous test, the observations were firstly assumed free of errors and then corrupted by random errors uniformly distributed with variance 9.7•10 4 m 2 .
The inflow hydrograph was discretized using 3 hours steps (Np = 51), whereas the observation stage hydrograph was discretized every 0.5 hours. Figure 7 shows the comparison between the actual and the estimated inflow hydrographs with free of error (Figure 7-a) and corrupted (Figure 7-b) observations. Again, the estimated hydrographs capture well the actual ones.
(a) (b) Fig. 7. Two peaks inflow: comparison between actual and estimated hydrographs with free of error (a) and corrupted (b) observations. The differences between actual and estimated values (error) are also reported.
In both estimations, the error in the peak flow Ep is around 1% and the RMSE about 4 m 3 /s: these values show the capability of the procedure in reconstructing also flood waves with more than one peak.

Conclusions
In the present work a Bayesian methodology, that allows the estimation of discharge hydrographs in ungauged sections located at the upstream section of a river reach, has been extended to 2D cases by adopting a 2D-SWE parallel forward routing model. The validation on a natural river has shown that the procedure well estimates unknown inflow hydrographs with different shapes and also in presence of corrupted observations. The parallelization of the Jacobian matrix computation and the possibility of taking advantage of any HPC cluster with GPUs, allowed achieving the highest reduction in the computational costs: the more GPUs are available, the less time is required for computation. For a considered case, this parallel procedure reduced the computational time of a factor 8 against running the 2D-SWE code on a single GPU. Furthermore, the analysis of the runtimes has highlighted that the adoption of a serial code would lead to inadmissible computational times.