Towards reliability-management for debris flow risk assessment

. Recent progress in data-integrated simulation methods excelled our understanding of debris flows including triggering mechanisms and dynamic run-out behavior. Research groups and geohazard practitioners worldwide successfully integrate advanced simulations into workflows for hazard mapping. However, many challenges remain in predictively applying such tools for accepted decision support. One re ason is our lack of a systematic approach to managing the simulations’ reliability. In this contribution, we present results on an investigation to which extent the choice of data used for calibration influences the simulation’s reliability. We start with introducing building blocks of a modular and extendible data-integrated debris flow simulation toolchain developed by our group. Next, we introduce reliability as one quality measure of a holistic debris flow simulation and discuss how it can be assessed. Based on a synthetic example, we then show how different types of observed calibration data, such as impact area, deposit volume or localized velocity measurements impacts on the subsequent forward simulation’ s posterior probability distribution, hence the simulation ’s reliability. We conclude by discussing how linking a debris flow simulation ’s reliability to type, scope and resolution of the calibration data could offer a novel pathway towards reliability management for debris flow risk assessment.


Introduction
Today, we are able to conduct advanced highperformance simulations of cascaded natural disasters involving debris flow activity. High-performance simulations of such geohazards open a revolutionary opportunity for customized and dynamically adoptable hazard maps that support decision makers and allow for the model-based design of effective mitigation measures to save lives and prevent financial loss. Once a calibrated computational geohazard simulation tool chain is available, its quality can be assessed in different ways: We can assess the plausibility of the underlying process model given observations that are available to us. While this has often been done in an ad hoc manner in the past, there is recently a push towards more formalized Bayesian model selection techniques. If there is strong reason to believe that the underlying process model is highly appropriate, then we can assess the data-integrated simulation model's potential for transfer: Can the calibrated model be used to predict in a different context, hence can it be transferred to an event at a different location? A third option is to assess the simulation model's reliability: To which extent can we trust the data-integrated simulation results, hence how large is the model's uncertainty?
This study addresses the third question, hence reliability in mass flow simulations. Various techniques have been proposed to assess uncertainty and reliability including point estimate methods, first order second moment and reliability methods, and Monte Carlo simulations, see [1] for an overview. While these studies * Corresponding author: kowalski@mbd.rwth-aachen.de are designed to propagate parameter uncertainty through a mass flow simulation, they typically do not address how reliability can be managed. Reliabilitymanagement is the central motivation for this work. In section 2, we will introduce a modular mass flow simulation tool chain used for this study. In section 3, we will present a case study based on which we discuss how different types of observation data impact on the simulation's reliability. In section 4 we conclude and provide an outlook how these findings could offer new possibilities for reliability management in debris flow risk assessment.

Simulation tool chain for reliability assessment
Assessing reliability in debris flow simulations requires combining probabilistic model parameter calibration based on observations with a subsequent goal-oriented forward uncertainty quantification [2]. The ability to effectively manage reliability imposes further requirements on modularity, computational-feasibility and usability of such simulation tool chains. In our research group, we develop a simulation tool chain capable of both computationally feasible backwards probabilistic parameter calibration and forward uncertainty quantification in mass movements. It follows a modular setup to facilitate a selective choice of the central building blocks defining a modern mass movement simulation tool chain (see figure 1). https://doi.org/10.1051/e3sconf/202341505013 , 05013 (2023) E3S Web of Conferences 415 DFHM8 Fig. 1. The schematic shows the generic structure of a modular debris flow simulation tool chain. Yellow indicates the process model, blue indicates necessary computational techniques and green indicates data integrated into the simulation tool chain. Different aspects of the calibrated tool chain can be quality assessed including its potential for being transferred to another site as well as its plausibility and reliability. Boxes bounded by red dashed lines indicate high-throughput tasks that are computationally expensive, and benefit from modern surrogate modelling techniques.
The chosen physical process model (a) is at the heart along with its numerical solution scheme (b), e.g., an idealized mass point model, r.avaflow [3], or RAMMS [4]. The choice of the underlying computational model defines the model parameters to-be-calibrated (c), such as Coulomb friction, as well as the simulation's output (d), e.g., the flow's spatio-temporal height and velocity distribution. The purpose of the simulation study determines the relevant diagnostic variable of interest post-processed from the output (e), e.g., impact area, maximum run-out distance or factor of safety. Computational techniques are needed not only for numerically solving the process model (b), but also to (potentially) introduce a surrogate for speed-up and to facilitate high-throughput tasks (f) -here based on Gaussian process emulation [6] -and to efficiently calibrate model parameters (g) -here based on Bayesian Active learning [5]. Finally, specific simulation and calibration scenarios rely on static input data (h), such as topography and vegetation maps, and on observables that comprise the training data set used for calibration (i). The automated integration of surrogate modeling techniques is particularly powerful, as it facilitates previously unfeasible computational tasks, such as global sensitivity analysis and uncertainty quantification tasks [5,7].

How observation data impacts on prediction reliability
The goal of this study is to investigate the impact of different types of observed data on the reliability of a subsequent probabilistic forward simulation. For the sake of transparency and interpretability, we will revisit previously published simulation results and interpret them in light of a potential reliability management. The approach itself will be generalized to other mass flow scenarios in the future.

Fig. 2.
Posterior distribution as a result of Bayesian Active learning of dry-Coulomb friction and turbulent friction parameters given synthetic observations extracted from a simulated landslide close to Bondo, Switzerland. See [6] for simulation scenario and approach to Bayesian parameter estimation.

Fig. 3. Posterior distribution as a result of joint Bayesian
Active learning of dry-Coulomb friction and turbulent friction parameters given a combination of two types of synthetic observations, namely deposit volume and maximum velocity at location L1. See [6] for details about the simulation scenario.
In particular, we follow [6] in synthesizing observations including maximum flow height and velocity as well as deposition volume and impact area from a landslide simulation close to Bondo, Switzerland. Posterior probability densities, as shown in Fig. 2, are computationally inferred using grid approximation based on a 100 × 100 grid over a discretized parameter space following [6], namely [0.02, 0.3] for dry-Coulomb and [100, 2200] m/s 2 for turbulent friction coefficients. A joint Bayesian parameter estimation based on a smart combination of observables yields a more concentrated posterior distribution in comparison to those based on one observable only as demonstrated by [6], see Fig. 3. So far, however, it is hard to judge how the estimated posterior distribution impacts on the uncertainty of a subsequent forward simulation, hence the reliability of the subsequent prediction. To investigate this, the posterior distribution is first converted into probability values associated with full grid discretization of the twodimensional parameter space. A Gaussian process emulator has then been trained to predict the impact area, following [5]. The trained surrogate model has been used to forward propagate previously identified probabilities in parameter space, see Figs. 2 and 3. Resulting values for the predicted impact area were finally used to populate a histogram of 160 equally sized bins between 2 to 6 x 10 6 m 2 . The value of each bin correspond to the accumulated probabilities of its generating simulation runs, see Fig. 4. Highly reliable predictions are characterized by a very narrow distribution. Predictions of low reliability are characterized by a broad probability distribution. Quite interestingly, calibration based on integral values such deposit volume and impact area result in a predicted impact area of low reliability, whereas a prediction based on a simulation that was calibrated via localized deposit heights is much more reliable. In particular, we find that if a prediction of the mass flow's impact area is sought, it is a better strategy to calibrate based on, e.g. a combination of point-based measurements than to calibrate based on the impact area itself.

Conclusions and outlook
We showed that calibration of mass flow simulation models based on different types of observable data may significantly influence the reliability of a subsequent prediction. While the result is highly promising, it is very important to emphasize the limiting aspects of this study as it is based on synthetic data and assumes that the underlying Voellmy-Salm type shallow flow process model has proven to be highly plausible. Due to the carefully designed synthetic nature of our study, however, we have been able to demonstrate the large potential of intelligent data selection for improving a data-integrated simulation tool chain's reliability. In the future, type, scope and quality of the data can be systematically chosen to yield highly reliable predictions. Optimizing data acquisition for reliability management will be an important step towards increasing the integrity and acceptance of model-based policy making in geohazards engineering in the future.