Flood risk vulnerability assessment : hierarchization of the main factors at a regional scale

Recent studies have shown a high flood risk exposure in France. It represents and almost one fourth of the total population and a third of jobs. In this context, a global vulnerability assessment methodology is currently elaborated and evaluated in France to bring adequate tools for flood risk management. This study raises the question of the quantification, of the qualification and of the choice of these vulnerability indicators for a given territory. This work aims to propose a new methodology dedicated for classification, for hierarchization and selection of a set of six vulnerability indicators by the means of a statistical analysis including PCA and ANOVA analysis depending of their relative impacts and correlation with the estimated risk level on the territory of Chalon-sur-Saône.


Introduction
This paper investigates the statistical possibility of hierarchization, classification and selection of multiple pertinent exposition indicators to flood risk.Recent evaluations of flood risk which emerged in application of the 2007 European Directive on the Assessment and Management of Floods (Directive 2007/60/EC) have shown that the national flood risk exposure is high in France.Estimations have underlined that one fourth of the total population and a third of jobs are potentially located in risk areas [8].In this context, a new vulnerability assessment methodology is currently being developed in France to elaborate suitable tools to evaluate most measurable components of vulnerability.One of these tools is list of about 110 vulnerability indicators adaptable to spatial scales and local specificities to evaluate vulnerability to flood risk [10].This research paper presents an experimental work which raises the question of the quantification, of the qualification and of the choice of these vulnerability indicators for a given territory.This work aims to propose a classification of six vulnerability indicators depending of their relative impacts on the risk level estimated on the given territory of Chalon-sur-Saône crossed by Saône river.The goal is to reach a better definition of the nature of exposed areas, of the indicators to flood risk vulnerability and to reduce the number of calculable indicator by pinpointing the most appropriate.This will in return be helpful to design an appropriate policy.
Today and despite an abundant literature on the subject [6; 3; 13], vulnerability still represent a highly complex phenomenon with both biophysical and socioeconomical factors affecting exposure, sensitivity and adaptive capacity to flood hazard.
The 2007 European flood Directive underlined the necessity to also integrate economic evaluations of structural measures for flood protection which has the effect to increase the number of tangible indicators.Consequently, in France, and from 2009, each programs for flood prevention (PAPI a ) had to include a cost benefit analysis (CBA) by estimating the economic efficiency of the planned program.and to consider non-monetary damage (human health, employment, environment, agriculture), the French government [9] developed a new methodology called Multi-criteria analysis (MCA).It allows the calculation of new and non monetary indicators such as the population exposed to flood or the number of employments the program is expected to protect etc...It introduces then new indicators dedicated to the measurement of the vulnerability on human health and security, employments and environment.
Therefore, the volume of numeric vulnerability indicators increased significantly during the last decade and it has to be linked with the development of multiple national or international initiatives pushed ahead by climate change agenda [11].Furthermore, in 2011 the French Ministry of Ecology has published a national strategy for flood risk management [9] driven by three major objectives : -to increase the security of vulnerable populations; -to stabilize in the short term, and to reduce in the medium term floods damage costs; -to sharply reduce the recovery time (time to regain an acceptable functioning) for a given territory.Hence, a global vulnerability assessment methodology is currently being developed at the French national level to assist this flood prevention strategy.This vulnerability assessment consisted in a list of 110 quantified vulnerability indicators connected with each of the objectives of the national strategy previously mentioned.They will be organized in a toolbox including a methodology to calculate or to estimate each individual indicator (flood impacts on population, on jobs, damage costs, etc.).
In this context, the study also raises the question of non tangible damages indicators for a given territory.There is actually no methodology which selects relevant indicators related to flood risk.This would also be of great interest to produce a spatially explicit vulnerability index obtained by a combination of selected indicators.This is especially the case in the field of socioecological and spatial vulnerability assessments.Dedicated maps are encompassing a wide range of biophysical and socio-economical aspects by working on the definition of an aggregated vulnerability index (AVI) which is an aggregation of multiple biophysical and socio-ecological indicators of vulnerability.The task is gaining more and more interest in the research community as they represent strong visual tools in environmental policy formulation and in communication [12].The AVI is usually obtained by using holistic methods mapping [4; 2].
The AVI has many other advantages that deserves to be underlined.It reduces the amount and the complexity of the information that must be communicated to the population or to bring adequate tools for policy formulation.It also provides an indication of the interaction of multiple, spatially homogenous indicators through one single index.In the other hand, one of the main critics addressed to this AVI is that it represents a dimensionless aggregation of several indicators for the related phenomena.Such generation of a single composite vulnerability index may be problematic because potentially important information with respect to the relations between the original variables are occulted in this resulting index [2].Finally proposed AVI methodologies are mostly deterministic as they do not include any statistical selection process and do not consider any possibility to exclude any of the calculated indicator.
However, our approach has mainly focused on the better definition of vulnerability components which are impacting the territory at both micro scale and macro scale levels.Our approach is divided into two main tasks: -How to classify and select the relevant vulnerability indicators in accordance to their link with hazard level?-Which ones to choose in a wide range of vulnerability and exposition indicators?
In application of the 2007 European Directive on the Assessment and Management of Floods (Directive 2007/60/EC), France has identified 122 regional areas potentially impacted by a significant flood risk.The municipalities located in these flood areas are required to elaborate a flood risk management strategy by the end of 2016.Within this frame, this work aims to propose a classification, a hierarchy and a selection of the main vulnerability factors at a regional scale.To do so, we underlined the relationship and the correlation between each vulnerability indicator calculated independently, and a risk indicator elaborated on the basis of data generated by hydrologic and hydraulic numerical models and land use characteristics.The classification, the hierarchization and the selection of tested vulnerability indicators is then proposed by respectively using a simple Pearson correlation, a PCA (Principal Component Analysis) and an ANOVA (ANalysis Of VAriance).The ANOVA is the only methodology which is able to proceed to the selection of the most relevant indicators (and symmetrically to the statistical exclusion of less relevant ones) depending on their relative correlation weight and impact on the risk level.

Context and data
Chalon is crossed by the Saône and several of its tributaries -the Thalie, the Corne and the Grosne responsible for frequent flooding.The dynamics of flooding of these rivers is a slow kinetic: floods take several days to reach their peak, the decline may take two to three weeks which induces particularly long immersion times for flooded territories.The damage caused by these floods are often important, but they generally do not generate casualties.The Saône reference flood is the flood of November 1840 (Fig 2).This flood have been generated by major precipitation across the basin of the Saône during the month of September, coupled with torrential rains on southern basin and melting early snow on the reliefs of Jura.In the twentieth century, the three floods of 1981, 1982 and 1983 (Fig. 2), each with a return period of between 20 and 50 years, have strongly marked the spirits, causing significant damage to cities and industrial and agricultural activities.The last significant flood the Saône has known is the 2001flood episode (Fig. 2) with a time of return between 20 and 30 years.The overall economic balance of this flood was evaluated at 280 M€.Data collection consisted in seven flood scenarios (T2, T5, T10, T20, T50, T100, T1000) covering the area of Chalon-sur-Saône (Fig. 2) including 7 rural cities.Flood scenarios have been modelled by the Saône-Doubs watershed institute.Each of these scenarios consisted in a simulation of the flooded area and water depths given at a scale of 1m 2 grid (1m x 1m cell size).Two databases have been used for the topography, first a LiDAR (1m horizontal resolution) with only a cover along the Saône river and a DEM (BD Alti from the IGN) with a 25m horizontal resolution including all the studied area.Finally, we used different land cover data (population, houses, agriculture, natural area…) using the BD Topo (at building scale), Corine Land Cover (at plot land scale) and INSEE database to estimate the flood impacted population and goods.

Implemented methods
In order to perform our statistical analysis of the spatial distribution of floods impacts, results of flood scenarios from numerical calculations have been represented in a 100 m grid cell, which is a good compromise comparatively to the size of the total area, leading to a total of 7 612 cells (Fig. 3  The methodology applied on the territory of Chalon-sur-Saône could be divided into two distinct phases: 1.In a first step we have calculated a synthetic variable (the FHI: flood hazard index) grouping all the hazard data together (Fig. 3) at the cell grid level and in another hand we have calculated and affected in the cell grid all the variables of the land use composition (Fig. 6: population, urban areas, natural areas, etc.).
2. Secondly, we proposed to use and to compare three different statistical methods to classify and select the most relevant vulnerability and exposition indicators.This is permitted by the mean of a statistical crossing of our produced matrices of FHI with our vulnerability indicators of land use and demographic variables.

Calculation of a Flood Hazard Impact (FHI)
The calculation of the Flood Hazard Impact (FHI) synthesize in a single aggregated variable all the flood hazard data (probability of occurrence, water levels) for four flood scenarios (Fig. 3) we owned on Chalon-sur-Saône.The Flood Hazard Impact (FHI) can be determined by the equation:

FHI = [water level × flood probability]
(1) In order to perform a statistical analysis of the spatial distribution of floods on the territory, we calculated the flood impact in a 100 m grid cell (Fig. 4).The FHI variable characterize only the natural hazard data and does not include any information about the land use (population, urban areas, natural areas, etc.).We observed a logical increase of the FHI as a function of the riverbed distance (Fig. 4).As illustrated on the Figure 5, the more we are close to the riverbed, the more likely a frequent flood scenario could impact the area and most water levels are high.First a simple Pearson correlation coefficient was calculated for each indicator giving a first indication in terms of existing links between indicators and the FHI.Then, a principal component analysis (PCA) was applied to give a classification of homogeneous groups of indicators depending on the sign (positive or negative) of the link between FHI and each indicator.These results were useful to identify the most relevant vulnerability indicators as a function of their flood exposure.These statistical analysis aims to highlight the relationship between a variable of exposure level (hydrologic impact: water levels and flow velocity) with spatial vulnerability indicators for each one of our 7 612 cells.
A PCA helped to give an information about flood impact by catching the correlation between FHI and altitude.

Studied variables
Various vulnerability and exposition factors have been developed as a rapid and consistent method to characterize the relative vulnerability at the cell grid level.Our approach consisted in an assessments of the physical vulnerability of the area.Hence, we have calculated and studied six indicators on the territory (Fig. 6

Statistical methods
To compare variables (vulnerability indices) as a function of the water levels and flood hazard impact (FHI), we used 3 different statistic methods previously described (1) calculation of correlation coefficient, 2) PCA, 3) ANOVA).

Correlation coefficient
The Pearson correlation coefficient gave a measurement of the linear correlation between FHI and each vulnerability indicator.This coefficient gives a value in the range [ -1; +1].The value of +1 correspond to a total positive correlation, í1 is the total negative correlation while 0 is the total absence of correlation.

Principal component analysis (PCA)
PCA is a statistical procedure that uses an orthogonal transformation to convert a number of potentially correlated variables into a set of uncorrelated variables called principal components that captures the variability in the underlying data.The number of principal components is less than or equal to the number of original variables.PCA is a non-parametric procedure and is therefore independent of any data probability distribution hypothesis [1].PCA uses orthogonal linear transformation done in such a way that the first principal component has the largest possible variance where the total variability within the data is the sum of the variances of the observed variables, when each variable has been transformed so that it has a mean of zero and a variance of one [7].

ANOVA
The ANOVA allows both a hierarchization and a selection of relevant indicators.The selection of variables is performed with a stepwise selection to assess contributions of each dependant variable on the F-statistic (the Fisher statistics gives an indication on the global fitness of the model) as they are added to or removed from the model.If at a given step of the stepwise method, any variable of the model is not significant at the 5% level, then the least significant variable is removed from the model and the algorithm proceeds to the next step (removing next variable if not significant).The statistic used for fitting the model is the Schwarz Bayesian information Criterion (SBC) known for being quite restrictive for variables selection.The SBC statistic is given by the following formula: With: n=number of observations, p=number of parameters including the intercept, SSR=sum of squared residuals.
In addition to the ANOVA, a simple linear regression obtained by ordinary least squares regression.In this regression the FHI is considered as the dependant variable and all indicators selected by the ANOVA as explanatory variables.This regression is computed to estimate the coefficient as well as the sign of the correlation (positive or negative) which is are not given by the ANOVA.

Comparison of variable as a function of the water levels
Results (Figure 5) indicate a threshold between the T20 scenarios and T50 ones in terms of urban impacts induced by flood risk.PCA results discriminates clearly between two different category of flood scenarios: the most frequents on the one hand (from T2 to T20) and then moderate and extreme scenarios on the other hand (from T50 to T1000).Nearly 95% of the total variance is explained with the two first principal components (Fig. 7b) This result is mainly explained by the territorial strategy of mitigation where a proportion of the north natural area is devoted to protect the territory for a T20 flood scenario.Concerning flood exposition to potential flood depth, natural areas are concerned by high impact of flood hazard from the first scenario which is T2.
This could bring very useful information to spatial planners and policy managers to implement the right strategy which could consist in a restriction of building in some areas, to stimulate the deployment of adaptive measures or to develop only in the most suitable areas.

Correlation coefficient
The Pearson correlation is the very first intuitive statistical exploitation able to estimate the link between risk level and each territorial component taken separately.Depending on the nature of the territorial component we found three different categories of indicators (Table 1): (1) Components which are positively correlated with risk level : in our case study "natural areas" were the only indicator to be included in this category.As previously explained natural areas are playing an important role for the mitigation of the Grand-Chalon territory.
(2) Components which are negatively correlated: hopefully this is the case of most of the variables population ,urban area, houses, companies (i.e., Table 1).
(3) Components for whom we do not find any direct correlation with FHI this is the case for electric transformers (i.e., Table 1).

Principal component analysis (PCA)
We then used Principal Component Analysis (PCA) as a means of classification of vulnerability indices (population, urban areas, houses, etc.)) across broad spatial scales.The PCA is used here as a descriptive, statistical approach to data transformation as a means of overcoming variable incommensurability (Fig. 8).It gave a confirmation of our observations made with a simple Pearson correlation calculation by discriminating three types of exposed indicator (see section 4.2.1).We found that around 50% of the total variance is explained with the two first principal components.

ANOVA and linear regression (OLS)
We performed here a linear regression using the ordinary least squares (OLS).The endogenous variable (ܻ ) was the FHI.The exogenous variables (ܺ ǡ ) were selected ones within the vulnerability indicators: The main idea was to get the relative contribution to emissions of each variable on FHI.Moreover, as we wished to have a hierarchy of variables depending on their impact on the FHI we have computed an ANOVA on our five vulnerability indicators.This step was helpful for two reasons: (1) it allowed us to compare the influence of each vulnerability indicator relatively to all other indicators and to know if it is an important factor to be included and kept into the model, (2) it introduced a statistical rule to skip non significant variables.The results are presented in two distinct parts.First we gave the hierarchy of factors (i.e., Table 2) and then the selected variables, associated parameters estimates (i.e.Table 3) .
The selection of variables was performed with a stepwise selection to assess contributions of each dependant variable on the F-statistic as they were added to or removed from the model.
The results are presented in two distinct parts.First we gave the hierarchy (1 to 5) of factors and the selected indicators which indicated the importance of the link in absolute value with the flood risk indicator (FHI) (Table 2).In a second table associated parameters estimates which are giving the relative contribution to global risk of each variable (Table 3), we therefore gave a hierarchy of variables (Table 4) depending on their exposition and their vulnerability unless we could not really provide a clear interpretation of estimated regression coefficients .
Looking at the results from the entire set of variables (Table 2), we can see that the indicator of electric transformers (shaded in grey) is not significant in the model and was consequently removed from the model.Our five other indicators were acceptable according the ANOVA and ranked in the model in the following order of importance: (1) urbanized area (m²), (2) natural area (m²), (3) number of houses, (4) population and (5) companies and services.
Moreover, these results suggested that the electric transformers localization have no additional information to bring on the global biophysical vulnerability in comparison with all significant variables included in the final model (from 1 to 6).This could be explained by the fact that we do have only information about the exposition of the electric transformers and no information was provided about the real impact on the power supply and on the power grid.
Step The estimation of the parameters are giving information about the net average correlation with the FHI.The final classification of vulnerability indicators is given according to their importance in the model (ANOVA classification) and of the sign of the correlation.In first position we have the most vulnerable indicator which is the surface of natural areas, the only indicator which is positively correlated with the flood hazard.In the last position we can find the least vulnerable indicator.Companies and services are in average located at higher altitude zones.The prioritization of vulnerability indicators on Chalon-sur-Saône and the identification of variables that are statistically located in the most exposed areas are then: • Natural areas: positively correlated with the flood hazard impact, natural areas are mobilized for mitigation to absorb the impact of the most frequent floods (i.e. Figure 7) • Population, urbanized areas, houses and companies: negatively correlated with risk impact but with different levels of correlation (1: population, 2: houses, 3: urban areas, 4: companies ).This result underlined that policy measures should focus on the reduction of vulnerability measures of populations and houses located in flood hazard areas.
• Electric transformers: no correlation found with exposed areas and risk level.As already explained, no information was provided on the impact of any flood event on the power grid of Chalon and hence only the exposition of electric transformers was taken into account.Statistical analysis made on flood event data (PCA results) have showed a threshold between the T20 and T50 scenarios in terms of urban impacts induced by flood risk which enlighten our understanding of natural areas indicator classification.A previous study on Chalon-sur-Saône has already shown the gap in flood impact between these two flood scenarios T20 and T50 [5] (Fig. 9).This result emphasizes the necessity and the importance of the relation between urban planning and floods impact on the territory and to bring adequate policy measures of vulnerability reduction on the T50 flood scenario prevention.

Conclusion
There are likely to be multiple types of vulnerability occurring simultaneously over a complex territory.In response to this complexity it is often argued that in order to provide policy relevant research one should quantify vulnerability in relation to a single, clearly identified issue.It therefore presupposes that the most important sources of vulnerability within the system were already known.
Our methodological approach appeared helpful for the hierarchization and the classification of relevant vulnerability indicator.The statistical analysis was helpful to precise which components of vulnerability were impacting the territory of Chalon sur Saône.Comparing to the simple Pearson correlation calculation, or even to the PCA (Principal Component Analysis), the ANOVA (ANalysis Of VAriance) allowed us t to proceed to the selection of the most relevant indicators (and symmetrically to the statistical exclusion of less relevant ones) depending on their relative correlation weight and impact on the risk level.In our case study the ANOVA was able to: 1. give a classification of vulnerability indicators according to their importance in the model (ANOVA classification) and of the sign of the correlation given by a simple OLS statistical regression which has also taken into account cross-correlation effects between other indicators which is not the case with the Pearson correlation coefficient.
2. exclude the indicator of electric transformers localization.We have to underline that no information was provided about the real impact on the power supply and on the power grid.Therefore, the lack of data on the vulnerability of the power grid could have played a role in favor of the exclusion of the variable in the global model.
The presented methodology has the advantage to hierarchy, to classify and to select relevant and usable vulnerability indicators by the means of statistical tools.Nevertheless some further improvements of the method will be considered in the future.Some of these improvements are 1) the inclusion of additional indicators of vulnerability and exposition to flood (damages, agriculture, networks, energy supply…), 2) to test the method on more experimental sites with different socioeconomic context and 3) finally to introduce the computation of partial least-square regression (PLS) to obtain predictions on vulnerability indicators for unmodelled flood scenarios which would be of a great interest for territories that have lacks of modeled flood scenarios.

Figure 1 .
Figure 1.Location of the studied area for flood risk evaluation

Figure 4 .
Figure 4. Method of calculation of the flood hazard impact in a 100 m grid cell.

Figure 5 .
Figure 5. Illustration of the method of calculation of the flood hazard impact on 2 fictive points (houses).The point 1 is impacted by 3 flood scenarios (T20, T50, T100) and the point 2is impacted only by 1 flood (T100), as a consequence, FHI of the point 1 is higher than in point 2. In the 2 equations, H is the water level for the different flood scenarios.
Finally, an ANOVA (ANalysis Of VAriance) was computed to propose a selection of indicators by considering the exclusion of indicators from the final set based on statistical rules and a hierarchy of variables which have highest positive relations depending on their contribution with the FHI.The ANOVA is helpful to eliminate non significant indicators from the model and to have a scale of comparison for each vulnerability indicator in order to know which are the vulnerability indicators to be included in or excluded from the model.

Figure 6 .
Figure 6.Spatial distribution of 5 variables on the studied area.a) natural areas, b) houses, c) urban areas, d) population e) firms, companies and services.

Figure 7 .
Figure 7. PCA analysis: altitude impacted by flood scenarios a/Variables factor map b/Percentage of variance

Figure 9 .
Figure 9. Relation between urban planning and floods impact on the territory between T20 and T50.

Table 1 .
Person correlation coefficient

Table 2 .
ANOVA: selection of variables performed with a stepwise selection.SSR (Sum Square Resid) : should be minimum for a good fitting.BIC is the objective function of the algorithm, and have to be minimized.

Table 3 .
Linear regression of FHI and vulnerability indicators.FLOODrisk 2016 -3 rd European Conference on Flood Risk Management

Table 4 .
Final ranking of vulnerability indicators FLOODrisk 2016 -3 rd European Conference on Flood Risk Management FLOODrisk 2016 -3 rd European Conference on Flood Risk Management