A brief description of domestic water-use data in the city of Rabat-Salé (Morocco)

. A small-scale experiment is conducted to estimate domestic water-use rate in the city of Rabat-Salé. Water consumption in 171 households has been closely monitored over a 24-month period. During this survey both the net monthly volume of tap water effectively used in each household and the exact number of people who have lived in the same household are observed. This has resulted in a series of reliable realizations of daily per-capita water rate (denoted q ). A usual procedure for data summarizing is presented: It is observed that entity q , interpreted as a random variable, is satisfactorily represented by the log-normal distribution. Estimate for the expected value of q along with the corresponding 95% Cox confidence interval are given. Such estimates are (for instance) essential to future water system extension.


Introduction
Morocco is a semi-arid country prone to long periods of drought [1]. Its fresh water resources are scarce and exposed to high contamination risk mainly because of (i) massive and irrational use of fertilizers and pesticides, and (ii) tardy investment in wastewater treatment. For instance, the Sebou River has typified for decades the serious problem of surface water pollution in Morocco e.g. [2]. However, since 2005 a substantial effort has been made to narrow down the gap by investing more than 4 billion US dollars, as part of the national program for wastewater treatment aiming at protecting water resources against contamination and upgrading sanitary and hygiene standards for about 10 million inhabitants [3]. Yet, still large amounts of raw domestic and industrial wastewater are everyday pumped directly into fresh water reservoirs. In parallel, dams which represent one of the main Moroccan assets in terms of water resources control, are (for decades thus far) known to be effectively endangered by silting [4], and eutrophication as well [5,6,7,8]. Unless some non-conventional source of energy is envisaged, the high cost of energy in the Moroccan context makes of seawater desalination not an option to come first despite of the very large national coastline. Morocco is nowadays considered one of the vertices of what is alarmingly referred to as "thirst triangle" which includes North Africa and the Middle East [9]. Under these serious and structural coercions, Moroccan water authority is now (and very likely will be) facing the formidable task of conciliating a depleting and vulnerable national water reserve on the one hand and a steadily growing and pressing demand of the population on the other hand. Consequently, among several measures to consider, accurate estimation of influential parameters is no longer a mere formality; it has become one of the key steps to take seriously by the Moroccan decision-maker for a genuine rationalization of its water management policy.
This work starts addressing one of these influential parameters namely the rate of domestic water-use (herein denoted generically by q) for urban population.
Generally speaking, q can be defined as the daily average volume of water required by an individual to cover its basic needs. This essential quantity underlies estimation of outflows in water systems. So, it helps with shedding light on the proper size pipes should take.
In light of this, this paper attempts to go a step beyond the 'rule of thumb' frequently adopted to get rough-guess of the value of the parameter q. And, it suggests a small-scale experiment to explain how we can develop straightforwardly a well educated-guess of q instead. The next section in this work presents survey and methods. Section three is devoted to the processing of the gathered data following the mainstream steps. Finally, conclusions are given.

Survey and data
Parameter q is routinely grossed as the ratio between the billed volume of water at the scale of a city and the city nominal population P believed/assumed to have consumed this amount of water. The problem with this approach is that P is (maybe) ill-defined and oftentimes inaccurately known which results in a rough point estimate of q.
The present work proceeds within a statistical framework by interpreting q as a random variable governed by some probability density function which is dealt with a part of the question. The whole procedure is exemplified by a small-scale survey of 171 households scattered over the city of Rabat-Salé. This survey has E3S Web of Conferences 234, 00077 (2021) https://doi.org/10.1051/e3sconf/202123400077 ICIES 2020 ICIES'2020 covered a period of 24 months and consisted in (i) monitoring the net volume Vij of tap water effectively used in the household number i during month number j; leaks in water distribution system are bypassed. Vij is taken from, REDAL®, the company in charge of water and electricity distribution systems in Rabat-Salé. (ii) Monitoring the exact number Pij of people who have lived in the household i during the month j. That is, the effective consumers of the billed volume of water Vij: It is in this aspect that this survey differs from conventional consumer surveys. Entities Pij s are reliable data delivered by the families who have accepted to take part in this project.
This approach has the advantage (i) to be based on reliable data coming from an accurately and closely monitored population sample. Of course, the sample and the observation period could be made larger in accordance with the available funds (ii) to proffer more flexibility thanks to randomness in q which helps better explain the large variations in its observed realizations.
For the time being entity q is dealt with as a global indicator characterizing the whole population abstracting from explanatory factors such as income, household size, consumer behavior and so on. These regressors are documented for instance in [10,11,12].
It is noteworthy that the intended meaning given herein to domestic water-use includes the following main and basic needs such as drinking; cooking; bathing; toilet flushing; washing (utensil/clothes); cleaning. However special activities, like filling a swimming pool, are not included. As a de facto practice, such waterdemanding activities are usually satisfied via private and hidden (i.e. illicit) wells. A methodology for detecting massive illegal pumping is suggested in [13,14].
The daily per-capita domestic water-use rate q, interpreted as a random variable, has a realization qij based on the household i during month j: Dj denotes the effective number of days covered by the bill corresponding to month number j. The other entities have been defined above. Now, for a given household i the sequence of realizations as follows: 1 2 ,24 ...
Finally, the numerical value i q is taken as an outcome of the random variable q and the observed dataset would be the sequence q1, q2…. From the first glance, it is clear and definite that the qis span a substantially large range going from 28 up to 314 liters per day and per capita (l/cap d). So, proceeding with q as a random variable seems to be reasonable; and it would be rather simplistic to boil it down to a nominal constant value as it is conventionally conjectured.

Results and discussion
The boxplot in Figure 1a shows that the sample data are positively skewed. To alleviate the asymmetry, a new dataset x1, x2 … is formed by taking the natural logarithm of q1, q2.... This transformation helps map the zero-lower bound random variable q into a variable X=log(q) with tails that stretch to infinity; symbol 'log' stands for the natural logarithm. It is one of the most commonly used transforms in preprocessing water resources data. It is remarkably helpful in modeling transmissivity fields in hydrogeology [15]. The famous logarithm transform has been successfully applied to other natural phenomena as discussed in [16] (to cite a few). Firstly, it appears from Figure 1b that both the box and the whiskers corresponding to the log-transformed dataset are approximately symmetrical. A summary of the main descriptive sample statistics is compiled in Table1. It appears that (i) the sample mean and the median take close values (ii) the interquartile range IQR is approximately equal 1.30 times sample standard deviation, and (iii) The data range covers more than five times sx. These distinctive features are in good agreement with the conventional characteristics of a normal distribution.
Secondly, a histogram for the sample x1, x2,…, reported in Figure 2 recalls the well-known bell-shaped appearance of a normal distribution. For 171 data points, eight intervals have been used as recommended in [17]. The normal density function with parameters ̄= 4.4598 and 2 = 0.2268 is then plotted (dashed line) against the aforementioned histogram. It looks like a free-hand sketch of what an idealization of the probability density function of X appears to be.

(a) (b)
Quantitatively, this intuition translates into the fact that 119 out of the 171 observations (i.e.69.6%) lie within̄± , and 160 observations (i.e. 93.6%) lie within ̄± 2 which is again in good agreement with what is conventionally expected from a normal distribution. That is, 68% (95%) of the observations should be within one (two) standard deviation(s) off the mean, respectively. This indicates that the repartition of the log-transformed data points is not over peaked around the mean and its tails come down at a rate consistent with that of a normal distribution. Thirdly, the scatter plot in Figure 3a depicts the observed z-scores for data points x1, x2,… computed as ( −) −1 against the expected z-scores found by using the table of probabilities of the standard normal distribution. In this paper, the Blom plotting position formula, widely accepted to be the best for comparing data quantiles against a normal distribution, is used. It appears that the set of paired observed/expected z-scores fall approximately along the 45-degree line passing through (0,0) with a satisfactory correlation coefficient of 0.998 which is another milestone witnessing the normality of the log-transformed data; this is to be paralleled with the banana-shaped plot of Figure 3b associated with untransformed data.
The foregoing threefold visual inspection of data provides grounds for admitting normality of the random variable X, but it remains open to some subjective interpretation. So, conducting a normality test should be the next step to take. And, the null hypothesis would be the following: H0: The studied sample x1, x2,… comes from a normal distribution; against the alternative hypothesis; Ha: non-normal population distribution.
The Jarque-Bera test is applied by considering the following statistic [18]: S is the sample skewness, n=171 stands for the sample size. S measures the asymmetry in the observed dataset (xi). The excess kurtosis E assesses how thick-tailed is the observed distribution. S and E are given by: The foregoing expressions for S and E suit small size samples as well. For a normal distribution both S and E are equal to zero, their successive values are reported in Table1. And the Jarque-Bera statistic JB condenses the effects of skewness and excess kurtosis. The test consists in comparing the value of this statistic to the Chi-square distribution with 2 degrees of freedom. For the studied sample, JB=0.68 is less than 2 = 5.99 corresponding to the preselected significance level = 0.05. So, the null hypothesis is consistent with the data and there is no reason (insofar) to reject H0. In other words, the distribution of the daily per-capita domestic water-use rate q for the city of Rabat-Salé seems to be well fitted by the log-normal probability density function. An estimate (denoted ̂ ) of the expected value of q is not recovered simply by taking the antilog of the sample mean ̄ , the sample variance 2 should also be accounted for according to the following wellestablished relationship [19,20]: Equation (6) combined with data from Table1 yields for the city of Rabat-Salé: Various methods dedicated to computing confidence intervals for the mean of a log-normal distribution are compared in the literature e.g. [21,22,19]. In this study, we use the standard Cox's method which performance remains satisfactory even when the sample size is not sufficiently large. The procedure is carried out in two steps: First, the lower and the upper bounds, Lcox and Ucox, of the confidence interval for log(µq) are evaluated:  (9) z is the percentage point of the standard normal distribution corresponding to the desired level of confidence; z=1.96 for a 95% confidence level. In [22] use of t-statistic is proposed instead of z. The term under the square-root symbol in equations (8)(9) represents an estimate of the variance of ̂ (denoted by vâr(̂) ); [19] handled it with ( − 1) in the denominator instead of ( + 1) . The amendment reported in equations (8)(9) was made by Chami et al [21], in their paper's subsection 'Cox method', in order to render vâr(̂) unbiased. For large samples this would make no difference. According to Table1, Lcox=4.4971 and Ucox=4.6485.
Second, the anti-logarithm of Lcox and Ucox are taken in order to recover the confidence interval for the mean value of the domestic water-use rate q. That is, one can be 95%-confident that the expected value of q lies between Exp( ) and Exp( ) i.e. between 89.8 and 104.4 liters per capita per day (l/cap d).

Conclusions
This work outlines a preliminary investigation of the daily per-capita domestic water-use rate q in the city of Rabat-Salé (Morocco). A follow-up is to come in a near future. The dataset, upon which this study is based, is obtained by surveying 171 households over a 24-month period. Conventionally, q is simply approached as a ratio between the billed volume of water and some nominal population. The latter is ill-defined and entails good deal of arbitrariness. In contrast, this study suggests interpreting q as a random variable which realizations are based on a well-defined and closely monitored population sample.
The assumption that q is a random variable is well supported by the data coming from the survey. That is, the observed values of q range from 27.9 up to 313.8 liters per capita per day. And a statistical approach seems to be the adequate framework to assimilate naturally a dataset of this kind. It is observed that the data are well fitted by the log-normal distribution. Therefore, rationally defined estimate (i.e. beyond the so-called rule of thumb) of the mean value of q (equation (7)) along with the associated confidence interval (equations (8)(9)) are given in the case of the city of Rabat-Salé. Such statistics are essential for planning future water mains for Moroccan cities.