The National Survey on Population and Family Health (NSPFH), Morocco-2018: a Data Quality Assessment

. The National Survey on Population and Family Health (NSPFH) is an important source of data in Morocco. Its objective is to assess, periodically, the state of population health and the impact of policies and programs put in place, by updating the main demographic and health indicators. The report of the sixth NSPFH, conducted during the period October 2017-January 2018, illustrates updated socio-demographic data and the new prevalence of several diseases. As the NSPFH results are a reference for decision-makers, researchers and professionals, it seems necessary to promote them and give them more credibility. The objective of this paper is to verify the reliability of the NSPFH data and results using demographic data quality assessment methods (graphic methods: age-gender pyramid and distribution by gender and area of residency, and statistical methods: non-response rate and age accuracy indexes) to ensure if they are of good quality. This study showed that the NSPFH-2018 data were of good quality. Indeed, the non-response rate did not exceed 1.1% for all questionnaires (household 1.1%, woman 0.5% and elderly 1.1%). The age-gender pyramid confirmed the demographic transition towards ageing and the downward trend of fertility in Morocco. The distribution by area of residency confirmed the trend towards urbanization of the country (61% urban and 39% rural). The Whipple (1.05), Myers (4.73), Bachi (2.31) indexes and the United Nations Combined Index (31.21) were all within the standards for a good quality of age declaration. In addition, the results were consistent with each other and in line with the national and international health context.


Introduction
For more than thirty years, the Ministry of Health in Morocco has been carrying out demographic and health surveys, according to an internationally approved standard methodology. These large-scale surveys are one of the main sources of population and health data at the national level and a major component of the national health information system [1]. The National Survey on Population and Family Health (NSPFH-2018) consisted of the sixth edition of surveys of this type. It aims to assess the population health status and the impact of policies and programs put in place, by updating the main demographic and health indicators. It also aims to make available, to the Government, civil society and development partners in the country, recent statistics for a better appreciation of the extent of health problems and an assessment of progress in implementing Morocco's international commitments, particularly in the context of the Sustainable Development Goals (SDGs) [2]. Thus, the NSPFH is an indispensable source for professionals at all levels. It is also the best tool around which the debate on the particular population needs can be engaged. Following our review of literature on the subject, we found that the data quality of this survey was never assessed. Furthermore, just two studies included data from Morocco. The first was conducted in 2007 by Spoorenberg where he applied the modified Whipple Index to the 2004 Morocco census data [3]. In the second, Fajardo-Gozàlez et al. evaluated, in 2014, the quality of age reporting in the census of several countries including Morocco [4]. However, the data from the NSPFH, like those from any collection operation, cannot be free of errors. Therefore, it was interesting to carry out an analysis to provide qualitative and quantitative indicators of the confidence that can be given to the results from the NSPFH. In this paper, we assessed the data from the NSPFH-2018 to ensure if they are of good quality. To do this, we have applied demographic methods of data quality assessment and analyzed the consistency of the results issued from the survey. The objective was also to give more credibility to the results of the NSPFH-2018. Furthermore, it was necessary to verify the reliability of the data from this crucial and indispensable source of health information on which decision-makers are based when developing health strategies, policies, plans and programs.

Data Sources
The data used in this paper are from the results and tables presented in the final reports of the SPFH-2003-04, NSPFH-2011 and NSPFH-2018 available and downloadable on the website of the Ministry of Health [5], DHS Program [6] and population and household projections conducted by the High Commissioner for Planning (HCP) [7].

Data Quality Assessment
To assess the data quality of the NSPFH-2018, we analyzed: 1) the internal consistency of the NSPFH-2018 results. It involved studying the data quality of the survey and its results using some graphical methods (age-gender pyramid and distribution of the population by age, gender and area of residency) and statistical methods (non-response rate and age accuracy indexes: Whipple, Myers, Bachi and United Nations Combined Index). In addition, we tried to identify several relationships and concordances between the NSPFH-2018 results; 2) the external consistency of the survey's results. To ensure that the results from the NSPFH-2018 were aligned with expected trends and health policies, plans and programs, we compared them with those of previous surveys, including the SPFH-2003-04, NSPFH-2011 and the demographic projections made by the HCP.

Non-response Rate
To assess the data quality, before testing any other method (graphical or statistical), the non-response rate is the most commonly used indicator, given the considerable bias that can result from non-response in the analysis of the data. If the non-response rate is less than 10%, it is generally considered acceptable [8].

Age-gender Pyramid
According to the National Institute of Statistics and Economic Studies (INSEE) definition, the age-gender pyramid represents the breakdown of the population by gender and age at a given point in time. It consists of two histograms, one for each gender (by convention, men on the left and women on the right) where the numbers are shown horizontally and the ages vertically. The numbers by gender and by age depend on interactions between fertility, mortality and migrations. The shape of the pyramid and its variations over the years depend above all on the variations in fertility [9].

Age Reporting Accuracy Indexes
The errors in the reporting of age have probably been examined more intensively than the reporting errors for any other question in the census/survey [10]. The quality of the distributions by age and gender was estimated by the most commonly used indexes [11]: Whipple [12,13], Myers [14], Bachi [15] and the United Nations Combined Index [11,12].

Whipple Index (W)
This index is the simplest one. It aims to measure the attraction or repulsion of ages ending in 0 or 5. Its construction is based on the hypothesis of linear evolution of the numbers between the five-year age groups, the young ages (0-22 years) and the older ages (63 and over) are excluded because, in these age ranges, the linearity hypothesis is implausible. The advantage of this index is its simplicity. Its disadvantage is to measure only the preference to numbers 0 and 5.

Myers Index (M)
Myers Index (M) measures the attraction or repulsion of ages ending in digits between 0 and 9, not just those ending in 0 or 5. It is not possible to directly compare the successive total numbers of individuals reporting ages ending each of the digits from 0 to 9 because of the normal decline in numbers with age. Therefore, Myers proposed to calculate, for each of these digits, an "adjusted number" which, if there were no attraction or repulsion would be equal to 10% of the total adjusted number. M is the sum of the absolute differences in the percentages of each adjusted number with the theoretical 10. If the age declarations are correct, all the adjusted numbers are almost equal and M is around zero. The higher the value of M, the greater the preferences or repulsions for ages ending in particular digits. Its maximum value is reached when there is a preference for all ages ending in a single number and is then worth 180. This index also gives information on the attraction of round ages (ending in 0 or 5). One of the disadvantages of the Myers index is that it is impossible to define precisely the theoretical conditions under which it takes, respectively, values 1 and 0 [11, 12, 14, 17].

Bachi Index (B)
One of the disadvantages of the Whipple and Myers indexes is the impossibility of defining precisely the theoretical conditions under which they take, respectively, values 1 and 0. To cope with this, Bachi developed an index that does not present this disadvantage. Its calculation does not consider young and older ages. Based on the 23-72 age group, Bachi found that for populations with a well-declared age, the ratio of 23-72 age group by the total 72 years varies roughly linearly according to the number of units from 0 to 9. Moreover, he showed that the slope of this line does not vary much if we slightly change the age limits. The Bachi index can vary between 0 (no preference or repulsion) and 90 (all reported ages end in the same number) [11, 12, 15, 17].

United Nations Combined Index (UNCI)
The UNCI is different from the three indexes described above. Indeed, it is calculated based on the distributions by age groups and no longer by years of age, and it attempts to measure the regularity of the distributions by gender and age. Compared to the methods of Whipple, Myers and Bachi, this method has the advantage that the calculated index reflects changes in the number of omissions according to age groups, intentionally inaccurate age declarations and preferences for ages ending in a given number of units. Therefore, this index better reflects the overall accuracy of the statistics by age. However, when the population size is high, the age distribution is, largely, random and the value of the index is affected. If UNCI < 20: the data are of good quality; if 20 ≤ UNCI <40: the data are of relatively good quality and can be adjusted; if UNCI ≥ 40, the data are of very poor quality [11,12,17].

Non-response rate
The NSPFH-2018 is a large-scale survey in which 15,022 households were successfully surveyed (Table  1.), with a non-response rate of 1.1% (1.6% in urban and 0.3% in rural area). The number of women aged 15 to 49 who took part in the survey was 9969. The non-response rate for women was around 0.5% (0.7% in urban and 0.2% in rural area). Similarly, of the 3,575 people aged 60 and above, 3,534 were successfully surveyed, representing a non-response rate of 1.1% (1.5% in urban and 0.7% in rural area).
The non-response rate was well below 10% for the three NSPFH-2018 questionnaires (household 1.1%, women 0.5% and elderly 1.1%). It confirms that the interviews were carried out on almost all statistical units selected. It is an indicator of good data quality [8]. The small value of the non-response rate is explained by the benefit of interviewer-assisted survey methods, where the interview is personalized, questions and concepts can be interpreted, and the interviewer can increase the response rate and data quality overall [18]. However, the response rate is only an indicator of the possibility of nonresponse bias. Even when the response rate is high, there may be a non-response bias. Therefore, a non-response bias analysis is always recommended [19].

Population Structure according to gender, age and area of residency
The results of the NSPFH-2018 showed that the gender distribution of the population presented equality between men and women. Indeed, of the 67,795 individuals surveyed, 33,926 were women (50%) and 33,869 were men (50%). The sex ratio was almost equal to 100 men per 100 women. Similarly, the sex ratio in urban area was 99 men per 100 women (20664/20871=99.0), while it was just over 101 men per 100 women in rural area (13206/13054=101.2). For the age group 0-4 years (Fig.  1.), the sex ratio was 108 boys per 100 girls. It decreased with age, as the mortality rate of boys is generally higher than that of girls. From the age of 20, women were in majority. Since the age of 50, the sex ratio represented some irregularities. These variations are observed, not just in Morocco, but also in several African countries [17]. Furthermore, it should be noted that, generally speaking, the sex ratio decreases at an advanced age, given the longer life expectancy of women. It is, therefore, largely dependent on the age structure of the population [20].  The analysis of the age-gender pyramid of the surveyed population (Fig. 2.) revealed that the general trends of the Moroccan population structure (including fertility, mortality and ageing) were reflected in the NSPFH-2018. Indeed, a demographic transition towards ageing, which is gradually taking place, has been observed. The shape of the age-gender pyramid also showed that children were slightly more numerous than girls in the 0-4 and 5-9 age groups. A slight shrinkage in the 0-4 age group was also observed. According to Hardy (2002), it is explained by the Fertility rates declining [21]. The results showed that there were 2.38 births for every woman aged 15 to 45 in 2018 compared to 2.6 in 2011 [22]. Overall, the population pyramid showed a symmetry between men and women. However, an increase in numbers of women aged 50-54 at the expense of the younger age group was noted. It was an anomaly, which can be attributed to investigators who, to avoid extra work during the individual survey, tend to transfer, during the household survey, some women of the 45-49 age group to the 50-54 age group. Because after the age of 50, women were no longer eligible for the individual survey [23]. Then, it was an observation error. It came from the risks associated with the observation itself. It depends on the rate of coaching, the quality and training of the interviewers, the clarity of the questionnaire, etc [24]. It can also reflect the accuracy of age declarations among women, particularly in a population where 39.5% of women have no educational level [2] and a good number of individuals are not registered at Civil status [23,25]. In addition, the decline in malnutrition rates may be a potential explanation, as infant protein malnutrition syndrome was and is (in the poorest economies) a limiting factor in an adult's cognitive abilities (which may lead to reporting errors by age) [26].  7%) (Fig. 3.). The trend towards urbanization of the Moroccan population was reflected by the high proportion of the urban population at the expense of that of the rural area [27,28]. NPSFH-2018 was of relatively good quality and can be adjusted. It confirmed the irregularities in the age-gender pyramid and the sex ratio observed previously which were due to the observation errors. Similarly, it should be noted that the UNCI does not examine the problem of age attraction as the previous indexes do. In addition, the age group affects this method, as it applies to data classified by five-year or ten-year age groups [10,11,12]. Therefore, it is advised to limit the calculation of age group ratios and sex ratios to age groups up to 70 years. Above this age, the series shows significant variations [17].

Consistency of Results
The results of the NSPFH-2018 presented several concordances. Indeed, the low proportion of pregnancies among women aged 15-49 (7.1%) is due to the high use of contraception (70.8%), the unmet need for family planning (11.3%) and the high average age at first marriage (31.9 years for men and 25.5 years for women). It explains the downward trend in Fertility rate [30]. It declined by 1.

External Consistency
When we compare the population pyramid of Morocco in 2018 [7] (Fig. 4.) with that of the NSPFH-2018 [2] (Fig. 2.), it revealed that the general trends in the population structure (including fertility, mortality and demographic transition towards ageing) have been reflected in the NSPFH-2018.

Conclusions
In this paper, we assessed the data quality of the NSPFH-2018. We applied several demographic data quality assessment indexes (Whipple, Myers, Bachi and the United Nations Combined Index) to examine the quality of age reporting. We also analyzed the consistency of the results of this survey to ensure that they were consistent with each other and aligned with the national and international context. The results of our analysis revealed that the NSPFH-2018 data were of good quality. The non-response rate was very low and did not exceed 1.1% for all questionnaires. The age-gender pyramid confirmed the demographic transition of the Moroccan population towards ageing and the downward trend in fertility and mortality. The distribution by area of residency confirmed the trend towards urbanization in Morocco. The Whipple, Myers, Bachi and UNCI indexes provided information on good data quality of age reporting. However, irregularities in the population structure by age groups of five years and sex were observed. Moreover, the survey results were consistent and aligned with general trends and the national and international health context. As a result, the use of the NSPFH-2018 results is recommended. It also seemed interesting to prepare a data quality report for future surveys. We also recommend sharing the survey database with the scientific community to allow researchers to conduct indepth analysis of health problems. In terms of limits, we note that we used the weighted data represented in the final reports of the surveys studied. However, to assess the quality of age reporting, it is preferable to use the raw data before weighting [10,11,12]. In addition, analysis of the response rate for each variable would be more meaningful than analysis of the overall response rate [19].