Model-Based Diagnosis of Telecommunication Cooling Systems Malfunctioning

A model is developed that allows simulating the most-likely failures possibly occurring in freecooling (FC) systems of telecommunication (TLC) switching rooms. Main aim is to provide an effective and online implementable diagnosis method, which in turn will allow fulfilling the threefold function of safeguarding electronic equipment, ensuring desired air quality in case of human presence and reducing malfunction-related waste of energy. Specifically in this work, obstruction (reduction of the volumetric flow of air introduced into the room) and loss of efficiency (degradation of the fan) are deepened. Two black-box sub-models were developed to simulate the above described faulty functioning of the free-coolers. Subsequently, the fault signature matrix was developed, through which the “symptoms”, calculated as residuals between the “faulty” and “non faulty” conditions of the monitored variables, are associated to the corresponding faults. The peculiarity of the telecommunication sector, where nowadays data acquisition and monitoring platforms are significantly spreading to monitor most significant energy consumptions, including cooling loads, was proved essential in guaranteeing effective isolation of different faults. The simulation results highlight the reliability of the developed diagnostic tool, expected to be versatile and easy to implement enough for being extended to air-handling unit diagnosis, as well as other industrial sectors.


Introduction
The Information and Communications Technology (ICT) sector is recognized as one of the most relevant contributors to global warming, as a consequence of the increasing demand for internet, cloud-sharing and internet of things services. The development of innovative energy saving strategies themselves is expected to strongly rely on the availability of powerful internet services, so as to support the current transition towards a more data-based management and maintenance of advanced technological equipment in the Industry 4.0 framework. Therefore, ICT companies are expected not only to support the achievement of energy sustainability worldwide through their services, but also to develop themselves advanced energy intelligence protocols, so as to valorize the availability of a large amount of highly informative energy consumption data. The energy consumption of a generic COF can be preliminarily classified in two main categories: direct and indirect consumptions, the former corresponding to the telecommunication equipment demand, while the latter consisting mainly of cooling systems energy absorptions [1]. In some cases, the indirect energy * Corresponding author: msorrentino@unisa.it demand can reach up to 50% of total energy supplied by the grid [2]. Therefore, most prominent telecommunication companies are focusing on improving monitoring, control and diagnosis of key ancillary devices, such as air handling units, chillers and, most of all, free-coolers. The latter are indeed the most strategic ones in terms of cooling efficiency, especially in cold climatic zones and during mild seasons in a wider world area. Moreover, the spreading of free-coolers in central office and data-centers entail carefully and continuously monitor their correct functioning in terms of met cooling load, so as to guarantee proper and long-lasting operation of TLC equipment. Besides the above-recalled technological needs, it is nowadays becoming relevant the impact of supplied airquality in industrial rooms that might host any kind of personnel, such as maintenance and/or other TLC operators. The latest COVID19-related issues particularly suggest exploiting the availability of dataanalytics methodologies for detecting free-cooler malfunctioning (e.g., air filters obstructions) that might be associated to poor-quality air circulation within industrial environments [3].
To address the above-described needs, in the current paper a model-based diagnostic procedure is proposed. Main aim is to carry-out a simulation analysis aiming at proving the suitability of the selected methodology, on one hand, and, on the other, to prove its suitability and versatility for subsequent extension to other cooling systems faults, as well as to other industrial sectors.

Diagnostic methods
The diagnosis of a system's faults mainly involves three activities: fault detection, isolation and identification. For the sake of completeness, the reader is addressed to [4] for a detailed glossary of the main diagnostic terms.
There are different ways of classifying approaches to the problem of diagnosing a system, for example depending on whether it is done while a system is running or not, or whether it is based on deterministic information (for example, obtained from a model) or stochastic information (e.g. historical data, statistical data).
In general, the diagnostic methods are classified into three categories: • Signal-based • Model-based • Knowledge based Signal-based techniques are based on comparative assessments of the state of a system being tested with other known events. As long as the behavior of the system under test remains similar to that of an already known one, considered healthy, the system then does not present defects. When the measured behavior differs from the reference one, a fault is detected and can also be isolated and identified. This method has the advantage of simplicity of use, however at the expense of the necessary presence of different sensors for monitoring the operating conditions.
Model-based techniques involve the description of the systems to be analyzed, through mathematical models and physical laws that govern their behavior. Such an approach is generally more efficient, in that it is able to solve new or unexpected situations more easily, since this technique can integrate and replicate, thanks to its mathematical models, a wider range of behaviors, although not previously observed in other real systems.
In this case, having available a model capable of simulating the behavior of the real system, the detection of faults can be obtained by comparing the results of the model in normal operating conditions with the same values obtained in the presence of faults (values that can be obtained modifying the model parameters in normal operating conditions).
The analyses carried-out via model-based approach can overcome some of the drawbacks of the signalbased method, but they need, as previously said, an accurate and complete model of the system under study: it is often not available or a vague and incomplete knowledge is in place, thus requiring an alternative that is not based on the complete knowledge of the mathematical model of the system.
With the Knowledge-based approach, in addition to the generation of symptoms, using quantifiable information the feedback provided by human operators can also be exploited to generate heuristic symptoms: they can be obtained for example through the human observation and interpretation of particular noises, odors, vibrations, etc. Such an approach can fruitfully integrate the above-described model-based techniques, so as to improve both effectiveness and reliability of the fault diagnosis process.
Considering the physical phenomena dealt with hereinafter, the model-based approach was deemed suitable and was thus selected.
3 Model-based approach to the diagnosis of FC systems As already mentioned above, the objective of this activity was to develop a model-based procedure for fault detection and isolation of FC systems used in TLC switching rooms. To accomplish this, a room thermal model (RTM), previously developed by the authors [5] was used, capable of accurately simulating both the thermal dynamics of the room under consideration and the power absorbed by the cooling system.
The main contribution was to integrate the original fault-free model with fault sub-models, thus enabling the original RTM to simulate both thermal dynamics and cooling load trajectories associated to faulty functioning of FCs.
The following paragraphs will detail the adopted faults modeling approach. Fig. 1 shows the TLC room model taken into consideration for the current study. The cooling system, in this case, consists of two free-coolers and two air handling units (AHU). Such cooling devices are managed via thermostatic control; therefore, nominal operating conditions (in terms of supplied air flow and absorbed electric power) were set by carrying out specific experiments. For the sake of simplicity, the analysis was conducted on a single FC (i.e., FC_2). Knowing, with P TLC AHU in AHU out 2 respect to such FC, the values of air volumetric flow introduced into the room, total absorbed power, efficiency, it is possible evaluating the properties of the air (neglecting the effect of humidity) entering the room. Particularly, the formula of the power absorbed by the FC (i.e., eq. 1, derived from [6] by safely assuming a compressibility ratio of 1 [7) allows estimating the fan total pressure (FTP) [8], as follows:

Nominal functioning
where t and el represent the total and electric motor (i.e., the FC prime mover) efficiency, respectively, whereas VFC [m 3 •s -1 ] is the FC volumetric flow. As for t determination, starting from a verified information, namely that the measured FC volumetric flow (i.e., VFC,n=3.1 m 3 •s -1 ) corresponds to 70% of the maximum value, it was possible deriving the needed data (i.e., t,n =0.75) for the adopted fan type (i.e., radial blade centrifugal fan [6]). Moreover, the nominal electrical power absorbed by the FC was measured (PFC,n=2.59 kW). As for the fan prime-mover efficiency, a literaturederived assumption was made, i.e., el =0.95 [7].

Sub-model of air flow obstruction
To simulate the obstruction (or clogging of a filter) fault, it was necessary to approximate the characteristic curve of the free-cooler under-investigation, for which the reference curve was used for a centrifugal fan with radial blades. Starting from the generalized characteristic curve mentioned in the previous paragraph, four pairs of flow rate-pressure increase values were selected from the reference-normalized curve available in [6]. The resulting experimental data-set was then curve-fitted to obtain a polynomial regression (see eq. 2 and Fig. 2) that evaluates VFC as a function of FTP, as follows: The obstruction to the passage of air through the fan is caused by the deposit of solid particles, such as dust, on the air filter. Such a phenomenon neither can be controlled nor predicted; however, it generates the negative consequence of reducing the useful section to the air flow, thus altering the functional characteristics of the fan. The simulation of this fault type is carried out by modifying the nominal operating point and therefore considering a lower volumetric air flow [9]. Fig. 2 shows the fault case: the obstruction was simulated considering a decrease in air volume flow of 30%, with the consequent increase in pressure shown in the figure and quantified by eq. 2. Such a fault intensity was assumed taking into account the selected FC specifications.

Sub-model of efficiency loss fault
The efficiency loss is a phenomenon that inevitably occurs in every type of machine/device, so a diagnostic tool that avoids the harmful consequences is useful. In this case, it was necessary to approximate the relationship efficiency vs. volumetric flow (see eq. 3), valid at the rpm at which the FC under-investigation is operated, using the same procedure adopted in the previous paragraph for the calculation of the pressureincrease vs. volumetric flow relationship (see eq. 2).
The graphical representation of the efficiency function at fixed rpm is shown in Fig. 3. The approximation was obtained with a rational function (see eq. 3): as can be seen from the above-mentioned graph, the total efficiency sub-model guarantees an excellent level of accuracy in estimating performance.
This type of fault was simulated by multiplying the above-introduced total efficiency function by a real number, positive and less than one, causing compression on the vertical axis of the modeled curve shown in Fig.  3. In the following, a 20% efficiency penalization was assumed to simulate such a faulty condition. Such an efficiency penalization can be representative of a number of causes, ranging from prime mover dependent losses (in this case the efficiency penalization should rather be moved to el in eq. 1) to specific fan losses, such as aerodynamic, acoustic and bearing fan losses [10].

Fault signature matrix
The "fault detection and isolation" process was carried out using an appropriate faults matrix (see Table 1 [11]), developed by applying the fault tree analysis approach adopted in [12]: it presents the errors estimated via eqs.
As can be seen in Table 1, in the case of obstruction, both errors must be higher than zero, whereas in the case of efficiency loss the error in terms of TLC room temperature (T [°C]) is expected to be negligible. It is worth mentioning here that the variable Pcooling represents the overall absorption of cooling systems to meet current cooling demand.

Results
The diagnosis phase was characterized by the observation of the graphs in Fig. 4 and Fig. 5, produced by the simulations in the absence (i.e., time interval 0-100 hours) and presence of faults (i.e., time interval from 100 hours on). Simulated trends (red-highlighted) in the room temperature ( Fig. 4.a and Fig. 5.a) and the power (Fig. 4.b and Fig. 5.b) absorbed in the event of a fault are compared to those simulated in normal operating conditions (blue curves). Looking at the graphs in Fig. 4.c/ Fig. 5.c and Fig. 4.d/ Fig. 5.d, the differences in temperature and power absorbed emerge in the two simulated faulty operations. As for the obstruction (i.e., F1 in Table 1), both errors (see eqs. 4 and 5) were higher than zero: in fact, a decrease in volumetric flow implies in this case a lower absorbed power (because of the higher t and lower VFC induced by the obstruction, see Fig. 2), which would apparently seem a positive factor. Nevertheless, such a malfunctioning results in a lower cooling capacity of the TLC room since, as a consequence of the adopted openloop thermostatic control, a reduction in VFC inevitably causes T to increase. The result is therefore an increase in T, which could cause damage to the electronic components. Moreover, the different cold air mass flow processed by the FC causes the FC on-off sequence to change with respect to the unfaulty case, which in turn causes the cooling power trajectory vary as well. Such a different trajectory (see Fig. 4.b) allows explaining why the percent error of Fig. 4.d and Fig. 5.d significantly differ from each other: the former exhibits much higher peaks, due to the different schedule induced by the adopted thermostatic-based control logic, as compared to the unfaulty simulation. The trends described in Fig.  4 in the case of F1 therefore comply with the physical considerations that led to the determination of the faults matrix described in Table 1.
Moving to F2 (i.e., efficiency loss in Table 1), such a malfunctioning essentially causes the increase in the power absorbed by the fan, when processing the same volumetric flow (i.e., VFC,n, see section 3.1) as in nominal conditions. Therefore, only the error relating to the cooling power (E2 in eq. 4) drifts from zero. Fig. 5 shows the graphs obtained by simulating the FC fan efficiency loss. In this case, as mentioned above, the temperature error is close to zero, while the simulated faulty Pcooling trajectory significantly differs from zero, thus confirming the validity of the proposed FSM (see Table 1) to detect and isolate different fan-related faults.
Going into the details, it is worth remarking that the loss of efficiency, caused for example by greater mechanical friction in the bearings, causes higher degradation of mechanical energy into thermal energy: it is partially transferred to the fluid, thus producing an increase in temperature. However, the comparison between the FC supply temperature in both cases (with and without fault) showed a substantially negligible difference (i.e., 0.17 ° C), thus fully supporting the choice of considering E2 (see eq. 5) close to zero in case of F2. The above-mentioned average temperature error was evaluated by means of well-known thermodynamics principles and properties applied to centrifugal fans [7].

Conclusions
A model based diagnostic procedure was developed, particularly aiming at meeting both free-coolers-related energy efficiency and personnel and equipment safety targets in TLC sites. The procedure was developed by suitably combining physical knowledge of the process under-study with experimental data and information provided by telecom operators, as well as with an available dynamic model of TLC room thermal dynamics. The latter tool was exploited to validate the proposed faults-to-symptoms correlations through realistic scenario analyses. The ability in distinguishing between device (i.e., the fan) and ancillaries (e.g., air filters etc.) proves the efficacy of the proposed technique to improve, on the basis of easy to monitor data (i.e., temperatures and power absorptions), current energy performance and air quality in industrial environments.