Prognostic techniques for aeroengine health assessment and Remaining Useful Life estimation

. Predictive maintenance is the latest frontier in the management and maintenance of many industrial assets, including aeroengines. Made possible by last decades advances in monitoring equipment and machine learning algorithms, it permits individual-based maintenance schedules, on the basis of performance monitoring and estimates resulting from the application of diagnostic and prognostic techniques, whether on ground or real time. Predictive maintenance results in operational cost reduction and asset usage optimization, if compared with traditional maintenance strategies, which instead may suffer from unanticipated failure or unnecessary maintenance and therefore higher operational costs. In the study, Remaining Useful Life (RUL) estimates will be carried out for different turbofan engines, based on historical individual and fleet data made available by the Prognostics Center of Excellence at NASA. The design of Prognostics and Health Management (PHM) algorithms requires at first an analysis of available data to identify which of them is effectively related to equipment degradation and hence could be useful in determining future system evolution and predicting failure. In particular, RUL prediction of test engines suffering from high pressure compressor fault with exponential degradation trend has been carried out with both regression and Artificial Neural Networks (ANNs). In turn, different regression models and neural network architectures have been compared, namely tree regression with different levels of tree depth, Gaussian Process Regression (GPR) with different kernel functions and Multilayer Perceptron (MLP) with one to three hidden layers and varying number of nodes. The objective is to demonstrate the capability of such machine learning algorithms to predict engine failure and thus their importance in supporting predictive maintenance planning, and to evaluate the quality of results in relation to the algorithm structure. Results show comparable performance in terms of Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) of predicted with respect to actual RUL, in particular predictions obtained through recourse to multilayer perceptron reveal to be the most accurate, with a RMSE of 17.38 and a MAE of 12.50.


INTRODUCTION
Reliability, availability, safety and maintenance cost effectiveness have been an important concern in many industries.
For this reason, predictive maintenance is one of the key figures within the field of Industry 4.0, since it guarantees high equipment availability and reduced downtime, allowing for an optimal exploitation of the item itself and its maintenance process and thus reduced operational costs.
Predictive maintenance relies on Prognostics and Health Management (PHM) systems, which provide overall health state of machines or complex systems and assists in making correct decisions on machine maintenance.
The main duties of PHM technology are monitoring of key features, assessing engine health, identifying and isolating potential faults and establishing degradation trends to be used to predict the engine remaining useful life (RUL).
Starting from either a fault detection or a degradation trend, prognostic models can be developed to predict future evolution scenarios of the engine state until a predefined threshold of acceptability.
This can be intended as the point at which the impact of the potential fault on the system performance is no longer tolerable or as the remaining useful life of life limited parts.
In fact, health monitoring plays the preliminary role of monitoring system performance, identifying eventual faults or abnormal behavior and then enabling a future perspective to be delineated by means of prognostic algorithms. This in turn permits a different maintenance approach, which provides for a maintenance action to be taken when necessary but however before failure occurrence: in this way, the system can be exploited at a maximum and the maintenance operation can be planned in optimal manner.
In the present paper, fleet data of several turbofan units available at NASA PCoE Data Repository are analyzed and employed to implement different predictive models by means of three distinct machine learning algorithms, namely Tree Regression, Gaussian Process Regression and Neural Networks, whose scope is the prediction of Remaining Useful Life of test engines.
Algorithm performance in terms of RMSE and MAE of predicted with respect to actual values of RUL are given on varying algorithm characteristic parameters.
The most common practice in engine prognostics algorithms is that of fusing data collected from the engines into a single health index and use it as the target of models to be trained. Following this approach, Kang et al. [1] have already demonstrated the high potentiality of MLP in predicting failure.
A similarity-based approach has been employed in [2] and [3]; several neural network solutions have been used for prognostic purposes, such as Recurrent Neural Network in [4], Deep Convolutional Neural Network in [5] and Long Short-Term Memory in [6]. Though, such algorithms require a much higher computational cost if compared with MLP. Thus, this study aims to develop a MLP for RUL prediction trained with separate engine data and compare algorithm performance in relation to its architectural parameters.
As for regression kind prognostics, Taha et al. [7] compare different regression methodologies, ranging from linear to random forest, while Tree Regression is used in [8]. In the present study, the potentiality of Tree Regression and result optimization with tree development are analyzed.
Finally, Gaussian Process Regression appears to be a novelty inside engine prognostics, since its previous applications mainly concern different equipment ( [9], [10]).

DATASET AND ANALYSIS
The present work is focused upon data processing and training of a model which would be useful for RUL prediction of several individuals from a fleet of similar turbofan engines, on the basis of data made available at NASA PCoE Repository.
Data is extracted from datasets FD001 and FD002, which in turn contain a training dataset plus a test dataset each, whose actual values of RUL are given, too.
The former (training) is made of degradation trajectories of different units until failure occurrence while the latter consists of degradation trajectories of other units interrupted at an unknown time prior failure occurrence.
For FD001, both training and testing datasets contain 100 degradation trajectories, while for FD002 260 training engines and 259 testing engines are given.
Degradation trajectories are obtained through simulations of a turbofan engine model in C-MAPSS (Commercial Modular Aero Propulsion System Simulation). When building the model to be simulated, the user inserts a degradation law that can be applied to each of the rotating components (fan, LPC, HPC, HPT, LPT), to simulate performance decay during engine life.
Each of the modules can be characterized by 3 types of degradation: efficiency loss, flow capacity loss and pressure ratio loss.
In particular, data available in dataset FD001 and FD002 have been obtained from simulations run with different exponential degradation trajectories affecting only HPC (this form was chosen since it reflects common degradation trends experienced in practice), with randomly chosen coefficients a and b (0.001 ≤ a ≤ 0.003 and 1.4 ≤ b ≤ 1.6) 1 , being the exponential degradation d (non-dimensional) expressed as a function of time t in the form Moreover, an initial deterioration not higher than 0.01 is added, to account for initial wear ascribable to manufacturing inefficiencies that is commonly observed in real systems.
The output of the simulations carried out with the above-described deteriorated models, is given as a time-series of the parameters listed in Table 1, corrupted with a certain amount of random measurement noise [11]. Being very large and noisy datasets, they should be properly pre-processed before being employed to construct a certain predictive model with higher accuracy.
In fact, not all available data are necessarily useful for prognostic purposes. Hence, at first parameters assuming constant or nearly constant values throughout the engines' lives were discarded from training datasets, since they carry no information about performance decay.
Although engine degradations of all members of FD002 is ascribable to HPC failure as for FD001, the main difference between them lies in the fact that in this case engines do not operate at a unique combination of settings: 6 different operating conditions are present, which makes it necessary to preliminarily individuate operating condition clusters and associate each instance to the pertaining cluster, so that all measurements can be normalized with respect to the corresponding cluster mean and standard deviation values: in this manner it becomes possible to compare sensed parameters independently from operational settings.
All significant parameters to be retained for further analysis show a visible degradation trajectory towards end of life, with common trend for all sensors but sensor 9 and sensor 14 ( Fig.1).

Proposed Methodology
After parameter selection, there is the need to filter remaining parameters to reduce the impact of noise on subsequently implemented algorithms. A moving mean filter has been employed for this scope.
Filtered sensor measurements retained after preliminary analysis will be used as inputs to train a model capable of predicting desired responses for a set of different data.
Of course, the response of interest is remaining useful life (i.e. output of the models), at first simply defined as progressive time to failure, or Though, it can be noticed that performance decay is not appreciable in an engine early stage of life, becoming much more evident and steep after some degree of usage is reached.
This would result in very low accuracy when predicting RUL for test engines which have run only few cycles, since their degradation path is not noticeable yet.
Such issue can be dealt with by modeling a piecewise linear RUL function, which shows a constant output at the beginning of the equipment employment followed by a linear segment preceding failure, reflecting real behavior.
This means that, once individuated the point at which the RUL knee appears, a constant value is attributed to RUL for each cycle before the knee, as supported from many literature sources [4-12-13].
In the present work, it has been assumed that degradation becomes manifest in the last 125 cycles before failure, resulting in a RUL function shaped as in the picture below (Fig.2).

FIGURE 2. Piecewise linear RUL for each engine in training dataset.
The first approach consists in training several regression models in Matlab Regression Learner, with a 25% holdout validation method. That means that only 75% of training data fed into the algorithm will be used to train the model, while remaining 25% serves to test and thus validate it.
The most suitable regression models for the problem at hand, in terms of performance in predicting validation sets, appear to be either Tree Regression or Gaussian Process Regression.
Tree based models split the data multiple times according to certain cutoff values in the features. Through splitting, different subsets of the dataset are created, with each instance belonging to one subset.
The final subsets are called terminal or leaf nodes and the intermediate subsets are called internal nodes or split nodes [14].
To predict the outcome in each leaf node, the average outcome of the training data in this node is used: 3) with M subsets, cm constant for each subset and I{x∈Rm} is the identity function that returns 1 if x is in the subset Rm and 0 otherwise. Each instance falls into exactly one leaf node (that is Rm). If an instance falls into a leaf node Rl, the predicted outcome is cl, where cl is the average of all training instances in leaf node Rl.
Splitting points are chosen so as to minimize squared error of predictions in the two subsets identified with the split.
A medium tree with minimum leaf size of 12, a coarse tree with leaf size of 36 and 50 and an optimizable tree have been trained. In fact, although a finer tree could behave better when trained, it should be considered that trees with too small leaf size may incur in overfitting and perform worse when applied to a test dataset.
Then, a Gaussian Process Regression with both exponential and squared exponential kernel function is modeled. Gaussian Process Regression is a non-parametric regression which incorporates Bayesian approach in the creation of the model. and ε additive gaussian noise with 0 mean and σ 2 variance, the distribution of latent functions f(xi) is assumed to be a gaussian process, that means that P(f|x1,…,xn) = N(0, K). K is the covariance matrix obtained from the covariance function (kernel) k. A common shape for kernel function is squared exponential, expressed by the following equation: where σf 2 is the prior variance and σl a length scale parameter (θ defines the vector of kernel parameters). Another possible assumption for kernel function is that of exponential kernel, defined as where r is the Euclidean distance between xi and xj. The model is trained when its hyperparameters are such that they maximize the likelihood of training dataset Once assumed a mean function and the kernel function of the prior distribution, acquiring new data X * allows to update the posterior distribution of possible fitting functions in a Bayesian perspective, that is: which represents the joint distribution over observed Y and f* and consequently allows to calculate the conditional probability P(f * |f,X,X * ) The second method employed to predict RUL of test engines resorts to ANNs. Different network architectures were built and trained for dataset FD001 and FD002: in particular, 4 multilayer perceptrons with one hidden layer and varying number of neurons and a cascade-forward network were applied to dataset FD001, while FD002 was tested with four different MLP with one hidden layer, three MLP with 2 hidden layers and a MLP with 3 hidden layers.
Data fed as input is randomly split up into a training portion, a validation and a testing portion. Default values for training, validation and testing fractions are 0.7, 0.15, 0.15.
All previously mentioned networks gave a MSE lower than 300 and a R-squared higher than 0.91 with validation subset.
The recourse to higher complexity networks required an increment in training time, not accompanied by an equivalent improvement in performance: in fact, this consideration is supported by the common assumption that MLP with single hidden layer is a potential universal approximator for continuous mapping functions from one finite space to another [16].
The net is trained by backpropagation with Levenberg-Marquardt algorithm, which is an iterative method used to solve nonlinear least squares problems: in order to minimize a determined loss function (generally MSE) it updates the network parameters according to the following equation: where the Hessian has been approximated with J T J, ei is the error associated to the i-th instance and µ is the damping factor, whose value determines whether the algorithm approximates more closely a steepest descent or a Gauss method: µ is decreased after each successful step (reduction in performance function) and is increased only when a tentative step would increase the performance function. In this way, the performance function is always reduced at each iteration of the algorithm [17]. Training arrests whether the minimum magnitude of gradient descent is reached or maximum number of epochs (in the sense of iterations) or alternatively maximum µ. Figures 3A, 3B and 3C depict a flowchart for the three reported algorithms.
In general, the error increase with increasing depth of tree regressions and number of hidden neurons in the networks, as can be evinced from plots in Figs. 4-7 and Tables 2-3, confirms the reduced generalization capability of too fine trees and NNs with too many hidden nodes, as a consequence of probable overfitting when trained.

CONCLUSIONS
Inside the framework of Industry 4.0, implementation of accurate prognostic algorithms for failure prediction is a key factor towards the transition from a time-based maintenance policy to a condition-based maintenance, with annexed benefits in terms of reduced maintenance costs, reduced equipment failure and unscheduled maintenance occurrences, optimization of equipment life cycle exploitation.