Prognostic Methods on Accelerator's Anode Voltage Regulator

This study investigated an adaptive control, fault diagnostics and prognostics of the anode voltage regulator system at an ion implantation accelerator. The system was modeled as a 4th order AutoRegressive with eXogenous (ARX) model, controlled by a Fuzzy Logic Controller (FLC). This model was then used as a basis for constructing and updating a fault diagnosis module and a failure prognostics module. To maintain the system’s performance, the controller’s response was continuously re-adjusted through an optimization scheme. A Failure Mode and Effect Analysis (FMEA) was conducted resulting on five failure modes of the regulator system. Fault data were generated in MATLAB simulation to train a random forest fault classification engine. The optimal random forest classifier was 20 decision trees with a fault diagnostics accuracy of 98.06%. A Hidden Markov Model (HMM) was constructed as the system’s fault progression model based on the interaction between environmental conditions and controller actions. The particle filter and Bayesian inference methods were then employed to continuously update the HMM and predict the system’s Remaining Useful Lifetime (RUL). The proposed methodology was able to integrate an adaptive fuzzy logic control, prognosis and failure diagnosis altogether allowing a continual satisfactory performance of the voltage regulator system throughout its lifetime.


Introduction
Ion implantation accelerator is a device commonly used to fabricate semiconductor wafers by accelerating and impacting charged-particles to a target material. The implantation process is highly affected by several critical factors, namely the ion beam's energy and intensity. These two variables are directly regulated by the electrical voltage applied to the accelerator's anode. As the implantation process typically takes a long time, it requires employing a control system which can adapt to external variations induced by voltage instabilities and component wear-out.
Various methods are available to maintain the system's performance by adjusting the control law according to changes in the system. Adaptive control [1] and Model Predictive Control (MPC) [2] are typical examples of these approaches. The adjustment flexibility is, however, typically designed to handle finite variations or nonlinearities encountered under normal operation. When the variation is due to degradations within the system, there exists a limit beyond which the controller cannot sufficiently adapt to maintain the desired performance requirements. It is, therefore, a strategic interest to characterize the degradation's evolution and predict when to perform corrective maintenance to ensure a continuous satisfactory performance of the system.
Studies of prognostics give insight into the future fault evolution characteristics. An et al. [3] provided a review of the existing prognostic methods. Data-driven methods such as Neural Network (NN) and Gaussian Process (GP) regression use information from collected data to identify the characteristics of damage state and predict its future state without using any particular physical model. Although being simple and fast to implement, these methods do not give any useful insight of the physical cause-effect relationship. Physics-based models, such as Paris-Erdogan fatigue crack law [4], on the contrary, depart from the well-established physics laws on the fault's causality. These models consequently provide intuitive prediction results yet may be computationally expensive especially for high fidelity models.
Byington et al. [5] identified failure modes of an electromechanical flight control actuator and monitored the associated key parameters. They utilized a smoothing algorithm to extrapolate the associated parameters to the point of failure. This approach, however, did not consider the effect of parameter uncertainties towards the Remaining Useful Life (RUL) estimate. Shin et al. [6] proposed a combined strategy of online monitoring and surveillance tests to predict the unavailability evolution of standby equipment. The method offers an integrated prediction approach applicable for both new and aged components. However, it is constrained specifically to devices in their standby mode, and no particular examination on a controlled system was done.
Not many research has addressed the integration of E3S Web of Conferences 43, 01020 (2018) https://doi.org/10.1051/e3sconf/20184301020 ASTECHNOVA 2017 fault growth prognostics with adaptive control schemes. Bole et al. [7] addressed this issue by defining the fault growth physics with a Markov chain model as a function of environmental stresses and control loads. It further demonstrated the use of a dynamic programming technique to select control actions which minimizes the risk of a continuous control load to the degrading component while minimizing deviation from the system's desired performance. While the approach was not validated experimentally on a real system, it provided some valuable understanding of how exogenous and control loads and their uncertainties thereon affects the evolution of faults.
This study aims to create an adaptive controller, fault diagnostics, and prognostics scheme for the anode voltage control regulator of the ion implantation accelerator. The result is expected to ensure a continual satisfactory performance of the system throughout its lifetime. Since the adaptive control concept is already well established, in-depth analysis is instead given to identifying failure modes and modeling the fault growth propagation driven by environmental load variations and control actions.

Model structure
A schematic of the anode voltage control regulator is shown in Fig 1. When an operator inputs a specific setpoint through the PC interface, a Direct Current (DC) motor drives a variable transformer to scale the Alternating Current (AC) voltage from the grid. The resulting voltage is then fed to a motor generator which generates a high voltage of up to 2.5 kV. The transformer's output is likewise measured and scaled down as a feedback signal.
The first step in developing an adaptive controller is defining the model. Preliminary results showed that the system is linear throughout its operating range. It is convenient to model linear systems in the form of a transfer function derived from physical equations, to give an intuitive insight of the dynamic variations taking place inside the system. Therefore the system can be simplified into Fig 2, with differential equations describing the input-output relationship summarized in Table 1 [8].
After performing substitutions and Laplace transformations, the system was rearranged into blocks of first order transfer functions shown in Fig 3. Hence the system's output ω L can be expressed as a function of the input variables as Eq. (1).    By applying a low-pass filtration with a transfer function D(s) equation (1) then becomes: where B'(s) = K m G 1 (s)G 2 (s)G 3

(s)G 4 (s)D(s) T' d = T d C(s)D(s)/A(s)
, a noise spectrum with reduced correlation to the process function. It can be considered as an output or measurement noise spectrum.
The system represented by Eq. (2) can then be written in an autoregressive model with exogenous input (ARX) discrete-time model as Eq. (3), because of its digital interface.

Model parameter estimation
System identification methodology was used to obtain the ARX model's order and parameters. in order to find the model's order, a residual analysis method was performed by using two statistical criteria namely the Akaike Information Criterion (AIC) and Final Prediction Error (FPE). Both criteria were used because the former may at times overestimate the model's order while the latter the otherwise [9]. When the model's order is correct, it will minimize the values of AIC and FPE calculated using Eq. (4) and (5).
where " # = variance of the residual signal p = n a +n b , the number of model parameters N = number of data Various recursive algorithms are available to estimate the ARX model's parameters once its order is found. Recursive Least Squares (RLS) is one of the widely used Subsystem Relations Variables and units Armature where ? = A , … , D% , A , … , DF , estimates of model parameters λ 1 and λ 2 = forgetting factor variables.

y(t) = input and output signal respectively
Because the system's degradations are relatively slower time-varying dynamics compared to other dynamics found in normal operation condition, λ 2 was set to 1 while λ 1 was set at around 0.95 and 0.99. These parameters introduced a bias in covariance matrix F which enables a selective re-convergence of parameter estimates when the system slowly degrades while tolerating high-frequency variations such as noise. Other algorithms applicable for an ARX model are Instrumental Variable with Auxiliary Model (IVAM), Output Error with Fixed Compensator (OEFC) and Output Error with Adjustable Compensator (OEAC). These algorithms are not explained in detail in this paper due to their similarity with the RLS. Interested readers are invited to look up reference [9] instead. All of these algorithms were used to generate a pool of models from which the one that has the best validation results was selected. Lastly, the system identified model needs to be validated. Provided that the model is correctly identified, the residual signal which is the remainder of system and model's output is expected to be purely a real-world random noise spectrum having a zero mean with low correlation to the model's output. The correlation could indicate that the experiment is poorly designed which causes divergence and random fluctuations of the model's parameters. Based on the aforementioned hypotheses, the tests were expressed as Eq. (9) and (10).
where M is the total count of events when the residual signal crosses zero value, ZC is the zero count criteria and NCC is the normalized cross-correlation spectrum. The acceptance criteria for the tests was taken to be at 3% significance level Gaussian hypothesis testing, which gives System identification and process control were done online through an 8-bit analog-to-digital (ADC) and digital-to-analog (DAC) converters. A preliminary test indicated that the system's cutoff frequency is 5 Hz, hence, to fulfill the Nyquist criteria, the sampling frequency was set to be 200 Hz.

Adaptive FLC
Based on the identified model, an adaptive FLC was constructed to meet the performance requirements, i.e., good corrective response to voltage instability, low maximum overshoot, low steady-state error, small rise and settling time. Because the actuator is a Single Input Single Output (SISO) type system, the fuzzy inference approach was chosen to be Mamdani. To get a responsive controller action with a fast rise time, the Middle of Maximum (MOM) defuzzification method was selected. The bell-shaped curve was chosen as the fuzzification membership function as can be seen in Fig. 4a. Due to the linearity of the system, the defuzzification membership function was also taken to be linear as shown in Fig 4b. Given that the controller's input is y(t) -setpoint, the control rules were constructed as follows: (a) bell-shaped input membership function.
(b) control signal membership function. 1. If e(t) is positive large, rotor is to rotate clockwise quickly 2. If e(t) is positive small, rotor is to rotate clockwise slowly 3. If e(t) is negative large, rotor is to rotate counterclockwise quickly 4. If e(t) is negative small, rotor is to rotate counterclockwise slowly 5. If e(t) is zero, rotor stops rotating The input domain between x 1 and x 2 in Fig 4a is the range of input values to control the rotor's rotation carefully. When receiving input values in this region, the rotor rotates slower in proportion to the fuzzy input to meet the given setpoint. It is therefore adjusted to cover the range of expected exogenous disturbances during steady-state operations. This disturbance comes from the fluctuation in the grid's voltage which magnitude is between -10% and +5% out of 220 V. Therefore x 1 was set to -26 which corresponds to -22 V while x 2 was set to 13 which corresponds to 11 V. The control actions are adapted to compensate for actuator's degradations in order to maintain the system's performance. Thus the values are 1 ≤ y 1 < 100 and 155 < y 2 ≤ 254. The control value ranging from 100 and 155 were not used since our preliminary test revealed these values could not actuate the motor. As the system degrades, y 1 is reduced while y 2 is increased by a factor of y. Selection of y was made to minimize the differences between the desired system's performance and the ARX model's performance as Eq. (12).  (12) where s(j) is the desired transient response performance while m(j) is the ARX model's predicted performance. Four performance characteristics were considered in this study, namely rise time, settling time, maximum overshoot and steady-state error.

Fault modes and diagnostics
In this study, failure was defined as operation events which transient responses do not satisfy the performance requirements. Possible failure modes as listed in Table 2 were identified from references [5][10] [11] and from analyzing the system's technical specification. Destructive or potentially fault-inducing tests were not done on the actuator system to preserve its condition. System failures were simulated in a MATLAB computational environment by modeling the system in Simulink. Since the actuator's performance is maintained by the FLC, failures were simulated by setting the controller's response y 1 and y 2 to their maximum responsive level and sequentially altering the values of the key variables in Table 2 until the time when the transient response characteristics exceed the performance specifications. The variables' value was then noted as the failure limit for such specific failure mode.
Root locus analysis was used to diagnose the modes of failure and measure its severity [12]. Each failure mode is expected to produce a unique pattern in the system's natural frequency and damping factor, which is reflected through the change in the model's pole-zero pairs within the central unit circle of the z complex plane. These polezero patterns were observed by simulating the faults with increasing severities. The results were fed to train a random forest classifier to enable a real-time pattern recognition and fault diagnosis. The trained classifier was then tested against another set of failure data to check its classification performance.

Fault growth modeling and prognostics
When a fault mode is diagnosed, a prediction of its evolution to estimate the Remaining Useful Life (RUL) of the actuator system is desireable to perform a timely maintenance schedule. The fault growth was modeled analytically to obtain a surrogate, probabilistic-based model. This approach was pursued instead of having detailed physics-based degradation models to reduce computational time. Fast computational predictions allow for a real-time prognostic application and real-time condition-based maintenance program. Furthermore, it averts the stacking of uncertainties from an overmodeling situation when multiple physics-based degradation models are used. The underlying principle for fault growth was adopted from the Eyring model which is based on quantum mechanics principles [12]. It recalls that environmental stresses generate the activation energy needed to cross an energy barrier at the quantum level to initiate a reaction which creates physical fault such as crack, deformation or oxidation. The environmental stress v(k) is characterized into deterministic and stochastic stresses as follows [7]: where ( ) is the stochastic environmental condition which inflict stresses upon the actuator system, while is the deterministic multiplicative factor to ( ) due to a more stringent control actions when faults are present. The latter is further classified into bounded and unbounded load variations which probability functions are given in equation (14) and equation (15) respectively. The variable n in equation (15) is the number of discrete states of ω. Bounded ω comes from variation of environmental condition where the actuator system is installed, i.e. temperature, pressure, humidity. Unbounded ω originates from freely-oscillating stressors with a uniform distribution over their normalized discrete states n. Several examples of stressors which fall into this category are the accelerators operation profile, demanded beam intensity, and electronic disturbances.
The armature and bearing are insulated by-design thereby allowing only unbounded stress variations to affect fault growths in these subsystems. The shaft and transformer, on the contrary, are exposed to the atmosphere which causes both the bounded and unbounded stress variations to play a role in fault generations. Since the faults described in Table 2 are naturally irreversible, and because they are induced stochastically, the fault evolution satisfies a monotonically increasing function as illustrated in Fig. 5   It is expected that there are aleatoric uncertainties in the predicted damage level should a load above its minimum energy barrier is imposed. A noise term in the Markov Chain model as shown in Fig. 6, which is a similar practice to the reference studies [13,14], is then introduced to incorporated this uncertainty in the model. In contrast, the damage level is expected to remain the same when the load is below the minimum energy barrier. However in order to accommodate incipient faults and epistemic uncertainties in the fault growth physics, a relatively small probability value C was additionally introduced which may lead to an increase in the DS. For this reason, the fault evolution mechanism was then mathematically expressed as Eq.  The failure is measured in terms of the FLC's capability to maintain the system's performance. Hence, it is desirable to map the relationship between the internal degradation state and the controller's capability to compensate it. This fault compensation capability is reflected by the term Adaptive Space (AS), which is the remaining space between 0 and y 1 or 255 and y 2 in Fig 4b. Thus AS in this digital FLC setting has an integer value of 100 when the system is healthy, and 1 when the system fails. The objective of this prognostics module is, therefore, to continuously predict when AS' value drops to 1. However, it was not hypothesized as a straightforward task. Because the system identification method used as the basis of the adaptive FLC is a statistical method, it may not produce a perfect model. There was incompleteness throughout the finite observation domain deemed negligible within the boundaries of a 3% Gaussian significance level. As a consequence, the variable AS as a function of DS was expected to be noisy. This phenomenon where DS, a stochastic state satisfying the Markov property, generates AS, another stochastic variable of observational interest, can be conveniently portrayed as a Hidden Markov Model (HMM) shown in Fig. 7. The DS transition probability p i,j was given in equation (19)   Simulations with the MATLAB Simulink model were carried out to obtain p i,j and q k,l . Several variables were not available due to the absence of destructive experiments and were assumed in a reasonable manner, i.e., the activation energy threshold for each of the fault modes \]^/ , and density functions. Afterwards, a normalized bounded and a normalized unbounded environmental conditions based on equation (14) and equation (15) were generated. The fault indicators in Table 2 were then affected according to equation (19) and equation Error! Reference source not found.. These indicators were normalized in a piecewise manner into 100 discrete DS' states. The indicators were fed to the closed-loop system model until the system failed. This run-to-failure data was then used to estimate p i,j and q k,l by using the Baum-Welch's algorithm [15].
After the HMM's parameters were identified, another run-to-fail data sequence was generated where the DS and failure mode were not revealed. The fault diagnosis module identified the active failure mode from which the prognostics module continuously updated the predictions of future AS probabilities. The prognostics module used the HMM as a prior model for DS evolution and corrected it with the observed AS. This approach was made numerically by using a particle filter algorithm. A set of 1000 particles was employed at each time step t to estimate future degradations ( + ), the corresponding AS ( + ) and the RUL ( | A ). The prediction result was then compared with the true DS to investigate its accuracy.

Results
The AIC and FPE values for an ARX model with increasing model order are shown in Fig. 8. AIC shows a continually decreasing trend. However since FPE shows a minimum at the 4 th order model, and the further increase of model order does not reduce AIC significantly, the model's order was decided to be 4.  Table 3. The B(z -1 ) polynomial's parameters have low values because the input signal is converted from digital to analog. The model satisfied the validation tests. The multistep response characteristics of the FLC is shown in Fig. 9. The figure shows that the FLC performs satisfactorily to changes in the AC grid voltage.  Table 4. Headings of the matrix are set in numbers representing the fault modes and severity classes. The physical interpretation of these classes are shown in Table 5. The performance matrix approached an ideal classifier which has diagonal matrix values of 1. It expresses the random forest's classification accuracy of 98.06%.
Results of fault generation simulation and the ensuing AS are visualized in Fig 10. The exponentially increasing DS was caused by the fault acceleration effect due to the increasing controller's action as the system degrades. Furthermore, it is observed that the AS becomes noisier along with the DS. This might be caused by decreasing Signal-to-Noise Ratio (SNR). As the system degrades, its output level decreases while the measurement noise remains the same. This phenomenon creates variations in the ARX model parameter's which were propagated to the calculated AS.
Results of prognostics are given in Fig 11. True AS and the estimated AS are plotted together for a comparison. The predictions are updated at every timestep. Fig. 11 shows the results taken at the beginning, middle and end of the system lifetime period. The significance level on predicted AS was taken to be 3%. Predictions made at t=1 describes the HMM's forecast performance given an initial calibration only. Although it gave a good RUL estimate approaching the true RUL at t=514, the uncertainty band was relatively wide. A hybrid prognosis combining both HMM's prior likelihood and online monitoring information gave a more accurate RUL estimate with reduced uncertainty as shown at t=200 and t=400. Further reduction of uncertainty was observed as more AS data became available. Fig 12 compares the performance characteristics of the system when this methodology was applied and when it was not. The comparison was started from when degradation was first observed. Although the steady-state error of the static FLC system started to exceed its threshold at t+348, the rise time and settling time had already deviated earlier at t=1. Therefore, in this case, it failed at t=1. Increasing the controller's action inevitably increased the system's maximum overshoot. However, it is still below the prescribed threshold of 1%. The figures showed that the adaptive FLC was successful in maintaining the required system's performance despite degradations. Its lifetime was consequently extended for an additional 514 cycles.

Conclusion
This study developed an integrated approach to control the performance of the anode voltage regulator system at an ion implantation accelerator throughout its lifetime despite degradations. The approach relied on the modeling the system empirically through a linear system identification method. The obtained model was a fourth order ARX model validated with a 3% significance level Gaussian hypothesis testing. A fault diagnostics module based on the random forest algorithm was developed to classify the fault mode and severity. It has a classification accuracy of 98.06%. An adaptive FLC was created to compensate for faults and maintain system's performance. Results show that the performance was satisfactorily achieved for an additional 514 cycles thereby prolonging its useful life. The progression of faults and its relation to the FLC's adaptive capability was characterized by a Hidden Markov Model. The trained model provided a prior likelihood to the system's RUL on which it was updated with the Bayesian method and a particle filter method. Results implied that this approach was successful in predicting the system-specific RUL with a 97% confidence level.