Gearbox Fault Prediction of Wind Turbine Based on Improved NEST Model

. This paper studies a fault prediction method for wind turbine gearbox. It uses grey relation analysis to get modeling variables, and makes sample data getting good integrity and redundancy by similarity analysis. Thus it gets the reduced process memory matrix, and trains the improved nonlinear state estimation (NEST) model. When the gearbox fails, the model residual will exceed the threshold value, and the model will give an early warning. Combined with the actual operation data of a wind turbine, the effectiveness and accuracy of the improved model are verified.


Introduction
The gearbox as the key component of wind turbines can transfer the moment generated by wind energy to the generator and convert into electrical energy. It can also change the low speed of the main shaft to the high speed required by the generator. The maintenance process of gearbox is complex. The maintenance process needs a lot of equipment, and the weather for maintenance is demanding. Comparing with the electrical system and main control system with the high failure rate, the gearbox has the longest failure time and the highest economic loss. Therefore, this paper studies the fault prediction of gearbox. Literature [1] makes statistics of various faults of wind turbine, and the result shows that gearbox is the component which causes longest downtime and most serious failure. Literature [2][3][4] summarize the common condition monitoring methods of wind turbine. Literature [5] uses the nonlinear state estimation model to predict gearbox temperature. But it selects observation variables relying on experience, and it lacks theoretical basis. At the same time, the process memory matrix is not optimized for sample, and which affects the timeliness of model.
The bearing of gearbox is the carrier of main shaft, and which directly reflects the meshing state of main shaft and gear. Because of the particularity of bearing work and the complexity of load, bearing becomes a component with high failure rate. This paper uses NEST model to monitor the gearbox bearing temperature. It is aimed at the deficiency of selecting observation vector based on experience, and it uses the grey correlation analysis method to provide theoretical basis. In order to ensure the integrity and small redundancy of data, this paper uses the similarity analysis method to construct the process memory matrix which can realize sample optimization. When the working state of gearbox is abnormal, the residual of the prediction model and mean square root of residual will increase. This paper uses Statistical analysis for residuals by moving window residuals method and compares with the setting threshold value, which can realize fault early warning. This paper uses SCADA data of a wind turbine to simulate and analysis, the results show that the improved NEST model can effectively predict the bearing temperature and has good timeliness.

Selection of modeling variables
This paper is based on SCADA data of a wind turbine. The rated power of fan is 2000 kW. The rated frequency is 50Hz. The rated wind speed is 11.5 m/s. The wind turbine SCADA data records average value of each parameter every 10 minutes. This paper selects 5050 sets of effective data to participate in modeling. In order to establish the prediction model of gearbox bearing temperature, this paper selects the variables based on experience which involved in modeling, including ambient temperature, gearbox oil temperature, average wind speed and active power. Next, this paper analyses feasibility of the four parameters involved in modeling by theoretical.
Gearbox oil temperature: the oil and bearing are both in the gearbox, and the gearbox has heater and cooler. When the oil temperature is high, the cooler will work. When the temperature is low, the heater will work. The bearing temperature is affected by the heater and cooler. Ambient temperature: there's a clear temperature difference between day and night in local, and the bearing temperature is greatly affected by ambient temperature.
Active power: the higher the active power, the higher the unit load and the higher the gearbox bearing temperature.
Average wind speed: the higher the wind speed, the faster the gearbox rotates and the higher the bearing temperature.

Similarity analysis
Constructing observation vector of gearbox bearing temperature X i : In formula (1), X 1 is ambient temperature; X 2 is the gearbox oil temperature; X 3 is the average wind speed; X 4 is the active power.
Constructing process memory matrix U: m is the number of observation vectors. m=5050. n is the number of modeling variables. n=4.
The SCADA system of wind turbine contains a large number of similar data, which leads to the increase of calculation amount and calculation time. Similarity analysis can effectively solve this problem. Firstly, the data of gearbox in normal working state is selected as observation vector, then it calculates the similarity between any two groups of observation vectors. Finally, the similarity matrix R is obtained. Figure 1 is the schematic diagram. The closer the similarity value is to 1, the more the two observation vectors contain the same information. This paper eliminates a group of samples to optimize and simplifies the sample, and ensures the integrity of workspace information. The similarity function is: In formula (2), ,2, … ,n; m is the dimension of data;  is the normalized parameter; R ij is the similarity value between i group and j group of data.
Calculating similarity normalization parameters  : In formula (3), D i is the data set of parameter i. The simplification degree of reduced process memory matrix depends on the selection of similarity function threshold. There are 5050 groups of effective historical data in this wind turbine SCADA system. The relationship between number of remaining samples and different threshold is shown in Figure 2.  For the requirements of high integrity and low redundancy, when the curve slope is the largest, it means that the increment of redundancy information in this interval is the largest. In order to get best effect, the threshold is 0.97 in this paper. 5050 training sample sets are optimized by similarity function, and 3870 optimized sample sets are obtained.
S is the number of observation vectors in reduced process matrix, s=3870.

Grey correlation analysis
In order to reduce the information dimension and improve the prediction accuracy, this paper selects the operation parameters which are sensitive to bearing fault, and removes the parameters which are less sensitive. Grey relational analysis (GRA) can achieve this function better. It can calculate the consistency of changing trend of factors to achieve variable selection. 32 relevant parameters can be reduced to 8 parameters which related to gearbox bearing temperature, including environment temperature, gearbox oil temperature, drive chain swing amplitude, average speed of generator, cabin temperature, average wind speed, active power, average speed of control generator and power generation. Through formula correlation, this paper obtains the Grey correlation degree value of selected variables to the bearing temperature, and the results are shown in Table 2. Among them, R 1 is the grey correlation value of average wind speed; R 2 is the grey correlation value of average speed of control generator; R 3 is grey correlation value of average speed of generator; R 4 is the grey correlation value of power generation; R 5 is the grey correlation value of active power; R 6 is the grey correlation value of ambient temperature; R 7 is the grey correlation value of gearbox oil temperature; R 8 is the grey correlation value of engine room temperature.
According to table 2, it can be seen that the top four items are gearbox oil temperature, average wind speed, ambient temperature and active power. It is completely consistent with the observation vector selected by experience in Chapter 1.1. It provides a theoretical basis for the selection of modeling variables and enhances the credibility of the model.

Nonlinear state estimation modeling
It takes the new observation vector of gearbox working state X obs as the input of NEST model and outputs the corresponding prediction vector X ets .
Firstly, the NEST model will generate an s-dimension weight vector corresponded input: Finding the partial derivative of ( ) S w to k w :  (10): In formula (11), is the similarity calculation based on Euclidean Distance, and the estimated output is:

Statistical method of residual
Because of the random change of wind speed and the time-varying influence of fan operating conditions, various uncertain factors will affect the residual change. It leads to some isolated abnormal points and false alarm. This paper uses the analysis method of residual mean value of sliding window, which can effectively weaken the influence of isolated points and continuously reflect the trend of residual change. It can avoid false alarm and accurately give early warning signals. Setting the width of sliding window to N. Processing the mean value of N consecutive residuals: During the normal operation of the equipment, the residual mean value sequence is obtained by sliding window calculation, and the maximum absolute value obtained by statistics is E x .
Setting the residual threshold E v of NEST model: In formula (15), the setting of K is generally determined according to the experience of staff. The flow chart of gearbox bearing temperature monitoring model based on NEST is shown in Figure 3. This paper sets the threshold τ to 0.97. The reduced process matrix is constructed by the remaining 3870 groups of data, which trains NEST model 1. NEST model 2 is trained by all 5050 groups of data as a comparison. 800 groups of data were selected as the test set , and it used to test the validity of the model. The predicted residual of NEST model recording the bearing temperature of gearbox: x is the input temperature variable. ˆb x is the predicted temperature variable. The prediction results of model 1 and model 2 are shown in Figure 4 and figure 5. The average relative uses MRE to represent. It can be seen from Figure 4 that the MRE of model 1 is 1.42%, and the operation time of the model is 4.9 seconds. It can be seen from Figure 5 that the MRE of model 2 is 1.28%, and the operation time of the model is 9.6 seconds.
The comparison shows that the MRE of the two models is very low. In other words, the prediction accuracy of two models is very high. Two models can track the change trend of gearbox bearing temperature well. The prediction accuracy of two models is the same order of magnitude, and the difference isn't big. But in terms of operation time, the time consumption of model 1 is only half of model 2. Therefore, the trained NEST model constructed by reduced process matrix and similarity analysis not only has high accuracy, but also shortens the operation time. At the same time, it reduces the operation amount and effectively improves the timeliness of the model. Comparing with different models, the least square support vector machine (LSSVM) is used to train the model 3, and the parameters are optimized by the bird swarm optimization (PSO) algorithm. The model 3 is established by six variables. It includes gearbox bearing temperature at the previous time, gearbox bearing temperature at the previous two times, environment temperature, gearbox oil temperature, average wind speed, active power. It takes the same test data for testing, and the predicted results is shown in Figure 6. The MRE of model 3 is 1.85%. Comparing three models, it can be seen the accuracy of NEST model is higher than LSSVM model, and the variables involved in modeling NEST model are less. The NEST model is relatively simple and reliable. This paper verifies the accuracy and superiority of NEST model.

Residual analysis
According to the temperature residual obtained by inputting the training samples into NEST model, the residual threshold can be determined after statistics. The mean value of residual is 0.02. This paper defines k1 as 1.
The threshold Ev is defined as 0.02. This paper carries out the residual analysis by sliding window for model 1, and the result is shown in Figure 7.  It can be seen from Figure 7 that the residual value of points 23 and 409 in the figure exceeds the threshold value of 0.02, and a fault warning will be carried out. By querying the SCADA records and operation logs, the unit did have faults and carried out alarm maintenance at that time. At 3:05 on July 8 and 19:27 on July 11, the gearbox oil high temperature fault had occurred, which corresponding to points 20 and 406 in the figure respectively.

Prediction of bearing temperature trend
In order to further verify the validity of the model, this paper adds temperature offset to gearbox bearing temperature, which is used to simulate the fault of gearbox bearing temperature abnormality. In 800 groups of testing data samples, starting from point 200, it adds 0.01 ℃ cumulative error. The change of residual mean value sequence and standard mean value sequence are shown in Figure 8 and Figure 9. It can be seen from the figure that the residual mean value sequence and the residual standard deviation sequence are gradually increasing, both of which exceed the corresponding thresholds and carry out fault early warning. Therefore, the residual mean value and the standard deviation can quickly reflect the fault status of the gearbox, and the NEST fault prediction model is effective and reliable.

Conclusion
This paper studies a fault prediction method of gearbox bearing. It uses grey relation analysis to get modeling variables, and makes sample data getting good integrity and redundancy by similarity analysis. Thus it gets the reduced process memory matrix, and trains the improved NEST model which can identify fault symptoms. Comparing with different models, the effectiveness and accuracy of the improved model are verified. The improved NEST model can shorten the operation time and reduce the operation amount, and it has high prediction accuracy. In actual needs of the project, it can effectively avoid equipment damage, and improve the economy and safety of wind turbine.