A Novel Vehicle Gearbox Fault Diagnosis Approach Based on Collective Anomaly Detection

. Targeting the problem of gearbox fault diagnosis, we proposed a novel semi-supervised approach based on collective anomaly detection. Based on the limited sample data, the principle of the approach is to detect whether a test dataset contains abnormal patterns by using data distribution as the metric. The sequence obeying unexpected distribution will be identified as collective anomaly, which may be generated by fault patterns. The approach consists of three steps. First, the mixture of multivariate Gaussian distribution is used to fit the structure of sample dataset and test dataset. Then, based on maximum likelihood estimate algorithm, we hope to search the optimal parameters which can fit the data distribution with the highest degree. Finally, the fixed point iteration algorithm is used to solve likelihood estimate functions. Experimental results demonstrate that the proposed approach can be used to find fault patterns of gearbox without the prior knowledge of their generated mechanisms.


Introduction
With the development of artificial intelligence technology, the research direction of fault diagnosis has changed to build an intelligent diagnosis system based on data driven and intelligent computing technologies. Most researches on gearbox fault diagnosis are based on the analysis of vibration signals. The data characteristics of vibration signal can be divided into two categories [1]: time domain and frequency domain, and the features in each of which are complex, changeable, and interactive. And they can be easily influenced by other vibration sources during the driving operation of the vehicle [2].
Currently, most research on fault diagnosis focus on semi-supervised anomaly detection approach [3], which detects unknown fault patterns according to a limited size of sample data. One of the common limitations of these existing approach is that they can hardly detect anomalies which are generated in the same background as normal data [4]. That is, how to accurately and sensitively detect the real time failures during the continuous operation of the gearbox. Although the data that makes up these realtime faults seems normal on their own, it is not normal for them to appear together as a set. If the detection only checks the single data of unmarked test dataset one by one, it is difficult to find such abnormal patterns. On the contrary, it may mistakenly identify some normal data falling in the low probability density range as abnormal.
To solve the problems mentioned above, we propose a semi-supervised collective anomaly detection approach based on data distribution similarity metric and apply it in the fault diagnosis of vehicle gearbox. This algorithm consists of three parts: 1) mixture of multivariate Gaussian distributions is used to fit the distributions of sample dataset and test dataset. 2) Based on MLE (maximum likelihood estimate) algorithm, search the optimal parameters which can fit the data distribution. 3) The fixed point iteration algorithm is used to solve likelihood estimate functions. When the distribution of a data pattern is significantly different from the sample data, it can be identified as a collective anomaly that may be generated by gearbox fault. This paper is organized as follows. In section 2, we introduce the construction of the semi-supervised detection model in detail, and give the definition of the mathematical theory involved in it. Then, based on multivariate distribution fitting, maximum likelihood estimation and fixed point iteration, the solution process of the model is shown in detail in section 3. In section 4, we verify our detection approach by testing on actual operation data of gearbox in an automobile factory. Lastly, we summarize the research results in section 5.

Detection framework and related concept 2.1 Detection Framework
The framework of the proposed detection approach are conducted in two processes. Firstly, for a labeled normal sample dataset S s , its data distribution can be denoted as Eq.1. The parameter f s represents the data distribution function, represents parameters of the function.

|
Secondly, for an unlabeled test dataset S t which may (2) Based on our proposed collective anomaly detection approach, we hope to examine whether an unlabeled test contains fault patterns by similarity measures based on data distribution. For the realization of the goal, there are three parameters as , , and need to be estimated.

Collective Anomaly
The collective anomaly is a set of related data instances. When they appear together in a certain pattern, their overall behavior attribute will deviate significantly from the whole dataset, but the individual observation in the set may not be an anomaly.

Fixed Point Iteration Algorithm
Fixed-point iteration is a successive approximation method with which represent the implicit equation by a set of explicit equations. In other word, the approximate value of the root is repeatedly corrected using an equation to make it convergence [5]. As an effective method for solving highly nonlinear numerical problems [6], due to its excellent mathematical properties and mature theorem proofs, fixed point iteration has been widely used for searching equation solution in many fields of engineering mathematics. Main concepts of the algorithm are shown as follows: Definition 1. Suppose that X is a subset of R n . If there is a specific f(x)∈X corresponding to every point x in subset X, f is a self-mapping of X, denoted as f: X→X.
Definition 2. Suppose that X is a nonempty set and f: X→X is its self-mapping. If there is a x*∈X satisfy f(x*) =x*, x* is considered to be a precise fixed point of f. Definition 3. Suppose that (X, ρ) is a metric space and T: X→X is a mapping. If there is an L∈ [0, 1) that enables ρ(T(x), T(y))≤Lρ(x, y) for any x, y∈X, T is considered to be the contraction mapping on X. Theorem 1. The Banach fixed-point theorem is also known as the contraction mapping theorem. Suppose that (X, ρ) is a nonempty perfect metric space and T: X→X is a contraction mapping, T has the only fixed point in X. The Banach fixed-point theorem determines the existence and uniqueness of the solution to equation T(x) =x.
Theorem 2. For any contraction mapping T: X→X, suppose that X is a bounded discrete nonempty set, which means there is a≤x≤b for any x∈X. If the following two conditions are satisfied: (1) there is a≤T(x)≤b for any x∈X and (2) there is a positive constant L<1 that enables | | ≪ | | for any x, y∈X, T has the only fixed point x* within the bounded discrete nonempty set. Definition 4. Approximate fixed point: suppose that ε is any positive constant and | |is the modulus of the vector in n-dimensional Euclidean space Rn for the contraction mapping T: X→X. If there is a point x* satisfying| * * | , x* is an approximate fixed point.
The existence of a precise fixed point can be proven in many conditions, but the computation overhead is always too expensive to find it, besides, for the convenience of calculation, the precise value of fixed point is usually need to be approximated. Just like, the precise solution to x 2 −2=0 is infinite which must be approximated to participate in the later calculation. Therefore, we introduced the concept of approximate fixed-point into our algorithm to solve this kind of problem. If the limited numerical value of precise fixed point was not found when the search reached the preset number of iterations, the approximate fixed point with the highest precision during the iteration will be taken as the result.

Algorithm construction
Our proposed detection approach consists of three part. Firstly, finite mixtures of multivariate Gaussian distributions are used to represent the distribution of labeled normal sample dataset (as shown in Eq.1) and unlabeled test dataset (as shown in Eq.2). Then, the MLE (maximum likelihood estimate) algorithm to estimate the parameters of the mixture distribution functions. Finally, fixed point iteration algorithm is carried out to solve the maximum likelihood estimate functions.

Mixture of Multivariate Gaussian Distributions
The multivariate mixture of Gaussian is adopted to represent the data distribution. In an n-dimensional Euclidean space R n , the mixture multivariate Gaussian distributions of K components is defined as Eq.3 and Eq.4.
| ; ∑ represents the probability density function for the k Gaussian distribution with mean . The parameter ∑ is covariance matrix, which is symmetric and positive semi-definite, and the |∑ | denotes matrix determinant. The is the mixing coefficient for the k Gaussian distribution, which satisfy 0 and ∑ 1 .

Maximum Likelihood Estimate (MLE) algorithm
Maximum likelihood estimation algorithm is used in the case where the data distribution function is known but the function parameters are known. For a continuous sequence S, its probability density function is | . If S 1 =(X 1 , X 2 … X n ) is a sample of S, and the probability density function | ∏ ; is known. If the point , , ⋯ , ∈ , the probability that any random point in S falls on the adjacent side of Y can be approximately expressed as ∏ ; . The likelihood function of sequence S can be calculated as Eq.7. , , ⋯ , ; ∏ ; (7) The method of MLE algorithm is to find the parameter which can make the probability ∏ ; reach maximum value, it can be defined as Eq.8. , , ⋯ , ; , , ⋯ , ; (8) where is associated with the selected point Y, the , , ⋯ , represents the maximum likelihood estimation of the parameter θ of the probability density function | . Due to the function is the increasing function of , both of them will reach their maximum value at the same point. Hence, it is usually to search the extreme point of to replace , which can not only convert the multiplication to addition but also avoid the problem of floating point overflow. Thus, based on the Eq.4 and Eq.6, the likelihood function of labeled normal sample dataset can be calculated as Eq.9, and the likelihood function of unlabeled test dataset can be calculated as Eq.10. , , ⋯ , ; | (9) , , ⋯ , ; | (10)

Fixed point iteration algorithm
For Eq.9 and Eq.10, the extreme value will appear at the inflection point of the function, it means if a point , , ⋯ , ∈ satisfy the equation 0 , the , , ⋯ , will represents the maximum likelihood estimation of the parameter θ. Based on the definition of fixed point iteration in former section, it can be used to search the maximum likelihood estimation of the parameter θ of the probability density function | , the detailed steps are as follows.
(1) Construct the fixed-point iteration. The problem of searching function extreme value can be converted to seek the point that satisfying derivative 0. (4) Before the process in step 3 reaching the max  number  of  iterations,  if  there  is  a  solution , , ⋯ , satisfying , it will be treated as the precise fixed point. According to the Definition 4, if the precise fixed point cannot be found, the point that satisfied at the greatest extent will be taken as the approximate fixed point.

Experiment and analysis
The WLY·CVT25 stepless gearbox newly developed by an automobile manufacturer is selected as the experimental object, detailed product information is shown in Figure 2 (a). The experimental data is derived from the vibration signal collected by sensors under different working conditions of the gearbox, and the acquisition frequency is once every 5 seconds.  Figure 2(a). The detail information of WLY·CVT25 Figure 2(b). The test component of WLY·CVT25

Experimental dataset
The experimental dataset used in this section consists of three parts: normal sample dataset S s , abnormal dataset S a , and unknown test dataset S t . Normal sample dataset S s : To avoid the data fluctuation caused by too long continuous working of one gearbox, three qualified gearboxes of the same model are selected to work continuously for 24 hours under the same load condition. The vibration datasets generated by the three gearboxes will be set as the normal dataset S normal . The data collected within 10 independent hours are randomly selected from the normal dataset S normal to form the normal sample dataset S s . Abnormal dataset S a : As shown in Figure 2 (b), the driving shaft, driven shaft, driving gear, and driven gear are four most important components of the gearbox. Thus, we select them as the target of fault diagnosis. In the experiment, these four qualified parts will be replaced with cracked parts one by one. For each cracked part, under the same load condition, we collect 6 hours of vibration data as the abnormal dataset S a . In addition, compared with the cracked parts, the difference between the vibration signal generated by the worn old parts and the normal signal is not so obvious. In order to test the sensitivity of our algorithm, we also use the worn old parts to replace the qualified parts one by one, and collect the vibration data for 6 hours under the same load condition. To clearly represent these abnormal datasets, we label them according to Table 1 to avoid unnecessary troubles. Crack Unknown test dataset S t : In the normal dataset, different kinds of abnormal dataset S a are added one by one to form eight kinds of unknown test datasets S t . Based on the previous assumption that the fault pattern only accounts for a small proportion of the entire dataset, the proportion of abnormal dataset is controlled below 5% of the normal dataset. The data of no more than 3 hours size are randomly select from all kinds of abnormal datasets to add to the normal dataset. Details are shown in Table 2.

Experimental results
For the sample dataset S s and abnormal datasets S a , data distributions are fitted based on our proposed algorithm. The parameters of the distribution function of the sample dataset will continue to participate in the subsequent analysis. However, the proportion and distribution function parameters of various abnormal datasets will be used as the real labels to compare with the detection results, rather than directly participate in the analysis on unknown test datasets S t . The parameters of the probability density function of sample dataset is . ; . . The detection results S t1~St8 are shown in Table 3, the proportion of collective anomaly detected by our proposed algorithm and the parameters of its probability density function are compared with its real labels. All the results were calculated to three decimal places.  From the experimental results in Table 3, it can be found that our proposed algorithm has reached more than 90% agreement in all detection indexes when detecting the fault pattern of each unknown test dataset, especially when identifying worn and old parts, it still shows high sensitivity, which can prove the effectiveness of our algorithm. In addition, based on the detection results, we can also draw the following conclusions: 1) the obvious degree of bearing fault is greater than that of gear. 2) The obvious degree of driving component fault is greater than that of driven component. 3) The obvious degree of crack component fault is greater than that of worn old component.

Conclusions
In this paper, we have presented a semi-supervised vehicle gearbox fault diagnosis approach based on collective anomaly detection. In the proposed algorithm, firstly the mixed Gaussian distribution was used to fit the vibration signal of the gearbox. Then, the parameter variation of the probability density function of the data distribution was taken as the measurement standard. Finally, based on the known normal sample dataset, the maximum likelihood method and fixed point iteration method were used to fit the distribution of the unknown test dataset. According to the fitting results of data distribution, data patterns that are subject to unknown or unexpected distributions will be identified as collective anomalies which may be generated by faults. For creditability verification, we have made the detection experiment on eight kinds of test datasets in which including different kinds of fault patterns. The experimental results show that, when detecting each test dataset, the proposed algorithm has achieved a fit of more than 90% on each parameter of the failure data distribution function, it still shows high sensitivity on identifying worn old parts. Therefore, it verifies that our proposed detection approach can be used to find fault patterns of vehicle gearbox without the prior knowledge of their generated mechanisms. Given the generality of the framework, it should be possible to find future applications also on other fields of science and technology.