Fault diagnosis of electric submersible pump tubing string leakage

equally to this work. Abstract. With the rapid development of the offshore oil industry, electric submersible pumps have become more and more important. They are the main pumping equipment in oil well production and have huge advantages in terms of displacement and production costs. Due to the complex structure of the electric submersible pump, the bad working environment will cause failures. The failure of tubing string leakage is a common failure in oilfields; tubing string leakage of the electric submersible pump will reduce oil production. In order to reduce the economic loss of oil well production.This paper uses PCA and Mahalanobis distance to make the tubing Fault diagnosis of leakage. The feasibility of the algorithm is verified through experiments. The result shows that it can diagnose the failure time of pipe string leakage in advance and hence help us to reduce the maintenance cost of offshore oilfields.


Introduction
Electric submersible centrifugal pump (electric submersible pump) is a kind of multi-stage centrifugal pump that works in the well. It uses tubing to lower the centrifugal pump and the submersible motor into the well. The motor drives the multi-stage centrifugal pump to rotate to generate centrifugal force, which reduces the crude oil in the well oil extraction equipment lifted to the ground. The electric submersible pump oil production process is widely used in high water cut wells, non-selfblown high-yield wells and offshore oil fields due to its simple equipment structure, large displacement, high efficiency and high degree of automation [1]. Electric submersible pumps can lift crude oil to the surface under conditions of higher temperature and greater depth, but electric submersible pump wells are prone to failures during the production process, which will cause production interruptions and cause serious economic losses [2] .
Under normal circumstances, once the electric submersible pump fails, it needs to perform workover operations, which costs a high maintenance fee, which brings a heavy economic burden to the oil field enterprise, and also affects the normal production of the oil field. It mainly relies on manual calculations and expert experience to carry out post-mortem evaluation and forecasting. There are problems such as heavy workload, poor timeliness and low forecast accuracy, which cannot meet the needs of modern oilfield production. Therefore, it is necessary to conduct in-depth research on the fault diagnosis technology of electric submersible pumps, apply new methods and new technologies to fault diagnosis of electric submersible pumps, ensure that the electric submersible pumps work better, and create better economic benefits for enterprises.
At present, Peng Long et al [3]proposes a fault diagnosis of the broken shaft of a submersible electric pump based on principal component analysis, and concludes that the principal component analysis algorithm has the potential to be used to detect dynamic changes in data. Matthews et al [4]. used random forest to classify and identify 7 types of oil well faults with an accuracy of 94%. Lu Li et al [5] used electric submersible pump wellhead electrical parameters and production parameters to extract the parameter characteristics of different faults through digital signal processing, and analyzed 9 types of faults. Data analysis is carried out for typical fault conditions. Zhang Panlong et al [6] proposed an algorithm based on feature extraction to diagnose electric submersible pump system faults, experiments show that the algorithm can achieve the expected goal well. This paper uses principal component analysis and Mahalanobis distance algorithm to study the daily production data of electric submersible pumps. First, the data is preprocessed, then the principal component analysis algorithm is used to extract the principal components, and finally the electric submersible pump data is faulted through Mahalanobis distance. Diagnosis and analysis of experimental results show that this set of algorithms can diagnose pipe string leakage failures in advance and achieve the benefit of reducing economic losses.

Principal component analysis
PCA(principal component analysis) is a multivariate statistical method. It is one of the most commonly used dimensionality reduction methods. A set of potentially correlated variable data is transformed into a set of linearly uncorrelated variables through orthogonal transformation. The transformed variable is called main ingredient [7] . That is, given a set of variables 1 2 , ,..., n x x x , through a linear transformation, transformed into a set of uncorrelated variables 1 2 , ,..., n Y Y Y , in this transformation, the total variance of the variables (the sum of variances) is kept unchanged, and the maximum variance is called the first principal component. At the same time, Y 1 has the largest variance, called the first principal component; Y 2 has the second largest variance, called the second principal component. By analogy, once there were n variables, you can convert n principal components, and find out that most of the variance variables that can reflect the original n variables are p(p≤n).
First calculate the correlation between variables Set a correlation matrix R as follows, Calculate the eigenvalue λ of the matrix, and obtain the variance contribution rate and the cumulative variance contribution rate as the formula.
Finally, the principal component is:

Mahalanobis distance
The Mahalanobis distance was developed by Indian statistician P. C. Mahalanobis to measure the covariance distance of data. It is an effective method to calculate the similarity of two unknown samples. Compared with Euclidean distance, Mahalanobis distance has more advantages and is not affected by dimension, that is, the Mahalanobis distance between two points has nothing to do with the original data, and the calculation speed is fast. Moreover, the Mahalanobis distance can consider the relation between various characteristics and eliminate the correlation interference between variables. To sum up, Mahalanobis distance can easily measure the distance between the observed sample and the known sample set, so it is suitable for fault diagnosis [8] .
where P is the dimension of the sample, M is the size of the sample.
The μ value is calculated as followed: 1 1 , where the covariance matrix is:

Fault Diagnosis Process
The electric submersible pump data collected by the sensor has many missing and abnormal data, so the data should be cleaned first, and then the data should be standardized. For some attributes, only unique values need to be deleted. Then, the preprocessed data were analyzed by principal component analysis to calculate variance contribution rate and cumulative variance contribution rate, and each principal component. Finally, the data set is divided, the Markov distance and the maximum Markov distance of the training set are calculated, and then the new data is tested to diagnose the time of failure. The specific fault flow is as shown in Figure 1.  Daily measurement data of oil and gas wells: converted daily fluid production, converted daily oil production, converted daily water production, converted daily gas production, test fluid volume, test oil volume, test water volume, test gas volume, wellhead temperature, oil body density, gas-oil ratio, water content, separator pressure, separator temperature.  Daily distribution data of oil and gas wells: daily fluid production, daily oil production, daily water production, daily gas production, gas-oil ratio, oil-gas ratio, water cut, water-gas ratio.  Fault information data: the data information contains the specific time when the string leakage occurred in each oil well.

Data cleaning.
Through data analysis, data cleaning is divided into the following steps:  Daily measurement data of oil and gas wells are recorded multiple times a day at some time points, and one piece of data should be retained.  The daily measurement data of oil and gas wells is not recorded every day but is recorded irregularly (sometimes once every two or three days, sometimes every five or six days), so linear interpolation of the measurement data is required.  The daily status data of oil and gas wells, the daily metering data of oil and gas wells, and the daily distribution data of oil and gas wells have been merged at the same point in time.


Delete the samples whose production time is 0, this kind of data will lead to fault misjudgment.  Delete variables with a single value, such as nozzle diameter, surface casing pressure, technical casing pressure, etc. These variables have no effect on the data.

Data transformation.
Data standardization is performed on the cleaned electric submersible pump data. Because the difference between different variables is large, the calculated relationship coefficients will differ greatly, and the optimal solution of the model cannot be quickly found in the optimization process. This paper uses the maximum-minimum normalization method to map each variable to the interval [0,1], as in equation (9 and n is the characteristic dimension of the data.

Principal component analysis
According to the data variables obtained by data preprocessing, input the principal component analysis model. According to the decreasing order of the variance of the principal components, the different principal components are sorted. Taking the well ID=831353047 as an example, through data preprocessing, an oil well has 18 parameters including wellhead temperature, pump current, pump voltage, oil pressure, daily gas production, daily liquid production, daily oil production, daily water production, gas-oil ratio, oil-gas ratio, water content, water-gas ratio, converted daily fluid production, converted daily oil production, converted daily water production, test fluid volume, test oil volume, test water volume. Through principal component analysis, the variance of the original input parameters with more than 99% of the first six principal components can be observed, as shown in Figure 2. Among them, the first principal component and the second principal component have the highest variance, including 81% of the variance of the original data.

Mahalanobis distance fault diagnosis
Extract more than 99% of the principal components of the original data. Well ID=831353047 as an example, extract the first 6 principal components to construct a data set, use the Mahalanobis distance model to train the data, calculate the Mahalanobis distance of each sample, and get Maximum Mahalanobis distance, and use this data value as the failure threshold. Through data visualization, the calculated threshold for marking anomalies is about 6.74, as shown in Figure 4. Through the Mahalanobis distance of the training set and the test set, and the calculated threshold for marking anomalies. The next step is to compare the Mahalanobis distance of the data set with the threshold. When it is greater than the threshold, it is marked as abnormal. As shown in Figure 5, it can be seen that the oil well was abnormal after 2016/8/15, but the actual failure the date was 2016/8/22.  Similarly, taking the well ID=861652417 as an example, the threshold for marking anomalies is about 7.96. It was detected that the oil well was abnormal on 2016/3/10, and the actual failure date was 2016/3/16，as shown in the Figure 6 and Figure 7.  Table 1 shows the comparison between the calculation of the abnormal threshold value and the prediction of the string leakage of the electric submersible pump using the Mahalanobis distance model and the actual time. The analysis in Table 1 shows that the pipe string leakage time predicted by the Mahalanobis distance diagnostic model is earlier than the actual pipe string leakage time. Therefore, the Mahalanobis distance diagnosis model has excellent accuracy in predicting the leakage time of the electric submersible pump string.

Conclusions
This paper proposes a fault diagnosis model based on principal component analysis and Mahalanobis distance to detect the leakage fault of the electric submersible pump string in advance. Firstly, the principal component 1 and principal component 2 can be used to judge the normal sample and the abnormal sample, and then the Mahalanobis distance can be used to diagnose the time when the fault occurs, which reduces the economic loss of oil well production.