Mathematical simulation of the rate of carbon dioxide corrosion at the facilities of Gazprom dobycha Urengoy LLC

In Gazprom dobycha Urengoy LLC, as in other oil-and-gas production enterprises, there are problems of increased equipment wear due to corrosion. A special role there plays CO2corrosion. Despite the homogeneity of the extracted fluid and even chemical composition of the working medium, the nature and intensity of corrosion damage to pipelines and equipment varies over a wide range, due to different thermobaric parameters of well operation. To determine parameters influencing the rate of corrosion most different methods of statistical analysis were used. The paper provides a methodology for compiling a mathematical model and assessing its reliability. As a result, the equation of carbon dioxide corrosion in relation to the conditions of Achimov deposits of Urengoy oil, gas and condensate field was obtained. The type of the obtained equation was chosen according to the model of the classical de Waard-Milliams carbon dioxide corrosion equation. The model proposed by the authors describes the processes of carbon dioxide corrosion more reliably than the de Waard-Milliams equation does. The disadvantage of the developed model is that it does not reliably describe the speed of corrosion in wells with corrosion rates, significantly exceeding the average values for all wells studied.


Introduction
In October 2009, the gas condensate facility was launched developing Achimov deposits new for PJSC Gazprom. Achimov deposits are characterized by abnormally high reservoir parameters: pressure is up to 60 MPa, and the temperature is up to 106°C.
After 5 years of operation the first corrosive damage to the inner surface of the wellhead equipment elements and pipelines were identified. Figure 1 shows one of the defects. The cause of these defects was carbon dioxide corrosion. The partial pressure of carbon dioxide at the wellhead exceeds 0.21 MPa. Thus, the environment has a high degree of aggressiveness in accordance with various regulatory documents (NACE SP 0106-2006, GOST R 51365-2009, SТО Gazprom 9.3-011-2011) [1][2][3]. In gas collecting pipelines, the partial pressure of carbon dioxide is reduced to the level of 0.1 MPa but the presence of free water and the stratified flow conditions cause the corrosion process along the lower pipe generatrix.
At the facilities of Gazprom dobycha Urengoy LLC exposed to carbon dioxide corrosion a corrosion monitoring system is implemented, which allows to measure the corrosion rate in different parts of the gas collection system [4,5]. According to the measurement results, the following pattern was noted. In areas up to the pressure regulator, characterized by high temperatures and pressures, the rate of corrosion flow is higher than in the areas after the pressure regulator, where thermobaric parameters are lower. This fact is explained by the fact that the rate of carbon dioxide corrosion depends on the temperature and partial pressure of the gas, which in turn depends on the operating pressure.
A large number of theoretical, empirical and semiempirical models exist to describe the corresponding dependencies [6,7]. The model de Waard-Milliams is considered the most popular and -already -classic [8,9]. However, the use of this model for the conditions of Achimov deposits gives excessive corrosion rates relative to the actual corrosion rates. Based on this, the authors of the article faced the task of developing their own model describing the rate of carbon dioxide corrosion in relation to the conditions of Achimov deposits.

Materials and Method
To carry out a multi-factor analysis assessing the impact of various parameters on the corrosion rate, the following data were selected: the results of chemical analyses of formation water samples, the results of gas condensate studies, the average values of pressure, temperature and flow rates for each well, corrosion rates obtained by gravimetric method. All data were summarized in a matrix consisting of 72 rows and 28 columns, where 72 is the number of observations; 28the number of factors that could have a possible impact on the corrosion rate.
The generated data array has undergone a preprocessing procedure, including: 1. Elimination of missing observations. As part of this operation, we removed from the data set observations, in which at least one of the factors was missing. For example, wells that lack data on the chemical composition of water were excluded from the data set.
2. Transformation of qualitative features into numerical (binary) ones. For example, one of the layers from which the fluid was extracted was assigned the value 0, the other layer -1.
3. Application of bootstrapping method. Due to the large amount of data available for multi-factor analysis, the bootstrapping method was used to artificially increase the amount of information and improve the quality of the future model [10,11]. The essence of the method consists of forming a set of samples based on a random selection with repetitions and ultimately the entire sample assumes the normal form of distribution, which allowed further use of standard methods of mathematical statistics and data analysis.
After preliminary data processing for each analyzed factor, mean values and mean square deviation (hereinafter referred to as MSD) were found. For some factors, the MSD exceeded the average value of the factor, which indicates a large level of spread in values.
The next step was to conduct a factor analysis, the purpose of which is to find the factors that most affect the corrosion rate. For its implementation, the principal component method was chosen as the most frequently used due to ease of use and transparency. This method was used to exclude the maximum number of factors from the model, because a large number of variables in the mathematical model makes it variable, and the reliability of such a model may be in doubt.
In the RStudio software product, a diagram was built ( Figure 2), estimating the contribution of the total variance in each of the main components.
The higher the resulting column, the more variations include the main component being analyzed. The main components are ordered in descending order. Thus, the first two main components account for about 60% of the total number of variations. Further, the contribution (in percent) of each of the analyzed factors to the first two main components was estimated, in other words, how strongly the analyzed factor affects the main components. After that, a correlation map was drawn in the software system ( Figure 3). The correlation map indicates positively correlated factors (grouped together), negatively correlated (located in opposite quadrants), as well as the level of variability and significance of factors: the location closer to the edge of the circle indicates greater variability, while the location closer to the center indicates lesser variability. The obtained diagram shows that the corrosion rate in 2016 (Vcorr16) and in 2017 (Vcorr17) has a relationship with the temperature (Ту) and flow rate (Qpl), also has a positive correlation with the first main component with the operating pressure (Pу) and, accordingly, the partial pressure of carbon dioxide (PСО2). Some other factors that have a positive correlation with the corrosion rate (water content -QH2O, the content of hydrocarbon components C5+в -PС5В) have an overestimated mean square deviation and cannot be included in the model. Thus, the hypothesis that the maximum influence on the corrosion rate is exerted by the temperature, pressure and flow rate of wells was previously formulated. To confirm this hypothesis, we started the construction of a multidimensional regression model. Further, also using RStudio software, a model was compiled. For this, all factors included in the original matrix were used. The result of this work was a multidimensional regression model presented in the form of a table indicating the regression coefficients, MSD for each factor, the values of the Student's t-test, p-values (significance level) for each criterion. All of the above criteria allow us to estimate the statistical significance of each factor and further exclude the least significant factors from the model.
On the basis of the obtained model, it was decided to reduce the model, in particular, all factors were excluded from the model except for the temperature, pressure and flow rate of the well. After that, a model was built that includes only the above factors (Table1). The next stage of the study was to assess the reliability of the model. In this stage, we built three diagrams.
Using the Gauss -Markov theorem, an array of data was formed, on which the first diagram was built ( Figure  4), which allows us to evaluate the correctness of the choice of the model type. The resulting dependence indicates that the model type (linear relationship) was chosen correctly. In the case of a different nature of the desired dependence, the trend line (red line) would have a geometry that is different from the linear one. Also on this diagram, it is clearly visible that there are anomalous points (No. 1, 11, 22) that do not fall on the model. The greater the spread of points from the trend line, the less adequate is the resulting model.
The following diagram ( Figure 5) is a scattering diagram of observed and expected (standardized) values with a corresponding specified distribution. If the observed values fall on a straight line, then the theoretical distribution is well suited with the observed data.  In addition, the diagram shows a dotted line of Cook's distance (in the upper right corner). The closer to these lines the observation is, the more this observation shifts the predicted corrosion rate from the actual value. In accordance with this diagram, there are anomalous observations (№ 11, 22, 111), which coincide with the previous two diagrams.
Further, using the Cook's distance determination method [12], anomalous corrosion rate values were revealed, creating a high variability of the regression model. Cook's distance shows the difference between the calculated coefficients of the regression equation and the values that would be obtained by excluding the corresponding observation. If all Cook's distances are not the same, which indicates the adequacy of the model, we can assume that this observation shifts the estimates of the regression coefficients. Figure 7 shows the results of the algorithm in identifying anomalous values for corrosion rates. Numeric designations correspond with the index identifier of a well with an anomaly. Through Cook's distance this method, four anomalous observations located above the red line were excluded from the model.  Table 2). Based on the evaluation of the statistical significance of each factor, it was decided to exclude the flow rate from the model due to the lack of a sufficient level of significance for the above predictor -0.9524. After that, the model was rebuilt and its reliability was evaluated similarly to the original model. After rebuilding, the dispersion of the actual values from the predicted corrosion rates visually decreased, which is an indicator of the adequacy of the model obtained.
It is known that in the classical equation of an estimation of carbonic acid corrosion of de Waard-Milliams (1) there are the same variables as in the model obtained by authors. The only difference is that in the de Waard-Milliams equation, the pressure is represented as the partial pressure of carbon dioxide but because the CO2 content is the same for all wells, respectively, the operating pressure is directly proportional to the partial pressure of CO2.
Therefore, the next stage of the work was the construction of a model according to the type of the de Waard-Milliams equation. Refined model parameters are presented in Table 3. (2) Figures 8-9 show a graphical comparative analysis of the actually measured corrosion rates with the predicted corrosion rates for the two models.

Discussion
In the model, all members are statistically significant predictors. The value of the determination coefficient (R 2 ) for the obtained model is 0.47. Taking into account the fact that all the initial data were obtained in field conditions, the model can be considered sufficiently adequate.
The model developed by the authors describes the processes of carbon dioxide corrosion in relation to objects of Achimov deposits with a better correlation than the de Waard-Milliams model, despite the fact that the de Waard-Milliams equation has a theoretical basis on which the results of laboratory studies are imposed.
The resulting equation is of practical importance at the stage of commissioning of new wells to predict corrosion rates until it is possible to actually measure the corrosion rate. 2. To identify the main factors affecting the corrosion rate, a statistical analysis was carried out using the principal component method and the construction of multidimensional regression models. It is established that the factors most correlated with the corrosion rate for the conditions of Achimov deposits are temperature and pressure; 3. An anomaly search algorithm was implemented based on the Cook's Distance determination. Using this algorithm, four observations were excluded from the total sample; 4. After exclusion of the anomalous wells an updated regression model was built by the type of de Waard-Milliams equations; 5. The proposed model describes the processes of carbon dioxide corrosion more accurately than the de Waard-Milliams equation; 6. The disadvantage of the developed model is that it does not describe the corrosion rate in wells with abnormal values of the corrosion rate.

List of symbols
Vcor corrosion rate, mm/y PCO2 partial pressure, bar T temperature, K