Prediction of centrifuge capillary pressure using machine learning techniques

. In current literature in the petroleum industry, machine learning has been used to predict capillary pressure only on the centrifugal data points and not the complete capillary pressure curves generated from existing correlations after analysis. This paper will present novel information that will benefit the petroleum industry as it shows machine learning techniques can be used to obtain the complete capillary pressure curve which is the end goal in undertaking an SCAL centrifuge experiment. This research involves testing core samples using a centrifuge set up to produce capillary pressure data points. Then, using a commercial SCAL interpretation software, the collected data is utilized to generate complete capillary pressure curves based on developed literature correlations. RCAL data for the core samples is also obtained to be used with the machine learning techniques. The machine learning models are then applied to the collected data to predict the capillary pressure curves. Optimization of the different machine learning techniques is done to improve the predictions. The results show the machine learning techniques perform very well on the validation set after being trained on the training set. The machine learning models also provide reasonable prediction of the complete capillary pressure curves on the testing data set. Changing of the machine learning technique parameters also shows the effect on the overall precision and the improvements that can be made. Further research can be done to see the effectiveness of using machine learning techniques to predict other SCAL properties such as relative permeability. This can then greatly reduce the time needed to obtain these extremely important properties for reservoir characterization.


Introduction
Researchers continue to find new ways to reduce the time and improve the accuracy of reservoir characterization [1].Specialized core analysis (SCAL) examines important reservoir characteristics such as capillary pressure and relative permeability to estimate incremental oil recovery during secondary water or gas flooding and tertiary enhanced oil recovery.Arguably, SCAL is more time consuming and has more uncertainty compared to routine core analysis and fluid characterisation.Capillary pressure (Pc) is the pressure between two immiscible phases in a capillary tube or, in this case, within the rock's pores [2].Capillary pressure is measured in a lab through spontaneous imbibition, gravity drainage, and forced imbibition tests.Pc is used directly in reservoir simulators to account for unrecoverable oil and it is further used to estimate relative permeability or the relative conductivity of the rock to more than one fluid phase simultaneously.The three main test methods used to determine capillary pressure curves are centrifuge, porous plate, and mercury injection [2].The advantages and disadvantages of these methods are summarized in Table 1.It is worth pointing out that mercury injection capillary pressure (MICP) does not respond to wettability components in the pores since Hg is an ideal non-wetting fluid.Hence MICP reflects only the pore geometry and therefore differs slightly from other tests that use oil/brine/gas.The oil and gas industry has always been highly competitive and somewhat unpredictable.In recent years, companies have begun using machine learning techniques to combat several challenges and issues in data processing and handling in various oil and gas activities such as reducing risk factors and cost of maintenance [3].In current literature, machine learning has been used to predict the experimental capillary pressure data points obtained from centrifugal and mercury injection tests with varying results [4][5][6].For the case that did use centrifugal data, correlations were not applied to the experimental results to obtain a complete capillary pressure curve.This research aims to obtain complete capillary pressure curves generated from existing correlations after analysis of centrifugal data and then test the novelty in using machine learning techniques to predict the experimental capillary pressure curve and water saturation.Three machine learning techniques, including the artificial neural network, support vector regression and random forest regression were used to predict the capillary pressure from centrifuge test samples with routine core analysis data (porosity, permeability, bulk density, and irreducible water saturation) being used as inputs.ML-based approaches generate a multipoint capillary pressure curve as a function of fluid saturation using the provided input parameters.The ML prediction is not just a single value just as the experimentally measured and calculated capillary pressure values are recorded at discrete fluid saturation points.Although experimental techniques are broadly used to generate the capillary pressure curve of core samples, they are expensive and require core samples to be in reservoir conditions that is very difficult to achieve.As a result, researchers ponder whether data-driven

Machine Learning Technique Types
Machine learning is the broad study of programming algorithms that can learn through experience from the use of a variety of data [7].Machine learning techniques are categorized into either supervised or unsupervised learning with supervised learning being further broken into classification and regression as shown in Fig. 1.For the purposes of this research, supervised learning regression techniques that were used are briefly explained as follows.

Support Vector Regression
Support vector regression (SVR) is a popular technique where support vector machines are adapted for regression with a quantitative response [9].The technique provides flexibility in defining the acceptable error for the model and determines an appropriate line or hyperplane fit to the data [10].For SVR, the objective function is to minimize the L2norm of the coefficient vector.One strength of SVR is a variety of kernel functions being available to select or modify depending on the requirements for your own predictions [11].
The kernel function transforms data that are not linearly separable to linearly separable in a higher dimensional space.As well, given its dependence on support vectors, not all training data is needed for predictions.The support vectors determine the decision boundaries between the data from different trends that are used to do regression.SVR is a supervised technique in which data are fed into the model in (input, output) pairs.

Artificial Neural Network
Artificial neural network (ANN) is another regression technique that learns from processing (input, output) data pairs supplied during the training-testing split of input data.
The more examples and variety of inputs the model is given, the more accurate the model typically will predict outputs [9].Fig. 2 below shows the architecture of an ANN model, where the X nodes are the input layer, the Z nodes are the hidden layer, and the Y nodes are the output layer.For this study, X consists of porosity, permeability, bulk density, and irreducible water saturation and Y consists of capillary pressure and water saturation.The number of hidden layers and epochs (number of cycles through the training data) can be varied to see their effect on the predictions.The input to every hidden layer is the linear combination of the outputs of the previous layer passed through a non-linear function, also called the activation function.Activation function can be one of hyperbolic tangent (tanh), sigmoid, Rectified Linear Unit (ReLU), linear, etc.

Random Forest Regression
Random forest regression (RFR) is a supervised learning technique that utilizes ensemble learning for regression analysis.The method uses predictions from multiple learning algorithms to obtain a more accurate prediction [12].RFR is also one of the most powerful regression techniques and is useful for non-linear relationships.A typical setup of a RFR model is where multiple decision trees are constructed on training data and the mean of the trees is outputted as the prediction.Each tree processes the input parameters, including the porosity, permeability, bulk density, and irreducible water saturation to predict capillary pressure and water saturation.Therefore, capillary pressure and water saturation are the output of each tree.The final output will be the average of all trees outputs in the forest.
However, it should be pointed out that capillary pressure curves generated by machine learning methods and experimental methods are often not accurate and seldom match the distribution of wetting and non-wetting phases sensed by imaging techniques.An ML-based method can predict water saturation and capillary pressure curve much faster than laboratory methods and potentially with no more uncertainty.Also, ML-based methods can be generalized to different fields if enough samples are used to train the models.In this manner, the need for core samples from new fields in reservoir conditions is mitigated.
2 Literature Review

Centrifuge Capillary Pressure Correlations
The centrifuge capillary pressure problem involves solving the challenge of determining local saturations along the length of the core based on the averaged saturations measured during the centrifuge experiment [13].To solve the centrifuge capillary pressure problem numerous correlations have been developed that take on the general form of Eq. 1. ( where , , and will vary depending on the correlation used [14].The main assumptions are hydrostatic equilibrium being reached at each phase and the boundary condition of the capillary pressure being zero at the outflow face [13].
is the local saturation and is the average saturation measured during the centrifuge experiment.Pc1 is determined from Eq. 2. (2) Where is the difference in the fluid densities and and are the radii at the inner and outer faces of the sample in the centrifuge core holder and is the rotation speed.

Hassler Brunner Method
The simplest solution for the centrifuge capillary problem was developed by Hassler and Brunner and it is also the poorest solution as it neglects both radial and gravity effects [15].These assumptions can be satisfied for very short and narrow samples spun far from the rotation axis.In the case of the general centrifuge capillary problem, the Hassler-Brunner equation reduces to the following: ( Forbes first solution demonstrated that this solution is always significantly lower, in terms of saturation, than the true solution [14].

Forbes First Solution
Forbes built upon previous solutions to the centrifuge capillary problem and came up with his own solution that still neglects both radial and gravity effects but incorporates the difference in the centrifuge core radii [13].Forbes solution took the form shown in Eq. 4.s where , and . ( (5) ( 6)

Spline Functions
Nordtvedt and Kolltvelt were first to approximate the wetting phase saturation as a function of the capillary pressure whereby the capillary pressure curve is approximated piecewise by a polynomial [16].This parameter estimation technique is usually related to functions which are too simplistic to describe the shape of capillary pressure curves [15].A linear system of equations is then obtained and the piecewise approximation polynomial takes on the form shown in Eq. 7. ( Where indicates the interval number and refers to the total number of intervals.

Current Machine Learning Applications
Busaleh et al. used several types of machine learning techniques to predict capillary pressure in carbonate oil reservoirs from data comprised of mercury injection drainage capillary pressure data [4].Their data consisted of 202 carbonate core samples that contained at minimum 70 capillary pressure versus saturation points in addition to the corresponding porosity, permeability, and grain density for the core.For their machine learning techniques, the inputs used were porosity, permeability, grain density, and water saturation with their output being capillary pressure.They used a training-testing split of 70-30 of the core sample data.With the variety of machine learning techniques came with mixed results.ANN-trainbr, ANN-trainlm, ANN-trainscg, ANN-trainoss refer to neural networks using Bayesian regularization backpropagation, Liebenberg-Marquardt optimization, scaled conjugate gradient, and one-step secant methods, respectively.
The decision tree algorithm performed the best on the training data however this is likely due to overfitting as it performed considerably worse on the testing set.The ANN-trainbr method performed the best overall; however, there is still significant room for improvement [4].
Jamshidian et al. obtained centrifuge capillary pressure data for 15 core samples each of which having 27 capillary pressure data points totalling 405 data points.They completed seven cases to see the effects of additional input parameters on the final predictions [5].An ANN model with a Cuckoo optimization algorithm was used to predict capillary pressure as the only output which was then compared against the experimental centrifuge values.The training set consisted of 8 cores with the remaining 7 used for validation.
Table 2 above shows the different cases modelled with the addition of input parameters.Water saturation, air permeability, porosity, irreducible water saturation, water permeability, oil permeability at irreducible water saturation, and grain density were the contributing input parameters.From Fig. 3 the increase of input parameters correlates with an increase in the correlation coefficient for both the training and testing data sets.Case 1 through 7 correlates to the ANN modelling case and contributing input parameters shown in Table 2. Also, the greatest prediction comes with all the input parameters being used.Fig. 4 below highlights the accuracy of the predictions for core number 9 under the case 7 scenario.Eq. 9 above is the Carman-Kozeny equation which was used to determine the average pore size.The data consisted of 214 samples of mercury injection data with permeability and porosity data and the J versus water saturation for each curve was calculated.The workflow consisted of the formulation of grouping features from porosity, permeability, and pore throat-radius.The formed grouping features can be seen in Eq. 10 and 11.This approach was compared against the flow zone indicator (FZI) method, which is a rock typing method grouping capillary pressure data into different rock types [6].The WCSS is the sum of squared deviations between each cluster's data points and the cluster's centroid.In this case, as shown in Fig. 5, four scenarios were considered based on the maximum curvature of 6 and three higher cluster scenarios of 10, 15 and 20 clusters.
The results of the cluster scenarios based on the grouping features can be seen in Fig. 6.Fig. 6.Clustering Scenarios Based on Grouping Features [6] From these generated clusters in each of the scenarios, an average J curve was calculated based on the J curves for each sample that were identified in the specific cluster as seen in Fig. 7. Fig. 7. Average J Curves Based on Clustering Scenarios [6] An example of the average J curves calculated against the J curves for each sample identified in a cluster is shown in Fig. 8.This is showing the curves for cluster 1 and 2 for the 6 clusters scenario.Next, classification algorithms (supervised machine learning) were used to develop maps for the prediction of corresponding clusters for new samples.The algorithms used were K-nearest neighbours (KNN), kernel support vector machine (KSVM), decision tree (DT), and random forest (RF) which were validated by using six carbonate rocks and comparing with the respective experimental data.The best predictions were for the 6 and 10 cluster scenarios with the accuracy for each algorithm found below in Table 3.As shown, KSVM provided the highest accuracy in the predictions.This algorithm was then subsequently used to classify the validation set (6 carbonate rocks).The rock properties and the grouping features for each can be seen in Table 4 along with the identified cluster for each rock.It is important to note here the predictions were completed on capillary pressure experimental centrifuge points with water saturation being used as an input.7 different cases were conducted with each case involving an additional input parameter which in turn increased the accuracy of the predictions.The limitation here however was the use of the raw centrifuge experimental points.The use of an interpretation software was not used to apply literature correlations to obtain a complete capillary pressure curve.
Kasha et al. took a different approach in using an unsupervised algorithm, clustering, first followed by classification algorithms.The limitation here being the need for an abundance of mercury injection data for many samples to be clustered together and an accurate average J curve for those types of rocks to then be developed.Other approaches could have been taken as well with regards to the pore size distribution such as the Winland r35 values.
This research aims to show the novelty of using machine learning techniques to predict complete capillary pressure curves developed from conducting centrifuge tests followed by using SCAL interpretation software to apply literature correlations.This is turn can reduce the time to obtain a complete capillary pressure curve and help improve reservoir characterization which is the end goal in undertaking a SCAL centrifuge experiment.Although several researchers have used ML methods to predict SCAL data, they have focused on predicting a single value of the quantity of interest (relative permeability, capillary pressure, water saturation, etc.).This study assesses the capability of three machine learning algorithms in generating the capillary pressure curves that consists of both capillary pressure and water saturation.Therefore, the novelty of this paper is to simultaneously predict capillary pressure and water saturation values to generate the capillary pressure curve.

Centrifuge Capillary Pressure Tests
For this research, multiple centrifuge capillary pressure tests were conducted.The setup consisted of a VINCI RC 4500capillary pressure refrigerated centrifuge along with a computer with the appropriate recording and interpretation software.The interpretation software used was CYDAR which offers a powerful solution for the interpretation of conventional and specialized core analysis experiments.CYDAR is also used to record the water saturation and speed versus time during the experiment.A separate VINCI software is used to set the rotational speeds and block times for the centrifuge as well as setting up the camera interfaces for the holding cups.
Core samples were first cleaned followed by drying out in an oven for approximately 1-2 days.Following drying the core samples, their length and diameter were recorded using a Vernier calliper.Next, the dry weight of the cores were recorded.The petrophysical properties of the cores used in the machine learning can be seen in Table 5 below.Next, a synthetic brine was prepared for the simulation of formation water, the composition is shown in Table 6 below.The cores were then loaded into a vessel where the approximately 1 L of brine was pumped in to saturate the cores for approximately one day.Due to the high permeability of the cores one day proved sufficient in saturating the cores.After saturation had occurred, the wet weight of the cores was measured.This data was then used to calculate the porosity and bulk density of each sample.
The cores were then loaded into centrifuge core holders where the top, bottom pipe, and sleeve were determined from the core length as shown in Fig. 9a.The core holders were pressurized to 2000 psi using a hand pump and silicone oil.Small amounts of deionized water and oil were put into the holding cups to create an interface as seen in Fig. 17b.The core holders were also weighed to ensure symmetrical weight before being loaded in the centrifuge.
Once loaded in the centrifuge, the recorded data was inputted in CYDAR, and the recording software was loaded.Multistep speeds (between 500 to either 3500 or 4500 rpm in increasing 500 increments) as well as step durations were specified, and the camera was set to record the oil-water interface for the respective cups as shown in Fig. 10.This figure shows an oil-water system with the right arrow indicating the water side and the left arrow indicating the oil side.The arrow pointing between the two indicates the oilwater interface.Throughout the drainage experiment as the oil is forced into the core and the water out, the interface will gradually move towards the left.The camera was also checked periodically to ensure it was still properly recording the interface.Upon completion of the centrifuge test, the next step was to process the recorded data.This required the use of CYDAR interpretation software.The water saturation versus time data was cleaned to exclude any outlier data that the camera may have recorded, and the block times and durations were recorded and entered for CYDAR to calculate the capillary pressure based on the step speeds.For each core, an analytical fit was calculated for the experimental data and Forbes first solution correlation [13] was applied to the experimental data.An example of a complete capillary pressure curve for a sample after processing can be seen in Fig. 11.A biexponential fit was the analytical method used in this case.As shown, both analytical and Forbes methods lack rigor to fit the experimental data.For each of the 13 cores the Forbes first solution was applied to the experimental data.Using CYDAR, data from the Forbes curves was discretized to be used for the machine learning techniques in order to predict the Forbes curves.

Machine Learning Techniques
A spreadsheet was developed to collect all the test data for the machine learning techniques before pre-processing of the data from 13 core samples.The machine learning workflow followed in this case can be seen in Fig. 12. 10,955 (input, output) data points were recorded and used to train and validate the ML models.Depending on the technique used, the parameters were defined for each; for ANN the epochs, activation function, hidden layers, and neurons, for RFR the number of trees, and for support vector regression the tolerance and max iterations.Standardization and normalization feature scaling were also implemented to see how they would affect overall performance.Eqs. 14 and 15 show the methods for standardization and normalization, respectively. ( where is the mean of the feature values and is the standard deviation of the feature values. ( where and are the minimum and maximum of the feature, respectively. Once the models were run on the training and testing data, their performance was measured using 2 measurements, including the mean squared error (MSE) and correlation coefficient (R 2 ) where n corresponds to the total number of data points.(16) Finally, once all models were developed on the validation data set, they were then tested and evaluated on the unseen data set.All machine learning methods were implemented using Python 3.7 programming language along with Keras library with Tensorflow 2.3 backend.

Results and Discussion
The results from all ML algorithms on the validation data is shown in Table 7.As can be seen from Figs. 19&20 the biggest change in prediction accuracy comes with the change in using Tanh as the activation function.Minimal changes occurred in the prediction accuracy with varying the hidden layers with 2 hidden layers performing the best.Finally, for the varying of neurons, the greatest error occurred when using 16, while the greatest prediction happened using 32.
With the ML models developed, the test set of the 2 cores that was set aside was then given to the models for predictions.
The following graphs highlight the accuracy of each model showing the test data results with the results on the test set when using normalized and standardized feature scaling.Following the evaluation of the validation data, unseen data consisting of 2 cores was used to test the accuracy of the models.The results for each of the models can be seen in Table 8.As shown in Fig. 23, although RFR performed the best overall on the validation data and unseen data with regards to the performance metrics, it is very noisy when predicting unseen data.SVR performed the greatest overall with a smoother curve, however the entry capillary pressure was predicted higher.Similar to the first core, RFR was noisy again in predictions for the second core.ANN and SVR curves were smoother but showed differences in the entry capillary pressure.

Conclusions
This work investigated various machine learning techniques in predicting complete drainage capillary pressure curves.Centrifuge capillary pressure tests were conducted with SCAL interpretation software being used to apply the Forbes First solution correlation.Data for 13 cores was obtained along with the respective RCAL data of porosity, permeability, bulk density, and irreducible water saturation for each core.Once all the data was collected, three machine learning models were developed using training and validation data sets with some of the collected data set aside to be used as a test data set.Some of the key findings from this work were that the three techniques used all performed very well on the validation data set as evident from the performance metrics used.Another takeaway is the effectiveness of conducting a sensitivity analysis as evident in the results from the ANN model when changing the neurons, hidden layers, and activation function used.
Some recommendations for future work are to conduct centrifuge experiments on more cores of different rock types to see how the models will perform when using a variety of rock types.As well as, incorporating additional input data such as pore size distribution to see the effects.Additional cores could also be added to the unseen data for the models to increase the accuracy of the predictions.With the success of applying machine learning techniques to capillary pressure, future work can also be to conduct other SCAL tests such as relative permeability to see the effectiveness of machine learning techniques in predicting other reservoir characteristics.

Fig. 4 .
Fig. 4. Core Number 9 Experimental versus Predicted Values [5] Kasha et al. developed a new method for capillary pressure estimation based on the Leverett J-function as shown in Eq.8.This is a dimensionless function of water saturation describing capillary pressure also considering pore size and interfacial tension[6].
E3S Web of Conferences 367, 01004 (2023) https://doi.org/10.1051/e3sconf/202336701004SCA 2022 machine learning technique (clustering) was used to group each sample based on the grouping features.A technique called the within-cluster sum of squares (WCSS) was used to determine the optimum number of clusters.

Fig. 12 .
Fig. 12. Machine Learning WorkflowThe inputs for the machine learning techniques as mentioned previously were the routine core analysis data of porosity, permeability, bulk density, and irreducible water saturation.The outputs were water saturation and capillary pressure; subsequently, multi-output regression models were developed.Data for 2 core samples were put into a separate file to be used as a test set after model development.Note that for these 2 samples, 1995 (input, output) data points were recorded.For the remaining 11 cores, a training-testing split of 70-30 was used to train the models.Using these 11 cores,

Fig. 21 .
Fig. 21.MSE Results of Models for Test and Validation Data

Table 7 .
ML Models Results -Test DataAll models performed well with random forest being the best with a correlation coefficient of 0.9997.Figs.13 and 14show the prediction versus experimental data for ANN for capillary pressure and water saturation, respectively.

Table 8 .
ML Models Results -Unseen Data