Data-Driven Air-Cooled Condenser Performance Assessment: Model and Input Variable Selection Comparison

This paper presents a data–driven model for the estimation of the performance of an aircooled steam condenser (ACC) with the aim to develop an efficient online monitoring, summarized by the condenser pressure (or vacuum) as Key Performance Indicator. The estimation of the ACC performance model was based on different dataset from three different combined cycle power plants with a gross power of above 380 MWe each, focusing on stationary condition of the steam turbine. The datasets include both boundary (e.g. Ambient Temperature, Wind Speed) and operative parameters (e.g. steam mass flow rate, Steam turbine power, electrical load of the ACC fans) acquired from the power plants and some derived variable as the incondensable fraction, which calculation is here proposed as additional parameter. After a preliminary sensitivity analysis on data correlation, the paper focuses on the evaluation of different ACC Condenser models: Semi-Empirical model is described trough curves typically based on steam mass flow rate (or condenser load) and the ambient temperature as main parameters. Since monitoring based on ACC design curves Semi-Empirical models, provides biased poor results, with an error of about 15%, the curves parameters were estimated basing on training data set. Other two data driven models were presented, basing on a neural network modelling and multi linear regression technique and compared on the base of the reduced number of input at first and then including aldo the other process variables in the prediction of the condenser back pressure. Estimate the parameters of the Semi-Empirical model, results in a better prediction if just steam mass flow rate and ambient temperature are available, with an error of the 7%, thanks to the knowledge contained within the “curves shapes”, with respect to linear regression (8.3%) and Neural Network models (7.6%). Higher accuracy can be then obtained by considering a larger number of operative parameters and exploiting more complex data-driven model. With a higher number of features, the neural network model has proved a higher accuracy than the linear regression model. In fact, the mean percentage error of the NN model (2.6%), in all plant operating conditions, is slightly lower than the error of the linear regression model, but presents and much lower than the mean error of the Semi-Empirical model thanks to the additional data-based knowledge.


Introduction
Air -cooled steam (ACCs) condensers are widely used in various technological applications (es. Refrigeration, Steam Turbine etc.). The ACC are used in the power generation where they can substitute the water condensers when this solution cannot be adopted for lack of cooling water or due to a high environmental impact of water condenser heat rejection [1].
The ACC used in power generation consist in one, or more, array of fan units which function is to condense the steam of a closed steam. The fans are located below a finned tube heat exchanger to force the ambient cooling air through the system. The heat rejection occurs from the condensing steam to the environment via the finned tubes. The air condenser suffers largely regarding the efficiency and fouling of the heat exchange surfaces. The functional output of an air condenser depends essentially from the environmental air temperature, it leads to decrease of performance of the condenser during the summer season.
The performance of a condenser is generally judged by the vacuum maintained therein, this represents the back pressure of the Steam Turbine, ST, a crucial parameter to fully exploit the steam expansion curve. To fully exploit the ST power output, the vacuum value should be maintained at the lower value with respect to the boundary conditions that affect its behaviour.
A method of the state of art for monitoring the performance of a water cooled condenser has generally consisted in sensing the operating conditions of the condenser (such as the vacuum in the condenser, inlet and outlet temperatures of the cooling water fed to and discharged from the condenser, and the steam heat load). On this basis, several algorithms were applied to calculate the vacuum values and to estimate the condenser performances performing heat balances [2] and calculating form this the value of the overall heat transmission coefficient or the heat transfer rate [3]. This cannot apply to ACC since air discharge temperature and mass flow rate are unknown.
The main parameters usually considered to calculate the ACC performance are the steam mass flow rate to condensate, that can be expressed also as the percentage ACC load, and the ambient temperature.
Main internal causes of ACC performance reduction are i) a change in air tightness (i.e. the amount of air entering in the sub atmospheric condenser), ii) the fouling of pipes that modifies the heat exchange rate between steam and environmental air with respect to design conditions; moreover, the performance are influenced also by many environmental conditions (e.g. ambient temperature and wind speed) and operational factors (e.g. steam mass flow rate, Steam turbine power, electrical load of the ACC fans). To understand the condenser's real performance under different situations, it is of great importance to investigate the relationship between the back pressure of the steam turbine and the condenser-related variables [4]. The most common continuous monitoring system usually poses on the assumption of no air accumulation within the condenser. However, such parameter can be estimated trough an additional measurement of intruding air by the means of thermal anemometers at the vacuum pumps discharge [2]. Such pumps are used, during start-up and normal operation, to evacuate the non-condensable gas entering the condenser avoiding accumulation. Air tightness can be also assessed during dedicated test, i.e. the "vacuum decay test", an operator driven test that requires the shut off of the vacuum pump and the evaluation of the vacuum rate of change due to non-condensable gases accumulation. Another parameter that might affect the condensers performance is the amount of steam recuperated by the gland seal steam i.e. the steam that is used to maintain the labyrinth seals functional: the shaft steam seal system exhausts excess of superheated steam to the main condenser, due to an excessive wear of the sealings could bring to an increase in condenser pressure [1]. This effect was neglected in this work, since direct measurement of this steam mass flow rate is usually not available. With respect to the amount of air evacuated, the number of operational vacuum pumps was constant in the period observed. Generally, there are three approaches to analyse the performance characteristics of air-cooling condensers. The first approach, the analytical approach, uses empirical equations for heat transfer or pressure drop calculations, and ACC design data to describe the performance under off-design conditions [6,7,8]. However, the performance represented by empirical equations and design data, can give erratic indications because they do not take into account the particularity of the ACC installation site and due to performance degradation after a certain period of operation [4].
The second approach uses detailed computational fluid dynamics numerical simulation [9,10,11], while the third approach, exploited in this work, utilizes mathematical algorithms such as data mining or artificial intelligence [12,13] to consider large historical-operation datasets to identify a reliable relationship between given inputs and desired output. This approach was chosen by Li et al. that applied support vector regression method to establish a data-driven model to express ACC a non-explicit relationship from the operating data [4]. A data driven approach can be advantageous if a data set of the ACCs normal behaviour period is available. Those data can be set has baseline (or training period) of the model, to calculate then any deviation from expected of the test data, detecting the abnormal behaviour.
This work focuses on the estimation of the performance of steam air condenser for monitoring purposes, defining and comparing Semi-Empirical approach (i.e. based on condenser design curves derived by empirical equation improved by parameter estimation) and machine learning data-driven models such as Multi -Linear Model and Neural Network. Moreover, a simple relation to estimate non condensable gas presence within the Condenser is presented.
The main goal is to evaluate the effect of the number of the input variables and to assess which data -driven model for performance monitoring of Air condenser belonging to Combined Cycle Power Plant can estimate vacuum values by low error rate and high stability.

Datasets
To define, test and validate different models, data belonging to three different air condensers with same architecture such as number of fans and number of ways (i.e the number of parallel exhaust header pipe in which the steam is divided) were considered. The three air condensers belong to three different combined cycle power plants based on a F-Class Gas Turbine (CCGT) with a gross electrical output of 380 MWe.
The three datasets refer to environmental, operating parameters and steam conditions of the power plants with a focus on the ACCs parameters. Data were acquired from DCS (Distributed Control System) of the plants with a sample interval of 5 seconds and covers approximatively three months of commercial operation of the plants.
To select the variables from the dataset that better describe the ACC condition is essential to understand the architecture, the mechanics and the factors that can modify the performance of an air -cooled steam condenser [14] starting by analysing variables available in our datasets. Focusing on the ACC peculiar parameters, the fans total absorbed power is strictly correlated with the air mass flow through the condenser finned tubes and it has been considered in our modelling to include the air -side flow heat exchange contribution. Changing the number of operational fan and their velocity, an increase of the adsorbed power causes an increase of the heat rejected from the condensing steam and consequently a decrease of the vacuum value (increase of air condenser performance) considering constant ambient temperature and constant steam turbine conditions. This effect can mask the reduction of the ACC performance due to air entrance or fouling.
Usually less attention is given to the wind velocity even if its influences was proved to have a non negligible impact on the power plant performances. Hotchkiss et al. [9] concluded that the efficiency of axial flow fan is reduced by crosswinds, while the fan power consumption is not significantly affected. Liu et al. [10] and Yang et al. [11] found that the hot air recirculation in ACCs is very sensitive to the wind direction and speed, and it will be sensitive to high wind speed. In the three considered power plants the wind direction is not acquired. Therefore, in this paper only the wind velocity [m/s] is considered. The presence of wind causes a worsening of the vacuum, at the same other operating condition of the plant due to a decrease of the volumetric effectiveness of the fans that causes a decrease of the air flow through the air condenser. The VGB guidelines prescribe for the that the mean wind velocity at the upper edge of the air-cooled condenser must not exceed 3 m/s [14].
Including the effect of the wind velocity, and moreover directions, effects requires numerical characterization that cannot be generalized to power plant specific installation layout while a data-driven approach can introduce more easily such information within a model. Not directly measured parameters were calculated from the original dataset with empirical and thermodynamic relationship, as the steam flow through the condenser and the non-condensable gases fraction.
The steam flow is a function of the first stage pressure of Low-Pressure Turbine and a series of empirical parameters related to the bucket cross sectional area: If the expansion efficiency of the steam turbine could be considered stable across the whole dataset, the steam turbine active power is strictly correlated both with the input steam flow rate (since usually steam conditions appear to be stable with GT load), which is approximately the same of the inlet condenser steam flow, and the vacuum condition. In fact, the vacuum value affects the available pressure drop of the LP turbine and consequently the active power of the steam turbine.
The presence of non-condensable gases, NC gases, in a condensing vapour results in a vapour-side accumulation of these gases on the condensate surface. The effect on the condenser efficiency can be detected as a reduction of partial pressure lowers the condensation temperature at the vapour interface and thus reduces the driving temperature difference relative to the pure vapour heat transfer [15]; as said, an high incondensable content causes a decrease of the condenser performance. In this work the amount of NC gases was calculated with respect to the total pressure and the vapour partial pressure as: Where is percentage amount of incondensable gases in air condenser, is partial saturation pressure calculated from incondensable gases temperature and is the air condenser vacuum value. This value is calculated to derive a continuous monitoring index to be add to the analysis. This figure can introduce inaccurate results if subcooling occurs within the condensate, i.e. the condensate temperature used to evaluate the is not at the saturation.

Data Preparation
One of the main issues in order to develop a data -driven model is to detect model input and output through whole available dataset. Firstly, has been selected output of the model as vacuum pressure at ACC, mainly acknowledged for performance estimation of air condenser. Therefore, input variables were selected from whole dataset considering the experience of the reference focusing on data-driven condenser condition monitoring. [4,12,13,14].
In this case study all the dataset acquired from the plants were considered as normal behaviour datasets and the effect of fouling of the condenser was neglected because the continue cleaning procedure applied during periods of plant shut down, prior to the hot season data here presented. In this case efficiency of heat exchange between condenser finned tubes and environmental air could be considered constant alongside datasets and therefore performances of ACC can be evaluated without considering the physical time.
Because of Semi -Empirical model is defined only for Steam Turbine stable condition (Active Power Gradient calculated < 1.6MW/min) also the following correlation analysis and data -driven models were defined considering only stable condition. However, Data -driven model could be easily extended for all operating condition of the condenser avoiding this filtration stage. As first step, a correlation analysis was performed to highlight the effect of the selected variables over the condenser pressure.

Correlation analysis
As preliminary correlation analysis between each variable was developed in order to highlight dependences between variables and validate input/output selection for data-driven models.
Selected variables are (as reported in Fig. 2): Ambient Air Temperature, Wind Speed, NC gas content, Steam Flow, Fan total adsorbed power, Steam Turbine Active Power and the ACC Vacuum pressure.
In the following figure it is displayed a visual representation of the correlation between the variables for the dataset of Plant #2: Figure 2 shows the distribution of each variable trough the histograms in over the diagonal, while the off-diagonal plot presents correlation plot with linearly trend lines. Ambient temperature ranges between 18.3 and 41.5 °C, while wind speed is mainly around zero for the 60% of the samples with just 8.5% of the values that exceed 3 m/s. The correlation plot show as wind velocity and ambient temperature, in theory not related, presents an actual positive correlation.

Fig. 2. Correlation Analysis -Plant 2
The first evident correlation between the condenser vacuum and other variables is with the Ambient Temperature, as expected. Another clear correlation is between Steam Flow and Steam Turbine Power Output and mass flow rate, that are obviously highly correlated each other. Both correlations are explained clearly by physical relationship between vacuum and heat rejection (strictly depending on Ambient Air temperature) and Steam flow and power output by steam turbine which increase the ACC load. The relations between the vacuum and the other variables are highly non -linear. It can be noticed that Steam Flow and Air Temperature, which represents the main independent variable of the process and the main input taken in consideration by the Semi-Empirical Model, are quite completely uncorrelated (-0.05). This is due mainly to the fact that steam flow rate changes also as function of a controlled parameters, the gas turbine load. However, the effect of temperature over the steam flow can be seen by focusing on the correlation chart [4,1]: the upper stripe represents the full load gas turbine condition, in which steam production is reduced by high ambient temperatures which derates the steam generation. The negative effect of the temperature increase over the ST power [6,1] is enhanced (-0.29) by the concordant increase of discharge pressure. It is interesting to noticed that, due to the high ambient temperatures, the number of operational fans and speed is always maintained to the maximum value, so the observed reduction in fan consumption is due to the reduction in air density at high temperature an thus of the air mass flow rate at given volume flow rate. So, the fan energy consumption represents a good indication of air mass flow rate.
Looking to the non-condensable gases, they appear to be correlated with ambient temperature (and thus fan consumption) and thus to vacuum in a non-linear relation.
This behaviour confirms that in order to use those data, the data driven model must have the capability of evaluate and approximate non -linear function, peculiarity of neural network models [9].
It is noticeable that for each plant considered there is a different relation between the NC gas content and the vacuum pressure: however a general trend of an increase of NC gas content is registered in two of the three condenser at very low pressure values. Under this condition the re-entrance of air is promoted and tend to overcome the capability of extraction of the vacuum pump. This effect is less evident in the case of plant #3, where the values of NC gas content are slighty constant across all the operating range of the air condenser with a value of above 0.07 and consequently correlation with vacuum is very low. Information from the power plant management confirmed that ACCs of plant #1 and #2 suffer of lack of air tightness, but proper vacuum decay tests were not available to compare this parameter among the different plant.

Fig. 3. Non-Condensable Gases density function for each ACCs
Also, wind velocity correlation coefficient is different between the three datasets; it is explained by different positioning of ACCs and different affection of wind by each one of the ACCs. In particular plant #1 and #2, installed in a windy area, shows a very slight negative effect.
Correlation analysis confirms preliminary variables selection for data -driven models, detecting all variables selected highly correlated with vacuum values (output variable) in at least one of our datasets.
Also the need of data-driven approach has been confirmed by detecting many differences between each condenser despite they have same geometry. In this case, an apt trained data -driven model could results into highly specialized model of each single condenser.

Modelling approaches
In order to estimate the performance of the air condenser, three data driven regression models have been defined and compared against each other as function of the selected input: i) A Semi-Empirical model, SE ii) A Neural Network model, NN iii) A Multi Linear regression model, ML The baseline model used in this work to compare other models' results is the SE model based on design curves of the air condenser, which parameters where estimated to fit field data. Since this model are derived by balance equation, just steady state condition and a similar choice was made for the other data-driven models to guarantee a fair comparison even if neural network model can be easily extended also to transient conditions.

Semi-empirical model
In the Semi-Empirical model, vacuum, according to condenser curves in Fig. 4, is mainly function of ambient temperature and steam flow through condenser. In this case steam flow has been considered as percentage load of condenser referring to maximum design steam flow condition. Semi -Empirical model is defined only in stationary operating point where transient effect are negligible and links air temperature and condenser percentage load (in terms of mass flow rate) to vacuum value by non-linear relationship. During normal operation of ACCs, a standard SE model estimate vacuum with a biased average error below 15%. A way to increase prediction accuracy with the SE model was to perform an estimation of the parameters of the SE model, fitting by least square actual field training data of ambient temperature, ACC Load and Vacuum.

Multi linear model
The ML Model is a model based on the multiple linear regression theory. In multiple linear regression there are p explanatory variables, coincident with input variables of the model, and a dependent variable (output variable). The relationship between dependent and explanatory variables is described by: p(x 1 … , x n ) = k 1 x 1 + k 2 x 2 + ⋯ + k n x n Multiple linear regression is an extension of linear regression. The ML model developed in this work is a datadriven model, based on same input and output of neural network model. The model was developed in order to compare the results of a non -linear data driven model such as neural network, with another data driven model. During training process ML model optimize k coefficient by minimizing estimation error on the training dataset. After training model was tested on remaining data to validate reconstruction.

Neural networks model
Neural networks are constituted by an input layer (dimension of the inputs of the model), an output layer of the dimension of the output and one or more hidden layers. In this work the output layer dimension is 1; the only output of the model is the air condenser vacuum. Weights optimization phase is the training of the neural network. During this work was performed various trainings of the models in order to validate and test it always using above the 50% of the data from each dataset as training dataset.
Neural network model can be also developed for every single condenser leading to a highly specialized model for different architectures (not in our case) and operating conditions of various air condenser. For example, the difference between plant #1 and #2 and plant #3, in terms of presence of NC gases, could be easily modelled by a data-driven approach like NN. The feed -forward neural networks has only one hidden layer because neural networks with one hidden layer can easily estimate any continuous function (particularly non-linear functions) with a high accuracy in a wide range of operations. The NN model has been compared with the semi -empirical model previously described and with a multi variate linear regression model trained with the same dataset to make results comparable.

Network configuration and training
When an initial selection of input and output has been done with correlation analysis the operational data of ACC are selected from initial dataset, next step is to develop the NN model based on the selected parameters. Level of complexity between inputs and outputs is determined by the number of neurons in hidden layer. To establish required number of neurons was performed a iterative approach. Starting with low numbers of neurons and continues to add neurons until the accuracy of network estimation improves.
In this work the NN models are trained with Scaled Conjugate Gradient Optimization Algorithm [16] implemented in Matlab Neural Network Toolbox. This training algorithm is particularly suitable for large dataset with a high number of observations such datasets considered in this work.Once NN model has been optimized for this problem defining Networks and training parameters, the training periods was selected from whole dataset.

Models' performance comparison
The aim of the tests is to compare results from each model to detect the most effective in evaluating the performances of an air condenser in terms of vacuum conditions using the State of Art input, i.e. steam mass flow rate and ambient temperature as input. Moreover, the influence of the additional input made available by a broader data set will be assessed using Multi Linear Regression and Neural Network, named respectively ML+ and NN+.
During tests, the Neural Networks structure (hidden layers and hidden neurons) is constant and equal to values obtained after iterative optimization process. The training dataset is above the 50% of the whole dataset (approximatively from June to July 2017), for ML and NN (model which require training datasets), and the test dataset is the same for each model.
The accuracy of regression models is compared using the residuals analysis and with the following metrics: i)MR: Mean residual values; ii)Mean percentage error: percentage expression of mean residual weighted on vacuum values; iii) SSR: sum of squared residuals; iv)RMSE and NRMSE: root mean squared error and normalized root mean squared error; v) R2: coefficient of determination iv); Accuracy: percentage of values between a threshold based on experience.
Both the NN and ML models has to be trained with data covering the entire operational range of the plant. The data selected for the training cover the all the summer operational condition of the plants including all operating range of steam turbine (from minimum to base load) and many environmental conditions. A data driven model could be easily extended to other operating and boundary condition (for example winter conditions) by adding in training dataset a period of normal behaviour of different seasons.  Training of both models (ML and NN) for the three Unit produces similar results, in terms of mean percentage errors and coefficient of determination (Fig. 5).

Test Results
To verify the model generalization, the models were verified against test data, which means the data that were not used for the models training. The data belong to the same starting dataset of above 3 months of operation of the power plant with similar operational and ambient conditions and also effectiveness of air condensers could be considered similar to training periods due to continuous cleaning operation across whole datasets.
Regarding the Unit 3, which has a test dataset of above 75000 samples, reconstructions and residuals are represented in Fig. 6a -6b -6c. The models were compared with the metrics in the Tab. 3 which indicate the same behaviour of the residual's analysis. The same test was developed for the Unit #1 and Unit #2. The results are quite similar with the previous one in terms of magnitude and tendency. Therefore, the difference between datasets in terms of accuracy of measurement and acquisition, NN+ model always performed better than ML+, ML, NN and SE models. In particular, SE model, which does not take into account the particularity of each condenser in terms, for example of air tightness, produces higher mean error than both proposed data-driven models with complete data set input. SE model performs well in certain condition, for example for certain level of condenser vacuum. Instead the NN+and ML+ model, based on data -driven approach that include a wider input datasets in terms of variables, could easily maintain a high accuracy for different Unit and for all the operative range of power plants. The most important metrics considered for model selection are R2 and Mean Percentage Error that were reported for all the plants in figure 7. The fig. 7b) confirms that all the model has captured the Condenser performance with a mean percentage error below 12%. Using only Ambient Temperature and ACC Load the error, averaging on all the Plant results, is lower for Semi-Empirical model with fitted parameters (about 7%) than for linear regression (8.3%) and Neural Network models (7.6%).
The fitting performance increase in case of NN+ and ML+ models, considering a larger number of operative parameters, because they are capable to capture relationship between the other inputs variables with vacuum, that are not take into account in the other models. Comparing ML+ and NN+ we noticed that NN+ error is lower than ML+ error rate, so NN appears more capable to reproduce the non-linear behaviour of the air condenser compared with ML which apply a linear model. As results, the mean percentage error of the NN+ model in all plant operating conditions slightly lower than the error of the linear regression model.

Conclusion
This paper presents a comparison of several models for the estimation of the performance of an air-cooled steam condenser with the aim to develop an efficient online monitoring model. A preliminary sensitivity analysis on simple data correlation highlights the effect of ambient temperature over the vacuum. Those two variables are usually adopted in Semi-Empirical models that returns the expected vacuum value, trough curves calculated trough the condenser energy balance. Semi-Empirical models curves' parameters were estimated basing on training data set of field measurements and other two data-driven models were presented, basing on a neural network and multi linear regression technique. The latter two models were trained i) on just ambient temperature and steam flow rate, as the Semi-Empirical models, and ii) on a complete data set including also the other process and ambient variables in the prediction of the condenser back pressure (i.e. wind speed, non-condensable gas content, the power absorbed by the air fan and the power produced by the steam turbine). Non-condensable gas content was determined on the base of partial pressure, highlighting a tendency to air accumulation for low condenser pressure values. Training data representing the healthy condition of the condensers, were used to create the baseline and test data, still under healthy conditions, has been used to assess the reconstruction errors. The Semi-Empirical model results in a better prediction if just steam mass flow rate and ambient temperature are available, with an error of the 7%, thanks to the physical knowledge contained within the "curves shapes", with respect to linear regression (8.3%) and Neural Network models (7.6%). Higher accuracy can be then obtained by considering a larger number of operative parameters and fully exploiting data-driven model capability which reduces the error just below the 3%, with the neural network model that shows a slightly higher accuracy than the linear regression model. This proves how the Semi-Empirical model can beneficial in create an Air-Cooled Condenser model, since the shape of the curve, derived by an equation-based knowledge (i.e. based on balance and empirical equations), coupled with a proper parameter estimation strategy, allows to have a good accuracy of the model basing just on the standard measurements. Moreover, the calculation can be easily implemented in already existing control systems to develop an on-line monitoring system, easy to maintain.
The full data-driven approach must be preferred when considering a large number of Air-Cooled Condenser related measurements, improving the regression performances of the models. The comparison of the models' accuracy shows that Neural Networks approximate better the air condenser vacuum than the others. The results shown that the methodology is generic for different air condenser also with different air tightness and site conditions and could be extended to different plants better than empirical based models, since no detailed thermodynamics information about the condenser construction are required and just operational data are required. Calculation is fast and can be used for performance online monitoring, if the control system supports such kind of algorithm, of through dedicated platform.
On the contrary, the data-driven models require data that cover the entire operating range obtain a proper trained model selecting data with respect to major and possibly minor maintenance activities. This operation is not trivial and must be conducted by aware personnel to derive meaningful baseline for the subsequent monitoring activities.