Prediction of capital cost of ro based desalination plants using machine learning approach

This paper presents a neural network tool for predicting the capital cost of desalination plants based on reverse osmosis technology. A multi-layer feedforward neural network with back propagation learning method is used to model the investment cost of RO plants. The model is developed using the data sets of 1806 RO plants of capacity at least 1000 m3/day, which involved training, testing and validation. The model used six inputs that included both categorical and numerical data elements, namely: plant location, plant capacity, project award year, raw water salinity, plant types, and project financing type. The output is the capital cost of the RO plants planned. This prediction model can be used by governments, investors or other stakeholders in desalination industry to make a reasonable estimate of investment costs of upcoming RO plant projects.


Introduction
With the increase in water demands over years many of the water scarce countries are depending on desalination technologies for producing drinking water. Desalination can be defined as the process of removing dissolved salts from water with high salinity to produce water with low salinity that meets the quality (salinity) requirements of various purposes (1). There are different technologies by which desalination is carried out which include but not limited to major technologies like multi-stage flash distillation (MSF), multiple effect distillation (MED), reverse osmosis (RO) and hybrids of these. RO is one of the newer desalination technologies that's on the rise all over the world. During last few decades, the use of RO has gained wide acceptance in the Middle East countries because of lower cost and easy operation (2,3). A desalination plant is a large-scale project which is usually undertaken by governments; hence, an estimation of the project cost is important for governments, investors and stakeholders. When new RO plants are proposed, an estimation of the expected project cost is key in deciding the budget allocation. This can also help identify whether the project is cost effective or an alternative is to be looked for.
The project cost of an RO plant consists of two major components: the capital cost which is a one-time cost, and the annual operating costs (4,5). The capital cost also called as capital expenditure (CAPEX), can be direct or indirect costs. Direct CAPEX includes land cost, costs of major equipment ,engineering cost and so on (6). The indirect capital costs include elements such as insurance, construction, and overhead (7). The operating cost, also called as OPEX, is the cost required to run the desalination plant which includes energy cost and other costs such as labour cost, spare parts, chemicals, membrane replacement, and so on (6,8). There are different methodologies and assumptions adopted by various researchers in order to model the capital and production costs. However, this study is limited to develop a predictive model to estimate the capital cost of future RO plants using a methodology which is different from past studies.
This study began with gathering information on various methodologies available for the cost estimation of desalination plants. Literature review showed that comparative cost models, statistical models, semiempirical cost models and parametric cost estimation are the most commonly used approaches. Comparative cost methods are old estimation methods, in which estimates are based on the published cost data available. In 1976, Glueckstern and Reed (9) published a report on the product water cost where plant sizes in the range of 1 to 200MGD were investigated. Comparative method was adopted by Shatat and Riffat (10), and Al-Karaghouli and Kazmerski (11) for developing a table that provided cost ranges for major desalination technologies. One major drawback of this method is that it didn't take into consideration some parameters like year of construction and geographical location of the plant, which are already established to have impact on the cost.  (4) to forecast the nature of unit cost of desalination in near future. All the statistical models discussed above tried to identify the main parameters affecting the cost of desalination and tried to provide a tool for estimating the cost of desalination plants in terms of unit cost, capital cost, and O&M cost.
Semi-empirical methods are deployed in determining the investment and production costs of RO plants. Studies conducted by Greig and Wearmouth (15) in 1987, and Frioui and Oumeddour (8)  There are few commonly available commercial cost estimating tools in the market .In literatures (16)(17)(18) , tools namely, Desalination Economic Evaluation Program (DEEP), Water Treatment Cost Estimation Program (WTCost) and Global Water Intelligence Cost Estimator (GWI CE) are found. DEEP is available as a free software which can be used for performance and cost evaluation of desalination plants. GWI CE is a tool that can be used to estimate the capital expenditure (CAPEX), operating expenditure (OPEX), and resulting water prices of seawater reverse osmosis, and other desalination techniques. These tools work on the tailor-made preprogrammed rules or algorithms by the developers. Also, it requires enough knowledge on the technical and design parameters of desalination, which makes it ineffective tool for use by economists.
This research is focused to develop a tool which could predict the capital cost of RO plants with reasonably good accuracy. An analysis of expected project cost is important is making decisions on any proposed project, and therefore, same is applicable for RO plants. This will help in deciding the budget allocation and approval of a project, even though the decisions are dependent on other factors. Therefore, an efficient and easy cost predictor is required in planning and implementation of any desalination projects. The proposed methodology is based on artificial intelligence which is reliable and easy to use. Self-learning technique in artificial intelligence is already in use for many of the scientific and engineering application. Hence, the artificial neural network approach using machine learning technique is made use in this study. Studies conducted by Elfahham (20) and Trefor (21) are some of the examples of application of artificial intelligence using neural networks (NN) in the construction industry. In this paper, a multilayer perceptron with back propagating learning method is deployed to predict the capital cost of the RO plants. The model is developed using the data available for 1806 RO plants.

Methodology
There are many parameters that can affect the capital cost of RO plants. It can be both technical and non-technical. Technical parameters include capacity, desired quality of product water, raw water quality, pre-treatment requirements, disposal of brine discharge, options of membrane and so on. Non-technical parameters can be social, environmental or geographical. In this research, the preliminary tasks comprised data collection and processing. Cost data on RO plants was collected from the database available with Global Water Intelligence (GWI) (17), who keeps the inventory of all desalination plants. The data contained information on many design parameters as well as other non-technical details like location, procurement type, capital cost, year of the project, contractual period, supplier details, consultants and so on. However, pre-processing of data identified those parameters which are identified to be relevant for our study. The pre-processing involved choosing the potential variables that can affect the capital cost and/or operational cost. Based on the comprehensive search in the literature (4,5,13,(22)(23)(24)(25), six variables from the cost database got qualified as the modelling variables. They are plant capacity, raw water type, year of project award, geographical location of the plant, plant type and project financing type. Plant capacity is established to be an important parameters through a number of studies including (4,23).Raw water quality expressed in terms of salinity level are used by (13) .There are several papers which presented geographical location proves an important role in the capital cost the RO plants, as the prices are found to vary with continents and countries (13,22,25).The year of project is found to be very important as the cost was found to vary with years (8,14,20,21,24). The type of procurement or financing and plant type is chosen to find out the impact of other possible non-technical parameters on the capital cost. Both were included in the study as they were found to be statistically important. Thus, a list of readily available potential parameters was selected to the neural training and is shown in Table 1. Neural networks are algorithms that learn from the input data to predict the output. This is a computing process by which relationships between the input variables and outputs are established. The training data of the neural training are compiled for 1806 RO plants which were built between 1980 and 2018.As the input data elements comprised both categorical and numerical variable, all the categorical variables are first converted into an equivalent number in binary values. The output of the model is the capital cost of the RO plants. The cost values are converted to present value in US dollars corresponding to 2019, considering the inflation over a large time span. In this study, neural network toolbox in MATLAB is used to develop the model. For the training process, the data was sorted into three sets, the first set comprised 70% of the data for training, the second had 15 % for testing, and the remaining 15% was assigned for validation. All the major training algorithms available in MATLAB environment were explored, namely, Levenberg-Marquardt backpropagation, Bayesian regularization and scaled conjugate gradient. Multi-layer neural networks with different number of hidden layers tried to examine the impact on model performance. The performance of the model was evaluated using MSE, number of epochs, regression coefficient r value and training time.   A sample NN predicted results of a hypothetical case is shown the Figure 6. In this prediction, the inputs used are "capacity-10000 m3 / day", "feed water qualitysaline (seawater)", "plant type -standalone", "region-Middle East Asia", "project procurement type -EPC contract (Engineering, Procurement & Construction)".The prediction was made for years 2020 to 2030. The capital cost is showing a decreasing trend. This could be due to the combined effect of the parameters and is matching with the current trend where the RO plants are decreasing over years. A periodic update of the data is essential to make the predictions more accurately and over a wider span of time.

Conclusion
In this paper, the applicability of machine learning using neural networks in predicting the capital cost trends of RO is studied. As the projects like RO plant construction require high capital investment, there is a need to estimate the capital cost for the purpose of planning and budget allocation. A NN cost prediction model is developed to predict the capital cost of RO plants. Thus, the main contribution of this study is to provide the stakeholders of a water desalination projects with an easy and reliable tool for estimating the expected investments of coming desalination projects.
In the light of the reasonably good results, the same methodology can be adopted to other desalination types which will help in reducing the cumbersome estimation of investment cost in the pre-approval stage of any water treatment plants.