Detection of waterborne bacteria using Adaptive Neuro-Fuzzy Inference System

. The detection of waterborne bacteria is crucial to prevent health risks. Current research uses soft computing techniques based on Artificial Neural Networks (ANN) for the detection of bacterial pollution in water. The limitation of only relying on sensor-based water quality analysis for detection can be prone to hu man errors. Hence, there is a need to automate the process of real-time bacterial monitoring for minimizing t he error, as mentioned above. To address this issue, we implement an automated process of water-borne bac terial detection using a hybrid technique called Adaptive Neuro-fuzzy Inference System (ANFIS), that integ rates the advantage of learning in an ANN and a set of fuzzy if-then rules with appropriate membership func tions. The experimental data as the input to the ANFIS model is obtained from the open-sourced dataset of g overnment of India data platform, having 1992 experimental laboratory results from the years 2003-2014. We have included the following water quality parameters: Temperature, Dissolved Oxygen (DO), pH, Elect rical conductivity, Biochemical oxygen demand (BOD) as the significant factors in the detection and existen ce of bacteria. The membership function changes automatically with every iteration during training of the sy stem. The goal of the study is to compare the results obtained from the three membership functions of ANFI S- Triangle, Trapezoidal, and Bell-shaped with 35 = 243 fuzzy set rules. The results show that ANFIS with generalized bell-shaped membership function is best with its average error 0.00619 at epoch 100.


Introduction
In the present situation, people are struggling to obtain access to clean water due to significant infectious risks linked with the ingestion of contaminated water with human or animal feces [1]. Micro-organisms or pathogens cause some of the primary diseases because they may live, reproduce, and disperse in water systems [2]. Approximately 1.7 billion children in developing countries under the age of five died due to diarrhea, mainly by drinking contaminated water that has been reported by the World Health Organization (WHO) in 2011. Besides, 525,000 children deaths a year in 2018 in world-wide because of the poor water quality, sanitation, and hygiene conditions, mainly through infectious diarrhea. In the world, 1.9 billion people use water, which is faecally contaminated [3]. Drinking water comes from surface and groundwater sources. Due to damaged sanitation and dwindling resources that make the availability of safe water almost inaccessible because of the bacterial and chemical contaminations [4]. With rapid research on the detection of water-borne pathogens to curb diseases, the adequacy of the standards of drinking water quality to prevent water-borne illnesses is still not met [5]. According to the micro-biological drinking water quality standards established by the WHO, EPA, and IS: 10500 [6], bacteria shall not be detectable in 100 ml of water sample [1,17]. Artificial neural networks (ANN) is a mimic of the biological neural network of the visual cortex in the brain. The brain consists of a densely interconnected set of information-processing units called neurons. Information is stored and processed in the brain by the involvement of every neuron, which subsequently helps in human learning. Similarly, an ANN model trains itself for learning by connecting to different nodes [7]. The ANFIS incorporates the self-learning ability of neural networks with the linguistic expression function of fuzzy inference. ANFIS is a multilayer feed-forward network in which each node performs a particular function on receiving signals and has a set of parameters about this node. Similar to ANN, ANFIS is capable of converting unseen inputs to their respective outputs by learning the rules from previously observed data [8]. An ANFIS model is capable of adjusting the parameters better in any series and takes into consideration all the edge cases in a rule-viewer interface. ANN model may not take the probabilistic values, but using ANFIS, we can make a set of rules for the same. While ANFIS integrates with both fuzzy inference systems and ANNs, it is useful in solving non-linear and complex problems within a frame [9] Hybrid based methods like ANN and fuzzy or ANFIS-GA (Genetic Algorithm), can prove to be extremely useful in dealing with missing data [10]

ICEPP 2019
Hybrid models significantly increase the accuracy of estimation, especially in non-linear problems [11] 2. Related work Bouharati,S et al. [8] proposed a method for the detection of micro bacterial pollution in freshwater using an ANFIS. The model produced instantaneous results by the measurement of the physical and chemical properties of the sensors. ANFIS based methods are based on the concept of Fuzzy set theory, which states that a variable can partially belong to a set and can have a membership value between 0 and 1. In this paper, three parameters were selected as input, i.e., pH, temperature, electrical potential, and the output was the number of bacteria. The next step was to create a membership function for the input variables; using these variables we can classify them into different levels like low, middle, or high. The number of rules depicted the number of fuzzy sets created. The author revealed the use of an artificial neural network model of three layers trained and tested on the collected water samples. ANFIS based model was used because it combines the advantages of fuzzy systems with transparent knowledge representation and those neural networks which deal with implicit knowledge that can be acquired using learning. Kamali and Binesh [12] used ANN and ANFIS to study the diffusion of water through nanotubes using the molecular dynamics data. They concluded that ANFIS outperformed ANN.
[13] compared the performance of ANN and ANFIS in the triage of emergency patients using various vital signs of patients as input parameters. Chandaran, U. D et al. [14] explained the detection of sulphate reducing bacteria (SRB) using ANFIS, which can be crucial in curbing the corrosion of iron material in the system. The author used three parameters: Voltage, Temperature, and humidity for training the model. The membership functions were taken to be trapezoidal and bell-shaped. The ANFIS model used three inputs, which finally gives the output as either 1 or 0. The predicted results were obtained by the input parameters and the number of epochs was taken as 20. Lastly, the model was tested with testing data up to 250 epochs.The author compared the results and the best membership function was given by trapezoidal shape. Keshavarz, Z et al.
[10] explained the application of ANFIS based method in determining the compressive strength of concrete. The model used 150 different concrete specimens with various mix design parameters. Five different concrete mix parameters, i.e., cement, water to cement ratio, gravel, sand, and micro-silica, were considered as the parameters. For results, two of the soft computing methods: ANN and ANFIS, were selected to detect the compressive strength of concrete. The results were computed in MATLAB, where the concrete mix parameters were used as input variables and the compressive strength of concrete was used as an output parameter. In order to compare the ANN and ANFIS based methods, the author used parameters like the R squared coefficient of both the models. The higher values of the coefficient of determination would indicate the better capability of the model in predicting the specific studied characteristics. Calp, M. H [11] proposed a hybrid model for the estimation of the regional rainfall amount. The proposed model focused on providing efficient management of water resources by estimating the amount of rainfall that can occur in the region. While creating the model, the MATLAB package program was used and regression values (R) or mean squared error (MSE) were taken into account. The error rate was obtained as 0.9920, 0.9840 and 0.0011, respectively, for the model. The author concludes by stating that this hybrid model is an important support tool for estimating the amount of annual rainfall and ensuring the effective management of water resources.

Methodology
In the first part of this section, we explain the design of a fuzzy expert system based on membership functions. Subsequently, an ANFIS model will be introduced based on the fuzzy rule base. In this study, we have assigned three fuzzy sets to each water quality parameter, namely desirable, undesirable, and highly undesirable. The individual membership function is assigned to each parameter, as described in Table 1. functions for data, then to train the input using the ANFIS training function. Finally, the predicted result can be obtained by inputting the parameters. Figure 1 shows the structure of the ANFIS model. 3.3Predicted output for ANFIS The testing data is used to check the capability of the model. For output prediction, we compare the results obtained from three membership functions (triangle, trapezoidal, and bell-shaped). The error rate and parameters for every single membership function are plotted. The resulted FIS is tested using testing data for a hundred epochs. We tabulate the error analysis for each membership function type.

Results and discussions a)Triangular as Membership Function
By setting the number of membership functions to three for input data and using triangular function, the parameters for each input's membership function are tabulated for epoch 1-50. Figure 2 and Figure 3 show the initial and final membership functions of the input data derived by training via the triangular function. Table-2 shows the label of results that are used and their representative of the membership functions. The proposed ANFIS is shown in Figure 1.     c) Generalized bell-shaped as Membership Function By setting the number of membership functions to 3 for input data and using a generalized bell function, the parameters for each input's membership function are recorded and tabulated for epoch 1-100. Figure 6 and Figure 7 show the initial and final membership functions of the input data derived by training via the trapezoidal function.   Error rate The training error of three membership functions are tested and shown in Figure 9. For each type of membership function, the error rate after every single epoch has been recorded. The rate is tabulated in Table 3.   Table 2, it can be seen that the error rate of using bell-shaped membership function is lesser from the starting if compared to triangular and trapezoidal function. The differenced becomes more noticeable when epochs equal to 100. The error rate is only 0.006 (bell-shaped), which is lesser than 0.06 (triangular) and 0.05 (trapezoidal), and the results show that the bell-shaped function always gives the least error.

Conclusions:
An ANFIS model for the detection of bacteria in drinking water sources has been developed with 243 fuzzy set rules, and the predictive ability of the model is compared with three membership functions. The membership function changes automatically with every iteration during the model training. The results show that ANFIS with a generalized bell-shaped membership function is the most suitable membership function to model bacterial detection. The least error obtained at epoch 100 is 0.00619 by applying a bell-shaped function, through the testing data verification. ANFIS with bell-shaped membership function gives precisely the same output as experimental output.