The Application of Artificial Neural Network Model to Predicting the Acid Mine Drainage from Long-Term Lab Scale Kinetic Test

Acid mine drainage (AMD) is one of the common environmental problems in the coal mining industry that was formed by the oxidation of sulfide minerals in the overburden or waste rock. The prediction of acid generation through AMD is important to do in overburden management and planning the post-mining land use. One of the methods used to predict AMD is a lab-scale kinetic test to determine the rate of acid formation over time using representative samples in the field. However, this test requires a long-time procedure and large amount of chemical reagents lead to inefficient cost. On the other hand, there is potential for machine learning to learn the pattern behind the lab-scale kinetic test data. This study describes an approach to use artificial neural network (ANN) modeling to predict the result from lab-scale kinetic tests. Various ANN model is used based on 83 weeks experiments of lab-scale kinetic tests with 100\% potential acid-forming rock. The model approaches the monitoring of pH, ORP, conductivity, TDS, sulfate, and heavy metals (Fe and Mn). The overall Nash-Sutcliffe Efficiency (NSE) obtained in this study was 0.99 on training and validation data, indicating a strong correlation and accurate prediction compared to the actual lab-scale kinetic tests data. This show the ANN ability to learn patterns, trends, and seasonality from past data for accurate forecasting, thereby highlighting its significant contribution to solving AMD problems. This research is also expected to establish the foundation for a new approach to predict AMD, with time efficient, accurate, and cost-effectiveness in future applications.


Introduction
For many years, Artificial Neural Networks (ANNs) have emerged as an incredible tool, celebrated for their remarkable capability and robustness when dealing with complex datasets.ANNs, inspired by the human brain's neural structure [1], have witnessed rapid and major developments, proving their capabilities across many domains and setting new benchmarks as State-of-the-Art (SOTA) models in various modalities.

arXiv:2409.02128v1 [cs.LG] 1 Sep 2024
The potential of ANNs lies in their ability to comprehend intricate patterns and relationships within data, often transcending the limits of conventional analytical techniques.This paper explores whether this ability of ANNs can be implemented in the environmental engineering domain.Specifically, to investigate their utility in understanding and forecasting Acid Mine Drainage (AMD) through Long-Term Lab Scale Kinetic Tests.
AMD is strong, acidic wastewater rich in high concentrations of dissolved ferrous and non-ferrous metal sulfates and salts, and if AMD is left untreated, it can contaminate ground and surface watercourses, damaging the health of plants, humans, wildlife, and aquatic species [2].Traditionally, AMD is analyzed using the kinetic test.This process involves simulating mine drainage production from samples influenced by mining activities, and incorporates dynamic elements encompassing physical, chemical, and biological systems and processes that govern the generation of acidic or alkaline mine drainage.Kinetic tests primarily focus on studying reaction rates and mechanisms leading to acidic or alkaline mine drainage, typically requiring larger sample volumes and extended durations than static tests, and are typically conducted in laboratory settings.These tests yield crucial information regarding sulfide mineral oxidation rates, acid production rates, and drainage water quality [3].Usually, comprehending the complex kinetics of AMD generation and forecasting its behavior through kinetic tests has been laborious and resource-intensive.Consequently, developing cost-effective and sustainable remediation solutions for the AMD problem has been the subject of extensive research.
Predicting and forecasting analysis using ANN or Machine learning (ML) in general, has been used in various domain, including nonlinear timeseries and have gained overwhelming attention over the past years [4].Forecasting mining influenced water data using various ML technique including tree based method and ANN show positive result, with close to accurate prediction [5].Beside ML method, traditional model such as auto regressive integrated moving average (ARIMA), Box-Jenkins, etc. has been applied to this domain problem, but also coming with some drawback since traditional model assume that time series data are linear processes [6].This study will focus on developing ANN model with various technique that suits the time series domain and will focus on capturing the relation between data over time.

Feedforward Neural Network (FNN)
One of the most common type of ANN architecture is a feedforward neural network (FNN).This type of model consists of several layers of linear model, with each corresponding number of neurons.A linear model (Eq. 1) is the basic structure of any neural network model [1].Each linear model with a non-linear activation function is called a neuron, and usually, every layer of FNN or ANN in general, consists of a few or maybe hundreds of neurons.Each neuron in a layer is connected to every neuron in the previous and next layers (Fig. 1).These are called dense or fully connected layers.FNN usually consist of input layer, a few hidden layer, and an output layer (Fig. 1).The information flows and processes in one direction, from the input layer through the hidden layers to the output layer, without any loops or cycles.

Multivariate Long-Short Term Memory (LSTM)
Long Short-Term Memory (LSTM) is a Recurrent Neural Network (RNN) that differs from traditional RNN.LSTM cell uses a state that represents a "memory" or a "context" besides the inputs and the outputs [7], which aims to learn when to remember and when to forget pertinent information.LSTM contains three gates to control the dependencies, as shown in Fig. 2 an input gate to select the inputs, a forget gate to free some part of the memory, and an output gate to control the output.LSTM solve the vanishing gradient problem of traditional RNN when dealing with long sequences of data due to the application of Back Propagation Through Time (BPTT) for a specific Horizon [8].LSTM consist of these operations.

Encoder-Decoder Architecture
In the Encoder-Decoder architecture, there are two different inputs, the past feature values and the known current feature values.The Encoder part compresses the information from the first input sequence into a vector, which is generated from the sequence of the LSTM hidden states [9].The encoder hidden states and also the second input feed into the decoder part and generate the output sequences.In addition, some dense layers were provided before the output layer to give a better prediction sequence.Fig. provide better explanation for the encoder-decoder architecture.This architecture also help learn the time-dependent characteristics of the sequence, give better prediction for future value [10].

Evaluation Metrics
The models' performance in the validation dataset was assessed to determine the optimal model architecture for our research objective.The metrics used in this study as follows.

Mean Squared Error (MSE)
Mean Squared Error, abbreviated as MSE, is a fundamental metric in regression analysis that calculates the average of the squared differences between predicted values and actual observations.It provides insight into the precision of a predictive model by quantifying the average squared error across all data points.Smaller MSE values are desirable as they indicate a model that predicts closer to the actual values.

Mean Absolute Error (MAE)
Mean Absolute Error, abbreviated as MAE, is another widely used metric in regression analysis that measures the average of the absolute differences between predicted values and actual observations.MAE provides a straightforward way to understand the average magnitude 3 Dataset

Dataset Overview
The dataset for this study was acquired from one of the mining locations in Indonesia.Data was gathered between 09 February 2021 and 02 September 2022 every 7 days.This data contains 7 parameters, i.e. pH, redox potential (ORP), conductivity, total dissolved solids (TDS), SO 4 , Fe, and Mn.Fig. 4 shows the graph visualizations of the collected data.

Stationarity Test
Stationarity test is an important process in time series analysis and forecasting.A stationary condition of a time series data is when its properties do not depend on the time at which the data is observed.Stationary data does not have trends and seasonality, thus making it easier to analyze and forecast [11].Therefore, the Augmented Dickey-Fuller (ADF) test was used to test the stationarity of the dataset.The test was developed using a highly significant p-value (0.05) (Table 1).Additionally, line plots for the dataset were drawn to visualize and help identify any stationarity or non-stationarity properties of the dataset.According to the results, pH and ORP show non-stationary behavior as indicated by their p-values exceeding the critical threshold of 0.05.On the other hand, conductivity, TDS, SO 4 , Fe, and Mn indicates stationarity, supported by their p-values below 0.05.This means that these parameters showcased consistent properties over time, devoid of trends or seasonality, making them more amenable to analysis and forecasting.Thus, the ADF test result indicates that the dataset is stationary.

Anomaly Detection
Anomalies are patterns in data that have different characteristics from expected conditions and often give bias to the data [12].These anomalies can occur due to several factors, e.g., contamination, device calibration, human error, or other factors.Detecting anomalies has significant relevance and often improves performance during analysis and forecasting.Isolation forest, a model-based approach, is used for detecting anomalies, which constructs an ensemble of tree structures where anomalies are closer to the root, while normal points are deeper.This approach effectively detects anomalies with a small number of trees and minimal sub-sampling size, thus quick to convergence [13].The isolation forest was implemented with contamination parameter of 0.2 (Fig. 5).
As indicated by their indices (0, 1, 2, 3, 4, 6, 8, 9, 10, 11, 12, 13, 16, 17, 18, 25, 82), anomalies were successfully identified using the isolation forest model with chosen contamination parameter of 0.2.These anomalies are characterized by different patterns characteristic from the expected conditions due to several factors.This successful anomaly detection is pivotal for improving the reliability and accuracy of our analysis and forecasting processes, ensuring a more robust and precise interpretation of the underlying patterns in the data.

Data Interpolation
Huge amounts of data are usually needed when training ANN or deep learning [14].However, this dataset only contains 83 data points and gathered between 09 February 2021 and 02  September 2022 every 7 days.Predictive interpolation using random forest and tree-based gradient-boosting regression models were used to interpolate the missing data between the time intervals.Every model for each parameter were developed based on the time component with sine and cosine transformation.The data were split into training (80%) and testing (20%) sets.Each parameter is calculated based on the average of the top-3 interpolation results from the best-performing model (Table 2).Except for Mn parameter, where the value calculated from the result of random forest, XGBoost, and ExtraTrees.

Data Transformation
ANN or deep learning model usually prefer to receive inputs on the same scale.That is because ANN is just stacks of linear transforms with non linear activation function [1].Thus, building forecasting models with untransformed data often results in inaccurate forecasting results.Therefore, the data need to be transformed to close to normal distribution.The time component is also transformed using cosine transformation to give the information about time to the model.

Model Development and Evaluation
In this study, three types of model architecture, FNN, LSTM, and encoder-decoder LSTM were develop.Each of the types were trained with three different window size or the number of past time-steps data the model needs to use to predict the current time-step, except for FNN where also trained without past data.Each of the types and window sizes were trained and tested with each independent set of data.The data were split into training (70%) and testing (30%) sets.All of the model are also trained to forecast all of the parameters.The models were evaluate based on MSE, MAE, and NSE values.The models were trained on the same batch size of 4, ReLU activation function for the output layer, MAE loss, and adaptive moment estimation (Adam) optimizer.The result were shown on Table 3.All model's hyper-parameters were tuned and optimized, and callbacks were also used to create the best performing model.
Each model was also tested to forecast the future parameter values for the next 60 days.To determine which which model is the best model of the others, the forecast result and plot given in Fig. 6, Fig. 7, and Fig. 8 can be used, also by given the known condition of train and validation loss and also by comparing to measured data.Generally, lower training and validation loss implies good model fit on the train data and new unknown validation data.Aside from lower training and validation loss, the good and fit model is defined by the distance between training loss and the validation loss.If the distance is close enough, the model is probably good fitted, if the distance is quite far, there a possibility of overfitted model, which is not a good model, and if the validation loss is lower than the training loss, that implies the possibility of underfitted model, which also not a good model.

Result and Discussion
From the training and validation performance, encoder-decoder LSTM displayed the overall best performance compared to other types of models and structure variation.It shows a good model fit without overfitting to the dataset, Furthermore, the variation in window size shows some overall improvement, meaning the longer the sequence of past values, give better information to the model when predicting the current value.In addition to the model's performances, the results suggest that ANN models can be applied in AMD time series forecasting and analysis.
The models were also evaluated by forecasting all features for 60 days (Fig. 6, Fig. 7, and Fig. 8).Because of the kinetic test measurement and observation were not carried out daily, the measured data only contains 9 observations, while the forecasting period was for 60 days, the calculated performance and error were based on the availability of measured data.Also, based on the performance of the encoder-decoder LSTM on training and validation, we only calculate the performance of that models, for all window size (Table 4).Based on that result, the encoder-decoder model with 7 days of past value show better performance.This was shown by the lower MSE and MAE values compared to other window size variation.

Conclusion
The forecasted data show low error measurements from MAE and MSE metrics and also follow the historical trend and pattern.Given this condition, the best performing model from the proposed methodology can be applied with certainty and confidence in forecasting AMD.This ANN approach show that the computer can learn patterns, trends, and seasonality of previous data in order to forecast the future value.This also can be concluded that by applying ANN models is a relevant contribution and addition to solve AMD problems.Finally, the results obtained in this study indicate that ANN technique are powerful and important mechanism to model and forecast the AMD data or nonlinear systems in general.These approaches also show a much better performance and accurate approach compared to traditional time series analysis and statistical techniques.Finally, this study show the ability to predict the actual lab-scale kinetic test in order to predict the AMD with a shorter time, high accuracy, and cost efficiency.

Figure 6 .
Figure 6.Forecasted concentrations of all parameters for 60 days of Feedforward Neural Network.Historical data was used between 15 June 2022 to 02 September 2022 for better visualization.

Figure 7 .
Figure 7. Forecasted concentrations of all parameters for 60 days of Multivariate LSTM.Historical data was used between 15 June 2022 to 02 September 2022 for better visualization.

Figure 8 .
Figure 8. Forecasted concentrations of all parameters for 60 days of Encoder-Decoder LSTM.Historical data was used between 15 June 2022 to 02 September 2022 for better visualization.
Nash-Sutcliffe Efficiency, assesses the goodness of fit between observed and simulated data.With values ranging from negative to 1, NSE quantifies the performance of a model in replicating observed data.An NSE of 1 represents a perfect match between model predictions and observed data, while values greater than 0 indicate that the model is superior to using the mean of the observed data.Conversely, negative NSE values suggest that the model performs worse than a basic mean-based estimate.

Table 3 .
ANN model structure and performances.