Development of Prediction models for Bond Strength of Steel Fiber Reinforced Concrete by Computational Machine Learning

. Sustainable construction contributed to the usage of recycled and waste materials to substitute conventional concrete. This research focuses on prediction of normalized bond strength of cement concrete substituted by large amounts of waste materials and products with strong mechanical properties and sustainability. It also emphases on using analytical model for the prediction of bond strength of the green concrete, so that there is a reduction in the cost of construction, con-serve energy, and it will lead to a reduction of CO2 production from cement industries within reliable limits. In this paper machine learning approach has been used to predict the normalized bond strength of green and sustainable concrete. Machine learning empowers machines to learn from their experiences and data provided. The system analyses the datasets and finds different patterns formed in the given data. Then, based on its learnings the machine can make certain predictions. In civil engineering application, a special computing technique called the Machine learning (ML) is in huge demand. ANN is a soft computing technique that learns from previous situations and adapts without constraints to a new environment. In this work, a ML network model for prediction of normalized bond strength of concrete has been illustrated. Different sets of data based upon several concrete design mixes were taken from technical literature and were fed to the model. The model is then trained for prediction, which are being influenced by several input attributes and were jotted down a linear regression analysis.


Introduction
Machine learning is an area of study which helps computers or systems to learn from their experiences and improve. Arthur Samuel defines machine learning as "the field of study that gives computers the ability to learn without being explicitly programmed". This definition given by Arthur Samuel is not a very formal definition of machine learning [1,3]. Therefore a relatively newdefinition is given by Tom Mitchell, it says, "A computer program is said to learn from experience E with respect to some class of task T, and a performance measure P, if its performance at tasks in T, as measured by P, improves with experience E". For example, two digits are paired, the user inputs one digit to expect the other digit from the machine. The machine then has to identify the logic between the pairs and give the other value as a prediction to the user [4,7]. This process of finding or evaluating the logic, and learning from experiences is what machine learning is all about.
Machine learning is a technology that emphasises on learning from data. The system analyses the datasets and finds different patterns formed in the given data. Then, based on its learnings the machine can make certain predictions [8].
There are many different approaches that can be used for machine learning. The approaches commonly used in machine learning are supervised learning, semisupervised learning, unsupervised learning and reinforcement learning. In this work data has been collected experimentally and has been analysed in order to apply the machine learning techniques. The data comprises of both the input features and the output features. Therefore, supervised learning became handy.

Experimental Setup
The dataset obtained was not in the desired form so as to apply different machine learning algorithms. So proper data preprocessing was performed. Initially the dataset was normalised, thereafter outliers were removed and missing data was handled. After all the cleaning and pre-processing was done the dataset contained 361 instances. This cleaned and pre-processed dataset was then used to apply machine learning algorithms so that the Normalised Bond Strength [NBS] of steel could be predicted [9,12].
In order to make predictions for the normalised bond strength of concrete, NBS was considered as an output parameter and strength grade, fc cube, fibres, volume fraction, types of aggregate, admixture, admixture content (in percentage), specimen geometry, diameter, bond length, length to diameter, concrete cover, cover to diameter, type of bar, age at testing, maximum temperature, time at maximum, cooling, type of test, temperature variation were considered as input parameters [11,13]. So there were a total of 21 features, 20 being input features and 1 being the output feature that needs to be predicted as shown in Table 1.
We used different machine learning algorithms like elastic net regression, lasso regression, ridge regression, support vector regression, random forest regression, multiple regression and CART regression to train our machine. We found that multiple regression and CART regression gave the best results. In this paper we will be discussing about the results obtained by training the machine using these algorithms [14,16].

Multiple Linear Regression
Multiple linear regression is a technique to find a relationship between one dependent variable and two or more independent variables. In this case we have one dependent variable also known as the response variable and we have 20 independent variables that we call predictors. The coefficient of each predictor in the equation depicts its contribution in predicting the response value. Using Minitab we can derive an equation showing the relationship between these parameters so that normalised bond strength of steel could be estimated.  Figure 1 depicts the methodology used to evaluate the NBS. We used (1,0) coding for categorical predictors. We also used 10 fold cross validation to ensure that our model is neither overfitting the data nor is it under fitting the data.  The dataset taken in to consideration for analysis is normally distributed hence analysis of variance is done to analyse the differences in prediction from the proposed input parameters fitted in the model. Adjusted Sum of Squares measures the variation in the output and is clarified by individual unit of model. Adjusted mean squares quantifies how much variety a term clarifies, expecting that every single other term are in the model irrespective of their order of entry, it is the variance around fitted values.  Figure 3 shows the analysis of variance which clearly depicts that temperature variation, types of aggregate, fc cube, admixture, strength grade, length to diameter, age at testing are variables that have a significant effect on the estimation of NBS. The model obtained after training the machine was around 87% accurate as shown in the model summary, figure 4.
Normal probability plot of residuals is plotted to verify the assumptions are normally distributed. The plot obtained for our model shown in figure 5 depicts the accuracy of our model. Residuals versus fits plot given in figure 6 verifies assumption that the residuals have an acceptable variance. The plot we obtained as shown in figure 7 is the histogram of residual which is used to determine the skewness of data and outlier's existence in the data.

CART Regression
CART stands for Classification And Regression Trees. It refers to a decision tree model used for predictive modelling. The representation of this model is in the form of a binary tree. It can be used for both classification and regression. Since our output variable is continuous we used CART regression for predicting the normalised bond strength of steel.
While training the machine we used strength grade, fibres, type of aggregate, admixture, specimen geometry, type of bar, cooling and type of test as categorical parameters and the remaining inputs as continuous parameters for predicting the NBS. The method that we applied to make predictions using CART regression is mentioned in figure 9. We used least squared error for node splitting and we applied 10 fold cross validation to avoid overfitting. For evaluating the model we used various parameters like R squared value and root mean squared error. or ideal condition, R -squared value should be 1 and RMSE value should be 0.
The tree obtained by using this approach is shown in figure 10.  The model that was applied for predicting NBS used 35 terminal nodes. It was more than 86 percent accurate as depicted by the R -squared and the RMSE values in figure 11. The figure also shows that all the input parameters are important for making predictions for normalised bond strength of steel.

Results and Discussion
For multiple linear regression model variable importance in the decreasing order was temperature variation, type of aggregate, fc cube, length to diameter, admixture, strength grade and age at testing. The other features were not very significant while predicting NBS. While making predictions using the CART regression model the variable importance obtained in decreasing order was temperature variation, bond length, length to diameter, maximum temperature, cover to diameter, concrete cover, fibres, type of aggregate, admixture, diameter, admixture content, strength grade, time at maximum temperature, fc cube, specimen geometry, type of bar, type of test, cooling and volume fraction.
Multiple linear regression model didn't consider specimen geometry and type of test in regression equation. Whereas all of the 20 parameters were considered important for making predictions using CART regressor.

Conclusion
In this study we developed two machine learning models to predict the normalised bond strength of steel. The first model was multiple linear regression model and the other was CART regression model. Both the models were equally good as multiple linear regression model showed on accuracy of 86.82 percent and the CART regression model was 86.22 percent precise as depicted by their R-squared values. The models were developed utilising ANN and linear regression models from a reference dataset. The model with its respective inputs has been checked for its output with the regression equation to forecast the bond strength. The plot of residual data for bond strength is built to evaluate the fitness level after regression for the various sets of input parameters, deciding whether the ordinary least square assumptions are satisfied if presumptions are met, ordinary lowest square regression results unbiased coefficient estimates with the least variance. In machine learning, the ideal value of RMSE is 0 and that of the R squared method is 1. This model, which is about 86.82 percent accurate, can predict bond strength precisely in short time. The ANN models developed will contribute to timesaving, reducing waste material and decreasing the project's overall cost.