Machine learning methods for soil moisture prediction in vineyards using digital images

. In this paper, we propose to estimate the moisture of vineyard soils from digital photography using machine learning methods. Two nonlinear regression models are implemented: a multilayer perceptron ( MLP ) and a support vector regression ( SVR ). Pixels coded with RGB colour model extracted from soil digital images along with the associated known soil moisture levels are used to train both models in order to predict moisture content from newly acquired images. The study is conducted on samples of six soil types collected from Chateau Kefraya terroirs in Lebanon. Both methods succeeded in forecasting moisture giving high correlation values between the measured moisture and the predicted moisture when tested on unknown data. However, the method based on SVR outperformed the one based on MLP yielding Pearson correlation coefficient values ranging from 0.89 to 0.99. Moreover, it is a simple and noninvasive method that can be adopted easily to detect vineyards soil moisture.


Introduction
The vine growth and production depend on its water status directly related to the root system functionality and the availability of water and minerals in its soil. The soil fertility is directly related to the soil humidity [1][2]. Therefore, soil humidity determination is an important tool in terroir characterization, the latter being an essential procedure for viticulture development [3][4].
By definition, the soil moisture is the ratio of water mass in a sample to its total mass expressed as percentage. This is how the thermo-gravimetric method measures soil moisture [5], but it is a destructive method requiring the collection of the soil sample and its drying at 105 • C during 24 hours. Agriculture uses different types of tools to detect soil moisture such as tensiometers, Frequency Domain Reflectometry (FDR) and Time-Domain Reflectometry (TDR). Tensiometers function well in humid to semi humid soils, but show uncertainty and even total dysfunction in dry soils such as vineyard soils at the end of the wine grapes growing season [6]. As for the TDR and FDR, they are accurate but expensive [5].
These constraints prompted several researchers to implement computer-based methods as a smart agriculture tool to predict soil moisture. In the field of machine learning, Altendorf et al. claimed that a neural network based model outperforms linear regression methods in forecasting soil moisture from soil temperature data [7]. In [8], a support vector machine built with meteorological data predicted soil moisture for four to seven days ahead. Meteorological data are also used as input to a neural network that predicts the soil humidity in [9].
Based on the fact that soils get darker with increased moisture [10], several methods were proposed to predict soil moisture using image processing. A linear regression model is constructed in order to predict soil moisture from digital soil images where the predictors are the S and V values of the pixels coded with the HSV colour model [11]. Another linear regression model is proposed for each soil type to forecast soil moisture from soil images where the independent variables are chosen among the features of the RGB and HSV colour models as well as the digital number of panchromatic images [12]. A neural network based model is built to predict soil moisture of tropical soils where the network inputs are the R,G,B values of the pixels extracted from soil digital images [13].
The aim of the present study is to use machine learning methods in order to estimate vineyards soil moisture from soil digital photography. Soil samples having different moisture content were photographed. The colour information extracted from the photos and the measured moisture content were used to train two nonlinear regression methods. The first method is a neural network, more specifically a multilayer perceptron (MLP) of one hidden layer. The second method is a support vector machine used for nonlinear regression (SVR). Both methods succeeded in predicting the soil moisture yielding high values of the correlation coefficient R of the predicted moisture and the measured moisture when both models were tested on unknown data: The R coefficient ranged between 0.84 and 0.97 in the case of MLP and between 0.89 and 0.99 in the case of SVR.

Materials and methods
The course of the study went through different phases. First, soil samples were collected from vineyards of six different terroirs. Then, soil samples were immersed in water for 48 hours and left to dry after drainage of excessive water. On a daily basis, the samples were photographed and weighed. When the masses reduction became negligible, the soil samples were dried completely in the oven and moisture contents were calculated according to the thermo-gravimetric method. RGB colour data extracted from photos along with the associated measured moisture were used to train two nonlinear regression models that can predict soil moisture content.

Soil sampling
The soil samples used for data collection were collected from Chateau Kefraya terroirs in Lebanon. This area was chosen because it is an important agricultural area cultivated mainly with wine grapes. Pure soil categories, from relatively wide units were selected randomly to conduct the study. Table 1 shows the soil types, the number of samples collected for each type and the colour of each type.
The samples collection was done in two steps: first, by removing the rocks from the surface, then by collecting around 3 kg of soils at a depth of 10 cm from each point. At this depth, the soil is arable, therefore its moisture can be investigated. Each sample was mixed and divided into two sub-samples: one was used to conduct the experiment and one was sent to the soil analysis laboratory of the Lebanese Agricultural Research Institute. The physical characteristics of the soils obtained from the laboratory analysis are simplified by calculating the average of all the replicates of each soil type as shown in Table 2.
To conduct the experiment, the soil samples were placed in plastic containers which dimensions are: 30 cm in length, 22 cm in width and 7 cm in depth. Each container is bottom perforated with 1 cm diameter holes to allow drainage of excessive water. The holes are covered with a fine mesh screen to prevent soil loss. Each container received equal amount of distilled water and then placed in a larger one, full of water, and soaked for 48 hours till it reached its full water capacity. Starting day one, each container was weighed and photographed daily in a dark room with a Canon EOS 1200D digital camera of 18 Mpx resolution. The camera was fixed on a tripod with its lens facing down parallel to the plan of the container, at a height of 0.5 m. A source of continuous light illuminates the soil sample at 45º . Four panels of white foam are placed around the container as reflectors to illuminate the sample with a continuous soft light. The custom white balance setting on the camera is used to calibrate the colours. Every sample mass was measured with a tare digital balance on a daily basis. The mass reduction became negligible on day 52. On day 53, the soil samples were completely dried in the oven at a temperature of 105-110º C. After complete dryness, the samples were weighed and photos were taken in the same methodology described above. The daily percentage of soil moisture content is calculated as such: (1) Table 3 shows the maximum and minimum moisture content per type.

Data acquisition
Each day of the experience yields 35 photos. In order to collect the data resulting from the photos, the following steps are performed: 1 Each photo is coded according to the RGB colour model. It is an additive colour model where the primary colours (red, green and blue) are added together to make 16,777,216 colours. Thus, each pixel is described by the red, green and blue features, each one ranging from 0 to 255.
2 A window of 500x500 pixels is extracted from the center of each photo.
3 Outlier pixels are removed from the cropped window.
To do so, the sum of red, green, and blue values for each pixel is calculated. Then the first quartile (Q1) and third quartile (Q3) of all the sums are calculated. Pixels whose RGB sum is less than Q1 or greater than Q3 are removed [13]. 4 The cropped window is divided into 9 sub-windows.
For each sub-window, the mean of red, green and blue components of the pixels is computed.
Therefore, each photo gives nine three-dimensional data vectors. The soil moisture content measured at the day when the photo was taken is associated to the nine RGB data vectors. In order to limit the size of the training dataset, data from 24 regularly spaced days among the 53 were retained for each soil type, making a total of 1296 observations per soil type. This will prevent the prediction model from over fitting and will reduce the model training time.
Two nonlinear regression models based on machine learning are built to predict the soil moisture from the soil digital photos. The first one is an artificial neural network, more specifically a multilayer perceptron. The second one is a support vector machine that we use for regression. Supervised learning is used to train the models with the data collected from the experience where each data sample consists of a input vector ( ) described by the red, green and blue components, and the corresponding measured moisture ( ) also named the target moisture.

Multilayer perceptron method
A multilayer perceptron (MLP) [14] with one hidden layer of seven neurons is constructed in order to predict the soil moisture content from the soil digital images (Fig.1). The number of hidden neurons of the network is set to seven according to the Kolmogorov method which states that the number of hidden neurons in MLPs is equal to: 2*(Number of inputs) +1 [15]. A higher number of hidden neurons might give better results but it could limit the generalization capabilities of the network. The input layer is fully connected to the hidden layer. The hidden layer is fully connected to the output neuron. The sigmoid function: (2) is the activation function of the hidden neurons, whereas the linear function is the activation function of the output neuron.
The network training is performed according to the Levenberg-Marquardt algorithm [16]. It is an algorithm based on the Newton method used in optimization to find the minimum of the error function. This algorithm is fast but requires more memory since it computes the Jacobian matrix, which is not a problem in this case since the network size is limited. The data vectors collected from the experience are divided randomly into three subsets.
The first one is the training subset (T) consisting of 70% of the data vectors. It is used to compute the gradient and update the network weights and biases. The second subset is the validation subset (V) consisting of 15% of the data vectors. It is used to stop the training when the validation error increases for a specific number of epochs, after being decreased during the training. The test subset (Tt), consisting of 15% of the data vectors, is not used during training. It is used to test the performance of the network and to compare different networks. The supervised learning of the network consists of the following steps (Fig.2): 1 Initialize the network weights with random values.
2 Present the samples ( ; ) of the training subset. The input data vector is presented to the network through the input layer and the value of the output neuron is the predicted moisture.

Support vector regression method
Support Vector Machines (SVM) is a popular machine learning tool for classification and regression [17]. The SVM formulates a quadratic optimization problem that ensures a global minimum, which makes them outperform traditional learning algorithms. In this study, we use the e-intensive SVM or (e -SVM) regression in order to predict soil moisture from RGB predictors. The goal is to find a function that has at most e deviation from the actually obtained targets moisture for all the training data, and that is as flat as possible (Fig.3).
A nonlinear regression is achieved by using a Gaussian kernel function that map data into a higher dimensional space.
The data collected from the experience is split into two subsets: the training subset consisting of 85% of the data used to construct the model, whereas remaining 15% are used as test subset to assess the model. The observations that constitute the test subset are the same for the test subset used in the case of MLP.

Results
In the following, the results of using the MLP and the SVR models are exposed. Comparison between both models and between other methods proposed in similar works is also discussed. The Pearson correlation coefficient R of the predicted moisture and the target moisture is used to evaluate the performance of a model. Values close to one of this coefficient indicate a good prediction model. The mean squared error (MSE) between the predicted moisture and the measured moisture is also used to evaluate the network performance and to compare different models. Lower values of MSE indicate better prediction model.

Multilayer perceptron results
The Matlab Deep Learning toolbox is used to construct and train the MLP model [18]. In order to overcome the problem of local minima, the MLP is trained 50 times, and the replicate that yields the smallest error Et (Eq.(3)) is chosen. Table 4 shows R and MSE values of the test subset, as well as the number of training epochs when the MLP is trained for the six soil types individually as well as for all soil types. The best results are obtained for type-4 (R=0.972, MSE=4.0729e-4). Less promising results are obtained for type-6 and when training is done with data from all types. Fig.4 and Fig.5 show the regression plots of the target moisture and the predicted moisture on the test subset when the MLP is trained with data collected from type-4 alone and from all types respectively.

Support vector regression results
A support vector regression model is built using the Matlab Statistics and Machine Learning Toolbox [19] in order to predict the soil moisture from the soil digital images. The values of the model parameters are the default values set by the toolbox. Table 5 shows the values of R, MSE as well as the percentage of support vectors, when a SVR model is trained for the 6 individual soil types as well as for all types combined.
Similarly, as in the MLP case, the highest value of R (0.989) and lowest value of MSE (1.7818e-04) are obtained for type-4. Less promising results are obtained for type-6 and when training is done with data from all types. Fig.6 and Fig.7 show the regression plots of the target moisture and the predicted moisture on the test subset when the SVR model is constructed with data collected from type-4 alone and from all types respectively.   Table 5 shows the values of R, MSE as well as the percentage of support vectors, when a SVR model is trained for the 6 individual soil types as well as for all types combined.
Similarly, as in the MLP case, the highest value of R (0.989) and lowest value of MSE (1.7818e-04) are obtained for type-4. Less promising results are obtained for type-6 and when training is done with data from all types. Fig.6 and Fig.7 show the regression plots of the target moisture and the predicted moisture on the test subset when the SVR model is constructed with data collected from type-4 alone and from all types respectively.

Results interpretation
According to Tables 4 and 5, and Fig. 4, 5, 6 and 7, the results obtained with the SVR method are better than those obtained with the MLP method. Moreover, SVR ensure global minimum and is deterministic whereas MLPs are at risk to be stuck in local minima and their outputs depend on the initial connection weights. By comparing the results obtained for the different soil types (Tables 4 and 5), we notice that for type-4 (Eutric Leptosols) and for type-2 (Calcic Cambisols), the obtained R values are the highest and the obtained MSE values are the lowest. It is probably due to their colour which is lighter than the colour of other types (Table 1), an evidence assessed by [11].
When comparing our work to other similar ones, we find that the proposed methods perform better than the ones proposed in [13] and in [12] in term of correlation coefficient. According to [13], the correlation coefficients ranged between 0.703 and 0.909. But, it is important to mention that they used a camera resolution of 7.1 Mpx which is lower than the one used in this study. Besides, the experiments were conducted on tropical soils having different soil compositions. Santos et al. built a different linear regression model for each soil type and the resulting correlation coefficients varied between 0.8538 and 0.9506 [12]. Persson obtained better results with correlation coefficients ranging from 0.965 to 0.995 [11], yet, the investigated soils contained higher percentages of sand (above 40%) that increase their reflectance which is not the case with the present study (Table 2). It is obvious that better prediction results are obtained when the regression models are trained with data of individual soil types. Since soil colour may be also affected by the soil physical, biological and chemical properties [20][21], it would be wise to include some soil properties in addition to colour data to train the prediction model especially if soils of different types are involved.

Conclusion
In this paper, we implemented a multilayer perceptron and a support vector regression to predict vineyards soil moisture from digital images. The experiments were conducted on six soil types collected from Chateau Kefraya terroirs in Lebanon. The training data consisted of RGB pixel values extracted from the soil images and of the associated measured soil moisture by means of the thermo-gravimetric method. The SVR method predicted the soil moisture better than the MLP one and better than other regression methods found in earlier studies. As prospects to this work, the models might be tested in real time by collecting digital photos on the site and by comparing the predicted moisture to the real measured moisture.
The SVR based model succeeded in predicting the soil moisture from digital images, especially if individual soil types are investigated. It constitutes a simple smart tool for soil moisture prediction in vineyards which simplifies and automates viticulture terroirs characterization.