Investigation of the influence of geographical factors on soil suitability using a nonparametric controlled method of training and data analysis

. This paper analysed a dataset using a selected data analysis tool. The study found that decision tree was a suitable tool to analyse this data set. Special attention was given to the analysis of geographical factors including an assessment of the presence of water bodies in the county. The analysis showed that these factors have a significant impact on soil workability. Although the model based on these factors did not have absolute accuracy (14% error), it was still acceptable and cheaper to implement. One of the main advantages of using geographical factors to predict soil workability is their easy availability. Data on the presence of water bodies and other geographical indicators can be easily found and used in the analysis. The analysis thus confirms the effectiveness of using decision tree in combination with geographical factors to analyse datasets related to soil serviceability. Despite some inaccuracy of the model, its relative simplicity and accessibility make it an attractive tool for forecasting and decision making in this area.


Introduction
Making the right decision is a very difficult task when there are many alternatives.Environmental engineering research deals with many different situations with complex problems.Determining the balance between all components and making the right decision in a multi-level situation becomes complex.Use algorithms for multi-criteria decision analysis (MCDA) could formalise and break down all aspects of the process.The selection of suitable materials is important, as well as the selection and optimisation of the process.It was also evaluated whether MCDA methods could be used in analytical chemistry to increase the complexity of decision-making processes.
Agricultural soils are the nexus of a wide range of pressures, including increasing global food demand associated with population growth, changing diets, land degradation and the associated decline in productivity, potentially exacerbated by climate change.To effectively manage agricultural soil use, decision makers need science-based, easy-to-use and costeffective tools to assess soil quality and function [1] To effectively manage agricultural soil use, decision-makers need scientifically sound, easy-to-use and cost-effective tools for assessing soil quality and function [2].However, as practical assessment of soil quality requires an integrated consideration of key soil properties and their changes in space and time, this remains a challenging task [3].
Soil quality is related to the ability of soil to function within the boundaries of an ecosystem, to support biological productivity, to maintain environmental quality and to promote plant, animal and human health [4,5].The physical quality of agricultural soil refers primarily to soil strength and the fluid transmission and storage characteristics of the crop root zone [6].
Recent studies have proposed several conceptual frameworks for soil quality monitoring.Soil characteristics are selected from a minimum data set for their suitability to assess a specific soil function [8], a specific soil ecosystem service [9], or a key soil threat [10].However, as many soil analyses are required, monitoring all soil quality indicators at different scales and land uses remains costly and time and labour intensive using standard procedures [11].In this regard, the determination of relevant soil physical indicators seems useful in determining the status of soil quality as they express the soil's ability to store and provide water, air and nutrients required by the crop [12].
The authors [13] made a study in the field of oil field development, they reviewed the main methods and technologies currently used for oil field development.They concluded that the choice of a development system, placement and selection of a good operating mode significantly depend on the geological structure of the reservoir, its volume and the properties of oils.
Other authors [14] modelled and carried out a comprehensive analysis of the parameters of the topology of ventilation networks when ensuring fire safety in the development of coal and gas fields.They concluded that the stability of gas and explosion safety and air supply of coal mines is significantly affected by diagonal connections, which have their own peculiarities and have different effects on the distribution of air flows and methane emissions in mine workings, influencing fire safety in general.

Materials and methods
The soil classification dataset used in this study contains records derived from U.S. area-wide drought monitoring, manually created by experts using a wide range of data.Training and testing of the exploratory analysis model was conducted using deductor studio.
The main objective of the study was to identify factors affecting soil suitability for cultivation under drought conditions.Data object classification, a data mining and value management technique used to group similar data together, was used to conduct the study and identify new influencing factors on soil suitability for cultivation under drought conditions.
This study utilises the Deductor analytics platform, which is a framework for creating comprehensive application solutions.Three algorithms were tested for each dataset: kohonen maps, decision tree and neural networks.An explanatory analysis is presented for the best models, which have a lower error rate, to explain the results obtained.It is also worth noting that the version of the Deductor platform is a training version, which limits the possibility of data customisation and thus leads to a high error rate and incorrect data display.
A decision tree is a tree structure similar to a flowchart [15].In this algorithm, there is automatic selection of features to nodes from a set of features, construction of decision rules in a form understandable to the expert.Self-organising Kohonen map is a method of projecting a multidimensional space into a two-dimensional space and consists of two layers: input and output.Kohonen maps allow clustering of objects by simultaneously formed clusters [16].The maps reflect the proximity of multidimensional feature vectors, so the objects of vectors whose features are close to each other are located in neighbouring cells or are included in one cell.

Result and discussion
The main objective of the study was to investigate the function of different models of machine learning algorithms for predicting the suitability of soil for cultivation under drought conditions.Based on 3 different tools namely Kohonen's Self-Organising Maps, decision tree and neural networks, the tool -decision tree was chosen to identify the cheaper and more accessible figure of the model in expert systems while using few factors.It is worth noting that SQ7 is a measure of suitability of growing plants on soil (workability), is an output factor in the construction of the model {0;1;2;3;4;5;6;7}, where 0 is the suitability of growing plants on soil, 1 is almost no suitability of growing plants on soil, 2 is low suitability of growing plants on soil, 3 is below average suitability of growing plants on soil, 4 is above average suitability of growing plants on soil, 5 is high suitability of growing plants on soil, 6 is almost full suitability of growing plants on soil, 7 is the highest suitability of growing plants on soil.
The panel consists of assessment data exactly land in the county, such data can also be found, for example through the PLUS model [17,18], which provides support for a highfidelity study of the evolution of land use areas.The PLUS model is a new and improved CA (cellular automata) model based on the FLUS model.It combines a new strategy for analysing land use expansion and a CA model based on multi-class random seeded beds.It is also possible to find such data by narrowly contacting organisations in a particular county, where such data are collected for statistics on crops grown, etc.Each factor that comprises a group is detailed in Table 1.Based on these factors, as discussed above, a model was built on individual factors that fall into a separate group for practicality.The error of such a model was 14.15% and the most significant factor in such a model is the assessment of water bodies in the district.The significance of the factors from the output parameter (soil suitability score), is presented in Table 2.

Conclusion
The model has a rather large error, namely 14.15%, the most significant parameter is the assessment of water bodies in the district, it should be noted that this parameter can be used only in predicting the suitability of land in drought conditions, data collection with current technology is quite easy.The model is perhaps the most practical in real life, its error is slightly high (14.15%),but it is possible to make a prediction about the suitability of soil for cultivation through such a model even with one parameter (assessment of water bodies in the district), this parameter is available in the public domain, it can be collected even from maps and it does not require huge resource costs.A visual diagram of the constructed model demonstrating the quality is shown in Figure 1.A group of factors related to geographical indicators, including an estimate of the availability of water bodies in the district, may be preferable in analysing soil performance for the following reasons: 1. Geographical factors can provide a broad context and general characteristic of the area where the soil being analysed is located.Evaluating the presence of water bodies can be useful because water is one of the primary factors affecting soil conditions.The presence or absence of water in a neighbourhood can have a significant impact on soil performance as it can provide optimum moisture levels for different soil types and cultivated plants [19].
2. Weather factors such as temperature, precipitation and wind can vary significantly in different climatic zones and at different times.This can lead to heterogeneity and variability in weather data, making it difficult to use in analysing soil performance.On the other hand, geographical indicators such as water body grades tend to change more slowly and provide more stable data that can be useful in analyses [20].
3. Mining and laboratory soil indicators provide detailed information on soil composition and physical properties.However, these indicators can be localised and may not always represent general soil characteristics over large geographic areas.In contrast, geographical factors, such as the presence of water bodies, can provide a broader context and account for the general characteristics of soil conditions in a region [21][22][23][24][25][26].
In conclusion, geographic indicators, including an assessment of the availability of water bodies in the county, are important factors that can have a significant impact on soil performance.Their use in the decision tree allows contextual and general characteristics of soil conditions to be taken into account at a broad geographical scale, thus facilitating more accurate and comprehensive analyses of these datasets.

Fig. 1 .
Fig. 1.Diagram reflecting the quality of the model

Table 1 .
Factors used to build the model

Table 2 .
The significance of factors from the output parameter