Deep reinforcement learning in agricultural IoT-based: A review

. The world's food needs have an impact on innovation in the field of agriculture, and one of them is by implementing deep reinforcement learning (DRL) technology, which is very relevant to the Industrial Revolution 4.0. This research discusses important issues and developments in DRLs that are implemented, especially in the field of IoT-based agriculture. The research method uses a Systematic Literature Review (SLR) approach through searching and analysing raw data sources, sorting and selecting relevant data relevant to the topics discussed, discussing topic areas and how trends are in current conditions, and concluding. The purpose of this study is to see how the current state of DRL implementation in agricultural IoT-based. The limitations of the study are that (1) the data sources come from Scopus-indexed journals; (2) the journal period is 2021 – 2023; (3) the research approach uses SLR; and (4) the focus of the discussion includes the implementation of DRL in agricultural IoT-based systems, the development of DRL technology, and the use of tools in DRL.


Introduction
Technology in agriculture is continuously being developed to meet food needs both in a regional, domestic, and global scope through food technology, genetic engineering and applied technology such as the implementation of the Internet of Things (IoT), which started a trend in early 2010, especially with the development of 4G and 5G communication technology [1].IoT in agriculture provides the easiness to control and monitor land, plant, and environmental conditions in real-time and provides precision farming.In the era of the Industrial Revolution 4.0, IoT in agriculture is growing with the combination of data science, machine learning, big data, and robotics [2].Particularly in the field of machine learning, deep reinforcement learning (DRL) modelling has been developed which combines deep learning and reinforcement learning.DRL is a learning model that uses a deep neural network (DNN) as a basis for data processing, Besides that some agents work based on DNN to make decisions, which are called action [3].DRL is widely implemented in the IoT field where devices can provide control and respond to the environment they perceive through sensors.In this study, a review is conducted of the development of DRL implementation in agricultural IoT-based through a systematic literature review (SLR) method approach by reviewing DRL agricultural IoT-based systems in terms of implementation, techniques or tools used, and their developments.The steps taken are searching and analysing raw data sources, sorting and selecting relevant data relevant to the topics discussed, discussing topic areas and how trends are in current conditions, and concluding.The purpose of this study is to see the current state of DRL implementation in agricultural IoT-based.The limitations of the study are that (1) the data sources come from Scopus-indexed journals; (2) the journal period is 2021-2023; (3) the research approach uses SLRs; and (4) the focus of the discussion includes the implementation of DRL in agricultural IoT-based systems, the development of DRL technology, and the use of tools in DRL.

Methods
The study was conducted using a systematic literature review approach through the process of reviewing research journals that have been published in Scopus indexed international journals in Figure 1 below shows a flowchart of the study conducted:

Data collection
The article collection stage is the initial initiation stage of the research method being carried out.At this stage, a search for relevant articles regarding DRL in agricultural IoT-based is carried out.Some of the techniques used are random searching on the scopus.compage with the keywords "deep reinforcement learning", "agriculture DRL", "DRL agriculture IoT" and "DRL agriculture IoT-based".From the article collection stage, 51 recommendation articles were produced starting from the period 2021 to 2023.

Article selection
As the next stage of article collection is the selection of articles that will be used as a rationale for the research conducted.Table 1 shows a variety of research topics related to DRL in agricultural IoT-based as many as 6 topics in 51 research articles:

Article review and study
Reviewing and studying articles using Systematic Literature Review (SLR).SLR is defined as the process of identifying, evaluating, and interpreting all available research evidence with the aim of providing answers to specific research questions.The function of the SLR data analysis method is to summarize and deepen the findings of various relevant studies.The facts presented in the aggregate results are very useful to decision makers as a basis for policy results.The facts outlined are grouped into 3 main discussions consisting of (1) the implementation of DRL in agricultural IoT-based systems, (2) the development of DRL technology, and (3) the use of the techniques and tools in DRL.

Documentation
Documentation is the final stage of this research, which is actually carried out during the research implementation process starting from data collection, data analysis to synthesis or interpretation of the findings.The documentation process is important because it can be used as a basis for future research development, as well as a research document that shows the credibility of the data and information used.The implementation of DRL was carried out on grape plantation in Hunan with a series of parameters which became the basic state of the agent, consisting of soil moisture, the leaf water potential, the leaf stomatal conductance, solar radiation, soil humidity, relative humidity, air temperature and wind speed.The data training process was carried out with a total of 1210 hours with a total of 72600 training steps.The results of implementing the algorithm on the agent show that the agent can perform actions in a complete, distributed manner and is able to explore the conditions of the wine plantation environment.

The development of DRL technology
Currently, DRL development has been widely implemented in unmanned aerial vehicles (UAV) as was done by Castro et al [4] which combines rapidly exploring random trees (RRT) and DRL to control the movement of UAVs across olive trees.The author conveys several parameters as limitations of the proposed model, i.e., (1) there are random traps distributed with five traps in each tree; (2) the distance between traps is determined to be a minimum of 3 meters; (3) UAV range through planar LIDAAR within 10 meters with 360-degree angle of scanning.The block diagram of the algorithm agent is represented by Figure 3  Modelling is done by combining RRT and DQN where RRT is an algorithm based on a decision tree and random search of objects, where the object in the algorithm is tree olives.Whereas in DQN the Q-network scenario is used which consists of state and action.Hyperparameter tuning is carried out in the DQN model training process which consists of four models, i.e., the first model with 1 hidden layer consisting of 30 neurons with a total of 1000 steps; second model with 1 hidden layer consisting of 100 neurons and 500 steps; third model with 3 hidden layers consisting of 100, 60, 30 neurons and 500 steps; fourth mode with 3 models consisting of 81, 212, 29 neurons and 500 steps.The experimental results show that the UAV with the combination of the RRT+DQN model produces a run time of 8.2 ms in passing 10 obstacles in the 300 meter area.

The use of technique and tools in DRL
Various DRL modelling techniques have been developed, one of which is proposed by Din et al [6] which uses a dual deep Q-network (DDQN) which is used to monitor crop farms.Fundamentally, the Q-network is an action-value function consisting of a series of states and actions.DQN employs a deep neural network in modelling the Q-function as knowledge for agents to make decisions.Double DQN provides a solution to the Q-value issue, namely overestimation in determining action conditions.The background is how to use a set of parameters or weight to determine the best action, and another set of parameters to estimate the Q-value in both online and local networks.The training process is carried out with initial parameters consisting of 32 batch sizes, 0.9 of gamma, 3500 of epochs, 30 steps and 6e-4 epsilon decay.Experiments were carried out by comparing DDQN with other models, namely uniform coverage (UC) and behaviour-based robotic coverage (BBR) in percentage coverage area with the results represented by

Fig. 3 .
Fig. 3. Block diagram algorithm of the system agent.

Table 1 .
Various research topics related to DRL in agricultural IoT-based.DRL technology, and (3) the use of the techniques and tools in DRL.While the selected journals produce 3 reference journals with detailed descriptions in table 2 below: Of the 51 articles, those have the same discussion, which, if further studied, produces 3 main topics which are then discussed in the results and discussion chapter.The 3 main topics are (1) the implementation of DRL in agricultural IoT-based systems, (2) the development of

Table 2 .
Description of three recommended journal.

Table 3 .
Comparative analysis experiment in area coverage.The implementation of DRL in the field of IoT-based agriculture contributes to producing accurate, real-time and efficient data.DRL has experienced many developments in terms of the tools and modelling techniques implemented.From the review, 51 articles have been synthesized into 3 main topics related to DRL in agricultural IoT-based systems, namely (1) the development of DRL technology; (2) the implementation of DRL in agricultural IoTbased systems; (3) the use of techniques and tools in DRL.DRL will continue to develop in line with the development of machine learning and IoT technology in the industrial revolution 4.0 so that it contributes positively to other scientific developments.