Covid-19 Forecasting using Supervised Machine Learning Techniques – Survey

COVID-19 is a global epidemic that has spread to over 170 nations. In practically all of the countries affected, the number of infected and death cases has been rising rapidly. Forecasting approaches can be implemented, resulting in the development of more effective strategies and the making of more informed judgments. These strategies examine historical data in order to make more accurate predictions about what will happen in the future. These forecasts could aid in preparing for potential risks and consequences. In order to create accurate findings, forecasting techniques are crucial. Forecasting strategies based on Big data analytics acquired from National databases (or) World Health Organization, as well as machine learning (or) data science techniques are classified in this study. This study shows the ability to predict the number of cases affected by COVID-19 as potential risk to mankind.


Introduction
Machine learning (ML) has become a popular research subject in the previous decade, handling a variety of complex and sophisticated problems. ML algorithms often learn through trial and error, in contrast to traditional algorithms, which computer instructions based on decision statements such as if-else. Forecasting is one of the most important aspects of machine learning [1]. In this field, a variety of typical machine learning methods have been applied to direct future activities as shown in below  The researchers' primary goals were to produce a study that could be beneficial for future decision-making models. Historical data is evaluated to gain perspective during the decision-making process. However, having access to data in such a short length of time is insufficient to build Artificial Intelligence (AI) models [12]. Time-series data requires AI models that can be effectively trained (During the early phases of an epidemic's spread, there is a scarcity of data). The time series analysis can help enhance forecasting efficiency.
Time series analysis is a large field that has been used to solve a wide range of issues, from econometrics to earthquakes and weather forecasting. A time series is a collection of measurements taken at regular intervals over time. A time series might be yearly, quarterly, monthly, or weekly, depending on the frequency [3]. There are two ways in which Time-series differs from a traditional regression problem. The first is time-related; in linear regression analysis, variables are independent. However, in this case, they are dependent on time. Seasonality trends, on the other hand, are fluctuations that are specific to a given span of time [4].

COVID (2019) OVERVIEW
COVID-19 (Corona virus) is a novel virus that causes inflammation. The disease induces a respiratory illness (such as cold, cough, fever and difficulty breathing in more severe cases).
Pandemics have posed a threat to the world on many occasions throughout history. (YHU\ SDQGHPLF ¶V LPSDFW KDV always had a massive influence on the entire world, and it has also flipped the roles. Corona virus (2019), the latest destructive outbreak, is currently sweeping the globe. Not only are economics collapsing, but so are the countries' entire strengths and morale. The global effect of the novel corona virus (COVID- 19) necessarily requires detailed forecasting of confirmed patients as well as analysis of death and recovery rates. Forecasting, on the other hand, needs a large amount of past data. At the same time, no prediction can be made with certainty because the future rarely repeats itself. This study details the timetable of a live forecasting exercise with significant implications for planning and decision-making, as well as objective projections for COVID-19 cases that have been confirmed [5]. The discovery of the disease and its categorization as a pandemic by the World Health Organization are important milestones [6] x Keep up to current on the COVID-19 outbreak by checking out WHO updates or your local and national public health authority.
x Hand hygiene should be done on a regular basis, either with an alcohol-based hand massage. x Keep your hands away from your eyes, nose, and mouth. x Coughing or sneezing into a bent elbow or tissue, then discarding the tissue, is a good way to strengthen respiratory hygiene. x If you've breathing difficulties, put on a surgical mask and wash your hands carefully after removing it. x People who are experiencing respiratory problems have to maintain safe distance (about 2 m). x If you've a fever, a cough, or are having trouble breathing, visit a doctor.

RELATED WORK
In the academic literature, machine learning (ML) methods have been offered as time-series forecasting alternative solutions to statistical approaches. However, there is a scarcity of information about their respective performance and computational needs. Using a subset of (1045) monthly data sets from the M-&RPSHWLWLRQ WKLV VWXG\ ¶V SXUSRVH LV WR evaluate such performance over a variety of predicting horizons. When we compared the post sample accuracy of 8 prominent algorithms of ML to that of 8 classic statistical methods, study discovered that the first consistently outperformed the latter across all accuracy measures and forecasting horizons. Furthermore, we discovered that they had far higher computational requirements than statistical approaches. The study describes the findings, explains why models of ML are less accurate than statistical models, and suggests some possible next steps. Our study's empirical findings underscore the need for unbiased and fair approaches to assess the efficacy of predicting methodologies, which can also be done via major, multinational events that allow for significant comparisons and conclusions. Artificial Intelligence (AI) has gained in popularity in recent years, thanks to various elevated applications in intelligent robotics, voice recognition, image recognition, legal, medical, social applications, and even defeating winners in games such as chess and cards. The success of AI is dependent upon its usage of techniques that can learn by experimentation and improve their ability over time, rather than the typical programming domain of coding directions based on reasoning, if then principles, and Decision Trees [1].
The study's purpose is to increase machine learning (ML) algorithms' interoperability with Internet Of Things (IoT) technology in engaging with public and its surroundings in order to reduce COVID-19.Furthermore, the research looks at and examines different solution frameworks that use machine learning techniques to generate, capture, store, and analyze data. These algorithms can detect, prevent, and trace the transmission of COVID-19 in smart cities, as well as provide a better understanding of the virus. Similarly, the report highlighted case studies on the use of ML in hospitals around the world to aid in the fight against COVID-19. The research offers a thorough examination of the primary components required for integrating machine learning with other AI-based solutions. As shown in below Fig 3 and Fig 4, The study's framework provides a complete overview of the essential components required for integrating machine learning with other AI-based solutions [9] . The information and communication technology equipment incorporated in smart cities generates a variety of data kinds. The first type of data is statistical data, which often includes daily statistics such as the number of recognised cases, positive cases, deaths, and recovered cases. The second sort of data is epidemiological data, which mostly consists of all clinical test results for various medications, various drug trials, the patient's medical history, the patient's response to various medications, and so on. The third form of data is real-time surveillance data created by smart city sensors and cameras. Fever is one of the first symptoms of COVID-19 that can be detected. People's body temperatures and other personal information are examples of data that can help stop the spread of COVID-19 [9].
The (MLP) multi_layer perceptron is a fully-linked, (ANN) artificial neural network made up of layers of neuron like processing units feed forwarded. MLP is used for producing high quality models and also requiring less training period than more sophisticated approaches. Hyper parameters (Example: The learning rate for training a neural network) are settings that specify the ANN model's architecture. Correct hyper parameter settings are critical for producing a high-quality model. The grid search technique was used to find the optimal hyperparameter combination. A multi_layer perceptron (MLP) artificial neural network (ANN) is trained using a time series data source that is turned into a regression data source. The goal of training is to create a global model that includes the maximum patients from all locations in each time unit. With a total of 5376 hyperparameter combinations, the MLP's hyperparameters are modified using a grid-search technique. ANNs 48384 are trained using these combinations, and each model is evaluated using the determination coefficient (Zlantan Car, 2020) When cross-validation is used, the scores for confirmed, recovered, and deceased patient models drop to 0.94, 0.781, and 0.986, respectively. The deceased patient model has a high level of robustness, whereas the confirmed patient model has a decent level of robustness and the recovered patient model has a low level of robustness [10].

Figure 4 : Modeling the spread of covid-19 using MLP
The major proceedings of this paper: Comparison of the ML forecasting techniques accuracy with normal statistical ones. As highlighted in the Table 1 Table 1. From the Observation, the MLP got the highest accuracy, then after the BNN and the GP. The remaining methods' sMAPE is in the double digits, indicating a significant variation in accuracy. Investigating the grounds for the variations in performance among the different ML approaches and developing guidelines for picking the most appropriate one for new sorts of forecasting applications would be of significant research value. [11].  Figure 5 shows the statistics broken down by region. The following are the Regions: The Western Pacific Region, the European Region, the South_East Asian Region, the Eastern Mediterranean Region, the American Region, and the African Region are all part of the Western Pacific Region. China, France, Spain, Italy and the United States are among the heavily impacted regions.

ANALYSIS
Forecasting has been done in the research using a variety of forecasting methodologies and data sources. To understand existing forecasting models for better analysis

Table 2Analysis of Covid (2019) prediction on ML Techniques or Data Science
Due to their precision, ML techniques are now utilized for forecasting all over the world. However, there are a few limitations to the use of machine learning (ML) approaches because there's very little data accessible. The optimal parameter selection and selecting models of ML are two issues involved in training a model for forecasting. Researchers made predictions based on publicly accessible datasets and utilised the best machine learning model for each dataset [13,14,15,16,17]. To determine rates of infection in Italy and China, [18] Research proposed a model based on the Logistic-equation, Weibull-equation, and the Hill-equation. Data analysis is conducted in this study to determine the environmental factors impact on the spread of COVID (2019). This model focused on three environmental factors: relative humidity, maximum environmental temperature, and wind speed. The results demonstrated that there is no correlation between COVID-19 spread and humidity or wind speed. The study [19] proposed a model that included a hybrid model, gradient boost trees, and logistic regression that used Medical data. The results of above models will aid in the development of management planning and the implementation of remedies in order to reduce the spread.

Table 3 Analysis of covid (2019) prediction on Big Data
Researchers have forecasted using data from recognised national and international sources, according to the literature. Various methodologies, such as mathematical equations or machine learning algorithms, are used to analyse a large dataset. Research [20] has given decision-making systems based on the COVID-19 data collected from Johns Hopkins University for countries such as China, European countries, Japan, Korea and North America.
[21] Research used WHO COVID-19 databases, Italian national data and Johns Hopkins data to forecast death rates. The impact of disease management actions and transportation limitations on the spread rate was described [22]. The study was based on a dataset obtained from the US-CDC (Centers for Disease Control and Prevention). [23] Study has discussed the key tasks of Isolation in reducing COVID-19 dissemination rates. Table 3 summaries the results of the literature review.

CONCLUSION
The spread and reproduction number should be predicted using a variety of datasets. For more accurate worldwide forecasting, the models described in the literature should be evaluated internationally. On similar considerations, several peaks must be considered in the model not just for short-term forecasting but also for forecasting the outbreak later in the year.
We expect that by analysing multiple COVID-19 forecasting models, we will be able to better modify intervention measures and, more importantly, we will be able to reduce the pandemic's worrying effect. In this study many publications analysed are preprints, which means they are not subjected to rigorous review. Though, given COVID-19's rapid global expansion, a detailed survey of comparison is urgently needed for the mankind.