Novel Corona Virus Prediction and Transmission Analysis using Machine Learning Models

Today we all are suffering from Covid-19, a novel virus and it is the most harmful disease across the world which mainly comes under the domain of health care research. Healthcare system gives importance to health states of the population or individual. Healthcare plays a vital role in promoting physical and mental health and wellbeing of people around the world. Efficient health care system leads to country’s economy, industrialization and development. Corona virus is dangerous animal and human pathogens and it is threatening people by spreading all over the world. Corona virus patients mostly suffer from lung infection studies have shown it clinically. We proposed detailed analysis on how to predict the expected death, recovered and confirmed cases based on the available data across the world using various machine learning models. Especially we constructed linear regression model (LRM), support vector machine model (SVMM) and polynomial regression models (PRM) and predicted future expected cases over a period of next 15 days. The error between the predicted model and official data curve is quite small in the process of transmission in data modeling. Compare to other models Polynomial regression model performs best prediction of corona positive cases. Forward prediction and backward inference of the epidemic helps to take decisions for necessary actions during Covid-19 propagation.


Introduction
Healthcare techniques, practices, programs and policies are the set of variables comes under health care research. Every individual must take care of their own health in order to achieve best possible outcomes. It will analyze the factors that affect the health systems and interventions.
Corona virus mainly caused for animals and human beings [2]. Mild East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS) are to blame [1].There are various symptoms for this corona virus. Each and every person may or may not have all the symptoms. Sometimes the person may not have any symptoms also. Transmission of the corona virus from animals to humans is very rare [2]. It may take up to 5-6 days to identify the symptoms or it may take up to 14 days also it varies from person to person. Corona infection has a pH ranging from 5.5 to 8.5. We ought to eat alkaline foods that are higher in PH. * Corresponding author: karunavenkatg@gmail.com

Background
It first originated in food market, Wuhan which is the seventh largest place in china. Corona virus can be tested through Swab test, blood test etc [4]. A swab is inserted in nose or mouth to take the sample for testing purpose [4]. In December 2019 [5], Corona virus spread rapidly and widely in china so many people are affected by the virus [3]. Europe has become the central hub of the pneumonia epidemic, and on March 11, a new pneumonia infestation known as a global pandemic was declared [7]. Viruses have become a major threat to human health and safety all over the world [28], and as a result, many individuals have been infected and harmed [23]. Whenever the person coughs or sneezes the droplets are going to fall on earth as they cannot remain in the air for long time. So, people should take necessary precautions while coughing or sneezing. People should follow all the steps implemented by the government in order to protect from this pandemic and one should not step of the house if it is an emergency case then only should go out by taking steps like Covering the face with mask, sanitizing hands and washing the hands frequently.
In this article, we developed and built a new model of corona virus transmission using linear regression [21], support vector machines (SVMs) [11,12,14], and polynomial regression Model [8]. In propagation process the simulating of COVID-19 studies found that curves of proposed will get simulated by the official data curves for all the countries [8]. Major affecting factors are the virus spreading, such as confirmed cases, death cases and recovered cases. We can predict epidemics development in various regions and they are inferred at the time of initial cases. Some of the symptoms of the Corona virus are loss of smell and taste, fever, conjunctivitis, headache, dry cough, High temperature, tiredness, diarrhea, rashes on skin and body, Finger and toe discoloration [15] Severe symptoms include pain breathing or breathlessness, chest pain or tenderness, and decline of speech or action. If any individual have serious symptoms they need to seek for immediate medical attention. Mild symptom people can take their medication at home. It may take up to 5-6 days or 14 days to identify the virus symptoms.
Viral test and anti-body test are the two kinds of tests for corona virus. If you are suffering from infection currently then viral test is applicable. If you have past infection then you go for antibody test [18]. These anti body test cannot be shown if you are suffering from infection currently your body takes 1-3 weeks to develop antibodies. Having antibodies might prevent from getting infected again to the virus. A swab test [4] is also done to the virus a swab is inserted in the nose or mouth to collect the sample. Results may take some days .Some type of kits are also available to test the virus .The result must be positive or negative. If it is positive the person is suffering from virus. If it is negative he is not affected to the virus. Some tests may be accurate or not. Some of the tests may not getting accurate results even though the person is suffering from different symptoms like breathing. These types of cases can be detected by using MRI and CT scan. In this the project's main purpose is predicting the future covid-19 cases by taking the datasets and applying the machine learning algorithms such as SVM , linear [21] and polynomial regression models. If we take the dataset accurately then we can predict for the next 10 to 20 days. The key factors that influence the spread of COVID-19 are death instances, recovered incidents, and confirmed cases. We can find out easily by using the existing data and can predict the evolution of existing epidemic data. We make bold predictions for the data which exists abroad and trends in development of epidemic in different regions, associated control time and the best early transmission trying to trace of nations with diverse dates Corona virus became threat to people safety and health due to its harmful and spreading power. Presently, outbreak is effectively controlling and spreading rapidly is done by other areas due to corona virus. It is hard to determine the virus transmitted and the time it takes to spread through the areas. At different stages the epidemic has the transmission characteristics and uses SVM, linear [21] and polynomial regression models to create a virus estimation procedure Instances that have been confirmed, cases that have starved to death, and situations that have been recovered are the factors majorly that will affect the spread of virus.
Prediction is done in development and trend of epidemics in different regions at initial cases at the time of inferring. We analyze further the control time's impact in the spread of epidemic. Related countries epidemic prevention is controlled by applicable models and analysis of data that might provide foundation and direction. Simulation results are getting compared with real data, the propagation process and influencing factors can be analyzed. There are datasets for death cases, confirmed cases, recovered cases [16] for different states, countries and regions. If the dataset is most accurate and each and every region of the data is included then we can find the predictions for villages and rural areas if the data is available most accurately. By prediction we can prevent the spreading of the virus by taking the necessary precautions timely.

Related Work
Using enhanced epidemiological models for prediction and forecasting COVID-19 epidemic in India [24]. By using SARS (Severe Acute Respiratory Syndrome) we come to know that corona virus is an infectious disease and thus analyzing the future predictions and forecasts in India. Here in this as they are insufficient tests conducted and it is highlighted using a relevant mathematical formula and the relationships are establishing for number of people infected and the death counts.
Based on the tracking of social interaction of people and patient prediction system can predict the possibility of infections [25]. They used GPS and BLE (Bluetooth Low Energy) tracking system[30] based on social Interactions and proposed a graph model in order to analyses the proposed algorithm behavior. Embedded sensors and Smartphone used for diagnosing Novel Coronavirus in AI [26]. It is less commercial if we use smart phones and sensors to detect the virus compare with the medical kits. Based on the memory, wireless sensors including cameras are https://doi.org/10.1051/e3sconf/202130 E3S Web of Conferences 309, 01034 (2021) ICMED 2021 901034 used. Sensors can detect the severity of the disease through prediction [27]. Flattening the curve by using Bhilwara Model for COVID-19 Outbreak in India specified in [29]. Polynomial 3rd Degree Regression is used in Bhilwara Model and calculated the growth rate and mean by using all the datasets of all states in India.

System Architecture
There are various steps used in this corona virus prediction model and they are Data collection, Feature Extraction, Data preprocessing, Model Selection, Data visualization [19]. Initially you need to collect all the data [6] and then need to check which elements are needed for the proposed work then need to extract those particular features. While implementing the code first you need to load all the libraries needed there after load the datasets and preprocessing is done to all the data means comparing the data by using plots and graphs, scatter plots can also be used. Model selection is done after preprocessing which algorithm gives the best and accurate results needs to be selected, can be visualized by using various charts and graphs.

Data Collection
Data collection means the collection of data by different means such as groups, surveys, observations, questionnaires and various organizations. Data such as confirmed cases, deaths reported and recovered cases needs to be collected for some days or months.

Feature Extraction
Amount of data for the accurate and complete data set which is original must be processed effectively by select and/or combine variables into features for Feature Extraction [10].

Data Preprocessing
Data preprocessing can be done by data cleaning, integration, reduction and transmission. We need to remove the unnecessary data which is not required [20]. Data cleaning means identifying inaccurate or incorrect records from the dataset and removing those elements from the dataset. Data integration means combining the single unified view of data. Integration includes cleaning, mapping which is ETL and transformation. Data integrations are residing in different types of sources.

Model Selection
We used linear, SVM and polynomial regression methods to show the difference between the official data collected and the simulated data [16]. There is a slight difference between the two curves.

Data Visualization
Data Visualization means in which form we can see the output like bar graphs, pie charts, pictures, curves etc. It is the graphical representation of data and information. It provides tools [13,17] to accessible and understands outliers, patterns and data.

Methodology
Covid-19 prediction model can be implemented by using linear regression, support vector machine and polynomial regression models. Linear regression is a linear approach. Relationship between the independent and dependent variables gives the prediction of relationship between the two variables.  901034 finding hyper plane in N-dimensional space where N is the number of features that classify the data points distinctly. Data points can be separated by using two classes and chosen by many hyper planes which are possible. Maximum margin finding is the main objective which means the distance between data points of both classes which is maximum. Margin distance maximization gives reinforcement to future data points can be classified with more confidence. Regression is a type of algorithm that gives and models the dependent(y) and independent variable(x) relationship as polynomial of nth degree. The equation for Polynomial Regression is given below: y= b0+b1x 1 + b1x 2 + b1 x 3 +…...bnx1 n Application of linear dataset or model gives good result for Simple Linear Regression and application of same model for non-linear dataset without any modification will leaves a drastic output. This leads to increase of loss function and high error rate and decrease in accuracy. In non-linear fashion, the data points are arranged for such cases for that we need regression type of model.
Nonlinear and linear dataset can be understood in a better way by the comparison diagram below. Application of linear dataset or model gives good result for Simple Linear Regression and application of same model for non-linear dataset without any modification will leaves a drastic output. This leads to increase of loss function and high error rate and decrease in accuracy. In nonlinear fashion, the data points are arranged for such cases for that we need regression type of model. Nonlinear and linear dataset can be understood in a better way by the comparison diagrams.

Machine Learning Models
For covid-19 propagation analysis, three models are using for prediction purpose. They are SVM model, linear regression model and polynomial regression model.

Support Vector Machine (SVM) Model
SVM is nothing but support vector machine the main aim is finding data points in a distinctly classified method to find the hyper plane of this model. To separate data points hyper planes are chosen. SVM is used for classification and regression tasks. Maximum margin finding hyper plane is the main objective i.e data point's distance is maximum. It is a linear model used for classification and regression problems. It solves linear problems and non-linear problems and works for practical problems too. SVM creates a line or hyper plane which separates the data into classes.

Linear Regression Model
Linear Regression model uses linear approach and shows the relationship between the dependent variable and independent variables. It is used in predicting the two variable relationship. We can use this model to predict output values for inputs which are not present in the dataset. We may believe that those data points will fall under that line.

Polynomial Regression Model
Relationship between dependent variable(y) and independent variable(x) gives the Polynomial Regression for the nth degree. It is a special case of linear regression. Curvilinear relationship between the target and the independent variables is plotted.
Polynomial equation of n degree is represented as y= b0+b1x1+ b2x1 2 + b2x1 3 +…… bn xn n where b0 is the bias, b1,b2….bn are the weights in equation of polynomial regression, n is the degree of the polynomial. Increasing value of n increases higher order terms, therefore the equation becomes complicated.

Result Analysis
Initially we imported all required libraries and loaded data set of covid-19 death, recovered and confirmed dataset of two months from the specific location and preprocessed the data to remove outliers then analyzed the data by making predictions with construction of linear regression, support vector machines and polynomial regression models. Finally the results are visualized using line graphs shown in below figures 11, 12, 13. Data collection, Feature Extraction, Data preprocessing, Model Selection, Data visualization is done as step by step process. The results are clearly shown that the expected deaths, recovered and confirmed cases in next 15 days from 22nd January 2020.  Fig.11. Prediction of Corona cases using SVM    901034 accurately and can recognize easily the expected future growth in positive cases.

Conclusion
Covid-19 is novel dangerous virus which causes human death and created a pandemic situation for every one day to day life. In this proposed work, propagation analysis and prediction of covid-19 done by taking the datasets from various organizations. While doing some preprocessing techniques, visualizing the data and by handling the missing values and redundancy, there is a slight difference between the predicted and the actual curves. We make bold predictions using linear regression, SVM model and polynomial regressions accurate predictions can be done by these models. By these predictions we can be aware of the pandemic and take necessary precautions to safe ourselves from covid-19 and Polynomial regression model predicted better than linear and SVM models. For future enhancement of this work, non-linear models may be helpful for further accurate and better prediction of Covid-19 cases.