ARFIMA Model for Short Term Forecasting of New Death Cases COVID-19

COVID-19 is an infectious disease that can spread from one person to another and has a high potential for death. The infection of COVID-19 is spreading massive and fast that causes the extreme fluctuating data spread and long memory effects. One of the ways in which the death of COVID-19 can be reduce is to produce a prediction model that could be used as a reference in taking countermeasures. There are various prediction models, from regression to Autoregressive Fractional Integrated Moving Average (ARIMA), but it still shows shortcomings when disturbances occur from extreme fluctuations and the existence of long memory effects in the form of analysis of a series of data becomes biased, and the power of statistical tests generated for identification become weak. Therefore, the prediction model with the Autoregressive Fractional Integrated Moving Average (ARFIMA) approach was used in this study to accommodate these weaknesses because of their flexible nature and high accuracy. The results of this study prove that ARFIMA (1,0,431.0) with an RMSE of 2,853 is the best model to predict data on the addition of new cases of patients dying from COVID-19.


Introduction
COVID-19 is a respiratory infection that can spread from one person to another. This disease is caused by Novel Viruscorona (SAR-Cov-2) which was first identified in Wuhan China in December 2019 and has spread in various countries [1]. At the beginning of identification, sufferers of this disease have a high potential for death [2]. The World Health Organization (WHO) has designated the disease as a global pandemic due to its rapid and massive transmission over time [3]. One way to reduce the mortality rate caused by COVID-19 is to produce a prediction model so that it can be a reference in taking countermeasures.
COVID-19 which occurred in Indonesia has infected 1528 people in a period of less than one month from March 2, 2020, 2 confirmed cases were reported. On March 29, 2020, this case increased to 1,285 cases in 30 provinces, the five highest provinces including Jakarta, West Java, Banten, East Java, and Central Java [4,5].
Time series is one of the most popular methods in statistics for making prediction models. This is because the method is simple but able to solve more complex problems, if the case under study is affected by time [6]. Several studies using this method in predicting an event both in the fields of kymiatology, agriculture or health. Studies that examine these include [7][8][9][10][11][12].
There are various models in the time series including decomposition models, Autoregressive Integrated Moving Average (ARIMA), Seasonal Autoregressive Integrated Moving-Average (SARIMA), Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX), Vector Autoregression Moving-Average (SARIMA) VARMA), Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX), Simple Exponential Smoothing (SES), Holt Winter's Exponential Smoothing (HWES) and others. However, the accuracy of these methods is still weak when extreme fluctuations occur and is unable to capture if there are observations that have a strong enough correlation with other observations even though the distance between observations is quite far (long memory). From several time series models, the Autoregressive Fractional Integrated Moving Average (ARFIMA) is a model that is able to capture extreme fluctuations and long memory. This happens if the case under study experiences a continual change over time. [13][14][15][16][17][18][19][20].

Methods
The method used in this research is a case study by applying theory to analyze data in concluding additional cases of dead patients caused by the COVID-19 pandemic, this is done to be able to determine preventive measures. To achieve this goal, the following steps are taken.
The first step is to describe the died characteristics of COVID-19 patient data from March 3, 2020 to June 1, 2020. Data is divided into two parts, in sample data and out sample data. In sample data starts from March 2, 2020 until June 1, 2020, while out sample data starts from June 2, 2020 to June 11, 2020. The second stage is to identify the existence of the long memory with . This can be done by several estimation methods including Geweke and Porter Hudak (GPH), Nonlinear Least Square (NLS), Exact Maximum Likelihood (EML) and Modified Profile Likelihood (MPL) [21]. However, in this study using the GPH Estimator because the parameter d estimation in the GPH method can be done directly without knowing the values of the p and q parameters first [22]. (1) The next step is to create an ARFIMA model by making a time series plot, transforming data if the data does not meet the assumption of homogeneity in variance, making ACF and PACF plots of data that have been transformed, setting one or more ARFIMA models in accordance with the ACF and PACF plots of the results of the previous step, do the estimation of the model parameters and choose the best ARFIMA based model for in sample data and for out sample data After that carry out diagnostic tests assuming white noise and normal distribution using the Ljung-Box test and Kolmogorov Smirnov Ljung-Box and "Kolmogorov Smirnov" The last step is forecast the next 10 periods, then calculate the RMSE value from the forecast data obtained (2) 3. Results

Description of Data
The following is a description of data from COVID-19 patients who were declared dead from March 2, 2020 to June 11, 2020. distension and this was also the initial period of the pandemic entering Indonesia. While at the end of week 13 the easing of social boundaries began and coincided with the entry of Ramadan and Eid al-Fitr which is the culture of Indonesian people going home so that community mobilization is higher.

Testing Long Memory Data for Patients Died by COVID-19
Before create the ARFIMA model, the first thing to do is making a time series plot like in Fig. 1. to see the data patterns of patients dying from COVID 19 every day. Based on Fig. 2. it can be seen that data patterns are not stationary in variance, this causes the need for Box-Cox transformation as shown in Fig. 3.  Fig. 3. Shows the value of λ = 0, with each transformation in the patient's death data obtained a time series plot that resembles a straight line, the ACF and PACF plots also resemble the initial ACF and PACF data as shown in Fig. 4. and Fig. 5. for patients who died. Therefore, in this study ignoring data that was stationary on variants both data of patients recovering from COVID-19 disease and patients who died from COVID-19. The next step is to identify the long memory data of patients who died from COVID-19, this is done to see whether there is a long memory effect (long-term dependency). The way to do this is to observe ACF in Fig. 4. The ACF plot indicates the alleged long memory in the recovered data of COVID-19 patients and the data of COVID-19 patients who die every day due to ACF plots moving down slowly.
Apart from the ACF plot, to prove that the data follows a long memory pattern by estimating the parameter d by using the GPH Estimator. The estimated parameter d in the data of patients who died due to COVID-19 disease was 0.488. This shows that the GPH Estimator value for data of patients who recovered and died had values between 0 and 1. This proves that the data is following a pattern of long memory. From the detection of long memory and long memory tests that have been done above, the data of patients who die from COVID-19 every day can be modeled using the ARFIMA model.

ARFIMA Model for data of patients who are declared dead every day due to COVID-19 disease
After the descriptive statistics and long memory identification phase, the next analysis is ARFIMA modeling of patients who died from COVID-19. If the normality assumptions of the model residuals are not met, then the analysis continues by showing the value of kurtosis. Data of patients who die every day from COVID-19 disease are modeled with the ARFIMA model and get d = 0.431 with p-value = <2e-16 which means that the d value is  Fig. 5 which is the absence of a significant lag (out of the 2 √ $ limit) after lag 2, whereas in Fig. 6 shows that ACF drops slowly. This shows that the AR model was used as an estimate in this case.
After conducting a number of trials that included significant lags (out of bounds), the estimation models obtained and the corresponding data for patients recovering daily from COVID-19 disease are presented in Table 1.  (1) After getting the best ARFIMA model, the next step is checking the diagnosis, testing the residual to see whether the residual meets the assumption of white noise and has a normal distribution.  Table 2 shows that the L-Jung Box test p-value statistic is more than 5%, which is equal to 0.218. This means that the ARFIMA residual model (1,0,431.0) meets the assumption of white noise residuals. The next step is to test the normality distribution using the Kolmogorov Smirnov test, where the results are presented in Table 3. Based on the results of the normality test using the Kolmogorov Smirnov test in Table 3., the results of the residual assumptions are not normally distributed. After checking the kurtosis value, a value of 1.465 can be seen in Fig. 6., this causes the residuals not to have a normal distribution. The results from ARFIMA modeling (1,0,431.0) yielded forecasting for the next 10 periods presented in Table 4.

RMSE 2.853
Based on the forecast results in Table 4. it can be seen that the forecast results have a good value of the model for RMSE of 2.853, which means that the ARFIMA model (1.0.431.0) is very well used to predict patients who die from COVID-19 disease going forward.

Conclusion
The best Integrated Fractional Integrated Moving Average (ARFIMA) model obtained to predict cases of patients dying from a COVID-19 pandemic in Indonesia every day is (1 + 0,236 )(1 − ) (,*+& -= -with Root Mean Square Error (RMSE) of 2,853. This is because the ARFIMA model is able to accommodate well the long memory effect, resulting in a small bias. Also in estimating model parameters, it is also simpler. By knowing the addition of patients who died from COVID-19, we can take anticipatory steps and decisions that need to be made.