Modelling Generalized Poisson Regression in the Number of Dengue Hemorrhagic Fever (DHF) in East Nusa Tenggara

. Regression analysis is an analysis used to model the relationship between the dependent variable (Y) and the independent variable (X). If the dependent variable is a discrete random variable, it is developed using the Poisson regression model. Poisson regression models require non-over-dispersion model assumptions. To deal with over-dispersion, a Generalized Poisson regression model was developed. Generalized Poisson regression (GPR) model is an extension of the Poisson regression model. In this study a GPR model is applied to model the number of dengue hemorrhagic fever (DHF) sufferers in East Nusa Tenggara Province in 2018. The independent variables used include percentage of poor population (X 1 ), population density (X 2 ), percentage of proper sanitation (X 3 ), percentage of decent homes (X 4 ), number of doctors (X 5 ), percentage of access to improved drinking water (X 6 ), average length of schooling (X 7 ), human development index (X 8 ). In the resulting model, Poisson regression experiences multicollinearity and overdisception occurs. To overcome multicollinearity, variable selection is performed. Based on the measurement of the goodness of the model using AIC, the GPR model provides better accuracy than Poisson regression to model DHF in East Nusa Tenggara which is 218.5.


INTRODUCTION
Regression analysis is a method used to analyze the relationship between two variables, namely the dependent variable with the independent variable. In general, regression analysis is used to analyze data with the dependent variable in the form of continuous random variables. However, there are also data analyzed that the dependent variable is a discrete random variable. One regression model that can be used to analyze the relationship between the dependent variable in the form of discrete data with an independent variable is the Poisson regression model.
One of the assumptions that must be met in the Poisson regression model is the variance of the dependent variable whose value is equal to the average. This is in accordance with the characteristics of the Poisson distribution ie the value of variance is the same as the average value. But in the analysis of discrete data with Poisson regression models sometimes violations of assumptions can occur, where the variance value is greater than the mean value called overdispersion or the variance is smaller than the mean value called underdispersion. According to Cameron and Trivedi (1999) [1] overdispersion has the same effect as a violation of homokedasticity in the linear regression model. Homocedasticity is one of the assumptions that must be met in the classic linear regression model, where the value of the variance of residuals is constant. While the variance and the mean in Poisson regression that experiences dispersion (overdispersion / underdispersion) are ( ) = (1) where constant f is parameter of dispersion/ scale parameter. When it was overdiseprsion, then the value of f is greater than one.
According to Wang and Famoye (1997) [2] one of the development methods used to overcome overdispersion is the Generalized Poisson regression model. The concept of forming this regression model is based on the generalized Poisson distribution which can explain a number of discrete data that maintain the overdispersion and underdispersion properties. Generalized Poisson regression is part of the Generalized Linear Model (GLM) that does not require the dependent variable to have a Normal distribution and does not require the homogeneity / constancy of the variance to test its hypothesis. One of the researches related to Generalized Poisson regression model is Mahama et. al (2020) [3] modeling technology adoption in increasing soybean production using the Generalized Poisson regression model. The dependent variable used is the number of technologies adopted, while significant independent variables include age, education, level of visits, and mass media via radio.  Figure 1 shows the number of DHF sufferers in NTT from 2015 to 2019. Based on the figure, there was a significant increase in number from 2017 to 2019. In addition, in 2020, the number of DHF sufferers has almost doubled compared to 2019 which has reached the 4000s. This is certainly the concern of the NTT provincial government to deal with the problem of DHF. Not only intervention from the provincial government alone, the people of NTT must be aware of clean and healthy behavior. This is because dengue hemorrhagic fever is highly influenced by environmental, climate and weather factors, as well as the individual's The Number of DHF Year lifestyle. Therefore in this paper, the number of DHF sufferers in NTT is modeled in 2018 with the independent variables being the percentage of poor population, population density, percentage of proper sanitation, percentage of decent housing, number of doctors, percentage of access to safe drinking water, average length of school , human development index.

Poisson Regression
Regression analysis is a statistical tool that utilizes the relationship between two or more quantitative variables so that one variable can be predicted from other [4]. The right model for discrete data (count) is the Poisson regression model which is a nonlinear regression model (Cameron and Trivedi, 1998 with @ is the average number of events that occur in a certain time period in observation ith and is parameter of Poisson regression The method used to estimate the parameters in Poisson regression is the Maximum Likelihood Estimation (MLE) method, namely by maximizing the likelihood function [5]. Estimating parameters for poisson regression are as follows Furthermore, the likelihood function in equation (2) Then the equation (4) Equation (5) is equated with zero to get the parameter estimator. However, this method does not get explicit results, so that the alternative to solving equation (5) is the Newton-Rhapson iteration method. The purpose of the numerical iteration method is to maximize the ln-likelihood function [5].

The Testing of Parameter Poisson Regression Model Simultaneously
A good model is a model that has a small residual value. In poisson regression deviance is used to determine the best model. Finding deviance values from the model is obtained by the initial step of simultaneous testing (overall). Simultaneous testing the parameters of the Poisson regression model are used to see the suitability of the resulting model. The concurrent testing hypothesis is as follows H Q = . = / = ⋯ = 9 = 0 : H . = at least A ≠ 0, j = 1,2, … , k The test statistic used was obtained from the Myers Maximum Likelihood Ratio Test (MLRT) method. (6) Devians is approaches from / distribution with the large sample and it has degree of freedom v. For criteria of testing is reject H Q for level significant for D(β) > / ( , ) . Deviance values will be smaller if the parameters in the model increases [6]. The smaller the devians, the smaller the error rate produced, so the model becomes more precise.

The Testing of Parameter Poisson Regression Model Partially
The parameters generated from the assessment process may not all have a significant influence on the dependent variable. Partial parameter testing is used to determine the parameters that have a significant influence on the model. The hypothesis used in the partial test are: The statistics testing as follow as: where s E Ç A I is standard error of estimation parameter of . Criteria of rejection is reject H0 for | | > (1V9V., ê ë )

Over-dispersion
Poisson regression has an assumption that is equidispersion where the dependent variable used has the same mean and variance of value. One violation of that assumption is overdispersion where the value of variance is greater than the mean value. Overdispersion causes the alleged child of the regression parameters to be consistent but inefficient. If the equidispersion assumptions are not met then the impact on the standard error value is under estimate, so the conclusions obtained are invalid [7].
According to McCullagh and Nelder (1998) [6] to detect overdispersion in Poisson regression, Pearson Chi-Square can be divided into degrees of freedom. The formula for Pearson Chi-Square statistical tests is as follows:

Generalized Poisson Regression
Generalized Poisson regression (GPR) model is model that used to count data to overcome overdispersion or underdispersion. GPR model have parameter and additional parameter . The distribution of GPR model can be written as [8]: .ûüW X É (9) Means and variance of Generalized Poisson distribution are E(Y) = and var(Y) = (1 + ) / . If the value of is zero then the model formed is Poisson regression. If the value of > 0 then it called overdispersion, while the value of < 0 then it called underdispersion. The method that used for estimate parameter of GPR model is Maximum Likelihood Estimation (MLE) with the combination of Newton-Raphson iteration. The likelihood of GPR model can be written as: Furthermore, the likelihood function in equation (10)  For getting the estimation parameter β and then it was derivative to ln likelihood function of β and . The first derivative from ln likelihood function as follow as: Because it has not been able to provide a solution then another solution is used by the numerical iteration method called the Newton-Rhapson method. The purpose of the numerical iteration method is to maximize the ln-likelihood function. In the Newton-Rhapson method the first and second derivatives of the ln-likelihood function are required. For estimatung the and with this method, it was needed prior estimation from and . According Wang dan Famoye (1997) the prior estimation of getting the zero value or can used the Chi square divided to degree of freedom.

The Testing of Parameter Generalized Poisson Regression Model Simultaneously
The method that can be used to determine test statistics in parameter testing is the Maximum Likelihood Ratio Test (MLRT) method. The hypothesis for the simultaneous test as follows as: H Q = . = / = ⋯ = 9 = 0 : H . = at least there is A ≠ 0, j = 1,2, … , k The test statistic used is obtained from the Maximum Likelihood Ratio Test (MLRT) method using the deviance value.
For criteria of testing is reject H Q for D(β) > / ( , ) .

The Testing of Parameter Generalized Poisson Regression Model Partially
Partial parameter testing is used to determine the parameters that have a significant influence on the model. The hypothesis used in the partial test as: Q : A = 0 (not significant) . : A ≠ 0 with j=1,2, ..., k (significant) the statistics testing as follow as: Criteria of testing, reject H0 for | | > (1V9V., ê ë ) .

RESULTS AND DISCUSSIONS
Descriptive statistics of the data used are as follows in Table 1.

Poisson Regression Model for The Number Dengue Herrmohagic Fever (DHF) in East Nusa Tenggara
Data on the number of patients with Dengue Hemorrhagic Fever (DHF) is a count data with a very small probability and the amount of data that has a certain time interval, so that the appropriate analysis is poisson regression modeling.

Parameter Testing Simultaneously
The hypothesis for simultaneous testing is as follows H Q = . = / = ⋯ = 9 = 0 H . = at least one of A ≠ 0, j = 1,2, … , k with significant level is α = 0.05 and criteria of testing is reject H0 for D(β) > / ( ,ü) . Based on results, the value of D(β) = 1560.9 is greater than / (.¬,Q.Qƒ) = 22.36 so H Q was rejected. So that, at leat one of βj ≠ 0. Therefore, this indicates that simultaneously the independent variable affected DHF sufferers in East Nusa Tenggara in 2018

Parameter Testing Partially
The next step is to partially test the parameters. The hypothesis of partial parameter testing is as follows H0 : βj = 0 (parameter is not significant), where j =1,2,3 H1 : βj ≠ 0, (parameter is significant) with significant level is α = 0.05 for criteria of testing is reject H0 for | | > ( ê ë ) = 1.96. The results as follow as: Based on Table 2 it appears that all independent variables are significant for the model. In poisson regression obtained AIC value of 1677.2. Furthermore, multicollinearity testing will be performed on all independent variables. Obtained the Variance Influence Factors (VIF) values as follows: Multicollinearity occurs when the VIF value> 10. Based on Table 3, the variables that occur multicollinearity are variables X1, X2, X5, X7, and X8. To handle multicollinearity assumptions, a variable selection process is carried out. The variable selection process is done by removing the variable that has the highest multicollinearity value, namely X2. Furthermore, it is modeled again using Poisson regression, until we get a variable that does not contain multicollinearity. Obtained variables that do not contain multicollinearity such as X1, X3, X4, X5 and X6. Poisson regression model is obtained as follows in Table 4. In the Poisson regression model, the variables X2, X7 and X8 are not included in the model, because these variables experience multicollinearity. The model produces an AIC value of 986.4, the value is smaller than that involving all variables. This shows that the selection of variables made provides a measure of the greater good of the model. The next step is overdispersion testing for the formed Poisson regression model

Overdispersion testing for Poisson Regression Model
Analysis of the case of overdispersion is used to determine whether the Poisson regression model obtained meets the assumptions.  Table 5 shows the Pearson Chi-Square value of the Poisson regression model is 2934.6. If the value is divided by the degree of freedom it will produce a dispersion value of 133.4 whose value is greater than 0, which indicates overdispersion. From this it can be concluded that the data on the number of DHF sufferers in East Nusa Tenggara in 2018 experienced overdispersion. To overcome this, Generalized Poisson regression modeling is used.

Generalized Poisson Regression (GPR) Model for Number of Dengue Herrmohagic Fever (DHF) in East Nusa Tenggara
In the case of the number of DHF sufferers in East Nusa Tenggara in 2018 experiencing overdispersion cases, this can be overcome by using Generalized Poisson Regression (GPR) modeling. To get the estimated parameters of the GPR model the Maximum Likelihood Estimation (MLE) method is used with Newton-Raphson iteration.

Parameter Testing Simultaneously
The hypothesis for simultaneous testing is as follows H Q = . = / = ⋯ = 9 = 0 H . = at least one of A ≠ 0, j = 1,2, … , k with significant level is α = 0.05 and criteria of testing is reject H0 for D(β) > / ( ,ü) . Based on results, the value of D(β) = 204.5 is greater than / (.¬,Q.Qƒ) = 22.36 so H Q was rejected. So that, at leat one of βj ≠ 0. Therefore, this indicates that simultaneously the independent variable affected DHF sufferers in East Nusa Tenggara in 2018 use GPR model.

Parameter Testing Partially
Based on the results of simultaneous testing which shows that the independent variable influences the number of DHF sufferers in NTT in 2018 using the GPR model. The next step is to partially test the parameters. The hypothesis of partial parameter testing is as follows H0 : βj = 0 (parameter is not significant), where j =1,2,3 H1 : βj ≠ 0, (parameter is significant) with significant level is α = 0.05 for criteria of testing is reject H0 for | | > (1V9V., ê ë ) = 2.11 or probability less than significant level. The results as follow as:  Table 6 shows a partial test of Generalized Poisson Regression models. Based on Table  6, all variables are significant except X3. It shows that the percentage of households with proper sanitation does not affect the generalized Poisson regression model. However, when compared to the Poisson regression model with the GPR model, with the same number of variables X1, X3, X4, X5 and X6, the GPR model has a smaller AIC than the Poisson regression model of 218.5.
For the interpretation of the GPR model, it cannot be done partially. For example, the X5 variable is the number of doctors, in theory the greater number of doctors should be able to reduce the number of DHF sufferers in NTT. This shows that the parameter coefficient should be negative. However, the model obtained from the GPR model and Poisson regression shows that the parameter coefficient is positive. Therefore, the interpretation of the GPR model or Poisson regression is done together by involving variables X1, X3, X4, X5 and X6. In the GPR model, the X3 variable is not significant, so it can be assumed that the value is zero. The GPR model formed is as follows: