Linear Shrinkage and Shrinkage Pretest Strategies in Partially Linear Models

. In this paper, we improved the efﬁciency of parameter estimation in partially linear models, where subspace information is available. We proposed linear shrinkage and shrinkage pretest estimation strategies. The asymptotic distributional risk of the proposed estimators was examined. We also conducted a Monte Carlo simulation to evaluate the risk performance of the estimators. The proposed estimators performed better than the unrestricted estimator. A real data example was used to illustrated the application of the proposed estimators.


Introdution
A partially linear model (PLM) is a form of semiparametric model, which includes both parametric and nonparametric components. Its advantages include allowing easier interpretation of the effect of each variable, compared with a completely nonparametric model. It is also more flexible than the standard linear model since it combines both parametric and nonparametric components when it is assumed that the response variable is linearly related to some covariates, whereas its relation to other additional variables is characterized by nonparametric functions.
We consider the estimation of regression parameters in a PLM with many predictors. We thus have a full or unrestricted model with all predictors. However, there are some predictors which may have no influence on the response variable, so that the efficiency of full model estimation may decrease. The subspace information can be used to identify the significant and non-significant predictors. Such information may be derived from previous studies, expert opinion or variable selection techniques. Subspace information is usually incorporated into a model via an assumed restriction on the model parameters, resulting in submodels or restricted models. The use of submodel estimation improves efficiency if the submodel is true, but is less efficient when the submodel is not true.
The main point of this study is to test the validity of such information before incorporation into the estimation. The pretest estimation strategy is suggested by Bancroft [1]. The validity of the information is tested before incorporation into the estimation. In the following, Ahmed [2] developed the shrinkage pretest estimator from the pretest estimator. This improved significantly on the ordinary pretest in terms of the size of the test. Another strategy is based on the linear shrinkage that uses a linear function of the full model and submodel estimator along with a shrinkage coefficient. Estimators based on these strategies have been applied to various statistical models by researchers including Yüzbaşı et al. [3] for linear regression, Lisawadi et al. [4] for logistic regression, Reangsephet et al. [5] for negative binomial regression, and Piladaeng et al. [6] and Piladaeng et al. [7] for nonlinear regression, among others. For partially linear regression model, Ahmed et al. [8] used the pretest and shrinkage estimators to estimate the regression coefficients in PLM, where the nonparametric component is estimated using kernel smoothing. Raheem et al. [9] extended this work by using the B-spline series to approximate the nonparametric component, and proposed estimators based on shrinkage strategies. In this paper, we proposed the linear shrinkage and shrinkage pretest strategies for parameter estimation in PLM, where the nonparametric component is estimated using smoothing splines and subspace information is available.
The rest of this paper is organized as follows. In Sect. 2, we describe the model and estimation strategies. Sect. 3 provides the asymptotic distributional risk of the estimators. The results of Monte Carlo simulation are reported in Sect. 4. Application to a real dataset is given in Sect. 5. Conclusions are presented in Sect. 6.

Model and Estimation Strategies
The partially linear model (PLM) has a form as follows: where y i are responses, x i = (x i1 , . . . , x ip ) T and t i are vectors of covariates, β = (β 1 , . . . , β p ) T is a p−dimensional unknown parameter vector, g is an unknown function, ε i are independent and identically distributed random errors with zero mean and common variance σ 2 , and the superscript T denotes the transpose of a vector or matrix. Equation (1) can be written in vector-matrix form as ( where y = (y 1 , y 2 , . . . , y n ) T , X = (x 1 , x 2 , . . . , x n ) T , g = (g(t 1 ), g(t 2 ), . . . , g(t n )) T , and ε = (ε 1 , . . . , ε n ) T . We estimated the nonparametric component using the smoothing splines method. For smoothing splines-based estimation, a solution can be obtained by minimizing over β and g using following equation: where λ > 0 is a smoothing parameter which controls the tradeoff between smoothness and goodness of fit. We applied the smoothing splines method of Speckman [10] to estimate g and the estimator is given byβ = (X TX ) −1X Tỹ andĝ = S λ y − Xβ , wherẽ In this study, we partitioned the unknown parameter β = (β T 1 , β T 2 ) T , where β 1 , β 2 have dimensions p 1 and p 2 respectively, and p 1 + p 2 = p. We are interested in using the information on β 2 to estimate β 1 when β 2 is zero.

Unrestricted and Restricted Estimators
The unrestricted estimator (UE) of β 1 is denoted asβ UE 2 ,X 1 is composed of the first p 1 row vectors ofX, andX 2 of the last p 2 row vectors.

Linear Shrinkage Estimator
The linear shrinkage (LS) estimatorβ LS 1 of β 1 is a linear combination ofβ UE 1 andβ RE 1 , and is defined asβ where the constant c is the shrinkage intensity and c ∈ [0, 1]. The value of c may be assigned by the researcher based on subjective estimation of the prior information at hand, or chosen to minimize the mean squared error of the estimator. If c = 0, then the LS estimator is the UE estimator. Conversely, the LS estimator is the RE estimator when c = 1.

Shrinkage Pretest Estimator
To construct the shrinkage pretest estimator, we introduce the following profile likelihood ratio statistic for testing H 0 : β 2 = 0 p 2 .
where ln L(H 1 ) is the log-likelihood for the unrestricted model and is given by where RSS 1 = (ỹ −Xβ) T (ỹ−Xβ). And ln L(H 0 ) is the log-likelihood for the restricted model and is given by . Under the null hypothesis H 0 , the statistic T n follows the asymptotic χ 2 distribution with p 2 degrees of freedom.
The shrinkage pretest (SP) estimatorβ SP 1 of β 1 is defined aŝ Here, I(.) is an indicator function. The SP estimator uses the pretest strategy to choose between the UE and LS estimators. It is the UE estimator when H 0 is rejected, and takes the value of the LS estimator otherwise.

Asymptotic Results
In this section, we derive expressions for the asymptotic distributional risk (ADR) of the estimator β 1 . We consider local alternatives defined by where ω = (ω 1 , ..., ω p 2 ) T ∈ R p 2 is a fixed vector. The vector ω √ n is a measure of how far local alternatives K (n) differ from the subspace restriction β 2 = 0.
Theorem 1.1 Under the local alternatives K (n) and the assumed regularity conditions, the ADRs of the estimators are presented below.
Proof.See Appendix. From the ADRs results, we conclude that when the null hypothesis is true, that is ∆ = 0, all estimator perform better thanβ UE 1 , andβ RE 1 has a minimum ADR. Nevertheless, when ∆ → ∞, the ADRs ofβ RE 1 andβ LS 1 become unbounded. As ∆ increase, the ADRs ofβ SP 1 reach to maximum values, which are larger than that ofβ UE 1 , and then approach toβ UE 1 .
We used the mean squared error (MSE) criterion to investigate the performance of the proposed estimators and defined the relative mean squared efficiency (RMSE) of an estimator β * 1 with respect toβ UE 1 as follows: , whereβ * 1 is any one of estimatorsβ RE 1 ,β LS 1 , orβ SP 1 . If the RMSE(β UE 1 :β * 1 ) is larger than one, then estimatorβ * 1 is more efficient thanβ UE 1 . For briefly, we report only the results of case c = 0.50 in table 1 and their graphical representation in figure 1. Note that, in all figures, the shrinkage pretest estimators at α = 0.01, 0.05, and 0.1 are denoted by SP1, SP2, and SP3, respectively. We summarize our findings as follows: (1) The RMSEs of all estimators increased as p 2 increased.
(2) The RE outperformed all other estimators at or near ∆ * = 0, but its RMSE decreased sharply to zero as ∆ * increased, that is, the risk of RE increased and became unbounded.
(3) Similarly, the RMSE of the LS estimator decreased to zero more slowly than that of SM as ∆ * increased.
(4) When ∆ * was zero or nearly zero, the performance of SP estimator depended on the size of the test α and the shrinkage intensity c.
(5) As ∆ * increased, the RMSE of the SP estimator fell below one, then increased to approach one. (6) The SP estimator outperformed the SM and LS when ∆ * was large.

Application to Real Data
We considered the diabetes dataset in Willems et al. [11]. The data composed of 403 African Americans subjects who were interviewed and screened for diabetes in central Virginia. Our study focused on subjects with a small body frame and no missing measurements, producing a sample size n = 98. In this study, we considered Glycosolated hemoglobin as the response variable and the other 14 variables as the predictor variables. These 14 variables were: cholesterol (chol), stabilized glucose (stab.glu), high density lipoprotein (hdl), cholesterol/hdl ratio (ratio), age, height, weight, first systolic blood pressure (bp.1s), first diastolic blood pressure (bp.1d), waist, hip, postprandial time when labs were drawn (time.ppn), location (1=Buckingham, 0=Louisa), and gender (1=female, 0=male). Among these 14 variables, two were categorial variables (location and gender), 12 were continuous variables. Therefore, the number of subjects were n = 98 and p = 14.
The Bayesian information criterion (BIC) was applied to obtain a candidate submodel that contained four effective predictors: stab.glu, ratio, age, and time.ppn. The stab.glu variable was chosen as a nonparametric variable. To evaluate the performance of the proposed estimators, we conducted a bootstrapping with 1,000 replicates. We then used the relative prediction error (RPE) ofβ * 1 with respect toβ UE 1 : where MSPE denotes the mean square prediction error. An RPE greater than one means that β * 1 dominatesβ UE 1 . We considered α = 0.05. The RPEs in table 2 show that all the proposed estimators were superior to the UE estimator. Unsurprisingly, the RE estimator had the largest RPE because this estimator assumes that the subspace information is true. The LS estimator outperformed the SP estimators. These results confirmed the simulation results when the subspace information was assumed to be nearly correct.

Conclusions
We proposed the linear shrinkage and shrinkage pretest estimators for partially linear regression models when the accuracy of the subspace information is unknown. We examined the performance of the proposed estimators by applying asymptotic distributional risk, and performing a Monte Carlo simulation. We also applied the proposed estimators to a real dataset. We found that the RE estimator expectedly performed better than all other estimators when the subspace information was true or nearly true. However, as the uncertainty of the subspace information increased, its risk rapidly increased. The LS estimator was sensitive to the quality of the subspace information, though less than the SM estimator. The LS estimator was also less efficient than the SP estimator when the restriction was incorrect. The SP estimator outperformed the other estimators when the information misspecification increased. The analysis of the real data example produced results that were consistent with those from the theoretical and simulations.

Appendix
Following Chen and Shiau (1994), under the regularity conditions,β asymptotically follows a normal distribution where D − → means convergence in distribution. Under the local alternatives K (n) and the regularity conditions, and as n → ∞, the joint distributions are 11 Σ 12 ω, and Φ = σ 2 Σ −1 11 Σ 12 Σ −1 22.1 Σ 21 Σ −1 11 . We next present the asymptotic mean squared error matrix (AMSEM) of the proposed estimators, which help us to derive their ADRs. Under the local alternatives K (n) and the regularity conditions, the AMSEMs of the estimators are as follows: