Volatility Forecasting of China Silver Futures: the Contributions of Chinese Investor Sentiment and CBOE Gold and Silver ETF Volatility Indices

This paper is to detect the role of CBOE gold ETF volatility index (GVZ), CBOE silver ETF volatility index (VXSLV), and constructed Chinese investor sentiment (CnSENT) on the volatility forecasting of China silver futures over daily, weekly and monthly horizons. Different types of HAR models and ridge regression models are utilized to do the analysis, and the out-of-sample R-square statistics and different rolling window sizes are used to ensure the robustness of the conclusion. The empirical results suggest that GVZ and VXSLV have the explanatory power on the China silver futures. Particularly, VXSLV has a better performance than GVZ. However, the predictive power of CnSENT is doubtful as some results indicate that it cannot improve the prediction accuracy. Additionally, the ridge regression method does not achieve a better result than all types of HAR models.


Introduction
Volatility forecasting is crucial in asset pricing and risk management, which has attracted many researchers' attention [1][2][3][4][5][6]. Due to the unobservable attribute of true volatility, many financial researches have focused on the implied volatility and historical volatility [7].
Silver, as a safe-haven asset, has played a crucial role in the commodity futures market [8]. The investors usually used it for the purpose of risk hedging and asset allocation optimization. During the economic downturn, the demand for silver increases [9].
The growing literatures focus on the volatility forecasting of futures prices. [10] suggested that the CBOE gold ETF volatility index (GVZ) and CBOE silver ETF volatility index (VXSLV) played a crucial role in volatility forecasting of China gold futures market. [11] indicated that VXSLV could help to predict the volatility of the China silver futures market. However, no study to date has examined the joint contribution of GVZ and VXSLV to the volatility forecasting of China silver futures market. Furthermore, [12] discovered that the FEARS index as a proxy for investor sentiment had the explanatory ability on the crude oil futures volatility. However, there has been less previous evidence for the explanatory power of Chinese investor sentiment (CnSENT) on the China silver futures volatility.
The Heterogeneous autoregressive model (HAR) proposed by [13] has been extensively used. The standard HAR model includes three components, which are daily, weekly and monthly variables. It can capture 'stylized facts' of long memory related to the volatility. The empirical findings suggest that high-frequency HAR models can achieve superior performance than GARCH models and stochastic volatility models [14]. However, one of the disadvantages of HAR models is that it cannot solve the issues of overfitting and multicollinearity [10]. Ridge regression model, as a popular shrinkage model, can solve these issues.
Based on the issues above, Chinese silver futures market is analyzed thereby. And whether CBOE gold ETF volatility index, CBOE silver ETF volatility index and Chinese investor sentiment, i.e., GVZ, VXSLV, and CnSENT, have the explanatory power on the China silver futures volatility is investigated by using different types of HAR models. Moreover, the ridge regression method is also used to solve the overfitting issue and to see whether this method can achieve superior performance results than other models. Additionally, out-of-sample R-square statistics and different window sizes are utilized for robustness checks. This paper has the following contributions to the existing literature. Firstly, [10] considered the joint contribution of GVZ and VXSLV on the China gold futures market. However, this research considers the predictive power of GVZ and VXLSV on the volatility forecasting of the China silver futures market. Secondly, [11] only used VXSLV to detect the predictive power on the China silver futures volatility. However, in this paper, the joint contribution of GVZ, VXLSV, and CnSENT to the China silver futures volatility is explored. Finally, existing studies used the FEARS index as a measure for the investor sentiment. However, the FEARS index may not represent the investors' sentiment in the context of the Chinese market. Another way is used to construct the CnSENT and to detect its explanatory power on the Chinese silver futures market.
The rest of this paper is organized as follows. Section 2 relates to the data description. Section 3 presents the methodology. Section 4 discusses the results. Finally, Section 5 concludes.

The Data
Data has been obtained from Bloomberg, Wind, and Choice databases between February 03, 2017, to June 24, 2020. The data frequency for CnSENT is a daily level. The data frequency for China silver futures prices, GVZ, and VXSLV, are high frequency. Figure 1 shows the time series of all variables' daily RV on the whole sample period. Compared with GVZ, VXSLV, and CnSENT daily RV, China silver futures' RV is not very dynamic. Table 1 is a summary of the descriptive statistics of all variables. Kolmogorov-Smirnov test and Jarque-Bera test indicate that all variables are not normally distributed. The Ljung-Box Q tests indicate that the autocorrelation between them is high. The augmented Dickey-fuller test indicates that all variables passed the unit root test, and there is no need for further data transformation.  The Kolmogorov-Smirnov and Jarque-Bera methods are used to test whether the sample follows the normal distribution. The Ljung-Box Q is used to test the serial autocorrelation of a time series based on a number of lags. The augmented Dickey-fuller test is used to determine whether a unit root exists in a sample. * indicates significance at the 10% level. ** indicates significance at the 5% level. *** indicates significance at the 1% level 3 Methodology

Investor sentiment construction
Based on the paper written by [15], several indicators are used to construct CnSENT as they can better capture the investors' emotions. These indicators are stock high opening rate, stock growth rate, stock effective growth rate, stock limit up rate, stock limit up closing rate, quantity relative ratio, and stock turnover rate. The stock high opening rate is calculated based on the Shanghai (securities) composite index, and others are calculated based on China's A-share market. Due to the relative sensitivities of stock growth rate, stock limit up closing rate, quantity relative ratio and stock turnover rate, the values of these indicators are based on a 20-day average. Table 2 describes the notations and definitions related to these indicators. (5) If the stock price reached the daily limit and continued at the daily limit until the close of the stock market, it means that this stock successfully closed limit up on day t.
QRR can be directly obtained from the Wind database without any further calculation.
STR � �� ���� � 100% (6) As the correlation between these seven indicators is not very strong, they can be used in the following calculation and modeling. According to the method used by [16], the min-max normalization method is used to process the data. The entropy weight method (EWM) is an important model that has been widely used in the research [17]. Based on that method, their weights are calculated and then each indicator is weighted to construct the CnSENT.

Standard HAR model and its extensions
Following [18], the daily realized volatility (RV) can be defined as the sum of squared intraday returns: Where M represents the sampling frequency, r �,� represents the ith intraday returns on day t denoted by r �,� = 100 × ( lnP �,� -lnP �,��� ). [13] indicated that the heterogeneous autoregressive model for realized volatility (HAR-RV) was an attractive model that has been widely studied and practiced. Many researchers used it as their benchmark model. In addition, compared with other models, this model can predict the performance more accurately [19]. The standard HAR-RV model can be specified as follows: Where ��� i s an unexpected error term, RV � ��� is lagged daily RV. RV � ��� is lagged weekly RV and RV � ��� is lagged monthly RV. The weekly and monthly RV formulas can be specified as follows: The standard HAR-RV model are used as the benchmark model for comparison and analysis purpose. [11] noted that VXSLV contained the information for the prediction of RV in China silver futures market. [10] indicated that GVZ and VXSLV were helpful for the RV forecasting in China gold futures market. What is the most interesting is that except VXSLV, whether GVZ and CnSENT can improve the performance of RV forecasting in China silver futures market. Thus, additional HAR extensive models can be specified as follows:

Ridge regression
Overfitting issues refer to a model that contains more variables than are needed, which violates parsimony [20]. Ridge regression method has been extensively used to solve this problem [21].This method is used to see whether this method can better improve the forecast performance. Based on the method proposed by [22] and used by [10], it has been calculated as: Where X stands for (t-1)*10 matrix, Y represents the vector of the dependent variable, I represents the identity matrix, and K represents the ridge parameter.

Out-of-sample forecasting method
In order to further explore the out-of-sample predictive performance of these variables, [23]'s method is followed and R-squared statistics is computed as follows: Where RV ��� represents actual RV, RV � ��� represents forecasted RV and RV � ����� represents the RV predicted by the HAR-RV model on day ��� . The results of other models are compared with the result of the benchmark model. If ∆R �� � is positive, it means that this model has a better forecasting performance than the benchmark model. Moreover, [24] method is followed to do the hypothesis test to see whether the mean squared forecast error (MSFE) of the HAR-RV model is less than or equal to the MSFE of other models. The p-value is based on [24] test.

Results
In this section, the regression is done to get the in-sample estimation results of all models. Then, the out-of-sample R-squared statistics is used to see the out-of-sample predictive performance of these variables. Finally, different forecasting window sizes are used to do the robustness check in order to further confirm the results.
A. In-sample estimations Table 3 is the summary of day ��� in-sample estimation results of all models on the whole sample period from February 03, 2017, to June 24, 2020, by Ordinary Least Square (OLS) method. ∆R 2 represents the rise for the adjusted R 2 of other models relative to the adjusted R 2 of the benchmark model.
The result for HAR-RV model shows that β ��� and β ��� are all significant at 1% level and β ��� are not significant, suggesting that short term and long term lagged RV play an important role in the prediction of China silver futures volatility. The coefficients in the HAR-GVZ model indicate that β ��� ��� and β ��� ��� are significant at least at 5% significant level. However, β ��� ��� has a negative influence on the forecasting of China silver futures volatility, while its β ��� ��� has a positive impact on it. The coefficients estimate in the HAR-VXSLV model suggest that β ����� ��� and β ����� ��� are significant at least at 5% significant level. However, contrary to the findings of [11], it cannot be found that β ����� ��� could contribute to the prediction of China silver futures volatility. This is interesting to note that the CnSENT in the Chinese stock market may not improve the prediction of China silver futures volatility as its coefficient is not significant.
With regards to the overall fitting evaluation, the results indicate that GVZ and VXSLV have predictive power for the forecast of China silver futures future volatility, which is consistent with the observations in [10] for China gold futures. VXSLV can improve volatility forecasts, which is also consistent with the results in [11] for China silver futures.
The ∆R 2 for HAR-GVZ model and HAR-VXSLV model are 10.1% and 21.9%, respectively, suggesting that VXSLV can contribute more in forecasting China silver futures volatility. The results show that CnSENT does not seem to improve the forecast of China silver futures volatility as the ∆R 2 for HAR-CnSENT model has decreased. By only focusing on R-square, it is found that HAR-GVZ-VXSLV-CnSENT and HAR-GVZ-VXSLV are the two best models for prediction among all models. However, by concentrating on R 2 and ∆R 2 , it suggests that HAR-GVZ-VXSLV model is the best model that generates the highest ∆R 2 . This result can be explained by the negative influence of this investor sentiment variable. Overall, all in-sample results point out that U.S. based gold and silver volatility indices can be very helpful for the forecast of China silver futures volatility, while the investor sentiment cannot improve the performance of the model. The table shows day ��� in-sample estimation results of all models over the whole sample period from February 03, 2017, to June 24, 2020. ∆R 2 means that the adjusted R-square increases or decreases of other models relative to the adjusted R-square of the benchmark model. * indicates significance at the 10% level. ** indicates significance at the 5% level. *** indicates significance at the 1% level B. Out-of-sample evaluations According to [10], the in-sample test was used to confirm the accuracy of the model for past patterns. The out-of-sample test was used to ensure that the model has the ability to forecast future performance. From the perspective of investors, these researchers paid more attention to the estimation results of the out-of-sample test. In this paper, the out-of-sample R-squared statistics is used to get a deep understanding of the predictive power of models for future trends. Particularly, whether GVZ, VXSLV, and CnSENT contain the information for forecasting the volatility of China silver futures is examined. Figure 2 shows the fluctuation of actual RV and the benchmark model' forecasted RV based on a 350-day rolling window. It indicates that the forecasting results generated by the benchmark model can capture some movements of realized volatility in the China silver futures market. Figure 3 shows the fluctuation of actual RV and all other models' forecasted RV based on a 350-day rolling window. There are two ridge regression models where the X � in M8 contains all variables except CnSENT and the X � in M9 contains all variables. Compared with the forecasting results produced by the benchmark model, other HAR type models perform better as they can generate more sensitive forecasts. This means that U.S. based information is very useful for the prediction of China silver futures volatility. Furthermore, the results show that models based on the ridge regression method can predict the realized volatility more precisely than the benchmark model.
A summary of out-of-sample results based on a 350day rolling window is given in table 4. The ∆R �� � represents the variation of out-of-sample R 2 of other models relative to the out-of-sample R 2 of the benchmark model. The p-value is based on [24] test.
The results show that they are all significant at least at 5% significant level. By incorporating GVZ into the benchmark model, the ∆R �� � has increased to 7.76%, suggesting that GVZ can improve forecasting performance. The ∆R �� � has increased to 17.43% by including VXSLV, indicating that VXSLV can contribute more to the predictive accuracy than GVZ, which is consistent with the in-sample results. However, the contribution of CnSENT is still doubtful as some model shows a positive influence on volatility forecasting and HAR-VXSLV-CnSENT model shows an opposite result. The HAR-GVZ-VXSLV-CnSENT model generates the maximum ∆R �� � (21.35%). Moreover, the HAR-GVZ-VXSLV model still remains a very good forecasting performance, indicating that U.S. based GVZ and VXSLV have significant predictive powers for the future volatility of China silver futures.
For the out-of-sample test, ridge regression method is used to examine whether this method has superior predictive performance than all other models by solving the overfitting issue. However, the results reveal that the models based on ridge regression method do not perform better than other models. By including the variable of CnSENT, its result is in accord with the result of HAR-GVZ-VXSLV-CnSENT model. Without the incorporation of CnSENT, its prediction performance is the same as the performance of HAR-GVZ-VXSLV model.
Overall, the usefulness of GVZ and VXSLV for the volatility forecast of China silver futures can be discovered. However, the forecasting ability of CnSENT is questionable.  is based on the p-value of MSPE-adjusted statistics. Hypothesis test relates to whether the mean squared forecast error (MSFE) of the HAR-RV model is less than or equal to the MSFE of other models [24]. * indicates significance at the 10% level. ** indicates significance at the 5% level. *** indicates significance at the 1% level.

C. Robustness check with a different rolling window
Using a different rolling window size may help to further determine whether the conclusion is robust or not. How to select the estimation window size has attracted the attention of many researchers as using a different window size may result in different empirical conclusions [25]. Thus, the method of [10], using a 250-day rolling window, is applied to redo the process and generate the estimation results again. Table 5 is the summary of the out-of-sample results based on different window sizes. The empirical results are revealing that GVZ and VXSLV can be utilized to predict the volatility of China silver futures by incorporating them into the benchmark model, respectively. Compared with GVZ, VXSLV has better predictive power. HAR-GVZ-VXLSV-CnSENT is the best model, and HAR-GVZ-VXSLV still maintains a good prediction performance. The role of the CnSENT in the volatility forecasting is still doubtful. Two types of ridge regression models cannot perform better than all other models. The results based on a 250-day rolling window are consistent with the conclusions based on a 350-day rolling window. is based on the p-value of MSPE-adjusted statistics. Hypothesis test relates to whether the mean squared forecast error (MSFE) of the HAR-RV model is less than or equal to the MSFE of other models [24]. * indicates significance at the 10% level. ** indicates significance at the 5% level. *** indicates significance at the 1% level.

Conclusions
This paper focuses on the volatility prediction of China's silver futures market. Specifically, the roles of CBOE gold ETF volatility index, and CBOE silver ETF volatility index are investigated, and Chinese investor sentiment is constructed based on the Chinese stock market from both the in-sample and out-of-sample perspectives.
The results reveal that GVZ and VXSLV can significantly improve the prediction accuracy of China silver futures. VXSLV plays a more important role than GVZ. Based on different types of HAR models, the results report that GVZ at the daily horizons and VXSLV at daily and monthly horizons have a negative impact on the volatility prediction. However, based on the in-sample and out-of-sample results, it cannot be concluded that CnSENT plays a crucial role in volatility forecasting, as the contribution of this variable is unstable. This may be due to the illiquidity in the Chinese silver futures market. Furthermore, although the ridge regression method is used to deal with the overfitting issue in the model, its forecasting performance is not better than all other HARtype models.
It would be interesting to further examine whether CnSENT can be utilized to assist in volatility forecasting of China silver futures by using other models or other indicators to construct a new Chinese investor sentiment. In addition, currently, many practitioners also take the influence of leverage effects and geopolitical risks into consideration when exploring the volatility prediction. Nevertheless, these issues are left for further study.