Big Data Industrial Agglomeration Promoting Regional Innovation: Comparison between Guangzhou and Zhaoqing in China

— This paper selects the data of big data industry in China's "Guangzhou Development Zone Big Data Industrial Park" and "Zhaoqing Big Data Cloud Service Industrial Park" from 2014 to 2018, uses the improved knowledge production function to establish an OLS model, and compares the impact of MAR and Jacobs external aggregation on the R&D input and patent output in Guangzhou and Zhaoqing. It is found that: (1) MAR externality is not conducive to the technological innovation of the two cities, and has a stronger negative effect on innovation in Zhaoqing; Jacobs externality can actively promote the innovation of the two cities, and has a stronger positive effect on innovation in Guangzhou. (2) In the impact of Jacobs externality on innovation output of the two cities, R&D plays a part of intermediary effect, and the effect on Guangzhou is stronger; in the impact of MAR externality on innovation output of the two cities, R&D only plays a part of negative intermediary effect in Zhaoqing. The conclusions show that the MAR and Jacobs agglomeration in big data industry all play more effective roles in promoting technological innovation in economically developed cities.


Introduction
The basis of big data industry is cloud computing, internet of things and other key technologies, which is the development direction of scientific and technological innovation in the new era. Big data industry and technological innovation complement each other. On the one hand, a large number of scientific and technological innovation practices have accumulated a large number of data, providing materials for the construction of big data platform and big data research; on the other hand, the construction of big data platform reduces the exploration cost of technological innovation, and provides convenience for technological innovation.
Practice shows that the development of big data industry help to guide the input and output of regional technological innovation, but the construction of big data industry is still in the primary stage, there are not many practical experiences and research results for reference. Therefore, this paper discusses the impact of big data industry on the input and output of regional innovation by referring to the relevant research of traditional industrial agglomeration and regional innovation. In the study of the influence of traditional industrial agglomeration on regional innovation, there are two common problems. (1) Generally, it is based on the "innovation of two-stage value chain" (Guan and Chen, 2010), but the relationship between innovation input and innovation output is rarely discussed (Feldman and Audretscht, 1999;Paci and Usai, 1999), that is to say, the endogenous problems in the two stages need to be solved better. (2) There are regional differences in the development level of big data industry. So, are there differences in the impact of relevant variables based on big data industry on regional innovation? It is necessary to clarify this issue.

Data Sources
In this paper, the sample range is limited to Guangdong province, China, which is a strong province of big data industry development in China. At present, there are 16 provincial big data industrial parks in Guangdong province. In order to find out whether there are differences in the impact of big data industry on regional innovation, this paper makes a comparative analysis of two cities with large economic strength gap in the province. We choose "Guangzhou Development Zone Big Data Industrial Park" and "Zhaoqing Big Data Cloud Service Industrial Park" as the research objects. The reason why we choose these two cities is that they are the first tier cities and the third tier cities respectively, and there are great differences in their technological innovation capabilities. Then, exploring the impact of big data on the technological innovation of the two cities may have different economic results, which is a interesting comparative perspective.
The data of big data industry comes from the website of China's "State Intellectual Property Office" ① . Advanced retrieval is carried out in the "patent retrieval" module to obtain the relevant data of the two industrial parks in 2014-2018. The key words are: ("big data" or "map reduce" or "hadoop" or "data mining" or "cloud computing" or "structured" or "K-means" or "spark" or "mahout" or "slope" or "association rule" or "multi fork tree"). Other data are from Guangzhou statistical yearbook, Guangzhou science and technology statistical yearbook, Zhaoqing statistical yearbook and Zhaoqing science and technology statistical yearbook.

2.2.1
Explanatory variables Glaeser et al. (1992) puts forward "MAR externality" and "Jacobs externality": "MAR externality" emphasizes specialized knowledge spillover within the same industry, that is, knowledge spillover within the same industry is an important source to promote regional innovation and economic growth; "Jacobs externality" considers that knowledge spillover between different industries is the main driving force to promote regional innovation and economic growth. Based on this, this paper focuses on the difference between the specialization and diversity externalities of big data industry in the two cities on the role of local technological innovation. The calculation formula of specialized externality variables is shown in formula (1): Where, t , i and k respectively represents year, city and industry.
represents the total output value of industrial production. t i MAR , measures the difference between the specialization structure of urban industrial knowledge and the average level of the other city. The larger the variable value is, the stronger the effect of industrial knowledge specialization is. If the MAR is significantly positive, it shows that the specialization of local industry can help to improve the level of local innovation.
Refer to the measurement method of industrial diversification level (Duranton and puga, 2000), the calculation formula of Jacobs index is as shown in formula (2): ① http://pss-system.cnipa.gov.cn/sipopublicsearch/portal/uiIndex.shtml Among them, . The meaning of Jacobs , is the level of industrial diversification. The greater the value, the stronger the diversification and the greater the degree of decentralization. If the coefficient of Jacobs is significantly positive, it indicates that the level of local industrial diversification help to improve the level of innovation in the region.

Control variables
(1) The patent application count in big data industry (represented by L). This variable can examine the stimulating effect of big data patent applications on promoting local R&D investment.

Explained variable
We use local R&D input (R&D) and patent authorization (Lngpt) as explained variables, they embody the idea of two-stage value chain of regional innovation.

Empirical analysis 3.1 Improved Knowledge Production Model
This paper draws lessons from Kluge and Lehmann's (2013) department knowledge production function with constant returns to scale, and improves it as shown in formula (3) according to the research needs.  F , respectively represents the patent application counts for big data industry, the amount of fixed asset investment, and t i A , is the technical level, namely regional total factor productivity (TFP), which X is a variable parameter,  and  is the corresponding output elasticity. (3) can not be directly observed. In this paper, with reference to Martin et al. (2011), it is assumed that the technological progress is in the form of formula (4): Formula (4) shows that technological progress depends on the specialization t i MAR , and diversification t i Jacobs , of knowledge externality agglomeration, t i U , represents the regional characteristics. Among them, it represents the possible functional relationship between the degree of industrial specialization and diversification. In this paper, it is assumed that its logarithmic form is formula (5): Then, after the logarithm of equation (3), the general form can be obtained, as shown in equation (6): Equation (6) is similar to the Innovation Factor Mode (IFM) of Zhou and Li (2020), and its derivation process is consistent.

Effects of Big Data Industry on Regional Innovation
OLS regression is carried out for equation (6) by software Eviews 7.0, and the results are shown in Table  1: Table1 Impact of big data industry on regional innovation Note: *,** and *** are significant at the levels of 10%, 5% and 1% respectively. Table 1 shows that: (1) MAR externality has a negative impact on the R&D input and patent output of the two cities, indicating that the more specialized the big data industry is, the worse the local innovation level is. Due to the high similarity between industries and the lack of motivation to learn from each other, the level of R&D investment is also weak. In contrast, the negative impact is more obvious in Zhaoqing city, which shows that specialized agglomeration makes the innovation power of the small city weaker. Jacobs externality has a significant positive impact on the innovation input and output of the two cities, indicating that the more diversified the big data industry is, the stronger the local innovation level is. Diversified industries promote the power of mutual learning, so R&D investment level is higher, and knowledge spillover further promotes patent output.
(2) The greater the patent application counts of big data industry, the stronger the R&D input and patent output capacity of the two cities, indicating that the innovation capacity of big data industry has led to the improvement of regional innovation level. In addition, the fixed asset input of big data industry has a significant positive impact on innovation input and output of the two cities, indicating that the fixed asset input of big data industrial park has a positive guiding effect on regional innovation, or the fixed asset input of big data industry innovation and regional innovation has a linkage effect.

Intermediary Effect of R&D
As mentioned above, R&D input in regional innovation will affect patent output, so there may be endogenous in the regression model. Therefore, this paper chooses R&D investment as the intermediary variable to judge whether the two kinds of big data industry cluster externalities have intermediary effects on regional innovation output. The methods of setting up the function of intermediary variable and judging the intermediary effect are as follows: First, the basic explanatory variable has a significant impact on the intermediary variable; Second, the basic explanatory variable has a significant impact on the explained variable; Third, the intermediary variable is included in the explanatory variable, and the intermediary and explanatory variable have both significant impacts on the explained variable.
According to the coefficients and significance levels of the latter two explanatory variables, the intermediate effect results are confirmed. If the coefficient of the basic explanatory variable in the third step is smaller than that in the second step, there is partial intermediary effect; if the basic explanatory variable in the third step is not significant, there is complete intermediary effect ( Baron and Kenny, 1986). As shown in equation (7), (8), (9): The regression results are sort out as shown in Table  2: ① Taking R&D investment as the current variable (t period, 2017), then, R&D investment has an impact on patent output (t+1 period, 2018), and there is a time sequence relationship. This setting can avoid endogenous problem. Table 2 shows that: (1) for the regression results of Guangzhou, the influence of MAR externality on R&D is not significant, indicating that there is no intermediary effect. In contrast, for the regression results of Zhaoqing, MAR externality has significant influence on the explained variables, and the significance level of model (8) and model (9) is the same, what's more, the absolute value of the latter coefficient is smaller than the former, indicating that R&D plays a part of intermediary effect in the influence of MAR externality on patent output of Zhaoqing area. It needs to be noted that this intermediary effect is a negative effect, indicating that the higher the level of professional agglomeration of big data industry, the stronger the inhibition of R&D investment in relatively underdeveloped regions, which is not conducive to their patent output. Of course, the professional agglomeration of big data industry not only plays a role in innovation output through R&D, but also directly inhibits the exchange and learning between industries in relatively backward regions, which is not conducive to patent output.
(2) The Jacobs externality of big data industry has a significant positive impact on R&D input and patent output at the level of 0.10 or 0.05 respectively. Further judgment shows that: for two cities, the regression coefficient of Jacobs externality to model (9) is smaller than that of model (8), and the significance level is the same, indicating that R&D variable plays a part of intermediary effect about the impact of Jacobs externality on patent output. That is to say, the diversified big data industry agglomeration further promotes the patent output in part by stimulating the R&D input of local relevant industries, but the patent output is not all created by the R&D input, the diversified agglomeration itself can also stimulate the learning and exchange between local relevant industries, thus promoting the patent output.

Conclusions and policy implications
This paper selects two big data industrial parks in Guangzhou and Zhaoqing, Guangdong province, China as the research objects, and uses the improved knowledge production function to analyze the impact of the externality of specialized and diversified industrial agglomeration on the technological innovation of the two regions, and makes a comparison. It is found that: (1) MAR externalities are not conducive to the R&D input and patent output of the two cities, in which the negative impact on Zhaoqing city is stronger; on the contrary, Jacobs externalities have significant positive impact on the innovation input and output of the two cities, and there is no significant difference between the two cities.
(2) The more patent applications in big data industry, the more stimulating innovation input and output of the two cities; the more fixed assets input, the more innovation input and output. (3) In the process of the impact of the externalities of two kinds of big data industry clusters on the innovation output of the two cities, the R&D investment has both intermediary effects. Obviously, the conclusions of this study support the theory of Jacobs' externality, which is consistent with the conclusions of According to the above conclusions, the relevant policy recommendations are as follows: First, the specialized agglomeration of big data industry is not conducive to regional scientific and technological innovation, and excessive specialization should be avoided in the process of developing big data industry, especially in the relatively backward regions. Second, the diversity agglomeration of big data industry is conducive to regional innovation, especially in the relatively developed regions, and should promote the development of industrial diversity. Third, in the process of regional innovation development, the government should reasonably guide and support the R&D investment, so as to make the process of big data industry agglomeration promoting regional innovation output more smoothly.
In this paper, only two cities are selected as the research objects. The following research should examine the impact of big data industry agglomeration on regional innovation and its spatial effect.