The behavior strategies between the government and power generation enterprises considering the learning mechanism based on evolutionary game

The Renewable Portfolio Standard (RPS) as a policy tool to promote renewable energy development has gone through more than ten years in China. In order to research the strategic interaction between governments and power generation enterprises under the background of energy system transformation and upgrading, a learning mechanism was introduced based on the dynamic reward and punishment mechanism, and an evolutionary game model between the government and power generation enterprises was established. The results showed that the evolutionary stability strategy depended on the dynamic reward and punishment mechanism, which is conducive to the gradual stability of the system. The existence of learning mechanism not only reduced the cost of wind power, but also reduced the probability of government supervision.


Introduction
With the rapid development of the economy and society, competition among countries for fossil energy such as coal, oil and natural gas has intensified. At present, coal and other fossil energy sources provide about 70% of the electricity to the power sector in our country. Since the reform and opening up, China's power generation cumulative installed capacity and total power generation have grown rapidly, even the average annual growth rate has exceeded the growth rate of energy production [1]. At present, although the power structure based on thermal power in China is continuously optimized, it still differs from the power structure of developed countries.
China is the world's largest emitter, with electricity emissions accounting for about 50% of total emissions [2]. Power generation enterprises (hereinafter referred to as "enterprises"), as an important role in the power market, are key actors in the continuous optimization of the power structure. At the same time, the enterprises are also a stakeholder and the optimization and upgrading of the power structure means increasing the proportion of renewable energy power generation. The additional production cost makes the enterprises lack the motivation to fulfill their obligations. Therefore, the government must establish relevant laws and systems to guide and support enterprises through incentive policies. The Fixed Feed-in Tariff (FIT) and Renewable Portfolio Standard (RPS) are the two most important incentives for renewable energy in the market. In recent years, they have achieved significant results in the field of renewable energy power generation in China. And now, China's power market insists on the synergistic use of FIT and RPS, and is in the stage of gradual transition from FIT to RPS [3]. Wind power, as a clean and efficient renewable energy source, has great potential for development in China. The installed capacity of newly added wind power has slowed down in recent years in China, even so, the cumulative installed capacity of wind power has maintained a steady upward trend. Although wind power has broad development prospects, its own unstable and uncontrollable defects lead to relatively high average cost of wind power and serious wind abandonment. Nowadays, the government has made significant progress in promoting the renewable energy power industry, but still faces many difficulties in the implementation of the policy.
At present, domestic and foreign scholars have done a lot of research on the contradictions and conflicts of stakeholders in the electricity market. The extensive application of evolutionary game theory provides new ideas for the transformation and upgrading of energy systems. This analysis method based on the bounded rationality of individuals and considering the mutual imitation and learning of groups in the process of strategy selection is more representative [4][5][6].In addition, the cost advantages obtained by enterprises through technological progress and experience accumulation in the daily production process cannot be ignored [7]. The inclusion of this learning mechanism into the measurement category of the model is helpful to improve the practical significance of the model.

Model hypothesis
We make the following model hypothesis according to evolutionary game theory:(1)Both the government and enterprises of the game subjects are bounded rational, and the information is complete. (2)In the face of market regulations on wind power proposed by the government, enterprises can decide whether to increase the proportion of wind power, that is, they choose two strategies of "obey" and "disobey", while the government chooses two strategies of "supervise" and "non-supervise". (3) Assuming that the cumulative power generation of the enterprises is Q , and the proportion of wind power generation increased by the enterprises is α , that is, the wind power generation increased by the enterprises is Q α .(4)Because of the intermittent and unstable nature of renewable energy such as wind energy, the transmission cost of wind energy is higher than that of conventional energy. So it is necessary for the government to establish a corresponding compensation mechanism to encourage the transmission and management of wind power in the grid.
According to the hypothesis, we can obtain the payoff matrix of the evolutionary game and main parameters as shown in Table 1 and Table 2. Table 1. Evolutionary game payoff matrix.

Model construction
Assuming that the probability of the government selects the "supervise" strategy is x ( 0 1 x ≤ ≤ ), and the probability that the enterprises choose the "obey" strategy is y ( 0 1 y ≤ ≤ ).
Thus, the replication dynamic equation of the system (I) consisting of government and enterprises strategies can be obtained as follows:

Analysis of evolutionary stability strategy without considering learning mechanism
The existing literature shows that under the static reward and punishment system, the government and enterprises cannot achieve the expected Evolutionary Stability Strategy (ESS) [8]. Therefore, the influence of dynamic reward and punishment system on evolutionary strategy is considered. Thus, the replication dynamic equation of the system (II) can be obtained as follows: According to the replication dynamic equation According to Friedman's method for testing the properties of equilibrium points [9], we can find that the equilibrium points (0,0), (0,1), (1,1), (1,0) of the system (II) are saddle points, and the equilibrium point 1 1 ( , ) x y is the asymptotically stable point, and the corresponding characteristic roots are a pair of complex roots with negative real parts. It can be concluded that the equilibrium point 1 1 ( , ) x y is the evolutionary stability strategy (ESS) of the system with asymptotic stability, and the evolutionary trajectory of the system (II) is a curve with 1 1 ( , ) x y as the focus and spirally converges to the focus.

Analysis of evolutionary stability strategy considering learning mechanism
This paper introduces a single factor learning curve model, that is, as the cumulative installed capacity increases, the unit cost of renewable energy is decreasing [10]. The basic model is: Where SC represents unit capacity cost for a renewable energy source. a is the initial unit capacity cost of the renewable energy. b denotes the yield elasticity coefficient. CC is the cumulative installed capacity.
In this paper, we simplify the process of replacing the cumulative installed capacity with the cumulative generating capacity without considering the actual capacity utilization rate of the enterprises. Therefore, assuming that the initial wind power proportion of the enterprises is 0 α , when considering the impact of learning mechanism, the actual cost differences of the enterprises should be rewritten as follows:

So we can get the new replication dynamic equation of the system (III):
[ ] * * 1 1 1 In the same way, five equilibrium points of the system (III) can be obtained as follows: (0,0), (0,1), (1,0), (1,1) and 2 2 ( , ) x y under the condition of 0 1 x ≤ ≤ and 0 1 y ≤ ≤ . In addition, we can also calculate that In the same way, we can see that the equilibrium points (0,0), (0,1), (1,1), (1,0) of the system (III) are saddle points, and the equilibrium points 2 2 ( , ) x y is the asymptotically stable point. So we can summarize that the equilibrium point 2 2 ( , ) x y is the evolutionary stability strategy (ESS) of the system with asymptotic stability, and the evolutionary trajectory of the system (III) is a curve with 2 2 ( , ) x y as the focus and spirally converges to the focus.

Case analysis
Under the premise of satisfying the basic assumptions of the model, the initial parameters are assigned and analyzed.

The situation of China's electricity market
In 2017, China's annual power generation is 11 44.2 10 × KWh, of which wind power generation is 11 3 10 × KWh, accounting for about 6.8% of the total power generation. In addition, the average price of wind power in China is 0.56 Yuan/KWh, while the average price of traditional thermal power in the same period is 0.37 Yuan/KWh, so the upper limit of government subsidies for wind power should be 0.19 Yuan/KWh. Because of the maturity of technology, it is assumed that the average price of the traditional thermal power is equal to its average production cost.

Simulation results
Fig.1shows the evolution processstatic reward and punishment system.The evolution track of system (I) is a closed-loop loop, indicating that the game behaviour of government and enterprises is a cyclical behaviour, and the system does not have asymptotic stability.   2 compares the evolution process under whether there is a learning mechanism. One the one hand, we can find that dynamic reward and punishment is conducive to the system to achieve a gradual stable state. And on the other hand, the existence of learning mechanism will make the equilibrium point of the original system shift to the left, that is, the probability of government supervision will be reduced.

Conclusions and policy implications
Main findings are concluded as follows:(1)Dynamic reward and punishment system is the key to the equilibrium of the evolutionary game between the government and power generation enterprises: when the degree of reward and punishment is related to the behavior of enterprises, the evolutionary trajectory of the government and enterprises will converge around the focus and eventually reach a stable equilibrium state. (2)The introduction of learning mechanism will make the equilibrium point of the original system move to the left, that is, the probability of government supervision will be reduced, and the probability of enterprises choosing to comply with norms depends on whether government subsidies and corporate social benefits can compensate for the cost difference of enterprises: if not, the learning mechanism will not be abided by enterprises; if it is enough to compensate, learning mechanism can effectively promote enterprises to actively comply with norms. Based on the results, we propose the following policy implications: (1) Improving the renewable energy quota system. As an important institutional guarantee for China to vigorously develop renewable energy, the Renewable Portfolio Standard is an effective way to solve the problem of resource shortage and environmental pollution in the future. Among them, the quota index of renewable energy is the core of the system. The government should make reasonable formulation according to the actual situation of the market. If the quota is too low, there may be insufficient incentives. And if the quota is too high, it may bring a heavy economic burden to the enterprises. (2) The government's reward and punishment policy should be adjusted in a timely and appropriate manner according to the behavior of enterprises, among which the existence of enterprise learning mechanism is an important factor that cannot be ignored. The government needs to pay close attention to the learning situation of enterprises and actively encourages enterprises to improve production efficiency through learning.