Optimal control for wind turbine based on reinforcement learning

. In this conference paper, an optimal control method is designed for a variable speed wind turbine system. Due to the inherent nonlinearity of the wind turbine arising from the aerodynamic torque, a linearized model is derived to handle the system's nonlinearities. An online update cost function is created based on the resulting linearized model. The critic neural network weight vector is updated with the steepest decent algorithm to design an optimal control able to minimize the given cost function. To validate the effectiveness of the optimal control based on reinforcement learning, simulation results with varying wind speed profile for different values of learning parameters are presented.


Introduction
Several research teams are currently interested in sustainable development and renewable energies.Thus, the wind turbine development represents a significant investment in technology research.The systems that generate electrical energy from the wind can provide a technological and economic alternative to the various sources of exhaustible energy.Therefore, the study of wind turbine systems is more important and necessary.In general, wind turbine systems fall into two categories: fixed-speed wind turbine systems and variablespeed wind turbine systems.Due to its high efficiency and the optimization of electricity production in the grid at different wind speeds, the variable-speed wind turbines have become very popular compared to fixed speed wind turbines [1,2].Strong nonlinear characteristics of the variable speed wind turbine, as is well known, can make the wind turbine's operation more challenging.Thus, the linearization around an operating point of this system has been made in [3]- [5].
Optimal control, as is well known, leads controlled plants to a specific optimality criterion, which is always important in contemporary complex systems.In other words, it is focused on identifying a control strategy that optimally guides the dynamical system to the equilibrium point, with regard to a cost function.Hamilton-Jacobi-Bellman (HJB) equations are solved for nonlinear system to obtain optimal control.The use of adaptive dynamic programming (ADP) is a powerful and significant method to solve HJB equations [6]- [10].Therefore, it is generally used to find the optimal control law in recent years.Reinforcement learning (RL), neuron-dynamic programming, adaptive critic designs and neural dynamic programming are all synonyms for ADP.The critic neural network is one of intelligent technologies of ADP, has been used in [11]- [15] to estimate the iterative cost function (performance index function) and control law.In this context, this paper, investigate the performances of the integration of RL in the control of wind turbine systems.
The structure of this conference paper consists of the following sections.In section 2, modeling and linearization of variable speed wind turbine is discussed.In section 3, the optimal control applied to the linearized model of a variable speed wind turbine is shown.In order to validate the proposed optimal control method, the simulations of rotor speed for different learning parameters are shown in section 4.This paper is ended with a conclusion.

Modelling and linearization of wind turbine system 2.1 Modelling of wind turbine
A wind turbine system consists of different interconnecting subsystems: the mechanical subsystem, generator subsystem, and converter subsystem.In this study, we focus on the mechanical subsystem, which includes the aerodynamic subsystem and the transmission subsystem (drive train subsystem).
The aerodynamic subsystem shows how wind speed is converted as forces acting on the blades that generate rotational motion.The power captured by the rotor is given by: Where ,  and R are the air density, the wind speed m/s, and the rotor radius, respectively.And   is the power coefficient, which is dependent on both the pitch angle  and the tip speed ratio  is given as follows: With   is rotor speed (rad/s).The aerodynamic torque expression is: Where   (λ, β) is the torque coefficient.
The   can be approximated using a modified and expanded form of the function for the power coefficient given in [16].The transmission subsystem transfers the mechanical power taken by the rotor to the electric machine.The following dynamic system represent the transmission between the rotor and the generator:   (6)

Linearization of wind turbine system
The nonlinearity of wind turbine model ( 6) comes from aerodynamic torque   , which depends on the wind speed (), the pitch angle  and rotor speed   .The linearization of aerodynamic torque   around a given operating point is given by the following expression: (  , , ) =   ( _ ,   ,   ) + ∆  (7) By taking ∆  , ∆  , and ∆  are the new state variables, and using system (6) and equation (8), the linearized state space model is given by:
In this section, an optimal control law  * () for system (10) is designed to minimize the cost function below: and  are positive definite matrixes.

𝜕𝐱
. Define the Hamiltonian function of optimal cost function and the optimal control problem as: ( x , , , ) =  T  +  T  +     (() + ()) (13) When the cost function () converges to its optimal value  * () , the following HJB equation is satisfied.(( x  * , , )) = 0 If the solution  * () exists and is continuously differentiable, the optimal control  * can be described as: It is hard to find the solution of equation ( 14) through dynamic programming approaches.Thus, a critic neural network is employed to estimate the cost function () as: Where   ∈ ℝ  is the weight vector of the critic neural network, () ∈ ℝ  is the activation function, () is the neural network error, and n is the number of neurons in the hidden layer.The gradient of equation ( 16) with respect to the system state  is: , and ∇E() are the gradients of the activation function and the neural network error respectively.Therefore, the Hamiltonian (13) becomes as: The critic neural network ( 16) can be approximated as follows: Where  ̂ is the estimated weight.The gradient of (19) with respect to the system state  is: Thus, the estimate Hamiltonian can be expressed as: Therefore, the objective function is minimized by using the steepest decent algorithm, the critic neural network weight vector  ̂ can be updated as: Where  = ()̇ Where   > 0 is the learning rate of critic neural network (22) and  > 0 is a designed parameter.Therefore, the optimal control law define as: ̂ converges to  * when the neural network weights converge to ideal weights.

Simulation results
The numerical simulations is provided in this section to explain the effectiveness of the proposed optimal control, which is implemented in MATLAB SIMULINK.The characteristics of studied wind turbine are presented below:    One may conclude from these plots that the proposed controller converges to the equilibrium point with an acceptable tracking error.After conducting several simulations, the optimal parameters were determined to be   = 0.01 and  = 0.5.Figure 3 illustrates that the rotor speed converges precisely to its desired value.This outcome serves as a validation of the effectiveness of the proposed optimal control approach.

Conclusion
In this paper, an optimal control method for a wind turbine system was studied.First, a linearization around an operating point of aerodynamic torque was made.Then, the adaptive dynamic programming methods are used to solving the optimal control problem.The online cost function was constructed, and subsequently a critic neural network was used to obtain the optimal cost function and the optimal control.Finally, the simulation results for different value of learning parameters were represented to test the efficacy of the proposed optimal control method.The simulation results demonstrate the performance of the proposed optimal control for the wind turbine system, and the significance of selecting the appropriate parameters.The main focus of this paper is on developing reinforcement learning based optimal control for the wind turbine, and the proposed method is evaluated based on the simulation results.