Intelligent digital twin – machine learning system for real-time wind turbine wind speed and power generation forecasting

. Wind power is a key pillar in efforts to decarbonise energy production. However, variability in wind speed and resultant wind turbine power generation poses a challenge for power grid integration. Digital Twin (DT) technology provides intelligent service systems, combining real-time monitoring, predictive capabilities and communication technologies. Current DT research for wind turbine power generation has focused on providing wind speed and power generation predictions reliant on Supervisory Control and Data Acquisition (SCADA) sensors, with predictions often limited to the timeframe of datasets. This research looks to expand on this, utilising a novel framework for an intelligent DT system powered by k-Nearest Neighbour (kNN) regression models to upscale live wind speed forecasts to higher wind turbine hub-height and then forecast power generation. As there is no live link to a wind turbine, the framework is referred to as a “Simulated Digital Twin” (SimTwin). 20 19-2020 SCADA and wind speed data are used to evaluate this, demonstrating that the method provides suitable predictions. Furthermore, full deployment of the SimTwin framework is demonstrated using live wind speed forecasts. This may prove useful for operators by reducing reliance on SCADA systems and provides a research and development tool where live data is limited.


Introduction
Wind power is of particular interest in efforts to reduce greenhouse gas emissions [1] given its technological readiness, relatively low environmental footprint, and abundant availability [2].Electricity is generated via the conversion of kinetic energy contained within the wind, governed by the wind power equation [3]: P=0.5C P ρAV 3  (1) Where P is power generated, C P is a wind turbine's power coefficient, ρ is air density, A the rotor wind-swept area, and V is wind speed.
Given the fluctuating nature of the wind [4], wind turbine power output can be highly variable.This variability and the resultant difficulty in forecasting future power generation pose several challenges in power grid integration whilst ensuring stability [5].This has led to a number of proposed solutions including electrical interconnectors [6], energy storage systems [7] and improved demand prediction [8].
Another potential mitigating measure is predicting wind turbine power generation.This allows better wind farm [9] and grid network management, reducing the need for generation reserves [5], as well as enabling other solutions such hydrogen energy storage [10].Given that power generation is dependent on the wind speed cubed, this is frequently seen as the most important, and therefore the most used, input parameter for calculating power generation [11].Manufacture power curves provide one method of doing so, detailing anticipated power output for a given wind speed.However, these can be overly optimistic and are often based on ideal conditions [12].
Supervisory Control Data Acquisition (SCADA) is commonly used in wind energy [13], providing wind turbines, wind farms, and associated equipment the ability to report their operational status [14].Solutions using SCADA have become increasingly popular in fault diagnosis and prediction in wind turbines [15,16,17].SCADA often details power output, wind speed, and other associated metrics, allowing its use for power generation predictions.
SCADA data is typically recorded at 10-minute intervals and provides a number of potentially useful measurements including wind speed, wind direction, power generation, voltages and component temperatures [18] making it ideal for data-driven methods utilising machine learning.These have the benefit of not requiring domain knowledge, with the potential for model improvement over time [19].Additionally, the use of large datasets allows many different aspects to be considered [20], providing a good way to discover complex relations between data.However, it may be difficult to predict extreme conditions due to limitations in observations and whilst correlation may be determined, this will not give causality for what occurs [19].
Given the availability of SCADA datasets, numerous publications have undertaken wind turbine power generation-related research making use of this resource.Lin et al. [21] utilised isolation forest and deep learning techniques alongside high-frequency SCADA data to derive an improved technique for outlier detection, improving the accuracy of power generation predictions.Delago and Fahim [22] devised a Long Short-Term Memory (LSTM) powered data analysis framework, capable of predicting and visualising SCADA data and associated power generation prediction.
An important factor in the prediction of energy generation from a wind turbine is knowledge of forecasted wind speed, with a number of machine learning techniques also at the forefront of this.Lv and Wang [23] utilised deep learning within a newly proposed combined model to forecast wind speed which followed a "decomposition-optimisation-forecasting" principle.Hur [24] developed a 2-stage method for short-term wind speed prediction utilising an extended Kalman filter for estimation, with a combination of extrapolation and a double-layer perceptron (DLP) feedforward neural network for prediction.
Whilst methods such as those outlined above have seen successful academic implementation, it has been noted that these can often be overly complicated [25] increasing the difficulty of real-world deployment and use.As such, this has inspired research using simpler methods of wind speed and power generation.k-nearest neighbour (kNN) methods have been shown to be successful, producing robust wind speed and power generation predictions [9,25], with results achievable that are comparable to more intensive methods [26].
A wide range of definitions as to what constitutes a DT have arisen, spurring attempts to consolidate definitions and better specify what constitutes a DT [27,28,29].These suggest that at its most basic form, a DT consists of a physical asset, a virtual model of the asset, and a bidirectional link between them.This link is often considered "live", providing real-time updates to the model, as well as enabling changes to the asset as a result of changes in the model.These changes vary from direct actions undertaken by the model to indirect actions resulting from operator decisions.
DTs provide increased integration between physical and virtual spaces by combining DTs with sensors, machine learning and Internet of Things (IoT)-based technologies [30].DT technology offers several benefits including remote monitoring and operations from anywhere at any time [31], access to real-time monitoring data useful for decision-making [28], and the provision of continuously updating predictions of the future state of an asset [27].These allow for improved decision-making and planning [27], optimisation of activities [32] and automation [31].
The use of DTs in the wind energy sector has seen increasing popularity.Numerous frameworks have been proposed, particularly for operations and maintenance purposes, for both onshore and offshore wind turbines [33,34,35].
However, there has been limited research regarding power generation-focused DTs.Fahim et al. [9] proposed a 5G DT platform powered by Microsoft Azure infrastructure.This looked to provide a framework for real-time monitoring and prediction of power generation, claiming to be the first to do so.This was tested utilising SCADA data for a turbine located at an onshore wind farm in the Yalova region of Turkey.Machine learning, in the form of a deep learning Temporal Convolution Network (TCN) and non-parametric k-nearest neighbour (kNN) regression, were also used to predict wind speed and power generation respectively.
Fahim et al. [9] were able to provide wind speed and power generation predictions rivalling other machine learning techniques.However, there was a lack of explanation as to how a DT would be deployed, either in the field or during testing, and power generation predictions were limited to 2018 (the extent of the SCADA dataset available).Additionally, the use of machine learning did not appear fully integrated with the DT.Rather the results presented appeared to utilise the historical SCADA dataset but did not provide evidence of being used in the context of the DT framework proposed.
Kim et al. [36] proposed a physics-based DT to overcome the reliance on historical SCADA data for model training.The approach provided promising results for a test floating wind turbine, though focused on "live" updates on power generation as opposed to longer-term predictions and requires continuous sensor readings in order to provide predictions.Additionally, the proposed physics-based approach requires a thorough understanding of the wind turbine and local environment in question, with a complete change in model required depending on wind turbine model and location.
As such, this research looks to contribute a DT with live power generation forecast capabilities, as opposed to being restricted to the timeframe of a given dataset and SCADA availability.This is achieved via the development of a machine learning model to predict wind turbine hub-height wind speeds based on live weather forecast data provided via an Application Programme Interface (API).Predicted hub-height wind speed is then used with a machine learning power generation prediction model to provide a power generation forecast.By incorporating this into a fully-functioning DT framework, a fully realised, deployable, power generation forecasting DT is delivered, capable of providing continuously updating live predictions for a wind turbine.
The proposed DT does not maintain a direct link to a wind turbine as might be expected from DT definitions highlighted, instead using live weather forecast data.It is considered that the DT proposed offers a simulation of the anticipated live and future state of the wind turbine in question and as such, the proposed framework and its deployment are referred to as a "Simulated Twin" (SimTwin) for the remainder of the paper.

Data
Wind speed and power generation data were derived from historical SCADA data [37]

Predictive models
kNN regression is a non-parametric pattern recognition method [39] that has seen popularity due to its successful use for time series forecasting whilst remaining relatively simple [40], with the technique seeing use for forecasting both wind speed and wind turbine power generation [41,42].kNN utilises the average of nearby observations to provide an estimated value [43] and distance is used to decide if observations are considered nearby, given by k [44].An optimum k value is imperative.Smaller values can lead to over-fitting, with greater values potentially resulting in worse performance [43].
kNN regression is considered a suitable initial method for the development of the predictive models developed.Time-series SCADA and wind speed datasets were used to develop and implement the predictive capabilities of the SimTwin, with kNN highly compatible with time-series data.Additionally, the simple setup and relatively quick calculation time make it useful for continuous live updates.As previously highlighted, kNNbased research has produced robust power generation and wind speed prediction results [25,9].Taking into consideration the low computational and user requirements, kNN-based methods have demonstrated results comparable to more intensive methods, such as Long Short-Term Memory (LSTM), Support Vector Regression (SVR) and Bagging Regression (BR) [Mehr, 2021].
kNN regression was undertaken utilising a Minkowski distance measurement, given by [45]: Where D is Minkowski distance, u is input array, v is output array and p is the order of the norm of the difference ||u-v|| p [45].
The use of kNN regression to predict hub-height wind speed ( PW HUB ) from lower level wind speed ( W LOW ) and predicted power generation ( PG ) from predicted hub-height wind speed is given by the following functions respectively: Wind speed prediction results have also been compared to alternative methods to kNN regression considered appropriate for wind speed forecasting.These have been chosen for their ability to provide relatively quick updates as would be required by the SimTwin to give continuously updating results.The Wind Speed Power Law (WSPL) is a physical law that can be used to estimate upscaled wind speeds utilising a reference height with a known wind speed given by the equation [46]: Where U h is target wind speed at height Z h , U g is known wind speed at height Z g and α is the power law exponent.A value of 0.143 is typically adopted for α under neutral stability conditions.
Decision Tree Regression (DTR) has also been used, which breaks data into small sub-groups [9].Extreme Gradient Boosting (XGBoost) regression is a gradient-boosted decision tree, which combines weaker models, providing a unified stronger model [47].

SimTwin framework
Fig. 1 highlights the high-level architecture used to implement the power generation forecasting SimTwin.

Hub-Height wind speed prediction
Power generation forecasting requires the prediction of future wind speed, necessitating the use of weather forecasting services.Wind speed predictions for the wind turbine location were sourced from OpenWeather's One Call API 3.0 [48].This provides hourly forecasts for a 48hour period, including wind speed.However, as this is given at a different altitude than the hub-height of the wind turbine this needed to be converted.
A hub-height wind speed prediction model was derived using hub-height wind speed from the SCADA dataset and historical wind speed data sourced from OpenWeather [49] for the same timeframe (see equation 3).Historical lower-level wind speed data from OpenWeather is given for hourly periods and as such the 10-minute SCADA dataset was averaged to give an hourly value.Historic wind speed data was used as an input to kNN regression with the higher hub-height wind speed as the desired target.The kNN regressor looks to predict the anticipated hub-height wind speed for a given low-level wind speed by utilising the average of nearby wind speeds for a given point, as identified by k.Grid search was used to calculate the optimal k value for the model, with values between 1 and 100 trialled.In doing so, a kNN model was derived capable of taking lowerheight wind speed and upscaling it to hub-height.
Live weather forecasts from the One Call API 3.0 were accessed using an API call to the OpenWeather One Call API 3.0 web address.Upon calling, 48-hour, hourly, wind speed dataset is provided in JSON format.This dataset is then inputted into the kNN wind speed upscaling model and anticipated correlating hub-height wind speed generated, providing a 48-hour prediction of hub-height wind speeds.This was undertaken at frequent repeating periods, providing continuous hub-height wind speed predictions based on the latest weather forecast.

Power generation forecast
Wind speed at hub-height and actual power generation from the SCADA dataset was used to train a kNN regression model capable of predicting power generation for a given hub-height wind speed (see equation 4).Data was averaged so to give an hourly value, reflecting the hourly values used for hub-height wind speed prediction and power generation forecasting.Grid search was also used, with values between 1 and 100 trialled.
The 48-hour hub-height wind speed predictions were used in conjunction with the power generation prediction model, giving a 48-hour power generation forecast for the wind turbine.This was set to occur at frequently repeating periods, providing continuously updated power generation forecasts.

Exportation to Azure IoT Hub
The hub-height wind speed and power generation forecast are converted to JSON ("UTF-8" encoding and content type "application/json") and exported to Azure IoT Hub.This is then continuously updated upon receiving power forecast data.

Azure Digital Twin model display
Azure Digital Twin requires models to be defined using Digital Twin Definition Language (DTDL).This enables the establishment of relationships to allow for grouping of different components of a DT, allowing a better understanding of how these may interact with each other [50].In this case, official documentation [50,51] has been used to generate a simple wind farm model comprising 3 wind turbines, one of which is used to represent the wind turbine tested.A 3D model [52] has also been imported for added visualisation.
An Azure Function, adopted from official documentation [53], decodes and sends relevant data from IoT Hub to Azure Digital Twin which is then displayed.The Azure Function continuously sends relevant data upon the arrival of new data to IoT Hub, allowing the DT model to display the most recent hub-height wind speed and power generation forecasts.Fig. 2 shows the wind turbine in Azure Digital Twin graph layout and the 3D wind turbine model.T1 refers to the wind turbine tested.It is noted that this is not intended to reflect the makeup of Kelmarsh wind farm but rather act as a general representation.

Model performance metrics
Root Mean Square Error (RMSE) was used to measure model performance, allowing comparisons with similar research.This considered a suitable performance evaluation metric for wind power as it assigns additional weight between large differences between actual and predicted values compared to smaller differences [54].RMSE was calculated so to give the difference between predicted and actual values for wind speed and power generation.A lower RMSE value is considered better and is given by the equation [55]: Where e i is an actual value and e ̅ is a predicted value.

Power forecast SimTwin testing parameters
All tests undertaken are highlighted in Table 2. To test the deployment of the SimTwin framework, 12 months of power generation and wind speed SCADA data, as well as 12 months of historical wind speed forecasts were used to generate the power generation prediction model and hub-height wind speed prediction model.A wind speed forecast for the future 48-hour period of 16:00 on 21/06/2023 to 15:00 on 23/06/2023 was sourced from One Call API 3.0, upscaled to hubheight, and fed into the power generation prediction model, thus giving a power generation forecast for the wind turbine during this period.

Training period Test period
Given that no actual wind turbine data is available for this period, the validity of the hub-height wind speed prediction and power generation prediction kNN models was calculated by producing alternative models.These are capable of making predictions during the SCADA dataset timeframe, allowing comparisons with actual output during this period.This was achieved by using historical OpenWeather wind speed data for both training the hubheight wind speed prediction model and for calculating power generation.Model training was undertaken using quarterly data, with the following week used to test the models.Q4 2019/2020 data was reduced by 1 week to allow a test period within the 2020 timeframe of the SCADA dataset.

Results
Results are split into 3 sections; The first section provides 2019 weekly results for the hub-height wind speed predictions based on quarterly 2019 training data, with the actual SCADA-derived hub-height wind speed for the same period provided for comparison.The kNN regression method adopted is also compared to WSPL, DTR and XGBoost-based approaches.kNN regressionderived weekly results for 2020 based on quarterly 2019 data are also provided.
The second section highlights the 2019 and 2020 weekly power generation forecast results using the kNN regression-derived predicted hub-height wind speeds and actual SCADA-derived hub-height wind speeds as inputs.
These results have been presented alongside SCADAderived wind turbine power generation for comparison.

Quarterly hub-height wind speed results
Table 3 outlines measured performance in the form of RMSE for the 2019 test periods for each quarter, performed for the range of methods considered to be suitable for wind speed prediction.The k value used for kNN-based predictions is also presented.Table 4 outlines the measured performance and k value used for the 2020 test periods for each quarter, performed using kNN regression for wind speed prediction.

Quarterly power generation results
Table 5 outlines measured performance and k value used for the 2019 test periods for each quarter, performed using kNN regression for power generation prediction based on predicted and SCADA-derived hub-height wind speeds.6 outlines measured performance and k value used for the 2020 test periods for each quarter, performed using kNN regression for power generation prediction based on predicted and SCADA-derived hub-height wind speeds.It has been demonstrated that the SimTwin framework proposed is deployable and capable of providing live and updating forecasts of future hub-height wind speed and power generation for a wind turbine.This may prove useful for operators by reducing reliance on SCADA systems and associated physical wind turbine sensors.Additionally, as a DT necessitates the need for live data, this provides a useful tool for DT development where live and historical data may be limited.
The testing results show that the kNN hub-height wind speed model was capable of upscaling historical OpenWeather low-level wind speed hub-height forecasts to a reasonable accuracy across all quarters tested barring Q4 2020, though still picked up on the general trend in wind speed fluctuations.The kNN regression approach taken produced lower RMSE results than WSPL and DTR.XGBoost results are generally considered on a similar level to that of kNN regression whilst requiring significantly greater model tuning and additional training time.As such, the kNN approach presented is considered the most appropriate for the development of an easy to deploy, responsive, DT system.
The kNN power generation model showed relatively impressive results when using SCADA-derived hubheight wind speed for the week tests undertaken for Q1, Q2 and Q3 of 2020.Q4 proved more challenging to predict, resulting in lower accuracy, however this is anticipated to be due to a partial shutdown of the wind turbine when no generation occurred.This demonstrates that the kNN power generation prediction model is generally capable of providing accurate power generation results given an optimum hub-height wind speed input.
When predicting power generation using predicted hub-height wind speeds, the model also demonstrated reasonably accurate results, capturing the general trend of power generation.The RMSE for this was higher than utilising the SCADA-derived wind speeds, due to the error already introduced when upscaling the predicted hub-height wind speeds.
It was also demonstrated that the proposed framework can produce reasonable predictions for both short and long timescales, thus increasing its potential usefulness.
Table 7 highlights select results from Fahim et al. [9].Week-long quarterly hub-height wind speed predictions were undertaken for a different wind farm than that test in this paper, however, this is considered a useful comparative metric for predicting wind speed.Hub-height wind speed predictions utilising the kNN hub-height wind speed model produced a lower RMSE in both Q1 and Q2 in both 2019 and 2020, though RMSE was higher in Q3 and Q4 2019 and 2020.Overall, it is considered that the ability to upscale wind speed from an easily accessible source of wind speed data, make predictions over long timescales and the simplicity of the kNN model may potentially be worth the trade-off in the correct circumstances, such as in situations with limited data availability and where ease of model training is required.Despite this, there are potential methods that could be utilised to improve performance.The use of alternative weather forecasting services [56] may give a more accurate representation of wind speed at the site of the wind turbine.The inclusion of other factors beyond wind speed and power generation may prove beneficial if available in alternative datasets.This includes temperature, pressure, wind direction and humidity [32].Alternative approaches could be tested, such as deep neural networks [32] and hybrid approaches combining physics, statistics and machine learning techniques [28].
It is envisaged that the SimTwin framework demonstrated in this paper could act as an alternative system to SCADA systems, reducing reliance on wind speed sensors by providing reasonable wind speed and power generation forecasts for use by wind turbine operators.The relative simplicity of the system should also help with ease of deployment.Additionally, it is anticipated that the outputs of the SimTwin would provide a useful tool in research and development environments, particularly when access to live wind speed and power generation forecasts are needed but inaccessible.This includes research into the fatigue effect of wind loading, the management of power generation and the storage of additional power generation, which can be undertaken in real-time.
Future work should look to include the potential improvements highlighted and test the model for different wind turbine models in differing locations, including both onshore and offshore installations.

Conclusion
Wind power is a key pillar for the decarbonisation of electricity generation.Digital Twin (DT) technology allows increased integration between physical assets and virtual models including live monitoring, updates and predictions, allowing suitable and informed actions to be undertaken.It has been demonstrated that a power generation forecasting Simulated Digital Twin (SimTwin) is achievable, providing hub-height wind speed and power generation predictions beyond the timeframe of data availability via the use of a weather forecast API and machine learning in the form of k-nearest neighbour (kNN) regression-based models.
The use of kNN regression for hub-height wind speed prediction was seen to have comparable or better results when compared to the Wind Speed Power Law (WSPL), Decision Tree Regression (DTR) and Extreme Gradient Boosting (XGBoost) regression.Additionally, it has been demonstrated that this approach can provide reasonable predictions for short and long timescales.This reduces reliance on Supervisory Control and Data Acquisition (SCADA) systems and associated physical wind turbine sensors and provides a useful tool in DT research where dataset availability may be limited.Furthermore, by using live openly available weather data, power generation predictions can be expanded to entire wind farms, as well as for regional, national or global ranges of wind turbine power production.
Future work should look to provide improvements to hub-height wind speed predictions and power generation forecasts.This may be achievable by using different machine learning techniques, alternative weather prediction sources, hybrid approaches and more detailed historical power generation and wind speed datasets.

Fig. 3 to
Fig. 3 to Fig. 6 outline the 2020 week-long wind speed predictions based on 2019 quarterly training data.Predicted wind speed is given in meters per second (m/s).

Fig. to
Fig. to Fig. outline the 2020 week-long power generation predictions based on 2019 quarterly training data.Predicted power generation is given in kilowatts (kW).

Table 3 .
2019 wind speed prediction comparative methods

Table 5 .
2019 Power generation prediction performance values

Table 6 .
2020 Power generation prediction performance values