Improving Net Energy Metering (NEM) Actual Load Prediction Accuracy using an Adaptive Learning Rate LSTM Model for Residential Use Case

. As an effort to promote renewable energy-based power generation, one of Malaysia ’ s initiatives is the net-energy metering (NEM) scheme. One of the shortcomings of residential Photovoltaic (PV) systems under the NEM scheme is that it operates with smart meters only whereby the actual load profiles by the residential consumers remain unknown. Accurate load prediction for NEM consumers is crucial for optimizing energy consumption and effectively managing net metering credits. This study proposes a new model that incorporates an adaptive learning rate and Long Short-Term Memory (LSTM) to predict the solar output power that subsequently predicts the actual load used by the NEM residential consumers. The proposed model is trained and tested using historical time series data of projected PV power and weather conditions, considering the GPS location of the PV system. The outcome of the proposed model is then compared with other state-of-the-art models like ARIMA and regression methods. It is shown that the proposed model outperforms the traditional forecasting models with a Root Mean Square Error (RMSE) value of 0.1942.


Introduction
In commitments with the Paris Agreement that addresses environmental issues, Malaysia intends to reduce its greenhouse gas (GHG) emissions intensity by 45% in the year 2030.This exhibits Malaysia's efforts to promote power generation from renewable resources, most commonly from solar, wind, and water resources.To achieve a low-carbon future and mitigate energy issues in terms of security, efficiency, and demand management, several key policies and incentives were introduced [1]- [4].
One of them is the Net-Energy Metering (NEM) scheme for residential Solar Photovoltaic (PV) systems.Introduced in 2016, Malaysia's Net Energy Metering (NEM) scheme has emerged as a pivotal policy to incentivize the adoption of renewable energy sources by allowing consumers to receive credits for surplus energy exported to the grid.As shown in Fig. 1, the concept of NEM is that the energy produced from the installed solar PV system will be consumed first, and any excess will be exported to the electric utility provider [5]- [6].However, a NEM residential PV system does not have a separate meter to monitor and measure the PV output power, unlike commercial and industrial sites that have independent meters to measure solar power parameters.The supply and load parameters are obtained via the Advanced Metering Infrastructure (AMI), i.e., Smart Meters.Only the electrical power that is imported to and exported from the grid can be extracted from the smart meter.However, the actual load consumed by the customer is unknown.Therefore, it is crucial to have accurate and reliable forecasting of solar power to know the actual load profiles of residential solar PV consumers.This acts as backup information and preparation to meet future demand when the penetration of residential solar power increases or during any emergency.Accurate load prediction for NEM consumers is crucial for optimizing energy consumption and effectively managing net metering credits [7]- [8].
Before load prediction is attempted, it is essential to forecast the solar irradiance (GHI) and the solar output power.These two variables play essential roles in predicting the actual load of the NEM consumers more accurately as they provide information on the pattern or trend of solar energy production.GHI, PV power, and load forecasting are time series forecasting since they vary concerning the time variable.Time series data forecasting simply means predicting the future value (short or long-term) under the influence of historical or past data over a defined period [9].Time series forecasting techniques are divided into two main categories: univariate and multivariate approaches [10,11].Univariate forecasting uses only a single variable, such as the historical PV power output, to make predictions.It captures the temporal dependencies and patterns inherent in the PV power time series data.On the other hand, multivariate forecasting incorporates additional variables, such as weather data, solar irradiance, ambient temperature, and other relevant factors, to enhance the accuracy and robustness of the predictions.By considering multiple variables, multivariate models can capture the complex relationships and dependencies between different factors influencing PV power generation.
Several studies on load forecasting and prediction models for NEM consumers in various countries have been conducted especially using traditional approaches, such as ARIMA and regression-based methods, which have been widely employed.However, these methods often struggle to capture the non-linear dynamics and temporal dependencies present in energy consumption and generation.On the other hand, machine learning, including deep learning techniques, is now very popular in the field of PV power and load forecasting due to its high prediction accuracy and its ability to handle big and complex data.Long Short-Term Memory (LSTM), a type of recurrent neural network (RNN), has shown promising results in time series forecasting, especially short-term PV power forecasting.Hence, this work proposes an adaptive learning rate Long Short-Term Memory (LSTM) model with an optimized window size for NEM residential actual load prediction in the context of the Malaysian energy landscape, leveraging its ability to capture temporal patterns and dependencies.

Related Works
Deep learning algorithms have been widely studied in the application of GHI, power, and load forecasting.These approaches leverage the power of neural networks to capture complex patterns and relationships within the data.The most commonly used model for forecasting is the Long Short-Term Memory (LSTM)-based model since it has shown promise in improving prediction accuracy.The following are several recent works on GHI, power, and load prediction via LSTM-based models.
For instance, [12] applied the basic LSTM model to perform short-term forecasting of the actual load for demand response management in India.[13] demonstrated PV power forecasting using a hybrid LSTM-based model for one hour ahead.[14] applied the LSTM model to predict short-term PV power output using weather forecast data.[15] employed an LSTMbased deep learning model to forecast PV power generation with high accuracy, incorporating historical PV power data and weather forecast information.[16] predicted PV power and load demand using the LSTM model with the infusion of different weather parameters such as air density, temperature, cloud cover irradiance, etc. [17] studied short-term solar power forecasting using a hybrid model (LSTM-TCN) using only historical PV power data.[18] forecasted GHI for using LSTM based on historical data.[19] compared LSTM and Random Forest methods for PV power forecasting.These works suggest that LSTM is a better prediction model as compared to other state-of-the-art models and that LSTM is more effective in handling multivariate data.
LSTM-based models have been tested for GHI and power forecasting in abundant research studies.However, further experiments using the LSTM-based models for load prediction are still limited to date, especially predictions specifically tailored for the residential NEM scheme in Malaysia.Hence, this paper aims to address this research gap by adapting and optimizing the LSTM model for NEM actual load prediction in the Malaysian context.

Weather Data Collection
For this study, the GHI and weather data were gathered from two independent fee-based databases which are the National Solar Radiation Database (NSRDB) and SOLCAST [20,21].Four years of weather data extracted are from the residential PV consumers located in the region of West Melaka, Malaysia.Fig. 2 shows the geographic location of the selected residential PV consumers for this study as highlighted in red.The main data extracted from SOLCAST are the GHI, cloud opacity, air temperature, speed, and direction of the wind.NSRDB provides other key features like beam and diffuse irradiance.
Based on previous studies [15][16][22][23][24], by including other weather data like irradiance, ambient temperature, and other correlated features, the model can integrate the pattern between different features better, leading to more accurate predictions.However, it is also equally important to feed the model with only related features and discard irrelevant features to avoid overfitting information to the model.This can be done with the aid of correlation analysis whereby only the positively-correlated features will be chosen as inputs.

Fig. 3. Heatmap Correlation Analysis of SOLCAST and NSRDB data
To obtain the best combination of input features for the proposed model, Heatmap Correlation analysis is done using both datasets, i.e., SOLCAST and NSRDB datasets.Heatmap correlation shows related features in different shades of red.The darker the red shade, the more the features are related to each other.Also, these features normally lie in the positive range of the map, i.e., above 0. From the Heatmap correlation analysis in Fig. 3, the GHI generation is positively influenced by the features that are highlighted in red.Only the highlighted features are fed as inputs to the proposed model input features selection.By utilizing these most impactful features as highlighted in Fig. 3, the proposed model can simplify the input feature selection process and avoid overfitting of information.This greatly improves the accuracy and also speeds up the computing time.

NEM data collection
The residential NEM consumers have only smart meters connected to their independent PV systems.These meters only show the imported and exported power values to and from the grid respectively.The import and export values are the base parameters in this study which are used to predict the pattern of the actual load of the selected consumers.These values were extracted from Malaysia's main utility provider database from the year 2020 to 2022.

Solar irradiance distribution analysis
The following analysis studies the distribution pattern of solar irradiance (GHI) in the selected area, West Melaka, Malaysia.Fig. 4 is a boxplot showing the GHI by hour of every day of 2022.It is observed that the irradiance can be consistently detected from 8 a.m. to 6 p.m. and is at its peak between 12 p.m. to 1 p.m. every day in the selected area.This indicated a strong correlation between the GHI and the time variable, which is a useful input to the proposed model.Next, based on the previous findings, the distribution of GHI at the peak hour (12 pm) of each day throughout the year 2022 is studied, as shown in Fig. 5.This plot reveals that the GHI distribution is at peak, has no major variations and the availability is again consistent throughout the year.

The Proposed Model
In this study, an adaptive learning rate LSTM model with an optimized window size is proposed for forecasting the actual load of residential NEM consumers.The integration of the adaptive learning rate aims to improve the performance and efficiency of the LSTM model in capturing the complex patterns and temporal dependencies in the selected input data.
The proposed LSTM model for NEM actual load prediction in Malaysia consists of multiple LSTM layers followed by a fully connected layer.The model takes as input the historical energy consumption, renewable energy generation, and other contextual features.The LSTM layers capture temporal patterns and dependencies, while the fully connected layer produces the load prediction.To enhance the model's performance, an adaptive learning rate mechanism is incorporated, allowing the model to dynamically adjust the learning rate based on the data characteristics.
This section describes the training process of the proposed model using the collected historical data, which considers both energy consumption and renewable energy generation as input features.The training process includes optimizing the learning rate to enhance the model's performance and convergence.
To train the LSTM model, the target label, which is 'AC System Output' (solar power generation), along with the multivariate features, are fed into the model during training and evaluation.The 'AC System Output serves as the ground truth for the model to learn and predict.
The multivariate features include 'Beam Irradiance (W/m2)', 'Diffuse Irradiance (W/m2)', 'Cloud Opacity', 'Ambient Temperature (C)', and 'Cell Temperature (C)'.These features provide valuable information about the weather conditions and solar irradiance levels, which are crucial factors in accurately predicting energy consumption and renewable energy generation.
During the training process, the LSTM model takes a sequence of input data consisting of historical values of the multivariate features and aims to predict the 'AC System Output'.This sequence is generated using a sliding window approach, where the window size determines the number of time steps considered in each input sequence.
Consider a sliding window size of 5, corresponding to 5 consecutive time steps or hours of historical data.The LSTM model takes this input sequence of multivariate features as inputs and learns to predict the 'AC System Output' for the next time step.This process is repeated for multiple input sequences, allowing the model to learn the patterns and dependencies in the data over time.
Similar input sequences are fed into the trained LSTM model during the evaluation, and the model predicts the 'AC System Output' for each time step.These predictions can be compared with the actual 'AC System Output values to assess the model's performance and accuracy in forecasting energy consumption and renewable energy generation.
By feeding the target label ('AC System Output') and the multivariate features into the LSTM model during training and evaluation, the model learns to capture the relationships between weather conditions and energy generation accurately.This approach enables the LSTM model to effectively utilize historical data and make accurate predictions for energy consumption and renewable energy generation in real-time scenarios.
Lastly, with the predicted solar power generation by the proposed model, the actual load is determined using the formulae shown in the pseudocode in Fig. 6 below.Three inputs to the prediction of the actual load are the predicted solar generation, import and export powers.Import power is needed whenever the generated solar power ('AC System Output') is insufficient to supply the electricity demand of a particular household.In contrast, when solar power generated is more than the actual consumption of a household system, it will be exported back to the grid.The overall methodology of this study is simplified in Fig. 7.

Adaptive Learning Rate
The adaptive learning rate plays a crucial role in optimizing the learning process of the LSTM model.By dynamically adjusting the learning rate based on the model's performance and convergence behaviour, the training process becomes more effective, avoiding the risk of getting stuck in suboptimal solutions or experiencing slow convergence.This adaptive mechanism enables the LSTM model to adaptively learn from the data and improve its forecasting accuracy over time.
In this study, the Adam optimizer with a learning rate step of 0.02 is employed as an adaptive learning rate algorithm for the LSTM model.The Adam optimizer is widely used in training deep learning models because it adapts the learning rate based on the observed gradients during training.
The Adam optimizer combines the concepts of adaptive learning rates and momentum-based optimization.It maintains a separate learning rate for each parameter in the model and updates these learning rates using estimates of the first and second moments of the gradients.These estimates are calculated using exponentially decaying moving averages of the gradient and its square.
The adaptive nature of the Adam optimizer enables it to automatically adjust the learning rate for each parameter based on the observed behaviour of the gradients.If the gradients for a specific parameter consistently exhibit large values, the optimizer reduces the learning rate to take smaller steps, preventing overshooting the optimal solution.Conversely, if the gradients are small, the optimizer increases the learning rate to accelerate convergence.
The use of adaptive learning rates provided by the Adam optimizer offers several advantages.It eliminates the need for manual selection of an appropriate learning rate, as the optimizer dynamically adjusts the learning rate during training.This adaptivity facilitates faster convergence and improved stability, particularly in optimization landscapes characterized by sparse gradients or varying magnitudes.
In the context of the proposed model, the adaptive learning rate mechanism offered by the Adam optimizer contributes to the overall effectiveness of the training process.It enables the model to effectively learn from the collected historical data, considering both energy consumption and renewable energy generation as input features.By dynamically adjusting the learning rate, the model can navigate the training process more efficiently, leading to higher accuracy in forecasting energy consumption and renewable energy generation.

Model Evaluation 2
In this paper, the Root Mean Square Error (RMSE) indicator is used to evaluate the performance of the actual load prediction model.The RMSE, given by ( 1) is a commonly used metric to measure the average magnitude of the errors between predicted and actual values.It is calculated as the square root of the average of the squared differences between each predicted value and its corresponding actual value which gives the overall accuracy of the forecasts while penalizing large forecast errors in a square order.A lower RMSE value indicates better accuracy and closer alignment between the predicted and actual values.To evaluate the performance of the proposed model, it is essential to compare its results with other existing forecasting models used in the literature.This comparative analysis provides insights into the effectiveness and competitiveness of the proposed model. (1)

Prediction of solar power generated.
As illustrated in Fig. 8, the proposed LSTM model can accurately predict solar power ('AC System Output') generated.It can be observed that the predicted value follows the actual value in a close manner.The input features chosen for the solar power prediction are air temperature, cloud opacity, relative humidity, wind direction, wind speed, beam irradiance, diffuse irradiance, ambient temperature, the plane of array irradiance, and cell temperature.Based on the output in Fig. 8, it is proven that the incorporation of other weather data into the prediction model can help to boost its performance, accuracy, and efficiency.The best performance evaluation shows that the RMSE is 3.8471, MSE is 14.8004 with the best learning rate of 0.04.

Prediction and Comparison of the Actual Load
The prediction of the actual load is executed by integrating the predicted solar power generated with import and export power distribution.The following figures are the results of the actual load prediction.Firstly, Fig. 9 shows the distribution of power imported from the grid.Highlighted in green in Fig. 9, it is observed that power is mostly imported during the "no sun" period (no GHI and solar power generation), i.e., from the evening till early morning of the next day.Moving on, Fig. 10 shows the distribution of power that is exported to the grid when the energy generated is in excess.This activity generally happens during the peak of each day, i.e., at noon times as shown in green in Fig. 10.Subsequently, Fig. 11 shows the distribution of the predicted solar power generation ('AC System Output').This is the output from the proposed LSTM model with references to all the key features from Fig. 3 as inputs.
With references to the previous observations made on the import power, export power and the predicted solar power distributions, the prediction of the actual load is simulated.The parameters used in the actual load prediction are summarized in Table .The best actual load distribution configured is shown in Fig. 12 and finally compared with other state-of-the-art models and their respective parameters as shown in Fig. 13.Fig. 12 shows that the projected load line follows the import power value whenever the import power is higher than the solar generation value.Next, if the predicted value is higher than the export power value, the actual load is equal to the predicted value minus export power.Lastly, the projected load will follow the export power value if the export power is higher than the predicted value.The predicted distribution in Fig 12 exhibits that the actual load is closely related to the generated solar power.This further proves the hypothesis made earlier which is that the load prediction is greatly improved with the inclusion of solar irradiance, solar power, and other related weather data subsequently.As the next and final step, the performance comparison and evaluation were done between the proposed LSTM model, Gradient Boosting Regression, Random Forest Regression, ARIMA, and Linear Regression as illustrated in Fig. 13.Based on Table , it can be concluded that the adaptive learning rate LSTM with an optimized window size has outperformed all other models with the lowest RMSE of 0.19424 and MSE of 0. 03773.Table

Table .
shows the percentage of improvement in prediction by the proposed LSTM model when compared with other models used in this study.This proves that LSTM is so far one of the most accurate deep learning methods for forecasting.As expected, the integration of an adaptive learning rate mechanism within the LSTM model has enhanced its ability to adapt to varying data characteristics specific to the Malaysian NEM scheme.This adaptation is crucial for capturing the dynamic patterns of load consumption and renewable energy generation in Malaysia.Also, via -the determination of the optimal window size, the model can capture relevant historical observations within the Malaysian NEM scheme which is a critical factor in improving load prediction accuracy

Conclusion and future works
This paper has proposed an LSTM model for NEM actual load prediction in the context of the NEM scheme in Malaysia.The integration of an adaptive learning rate mechanism and optimized window size within the LSTM model can capture the complex dynamics of load consumption and renewable energy generation specific to the Malaysian context.The proposed model outperforms the traditional methods tested in this study, providing more accurate load forecasts, and contributing to the successful implementation of the NEM scheme in Malaysia.
Accurate load prediction for NEM consumers in Malaysia is crucial for optimizing energy consumption, managing net metering credits, and promoting the adoption of renewable energy sources.This study shows that the proposed LSTM model can empower consumers or the Grid System Operators (GSO) to make informed decisions regarding their energy usage patterns, further contributing to the sustainable development goals of Malaysia.
Though the LSTM model shows promising results, it may face limitations in handling extreme load variations or unforeseen events specific to the Malaysian NEM scheme.Hence, future works are suggested to focus on incorporating additional contextual factors, such as grid constraints or policy changes, to enhance the model's predictive capabilities in the Malaysian context.

Acknowledgement
This work was supported by Tenaga Nasional Berhad (TNB) and UNITEN R&D through the TNB Seeding Fund under the project code U-TD-RD-21-21.

Fig. 8 .
Fig. 8. Predicted solar generation values versus the actual generation values.

•
The number of boosting stages to perform; No. of estimators = 100 • The learning rate for each boosting iteration= 0.1 • The maximum depth of each decision tree = 3 • The subsamples used for fitting the individual trees = 1.0 Random Forest Regression • The number of decision trees in the random forest =100 • The maximum depth of each decision tree = 10 • The minimum number of samples required to split an internal node = 2 • The minimum number of samples required to be at a leaf node = 1 ARIMA • The order of the autoregressive part of the ARIMA model, p = 1 • The order of differencing in the ARIMA model, d = 0 • The order of the moving average part of the ARIMA model, q = 1 Linear Regression • No specific parameters are set for the linear regression model.

Table .
Comparison between the proposed model and other existing forecasting models