Machine learning based modeling for estimating solar power generation

. The solar power plant is a rapidly growing renewable energy source that has a potential role in reducing climate change and replacing fossil fuels. Estimation of the power generated by a solar power plant is required to determine the energy supply. Unfortunately, the solar power generated is highly uncertain due to highly dependence to nature, such as solar radiation and weather. This makes the estimation of solar power generation to be very difficult. This study presents a development of machine learning to model a solar power plant for estimating the generated power. The machine learning is developed by implementing the k-NN algorithm. A data set of power generated in a solar power plant is applied to build the machine learning. The development resulted in a machine learning that models the solar power plant. Simulation test result show the machine learning was able to estimate the solar power generated with an accuracy of 69.6%. The developed model is very useful to estimate potential of solar power resource in a region. The developed model is very useful in feasibility studies to estimate the potential of solar power resources in an area.


Introduction
The world has recently faced the challenges of climate change and depleting fossil energy reserves.Searching for clean and renewable energy sources has become a global priority.Clean and renewable energies are mandatory to save our planet and ensure a sustainable future for the next generation.There are several potential energy sources that provide clean and renewable energy, such as solar, wind, hydropower and geothermal.Solar energy is one of the most abundant renewable energy sources.Solar energy is produced through nuclear fusion in the Sun, where protons of hydrogen atoms collide violently and fuse to create helium atoms.The energy produced is emitted by the Sun at a rate of 3.8 × 10 23 kW as light and heat.About 1.8 × 10 14 kW of energy goes to the earth and about 60% reaches the earth surface, while 40% is reflected to space and absorbed by the atmosphere [1].The earth surface receives energy from the Sun about 1.08 × 10 14 kW in the form of heat and light.According to [2], total world energy consumption in 2018 was about 14000 Mteo that is equivalent to 1.6282 × 10 8 kW.This number shows a big potential of the solar power to cover the demand of clean and renewable energy.
Solar energy has been used since ancient times.Traditionally, solar energy is used directly in the form of heat, such as drying food, clothing, or other materials.In this modern era, solar energy is converted into electrical energy so that it can be used in various purposes, either directly or indirectly (stored energy).Photovoltaics are devices that convert solar irradiation directly into electricity [3].Solar irradiation is defined as the amount of solar radiation received per unit of area, while solar radiation refers to the total amount of energy emitted by the Sun.A unit element of photovoltaic is known as a photovoltaic cell or a solar cell.When these cells are exposed to sunlight, photons are absorbed and the electrical current begins to flow after completing the gap between two poles.To produce a sufficient amount of electricity, millions of photovoltaic cells are combined to build a solar panel.To capture solar energy and produce electrical energy, the solar panel is integrated with other components such as a controller, inverter, and battery to construct a solar panel system.
The performance of a solar panel system is indicated by the electrical energy produced.This performance is influenced by several factors that can be classified into two categories, internal and external factors.The internal factor is any factor that influences the amount of energy produced due to the components used and the configuration in the system.Internal factors may include the quality of each component of the system, the configuration of the system, and the installation.The solar panel system should be installed such that it received maximum intensity of sun light.A solar tracker device is usually applied to direct the solar panel following the movement of the Sun [4].Studies on the use of solar tracker in a so-lar panel system show an increase in power output in the range of 31% to 82% compared to the fixed solar panel [5][6][7][8].However, the use of solar tracking entails an additional cost for the system, and therefore the economic factor should be considered [9].Among these internal factors, the photovoltaic material is the most influencing factor for energy output.Higher energy output of a solar panel can be achieved by using a higher efficiency photovoltaic material.A comprehensive report on the efficiency of different types of photovoltaic materials was presented in [10].The perovskite single junction photovoltaic cell (PSC) and the organic photovoltaic material-based single junction photovoltaic cell (OPV) are the solar panel materials that produce the highest efficiency.These internal factors are deterministic, where the higher performance of the solar panels can be achieved using a better material and a better configuration.On the other hand, the external factors of solar power generation are nondeterministic.The external factors come from nature, such as weather, solar irradiation, solar spectrum, ambient temperature, wind, and humidity [11][12][13].The external factors give uncertainty to solar power generation.
Estimation of the power produced in a solar system is very important to determine the power supply capacity.Estimation can be done through modeling of the solar system.Modeling is a process to develop a model that represents a relationship between input and output.Modeling the solar system is not a simple problem, as the power produced by the solar panel is influenced by many factors.
There are several methods in modeling, and one of them is data-driven modeling using machine learning.This modeling is a black-box modeling, where the model is built through a learning process using pairs of input-output data.Machine learning is a method that is very useful and powerful in system modeling [14].Machine learning has been applied in the modeling of many different systems, including complex systems.The study in [15] presented a comprehensive review of energy system modeling using machine learning.Modeling fluid mechanics using machine learning was presented in [16].The modeling using machine learning promises an advantage of high accuracy.
The advantage of machine learning in data-driven modeling motivates this study to develop machine learning to estimate power generation in a solar power plant.The machine learning is developed by implementing the kNN algorithm.A solar power system data set that includes the generated power and the weather is utilized to train the machine.The presentation of this study is organized as follows.Section I provides an introduction that describes the background, motivation, related studies, and objective of the study.Section II describes the methods that include a brief description of solar panel system, system modeling, machine-learning based modeling, the k-NN algorithm, data set, and machine learning development.Section III presents the results of the study and discussion.Finally, the conclusion of this study is given in Section IV.

Solar panel system
Solar panel system has been applied in many applications, such as agriculture [17].Figure 1 shows a diagram of a solar panel system.The system consists of solar panel, charge controller, battery, inverter, and electric loads.The solar panel is used to convert sunlight into electricity.The charge controller has dual functions to perform; one is to charge the battery and the second is to prevent overcharging of batteries.They eliminate any of the reverse current flows from the batteries back into the solar modules at night [18].The battery is to store the electric energy that is normally at 12 Volts DC.The stored energy can be used directly to power any device that works at 12 Volts DC.Alternatively, the stored energy is converted to the same voltage as the grid voltage, which is commonly 110 or 220 Volts AC.An inverter is the tool for this conversion.The loads in the figure represent any devices that consume electrical power, such as lamps, computers, refrigerators, etc.The power generated by the solar system depends on several factors, such as the intensity of sunlight and the weather.Several models have been introduced to approximate the power, and one of the models is described below [19].
where V is the voltage imposed, I0 is the leakage current, the Iph is the photocurrent, A is the ideality factor of diode, and VT is the thermal voltage of diode.The thermal voltage is defined by the following equation:

System modelling
A system is basically a process of transforming input into output and therefore is represented by the following mathematics equation: Fig. 2. System modeling based on mathematical approach.
where x is input, f(x) is a function representing the process, and y is the output.The input and output of the system are known variables that can be obtained through measurements, and both can be scalar or vector.On the other hand, the function f(x) that transforms the input into the output is unknown.System modeling deals with obtaining a model that approaches the function f(x) based on the input and output data of the system.Figure 2 shows a diagram of how system modeling is carried out.It is a reconstruction process of how the output was generated by the given input.Mathematically, the model is represented as follows: where  ̂ is the output of the model and  ̂() is the approximated function of the model.The model receives the same input data as the system and generates output based on the approximated function.The output model is compared to the system output, and the different is known as the model error given as follows: where e is the model error.The goal of system modeling is to obtain  ̂() such that e is minimum at acceptable values.System modeling is a recursive process in which the error is used to refine the model so that minimum error is achieved.System modeling is conventionally done based on mathematical methods, such as auto-regressive (AR), autoregressive moving average (ARMA), and autoregressive integrated moving average (ARIMA).Instead of that, system modeling can also be done using artificial intelligence, including machine learning.

Machine learning based modelling
The development of computer technology has introduced artificial intelligence.Artificial intelligence is a computer program that mimics human intelligence.Machine learning is a type of artificial intelligence, where the intelligence is built through a learning process based on a dataset.Machine learning is widely applied in many different fields, such as engineering, economics, medical, social, etc.One of the applications is system modeling.A machine learning is a black box model that can be trained to learn the behavior of any system based on input and output data of the system.The learning process builds the ability of the machine to mimic the behavior of the system.This makes machine learning applicable in system modeling.Figure 3 shows a diagram of system modeling using machine learning.This modeling is similar to the modeling based on mathematics.The different is that the machine learning based modeling does not result in any mathematics equations, as it is a black box model.The refinement model of the machine learning modeling is done through the learning process by using the error.Several algorithms can be used in the learning process, such as knearest neighbor (k-NN), support vector machine (SVM), multilayer perceptron, decision trees, random forest, etc.

The k-nearest neighbors algorithm
The k-nearest neighbors (k-NN) algorithm is a simple yet powerful machine learning algorithm that can be used in solving classification and regression problems.The k-NN is a non-parametric and instance-based learning algorithm that does not require assumptions about the underlying data distribution.This algorithm works based on the k number of closest neighbors of a data point that is of concern.The k closest neighbors of a data point q are any k number of data points that are located at the shortest distance from q. Calculating distance between data points in a vector space is therefore required in the k-NN algorithm.There are several distance metrics that can be used to calculate the distance, and one of them is the Euclidean distance.Euclidean distance of two data points, p and q, is defined as follows: where d is the distance and the superscript (.) T is the vector transpose operator.Once the distance between a concerned data point, q, and the other data points has been calculated, the distances are then sorted and only the k number of data points with the shortest distance is considered.Figure 4 shows an illustration of the three closest neighbors of data points q1 and q2.eighbors of data points q1 and q2.In performing classification, the k-NN determines the class of the data point q based on the majority class of the k number of closest neighbors.The majority class can be obtained through a voting mechanism given by the following function: where V is the vote function for class yj by neighbor xc.The function h(yj , yc) is defined as follows: returns 1 if yj and yc are a match, indicating that they belong to the same class; otherwise, returns 0. For the regression task, the k-NN algorithm calculates the output value of q based on the average value of the k-nearest neighbors as given as follows: where yq is the output value of the concern data point q and yi is the output value of the data point pi which is included in the k closest neighbor.

Data set
A data set is a key point in the development of machine learning.It is a raw material to build machine learning.The data set is a collection of data that may consist of many rows and columns.Data in a row can be considered as a pair of input and output.The input and output data can be scalar or vector.In data science, the input and output data are known as feature and target, respectively.Modeling a solar system using machine learning requires a dataset that consists of the input and output data of the solar panel.Basically, the main input of the solar panel is solar radiation and the main output is electrical power.However, several other factors, internal as well as external, that may influence the output of solar system can be considered as input, such as weather, wind, solar panel orientation, etc.
In this study, a solar system data set that includes power production and weather conditions is used to build machine learning.The data set is secondary data taken from Kaggle [20].The data set provides information about wind speed, sunshine, air pressure, radiation, air temperature, air humidity, and produced electricity power that was measured every hours from the 1st of January til the 31st of December in 2017.The data set consists of 8760 data rows.The sample data showing the measurement result during a day in Summer is shown in Table 1.

Machine learning development
A machine learning is developed based on the dataset to model the solar system in producing electric power.Wind speed, sunshine, air pressure, radiation, air temperature, and air humidity are defined as input to the system, and the electrical power produced is the output of the system.The input and output of the system are the feature and target of machine learning, respectively.The development of machine learning is done using Python programming language.The flow chart of the program is shown in Figure 5.It begins by importing the required library, including numpy, sklearn, panda, and matplotlib.A data set is then loaded into the program and followed by pre-processing of the data by removing the incomplete data.From the data columns, which columns are the features and target of the machine learning is then defined.The features and target are pairing data, which are then split into training data and testing data.The training data are used to build intelligence on the machine through a learning process, while the testing is used to validate the intelligence of the machine through an examination.The next step is to build machine learning using the sklearn library, where the structure and machine learning algorithm are defined.The k-NN algorithm is applied as the learning algorithm.The machine does not yet have any intelligence.Intelligence is built through the training step, where the machine learns the relationship between the features and the target of the training data.The intelligence of the machine is validated by examining the machine with the test data.In the validation step, the machine is given the feature data contained in the testing data and is tasked with calculating the output, which is known as the estimated output.Machine performance evaluation is done by comparing the estimated output and the actual output, which is the target of the test data.The difference between the estimated output and the actual output is defined as the error that is used to determine the accuracy of machine learning.

Results and discussion
A machine learning was built to model electric power production in a solar power system.The aim is to develop a model that is able to estimate the generated power in the solar power system, The machine was built by implementing the k -NN algorithm, and a data set of the solar power system adopted from [20] was used as learning data.The data set contains 8760 measurement data of the solar power system that include recorded time, wind speed, sunshine, air pressure, radiation, air temperature, air humidity, and produced electricity power.In this modeling, the data of produced electricity power in the data set are defined as the output or target that is going to be estimated by machine learning.Wind speed, sunshine, air pressure, radiation, air temperature, and air humidity data are defined as input or a feature of machine learning.The data set is divided into training data and testing data with composition 80% and 20%, respectively.The performance of the model is indicated by the accuracy of the model in estimating the generated power.The accuracy is defined as follows: where Ac is the accuracy, y is the actual output, and  ̂ is the estimated output.The difference between the actual output and the estimated output is known as the error defined in (5).
The k-NN algorithm performs computation based on the k number of closest neighbors.Different values of k may result in different machine accuracy.Figure 6 shows a comparison of the k number and accuracy of the machine learning in modeling the solar power system.It is shown that the k equals 10 and 11 result in the best accuracy, which is 69.6%.Both k values can be used in this modeling as the k-NN machine learning performs a regression task, where there is no pooling as in the classification task.The k = 11 is chosen in the development of machine learning as k should be an odd number.Some samples of the estimation result using the machine learning together with the actual data, the estimation error, and the feature data are given in Table 2.The samples shows that, for some cases, machine learning was able to perform excellent estimation, but for other cases, machine learning did poor estimation.Therefore, it resulted in a total accuracy of 69.6%.
This study was able to develop a machine learning based model to estimate the solar power generated based on natural data, such as wind speed, sunshine period, air pressure, solar radiation, air temperature, and air humidity.Machine learning was developed by implementing k-NN algorithm and resulted in an estimation accuracy of 69.6%.The accuracy result is comparable to another similar study presented in [22].The study in [22] used a different dataset and applied several machine learning algorithms to develop solar power forecast models.The models developed in [22] resulted in different accuracy in the range from 57.9% to 70.1%, where applying the k-NN resulted in accuracy of 64.9%.

Conclusions
A study on modeling solar power generation using machine learning has been presented.The machine learning was developed by implementing the k-NN algorithm.A secondary data set of power generation in solar power plant was applied to train and examine the machine learning.The data set consists of 8760 measurement data on the power generated and the weather in the solar power plant during one year in 2017.The machine learning was trained using 80% of the data and the rest was used to examine the machine.The best performance of the machine resulted in an accuracy of 69.6% in estimating the power generation achieved with k = 11.The result showed that the machine learning developed was able to estimate the generated power in a solar power plant.Improving the machine learning accuracy is a concern in the continuation of this work.
the unit of the electrical charge (1.6021 × 10 −19 Coulombs), k is the Boltzmann constant (1.3806 × 10 −23 m 2 kg.s −2 K −1 ), and the T is the temperature of the cell in Kelvin.

Fig. 4 .
Fig. 4.An example of three closest neighbors of data points q1 and q2.

Fig. 6 .
Fig. 6.Comparison of k number of neighbors and accuracy.

Table 1 .
Power production of solar panel system: Sample data.

Table 2 .
Power production versus power estimation of solar panel system: Sample data.