Automatic solution for solar cell photo-current prediction using machine learning

. In this paper, we discuss the prediction of future solar cell photo-current generated by the machine learning algorithm. For the selection of prediction methods, we compared and explored different prediction methods. Precision, MSE and MAE were used as models due to its adaptable and probabilistic methodology on model selection. This study uses machine learning algorithms as a research method that develops models for predicting solar cell photo-current. We create an electric current prediction model. In view of the models of machine learning algorithms for example, linear regression, Lasso regression, K Nearest Neighbors, decision tree and random forest, watch their order precision execution. In this point, we recommend a solar cell photocurrent prediction model for better information based on resistance assessment. These reviews show that the linear regression algorithm, given the precision, reliably outperforms alternative models in performing the solar cell photo-current prediction I ph


Introduction
In recent years the whole world is moving towards the production of renewable energies due to the depletion of * Corresponding author: mohammedazza81@gmail.com fossil energy resources. It is a strategy to avoid the energy crisis and to partially end the pollution of our planet triggered by the combustion of fossil fuels, the advantage of this kind of energy exists from its respectable relationship with the environment. Solar energy is the promising source of energy in industry, which is why we are interested in solar radiation that arrives at the earth's surface to seek information for the implementation of photovoltaic system, is optimal. The photovoltaic effect is a physical phenomenon where there is a conversion of light energy into electrical energy in a semiconductor [1], through a device which is the photovoltaic cell. The absorption of light photons results in the generation of excitons by recombination of electrons and the holes thus formed as a result of the jump of electrons from the valence band to the conduction band, in recent years the world has put this technology primarily because of the increase in energy requirements, this is why we are interested in studying the electrical parameters of this technology. [2] In this work we know how to focus on the photocurrent prediction of the photovoltaic cell from the measured results of shunt and series resistance using machine learning which plays a major role in image detection, spam reorganization, normal voice control, product recommendation and medical diagnosis. The current machine learning algorithm helps us improve safety alerts, ensure public safety, and improve medical improvements. The machine learning system also provides better customer service and safer automotive systems. The algorithms used are linear regression, Lasso regression, K Nearest Neighbors, decision tree and random forest, aim to predict future photocurrent values, as we have compared the Precision between the algorithms used, this prediction will allow us to specify the most valid method for the prediction of the photocurrent of the photovoltaic (PV) system studied. Our solar cell is modeled by the electrical circuit shown in Figure 1.

Material and method
This section provides details about the data and tools used in the study. The choice of algorithms to perform the solar cell photo-current forecast took into account the nature of the data set. Linear regression, KNN, decision tree and random forest are the different methods that have been applied for the regression. For this we present the values of the parameters to be used to solve this problem in the table represented in table 1. These parameters are calculated by the method of least squares and the method of Newton -Raphson which consists in extracting the parameters of the solar cell.

Least squares method
The extraction of the electrical parameters by the Least Squares algorithm (LMS) to determine the electrical parameters using the one-diode model which is very sufficient to describe the electrical operation of the solar cell [4].
The principle of the least squares method consists in minimizing the sum of the squares of the deviations E called the objective function, between the N experimental measurements.

The loss functions
Before running a regression model, a loss function must be chosen. Although the mean squared error (MSE) is one of the best known and most used regression measures [5], here we have evaluated our models with MSE, RMSE and MAE. (2) Where yi, i and n are, respectively, the true value, the predicted one and n the number of samples. the root mean square error (RMSE) is defined as the square root of the MSE [6].
The mean absolute error (MAE) is another standard way to measure a model's error in predicting quantitative data. Mathematically, it is defined as follows [6] :

Simple linear regression
A simple linear regression statistical method allows us to summarize and study the relationship between two continuous quantitative variables [7]: = + Where X, Y are, respectively, input data, our prediction, a and b are the learning coefficients and which allow us to produce the most accurate predictions.

Multivariate regression
A more complex multivariate linear equation is used to check whether there is a statistically noticeable association among the sets of variables [7][8].
( > , C , R ) = > > + C C + R R + T (5) Or the variables x_1, x_2, x_3 represent the attributes or distinct information for each observation, Y is our prediction and w represents the coefficients or weights, our model will try to learn. [6]

K Nearest Neighbors Regression
K Nearest Neighbors (KNN) regression is a simple algorithm that stores all available cases and predicts the digital target based on a similarity measure. The distance functions used by the KNN regression are the same functions as the KNN classification [9]. Euclidean: W AD> Manhattan: Minkowski:

Lasso Régression
We usually define the design matrix as a matrix having n rows and columns, representing the variables for n instances. is the coefficient vector [10]. We also define the target vector as a column length vector containing the corresponding values of the target variable. The lasso regression technique attempts to produce a sparse solution, in that several of the slope parameters will be set to zero. The Lasso is also formulated with respect to the matrix X. In addition, the penalty L1 is only applied to the slope coefficients, and therefore the intercept, β0, is excluded from the penalty term. Therefore, the Lasso can be expressed as a constrained minimization problem [11]. Or ≥ 0 and, as before, there is a one-to-one correspondence between and . The L1 penalty makes the solution nonlinear in the . The constrained minimization above is a quadratic programming problem, the solution of which can be efficiently approximate

Decision tree
The decision tree builds regression or classification models in the form of a tree structure. It decomposes a dataset into smaller and smaller subsets while gradually developing an associated decision tree. The end result is a tree with decision nodes and leaf nodes [12].

Random Forest
Bootstrap refers to random sampling with replacement. This is a general procedure that can be used to reduce variance for algorithms that have high variance, usually decision trees [13]. Random Forest is part of Bootstrap techniques. Trees in random forests are executed in parallel. There is no interaction between these trees during construction [14] It works by building a multitude of decision trees at training time and displaying the class which is the mode of classes (classification) or mean prediction (regression) of individual trees [15].

Prediction of solar cell current photo values
In data science, the analyst might be interested in the estimated values for the objects in the training set, predicting the responses for the new objects. However, identifying the most important input variables for making good predictions is difficult due to the complex relationships between these variables [16].
This study presents the prediction of photo-current using different algorithms such as linear regression, lasso algorithm, KNN, decision tree, random forest based on data such as shunt resistance and series resistance calculated by least squares and Newton-Raphson methods.    In figure 3 we have evaluated our model using the decision tree algorithm, in this case the algorithm is reliable in the first observations, up to node number 3 in which it has a large difference between the values predicted and actual values. In this graph the actual values and the predicted values are moderately far apart in node number 3, on the other hand in the other nodes the forecast is optimal due to a minimum distance which can be seen between the values to be optimized.   Graph 6 shows that the predicted values do not coincide with the actual values due to high RSME value which says that this algorithm is unreliable in predicting the photocurrent values of the solar cell studied so the description of results obtained by this method remains invalid. From this graph, we can see that the linear regression and the lasso regression and the random drill algorithm give good precision in the prediction of the photocurrent values. Fig. 6. RSME test given by the algorithms used.

Comparison of algorithms
To interpret these results, we first compare RSME of these algorithms, such that the KNN algorithm has a very high RSME compared to the other algorithms, which justifies that the prediction of Iph by this method is low since it evokes a large error, and we find that the linear regression gives a very low RSME which shows that this algorithm is reliable due to its low error values.  The following result confirms that KNN has a high value of MAE which justifies that it is weak in the prediction of parameters studied in a valid way, on the other hand the linear regression algorithm always remains weak during the tests carried out. This table summarizes the model precision values, MSE, RSME and MAE, so the classification will be made from the most reliable algorithm to the less reliable one, first the linear regression has a value of 0.959437 followed by the lasso regression with a value of 0.958530, then the random forest with a value of 0.848903, then the decision tree algorithm with a value of 0.617507 and finally the KNN with a low value of 0.593065, and for RSME, MSE and MAE have the same classification as the precision of the model, so the linear regression algorithm is very reliable compared to the other algorithms studied.

Conclusion
In this work we used five algorithms for the prediction of the values of Iph of the photovoltaic cell, such as the linear regression which presents a good contribution between the real values and the predicted values, this method presents optimal values concerning MSE, RSME and MAE, the same description is carried out on the lasso regression and the random forest algorithm which also give better results for the prediction of this electrical parameter which characterizes the solar cell studied.
The other two KNN algorithms and the decision tree algorithm are moderately low in accuracy, because there is a gap between the predicted and actual values. At this point it can be concluded that the The reliable algorithm in this study is that of linear regression.