Research on overhead line Engineering Cost prediction based on PCA-LSSVM model

. In recent years, the investment projects of overhead line engineering increase year by year. Establishing scientific cost prediction concept and optimizing cost prediction method can improve the investment utilization efficiency. Based on the actual cost data of 110kV overhead line project, this paper extracts the principal component factor through principal component analysis and eliminates the correlation between the original indicators. Then, the training sample is input into the least-squares support vector machine model to build a learning network. Finally, the predicted value of the model is compared with the actual cost level for analysis. The prediction results show that the average error rate is less than 5%, indicating that the PA-LSSVM model constructed in this paper can effectively predict the overhead line engineering cost.


Introduction
The cost level prediction of overhead line engineering is a complex process with multiple variables and nonlinearity, and its cost influencing factors mainly include line length, wire quantity, single wire area, tower base, tower material quantity, tower material price, altitude, earth and stone quantity, base steel quantity, base steel price, etc. [1]. Because there are many influencing factors of project cost, quantitative prediction method is more applied to overhead project cost prediction. Based on the principle of structural risk minimization, SUPPORT vector machine has better generalization ability. Moreover, its optimal solution is based on limited sample information, which is more suitable for cost prediction in the case of small samples. Moreover, the training time of support vector machine model is shorter, which reduces the empirical prediction component [2]. Therefore, based on the main points analysis (PCA) and improve the support vector machine (SVM) -least squares support vector machine (LSSVM) hybrid algorithm, this paper establishes the overhead line construction cost prediction model through feature extraction and small sample research, so as to achieve the goal of faster prediction of overhead line construction cost and guidance of construction project cost management.

Principal Component Analysis
Principal component analysis (PCA) is a dimensionality reduction method of data. Through feature extraction, fewer comprehensive independent indicators are obtained, and the original information is not lost to the greatest extent. The advantage of principal component analysis is that multiple variables originally linearly related are transformed into several independent variables, which not only achieves the purpose of dimension reduction but also retains most of the information of the original variables [3].

Least squares support vector machines
Support vector machine (SVM) is a new machine learning method proposed by Vapnik et al. in 1995 based on statistical learning theory. It replaces the principle of minimizing empirical risk in traditional statistical learning by seeking structural risk Minimize to improve the generalization ability of the learning machine, so that in the case of small samples, it can also achieve better prediction accuracy [4]. In 1999, Suykens J.A.K introduced the least squares linear system into the support vector machine, forming the theory of least squares support vector machine. LSSVM uses the quadratic loss function to replace the insensitive loss function in SVM, and changes the inequality constraint conditions into equality constraints, thereby further improving the learning accuracy [5].
, The optimization problem of LSSVM can be expressed as: In this formula,  is the weight vector;  is the regularization parameter; b is the deviation value; is the n sample point error variable [6].
Because it is difficult to solve the original problem, the Lagrange function is used to transform the original problem into a dual problem, so as to optimize the dual problem. And by introducing a kernel function, this paper maps the sample in the original space to a vector in the high-dimensional feature space to solve the linear inseparable problem in the original space. There are many settings for kernel functions, and radial basis kernel functions are used in this article [7].

Overhead line project cost prediction model 4.1 Principal component analysis
This article takes the data of 46 110kV overhead lines completed in 2019 in an area as an example, and collects 16 indicators in the project construction to verify the established cost prediction model. In the model, the unit length cost index is used as the output dependent variable, and the total length of the line (folded order), the number of loops, the amount of tower material, the price of tower material, the ratio of tensile strength, the area of a single wire, the amount of wire, the price of wire, the amount of steel, A total of 15 indicators, such as steel price, topography distribution, geological conditions, earthwork volume, total amount of foundation concrete and altitude are used as input independent variables of the model.

Cost forecast indicator data
Combined with the actual situation of overhead line engineering cost, this paper established a cost prediction model based on principal component analysis and least squares support vector machine. After pre-processing the collected data samples of 46 impact line projects on 15 impact factors according to 4.1, this paper uses SPSS software to extract the principal components of many impact factors to achieve the purpose of dimensionality reduction and index correlation reduction, Table 1 Interpretation table of total variance obtained by principal component analysis. In actual engineering applications, when the cumulative contribution rate is generally set to be greater than or equal to 85%, it can be considered that the extracted principal component factor can already replace the original index data. According to Table 1

LSSVM cost prediction model
In this paper, the LSSVM toolkit is added to the MATLAB software, and the training program is written to predict the cost. According to 4.2, the first five principal component factors are taken as input variables, and the cost per unit length is used as output variable, and then 36 projects are randomly selected as training sets from 46 overhead line projects and input into the least squares support vector machine model. After training and learning, the remaining 10 overhead line projects are used as the test set, and the predicted results are tested according to the trained model, and compared with the actual unit length cost to calculate the model error rate. The fitting situation of the training samples is shown in Figure 1, and the prediction results and error rates of the test samples are shown in Table 3.   According to Table 3, the transmission and transformation project cost predicted by the model is basically consistent with the actual project cost, and the error rate of the 10 test samples is all below 5%, which meets the 5%~10% error rate allowed in the project budget estimate And the requirement of 3%~5% error rate allowed in the project budget. Therefore, the cost prediction model based on PCA-LSSVM established in this paper has high accuracy, which can accurately predict the cost of overhead line engineering.

Conclusion
Aiming at the problems that there are many factors influencing the overhead line project cost and the difficulty of prediction, this paper builds a PCA-LSSVM-based overhead line project cost prediction model, and uses 46 overhead line project cost index data in a certain area to test and verify the constructed model. The results show that the error rate between the predicted value and the actual value of the 10 test samples is below 5%, and the prediction accuracy is high. Therefore, the cost prediction model based on PCA-LSSVM established in this paper is scientific and effective, and can accurately predict the cost in the case of a small sample, which has certain reference significance for the actual budget work of overhead line engineering and cost control.