Combination Forecasting Method of Ship Maintenance Cost Based on Integrated Weighting

In view of the existing combination forecasting methods based on rough set theory that may not be able to weight individual individual forecasting models, the attribute importance in the original method is adjusted and combined with the root mean square error of the individual forecasting model to form a new attribute importance. The new attribute importance is used to determine the combination forecasting weight coefficients, which solves the problem that the original method cannot be weighted, and increases the consideration of forecasting accuracy. Weight coefficients are also determined according to the historical forecasting performance of the models, which reflects the forecasting stability of the models. The integrated weighting method is used to fuse the two kinds of weight coefficients. Based on a certain type of ship maintenance cost data example, the improved method is compared with the commonly used combined forecasting methods, and better results are obtained, the accuracy and stability of the forecasting are improved, and the effectiveness of the method is verified.


Introduction
In the forecasting of ship maintenance costs in of in China, due to the large time span of costs and many influencing factors, the collection of relevant data started late [1][2][3],using traditional cost estimation methods and individual forecasting models is difficult to describe its changing law [4].Bates and Granger [5] proposed the concept of combination forecasting in 1969. They recognized the difficulty of constructing real models. They regarded various individual forecasting models as different pieces of information, and used the combination of information to disperse the uncertainty of an individual forecasting model and reduce the overall uncertainty, which has strong adaptability and better stability. Since the theory was put forward, the research on combination forecasting has always been a hot issue in the field of forecasting. Hibon and Evgeniou [6] selected 3003 time series and 14 forecasting methods in the M3 competition database to conduct experimental research and found the forecasting risk of choosing a combination model is less than the choice of an individual model, which demonstrated the universal advantages of the combination forecasting model. The M4 competition that ended in 2018 once again affirmed the important position of the combination forecasting [7].
There are many studies on the use of combination forecasting methods to forecast ship maintenance costs. Literature [8] is based on minimizing the Mahalanobis distance to obtain combination forecasting weights to forecast ship maintenance costs; Literature [9] proposed combination forecasting model based on set pair analysis and the error, set pair analysis combination forecasting based on error direction and set pair analysis combination forecasting based on model performance are given, which can get accurate forecasting results of ship maintenance costs; literature [10] introduced the Markov chain into combination forecasting modeling of ship maintenance cost, and verified the feasibility and effectiveness of the model for solving this type of problem.
However, the combination forecasting methods used in the above literatures still have problems such as complicated optimization calculation process, easy to fall into local optimum, and subjectivity of the methods. Literature [11][12] used rough set theory to establish a combination forecasting model and used the attribute importance based on the degree of support in rough set theory to determine the weight coefficients. Not only was the calculation simple, but also the data-driven characteristics made it has strong objectivity. However, the existing combination forecasting methods based on rough set theory may have the problem that the degree of support of all individual forecasting models may be 0, which makes it impossible to determine the weights of these models, and the information contained in the original attribute importance is relatively single. For all the information contained in the individual forecasting model, there is a serious lack of information, such as accuracy and model forecasting stability.
Based on the above problems, the article proposes a new attribute importance, which combines the root mean square error that characterizes the forecasting accuracy of an individual forecasting model with the original attribute importance to form a new attribute importance, taking into account both the classification ability of condition attribute to decision attribute under rough set theory and the forecasting accuracy of the individual forecasting model. At the same time, the historical forecasting performance of the models is taken into account in the weight calculation, which reflects the consideration of stability. Finally, based on the above content, the integrated weight coefficients of the combination forecasting is determined, and the comparative analysis is carried out in combination with examples and common combination forecasting methods.

Rough set theory
Rough Set is an important mathematical tool for dealing with uncertain and incomplete data proposed by Polish scientist Pawlak [13] in 1982. Its main idea is to start from the description of a given problem and maintain the classification of concepts. Under the condition that the rules remain unchanged, the classification rules of concepts are derived through knowledge reduction.

Basic concepts of rough set theory
An information system S can be expressed as S=(U, A, V, f), where U is a collection of objects, also called the universe of discourse; A is a collection of attributes; V=⋃ ∈ is The range of attribute a; f: U×A→V is an information function, which assigns an information value to each attribute of each object, namely ∀a∈A, x∈U, f(x, a) ∈ , can be divided into condition attribute set C and decision attribute set D, namely C∪D=A, C∩D=∅, then S is called decision system or decision table.
In the information system S, for the attribute set X ⊆ U, R is an equivalent relationship, and two subsets are defined: ∩X ∅}, respectively called the R lower approximate set and R upper approximate set of X.
For any attribute set P ⊆ A, IND(P)={(x,y) ∈ U U;f(x,a)=f(y,a), ∀ a ∈ P}, where IND means indistinguishable relation, a division of the symbol U/IND(P) on U.
Let P and Q be the equivalent relationship on the universe U, the P positive field of Q is recorded as (Q), that is (Q)=∪ -(X), the P positive field of Q It is the set of all the knowledge expressed by the U/P in the universe U that can be correctly classified into the equivalence class of U/Q.

attribute importance
For a decision-making system, different condition attributes have different degrees of support to decision attributes. The degree of support of condition attributes to decision attributes is called the attribute importance. There are many definitions and explanations of attribute importance in rough set theory, such as attribute importance based on support [12], attribute importance based on dependence [14], attribute importance based on mutual information [15], attribute importance based on conditional information entropy [16] and attribute importance based on attribute frequency [17][18], etc. The importance of the above various attributes represents the importance of condition attributes to decision attributes from different perspectives, which is an important content in rough set theory.

Problems with existing combination forecasting methods based on rough set theory
Literature [12] used the degree of support as the basis for determining the weight coefficients of the combination forecasting. Regarding the individual forecasting model set as a condition attribute set, according to rough set theory, the degree of support of condition attribute C to decision attribute D is defined as: it expresses the degree of support or classification ability of the condition attribute to the decision attribute, and the support of a single condition attribute to the decision attribute is defined as the change in the degree of support or classification ability of the condition attribute to the decision attribute after the attribute is removed. The formula is: sig(a,C;D)= (D)-(D) (2) the larger the sig(a, C; D) is, the greater the influence of a single condition attribute a on the decision-making in the condition attribute set C, the more important a is; conversely, the less important a is.
However, the problem with the above method is that for some decision-making systems composed of the forecasting value and actual value of the individual forecasting model, the degrees of support of all individual forecasting models (condition attributes) may be 0, which makes it impossible to determine the combination forecasting weight coefficients. Examples are as follows: We take the maintenance cost data of a certain type of ship (the data has been processed) to show the problem, and use the ARIMA model, the multilayer perceptron model, the RBF neural network model, the unary linear regression method and the moving average method to forecast the set of data and obtain the forecasting value. The sequence constitutes Table 1. The 5 methods are respectively labeled as methods a, b, c, d and e. take the first 6 sets of data in Table 1 as the training set, and the 7th-9th sets of data as the validation set. After discretizing the training set data, a decision Table 2 is constructed.
The following results are obtained from decision  =0. According to the above results, the attribute importance of all conditional attributes is 0, so the combination forecasting weight coefficients cannot be determined from this result; and in the existing methods, the forecasting accuracy and stability of the model are not considered, so there may be situations where the forecasting accuracy is not high, or the overall forecasting accuracy is high, but the actual accuracy of some individual forecasting models is not high, and the forecasting stability is not strong.

Combination forecasting method based on improved attribute importance and model historical forecast performance
In response to the problems of the above methods, this article first adjusts the original attribute importance to the attribute frequency that can also reflect the ability of condition attributes to classify decision attributes, and combines the root mean square error that can characterize the forecasting accuracy of the individual forecasting model to form a new attribute importance as the basis for determining the weight coefficient of the combination forecast. Then, the historical forecasting performance of the model is considered in the calculation of the weight coefficient, and the final combination weight coefficients are determined by the historical forecasting performance of the model and the new attribute importance.
Attribute frequency refers to the frequency at which a single condition attribute appears in the discernibility matrix. The higher the frequency of a single condition attribute in the discernibility matrix, the stronger its ability to classify decision attributes. Correspondingly, its attribute importance is also higher. Literature [17][18] describes the importance of attributes based on the frequency of attributes, and the construction principles of the discernibility matrix on which the calculation is based are as follows: {a∈ : , , }, when , , ; = ∅, in other cases. According to the above construction principles, the attribute frequency expresses the frequency of different condition attribute values when the decision attribute values of two objects are different. It is also used to express the effect of this type of condition attribute on distinguishing two objects. It is usually used to characterize the importance of attributes in rough set theory, and from its calculation process, it can be seen that in the discernibility matrix, the attribute frequency of each conditional attribute won't be 0 in any case obviously, which makes it more scientific and reasonable to determine the weight coefficients of each individual forecasting model using attribute frequency to characterize the importance of attributes, and it is easier to understand and to calculate the importance of attributes in terms of attribute frequency.
The root mean square error (RMSE) is used as a measure of the systematic error of the forecasting value. It expresses the average deviation between the forecasting value sequence of an individual forecasting model and the actual value sequence. It can be used to describe the forecasting accuracy of the models. Its calculation formula is: the smaller the root mean square error, the lower the average deviation between the forecasting value sequence and the actual value sequence, that is, the higher the average similarity between the forecasting value sequence and the actual value sequence, the higher the forecasting accuracy of the individual forecasting model, and vice versa. The higher the average degree of deviation, that is, the lower the average similarity between the forecasting value sequence and the actual value sequence, the lower the forecasting accuracy of the individual forecasting model. In order to make the forecasting accuracy and its change in the same direction convenient to express, take the reciprocal of the root mean square error. This metric better reflects the degree of fit between the forecasting value and the actual value than indicators such as correlation coefficient and gray correlation degree, because other indicators emphasize the similarity of the change trends, but not the similarity of the values. If other indicators are used, even if the change trends are similar, there may be large deviations in the values. Therefore, it is more reasonable to use the reciprocal of root mean square error as the measure of forecasting accuracy. In order to eliminate the influence of numerical value, it needs to be normalized. The attribute frequency and the reciprocal of root mean square error are based on different angles to reflect the importance of the forecasting value of the individual forecasting model to the actual value. Therefore, the normalized reciprocal of root mean square error and the attribute frequency are combined to form a new attribute importance. The formula is as follows: δ =γ · θ (5) where δ is the new attribute importance, γ is the attribute frequency, used to characterize the importance of the attribute, θ is the normalized root mean square error reciprocal, used to characterize the forecasting accuracy of the individual forecasting model, j=1,2 ,..., m is the serial number of the individual models. The new attribute importance includes both the attribute frequency that characterizes the importance of the attribute in rough set theory and the reciprocal of root mean square error that characterizes the forecasting accuracy of the individual forecasting model. The two measures are merged to highlight their respective characteristics, making the combination forecasting more effective. The weight determination process includes accuracy information, and the weight coefficients obtained are more scientific and reasonable.
However, the two aspects of the attribute frequency and forecasting accuracy contained in the new attribute importance both reflect the overall forecasting effect of the individual model, but the historical forecasting performance of each model is not expressed, and the stability of the model forecasting is lacking. There is still a certain degree of deficiencies in the amount of information contained.
Therefore, it is considered to integrate the weight coefficient determined by the metric that characterizes the historical forecasting performance of the models with the weight coefficient determined based on the new attribute importance to obtain a new weight coefficient, in order to improve the stability of the combination forecasting.
When describing the historical forecasting performance of an individual forecasting model, it is necessary to express the closeness of each forecasting value to the actual value in the historical forecasting. The closeness cannot be expressed by the root mean square error, and the relative error and other common errors may have the problem that the result is 0, which makes is impossible to determine the weight coefficient, so it is necessary to choose a more appropriate measurement. Similarity is used as a measure to describe the how similar it is between two variables. It has strong applicability to this type of problem, and the result will not be 0. Therefore, the similarity between the forecasting value and the actual value of each individual model is used for calculation: Among them ， is the forecasting value of a certain model in a certain forecasting, is the actual value of a certain forecasting, i=1,2,...,n is the serial number of forecasting times, j=1, 2, …m is the serial number of the individual models. For the convenience of calculation, the similarity between the forecasting value of each model and the actual value in the historical forecasting is normalized, and the result obtained is the performance of the model in the current forecasting: Finally, the historical forecasting performance of each model is integrated: Since the new attribute importance and the historical forecasting performance of the model are based on different angles to express the forecasting effect of the model, the integrated weighting method is considered to determine the weight coefficients.
Firstly, the formula for determining the weight coefficients of the combination forecasting based on the new attribute importance is: * = ∑ (9) then calculate the weight coefficients based on the historical forecasting performance of the model according to the integrated weight coefficient of each individual model: Finally, the weight coefficients based on the new attribute importance and the weight coefficient based on The steps for establishing the combination forecasting model based on the new attribute importance and the historical forecasting performance of the models are as follows: Step 1: Discretize the data in the data table, characterize the continuous data, and build a decision table; Step 2: Construct a discernibility matrix according to the decision table, and determine the frequency of apperance of each individual forecasting model in the discernibility matrix; Step 3: Calculate the root mean square error of each individual forecasting model, and calculate the new attribute importance according to formula (5); Step 4: Calculate the similarity of each forecasting value to the corresponding actual value according to formula (6), and calculate the historical forecasting performance and integrated performance of each model according to formulas (7) and (8); Step 5: Calculate the weight coefficient * based on the new attribute importance, the weight coefficient based on the historical forecasting performance of the model and the final weight coefficient according to equations (9)-(11); Step 6: Combine the individual forecast models according to the weight coefficient .

case analysis
The calculation of the weight coefficients is also carried out with the maintenance cost data of a certain type of ship (the data has been processed). The detailed calculation steps are as follows: Step 1: Use isometric method to discretize the data in the table and build a decision table, the details are in Table  2 for; Step 2: Construct a discernibility matrix according to the decision table, as shown in Table 3: Step 4: Calculate the integrated historical forecasting performance of each model according to formulas (6)- (8) as μ= [0.000071,0.000086,0.00014,0.000051,0.000012]; Step 5: Calculate from equations (9)-(11), the weight coefficient based on the new attribute importance is * = Step 6: Combine the individual forecasting models to obtain the forecasting value according to the calculated weight coefficients.
In order to verify the effectiveness of the improved method, the combination forecasting results obtained according to the improved method, the equal weight combination method, the RMSE reciprocal method, and the SSE reciprocal method are used to compare the forecasting values based on the 7th-9th groups of the data. The combination method based on the original attribute importance of rough set cannot determine the weight, so it does not participate in the comparison. The comparison results are shown in Table 4: It can be seen from Table 4 that the improved combination forecasting method has certain advantages in the average forecasting accuracy and forecasting accuracy of an individual forecasting compared with the equal weight combination method, the RMSE reciprocal combination method and the SSE reciprocal combination method, which reflects its ability to improve the forecasting accuracy and the effect of forecasting stability, the effectiveness and feasibility of the method is verified.

Conclusion
This paper analyzes the existing combination forecasting method based on the attribute importance of rough set theory, points out its problems in determining the weight coefficients, adjusts the importance of the original attribute, and characterizes the average forecasting accuracy of the individual forecasting model. The root square error is combined with it to form a new attribute importance for the determination of the combination forecasting weight coefficients, which solves the problem that the weight coefficients may not be determined in the original method, and makes the information contained in the new attribute importance more comprehensive; at the same time, for the consideration of improving the stability of the forecasting model, the weight coefficients are also determined according to the historical forecasting performance of the models, and the specific calculation steps are given. The integrated weighting method is used to fuse the two kinds of weight coefficients. The integrated weight coefficients also considers the classification ability of condition attributes for decision attributes, the forecasting accuracy of the individual forecasting models, and the forecasting stability of the models. It is an organic combination of rough set theory and combination forecasting theory. An example verifies that this method is superior to the traditional combination forecasting methods, is effective and feasible, and can improve the accuracy and stability of combination forecasting.
The accurate forecasting of ship maintenance costs is a practical measure to resolutely implement Chairman Xi's important instructions on the management and use of funds. It is an effective action for strict savings, reasonable allocation and management of costs, and also to ensure the reasonable allocation and management of ship maintenance costs. The important auxiliary measures used are of great significance for maintaining the ship's ongoing rate and enhancing the combat effectiveness of the troops.