Research on the Classification of the Operational Test

The operational test, along with the evaluation based on it, is the most important decisionmaking basis for the Department of Defense in acquiring weapons and equipment. However, there has always been a wide range of indicators needed to be test, which makes the planning, implementation and data processing of operational tests very complicated. In order to alleviate this problem, this paper studies the statistical meanings of various operational tests, analyses their relationships, and unifies them under the framework of linear modelling.


Introduction
Operational test and evaluation (OT&E) of the department of defense (DoD) is the field test under realistic combat conditions, of any item of (or key component of) weapons, equipment, or munitions for the purposes of determining the effectiveness and suitability of the weapons, equipment, or munitions for use in combat by typical military users; and the evaluation of the results of such test [1]. OT&E provides the most critical evidence for the DoD to make decisions on the acquisition of weapons and equipment, therefore it has important practical significance.
Therefore the operational test(OT) is a process of observing the completion degree of the combat task of the weapon and equipment under the control of professional fighters and the degree of adaptation to the given combat task by simulating the actual combat. There are a wide variety of indicators for the OT, including operational distance, response time, implementation accuracy, maneuverer speed, coverage range, etc., which are derived from the combat mission and can reflect the operational effectiveness [1] of the completion of the combat mission of SUT, and reliability, availability, maintainability, supportability, compatibility, etc. derived from the combat mission, which can reflect the operational suitability [1] index of the combat mission of SUT.
From the perspective of statistics, this paper classifies these indicators into three categories, and studies the relationship between them. Based on these studies, a conclusion can be drawn that all kinds of tests of the OT can be unified under the framework of linear model, so as to provide a useful reference for further research on the OT.

The first class of OT: comparison with the threshold
This kind of inspections mainly examine whether a certain index of the SUT has reached the threshold stipulated in the contract [2]. For example, "if the ×× weapon performs the ×× combat missions under the ×× environment, is the ×× index not lower than the value of ××?" Generally, it will be performed by comparing the mean value of the SUT and . When the sample is obtained by the OT, the sample mean is taken as the predicted value , and is checked by the hypothesis testing [3]. The mathematical model is Where, all components of are 1's, all components of are independent of each other and subject to .

The second class of OT: Comparison with the baseline force(s)
This kind of inspections mainly examine whether there is a difference between the SUT and the baseline combat force(s) in the achievement of the combat index , while they have performed the same combat mission under the same combat context [4].
Among them, the baseline combat force(s) is(are) the weapon(s) or equipment that is(are) in service, or the weapon(s) or equipment that participate(s) in the bidding at the same time with the SUT [5]. For example, in the ×× context, the ×× weapon performs the same ×× combat mission as the baseline equipment ××, whether the ×× weapon is significantly superior to the baseline combat force.
If there is just one baseline, then generally speaking, it will be performed by comparing the mean value and of the SUT and the baseline combat force. When two independent samples and were obtained in the OT, respectively with two sample mean and as the estimates of and . And then, and will be compared with each other, by hypothesis testing. The mathematical model is: Where, all components of and are 1's, all components of and are independent of each other and subject to and . Since the context, under which the SUT is compared with the baseline, needs to be as consistent as possible, so generally speaking, the sample size of the SUT is equal to that of the baseline, that is . And since the combat mission of the SUT is the same as that of the baseline combat force, and generally, the technological level of them is roughly similar, it can be further assumed that the changes of the index of of them are roughly similar. i.e. . Thus, based on the above two formulas, it can be deduced that: , is an identity matrix.
Using the least square method [7], we can obtain the And then, we can use the model of to predict the index .
By testing of hypothesis about as a whole, or a certain component, we can find out whether the indicator of the SUT is significantly affected by the combat elements, and the combat conditions (that is, the combination of combat elements) which leads to the best performance of weapon equipment index .

Comparative analysis of the three classes of tests
It can be seen that the four models are gradually generalized, that is, the former model is a special form of the latter model.
If all operational factors other than the SUT are taken together as interference factors of the inspected indicator , the model on which hypothesis testing is conducted in the operational test is formula (1). After determining the criteria and the test sample size , the upper bound and lower bound of the test can be determined. When the sample are obtained, if the sample mean is greater than or less than the upper or lower bound, the null hypothesis (weapons and equipment are not up to the standard) is rejected, as shown in figure 1. If we take into account the fact that there is a real difference between the baseline combat force(s) and the SUT in the inspected indicator, and add this fact into formula (1), the model on which hypothesis testing is conducted in OT is formula (4) or (5), and the testing process is shown in figure 2. Then, let the in formula (4) or (5) be the operational elements specified in the combat doctrine, and if we want to check that if the individual or the whole of significantly affects the indicator , the model on which the hypothesis test is based is the formula (7). When sample has been obtained and operational element matrix has been recorded or set, and these data have been fitted by the least square method or its derived algorithm through appropriate data transformation, the result can be shown in figure 3. If the higher order terms or interaction terms of some combat elements are appropriately added and the model, the result can be shown if figure 4.

Conclusion
In the planning phase of the OT, the collocation of combat elements can be reasonably arranged by means of randomization, blocking, orthogonalization, etc., so that the mathematical model referred to in formula (7) can be traced back to the mathematical model referred to in formula (1). Therefore, in the OT, the test design can be carried out based on the mathematical model referred to in formula (7). And if necessary, the test data can be processed to answer the questions investigated in formula (1) so as to improve the overall benefit obtained from the operational tests.
Since the hypothesis tests based on formula (4) or (5) are also t-test and ANOVA respectively, therefore, power analysis and sample size calculation in all types of OT discussed above can be studied based on formula (7).