Analytical method for selection an informative set of features with limited resources in the pattern recognition problem

Feature selection is one of the most important issues in Data Mining and Pattern Recognition. Correctly selected features or a set of features in the final report determines the success of further work, in particular, the solution of the classification and forecasting problem. This work is devoted to the development and study of an analytical method for determining informative attribute sets (IAS) taking into account the resource for criteria based on the use of the scattering measure of classified objects. The areas of existence of the solution are determined. Statements and properties are proved for the Fisher type informativeness criterion, using which the proposed analytical method for determining IAS guarantees the optimality of results in the sense of maximizing the selected functional. The relevance of choosing this type of informativeness criterion is substantiated. The universality of the method with respect to the type of features is shown. An algorithm for implementing this method is presented. In addition, the paper discussed the dynamics of the growth of information in the world, problems associated with big data, as well as problems and tasks of data preprocessing. The relevance of reducing the dimension of the attribute space for the implementation of data processing and visualization without unnecessary difficulties is substantiated. The disadvantages of existing methods and algorithms for choosing an informative set of attributes are shown.


Introduction
The total amount of data created in the world is forecast to increase dramatically in the coming years, reaching 175 zetabytes in 2025. The rapid development of digitalization contributes to the ever-growing global datasphere [1]. Such data (Big Data) is often created by integrating various data sources that meet different standards. In such cases, unfortunately, we have practically no opportunity to analyze, systematize and track all these data. In the absence of such opportunities, often, the proportion of "dirty" data, such as inaccurate, incomplete, duplicated, contradictory, noisy, useless, increases in proportion to the amount of data.
Typically, such problems are solved using methods and algorithms for data mining (Data Mining). The practical application of Data Mining methods involves a multi-step procedure, where one of the key steps is data preprocessing. At this stage are carried out [2][3][4]: Data cleaning, i.e. elimination of contradictions, omissions, accidental emissions and interference; Data integration, i.e. combining data from several possible sources in one repository; Data transformation, i.e. data aggregation and compression, attribute discretization and dimensionality reduction, etc.
Modern data arrays, to which one or another Data Mining methods can be applied, can be characterized by a large number of features that form a feature space of large dimension. Therefore, the urgent task is dimensionality reduction of such a space to a dimension that allows data processing and / or visualization without unnecessary difficulties.
To date, a variety of approaches, methods and algorithms have been developed and investigated dimensionality reduction of features .
All approaches to dimensionality reduction of the original attribute space can be divided into two large classes.
The first class provides for the transformation of attribute space. One of the best known and practiced approaches of this class is the principal component method.
Another class of methods consists in choosing the most informative, useful features and exceptions from the consideration of non-informative features without transforming the original space. Various methods and approaches are used here:  with random selection. The analysis shows that many existing methods and algorithms for determining an informative attribute sets (IAS) represent one or another way of partial enumeration and therefore do not guarantee the achievement of an optimal result in the sense of the informativeness criterion used. Moreover, these methods do not take into account many factors associated with the costs of identifying attributes, except for computational costs. In this connection, it is advisable to develop new methods that take into account other costs and would be based on the use of a developed mathematical programming apparatus and give an optimal (or close to it) solution.
The aim of this work is to develop an analytical method of determining the IAS taking into account the resource for criteria based on the use of the scattering measure of classified objects.
Then the total number of training sample objects is: . Suppose that each object in the training sample is a dimensional vector of features, then ‫ݔ∀‬ = ‫ݔ(‬ ଵ , ‫ݔ‬ ଶ , … , ‫ݔ‬ ே ) ∈ ܺ and dim(ܺ) = ܰ. For dimensionality reduction of the initial feature space and selection of the IAS, we use the -dimensional vector ߣ = (ߣ ଵ , ߣ ଶ , … , ߣ ே ), i.e.

Results
Here ߣ = 0 indicates the absence or ߣ = 1 the presence ݆-th feature in this set. Definition 1 [9].
The set of all ℓ informative vectors is denoted by Λ ℓ : We see that the cardinality of the set Λ ℓ equals to From (2) and (3) Taking into account that the task of determining the optimal IAS is usually associated with assessing the quality of classification, we will use the functional ‫)ߣ(ܫ‬ as an informativeness (effectiveness) criterion to select the optimal IAS from (5).
Definition 2. We say that a system (or IAS)ܺ| ఒ ∈ ܺ ஃ ℓ is optimal, if ∃ߣ ∈ Λ ℓ for which ‫)ߣ(ܫ‬ = extr ఓ∈ஃ ℓ ‫)ߤ(ܫ‬ is true. Then the problem of determining an informative attribute set can be reduced to an optimization problem Suppose the definition of an object, i.e. each feature of an object requires certain costs (technical, computational, time, etc.) and a fixed resource is allocated to determine the object.
If we take into account that each ߣ ∈ Λ ℓ vector uniquely determines a specific IAS, then (с, ߣ) will show how much a resource is required to get this IAS.
Can we get this informative attribute set? If (с, ߣ) ≤ ܿ , then yes. Now we can formulate a general mathematical statement of the problem of choosing a set of the most informative features with limited resources as follows: where ( * , * ) -scalar product of vectors. If it is not specifically known about objects and classes, except for the value of the features of the object, then we can assume the following: Hypothesis 1. If the training sample has the form (1), then the value of features between objects of the same class is more similar (close) than the value of different classes. Hypothesis 2. If hypothesis 1 is true, then the optimal IAS brings together objects of one class and separates objects of different classes better than others.
The coefficients ܽ , ܾ are independent of ߣ and are calculated in advance.
For the given functional, it is considered that the IAS is optimal if the functional values are greater. Among the main advantages of the functional, its relative simplicity should be highlighted.
On the other hand, the simplicity of the Fisher Functional acts as a disadvantage, since here one can "overlook" the complex nonlinear properties of the classes being analyzed. However, in favor of this functionality is the fact that simple quality criteria, as a rule, turn out to be more reliable, i.e. if not the most informative, then at least a fairly informative subsystem of attributes is distinguished. Conversely, complex criteria, which in most cases allow more informative subsystems, still allow you to select subsystems for which it is difficult to construct a decisive rule. Then (7) will have the following form ൞ ‫)ߣ(ܫ‬ = (,ఒ) (,ఒ) → ‫ݔܽ݉‬ ߣ ∈ Λ ℓ (с, ߣ) ≤ ܿ (9) Before proceeding to solution (9), we first determine the field of existence of the solution.

Discusion
Proposition 1. From the numerical sequence (11) the following are obtained: if ݂ ଵ > ܿ , then (9) has no solutions; if f ଵ ≤ c , then (9) has at least one solution; if ∃t > ℓ such that f ଵ − c ୨ భ + c ୨ ౪ ≤ c < f ଵ − c ୨ భ + c ୨ ౪శభ , then the features of the objects corresponding to the indices j ୲ାଵ , j ୲ାଶ , … , j will not participate in the IAS and should be excluded from further consideration, i.e. these features should be an exception from the attribute space.
By analogy [***] to solve problem (12), we introduce a vector function which indicates the direction of the fastest growth of the functional ‫(ܫ‬λ) at the point λ.
We formulate the considered method in the form of an algorithm: Step 1. Input parameters are the values: ℓthe required number of features; ܰ is the number of features; ܿ, ܾ, ܿ are N-dimensional vectors; ܿ is the allocated resource.
The following is a method for determining the measure of similarity (proximity) or the scattering measure for objects of the training sample.
Let the features of objects take binary (i.e. {0,1}) and/or or continuous (on the interval[ߙ, ߚ]) values. We consider both cases in more detail.

Conclusion
The main result of this work is to solve the scientific problem of choosing of informative attribute set, taking into account the resource in the problem of pattern recognition and data preprocessing. In the course of solving this problem, the following work was carried out: studied the main causes and types of big data problems; the main tasks of data preprocessing are given; shown the need of dimensionality reduction of the attribute space and the shortcomings of existing methods for the selection of IAS; an effective analytical method for determining IAS was developed and investigated taking into account the resource; an approach is proposed for determining the area of existence of a solution for the problem of choosing an IAS with limited resource; it is proved that the proposed method for determining the IAS guarantees the optimality of the results in the sense of maximizing the selected functional; the relevance of the choice of this type of informativeness criterion is substantiated; shown the universality of the method with respect to the type of features; given an algorithm for implementing this method.