Holographic method of neural network design

. The paper presents a method for solving the problem of changing a full-connected neural network of direct propagation with a sigmoid activation function in the conditions of the emergence of a new class, with the ability to preserve the network to recognize already known classes and to classify objects of a new class. Such conditions generate a new neural network, which is trained on examples of all classes, including the new class. The learning process takes a long time and requires the selection of several parameters. The emergence of a new class in natural neural networks does not cause a transformation of the network structure, only the strength of connections between neurons changes. The network shows the properties of stability and plasticity at the same time. The authors draw attention to the analogy between neural networks and holograms in their ability to store information and form an image of a class in response to an input signal. Following the holographic analogy, the paper proposes a model of the wave nature of neural networks, which treats the network weights as a hologram and the input signal as a wave passing through a hologram. The construction of a new network is created with two neural networks, which are a combination of two holograms. The first hologram represents the original network, and the second is a new network with a similar structure, but it is trained to recognize one new class. The addition of the holograms of these neural networks implements the mechanisms of plasticity and stability in the model.


Introduction
Artificial neural networks have become a popular tool for solving many problems. Examples of neural networks show how they perform well in classification tasks on predefined sets of classes. Information systems face the emergence of new classes because of external factors or internal reasons. A neural network becomes incapable of differentiating a new class under such conditions. We face creating a new network capable of recognizing both already known classes and a new class.
The task is to create and train a new neural network, while training remains a very timeconsuming process with the selection of many empirical parameters. Parameters in full-link multilayer neural networks affect the learning time and quality. It is impossible to be sure in advance to achieve the expected result in a certain time.
Forming a network by constructive methods looks like a solution to the problem. They model the mechanisms that are involved in the formation of natural neural networks, providing plasticity and stability.
Existing constructive algorithms offer ways that, through the addition of neurons, create changed networks that can classify new images. Here, network training is required after each addition of neurons or neuron complexes. The disadvantage of such methods is the necessity of strictly controlling the appearance of an excessive number of neurons [1,2].
Constructive methods are suitable at the beginning of neural network formation when each new class can generate one or more new neurons in the network structure, which at the initial stages does not affect significantly the total number of neurons and training time. As the number of classes grows, the number of neurons and the number of connections between them increases. With an excessive number of neurons and many connections, the neural network loses its ability to generalize and reacts to noise [3]. While the existing number of neurons and network layers is enough to recognize more classes.
Observation of natural neural networks shows that the network structure changes not so often and dramatically. Only in early ontogenesis, when the active formation of a neuronal network is in progress, there is an addition of neurons to the network, formation of neuronal connections, and even their restructuring by breaking existing connections [4].
Functional plasticity appears in the mature neural network, associated with increased reactivity of neurons exposed to stimulation and functional change of connections [5]. Prolonged exposure of neurons during the learning process leads to changes in their metabolic activity and synaptic membrane permeability, actually changing the strength of connections.
So, learning of an already formed natural neural network does not lead to the addition of new neurons. There is not a quantitative, but a qualitative transformation of a neural network because of changes in connections between neurons [6,7].
According to neurophysiological observations of D. Hebb, if neurons on both sides of a synapse are activated simultaneously and regularly, the strength of synaptic communication increases [8]. An important feature of this rule is that the change in synaptic weight depends only on the activity of the neurons connected by the synapse.
Connections between neurons do not exist in isolation. O.V. Kuznetsov noted holographic properties of neural networks in the example of the human brain: distribution of information in the network and its preservation [17]. G. Shepherd noted that natural neural networks show stability through the replacement of damaged nerve pathways by competing for the possession of synaptic regions [18]. The systemic properties of the entire neural network compensate for the poor reliability of individual connections and neurons, ensuring stability.
The properties of plasticity and stability of neural networks lead to the analogy with holographic images. Works [9][10][11][12][13][14] put forward a hypothesis about the similarity of information processes in optical holography with information processes in the brain.
Nowadays, physics deals not only with optical holograms. The common feature of holograms is their wave nature. Any hologram is a wave pattern recorded in some medium, corresponding to the interference of coherent waves having a common source. Modern studies admit the existence of more complex holograms formed by sources of nonmonochromatic and incoherent waves.
Works [15,16] show that we can achieve switching between spontaneous activity patterns on neuronal cultures, showing long-term stability of activity under external electrical stimulation. Without changing the structure of the network, we can achieve a new behavior with new knowledge without traditional training procedures. You can transfer new information into the neural network, despite its holographic stability when it is externally stimulated, to achieve a new behavior with new knowledge without traditional training procedures.
The problem is to add a new class to the network with minimal training, so as not to train all classes again. The optimal solution is to create and train a new neural network with one class, and then combine it with the existing neural network.

Materials and methods
It follows from the holographic nature of neural networks: an effect on only one connection will lead to a change in the properties of the entire network to some extent. As in holograms, only significant interventions in connections will cause qualitative changes.
We can consider a neural network (its weights and structure) as a hologram, having made some simplifications. We assume coherent waves form a hologram, and the thickness of the hologram registration medium will be negligibly small. Then the input signal of the neural network passing through it will generate an image of the class corresponding to it, like a wave restoring an image from the holographic plate.
Having two neural networks 1 and 2 with the same structure, differing only by weight coefficients of connections and trained to recognize different classes, we will consider them as holograms.
The first neural network is trained on examples of the set 1 , partitioned into a set of classes С 1 , the second one is trained on 2 , partitioned into classes С 2 . In the original problem statement, the power m(С 2 )=1.
If you combine two holograms in space and apply different signals to them, the classes corresponding to the signals will be reconstructed. The signals 1 ∈ 1 on hologram 1 will recreate the corresponding с 1 ⊂ С 1 , 2 ∈ 2 on 2 will recreate с 2 ⊂ С 2 . Since the signals will hit both holograms, they will distort the result: either amplifying it or reducing it, i.e. intervening connections of the two neural networks.
Since the conditions of hologram formation and its orientation in space are unknown, we can only suppose that refine transformations should apply to one hologram for better alignment. Let us choose one network as the base one -1 , the second one as a changing one -2 .
Let us assume that space is Euclidean and the rotation in space does not exceed . Thus the hologram will change: the phase difference fixed on the plate will change by ∆ at each point of the plate (in general each class will have its ∆ , but since m(С 2 )=1, we can assume that ∆ for 2 is one).
To perform the hologram addition, it is necessary to change from the amplitude-frequency representation to the frequency-phase representation. We use the fast Fourier transform for the discrete signal, which allows us to avoid increasing the complexity of the algorithm to ( 2 ) and keep it at ( ). For the 1 neural network with weight coefficients 1 ( = 1. . , = 1. . ) we obtain the weight coefficients: where is the fast Fourier transform. Neural network 2 пafter the transformation will have weighting coefficients: 2 ′ = ( 11 2 , … , 2 , … , 2 ) + ∆ , ( = 1. . , = 1. . ) By adding the transformed weights of neural networks 1 and 2 , we get the weights of the new neural network 3 : The weight coefficients of 3 ′ in the inverse Fourier transform are complex numbers, so we can apply the Hartley transform to get the real coefficients of the 3 neural network Thus, we achieve not only the maximum 3 , but also guarantee the balance of the neural network in the recognition of classes from the sets С 1 and С 2 . Depending on the problem, we can redefine the target function by relaxing the condition | 1 − 2 | → 0. Maximizing 3 without this condition does not guarantee that 3 will recognize classes С 1 ∪ С 2 equally well.
It is possible, for example, to require that the network keeps the ability to recognize С 1 classes in full and in part, but with a high percentage of new classes. Or on the contrary, we can simulate the process of "forgetting" by reducing the requirements for 1 and requiring a maximum for 2 . Thus, not only the maximum 3 is achieved, but also the balance of the neural network in the recognition of classes from the sets С 1 and С 2 is guaranteed. Of course, depending on the problem, it is possible to redefine the target function by relaxing the condition | 1 − 2 | → 0. Maximizing 3 without this condition does not guarantee that 3 will recognize classes С 1 ∪ С 2 equally well.
Networks are trained to recognize different classes have different weights with the same structure. Networks trained on only a class have weakly expressed connections, approximately from the same range. Neural networks trained on more classes have more pronounced connections than those trained on one class, Table 1. To stress the force of the neural network's connections, the forces 2 in its connections can be scaled by the connections 1 , replacing (2) with 2 ′ = ( 11 2 , … , 2 , … , 2 ) + ∆ , ( = 1. . , = 1. . ) (6) We experimentally tested the method on a set of handwritten digits from the MNIST database. Neural networks were trained on a different number of classes of handwritten digits, and then a new digit class was added to them. Both neural networks 1 and 2 were created with the same structures, and the number of outputs in both networks was set equal to ( 1 ∪ 2 ). Then 1 was trained to recognize 1 classes, and 2 a new class, after which we applied a holographic method to the networks.

Results and discussion
The proposed method depends on two parameters: the scaling factor k and the rotation angle ∆ .
Different values of 1 and 2 can be achieved with different ∆ , for example, when adding networks capable of recognizing only one class, we get a result similar to Fig. 1.   Fig. 1. Graphs of the function 1 , 2 , 3 : 1 -recognize digit 1, 2 -recognize digit 2, 3recognize digits 1 and 2.
The graph shows that for the case m(С 1 )=1, there is a ∆ at which 3 reaches a maximum close to 100%. The experiment showed that if the original networks can recognize their classes with an accuracy close to 100%, the resulting network also shows an accuracy close to 100%.
We note that the functions 1 , 2 , 3 have a fractal form because of the wave nature of the model (Figures 1 and 2). The optimal value of 3 in fragments may vary, but there is a limiting value of 3 under optimality conditions (5).
Therefore, we can search for ∆ within a single fragment of self-similar parts of the graph. The difficulty of the search is related to the fractality of the target function. If you use a brute-force algorithm, choose the step change of ∆ as small as possible. This does not guarantee that even with a small step of ∆ , you will hit the local maximum 3 on the fragment.
To get the best result, we can scale connections of the neural network 2 by connections of the neural network 1 with the coefficient , chosen empirically.

∆ ∆
Because of the holographic nature of the neural network, scaling in some limits does not affect the ability of the neural network to recognize classes known to it. If all weight coefficients are scaled simultaneously with some coefficient , it will change the slope of the neuron sigmoid. It will not significantly affect the network with small values of weights, but when weights are added to weights of the network with more significant coefficients, it will help to strengthen the effect on the result. The scaling factor has no significant effect on the optimal value in the sense of (5), only changing the conditions for finding this value for the set ∆ ( ∈ ).
The choice of the basic and changing neural network is important, if (С 1 ) = (С 2 ) = 1, then we choose the basic function experimentally. An incorrectly chosen basis function gives 1 = = 0 or 2 = = 0. In this case, we should swap the basic and modifying neural networks.
Searching for the optimum depending on the problem (the number of classes 1 and their peculiarities). The method demonstrates the ability of the neural network to store the old knowledge and accept the new at the same time.
We can apply this method to speed up the learning process by training a neural network quickly on a single class, and then using it to change an existing network trained on multiple classes. The resulting network can be pre-trained, which is faster than creating and training a new network.
The question of the fractal nature of the target function and determination of conditions for finding its maximum by a minimum enumeration of ∆ values, the possibility of introducing additional parameters into the model that increases the efficiency of the method requires further research.