Search for geophysical structures by their mathematical models and samples

When we analyze geophysical data, the task of searching for structures by their samples and mathematical models often appears. We propose to use deep neural networks (DNN) to search and detect the forms of geophysical structures. At the same time, both the structure samples themselves and the synthesized structure samples according to their mathematical models act as a training dataset. End-to-end demonstration examples of the highlighting of reflection traces from different layers of the ionosphere in the ionograms, as well as the highlighting of whistler forms in the VLF spectrograms are presented.


Introduction
In [1], questions of the functioning and construction of question-answering sensor systems are considered, which allow using question-answer agents to answer given types of natural language questions based on environmental monitoring data coming from sensor networks and existing data analysis systems. This paper discusses the search for geophysical structures (GS) in unstructured data. The search for the GS is an important step in the task of filling the ontology with the parameters of the extracted GS. The approaches to the search for GS considered in the work are based on deep learning (DL), which is currently one of the highest priority areas in the field of artificial intelligence and machine learning [2].
The authors previously successfully used DL to recognize whistlers in a spectrogram [3,4] and to highlight of reflection traces from different layers of the ionosphere in ionograms [5]. Summarizing the known and obtained results, two different categories of tasks that arise in the process of recognition of GS should be distinguished. The first task is to determine the fact of the presence and region of appearance of similar GS to samples in previously unanalyzed data. The second task is to highlight the forms of similar GS to samples in previously unanalyzed data. The second task is more complex and its solution also implies the solution of the first task. The second task can be assigned to the class of segmentation of objects in images. By the present time, scientists have developed different architectures for deep neural networks, which are used for object segmentation on images (for example, U-Net [6], Mask R-CNN [7], Deep Watershed Transform [8] etc.). The following main stages may be distinguished as one of the variants to solve the problem of 1  to define the architecture for deep neural network for training; 3. to define the loss function (object function); 4. to choose an optimizer which determines how the deep neural network will change under the loss function effect; 5. to determine the learning rate; 6. to determine the number of learning epochs and stop criterion; 7.to train the deep neural network; 8. to store the learned model of deep neural network (DNN) for the future use. It should be noted that one of the most laborious stages of the above is the creation of a training dataset (TD), which can be created either by manually extracting the desired geophysical structure (DGS), or by generating DGS and further overlaying them on possible basic geophysical environment, which do not contain DGS. The advantages of the manual allocation of DGS include the quality of the generated TD, and the disadvantages are the time spent on its creation. It should be noted that the required number of training examples depends on the type of DGS itself, its diversity, as well as the diversity of the state of the geophysical environment. In general, it should be noted that increasing in the diversity and number of training examples leads to an improvement in the quality of recognition of DGS forms, but it increases the training time of the DNN, and can also lead to an increase in hardware requirements (for example, the amount of RAM and video card memory) used for learning DNN [4]. DGS can be generated either by their mathematical models with variable parameters, or based on the use and modification of manually created DGS samples. Generative deep learning should be attributed to one of the most interesting approaches to the generation of modified images by samples ( Figure 1) [2]. Kaggle.com has a large number of training examples on recognizing objects in images. So, in the competition "Freesound Audio Tagging 2019" [9], you need to develop an algorithm to tag audio data automatically using a diverse vocabulary of 80 categories ( Figure 2).

Search for whistlers by their mathematical models and templates
The mechanism of whistler formation was suggested in the paper by Storey [10]. According to this theory, an atmospheric initiated by a lightning discharge propagates in the earth-ionosphere waveguide. However, some part of atmosperic energy may penetrate through the ionosphere and enter the magnetosphere. In heterogeneous and anisotropic magnetospheric plasma, the electromagnetic wave undergoes frequency dispersion. In the result, a pulse is transformed into a complex signal having acinaciform frequency-time characteristic determined by magnetic field line intensity.
In paper [3] algorithms for streaming whistler recognition are offered. In this paper we show in an extensible functional scheme of the system for whistlers recognition the place of algorithms for detailed analysis of whistlers and highlighting their forms. Based on a large number of whistlers recognized at various location points, we created a training set of whistler images and images without whistlers (including mistakenly recognized whistler images). Based on this training set of images we with the help of the Tesla K80 video card trained a deep neural network , which is based on the architecture of the convolution neural network ResNet-50. The neural network is currently used at the stage of detailed analysis of the whistler recognition results. The current trained neural network shows about 98.5% accuracy of classification on validation images, which not used at the learning stage. Two seconds duration test image of the spectrogram is recognized in 0.11 seconds on a computer without a video card with available two cores of the Intel (R) Xeon (R) CPU 2.30GHz processor. Figure 3 shows the epoch evolution in deep neural network learning for whistler recognition. To highlight whistler forms in VLF-spectrograms we like in paper [5] also used the architecture of deep neural network U-Net [6]. To create training dataset we do this steps: a) initially manually labeled whistlers in VLF-spectrograms and added them to a database of whistler labeled forms. Examples of this manually labeled whistlers are shown on Figure 4. b) generated whistler forms with the help of this well-known equation / t D f    [11], where the coefficient D is called the dispersion. All generated whistler forms we added the database of whistler labeled forms. Examples of black-and-white masks of generated whistlers are shown on Figure 5. Examples of generated whistlers (with cut off starting and / or ending parts) and placed to VLF-spectrograms are shown on Figure 6.  c) generated images by adding to the VLF spectrograms that do not contain whistlers modified forms of whistlers from the database of whistler labeled forms obtained in steps (a) and (b). Whistler forms were modified by randomly performing the following operations: stretching and / or compressing the whistler by time and / or frequency, shifting in frequency, deleting and / or changing the brightness of some of its points, cutting off the initial or final part of the whistler, drawing the shape of the nose frequency, etc.
After we created training dataset we trained U-Net DNN on Tesla K80 video card. Figure 7 illustrate examples of highlighting whistler forms in VLF-spectrograms with DNN. Fig. 7. Examples of highlighting whistler forms in VLF-spectrograms with DNN.

Search for traces of reflections from different layers of the ionosphere in ionograms
An outstanding feature of the new method proposed by the authors in [5] is the use of deep learning to recognize traces of reflections from different layers of the ionosphere. Deep neural network learning is realized on the basis of 45000 etalon markings created by operators. Currently, the architecture of deep neural network U-Net has been chosen to detect the reflection traces from different ionospheric layers. In order to recognize the reflection traces from ionospheric E, F1 and F2 layers, we trained a separate deep neural network (DNN) for each layer layer and used black-and-white masks marked by operators (Figures 8,9). We used dice-coefficient loss (DCL) function:    Figure 10 shows examples of highlighting of reflection traces from different layers of the ionosphere in ionograms. In each example, the left image is the ionogram marked by expert-operators and the right image is the ionogram marked successively by three deep neural networks. The applied colors: green -reflection trace from E layer; blue -trace reflection from F1; cyan -reflection trace from F2 layer.

Conclusion
On the basis of the obtained results of the analysis of real geophysical data, we can conclude about the effectiveness of the use of deep learning in the tasks of search for geophysical structures by their mathematical models and samples. We plan to develop a universal software module for searching for geophysical structures for the purpose of its use in question-answering sensor systems.