Algorithms and results of streaming whistler recognition

. Algorithms for streaming whistler recognition are offered. Different stages of algorithms are considered. The developed algorithms are used on a mini-computer software and hardware complexes for monitoring very low-frequency electromagnetic radiation at the as well as recognition of whistlers is performed on the basis of open access data abelian.org station Our whistler recognition results allowed us to establish on the days of strong whistler activity, there is an average positive correlation between the number of whistlers registered at Karymshino station in minutes of a day and the number of lightning strokes registered by WWLLN global network in minutes of a day in the coordinate rectangle LAT 25S-45S,


Introduction
In paper [1] the new software-hardware complex «Sensor signal analysis network» (SSAN) for construction and functioning of sensor networks for distributed analysis of time synchronized signals is described. On the basis of the SSAN complex in November 2017 the synchronous registration of atmospherics and whistlers at the radio physical observation stations Karymshino (Kamchatka Krai, Russia) and Oybenkel (Yakutsk, the Republic of Sakha (Yakutia), Russia) began in the operational mode [2]. At each point, a vertical asymmetrical electric dipole as a capacitive sensor is used to register electromagnetic radiation. Signals after amplification are fed to the input of a sound card with a sampling frequency of 192 kHz per channel, which made it possible to see in the spectrograms frequencies of up to 96 kHz and that in turn allowed the registration of big ammount of high-frequency whistlers at the Karymshino station (examples in Figure 1). In the SSAN complex, the time reference is made using the GPS module (NMEA protocol and PPS signal). Nodes of SSAN network allow the user to run custom signal analysis programs and to create archives of recognized events, source signals and their spectrograms [1].
With the support of the VarSITI grant, a database of atmospherics and whistlers registered in the Far East of Russia was created [3]. The database was obtained by recognizing events in both archived WAV files and by streaming event recognition using the SSAN complex in near real time mode.
The developed algorithms are also used by the authors for streaming whistler recognition based on the open access data of the VLF station network (abelian.org) in The work is devoted to the description of the developed algorithms and some obtained results obtained on the basis of the application of whistlers recognition algorithms.

Existing algorithms for whistlers recognition
At the present time, various algorithms for whistlers recognition [4-7, etc.] have been developed, which distinguish various properties of the presence of whistlers in spectrogram. In paper [4] the authors propose the algorithm developed to determine plasmaspheric electron density measurements from whistler traces, based on a Virtual (Whistler) Trace Transformation, using a 2-D fast Fourier transform transformation.
The abelian.org website allows to download the open source program vtevent [5], which uses a combination of principal component analysis and Hough transform to detect VLF signals having reasonably narrow bandwidth along with rising or falling frequency. Falling frequency signals are identified as whistlers if they conform to the expected shape of a whistler curve as defined by the low frequency Eckersley approximation. A dispersion range of 12.2 to 114 is examined by the Hough transform.
In the algorithm of [6] the following 4 main stages of whistler recognition are distinguished: 1. median filtering (the authors consider the problem of whistlers recognizing from the point of view of recognizing a graphic image and suppressing extended impulse noise in images suggest using median filtering); 2. selection of significant samples for each column of the filtered spectrogram (the squares of the column elements are considered as the Schuster periodogram of the corresponding fragment of the VLF signal, a check is performed at the significance level α of the hypothesis that this fragment of the signal is white noise); 3. transformation of coordinates for the purpose of straightening the image of the whistler (for the whistler that came at the time τ (the form of the whistler is described by then in the coordinate plane (t, s) the whistler image is straightened); 4. recognition of the "straightened" whistler (the whistler search problem is reduced to the problem of finding a set of points near the inclined line (or lines) in the (t, s) plane).
In paper [7], an algorithm for automatic recognition of whistlers is proposed, based on the using of the connected component labeling (CCL) method. During the search for whistlers on a spectrogram, the authors remove noise by a number of image processing methods (median filtering, adaptive thresholding and opening methods). After removing the noise, the authors apply the CCL method, which is used to detect the waves pattern in whistler waves spectrogram image. The authors use the CCL method to scan all pixels of the spectrogram image from the upper left to the lower right. Connected authors call the pixels that share the same amount of intensity values and is located closed to each other. After all the сonnected groups of pixels are highlighted in the image then the groups of pixel can be tagged or marked to distinct one group with others. In the CCL method used, the authors select 2 stages: at the first stage, each pixel is assigned a label in accordance with the four rules (where A(x,y) is the value of pixel intensity of the location (x,y) from image A) [7]:

If both the upper neighbor A(i-1,j) and left neighbor A(i,j-1) of the object pixel A(i, j)
have the same label X , then assign label X to A(i,j).

If either the upper neighbor A(i-1,j) or the left neighbor A(i,j-1) of the object pixel A(i,j)
has the label X, then assign label X to A(i, j). 3. If the upper neighbor A(i-1,j) has label X and left neighbor A(i,j-1) of the object pixel P(i,j) has a different label Y (i.e., X !=Y) then assign label X to A(i,j). Enter X and Y in an equivalence

Functional scheme and new algorithms for whistlers recognition
In order to use the various classification characteristics of the presence in the spectrogram of whistlers and different recognition algorithms, in Figure 2 an extensible functional scheme of the system for whistlers recognition is proposed.  • tables TLTRD and TRTLD of matching the group size to allowable thresholds of the percentage of the number of group points falling on the covered distance among the total number of group points.
Step 1. Create two-dimensional array Sl and write into it the logarithmically normalized values of the signal spectrogram (Figure 3). Step 2. Set to zero all values of the spectrogram array Sl having frequencies above the given threshold Fp (Figure 4(step 2)).
Step 3. Calculate average values of array Sl along the frequencies axis and write them into array Mavg1. For each element of array Sl(i,j)>Kr*Mavg1(j), for allowable values of its indices (j+k), check the condition Sl(i,j)>Kb*Sl(i,j+k) for each k∈I, and if the condition does not hold for at least one k∈I , then set Sl(i,j)=Mavg1(j) (Figure 4(step 3)).
Step 7. In each group ∈ G, calculate the number Kdifx of points with different values at the X-axis, and if Kdifx<Pdifx, then remove group group from G (Figure 4(step 7)).
Step 8. In each group ∈ G, using the method of least squares, calculate the approximating straight line y=k*x+b. Remove all groups group∈G having the slope coefficients k>kmax or k<kmin. Recursively join the groups considering parameters maxDifX, kAbsDif, interval [difY1; difY2] etc. (Figure 4(step 8)).
Step 10. Using dynamic programming, calculate the maximal sum SumLTRD of the number of group points on the covered distance, allowing movement on group points right or down from the upper left corner of the rectangle into the lower right corner of the rectangle of the group, and calculate the maximal sum SumRTLD of the number of group points on the covered distance allowing movement on group points left or down from the upper right corner of the rectangle into the lower left corner of the rectangle of the group.
Calculate percentages percentLTRD and percentRTLD of the number of group points on the covered distance among the total number of group points. Using the tables TLTRD and TRTLD, group sizes and obtained values percentLTRD and percentRTLD, filter the groups.
The rest groups of points (logarithmic scale by frequency) are labeled as whistler points (Figure 4 (step 10)) and must be checked according to the functional scheme ( Figure 2) by algorithms of detailed analysis and recognition of whistlers, as well as algorithms of the entire whistler's shape determination.
We should note the high speed of work of the proposed algorithm. For instance, a 15-minute WAV file with discretization frequency 44.1 kHz is analyzed in 7 seconds Based on a large number of whistlers recognized at various location points, we created a training set of whistler images and images without whistlers (including mistakenly recognized whistler images). Based on this training set of images we with the help of the Tesla K80 video card trained a deep neural network , which is based on the architecture of the convolution neural network ResNet-50. The neural network is currently used at the stage of detailed analysis of the whistler recognition results (Figure 2). The current trained neural network shows about 98% accuracy of classification on validation images, which not used at the learning stage. Two seconds duration test image of the spectrogram is recognized in 0.11 seconds on a computer without a video card with available two cores of the Intel (R) Xeon (R) CPU 2.30GHz processor.
Further research directions on the use of deep neural networks for the recognition of whistlers involves working on the selection of a suitable neural network architecture, the expansion of the training set, the use of a neural network to determine the whistler's shape entirely (not only its tail part), etc.

Conclusion
On the basis of the proposed algorithm, we created a historical database of whistlers registered with the help of a magnetic East-West antenna at the Karymshino station in Kamchatka and an automatically updated database of registered whistlers on mini-computer software and hardware complexes for distributed monitoring of low-frequency electromagnetic radiation [3]. The most correct and complete source WAV-files in the historical database are presented for 2012-2013 years.

2001
The results of the whistler recognition for 2012-2013 years allowed us to establish on the days of strong whistler activity, there is an average positive correlation between the number of whistlers registered at Karymshino station and the number of lightning strokes registered by WWLLN global network in the coordinate rectangle LAT 25S-45S, LON 140E-160E (Australia). Table 1 and some selected figures (12-13) shows the results of the Spearman correlation analysis described above by the minutes and hours of the day on days with the number of registered whistlers at the Karymshino station over 6,000.