Parallel adaptive sparse approximation methods for analysis of geoacoustic pulses

The article is devoted to a new approach in the analysis of geoacoustic pulses. The authors proposed a mathematical model based on a sparse representation of the signal. An adaptive matching pursuit method has been developed to identify model parameters. A parallel implementation of this algorithm is proposed on the CUDA platform. This allows real-time processing and modeling of signals.


Introduction
The passive acoustic emission method is widely used to study of the strength of materials, in geophysics and to study of the seismic process.It is based on the study of the characteristics of acoustic emission pulses that arise as a result of plastic deformation of solid media.Researches in Kamchatka shows that the acoustic emission method is effective in monitoring acoustic disturbances occurring in the sound range during activation of sedimentary rocks deformation at the final stage in the earthquake preparation.Such emission signals are called geoacoustic emission (GAE), and their disturbances are often used as operational earthquake precursors.
Typical GAE signal represents the succession of relaxation pulses of different form, amplitude and frequency.The characteristics of pulsed radiation depend on the properties of the studied plastic processes.Geoacoustic pulses are characterized by a complex internal structure, a wide variety of time-domain waveforms, large amount of natural and technogeneous noise, short duration.These features significantly complicate the estimation of characteristics and analysis of signal structure.The classical time-frequency methods used to analyze similar signals in other research fields do not give good results.A short duration leads to a poor time-frequency resolution and the redundancy of the resulting decomposition does not agree with the additive character of the pulse generation process because the pulse is the sum of elementary pulses corresponding to separate deformations [1].
It was necessary to develop a model that makes possible qualitative analysis of geoacoustic signals.The term "model of an isolated pulse" should be regarded as a linear combination of functions that satisfies the following properties: 1) functions (model components) should describe elementary pulses; 2) model must have high accuracy and do not contain a redundant number of components.Due to the complex structure of the signals the methods used to identify the model parameters have the property of adaptability and adapt to the local features of the signals and allow for real-time processing.

Sparse approximation and matching pursuit algorithm
To a first approximation the geoacoustic signal model may be described as the sum of elementary GAE pulses and interference The model identification method is based on the sparse approximation ideas.At first, a sparse representation is constructed over a large, in general case, linearly dependent set of different functions.It's substantially increases the adaptability of the methods.At second, sparse representations are compact and not redundant.Finally, the sparse approximation methods were effective in solving similar problems in related fields of science.
The sparse approximation problem is reduced to the signal representation in the linear combination form of the minimum possible number of functions for a given accuracy from some predefined set (hereinafter, the dictionary).
There is no algorithm that can solve this problem in polynomial time, but there are algorithms that can get an approximate solution.The two popular approaches to solving the sparse approximation problem are to replace the minimization of the l 0 -norm by minimization the l 1 -norm (Basis Pursuit, [2]) and replacing the minimization of the l 0 -norm by minimization the l 2 -norm (Matching Pursuit, [3]).Each of these approaches gives an estimate of the signal sparse representation on some dictionary {g Dictionary A series of experiments was performed on real geoacoustic signals (the experiment is described in) to select the most suitable algorithm.The best results for the processing time and the accuracy of the obtained sparse representations were demonstrated by the Matching Pursuit (MP) algorithm proposed by S.Mallat and Z.Zhang.
The algorithm is reduced to an iterative process of searching for dictionary elements that minimize the difference between the original function and the approximation at each step.
The classic MP algorithm allows to construct linear combinations as (1), satisfying the definition of the model, but has several disadvantages.The only way to improve the accuracy of the constructed approximations is the addition to the dictionary of new functions (atoms) that increases temporal and spatial costs (computing complexity of the MP is O(M 3 )).The need to use a dictionary of fixed size leads to "coarse" discretization in space of model parameters and reduces the adaptive property of the algorithm.
The authors proposed an Adaptive Matching Pursuit algorithm which allow under conditions of limited computing resources to obtain qualitative signal approximation and adapt to the particular features of the signals in [4].This algorithm is chosen as a method for identifying model (1) parameters.

Adaptive Matching Pursuit
The iterative refinement procedure was added to the classical MP algorithm to improve the quality of the approximation for the dictionary of fixed size.It lies in applying numerical method of alternating-variable descent type for searching atom with the highest value of the scalar product c max with the signal (hereinafter, the "best" atom).
1) to determine the scalar product of each atom and all its shifts with the signal, to remember the parameters of the "best" atom, p 0 , and the value of the scalar product 0 max с ; 2) to choose a vector λ containing the steps lengths for each of the vector parameters p 0 (it's recommended to select half of the parameter sampling interval as a step λ i for the parameter p i ); 3) k=1; 4) to calculate the scalar products of each atom obtained by a combination of parameters from vectors p k-1 -λ, p k-1 , p k-1 + λ and all its shifts with the signal, to determine the "best" atom, to remember its parameters p k and the value of the scalar product k с max ; 5) if p k = p k-1 then λ = 0.5 λ; 6) if the required accuracy is not achieved , then k=k+1 and go to step 4 else STOP.
A stopping criterion was added to the algorithm together with the refining procedure.This criterion allows to reduce the experiment temporal cost because the algorithm "decides" itself when to stop instead of calculating the maximum number of iterations for each test signal.It reduces spatial costs.Significant atoms are remembered for each decomposition instead of storing a fixed number of atoms.And also the criterion improves the adaptive property of the algorithm.The number of atoms included in the decomposition directly depends on the complexity of the signal structure.
The value 5-10% of error level was chosen empirically.The required accuracy of approximation for geoacoustic signals containing separate pulse is achieved.
The developed algorithm better adapts to local signal features and is more resistant to noise compared with the classical MP algorithm.Figure 1 shows the dependence graphs of approximation error on l 0 -norm for MP and AMP algorithms.

Choice of dictionary
The second important task in modeling geoacoustic signals is the choice of a basic dictionary for research.The authors decided to use a combined dictionary consisting of different functions.The choice of the dictionary is devoted to the works [5,6,7].
First of all, Gaussian modulated functions having the less area of Heisenberg box and widely used in audio signal description models were included in the dictionary.This function can be described in the form (5) where A is the amplitude chosen in such a way that the atom is normalized ||g(t)||=1, A>0; T end is the atom length; B = B lim • ∆ is the parameter affecting the attenuation of the envelope curve, B>0; B lim is the critical value of parameter В, calculated in respect of T end so that the pulse amplitude at the domain boundaries is not more than 5% of the maximum value f is fill frequency in Hz; ∆ is the gain of parameter B in respect of the critical value (Figure 2).The model should be composed of signals describing typical geoacoustic impulses by the above definition.Berlage pulse used in the modeling of seismic processes well describes the temporal form of the geoacoustic pulse from the different parametric equations (Fig. 2).
Berlage pulse may be described as following analytic expression where A is the amplitude chosen in such a way that the atom is normalized ||g(t)||=1, A>0; T end -is the atom length; t max is position of the envelope maximum value; n = n lim • ∆ is the parameter affecting the attenuation of the envelope curve, n>0; n lim is the critical value of parameter n, calculated in respect of T end and t max so that the pulse amplitude at the domain boundaries is not more than 5% of the maximum value f is fill frequency in Hz; ∆ is the gain of parameter n in respect of the critical value (Figure 2).The most probable ranges of values for the parameters p i (T end , t max , ∆ и f) for the proposed functions

Model of geoacoustic signal
On the basis of the above, detailed requirements regarding GAE signal model (1) were formulated by authors: 1) functions g i belong to the linear normalized space L 2 (R); 2) the system of functions is redundant; 3) g i are time shifted, modulated Gauss and Berlage pulses.Every atom is uniquely identified by and set of parameters p: frequency f; parameters influencing the pulse envelope shape: the type of function, n, B; 4) the atoms are normalized where g m are the atoms approximating the pulse, g n are the atoms approximating the parasitic component of pulse.Value N 1 характеризует characterizes the complexity of the pulse structure, N 2 is the noise level of the pulse.The difference between the original function and the approximation error R N determines the goodness of decomposition to the signal.
The coefficients α, β and the parameters of the functions p are identified by the Adaptive Matching Pursuit algorithm.Figure 3 shows the modeling of the real geoacoustic pulse.The registration system records geoacoustic signals with a frequency 48 kHz in real time and stores them in 15-minute WAVE files.A large amount of computing resources is required to analyze the anomalies of a single file.At the first stage, the researcher must detect signal sections containing pulses, but it's complicated by the short length and high noise level of pulses.Next, each detected segment is analyzed using the AMP algorithm.
The automated system for processing GAE signals was developed to automate and accelerate the analysis process.It consists of two parts 1) pulse detection and elimination of not informative signal segments; 2) parallel realization of the algorithm AMP.

Pulse detection
It was decided to apply the threshold method for automatic pulse detection in a signal.The pulse is registered when the signal exceeds some predetermined threshold value.
Most often, the detection threshold is a constant value and it is determined empirically.However, the high risk of losing informative data in this case: impulses below the specified threshold are not registered.But with a low thresholds and a strong noisy signal, a large amount of non-informative data can be extracted.An effective solution to this problem is using of an adaptive threshold, which automatically adjusts to the current noise level in the signal.The mean square deviation is used as the pulse detection parameter.Its calculation is carried out in non-intersecting windows of fixed length.
where s thk -threshold value in signal segment from k•n to k•n + n -1, x i -discrete signal value at the i sample, μ k -average signal in the window from k•(n-1) to k•n -1, n -window length, A -experimentally determined parameter.
The experiments proved that using of the adaptive threshold function makes it possible to detect single pulses in signals with different degrees of noisiness.However, when pulses are frequently followed at the signal segment, the threshold values can increase sharply, resulting in a loss of informative data (Figure 4a).
Therefore, it is suggested to not include signal sections containing a pulse in the process of calculating the threshold function (Figure 4b).

Parallel realization of adaptive matching pursuit algorithm
At the next stage, the system analyzes each detected pulse using the AMP algorithm.The most practical and cost-effective way to increase computing capability is the organization of parallel computing [8].
Let there be a dictionary consisting of M atoms with length of L g samples, and a signal with length of L samples.The scalar product between Signal and each Dictionary line is calculated and written into a C matrix of dimension M×(L+L g -1) to determine the "best" dictionary atom (Figure 5): The "best" atom is refined k times and stored in the output file.Figure 6 shows the scheme of the refinement procedure.The costliest part of the algorithm is the calculation of scalar products of dictionary atoms with the signal: matrix calculation time is more than 90% of the total algorithm execution time.
The same type processing of a large amount of information makes it possible to apply data parallelism to this part of algorithm.Each subtask calculates one element of the covariance matrix, depending on the input parameters i and j.The proposed algorithm decomposition is suitable for efficient execution in the style of the architecture SIMD (Single Instruction stream / Multiple Data stream), that allows to perform the same arithmetic operation on multiple data points simultaneously.One of the most popular technologies based on the SIMD concept is the software and hardware platform CUDA, used to organize parallel computing on graphics processors (GPU) [8].The main part of the algorithm is executed on the central processor (CPU), but the costly process of calculating the covariance matrix is sent to GPU (Figure 7).

Fig. 4 .
Fig. 4. Using an adaptive threshold to pulse detection in a signal