A martingale-based temporal analysis of pre-earthquake anomalies at Jiuzhaigou, China, in the period of 2009-2018

Many studies on the relationship between outgoing longwave radiation (OLR) data observed by National Oceanic and Atmospheric Administration Satellites (NOAA) with strong earthquakes are based on a certain zone or a couple of earthquakes. But it's hard to know if the algorithm works by looking at just one earthquake. In this paper, after algorithm analysis of OLR signals based on martingale theory during the 10 years from 2009 to 2018, a time series analysis of re-anomaly screening for the 10 years' data is proposed. The experimental results show that this method can be more effective in the statistical analysis of historical data and further improve the reliability of prediction.


Introduction
The most of the cause of earthquakes is the rupture of a faults zone [1], there are many earthquakes occurred just because of the area on the fault-melt, and the thermal omen is observed at the same time [2]. The fault zone showing rapid shear heating of fault gouge might preserve stress magnitude accommodated the thermal pressurization during the earthquake [3]. Scholars have proposed that 2017 Jiuzhaigou earthquake is delayed by the Wenchuan earthquake and the former event possibly results from the south-eastward movement of the Bayankala block, which in turn arises from the collision of Indian-Australian Plate and the Eurasian Plate [4]. The Jiuzhaigou earthquake occurred in the region among the eastern segment of the East Kunlun, Minjiang, and Huya faults.
Studies show that there are thermal anomalies and the change in electrical state [2], [5] and co-seismic slip and fault [6], [7] when an earthquake occurs. The study of the infrared data of seismic satellites in different periods around the world shows that many large areas of infrared radiation data are abnormally warming before strong earthquakes [8]- [11].
Many artificial intelligence analysis methods have found anomalies in the thermal infrared data before the earthquake, which include data mining, machine learning, and pattern recognition. The log-likelihood method can be used to detect changes in multi-dimensional data sets, Kuncheva [12] gave a log-likelihood justification for Kullback-Leibler (K-L) distance and Hotelling's Tsquare test for equal means, which use a single round of k-means clustering instead of density approximation, Song et al. [13] defined a statistical test call density test to deciding if the observed data points are sampled from the underlying distribution. Ho and Wechsler [14], [15] proposed a martingale framework for detecting changes in exchangeable data streams, which showed that it can be effective in unlabelled data streams. Severo and Gama propose to compose a regression predictor, a Kalman filter and Cumulative Sum of Recursive Residual (CUSUM) to detected changes [16]. The predictor can continuously detect the error of the regression model, whose significant increase is taken for an abnormal change.
A Bayesian approach for earthquake forecasting model was put forward by Marzocchi et al. [17].
Other methods involving analysis of thermal properties and statistical analysis of standard deviation have been proposed to find anomalies in the OLR data before earthquake [6], [18]. Chakraborty et al. studied the effects of seismic events on OLR in the M7.3 Nepal earthquake that occurred on May 12, 2015, and a previous large earthquake in on Apri25, 2015 [19]. They used the method of Eddy field calculation mean to find the anomaly and the relation with the tectonic plate boundary. Xiong et al. utilized a data mining technique based on wavelet transform to analyse the anomalies of longwave emission radiation (OLR) data; they propose that there is a certain correlation between the abnormal information found and earthquakes [20]. The OLR, from several decades, provides a significant pattern of transient anomalies preceding earthquakes [6], [7], [19], [21]- [23]. The OLR anomaly corresponds to a large area of ground coverage and coincides with the main epicentral zone [6], and primary was used to research Earth radiative budget and climate. An increase in radiation and a transient change in OLR were proposed to be related to thermodynamic processes in the atmosphere over seismically active regions [21]. The fault may create huge thermal energy which captured by NOAA and converted to OLR data. Based on the previous studies, in this paper, we take the OLR data of Jiuzhaigou from 2008 to 2019 as an example to study the abnormal results obtained by martingale theory algorithm, and conduct further analysis of time series to discover the relationship between time-series data changes and earthquake occurrence. We proposed a method which called the Martingale and SEA Algorithm (MSA) algorithm base on our previous work, which is reference Martingale theory [15], [24] and the Geometric Moving Average (GMA) method. The algorithm obtains the abnormal from OLR raw dataset.
The earthquake preparation process may start 1-30 days ago, Event. These multidisciplinary data provide a synopsis of the atmosphere/ionospheric variations related to tectonic activity. People found before an earthquake, there are some abnormal occurs, but they are instability and hard to find.
After the abnormal signal is extracted, it needs to be further analysed. Generally, the occurrence of an earthquake has a preparation process may start 1-30 days ago [19], [25]. To validate their result and to find out those characteristics of OLR in other aspects, we apply some statistical studies based on the data of Jiuzhaigou, China, 2009-2018 in this study, and we were more focus on the Ms6.5 earthquake at Aug 8th, 2017 to conduct the time series analysis.

Data Source and Pre-processor 2.1 Data Source
Outgoing Longwave Radiation (OLR) measured at the top of the atmosphere [21], which are not directly measured from the raw data using a separate algorithm [26]. In this paper, the OLR dataset used was from National Oceanic and Atmospheric Administration satellites (ftp://ftp.cpc.necp.noaa.gov/precip/noaa18_olr), which provides two data sets each day, one containing the OLR data for the "afternoon" satellite (i.e. equator crossing time 1430~0230 LST, currently "NOAA-14") and one day set for the "morning" satellite (i.e. equator time 0630~1830 currently "NOAA-15"). The data are stored in 144 ×72 arrays in ASCII, and each value represents the OLR flux on a 2.5 degrees latitude/longitude (units are W/m 2 ). As shown in Fig. 1, according to the data format, we divide the Jiuzhaigou fault zone into 9 grid areas, where each area indicated 2.5 º latitude×2.5 º longitudes. The five-point start in the middle expressed the epicentre of the earthquake on August8, 2017 in Jiuzhaigou.
We have studied OLR signals in Jiuzhaigou region from 2009 to 2018 and conducted time series analysis based on martingale theory algorithm in our previous research. The information about these earthquakes is from the website of U.S. Geological Survey, which is part of the National Earthquake Hazards Reduction Program (NEHRP) led by the National Institute of Standards and Technology (NIST).
(https://earthquake.usgs.gov/earthquakes/) The epicentre of the Jiuzhaigou earthquake was 33.193°N 103.855°E, corresponding to the grid 5th, the longitude was between 102.605°E and 105.105°E , and the dimensions were 34.443°N and 31.943°N . No. 5 in table 1 lists the information about this earthquake.  Table 1 lists 12 seismic events that occurred in grid 5 ( Fig. 1) between April and August from 2008 to 2019. This list of seismic events is filtered by the rule that if there are several earthquakes event on the same day, left the only one which is the highest magnitude and the shallowest depth. For example, on August 8 and 9, 2018, there were 5 earthquakes and 6 earthquakes respectively, and we chose one of each according to the guidelines. We chose those earthquakes which have occurred in the period from April to August annually. After processing by the martingale theory algorithm, the year-end data is vulnerable to other data during the year, and some nodes at the beginning of the year are used as the starting point for the algorithm.

Data Pre-processing
Before we start the experiment, we need to process the data first. NOAA data is stored in a file-per-day manner, which is relatively complete and does not require cleaning, but sometimes it is missing. First, the following strategies are adopted for data normalization: 1) Three days of data loss were found in the data of 10 years, they are 20080514, 20080515 and 20090509. We copied the data from the previous day to supplement the complete data. 2) About the in OLR set, the missing values are identified as "-9999". This situation is rare, usually one or a couple day, so we replaced it with the previous day's data. 3) Concerning leap years, we uniformly normalized the deletion on February 29. After the processing of the above steps, the 10-years OLR database studied is a relatively complete and regular time series with 365 data points per year. Secondly, OLR data are generally superposition of several signals: weather effect, diurnal temperature difference, artificial noises, or some underground activities. To minimize the impact of data noise, we utilized the night time dataset. The third is to grid the earthquake region. Taking the epicentre of Ms 6.5 earthquake that occurred in Jiuzhaigou area on August 8, 2017, that is 36km WSW of Yongle as the No. 5 in label 1, divided into 9 grids, which is convenient for spatial and temporal analysis. In this paper, sequence analysis is carried out with grid 5 as the core to discuss the relationship between pre-seismic anomalies of the OLR data and earthquake events.

A martingale method
The Martingale theory has been widely used in investment optimization, decision-making optimizations, and survival analysis. In [14], [24], Ho and Wechsler proposed a Martingale framework for detecting changes and found the feasibility of the martingale method for detecting changes in unlabelled data streams. In our previous work, we applied the martingale theory algorithm proposed by Ho et al. to analysed the OLR dataset, and it was effective to obtain the abnormal information from OLR data [5], [7], [27]. In probability theory, a martingale is a sequence of random variables (i.e., a stochastic process) for which, at a particular time, the conditional expectation of the next value in the sequence, given all prior values, is equal to the present value.
Basic on the martingale arithmetic, we combined the GMA/EWMA control charts to improve the results for precision. The benefit of GMA/EWMA Martingale method is that it can obtain better precision and recall than the pure Martingale method [15].
Consider an unlabelled time series example the OLR dataset, set } ,..., ,  , and if a change is found, re-initializing the calculation of the equation.

Time analysis and anomaly detection
From the previous step, the OLR data transform to a serial of CD-Value dataset, which mines valuable abnormal change information from the original OLR data. The calculating process of CD-value dataset is a Moving Window Martingale Theory (MWMA) actually, so we call it as MWMA value or CD value (Change Detected) sets. However, it is hard to get really useful information simply by analysing the earthquake of a single curve of these CD-Value sets. We tried to find the rule by studying the CD data of 10 years and using statistical methods. This paper takes grid 5 in figure 1 of the OLR data set in Jiuzhaigou region from 2009 to 2018 as the research object. The next step is to calculate the standard deviation  of this matrix. To find out a proper criterion for the anomaly definition, we check the empirical probability distribution of the 10-year MWMA value. In Figure 2, shows the contrast between the blue, red and green three curves, demonstrate the MWMA value of 2017 years in Jiuzhaigou, and mean value  and   2 + of 10 years, respectively. The conventional method to identify anomaly data from a certain data set is using the threshold. However, in Figure 2, it could be found that the energy distributions are quasi-normal or skew, and the extremely large MWMA value can seriously bias the mean and the  . Therefore, here we utilize median and interquartile range, which are more robust estimations, to define the anomaly. As a result, there are 905 anomalies if the threshold is  and 314

Experimental Results
A disturbance that cannot be recognized by the naked eye can be obtained by data mining of OLR data source. Figure 3 shows the comparison before and after data processing. In the picture above, the blue wave line is the OLR raw data; it is hard to get the regularity and abnormality. Meanwhile, the picture below is the after martingale algorithm processing, it is completely different from the original data curve. The MWMA value captured the abnormal information. The red dashed lines in Figure 3 indicated twice earthquake event in 2017, from left to right correspond to No.6 and No.5 in table 1. As the Figure 3 the MWMA value image shows, the MWMA value very small before April 10, after the April 10 earthquake, the value of MWMA fluctuated slightly, and then all the way down, until the value began to rise in July, and gradually reached the peak, and then fell to the next data climb, just happened on August 8 The magnitude 6.4 earthquake. This shows that the exception of the OLR signal was successfully acquired. is better choice relatively. Figure 4 shows the different value of  getting various. Because of the parameter  influenced the degree and amplitude of those abnormal points. Before we study a seismic region, we need to judge and choose the parameters of the area. Figure 4 shows 3 value of parameter  , they are some typical case. The red line represent the day of Ms=6.5 earthquake. If , the amount of abnormal values detected is smaller. As the parameter value increases, the magnitude of the abnormal detection begins to increase, but when it gets too large, some small earthquake trigger abnormal may be ignored. There are much other reasons may trigger the WMWA abnormal. We couldn't make an earthquake early warning as soon as the abnormal appear. Even in a non-earthquake year, there may appear a similarity abnormal curve. So, we have studied the relations of multi-years. Furthermore, to investigate the occurrence and of the MWMA anomalies of ten years, and apply the above method of identification abnormal, and compare them to other normal years. Figure 5 shows the thermodynamic diagram of the twice earthquake in Jiuzhaigou in 2017. In Figure 5, the colour indicates the magnitude of the MWMA value, the purple means the value bigger, and the cyan means smaller. The vertical is 2008-2019 years and the horizon is 30 days before the earthquake on Aug. 8, 2017, and the most right square indicates the earthquake day is the 220th day, so there are 30 squares in the coordinates per row. In Figure 5, the black arrow is referring to 2017.
The similar situation occurs on April 10, 2017. Compare to August 8, the earthquake is smaller, the purple also occurs, and have almost a pure purple square, that is the value is very big. There are 5 purple squares in Figure 5(b), one belongs to 2017, 3 belongs to 2018 years. Of course, we found in 2018 occurs the Ms=4.8 earthquake. Compared the same period with other years, purple squares are the most visible, and the quantity of the purple square is the most. And that the Ms=4.8 earthquake happened in June 2018, which was a small sign of things to come. In Figure 5, the graph on the right is the CD-Value curve corresponding to the year of the earthquake on the left, that is the year indicated by the arrow, and the red line shows the date of the earthquake, which can be compared with each other to increase the credibility of some earthquake signs.

CONCLUSIONS
During the period from April to June in 2009 to 2018, a total of 12 earthquakes of magnitude 4 or above occurred in the Jiuzhaigou area, and the Ms 6.5 earthquake occurred on August 8, 2017. Firstly, we screen the parameters of this region, and then extract the abnormal MWMA value of martingale theory algorithm from the OLR data of 10 years, and then carry out time-series analysis on this basis. The results show that the multiyear lateral comparison can help to confirm the reliability of earthquakes. We still have a lot of work to do. For example, if we first make difference another algorithm for stationary time series data of source data, we can eliminate the noise caused by seasonal temperature changes. The algorithm works well inland, so how do you tell if an earthquake is happening on an island or off the coast when the temperature is out of control. We will further study other signals such as GPS.