Some problems with old magnetic data processing

Continues magnetic measurements at the IKIR FEB RAS observatories Magadan (MGD), Paratunka (PET), Yuzhno-Sakhalinsk (YSS), Cape Schmidt (CPS) and Khabarovsk (KHB) and CSIR-NGRI observatories Hyderabad (HYB) and Choutuppal (CPL) have been started almost since their formation. A significant part of the results obtained is presented in the WDC and INTERMAGNET databases. However, a large amount of raw data remains unprocessed and unavailable for using by scientific community. In the past few years, institutes has been making efforts to process and reprocess old magnetic data. Digital images of analog magnetograms of the Observatory Paratunka since 1967 were obtained and the possibility of their use for calculation hourly and minute values of magnetic field elements was evaluated. Old digital data that was available during the conversion from analog to digital magnetometers is processed. The main problem of processing or re-processing archived data is the lack of information (metadata) about the measurement conditions. First of all, these are the results of absolute observations, which are necessary to obtain the values of the elements of the total field vector. In this paper, some technologies are proposed that allow to use the data obtained during processing of analog magnetograms to adjust the digital magnetometers records. A significant problem is the lack or inaccuracy of information about the temperature conditions in the variation pavilion, about magnetometers or support equipment maintenance or about works in and near the pavilions. As we accumulate the experience during the processing of old magnetic data, a "catalog" of noise and its typical images is formed. This makes it more reliable and efficient to identify and remove this noise from records.


Introduction
Working with archives is one of the important tasks of magnetic observatories that perform long-term continuous observations. First of all, it is the saving of old data, both digital and paper. This is partially solved by transferring the standard processed data to large storage centers such as World Data Centers (WDC). The attention of the scientific community to the rescue of raw information is represented in resolutions of international organizations, for example, Resolution No. 9: Preservation of historical materials (IDC History), decided at the 8th IAGA Scientific Assembly, Uppsala, August 1997 (http://www.iaga-aiga.org/index.php?id=res9-97) or Resolution No. 2 (2005) Data rescue (10th IAGA Scientific Assembly, Toulouse, July 2005, http://www.iaga-aiga.org/index.php?id=res2-2005) and in ongoing projects. For example, under the VarSITI grant (2014), digital images of magnetograms from the Observatory Paratunka for 1967-2006 were obtained, which are available through the WDC system (Moscow, http://www.wdcb.ru/stp/geomag/magnetogr_list.en.html, and Kyoto) [1].
Another task of working with archives can be considered revision of previously published data. Modern methods and computer technologies allow to detect problems in old data series, including gaps due to various reasons, and also allow to re-process these series using new mathematical and methodological capabilities [2].
The third task is to analyze and process the raw data available at the Observatory, which was not previously used. This situation is common for periods when the observatories is switching to new observational methods or when equipment is being upgraded. For example, very often magnetic observatories simultaneously used analog recording systems (quartz sensors with registration on photo paper) and digital magnetometers. The results of stable and well-established analog measurements were used to calculate the standard hourly values of magnetic field elements, and digital data was mainly used in research tasks. The use of digital measurements to obtain standard series was restricted by the limited computational possibilities of observatories, the relative instability and unreliability of new digital systems, etc.
This paper discusses some aspects of the results of magnetic monitoring available in the archives of the observatories Magadan, Paratunka, Cape Schmidt and Khabarovsk of IKIR FEB RAS and Hyderabad and Choutuppal of CSIR-NGRI. The features and problems of using these data are analyzed in more detail for the Observatory Paratunka.

State of an archival data of the magnetic observatories of IKIR FEB RAS and CSIR-NGRI
The geophysical observatories Cape Schmidt, Magadan, Paratunka, and Khabarovsk begin their history since the resolutions of the USSR government to organize a network of joint magnetic-ionospheric stations (KMIS), decided as result of the IGY in 1957-1958 [3][4][5]. The Observatory Yuzhno-Sakhalinsk has a longer history, which begins before the 2nd World War  Table 1 provides general information about these observatories, and Figure 1 shows the data status graphically. More detailed description of modern equipment of there observatories is presented in [6,7]. Due to various problems during perestroika in the 90s, many magnetic observatories in the former USSR, including the observatories of IKIR FEB RAS, significantly reduced their activity. As a result, the sending of standard hourly data to the WDC really has stopped. However, analog measurements and processing of their results mostly continued, and the series of hourly data at one or another level of readiness accumulated in the internal archives of the observatories. Since the beginning of the new century, a gradual modernization of observatories of IKIR began -analog magnetometers were replaced by digital ones, including in cooperation with institutes in Germany and Japan [8]. These digital data were also accumulated in the archives of observatories. Initially HYB was equipped with La-cour variometer (analog) and VPPM for absolute determinations of H and Z, with QHM and BMZ secondary absolute instruments and CPL is with tri-axial fluxgate variometer (digital) [7].
Thus, in the period between analog (IAGA standard) and digital (INTERMAGNET standard) measurements at the observatories of IKIR FEB RAS and CSIR-NGRI, an interval "IAGA" is observatory code of IAGA; "GLat", "GLon", "Alt" are geographical latitude and longitude, and altitude; "Hourly", "Yearmean" -availability of hourly and annual data in the WDC for Geomagnetism

Processing of archival analog magnetic data of Observatory Paratunka (PET)
Preparation of hourly data from analog magnetograms includes two stages: the checking of the previously calculated hourly values of total field components H, D and Z available in the observatory's databases, and filling of the gaps using the magnetogram images. During checking process, it was performed (a) comparison of magnetic field variations at the PET with variations (hourly data) at nearby observatories, mainly Memambetsu (MMB), (b) estimation of the first differences, and (c) comparison of doubtful values directly with the original analog magnetograms. If possible, the available digital records of variations (seconds or minutes) were also used, even if they were not of very high quality. An example of similar comparison is shown in Figure 2 -the spike in the PET data is clearly visible against the smooth field variations at MMB and MGD. The spike has a significant amplitude and raises suspicions in itself, which are easily verified by comparison with the original magnetogram. But with mass checking and with minor spikes or jumps, continuous viewing of magnetograms becomes very expensive. The main problem of similar comparison is the inefficiency of controlling of the slow field variations, with periods from several hours or more. Therefore, the effects as the temperature change in the pavilions, the erroneous baseline values or an inaccurate scales are not controlled. There are also significant problems with estimating the reliability of magnetic variations during storms, especially if records from storm-set magnetograms are used.
Filling in large gaps (several hour values or more) was performed using the original magnetograms (their digital images in TIF or JPG graphic formats) [2], if there was an analog recording of field variations for this time interval: 1) using a special software WFD [10], the tracks H, D, Z of Bobrov's quartz sensors, its fixed lines (baselines), hourly markers, scale bar were marked on the magnetogram image, and then the ordinates dH, dD, dZ in mm were calculated for each minute; 2) using the available scale coefficients and the accepted base values the total values of the field elements H, D, Z were calculated for each minute. If possible, temperature corrections were taken into account for the H and Z components; 3) then the hourly average values were calculated from the minute data.
Tracks were usually digitized not only for intervals with missing data, but also for adjacent fragments of records on magnetogram (usually 12 hours on the left and right). These overlapping sections were used to fit the new digitized data to the original ones, since the lack of accurate baseline values and measured temperature values could cause jumps at the borders. Figure 3 shows, as an example, the calculated series for January 29-31, 1997 (full processing was done for January 1-31) and its comparison with data from the Observatory MMB. A image of the used magnetogram is shown in Figure 4.  . Green curve-minute data obtained by digitization of the analog magnetograms; red curve -hourly data calculated from minute values; blue curve -hourly values obtained by manual processing of magnetograms (from WDC); black curvehourly data of observatory MMB (from WDC, for comparison) [9] The main problems with this processing are due to the poor quality of the original magnetogram (yellowing over time due to improper processing of photo paper, illumination during the replacing photopaper by magnetologist and processing it, weak tracks and fixed lines, etc.), as well as very complex tracks (intersecting and returning) during magnetic disturbances. There are also personal systematic errors of magnetologists who digitize the magnetogram, reaching several nT during periods of magnetic storms.

Processing of digital magnetic data (standard magnetometers)
Since 2004, the fluxgate declinometer-inclinometer (DIflux) LEMI-203 obtained during the project CRENEGON (INTAS) has been used at the Observatory Paratunka for absolute measurements of D and I values [11], the total field F was measured using the proton magnetometer MMP-203M.
The magnetic variations were measured using the digital quartz station "Quartz-9" based on Bobrov quartz sensors and the fluxgate variometer FRG-601, the specifications are presented, for example, in the review [12]). These magnetometers were almost fully compliant with INTERMAGNET standards. Since 2009, a set of magnetometers for continuous measurements installed at the Observatory under an agreement with GFZ (Germany) has been used. The set included the FGE-DTU fluxgate variometer and the GSM-90 GEM Overhauser magnetometer (supplemented by the Magdalog recorder), which fully met the requirements for the INTERMAGNET Observatory.
The retrospective processing of digital data from 2012 to 2005 does not differ methodically from the processing of current magnetic measurements at Observatory, and standard software was used to a large extent. As a result, by 2019, minute data corresponding to the INTERMAGNET Definitive status was prepared, and the data sets for 2006-2012 passed The main difficulties during processing were related to problems in the results of absolute observations: (a) some of the observation protocols and resulting files contained errors that were almost impossible to interpret and/or correct, (b) some of the protocols were missing, (c) absolute observations were not always performed methodically correctly and uniformly, (d) absolute magnetometers had technical problems that were not fixed promptly. As a result, some of the results of absolute observations were not processed, and some of the calculated baseline values were not accepted. This led to a noticeable decrease in the reliability of the adopted baseline values. This situation is not unique, and it is quite common at present in many INTERMAGNET observatories: the results of absolute observations are processed only in advance or not at all, and minute data is promptly sent to GIN only in the Reported status (variations). The negative consequences of this approach become visible at the stage of preparing the final annual data (Definitive).
Another significant problem is the lack of information about the conditions under which magnetic measurements were performed, both variational and absolute. This created difficulties in interpreting the observed anomalies in field variations or in baseline values, since it was impossible to understand the cause of anomalies -from natural or man-made sources, such as the changes in equipment parameters, work in pavilions or nearby, changes in thermal conditions, etc. As a result, some part of the noise may have been missed during processing, and the useful signal may have been deleted.
As an example, a part of a record obtained on January 31, 2006 at the PET using a fluxgate magnetometer with a measurement frequency of 1 Hz is shown in Figure 5. The left panel shows that near 02:00-02:15, two anomalies with amplitude up to 4 nT are observed in all components. A similar anomaly is observed in the quartz variometer data (see curve dZ (2)). There is no any information in the Observatory's diary about possible cause of this anomaly. The type of anomaly indicates that it may be caused by vehicles passing there and back, such as a snowmobile or tractor that cleared the road to the pavilions. At the same time, the amplitude and form of anomaly do not exclude its natural origin. In the minute data of the nearest operating in 2006 Observatory Memambetsu (MMB) such an anomaly is not visible (left panel of Figure 5), although variations with similar characteristic times and values on PET and MMB correlate well (right panel). Consideration of all these features, mostly indirect rather than direct, allowed the magnetologist to make a decision about the artificial origin of the anomaly near 02UT and then remove it from further processing and calculating the Definitive minute data.

Processing of digital magnetic data (non-standard magnetometers)
The situation with magnetic data up to 2004-2005 differs significantly from that described in section 3: 1) absolute measurements were performed with magnetometers that do not fully meet modern standards (quartz horizontal magnetometer and declinometer, analog proton magnetometer); 7 E3S Web of Conferences 196, 02029 (2020) https://doi.org/10.1051/e3sconf/202019602029 STRPEP 2020 2) the archives with the raw results of absolute observations are in very poor condition and practically cannot be used; 3) only one digital magnetometer (fluxgate variometer FRG-601) worked at the Observatory, therefore there was no backup device whose data could be used for checking main magnetometer records or filling of the gaps; 4) there are no general archives with information about the conditions during magnetic measurements.
Thus, in fact, there are no observed absolute values of the magnetic field and it is almost impossible to obtain baselines values for the digital variometer. At the same time, digital data on variations are generally acceptable and close to INTERMAGNET standard. In addition, the internal archives of the Observatory PET contain a sets of hourly values H, D, and Z, which were obtained by processing of analog magnetograms and the results of absolute observations using standard IAGA methods and are close to absolute in their meaning.
Therefore, there is a proposal to use these hourly values for adjustment of digital variations to avoid possible instability due to instrumental, environmental and other reasons. The main question in such technology is how much the hourly data obtained from magnetograms can be considered "absolute", that is, how correctly the full standard processing was performed, including calibration, temperature effect consideration, absolute observations and calculation of baselines, manual processing of magnetograms, etc.).
Next, we will make some estimates based on the results of measurements at the Observatory PET in 2001. The original magnetograms (on photo paper, the main and storm sets) are stored in the Observatory's archive. Digital images of these magnetograms with accompanying information were obtained in 2014 as part of a VarSITI grant. An example of a magnetogram is shown in Figure 4. In 2016, the PET hourly data for 2001 were checked by comparison with the hourly data of MMB. The main purpose of the checking was error detection. If errors or gaps were found in hourly values files, data fragments were replaced or filled with data obtained by digitizing of the existing images [2]. The final hourly data is shown in Figure 6.
The instrumental parameters of Bobrov's photo series (sensitivity values EH, ED, and EZ) were usually determined monthly or after working with sensors, for example, after adjusting the position of light spots. Absolute observations were performed more frequently, and monthly mean values were derived from the results, which were used during calculation of the total field components. Figure 7 shows the values of instrumental parameters (sensitivity and baseline values) for 2000-2002, taken from the general summary table that accompanies the database of magnetogram images [13]. As can be seen from Figure 7, as well as directly on the magnetograms, at 00UT 01.08.2001, the quartz sensors were probably adjusted, which led to a shift in the position of light spot on the photo paper (a change in the baseline values) and to the changing of the sensitivity of the Z-record.
One important issue is the effect of temperature on measurements. Bobrov quartz sensors are stable with respect to temperature variations, and accounting for their temperature coefficients is quite simple, this is one of their advantages. The last values of temperature coefficients available in the archives date back to 1990, for the H-sensor it is accepted -0.4 nT/ o C, for the D-and Z-sensors it is zero. Because the sensors did not change in the following years, these coefficients can be taken as actual. Some problem is the measurement of temperature variations. In accordance with the method accepted at the Observatory, the alcohol thermometer readings in the internal thermally insulated chamber were taken once a day, when the photo paper was replaced (this time is 21UT in 2001). If the temperature stability in the variation hut is good, then single daily measurement is sufficient to adequately make the temperature corrections. Figure 8 shows the daily (spot) temperature values for 2001.
As can be seen from Figure 8, in extreme cases during winter, the temperature inside the variation pavilion could change by several degrees per day. Usually, such rapid changes were caused by the planned switching on/off of additional heaters or by the cooling or heating of the pavilion during long power outages. Since the temperature changes are quite smooth due to relative thermal insulation of the pavilion, even a simple linear interpolation of temperature values between days made it possible to exclude a significant part of these rapid changes. Rough estimates show that we can expect errors in the temperature data near of 1-2 o C.
A more accurate assessment of the effectiveness of using such spot temperature measurements can be made for 2005. In 2005, daily temperature readings were taken manually during the changing of the photo paper, but digital measurements were also performed using a thermistor and recording of the results to a computer. Figure 9 for January-March 2005 shows daily spot measurements with an alcohol thermometer, their minute interpolation by a smoothing spline, and digital minute values of the temperature measured by a thermistor. As can be seen from Figure 9, the interpolated temperature values differ from the digital values by 2-3 o C, so they are close to the estimates made above, with the exception of some cases. Thus, we can assume that the errors of temperature corrections in the data of the quartz sensor Z will be about 1-2 nT, and possibly up to 5 nT in extreme cases. Temperature corrections for sensors H and D are assumed to be zero.  Unfortunately, it is currently impossible to evaluate the correctness of the actual methods, algorithms, and calculations used. It can only be assumed that in the years considered here, the processing of analog magnetograms at the Observatory PET continued to be performed in accordance with the IAGA requirements for magnetic hourly data. Errors in individual values are possible and unavoidable, which are usually the result of errors in reading the ordinates of tracks from magnetograms, especially during disturbances or storms, errors in manual recording on forms and typing data into a computer. However, these unreliable values Figure 9. Comparison of spot daily temperature readings using the alcohol thermometer with digital minute values of the temperature. 10-minute mean values of the outdor temperature using WS2000 meteorological station are also presented.
can be removed in the same way as incorrect individual results of modern absolute observations. We can assume that the final inaccuracy of the "absolute" hourly values from analog magnetograms lies within a few nT. This makes it possible to use them by analogy with the results of standard absolute observations.
The following steps were performed: 1) the original raw (second) data of the FRG variometer for 2001 was reviewed, measurements with noise and during equipment failures were deleted; 2) the minute values of the variations dHm, dDm, dZm were calculated from the second data, the weight coefficients recommended by INTERMAGNET (Gaussian filter) were used; 3) the hourly average values of the variations dHh,dDh, dZh centered on the middle of the hour are calculated from the minute data, so practically without a time shift relative to the hourly values Hb, Db, Zb from analog magnetograms (Bobrov sensors) obtained manually using a pallet; 4) the hourly "baseline" values H0h, D0h, Z0h for the variometer are calculated as the difference between the "absolute" values Hb, Db, Zb and the variations dHh, dDh, dZh. When calculating H0, the effect of dDh was taken into account. The "baseline" values that fall out of the general range were removed; 5) the hourly values H0h, D0h, Z0h have the meaning of "observed" baseline values. Using a smoothing spline, the minute "accepted" baseline values H0m, D0m, Z0m are obtained and the total minute values of the components Hm, Dm, Zm are calculated. When calculating Hm, the effect of dDm was taken into account. Figure 10 shows a fragment of the results of calculations for steps 3-5 for the vertical component for December 2001. It can be seen that the baseline values Z0 have a strong variation with a range up to 10-30 nT, which correlates well with the temperature in the pavilion. According to measurements in recent years, it is known that the magnetometer FRG has a temperature dependence. However, this dependency is unstable, so it can only be correctly considered for some parts of the records. Estimations of the temperature coefficient of the Z-channel during different years are in the range of 1.6-2.0 nT/ o C. Figure 10 shows that the variation of Z0 during December 6-17 with an amplitude of about 20 nT can be explained by the cooling of the pavilion at 10 o C. A similar picture is observed for the component H, the temperature coefficient of H is estimated as 0.5 nT/ o C.   The results of processing using the method described above were prepared as files in the INTERMAGNET Definitive format. The absence of standard absolute observations under unstable temperature conditions in the variation pavilion and significant intrinsic changes of 13 E3S Web of Conferences 196, 02029 (2020) https://doi.org/10.1051/e3sconf/202019602029 STRPEP 2020 the baseline values of the magnetometer FRG does not allow us to consider the results obtained sufficiently reliable. Unfortunately, independent control of the obtained minute data cannot be performed due to the lack of a second digital variometer or scalar magnetometer with continuous recording at the Observatory in 2001. However, the obtained data can be accepted by the INTERMAGNET in Adjusted or, possibly, Quasi-definitive status. In addition, obtained hourly data can be considered as compliant with IAGA standards and can be published via system of World Data Centers.

Conclusions
The rescue, maintenance, ordering and publication of archives is the most important task of science, including observatories as organizations that continuously accumulate experimental data. The work with archives of magnetic data available at the observatories of the IKIR FEB RAS and CSIR-NGRI has shown that there are a number of significant problems that do not allow effective use of these archived data. First of all, it is a partial or complete loss of raw results of measurements, as well as the lack of metadata, i.e. information about the conditions under which these measurements were performed.
Nevertheless, work is underway with the archived magnetic data of the IKIR and CSIR-NGRI observatories, including activities to rescue the existing and newly received data, as well as to re-process raw data in order to obtain new information and publish it. The old hourly data is revised using analog magnetograms, and the existing digital series are processed to prepare the minute data in accordance with INTERMAGNET standard.
In our opinion, one of the important conclusions is that the data we receive now becomes archived after a while, but it does not lose its actuality. Therefore our task is to make sure that after many years the users of this data will not get the same problems that we experience when working with old archives today.