Forecast of the trend of heavy fog based on the VARIMAX model

This paper works out relationship between visibility and near-surface meteorological factors. The formation of heavy fog is affected by meteorological factors near the ground and fog in the past period. In this paper, we abstract and simplify the problem as a time series problem. First, the airport AWOS observation data is reprocessed, and some missing and incorrect data are supplemented and corrected. Then draw a distribution map of "Visibility-Near-surface Meteorological Factors" to intuitively grasp the correlation between them. Finally, model the classic VARIMAX to fit the mapping relationship between visibility and near-surface meteorological factors. The results show temperature has the greatest impact on visibility index, positively correlated with it; secondly, dew point temperature index negatively correlated with it. The results show that, with the temperature low and the humidity high, the water vapor in the atmosphere is more likely to condense into mist, which is not easy to dissipate, resulting in reduced visibility. The indicators related to air pressure and wind speed are positively correlated with visibility, indicating that the increase in air pressure and the increase in wind speed will promote the dissipation of heavy fog. Generally speaking, the MOR index fits better with near-surface meteorological factors.


Introduction
According to the annual road traffic accident statistics report released by the Traffic Management Bureau of the Ministry of Public Security, nearly 30% of traffic accidents in my country occur under adverse weather conditions such as snow and fog [1]. More statistics show that at night with poor visibility, traffic accidents account for 10% of the total accidents, but cause 47% of the death rate [2]. With the development of the transportation industry, the aviation industry has become one of the important current transportation methods. Despite the maturity of aviation technology, flights are still constrained by severe weather. From the statistics of the Civil Aviation Administration, the number of flights affected by weather factors is increasing. Among them, the low visibility caused by foggy weather has become an important factor affecting flight, and to a large extent determines whether the airport can operate normally. Poor visibility caused by bad weather directly causes the driver to be unable to obtain sufficient information, prolongs the driver's reaction time, and increases the psychological burden of driving. At the same time, low visibility can also cause the driver to make mistakes in speed and distance estimation. Without enough reference objects, it is difficult for the driver to accurately distinguish different objects in motion. Therefore, visibility prediction has always been a topic of great concern to highway management departments and airlines.
In view of this, improving visibility prediction is an important means to reduce severe weather accidents. One way to measure visibility is to use a laser visibility meter. The instrument emits visible light through the xenon flash lamp in its transmitter and transmits it to the receiver through the atmosphere. The light energy received by the receiver is compared with the light energy emitted by the transmitter to know the attenuation of visible light, and the brightness of the background light is measured at the same time. These data are encoded and processed by the internal data processing equipment of the laser visibility meter, and the visual range and meteorological visibility of the runway are calculated by the Koschmider formula and the Allard formula. However, the current methods for detecting visibility through instruments have great limitations. If a large number of laser visibility meters are used to cover the highway network, it will cost a lot of money, and it will be difficult to achieve economy. Secondly, the laser visibility meter has problems such as low fog detection efficiency, small detection range, and high maintenance cost.
In response to the above situation, in recent years, many scientific researchers have proposed a video-based runway visibility detection method as a solution to replace the laser visibility meter. This scheme overcomes the deficiencies of the laser visibility meter to some extent. However, the existing visibility detection methods based on video images all use indirect calculation, so the estimation accuracy is not high, and there is still much room for improvement.

The relationship between MOR, RVR and visibility
According to the measurement principle, the Meteorological Optical Range (MOR) refers to the distance that the parallel light beam emitted by the incandescent lamp is absorbed and scattered by the atmosphere when the color temperature is 2700K and the light beam is attenuated to 5%. When the MOR value is less than 2000, it can better reflect the actual visibility situation. Runway visual range (RVR) is the distance at which the pilot on the aircraft on the centerline of the runway can see the signs or runway boundary lights or centerline lights on the runway. RVR is estimated by considering three factors: extinction coefficient, runway light intensity and illuminance threshold [3]. For airlines, RVR is actually the main factor that affects aircraft operations, but MOR is closer to the main visibility value observed by humans. The MOR value is recognized by the International Civil Aviation Organization which can be used as visibility in unattended situations [4]. When the instrument is calibrated and operating normally, some scholars believe that the visibility value can also be replaced by the MOR value at night. This article collects the AWOS observation data of an airport in China on March 13, 2020 and December 16, 2019. At the same time, this article collected the air pressure (HPA), the highest point air pressure in the landing area of the aircraft(QFE06), the corrected sea level air pressure (QNH), the temperature (TEMP), the relative humidity (RH) and the dew point temperature (DEWPOINT). This paper also collected fog-related near-surface meteorological factors such as visibility, wind direction, wind speed, and vertical wind speed under the two definitions of RVR and MOR.
However, the observation data stored according to this type has some problems such as missing data and different sampling frequencies, so the original data should be preprocessed. In this paper, the factors with different frequencies in the original data are stored separately, and the sampling time is unified and processed. The frequency of the processed data is uniform and there are no missing values. This article aims to make a specific relationship between visibility and related meteorological indexes based on near-surface meteorological indexes related to fog. Therefore, this paper draws a scatter diagram between the ground meteorological indexes and visibility indexes observed by the airport AWOS system to determine the relationship between different meteorological factors and visibility. In this paper, as far as possible, choose the index that has a high correlation with the formation and dissipation of heavy fog.
In this paper, the visibility defined by MOR is used, and the distribution maps between MOR visibility and near-surface meteorological factors are also drawn separately. From the scattered point distribution, it can be seen directly that the relationship between HPA, QFE06, QNH and other three different pressure values is basically consistent with the MOR. But different from the visibility defined by RVR, MOR and different pressure distributions clearly show the form of "large upper and small lower". This shows that when low visibility occurs, the range of air pressure that can be selected is very small, and there is a significant relationship between them.
Temperature and MOR show an obvious positive correlation. From the distribution of temperature and MOR, it can be seen that when the minimum visibility occurs, the point is concentrated below 10 degrees. As the temperature rises, the visibility value also increases. Relative humidity and MOR distribution show a significant negative correlation. When the visibility is low, it can be seen directly from the scattered point distribution that three different air pressure values (HPA,QFE06 and QNH) have basically the same relationship with visibility. The correlation between visibility and air pressure is weak. Only when the air pressure reaches 1020PA or more, the data at some time points show extremely low visibility. It can be seen that the air pressure within the usual range will not have a significant impact on visibility.
Under the visibility defined by RVR, the RVRtemperature distribution presents a "narrow down and wide top" form, that is, the lower the visibility, the narrower the corresponding temperature range, but the total difference is not large. Based on the available data, we cannot derive a clear distribution trend between relative humidity and RVR from the AWOS observation data of the airport. Visibility RVR is obviously more sensitive to dew point temperature, because under high humidity conditions, the aerosol extinction coefficient changes with humidity, and the magnitude of the change is very large [5]. Dew point temperature characterizes atmospheric humidity, and the low visibility caused by fog formation is closely related to atmospheric humidity. It can be seen from the distribution relationship between visibility and dew point temperature on two observation days that the dew point temperature and visibility tend to be negatively correlated on different observation days.
Before observing the distribution of wind-related indicators and visibility RVR, this paper draws the wind rose chart measured by the airport AWOS system, which is the percentage value of each wind direction averaged according to the observation point. Image analysis shows that when the wind speed is at a low value, it maintains the spread of the fog.
In the case of high relative humidity, the dew point temperature also shows a negative correlation with MOR. The condensation of water vapor in the air promotes the generation and maintenance of fog and reduces the visibility MOR. Similar to the visibility defined by RVR, the wind direction index is also almost irrelevant to the MOR. When the visibility fluctuates in a wide range, wind in all directions can be detected. The wind speed has a certain degree of positive correlation with the visibility defined by MOR, and the low visibility is relatively concentrated in the low wind speed section.
From the above-mentioned intuitive analysis of the distribution of visibility and near-surface meteorological factors, it can be seen that the visibility under the two different definitions of MOR and RVR has a relatively similar distribution relationship most of the time, but there are still significant differences.

Establishment of Visibility VARMAX model
Considering that the data of near-surface meteorological factors at different points in time is related to the past data, the observed visibility-related indexes can be regarded as a set of time series. The dependence of this series of sequences reflects the continuity of the original data in time. It can be understood that on the one hand, the visibility index is affected by near-surface factors, and on the other hand, it has its own changing law.
First, this article calculates the autocorrelation coefficient and partial autocorrelation coefficient of the original visibility observation data, The k-order autocorrelation coefficient ACF of the 1 {y } T t t sequence can be expressed as: The value of k ranges from negative infinity to positive infinity. k  is the k-order covariance. 0  is variance. The autocorrelation function of the series containing the trend term will not decay with a negative exponential, but will decay with a slow linear attenuation. The partial autocorrelation coefficient can be obtained by solving the Yule-Walker equation as follows [6].
Series kk  is the partial autocorrelation coefficient Then consider the original form of the ARIMAX model. This article assumes that the influencing factors are 1 2 3 , , , k X X X X … . From regression analysis, we can know： Y is the observed value of the quasipredictor, 1 2 3 , , , … is the effect coefficient of the influencing factor, and Z is the error value. Considering that Y will be affected by its own changes, and the visibility data is related to past data, the law can be summarized as: Considering that the error term should be interdependent in different periods, the expression of the error term is written as: The term t Y to be predicted can be written as: The above model can be sorted into the following form through the lag operator [7]: is not related to each other and have a distribution with a mean value of 0，which is often assumed to obey a normal distribution. The model can use ARIMA(p,d,q).
It is imperfect to discuss only the trend of a single visibility index. It is necessary to further introduce exogenous variables, that is, near-surface meteorological factors. To avoid confusion, the classic form of VARMAX with exogenous variables should also be considered: 1 1 is r-dimensional exogenous variable vector, determining terms such as trend terms. They have contemporaneous correlation with each other, but not related to lagged value of themsleves, and not related to the variables on the right side of the equation. It is usually assumed to obey a multivariate normal distribution with a covariance matrix in the form of an identity matrix. Furthermore, the above-mentioned classic VARMAX model can be sorted into a matrix form:  The part of formula (12) except a is a matrix of exogenous variables composed of near-Earth meteorological indexes.

Model solving process
According to the idea of establishing the model in this article, the ACF autocorrelation coefficient and the PACF partial autocorrelation coefficient of the visibility data need to be calculated separately. Calculating the ACF and PACF of the original MOR data on December 16, 2019, it can be seen that the autocorrelation of the original sequence is very strong, and the partial autocorrelation coefficient has the feature of truncation.To further carry out ADF stationarity test on the original data, it needs to be explained that stationarity analysis is the basic alleviation of time series modeling and the stationary model can be directly used for modeling when the series is stationary. Otherwise, if there is a unit root in the sequence, we need to smooth the sequence, and then use the stationary data to model, or directly use the non-stationary model to model. Stability can be divided into two types: strict stability and weak stability. However, strict stability conditions are too harsh in practice, and weak stability has a wider range of applications. Strictly stationary is defined as if 1 {y } T t t has exactly the same joint probability distribution at different time points, that is, It can be seen from the definition that the statistical characteristics of a strictly stationary time series will not change with time. The mean and variance of a weakly stationary short time series are both finite constants. The autocovariance progression is related to the statistical time interval k, and time t has nothing to do. There is no inevitable connection between strict stationarity and weak stationarity. Generally, the judgment of stationarity can be done by visually observing the trend chart of the time series. If there is no obvious trend or periodicity, the sequence may be considered to be stable, otherwise, it must be non-stationary. Stationary series. A more accurate method generally uses statistical methods to test the unit root of the sequence. We use ADF test to process the original MOR sequence, and the result we get is -0.9441>-2.57, that is, we cannot reject the null hypothesis that there is a unit root in the ADF test. The original series of MOR data on December 15, 2019 is non-stationary and needs further processing.
The MOR sequence is differentiated, and the autocorrelation and partial autocorrelation coefficients of the sequence after the difference are calculated. The results are shown in the figure: We verify the stationarity of the differenced MOR series and get -33.6331<-3.43, indicating that the null hypothesis that the series has a unit root can be rejected, and the differenced series is stationary.
Next, identify the model used, including the model type and model order. We use the Box-Jenkins identification method whose essence is to use the autocorrelation function and the partial autocorrelation function to determine the model order and type, and determine the model type when the order of the model can also be determined.
Finally, the insignificant factors in the near-surface meteorological factors are removed, and the Log Likelihood value and AIC criterion are used as an aid to determine the order of the model. Finally, ARIMA(1,1,1) was use to fit the difference MOR on December 16, 2019. Data. The result is: It can be seen that the visibility index defined by MOR on December 16, 2019 is positively correlated with temperature and wind speed. The R2 value of the fitting result is 0.9938, indicating that the fitting effect is better.
In the same way, auto-correlation and partial autocorrelation functions are calculated for the original sequence of visibility MOR on March 13, 2020, and the results are shown in the figure: After testing, the original MOR data on March 13, 2020 is -1.7924>-2.57, which is not stationary. Therefore, we need to differentiate the original series. What's more, calculate its auto-correlation function and partial autocorrelation function. The results are shown in the figure: After ADF test, the difference MOR data on March 13, 2020 is -24.8669<-3.43, which means that after the difference processing, the series does not contain unit roots, and the difference MOR series is stable.Similarly, we need to identify the model, remove the insignificant factors in the near-surface meteorological factors, and use the Log Likelihood value and the AIC criterion as an aid to determine the order of the model.Finally, ARIMA(3,1,1) was used to fit December 2019 Differential MOR data on the 16th, the result is: It can be seen that the visibility index defined by MOR on March 13, 2020 is positively correlated with temperature, humidity and wind speed, negatively correlated with dew point temperature. Among them, the temperature index has the greatest impact on the visibility of the day, followed by the dew point temperature index. The two results respectively mean that the higher the temperature, the higher the visibility, and the higher the dew point temperature, the lower the visibility. Therefore, when only the indicators of fog formation and dissipation are considered, the increase in temperature is conducive to the dissipation of the fog; the dew point temperature is a sign of the humidity in the atmosphere, the higher the humidity, the easier the moisture in the atmosphere condenses into fog, thus reducing visibility. Air pressure and wind speed are also positively correlated with visibility, indicating that the higher the air pressure and the greater the wind speed, the higher the visibility, which is consistent with the previous analysis. Static or low wind helps the fog to maintain and spread, and high winds accelerate the dissipation of the fog. Air pressure and wind speed are also positively correlated with visibility, indicating that the higher the air pressure and the higher the wind speed, the higher the visibility. This is consistent with the aforementioned analysis that quiet or low winds help the mist to maintain and spread, while strong winds accelerate the dissipation of mist. The R2 value of the fitting result is 0.9967, indicating that the fitting effect is better.
In the same way, the RVR data observed on December 16, 2019 and March 13, 2020 are processed in the same process, which finds that the original series of the two days are not stable after inspection. The original RVR series and the differential RVR series are calculated separately as shown in the figure: We identify the model, remove the insignificant factors among the near-surface meteorological factors, and use the Log Likelihood value and the AIC criterion as an aid to determine the order of the model. Finally, ARIMA(1,1,1) is used to fit the differential MOR data on December 16, 2019, and the result is: It can be seen that the visibility index defined by RVR and the visibility index defined by MOR have reached a similar conclusion, which means the visibility data at the point of observation is the most sensitive to temperature and is positively correlated with temperature.
At the same time, it is negatively correlated with the dew point temperature, indicating that the increase in humidity in the atmosphere is conducive to fog condensation, thereby reducing visibility.
The wind speed is positively correlated with RVR, which also means when the wind speed is high, the fog is easy to be blown away. In the same way, we identify the model, and finally get the fitting result of the differential RVR data of the ARIMAX(2,1,1) model on March 13, 2020 and the nearsurface meteorological factors [8]: The results show that the differential RVR data on March 13, 2020 is mainly affected by temperature and dew point temperature. This trend is the same as the previous analysis, confirming the accuracy of the time series model used.

Conclusion
In summary, in this article, we established an AMIMAX model to fit the MOR and RVR data of different dates with near-surface meteorological indicators, and selected models of different orders according to the characteristics of different dates. In the end, the specific relationship between visibility and near-surface meteorological factors can be summarized as follows: In this paper, it is assumed that the visibility and the near-surface meteorological indicators are well fitted, and the models are all above 0.98, that is, the factor that has the greatest influence on the visibility indicators is temperature, and the two are positively correlated. Secondly，we analyse the dew point temperature index which is negatively correlated with visibility. It shows that when the temperature is lower and the humidity is higher, the water vapor in the atmosphere is more likely to condense into fog, and it is not easy to dissipate, resulting in reduced visibility. The indicators related to air pressure and wind speed are positively correlated with visibility, indicating that the increase in air pressure and the increase in wind speed will promote the dissipation of heavy fog.