Correlation analysis of atmospheric pollutants and meteorological factors using statistical tools in Pune, Maharashtra

. Air pollution has gotten worse due to the speeding up of urbanisation and industry, and the outlook for pollution control is not promising. A significant worldwide challenge that humanity is currently facing is climate change. India has suggested carbon neutrality and a carbon peak as ways to combat climate change. The intricate link and association between atmospheric contaminants and climatic variables that affect air quality, however, must be further elucidated. This work uses Pune's 2017– 2021 high-resolution air pollution reanalysis open data set in conjunction with statistical techniques of the Pearson Correlation Coefficient (PCC) to compute and illustrate the design and analysis of environmental monitoring big data. The PCC is easy to use, immediately showed how contaminants and meteorological conditions relate to one another in time and space, and made environmental management agencies' jobs easier. The experimental results show that all contaminants are positively associated, with the exception of ozone, which is adversely connected. Pollutants are more influenced by meteorological factors than by temperature, which are all positively associated. Due to its strong negative relationship with the five pollutants, wind speed has a greater effect on the dispersion of pollutants.


Introduction
A crucial aspect of protecting the environment is reducing urban air pollution. Urban environmental protection has become more crucial to overall national environmental protection as the country's urbanisation process has accelerated. Air pollution control and prevention are critical for human survival, but they are also an essential first step in the development of the economy. Over the past few decades, air pollution and its mitigation have been ongoing scientific issues. However, they continue to be major global issues. The National Air Quality Index was introduced in 2015 by IIT Kanpur and the Indian government. Governments in populous and developing nations view the regulation of air as a crucial task.

Studies and Methodologies
Francis Chizoruo Ibe, et.al. [5] used statistical methods to examine the amounts of various air contaminants in some areas of Imo State, southern Nigeria. The existence, quantities, and potential sources of emission of PM10, NO2, SO2, CO, VOC, and H2S in specific areas of Imo State, Southeast Nigeria, were examined. This study also attempted to determine whether there might be a connection between the regional variation and dispersion of pollutants and anthropogenic and industrial activity in the study area. The air pollutant concentrations in the research area varied noticeably, as seen by the ANOVA and Box and Whisker graphs. Additionally, whereas Principal Component Analysis (PCA) further grouped the pollutants into a single coherent component indicating a comparable source of emission, Hierarchical Cluster Analysis (HCA) had already classified the pollutants into one large category. The study's use of Box and Whisker plots helped to clarify how the pollutant concentrations varied according to time of day. The observed positive connection between NO2, SO2, CO, and PM10 also suggests that their respective emission sources are comparable. Planning for the environment and developing measures to reduce air pollution may benefit from this. The link and origin of the air pollutants under investigation were also shown by the PCA and HCA used in the current study. Knowing where the air pollutants come from is crucial for controlling and preventing atmospheric emissions, especially those from point sources. The wind rise diagrams were used to identify the predominant wind direction and speed that were in charge of the dispersion of air contaminants in the study area. These results highlight the importance of ongoing air quality monitoring and atmospheric analysis in cities and towns. Therefore, air quality monitoring is a proactive and crucial activity that is essential for managing, controlling, and monitoring the atmosphere. Therefore, it is imperative to regularly and continuously monitor the levels of criteria air pollutants as CO, NO2, SO2, and PM10.
Pornpun Watcharavitoon, et.al. [14] examined Bangkok's terrible air pollution. The Pollution Control Department of the Ministry of Natural Resources and Environment, Thailand, gathered the weather and air quality data utilised in this study from 1996 to 2009. The hourly air quality and meteorological data were measured at ten residential and seven roadway locations. According to Pearson's chi-square cross tabulation statistics, the 24-hour mean PM10 concentrations at both roadside and residential locations were found to be significantly higher than the Thai National Ambient Air Quality Standards (NAAQS) and World Health Organisation (WHO) recommendations. The daily maximum O3 (O3-1hr) concentration at both sites was higher than the Thai NAAQS. The CO8hr values at both sites were greater than the 8-hour time-weighted average of CO (CO8hr), but lower than the Thai NAAQS. The daily 1-hour mean concentration of NO2 and the 24-hour average SO2 concentration were both greater than the WHO recommendations, although both sites met the Thai NAAQS. A stepwise multiple linear regression model was used to analyse the important variables affecting PM10, CO8hr, O3-1hr, NO2, and SO2 levels at both locations. The results showed a declining correlation with meteorological parameters and a growing association with the region and seasons. O3-1hr levels, on the other hand, showed a diminished connection with the region under investigation. This study found that traffic emissions are the main element causing the geographic disparity in airborne pollutants in BKK, Thailand, even though meteorological circumstances may be the main factors driving the temporal changes. Our study demonstrates the various spatiotemporal facets of air pollution in BKK and offers practical solutions to address this problem.
David Nu´ñez-Alonso, et.al. [4] presents data on the pollution distribution in the province and city of Madrid from 22 monitoring stations between 2010 and 2017. Data on air pollution were interpreted and modelled using statistical methods. The data are yearly average concentrations of nitrogen oxides, ozone, and particulate matter (PM10) that were gathered in Madrid and its surrounding areas, one of the biggest cities in Europe with little research on the quality of its air. In order to show how these pollutants, relate to one another and to the local population, a map of their distribution was created. An association between several contaminants was established by the multivariate analysis using correlation analysis, PCA, CA. The findings allowed for the classification of different monitoring stations based to each of the four pollutants, exposing information about their mechanisms and average annual limits, illustrating their geographic distribution, and tracking their levels. The multivariate analysis's assessment states that NO2 levels in the most parts of the Madrid province was more than the yearly limit was also corroborated by the development of contour maps using the geostatistical technique known as ordinary kriging.
Chen Chao, et.al. [3] calculated Using China's 2013-2018 high-resolution air pollution reanalysis open data set in conjunction with statistical techniques of the Pearson Correlation Coefficient (PCC), it was possible to visualise the design and analysis of environmental monitoring big data. This approach is simple to apply, clearly illustrates the relationship between climatic variables and contaminants in the temporal and spatial sequence, and makes it simple for environmental management departments to u The experimental results show that all contaminants are positively associated, with the exception of ozone, which is adversely connected. Pollutants are more affected by meteorological factors than by temperature, air pressure, and humidity, which are all positively associated.

Data Collected
Following data is collected:

Pearson's Correlation Co-efficient
The test statistic that assesses the statistical association, or relationship, between two continuous variables is called Pearson's correlation coefficient. Because it is based on the method of covariance, it is regarded as the best method for determining the association between variables of interest. It provides details on the size of the association or correlation as well as the relationship's slant. The range of coefficient values ranges from +1 to -1, with +1 denoting a perfect positive relationship, -1 denoting a perfect negative relationship, and 0 denoting the absence of any link.
Degree of correlation: 1. Perfect: If the value is close to 1, the correlation is considered to be perfect, meaning that if one variable rises, the other tends to rise as well (if it's positive), or to fall (if it's negative). 2. High: A correlation is considered strong if the coefficient value is between 0.50 and 1. 3. Moderate: A correlation is considered to be of a medium degree if its value falls between 0.30 and 0.49. 4. Low: A correlation is considered to be of low degree when its value is less than +.29. 5. No correlation: When the value is zero, there is no association. The above matrix (Fig. 7) shows the relationship of all the 6 stations considered together for the years 2017-2021. It can be observed from the matrix PM 2.5 and PM10 are very highly correlated with co-efficient 0.82 but PM2.5 and PM10 both are very less correlated with O3, moderately correlated with NO2 and almost no correlation with CO.

MONTHLY CORRELATION
To find the effect of time and weather month-wise correlation for the year 2017-2021 for all the 6 stations has been done using Pearson's Correlation coefficient. The result is shown in the table below. In this we can observe that in the moths of July to September the pollutants come down.

Pearsons Correlation for Pollutants & Meteorological Parameters
To show the relationship between the various meteorological parameters like temperature, rainfall, windspeed, uv-index and the pollutants considered, the following graph of Pearson's correlation coefficient is given. The figure is only for all 6 region considering all the values for pollutants as well as for meteorological parameters considering years 2017-2021.
Using the 6-year plenty of-level data of PUNE city according to the monthly granularity, the mean per day data between the six pollutants (PM2.5, PM10, NO2, CO, O3) is calculated. The meteorological factors (TEMP, RAIN, UV, WIND) and the Pearson correlation coefficient of pollutants are examined statistically. The experiment's findings show a positive correlation between the pollutants, with the exception of ozone, which has a negative correlation; a greater influence of meteorological variables on the pollutants; and a relationship between the pollutants and temperature. The six contaminants have a strong negative association with wind speed, which has a bigger effect on the dispersion of pollutants.