Dependence of the sample estimates on the sample size

. The paper provides data on the dependence of the sample indicators of the arithmetic mean, variance, asymmetry and excess of the length of the needles of European spruce ( Picea rubra ) and European larch ( Larix decidua ), the average length of a pair of needles of Scots pine ( Pinus sylvestris ) and Calabrian pine ( Pinus brutia ). The sizes of samples have been determined, which make it possible to obtain the values of the estimates of sample indicators that have stabilized around their general values. The data on the difference between the law of distribution of the length of the needles of coniferous plants from the normal one are confirmed. The possibility of using graphs of the dependence of sample indicators on sample sizes for the examination of scientific data is discussed.


Introduction
In everyday practice, most often researchers have to calculate the following sample indicators: arithmetic mean, variance, asymmetry and excess. Despite the routine and simplicity, the assessment of sample indicators is fraught with a number of pitfalls, some of which can be eliminated by the correct organization of data acquisition (for example, the "observer effect" [1]), and some require processing of already obtained data. An important point is the dependence of the sample value on the sample size [2]. Methods are well known that make it possible to obtain a reliable estimate of the interval [3] containing the desired value for small samples. However, in the case when obtaining samples of a large volume is not associated with technical difficulties or ethical obstacles (the use of animals [4]), primarily in ecological and botanical studies [5] or the processing of medical statistics [6], the accuracy of the results can be increased by studying the dependence of sample indicator on the sample size. In this work, an attempt is made to experimentally estimate the required sample size to assess the arithmetic mean, variance, asymmetry and excess coefficients for conifers.

Materials and methods
Herbarium was collected in ecologically clean locations of Sergievka Park (St. Petersburg, Russia) in 2019 and Turkey resort (36 40' 58" N, 30 34' 11" E) in 2020. The length of the needles was measured with a ruler with an accuracy of 1 mm. The length of the needles was measured for European spruce (Picea rubra) and European larch (Larix decidua). The fallen needles were collected. For Scots pine (Pinus sylvestris) and Calabrian pine (Pinus brutia), the length of a pair of needles was measured and the average value was found. Herbarium collection and measurements were carried out by different researchers to ensure the "blind test" rule. Calculations of the arithmetic mean, variance, asymmetry and excess coefficients were carried out using Excel according to the following well-known formulas (1-4): where n-sample size; variance; -mean square deviation; -arithmetic mean; variant. Each sample estimate was calculated for a different sample size from 10 variants to sample size with a step of 1. Next, the graphs of the dependence of sample size on sample characteristic value were plotted, and the sample size sufficient for the calculated indicators to stabilize near their general value was estimated.

Results and discussion
The results are shown in Figures 1-16. All the graphs show a significant fluctuation in sample indicators with sample sizes up to 500. Thus, the assessment of the mean value, variance and asymmetry of the needles of European spruce stabilizes near its general value starting from the sample size of 1800-2000 variants, and the excess value -from 1200 variants. It should be noted that the asymmetry values exceed the critical value for p = 0.01 and a sample size of 2000, which allows rejecting the hypothesis of a normal distribution. The excess does not exceed the critical values. However, it stabilized at about 0.5 with a sample size of 1200 and does not tend to zero with a further increase in its size.
The mean value and asymmetry of the distribution of the length of the European larch needles stabilize around the general values with a sample size of 1000 variants, and the variance and excess -with 800 variants. The magnitude of the asymmetry exceeds the critical value for p = 0.01 and the sample size 1400, the excess does not exceed the critical value [7], but is stabilized around a nonzero value and does not tend to zero as the sample size increases.
The stabilization of the values of the sample estimates of the mean value, variance and asymmetry of the average value of a pair of Scots pine needles occurs when the sample size               The mean value and variance of the Calabrian pine stabilizes at a sample size of 1100, the asymmetry begins to fluctuate around zero starting from the 900 variants, and the excess stabilizes around 0.4 with a sample size of slightly more than 1000 variants. The values of asymmetry and excess do not exceed the critical ones, but the excess stabilizes around a nonzero value.
Thus, despite the fact that in a number of cases it is impossible to strictly reject the hypothesis of a normal distribution, the excess in all cases amounted to a nonzero value, which does not tend to zero with an increase in the sample size, while the asymmetry stabilized around zero only in the case of the Calabrian pine growing in the resort area. The normal distribution of the size assumes zero (tending to zero) values of asymmetry and excess [3].
In terms of processing experimental data, one should take into account the possibility of obtaining false positive or false negative results with small sample sizes. To avoid this, it is necessary to build a graph of the dependence of the value on the sample size.

Conclusions
1. Neglecting to substantiate the sample size can lead to false positive or false negative results. 2. In terms of the examination of experimental data, the possibility of deliberate adjustments to the data by changing the sample size should be taken into account. This method of adjusting the results is almost flawless and suitable for publications with open data, since checking the calculations will give the stated results. Revealing such moments should be done by plotting the dependence of the value on the sample size -if the graph goes steeply down or up and does not fluctuate around a certain value, the researcher took an extreme value precisely for the purpose of fine-adjusting the result. 3. The sample sizes for obtaining reliable estimates of the arithmetic mean, variance, indicators of asymmetry and excess for needles of spruce, Scots pine and Mediterranean pine are at least 800 variants. A sample size of more than 2000 variants is impractical. 4. The distributions of the length of the needles of Scots pine, European larch and average length of a pair of Scots pine have nonzero values of asymmetry and excess, the distribution of the average length of a pair of needles of Calabrian pine is characterized by zero asymmetry but nonzero excess.