Methods for analyse performance in distributed systems

. The purpose of this article is to analyze various methods for measuring central tendency in statistics, including arithmetic mean, median, winsorized mean, outlier exclusion method, Hodges-Lehmann estimator, and quantile estimation and much more. The advantages and disadvantages of each of these methods are discussed, as well as their practical applications in performance analysis in distributed systems. In particular, we focus on the importance of selecting an appropriate measure of central tendency that is robust to outliers and accurately reflects the distribution of the data. We also provide examples of how these methods can be applied to real-world datasets to gain insights into the underlying patterns and trends. Overall, this article provides a comprehensive overview of the different techniques for measuring central tendency and offers practical guidance for researchers and analysts looking to make informed decisions about perfomance analysis.


Introduction
Distributed systems have become increasingly important in modern computing due to their ability to provide high levels of scalability, fault-tolerance, and performance [20]. However, evaluating the performance of distributed systems is a complex task, as it involves measuring not only the individual components but also the interactions and dependencies between them. The ability to accurately evaluate the performance of distributed systems is crucial for ensuring that they meet their design goals and for identifying and addressing potential issues.
In the course of our investigation into various forms of performance testing, such as benchmarking, stress testing, and load testing, we delve into the application of monitoring tools and techniques for the real-time evaluation of distributed systems. It is evident that the methods of mathematical statistics serve as a means of descriptive analysis in this field of research.
For example, we posit an examination of the distribution of request per-second (RPS), assuming that it is characterized by normal distribution[1] - Fig.1., also known as the Gaussian distribution (1). RPS is a performance metric used to measure the throughput of a system. RPS represents the number of requests that a system can process in a second, and it is often used as a key performance indicator for web servers, APIs, and other distributed systems that handle a high volume of requests. Measuring RPS [8] is important for ensuring that a distributed system can handle a large number of concurrent requests without experiencing performance issues or downtime. The problem is that at hand pertains that RPS encompasses various factors, such as the duration of data retrieval from the hard disk in the event of an input/output (I/O) bottleneck, the responsiveness of the random-access memory (RAM), the degree of request queue activity, among other elements. In essence, this parameter is intricate in nature and comprises a multitude of variables.
The normal distribution can be characterized by a set of two parameters -the expected value - and the standard deviation - . However, it should be noted that not all arbitrary probability distributions can be fully specified by two parameters, because they do not display the distribution form. This issue is addressed in a scholarly article [2] that explores different forms (Fig. 2.) of the normal distribution. As a means of approbation, let us consider an API function that facilitates file creation within a distributed web application. By leveraging the widely adopted Java Spring framework, we implemented this function and with Docker Compose technology locally deployed several instances of this service in kernel mode. Following this, a load testing experiment was conducted with randomized routing and asynchronous request processing. The utilized instrumentation included the Apache JMeter application, as well as the testing platform comprising an Intel Core i9-10900KF central processing unit, 32 gigabytes of random-access memory, and a solid-state drive with a data transfer rate of 5500 megabytes per second. It is worth noting that the flow and intensity of the request remained consistent at 1 RPS throughout the entire experiment with 1000 iterations. The raw data underwent computational procedures with the assistance of the Python programming language and the NumPy library of mathematical functions. A histogram, depicted in Figure 3, was generated from the processed data.  In the plot depicted in Figure 3, it is observable that the principal modes remain confined to the range of 0 to 100 milliseconds, whereas minor outlines manifest subsequent to the 100millisecond mark. Therefore, it can be hypothesized that the quantity of spikes and the heterogeneity of distributions will increase during the analysis of real production systems. Albeit lacking full clarity, this observation substantiates the conjecture that in practical settings wherein intricate functions governing system interactions, input/output operations, and intricate computations are involved and etc., the assurance of a normal distribution's manifestation is unstable.
In summary, the major issues associated with analyzing performance [9] are as follows: -Large variability in values -measurements are not localized in a single vicinity and may have a wide discrete range along the numerical axis; -Heavy-tailed and extreme outlines; -Multimodality; -Discretization effects, wherein continuous distributions acquire a discrete nature; -Asymmetry. Henceforth, within the confines of this manuscript, we shall expound upon alternative techniques for appraising performance distributions that influence non-standard sectors of mathematical statistics, namely, central tendency, quantile approximations.

Central tendency
Central tendency [10] allows for the compression of intricate performance distributions into a single numerical value. One of the most trivial ways to implement this approach is the arithmetic mean.

Arithmetic mean
Let us consider a sample of n numerical values (2), in such an instance the arithmetic mean [11] can be computed utilizing (3).
However, if an outline include in our sample n, as in the case of (4), then the arithmetic mean loses its credibility as an indicator of central tendency.
{1, 2, 3, 4, 5, 6, 273}; 48 In general, the arithmetic mean is characterized by low robustness [12], indicating its vulnerability to the influence of extreme values, or outliers. The presence of a single extremely large value can significantly impact the accuracy of the result, hence prompting the consideration of an alternative measure, such as the median. While the median for samples (5) and (6) may yield identical values, the application of the Gaussian efficiency estimate (7) could reveal discrepancies in the respective measures.
The Gaussian efficiency can be defined as the ratio of the variance pertaining to the arithmetic mean estimate and the variance associated with the value of the estimate. After generating a set of samples from a normal distribution, we computed the metrics of the arithmetic mean and median values. The resulting density distribution indicators were displayed in Figure 4. Based on the obtained results, we constructed Table 2 for the Gaussian efficiency of the arithmetic mean and median.

Arithmetic mean Median
Gaussian efficiency 100% 64% Based on the outcomes presented in Table 2, it can be contended that substituting the arithmetic mean with the median would lead to considerable dissimilarities among the experimental results.

Trimmed and winsorized mean
Let us contemplate a sample of eight sorted values, denoted as (8).
{ , , , , , , , ,}; In order to determine the median, it is necessary to disregard elements other than 1 and 2, which occupy the central positions. While such an approach would enhance robustness, it would result in a reduction of Gaussian efficiency in this scenario. In this instance, we eliminate the highest and lowest values and subsequently compute the trimmed arithmetic mean (9) -(10) [13].
Compared to the conventional arithmetic mean, the trimmed arithmetic mean offers greater robustness and Gaussian efficiency. An alternative methodology entails utilizing the winsorized mean [14], where in the minimum and maximum values are substituted (11) - (12).
Although both of these methods provide a foreseeable robustness measure, they are not invariably optimal since if a predetermined number of elements are trimmed or winsorized, this quantity may be either excessively small or overly large.

Exclusion of outliers
As an alternative scheme for evaluating central tendency, it may be advantageous to contemplate the prospect of excluding outliers. One of the most frequently employed methods is Tukey's criteria (13) [15].
This technique approximates a particular interval using quantiles and discards outliers that fall outside of said interval. Whilst this technique may be useful for academic evaluations of data, in practice, it is possible to mistakenly exclude regular elements that have been mislabeled as outliers, thus leading to a reduction in Gaussian efficiency. If we relax the criteria for the interval, we may begin to overlook significant outliers, which could have a detrimental impact on the robustness of the data analysis. There exist numerous methods for excluding outliers, which are discussed in detail in [3]. These techniques offer different levels of efficiency in addressing the problem.
However, defining outliers poses several challenges: -Outlier detection methods frequently lack stability; -Assessing the true performance of such methods and constructing a reliable analysis model can be challenging; - Outliers are an essential component of the distribution that contains valuable information for analysis.
Suppose that our distribution can be characterized as a Pareto distribution - Figure 5. For this type of distribution, characterized as Paretto, the arithmetic mean will be equal to (14), which does not provide any useful information about the central tendency of this distribution. Therefore, it can be assumed that approaches based on evaluating the mean or median will not yield meaningful insights.

Hodges-Lehmann evaluation
The formula for calculating the Hodges-Lehmann estimator is given by equation (15). ( ) 2 The method can be described as evaluating all pairs of numbers, calculating the arithmetic mean for each pair, and then selecting the median from the resulting arithmetic means [4].
The Hodges-Lehmann method is noted for its asymptotic Gaussian efficiency of 96%, which is practically equivalent to that of the arithmetic mean, as shown in Table 2. To evaluate the robustness, the breakdown point [18] method can be utilized, which characterizes the percentage of the sample that can be replaced with extremely large values without affecting the estimate itself. Robustness estimates according to the breakdown point for all the aforementioned methods are presented in Table 3. Table 3. Robustness for methods of estimating the central tendency according to the breakdown point.

Arithmetic mean Median Hodges-Lehmann evaluation
Breakdown point 0% 50% 29% The arithmetic mean has a breakdown point of 0% because a single value can significantly distort the entire estimate. For the median, the breakdown point is 50%, since it is possible to arbitrarily modify half of the values in the sample without affecting the estimate. For the Hodges-Lehmann method, the breakdown point value is 29%, which is considered a reasonably good indicator of robustness. If 29% of the sample is composed of outliers, then they cannot be considered as outliers, but rather as a distinct group that requires separate analysis and description. In summary, it can be concluded that the Hodges-Lehmann method is a suitable replacement for both the arithmetic mean and median in assessing the central tendency for normal distributions and distributions of arbitrary shape.
Moreover, in order to evaluate the central tendency, the following measures can be used: Relevant use cases exist for each of the metrics presented. It is imperative to note that in order to select a method for assessing the central tendency, it is of utmost importance to conduct an analysis of the statistical properties of the sample at hand. By taking into account all explicit and implicit business requirements, an informed decision can be made regarding the optimal method to be employed.

Quantile estimates
Previously, we have examined an illustration of quantile estimation, specifically the median. This metric serves to bisect the probability density function of the distribution under investigation. In the event that a value is randomly sampled from the aforementioned distribution, it can be asserted that there is an inherent 50% likelihood of the selected value being situated within either of the two identified partitions.The subsequent prominent measures in the quantile estimation of distributions are quartiles -a set of three values that demarcate the distribution into four partitions. In addition, deciles serve as another set of quantile measures that partition the distribution into ten equivalent portions. And percentiles -which divide the distribution into one hundred equal parts [17] - Figure 6. Quantiles serve as a broad generalization of all the previously presented definitions. A quantile of order p is a value that satisfies the property whereby a randomly sampled value from a particular distribution possesses a probability of p of being less than the value of the p -quantile, and a probability of 1 p − of exceeding it.It is worth noting that a distinction should be made between the actual quantiles of a distribution and the quantile estimations derived from a sample thereof.
Let us contemplate a sample extracted from a normal distribution (20), and subsequently compute the quartiles via a standardized approach - Table 4.
In addition to the conventional three quartiles, which coincide with the 25th, 50th, and 75th percentiles, the sample's minimum and maximum values can be included by appending the 0th and 100th percentiles, respectively. Formally, such notation would not be admissible, though it is frequently adopted nonetheless.Let us compare the computed values of these quantities with the actual quartiles of the underlying distribution, and take note of any discrepancies. It should be emphasized that the quartiles of a given distribution are fixed values and represent a characteristic feature thereof, while the quantiles derived from a sample are inherently variable across different experiments. It is important to recognize that the complete designation of these values is not merely quantiles, but rather, quantile estimates.  While quantile estimates are a means of approximating the genuine quantiles of a distribution based on the available sample data, they are intrinsically limited in terms of precision due to the fact that the final sample size is generally insufficient to yield a truly accurate determination of the true values. Simultaneously, it is feasible to devise a significant number of diverse approaches to hypothesize the genuine value for this given sample. Broadly speaking, any function that maps a sample to a numerical value can be deemed as a quantile estimate. Furthermore, these functions are not fixed and may differ across various packages for statistical analysis. A comprehensive list of such functions for mapping sample values to a numerical estimate is presented in reference [5]. All formulas presented in this paper are derived from either a single ordinal statistic or a linear combination of two consecutive ordinal statistics. Here, the i ordinal statistic denotes the i element of the sample after sorting. The principal benefit of the formulas presented is their straightforwardness and efficiency, In the average case, the computation of a quantile estimate using the presented formulas has a time complexity of (1) O when sample was sorted. In most contemporary statistical data analysis software, the formula that is prevalently employed is the formula numbered 7 - Figure 8. Ultimately, a mathematical expectation is constructed for all ordinal statistics, thereby providing the ultimate estimation of quantiles. Frequently, these methodologies demonstrate efficacy in estimating the median; however, they may not always be effective in the extreme ends of distributions.

Moments of quantile estimates
Formula (24) can be utilized to calculate confidence intervals for quantile estimations. Furthermore, this formula is commonly referred to as the Maritz-Jarrett method .
Upon comparison with the Harrell-Davis estimate, it can be inferred that the latter is an estimation of the first moment (24). The Maritz-Jarrett method, in its turn, has demonstrated that the standard quantile error can be derived from the square root of the discrepancy between the second moment and the square of the first moment, with confidence intervals being computed based on the standard error (25).
( ) 1 Nonetheless, it is pertinent to note that the selection of an appropriate quantile estimation necessitates a thorough analysis of the efficacy and robustness of various estimation techniques, as well as their suitability for specific data properties such as the median, tail elements, or exponential smoothing. Unfortunately, this aspect has not been delved into in this paper. However, if a suitable mathematical model is chosen based on the business requirements, it may be possible to investigate sliding quantile estimates in the future.