Statistical research and modeling network traffic

The self-similarity properties of the considered traffic were checked on different time scales obtained on the available daily traffic data. An estimate of the tail severity of the distribution self-similar traffic was obtained by constructing a regression line for the additional distribution function on a logarithmic scale. The self-similarity parameter value, determined by the severity of the distribution “tail”, made it possible to confirm the assumption of traffic self-similarity. A review of models simulating real network traffic with a self-similar structure was made. Implemented tools for generating artificial traffic in accordance with the considered models. Made comparison of artificial network traffic generators according to the least squares method criterion for approximating the artificial traffic point values by the approximation function of traffic. Qualitative assessments traffic generators in the form of the software implementation complexity were taken into account, which, however, can be a subjective assessment. Comparative characteristics allow you to choose some generators that most faithfully simulate real network traffic. The proposed sequence of methods to study the network traffic properties is necessary to understand its nature and to develop appropriate models that simulate real network traffic.


Introduction
Models for estimating the network traffic servicing characteristics remain at present actual scientific tasks. Reliable network traffic estimates are necessary in the planning of the telecommunications networks development, the differentiated service policies choice and computing resources characteristics that guarantee the required quality of service with the appropriate network load [1,2].
The inflamed interest in studying the network traffic nature is explained by the results of studies showing the long-term dependencies presence in the traffic or the self-similarity process. These changes in the traffics structure are associated with the implementation of the single multiservice network concept, involving the voice, data and multimedia integration [3,4].
To date, the self-similar stochastic processes theory is not as well developed as the Poisson processes theory. Given the known conclusions about the network traffic selfsimilarity, the actual tasks are the methods of its study and the tools development for generating artificial traffic that adequately reflects the real heterogeneous network traffic [5].

Properties and Characteristics Self-Similar Processes
Self-similarity describes a phenomenon in which some statistical characteristics of the process are preserved when the time is scaled. When averaging over the time scale in a selfsimilar process, there is no rapid "smoothing", that is, a tendency to bursts persists.
Properties that characterize the self-similarity of the process are such as slowly damped dispersion, long-term dependence, the presence of a distribution with heavy "tails" [6].
The property of a slowly decaying dispersion is that the variance of the sample mean decays more slowly than the inverse of the sample size, that is ( )  2 −variance of the process X (t); n − sample size; H − Hurst parameter (self-similarity parameter), 0.5 <H <1. Note that for traditional random processes, the variance of the sample mean decreases inversely with the sample size: ( ) The presence of a long-term dependence lies in the fact that the self-similar process has a hyperbolically damped correlation function.
The Pareto-distribution is determined by the distribution function in the form (2 2) ( ) ( ), 1, , L(k) − slowly varying function at infinity, for which The property of having a distribution with a heavy "tail" is that the random variable X has a distribution with a heavy "tail", if 0 < <2 − parameter of the distribution form; c − positive constant.

Methods for Investigating the Self-Similar Process
There are a techniques number that allow us to verify the self-similarity property of the process under investigation.
The self-similarity effect can be observed on the graphs illustrating the change in the time scale, in which the structure of the series obtained by averaging the groups of elements remains the same as the structure of the original one. This fact is a prerequisite for the assumption of the process self-similar structure under consideration and the basis for further analysis.
Further, it is necessary to estimate the distribution "tail" gravity ( ) ( ) Next, you need to assess the gravity "tail" of the distribution - parameter. To assess the , it is necessary to construct a graph additional distribution ( ) angle inclination tangent of the regression line for ( ) F x the horizontal axis is the parameter value .
The properties of the heavy-tailed distributions are as follows: ─ If 2, then the distribution has infinite variance; ─ If 1, then the distribution has an infinite average. ─ As  decreases, an arbitrary large portion of the density can be represented in the tail of the distribution. In fact, a heavy tail means the presence of infinite variance, in other words, a random variable can take very large values, but with a very small probability.
Regression equation derived y=-1.29x+0.5067 shows that the  takes a value equal to 1.29 and [0; 2], from which it follows that the traffic distribution data has the property of a heavy tail.
Knowing the  you can find the parameter of self-similarity. We calculate the selfsimilarity parameter value H=(3-1.29)/2=0.855, which also confirms the process selfsimilarity properties under consideration, since H[0. 5,1].
It is known that the Hurst parameter is a measure of persistence -the tendency of the process to trends. Using the example of filling the tank with incoming and outgoing flow, it can be shown how the Hurst parameter was originally calculated. And so that the water in the tank is stationary, it is necessary that the output flow is equal to the average input, so that the tank never empties or overflows (Fig. 9).
where R − the difference between the maximum and minimum values of S for N time units. Thus, R is the value that best describes the x variability. The H is related to the coefficient of the normalized scope R∕S, where R is the scope of traffic on the entire time series, and S is the standard deviation In

Simulations of self-similar traffic
The traditional analysis of telecommunication systems, which is based on the assumption of the Poisson flow, cannot accurately estimate the amount of computing resources and system performance in terms of pulsating traffic [7]. The necessary tools for generating artificial traffic that corresponds to the properties of real network traffic that can be used when modeling the processes of transmission, storage and processing of network traffic.
There are only a few models that are designed to simulate self-similar traffic. The work implements the tools for generating artificial traffic on the models listed in Table. 2 [8]. Comparative characteristics allow you to choose generators that mimic the real network traffic as plausibly as possible [9]. When comparing, the criterion of the least squares method Y is approximated by the point values of the artificial traffic by the approximating function of the real traffic where F(xi) − values of the approximating function at the points xi of artificial traffic; yi − specified array of source traffic at points xi. Every 60th minute is taken as a point, for a total of 24 hours -24 points.
gamma function; H −Hurst parameter; dB(t') − independent random displacements of the Brownian particle at time t'; K(t-t') − memory function of the system:  In addition to the quantitative assessment, Table 2 also provides qualitative assessments in the form of the laboriousness of implementing a software generator (the number of tunable parameters or the need for training). This is a subjective assessment, which is difficult to estimate, for example, as the time spent on generator programming or the complexity of the algorithm, since everything depends on the size of the pack, the time spent setting up one parameter and a set of parameters, programming knowledge, and others. For example, despite the fact that the neural network model showed the best result according to the Y criterion, most of its time was spent on choosing the neural network architecture and then setting up the model (3 days), while the model of fractal Gaussian noise was implemented in 40 min, but the Y criterion is 17.5 times greater than that of the neural network model. Moreover, to simulate traffic with a different Hurst parameter, the procedure for choosing a neural network architecture and its training will be required again.
Analysis of the above models allows you to concentrate on the last three presented in Table. 2 and use them in solving problems of modeling telecommunication systems and networks with the resulting global problems − planning the development of telecommunication networks, implementing differentiated services, evaluating the characteristics of computing resources that guarantee the required quality of service of the corresponding traffic [10][11][12].

Conclusion
The article presents the traffic research results in order to identify its self-similarity property. The assumption about traffic self-similar structure is based on the consideration of available data for a different timeline. Using the method of additional distribution function constructing for a logarithmic scale, the gravity of the tail distribution and the selfsimilarity parameter are estimated. The results obtained allowed us to verify the traffic selfsimilarity properties in question according to the definition and thus confirm the traffic selfsimilarity assumption.
Such studies are necessary for understanding the network traffic behavior and developing models that simulate the process real traffic entering the network. A review existing simulations of self-similar traffic was performed. It is assumed that the model adjustment can be performed according to the Hurst parameter if there are recorded real network traffic traces.