Characteristics of Network Traffic

What is self-similar traffic?
What is long-range dependance?
Estimating value of H
C++ code to measure Hurst parameter
Q & A


What is self-similar traffic?

[ Top ]
Consider a cumulative process Y(t) with stationary increments, and let Xt be its corresponding incremental process:
Xt = Y(t) - Y(t-1)(1)
For example Y(t) can represent the number of bytes arriving up to time t, and Xt can represent the number of bytes arriving in one unit of time.
The process Xs(m) is an aggregated process of Xt if:
Xs(m) = [Xsm-m+1 + Xsm-m+2 + ... + Xsm]/m(2)
Process Xt is said to be self-similar if Xt is indistinguishable from Xs(m). This is a very restrictive definition, especially considering the stochastic nature of the network traffic. Usually, second-order self-similarity is considered for the purposes of traffic description: auto-covariance functions of the original and aggregated processes should have the same values.
Let
γ(k) = E[(Xt - μ)(Xt+k - μ)](3)
and
γ(m)(k) = E[(Xs(m) - μ)(Xs+k(m) - μ)](4)
Then, the process Xt is exactly second-order self-similar if
γ(m)(k) = γ(k)(5)
and asymptotically second-order self-similar if
limm→∞ γ(m)(k) = γ(k)(6)
A convenient measure of a process′s distributional self-similarity is its Hurst parameter H. A process is self-similar with parameter H (0 < H < 1) if:
Y(t) = k-H Y(kt), ∀ k > 0, t ≥ 0(7)
i.e., the original and normalized aggregated processes should have the same distribution. Translating it to the corresponding stationary increment process, we get
Xs(m) = [Xsm-m+1 + Xsm-m+2 + ... + Xsm]/m = [Y(sm) - Y(sm-m)]/m = [Y(m) - Y(0)]/m(8)
From Eq.(7), it follows that Y(m) = mHY(1). Thus,
X(m) = [mH/m]*[Y(1) - Y(0)] = mH-1 X(9)
The self-similarity can be viewed as an ability of an aggregated process to "preserve" the burstiness of the original process, viz., the property of slowly decaying variance:
var(X(m)) ~ m2H-2(10)
In the context of network traffic, this means that aggregating traffic over large time intervals reduces the burstiness very slowly (compared to non-self-similar traffic).

What is long-range dependance?

[ Top ]
The property of long-range dependence refers to a non-summable auto-correlation function ρ(k):
ρ(k) = γ(k) / σ2 = E[(Xt - μ)(Xt+k - μ)] / E[(Xt - μ)2](11)
For 0 < H < 0.5 and 0.5 < H < 1
ρ(k) ~ H(2H-1)k2H-2, k → ∞(12)
It is clear, that if 0.5 < H < 1, then
ρ(k) = ∞(13)
i.e., the process is long-range dependant.
The long-range dependence results from a heavy-tailed distribution of the corresponding stochastic process. Heavy-tailness refers to the rate of tail decay of the complementary distribution function. In a heavy-tailed distribution, the decay obeys the power law:
P[X > x] ~ cx,    x → ∞, 1 < α < 2;(14)
As a result, the probability of an extremely large observation in the LRD process is non-negligible. In the context of network traffic, this means that extremely large bursts of data (packet trains) and extremely long periods of silence (inter-arrival times) will occur from time to time. This is one of the reasons why analytic models employing traditional negative exponential distribution often provide overly optimistic estimates for the delays and queue sizes – the probability of an extreme event is negligible.


Estimating value of H

[ Top ]
From Eq. (9) we get
var(X(m)) = m2H-2var(x)(15)
or
log[var(X(m)) / var(x)] = (2H-2) log(m)(16)
Eq. (16) suggests that log-log plot of variance vs. aggregation level m should have a linear slope of value 2H-2. The following figure shows a variance-time log-log plot which can be used to estimate the Husrt parameter of a stochastic process. In a log-log plot, the x-axis represents the logarithm of the aggregation parameter m and the y-axis represents the logarithm of the normalized variance. The LRD traffic plot shows a linear dependency (except tail region) with slope s value close to −0.4. From Eq. (16), we expect the log-log plot to have a slope s = 2H − 2. Thus, the Hurst parameter is estimated to be H = 1 + s/2 = 0.8.

The second plot called SRD traffic was obtained on the same generator, but using sub-streams with exponentially-distributed on/off periods. This distribution possesses no long-range dependence and its variance-time log-log plot is expected to have a slope of −1 (var(X(m)) ~ m-1).



C++ code to measure Hurst parameter

[ Top ]
List of files:

  1. _rand_MT.h
  2. _types.h
  3. _util.h
  4. avltree.h.h
  5. broadcom_pdf.h
  6. MersenneTwister.h
  7. stats.h
  8. PolyFit.h
  9. trf_gen_v3.h
  10. var_time.h
  11. vartime.cpp
This C++ program measures synthetic traffic produced by Self-Similar traffic Generator v.3. It builds the var-time log-log plot and estimates the Hurst parameter by finding linear least-squares fit to the measured graph and calculating its slope. The following table shows the output of the program.
POINT #WINDOWLOG WINDOW (X-coordinate)LRD VARLRD NORM VARLRD LOG NORM VAR (Y-coordinate)LRD LINEAR FIT (Y-coordinate)SRD VARSRD NORM VARSRD LOG NORM VAR (Y-coordinate)SRD LINEAR FIT (Y-coordinate)
00.000100.0628126100.007465510.0346508100.0377225
10.0001623780.2105260.05269570.838936-0.076271-0.07438660.02279690.657904-0.181838-0.170089
20.0002636650.4210530.04400540.700582-0.154541-0.1562390.01460550.421507-0.375196-0.3779
30.0004281330.6315790.03662070.583016-0.234319-0.2380910.009201520.26555-0.575854-0.585712
40.0006951930.8421050.03031680.482655-0.316363-0.3199430.005743940.165766-0.780503-0.793523
50.001128841.052630.02508270.399326-0.398673-0.4017950.003573640.103133-0.986603-1.00133
60.001832981.263160.02069730.329509-0.482133-0.4836470.002205380.063646-1.19623-1.20915
70.002976351.473680.01708620.272018-0.565402-0.5654990.00136420.0393699-1.40484-1.41696
80.004832931.684210.01417490.22567-0.646526-0.6473510.0008394180.0242251-1.61574-1.62477
90.00784761.894740.01172620.186685-0.72889-0.7292030.0005164350.014904-1.8267-1.83258
100.01274272.105260.009690330.154274-0.811708-0.8110560.0003171550.0091529-2.03844-2.04039
110.02069142.315790.008022010.127713-0.893763-0.8929080.0001947840.00562135-2.25016-2.2482
120.03359822.526320.006564470.104509-0.980847-0.974760.0001210960.00349476-2.45658-2.45601
130.05455592.736840.005499380.0875522-1.05773-1.056617.34E-050.00211776-2.67412-2.66383
140.08858672.947370.004625360.0736376-1.1329-1.138464.60E-050.00132829-2.87671-2.87164
150.1438453.157890.003761080.0598778-1.22273-1.220322.79E-050.000804806-3.09431-3.07945
160.2335723.368420.003115080.0495932-1.30458-1.302171.71E-050.000493271-3.30691-3.28726
170.3792693.578950.002685540.0427548-1.36902-1.384021.10E-050.000317988-3.49759-3.49507
180.6158483.789470.002157870.0343541-1.46402-1.465876.81E-060.000196651-3.7063-3.70288
19140.001850910.0294672-1.53066-1.547724.38E-060.000126402-3.89825-3.9107

LRD test: packet count2.46E+08
LRD test: traffic load0.26099
LRD test: expected Hurst parameter0.8
LRD test: measured Hurst parameter0.805601

SRD test: packet count2.36E+08
SRD test: traffic load0.249803
SRD test: expected Hurst parameter0.5
SRD test: measured Hurst parameter0.506448
[ Top ]