Untangling Complex Systems, page 69
of a patient with epileptic seizures, the abundance of a species in an ecosystem, the temperature of
the earth over the centuries, the price of a commodity in the stock market, or the number of visit of
a website, or any other time series. Whatever is the time series A*, it consists of N observations collected with sampling interval Δ t, A* = { A
1
}
i , i = ,
,
… N . The time series is transformed into a matrix
whose elements are a time-lagged version of the original data:
A 1
A+
1 τ
A+1( m− )1τ
A 1
A 2
A 2 τ
A
A
+
A =
2+( m− )
1 τ
2
=
[10.49]
A
A
N −
( m− )1τ
AN−( m−2)τ
AN
N −( m
− )
1 τ
Each row vector of the matrix [10.49] is a single point in the phase space. The integer time delay τ
is a dimensionless parameter defined as the actual time delay, τ , divided by the sampling interval,
R
Δ t. The parameter m is the embedding dimension of the phase space. The number of observation, N, must be much larger than τ and m.
10.8.1.1 Time Delay τ
First, we need to determine the time delay τ. Its calculation must be carried out carefully. If we
choose a τ that is too small, then, adjacent components A and A
i
i+τ are too highly correlated for
them to serve as independent coordinates. If τ is too large, then, neighboring components are too
uncorrelated. We can use two methods for the determination of τ: one based on the calculation of
the autocorrelation and the other that grounds on the calculation of the mutual information. The
autocorrelation of the matrix A is determined by using the following algorithm:
N
1
c τ
( ) =
AiA
N
i τ
+
∑
[10.50]
i=1
A good choice for the time delay is the smallest τ value for which it results (Lai and Ye, 2003)
c τ
( ) 1
c( ) ≤
0
e
[10.51]
344
Untangling Complex Systems
The mutual information for a partition of real numbers in a certain number of boxes is given by:
p τ
I τ
( ) = p
ij
∑ ( ) ( )
ij τ ln
[10.52]
pi p
ij
j
where:
p is the probability of finding a time series value in the i- th interval,
i
p in the j- th interval,
j
p (
ij τ) is the joint probability that an observation falls in the i- th interval and at the observation time τ later falls into the j- th.
The calculation is repeated for different values of τ. The best choice for the time delay is the τ value for which the mutual information has the first marked minimum (Fraser and Swinney 1986).
10.8.1.2 Embedding Dimension m
For the determination of the embedding dimension m, there is the false nearest neighbor method
(Kennel et al. 1992). For each point Ai in the m-dimensional phase space, the method looks for its
nearest neighbor Aj. It calculates the square of the Euclidean distance:
m 1
−
R 2
2
( +
+
∑ τ
τ )
m i =
−
,
Ai k
Aj k [10.53]
k=0
Then, it goes from dimension m to m + 1. A new coordinate is added to each vector Ai and Aj, and the distance becomes
R 2
2
2
1,
,
( τ
τ )
m+ i = Rm i + Ai+ m − Aj+ m
[10.54]
Any neighbor for which
1/2
R 2
2
τ −
m+1 i, − R
A
,
i+ m
A
m i
j+ mτ
[10.55]
2
=
> R
R
t
m i,
R
m, j
where R is a given heuristic threshold, is designated as false. When the number of false nearest
t
neighbors is zero, m is the suitable embedding dimension. The value of R is arbitrary. Cao (1997)
t
proposed another algorithm to avoid the arbitrary choice of R . Cao’s algorithm is:
t
N− mτ
1
R
E ( m
m+1 i
) =
, [10.56]
τ ∑
N − m
R
i
m i
=1
,
E ( m) depends on m and τ. In [10.56], we must consider only the distances of the nearest neighbors.
For the investigation of E ( m) variation when the embedding dimension increases of one unit, Cao
defined E 1( m) = E( m +1) / E( m). E 1( m) stops changing when m is greater than a value m if the 0
time series comes from an attractor. Then, m
0 + 1 is the best estimate of the minimum embedding
dimension. For a stochastic time series, E 1( m) never attains a saturation value as m increases.
Sometimes it is difficult to resolve whether E 1( m) is slowly increasing or has stopped changing if
m is large. There is another quantity to circumvent this difficulty. It is E 2( m) . First, we calculate N− mτ
1
E* ( m) =
R [10.57]
τ ∑
N − m
m i,
i=1
The Emergence of Chaos in Time
345
Then, we define E 2( m) = E* ( m +1) E*
/
( m). For stochastic data, since the future values are indepen-
dent of the past values, E 2( m) will be equal to 1 for any m. On the other hand, for deterministic time series, E 2( m) is certainly related to m. As a result, it cannot be a constant for all m; there must exist some m’s such that E 2( m) ≠ 1. E 2( m) is useful to distinguish deterministic from random time series.
10.8.1.3 Lyapunov Exponents
After building the phase space of a time series, an essential indicator of the possible presence of
deterministic chaos is the exponential divergence of initially nearby trajectories. Such divergence
can be probed by determining the Lyapunov exponents. The Rosenstein’s (Rosenstein et al. 1993)
and Kantz’s (1994) methods allow calculating the largest Lyapunov exponent.
The Rosenstein’s method locates the nearest neighbor of each point in the phase space. The
nearest neighbor of Aj is the point Aj, which minimizes the distance
d ( )
j 0 = min Aj − Aj [10.58]
where denotes the Euclidean norm. The j-th pair of nearest neighbor diverges approximately at a
rate proportional to the largest Lyapunov exponent ( λ). In fact, the distance between the two points,
after a time iΔ t (where Δ t is the sampling period of the time series) is
d
λ i t
( ) ~ (0) ( ∆ )
j i
d j
e
[10.59]
In logarithmic form, equation [10.59] becomes
lnd ( )
(0) λ ∆ )
j i = lnd j
+ ( i t [10.60]
The linear relation [10.60] can be defined for all the points in the phase space. Finally, the largest
Lyapunov exponent is calculated using a least-squares fit to the “average” line defined by
1
y( i) =
lnd ( )
j i
[10.61]
t
∆
Kantz’s method consists in computing
N
1
1
D ( t
∆ ) =
ln
dist
∑
, ;
[10.62]
1
∑
( T
)
j Ti
t
∆
N
j=
U
∈
j
i U j
where N is the length of time series. The term dist ( T
)
j , Ti ; t
∆ is the distance between a reference
trajectory T and a neighboring trajectory T after the relative time, Δ t, determined as
j
i
dist ( T
)
j , Ti ; t
∆ = Aj − Ai [10.63]
where Ai is a neighbor of Aj after Δt. Uj is the ε-neighborhood of Aj, i.e., the set of all the delay vectors of the series having a distance less than or equal to ε with respect to Aj. The quantity D(Δt) is calculated for different sizes of ε (ranging from a minimum given by (data-interval)/1000 to a
maximum given by (data-interval)/100) and for different embedding dimensions, m. The slope of
D versus Δ t is an estimation of the maximal Lyapunov exponent.
346
Untangling Complex Systems
10.8.1.4 Kolmogorov-Sinai Entropy
In a phase space of dimension m, a hypersphere of initial conditions, having hypervolume V, evolves
into a hyper-ellipsoid whose principal axes change at rates given by the Lyapunov exponents. In any
chaotic conservative system, the so-called chaotic Hamiltonian systems, the sum of the positive
Lyapunov exponents is counterbalanced by the sum of the negative Lyapunov exponents. In fact,
according to the Liouville’s theorem, the total hypervolume V of a Hamiltonian system remains
constant during its evolution in time. In the case of the dissipative chaos, when there is degrada-
tion of energy due to friction, viscosity, or other processes, the total hypervolume V shrinks. The
hypervolume V contracts and the system remains confined in a strange attractor. The sum of
the negative Lyapunov exponents outweighs the sum of the positive ones. This does not mean that
the hypervolume V contracts lengths in all directions. Intuitively, some directions stretch, provided
that some others contract so much that the final hypervolume is smaller than the initial one. The
exponential separation of close trajectories, which is a feature of any chaotic dynamic, takes place
along the directions characterized by positive Lyapunov exponents (Eckmann and Ruelle 1985).
The sum of all the positive Lyapunov exponents equals the Kolmogorov-Sinai entropy ( S ):
K
SK =
i
∑ λ [10.64]
i
λ >0
The Kolmogorov-Sinai entropy is related to the Information Entropy (Frigg 2004). It has been
formulated to describe the uncertainty we have in predicting a chaotic dynamic in its phase space.
A dynamical system in the phase space of total hypervolume M is like an information source. This
is evident if we partition M in a certain number of rigid, non-intersecting boxes: β = { bi i =1,…, k}, covering the entire phase space. The evolution of a differentiable dynamical system is described by
m differential equations of the first order of the type
dA( t)
= f ( A( t)) [10.65]
dt
in the case of continuous time, or by a map
A( n + )
1 = f ( A( n)) [10.66]
in the case of discrete time.
If we start from an initial condition A(0), the system traces a trajectory in its evolution. In other
words, the system generates a string that is the list of boxes it visits.9 For example, if we consider a partition consisting in two boxes, β = { b 1, b 2}, then the dynamical system generates a string of
the same sort as the one we obtain from a source that sends only binary digits. As it occurs when
we receive an unknown message, also in the case of a chaotic dynamic, we are not acquainted with
9 The definition of S grounds on the idea that the volume in the phase space can be interpreted as a probability. The phase K
space is partitioned in a certain number of rigid, non-intersecting boxes: β = { b i 1, , k . Together, they cover i
= … }
the entire phase space of hypervolume M. It is worthwhile noticing that if β = { b i 1, , k is a partition of M, then i
= … }
f β = { fb | = ,
1 …, } is as well, where f is the function [10.65] or [10.66] describing the evolution of the system. All we
i
i
k
know at each iteration (for discrete systems) or after each fixed time step (for a continuous system) is in which box the trajec-
tory ( A) lies. The measurements that A lies in box b at time , and in box at time , tell us that A, in fact, lies in the region 0
t 0
b 1
t 1
b
1
∩ − ( ), i.e., in the intersection of b with the pre-image of b . The preimage f −1 ( b ) of each b is all points that will 0
f
b 1
0
1
i
i
be mapped into b after a single iteration or time step. The intersection of the two sets b and f −1 ( b ) gives a finer partition i
i
j
β of the attractor: β
1
= { b
and the entropy of this partition is ( ) = −∑ ( )
( ) , where p is the
i ∩
−
f ( bj )}
1
1
S β
p
1
ij
log pij
1
ij
ij
probability of finding a point over the box b ∩ − ( ). The Kolmogorov-Sinai entropy is the supremum, taken over all i
f
bj
choices of initial partition β , of the limit over an infinite number of iteration or time steps (
=
β
0
N→∞): S
sup(lim( /
1 N) ( )),
K
S N
−1
−2
− N
N →∞
with β = b
{
. A positive value of S is proof of chaos.
0 ∩ f
b 1 ∩ f b 2 … ∩ f b }
N
N
K
The Emergence of Chaos in Time
347
what box the system will be in the next. Due to the restrictions on the accuracy of the definition of
the initial conditions, we may not know accurately what the system’s initial state is, and this uncer-
tainty is even amplified by the chaotic dynamic as time goes on. Therefore, we gain information
when we learn that the system is in one box, let us say b , rather than into the other, . And the
1
b 2
information amounts to one bit. The replacement of the two-boxes-partition with one having more
boxes is the analogue to the transition from an information source with two symbols to one with
more symbols. If a dynamic system has a positive S means that whatever the past history of the
K
system, we are not able to predict with certainty in what box of the partition the system’s state will
line next. Moreover, the magnitude of S is a measure of how high our failure to predict the future
K
will be; the greater the Lyapunov exponents, the larger the Kolmogorov-Sinai entropy, the more
uncertain we are about the future.
10.8.1.5 Correlation Dimension
Dissipative systems with chaotic dynamics generate attractors that are strange. Strange attractors
have fractal structures and typically non-integral dimensions. A way of determining the strangeness
of an attractor consists in measuring the correlation dimension D by the Grassberger-Procaccia
C
(1983) algorithm. The Grassberger-Procaccia algorithm requires calculation of the correlation sum:
N ′
1
C ( m,ε ) =
Θ
[10.67]
2 ∑
(ε − A
)
i − A
N
j
′
i≠ j
where:
ε is the size of the cell (whose minimum is given by (data interval)/1000 and whose maximum
is the (data interval)),
N′ = N −τ ( m − )
1 ,
Θ is the Heaviside step function, which is 1 when Ai − Aj ≤ ε , whereas it is 0 when Ai − Aj > ε.
Then, the correlation sum is used to determine the correlation dimension through equation [10.68]:
