Distinguishing Noise from Chaos

Chaotic systems share with stochastic processes several properties that make them almost undistin-guishable. In this communication we introduce a representation space, to be called the complexity-entropy causality plane. Its horizontal and vertical axis are suitable functionals of the pertinent probability distribution, namely, the entropy of the system and an appropriate statistical complexity measure, respectively. These two functionals are evaluated using the Bandt-Pompe recipe to assign a probability distribution function to the time series generated by the system. Several well-known model-generated time series, usually regarded as being of either stochastic or chaotic nature, are analyzed so as to illustrate the approach. The main achievement of this communication is the possibility of clearly distinguishing between them in our representation space, something that is rather difﬁcult otherwise.

Although being of a quite different physical origin, time series arising from chaotic systems (CS) share with those generated by stochastic processes (SP) several properties that make them almost undistinguishable: (1) a wide-band power spectrum (PS), (2) a delta-like autocorrelation function, (3) an irregular behavior of the measured signals, etc.In fact this similitude has made it possible to replace SP by CS in many practical applications.We attempt here to distinguish between SP and CS by recourse to an appropriate representation whose starring role is played by a socalled complexity measure.We deal with well-known models that generate time series according to prespecified rules.This is to be contrasted with the situation posed by real data, however, that always possess a stochastic component due to omnipresent dynamical noise [1,2].Indeed, Wold proved [1] that any (stationary) time series can be decomposed into two different parts.The first (deterministic) part can be exactly described by a linear combination of its own past; the second part is a moving average component of a finite order.Hence it may seem superfluous to ask whether a time series generated by ''natural processes'' is either deterministic, chaotic, or stochastic.However, having in mind Wold's theorem [2] it makes sense to ask, with respect to the deterministic part (predictable from the past), whether (i) it is dominant vis-a `-vis the unpredictable stochastic part or (ii) it is of a regular or chaotic nature.CS always produce time series with a physical structure.Looking for this physical structure is our leifmotif.In order to do this several statistical complexity measures have been recently introduced in the literature, based on the notion of the so-called ''disequilibrium'' [3][4][5].In Ref. [5] it was advanced a statistical complexity measure version that is (i) able to grasp essential details of the dynamics, (ii) an intensive quantity, and (iii) capable of discerning among different degrees of periodicity and chaos.This measure, to be referred to as the intensive statistical complexity measure C JS P, is a functional of the probability distribution P associated with the time series.C JS writes C JS P Q J P; P e H S P; (1) associating, to the probability distribution P fp j ; j 1; . . .; Ng, the entropic measure with S max SP e lnN, (0 H S 1).We denote by P e f1=N; . . .; 1=Ng the uniform distribution while SP ÿ P N j1 p j lnp j stands for the Shannon's entropy.Q J , the above referred disequilibrium, is defined in terms of the extensive Jensen-Shannon divergence (it induces a squared metric, in contrast to the Kullback-Leiber divergence [5]) as follows: with Q 0 a normalization constant (0 Q J 1).The disequilibrium Q J is an intensive quantity being different from zero only if there exist ''privileged,'' or ''more likely'' states among the accessible ones [3][4][5].A critical point is that of using the methodology proposed by Bandt and Pompe [6] In this expression, the symbol ] stands for ''number.''The Bandt-Pompe method [6] for evaluating the probability distribution P is based on the details of the attractorreconstruction procedure and causal information is duly incorporated in the construction process that yields P 2 (with the probability space) [8].A notable Bandt-Pompe result is a clear improvement in the performance of the information quantifiers obtained using their P-generating algorithm.One must assume with them that the system fulfills a very week stationary condition (for k < D, the probability for x t < x tk should not depend on t) [6] and that enough data are available for a correct attractor reconstruction.The advantages of the Bandt-Pompe method reside in (a) its simplicity, (b) the concomitant extremely fast calculation process, (c) its robustness, and (d) its invariance with respect to nonlinear monotonous transformations.Also, it can be applied to any type of time series (regular, chaotic, noisy, or reality based) [6].Finally, it is important to remark that calculations made with the Bandt-Pompe prescription are robust in the presence of observational and dynamical noise [6] ).If we allow D to grow without bounds, significant (and important from a theoretic viewpoint) consequences ensue [10].We remark that the above statistical complexity measure quantifies not only randomness but also the degree of correlational structures, and consequently it is not a trivial function of the entropy in the sense that, for a given H S value, there exists a range of possible C JS 's values between a minimum C min and a maximum C max [3].Thus, evaluating C JS provides one important additional piece of information regarding the peculiarities of a probability distribution, not already carried by the entropy.The additional information disappears if, for example, a probability density function (PDF) based on histograms is used.
A general procedure for obtaining the bounds C min and C max corresponding to the statistical complexity measures' family is given by Martı ´n, Plastino, and Rosso in Refs.[4].
One may content oneself with using just Q J , instead of C JS .
If so doing, the advantage of having a quantifier guaranteeing (as C JS does) (1) a zero value for both regular and completely random series and ( 2) maximum values for systems with ''immersed'' (or hidden) structures [4], would be lost.In order to study the time evolution of C JS , a diagram of C JS versus H S can be used, the CH plane (in this case, H S can be regarded as an arrow of time [12]).Also, this kind of diagram has been used to study changes in a system's dynamics originated by modifications of some characteristic parameters [3,5,[13][14][15][16].
Processes here studied were selected as illustrative examples of (a) CS and (b) SP, two different classes of processes in the sense indicated in the introduction.We dealt with the following five kinds of CS: (1) The logistic map: [17] defined by Note that for r 4 this map has a nonuniform natural invariant PDF.
(3) Henon's map: it is a 2D extension of the logistic map [17] given by x n1 1 ÿ ax 2 n y n y n1 bx n : The values used here, a 1:4 and b 0:3, correspond to a chaotic attractor with a nonsmooth PDF.
(5) Schuster maps: Schuster and co-workers [17] introduced a class of maps which generate intermittent signals with chaotic bursts that also display 1=f z noise, In particular, results for z 5=2, 2, and 3=2 are reported.We considered the following two kinds of SP here: (6) Noises with f ÿk PS generated as follows: (a) The MATLAB©RAND function is used to produce pseudo random numbers in the interval ( ÿ 0:5, 0.5) with an (i) almost flat PS, (ii) uniform PDF, and (iii) zero mean value.(b) Then, the fast Fourier transform (FFT) y 1 k is obtained and multiplied by f ÿk=2 , yielding y 2 k ; (c) Now, y 2 k is symmetrized so as to obtain a real function and then the pertinent inverse FFT x i is obtained, after discarding the small imaginary components produced by our numerical approximations.The ensuing time series x i has the desired PS and, by construction, is representative of non-Gaussian noises.
(7) Fractional Brownian motion (FBM) and fractional Gaussian noise (FGN): FBM is the only family of processes which is (a) Gaussian, (b) self-similar, and (c) endowed with stationary increments (see Ref. [16] and references therein).The normalized family of these Gaussian processes, fB H t; t > 0g, is endowed with these properties: (i) B H 0 0 almost surely, i.e., with probability 1, (ii) EB H t 0 (zero mean), and (iii) covariance given by for s, t 2 R.Here E refers to the average computed with a Gaussian PDF.The power exponent 0 < H < 1 is commonly known as the Hurst parameter (exponent).These processes exhibit ''memory'' for any Hurst parameter except for H 1=2, as one realizes from Eq. (11).The H 1=2 case corresponds to classical Brownian motion and successive motion increments are as likely to have the same sign as the opposite (there is no correlation among them).Thus, Hurst's parameter defines two distinct regions in the interval (0, 1).When H > 1=2, consecutive increments tend to have the same sign so that these processes are persistent.For H < 1=2, on the other hand, consecutive increments are more likely to have opposite signs, and we say that they are antipersistent.Let us introduce the quantity fW H t; t > 0g (FBM ''increments'') so as to express our Gaussian noise in the fashion Note that for H 1=2 all correlations at nonzero lags vanish and fW 1=2 t; t > 0g thus represents white noise.The FBM and FGN processes are continuous but nondifferentiable processes (in the classical sense).As a nonstationary process, they do not possess a spectrum defined in the usual sense; however, it is possible to define a generalized power spectrum of the form: / jfj ÿ , with 2H 1, 1 < < 3 for FBM and, 2H ÿ 1, ÿ1 < < 1, for FGN.Because of their Gaussian nature, and other characteristics above enumerated, the Bandt-Pompe ideas are applicable to the FBN and FGN dynamical process [18].For evaluating the FBM and FGN time series we adopt the Davies-Harte algorithm [19], as recently improved by Wood and Chan [20], which is both exact and fast.
For all the cases we studied here 10 time series of 2 15 data each were analyzed, each series starting at a different initial condition.The concomitant mean values of both H S and C JS are plotted in Fig. 1.
All the CS under scrutiny have entropies that, in our causality plane, are seen to be (i) in the entropy region lying between 0.45 and 0.7, (ii) located near to the maximum C JS .This entails that high C JS values are produced by structures immersed in chaotic time series.Higher H S values may be obtained using randomizing techniques to increase the mixing and destroying correlational structures, as will be reported elsewhere.For the logistic map the effect of a control parameter r < 4 was also studied.We found in our causality plane, the pertinent points lie in a low entropy region, although they always remain near the maximum complexity curve, in agreement with what happens in the case of a binary PDF [5].For the Henon map both the X and Y coordinates have the same ordinal structure and they have the same point in the CH plane.Schuster maps have also low entropy values H S < 0:6.In this case the reason is that these maps exhibit laminar regions separated by chaotic bursts.Their complexity is lower than that of the chaotic case.When the parameter z decreases, H S increases and the representative trajectory always remains below (but close) to that for chaotic maps.This is so because the size of laminar regions diminishes, entailing that the system becomes more similar to a fully chaotic one.
Noises with f ÿk power spectrum, and 0 k 3, exhibit medium-high entropy values (0:45 < H S < 1) and C JS values almost equidistant between the curves of maximum and minimum complexity.In particular, their C JS are much lower than those for deterministic noises (Schuster and chaotic).For the small k values k 0 and k 1 they become almost ideal noises with H S ' 1 and C JS ' 0. As k increases, correlations among different values become apparent and, consequently, H S decreases.FBM (1 < < 3) exhibits entropies near those of f ÿk PS, but with a lower C JS in relation to that of a non-Gaussian process.The associated FGN (ÿ1 < < 1) has higher entropic values (0:97 < H S < 1) and complexity values between 0 and 0.1.In addition these two kinds of value are higher in comparison to those for a f ÿk PS [see Fig. 1(b)].We associate this behavior with either the Gaussian or non-Gaussian nature of the respective processes.Ordinary Brownian motion ( 2) is characterized by a relative high entropy and low C JS (H S ' 0:9 and C JS ' 0:18).Also, persistent FBM (2 < < 3)-long memory processes-are more complex than FBM antipersistent (1 < < 2)-short memory ones-in agreement with the intuitive idea for this kind of behavior.Complexity values for FGN are higher than those corresponding to a f ÿk PS.In particular, persistent and antipersistent (0 < jj < 1) FGN display quite similar values [see Fig. 1(b)].Maximum entropy and minimum complexity values are observed for 0, which corresponds to white Gaussian noise.Note that, in the causality plane, this situation can be located in a position that lies below that corresponding to the case k 0 of f ÿk PS.
Summing up, our representation plane is here shown, with regards to signals generated by well-known models, to accommodate noise and chaos at different planar locations.Such representational property could be useful when dealing with real data (that always have a stochastic component due to omnipresent dynamical noise) so as to classify different degrees of ''stochasticness.''

FIG. 1 (
FIG. 1 (color online).Continuous lines represent minimum C min and maximum C max complexities.The area enclosed by them is the CH plane.(a) Localization of different CS and SP in the CH plane.(b) Enlargement near the ideal point H S 1, C JS 0. D 6 is used.The graph illustrates the fact that, in the case of textbook models usually regarded as being of either stochastic or deterministic nature [17], our numerical results place them at clearly different planar locations.
Our representation also distinguishes (a) Gaussian from non-Gaussian process and, (b) among different degrees of correlations (colored noises).Consequently, this representation plane is an effective tool for revealing the sometimes subtle difference between noise and chaos.This work was partially supported by the Consejo Nacional de Investigaciones Cientı ´ficas y Te ´cnicas (CONICET), Argentina (Nos.PIP 5687/05, PIP 6036/05) and ANPCyT, Argentina (No. PICT 11-21409/04).O. A. R. gratefully acknowledge support from Australian Research Council (ARC) Centre of Excellence in Bioinformatics, Australia.
. Of course, the embedding dimension D plays an important role for the evaluation of the appropriate probability distribution, since D determines the number of accessible states D! and tells us about the necessary length M of the time series needed in order to work with a reliable statistics.Concerning this last point in all calculations reported here the condition M D! is satisfied.It is essential for our present purposes to consider rather small D values.In particular, Bandt and Pompe suggest for practical purposes working with 3 D 7, and this is what we do here (in the present work we used D 6