The Shannon entropy as a measure of diffusion in multidimensional dynamical systems

In the present work, we introduce two new estimators of chaotic diffusion based on the Shannon entropy. Using theoretical, heuristic and numerical arguments, we show that the entropy, S, provides a measure of the diffusion extent of a given small initial ensemble of orbits, while an indicator related with the time derivative of the entropy, S′\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S'$$\end{document}, estimates the diffusion rate. We show that in the limiting case of near ergodicity, after an appropriate normalization, S′\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S'$$\end{document} coincides with the standard homogeneous diffusion coefficient. The very first application of this formulation to a 4D symplectic map and to the Arnold Hamiltonian reveals very successful and encouraging results.


Introduction
The characterization of chaotic diffusion in phase space of near-integrable systems is a matter of main significance in several fields, in particular, both in Celestial Mechanics and galactic astronomy. Indeed, just for the sake of illustration, we refer the reader to Maffione et al. (2015) where the relevance of chaos for halo stars in the Solar neighborhood is thoroughly discussed, and to Martí et al. (2016) which includes a careful study of diffusion in the Gliese-876 planetary system. More recently, in Cincotta et al. (2018) and references therein, a general discussion on the macroscopic diffusion is provided, as well as further applications to planetary dynamics. For instance, in Cincotta et al. (2018) the proper way to measure a physical instability in a planetary system in a Laplace resonance is addressed. Meanwhile, in Martí et al. (2016) and by means of diverse chaos indicators, the motion within such a resonance was shown to be completely chaotic, though diffusion experiments revealed an almost fully stable inner part but a highly unstable outer region in the resonance.
For the sake of clarity let us state that, from now on, we will term chaotic diffusion (or just diffusion) the time variation of the prime integrals or actions of an integrable Hamiltonian due to a small non-integrable perturbation, as it is usual in dynamical astronomy.
In the above cited works, we remarked that, in the case of macroscopic diffusion, the so-called normal diffusion approximation seems not to apply in almost any near-integrable system. Such an approach assumes that a single diffusion coefficient D w , w being any phase space coordinate, could be defined as the constant rate at which the variance of w, σ 2 w , evolves with time so that the linear trend, σ 2 w = D w t, is asserted. When a 2 factor is included in the definition of D, it becomes D = σ 2 /2t, relationship known in physics as Fick's law, after the pioneer work of A. Fick (1855). However, when dealing with dynamical systems, the diffusion coefficient is introduced in the frame of a generalization of Brownian motion, the review by Chandrasekhar (1943) being one of the most relevant references in this field.
In fact, deviations from normal diffusion were observed in many different dynamical systems. A significant contribution to the so-called transport processes was due to Zaslavsky (see for instance Zaslavsky 1994;Zaslavsky and Abdullaev 1995;Zaslavsky and Edelman 2000;Zaslavsky 2002a, b among others). Further relevant papers which pose a statistical approach are Klafter et al. (1990Klafter et al. ( , 1993, Korabel and Klages (2004), Schwarzl et al. (2017), Venegeroles (2008), Miguel et al. (2014), and simple applications of such theories to planetary dynamics can be found in Zhou et al. (2002), Cordeiro and Mendes de Souza (2005) and Cordeiro (2006). Such a phenomena, called anomalous diffusion (or, more precisely, anomalous transport), affords also a characterization through the variance progression with time, which is no longer linear but of a more general form. Indeed, in the case of abnormal diffusion, the variance runs like σ 2 w =D w t β , where the exponent β = 1 strongly depends on the local dynamical structure of the deemed region of phase space. Most of the theoretical efforts concerning anomalous transport (including chaotic diffusion) considered low-dimensional systems or maps and it seems not evident how to extrapolate this statistical approach to systems of higher dimension. Indeed, the due approach to this problem is completely open to the time being.
Only in the context of normal diffusion does the derivation of a diffusion coefficient seem to be widespread in dynamical astronomy (see Cincotta et al. 2018 for more details), and the way of measuring both diffusion and its rate is far from clear in the general case. Nonetheless, suitable diffusion experiments would well serve to neatly distinguish stable and unstable regions within a chaotic domain as it was shown, for instance, in Maffione et al. (2015), Martí et al. (2016), Cincotta et al. (2018) and Maffione et al. (2018).
In the last decade, studies on diffusion in multidimensional near-integrable systems were conducted by many authors, some of them dealing with the slow diffusion regime in the limit of weak chaos (see for example Lega et al. 2003Lega et al. , 2008Froeschlé et al. 2005Froeschlé et al. , 2006Efthymiopoulos and Harsoula 2013;Cincotta et al. 2014). On the other hand, diffusion studies in largely chaotic scenarios were also considered. For instance, in Tsiganis et al. (2005) and Tsiganis (2008) the diffusion theory is reviewed, including its application to the chaotic dynamics of asteroids and Jupiter's Trojans. Meanwhile, in Cachucho et al. (2010) Chirikov's formulation (Chirikov 1979;Cincotta 2002) is applied in order to derive a local diffusion coefficient within a given asteroidal three-body resonance domain, while in Batygin et al. (2015) and Martí et al. (2016) the dynamics and diffusion in the Laplace mean motion resonance in the GJ-876 planetary system is addressed. In all cases, the normal diffusion approximation was invoked.
Herein, we recover an strategy already outlined in Cincotta and Giordano (2012), which consists on estimating the chaotic diffusion extent by recourse to the Shannon entropy. Furthermore, we provide numerical evidence that a measure of the actual mean diffusion rate could be derived from the entropy's time derivative.
In the present effort, all these issues are thoroughly discussed, first considering a wellknown 4D-conservative map, where chaotic diffusion experiments are computationally much less expensive when dealing with systems of discrete time. However, we also propose the present formulation to the study of a system of continuous time, in particular, to a multidimensional near-integrable system, and present several numerical experiments regarding the 2 1 2 degrees of freedom Arnold Hamiltonian introduced in Arnold (1964). This system is the paradigmatic model for the so-called Arnold diffusion (actually the Arnold mechanism that leads to an instability).

A time discrete dynamical model
In order to show the use of the finite time entropy to quantify and measure the chaotic diffusion in a multidimensional system, we take the 4D map: and the quantities Δ i are fixed so that the f i have zero average. This map has been largely discussed in Cincotta and Giordano (2016) and references therein. Rescaling the actions y i , the map (1) becomes In what follows, we take for simplicity ε 1 = ε 2 = ε, μ 1 = μ 2 = μ 3 = μ, γ + = γ − = γ and all the ϕ i = 0, i = 1, . . . , 3, so Δ i = 0. The main difference of this 4D map from two coupled standard maps rests on the concomitant primary resonances' set. Indeed, denoting withŷ 1 = y 1 /2π,ŷ 2 = y 2 /2π and ≡ ε 2 , εγ, up to O( μ 2 ), this system has the following primary resonances (see Cincotta and Giordano 2016 for details):

Fig. 1
On the left, the theoretical resonance web of the map (2) for 0 < |k 1 | + |k 2 | + |k 3 | ≤ 6 and (y 1 /2π, y 2 /2π) ∈ [0, 0.5) × [0, 05). On the right, contour plot of the (logarithmic) time derivative of the conditional entropy, J , (log J values in the figure) for the map (2) with the given parameters after 400 iterates (log 2 ≈ 0.3). The initial values of the phases are fixed as Therefore, the full set of resonances at any order in and μ is given by Thus, horizontal lines correspond to the resonances of the uncoupled (x 2 , y 2 ) map, vertical ones to those of the uncoupled (x 1 , y 1 ) map, while the coupling resonances are given by the lines y 2 = −(k 1 y 1 + 2πk 3 )/k 2 , k 2 = 0. Figure 1(left) illustrates the theoretical resonance web for (y 1 /2π, y 2 /2π) ∈ [0, 0.5) × [0, 05) for 0 < |k 1 | + |k 2 | + |k 3 | ≤ 6. The actual resonance web is shown in Fig. 1(right) where the final values for t = 400 of the (logarithmic) time derivative of the conditional entropy, J (see Cincotta and Simó 1999) have been computed for an equispaced grid of 1500 × 1500 pixels in the domain (y 1 /2π, y 2 /2π) ∈ [0, 0.5) × [0, 05), (x 1 (0) = x 2 (0) = 0), since for the interval [0.5, 1) it is enough to take 1 − y i /2π, with y i /2π ∈ [0, 0.5). Recall that the (symmetric) conditional entropy of nearby orbits, I , is such that I ∼ δ 2 0 λ 2 t 2 1 for stable motion, with λ the maximum linear rate of linear divergence and δ 0 1 the initial value of the deviation vector; while I ∼ δ 2 0 exp(2σ t) ∼ 1 for unstable chaotic motion, where σ is the maximum Lyapunov exponent of the given trajectory. On the other hand, J = tİ /I behaves like the MEGNO (Cincotta and Simó 2000;Cincotta et al. 2003;Cincotta and Giordano 2016), J ≈ 2 for quasiperiodic motion and J ∼ σ t 1 for chaotic dynamics. We have chosen this particular dynamical indicator since it provides a clear picture of the global dynamics in very short motion times due to the fact that it is of second order in the solution of the variational equations. Although I is based on a continuous entropy associated to two nearby trajectories, say γ and γ and also to the orbit defined by γ × γ , J thus defined is quite similar to the MEGNO indicator. For further details, we refer to Cincotta and Simó (1999), while a brief discussion of both indicators is provided in Cincotta and Giordano (2016).
To clearly interpret the observed resonance structure, let us recall that the product (x 1 , y 1 )× (x 2 , y 2 ) at x 1 = x 2 = 0 only reveals the hyperbolic 2D tori (that is, the homoclinic tangle) of the primary resonances on the (y 1 , y 2 ) plane (as it would occur in the case of two coupled standard maps). We neatly distinguish the integer and the semi-integer resonances for the uncoupled maps as well as many coupling resonances and several high-order ones. Initial conditions that lead to strong chaotic motion take the highest values of log J while the smaller values correspond to stable quasiperiodic or periodic motion. This figure will serve later to locate the initial conditions within different chaotic domains.
In order to visualize the diffusion of any orbit onto the (y 1 , y 2 ) plane shown in Fig. 1(right), we proceed as follows (from now on, y i will denote y i /2π). We take a small ensemble of n p trajectories around some initial condition y 1 (0), y 2 (0), x 1 (0) = x 2 (0) = 0, iterate the ensemble during a total motion time t f and record the intersections of the n p orbits with the section S defined as |x 1 | + |x 2 | < η 1. Since the evolution of the variance of the ensemble is also needed, we compute the variance over the section (see Cincotta et al. 2018 and references therein for alternative ways of computing different variances). To this end, we take a fixed time interval Δt t f , define t l = lΔt and consider motion times t l−1 < t ≤ t l , l ∈ Z + . Assume that the n p initial conditions have n l 1 intersections with S at different times τ l ∈ (t l−1 , t l ]; thus, the variance over the section is computed as when the initial condition lies on a single resonance, y f denotes the values of the corresponding fast action (taken along the resonance line) for the ensemble. Meanwhile, for the initial conditions on multiple resonances or in a domain of resonances' overlap, we take y 2 f = y 2 1 + y 2 2 .

Theoretical outline and its experimental support
As we have already discussed, it is not a simple task to provide a measure of diffusion, in particular, when the latter is macroscopic. We could not invoke normal diffusion when dealing with systems of divided phase space and consequently, the derivation of a diffusion coefficient from the evolution of the variance of the unperturbed integrals is no longer valid. Thus, in the present section we go beyond in the search of an alternative tool to this end bringing into play the finite time entropy, known as Shannon Entropy, and investigate its effectiveness for such an application. A first attempt in this direction was already given in Cincotta and Giordano (2012), while the theoretical background on the Shannon entropy can be found in Shannon and Weaver (1949); however, different approaches to the entropy in dynamical systems are presented in for instance Katz (1967), Arnol'd andAvez (1989) and Wehrl (1978).
We will use the entropy to measure the wandering of the normalized unperturbed actions y = (y 1 , y 2 ) upon a given unit square (with opposite sides identified), T , as time goes by. The adopted T corresponds to the intersections of any trajectory of the 4D map with the starting phase plane or section |x 1 | + |x 2 | < η 1. Following Arnol'd and Avez (1989) and Lesne (2014), consider the real-valued function Z defined in [0, 1] as: Clearly, Z (x) ≥ 0, it has a maximum at x = e −1 and Z < 0. Let be a partition of T , for instance, a collection of q bidimensional cells that cover the whole unit square. The elements of the partition are assumed to be both measurable and disjoint. For any trajectory γ of the map, a probability density on T can be defined by where {y i , i = 1, . . . , N } denotes the intersections of the trajectory γ with T , and δ is the delta function. It is evident that and the measure μ of any element of the partition a i turns out to be Then, for the partition α the entropy of γ is defined in the fashion Let us notice that for a given partition and any trajectory, the entropy is bounded. Indeed, for the partition (6) it is 0 ≤ S(γ , α) ≤ ln q. The minimum is reached when γ is restricted to a single element of α, say the k-th element, which would correspond to the case of almost full stability (for instance when all the iterates lie onto a single torus). In such a case, it is μ(a k ) = 1, μ(a i ) = 0, ∀i = k, yielding S = 0. On the other hand, the maximum value, S = ln q, will be reached whenever the q elements of the partition have equal measure, μ(a i ) = 1/q, that corresponds to the situation in which the unperturbed actions wander all over the unit square in a uniform fashion, i.e., in case of ergodicity. Therefore, the entropy seems to be a priori a rather natural measure of the diffusion extent. 1 By means of the very same arguments given in  and Cincotta (1999), if the a i are small enough and N is large, we could assume that for any trajectory, all the nonempty elements of the partition have nearly the same measure. Thus, if q 0 denotes the number of cells visited by the orbit after a given motion time, then from (10) and the normalization condition, Therefore, under this assumption, the entropy behaves in a logarithmic way and it only depends on the fraction of the area of the unit square covered by the trajectory. Moreover, if N is large enough, q 0 would also provide an indication of the amount of instability on T . Indeed, if q 0 /q ≪ 1 we would be in presence of almost stability, while the instability would be quite large whenever q 0 /q 1.
The time derivative of the entropy,Ṡ, can be approximated bẏ where q 0 (t) denotes the number of cells visited by the orbit at time t and δq 0 /δt the rate at which q 0 changes with time. At this point, we make the following heuristic approximation: where δ y 2 j denotes either δy 2 1 , δy 2 2 or (δy 2 1 +δy 2 2 )/2 and · is the average on T over the interval (t, t + δt). The above estimate rests on the assumption that the variation of the occupied cells is entirely due to changes in the actions in any direction (see below).
Let us now consider two asymptotic scenarios. If γ is a stable orbit lying on a torus, then δ y 2 j = 0, δq 0 = 0 and no variation of the number of cells takes place. Both the entropy and its time derivative vanish and therefore the quantity also vanishes. On the other hand, in the case of a nearly ergodic system, for any trajectory it is δq 0 ∝ δ y 2 j > 0. In this particular situation, it is known that δ y 2 j ≈ Dδt, where D is the homogeneous, constant diffusion coefficient of the system and thus, S ∝ D, so where k is a normalization constant that depends on the extreme values attained by the actions. 2 Indeed, k is chosen such that kq 0 (t) is the area of the action space covered by the diffusion of a given orbit. The heuristic approximation (13) rests on the assumption that δ y 2 j is the difference in the ensemble variance on the section S at two nearby times, δ y 2 j = Δy 2 j (t + δt) − Δy 2 j (t) . Then if this change in the variance satisfies δ y 2 j 1/q, then δ y 2 j ∼ δq 0 . On the other hand, if δ y 2 j 1/q, for q large enough, it is almost stability, δ y 2 j 1 and thus δq 0 = 0.
2 Note that to compute δ y 2 j the actions are not restricted to the unit square. In order to test these assumptions in the case of a nearly ergodic scenario, in Fig. 2(top) we display the diffusion on the unit square, actually onto the section S : |x 1 | + |x 2 | ≤ 0.02, for the map (2) with values of the parameters large enough such that the system presents global and fast diffusion all over T . This plot corresponds to t = 10 6 iterates of the map for an ensemble of size 10 −6 of 2000 initial conditions around (x 1 (0), x 2 (0), y 1 (0), y 2 (0)) = (0, 0, 0.25, 0.25). The bottom panel (left) shows the evolution of the entropy S accordingly to (10) and its approximation S 0 given by (11), both normalized by ln q, with q = 800 × 800 cells of equal size. Clearly, in this particular case, both entropies completely agree. In the bottom panel at the right, we present both the temporal evolution of S given by (15) and the time average of the variance over the section of y 1 and y 2 . To compute the action variances δy, we have used (4) with Δt = 5 × 10 3 and being n l 700, while for S we have taken the per step numerical change of S every Δt. To determine the constant k, we have just evaluated the maximum and minimum values attained by the actions during the iteration of the ensemble and thus where y jM , y jm denote such maximum and minimum values, respectively. We observe that the diffusion behaves as a normal process, the rate of the variance's average approaches very quickly to a constant value leading to a diffusion coefficient, D ≈ 2 × 10 −5 . This is the very same constant value of S over the whole time interval, showing then that (15) holds. The key point of this formulation is the assumption (13), where the approximation (12) is used. It is clear that the exact form ofṠ also involves the time derivatives of μ(a i ). Even though we have numerically computed the full derivative of S taking into account all its terms, we found that the smoothest and best approximation toṠ is provided by the numerical derivative of S evaluated every δt, as shown in Fig. 2. Thus, in what follows, we assume (13) to be true in any case and numerical evidence of its validity will be delivered in the next subsection.

Further Numerical experiments
In this section, we consider additional numerical experiments but involving values of the map parameters such that the resulting dynamical system presents a divided phase space, with components of comparable measure.
In Fig. 3(top), we present the results of the intersections of the same initial ensemble as in Fig. 2(top) (ensemble 1) but for the values of ε, γ and μ adopted for Fig 1(right). We have changed the partition to q = 500 × 500 while the rest of the parameters (t, Δt, S) are the same as above.
We observe that diffusion, though macroscopic, is restricted to a relatively small domain of the unit square. This is revealed by the values of the concomitant entropies that are smaller than those for the case in which the system covers almost all the unit square; however, here the size of the cells of the partition is larger. Note that though the normalized entropies, S and S 0 somewhat depart, both evolve at a similar rate. We can provide a rough estimate of the extent of the diffusion recalling that, for the normalized entropy it is q 0 ≈ q S and being the size of the cells 1/q, the area covered by the ensemble is q 0 /q ≈ q S−1 and therefore the linear extension of the diffusion would be Δy i ∼ q (S−1)/2 , for q = 500 × 500 and after 10 6 iterates, S ≈ 0.65, leading to Δy i ∼ 0.11. The right panel reveals that, even in the case of a bounded diffusion, Eq. (13) holds. The constant k in (15) is obtained again by recourse to (16). We observe that the mean rate at which the variances evolve with time is still slowly decreasing at t = 10 6 , revealing that the diffusion is not normal over this time interval.
In view of the above results and those of our vast numerical experimentation, we here propound S and S as an indication of the extent of the diffusion and the rate at which it changes, respectively, the latter providing a measure of the macroscopic diffusion rate. In this direction, we focus on the next experiment.
Fixing again ε = 0.25, γ = 0.1 and μ = 0.5, we take a similar ensemble as the one above, but now centered at (x 1 (0), x 2 (0), y 1 (0), y 2 (0)) = (0, 0, 0.20, 0.20) (ensemble 2). A simple inspection of Fig. 3 (left panel) reveals that the ensemble 2 lies on the region covered by the diffusion starting at y 1 (0), y 2 (0)) = (0.25, 0.25) (ensemble 1) so that we should expect similar dynamical properties for both ensembles, that is comparable variance evolution and diffusion rates. Figure 4(left) shows that the diffusion of both ensembles, though with a different surface density distribution, their corresponding areas largely overlap; however, their variances evolve in a quite different way, as Fig. 4(right) shows. Again, it is evident that the observed diffusion is not normal and thus any attempt to estimate a diffusion coefficient, for instance by a linear fit on the variance evolution, would lead to very dissimilar values for such a macroscopic instability. Now, Fig. 5(left) shows the evolution of the normalized entropies for both ensembles. As expected, the final value of S for ensemble 1 is larger but comparable to that of ensemble 2, since the intersections of the latter with the section seem to be more confined than those of ensemble 1. Also they evolve in a different way and note that at t ≈ 1.5 × 10 5 both entropies take the very same value. For t > 1.5 × 10 5 the entropy for ensemble 2 slightly varies, while for ensemble 1 it is still increasing, revealing that diffusion continues visiting new cells of phase space. The evolution of S for both ensembles also reveals that the measure of the rate of this macroscopical diffusion should be comparable, and in fact, their final values are rather similar. In both cases, the diffusion rate would be close to 10 −9 after t = 10 6 . Several numerical experiments have been carried out with different values of the map parameters, in particular for very small, moderate and large values of ε, γ and μ. Also the total motion time ranged from t = 10 4 to t = 10 7 , with different selections of Δt, for ensembles located all over the unit square and considering different partition sizes. The results concerning S(t), S (t), q 0 (t)/q and variances' evolution are quite well represented by the results shown in Figs. 4 and 5.
Finally, as a global experiment and adopting again ε = 0.25, γ = 0.1, μ = 0.5 we take a grid of 500 × 500 initial values of (y 1 , y 2 ) ∈ [0, 0.5) × [0, 0.5) with x 1 (0) = x 2 (0) = 0. We consider ensembles of size 10 −7 of 10 initial conditions around each y 1 (0), y 2 (0) and compute the corresponding final values of S, S and q 0 /q on the section S : |x 1 |+|x 2 | ≤ 0.02, with q = 500 × 500 and after a total motion time t = 5 × 10 5 , taking Δt = 2.5 × 10 3 . In order to reduce oscillations in the final value of S as Fig. 5(right) shows, we average the values of S over the interval (5Δt, t). Figure 6 displays the results for S and S (in logarithmic scale). Both figures should be compared with Fig. 1(right), but keeping in mind that while the chaos dynamical indicator J has been obtained for t = 400, the entropy computation involves t = 5 × 10 5 so that around resonances and their crossings the chaotic domains become larger, as both S and S contour plots reveal. Recall that due to computational limitations Fig. 6 has 11% of the resolution of Fig. 1(right). Although the global picture given by the three figures is similar, note that while J measures the local exponential rate of divergence around each initial condition y 1 (0), y 2 (0), S and S provide a measure of the extent and the rate of diffusion of a small ensemble starting at y 1 (0), y 2 (0), respectively. As we observe from these figures, for the map (2) and the given values of ε, γ and μ, the mean diffusion rate ranges from log S − 13 to log S > − 10. Note that the lower values of S correspond to stable, regular motion where the expected value is S ≈ 0 while the larger ones appear in chaotic domains of the phase space. The most unstable region corresponds to the intersection of the two half integer resonances y 1 = y 2 = 0.5, while around the integer resonances' crossing the diffusion rate is clearly smaller and certain dynamical structure is revealed by both S and S . The lower values of the (normalized) entropy are very close to 0 as expected; on the other hand the highest values are nearly half of its maximum value, so every ensemble of orbits with S ∼ 0.5 visits ∼ √ q = 500 cells of size 4 × 10 −6 in 5 × 10 5 units of time, leading to a linear extension of the larger instability of the order of 0.05. A very similar (not included) figure is obtained for the contour plot of q 0 /q, revealing that this simple ratio also provides a measure of the extent of diffusion. The question is why S and S yield a better estimation of the diffusion extent and its mean rate than those derived for instance from the variance (4). The answer seems to be quite simple. The variance evolution of the ensemble is highly sensitive to stickiness and this phenomena seriously affects its time rate, as for instance Fig. 4 clearly shows. As a consequence, any estimation of a diffusion coefficient computed over a finite time interval by means of a normal diffusion assumption would in general fail. 3 On the other hand, the entropy, which takes into account all the dynamical information of the ensemble trajectory, evolves in a smooth way and it is much less affected by stability domains that slow down the chaotic diffusion during some time interval. Essentially, the entropy measures the full area of the intersections of the orbits with the section S. Of course, the intrinsic time dependence of the diffusion due to the dynamical phase space structure and its anisotropic character are also observed by means of the entropy. The same arguments apply to S and thus we are confident of S and S to provide a good measure of the chaotic diffusion within a given finite time interval.
For ε = 0 the system has two global integrals of motion, namely I 1 and I 2 , which determine the invariant tori supporting the quasiperiodic motion with frequencies ω 1 = I 1 , ω 2 = I 2 . In case ε = 0, μ = 0 the system still has two integrals, and the unperturbed Hamiltonian takes the form The Hamiltonian H ε 1 is the pendulum model for the resonance ω 1 = 0; H ε 1 ≡ h 1 = −2ε corresponds to the exact resonance or stable equilibrium point at (I 1 , θ 1 ) = (0, π) while h 1 = 0 to the separatrix and thus (I 1 , θ 1 ) = (0, 0) is the unstable point. The frequencies are then where ω p (h 1 , ε) denotes the pendulum frequency. Close to the separatrix, for both oscillations and rotations, ω p (|h 1 | 1, ε) ≡ ω sx (h 1 , ε) → 0 when |h 1 | → 0. The half-width of the resonance ω 1 = 0 in action space is (ΔI 1 ) r = 2 √ ε, so within this resonance the variation of I 1 is bounded by |ΔI 1 | ≤ 2 √ ε while I 2 remains constant. Therefore in action space, (17) can be recast as where H ε 0 is given by (19) and θ 2 (t) = ω 2 t + θ 0 2 . The perturbation μV ε , for εμ 1, mainly affects the motion close to the separatrix of the resonance ω 1 = 0 leading to the formation of the chaotic layer; the dynamics becomes unstable, chaotic in a small domain |I 1 ±2 √ ε| ≤ w s , where w s denotes the width of this layer. However, due to the dependence of V ε on θ 2 , also I 2 changes, and therefore motion along the stochastic layer could proceed. Due to the chaotic character of the dynamics inside this narrow layer, the variation of I 2 should also be chaotic, giving rise to variations in I 2 . Thus, as I 2 could change unboundedly, a large instability might exist. This is the way in which Arnold diffusion is discussed in Chirikov (1979).
In (20) ω 1 = 0 is just one of the six first-order resonances involved. A simple trigonometric manipulation reveals that the resonances at order ε and εμ are The resonances intersect at seven different points in frequency space, namely (ω 1 , ω 2 ) = (0, 0), (0, ± 1), (± 1, ± 1), 4 hence the diffusion would spread over all this resonance set. However for εμ ε 1 the diffusion rate should be negligible along all resonances except for ω 1 = 0, since the latter has the main amplitude ∼ ε, while all the other resonances have amplitudes ∼ εμ ε. 5 Considering the fully perturbed motion, besides the ones given in (21), the full set of resonances (in action-energy space) is given by a linear combination of the form In Cincotta et al. (2018) portraits similar to those displayed in Fig. 1 are included showing the resonance web defined by (22) and a MEGNO dynamical map for this model.

Numerical experiments
Let us investigate the diffusion in this model by fixing the value of ε = 0.25 and taking two different values of μ = 0.01, 0.025. We will focus on chaotic diffusion along the stochastic layer of the main or guiding resonance, ω 1 = 0, and for different initial values of ω 2 = I 2 .
According to Chirikov's theoretical results Chirikov (1979) by means of the normal diffusion approximation, when με ε 1 the local diffusion rate for |ω 2 | < 1 should be larger than for |ω 2 | > 1. As pointed out in Cincotta et al. (2018), this difference in the diffusion rates rests on the assumption that for |ω 2 | < 1, the strongest perturbation to the motion on the separatrix of the guiding resonance is due to the closest resonance in frequency space, the layer resonances ω 2 = ± ω 1 that give rise to the chaotic layer around the separatrix of the guiding resonance (motion across the layer), and thus the diffusion along this layer is driven by the smaller perturbing terms, the driving resonances ω 1 = ± 1. For |ω 2 | > 1, the layer resonances turn to ω 1 = ± 1, while the driving resonances become ω 2 = ± ω 1 . Clearly, near |ω 2 | = 1 all these arguments are no longer applicable as well as when ε 1, so for the adopted values of ε, μ and for |ω 2 | ≈ 1 numerical experiments are required.
For ε = 0.25, the unperturbed separatrix (or the center of the chaotic layer) of the guiding resonance lies at I 1 = 1 and typical time scales for the motion inside the chaotic layer are T a 20 (see Cincotta et al. 2018). We focus on the diffusion within a range of |ω 2 | where the guiding, layer, driving and other high-order resonances exhibit crossings and overlapping as the MEGNO contour plots reveal in Figs. 7 and 8, e.g., |ω 2 | ≤ 2.
We took action values on the chaotic layer, (I 1 , I 2 ) = (1, ω 2 ), and θ 1 = π, θ 2 = t = 0 as the initial values for the angle variables, in coincidence with those adopted for computing the MEGNO map. We considered six ensembles of 100 initial values of the actions around (1, ω 2 ) of size ∼ 10 −7 for each value of μ. The center of each ensemble lies at ω 2 = − 1.69, − 1.13, − 0.173, 0.053, 0.55, 1.55, respectively. The equations of motion were integrated with a time step of size 0.01 for a total motion time 4 × 10 6 .
The results for μ = 0.025 are reported in Fig. 7, where the wandering of the actions for the initial ensemble (depicted in white) is pursued and superimposed on the MEGNO contour plot (for |I 1 | ≤ 1.5, |I 2 | ≤ 2), each color identifying a different set of 100 trajectories. In order to observe the diffusion in the 2D plane (I 1 , I 2 ), we considered the section defined by |θ 1 (t) − π| + |θ 2 (t)| < 0.01.
From these experiments, we observe that despite the location of the initial ensemble, the diffusion spreads over the range |I 2 | ≤ 2 almost uniformly and in all cases the same resonances are visited after the considered motion time. Then, for these values of ε and μ, the instability is large and it is expected the same mean diffusion rate all over |I 2 | 2.
The results for μ = 0.01 are presented in Fig. 8, where we observe that the ensembles located at the lowest and highest values of ω 2 reveal a bounded diffusion over the range 0.4 |I 2 | < 2, while for the rest of the ensembles the diffusion spreads over a barely smaller domain than that for the case of μ = 0.025, |I 2 | < 2. Therefore, for these four ensembles, we also expect a similar characterization of the diffusion, as well as for the ensembles located at ω 2 = − 1.69, 1.55. Figure 9 presents the variance evolution of the ensembles over the section |θ 1 (t) − π| + |θ 2 (t)| < 0.01 computed by means of (4) for Δt = 2 × 10 4 (n l 1000). We observe that for μ = 0.025 the diffusion does not behave in a normal fashion; moreover, fitting a power law provides quite different exponents ranging from 0.10 to 1.02 when considering the full Six initial ensembles (indicated as white circles) are followed onto the MEGNO contour plot for = 0.25, μ = 0.025, white and light gray denote stable motion while black indicates strong chaotic dynamics. The concomitant trajectories for the six ensembles that intersect the section |θ 1 (t) − π | + |θ 2 (t)| < 0.01 are depicted with different colors. Figure taken from Cincotta et al. (2018) time span. However, when the last 80% lapse is regarded, lower exponents are obtained in some cases. Similar results arise for μ = 0.01, although the variance's evolution seems to be closer to that corresponding to a normal process: a power law fit provides exponents in the range 0.63 − 0.8 for the ensembles that present a more extended diffusion and close to 1 for the other ones. Therefore, the characterization of the diffusion by means of the variance evolution over the considered time-scale seems not to represent what it is observed in Figs. 7, 8, since in almost all cases the dynamical properties of the system for each value of μ should be very similar.
A region in action space encompassing all the crossings with the section has to be determined in order to define a partition for the computation of S and S . Here we take L 1 = {I 1 : |I 1 | ≤ 1.4}, L 2 = {I 2 : |I 2 | ≤ 2.5} for μ = 0.025, while L 1 = {I 1 : |I 1 | ≤ S' < δI 2 2 > / t 1.12}, L 2 = {I 2 : |I 2 | ≤ 2} for μ = 0.01 being l 1 and l 2 the diameter of L 1 and L 2 , respectively. Figure 10 shows the evolution of the entropy for all the ensembles when adopting a partition q = 400 × 400. Both figures reveal a very similar trend for the entropies for μ = 0.025, while for μ = 0.01 the ensembles for which the diffusion is restricted to 0.4 |I 2 | < 2, S reaches smaller values, as expected. For instance, for μ = 0.025, after t = 4 × 10 6 , S ≈ 0.8 and thus for the given partition it leads to a linear extension of the instability in I 2 of ∼ q (S−1)/2 l 2 ≈ 1.5, which is smaller but comparable to the observed one for all the ensembles. Figure 11 corresponds to the ensemble located at ω 2 = − 1.13 (for μ = 0.025) and displays a comparison between the evolution of S (normalizing k to the extent of the area covered by the diffusion of this ensemble) with δ I 2 2 /t. Notice that while the latter is still decreasing at t = 4 × 10 6 , S already reaches a nearly asymptotic value at t = 2 × 10 6 , though both quantities exhibit a comparable evolution and similar final values.
Finally, Fig. 12 shows the evolution of S for all the ensembles and both values of μ. It turns out that for μ = 0.025 all the ensembles present a very similar diffusion rate measured by S , of the order of 10 −7 . On the other hand, for μ = 0.01 the diffusion rates for the ensembles for which the diffusion spreads over nearly the same region are similar, ∼ 5 × 10 −8 , while for a partition q = 400 × 400 over the domains |I 1 | ≤ 1.4, |I 2 | ≤ 2.5 for μ = 0.025 and |I 1 | ≤ 1.12, |I 2 | ≤ 2 for μ = 0.01 those leading to a more restricted diffusion, also have comparable values, of the order of ∼ 10 −8 .

Conclusions
Herein we face the difficulty to compute a meaningful measure of the chaotic diffusion and its time rate, independent of the scaling of the variance with time. Theoretical, heuristic and numerical arguments support that S and S are suitable estimators of diffusion in both limits, local and global diffusion. The strongest assumption for our proposal is indeed that the variation of the number of occupied cells is entirely due to changes in the actions during the diffusion process, while a second but weaker one regards the approximation of the actual entropy by the logarithm of the number of non-empty cells. A third though more natural assumption is the nearly equal measure of the cells occupied by the intersections of the ensemble trajectory with a given section. Anyway a more rigorous foundation of this formulation should be provided. We remark that the conjecture δy 2 (t) ∝ δq 0 (t) is independent of the transport process that takes place in phase space and thus the time evolution of S and S would reveal the character of the chaotic diffusion, i.e., normal or abnormal.
With the help of a 4D symplectic map and by recourse to the conditional entropy as indicator of chaos, we yield numerical evidence that S gives reliable values of the mean diffusion rate within the domain covered by the roam of the action values, while S measures the extension of this domain, i.e., the range of variation of the actions over the considered time span.
For the sake of illustration, we have presented a few numerical examples involving chaotic diffusion of a single ensemble and the corresponding evolution of S, S 0 , S and the variances. However, plenty of computations have been carried out considering quite restricted diffusion, overlapping of several ensembles, dependence of the entropy on the adopted partition, different parameters of the map, etc. The outcomes presented in Sect. 3 tailor a quite representative sample of all our numerical experiments. Further, the global results displayed in Fig. 6 are pretty eloquent: both S and S are very effective dynamical indicators (as follows from a simple comparison with Fig. 1); however, they also provide further information about the local and global dynamics than, for instance, the classical chaos indicators.
The application of this approach to chaotic diffusion in a well-known Hamiltonian system of more than two degrees of freedom reveals that our initial conclusions, that are based on profuse numerical experimentation on a 4D map, seem to be general and thus we claim for S and S to characterize chaotic diffusion in phase space.
The computational effort to obtain the entropy and its derivative is nearly the same needed to evaluate the variances over the section, but of course it is quite large in comparison with the one required by any dynamical indicator of chaos. Indeed, in general, a dynamical indicator is computed for a given grid of individual initial conditions for relatively short motion times, while the entropy, as well as the variances, require a grid of initial ensembles and larger times. Anyway, when dealing with maps, this is not a serious limitation; however, it could be restrictive in the case of systems of continuous time and several degrees of freedom.
It turns out then that a combination of different techniques would furnish a very efficient way to investigate global and local dynamics in multidimensional systems. The structure of the phase space would be revealed by any chaos indicator, which would provide information on the location of invariant manifolds, resonances, stable and chaotic regions, etc., allowing to obtain a clear picture of the global dynamics of the model. However, since any classical indicator could not distinguish between stable or unstable chaos, S and S should be computed in those chaotic domains of physical interest in order to get a measure of the instability and its time rate.