L-moments and C-moments

It is well known that the computation of higher order statistics, like skewness and kurtosis, (which we call C-moments) is very dependent on sample size and is highly susceptible to the presence of outliers. To obviate these difﬁculties, Hosking (1990) has introduced related statistics called L-moments. We have investigated the relationship of these two measures in a number of different ways. Firstly, we show that probability density functions (pdf ) that are estimated from L-moments are superior estimates to those obtained using C-moments and the principle of maximum entropy. C-moments computed from these pdf ’s are not however, contrary to what one may have expected, better estimates than those estimated from sample statistics. L-moment derived distributions for ﬁeld data examples appear to be more consistent sample to sample than pdf ’s determined by conventional means. Our observations and conclusions have a signiﬁcant impact on the use of the conventional maximum entropy procedure which typically uses C-moments from actual data sets to infer probabilities.


Introduction
Probability density functions are of central importance in science and engineering, and certainly to us.For example, Mauricio Sacchi (1996) shows how we can build objective functions using Bayes' theorem, Allan Woodbury (1989) is involved in Bayesian interpolation and Tad Ulrych (1998) is examining proba-bility density functions associated with chaotic time series.An excellent preprint by Gouveia, de Moraes and Scales (1998) who write about the use of higher moments for the maximum entropy construction of Bayesian a priori pdf 's, presents clearly both the use of derived pdf 's as well as dif®culties encountered in computing these functions from realizations.These dif®culties prompted Hosking (1990) to formalize a method for the estimation of statistics that are measures of higher moments, skewness and kurtosis for example, but are more robust with respect to sample size and the presence of outliers.These measures, called Lmoments, are based on expectations of linear combinations of order statistics.Indeed, Vogel (1995) observes that the introduction of the theory of L-moments by Hosking ``is probably the single most signi®cant recent advance in relating to our understanding of extreme events''.
In this paper we concern ourselves, speci®cally, with the following question.What is the relationship between conventional moments, that we call C-moments in this paper for convenience, and L-moments?This brings into focus such issues as the inversion of L-moments to compute pdf 's, the comparison of pdf 's computed using C-and L-moments and, ®nally, the computation of C-moments from L-moment derived distributions.
The robust computation of higher moments of pdf 's from limited datasets are often used statistics.Important applications occur in wavelet estimation (Velis and Ulrych, 1996) and in the determination, by means of the principle of maximum entropy, of distributions that enter as prior probabilities in Bayesian inference (Gouveia et al., 1998).The central concept here is that of robustness.The conventional method for the determination of higher order moments, called the direct approach, is by means of an integral formulation.L-moments simplify this computation considerably.Not only do L-moments lead to simpli®cation, they also yield estimates with much lower bias.These properties are of such importance that Royston (1991) has proposed that all statistical packages incorporate L-moments as the method of computing higher order moments.The question remains in our minds, however.How are these different estimates of pdf properties, related?
In order to investigate this topic, we pursue the following methodology.We ®rst compute C-and L-moments from a particular sample size.We then determine the associated pdf using the principle of maximum entropy for both C-and L-moments (no mean task, as it turns out) and compare the derived distributions with the actual pdf.Finally, we compute C-moments from the L-moment derived pdf.

Theoretical summary
We present, brie¯y, the derivation of L-moments, as well as some details of the inverse problem that we solve to determine pdf 's from L-moments.

L-moments
We are all very familiar with the ®rst two moments, the mean and the variance.The next two moments, skewness and kurtosis are also familiar, but less so.In applied geophysics, at least, these moments are seldom used.Although their application has been somewhat limited in the past, the scale dependent phenomena that we are constantly encountered with demands that we look at our data in ever ®ner detail.This is where these statistics are of great importance.As we will see, conventional estimation of skewness and kurtosis has serious drawbacks.To obviate some of these drawbacks, Hosking (1990) formalized a method for their estimation that is based on expectations of linear combinations of order statistics.This approach originated with the work of Gini (1912) and was described by Kaigh and Driscoll (1987).Hosking's paper, a beautiful paper indeed, has had an explosive effect in some ®elds.Hoskings's paper has led to several papers that attempt to explain to those of whom, like us, are less well versed with the ®ne art of pure statistics, what these linear combinations actually mean and what advantageous properties they posses.We are referring to two papers in particular: Royston (1992) and Wang (1996).We present here a distillation that is brief and, we hope, to the point.

A reminder
First of all, a reminder.Consider a random variable X with a pdf px.The rth moment of px is de®ned as EX r x r pxdx 1 De®ning the mean and the variance by l and r 2 , respectively, we obtain (using Royston's (1992)  where E Á denotes expectation.

Definition of L-moments
De®ne X jXn to be the jth smallest moment in a sample of size n.The L-moments of X are de®ned by The ®rst four L-moments are then de®ned by Let us ®rst note that we are estimating linear combination of order statistics.
In fact the L in L-moments represents exactly this linearity.Wang (1996) gives the following, intuitive, interpretation of L-moments.One value in a sample gives a feel for the magnitude of X.Two samples, through their difference, give a feel for the variability of X.Three samples, give an indication of the asymmetry of px and, ®nally, four samples, tell us something about the ratio of peak to tails of px.Akin to the de®nition of conventional normalized moments, s 3 l 3 al 2 and s 4 l 4 al 3 are statistics related to the skewness and kurtosis of the pdf.where F is the cumulative distribution.

Estimating L-moments by means of PWMs
In a manner similar to the determination of product moments about the mean or about the origin, Hosking has shown that the ®rst four L-moments are given by Landwehr et al. (1979) have shown that the unbiased estimator of b r is given by b r 1 n The L-moments may now be computed by means of Eq. ( 14).At this stage, Wang (1996) points out that this procedure is not particularly ef®cient or clear.In other words, PWMs obscure the intuitive understanding of Lmoments.Wang makes the point that how L-moments are de®ned appears to be unrelated to how they may be estimated and he suggests a more logical approach that he calls the direct method.

Direct estimation of L-moments
Consider as a speci®c example the estimation of l 2 , Eq. ( 6).For each combination of any two values from a population of size n, we form the average of all differences between larger and smaller.Clearly, by examining Eq. ( 6), we see that this average is simply equal to 2l 2 .Similar considerations apply to the other Lmoments.Wang now shows us a method of covering all possible combinations, that may be large even for small sample sizes, to ef®ciently compute the direct estimators of L-moments.

2.1.5
The Wang scheme where, as before, x i Y i 1Y 2Y F F F Y n are samples ranked in ascending order and For a derivation of these equations, please see Wang (1996).

Maximum entropy density estimation
A very useful, and perhaps the best, method for conservatively assigning probabilities consists of maximizing the entropy of the unknown distribution subject to constraints on its moments (Jaynes, 1957).The problem can be solved by means of Lagrange multipliers in the case that the constraints are given by the conventional moments (i.e.mean, variance, skewness, kurtosis, etc.).Additionally, a normalization constraint is included.The formulation leads to an unconstrained problem where the Lagrange multipliers represent the unknowns.The entropy of a discrete distribution has been de®ned by Shannon (1948) as where p i is the probability of one of N possible outcomes of a given experiment.For a continuous distribution, is known as the relative or cross-entropy (Rietsch, 1977;Shore and Johnson, 1981), where x is a continuous variable that lies in the range aY b, and qx is Jaynes' invariant measure.This term represents a state of knowledge against which one makes comparisons.It is also known as the prior.
In practice, a and b are determined from the data (e.g.minimum and maximum values of the sample), and the prior is often the uniform distribution on aY b (the least informative prior).A particularly ¯exible method of incorporating prior information is by means of the principle of minimum relative entropy, that has found important application in the inversion of linear problems of general interest (see Woodbury and Ulrych, 1998 for a complete review).
The normalization is written as where l r are estimated from the data using sample statistics.Maximizing Eq. ( 21) with an uniform prior subject to the above constraints, leads to The Lagrange multipliers k r are obtained by solving a set of K 1 non-linear equations.These equations come from replacing the solution, Eq. ( 24), into each of the constraints.It is important to note here that the constraints are linear with respect to the unknown distribution.This allowed us to write Eq. ( 24) as a function of k r only.
We propose the use of L-moments instead of C-moments in the maximum entropy determination of probability densities.The advantages of L-moments have already been summarized in a previous section and more details can be found in Hosking (1991) and Royston (1992).Because L-moments are de®ned in terms of Fx rather than in terms of px, the constraints cannot be easily incorporated into the optimization problem through Lagrange multipliers.Rather, we de®ne the following cost function to be minimized with respect to the unknown distribution: In Eq. ( 25), the expression between square brackets is a penalty term (a is a constant) that takes into account the constraints.Distribution L-moments, l r , are evaluated using Eqs.(10)±(13) together with Eq. ( 9), whereas sample L-moments, l r , are computed using Eqs.(15)±(18).
All integrals appearing in the above equations are computed numerically by means of Gauss±Legendre M-point quadrature, where M is the number of points in aY b, with which we approximate the distribution, and correspond to the roots of the Gauss±Legendre polynomial (see for example Press et al. (1992)).In practice, we found it very useful to write all unknowns in terms of increments d i , that is and compute p i using central ®nite differences: in such a way that the cost function ( 25) can be expressed in terms of d i only.Note that setting 0 d i 1 is very convenient for ensuring that the F i 's are monotonically increasing and the p i 's are all positive.Then we write x i F r i p i w i X 30 In summary, the optimization problem consists of ®nding the unknown increment vector d, such that Ud is minimum.Since the cost function Ud is highly non-linear, we can either solve the problem by means of a linearization approach, that may depend strongly on a good initial guess, or by using a global optimization algorithm (for example, simulated annealing).Since we believe that the solution of the C-moments problem is, indeed, a good initial guess, we use a local convergent algorithm to minimize the cost function.Here, given d j at iteration j, we compute F j and p j using Eqs.( 26) and ( 27) respectively.We then compute H and the constraint terms (including the PWMs) as described above.

Examples
We now illustrate the behavior of the non-parametric density estimation using the maximum entropy criterion with C-and L-moment constraints.We choose two distributions that re¯ect only kurtosis and both skewness and kurtosis, and a mixture of two distributions that re¯ects both skewness and kurtosis, too.Clearly, this is a rather limited investigation, but, we are primarily interested in some broad rather than detailed conclusions.Field data examples are also considered.
We examine some realizations of re¯ection coef®cients obtained from well logs, and log-hydraulic conductivity values from an aquifer.All ®gures also show the pdf derived using a kernel approach, speci®cally the Epanechnikov kernel (Silverman (1986)), for comparison.

Symmetric distribution
The ®rst example illustrates the method for samples of various sizes drawn from a Laplace distribution, with and without outliers.The Laplace probability density function with location n and scale s is given by This distribution has p b 1 0 and b 2 24s 2 .For the experiment we set s 0X5 and generated 120 realizations of sample sizes between 10 and 250, and estimated the density function using the ®rst four C-moments and L-moments, respectively.The results of calculating the skewness and kurtosis coef®cients from the derived pdf 's, as well as the root-mean square (rms) error between true and estimated pdf, are shown in Fig. 1 (1st and 2nd rows).We can observe that the L-moments solution yields smaller rms error for all sample sizes, though the resulting C-moments derived from the density function are no better than the sample C-moments.Although the kurtosis is less biased towards smaller values, its variance is larger than that of the sample kurtosis.A possible explanation for this behavior is that C-moments contain much more information regarding the tail of the distribution than the L-moments.On the contrary, L-moments are insensitive to the tails of the distribution and thus provide more information for modeling the body of the distribution.
Figure 2 compares the resulting pdf for 10 independent realizations of sample size 150.The sharp peak of the Laplace distribution is recovered more accurately using L-moment than C-moment constraints.Table 1 summarizes the mean and standard deviations of the resulting skewness, kurtosis, and rms error for 100 independent realizations of sample size 150.The same experiment was repeated but adding a single outlier point at X 2X5.The results are shown in Fig. 1 (3rd and 4th rows).The bias towards higher moment values is now very clear, even for the larger sample sizes.Though Cmoments derived from the L-moment constrained pdf have large variance, they are less sensitive to the presence of the outlier.The rms error is also smaller and the shape of the true distribution is more accurately recovered, as may be seen in Fig. 2. Note the failing to reproduce the left shoulder of the pdf in the C-moment constrained case.This is because the right tail is ¯atter so as to ®t the high kurtosis values.

Skewed distribution
In the next example we used a Log-normal distribution with parameters n and s: n has been selected in such a way that l n exps 2 a2 0 for convenience.For s 0X4, we obtain n À1X083Y p b 1 1X32 and b 2 6X24.The results of the simulations for sample sizes between 10 and 250 are depicted in Fig. 3 (1st and 2nd rows).As observed from the ®gure, both methods perform well in terms of the resulting C-moments and estimated distributions, though the C-moment constrained pdf error curve shows less variability.Figure 4 shows 10 realizations using sample sizes of 150, and Table 2 summarizes mean and standard deviations for 100 realizations.
The computations were repeated with the addition of a single outlier at X 5X0.As expected the C-moments are heavily biased toward larger values, as illustrated in Fig. 3 (3rd and 4th rows), especially for smaller data samples.The bias is smaller in the L-moment case, while the error is not much different from the no outlier case.

Bimodal distribution
Perhaps the major impact of using L-moments instead of C-moments can be best appreciated by the following example.Random samples have been drawn from a mixture of two Gaussian distributions NÀ1Y 0X2 and N1X0Y 0X2, where the proportion of the mixture was 40% and 60%, respectively.We run the same experiment as in the previous examples and the results are displayed in Figs. 5  and 6.Here, we added a single outlier at X 7X0.In all cases the rms error of the L-moment constrained pdf is signi®cantly smaller that the C-moment counterpart.At the same time, the conventional skewness and kurtosis measures obtained from the resulting L-moment constrained pdf show less sensitivity to sample size and outliers than the sample statistics.In some cases, the two modes of the distribution can only be well-resolved using L-moments, as illustrated in Fig. 6.When the outlier has been added, both the C-moment (constrained) and the kernel pdf 's fail to reproduce the true distribution.
Although not shown here, the use of a larger sample size does not affect signi®cantly the estimation of the pdf using C-moments, unless more than the ®rst four moments are utilized.On the other hand, the ®rst four L-moments appear to contain more complete information about the distribution shape than the ®rst four C-moments.In this case, and as opposed to L-moments, C-moments are very sensitive to the tails of the distribution, and not very good at characterizing its center, as shown in Fig. 5 and Table 3.

Field examples
We obtained interesting results while computing C-and L-moment derived pdf 's for some realizations of re¯ection coef®cients obtained from well logs in the Campos basin, offshore Brazil (Rosa and Ulrych, 1991).We examined many different logs but show the results of only one for obvious reasons.These results are, however, remarkably representative.Figure 7 (top row) shows the derived Fig. 3. Skewness, kurtosis and rms error of the estimated pdf's with C-and L-moments constraints for samples drawn from a Log-normal distribution without outlier (1st and 2nd rows), and with outlier at X 5X0 (3rd and 4th rows) Fig. 4. Estimated pdf 's and error for the Log-normal distribution without outlier (top row) and with outlier (bottom row).Sample size 150 (10 independent realizations).Dashed line shows the true pdf pdf 's when the whole sample containing 1000 points is considered.Figure 7 (bottom row) shows the results when the pdf 's are computed with 100 point nonoverlapping windows to avoid non-stationary effects in the computation.Apart from a single 100 point realization that yields a broad bell shaped distribution on both the C-and L-moment pdf 's, the L-moment derived pdf 's show more consistency, sample to sample.
A second ®eld example consists of log-hydraulic conductivity values from the Borden aquifer in Ontario, Canada.This particular data set, and subsequent interpretations have sparked considerable controversy in the literature (Sudicky, 1986).Domenico and Schwartz (1998, p. 38) make the point that ``there is no hard and fast rule that hydraulic conductivity is log-normally distributed, or for that matter are other parameters''.Indeed, Woodbury and Sudicky (1991) show that other distributions may be appropriate for a valid description of the hydraulic conductivity.Woodbury and Sudicky (1991, Figs. 1, 2) also note the presence of outliers in the data set, i.e. values of ln-K of less than À6X0, and that the data set has a signi®cant negative skew.What is then, the effect of outliers on the parametric modeling of such a data set?In Fig. 8 we present the estimated pdf 's using 100 samples out of the total data set of 1188 values of ln-K data (solid line).The sub-sampling procedure is used to examine statistics from a data set affected by correlation (Woodbury and Sudicky, 1991).In this Fig. 8. ln-K pdf 's.Data from the Borden Aquifer, Ontario Canada (Sudicky, 1986).Smaller plots show the difference between the no-outlier and the outlier cases particular case, there is almost no visual difference between the C-and the Lmoment solutions.The calculations were also repeated including a single outlier at ln-K À7X0 (dashed line), and at ln-K À1X0 (dotted line), in order to simulate possible data errors.In the second case (outlier at ln-k À1X0), sample skewness and kurtosis go from À0X31 and 3.02 to 1.92 and 14.48, respectively.On the contrary, skewness and kurtosis derived from the L-moments constrained pdf are 0.53 and 5.83 respectively, values that are much closer to those of the no-outlier case (see Table 4).

Summary and conclusions
We have attempted to answer, or at least explore, the question of how C-and Lmoments are related.The approach we have used is to compare these moments by means of the pdf 's that can be computed by using them as constraints in the inversion.First, we summarize in the form of a partial list, some of the properties of L-moments as reported in the literature.1. L-moments, as compared to C-moments, are linear functions of the data and suffer less from effects of sampling variability, are more robust to outliers and are less sensitive to sample size (Hosking, 1991).Of particular signi®cance is the much larger bias that p b 1 and b 2 exhibit for small sample sizes, in comparison to L-moment ratios.2. In general, Royston (1991) points out that whereas p b 1 and b 2 are very sensitive to small and perhaps unimportant perturbations in the tails of the distribution, s 3 and s 4 are dependent on changes in the shape of the main portion.3. A Normal distribution, NlY r 2 , may be described in terms of its L-moments as l 1 l and l 2 p À1a2 r.Unfortunately, according to Hosking, an L-moment analog of covariance is not easily de®ned.4. Another observation by Hosking that has unfortunate rami®cations for the inversion problem (see Gouveia et al., 1998), is that no extension of L-moments to the multivariate case is immediately apparent.5. Whereas in some cases (Cauchy distribution being an example) C-moments may not exist, L-moments are guaranteed to exist.
The points enumerated above, clearly describe L-moments as statistics that show many advantages in comparison to conventional measures.We have also found, in our simulations, that, in general, L-moments are much less sensitive to sample size and, in particular to outliers.Perhaps the examples that are illustrated in this paper do not underline this conclusion, but certainly the work of other researchers as well as many examples that we do not have room to report, strongly support this observation.Given these properties, we had hoped that pdf 's computed using L-moments could then be used to compute C-moments that would be more robust measures than those derived from sample statistics.This has not turned out to be unequivocally so.It is true that the pdf 's derived from L-moment constraints are ``better'' than those derived from C-moments in a mean squared error sense.Although it may be argued, from the limited experiments that we have performed that, when outliers are present in the data, C-moments derived from pdf 's computed from L-moment constraints show somewhat less bias and a smaller variance than C-moments computed from samples, the difference is certainly not dramatic.
The fact that the L-moment derived pdf 's are superior estimates of the true pdf 's is interesting and may be of importance when the shape of the pdf is the object of the experiment.The Laplace pdf, for example, serves as a good illustration of the different estimates.It is clear that the L-moment derived pdf shows much more decisively than the conventional estimate that the pdf may be Laplacian.Such a conclusion can, of course, be then used to parametrically ®t an actual Laplacian pdf to the data and thereby obtain estimates of the C-moments.This conclusion is also supported by the mixture of two distributions example, which demonstrates that L-moments are better for characterizing the main portion of the distribution than the C-moments.
An observation worth making that stems from our work, and is well demonstrated by our ®rst three examples, is that the degree to which L-moments are superior estimates to C-moments, for a given small data sample, depends to a large extent on the pdf itself.i.e., the robustness of L-moments is, to some extent, data dependent.
In our limited experience, application of L-moment derived pdf 's to ®eld data appears to show strong advantages.The derived pdf 's are certainly more consistent when various logs from the same area are compared.The same observation applies when short data lengths from the same log are considered in order to avoid possible non-stationarity.In the case of ®eld data that included an imposed outlier, L-moment statistics were considerably more robust than the C-moment counterparts.
notation) the usual indices of skewness, p b 1 , and kurtosis, b 2 , as p b 1 EX À l 3 ar 3 2 and b 2 EX À l 4 ar 4 3 l r Y r 1Y F F F Y K Y 23

Fig. 1 .Fig. 2 .
Fig.1.Skewness, kurtosis and rms error of the estimated pdf 's with C-and L-moments constraints for samples drawn from a Laplace distribution without outlier (1st and 2nd rows), and with outlier at X 2X5 (3rd and 4th rows)

Fig. 5 .Fig. 6 .Fig. 7 .
Fig.5.Skewness, kurtosis and rms error of the estimated pdf 's with C-and L-moments constraints for samples drawn from a mixture distribution without outlier (1st and 2nd rows), and with outlier at X 7X0 (3rd and 4th rows)

Table 1 .
Mean and standard deviation of the resulting skewness, kurtosis, and rms error for 100 independent realizations of sample size 150 for the Laplace pdf.Primes indicate outlier

Table 2 .
Mean and standard deviation of the resulting skewness, kurtosis, and rms error for 100 independent realizations of sample size 150 for the Log-normal pdf.Primes indicate outlier

Table 3 .
Mean and standard deviation of the resulting skewness, kurtosis, and rms error for 100 independent realizations of sample size 150 for the mixture pdf.Primes indicate outlier

Table 4 .
Skewness and kurtosis for data from the Borden Aquifer