Rarefied particle motions on hillslopes – Part 3: Entropy

Theoretical and experimental work (Furbish et al., 2021a, b) indicates that the travel distances of rarefied particle motions on rough hillslope surfaces are described by a generalized Pareto distribution. The form of this distribution varies with the balance between gravitational heating, due to conversion of potential to kinetic energy, and frictional cooling, due to particle–surface collisions; it varies from a bounded form associated with rapid thermal collapse to an exponential form representing isothermal conditions to a heavy-tailed form associated with net heating of particles. The generalized Pareto distribution in this problem is a maximum entropy distribution constrained by a fixed energetic “cost” – the total cumulative energy extracted by collisional friction per unit kinetic energy available during particle motions. That is, among all possible accessible microstates – the many different ways to arrange a great number of particles into distance states where each arrangement satisfies the same fixed total energetic cost – the generalized Pareto distribution represents the most probable arrangement. Because this idea applies equally to the accessible microstates associated with net cooling, isothermal conditions and net heating, the fixed energetic cost provides a unifying interpretation for these distinctive behaviors, including the abrupt transition in the form of the generalized Pareto distribution in crossing isothermal conditions. The analysis therefore represents a novel generalization of an energy-based constraint in using the maximum entropy method to infer non-exponential distributions of particle motions. Moreover, the energetic costs of individual particle motions follow an extreme-value distribution that is heavy-tailed for net cooling and light-tailed for net heating. The relative contribution of different travel distances to the total energetic cost is reflected by the product of the travel distance distribution and the cost of individual particle motions – effectively a frequency–magnitude product.


Introduction
In two companion papers (Furbish et al., 2021a, b) we examine a theoretical formulation of the probabilistic physics of rarefied particle motions and deposition on rough hillslope surfaces. The formulation is based on a description of the kinetic energy balance of a cohort of particles treated as a rarefied granular gas and a description of particle deposition that depends on the energy state of the particles. The formulation predicts a generalized Pareto distribution of particle travel distances whose form varies with the balance between gravitational heating, due to conversion of potential to ki-netic energy, and frictional cooling, due to particle-surface collisions. Specifically, the generalized Pareto distribution varies from a bounded form associated with thermal collapse and rapid deposition to an exponential form representing isothermal conditions to a heavy-tailed form associated with net heating of particles and decreased deposition. The transition to a heavy-tailed form likely involves an increasing conversion of translational to rotational kinetic energy leading to larger travel distances with decreasing effectiveness of collisional friction. As described in Furbish et al. (2021b), these varying forms of the generalized Pareto distribution are consistent with laboratory measurements of particle travel D. J. Furbish et al.: Rarefied particle motions -Part 3: Entropy distances reported by Gabet and Mendoza (2012) and Furbish et al. (2021b) and with field-based measurements of travel distances reported by DiBiase et al. (2017) and Roth et al. (2020).
Here we highlight a key point in Furbish et al. (2021a). Namely, the generalized Pareto distribution is not selected in an empirical manner based on goodness-of-fit criteria applied to data sets. Rather, this distribution is dictated by the physics of the problem, just as, for example, the Boltzmann distribution (an exponential distribution) emerges in classical statistical mechanics from consideration of the accessible energy microstates of a gas system. In this problem the versatile form of the generalized Pareto distribution -specifically its apparent success in describing three distinctive energetic behaviors of rarefied particle motions -is enigmatic. Although the different energetic behaviors have a clear mechanical explanation, the transition from a bounded form to a heavytailed form in crossing isothermal conditions is abrupt. The basis of this transition, including the upper bound on travel distances prior to transition, is unclear -whether it represents a fundamental change in mechanical behavior or is simply a mathematical curiosity of the generalized Pareto distribution.
The purpose of this third companion paper therefore is to further elaborate the probabilistic physics of particle motions as represented by the generalized Pareto distribution. To do this we appeal to the principle of maximum entropy as outlined in the pioneering work of Jaynes (1957a, b). We specifically demonstrate that in this problem the generalized Pareto distribution is a maximum entropy distribution constrained by a fixed total energetic "cost" -the total cumulative energy extracted by collisional friction per unit kinetic energy available during particle motions. The relative energetic cost locally increases with increasing travel distance for net particle cooling and rapid thermal collapse, it is uniform for isothermal conditions, and it decreases with increasing travel distance for net particle heating. The cumulative cost involves integrating the local cost over the particle travel distance, and the total cumulative cost is then obtained by summing over all particles. This fixed total cost unifies the interpretation of the three energetic behaviors, where the upper bound on travel distances prior to transition is a probabilistic mechanical outcome.
As a point of reference, the canonical example of a maximum entropy distribution is the Boltzmann distribution of the energy states of the particles composing an ordinary gas at thermal equilibrium. Similarly, the Maxwell-Boltzmann distribution of particle speeds, which is derived from the Boltzmann distribution, is a maximum entropy distribution. Here we are referring to the Gibbs entropy of statistical mechanics. A maximum entropy distribution then is the unique distribution that maximizes the Gibbs entropy, subject to constraints imposed on the system. In the canonical case these constraints consist of a fixed number of particles and a fixed total energy, which together guarantee a fixed average energy equal to k B T , where k B is the Boltzmann constant and T is temperature. Moreover, any other distribution of particle energy states satisfying these constraints would coincide with a lower Gibbs entropy. Jaynes (1957a, b) elaborated the significance of the fact that the Gibbs entropy in statistical mechanics and the Shannon entropy in information theory are essentially one and the same, differing only by a constant. This similarity inspired Jaynes to champion the use of a maximum entropy criterion in choosing a probability distribution, leading to what is now known as the maximum entropy method (a.k.a. MaxEnt or MEM). The key idea of the maximum entropy method, whether viewed as a method of statistical mechanics or as one of inferential statistics, is that it provides an unbiased choice of a distribution by honoring only what is known mechanically about a system. That is, this unbiased choice is a maximally noncommittal choice that is faithful to what we do not know; it is therefore the most reasonable choice in the absence of additional information (Jaynes, 1957a;Williamson, 2010, 25 and 51 pp.). Importantly, mechanical constraints imposed on the system are part of the choice of the distribution, as opposed to empirical fitting without regard to such constraints. The maximum entropy method has been applied in a remarkable variety of fields (Shore and Johnson, 1980;Ramirez and Carta, 2006;Verkley and Lynch, 2009;Singh, 2011;Peterson et al., 2013), including sediment transport (Furbish and Schmeeckle, 2013;Furbish et al., 2016).
In using the maximum entropy method, constraints imposed on the system normally translate to constraints imposed on the moments of the distribution. In this case the method leads to a distribution that is among the exponential family (e.g., exponential, Gaussian). However, applications of the maximum entropy method to non-exponential distributions, including heavy-tailed distributions, are of particular interest in many problems (Peterson et al., 2013). As described below, applying this method to heavy-tailed distributions presents a special challenge in that the first or second moment, or both of these moments, may be undefined for such distributions, including the generalized Pareto distribution (Pickands, 1975;Hosking and Wallis, 1987).
In Sect. 2 we provide background material, namely, the essential elements of the formulation of Furbish et al. (2021a) leading to the generalized Pareto distribution of particle travel distances, and a summary of the properties and derivation of a maximum entropy distribution. In Sect. 3 we describe how the energetic cost associated with collisional friction is expressed as a constraint used in the maximization method. In Sect. 4 we show how the generalized Pareto distribution is obtained as a maximum entropy distribution. In Sect. 5 we describe the probabilistic properties and significance of the energetic cost. We consider the implications of the analysis in the final section. In the fourth companion paper (Furbish and Doane, 2021) we step back and examine the philosophical underpinning of the statistical mechanics framework for describing sediment particle motions and transport.  (Furbish et al., 2021a).

Elements of the distribution of travel distances
With reference to Fig. 1, let x denote the particle travel distance with probability density function f x (x). The theoretical formulation (Furbish et al., 2021a) then begins with the particle disentrainment rate function defined by (1) is the cumulative distribution function. The disentrainment rate P x (x) may be interpreted as a conditional probability per unit distance. Namely, upon multiplying both sides of Eq. (1) by dx, then P x (x)dx = f x (x)dx/R x (x) is interpreted as the probability that a particle will become disentrained within the small interval x to x + dx, given that it "survived" travel to the distance x. In turn, upon rearranging Eq.
(1) and making use of the fact that f x (x) = −dR x (x)/dx, the density f x (x) is obtained from Thus, the significance of the disentrainment rate function becomes clear: it completely determines the density f x (x) via Eq.
(1) and (2) are standard elements of survival (or reliability) analysis, without reference to entropy. The particle energy balance formulated in Furbish et al. (2021a) leads to the result that for a given particle size and shape the disentrainment rate on an inclined surface with uniform slope and roughness is Substituting Eq. (3) into Eq.
(2) then leads to the generalized Pareto distribution, where A ∈ R is a shape parameter and B > 0 is a scale parameter (Pickands, 1975;Hosking and Wallis, 1987). The cumulative distribution is and the exceedance probability is For A < 1 the mean is and for A < 1/2 the variance is .
The mean is undefined for A ≥ 1 and the variance is undefined for A ≥ 1/2. In mechanical terms the shape and scale parameters A and B are Here, S is the magnitude of the slope inclined at an angle θ , m is particle mass, g is acceleration due to gravity, µ is a friction factor due to extraction of particle kinetic energy E p = (m/2)u 2 where u is the surface-parallel particle velocity, E a = E p is the arithmetic average particle energy so that E a0 is the initial average energy at x = 0, γ = E a /E h where E h is the harmonic average particle energy, and α = α 0 /(1 − µ 1 Ki) where α 0 and µ 1 are factors of order unity and Ki is the Kirkby number defined by which represents the ratio of gravitational heating to frictional cooling. Here we emphasize that mg cos θ in Eq. (10) is not to be interpreted as the static normal weight of the particle, and µ is not interpreted as a Coulomb-like friction coefficient. Rather, µ ∼ β x , where β x denotes the expected proportion of particle kinetic energy extracted per particlesurface collision during downslope motion. Details are provided in Furbish et al. (2021a, b). For plotting purposes we define a characteristic particle cooling distance X = E a0 /mgµ cos θ and in turn define the following dimensionless quantities denoted by circumflexes: Figure 2. Plot of dimensionless probability density fx (x) versus dimensionless travel distancex for scale parameter b = 1 and different values of the shape parameter a for (a) a < 0 and (b) a ≥ 0 with associated exceedance probability plots (insets). Figure reproduced from companion paper (Furbish et al., 2021a). Compare with Fig. 1 in Hosking and Wallis (1987).
In addition, a = A and b = (α/γ )Ê a0 . Then the dimensionless form of the generalized Pareto distribution, Eq. (4), is written as For a < 0 the density fx(x) is bounded atx = b/|a| (Fig. 2). This density increases withx for a < −1, it is uniform for a = −1, and it decreases with x for a > −1. It is triangular for a = −1/2. For a = 0 the density fx(x) is exponential. For a > 0 this density is heavy-tailed. For a ≥ 1 the mean of fx(x) is undefined, and for a ≥ 1/2 the variance is undefined. We note that the definition of the differential entropy given in the next section involves the logarithm of the probability density function. In a strict sense this is acceptable only if the density is expressed in dimensionless form as in Eq. (13) or if the definition involves a discrete probability mass function. Nonetheless, the maximization method removes this logarithm such that the outcome is dimensionally the same whether one starts with the dimensional form or the dimensionless form of the density. For simplicity we use the dimensional form, Eq. (4). In addition, for simplicity in plotting we set the scale parameter B = 1 in calculated functions containing this parameter, and in several plots we use dimensional abscissa values (e.g., distance x) without reference to units, noting that these have the same visual appearance as if plotted using dimensionless values.
Following Furbish et al. (2021b) we calculate the quantities Based on Eq. (6), values of the modified exceedance probability R * and the dimensionless travel distance x * should collapse to a straight line in a log − log plot with slope of −1 (Fig. 3). The data in this figure, spanning more than 3 orders of magnitude of the dimensionless travel distance x * , are compiled from Furbish et al. (2021b;Fig. 16 therein).
Values of A and B are estimated from laboratory measurements of particle travel distances reported by Gabet and Mendoza (2012) and Furbish et al. (2021b) and from fieldbased measurements of travel distances reported by DiBiase et al. (2017) and Roth et al. (2020). This plot does not prove, but nonetheless supports, the idea that the generalized Pareto distribution correctly describes the energetics of the behavior of rarefied particle motions for a variety of slope and surface roughness conditions. The data fits for individual experiments with detailed explanation are presented in Furbish et al. (2021b).

Maximum entropy distribution
If x denotes a continuous random variable with probability density f x (x) over x = [0, ∞), then the differential entropy of x is defined as where it is understood that f Given the lineage of this definition, hereafter we follow Peterson et al. (2013) and refer to it as the Boltzmann-Gibbs-Shannon (BGS) entropy. In turn, let g j (x) denote a measurable quantity of x with j = 0, 1, 2, . . ., n. We then assume that with finite a j , where E[ ] denotes the expectation. For example, if g 0 (x) = g 0 = 1, then Eq. (16) gives a0 = 1. That is, the density f x (x) integrates to unity. If g 1 (x) = x, then Eq. (16) gives the mean of the distribution, a 1 = µ x . If g 2 (x) = (x − µ x ) 2 , then Eq. (16) gives the variance, a 2 = σ 2 x . Note, however, that g j (x) need not be selected just to obtain the usual moments of a distribution. Indeed, Eq. (16) may represent a constraint imposed by a function g j (x) that does not coincide with a moment of f x (x). As described below, this is essential for heavy-tailed distributions whose first or second moment, or both of these moments, is undefined. The maximum entropy distribution is then given by where λ 0 , λ 1 , λ 2 , . . . are Lagrange multipliers introduced in the problem of maximizing the entropy H (x) (Appendix A). Moreover, as above we set g 0 (x) = g 0 = 1 with a0 = 1, which guarantees that the probability density f x (x) integrates to unity. As a point of reference, a fixed mean with g 1 (x) = x and no other constraint leads to the result The Lagrange multipliers are then obtained as follows. By the definition of a probability density, which leads to e λ 0 = −λ 1 . Alternatively, Eq. (18) sometimes is presented as (e.g., Tolman, 1938;Schrödinger, 1946;Furbish and Schmeeckle, 2013) where it becomes clear that e λ 0 is a normalization factor that ensures the probability density integrates to unity. In turn, by the definition of the mean, which leads to λ 1 = −1/µ x and the exponential distribution, where it becomes clear that the Lagrange multiplier λ 1 enforces the constraint of a fixed mean. The Gaussian distribution is similarly obtained as the maximum entropy distribution with the constraint imposed by a fixed second moment (variance). The canonical example of the Boltzmann distribution of particle energy states is obtained in this manner as a maximum entropy distribution, where the mean is independently determined to be k B T (e.g., Schrödinger, 1946). The imposed constraints consist of extensive quantities that scale with system size: a fixed number of particles and a fixed total energy, which together guarantee a fixed mean energy. In a similar manner, Furbish and Schmeeckle (2013) and Furbish et al. (2016) derive an exponential distribution for the streamwise velocity states of particles transported as bed load, with the mechanical constraint imposed by a fixed total particle momentum under equilibrium transport conditions. Our next task is to adapt these ideas to the generalized Pareto distribution, which is not among the exponential family of distributions. We note that there is a continuing effort given to this topic, notably in relation to heavy-tailed (nonexponential) distributions. Peterson et al. (2013) summarize the basis of these efforts and note that one approach for inferring non-exponential distributions is to appeal to nontraditional definitions of the entropy, for example, the Tsallis entropy (Tsallis, 1988), rather than the canonical BGS entropy. The procedure is the same: to maximize the defined entropy subject to an extensive constraint that scales with the system size. Here, however, we adopt the view of Peterson et al. (2013), who highlight the conclusions of Shore and Johnson (1980). Namely, because the BGS definition of entropy uniquely ensures addition and multiplication rules of probability, any other definition of entropy yields a bias in the fitting of data. Peterson et al. (2013) suggest that this offers a "compelling first-principles basis for defining a proper variational principle for modeling distribution functions". Like these authors in their analysis of the energetics associated with the economics of scale, we retain the BGS definition of entropy and seek a non-extensive energy constraint aligned with the mechanics of the rarefied particle motion problem.

Energetic cost as a maximizing constraint
In the canonical example of the Boltzmann distribution, the particle energy state is an instantaneous quantity. Similarly, in the example of bed load particle velocities (Furbish and Schmeeckle, 2013;Furbish et al., 2016), the velocity state is an instantaneous quantity. The state of a particle changes from one instant to the next, and this state can be reached from smaller or larger state values. In these cases, the total particle energy and the total streamwise momentum are well-defined extensive quantities such that the moments of the distributions are fixed. In the absence of additional information, the maximum entropy distribution must be among the exponential family.
In contrast to instantaneous quantities, the particle travel distance x is an integrated quantity that reflects a dynamical particle history starting from the state x = 0. The state x must be reached from smaller (unrecorded) state values; it cannot be reached from larger state values. Moreover, travel distances are not like an extensive quantity that scales linearly with the system size. Nonetheless, particle motions require a source of energy and dissipation of energy. Following Peterson et al. (2013) we assume that the outcome of motions -the travel distances x -can be represented in terms of an energetic cost that probabilistically constrains the organization of a great number of particles into accessible states x.
The disentrainment rate P x (x) has special significance in defining the energetic cost. In particular, this rate determines the energetic cost associated with reaching the state x. We start by using Eq. (10) to rewrite Eq. (3) as The denominator in Eq. (23) describes how the average particle energy E a (x) varies with x, whether this involves net cooling (A < 0), isothermal conditions (A = 0) or net heating (A > 0). The quantity mgµ cos θ in the numerator is the expected spatial rate at which energy is extracted by collisional friction, modulated by the factor γ /α. Thus, the disentrainment rate represents the local relative energetic costthe spatial rate at which particle energy is extracted per unit kinetic energy available during motion at position x.
In turn, the relative energy extracted within a small interval dx is P x (x)dx, so the cumulative energy extracted per unit kinetic energy available is This is the cumulative energetic cost in reaching position x.
For isothermal conditions (A = 0) the cumulative cost is For non-isothermal conditions (A = 0) the cumulative cost is These two expressions for w(x) converge at small x (Fig. 4).
Relative to the linear cumulative cost of isothermal conditions, Eq. (25), the cumulative cost with net cooling (A < 0) increases more rapidly up to the limiting distance given by x = B/|A|, and the cumulative cost with net heating (A > 0) increases more slowly with increasing distance x.
Consider first the isothermal case to illustrate the significance of the cost w(x). This cost increases linearly with the distance x. Let N denote a great number of particles. Among all accessible microstates -the many ways of arranging N particles into states x where each arrangement has a fixed total cost -most microstates involve particles with small state values and fewer with large state values. As shown below, this constraint leads to an exponential distribution. Note that Furbish and Schmeeckle (2013) provide a detailed description of the analysis leading to this outcome, including the basis for counting microstates (see Fig. 3 and Appendix B therein), as applied to particle momentum states rather than travel distance states x. Nonetheless the analysis is otherwise conceptually identical. Tolman (1938) and Schrödinger (1946) provide clear descriptions of the canonical problem (in particular see chap. II, "The Method of the Most Probable Distribution", in Schrödinger's text).
With non-isothermal conditions and net heating, it is easier to achieve larger state values than with isothermal conditions. Among all accessible microstates, an increasing proportion will have particles in larger states than would be predicted with a uniform cost rate. In contrast, with net cooling a smaller proportion of microstates will have particles in large states x with an increasing relative cost to achieve these large states. Indeed, there is a limit on available energy to be spent in frictional cooling such that the relative cost goes to infinity at x = B/|A|. As shown below, these constraints lead to the generalized Pareto distribution.
The energetic cost w(x) is a natural choice for constraining the maximization method. As described in Sect. 6 ("Discussion and conclusions"), this choice is identical in form to the language of "cost" in the economics of scale (Peterson et al., 2013) leading to non-exponential (heavy-tailed) distributions of state values. We use these ideas next in deriving the maximum entropy distribution.

Constraints
Focusing on the generalized Pareto distribution, as above we start with the constraint given by g 0 (x) = g 0 = 1, namely, A second, strong mechanical constraint is provided by assuming that the total cumulative energetic cost associated with collisional friction is fixed. Starting with Eq. (24), which is the cumulative energy extracted by friction per unit kinetic energy available in reaching position x. Then, which is the average cumulative cost. Starting with isothermal conditions (A = 0), the disentrainment rate P x (x) = P x = 1/B. This gives which shows that the expected cumulative relative cost is unity with µ x = B. This is nominally the same as saying that the expected absolute cost is equal to the initial available energy E a0 . More generally with P x (x) = 1/(Ax + B), We use these two results in the maximization of entropy.

Cumulative energetic cost
Because of the importance of the energetic cost as a constraint in the maximum entropy method, here we examine the properties of this cost. The cumulative energetic cost w is a monotonic function of the travel distance x according to Eqs. (25) and (26), so we can readily deduce (Appendix B) the probability density function f w (w) of the cost w. For isothermal conditions (A = 0) this density is with mean µ w = 1. The cumulative distribution is For non-isothermal conditions (A = 0) the density is which has attributes of an extreme value distribution. The mean is where Ei denotes the exponential integral. The cumulative distribution is Note that these functions depend on the shape parameter A but not on the scale parameter B. Whereas the generalized Pareto distribution of travel distances x for net cooling (A < 0) is bounded at x = B/|A| (Fig. 2), the probability density f w (w) of energetic costs w is unbounded (Fig. 5). For isothermal conditions (A = 0) the cost w is linearly related to the travel distance x, so the distribution f w (w) has the same exponential form as f x (x). With net cooling (A < 0) the distribution f w (w) is heavytailed, and with neat heating (A > 0) it is light-tailed. With cooling the energetic cost w increases with distance x up to x = B/|A|, so probability is shifted to larger values of w. With heating the energetic cost decreases with distance x, so probability is shifted to lower values of w with increasing A. Over the domain −1 ≤ A ≤ 1 the average cost µ w has a maximum at an intermediate value of A ≈ −0.33 (Fig. 6). For conditions to the left of the maximum the relative costs of motions are large, but the travel distances are small. For conditions to the right of the maximum the travel distances are larger but with smaller relative costs. For conditions of net heating (A > 0) the travel distances increase, but the relative costs decrease. The total cumulative cost W (w) up to the value w is Alternatively, the total cumulative cost up to the distance x is Expressions for Eqs. (42) and (43) are provided in Appendix B and show how the total costs W (w) and W (x) grow with increasing w and x to a finite value. Consider here the product W * (x) = w(x)f x (x) = dW (x)/dx, which is the total cost per unit travel distance. This function is like a frequency-magnitude product and reflects the relative contribution to the total cost of different parts of the travel distance domain (Fig. 7). For net cooling (A < 0) and large negative A the total cost is dominated by the high individual costs of the largest travel distances near the upper bound given by x = B/|A|. With increasing A the cost becomes more evenly distributed. At isothermal conditions (A = 0) the total cost is dominated by travel distances near the mean distance. For net heating the total cost is dominated by the relatively large individual costs and proportions of small travel distances, although the contribution of large travel distances grows with increasing A.

Frictional loss to heat
The energetic cost outlined above pertains to the conversion of translational kinetic energy into other forms, including rotational energy, surface deformation and heat -all under the heading of collisional friction. This cost, however, is not the same as the total energy conversion to heat. Consider the total energy extracted by friction and ultimately converted to heat. Note first that the quantity mg sin θ at first glance normally is interpreted as the downslope component of the weight of a particle (or control volume) with mass m. In energetic terms, however, this quantity is to be interpreted as the accessible gravitational potential energy per unit downslope travel distance (Furbish et al., 2021a). For an individual particle traveling a distance x the heat generated is Taking the ensemble average of Eq. (44) and using Eq. (7), The total heat generated by N particles is then N µ q p . As a fun point of reference, 100 particles, each with a diameter of 0.1 m and an average starting velocity of 1 m s −1 traveling an average distance of 10 m down a 30 • slope, produce about 0.32 J of heat -the equivalent of an ordinary 100 W light bulb turned on for 0.0032 s. On the other hand, for a million similar particles traveling an average of 100 m down a 45 • slope, we must leave the light bulb on for nearly 8 min. This result offers an example of how application of the maximum entropy method can be misleading. Namely, suppose we assume that a total fixed quantity of heat generated by particle motions, because this is an energetic "cost", provides a constraint on the maximization procedure. In this situation, and with no further constraints, the maximum entropy method leads to an exponential distribution f q p (q p ) of heat states q p with mean µ q p = E a0 +mg sin θ µ x . Because q p and x are linearly related, then using Eq. (B1) (Appendix B) the distribution f x (x) of travel distances x would be exponential. Note that at this point, however, the mean travel distance µ x is not well constrained, as no mechanical information is provided for how particles achieve the distance states x. Whereas the choice of an exponential distribution for f q p (q p ) is a maximally unbiased choice, it almost certainly is incorrect. We comment further on this type of naïve use of the maximum entropy method below.

Discussion and conclusions
Let us acknowledge that a distribution identified as a maximum entropy distribution based on empirically constraining one or more of its moments is not necessarily a special outcome. For example, we frequently fit data to exponential and Gaussian distributions based on estimates of the mean and variance of these distributions -assuming these moments exist and are finite -without reference to maximum entropy. In other words, asserting that a random variable possesses a finite expected value (mean or variance) and then using this assertion to choose the distribution based on the maximum entropy method has no meaningful mechanical significance if the mechanical basis of the constraint is not specified. In this situation a maximum entropy criterion is just one among numerous inferential methods -albeit with the decided merit of being maximally indifferent in the choosing of the distribution. Only when the constraining moment has independent mechanical meaning, and in the absence of additional information, does the label of maximum entropy carry mechanical significance. The example of heat states q p described in Sect. 5.2 illustrates this point.
For example, Furbish et al. (2016) suggest the following: In focusing on the mechanical side of the duality of Jaynes's principle [of maximum entropy], it becomes important to distinguish between a "strong" mechanical constraint, a "weak" mechanical constraint, and an empirical constraint, as these inform confidence in the resulting choice of a distribution . . . A strong mechanical constraint is one that derives directly from a dynamics argument . . . A weak constraint is one that derives from a mechanical definition, for example, an appeal to mass conservation . . . An empirical constraint is one that appeals to our confidence in suggesting a general behavior from experiments or dimensional analysis but lacks a clear dynamics underpinning.
For rarefied bed load particles transported under equilibrium conditions, Furbish et al. (2016) show that the condition of fixed total particle momentum provides a strong mechanical constraint. In this situation the maximum entropy method predicts an exponential distribution of particle velocities in the absence of any additional mechanical information -consistent with measurements of particle velocities based on high-speed imaging (e.g., Lajeunesse et al., 2010;Roseberry et al., 2012;Furbish and Schmeeckle, 2013;Fathel et al., 2015;Wei et al. 2015). We suggest that the total cumulative energetic cost used herein to constrain the maximum en-tropy method similarly represents a strong mechanical constraint.
As a point of reference, the analysis presented herein is akin to the energetics associated with the economics of scale as examined by Peterson et al. (2013). To illustrate this idea we start with a binomial expansion of the disentrainment rate, Eq. (3), to give Momentarily focusing on the leading and first-order terms for illustration, Eq. (46) has the same form as the "communal cost-minus-benefit function" proposed by Peterson et al. (2013, Eg. (5) therein; Appendix C). Using the language of economic costs, here the state x may be interpreted as the size of a community, for example, "particles forming colloidal clusters, or social processes such as people joining cities, citations added to papers, or link creation in a social network" (Peterson et al., 2013, p. 20381). The leading term in Eq. (46) may be interpreted as an intrinsic cost for an individual to achieve ("join") the state x. For A > 0 the first-order term represents a "discount" provided by the community of size x. For A < 0 the first-order term represents a "penalty" imposed by the community. If the cost is independent of size (A = 0), then the cost rate is fixed (P x = 1/B) and the maximum entropy method leads to an exponential distribution of states x. If the cost is shared with increasing size (A > 0), then the cost of joining the state x decreases with increasing size. This means that larger sizes (states) are more likely to occur than if a discount is not provided, leading to a heavy-tailed distribution of states x. Conversely, if joining a state x involves a penalty (A < 0), then exclusion with increasing size occurs, leading to a light-tailed or bounded distribution of states x. In this analysis the idea of cost is fundamentally energetic, whether involving free energy for colloid particles, or the energy consumed by individuals in joining some form of social construct. When rearranged, the "cost-minus-benefit" function proposed by Peterson et al. (2013) yields a cost function (Appendix C) whose form is identical to that of the disentrainment rate, Eq. (3). In the economics of scale problem the costs are nominally absolute energetic costs. In the problem of rarefied particle motions the cost function (i.e., the disentrainment rate) represents the local relative energetic cost. Nonetheless, the formalism involving a fixed total cumulative cost is essentially the same. With net particle heating it becomes easier for particles to achieve larger states x relative to a fixed local energetic cost, analogous to effects of a discount in the economics of scale. With net cooling this effect is reversed, where the local relative energetic cost increases with the state x. The key mathematical construct of the disentrainment rate, Eq. (3), is that the state x appears in the denominator of this cost function.
In this problem the maximum entropy method in effect considers all possible accessible microstates -the many different ways to arrange a great number of particles into distance states x where each arrangement satisfies the same fixed total energetic cost. (Figure 3 in Furbish and Schmeeckle (2013) illustrates this idea.) Then, the generalized Pareto distribution uniquely represents the most probable arrangement. This idea equally applies to the accessible microstates associated with net cooling, isothermal conditions and net heating. To elaborate this point, consider the upper bound on travel distances, x = B/|A|, under conditions of net cooling. This is the distance at which, according to Eq. (23), the expected available kinetic energy goes to zero such that the disentrainment rate P x (x) becomes unbounded. From this perspective, the conditional probability P x (x)dx that motions cease within a small interval dx approaches unity as x → B/|A|. However, this is not to be interpreted as a "hard" boundary determined by mechanical behavior. Rather, according to Eq. (26) the cumulative energetic cost becomes unbounded at the distance x → B/|A|. For a small upper bound x = B/|A| the total energetic cost involves contributions from all particle motions but is dominated by the large individual costs of the largest travel distances near this upper bound (Fig. 7). From this perspective, the bounded form of the distribution is just the most probable among all possible arrangements satisfying the constraint of a fixed total cost, in this case dominated by the individual costs of the largest travel distances. A similar conclusion pertains to the bounded form of the distribution as contributions of individual motions to the total cost become more broadly distributed with increasing A (Fig. 7). In turn, no matter how large the upper bound x = B/|A| becomes as |A| approaches zero, this upper bound nonetheless remains finite. The generalized Pareto distribution then "flips" to an exponential form with unbounded distance states only in the limit of A → 0 − . In approaching this limit, the basic physics of particle motions does not change. Similarly, in approaching this limit A → 0 + from the heavy-tailed form of the generalized Pareto distribution, no changes in physics occur. That is, the essence of the balance between gravitational heating and frictional cooling by particle-surface collisions remains the same; there is nothing special or unusual about particlesurface interactions associated with crossing the isothermal transition. Thus, the most probable arrangement of distance states x is in each case -net cooling, isothermal conditions and net heating -a reflection of the unifying probabilistic outcome associated with a fixed total energetic cost.
Here we return to Eq. (2), the standard formulation of the probability density f x (x) presented in survival analysis, and compare this with the entropy maximization criterion given by Assuming the Lagrange multiplier λ 1 = −(A + 1), then Eq. (48) becomes which has the form of Eq. (47) with Substituting e λ 0 = 1/B and P x (x) = 1/(Ax + B) into Eq. (49) and evaluating the integrals confirms that the generalized Pareto distribution is retrieved. We now have the interesting result that, for this problem, determining the distribution f x (x) according to Eq. (47) is the same as obtaining this distribution using a maximum entropy criterion. This occurs because the disentrainment rate P x (x) represents an energetic cost to particles reaching states x. Then, inasmuch as the total energetic cost probabilistically constrains the organization of a great number of particles into accessible states consistent with the maximization method, the resulting distribution must be a maximum entropy distribution. If instead the disentrainment rate function P x (x) is heuristically proposed or empirically fitted to data without reference to constraints imposed on the system, then the distribution obtained from Eq. (47) will be consistent with the disentrainment rate function, but this does not guarantee that the distribution is a maximum entropy choice.
The analysis presented here represents an unusual situation. Namely, the generalized Pareto distribution of travel distances and its parametric values are known a priori, and this distribution is then shown to be a maximum entropy distribution consistent with the constraint imposed by a fixed energetic cost. In contrast, normally the distribution is not known and the maximum entropy method is used to choose the distribution in an unbiased manner based on known constraints -as exemplified by the Boltzmann distribution. As emphasized by many, starting with Jaynes (1957a), the maximum entropy method represents a compelling strategy for choosing a distribution. Nonetheless, it is important to highlight the fact that a distribution thus chosen is not necessarily the "correct" distribution (Furbish et al., 2016). Rather, a distribution derived from a maximum entropy criterion is unbiased in that it is faithful to what is known mechanically, but no more; it is the most reasonable choice in the absence of additional information. In this sense the maximum entropy method is a formal application of Occam's razor -an explanation involving the fewest possible assumptions. Thus, the value of showing that the generalized Pareto distribution is a maximum entropy distribution is this: the analysis represents a novel generalization of an energy-based constraint in using the maximum entropy method to infer non-exponential distributions -to include the versatile properties (forms) of the generalized Pareto distribution as applied to the rarefied particle motion problem. Importantly, the analysis uses the BGS definition of entropy rather than a nontraditional definition. We suggest that this result offers promise for examining particle motions in other systems, including particles transported as bed load, where insights involving particle energetics might become useful as we learn more about the physics involved.

Appendix A: Maximization
The maximization method involves the calculus of variations (Cover and Thomas, 1991), of which a version closer to the original analysis of Boltzmann is presented in Furbish and Schmeeckle (2013) and Furbish et al. (2016). Using the BGS definition of entropy given by Eq. (15) together with the constraints g 0 (x) = g 0 = 1 and g 1 (x) given by Eq. (29), we form the following objective function: with Lagrange multipliers λ * 0 and λ 1 . Taking the functional derivative of Eq. (A1) with respect to f x (x) and setting the result to zero then leads to with λ 0 = λ * 0 − 1. This yields For isothermal conditions with P x (x) = P x = 1/B, Eq. (A3) becomes f x (x) = e λ 0 e λ 1 x/B .

Appendix B: Total cumulative cost
Let x denote a random variable with probability density f x (x). If a random variable w is a monotonic function of x, namely w = g(x), then the probability density f w (w) of w is given by For isothermal conditions (A = 0) the cumulative cost w(x) is so g −1 (w) = x = Bw. Then dg −1 (x)/dw = B and the probability density is The cumulative distribution is The mean of this distribution is For non-isothermal conditions (A = 0), The cumulative distribution is F w (w) = 1 − e 1/A e −(1/A)e Aw .
Noting that for A < 0 the limit of Eq. (B6) as x → −B/A is w → ∞ and for A > 0 the limit as x → ∞ is w → ∞, then the mean of the distribution is where Ei denotes the exponential integral. The total cumulative cost W (w) up to the value w is The total cumulative cost W (x) up to the distance x is For isothermal conditions, For non-isothermal conditions, The total cumulative cost W (x) systematically increases with increasing travel distance x (Fig. B1).