Rarefied particle motions on hillslopes: 3. Entropy

David Jon Furbish1, Sarah G. W. Williams1, and Tyler H. Doane2, 3 1Department of Earth and Environmental Sciences, Vanderbilt University, Nashville, Tennessee, USA 2Department of Geosciences, University of Arizona, Tucson, Arizona, USA 3Current: Department of Earth and Atmospheric Sciences, Indiana University, Bloomington, Indiana, USA Correspondence: David Furbish (david.j.furbish@vanderbilt.edu)


Introduction
In two companion papers (Furbish et al., 2020a(Furbish et al., , 2010b we examine a theoretical formulation of the probabilistic physics of rarefied particle motions and deposition on rough hillslope surfaces. The formulation is based on a description of the kinetic 20 energy balance of a cohort of particles treated as a rarefied granular gas and a description of particle deposition that depends on the energy state of the particles. The formulation predicts a generalized Pareto distribution of particle travel distances whose form varies with the balance between gravitational heating, due to conversion of potential to kinetic energy, and frictional cooling, due to particle-surface collisions. Specifically, the generalized Pareto distribution varies from a bounded form associated with thermal collapse and rapid deposition to an exponential form representing isothermal conditions to a heavy-tailed form associated with net heating of particles and decreased deposition. The transition to a heavy-tailed form likely involves an increasing conversion of translational to rotational kinetic energy leading to larger travel distances with decreasing effectiveness 5 of collisional friction. As described in Furbish et al. (2020b), these varying forms of the generalized Pareto distribution are consistent with laboratory measurements of particle travel distances reported by Gabet and Mendoza (2012) and Furbish et al. (2020b), and with field-based measurements of travel distances reported by DiBiase et al. (2017) and Roth et al. (2020).
Here we highlight a key point in Furbish et al. (2020a). Namely, the generalized Pareto distribution is not selected in an empirical manner based on goodness-of-fit criteria applied to data sets. Rather, this distribution is dictated by the physics 10 of the problem, just as, for example, the Boltzmann distribution (an exponential distribution) emerges in classical statistical mechanics from consideration of the accessible energy microstates of a gas system. In this problem the versatile form of the generalized Pareto distribution -specifically its apparent success in describing three distinctive energetic behaviors of rarefied particle motions -is enigmatic. Although the different energetic behaviors have a clear mechanical explanation, the transition from a bounded form to a heavy-tailed form in crossing isothermal conditions is abrupt. The basis of this transition, including 15 the upper bound on travel distances prior to transition, is unclear -whether it represents a fundamental change in mechanical behavior or is simply a mathematical curiosity of the generalized Pareto distribution.
The purpose of this third companion paper therefore is to further elaborate the probabilistic physics of particle motions as represented by the generalized Pareto distribution. To do this we appeal to the principle of maximum entropy as outlined in the pioneering work of Jaynes (1957aJaynes ( , 1957b. We specifically demonstrate that in this problem the generalized Pareto distribution 20 is a maximum entropy distribution constrained by a fixed total energetic "cost" -the total cumulative energy extracted by collisional friction per unit kinetic energy available during particle motions. The relative energetic cost locally increases with increasing travel distance for net particle cooling and rapid thermal collapse, it is uniform for isothermal conditions, and it decreases with increasing travel distance for net particle heating. The cumulative cost involves integrating the local cost over the particle travel distance, and the total cumulative cost is then obtained by summing over all particles. This fixed total cost 25 unifies the interpretation of the three energetic behaviors, where the upper bound on travel distances prior to transition is a probabilistic mechanical outcome. As a point of reference, the canonical example of a maximum entropy distribution is the Boltzmann distribution of the energy states of the particles composing an ordinary gas at thermal equilibrium. Similarly, the Maxwell-Boltzmann distribution of particle speeds, which is derived from the Boltzmann distribution, is a maximum entropy distribution. Here we are referring to 30 the Gibbs entropy of statistical mechanics. A maximum entropy distribution then is the unique distribution that maximizes the Gibbs entropy, subject to constraints imposed on the system. In the canonical case these constraints consist of a fixed number of particles and a fixed total energy, which together guarantee a fixed average energy equal to k B T , where k B is the Boltzmann constant and T is temperature. Moreover, any other distribution of particle energy states satisfying these constraints would coincide with a lower Gibbs entropy. Jaynes (1957aJaynes ( , 1957b elaborated the significance of the fact that the Gibbs entropy in statistical mechanics and the Shannon entropy in information theory are essentially one and the same, differing only by a constant. This similarity inspired Jaynes to champion the use of a maximum entropy criterion in choosing a probability distribution, leading to what is now known as the maximum entropy method (aka MaxEnt or MEM). The key idea of the maximum entropy method, whether viewed as a method of statistical mechanics or as one of inferential statistics, is that it provides an unbiased choice of a distribution by honoring only what is known mechanically about a system. That is, this unbiased choice is a maximally noncommittal choice that is faithful to what we do not know; it is therefore the most reasonable choice in the absence of additional information (Jaynes, 1957a;Williamson, 2010, pp. 25 and 51). Importantly, mechanical constraints imposed on the system are part of the choice of the distribution, as opposed to empirical fitting without regard to such constraints. The maximum entropy method has been 10 applied in a remarkable variety of fields (Shore and Johnson, 1980;Ramirez and Carta, 2006;Verkley and Lynch, 2009;Singh, 2011;Peterson et al., 2013), including sediment transport (Furbish and Schmeeckle, 2013;Furbish et al., 2016).
In using the maximum entropy method, constraints imposed on the system normally translate to constraints imposed on the moments of the distribution. In this case the method leads to a distribution that is among the exponential family (e.g., exponential, Gaussian). However, applications of the maximum entropy method to non-exponential distributions, including 15 heavy-tailed distributions, are of particular interest in many problems (Peterson et al., 2013). As described below, applying this method to heavy-tailed distributions presents a special challenge in that the first or second moment, or both of these moments, may be undefined for such distributions, including the generalized Pareto distribution (Pickands, 1975;Hosking and Wallis, 1987).
In Section 2 we provide background material, namely, the essential elements of the formulation of Furbish et al. (2020a) 20 leading to the generalized Pareto distribution of particle travel distances, and a summary of the properties and derivation of a maximum entropy distribution. In Section 3 we describe how the energetic cost associated with collisional friction is expressed as a constraint used in the maximization method. In Section 4 we show how the generalized Pareto distribution is obtained as a maximum entropy distribution. In Section 5 we describe the probabilistic properties and significance of the energetic cost. We consider the implications of the analysis in the final section. In the fourth companion paper (Furbish et al., 2020c) we step back 25 and examine the philosophical underpinning of the statistical mechanics framework for describing sediment particle motions and transport.

Elements of the distribution of travel distances
With reference to Figure 1, let x denote the particle travel distance with probability density function f x (x). The theoretical 30 formulation (Furbish et al., 2020a) then begins with the particle disentrainment rate function defined by Here, R x (x) = 1 − F x (x) is the exceedance probability function where F x (x) is the cumulative distribution function. The disentrainment rate P x (x) may be interpreted as a conditional probability per unit distance. Namely, upon multiplying both sides of Eq. (1) by dx, then P x (x)dx = f x (x)dx/R x (x) is interpreted as the probability that a particle will become disentrained within the small interval x to x + dx, given that it "survived" travel to the distance x. In turn, upon rearranging Eq. (1) and  (Furbish et al., 2020a).
Thus, the significance of the disentrainment rate function becomes clear: it completely determines the density f x (x) via Eq.
(2) are standard elements of survival (or reliability) analysis, without reference to entropy.
The particle energy balance formulated in Furbish et al. (2020a) leads to the result that for a given particle size and shape 10 the disentrainment rate on an inclined surface with uniform slope and roughness is Substituting Eq. (3) into Eq.
(2) then leads to the generalized Pareto distribution, where A ∈ is a shape parameter and B > 0 is a scale parameter (Pickands, 1975;Hosking and Wallis, 1987). The cumulative 15 distribution is and the exceedance probability is For A < 1 the mean is and for A < 1/2 the variance is .
The mean is undefined for A ≥ 1 and the variance is undefined for A ≥ 1/2.

5
In mechanical terms the shape and scale parameters A and B are Here, S is the magnitude of the slope inclined at an angle θ, m is particle mass, g is acceleration due to gravity, µ is a friction 10 factor due to extraction of particle kinetic energy E p = (m/2)u 2 where u is the surface-parallel particle velocity, E a = E p is the arithmetic average particle energy so that E a0 is the initial average energy at x = 0, γ = E a /E h where E h is the harmonic average particle energy, and α = α 0 /(1 − µ 1 Ki ) where α 0 and µ 1 are factors of order unity and Ki is the Kirkby number defined by 15 which represents the ratio of gravitational heating to frictional cooling. Here we emphasize that mg cos θ in Eq. (10) is not to be interpreted as the static normal weight of the particle, and µ is not interpreted as a Coulomb-like friction coefficient. Rather, µ ∼ β x , where β x denotes the expected proportion of particle kinetic energy extracted per particle-surface collision during downslope motion. Details are provided in Furbish et al. (2020aFurbish et al. ( , 2020b.
For plotting purposes we define a characteristic particle cooling distance X = E a0 /mgµ cos θ and in turn define the follow-20 ing dimensionless quantities denoted by circumflexes: In addition, a = A and b = (α/γ)Ê a0 . Then the dimensionless form of the generalized Pareto distribution, Eq. (4), is written as

25
For a < 0 the density fx(x) is bounded atx = b/|a| ( Figure 2). This density increases withx for a < −1, it is uniform for a = −1, and it decreases with x for a > −1. It is triangular for a = −1/2. For a = 0 the density fx(x) is exponential. For a > 0 this density is heavy-tailed. For a ≥ 1 the mean of fx(x) is undefined; and for a ≥ 1/2 the variance is undefined.
We note that the definition of the differential entropy given in the next section involves the logarithm of the probability density function. In a strict sense this is acceptable only if the density is expressed in dimensionless form as in Eq. (13), or if the definition involves a discrete probability mass function. Nonetheless, the maximization method removes this logarithm such that the outcome is dimensionally the same whether one starts with the dimensional form or the dimensionless form of the 5 density. For simplicity we use the dimensional form, Eq. (4). In addition, for simplicity in plotting we set the scale parameter B = 1 in calculated functions containing this parameter, and in several plots we use dimensional abscissa values (e.g., distance x) without reference to units, noting that these have the same visual appearance as if plotted using dimensionless values.
Following Furbish et al. (2020b) we calculate the quantities

Maximum entropy distribution
If x denotes a continuous random variable with probability density f x (x) over x = [0, ∞), then the differential entropy of x is defined as where it is understood that f Given the lineage of this definition, hereafter we follow Pe-5 terson et al. (2013) and refer to it as the Boltzmann-Gibbs-Shannon (BGS) entropy. In turn, let g j (x) denote a measurable quantity of x with j = 0, 2, . . . , n. We then assume that with finite a j . For example, if g 0 (x) = g 0 = 1, then Eq. (16) gives a 0 = 1. That is, the density f x (x) integrates to unity. If g 1 (x) = x, then Eq. (16) gives the mean of the distribution, a 1 = µ x . If g 2 (x) = (x − µ x ) 2 , then Eq. (16) gives the variance, 10 a 2 = σ 2 x . Note, however, that g j (x) need not be selected just to obtain the usual moments of a distribution. Indeed, Eq. (16) may represent a constraint imposed by a function g j (x) that does not coincide with a moment of f x (x). As described below, this is essential for heavy-tailed distributions whose first or second moment, or both of these moments, are undefined. The maximum entropy distribution is then given by where λ 0 , λ 1 , λ 2 , ... are Lagrange multipliers introduced in the problem of maximizing the entropy H(x) (Appendix A). Moreover, as above we set g 0 (x) = g 0 = 1 with a 0 = 1, which guarantees that the probability density f x (x) integrates to unity.
As a point of reference, a fixed mean with g 1 (x) = x and no other constraint leads to the result The Lagrange multipliers are then obtained as follows. By the definition of a probability density, which leads to e λ0 = −λ 1 . Alternatively, Eq. (18) sometimes is presented as (e.g., Tolman, 1938;Schrödinger, 1946;Furbish and Schmeeckle, 2013) 10 where it becomes clear that e λ0 is a normalization factor that ensures the probability density integrates to unity. In turn, by the definition of the mean, which leads to λ 1 = −1/µ x and the exponential distribution, where it becomes clear that the Lagrange multiplier λ 1 enforces the constraint of a fixed mean. The Gaussian distribution is similarly obtained as the maximum entropy distribution with the constraint imposed by a fixed second moment (variance).
The canonical example of the Boltzmann distribution of particle energy states is obtained in this manner as a maximum entropy distribution, where the mean is independently determined to be k B T (e.g., Schrödinger, 1946). The imposed constraints 20 consist of extensive quantities that scale with system size: a fixed number of particles and a fixed total energy, which together guarantee a fixed mean energy. In a similar manner, Furbish and Schmeeckle (2013) and Furbish et al. (2016) derive an exponential distribution for the streamwise velocity states of particles transported as bed load, with the mechanical constraint imposed by a fixed total particle momentum under equilibrium transport conditions. Our next task is to adapt these ideas to the generalized Pareto distribution, which is not among the exponential family of dis-25 tributions. We note that there is a continuing effort given to this topic, notably in relation to heavy-tailed (non-exponential) distributions. Peterson et al. (2013) summarize the basis of these efforts, and note that one approach for inferring non-exponential distributions is to appeal to nontraditional definitions of the entropy, for example, the Tsallis entropy (Tsallis, 1988), rather than the canonical BGS entropy. The procedure is the same: to maximize the defined entropy subject to an extensive constraint that scales with the system size. Here, however, we adopt the view of Peterson et al. (2013), who highlight the conclusions of Shore and Johnson (1980). Namely, because the BGS definition of entropy uniquely ensures addition and multiplication rules 5 of probability, any other definition of entropy yields a bias in the fitting of data. Peterson et al. (2013) suggest that this offers a "compelling first-principles basis for defining a proper variational principle for modeling distribution functions." Like these authors in their analysis of the energetics associated with the economics of scale, we retain the BGS definition of entropy and seek a non-extensive energy constraint aligned with the mechanics of the rarefied particle motion problem.
3 Energetic cost as a maximizing constraint 10 In the canonical example of the Boltzmann distribution, the particle energy state is an instantaneous quantity. Similarly, in the example of bed load particle velocities (Furbish and Schmeeckle, 2013;Furbish et al., 2016), the velocity state is an instantaneous quantity. The state of a particle changes from one instant to the next, and this state can be reached from smaller or larger state values. In these cases, the total particle energy and the total streamwise momentum are well-defined extensive quantities such that the moments of the distributions are fixed. In the absence of additional information, the maximum entropy 15 distribution must be among the exponential family.
In contrast to instantaneous quantities, the particle travel distance x is an integrated quantity that reflects a dynamical particle history starting from the state x = 0. The state x must be reached from smaller (unrecorded) state values; it cannot be reached from larger state values. Moreover, travel distances are not like an extensive quantity that scales linearly with the system size.
Nonetheless, particle motions require a source of energy and dissipation of energy. Following Peterson et al. (2013) we assume 20 that the outcome of motions -the travel distances x -can be represented in terms of an energetic cost that probabilistically constrains the organization of a great number of particles into accessible states x.
The disentrainment rate P x (x) has special significance in defining the energetic cost. In particular, this rate determines the energetic cost associated with reaching the state x. We start by using Eq. (10) to rewrite Eq. (3) as

25
The denominator in Eq. (23) describes how the average particle energy E a (x) varies with x, whether this involves net cooling (A < 0), isothermal conditions (A = 0) or net heating (A > 0). The quantity mgµ cos θ in the numerator is the expected spatial rate at which energy is extracted by collisional friction, modulated by the factor γ/α. Thus, the disentrainment rate represents the local relative energetic cost -the spatial rate at which particle energy is extracted per unit kinetic energy available during motion at position x.
In turn, the relative energy extracted within a small interval dx is P x (x)dx so the cumulative energy extracted per unit kinetic energy available is This is the cumulative energetic cost in reaching position x. For isothermal conditions (A = 0) the cumulative cost is For non-isothermal conditions (A = 0) the cumulative cost is Eq. (25), the cumulative cost with net cooling (A < 0) increases more rapidly up to the limiting distance given by x = B/|A|, and the cumulative cost with net heating (A > 0) increases more slowly with increasing distance x.
Consider first the isothermal case to illustrate the significance of the cost w(x). This cost increases linearly with the distance x. Let N denote a great number of particles. Among all accessible microstates -the many ways of arranging N particles into 10 states x where each arrangement has a fixed total cost -most microstates involve particles with small state values and fewer with large state values. As shown below, this constraint leads to an exponential distribution. Note that Furbish and Schmeeckle (2013) provide a detailed description of the analysis leading to this outcome, including the basis for counting microstates (see Figure 3 and Appendix B therein), as applied to particle momentum states rather than travel distance states x. Nonetheless the analysis is otherwise conceptually identical. Tolman (1938) and Schrödinger (1946)  With non-isothermal conditions and net heating, it is easier to achieve larger state values than with isothermal conditions. Among all accessible microstates, an increasing proportion will have particles in larger states than would be predicted with a uniform cost rate. In contrast, with net cooling a smaller proportion of microstates will have particles in large states x with an increasing relative cost to achieve these large states. Indeed, there is a limit on available energy to be spent in frictional cooling such that the relative cost goes to infinity at x = B/|A|. As shown below, these constraints lead to the generalized Pareto distribution.

5
The energetic cost w(x) is a natural choice for constraining the maximization method. As described in Section 6 (Discussion and conclusions), this choice is identical in form to the language of "cost" in the economics of scale (Peterson et al., 2013) leading to non-exponential (heavy-tailed) distributions of state values. We use these ideas next in deriving the maximum entropy distribution.

Constraints
Focusing on the generalized Pareto distribution, as above we start with the constraint given by g 0 (x) = g 0 = 1, namely, A second, strong mechanical constraint is provided by assuming that the total cumulative energetic cost associated with collisional friction is fixed. Starting with Eq. (24), which is the cumulative energy extracted by friction per unit kinetic energy available in reaching position x. Then, which is the average cumulative cost.
Starting with isothermal conditions (A = 0), the disentrainment rate P x (x) = P x = 1/B. This gives which shows that the expected cumulative relative cost is unity with µ x = B. This is nominally the same as saying that the expected absolute cost is equal to the initial available energy E a0 . More generally with P x (x) = 1/(Ax + B), We use these two results in the maximization of entropy.
with mean µ w = 1. The cumulative distribution is For non-isothermal conditions (A = 0) the density is which has attributes of an extreme value distribution. The mean is where Ei denotes the exponential integral. The cumulative distribution is Note that these functions depend on the shape parameter A but not on the scale parameter B. The total cumulative cost W (w) up to the value w is Alternatively, the total cumulative cost up to the distance x is Expressions for Eq. (42) and Eq. (43) are provided in Appendix B and show how the total costs W (w) and W (x) grow with increasing w and x to a finite value. Consider here the product W * (x) = w(x)f x (x) = dW (x)/dx, which is the total cost per unit travel distance. This function is like a frequency-magnitude product and reflects the relative contribution to the total cost of different parts of the travel distance domain (Figure 7). For net cooling (A < 0) and large negative A the total cost is dominated

Frictional loss to heat 15
The energetic cost outlined above pertains to the conversion of translational kinetic energy into other forms, including rotational energy, surface deformation and heat -all under the heading of collisional friction. This cost, however, is not the same as the total energy conversion to heat.
Consider the total energy extracted by friction and ultimately converted to heat. Note first that the quantity mg sin θ at first glance normally is interpreted as the downslope component of the weight of a particle (or control volume) with mass m. In energetic terms, however, this quantity is to be interpreted as the accessible gravitational potential energy per unit downslope travel distance (Furbish et al., 2020a). For an individual particle traveling a distance x the heat generated is Taking the ensemble average of Eq. (44) and using Eq. (7), The total heat generated by N particles is then N µ qp . As a fun point of reference, 100 particles, each with a diameter of 0.1 m and an average starting velocity of 1 m s −1 traveling an average distance of 10 m down a 30 degree slope, produce about 0.32 J of heat -the equivalent of an ordinary 100 W light bulb turned on for 0.0032 s. On the other hand, for a million similar particles traveling an average of 100 m down a 45 degree slope, we must leave the light bulb on for nearly eight minutes.

10
This result offers an example of how application of the maximum entropy method can be misleading. Namely, suppose we assume that a total fixed quantity of heat generated by particle motions, because this is an energetic "cost," provides a constraint on the maximization procedure. In this situation, and with no further constraints, the maximum entropy method leads to an exponential distribution f qp (q p ) of heat states q p with mean µ qp = E a0 + mg sin θµ x . Because q p and x are linearly related, then using Eq. (B1) (Appendix B) the distribution f x (x) of travel distances x would be exponential. Note that at this 15 point, however, the mean travel distance µ x is not well constrained, as no mechanical information is provided for how particles achieve the distance states x. Whereas the choice of an exponential distribution for f qp (q p ) is a maximally unbiased choice, it almost certainly is incorrect. We comment further on this type of naïve use of the maximum entropy method below.
Let us acknowledge that a distribution identified as a maximum entropy distribution based on empirically constraining one or more of its moments is not necessarily a special outcome. For example, we frequently fit data to exponential and Gaussian distributions based on estimates of the mean and variance of these distributions -assuming these moments exist and are finite -without reference to maximum entropy. In other words, asserting that a random variable possesses a finite expected value (mean or variance) and then using this assertion to choose the distribution based on the maximum entropy method has no mean-5 ingful mechanical significance if the mechanical basis of the constraint is not specified. In this situation a maximum entropy criterion is just one among numerous inferential methods -albeit with the decided merit of being maximally indifferent in the choosing of the distribution. Only when the constraining moment has independent mechanical meaning, and in the absence of additional information, does the label of maximum entropy carry mechanical significance. The example of heat states q p described in Section 5.2 illustrates this point.

10
For example, Furbish et al. (2016) suggest: "In focusing on the mechanical side of the duality of Jaynes's principle [of maximum entropy], it becomes important to distinguish between a "strong" mechanical constraint, a "weak" mechanical constraint, and an empirical constraint, as these inform confidence in the resulting choice of a distribution... A strong mechanical constraint is one that derives directly from a dynamics argument... A weak constraint is one that derives from a mechani-15 cal definition, for example, an appeal to mass conservation... An empirical constraint is one that appeals to our confidence in suggesting a general behavior from experiments or dimensional analysis but lacks a clear dynamics underpinning." For rarefied bed load particles transported under equilibrium conditions, Furbish et al. (2016) show that the condition of fixed total particle momentum provides a strong mechanical constraint. In this situation the maximum entropy method predicts 20 an exponential distribution of particle velocities in the absence of any additional mechanical information -consistent with measurements of particle velocities based on high-speed imaging (e.g., Lajeunesse et al., 2010;Roseberry et al., 2012;Furbish and Schmeeckle, 2013;Fathel et al., 2015;Wei et al. 2015). We suggest that the total cumulative energetic cost used herein to constrain the maximum entropy method similarly represents a strong mechanical constraint.
As a point of reference, the analysis presented herein is akin to the energetics associated with the economics of scale as 25 examined by Peterson et al. (2013). To illustrate this idea we start with a binomial expansion of the disentrainment rate, Eq.
(3), to give Momentarily focusing on the leading and first-order terms for illustration, Eq. (46) has the same form as the "communal costminus-benefit function" proposed by Peterson et al. (2013, Eg. (5) therein; Appendix C). Using the language of economic 30 costs, here the state x may be interpreted as the size of a community, for example, "particles forming colloidal clusters, or social processes such as people joining cities, citations added to papers, or link creation in a social network" (Peterson et al., 2013, p. 20381). The leading term in Eq. (46) may be interpreted as an intrinsic cost for an individual to achieve ("join") the state x. For A > 0 the first-order term represents a "discount" provided by the community of size x. For A < 0 the firstorder term represents a "penalty" imposed by the community. If the cost is independent of size (A = 0), then the cost rate is fixed (P x = 1/B) and the maximum entropy method leads to an exponential distribution of states x. If the cost is shared with 5 increasing size (A > 0), then the cost of joining the state x decreases with increasing size. This means that larger sizes (states) are more likely to occur than if a discount is not provided, leading to a heavy-tailed distribution of states x. Conversely, if joining a state x involves a penalty (A < 0), then exclusion with increasing size occurs, leading to a light-tailed or bounded distribution of states x. In this analysis the idea of cost is fundamentally energetic, whether involving free energy for colloid particles, or the energy consumed by individuals in joining some form of social construct.

10
When rearranged, the "cost-minus-benefit" function proposed by Peterson et al. (2013) yields a cost function (Appendix C) whose form is identical to that of the disentrainment rate, Eq. (3). In the economics of scale problem the costs are nominally absolute energetic costs. In the problem of rarefied particle motions the cost function (i.e., the disentrainment rate) represents the local relative energetic cost. Nonetheless, the formalism involving a fixed total cumulative cost is essentially the same. With net particle heating it becomes easier for particles to achieve larger states x relative to a fixed local energetic cost, analogous to  (Figure 7). In turn, no matter how large the upper bound x = B/|A| becomes as |A| approaches zero, this upper bound nonetheless remains finite. The generalized Pareto distribution then "flips" to an exponential form with unbounded distance states only in the limit of A → 0 − . In approaching this limit, the basic physics of particle motions does not change. Similarly, in approaching this limit A → 0 + from the heavy-tailed form of the generalized Pareto distribution, no changes in physics occur. That is, the essence of the balance between gravitational heating and frictional cooling by particle-surface collisions remains the same; there is nothing special or unusual about particle-surface interactions associated with crossing the isothermal transition. Thus, the most probable arrangement of distance states x is in each case -5 net cooling, isothermal conditions and net heating -a reflection of the unifying probabilistic outcome associated with a fixed total energetic cost.
Here we return to Eq. (2), the standard formulation of the probability density f x (x) presented in survival analysis, and compare this with the entropy maximization criterion given by Assuming the Lagrange multiplier λ 1 = −(A + 1), then Eq. (48) becomes which has the form of Eq. (47) with 15 Substituting e λ0 = 1/B and P x (x) = 1/(Ax+B) into Eq. (49) and evaluating the integrals confirms that the generalized Pareto distribution is retrieved.
We now have the interesting result that, for this problem, determining the distribution f x (x) according to Eq. (47) is the same as obtaining this distribution using a maximum entropy criterion. This occurs because the disentrainment rate P x (x) represents an energetic cost to particles reaching states x. Then, inasmuch as the total energetic cost probabilistically constrains 20 the organization of a great number of particles into accessible states consistent with the maximization method, the resulting distribution must be a maximum entropy distribution. If instead the disentrainment rate function P x (x) is heuristically proposed or empirically fitted to data without reference to constraints imposed on the system, then the distribution obtained from Eq.
(47) will be consistent with the disentrainment rate function, but this does not guarantee that the distribution is a maximum entropy choice.

25
The analysis presented here represents an unusual situation. Namely, the generalized Pareto distribution of travel distances and its parametric values are known a priori, and this distribution is then shown to be a maximum entropy distribution consistent with the constraint imposed by a fixed energetic cost. In contrast, normally the distribution is not known and the maximum entropy method is used to choose the distribution in an unbiased manner based on known constraints -as exemplified by the Boltzmann distribution. As emphasized by many, starting with Jaynes (1957a), the maximum entropy method represents a 30 compelling strategy for choosing a distribution. Nonetheless, it is important to highlight the fact that a distribution thus chosen is not necessarily the "correct" distribution (Furbish et al., 2016). Rather, a distribution derived from a maximum entropy criterion is unbiased in that it is faithful to what is known mechanically, but no more; it is the most reasonable choice in the absence of additional information. In this sense the maximum entropy method is a formal application of Occam's razor. Thus, the value of showing that the generalized Pareto distribution is a maximum entropy distribution is this: the analysis represents a novel generalization of an energy-based constraint in using the maximum entropy method to infer non-exponential distributions -to include the versatile properties (forms) of the generalized Pareto distribution as applied to the rarefied particle motion 5 problem. Importantly, the analysis uses the BGS definition of entropy rather than a nontraditional definition. We suggest that this result offers promise for examining particle motions in other systems, including particles transported as bed load, where insights involving particle energetics might become useful as we learn more about the physics involved.
Data availability. The data plotted in Figure 3 are available from sources described in Furbish et al. (2020b).

10
The maximization method involves the calculus of variations (Cover and Thomas, 1991), of which a version closer to the original analysis of Boltzmann is presented in Furbish and Schmeeckle (2013) and Furbish et al. (2016). Using the BGS definition of entropy given by Eq. (15) together with the constraints g 0 (x) = g 0 = 1 and g 1 (x) given by Eq. (29) we form the following objective function: with Lagrange multipliers λ * 0 and λ 1 . Taking the functional derivative of Eq. (A1) with respect to f x (x) and setting the result to zero then leads to (A5) For isothermal conditions (A = 0) the cumulative cost w(x) is, so g −1 (w) = x = Bw. Then dg −1 (x)/dw = B and the probability density is

10
The cumulative distribution is The mean of this distribution is For non-isothermal conditions (A = 0), Noting that for A < 0 the limit of Eq. (B6) as x → −B/A is w → ∞ and for A > 0 the limit as x → ∞ is w → ∞, then the mean of the distribution is where Ei denotes the exponential integral.

5
The total cumulative cost W (w) up to the value w is For isothermal conditions, For non-isothermal conditions, (B12) 15 The total cumulative cost W (x) up to the distance x is For non-isothermal conditions, The total cumulative cost W (x) systematically increases with increasing travel distance x ( Figure B1).
Here, "the quantity on the left side of Eq. (C1) is the total cost-minus-benefit when a particle joins a k-mer community.

10
The joining cost has two components, expressed on the right side: each joining event has an intrinsic cost α 0 that must be paid, and each joining event involves some discount that is provided by the community. Because there are k members of the existing community, the quantity α k /k 0 is the discount given to a joiner by each existing community particle, where k 0 is a problem-specific parameter that characterizes how much of the joining cost burden is shouldered by each member of the community." Rearranging Eq. (C1) then gives which is analogous to the disentrainment rate function P x (x) given by Eq.
(3) is that k and x are in the denominators of these cost functions.