Introducing PebbleCounts: A grain-sizing tool for photo surveys of dynamic gravel-bed rivers

Grain-size distributions are a key geomorphic metric of gravel-bed rivers. Traditional measurement methods include manual counting or photo sieving, but these are achievable only at the 1–10 m scale. With the advent of unmanned aerial vehicles and increasingly high-resolution cameras, we can now generate orthoimagery over hectares at sub-cm resolution. These scales, along with the complexity of high-mountain rivers, necessitate different approaches for photo sieving. As opposed to other image segmentation methods that use a watershed approach to automatically segment entire images, our open-source 5 algorithm, PebbleCounts, relies on k-means clustering in the spatial and spectral domain and rapid manual selection of welldelineated grains. The result is improved grain-size estimates for complex river-bed imagery, without any post processing. In a second step, we develop a fully automated method, PebbleCountsAuto, that relies on edge detection and filtering suspect grains, without the k-means clustering or manual selection steps. The algorithms are tested in controlled indoor conditions on three arrays of pebbles and then applied to 12 × 1 m orthomosaic clips of high-energy mountain rivers collected with a 10 camera-on-mast setup (akin to a low-flying drone). A 20-pixel b-axis length lower truncation is necessary for attaining accurate grain-size distributions. For the k-means PebbleCounts approach, average percentile bias and precision are 0.03 and 0.09 ψ, respectively, for ∼1.16 mm/pixel images, and 0.07 and 0.05 ψ for one 0.32 mm/pixel image. The automatic approach has higher bias and precision of 0.13 and 0.15 ψ, respectively, for ∼1.16 mm/pixel images, but similar values of −0.06 and 0.05 ψ for one 0.32 mm/pixel image. For the automatic approach, only at best 70% of the grains are correct identifications, and 15 typically around 50%. PebbleCounts operates most effectively at the 1 m scale, where the algorithm can be rapidly applied in ∼5 minutes in many small areas to acquire accurate grain-size data over 10–100 m areas. These data can be used to validate PebbleCountsAuto applied at the scale of entire survey sites (10–10 m). We synthesize results and recommend best practices for image collection, orthomosaic generation, and grain-size measurement using both algorithms.

Accurate grain-size measurement is elusive in nature given the heterogeneity of gravel-bed rivers, particularly in steep mountain catchments where the range of grain sizes is large.Traditionally, grain-size distributions have been gathered via Published by Copernicus Publications on behalf of the European Geosciences Union.
physical clast measurement and counting along grids (Wolman, 1954), lines (Wohl et al., 1996), or in ∼ 1 m 2 patches (Bunte and Abt, 2001), all truncated at some lower observable limit (e.g., Rice and Church, 1998).Not only are these techniques time-consuming, prone to operator bias, and disruptive to the environment, but they also require large (hundreds of pebbles) sample sizes to accurately estimate the characteristic nature of the grains in each location (Wolcott and Church, 1991).
In light of this, measurement from photographs is an attractive option for increasing sample size and decreasing fieldwork, while covering larger areas.Increasingly affordable high-resolution -12-to 24-megapixel -cameras allow the collection of high-quality photo surveys via structure from motion with multi-view stereo (SfM-MVS) (Smith et al., 2015;Eltner et al., 2016) at scales of entire river cross sections or reaches with resolutions at, or exceeding, 1 cm pixel −1 (e.g., Woodget and Austrums, 2017).Even higher-resolution (1 mm pixel −1 ) river surveys can be accomplished with low-flying unmanned aerial vehicles (UAVs) (e.g., Carbonneau et al., 2018), pole-mounted cameras, or using handheld imagery.
We build on previous work and introduce the addition of color-space clustering techniques to present efficient new semi-automated (PebbleCounts) and fully automated (Peb-bleCountsAuto) algorithms for grain sizing from imagery in high-energy mountain rivers.Our algorithms are built on Python with a few popular libraries and are open source.The instructions and code can be accessed at https://github.com/UP-RS-ESP/PebbleCounts (last access: 12 September 2019) (Purinton and Bookhagen, 2019).In this study, we present previous work on grain-size measurement from rivers and our motivation for new developments.The processing chains of PebbleCounts and PebbleCountsAuto are then discussed.We test the algorithms in controlled conditions and then in a more challenging field setting in the northwestern Argentine Andes.The limits and caveats of the method are discussed using imagery of varying resolution, and suggestions for photo collection and processing are provided.
These methods only provide one estimate of grain size (e.g., D 50 ), which often requires site-specific calibration.Buscombe (2013) achieved full grain-size distribution measurements using wavelet decomposition and published an open-source Python tool, pyDGS.This is a texture method that has been designed for the analysis of thin sections or beach sands and requires each grain to be fully resolvable and the distributions to be fairly homogeneous in size and shape.Additional texture methods rely on the 3-D texture (or roughness) of point clouds to relate the variance of bed-scale topography to average grain size (Brasington et al., 2012;Rychkov et al., 2012;Westoby et al., 2015;Woodget and Austrums, 2017;Bertin and Friedrich, 2016); however, these techniques also require site calibration, and the relationships have been found to vary widely (Pearson et al., 2017).
In contrast to texture methods, the focus of segmentation is the full delineation and measurement of every visible grain.Segmentation is error prone in images that contain overlapping grains, a large range of grain sizes including sand patches, changes in land cover (e.g., vegetation), pebbles that are highly irregular in shape (non-ellipsoid), pebbles with intragranular color variations or texture such as veins or fractures, and in which shadowing is irregular.Herein, we refer to these factors collectively as image complexity.Furthermore, segmentation-based methods also require high-spatialresolution point clouds or images that resolve the specific grain geometries.The benefits are that segmentation does not require any site calibration besides knowledge of the image scale and it provides a full grain-size distribution and all the commonly used percentiles (D 5,16,25,50,75,84,95 ).Published methods by Butler et al. (2001), Sime and Ferguson (2003), and Graham et al. (2005a, b) all rely on edge detection followed by watershed segmentation and ellipse fitting to each separate grain to get the long (a) and intermediate (b) axes.Detert and Weitbrecht (2012) added some sophistication to the algorithm of Graham et al. (2005a, b) and provide a free -though closed-source -application called Basegrain for Matlab ™ , which has become a standard tool (e.g., Bertin and Friedrich, 2016;Bertin et al., 2017;Langhammer et al., 2017;Carbonneau et al., 2018).

Motivation
Watershed segmentation is effective for interlocking, uniformly colored, oblate grains; however, energetic gravel-bed rivers in mountains often have more complex grain compositions with intragranular variation, irregular shadowing, and a large range of sizes.The automated watershed methods proposed suffer from oversegmentation, grain misidentification, and the need for significant, time-consuming post-processing (e.g., in Basegrain with the split, merge, and delete tools) when applied to complex images.These issues limit their application to areas < 10 m 2 .
Thus, we are motivated to develop a new semi-automated technique that uses k-means clustering of pixels and rapid manual selection of well-defined grains, herein referred to as the k means with manual selection (KMS) or PebbleCounts approach, and a fully automated version that uses filtering of suspect grains, herein referred to as the automatic with image filtering (AIF) or the PebbleCountsAuto approach (Fig. 1).By avoiding oversegmentation and misidentification, we are able to select fewer grains per image but be sure that those selected are correctly delineated, thus improving the resulting distribution (Fig. 2), with the intention of upscaling to include many thousands of grain measurements over large areas.Despite the selection of fewer grains, Fig. 2 demonstrates that these represent the true grain size through the close match in distribution with hand-clicked results.
Furthermore, faced with diverse camera models and the rise of SfM-MVS for the generation of georeferenced orthophotos, we wish to explore reasonable and appropriate combinations for covering acre-to-hectare areas while maintaining accurate grain-size measurement.Fundamentally, our aim for the KMS approach is not in the delineation of a single high-resolution image from a ∼ 1 m 2 patch as in previous segmentation work but rather a method that can cover areas of 10-100 m 2 containing complex grain arrangements, despite missing many grains at the patch scale.These semiautomated photo-sieving results can then be used to validate the AIF method at much greater spatial scales (10 2 -10 4 m 2 ), where physical counting is infeasible and previous methods are unreliable or time-consuming.

Algorithm description
Our methods are similar to previous work by Graham et al. (2005a) and Detert and Weitbrecht (2012), with some key differences.A flow chart of both methods is shown in Fig. 3 and the processing is presented briefly.We direct the interested user to the manual (https://github.com/UP-RS-ESP/PebbleCounts) for a full description of the steps.Our algo-  6b).Watershed approach leads to oversegmentation of grains, giving an unreasonable number of clasts (276 versus 106 in the control) and an overly fine grain-size distribution.
rithms use 2-D image processing in the spatial and spectral domains, which ignores the potential to exploit third height dimensions from irregularly spaced point clouds generated via lidar or SfM-MVS.The reader is directed to Sect.S1 in the Supplement for our efforts in this regard.

PebbleCounts: KMS
We employ the additional color spaces HSV (hue, saturation, value) and CIELab (Russ, 2002), aside from traditional RGB (red, green, blue) and gray scale, to enhance differences in the spectral domain separate from lighting.First, the RGB image undergoes strong non-local means de-noising (Buades et al., 2011) to smooth intragranular color difference, interactive gray-scale shadow masking (Otsu, 1979) to separate obvious interstices, and HSV color selection for sand-patch masking (whereby sand is filtered by a narrow, user-selected color mask).The image and shadow/sand edge mask are then windowed for further processing.
At each window, the RGB image undergoes another weaker non-local means de-noising, is then converted to CIELab, and the chromaticity bands from this color space undergo bilateral filtering (Tomasi and Manduchi, 1998) to preserve intergranular edges while further smoothing color.Following this, edge detection on the smoothed, gray-scaled image occurs via a combination of top-hat, Sobel, and Canny methods with feature-AND selections (Russ, 2002), in which an edge is added to the full mask only if it overlaps with a found edge in the previous edge mask, thus piecewise building an edge map while avoiding lone (i.e., intragranular) edges (Detert and Weitbrecht, 2012).
After edge detection, our algorithm uses k-means clustering (Lloyd, 1982;Sculley, 2010) to further segment the pebbles.First, the matrix of non-masked pixels is converted into a vector that includes the spectral information at each location.This N × 4-dimensional vector (N being the number of non-masked pixels) includes two spectral observables: the green-red and blue-yellow smoothed chromaticity bands from CIELab; and the two spatial observables: the x and y coordinates of the pixel in image space.To avoid oversegmentation by anisotropic or image-spanning grains, the x, y coordinates are rescaled to 50 % of the color, which is also rescaled from 0 to 1.
We attempted using agglomerative Ward hierarchical clustering (Ward, 1963) to further improve results on anisotropic and/or large grains; however, this approach is prohibitively slow on large images, and test results did not show significant improvement.The k-means clustering requires a usersupplied number of clusters.Here, we add clusters beginning at 1 and recalculate up to an inertia improvement threshold of 1 %-10 % (user supplied).Resulting labeled masks are cleaned via binary operations and the user is prompted to select the labeled regions that contain full, single grains within a simple pop-up window (Fig. S3).
After selection, the orientation and a and b axes of an ellipse fit to the labeled region, shown to accurately approximate grain size (Graham et al., 2005a), are recorded and the grain is added to the final list and the masked region.This processing takes place over three separate scales representing a "burrowing" of the algorithm through the image (from largest to smallest window/grain size).Scales are set by the user-supplied longest expected a axis and image resolution.In contrast to the 46 variables employed by Basegrain, Peb-bleCounts has 20 command-line variable flags -of which 15 exert influence on the results -with most requiring little to no modification (Table S1).Examples of the command-line interface and clicking steps are in the manual and Sect.S2.

PebbleCountsAuto: AIF
This method applies the same initial non-local means denoising and interactive shadow/sand masking, with the option to input user-supplied values for full automation.From here, we diverge from the windowing and k-means approach and move directly to edge detection on the entire image using the same top-hat, Canny, and Sobel combination with feature-AND selections.
The resulting mask is then cleaned via binary morphological operations and each label is measured via ellipse fitting.To reduce the misidentified grains, the ellipses are filtered in a three-step chain.(A) Does the centroid fall within another ellipse?(B) Does the ellipse overlap with any neighboring el- lipses above some threshold?(C) Is the percent misfit (ellipse area versus grain-mask area) above some threshold?At each step, an answer of "yes" leads to the elimination of the grain.The (A) and (B) steps filter grains that have high overlap or are oversegmented, whereas (C) helps filter areas where multiple grains were combined in one mask or a non-grain was identified (e.g., remaining sand patch).Grains passing the test are taken as the final results, with the assumption that misidentified grains are minimal, particularly when upscaling to large areas and tens of thousands of pebbles on high-quality (low-blur) images.The command-line variables for this method are shown in Table S2, and command-line examples can be found in the manual.
We experimented with resampling (over-and undersampling) the image prior to grain detection to increase smoothing and to improve the detection of larger grains at the cost of measuring fewer smaller grains.The majority of images achieved the best results using the original resolution, though we did find a slight improvement in results using undersampling on some unsharp images (see Sect.S3).The selection of other parameters like the maximum percent misfit is also covered in Sect.S3.

Experimental setup
To test the KMS and AIF approaches on a simple control, we arranged three distributions of well-rounded, river pebbles with a-axis sizes from 3 to 130 mm in semi-overlapping patterns in a 0.5 × 0.5 m area (Fig. 4).As opposed to most studies that use b-axis lengths to measure the grain-size distribution (Bunte and Abt, 2001), in the experimental setup, we use a axes since it was easier to hand-measure the longest axis of the > 200 grains used.Six size-class bins (3-5, 10-20, 25-35, 40-50, 60-70, and 80-130 mm; all a axes) were sampled to approximate two log-normal and one bimodal grain-size distribution.The river pebbles used had uniform intragranular color with minimal striations (i.e., veins), low angularity, and a diverse array of intergranular colors.Lighting was www.earth-surf-dynam.net/7/859/2019/Earth Surf.Dynam., 7, 859-877, 2019 controlled by overhead fluorescent bulbs and the photos were taken without flash to limit cast shadows.

Orthomosaic generation
We tested a Fujifilm X100F model camera with a fixed 23 mm focal length lens and a Sony α6000 model with a removable 35 mm fixed length lens.Both had the same advanced photo system type-C (APS-C) sensors (23.6 mm × 15.6 mm) and both output photos at 24 megapixels in a 4000 × 6000-pixel format.Following initial tests, it became clear that the image quality and grain-size results were practically identical for these two cameras, so the results presented are only those for the Fujifilm, as the photo quality was slightly sharper throughout and less distorted at the image corners.To simulate reduced quality, the 24megapixel Fujifilm picture dimensions were reduced to 75 %, 50 %, and 25 %, resulting in 13.5-, 6-, and 1.5-megapixel images at pixel dimensions of 3000 × 4500, 2000 × 3000, and 1000 × 1500, respectively.
For each test setup, we collected ∼ 10 images from ∼ 20 • off nadir (oblique) and at least four overhead near-nadir (tilts < 10 • ) pictures, for 12-16 photos in total.The collection of oblique images aided in removing doming effects from the resulting point clouds (e.g., James and Robson, 2014) and for capturing the pebble edges and sides (Fig. S1).As consumergrade cameras have square pixels with negligible difference in horizontal and vertical resolution, the image scale can be calculated directly from the camera parameters and camera height with the resolution (R) in mm pixel −1 given by where S is the sensor height or width in millimeters, f is the lens focal length in millimeters, h is the camera height in millimeters, and I is the image height or width in pixels.S and I should either both be the width or both be the height of the sensor and image, respectively.This assumes no major distortions within the field of view, which is not valid for oblique imagery but is negligible for near-nadir photography at close range using non-fisheye lenses.With h = 1.55 m, the resulting image resolutions tested from the Fujifilm were 0.26, 0.35, 0.53, and 1.05 mm pixel −1 by Eq. ( 1).We used the 12-16 photos to generate SfM-MVS orthoimages in Agisoft Photoscan v.1.4.2 (Agisoft, 2018) -renamed Agisoft Metashape in recent versions.This allows rapid output of additional information including point clouds, digital elevation models (DEMs), and the undistorted orthomosaics, with resolution recorded in the image metadata for direct input into PebbleCounts and PebbleCountsAuto.Detailed Agisoft processing steps are provided in Sect.S4.

Comparison metrics
For the simple controlled experiment, with relatively coarse grain-size bins, it is not appropriate to compare percentiles (e.g., D 50 ) or to run Kolmogorov-Smirnov (KS) tests and measure the difference in distributions between the AIF or KMS and control grain-size distributions.Instead, we compared the counts in each bin between the control and algorithm and visually assessed the matching of the grain-size distributions.This provides a reasonable baseline for checking the performance of the algorithm in a highly controlled setting.

Results I: controlled experiment
For each of the three 150-200 clast arrangements, the KMS PebbleCounts runtime was ∼ 7 min on a laptop with 16 GB RAM and 2 cores (Intel i7-6650U 2.20 GHz) and no GPU, whereas the AIF PebbleCountsAuto runtime was ∼ 1 min.Both a single near-nadir image and the combined orthomosaic were used, but the results were entirely consistent aside from some inter-run variability in the KMS approach caused by the non-unique solution of k-means clustering.Given this consistency, we only present the results from the single nearnadir images.Furthermore, the use of only four overlapping near-nadir photos also generated the same results, albeit in about one-sixth the Agisoft orthomosaic processing time of using all 12-16 photos (∼ 10 min versus ∼ 1 h on the same laptop).
Across all three distributions, the KMS approach consistently undercounts the number of clasts in each a-axis bin (Fig. 4).However, and in agreement with previous research (Graham et al., 2010), this undercounting is uniformly distributed, and thus the grain-size distributions do not show notable differences between the algorithm and control.For the two arrangements with increased fine (3-5 mm) and coarse (60-130 mm) pebbles (Fig. 4b, c), the undercounting is stronger at the finer end of the distribution, leading to a slight underestimation of the grain-size distribution by the KMS approach in this region.This is caused partially by the user missing more of the smaller grains (of which there are exponentially more), some smaller grains being partially hidden by the larger, and also by the smallest grains being only a few pixels in area and thus eliminated during mask-cleaning steps or not captured at all.On the other hand, the AIF approach tends to overcount the fine pebbles, leading to overestimation of the grain-size distribution, because many small non-grain areas remaining in the masked image are automatically selected in the final result, rather than ignored as in the KMS approach.
As we reduced the resolution from 0.26 to 1.05 mm pixel −1 , the reduction in the finest size class increased dramatically for the KMS approach (Fig. 5).At the lowest resolution tested (1.5 megapixels), this undercounting leads to severe discrepancies in the grain-size  distribution curve.As the resolution degrades, it becomes more difficult to discern rocks in the smallest size class (3-5 mm), which correspond to a-axis grain sizes of 12-19, 9-14, 6-9, and 3-5 pixels for the 24-, 13.5-, 6-, and 1.5megapixel resolutions, respectively, indicating the necessity of a limiting lower measurement factor (e.g., Graham et al., 2005a).

Field setting
Having established the algorithms on control data, we sought to evaluate the performance on complex, natural photos.Field data provide the real-world application and detailed uncertainty analysis most useful for researchers seeking to apply the methods to their own sites.For this, we turned to photo surveys carried out on gravel-bed river cross sections of the foreland and topographic transition zone of the www.earth-surf-dynam.net/7/859/2019/Earth Surf.Dynam., 7, 859-877, 2019 northwestern Argentine Andes (Fig. 6).This is an area of strong precipitation, topographic, and environmental gradients, and the dynamic rivers surveyed are capable of transporting enormous quantities of sand, gravel, and boulders of various lithologies (Bookhagen and Strecker, 2012;Purinton and Bookhagen, 2018).Catchment-average erosion rates from the area, based on cosmogenic nuclide inventories, suggest rates on the order of 0.6-1 mm year −1 (Bookhagen and Strecker, 2012), with large variability during the Pleistocene and Holocene (Tofelde et al., 2017).The region is frequently affected by extreme hydrometeorological events that lead to flooding and drainage-pattern rearrangement (Castino et al., 2016(Castino et al., , 2017)).

Orthomosaic generation
All cross-section surveys were collected using the Sony α6000 camera model at 24-megapixel resolution, and survey sizes ranged from ∼ 1000 to 5000 m 2 .In this case, the standard zoom lens delivered with the camera was used at the shortest focal length of 16 mm to maximize the field of view.Also, to help cover the large survey sites, the camera was affixed to the end of a pole with a remote control trigger, allowing overhead shots to be collected from a height of 4.5-5 m (Fig. 7), giving a ground resolution of approximately 1.1-1.2mm pixel −1 by Eq. ( 1).UAV flights have proven difficult in the windy conditions experienced in these valleys, but flights at 20-30 m heights with the 12-megapixel camera provided on the DJI Mavic and Phantom models (focal lengths of 3.6-4.3mm, sensor dimensions of 6.17 × 4.55 mm, and image dimensions of 4000 × 3000 pixels) would result in image resolutions of ∼ 7-13 mm pixel −1 and are thus inadequate for delineating centimeter-scale pebbles.
To generate georeferenced orthomosaics that could be tiled and passed directly to PebbleCounts and Pebble-CountsAuto, survey sites on the dry riverbed were laid out with on average 18 coded targets (with a range of 10-24) and the position of each was measured with a differential GPS (Fig. 7).Kinematic post-processing with a permanent base station < 100 km away at the Universidad Nacional de Salta (UNSA) in Salta, Argentina, led to centimeter accuracy of XYZ target locations.The site was traversed in a crosshatched pattern with a photo captured every two to three paces, so that each location appeared in ∼ 9 near-nadir pictures from slightly different angles.We refer to the images as near nadir, rather than nadir, due to the fact that during mast photo collection some unintentional tilting of the camera (< 10 • ) occurred.These near-nadir photos aided in removing doming effects but did not allow us to capture the sides of pebbles as in the oblique images taken in the experimental setup (Fig. S1).Capturing oblique images of every patch in the field sites would require infeasible amounts of time and processing power.
Agisoft processing was similar to that described for the experiment, with some key differences (see Sect.S4).Given the volume of photos (600-1300 per site), the sites were processed automatically using the Python API for Agisoft, with processing times consistently over 10 h on an 80-core, 500 GB RAM server making use of 1 GPU NVIDIA Tesla K80 unit for some of the steps (e.g., dense matching).
From 10 of our full survey sites over three different river systems, we selected 12× ∼ 1 m 2 patches to clip out of the full orthomosaics and evaluate using the KMS and AIF approaches.The final resolution of these 12 GeoTiff orthoimages matched the theoretical value from Eq. ( 1), with an average of 1.16 mm pixel −1 and range of 1.08-1.24mm pixel −1 (standard deviation of 0.05 mm pixel −1 ).The patches (Fig. 6b) include variable amounts of sand and a large range of grain sizes, packing arrangements, and shadowing.From one site (S14A), there were handheld images available for the same selected patch from the same Sony α6000 camera zoomed to 20 mm focal length and taken from a height of ∼ 1.5 m, allowing for the generation of a complementary orthomosaic at 0.32 mm pixel −1 resolution.

Comparison metrics
For control data from the field, we return to b-axis measurements (rather than a axes as in the lab).In each patch, the b axes of all grains visible to the naked eye were manually digitized.This generated a 5490 pebble control dataset across all 12 mast-surveyed sites.For the lone handheld patch at 0.32 mm pixel −1 , the control data were 1726 pebbles versus 621 from the same patch at the 1.12 mm pixel −1 mast resolution, as smaller grains could be manually measured on the image at a resolution that was 4 times greater.
The use of continuous control data, as opposed to discrete bins in the lab experiment, allows a more detailed investigation of the performance of both approaches, including biases and their correction.The b-axis measurements of overlapping control and KMS grains were compared to look for sizing bias.This was followed by a search for the lower truncation limit (the lower cutoff in b-axis length in pixels that grains are reliably measured at) of the algorithm, also using the KMS results.For parts of the analysis, the size data were converted to the typical ψ scale (ψ = −φ = log 2(mm)) of grain-size measurement of coarse river sediments.This allows direct comparison of statistical results with other studies (e.g., Graham et al., 2005b) We compared the grain-size distributions from the KMS and AIF approaches with the control using a two-sample KS test to check the null hypothesis that the two samples are drawn from the same distribution.Because sample sizes were at times small, leading to erroneous KS-test results, we also devised a second metric of grain-size distribution comparison.Similar to the KS test, which uses the maximum distance between the cumulative distribution functions (grainsize distributions), our metric interpolates both distributions to the same lengths in 0.1 ψ steps and then sums the difference between the re-interpolated curve to give an approx-  imate integral of the difference between the two grain-size distributions (AIF or KMS minus the control), which we term A diff .Here, an A diff value close to 0 indicates good matching, and positive or negative values indicate underestimation or overestimation, respectively.
We also examined the performance of some key percentiles (D 5,16,25,50,75,84,95 ).The metrics for comparison of control (P C ) and KMS or AIF (P P ) percentiles are consistent with other studies (Sime and Ferguson, 2003;Graham et al., 2005bGraham et al., , 2010)).These are the mean (m = 1/n • (P P − P C )), the mean squared (ms = 1/n • (P P − P C ) 2 ), and the irreducible random error (e = √ ms − m 2 ).The bias of Pebble-Counts is quantified by m, and e measures the scatter or precision after bias correction (Sime and Ferguson, 2003).

Results II: field surveys
The KMS PebbleCounts approach took ∼ 10 min per 1 m 2 orthomosaic clip at 1.16 mm pixel −1 resolution, depending on the number of grains, and particularly the number of finer grains, present.Runtime for the AIF PebbleCountsAuto approach was typically ∼ 2 min per site.All runtimes refer to the same laptop with 16 GB RAM and 2 cores (Intel i7www.earth-surf-dynam.net/7/859/2019/Earth Surf.Dynam., 7, 859-877, 2019 6650U 2.20 GHz) and no GPU.For the 0.32 mm pixel −1 image, the processing for KMS took ∼ 45 min, as there were more fine grains to be identified (given the log-normal distribution) and so the clicking took exponentially longer, and the AIF took ∼ 20 min given the longer time spent filtering the large number of grains.We note that the use of a GPU for the filtering steps will significantly improve processing time.
Importantly, these runtimes refer to the use of no lower truncation value, which leads to much longer processing time.
An aggregation and coarse binning of all b axes in the control versus KMS and AIF data for the coarser imagery are presented in Fig. 8.There is obvious undercounting in these data from the KMS results, similar to the experimental setup, and here it causes a significant discrepancy in the grain-size distribution curves.Whereas the manual clicking found over 1000 grains in the smallest classes (1-2 and 2-3 ψ), the KMS approach found none in the smallest and only ∼ 100 in the second smallest.This skews the percentiles to the higher grain sizes and thus overestimates them significantly.In opposition to this, but again in agreement with the experimental setup, the AIF results display significant overcounting at the finer sizes as many non-grains are identified, particularly when the algorithm is run with no lower truncation.
The skewed results from both the KMS and AIF approaches warrant detailed analysis of the algorithms' deficiencies and grain-size distribution corrections.To begin, we examined the performance of PebbleCounts on grains manually digitized and the same grains selected during clicking in the KMS approach on the coarser imagery (Fig. 9).There is only a slight negative bias across all grain sizes, indicating underestimation of individual grains by PebbleCounts; however, this median shift varies with no apparent pattern and is likely caused by uncertainties in the manual b-axis digitization of thousands of grains.For instance, digitization with b-axis vector lines can achieve subpixel accuracy compared to the raster processing of PebbleCounts.The AIF approach measures grains identically to the KMS method and thus has the same misfit errors on correctly identified grains.From this, we conclude that the algorithm is effective on a grainby-grain basis and the skewing of the grain-size distributions is instead caused by sampling errors related to the image resolution and ability to find small grains (see Fig. 5).
The undercounting error can be explored on the full distribution of pebbles by gradually increasing the lower truncation value and assessing the error in percentiles versus the control data at each step (Fig. 10).As truncation is increased, the median percentile error decreases rapidly up to an inflecting value -manually chosen from the graph as a significant local minimum where the median difference is near 0 mm.Truncating the KMS distributions at a minimum baxis length of 23 mm (rounded to 20 pixels) improves the results significantly for the 1.16 mm pixel −1 imagery taken from the mast.Beyond this truncation, there is limited improvement.Regarding the 0.32 mm pixel −1 image, the 20-pixel (6.5 mm) truncation also results in a median difference near 0 mm, with subsequent truncation values leading to only ∼ 0.5 mm improvements.Supplying these truncation values directly to the KMS PebbleCounts tool results in reduced processing time to ∼ 5 min for the coarser imagery and ∼ 15 min for the finer, as many small grains were then ignored and left out of the clicking mask.
The same analysis for the AIF approach is complicated by the large number of false grains found and the extreme overcounting of fine grains.Given this, we instead make the assumption that the similarity of the two methods, particularly in the edge detection and ellipse fitting steps, leads to similar errors in both.Therefore, we assume the same 20-pixel truncation.For the AIF PebbleCountsAuto tool, processing times with the 20-pixel truncation reduced to < 1 min and ∼ 3 min for the coarse and fine images, respectively.
The combined results before and after lower truncation for the coarser (∼ 1.16 mm pixel −1 ) imagery taken from the mast surveys are shown in Fig. 11.Without any lower truncation, the AIF tool results in significant overcounting and grain-size distribution underestimation with a high A diff > 8.The KMS tool instead shows undercounting and grain-size distribution overestimation with a low A diff < −4.Both have KS-test p values < 0.0001.When we apply a 20-pixel truncation, both the AIF and KMS approaches achieve A diff values near or below −1, with the manual KMS approach performing best and achieving a high KS-test p value of 0.2398.The AIF approach retains a low p of 0.0008 with a ∼ 0.1-0.2ψ bias towards coarser values in the upper portion of the grain-size distribution (> D 50 ).
In Sect.S5 (Fig. S7), we show the 20-pixel truncated KMS and AIF results on a site-by-site basis.For the KMS approach, following truncation, 11 sites have p values > 0.1 and one site (S16) has p = 0.0971.A diff values are also near 0, indicating close matching of the grain-size distributions, aside from S24 and S34, which both show large discrepancies.The AIF results in Fig. S7 follow a similar trend to the KMS results, but there is a bias towards coarser values, with many A diff values < −1, and generally poorer results compared with the KMS approach, with grain-size distributions being overestimated by ∼ 0.1-0.2ψ.In the KMS results, despite a high p value, S24 demonstrates a stronger bias in the grain-size distribution towards coarser grains (up to 0.5 ψ discrepancy), as indicated by the high A diff value of −1.36.
Here, the KS-test pass is likely caused by the small sample size remaining after truncation (n = 24), the least of any site.The poor performance of S24 was expected given the large size range with many sub-centimeter pebbles and a few large boulders, strong cast shadows from the large grains, and intragranular edges on angular boulders with quartz veins (see Fig. 6b).Importantly, S24 is the only site not from a major river stem but rather from a debris-flow fan draining a small tributary catchment in the Quebrada del Toro.S34 also had a high A diff = −2.11.In this case, poor performance is due to Earth Surf.Dynam., 7, 859-877, 2019 www.earth-surf-dynam.net/7/859/2019/significant blurriness of this image and again a small sample size (n = 47).We also compared the individual percentiles of interest to assess the bias and accuracy of truncated results (Fig. 12).For the KMS approach, the bias (m) is 0.06 ψ with a precision (e) of 0.13 ψ.Excluding S24 and S34, m and e drop to 0.03 and 0.09 ψ, respectively.The AIF results have higher m and e values of 0.15 and 0.17 ψ, respectively, which are reduced to 0.13 and 0.15 ψ following exclusion of the same S24 and S34 sites, in addition to the S10 site, which was also somewhat blurry and with relatively few grains.For the AIF percentiles, we chose to include S16 despite a large overestimation at higher percentiles (Fig. S7), as this was a sharp image with a relatively large sample size.The high uncertainties from this scene likely require some adjustment of the edge-detection variables (see Sect.S3) for improved segmentation, but the results presented are realistic for fast processing using the AIF method, with the caveat of higher expected uncertainties.
The uncertainties in Fig. 12 are average values, and the inset plots also demonstrate the increasing uncertainty of larger percentiles.The maximum uncertainty for both at D 95 is m = 0.08ψ and e = 0.07 ψ for the KMS result and m = 0.35ψ and e = 0.2ψ for the AIF result.Importantly, since the ψ scale is logarithmic, the larger errors at higher percentiles correspond to similar percentage misfits as lower errors at smaller percentiles (e.g., 0.2 ψ precision at a grain size of 6.5 ψ (91 mm) is a 13 %-15 % misfit, whereas a 0.01 ψ precision at 4.5 ψ (23 mm) is a 4 %-10 % misfit).
As a final test for the KMS and AIF approaches, we turn towards our handheld imagery taken from S14A with a 4 times greater resolution of 0.32 mm pixel −1 (Fig. 13).We only show the 20-pixel truncated results, which displayed high KS-test p values > 0.2 and A diff close to 0 in both cases, with the AIF approach slightly underestimating (A diff = 0.6) and KMS slightly overestimating (A diff = −0.77).For the KMS approach, m and e are 0.07 and 0.05 ψ, respectively, and −0.06 and 0.05 ψ for AIF.

Caveat of PebbleCountsAuto AIF
The promising results of the AIF approach shown in Figs.11-13 come with some consideration of the grain-bygrain accuracy.In Fig. 14, we analyze the percentage of grains found in the AIF approach that have a corresponding grain in either the hand-clicked control (based on a 6 mm buffer of the b-axis line) or the KMS results (based on a www.earth-surf-dynam.net/7/859/2019/Earth Surf.Dynam., 7, 859-877, 2019  6 mm centroid buffer).From this subset of grains, we consider the AIF grain to be a matching (or correct) result if the b-axis difference between it and the nearby "good" grain (from the control or KMS) is < 1 cm.From this, we see that in the best-case scenario the percentage of correct grains identified by the AIF approach is only 70 %, from the handheld 0.32 mm pixel −1 image.A number of sites (S10, S16, S20B, S24, S34, and S35) have < 50 % matched grains.The two poorly performing sites (S24 with grain complexity and S34 with image blur) both demonstrate the lowest accuracy with < 40 % matches.Notably, despite a significant number of false positives in the results, when comparing the overall grain-size distributions (Fig. 11), and on a site-by-site basis (Fig. S7), the distribution of the AIF results matches the hand-clicked control well.The errors associated with the AIF method are demonstrated in Sect.S6.

Discussion
In this study, we developed two new methods for grain-size measurement with low uncertainties and the potential to deliver full grain-size distributions from complex images of Earth Surf.Dynam., 7, 859-877, 2019 www.earth-surf-dynam.net/7/859/2019/high-energy mountain rivers.Our open-source Python-based algorithms perform equally well to other image segmentation tools but can be applied more quickly over larger areas surveyed by the SfM-MVS workflow we present.Critical to success is the application of a strict lower cutoff, which limits the minimum measurable b-axis grain size to 20 times the pixel resolution.The automated version of the algorithm delivers less accurate measurements, but these can be limited by using low-blur, higher-resolution imagery.We focus our discussion on the comparison of our approach with similar work, the effect of the lower truncation on grain-size distri-bution estimates, and practical guidelines for acquiring imagery and applying PebbleCounts, including the application of UAV surveys.

Performance
For comparison of our algorithms to previous work, we do not consider errors reported in studies using texture-based measurements (e.g., Woodget et al., 2018), since these are based on correlative relationships rather than physical measurement of each grain.A match is defined as a grain within 5 pixels of the hand-clicked line or the KMS grain centroid for the 1.16 mm pixel −1 imagery or within 20 pixels for the 0.32 mm pixel −1 image (corresponding in both cases to a distance of ∼ 6 mm), and with a 1 cm maximum b-axis difference between the AIF grain and the match.The total percent correct, taken across all black triangles, is 51 %.
but high-energy mountain rivers with heterogeneous pebble arrangements and large ranges in sizes require segmentation approaches.Similar to other image segmentation methods, the KMS PebbleCounts approach undercounts grain sizes in each respective size class (Graham et al., 2010).This undercounting does not undermine the resulting grain-size distributions and associated percentile estimates, so long as an appropriate lower truncation is defined.This cutoff was found to be 20 pixels (compare to 23 pixels found by Graham et al. (2005a)) in b-axis length (Fig. 10), which explains the degradation in 3-5 mm counting in the reduced-resolution lab images (Fig. 5), where the smallest pebbles were only a few pixels in size as resolution was decreased.
As shown in Fig. 12, when we apply this cutoff and exclude poorly performing images, we find an average m (bias) and e (precision) of 0.03 and 0.09 ψ, respectively, for the ∼ 1.16 mm pixel −1 imagery and 0.07 and 0.05 ψ for the 0.32 mm pixel −1 image.For the AIF approach, these values are 0.13 and 0.15 ψ for the ∼ 1.16 mm pixel −1 imagery and −0.06 and 0.05 ψ for the 0.32 mm pixel −1 image.These are averages, which actually increase at higher percentiles in agreement with other image segmentation methods (e.g., Sime and Ferguson, 2003).We thus suggest higher error budgets at higher percentiles.
As demonstrated in Figs. 14 and S8, there are significant inaccuracies associated with the AIF approach.The errors associated with the AIF approach can be limited when applied to high-quality (low-blur) ∼ 1 mm pixel −1 resolution imagery, with better results possible on < 0.5 mm pixel −1 imagery.Ultimately, the uncertainties are highly dependent on the input image quality and complexity (range in grain size, angularity, intragranular variability) and providing blanket estimates is less useful than end users applying the KMS tool to a subset of images to validate the results of the AIF approach.
In spite of this caveat, our bias and precision values of −0.06-0.15ψ are on the low end of previously published errors from similar techniques (Table 1).To our knowledge, the only study to compare Basegrain results to control data by Westoby et al. (2015), makes comparisons in millime- ters rather than ψ units.Since the ψ scale is logarithmic, in our study, the error in millimeter increases with ψ from ∼ 0.8 mm uncertainty at 4.5 ψ (23 mm) to ∼ 7 mm uncertainty at 6.5 ψ (91 mm) for the ∼ 1.16 mm pixel −1 imagery in the KMS case.Westoby et al. (2015) report even greater bias and lower precision from Basegrain, with errors also increasing in magnitude at higher percentiles.We emphasize that the previous image segmentation techniques discussed all rely on watershed segmentation, whereas neither of our algorithms use this step for the reasons demonstrated in Figs. 1 and 2.

Lower truncation
The issue of lower truncation on grain-size distributions and percentile estimates has received much attention in the literature (e.g., Fripp and Diplas, 1993;Rice and Church, 1996;Bunte and Abt, 2001;Graham et al., 2010).Previously, field geomorphologists were interested in all grains above 8-16 mm, simply because smaller grains were difficult to manually identify and thus underrepresented in the results (e.g., Fripp and Diplas, 1993;Rice and Church, 1998).Previous work suggests that truncation at the finer end of the distribution primarily increases the lower percentiles, while having less effect on the large (> D 50 ) percentiles (Bunte and Abt, 2001).We find significant shifts in all percentiles of > 0.5 ψ when applying a 20-pixel truncation.Graham et al. (2010) report truncation errors of < 0.3 ψ for all percentiles in 1, 3, and 5 ψ truncated distributions.Their better results at lower percentiles are likely because the data were collected manually grid-by-number style in the field with the ability to include smaller grain sizes.The measurement resolution presents the ultimate control on how accurately grain-size percentiles can be measured.The purpose of the KMS and AIF approaches introduced here is in acquiring grain-size distributions from a subset of the full grain-size range present in the river, namely the subset with > 20-pixel b-axis length in image resolution.

Image acquisition
Ideally, collecting more than nine near-nadir images m −2 (as in our field surveys) or collecting an approximately 1 : 2 (or greater) ratio of near-nadir to oblique imagery (as in our experiments with point-cloud data dimensions; see Sect.S1), leads to the highest quality point-cloud results in Agisoft.
Higher-quality point clouds, in turn, lead to less distortion errors during orthorectification and higher-quality orthomosaics.Due to the textured nature of gravel images, we attained comparable results in reduced time using only four overlapping near-nadir images m −2 in the lab setting.In any case, high overlap of ∼ 80 % between images is recommended to ensure the best results.Where a user desires accurate and dense point-cloud data in addition to the 2-D orthomosaics, it is recommended that (many) more images closer to the surface be collected (e.g., Verma and Bourke, 2019) and from oblique viewing angles (e.g., James and Robson, 2014).
As we find the difference in calculated resolution and subsequent grain-size measurement to be negligible between orthorectified and raw near-nadir imagery at these scales, the use of orthoimagery is not strictly necessary when using image-segmentation software like PebbleCounts (e.g., Carbonneau et al., 2018).However, on very rough surfaces with cast shadows from large grains, generating orthoimagery will overcome distortions present in the raw photos.Furthermore, georeferenced orthomosaics may be preferable for capturing large sites at a constant resolution that can be fed into the algorithm.
In terms of camera and photographic height (and thus resolution) considerations, one first needs to assess the minimum grain size that is desired.Following this, the resolution of the image can be determined using Eq. ( 1) with some knowledge of the camera parameters (focal length, camera height, sensor size, and image size).The smallest grain b axis needed should be 20 times this resolution.For instance, using a similar camera to the Sony α6000 (24-megapixel, 15.6 × 23.5 mm sensor, 16 mm focal length), to measure all grains down to 1 cm one needs a resolution of 0.5 mm pixel −1 and thus a maximum camera height of ∼ 2 m.If finer grain sizes are desired, the user can use higher-resolution imagery but must be aware of the longer time needed for processing.

UAV surveying
The > 20 m flight heights typical of UAV surveys lead to centimeter-scale imagery with currently available 12-to 24megapixel cameras, which is less appropriate for Pebble-Counts processing, unless large (> 0.2 m) cobbles and boulders dominate the river site.Acquiring 0.5 mm pixel −1 imagery from a DJI Mavic drone with a 12-megapixel camera requires a very low flight height of ∼ 1.4 m, giving a field of view of only ∼ 1.5 × 2 m.This may be improved using better cameras like on the Mavic 2 Pro (20-megapixel camera), but gathering such imagery with the high overlap (∼ 80 %) required for SfM-MVS processing is still difficult, particularly given current ∼ 20 min flight length limitations from available batteries.Given continual technology improvements (e.g., greater battery life, more accurate geotags from onboard dGPS, higher megapixel cameras, and reduced motion blur), it is within reason to expect hectare to multi-hectare SfM-MVS UAV surveys at mm resolution in seamless orthomosaics along entire river reaches in the near future.But, for the time being, a single, non-overlapping orthoimage workflow proposed by Carbonneau et al. (2018) has high potential to achieve large-areal results.Their workflow, building on Carbonneau and Dietrich (2017), uses a number of high and oblique overlapping flights to orthorectify a lower nonoverlapping flight with millimeter-scale acquisition, with resulting single, scaled images passed to Basegrain, or, alternatively, to PebbleCounts.

Coverage and processing limits
Using handheld imagery, a survey site of 1000-5000 m 2 with ∼ 10 ground control points (GCPs) measured via dGPS can be covered in 2-6 h by one person (including GCP collection).Using a camera-on-mast setup, this time can be reduced by half, with even greater speed possible using more people and cameras (of the same sensor dimensions, focal length, and height).The potential to cover even larger sites up to or exceeding 100 × 100 m (1 ha) is feasible in a day of work by two people (with one measuring the targets and both sharing the photo-taking) using the proposed method with a 16-20 mm focal length lens and a 3-5 m mast.
One limit of the scalability of the PebbleCounts method is processing time.The KMS PebbleCounts tool is recommended to be applied to maximum 1-2 m 2 patches, depending on the image resolution, as the manual clicking of good grains is time-consuming, requiring 5-20 min per patch depending on patch size, image resolution, and abundance of finer grains.On the other hand, the AIF PebbleCountsAuto tool can theoretically be applied at larger scales.However, it is also advisable to tile data and feed them to the algorithm in maximum 1-2 m 2 patches for ∼ 1 mm pixel −1 imagery, since the non-local means de-noising can take minutes on very large images (> 2000 × 2000 pixels).Again, the use of systems with GPUs or large memory will shorten processing times and allow for larger images to be run.
In practical terms, a workflow to cover a ∼ 2500 m 2 survey site captured at 1 mm pixel −1 resolution and processed into a georeferenced orthomosaic would be (1) tiling into 2 m 2 patches, (2) passing each patch to the AIF PebbleCountsAuto tool with quick manual steps of shadow-masking and sandclicking (if sand is present), where each tile takes 1-2 min, (3) selecting a random subset of ∼ 20 tiles to pass to the KMS PebbleCounts tool as validation and uncertainty estimation for the AIF approach.Such a workflow could be accomplished in 1-2 d of work by an experienced user, providing tens to hundreds of thousands of measured grains from the survey site and a robust measurement of the full grain-size distribution.To increase processing speed, a gridded subset of tiles could also be extracted from the full survey site, with a 3-5 m step size between patches, to provide complete coverage across heterogeneous gravel-bar features, while avoiding unnecessary oversampling and processing of every patch in the survey site.

Conclusions
Using a k-means approach for pebble segmentation in the spectral and spatial domain combined with fast manual selection of good results, we developed a new semi-automated algorithm for grain sizing optimized for images taken over gravel-bed rivers (PebbleCounts).We also developed an automated algorithm that uses suspect grain filtering (Peb-bleCountsAuto), albeit with larger uncertainties in the results.The lower truncation of the methods (minimum b-axis length measurable) is limited to 20 pixels and above.These new methods were necessary to acquire grain-size distributions from dynamic high-mountain rivers with complexity from sources such as large ranges in grain size, intragranular heterogeneity, grain overlap, irregular shadowing, and sand patches.Similar to previous methods, PebbleCounts is best applied at the patch scale (∼ 1 m 2 ); however, PebbleCounts provides more realistic results in complex images without any post-processing steps in ∼ 5-10 min per patch, assuming ∼ 1 mm pixel −1 resolution imagery.PebbleCountsAuto performs very well on high-quality (low-blur) imagery, though with remaining misidentification that must be approached with caution.Grain-sizing results can be upscaled to areas on the order of 10 2 -10 4 m 2 when PebbleCounts results are used as validation for the automated PebbleCountsAuto function.

Figure 1 .
Figure1.Difference between our k means with manual selection (KMS) and automatic with image filtering (AIF) approaches versus a fully automated watershed segmentation approach on a gravel image from a high-mountain river.The a and b axes of each grain mask are found via an ellipse fit to the same area.Fewer grains are found in the KMS and AIF results, and there is still some misidentification in the case of AIF but less than in the watershed result.

Figure 2 .
Figure 2. Watershed segmentation (blue, dashed and dotted line) versus KMS (gray, dashed line) and AIF (red, dotted line) approaches compared with a hand-clicked b-axis grain-size distribution (black line) for a ∼ 1 m 2 river patch (S09 in Fig.6b).Watershed approach leads to oversegmentation of grains, giving an unreasonable number of clasts (276 versus 106 in the control) and an overly fine grain-size distribution.

Figure 3 .
Figure 3. Flowchart of PebbleCounts (a) and PebbleCountsAuto (b).The boxes are user-supplied input or output from the algorithm.Dashed lines indicate a user input step during processing, either entering and checking values or clicking.

Figure 4 .
Figure 4. Result of KMS (gray, dashed lines) and AIF (red, dotted lines) on the three experimental lab setups (a-c) with known grain inputs in six size classes (black line), measured as the grain a axis: (a) log normal, (b) log normal with increased number of all classes, including fines, and (c) skewed bimodal with increased number of coarser grains.Bottom row shows the counts per bin and the top row shows the resulting grain-size distribution.The images are 0.26 mm pixel −1 (24 megapixels).

Figure 5 .
Figure 5. Results of reducing the image dimensions to (a) 75 % (13.5 megapixels), (b) 50 % (6 megapixels), and (c) 25 % (1.5 megapixels) and rerunning the KMS approach on the distribution in Fig. 4a.The control is shown as solid black (left y axis) and gray (right y axis) lines and KMS as the dashed lines.

Figure 6 .
Figure 6.(a) Field cross-section survey sites (black triangles) in NW Argentina from three gravel-bed rivers (Toro, Vaqueros, and Grande) and their tributaries, draining from the sparsely vegetated mountains in the west towards the verdant foreland and city centers of Salta and Jujuy in the east.The Landsat 8 RGB composite satellite image (using bands 2, 3, and 4) from 12 June 2017 shows the climatic transition from wet foreland to dry mountains, demarcated by the green-brown transition zone, running approximately north-south, corresponding to vegetation changes.(b) Detailed view of the 12× ∼ 1 m 2 orthomosaic clips from each of the field sites with average resolution of 1.16 mm pixel −1 .

Figure 7 .
Figure 7. Sony α6000 24-megapixel camera affixed to mast for photo collection at a height of 4.5-5 m (a) and differential GPS measurement of coded targets (b).

Figure 8 .
Figure 8.Comparison of (a) KMS and (b) AIF at the 12 field sites all aggregated and coarsely binned.Control is shown as solid black (left y axis) and gray (right y axis) lines and KMS and AIF as the dashed lines.

Figure 9 .
Figure 9. Measurement error of PebbleCounts (here the KMS results) versus control on a grain-by-grain basis for overlapping grains in the coarser (1.16 mm pixel −1 ) imagery.There is an overall median shift, but the binned medians do not display a consistent pattern.

Figure 10 .
Figure 10.(a) Error in each percentile (5th-95th) as lower truncation value is increased in 1 mm steps for the 1.16 mm pixel −1 imagery.Only a few steps are plotted for clarity.(b) The median difference in percentiles compared with the control versus the lower truncation value, with the normalized median absolute difference (NMAD) shown as the error envelope(Höhle and Höhle, 2009).From this analysis, we select a lower truncation of 20 pixels.The analysis in panel (a) was repeated for the finer image (with 0.5 mm truncation steps) to get the gray squares line in panel (b) and is not shown here.

Figure 11 .
Figure 11.Results from hand-clicked control (black line), KMS PebbleCounts (gray, dashed line), and AIF PebbleCountsAuto (red, dotted line) with the initial non-truncated run (a) and the 20-pixel truncated run (b).In corresponding colors are the p value results of a KS test and the A diff approximate integral between the curves for each approach versus the control data.The legend indicates the number of grains (n) making up each curve.Note the reduction in x-axis scale between the columns, where the right truncated distributions are plotted on a narrower range to emphasize the remaining discrepancies.The curves separated by site (Fig. 6b) are shown in Sect.S5 and Fig. S7.

Figure 12 .
Figure 12.Comparing the key b-axis percentiles across all 12 field sites and between the KMS and AIF approaches with the 20-pixel truncation applied.(a) All 12 sites from KMS, (b) KMS improvement when excluding S24 and S34, (c) all 12 sites from AIF, and (d) AIF improvement when excluding S10, S24, and S34.For the main plot, each data point is a percentile value from a single site and the 1 : 1 relationship is the gray diagonal.The mean (m), mean squared (ms), and irreducible (e) errors are shown for each plot, taken as the average of all seven percentile errors across the 9-12 sites plotted.The m and e are separately plotted for each percentile in the inset plot.The number of grains in the control ("control grains") and KMS or AIF results ("grains found") is also indicated.The individual site curves where these data points originate are shown in Sect.S5 and Fig. S7.

Figure 13 .
Figure 13.(a) Results from hand-clicked control (black line), KMS PebbleCounts (gray, dashed line), and AIF PebbleCountsAuto (red, dotted line) from the 20-pixel truncated run on the 0.32 mm pixel −1 handheld imagery.In corresponding colors are the p-value results of a KS test and the A diff approximate integral between the curves for each approach versus the control data.(b) Percentile comparison for both methods with KMS in gray and AIF in red, with inset box showing the uncertainties for each in the corresponding color.

Figure 14 .
Figure 14.Percentage of grains from AIF results with a matching grain in either the hand-clicked control or in the KMS result.A match is defined as a grain within 5 pixels of the hand-clicked line or the KMS grain centroid for the 1.16 mm pixel −1 imagery or within 20 pixels for the 0.32 mm pixel −1 image (corresponding in both cases to a distance of ∼ 6 mm), and with a 1 cm maximum b-axis difference between the AIF grain and the match.The total percent correct, taken across all black triangles, is 51 %.
Texture methods work well for homogeneous pebble arrangements in lower-energy settings,

Table 1 .
Comparison of PebbleCounts and PebbleCountsAuto results with other segmentation-based pebble counting studies.Taken from only three percentiles(D 50,84,95).bCorrectedvalue presented byGraham et al. (2005b).cComparisonmade in millimeters, converted to ψ units here.d Large spread caused by significant disagreement at higher percentiles. a