Automated riverbed composition analysis using deep learning on underwater images

Ermilov, Alexander A.; Benkő, Gergely; Baranya, Sándor

doi:https://doi.org/10.5194/esurf-11-1061-2023

Articles | Volume 11, issue 6

https://doi.org/10.5194/esurf-11-1061-2023

Articles | Volume 11, issue 6

Research article

01 Nov 2023

Research article |

| 01 Nov 2023

Automated riverbed composition analysis using deep learning on underwater images

Alexander A. Ermilov, Gergely Benkő, and Sándor Baranya

Abstract

The sediment of alluvial riverbeds plays a significant role in river systems both in engineering and natural processes. However, the sediment composition can show high spatial and temporal heterogeneity, even on river-reach scale, making it difficult to representatively sample and assess. Conventional sampling methods are inadequate and time-consuming for effectively capturing the variability of bed surface texture in these situations. In this study, we overcome this issue by adopting an image-based deep-learning (DL) algorithm. The algorithm was trained to recognise the main sediment classes in videos that were taken along cross sections underwater in the Danube. A total of 27 riverbed samples were collected and analysed for validation. The introduced DL-based method is fast, i.e. the videos of 300–400 m long sections can be analysed within minutes with continuous spatial sampling distribution (i.e. the whole riverbed along the path is mapped with images in ca. 0.3–1 m² overlapping windows). The quality of the trained algorithm was evaluated (i) mathematically by dividing the annotated images into test and validation sets and also via (ii) intercomparison with other direct (sieving of physical samples) and indirect sampling methods (wavelet-based image processing of the riverbed images), focusing on the percentages of the detected sediment fractions. For the final evaluation, the sieving analysis of the collected physical samples were considered the ground truth. After correcting for samples affected by bed armouring, comparison of the DL approach with 14 physical samples yielded a mean classification error of 4.5 %. In addition, based upon the visual evaluation of the footage, the spatial trend in the fraction changes was also well captured along the cross sections. Suggestions for performing proper field measurements are also given; furthermore, possibilities for combining the algorithm with other techniques are highlighted, briefly showcasing the multi-purpose nature of underwater videos for hydromorphological assessment.

Please read the corrigendum first before continuing.

Download & links

Article (PDF, 12944 KB)

Notice on corrigendum
The requested paper has a corresponding corrigendum published. Please read the corrigendum first before downloading the article.
Article (12944 KB)

Corrigendum

Download & links

The requested paper has a corresponding corrigendum published. Please read the corrigendum first before downloading the article.

How to cite.

Received: 28 Sep 2022 – Discussion started: 14 Nov 2022 – Revised: 15 Sep 2023 – Accepted: 26 Sep 2023 – Published: 01 Nov 2023

1 Introduction

The physical composition of a riverbed plays a crucial role in fluvial hydromorphological processes as a sort of boundary condition in the interaction mechanisms between the flow and the solid bed. Within these processes, the grains on the riverbed are responsible for multiple phenomena, such as flow resistance (Vanoni and Hwang, 1967; Zhou et al., 2021), stability of the riverbed (Staudt et al., 2018; Obodovskyi et al., 2020), development of bed armour (Rákóczi, 1987; Ferdowsi et al., 2017), sediment clogging (Rákóczi, 1997; Fetzer et al., 2017), and fish shelter (Scheder et al., 2015). Through these physical processes, the bed material composition has a determining effect on numerous river uses, e.g. the possibility of inland waterway transport (Xiao et al., 2021), the drinking water supply through bank filtration (Cui et al., 2021), or the quality of riverine habitats (Muñoz-Mas et al., 2019). Knowledge of riverbed morphology and sediment composition (sand, gravel, and cobble content) is therefore of major importance in river hydromorphology. In order to gain information about riverbed sediments, in situ field sampling methodologies are implemented.

Traditionally, bed material sampling methods are intrusive (i.e. sediment is physically extracted from the bed for follow-up analysis) and carried out via collecting the sediment grains one by one (areal, grid-by-number, and pebble count methods; e.g. Bunte and Abt, 2001; Guerit et al., 2018) or in a larger amount by a variety of grab samplers (volumetric methods, such as WMO, 1981; Singer, 2008). This is then followed by measuring their sizes individually on site or transporting them to a laboratory for mass-sieving analysis (Fehr, 1987; Diplas, 1988; Bunte and Abt, 2001). These sampling procedures are time- and energy-consuming, especially in large-gravel and mixed-bed rivers, where characteristic grain sizes can strongly vary both in time and space (Wolcott and Church, 1991; USDA, 2007), requiring a dense sampling point allocation. The same goes for critical river reaches, where significant human impact led to severe changes in the morphological state of the rivers (e.g. the upper section of the Hungarian Danube; Török and Baranya, 2017). When assessing bed material composition on a river-reach scale, experts usually try to extrapolate from the samples and describe larger regions of the bed (even several thousand square metres) using data gathered from several dozen points (e.g. USDA, 2007; Haddadchi et al., 2018; Baranya et al., 2018; Sun et al., 2021). Gaining a representative amount of the sediment samples is also a critical issue. For instance, following statistical criteria such as those of Kellerhals and Bray (1971) or Adams (1979), a representative sample should weigh anywhere from tens to hundreds of kilograms. Additionally, physical bed material sampling methods are unable to directly quantify important, hydromorphological features such as roughness or bedforms (Graham et al., 2005). Due to these constraints, surrogate approaches have recently been tested to analyse the riverbed. Examples are introduced in the rest of this section. Unlike the conventional methods, these techniques are non-intrusive and rely on computers and other instrumentation to decrease the need for human intervention and speed up the analyses.

One group of the surrogate approaches includes the acoustic methods, where an acoustic wave source (e.g. an Acoustic Doppler Current Profiler, ADCP) is pointed towards the riverbed from a moving vessel, emitting a signal. The strength and frequency of this signal is measured while it passes through the water column, reflecting back to the receiver from the sediment transported by the river and finally from the riverbed itself. This approach is fast, and larger areas can be covered relatively quickly (Grams et al., 2013). While it has already become widely used for describing sediment movement (i.e. suspended sediment, Guerrero et al., 2016; bedload, Muste et al., 2016; and indirect flow velocity; Shields and Rigby, 2005) and channel shape (Zhang et al., 2008), it has not reached a similar breakthrough for riverbed material analysis. Researchers experimented with the reflecting signal strength (dB) from the riverbed (e.g. Shields, 2010) to establish its relationship with the riverbed material. Their hypothesis was that the absorption (and hence the reflectance) of the acoustic waves reaching the bed correlates with the type of bed sediment. Following initial successes, the method presented several disadvantages and limitations; hence, it could not establish itself as a surrogate method for riverbed material measurements so far. For example, Shields (2010) showed that it was necessary to apply instrument-specific coefficients to convert the signal strength into bed hardness, and these coefficients could only be derived by first validating each instrument using collected sediment samples with corresponding ADCP data. Moreover, the method was sensitive to the bulk density of the sediment and to bedforms. Based on his results and observations, the sediment classification could only extend to differentiate between cohesive (clay, silt) and non-cohesive (sand, gravel) sediment patches, but gravel could not be distinguished strongly from sand as they produced similar backscatter strengths. Buscombe et al. (2014a, b) further elaborated on the topic and successfully developed a better, less limited, decision-tree-based approach. They showed that spectral analysis of the backscatter is much more effective for differentiating the sediment types compared to the statistical analysis used by Shields. With this approach it became possible to classify homogenous sand, gravel, and cobble patches. However, Buscombe et al. (2014a, b) also emphasises that acoustic approaches are not capable of separating the effects of surface roughness from the effects of bedforms; therefore, the selection of an appropriate ensemble averaging window size is of great importance for their introduced method. This size has to be small enough to not include morphological signal, for which the a priori analyses of riverbed elevation profiles is needed at each site. Furthermore, they suggest their method is sensitive to and limited by high concentrations of (especially cohesive) sediment; therefore, its application to heterogeneous riverbeds would require site-specific calibrations. The above-mentioned studies also note that acoustic methods in general inherently do not allow the measurement of individual sediment grains due to their spatial averaging nature. The detected signal strength correlates with the median grain size of the covered area; information about other nominal grain sizes cannot be gained.

Another group of surrogate approaches is the application of photography (Adams, 1979; Ibbekken and Schleyer, 1986) and later computer vision or image-processing techniques. During the last 2 decades, two major subgroups emerged: one uses object and edge detection (by finding abrupt changes in intensity and brightness of the image, segmenting objects from each other; Sime and Ferguson, 2003; Detert and Weitbrecht, 2013), while the other uses analyses the textural properties of the whole image, using autocorrelation and semi-variance methods to define the empirical relationship between the image texture and the grain size of the photographed sediments (Rubin, 2004; Verdú et al., 2005). Both image-processing approaches were very time-consuming and required mostly site-specific manual settings; however, a few transferable and more automated techniques have also been developed recently (e.g. Graham et al., 2005; Buscombe, 2013). Even though there is a continuous improvement in the applied image-based bed sediment analysis methods, there are still major limitations the users face. These limitations include the following problems.

Most of the studies (all the ones listed above) focus on gravel-bedded rivers, and only a few exceptions can be found in the literature where sand is also accounted for (texture-based methods; e.g. Buscombe, 2013).
The adaptation environment was typically non-submerged sediment instead of underwater conditions (with a few exceptions, e.g. Rubin et al., 2007; Warrick et al., 2009).
The computational demand of the image processing is high (e.g. 1–10 min per image; Detert and Weitbrecht, 2013).
The analysis requires operator expertise (higher than in the case of any conventional method).
There is an inherent pixel and image resolution limit (Buscombe and Masselink, 2008; Cheng and Liu, 2015; Purinton and Bookhagen, 2019). The finer the sediment, the higher the required resolution of the images will be (higher calculation time). Alternatively, they must be taken from a closer position (smaller area and sample per image).

Nowadays, with the rising popularity of artificial intelligence (AI), several machine learning (ML) techniques have been implemented in image recognition as well. The main approaches of segmentation contra-textural analysis still remain; however, an AI defines the empirical relationship between the object sizes (Igathinathane et al., 2009; Kim et al., 2020) or texture types (Buscombe and Ritchie, 2018) in the images and their real sizes. In the field of river sedimentology a few examples can already be found, where ML (e.g. deep learning, DL) was implemented. For instance, Rozniak et al. (2019) developed an algorithm for gravel-bed rivers, performing textural analysis. With this approach, information is not gained on individual grains (e.g. their individual shape and position) but rather the general grain size distribution (GSD) of the whole image. At certain points of the studied river basins, conventional physical samplings (pebble count) were performed to provide real GSD information. Using this data, the algorithm was trained (with ∼1000 images) to estimate GSD for the rest of the study site based on the images. The method worked for areas where grain diameters were larger than 5 mm and the sediment was well sorted. The developed method showed sensitivity to sand coverage, blurs, reduced illuminations (e.g. shadows), and white pixels. Soloy et al. (2020) presented an algorithm that used object detection on gravel- and cobble-covered beaches to calculate individual grain sizes and shapes. A total of 46 images were used for the model training; however, the number of images was multiplied with data augmentation (rotating, cropping, blurring the images; see Perez and Wang, 2017) to enhance the learning session and increase the input data. The method was able to reach a limited execution speed of a few seconds per square metre and adequately measured the sizes of the gravel. Ren et al. (2020) applied an ensemble bagging-based machine learning (ML) algorithm to estimate GSD along the 70 km long region of Hanford Reach on the Columbia River. Due to its economic importance, a large amount of measurement data has been accumulated for this study site over the years, making it ideal for using ML. By the time of the study, 13 372 scaled images (i.e. their millimetre-to-pixel ratio was known) were taken both underwater and in the dry zones, covering approx. 1 m² each. The distance between the image sampling points was generally between 50 and 70 m. An expert defined the GSD (eight sediment classes) of each image by using a special visual evaluation classification methodology (Delong and Brusven, 1991; Geist et al., 2000). This dataset was fed to a ML algorithm along with their corresponding bathymetric attributes and hydrodynamic properties, simulated with a 2D hydrodynamic model. Following this, it was tested to predict the sediment classes based on the hydrodynamic parameters only. The algorithm performed with a mean accuracy of 53 %. Even though this method was not image-based (only indirectly, via the origin of the GSD data), it highlighted the possibilities of an AI for a predictive model using a high-dimensional dataset. Having such a large dataset of grain size information can be considered exceptional and takes a huge amount of time to gather, even with the visual classification approach they adapted. Moreover, this was still considered spatially sparse information (point-like measurements, 1 m² covered area, and images dozens of metres away from each other). Buscombe (2020) used a set of 400 scaled images to train an AI algorithm on image texture properties using another image-processing method (Barnard et al., 2007) for validation. The algorithm reached a good result for not only gravel but also sand GSD calculation, outperforming an earlier, but promising, texture-based method (wavelet analysis; Buscombe, 2013). In addition, the method required fewer calibration parameters than the wavelet image-processing approach. The study also foresaw the possibility to train an AI that estimates the real sizes of the grains, without knowing the scale of one pixel (mm $/$ pixel ratio) if the training is done properly. The AI might learn unknown relationships between the texture and sizes if it is provided with a wide variety (images of several sediment classes) and scale (mm $/$ pixel ratio) in the dataset (however, it is also prone to learn unwanted biases). Recently, Takechi et al. (2021) further elaborated on the importance of shadow detection and removal using a dataset of 500 pictures for training a texture-based AI with the help of an object-detecting image-processing technique (Basegrain; Detert and Weitbrecht, 2013). The previously presented studies, applying ML and DL techniques, significantly contributed to the development and improvement of surrogate sampling methods, incorporating the great potential of AI. However, there are still several shortcomings to these procedures. Firstly, none of the image-based AI studies used underwater recordings, even though the underwater environment offers completely different challenges. Secondly, the training images were always scaled, i.e. the sizes of the grains could be easily reconstructed, which is again complicated to accomplish in a river. Lastly, they were not adapted for continuous (i.e. spatially dense) measurement but instead focused on a sparse grid-like approach.

The goal of this study is to further investigate the applicability of image processing as a surrogate method and attempt to solve the shortcomings of previous AI-based approaches. Hence, we introduce a riverbed-material-analysing DL algorithm and field measurement methodology and our first set of results. The introduced technique can be used to measure the gravel and sand content of the submerged riverbed surface. It aims to eventually become a practical tool for exploratory mapping, by detecting sedimentation features (e.g. deposition zones of fine sediment, colmation zones, bed armour) and helping decision-making for river sedimentation management. In addition, the long-term hypothesis of the authors includes the creation of an image-based measurement methodology, where underwater videos of the riverbed could serve multiple sediment-related purposes simultaneously. Part of this is the current approach for mapping the riverbed material texture and composition. Others include measuring the surface roughness of the bed (Ermilov et al., 2020) and detecting bedload movement (Ermilov et al., 2022).

Compared to the studies introduced earlier, the main novelty of our study is that both the training and analysed videos are recorded underwater, continuously along cross sections of a large river. Furthermore, the training is unscaled, meaning that the camera–riverbed distance varies while recording the videos without considering image scale. Moreover, compared to the relatively low number of training images in most previous studies, we used a very large dataset (∼15 000) of sediment images for the texture-based AI, containing mostly sand, gravel, cobble, and to a smaller extent bedrock, together with some other, non-sediment-related objects.

2 Methods

2.1 Case studies

The results presented in this study are based on riverbed videos taken during three measurement campaigns in sections of the Danube, Hungary. The first campaign was at Site A, Ercsi settlement (∼ 1606 rkm, river kilometers), where three transects were recorded; the second one was at Site B, Gönyű settlement (∼ 1791 rkm), with two transects; and the third was at Site C, near Göd settlement (∼ 1667 rkm), with three transects (Fig. 1). Each transect was recorded separately (one video per transect); therefore, our dataset included a total of eight videos.

https://esurf.copernicus.org/articles/11/1061/2023/esurf-11-1061-2023-f01

Figure 1The location of the riverbed videos where the underwater recordings took place. All sites were located in Hungary in central Europe. The surveys were carried out on the Danube, which is Hungary's largest river.

The training of the DL algorithm was done using the video images of Site C and a portion of Site A (test set; see later in Sect. 2.3), while Site B and the rest of the images from Site A served for validation. The measurements were carried out during daytime at a middle-water regime (Q= 1900 m³ s⁻¹) in case of Site A and a low-water regime (Q= 1350 m³ s⁻¹) at Site B and Site C (Q= 700 m³ s⁻¹). This latter site served only for increasing the training image dataset (i.e. conventional samplings were not carried out at the time of recording the videos), and thus we do not go into further details with it for the rest of the study, but the main characteristics are listed in Table 1.

Table 1Main hydromorphological parameters of the measurement sites. Q_survey is the discharge during the survey, B_survey is the river width during the survey, H_mean,survey is the mean water depth during the survey, S_survey is the riverbed slope during the survey, SSC_survey is the mean suspended sediment concentration during the survey, Q_annual,mean is the annual mean of the discharge at the site, and Q_1 % is for the flooding discharge with a 1 % annual exceedance probability.

Download Print Version | Download XLSX

As underwater visibility conditions are influenced by the suspended sediment (SSC_survey – suspended sediment concentration), the characteristics of this sediment transport are also included in Table 1. The highest water depths were around 6–7 m in all cases. At Site A, measurements included mapping of the riverbed with a camera along three separate transects (Fig. 2a). At Site B, two transects were recorded (Fig. 2b).

https://esurf.copernicus.org/articles/11/1061/2023/esurf-11-1061-2023-f02

Figure 2Bathymetry of Site A and Site B. The measurement cross sections are also marked. The vessel moved along these lines from one bank to the other while carrying out ADCP measurement and recording riverbed videos. Physical bed material samples were also collected in certain points of these sections. The x and y coordinates are given in EOV, which refers to the Hungarian Uniform National Projection system (The background aerial images were downloaded from © Google Earth Pro).

2.2 Field data collection

Figure 3 presents a sketch of the measurement process with the equipment and a close-up of the underwater instrumentation. During the field measurements, the camera was attached to a streamlined weight (originally used as an isokinetic suspended sediment sampler) and lowered into the water from the vessel by an electric reel. The camera was positioned perpendicularly to the water and the riverbed in front of the nose of the weight. Next to the camera, two diving lights worked as underwater light sources, focusing into the camera's field of view (FoV). In addition, four laser pointers were also equipped in handmade isolation cases to provide possible scales for secondary measurements. They were also perpendicular to the bottom, projecting their points onto the underwater camera field of view. Their purpose was to ensure a visible scale (mm $/$ pixel ratio) in the video footage for validation. During the measurement procedure, a vessel crossed the river slowly through river transects, while the position of the above detailed equipment was constantly adjusted by the reel. Simultaneously, ADCP and real-time kinematic (RTK) GPS measurement were carried out by the same vessel, providing water depth, riverbed geometry, flow velocity, ship velocity, and position data. Based on this information and by constantly checking the camera's live footage on deck, the camera was lowered or lifted to keep the bed in camera sight and avoid colliding with it. The sufficient camera–riverbed distance depended on the suspended sediment concentration near the bed and the used illumination. The reel was equipped with a register, with its zero adjusted to the water surface. This register was showing the length of cable already released under the water, effectively the rough distance between the water surface and the camera (i.e. the end of the cable). Due to the drag force this distance was not vertical, but this value was continuously compared to the water depth measured by the ADCP. Differencing these two values, an approximation for the camera–riverbed distance was given all the time. The sufficient difference could be established by monitoring the camera footage while lowering the device towards the bed. This value was then to be maintained with smaller corrections during the survey of the given cross section, always supported by observing the camera recording, and adjusting to environmental changes. The vessel's speed was also adjusted based on the video and slowed down if the video was blurry or the camera got too far away from the bed (see later in Sect. 3.3). The measurements required three personnel to (i) drive the vessel; (ii) handle the reel, adjust the equipment position, and monitor the camera footage; and (iii) monitor the ADCP data while communicating with the other personnel (see Fig. 3).

https://esurf.copernicus.org/articles/11/1061/2023/esurf-11-1061-2023-f03

Figure 3(a) Sketch of the measurement process. The vessel was moving perpendicular to the riverbank along a cross section (i). A reel was used to lower a camera close to the riverbed (ii). Simultaneously, the bed topography and water depth were measured by an ADCP (iii). (b) Close-up sketch of the underwater instrumentation.

Automated riverbed composition analysis using deep learning on underwater images

2.1 Case studies

2.2 Field data collection

2.3 Image analysis: artificial intelligence and the wavelet method

3.1 Evaluation of the training

3.2 Comparison of methods

3.2.1 Visual evaluation and physical samples

3.2.2 Wavelet analysis

3.3 Implementation challenges

3.4 Novelty and future work