Automated quantification of floating wood pieces in rivers from video monitoring: a new software tool and validation

9 Wood is an essential component of rivers and plays a significant role in ecology and morphology. It can 10 be also considered as a risk factor in rivers due to its influence on erosion and flooding. Quantifying and 11 characterizing wood fluxes in rivers during floods would improve our understanding of the key processes but 12 is hindered by technical challenges. Among various techniques for monitoring wood in rivers, streamside 13 videography is a powerful approach to quantify different characteristics of wood in rivers, but past research 14 has employed a manual approach that has many limitations. In this work, we introduce new software for the 15 automatic detection of wood pieces in rivers. We apply different image analysis techniques such as static and 16 dynamic masks, object tracking, and object characterization to minimize false positive and missed detections. 17 To assess the software performance, results are compared with manual detections of wood from the same 18 videos, which was a time-consuming process. Key parameters that affect detection are assessed including 19 surface reflections, lighting conditions, flow discharge, wood position relative to the camera, and the length 20 of wood pieces. Preliminary results had a 36% rate of false positive detection, primarily due to light reflection 21 and water waves, but post-processing reduced this rate to 15%. The missed detection rate was 71% of piece 22 numbers in the preliminary result, but post processing reduced this error to only 6.5% of piece numbers, and 23 13.5% of


Introduction
Floating wood has a significant impact on river morphology (Gurnell et al., 2002;Gregory et al., 2003;Wohl, 2013;Wohl and Scott, 2017).It is both a component of stream ecosystems and a source of risk for human activities (Comiti et al., 2006;Badoux et al., 2014;Lucía et al., 2015).The deposition of wood at given locations can cause a reduction of the cross-sectional area, which can both increase upstream water levels (and the risk for neighboring communities), and laterally concentrate the flow downstream, which can lead to damaged infrastructure (Lyn et al., 2003;Lagasse, 2010;Mao and Comiti, 2010;Badoux et al., 2014;Ruiz-Villanueva et al., 2014;De Cicco et al., 2018;Mazzorana et al., 2018).Therefore, understanding and monitoring the dynamics of wood within a river is fundamental to assess and mitigate risk.An important body of work on this topic has grown over the last two decades, which has led to the development of many monitoring techniques (Marcus et al., 2002;MacVicar et al., 2009a;MacVicar and Piégay, 2012;Benacchio et al., 2015;Ravazzolo et al., 2015;Ruiz-Villanueva et al., 2019;Ghaffarian et al., 2020;Zhang et al., 2021) and conceptual and quantitative models (Braudrick and Grant, 2000;Martin and Benda, 2001;Abbe and Montgomery, 2003;Gregory et al., 2003;Seo and Nakamura, 2009;Seo et al., 2010).A recent review by Ruiz-Villanueva et al. (2016), however, argues that the area remains in relative infancy compared to other river processes such as the characterization of channel hydraulics and sediment transport.Many questions remain open areas of inquiry including wood hydraulics, which is needed to understand wood recruitment, movement and trapping, and wood budgeting, where better parametrization is needed to understand and model the transfer of wood in watersheds at different scales.
In this domain, the quantification of wood mobility and wood fluxes in real rivers is a fundamental limitation that constrains model development.Most early works were based on repeated field surveys (Keller and Swanson, 1979;Lienkaemper and Swanson, 1987), with more recent efforts taking advantage of aerial photos or satellite images (Marcus et al., 2003;Lejot et al., 2007;Lassettre et al., 2008;Senter and Pasternack, 2011;Boivin et al., 2017) to estimate wood delivery at larger time scales of 1 year up to several decades.
Others have monitored wood mobility once introduced by tracking wood movement in floods (Jacobson et al., 1999;Haga et al., 2002;Warren and Kraft, 2008).Tracking technologies such as active and passive Radio Frequency Identification transponders (MacVicar et al., 2009a;Schenk et al., 2014) or GPS emitters and receivers (Ravazzolo et al., 2015) can improve the precision of this strategy.To better understand wood flux, specific trapping structures such as reservoirs or hydropower dams can be used to sample the flux over time interval windows (Moulin and Piégay, 2004;Seo et al., 2008;Turowski et al., 2013).Accumulations upstream of a retention structure can also be monitored where they trap most or all of the transported wood, as was observed by Boivin et al. (2015), to quantify wood flux at the flood event or annual scale.All these approaches allow the assessment of wood budget and the in-channel wood exchange between geographical compartments within a given river reach and over a given period (Schenk et al., 2014;Boivin et al., 2015Boivin et al., , 2017)).
For finer scale information on the transport of wood during flood events, video recording of the water surface is suitable for estimating instantaneous fluxes and size distributions of floating wood in transport (Ghaffarian et al., 2020).Classic monitoring cameras installed on the river bank are cheap and relatively easy to acquire, setup and maintain.As is seen in Table 1, a wide range of sampling rates and spatial/temporal scales have been used to assess wood budgets in rivers.MacVicar and Piégay (2012) and Zhang et al. (2021) (in review), for instance, monitored wood fluxes at 5 frames per second (fps) and a resolution of 640 × 480 up to 800 × 600 pixels.Boivin et al. (2017) used a similar camera and frame rate as MacVicar and Piégay (2012) to compare periods of wood transport with and without the presence of ice.Senter et al. (2017) analyzed the complete daytime record of 39 days of videos recorded at 4 fps and a resolution of 2048 × 1536 pixels.Conceptually similar to the video technique, time-lapse imagery can be substituted when large rivers where surface velocities are low enough and the field of view is large.Kramer and Wohl (2014); Kramer et al. (2017) applied this technique in the Slave River (Canada) and recorded one image every 1 and 10 minutes.
Where possible, wood pieces within the field of view are then visually detected and measured using simple software to measure the length and diameter of the wood to estimate wood flux (piece/s) or wood volume ( 3 /) (MacVicar and Piégay, 2012;Senter et al., 2017).Critically for this approach, the time it takes for the researchers to extract information about wood fluxes has limited the fraction of the time that can be reasonably analyzed.Given the outdoor location for the camera, the image properties depend heavily on lighting conditions (e.g.surface light reflections, low light, ice, poor resolution or surface waves) which may also limit the accuracy of frequency and size information (Muste et al., 2008;MacVicar et al., 2009a).In such situations, simpler metrics such as a count of wood pieces, a classification of wood transport intensity, or even just a binary presence/absence may be used to characterize the wood flux (Boivin et al., 2017;Kramer et al., 2017).A fully automatic wood detection and characterization algorithm can greatly improve our ability to exploit the vast amounts of data on wood transport that can be collected from streamside video cameras.
From a computer science perspective, however, automatic detection and characterization remain challenging issues.In computer vision, detecting objects within videos typically consists of separating the foreground (the object of interest) from the background (Roussillon et al., 2009;Cerutti et al., 2011Cerutti et al., , 2013)).The basic hypothesis is that the background is relatively static and covers a large part of the image, allowing it to be matched between successive images.In the riverine environments, however, such an assumption is unrealistic because the background shows a flowing river, which can have rapidly fluctuating properties (Ali and Tougne, 2009).Floating objects are also partially submerged in water that has high suspended material concentrations during floods, making them only partially visible (e.g. a single piece of wood may be perceived as multiple objects) (MacVicar et al., 2009b).Detecting such an object in motion within a dynamic background is an area of active research (Ali et al., 2012(Ali et al., , 2014;;Lemaire et al., 2014;Piégay et al., 2014;Benacchio et al., 2017).Accurate object detection typically relies on the assumption that objects of a single class (e.g., faces, bicycles, animals, etc.) have a distinctive aspect or set of features that can be used to distinguish between types of objects.With the help of a representative dataset, machine learning algorithms aim at defining the most salient visual characteristics of the class of interest (Lemaire et al., 2014;Viola and Jones, 2006).When the objects have a wide intra-class aspect range, a large amount of data can compensate by allowing the application of deep learning algorithms (Gordo et al., 2016;Liu et al., 2020).To our knowledge, such a database is not available in the case of floating wood.
The camera installed on the Ain River in France has been operating more or less continuously for over 10 years and vast improvements in data storage mean that this data can be saved indefinitely (Zhang et al., 2021).The ability to process this image database to extract the wood fluxes allows us to integrate this information over floods, seasons and years, which would allow us to significantly advance our understanding of the variability within and between floods over a long time period.An unsupervised method to identify floating wood in these videos by applying intensity, gradient and temporal masks was developed by Ali and Tougne (2009) and Ali et al. (2011).In this model, the objects were tracked through the frame to ensure that they followed the direction of flow.An analysis of about 35 minutes of the video showed that approximately 90% of the wood pieces was detected (i.e., about 10% of detection were missed), which confirmed the potential utility of this approach.An additional set of false detection related to surface wave conditions amounted to approximately 15% of the total detection.However, the developed algorithm was not always stable and was found to perform poorly when applied to a larger data set (i.e., Video segments more than 1hr).
The objectives of the presented work are to describe and validate a new algorithm and computer interface for quantifying floating wood pieces in rivers.First, the algorithm procedure is introduced to show how wood pieces are detected and characterized.Second, the computer interface is presented to show how manual annotation is integrated with the algorithm to train the detection procedure.Third, the procedure is validated using data from the Ain River.The validation period occurred over six days in January and December 2012 where flow conditions ranged from ~400  3 /, which is below bankfull discharge but above the wood transport threshold, to more than 800  3 /.

Monitoring site and camera settings
The Ain River is a piedmont river with a drainage area of 3630  2 at the gauging station of Chazeysur-Ain, with a mean flow width of 65 m, a mean slope of 0.15%, and a mean annual discharge of 120  3 /.
The lower Ain River is characterized by an active channel shifting within a forested floodplain (Lassettre et al., 2008).An AXIS221 Day/Night TM camera with a resolution of 768 × 576 pixels was installed at this station to continuously record the water surface of the river at a maximum frequency of 5 fps (Fig 1).This camera replaced a lower resolution camera at the same location used by MacVicar and Piégay (2012).The specific location of the camera is on the outer bank of a meander, on the side closest to the thalweg, at a height of 9.8 m above the base flow elevation.The meander and a bridge pier upstream help to steer most of the floating wood so that it passes relatively close to the camera where it can be readily detected with a manual procedure (MacVicar and Piégay, 2012).The flow discharge is available from the website (www.hydro.eaufrance.fr).
The survey period examined on this river was during 2012 from which two flood events, (January 1-7 and December 15) were selected for annotation.A range of discharges from 400 3 / to 800  3 / occurred during these periods (Fig 1 .e),which is above a previously observed wood transport threshold of ~300  3 / (MacVicar and Piégay, 2012).A summary of automated and manual detections for the six days is shown in Table 3.

Methodological procedure for automatic detection of wood
The algorithm for wood detection comprises a number of steps that seek to locate objects moving through the field of view in a series of images and then identify the objects most likely to be wood.The algorithm used in this work modifies the approach described by Ali et al. (2011).The steps work from a pixel to image to video scale, with the context from the larger scale helping to assess whether the information at the smaller scale indicates the presence of floating wood or not.In a still image, a single pixel is characterized by its location within the image, its color and its intensity.Looking at its surrounding pixels, on an image scale, allows that information to be spatially contextualized.Meanwhile, the video data adds temporal context, so that previous and future states of a given pixel can be used to assess its likeliness of representing floating wood.On a video scale, the method can embed expectations about how wood pieces should move through frames, how big they should be, and how lighting and weather conditions can evolve to change the expectations of wood appearance, location, and movement.

Wood probability masks
In the first step, each pixel was analyzed individually and independently.The static probability mask answers the question "is one pixel likely to belong to a wood-block, given its color and intensity?".The algorithm assumes that the wood pixels can be identified by pixel light intensity () following a Gaussian The advantage of this approach is that it is computationally very fast.However, misclassification is possible, particularly when light condition changes.The second mask, called the dynamic probability mask, outlines each pixel's recent history.The corresponding question is: "is this pixel likely to represent wood now, given its past and present characteristics?".
Again, this step is based on what is most common in our database: it is assumed that a wood pixel is darker than a water pixel.Depending on lighting conditions like shadows cast on water or waves, this is not always true, i.e., water pixels can be as dark as wood pixels.However, pixels displaying successively water than wood tend to become immediately and significantly darker, while pixels displaying wood then water tend to become significantly lighter.Meanwhile, the intensity of pixels that keep on displaying wood tends to be rather stable.Thus, we assign wood pixel probability according to an updated version of the function proposed by Ali et al. (2011) (Fig 4.a) that takes 4 parameters.This function  is an updating function, which produces a temporal probability mask from the inter-frame pixel value.On a probability map, a pixel value ranges from -1 (likely not wood) to 1 (likely wood).The temporal mask value for a pixel at location (, ) and at time  is   (, , )= (∆  , ) +   (, ,  − 1).We apply a threshold to the output of   (, , ) so that it always stays within the interval [0,1].The idea is that a pixel that becomes suddenly and significantly darker is assumed to be likely wood.(∆  , ) is such that under those conditions, it increases the pixel probability map value (parameters  and ).A pixel that becomes lighter over time is unlikely to correspond to wood (parameter ).A pixel which intensity is stable and that was previously assumed to be wood shall still correspond to wood, while a pixel which intensity is stable and which probability to be wood was low is unlikely to represent wood now.A small decay factor () was introduced in order to prevent divergence (in particular, it prevents noisy areas from being activated too frequently).The final wood probability mask is created using a combination of both the static and dynamic probability masks.Wood objects thus had to have a combination of the correct pixel color and the expected temporal behavior of water-wood-water color.The masks were combined assuming that both probabilities are independent, which allowed us to use the Bayesian probability rule in which the probability masks are simply multiplied, pixel by pixel, to obtain the final probability value for each pixel of every frame.

Wood object identification and characterization
From the probability mask it is necessary to group pixels with high wood probabilities into objects and then to separate these objects from the background to track them through the image frame.For this purpose, pixels were classified as high-or low-probability based on a threshold applied to the combined probability mask.Then, the high-probability pixels were grouped into connected components (that is, small, contiguous regions on the image) to define the objects.At this stage, a pixel size threshold was applied on the detected objects so that only the bigger objects were considered to represent woody objects on the water surface (Fig

5.a the big white region at the middle).
A number of smaller components were often related to non-wood objects, for example waves, reflections, or noise from the camera sensor or data compression.
After the size thresholding step, movement direction and velocity were used as filters to distinguish real objects from false detections.The question here is, "is this object moving through the image frame the way we would expect floating wood to move?".To do this, the spatial and temporal behavior of components were analyzed.First, to deal with partly immersed objects, we agglomerated multiple objects within frames as components of a single object if the distance separating them was less than a set threshold.Second, we associated wood objects in successive frames together to determine if the motion of a given object was compatible with what is expected from driftwood.This can be achieved according to the dimensionless parameter " ∆ ⁄ ", which provides a general guideline for the distance an object pass between two consecutive frames (Zhang et al., 2021).Here  (passing time) is the time that one piece of wood passes through the camera field of view and ∆ is the time between two consecutive frames and practically it is recommended to use videos with  ∆ ⁄ > 5 in this software.In our case, tracking wood is rather difficult for classical object tracking approaches in computer vision: the background is very noisy, the acquisition frequency is low and the objects appearance can be highly variable due to temporarily submerged parts and highly variable 3D structures.Given these considerations it was necessary to use very basic rules for this step.The rules are therefore based on loose expectations, in terms of pixel intervals, on the motions of the objects, depending on the camera location and the river properties.How many pixels is the object likely to move between image frames from left to right?How many pixels from top to bottom?How many appearances are required?How many frames can we miss because of temporary immersions?Using these rules, computational costs remained low and the analysis could be run in real-time while also providing good performance.data.In this step, all images containing the object are transformed from pixel to cartesian coordinates (as will be described in the next section) and the median length is calculated and used as the most representative state.
This approach also matched the manual annotation procedure where we tended to pick the view where the object covers the largest area to make measurements.For the current paper, every object as characterized from the raw image based on its size and its location.It is worth to say detection was only possible during the daylight.

Image rectification
Warping images according to a perspective transform results in an important loss of quality.On warped images, areas of the image farther from the camera provide little detail and are overall very blurry and noninformative.Therefore, image rectification was necessary to calculate wood length, velocity, and volume from the saved pixel-based characterization of each object.To do so, the fisheye lens distortion was first corrected.A fisheye lens distortion is a characteristic of the lens that produces visual distortion intended to create a wide panoramic or hemispherical image.This effect was corrected by a standard Matlab process using the ComputerVisionToolbox TM (Release 2017b).
Ground-based cameras have also an oblique angle of view, which means that pixel to meter correspondence is variable and images need to be orthorectified to obtain estimates of object size and velocity in real terms (Muste et al., 2008).Orthorectification refers to the process by which image distortion is removed and the image scale is adjusted to match the actual scale of the water surface.Translating from pixels to cartesian coordinates required us to assume that our camera follows the pinhole camera model and that the river can be assimilated to a plane of constant altitude.Under such conditions, it is possible to translate from pixel coordinates to a metric 2D space thanks to a perspective transform assuming a virtual pinhole camera on the image and estimating the position of the camera and its principal point (center of the view).An example of orthorectification on a detected wood piece in a set of continuous frames and pixel coordinates (Fig

User interface
The software was developed to provide a single environment for the analysis of wood pieces on the surface of the water from streamside videos.It consists of four distinct modules: Detection, Annotation, Training, and Performance.The home screen allows the operator to select any of these modules.From within a module, a menu bar on the left side of the interface allows operators to switch from one module to another.
In the following sections, the operation of each of these modules are described.

Detection module
The detection module is the heart of the software.This module allows, from learned or manually specified parameters, the detecting of floating objects without human intervention (see wood pixels without specifying instances or objects, or to sample pixels within annotated objects.Finally, objects and/or pixels can be annotated multiple times in a video sequence to increase the amount and detail of information in such an annotation database.This annotation process is time-consuming, so a trade-off must be made regarding the purpose of the annotated database and its required accuracy.Manual annotations are especially important when it is intended to be used within a training procedure, for which different lighting conditions, camera parameters, wood properties, and river hydraulics must be balanced.The rationale for manual annotations in the current study is presented in section 5.1.
Given that the tool is meant to be as flexible as possible, the annotation module was developed to allow operator to perform annotation in different ways, depending on the purpose of the study.As shown in Fig Annotating a piece of wood is like drawing its shape, directly on a frame of the video, using the drawing tools provided by the module.It is possible to add a text description to each annotation.Each annotation is linked to a single frame of the video; however, a frame can contain several annotations.An annotated video, therefore, consists of a video file, as well as a collection of drawings, possibly with textual descriptions, associated with frames.It is possible to link annotations from one frame to another to signify that they belong to the same piece of wood.These data can be used to learn the movement of pieces of wood in the frame.

Performance module
The performance module allows the operator to set rules to compare automatic and manual wood detection results.This section also allows the operator to use a bare, pixel-based annotation or specify an orthorectification matrix to extract wood-size metrics directly from the output of an automatic detection.
For this module an automatic detection file is first loaded and then the result of this detection is compared with a manual annotation for that video, if the latter is available.Comparison results are then saved in the form of a summary file (*.csv format), allowing the operator to perform statistical analysis of the results or the performance of the detection algorithm.A manual annotation file can only be loaded if it is associated with an automatic detection result.
The performance of the detected algorithm can be realized on several levels: • Object.The idea is to annotate one (or more) occurrences of a single object, and to operate the comparison at bounding box scale.A detected object may comprehend a whole sequence of occurrences, on several frames.It is validated when only a single occurrence happens to be related to an annotation.This is the minimum possible effort required to have an extensive overview of the object frequency on such an annotations database.This approach can however lead us to misjudge overall wrongly detected sequences as True Positives (see below), or vice-versa.
• Occurrence.The idea is to annotate, even roughly, every occurrence of every woody object, so that the comparison can happen between bounding boxes rather than at pixel level.Every occurrence of any detected object can be validated individually.This option requires substantially more annotation work than the object annotation.
• Pixel.This case implies that every pixel of every occurrence of every object is annotated as wood.
It is very powerful in the event of evaluating the algorithm performances, and eventually refining its parameters with the help of some machine learning technique.However, it requires an extensive annotation work.

Assessment procedure
To assess the performance of the automatic detection algorithm, we used a set of videos from the Ain River in France that were both comprehensively manually annotated and automatically analyzed.According to the data annotated by the observer, the performance of the software can be affected by different conditions: (i) wood piece length, (ii) distance from the camera, (iii, iv) wood X, Y position, (v) flow discharge, (vi) daylight, and (vii, viii) light and darkness of the frame (see Table 2).If for example software detects a 1 cm piece at a distance of 100 m from the camera, there is a high probability that this is a false positive detection.
Therefore, knowing the performance of the software in different conditions, it is possible to develop some rules to enhance the quality of data.The advantage of this approach is that all eight parameters introduced here are accessible easily in the detection process.In this section the monitoring details and annotation methods are introduced before the performance of the software is evaluated by comparing the manual annotations with the automatic detections.Transferring coordinates to metric.
Calculating length, position, and distance.

Distance
Objects closer to the camera are easier to detect.
X position Some particular areas of turbulent flow in the field of view affect detection (e.g., presence of a bridge pier).Y position

Discharge
Flow discharge affects water color, turbulence and the amount of wood.
Recorded water elevation data and calibrated rating curve at hydrologic station.

Time
Luminosity of the frames varies with time of day.Time of day as indicated on top of each frame.

Dark roughness
Small spots with sharp contrast (either lighter or darker) affect detection.

% of pixels below an intensity threshold
Light roughness % of pixels above an intensity threshold Ghaffarian et al. (2020), Zhang et al. (2021) show that the wood discharge (m 3 per a time interval) can be measured from flux or frequency of wood objects (pieces number per a time interval).An object level detection was thus sufficient for the larger goals of this research at the Ain River, which is to get a complete budget of transported wood volume.
A comparison of annotated with automatic object detections gives rise to three options: • True Positive (): an object is correctly detected and is recorded in both the automatic and annotated database • False Positive (): an object is incorrectly detected and is recorded only in the automatic database.
• False Negative (): an object is not detected automatically and is only recorded in the annotated database.
Despite overlapping occurrences of wood objects in the two databases, the objects could vary in position and size between them.For the current study we set the TP threshold as the case where either at least 50% of the automatic and annotated bounding box areas were common or at least 90% of an automatic bounding box area was part of its annotated counterpart.
In addition to the raw counts of , , and , we defined two measures of the performances of the application, where: • Recall Rate () is the fraction of wood objects that are automatically detected ( /( + )); and • Precision Rate () is the fraction of detected objects that are wood (/( + )).
The higher the  and the  are, the more accurate our application is.However, both rates tend to interact.For example, it is possible to design an application that displays a very high  (which means that it doesn't miss many objects), but suffers from a very low  (it outputs a high amount of inaccurate data), and vice-versa.Thus, we have to find a balance that is appropriate to each application.
It was well known from previous manual efforts to characterize wood pieces and develop automated detection tools that it is easier to detect certain wood objects than others.In general, the ability to detect the wood objects in the dynamic background of a river in flood was found to vary with the size of the wood object, its position in the image frame, the flow discharge, the amount and variability of the light, interference from other moving objects such as spiders, and other weather conditions such as wind and rain.In this section, we describe and define the metrics that were used to understand the variability of the detection algorithm performance.
In general, more light results in better detection.The light condition can be varied by variation of a set of factors such as weather conditions or amount of sediment which is carried by the river.In any case, the daylight is a factor that can change the light condition systematically, i.e. low light early in the morning (Fig  Detection is also strongly affected by the frame 'roughness', defined here as the variation in light over small distances in the frame.The change in light is important for the recognition of wood objects, but light roughness can also occur when there is a region with relatively light pixels due to something such as reflection of the surface of the water, and dark roughness can occur when there is a region with relatively dark pixels due to something such as shadows from the surface water waves or surrounding vegetation.Detecting wood is typically more difficult around light roughness, which results in false negatives, while the color-map of a darker surface is often close to that of wood, which results in false positives.Both of these conditions can be when there is an obstacle in the flow, such as downstream of the bridge pier in the current case.The light roughness was calculated for the current study by defining a light intensity threshold and calculating the ratio of pixels of higher value among the frame.The dark roughness is calculated in the same way, but in this case the pixels less than the threshold were counted.In this work thresholds equal to 0.9 and 0.4 were used for light and dark roughness, respectively. The oblique view of the camera means that the distance of the wood piece from the camera is another important factor in detection (Fig 8 .c).The effect of distance on detection interacts with wood length, i.e.
shorter pieces of wood that are not detectable near the camera may not be detectable toward the far bank due to the pixel size variation (Ghaffarian et al., 2020).Moreover, if a piece of wood passes through a region with high roughness (Fig 8 .c)or amongst bushes or trees (Fig 8.c right hand side) it is more likely that the software is unable to detect it.In our case, one day of video record could not be analyzed due to the presence of a spider that moved around in front of the camera.
Flow discharge is another key variable in wood detection.Increasing flow discharge generally means that water levels are higher, which brings wood close to the near bank of the river closer to the camera.This change can make small pieces of wood more visible, but it also reduces the angle between the camera position and pixels, which makes wood farther from the camera harder to see.High flows also tend to increase surface waves and velocity, which can increase the roughness of the frame and lead to the wood being intermittently submerged or obscured.More suspended sediment is carried during high flows which can change water surface color and increase the opacity of the water.

Detection performance
Automatic detection software performance was evaluated based on the event based , , and  raw numbers and the precision (PR) and recall rates (RR) using the default parameters in the software.On average, manual annotation resulted in the detection of approximately twice as many wood pieces as the detection software (Table 3).Measured over all the events, RR = 29%, which indicates that many wood objects were not detected by the software, while among detected objects about 36% were false detections ( = 64%).To better understand model performance, we first tested the correlation between the factors identified in the previous section by calculating each one of the eight parameters for all detections as one vector and then calculating the correlation between each pair of parameters (Table 4).As shown (the bold values), the pairs of dark/light roughness, length/distance and discharge/time were highly correlated ( .= 0.59, 0.46, 0.37 respectively).For this reason, they were considered together to evaluate the performance of the algorithm within a given parameter space.The X/Y positions were also considered as a pair despite a relatively low correlation (0.15) because they represent the position of an object.As a note, the correlation between time and dark roughness is higher than discharge and time, but we used the discharge/time pair   software by a posteriori distinction between  and .After removing  from the detected pieces, in the second part, we test a process to predict the annotated data that the software missed i.e., false negatives.

Precision improvement
To improve the precision of the automatic wood detection we first ran the software to detect pieces and extracted the eight key parameters for each piece as described in section 5.1.Having the value of the eight key parameters (four pairs of parameters in Fig 9 ) for each piece of wood, we then estimated the total precision of each object, as the average of four precisions from each sub-figure of Fig 9 .In the current study the detected piece was considered to be a true positive if the total precision exceeded 50%.To check the validity of this process, we used cross-validation by leaving one day out, calculating the precision matrices based on five other days, and applying the calculated  matrices on the day that was left out.As is seen in Table 5, this post-processing step increases the precision of the software to 85%, an enhancement of 21%.The degree to which the precision is improved is dependent on the day left out for cross-validation.If, for example, the day left out had similar conditions to the mean, the  matrices were well trained and were able to distinguish between  and  (e.g., 2 nd Jan with 42% enhancement).On the other hand, if we have an event with new characteristics (e.g., very dark and cloudy weather or at discharges different from what we have in our database), the PR matrices were relatively blind and offered little improvement (e.g., 15 th Dec with 10% enhancement). 1  denotes the false estimations of the precision matrices which results in missing some .
2   denotes the recall rate of post processing which corresponds to   .
One difficulty with the post-processing reclassification of wood piece is that this new step can also introduce error by classifying real objects as false positives (making them a false negative) or vice-versa.
Using the training data, we were able to quantify this error and categorized them as post-processed false negatives (  ) with an associated recall rate (  ).As shown in Table 5, the precision enhancement process lost only around 14% of  (  = 86%).

Estimating missed wood pieces based on the recall rate
The automated software detected 29% of the number of manually annotated wood pieces (Table 5).In the previous section, methods were described that enhance the precision of the software by ensuring that these automatically detected pieces are .The larger question, however, is how to estimate the missing pieces.Based on Fig 9, both PR and RR are much higher for very large objects in most areas of the image and in most lighting conditions.However, the smaller pieces were found to be harder to detect, making the wood length the most important factor governing the recall rate.Based on this idea, the final step in the post processing is to estimate smaller wood pieces that were not detected by the software using the length distribution extracted by the annotations.
The estimation is based on the concept of a threshold piece length.Above the threshold, wood pieces are likely to be accurately counted using the automatic software.Below the threshold, on the other hand, the In the next step we wanted to estimate the pieces less than 2.5 m that the software missed.During the automatic detection process, when the software detects a piece of wood, according to Fig 9 (third column), the  can be calculated for this piece (same protocol as precision enhancement in Sect 5.3.1).Therefore, if for example the average recall rate for a piece of wood is 50%, there is likely to be another piece in the same condition (defined by the eight different parameters described in Table 2) that the software could not detect.
To correct for these missed pieces, additional pieces were added to the database, note that these pieces were imaginary pieces inferred from the wood length distribution and were not detected by the software.shows the length distribution after adding these virtual pieces to the database (blue line, total of 5841 pieces).
The result shows a good agreement between this and the operator annotations (green line, total of 6249 pieces), which results in a relative error of only 6.5% in the total number of wood pieces.

Conclusion
Here, we present new software for the automatic detection of wood pieces on the river surface.After presenting the corresponding algorithm and the user interface, an example of automatic detection was presented.We annotated 6 days of flood events that were used to first check the performance of the software and then develop post-processing steps to both remove possibly erroneous data and model data that were possibly missed by the software.To evaluate the performance of the software, we used precision and recall rates.The automatic detection software detects around one third of all annotated wood pieces with 64% precision rate.Then using the operator annotations as the ultimate goal, the post-processing part was applied to extrapolate data extracted from detection results, aiming to come as close as possible to the annotations.It is shown that using four pair of key factors: (i) light and dark roughness of the frame, (ii) daytime and flow discharge, (iii) X, Y coordinates of detection position, and (iv) distance of detection as a function of piece length, it is possible to detect false positives and increase the software precision to 86%.Using the concept of a threshold piece length for detection it is shown that it is then possible to model the missed wood pieces (false negatives).In the presented results, the final recall rate results in a relative error of only 6.5% for piece number and 13.5% for wood volume.It should be noted that the software cannot distinguish between a single piece of wood or the pieces in a cluster of wood in the congested wood fluxes.
This work shows the feasibility of the detection software to detect wood pieces automatically.Automation will significantly reduce the time and expertise required for manual annotation, making video monitoring a powerful tool for researchers and river managers to quantify the amount of wood in rivers.Therefore, the developed algorithm can be used to characterize wood pieces for a large image database at the study site.The results from the current study were all taken from a single site in which a large database of manual annotations was available for developing the correction procedures.In future applications it is unlikely that such a large database would be available.In such cases it is recommended to first ensure that the images collected are of high quality by following the recommendations in (Ghaffarian et al., 2020;Zhang et al., 2021).As data are collected, the automatic algorithm can be run to identify periods of high wood flux.Manual review of other high-water periods is also recommended to assess whether lighting conditions were preventing the detection of wood.When suitable flood periods with floating wood are identified, manual annotations should be done to create the correction matrices.Future applications of this approach at a wide range of sites should lead to new insights on the variability of wood pieces at the reach and watershed scales in world rivers.
Finally, we think of this work as a first step towards more autonomous systems to detect and quantify wood in rivers.Applying the post-process steps in real time is a realistic next step, because after we extract the correction matrices, which is a time-consuming process, the calculation time for PR, RR enhancement is negligible (less than 0.001s/piece).Moreover, over recent years, automatic visual recognition tasks have progressed very importantly with the advances in machine learning techniques and especially Deep Convolutional Neural Networks (DCNNs) that are now able to answer complex problems in real time.However, our context is very challenging for this class of solution, since wood objects have a highly variable shape, and they are feature in very noisy environments and a high variety of lighting conditions.Most training techniques are supervised, meaning that to train an effective DCNN to solve this problem, we would require an extensive annotated dataset.The solution presented in this work can be used as a first step towards this solution.It can be used to help human operators to quickly build annotated dataset, by correcting its output rather than annotating from scratch.

Fig 1
Fig 1 Study site at Pont de Chazey: a) Location of the Ain River catchment in France and location of the gauging station, b) camera position and its view angle in yellow, c) overview of the gauging station with the camera installation point, and d) view of the River channel from the camera.e) Daily mean discharge series for monitoring period from 1st to 7th January and in 15th December.
The specific steps followed by the algorithm are shown in a simple flow chart (Fig 2.a).An example image with a wood piece in the middle of the frame is also shown for reference (Fig 2.b).

Fig 2 a )
Fig 2 a) Flowchart of the detection software and b) an example of frame on which these different flowchart steps are applied.
distribution (Fig 3.a).To set the algorithm parameters, pixelwise annotations of wood under all the observed lighting conditions were used to determine the mean () and standard deviation () of wood piece pixel intensity.Applying this algorithm produces a static probability mask (Fig 3.b).From this figure, it is possible to identify the sectors where wood presence is likely, which includes the floating wood piece seen in Fig 2.b, but also includes standing vegetation in the lower part of the image and a shadowed area in the upper left.

Fig 3
Fig 3 Static probability mask, a) Gaussian distribution of light intensity range for a piece of wood, b) employment of probability mask on the sample frame.

Fig 4
Fig 4 Dynamic probability mask, a) updating function (∆  , ) adapted from Ali et al. (2011) and b) employment of probability mask on the sample frame.

Fig 5 a )
Fig 5 a) Object extraction by (i) combining static and dynamic masks and (ii) applying a threshold to retain only high-probability pixels.b) Object tracking as a filter to deal with partly immersed objects and to distinguish between moving objects from static waves.The final step was to characterize each object, which at this point in the process are considered wood objects.Each appears several times in different frames and a procedure is needed to either pick a single representative occurrence or use a statistic tool to analyze multiple occurrences to estimate characterization 6.a) is presented in Fig 6.b in metrics coordinates.The transform matrix is obtained with the help of at least 4 non-colinear points (Fig 6.c blue GCPs (Ground Control Points) acquired with DGPS) from which we know both the relative 2D metric coordinates for a given water level (Fig 6.b blue points), and their corresponding localization within the image(Fig 6.a blue points).To achieve better accuracy, it is advised to acquire additional points and to solve the subsequent over-determined system with the help of a Least Square Regression (LSR).Robust estimators such as RANSAC(Forsyth and Ponce, 2012) can be useful tools to prevent acquisition noise.After identifying the virtual camera position, the perspective transform matrix then becomes parameterized with the water level.Handling the variable water level was performed for each piece of wood, by measuring the relative height between the camera and the water level at the time of detection based on information recorded at the gauging station to which the camera was attached.The transformation matrix on the Ain River at the base flow elevation with the camera as the origin is shown in Fig 6.d.Straight lines near the edges of the image appear curved because the fisheye distortion has been corrected on this image; conversely, a straight line, in reality, is presented without any curvature in the image.

Fig 6
Fig 6 Image rectification, process.The non-colinear GCPs localization within the image (a), and the relative 2D metric coordinates for a given water level (b).The different solid lines represent the successive detection in a set of consecutive frames.(c) 3D view of non-colinear GCPs in metric coordinates.(d) Rectifying transformation matrix on the Ain River at low flow level with camera at (0,0,0).

Fig 7
Fig 7 User interface of (a) detection module and (b) annotation module of automatic detection software.Annotation moduleAs mentioned in Sec. 2, the detection procedure requires the classification of pixels and objects into wood and non-wood categories.To train and validate the automatic detection process, a ground-truth or set of videos with manual annotations are required.Such annotations can be performed using different techniques.For example, objects can be identified with the help of a bounding box or selection of endpoints, as in MacVicar and Piégay(2012); Ghaffarian et al. (2020) andZhang et al. (2021).It is also possible to sample

7
.b, this module contains three main parts: (i) The column on the far left allows the operator to switch to another module (detection, learning, or performance), (ii) the central part consists of a video player with a configuration tab for extracting the data, and (iii) the right part where the tools to generate, create, visualize and save annotations are located.The tools allow rather quick coarse annotation, similar to what was done by MacVicar and Piégay (2012) andBoivin et al. (2015), while still allowing the possibility of finer pixel-scale annotation.The principle of this module is to associate annotations with the frames of a given video.
8.a), bright light at midday with potential for direct light and shadows (Fig 8.b), and low light again in the evening, though different from the morning because the hue is more bluish (Fig 8.c).This effect of the time of day was quantified simply by noting the time of the image, which was marked on the top of each frame of the recorded videos.

Fig 8
Fig 8 Different light conditions during a) morning, b) noon and c) late afternoon, results in different frame roughness's and different detection performances.c) Wood position can highly affect the quality of detection.Pieces that are passing in front of the camera are detected much better than the pieces far from the camera.
seen in Fig 8 which is highlighted in Fig 8.a.In general, the frame roughness increases in windy days or because discharge has a good correlation only with time.As recommended by MacVicar and Piégay (2012), wood lengths were determined on a log base 2 transformation to better compare different classes of floating wood, similar to what is done for sediment sizes.

Fig 9
Fig 9 Correction matrices: a, b, c) wood lengths as a function of the distance from the camera, d, e, f) detection position, g, h, i) flow discharges during the daytime, and j, k, l) light and dark roughness's.The first column shows number of all annotated pieces.Second and third columns show Precision and Recall rates of the software respectively.
automatic detection software is likely to deviate from the manual counts.The length distribution obtained from the manual annotations ( + ) (Fig10.a)was assumed to be the most realistic distribution that can be estimated from the video monitoring technique, and it was therefore used as the benchmark.Also shown are the raw results of the automatic detection software ( + ) and the raw results with the false positives removed ().At this stage, the difference between the  and the  +  lines are the false negatives () that the software has missed.Comparison between the two lines shows that they tend to deviate between 2-3 m.The correlation coefficient between the length distribution of  as one vector and  +  as the other vector was calculated for thresholds varying from 1 cm to 15 m length and 2.5 m length was defined as the optimum threshold length for recall estimation(Fig 10.b).

Fig 10 a )
Fig 10 a) Steps to post-process software automatic detections: (i) raw detections ( +  red line), (ii) Only true positives using the  improvement process ( blue dashed line), and (iii) modeling false negatives (blue line).Operator annotation (green dotted line is used as a benchmark).b) The correlation coefficient between operator