Comment on esurf-2021-102

This paper presents an intriguing and likely novel data set, with multiple repeated high-resolution scans of a vegetated floodplain using numerous different cutting-edge techniques to assess the vegetation structure and therefore roughness. Vegetation classifications near a highly mobile river reach are performed using machine-learning techniques that leverage modern algorithms and computing power. However, despite the numerous data sets presented here, the manuscript does not yet sufficiently justify how it represents a substantial contribution to scientific progress.

Dynam. Discuss., https://doi.org/10. 5194/esurf-20215194/esurf- -102-RC2, 2022 This paper presents an intriguing and likely novel data set, with multiple repeated highresolution scans of a vegetated floodplain using numerous different cutting-edge techniques to assess the vegetation structure and therefore roughness. Vegetation classifications near a highly mobile river reach are performed using machine-learning techniques that leverage modern algorithms and computing power. However, despite the numerous data sets presented here, the manuscript does not yet sufficiently justify how it represents a substantial contribution to scientific progress.
Most obviously, the paper claims to be a 4D (3D space & time) comparison, but it falls short of this intent. For one thing, remote sensing data are processed to provide static 2D (rather than time-varying 3D) maps of vegetation guild coverage (e.g., Figure 4A). For another, although geomorphic channel change is characterized, it does not appear that temporal changes in vegetation are considered. There are undoubtedly changes in vegetation phenology (flowers vs. no flowers, leaves vs. no leaves) and morphology (herbaceous shoots vs. dry stalks) over time in vegetated regions, as well as growth of new plants and shoots, but this is not featured; instead, discussions of change over time focus on unvegetated fluvial regions. In fact, it is unclear to what extent 2D maps, let alone the location and characteristics of individual plants, are consistent from one time to another. Given the focus of the paper on changes in vegetated regions, it is a major oversight to omit a detailed discussion of differences (due to changes or uncertainty or both) between repeat scans in regions that remained vegetated (no avulsions etc.).
Second, the manuscript examines two drastically different spatial/temporal scales of interest, with only loose connections between them. One scale is the decadal scale of channel change and avulsion (Sections 3.1, 4.1); the other is the seasonal/annual scale of individual plant growth and characterization (Sections 3.2, 4.2). Given the extensive discussion of the hydrodynamic impacts of vegetation that was presented in the introduction, as well as the highly resolved tree-level observations possible with the remote sensing detail, the manual classification of the floodplain into only two vegetation classes (large vegetation vs. not large vegetation) for erosion assessment is massively simplistic. The spatially explicit location of erosion and new channels during the study year are presented (Sections 3.3, 4.3), but these locations are compared only anecdotally in Figure 11 to the various types of vegetation that were identified throughout the study reach. Without some attempt to quantitatively tie these various types of data together, the paper lacks cohesion, as well as misses its opportunity to evaluate the geomorphic importance of its classification scheme as well as controls on channel change.
Third, although the focus of much of the article is on traits-based classification of vegetation, no validation data are presented for this site or even these species. Without some sort of independent assessment (ideally from field observations), it is difficult to know to what extent the categorization presented herein is appropriate. Previous studies (e.g., Butterfield et al. 2020) have included ground truthing. The algorithms that were used were developed for different species in different ecological settings (e.g., Scots pines in Finland, beech and oak in the Netherlands), so it is difficult to assess site-specific validation, especially for application to non-woody grasses and herbaceous plants. An error/misclassification analysis based on field data (which may have been obtained -cf. Line 407) would greatly enhance the vegetation classification portion of the study.

Specific Comments
Line 42ff: The introduction focuses on the classification of vegetation into a relatively new framework developed to characterize eco-geomorphic relationships. Though this is an intriguing question, this narrow focus likely represents a missed opportunity to provide enough detail that ecologists and biologists could appreciate and use the results. An expansion of the idea of "traits-based classifications" to include other ecological goals may make this paper much more useful to a broader group of readers.
Line 55ff: The section on "The importance of vegetation" is focused exclusively on the role of aboveground vegetation in affecting river corridors. Surely the roots (belowground portions) are important as well. Although these portions obviously cannot easily be measured by remote sensing, their known contributions should at least be summarized.
Line 164ff: An important aspect of vegetation reconfiguration and drag is whether the stem is woody or not. A discussion of this aspect (and relevant citations) should be added to this section on functional traits. Line 170ff: To completement the extensive discussion of the impact of vegetation on hydrodynamics, the background information on the impact of vegetation on scour should be increased, especially at the scale of the bar or channel, which is what is measured in this study. This crucial paragraph does not contain any in-text citations, despite a wealth of experimental and field studies on the topic. This paragraph should be expanded and should include specific citations to previous studies. Line 176ff: The subsection titled "Remote Sensing of River Corridor Vegetation" is quite short and does not do justice to previous attempts to remotely sense vegetation that may be present in river corridors. A key omission is a description of previous efforts to use UAVs and TLS for remote sensing of vegetated regions, especially their methods (i.e., indices/proxies used, ground-control points, SfM, etc.) and successes and failures. Here are a few papers that might be relevant: Line 246: Explain in text exactly how a mix of spectral bands were used to highlight channel position of banks under trees, or provide a citation for this method.
Line 254: Specify in text whether the same centerline/transects were used for each digitized year or whether these changed position each time, and, if the latter, how this horizontal change affected assessments of channel width.
Line 255: Specify whether the SCE was calculated separately for the left and right bank.
Line 260: Specify in text whether the woodland areas needed to be near the channel. Also clarify whether there were changes in the distribution of large vegetation over time and, if so, how that affected the classification: i.e., if vegetation grew in a region over time, was it classified as vegetated, or not, or did its classification change over time? Somewhere (Figure 1? Figure 4? Figure 8?) a map of these classifications should be shown.
Line 263ff: "the analysis was repeated for changes…" The rest of this paragraph is unclear.
Be specific about what happened: what does removing a transect mean, or using a separate baseline? What does baseline mean in this context? Without making this point clear, the assessment that "the impact on the results from channel switching can be isolated and removed" is not supported.
Line 268: Specify what statistical comparison was used. Either a t-test or a nonparametric method should be used to evaluate differences between groups.
Line 310ff: Specify how/whether analysis was performed for each of the flights shown in Table 1. Were data sets projected to a common reference frame/grid, or did they differ? Were each of the five identified steps performed independently for each data type (UAV-LS vs. UAV-MS vs. TLS), or did some steps involve the comparison of multiple data types? Were repeat scans of the same area processed completely independently, or (for example) was the spatial location of a vegetation point cloud (i.e., specific plant) identified at one time used to identify a point cloud location at a different time? Did all analyses require classification into individual plants, or were some vegetation types best classified using bulk metrics (canopy height, density, etc.)? Answers to these basic questions are important for interpretation of the rest of section 3.3.
Line 323: "leaves and flowering parts were removed from the clouds…" How were these items identified, and was it performed only for TLS or all studies?
Line 325: "Any statistical outliers were detected, removing points 2.5 standard deviations and above the mean distance between points...." How were distances between points calculated, and what does it mean to remove a point that is above a mean distance?
Line 326: "…a dataset consisting of 37 herbaceous plants." There were presumably many more than 37 herbaceous plants within the study site. How were these 37 selected? Was it the same plants for all repeat studies, or did they change over time?
Lines 341-342: "Shrubs and grasses who structure could not be fully resolved from the UAV-LS or TLS data were not analyzed for traits extraction." This seems like a major hole in the current analysis, which set out to characterize all types of vegetation.
Line 394ff: It is unclear for which/how many plants/scans the PCA analyses were performed, and whether these were the same among different methods (TLS vs. UAV-LS vs. UAV-MS). Clarify in text.
Line 407: "field observations" -explain how and when these were performed.
Line 445: "Due to the limited number of samples being used, …" An error analysis is important. If not enough samples were used to enable even an internal consistency check, then the number of samples should be increased.
Line 464ff: Explain how SfM and UAV-LS data sets were combined. Was a single DEM produced for each observation date? Etc. Line 482 Table 3: Provide statistical significance for differences between each classification. Remove bonus "s" from caption. Specify units for data shown in table.
Line 495 Figure 4: In Panels B and C, bars should show some sort of uncertainty, stemming perhaps from the horizontal accuracy of transects or bank determination.
Lines 518-519: "Overall, model repeats appear to have good agreement with one another, and provide a basis for separating out vegetation with similar hydraulic functional traits." Do these model repeats refer to repeated classification of the same image, or comparison between images? Add this information to methods, and also explain here.
Line 521 Table 4: If six vegetation classes were used, then this table should show all six vegetation classes, as well as a statistical analysis of whether values are the same among different classes. In caption, specify meaning of all initialisms.
Line 556: In text, explain whether any attempts were made to characterize the understory vegetation. Unclear as written.
Line 569: "….many areas being classified as expected." On what basis were these expectations made or assessed? Line 590 Figure 8: For which time period was this classification produced? If only produced once for the entire period of study, then how much change was observed during the study?
Line 633ff: "It is not possible to isolate a single variable that may cause such switches to take place, such as particular flow thresholds, baseline conditions, vegetation, or soil characteristics." It does not appear that any detailed, let alone quantitative, analysis of any of these factors was performed; without that analysis, it does not make sense to comment that no such factor was identified.
Line 654: "…especially once trade-offs in terms of time and spatial extent are accounted for." This is an intriguing idea; would be nice to see it expanded. Lines 739-741: "The largest areas of change appear to be within the reaches absent of large vegetation, with the stable patches aligning well with those identified in the decadal analysis." First, as noted above, the polygons showing the spatial location of large vegetation are not shown anywhere in the manuscript. Second, comparing Figures 4A and  8, it appears that the downstream mobile bend was located within a reach with large vegetation. Be more specific (and ideally more quantitative) with how documenting how the results were used to reach this conclusion.
Line 758 Figure 11: Remove erosion/deposition scale bar from figure since apparently not used. In caption, explain how vegetation stability was assessed.
Lines 765-766: "…there is clear evidence of light green patches where dark green patches may be expected had the vegetations [sic -should be vegetation's] stabilizing effect not been present." This is an intriguing idea, but no details are provided for why dark green patches might be expected in these regions. Explain why this is reasonable.
Line 783ff: Acquiring and processing UAV or TLS data represents a major investment in equipment and technician training. The presented data set (multiple repeat flyovers with different techniques) is much more detailed than what would be available for most (all?) other sites. Therefore, it would be extremely helpful to have authors leverage the current data to assess best practices and minimum collection needs that should be acquired in other settings. For example, if only one UAV overpass were possible, at what time of year should be it be flown, and using which technique (LS, MS, or RGB)? Would the answer depend on the type of vegetation characterized and, if so, how? This is a huge missed opportunity for this data set.

Technical Comments
The text is generally well written and clear, though there are several glaring exceptions.
Line 180 (and elsewhere in text): The use of nested parentheses is odd and confusing. Considering using a semi-colon within a single set of parentheses.
Line 218: The word "exemplar" suggests that the study site is somehow better than other sites. Either explain in text why it is so outstanding and uniquely qualified for this type of study, or (if you instead want to suggest that the same methods could work elsewhere) then use a different term.
Line 226: The phrase "starting from the earliest gauge record" is confusing and unclear. Rewrite. Would also be nice to specify that the gauge period of record was from 2002 to 2021.
Line 229 Figure 1: Label each subpanel (there are at least 4) with a letter for easy reference in text. In the map, delineate the area used for the decadal analysis and shown in Figure 4A. Increase font size of all text in the discharge plot. Make sure that the exceedance level for 1.48 m in legend is consistent with text (is it 99% or 99.9% exceedance?).
Line 270: "To investigate the morphological process of avulsions…" The rest of this paragraph has a totally different topic than the first two sentences; move to a separate paragraph. In addition, this section discusses UAV flood imagery, which has not yet been presented; it would be helpful to move this section until after the UAV images have been discussed.
Line 277: It appears from Table 1 that one TLS survey was used. Change "TLS surveys" to be singular.
Line 298: Spell out abbreviation GCP Line 301, 318, etc.: Make sure that the entire methods section is in the past tense to describe what you did.