the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Short Communication: Motivation for standardizing and normalizing inter-model comparison of computational landscape evolution models
Nicole M. Gasparini
Katherine R. Barnhart
Adam M. Forte
Abstract. This manuscript is a call to the landscape evolution modeling community to develop benchmarks for model comparison. We illustrate the use of benchmarks in illuminating the strengths and weaknesses of different landscape evolution models (LEMs) that use the stream power process equation (SPPE) to evolve fluvial landscapes. Our examples compare three different modeling environments—CHILD, Landlab, and TTLEM—that apply three different numerical algorithms on three different grid types. We present different methods for comparing the morphology of steady-state and transient landscapes, as well as the time to steady state. We illustrate the impact of time step on model behavior. There are numerous scenarios and model variables that we do not explore, such as model sensitivity to grid resolution and boundary conditions, or processes beyond fluvial incision as modeled by the SPPE. Our examples are not meant to be exhaustive. Rather, they illustrate a subset of best practices and practices that should be avoided. We argue that all LEMs should be tested in systematic ways that illustrate the domain of applicability for each model. A community effort beyond this study is required to develop model scenarios and benchmarks across different types of LEMs.
- Preprint
(6421 KB) - Metadata XML
- BibTeX
- EndNote
Nicole M. Gasparini et al.
Status: final response (author comments only)
-
CC1: 'Comment on esurf-2023-17', John Armitage, 21 Jun 2023
I have a short comment on this short manuscript:
While the idea of a community benchmark for landscape evolution models is a worthy goal I think this manuscript rather demonstrates some straightforward numerical modelling principles, e.g., (1) make sure your time step is small enough to have numerically accurate results, (2) take care with implicit algorithms as even though they are numerically stable you still need to keep the time step sufficiently small to get an accurate result, (3) the results of a model that uses the stream power law and calculates upstream area using a numerical estimate that is a function of the discritization of model space (grid resolution and mesh type, trianglular, veronoi, regular) will vary in where erosion localises and the eventual drainage patterns.
I have a strong susspision the models might localise on numerical errors. A good test would be to run the codes on different CPUs, or on a different number of CPUs and see if the results are identical. If they are not, then there is a problem with them.
I agree completely with the aims of this manuscript but I think there is a more fundamental problem that we must overcome first: if model results are a function of numerical choices, such as grid resolution, then this must be resolved first. Once the models are demonstrated to converge, then it is time to explore natural landscapes or laboratory expirements that can be used for a benchmark to explore how the mathematical models give different answers.
I think this manuscript would benifit from a careful consideration over the brief model comparison, and to decide which parts can be expanded to some future benchmark and which aspects are more a demonstration of how to best verify your model is robust before exploring a hyothesis.
I hope this comment is useful. If it isn't then please ignore it.
John Armitage, IFP Energies Nouvelles
Citation: https://doi.org/10.5194/esurf-2023-17-CC1 -
RC1: 'Comment on esurf-2023-17', Anonymous Referee #1, 06 Jul 2023
The comment was uploaded in the form of a supplement: https://esurf.copernicus.org/preprints/esurf-2023-17/esurf-2023-17-RC1-supplement.pdf
-
RC2: 'Comment on esurf-2023-17', Andrew Wickert, 15 Jul 2023
Dear Dr. Schwanghart (editor) and Dr. Gasparini and co-authors;
Thank you for this preliminary and motivational study towards benchmarking of landscape evolution models. I agree wholeheartedly with your statement that we need to have a more formalized and community-organized mechanism to share knowledge and evaluate models and their utility. Indeed, as you pointed out, even this seemingly simple implementation of a stream-power-based erosion rule produced divergent results among computational frameworks that are geomorphic household names.
Unfortunately, my review will be a bit short: this paper is absolutely *not* a short contribution, and I did not have the time budgeted for something equal to or longer than a standard paper. Therefore, as opposed to my usual more exhaustive list of everything from small ideas to typographical or grammatical details, I will just be giving some pointers on the major takeaways. Towards this, my first recommendation would be to reframe the paper (not "short") and check for these small details towards a final draft.
The paper is written with an informal-to-conversational tone (at least, among the distribution of academic articles that I have read), and while this is not as standard, this is something I appreciate: Our job is to interpret the past and predict the future. Papers are a way of communicating with one another. We need not (always?) enter into some different form of sharing information to bring this forward, and I feel that this is particularly true for an article such as this, which is really a call to the community.
With this, I leave some comments below.
Andy Wickert
15.07.2023
Potsdam
Major points:Overall 1: Please carefully proofread the full manuscript for proper word use and grammar.
Overall 2: I believe that you should be able to tighten your argument considerably with a bit of thought. To me, this seems to be: (1) We haven't really done model intercomparisons. (2) Even an ad-hoc attempt with something super simple shows that we get different answers. (3) We should do this to be able to support a larger community with confidence. Is this true? If so, maybe some clarity from the start would be good.Associated with this, the paper overall reads quite long, and looks like a community effort in which multiple people added ideas and no one has really wanted to muck with each other's writing enough to really tighten it to the core messages within sections and paragraphs. This is especially true outside of the sections that are simply discussin model results. Of course, there is a question of optimization involved here as well: How much time do you want to spend building a really really nice bit of tight text vs. how many other good things can you accomplish with the same time -- and how many people will actually read this paper fully through, beyond citing it for the general motivations? I am not the right person to make this decision, and so simply hand over my thoughts.
Section 6.1 and Figure 2.
Landlab looks like what one would expect from the boundary conditions. By this, I mean that the approximately "X" shaped ridge set is a stable configuration that maximizes the distance of the high topography from the dropping Dirichlet boundary conditions. On the other hand, TTLEM shows a more creative distribution of basins that does not satisfy this basic expectation. Although I recognize that your goal is to motivate intercomparisons, the TTLEM result defies my (I think, correct) expectations so thoroughly that the discussion surrounding it feels a bit lacking. Within your scope, I think spending a bit of space addressing this could relate to just this question: when models don't match intuition for the shape of the solution. In this case, could there be something semi-hidden that only such detailed studies are likely to pick out and help us fix?I do wonder about your ideas with sink-filling and drainage routing: Might it be possible that TTLEM has greater sensitivity to initial conditions? Maybe this could be tested without too much extra work by repeating the runs with a different randomized topography -- or even more directly, but rotating or mirroring the current initial topography. If Landlab remains visibly the same and TTLEM changes, then the inital conditions seem to be what is causing the change. This could be an important finding (and again, motivation for your study), considering that the impact of initial conditions is a big question in landscape evolution. [See, e.g., Kwang & Parker, 2019; Perron & Fagherazzi, 2012]
Regardless of your method to address these questions, and your decisions about the relevance of this comapred to your larger point (we need model intercomparison!), I wanted to bring them up.
Section 6.2.
I am somewhat less convinced that this section is either entirely so useful (for the forward-difference case) or that it is really fundamentally a model-intercomparison question (well, yes, it is, strictly, and yet...).
Taking time steps that are too large and thereby breaking the forward-difference method is a fundamental known within numerical methods and feels useful primarily to maintain symmetry with tests of the implicit method.
Showing the reduced accuracy in an implicit method is also well established within numerical methods, but perhaps could be more useful for a geomorphological community with an uneven background in applied mathematics and computation. ("I can run this on my 1997 laptop! Time steps of one BILLION years, and the solution is for sure at steady state!") It is easy to become enthused with implicit-method stability, and this acts as a caution.
However -- and to my first point -- numerical stability and accuracy are themes that you are investigating in your study via these models, but to me, seem like things that should already be properly managed before any intercomparison takes place.
Therefore, I wonder if you might want to do something to the scope of your paper (i.e., that you also inclue demonstrations of breaking the Courant condition, as an illustrative example of how different models misbehave differently when pushed beyond reasonable limits) or focus on topics that are more germane to these models rather than to the principles of numerical methods. I suppose that I might go for the former, given that you've already done the work and that it seems useful. Therefore, my very soft recommendation would be for a reframing, with consideration of what it is that you are aiming to test and what is really happening numerically and mathematically, and perhaps, whether a model that is operated in a mode in which it is no longer valid is still a model. (I would say maybe-to-no, but this might be controversial.)
Overall 4:
This paper provides important motivations. It isn't perfect, but it also seems valuable to push real solid science forward. Personally, I have long worried that geomorphic modelers have been all reinventing Step 0 such that we are wildly inefficient at building excellence. I also think that this paper can help to turn our collective low-integer steps into something that is a real service to the community and beyond. And for this (and more), I fully support its intent and think that it contains a message that we collectively need to read and hear.
REFERENCESKwang, J. S., & Parker, G. (2019). Extreme memory of initial conditions in numerical landscape evolution models. Geophysical Research Letters, 46(12), 6563-6573.
Perron, J. T., & Fagherazzi, S. (2012). The legacy of initial conditions in landscape evolution. Earth Surface Processes and Landforms, 37(1), 52-63.
Citation: https://doi.org/10.5194/esurf-2023-17-RC2 -
CC2: 'Comment on esurf-2023-17', Risa Madoff, 22 Jul 2023
I commend Gasparini et al. for initiating an important topic that I hope will have a platform to continue. Starting a discussion with little or no precedents is always difficult. The Short Communication inviting a call for examination and assessment of LEMs through inter-model comparisons raises many issues with underlying considerations that I am very interested in, not as a software developer, but as a geomorphologist who uses numerical modeling. I am less surprised by the results and more curious about the diversity of model types. I have been assuming that a fundamental point of all models was to describe or represent something real, often used to predict future or past patterns, or at least hypothesize about and test them. Given your call to action, which also suggested possibility of proposals to fund meetings of the minds of “super users” of LEMs, to establish benchmarks, I was compelled to wonder about questions that likely fall outside the scope of your communication but which an exploration into model assessment seems to raise. I outline three areas and do so because I am interested in how modeling should be taught to novices and presented to community members who do not model but often rely on models. I came up with a few more items that might be considered as someone who is actually extremely interested in modeling landscape processes: 1) the difference between LEM(ing) as you have defined it and other kinds of numerical modeling, such as empirical modeling, and the possibility, not discussed, of such different models informing each other; 2) teaching, training, and community engagement with regard to LEMs and numerical modeling; and 3) what the community guiding the science of model development looks like, in terms of the variety of approaches to science and what is produced.
1) How might empirical models, or other models, factor into a deterministic LEM(s)? Processes within a landscape operate on multiple temporal and spatial scales and do so from boulder weathering to sediment transport on land and in streams to external drivers of climate and tectonics that also bring variability in time-scale with them. Scale variance is acknowledged as something beyond the scope of the paper, but when a LEM is defined by a particular landscape(s), what does the potential for variance mean for benchmarks you would like to have agreement on? For example, the effect of a transient erodibility factor can depend on the magnitude and frequency of climate transience of a region (Madoff and Putkonen, 2022). Such physical realities and even insights, I would contend, should be used to inform LEMs. This, then, suggests the possibility of model comparison for the purpose of validating how a model approaches the physical world it is generated to represent. Would such comparisons be a part of inter-model discussion? Sure, this might be beyond the scope your proposal but when and where should these broader discussions occur? Is it practical to wait until inter-model benchmarks are decided on before venturing into these other discussions that might well inform your proposal? Who would lead these calls to action? Would only the super-users of the kinds of models you refer to lead them? I often think about who is permitted to decide or influence which questions are worth addressing in landscape analyses and therefore which modeling tools are promoted to guide the science of landscape analysis.
2) Perhaps another scenario to consider would be a call for teaching novices a certain mindset: a way of thinking about comparing process models, data sources of process models, and LEMs compared with numerical models based on observed data vs long-term averages or parameters optimized by modeling. Granted, basic coding and modeling skills are needed before someone can engage in a discussion about models and their modeling and care about deep complex questions. But I think there may be some who would be more drawn and motivated to dig more deeply into the modeling process if this multifaceted approach was introduced from the start. Further, are there not fundamental questions that all modeling endeavors should address, such as how they should or could be tested against the diversity of physical landscapes or be explicit as to what they are being used for when they do not?
3) The science of numerical modeling fascinates me, and there is much more about LEM(ing) that deserves attention not typically received. I am glad you started the discussion and I hope to see it developed further, but I approach it from a different starting point than software development. I would like to see a meta-model user group that considers meta-model(ing) questions to inform topics about LEMs, such as: data management, model comparisons and validations, applications, ethics of model development, and pedagogy, and what mathematical considerations need to be factored into the other topics. I do not think I would be included in the super user group you mentioned but believe there is much worthwhile and useful to explore in a meta-modeling group that I might be able to contribute to or at least be informed by.
Madoff, R.D. & Putkonen, J. (2022). Global variations in regional degradation rates since the Last Glacial Maximum mapped through time and space. Quaternary Research, 109, 128-140.
Citation: https://doi.org/10.5194/esurf-2023-17-CC2 -
AC1: 'Comment on esurf-2023-17', Nicole Gasparini, 24 Aug 2023
Introduction and Summary
We thank the two reviewers - Dr. Wickert and an anonymous reviewer (R1) - and two community members - Dr. Armitage and Dr. Madoff - for their thoughtful comments on our paper. We are encouraged that community members found the topic intriguing enough to participate in the review process without being asked. This suggests general interest for the core ideas we want to share with our community.
Despite this interest, it seems our paper in its current form did not clearly communicate our ideas as we intended. We included topics that some reviewers thought were unnecessary while excluding topics that other reviewers thought were necessary. To extremely briefly summarize how we interpret each reviewer's comments:
R1 : Reject the paper for many reasons including that the paper covers too many topics but not the ones that this reviewer thinks are most important. R1 also argues that what we suggest should become community practice is already LEM community practice (as discussed below, we disagree with R1’s assessment of what is already LEM community practice). R1 suggests that we present mispractices that are not currently made in the LEM community.
Dr. Wickert : The paper is too long and yet doesn't give enough explanation of some of the surprising results. Similarly to R1, Dr. Wickert argues that some of what we say should be standard practice is already known by our community. However, Dr. Wickert likes the idea of a call to the community to develop benchmarks, just not our current form of it.
Dr. Armitage : There are many issues with LEMs that need to be discussed. If he were to prioritize issues with LEMs for benchmarking, Dr. Armitage might focus on a different set of issues than we highlighted (platform-specific numerical errors, grid resolution effects). However, Dr. Armitage seems to generally like the idea of a call to the community for clear benchmarks and user protocols.
Dr. Madoff : Although Dr. Madoff focuses on an even different set of issues than Dr. Armitage or we focused on, Dr. Madoff also seems to support an open discussion on LEM benchmarking. Dr. Madoff went even further to such other topics beyond benchmarking that the LEM community should address.
We find the fact that all the reviewers, in one way or another, liked the idea of community standards to indicate that this is a topic that needs to be discussed. However, our presentation and example choices did not impress any of the reviewers. As described below, our plan to deal with these revisions will effectively remove most of the components that both the formal reviewers and community commenters took issue with, and such, a formal rebuttal of the main points summarized above is not really warranted. Instead, we describe below our plan for a revised submission.
Moving forward on this submission
We accept that we may have tried to do both too much and not enough with this manuscript. Ultimately we would like to write a paper that will compel our community to agree on benchmarks and best practices for developing and using LEMs. However, in trying to motivate that, we have not presented what any of our reviewers think is a motivating case. In some ways that was part of our point - we (the authors) should not decide how to benchmark but the community should. However, we were not successful at getting our point across.
We would like to revise this submission by cutting out most of the manuscript and presenting only the results on time to steady state. This would allow us to resubmit something that better fits the description of a “short communication.” It would also allow us to highlight these results, which we think are extremely important but were likely overlooked in our original submission. Based on discussions with community members outside of this review process, we think that the time to steady state results will be interesting and useful for many who use LEMs. Focusing on these results would allow us to more fully discuss why these results leave some of our previous assumptions in LEM studies on shaky ground. This would also open a discussion on the scope of what LEMs can and cannot do.
Notably, the time-to-steady-state comparison is a concrete example that motivates the need for the type of LEM benchmarking and intercomparison we hoped to motivate with our initial contribution. By focusing the revised contribution on this portion of our initial paper, we expect we will be able to document one example of why a community effort around benchmarking would be valuable.
Our general sense from R1 is that "mistakes" like we illustrated are not generally made (e.g., use of timesteps that are longer than stable under a courant condition). We disagree with this, but at the same time, we do not want, or feel it would be productive, to write a paper that catalogs the mistakes made by others in published work as motivation. However, to effectively rebut one of the primary criticisms of R1 would essentially require us to do such a cataloging. By writing a short contribution that is more focused on one specific issue, i.e., the variability in the time to steady-state in our experiments, we believe that we can use this as an illustrative example of the types of mistakes that can be made, explain how such mistakes fit in the context of the literature (without criticizing previous work), and use our own "mistake" as motivation for a commentary and call to the community on LEM benchmarking and best use. This new strategy will also allow us to more fully explore the dynamics of the variability in the time to steady-state observed in our experiments. As part of this revision, we would also then remove much of the content that R1 especially found superfluous, i.e., reviewing and definition of terms, etc.
Finally, we would like to thank the editors - Dr. Wolfgang Schwanghart and Dr. Andreas Lang - for helping us navigate the ESURF submission and review process. We recognize their sustained voluntary contributions to ESURF and our community.
Citation: https://doi.org/10.5194/esurf-2023-17-AC1
Nicole M. Gasparini et al.
Nicole M. Gasparini et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
566 | 170 | 17 | 753 | 7 | 8 |
- HTML: 566
- PDF: 170
- XML: 17
- Total: 753
- BibTeX: 7
- EndNote: 8
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1