A Versatile, Linear Complexity Algorithm for Flow Routing in Topographies with Depressions

. We present a new algorithm for solving the common problem of ﬂow trapped in closed depressions within digital elevation models, as encountered in many applications relying on ﬂow routing. Unlike other approaches (e.g., the so-called "Priority-Flood" depression ﬁlling algorithm), this solution is based on the explicit computation of the ﬂow paths both within and across the depressions through the construction of a graph connecting together all adjacent drainage basins. Although this represents many operations, a linear time-complexity can be reached for the whole computation, making it very efﬁcient. 5 Compared to the most optimized solutions proposed so far, we show that this algorithm of ﬂow path enforcement yields the best performance when used in landscape evolution models. Besides its efﬁciency, our proposed method has also the advantage of letting the user choose among different strategies of ﬂow path enforcement within the depressions (i.e., ﬁlling vs. carving). Furthermore, the computed graph of basins is a generic structure that has the potential to be reused for solving other problems as well , such as the simulation of erosion. This sequential algorithm may be helpful for those who need to, e.g., process digital 10 elevation models of moderate size on single computers or run batches of simulations as part of an inference study.


Introduction
Finding flow paths on a topographic surface represented as a Digital Elevation Model (DEM) is a very common task that is required by many applications in domains such as hydrology, geomorphometry, soil erosion and landscape evolution modeling, and for which various algorithms have been proposed either for gridded DEMs (e.g., O'Callaghan and Mark, 1984;Jenson and Domingue, 1988;Quinn et al., 1991;Tarboton, 1997) or unstructured meshes (e.g., Jones et al., 1990;Banninger, 2007).
Closed depressions may arise in DEMs because they are real topographic features or result from interpolation error during DEM generation or its lack of resolution.These spurious local minima need to be resolved because they disrupt flow routing, produce hydrologically unrealistic results or introduce artificial singularities that may result from a sudden, unrealistic jump 1 is usually considered as more realistic, especially under temperate or humid climates.
Although not having a linear time complexity, the most recent algorithms of depression removal -e.g., the "Priority-Flood" algorithm and its variants (Barnes et al., 2014a;Zhou et al., 2016;Wei et al., 2018) -have been optimized so that they can be used efficiently on large datasets.To increase performance for very large datasets, further optimization efforts have been 5 focused primarily on rather complex, parallel variants of these algorithms (Barnes, 2016;Zhou et al., 2017).
Yet, in some applications flow path enforcement still remains the main bottleneck.This is for example the case in many Landscape Evolution Models (LEMs) simulating an evolving topography (see Tucker and Hancock, 2010, for a review) and that rely on flow routing to compute erosion rates.To produce realistic results, flow path enforcement is often applied many times, i.e., at each simulation time step (Figure 1), even when this eventually becomes irrelevant as the modelled erosional processes 10 usually tend to remove depressions rather than deepen or add new ones (Braun and Willett, 2013).Furthermore, LEMs are also used as forward models in sensitivity analyses and/or inferences on the parameters that control erosional processes, which often require running a large number of models to adequately explore the parameter space.Parallel flow routing and hydrological correction algorithms don't help much here, as grid-search and/or sampling methods (e.g., Sambridge, 1999) are generally easier to implement and more effective to execute in parallel.Highly optimized, sequential algorithms are still needed in this case.
We have developed a new method of flow enforcement that is based on the explicit building of a graph of drainage basins (possibly encompassing depressions) and the computation of the flow paths both within inner and across those basins.This idea was first introduced in a Computer Graphics implementation of the Stream Power Law (Cordonnier et al., 2016), but with a sub-optimal complexity.Although this approach may appear naive at first glance, we have improved it by using fast algorithms of linear complexity at each step of the procedure, which now makes the whole computation very efficient.Not only this method enable the use of a wide range of techniques of flow enforcement within the closed depressions (e.g., depression filling, channel carving or more advanced techniques); but it also provides generic data structures that could potentially be reused for solving other problems like modeling the behavior of erosion/deposition processes within those depressions.
After a detailed presentation of the different steps of the method, we will show in the sections below through some results how our algorithm behaves and performs compared to existing solutions of flow path enforcement.We will finally discuss the assets and limitations of our method, with some focus on landscape evolution modeling applications.

Algorithm
The input of the algorithm is a topography T = (N , E), where N is a set of nodes and E is a set of edges that link pairs of neighbor nodes.A node n is given a horizontal position p n and a vertical elevation z n .A topography may for example result from a triangulation or correspond to a regular grid of 4-connectivity (i.e, four neighbors per nodes) or 8-connectivity (i.e., also including diagonal neighbors).We follow the conventions of Braun and Willett (2013) to define flow paths on the topography: each node n is given (1) a single flow receiver, rcv(n), which corresponds to the one of its (strictly) downslope neighbors having the steepest slope, and (2) a set of flow donors, Donors(n), which is a subset of the neighbors of n and is defined as Donors(n) = {k ∈ Nb(n), s.t.rcv(k) = n}.We set rcv(n) = ∅ when n is a singular node: it either corresponds to a user-defined boundary node (e.g., a node on the domain boundary) or a local minimum in the topography, i.e., a node inside the domain where all of its neighbors have a higher elevation, and either correspond to a pit or a flat-bottomed depression in Lindsay (2016) terminology.
We propose an algorithm that updates the receivers of a subset of N such that the flow is never trapped in local minima.This algorithm primarily aims at resolving local minima in the context of flow routing and thus leaves the elevation of the nodes unchanged.Hence it breaks the previously introduced definition of a flow receiver: the new receivers assigned by the algorithm generally produce some localized "upslope flow".While this seems unnatural and may not be wanted, the data structures used by the algorithm provide enough information to efficiently address this issue later depending on the application, which is beyond the scope of this work.Still, the algorithm ensures that the updated flow routing stays consistent across the whole topography by respecting the following properties for each node n of the topography: 1.There exists a single boundary node b (not a local minimum), and a unique flow path from n to b.The flow path is defined as the set of nodes P = (n, rcv(n), rcv(rcv(n)), ..., b).
2. This flow path does not contains any cycle, i.e., each node appears only once in P 3. The receivers defining P are chosen such that it satisfies properties 1 and 2, and minimizes the energy E, defined as: As a first approximation, we set E i = z i (the altitude of the node).We will discuss later the special case of nodes under water level.
Our method is essentially based on the computation of a graph connecting adjacent drainage basins.We define a basin as the set of all nodes that flow toward the same singular node (Figure 2 (b)).A basin is either a boundary basin or an inner basin depending on whether the singular node is a boundary node or a local minimum.To better explain the problem that we want to solve, we consider a filled topography as the result of an ideal physical process where a perfectly fluid material has been poured onto an impermeable ground and stabilized at steady state.For a node n, we define as water level (w n ) the elevation of the fluid surface, and as a spill any node s such that ∃d ∈ Donors(s)|w d = w s and z s > w rcv(s) .Note that for a flow routing observing the aforementioned properties, the water level can be computed as w n = max(w rcv(n) , z n ).We also use the term depression from Lindsay (2016) terminology, and we define it with respect to a basin B as a subset of nodes of B under water level, characterized by w n = w rcv(n) .Note that the water level of a boundary basin corresponds to the elevation of its associated boundary node so that it contains no depression.In the case of nested depressions, the water level of a basin may be higher than the elevations of all its nodes, which means that the spill does not always belong to B. The energy of the nodes should be changed to E i = w i , but as described later, one may choose various routing strategies inside the depressions depending on the application.Therefore, we allow any path within depressions by setting E i to zero inside them, and keeping E i = z i elsewhere.
One may break the problem of flow path enforcement down to three, smaller problems: find the spill of each depression, force the flow within the depressions to be routed toward their respective spill, and ensure that the flow through the spills is properly routed into adjacent basins.The proposed algorithm addresses this problem in an explicit manner and can be divided into three main stages: 1. Compute the basins and link all pairs of adjacent basins (Figure 2 (b)).
2. Select only some of the basin links computed at the previous stage and orient them such that the flow is routed consistently across adjacent basins, from inner basins toward the boundary basins (Figure 2 (c)).This operation is not trivial: an optimal selection needs a global knowledge of the whole basin graph.To do so, we use an algorithmic structure: a minimum spanning tree of the basin graph.We propose here two algorithms, a simple one with O(n log n) complexity, and a more complex one with O(n) complexity.
3. Update the flow receivers.Using the links selected at the previous stage, we update (only some of) the receivers to enforce the flow both within and across inner basins so that it is ensured to finally reach the boundary basins and their associated boundary nodes.We propose three different methods (one may choose a method over another depending on 5 the specific problem to solve).
Each of these stage processes the whole DEM, and as such are run only once for a given topography.They are each detailed in the next sections.

Basin computation and linkage
This first stage consists in first assigning a basin identifier, basin_id(n), to each node n of the topography.The identifiers are 10 added sequentially by starting at singular nodes and parsing the nodes using a depth first traversal in the direction of the donors (see appendix A1).The case of flat bottomed depressions does not require any particular treatment: all nodes within flat areas are singular nodes and therefore are each assigned a unique basin identifier.
Then, the links connecting all pairs of adjacent basins are retrieved.To each link also corresponds an edge of the topography, here called a pass, which represents the crossing of lowest elevation between the two basins.For example, the link L = (B 1 , B 2 ) connects the basins B 1 and B 2 and has the corresponding Pass(L) = (n 1 , n 2 ), where n 1 ∈ B 1 and n 2 ∈ B 2 and where the chosen (n 1 , n 2 ) minimizes z pass(L) = max(z n1 , z n2 ).We define a single procedure to retrieve both the links and their pass (see appendix A2).This procedure parses each edge of the topography: if the two nodes of the current edge have each different basin identifiers, then (1) it adds a new link if no link has been already set for these two basins, and (2) it sets or maybe updates the pass of that link with the current edge.
The sets of basins B and the set of retrieved links L both define a basin graph.It is worth noting that, at this stage, the links/passes are not oriented and that only one link/pass is stored for two adjacent basins.The procedure described above runs sequentially and won't add the link (B 2 , B 1 ) if it already added the link (B 1 , B 2 ).

Flow routing across adjacent basins
This second stage tackles the problem of selecting the right subset of links so that we obtain consistent flow paths on the basin graph.To illustrate the proposed solution, let's start from an inner basin.If it is filled with water, the water level will raise until it finds a pass where water eventually pours into another, adjacent basin.The associated link is then called the outflow of the basin.Hence, routing the flow across the basins consists in connecting all outflows such that the resulting flow paths, from inner basins to the boundary basins, have the same properties than stated above, i.e., those paths are unique, contain no cycle and minimize the energy needed to reach the boundary basins.
If we add to the basin graph a virtual basin (let's call it external basin) to which we link all the boundary basins (i.e., the external basin may be viewed as a bucket collecting all the flow that leaves the domain), then we can represent the connected outflows using a specific algorithmic structure: a tree.More specifically, a basin tree is a tree that satisfies the properties above: it actually corresponds to a minimum spanning tree of the basin graph, i.e., a subset of the basin graph resulting from a selection of the links so that the following energy is minimized: Where O is the set of selected links (or the set of outflows) and z pass(L) is the elevation of their respective passes.
We propose two algorithms for the computation of a minimum spanning tree.Kruskal's algorithm is very generic and simple with a log-linear complexity.We also propose a second algorithm, which leverages the planar nature of the basin graph to reach a linear complexity.

Kruskal's algorithm
Kruskal's algorithm (Kruskal, 1956)  on the complexity of our solution.This algorithm uses a Union-Find structure to store and merge equivalence classes of objects (see Algorithm 1).The idea here is to parse all links L ∈ L sorted by increasing elevation z pass(L) , progressively grouping each pair of basins as a larger, virtual one (equivalence class).All subsequent paths between basins within this equivalence class are discarded to prevent loops.The Union-Find data structure has three operations: MakeSet Create an equivalence class containing a single element.
Union Merge two equivalence classes.
Find Get the equivalence class of an object.
The optimal implementation of the Union-Find structure provides a O(α(N )) complexity for these operations, where N is the number of elements in the structure (i.e., here the number of basins) and α is the inverse Ackermann function whose complexity is lower than O(log N ).This however requires first sorting the links by increasing weight (i.e., by the elevation of their respective passes), which finally yields a O(m log m) complexity for the whole computation.

Planar graphs
The problem of finding the minimum spanning tree is known to have a O(N ) complexity when the graph is planar (Mareš, 2002).A planar graph is a graph which can be embedded in a plane such that none of its edges cross another one.The basin graph described in section 2.1 is an example of planar graph.The key intuition behind the algorithm proposed in Mareš (2002) is that at least half of the vertices of a planar graph have at most 8 neighbors.The algorithm is then an adaptation of another classical algorithm, named Boruvka's algorithm (Boruvka, 1926) ; see Algorithm 2 for more details.The O(N ) complexity comes from the fact that at each step of the outer loop, we parse and remove at least half of the nodes of the graph, and As the number of grid nodes n > N , the complexity of this algorithm is bounded by O(n).As demonstrated by Mareš (2002), the limit of 8 neighbors for the selection of a basin in the inner loop is critical in halving the number of edges at each iteration of the outer loop and thus in obtaining a linear time complexity.
Algorithm 2 Planar Boruvka (Mareš, 2002) Initialize the basin tree structure while There remains nodes in the basin graph do while There is a basin B that has less than 8 neighbors do Add the link with the lowest neighboring pass to the basin tree Contract the link (if the link connects basins B and Bp, remove B and append all remaining neighbors of B to Bp)

end while
Clean the graph: bucket sort all links lexicographically to remove parallel edges.
end while A special case may arise when the basin graph is computed from a grid of 8-connectivity.In this case, the edges of the graph may cross each other due to the diagonal connectivity, possibly making the basin graph not perfectly planar.This is, however, rather unlikely as it implies that two passes connecting different basins are found on the two diagonals connecting four adjacent nodes of the grid.Furthermore, this issue does not impact the correctness of the algorithm.Only the linear complexity is not formally proven.Because it is not planar, the case of a 8-connectivity grid would fall in the second category mentioned by Mareš (2002) of graphs closed on graph minor.We have validated this experimentally by randomly computing minors of different sized 8-connected graph.We found an edge density of 4, implying that half of the basins in the basin graph are linked to at most 16 adjacent basins (and not 8 as for planar graphs) at any step of the algorithm.Therefore, we have demonstrated the linear complexity for 8-connected graphs experimentally, although future work is needed to prove this in a formal framework.

Updating flow receivers
The basin tree obtained at the previous stage must be oriented before routing the flow from inner basins to the boundary basins.This is achieved by traversing the tree in the reverse order (i.e., starting from the boundary basins) and labelling the two nodes of each pass, one as n in (incoming flow) and the other one as n out (outgoing flow).Depending on their elevation, either n in or n out is the spill node of the corresponding basin.
The last stage then consists in updating the flow receivers so that any flow entering an inner basin is ensured to leave the basin through n out .The most straightforward solution would be to only update the receiver of each local minimum p so that rcv(p) = n out .Note that if n in has a higher elevation than n out , then two receivers must be updated: rcv(n in ) = n out and rcv(p) = n in .This very simple solution ensures topological continuity of the flow but does not preserve its spatial continuity.
We therefore propose two other, more realistic methods: one similar to depression filling and another similar to depression carving.Note that we use carving and filling as metaphors as our algorithm only changes the flow graph connectivity without altering elevation values.For each of the variants, the donors and stack structures need to be updated to reflect the changes in the receivers.Figure 1 already shows the effect of flow path enforcement vs. no enforcement on the evolution of an escarpement under active erosion processes.A second set of experiments, shown in Figure 4, illustrates the impact that the different strategies of flow receiver updating have on the evolution of the topographic surface under the action of channel erosion.The input synthetic topography is defined on a 100 × 100 regular grid and looks like an inverted pyramid with 45 • regular slopes, forming a single, big depression (Figure 4 (a)).The node at the middle of the top boundary is the only node that is not part of the depression: it has same elevation than the node at the center of the grid and it is defined as a boundary node.A single time step of 5000 years of erosion only (no uplift) is performed using each of the strategies described in section 2.3: Simple correction.In this specific case, the algorithm updates the receivers of only three nodes: (1) one of the neighbors of the boundary node, which here corresponds to the spill of the closed depression, (2) one of the neighbors of the spill that, together with the spill, forms the pass connecting the depression to the boundary node and (3) the local minimum at the bottom of the depression.The new assigned receivers are respectively for (1) the boundary node itself, (2) the spill and (3) the other node of the pass.We can see in Figure 4  Choosing one strategy over another greatly depends on the specific application.For example, the simple correction strategy may be acceptable if one assumes that no erosion could happen in depressions below the water level.However, interrupted drainage area patterns within the depressions may be problematic when used with erosion algorithms like the FastScape model, which uses an implicit time scheme for solving the Stream Power Law but still treats drainage area explicitly, resulting in too slow opening of the closed depressions by erosion.The depression carving or depression filling strategies generally yield better results in the latter case.These two strategies have, however, contrasted behaviors and choosing one or the other will depend on several criteria such as the size (i.e., depth vs. volume) of the depressions.

Performances
To assess the performance of our algorithm, we have run multiple benchmarks under various settings.Although these benchmarks mostly take place in the framework of landscape evolution modeling, they provide results that may be useful in other applications too.Note that for better readability, we present here only the results from benchmarks applied on a fixed grid of 16384 × 16384 nodes.We obtain consistent results for other grid sizes.(c) Execution time measured for local minima resolution at each time step, with either spatially uniform or variable uplift rates (i.e., a relative noise magnitude of either 0% or 20%).The blue curves refer to our algorithm using the O(n) variant for computing the minimum spanning tree.
We have run benchmarks for our algorithm -including the two variants for computing the minimum spanning tree but considering only the depression filling strategy -as well as for three other, state-of-the-art algorithms of local minima resolution, respectively proposed by Barnes et al. (2014a), Zhou et al. (2016) and Wei et al. (2018).All three of those algorithms fill the depressions using improved variants of the "Priority-Flood" algorithm that reduce the number of nodes processed by a priority queue.The Barnes et al. (2014a) variant used here, i.e., "Priority-Flood+ ", is only slightly optimized but has the advantage number of local depressions -and thus the size of the graph -is small enough with respect to the size of the grid, resulting in only a small memory overhead compared to the Priority-Flood variants.

Conclusions
We have presented here a new algorithm for flow path enforcement in topographies with depressions.We have designed this algorithm within the framework of landscape evolution modeling and we have demonstrated through benchmarks that, in this scope, it may greatly improve performance compared to other, state-of-the-art solutions.The potential of this algorithm is, however, not limited to landscape evolution models.On a broader scope, the basin graph and its minimum spanning tree are generic structures that other applications may leverage, possibly through derived quantities such as the water level of each depression.We propose here optimal methods to compute those structures and quantities.Despite the fact that our algorithm is rather complex and requires some work to be properly implemented, it is designed in a composable way such that it is easy to reuse one or several of its components.Adding new features like alternative strategies of flow path enforcement within the depressions would require only little effort, too.
While being versatile, this new algorithm does not provide an universal solution to the problem of flow routing both within and across closed depressions.Perhaps its main limitation is the assumption of single direction flow, i.e., each node has one unique flow receiver.Adding full support for multiple direction flow (MDF) without loosing in performance is rather difficult and would require a fair amount of re-design work at each of the three stages of the algorithm: -Basin computation should take into account divergent flow (basin labels are not unique for grid nodes located on drainage divides).
-It should be theoretically possible to route the outflow from an inner basin into more than one of its adjacent basins (this is currently not possible using a minimum spanning tree computed from the basin graph).
-Alternative, MDF-compliant methods should be implemented to update the flow receivers within the depressions.
Other algorithms like the Priority-Flood don't have that limitation: they act directly on elevation values and don't prevent us from applying MDF flow routing methods on the modified topography.
Another limitation of this algorithm is its sequential implementation.Further work is needed to adapt it so that it could be run on modern, multi-core and/or GPU-based architectures.Still, many use cases would benefit from the current implementation.
These include processing datasets of moderate size on a single computer or running batches of simulations or analysis pipelines, e.g., in the context of sensitivity analyses or inferences on model parameters.
library that has been extracted for reproducibility purpose.Further maintenance and new developments will happen in the fastscapelib's main repository: https://github.com/fastscape-lem/fastscapelib.

Figure 1 .
Figure 1.Simulation of the evolution of an escarpment over 70 time steps of 1, 000 years each and on a 1024 × 256 regular grid, using the FastScape model (Braun and Willett, 2013, see also section 3 below).The grid nodes of the leftmost column (boundary nodes) have a fixed elevation while the initial elevation of the other nodes corresponds to a 500 m high flat surface with small random perturbations.Using the same set of model parameters, simulation results are shown (a) without and (b) with correction of flow routing in local depressions at each time step.As illustrated, flow path disruptions in (a) cause a much slower migration of the escarpment, while the topography predicted in (b)

Figure 2 .
Figure 2. Illustration of the inputs and the first steps of the proposed flow routing algorithm.(a)The input topography is defined on top of a mesh by a set of nodes and edges.A single edge is selected for each node, it connects the node to its flow receiver, i.e., its neighbor with the steepest slope.Nodes with no receiver are local minima (colored in the figure).(b) All the nodes that flow to a same local minimum belong to the same basin.A graph of basins is created by connecting together adjacent basins with links, which are materialized on the mesh by edges representing the passes, i.e., the crossings of lowest elevation that connect each pair of basins (black thick arrows).(c) Some of the links are selected by computing a minimum spanning tree and the corresponding passes are oriented in the direction of the flow across the basins (unidirectional arrows).This structure is then used to update the flow receivers so that the flow reaches the domain boundaries without being interrupted.
is one of the most classical algorithms used for computing minimum spanning trees and is known to have a O(m log m) complexity, where m is the number of links.The number of links being always bounded by a linear function of the number n of nodes in the grid (Euler formula), using this algorithm induces a global upper bound of O(n log n) Sort all links L by increasing weight z pass(L) for each Link L = (B1, B2) do if Find(B1) = Find(B2) then Add (B1, B2) to the Basin Tree Union(B1, B2)

Figure 3 .
Figure 3.Our algorithm of flow enforcement run on a synthetic case.(a) Hillshade and contour plot of the input topography with apparent depressions.(b) basins (areas of unique, random colors) and all passes connecting adjacent basins (thin black lines).(c) Flow directions across the basins (white arrows), as resulting from the computation of a minimum spanning tree from the basin graph, and water level (blue areas) after some erosion is applied on the input topography.

3. 2
Effect of flow path enforcement strategies on eroded topographies

Figure 4 .
Figure 4. Demonstration of the effect of flow path enforcement on erosion, using different strategies of flow receivers "correction" within inner basins.(a) Hillshade and contour plot of the initial topography.(b), (c) and (d) Hillshade and contour plot of the topography obtained after running a single time step of 5, 000 years with channel erosion only (no uplift), with flow receivers updated using each of the different strategies described in section 2.3.Water level is shown in blue, and is computed by propagating the spill elevation while parsing the nodes in the upstream order (based on updated donors).
(b) that this strategy doesn't allow channel erosion to propagate much from the boundary node into the closed depression.In fact, drainage area values close to the boundary node are high enough to trigger erosion but the low values of drainage area in the vicinity (within the depression) prevents further propagation of the erosion wave.Depression carving.Unlike the former strategy and as expected, Figure 4 (c) shows that the depression carving strategy allows erosion to propagate toward the local minimum along a narrow and deep trench.Depression filling.Using the depression filling strategy, flow receivers are updated over a large area of the depression as if the water surface was replaced by a very gentle slope toward the spill.As a result, erosion affects a great part of the modelled domain, with the emergence of a star-like pattern centered at the spill (Figure4 (d)).The number and disposition of the branches of the star are due to the grid 8-connectivity used here.

Figure 5 .
Figure 5. Results from benchmarks assessing the performance of our algorithm for local minima resolution -including both O(n log n) Kurskal's and O(n) Mareš' variants for computing the minimum spanning tree, compared to three other solutions based on variants of the Priority-Flood depression filling algorithm proposed by Barnes et al. (2014a), Zhou et al. (2016) and Wei et al. (2018), respectively.See text for more details about the setup of these benchmarks.(a) Execution time measured for local minima resolution applied once on a synthetic input topography vs. total number of local minima generated in the input topography.(b) Evolution of the number of local minima detected in the topography obtained at each of the first 20 time steps of a FastScape model run.Each curve corresponds to a given magnitude of random perturbations added to produce spatially variable uplift rates (magnitude values are relative to a fixed uplift rate of 5 × 10 −3 m y −1 ).