Parameterization of river incision models requires accounting for environmental heterogeneity: insights from the tropical Andes

. Landscape evolution models can be used to assess the impact of rainfall variability on bedrock river incision over millennial timescales. However, isolating the role of rainfall variability remains difficult in natural environments, in part because environmental controls on river incision such as lithological heterogeneity are poorly constrained. In this study, we explore spatial differences in the rate of bedrock river incision in the Ecuadorian Andes using three different stream power models. A pronounced rainfall gradient due to orographic precipitation and a high lithological heterogeneity enable us to 25 explore the relative roles of either these controls. First, we use an area-based stream power model to scrutinize the role of lithological heterogeneity on river incision rates. We show that lithological heterogeneity is key to predicting spatial patterns of incision rates. Accounting for lithological heterogeneity reveals a non-linear relationship between river steepness, a proxy for river incision, and cosmogenic radio nuclide (CRN) derived denudation rates. Second, we explore this nonlinearity using runoff-based and stochastic-threshold stream power models, combined with a hydrological dataset to calculate spatial and 30 temporal runoff variability. Statistical modelling suggests that the non-linear relationship between river steepness and denudation rates can be attributed to a spatial runoff gradient and incision thresholds. Our findings have two main implications for the overall interpretation of CRN-derived denudation rates and the use of river incision models : (i) applying sophisticated stream power models to explain denudation rates at the landscape scale is only relevant when accounting for the confounding role of environmental factors such as lithology and (ii) spatial patterns in runoff due to orographic precipitation in combination 35 with incision thresholds explain part of the non-linearity between river steepness and CRN-derived denudation rates. Our methodology can be used as a framework to study the coupling between river incision, lithological heterogeneity and climate at regional to continental scales.


Background
Research on how climate variability and tectonic forcing interact to make a landscape evolve over time has long been limited by the lack of techniques that measure denudation rates over sufficiently long time spans (Coulthard and Van de Wiel, 2013). Consequently, the relative role of climate variability and tectonic processes could only be deduced from sediment archives (e.g., Hay et al., 1988). However, whether sediment archives offer reliable proxies remains contested because sediment sources and transfer times to depositional sites are often obscured by stochastic processes that shred environmental signals (Bernhardt et al., 2017;Jerolmack and Paola, 2010;Romans et al., 2016;Sadler, 1981).
Nowadays, cosmogenic radionuclides (CRNs) contained in quartz minerals of river sediments provide an alternative tool for determining catchment-wide denudation rates on a routine basis (Codilean et al., 2018;Harel et al., 2016;Portenga and Bierman, 2011). In sufficiently large catchments, detrital CRN-derived denudation rates (E CRN ) integrate over timescales that average out the episodic nature of sediment supply . Hence, benchmark or natural denudation rates can be calculated for disturbed as well as pristine environments (Reusser et al., 2015;Safran et al., 2005;Schaller et al., 2001;Vanacker et al., 2007).
Catchment-wide denudation rates have been found to correlate with a range of topographic metrics including basin relief, average basin gradient and elevation (Abbühl et al., 2011;Kober et al., 2007;Safran et al., 2005;Schaller et al., 2001). However, in tectonically active regimes, hillslopes tend to evolve towards a critical threshold gradient, which is controlled by mechanical rock properties (Anderson, 1994;Roering et al., 1999;Schmidt and Montgomery, 1995). Once slopes approach this critical gradient, mass wasting becomes the dominant process controlling hillslope response to changing base levels (Burbank et al., 1996). In such a configuration, hillslope gradients are no longer an indication of denudation rates (Binnie et al., 2007;Korup et al., 2007;Montgomery and Brandon, 2002), and hillslope metrics (Hurst et al., 2012) often require high-resolution topographic data that are not widely available.
Contrary to hillslope gradients, rivers and river longitudinal profiles are more sensitive to changes in erosion rates . Bedrock rivers in mountainous regions mediate the interplay between uplift and erosion (Whipple and Tucker, 1999;Wobus et al., 2006). They incise into bedrock and efficiently convey sediments, thus setting the base level for hillslopes and controlling the evacuation of hillslope-derived sediment. Quantifying the spatial patterns of natural denudation rates in tectonically active regions therefore requires detailed knowledge of the processes driving fluvial incision (Armitage et al., 2018;Castelltort et al., 2012;Finnegan et al., 2008;Gasparini and Whipple, 2014;Goren, 2016;Scherler et al., 2017;Tucker and Bras, 2000).
River morphological indices, such as channel steepness (k sn ) (Wobus et al., 2006), have successfully been applied as a predictor for catchment denudation and thus E CRN by Safran et al. (2005) and many others, commonly identifying a monotonically increasing relationship between channel steepness (k sn ) (Wobus et al., 2006) and E CRN (Cyr et al., 2010;DiBiase et al., 2010;Mandal et al., 2015;Ouimet et al., 2009;Safran et al., 2005;Vanacker et al., 2015). Several authors identified a nonlinear relationship between k sn and E CRN in both regional (e.g., DiBiase et al., 2010;Ouimet et al., 2009;Scherler et al., 2014;Vanacker et al., 2015) and global compilation studies (Harel et al., 2016). Theory suggests that this nonlinear relationship reflects the dependency of long-term denudation on hydrological variability (Deal et al., 2018;Lague et al., 2005;Tucker and Bras, 2000). Hydrological variability affects both temporal and spatial variations in river discharge, and the effect of river discharge on denudation and river incision rates can be approximated by theoretical model derivations. However, the impact of hydrological variability on incision rates in natural environments has, until now, only been successfully identified in a limited number of case studies (DiBiase and Whipple, 2011;Ferrier et al., 2013;Scherler et al., 2017).
We identify two limitations hampering the large-scale application of river incision models that include hydrological variability. First, the necessary high-resolution hydrological data are usually unavailable. Mountain regions are typically characterized by large temporal and spatial variation in runoff rates (e.g., Mora et al., 2014). Yet, most of the observational records on river discharge in mountain regions are fragmented and/or have limited geographic coverage. Second, large catchments are often underlain by variable lithologies. Studies exploring the role of river hydrology in controlling river incision have hitherto mainly focused on regions underlain by rather uniform lithology (DiBiase and Whipple, 2011;Ferrier et al., 2013) or they have considered lithological variations to be of minor importance (Scherler et al., 2017). However, tectonically active regions have usually experienced tectonic accretion, subduction, active thrusting, volcanism and denudation, resulting in a highly variable lithology over > 100 km distances (Horton, 2018). Rock strength is known to control river incision rates and is a function of its lithological composition and stratigraphic age (Brocard and van der Beek, 2006;Lavé and Avouac, 2001;Stock and Montgomery, 1999), as well as its rheology and fracturing (Molnar et al., 2007). If we want to use geomorphic models not only to emulate the response of landscapes to climatic and/or tectonic forces but also to predict denudation rates, then we need to account for variations in physical rock properties (Attal and Lavé, 2009;Nibourel et al., 2015;Stock and Montgomery, 1999). Even more impor-tantly, these variations in rock erodibility can potentially obscure the relation between river incision and discharge (Deal et al., 2018). Therefore, the climatic effects on denudation rates can only be correctly assessed if the geomorphic model accounts for physical rock properties and vice versa. Based on current limitations, we formulate two main objectives: we want (i) to assess the impact of lithological heterogeneity on river incision and (ii) to unravel the role of allogenic (spatial and/or temporal runoff variability) versus autogenic (incision thresholds) controls on river incision. We develop and evaluate our approach in the southern Ecuadorian Andes, for which detailed lithological information is available as is a database of CRN-derived denudation rates (Vanacker et al., , 2015.

River incision models
Bedrock rivers are shaped by processes including weathering, abrasion-saltation, plucking, cavitation and debris scouring (Whipple et al., 2013). However, explicitly accounting for these processes renders models too complex at the spatial and temporal scales relevant to understanding landscape evolution of entire mountain ranges. Therefore, a broad variety of models have been proposed to simplify the complex nature of river incision dynamics (Armitage et al., 2018;Lague et al., 2005;Shobe et al., 2017;Venditti et al., 2019). Most river incision models assume a functional dependence of river incision on the shear stress (τ ; Pa) exerted by the river on its bed (Sklar and Dietrich, 1998;Whipple and Tucker, 1999). However, within the family of shear stress-stream power models, several approaches exist. Most commonly used is the Area-Based Stream Power Model (A-SPM), explicitly representing the universally observed inverse power relation between channel slope and drainage area (Howard, 1994;Whipple and Tucker, 1999). Parametrization of the A-SPM is purely empirical and involves the calibration of three incision parameters (an erosion efficiency parameter, an area exponent and a slope exponent). Given the interdependency of these parameters (e.g., Campforts and Govers, 2015;Croissant and Braun, 2014;Roberts and White, 2010), there is an ongoing effort to calibrate river incision models using a process-oriented strategy whereby small-scale observations and physical mechanisms are upscaled to the landscape scale (Venditti et al., 2019). In particular and not exclusively, ongoing efforts evaluate how the three incision parameters are affected by the presence of incision thresholds (e.g., DiBiase and Whipple, 2011;Lague, 2014), discharge variability (DiBiase and Whipple, 2011;Lague et al., 2005;Snyder et al., 2003;Tucker and Bras, 2000), and the spatial and temporal distribution of runoff (Deal et al., 2018;Ferrier et al., 2013;Lague et al., 2005;Molnar et al., 2006). In this paper, we evaluate how two of such derived models (the Stochastic-Threshold and Runoff-Based Stream Power Model -ST-SPM and R-SPM, respectively) can be used to explain measured variations in denudation rates at the landscape scale.

Area-Based Stream Power Model
The Area-Based Stream Power Model (A-SPM; Howard, 1994) is a first, lumped statistical approach to represent river incision: in which E is the long-term river erosion (L T −1 ), K (L 1−2m T −1 ) is the erosional efficiency as a function of rock erodibility and erosivity, A (L 2 ) is the upstream drainage area, S (L L −1 ) is the channel slope, and m and n are exponents whose values depend on lithology, rainfall variability and sediment load. Equation (1) can be rewritten as a function of the steepness index, k s , where k s can be written as the upstream area-weighted channel gradient: in which θ = m/n is the concavity index (Snyder et al., 2000;Whipple and Tucker, 1999). In order to compare steepness indices from different locations, θ is commonly set to 0.45 and referred to as the normalized steepness index, k sn (Wobus et al., 2006). Variations in k sn are often used to infer uplift patterns by assuming a steady state between uplift and erosion (Kirby and Whipple, 2012). In transient settings, in which steady-state conditions are not necessarily met, the k sn values can be used to infer local river incision rates (Harel et al., 2016;Royden and Perron, 2013).
When using the A-SPM, the effect of autogenic (caused by intrinsic river dynamics such as incision thresholds and changes in channel width) and allogenic (originating from the transient response of river dynamics to extrinsic changes such as climate variability) controls is assumed to be accounted for in the model parameters (K , m and n). For example, it has been shown that incision thresholds translate into a slope exponent n greater than unity when applying the A-SPM (Lague, 2014). Notwithstanding empirical evidence supporting the A-SPM, such as the scaling between drainage area and channel slope in steady-state river profiles (Lague, 2014) or its capability to simulate transient river incision pulses , the lumped modeling approach of the A-SPM cannot be used to evaluate the role of autogenic or allogenic river response.

Stochastic-Threshold Stream Power Model
The Stochastic-Threshold Stream Power Model (ST-SPM; Crave and Davy, 2001;Deal et al., 2018;Lague et al., 2005;Snyder et al., 2003;Tucker and Bras, 2000) simulates the impact of hydrological variability and incision thresholds on river incision and thus enables us to evaluate the role of autogenic or allogenic river response.
The ST-SPM is calculated in two consecutive steps. First, instantaneous river incision I (L t −1 ) is calculated as in which Q * represents the dimensionless normalized daily discharge calculated by dividing daily discharge Q (L 3 T −1 ) by mean annual discharge Q (L 3 T −1 ), k e (L 2.5 T 2 m −1.5 ) is the erosional efficiency constant, R (L T −1 ) is the mean annual runoff, a is the shear stress exponent reflecting the nature of the incision process , ψ is the threshold term (L T −1 ), and k t , k w , α, β, ω a and ω b are the channel hydraulic parameters described in Table 1.
In a second step, long-term river incision is calculated by multiplying instantaneous river incision, I , calculated for a discharge of a given magnitude (Q * ) with the probability for that discharge to occur (pdf(Q * )), and subsequently integrating this product over the range of possible discharge events specific to the studied timescale (DiBiase and Whipple, 2011;Lague et al., 2005;Scherler et al., 2017;Tucker and Bras, 2000;Tucker and Hancock, 2010): in which Q * c is the minimum normalized discharge required to exceed the critical shear stress (τ c ), and Q * m is the maximum possible normalized discharge over the time considered.

Runoff-Based Stream Power Model
The Runoff-Based Stream Power Model (R-SPM) is a simplified version of the Stochastic-Threshold Stream Power Model (ST-SPM). The R-SPM assumes that the incision thresholds are negligible (ψ = 0) and that discharge is constant over time (Q * = 1), simplifying Eq. (5) to In the following sections, we first describe the study area, characterize the lithological configuration by developing a lithological erodibility index and compile a database to represent runoff variability. Second, we present the methods and assumptions used for calibrating and simulating river incision. In a third section, the modeling results are presented at the catchment scale: we start by evaluating the impact of lithological heterogeneity on river incision rates using an area-based river incision model (A-SPM). We then evaluate to what extent the variability in denudation rates can be explained by spatial and/or temporal runoff variability and the   (Farr et al., 2007). The main map is produced with Topo Toolbox (Schwanghart and Scherler, 2014). The inset map is made in QGis 3 ©. existence of incision thresholds using the R-SPM and ST-SPM. In a final section, we discuss our findings, highlight the implications of our work and discuss further perspectives.

Tectonics and geomorphic setting
The Paute catchment is a 6530 km 2 transverse drainage basin (2.9 • S, 79 • W): the Paute River has its source in the eastern flank of the Western Cordillera, traverses the Cuenca intramontane basin and cuts through the Eastern Cordillera before joining the Santiago River, a tributary of the Amazon ( Fig. 1; Hungerbühler et al., 2002;Steinmann et al., 1999). Where the Paute River cuts through the Eastern Cordillera, the topography is rough with steep hillslopes (90th percentile of slope gradients: 0.40 m m −1 ) and deeply incised river valleys (Guns and Vanacker, 2013).
The oblique accretion of terranes to the Ecuadorian margin during the Cenozoic resulted in a diachronous exhumation and cooling history along the Ecuadorian cordillera system (Spikings et al., 2010). South of 1.5 • S, where the Paute  (Spikings et al., 2010). In the Western Cordillera, no elevated cooling is observed during the Paleogene and extensional subsidence of the Cuenca basin allowed synsedimentary deposition of marine, lacustrine and terrestrial facies until the Middle to Late Miocene (Hungerbühler et al., 2002;Steinmann et al., 1999). The collision between the Carnegie ridge and Ecuadorian trench at some time between the Middle to Late Miocene (Spikings et al., 2001) resulted in uplift of the Western Cordillera and caused a tectonic inversion of the Cuenca basin (Hungerbühler et al., 2002;Steinmann et al., 1999). Based on a compilation of mineral cooling ages available for the Cuenca basin, Steinman et al. (1999) estimated a mean rock uplift rate of ca. 0.7 mm yr −1 and a corresponding surface uplift of ca. 0.3 mm yr −1 from 9 Ma to present. Uplift patterns are assumed to be reflected in the river steepness and not explicitly simulated in this paper. The Paute basin is characterized by a tropical mountain climate (Muñoz et al., 2018). Despite the presence of mountain peaks up to ca. 4600 m (Fig. 1), the region is free of permanent snow and ice (Celleri et al., 2007). The region's precipitation is regulated by its proximity to the Pacific Ocean (ca. 60 km distance), the seasonal shifting of the Intertropical Convergence Zone (ITCZ) and the advection of continental air masses sourced in the Amazon basin, giving rise to an orographic precipitation gradient along the eastern flank of the Eastern Cordillera (Bendix et al., 2006). Total annual precipitation is highly variable within the Paute basin and ranges from ca. 800 mm in the center of the basin up to ca. 3000 mm in the eastern parts of the catchment (Celleri et al., 2007;Mora et al., 2014).

Lithological strength
The erodibility map was developed using an empirical, hybrid classification method: it combines information on the lithological composition (Aalto et al., 2006) and the age of non-igneous formations assuming higher degrees of diagenesis and increased lithological strength for older formations (see Kober et al., 2015). Adding age information to evaluate lithological strength has advantages because lithostratigraphic units are typically composed of different lithologies but mapped as a single entity because of their stratigraphic age. The lithological erodibility (L E ) is calculated as , non-igneous rocks L L 2 , igneous rocks. (7a) L A is a dimensionless erodibility index based on stratigraphic age (Fig. 2a), and L L is a dimensionless erodibility index based on lithological strength (Table 3), similar to the erodibility indices published by Aalto (2006). Note that L A varies between 1 (Carboniferous) and 6 (Quaternary), whereas L L ranges between 2 (e.g., granite) and 12 (e.g., unconsolidated colluvial deposits). The lithological strength thus has a double weight, resulting in L values ranging between 1 and 6. For igneous rocks, only L L is considered, assuming that the lithological strength of igneous rocks remains constant over time. For river incision parameters to be comparable to other published ranges, L E is finally scaled around 1 by multiplying L with 2/7. L E therefore ranges between 2/7 and 12/7. A description of the lithological units, the age of the formations and their lithological strength (L A , L L and L E ) is provided in Supplement Table S3.
Using Eq. (7), we developed the erodibility map of Ecuador ( Fig. S1) and the Paute catchment ( Fig. 2c) based on the 1M geological map of Ecuador (Egüez et al., 2017). The lithological erodibility values were compared with field measurements (n = 9) of bedrock rheology by Basabe (1998). An overview of measured lithological strength values is provided in Table S4 (e.g., uniaxial compressive strength). Figure 2b shows good agreement (R 2 = 0.77) between the lithological erodibility index, L E , and the measured uniaxial compressive strength.  Table S1. (b) Field measurements of uniaxial compressive strength (Basabe, 1998; Table S4) versus the empirical erodibility index calculated using Eq. (7). Note that two of the nine observations overlap on this plot. (c) Spatial distribution of L E in the Paute catchment. The underlying topographic map is based on the 30 m SRTM v3 DEM (Farr et al., 2007). The lithological erodibility map for Ecuador was used to delineate different lithostratigraphic units and is based on the 1M geological map of Ecuador (Egüez et al., 2017;see also Fig. S1 in the Supplement). The map is produced with Topo Toolbox (Schwanghart and Scherler, 2014).

CRN-derived denudation rates
Catchment-wide denudation rates are derived from in situ produced 10 Be concentrations in river sand. At the outlet of 30 sub-catchments ( Fig. 1, Table 2), fluvial sediments were collected. We refer to Vanacker et al. (2015) for details on sample processing and derivation of CRN denudation rates taking into account altitude-dependent production, atmospheric scaling and topographical shielding (Dunai, 2000;Norton and Vanacker, 2009;Schaller et al., 2002). CRN concentrations are not corrected for snow or ice coverage because there is no evidence of glacial activity during the integration time of CRN-derived denudation rates (Vanacker et al., 2015). Three data points were excluded from model optimization runs: two catchments with a basin area smaller than 0.5 km 2 (MA1 and SA) and one catchment with an exceptionally low 10 Be concentration that can be attributed to recent landslide activity (NG-SD; see Vanacker et al., 2015).

River morphology
Based on a gap-filled SRTM v3 digital elevation model (DEM) with 1 arcsec resolution (Farr et al., 2007), we calculate river steepness for all channels with drainage areas > 0.5 km 2 and average it over 500 m reaches. The optimized concavity θ for the Paute catchment (0.42; Text S1) is close to the frequently used value of 0.45, so we fix concavity to the reference value of 0.45 and report river steepness as normalized river steepness (k sn ) in the remainder of this paper. The spatial pattern of k sn values (Fig. 3) is a result of the transient geomorphic response to river incision initiated at the Andes Amazon transition zone (Vanacker et al., 2015). To evaluate the extent to which transient river features influence simulated denudation rates, chi plots (χ) for all studied sub-catchments are calculated following Royden and Perron (2013) and given in the Supplement (Text S1; Fig. S4; Royden and Perron, 2013).
To constrain the value of k w used in the process-based incision models (Eqs. 4 and 6), we calibrate the relationship between bankfull river width (W b ) and discharge (Leopold and Maddock, 1953): in which k w (L 1−3ω b t ω b ) and ω b are scaling parameters regulating the interaction between mean annual discharge Q and incision rates (Eq. 4). We constrain k w by analyzing downstream variations in bankfull channel width for a fraction of the river network (see Scherler et al., 2017). River sections are selected based on the availability of high-resolution optical imagery in Google Earth, and river width was derived using the ChanGeom toolset (Fisher et al., 2013;Fig. S5).
The power-law fit between Q and W yields a value of 0.43 for the scaling exponent, ω b , with an R 2 of 0.51 (Fig. 4). The value of this exponent lies within the range of published values of 0.23-0.63 (Fisher et al., 2012;Kirby and Ouimet, 2011). To maintain a dimensionally consistent stream power Table 2. Characteristics of the sub-catchments studied in this paper. IDs correspond to the numbers indicated in Fig. 1. The 10 Be cosmogenic nuclide erosion rates were derived from Vanacker et al. (2015) a . Coordinates are given in decimal degrees in the WGS84 datum, L E is the average lithological index for the catchment, k sn is the normalized catchment average steepness, P RIDW and R RIDW are respectively the downscaled catchment average precipitation and runoff, and k is the optimized discharge variability coefficient (see Eq. 9).

ID Sample
Latitude Longitude Area 10 Be erosion  (2015) is not listed because its area (< 0.1 km 2 ) did not allow us to accurately calculate the catchment properties listed here. b Catchments excluded from model optimization runs (see text).
model, ω b was fixed to a value of 0.55. When doing so, the fit remains good (R 2 = 0.5) and we obtained a k w value of 3.7 m −0.65 s 0.55 that is used in the remainder of the paper.

Runoff variability
Evaluating the role of spatial and temporal runoff variability (Eqs. 5 and 6) requires estimates of catchment-specific runoff (R, spatial variability) and discharge (temporal variability). Although measured runoff data and discharge records are available for the Paute basin e.g., Mora et al., 2014;Muñoz et al., 2018), the monitoring network of existing hydrological stations does not capture the spatial variability present in the different sub catchments of the 6530 km 2 Paute basin (Fig. 1). To estimate runoff variability for all 30 sub-catchments, we use hydrological data derived in the framework of the Earth2Observe Water Resource Reanalysis project (WRR2; Schellekens et al., 2017) available from 1979 to 2014. Specifically, we use the hydrological data calculated with the global water model WaterGAP3 (Water -Global Assessment and Prognosis: Alcamo et al., 2003;Döll et al., 2003) at a spatial resolution of 0.25 • and a daily temporal resolution (http://www.earth2observe.eu, last access: 19 May 2020). Uncertainties associated with the WaterGAP3 data originate from hydrological model assumptions and spatially distributed input data . We revisit the impact of uncertainties in the climatological data on our model runs in the Discussion section of this paper. In the following paragraphs, we explain how we derive (i) a highresolution runoff map by spatially downscaling these coarse data and (ii) catchment-specific magnitude frequency distri-  Farr et al., 2007). The highest values can be observed in two major knickzones located in the lower part of the Paute basin. In these zones, topographic rejuvenation started and a transient incision pulse has propagated from east to west (see also Fig. S3). The map is produced with Topo Toolbox (Schwanghart and Scherler, 2014).  butions of discharge (pdf_Q * ) characterizing the temporal variability of runoff.

Spatial runoff patterns
A global hydrological reanalysis dataset such as WaterGAP provides daily runoff data over several decades and makes our methodology transferable to other regions. However, a spatial resolution of 0.25 • is insufficient to represent highly variable regional trends in water cycle dynamics over mountainous regions (Mora et al., 2014) and in small catchments. Therefore, we downscale the Ecuadorian WaterGAP3 data to a resolution of 2.5 km by amalgamating rain gauge data with the reanalysis product. The procedure consisted of the following steps and is presented in Figs. 5 and 6.
The relationship between precipitation (P ) and runoff (R) is constrained from the fit between monthly mean values for P and R available for all Ecuadorian WaterGAP 0.25 • pixels (Fig. 5).
A high-resolution mean annual precipitation map (P RIDW ) is calculated by downscaling the WaterGAP precipitation data (P ) using a series of rain gauge observations (338 stations, 1990-2013) from the Ecuadorian national meteorological service (INAMHI; available from http:// www.serviciometeorologico.gob.ec/biblioteca/, last access: 19 May 2020). A residual inverse distance weighting (RIDW) method is applied to amalgamate mean annual gauge data with the mean annual WaterGAP3 precipitation map. First, the differences between the gauge and WaterGAP data are interpolated using an IDW method (Fig. S6). Second, the resulting residual surface is added back to the orig-  (Farr et al., 2007). The maps are produced with Topo Toolbox (Schwanghart and Scherler, 2014). inal P data. A similar approach is often applied to integrate gauge data with satellite products, and we refer to the literature for further details on its performance (e.g., Dinku et al., 2014;Manz et al., 2016). Figure 6a shows P for the Paute region, and Fig. 6c shows its downscaled equivalent (P RIDW ).
Daily precipitation data (12 784 daily grids between 1979 and 2014) are downscaled to 2.5 km using the ratio between P RIDW and P , thereby assuming that the mean annual correction for precipitation also holds for daily precipitation patterns.
The relationship between P and R (Fig. 5) is used to derive daily runoff values from the downscaled precipitation data for every day between 1979 and 2014.
The mean annual runoff map for the Paute basin is shown in Fig. 6b and its downscaled equivalent in Fig. 6d. Mean annual values are further used to calculate mean catchment runoff (R) and the discharge variability (next paragraph) for every sub-catchment described in Table 2. The mean catchment-specific runoff averaged for all catchments equals 0.82 ± 0.35 m yr −1 .

Frequency magnitude distribution of orographic discharges
Runoff variability is typically cast in terms of spatial runoff variability (Sect. 2.4.1). However, the temporal pattern of runoff might also influence river incision and is typically represented by discharge magnitude frequency distributions. Constraining the shape of these distributions is important because the number of large storm events determines the frequency with which thresholds for river incision to occur are exceeded (see Sect. 1.2.2 and references therein). The probability distribution of discharge magnitudes consists of two components: at low discharges, the frequency of events increases exponentially with increasing discharge (Lague et al., 2005), whereas at high discharge, the frequency of events decreases with increasing discharge following a power-law distribution (Molnar et al., 2006). An inverse gamma distribution captures this hybrid behavior and can be written as (Crave and Davy, 2001;Lague et al., 2005) in which is the gamma function and k is a discharge variability coefficient; k represents the scale factor of the inverse gamma distribution and (k + 1) the shape factor. Previous studies used a single average k value to characterize regional discharge: DiBiase and Whipple (2011) use a constant k value for the San Gabriel Mountains, whereas Scherler et al. (2017) use a constant k value for high and low discharge but distinguish between eastern Tibet and the Himalaya. However, given the strong variation in temporal precipitation regimes in the Paute basin (Celleri et al., 2007;Mora et al., 2014), we explicitly evaluated the role of temporal runoff variability by calculating catchment-specific discharge distributions from the WRR2 WaterGAP dataset. Daily variations in discharge at the sub-catchment outlets ( Fig. 1) were calculated by weighing flow accumulation with runoff (R RIDW ; see Sect. 5.1.1). For every catchment, the complementary cumulative distribution function (CCDF) of the daily discharge was fitted through the observed discharge distribution as where is the lower incomplete gamma function. Figure S7 illustrates the fit between the WaterGAP-derived discharge distribution and the optimized CCDF for one of the catchments. Site-specific discharge variability values (k) are calculated for all catchments and listed in Table 2. The obtained k values range between 0.8 and 1.2 with a mean of 1.01±0.12.

Methods
The presented river incision models (A-SPM, R-SPM and ST-SPM in Sect. 1.2) all depend on river steepness, k sn , which is known to correlate well with E CRN (DiBiase et al., 2010;Ouimet et al., 2009;Scherler et al., 2017;Vanacker et al., 2015). Moreover, E CRN integrates over time spans that average out temporal fluctuations of denudation rates and over spatial extents that are sufficient to average out the erratic nature of hillslope processes. Therefore, E CRN can be used to constrain models of river incision provided a set of assumptions that we first describe below.

CRN-derived denudation rates to calibrate river incision
The use of CRN-derived denudation rates to calibrate river incision relies on three main assumptions, summarized by Scherler et al. (2017). A first assumption is that the catchment-wide denudation rates derived from CRN are representative for long-term fluvial incision. Positive correlations between river steepness, k sn , and CRN-derived denudation rates support this assumption (Vanacker et al., 2015), except for very small catchments where CRN-derived denudation rates are sensitive to the occurrence of deep-seated landslides during which material shielded at depth is supplied to the river (Niemi et al., 2005;Yanites et al., 2009). A second assumption when using CRN data to calibrate river incision models is that the sediment cosmogenic nuclide budget is at steady state at the catchment scale so that the input of CRN via in situ production equals the export of CRN via sediment export and radioactive decay. Given the size of the studied basins, this assumption seems to be reasonable. A third assumption, in particular when using the process-based R-SPM and ST-SPM, is that the runoff data used to calibrate the incision parameters are uniform within the sampled subcatchments and representative of the time span over which CRN data integrate (1-100 kyr). This is a challenging assumption given that available hydrological data only cover the recent past. While spatial patterns of runoff, mainly controlled by orographic precipitation, could be assumed to be broadly similar over the integration time of CRN-derived denudation, this is not necessarily true for the temporal variation in runoff. We will revisit the validity and implications of these three assumptions in the Discussion section of this paper.

River incision models
In a first set of model runs, we evaluate the performance of the area-based SPM (A-SPM) in predicting E CRN rates. To account for rock strength variability Eq.
(2) is rewritten as where k a (L 1−2m T −1 ) is the erosional efficiency parameter and L E is a dimensionless catchment mean lithological erodibility value. Given its empirical nature, wherein the effect of allogenic (e.g., runoff variability) and autogenic (e.g., incision thresholds and river width dynamics) controls of fluvial processes is integrated within the empirical scaling parameters (K, m and n), the A-SPM does not enable us to identify the role of spatial or temporal runoff variability and incision thresholds. Note that, at any point in the paper, lithological heterogeneity within the Paute catchment is represented using the average values of L E for the individual subcatchments indicated with L E and listed in Table 2. If lithological heterogeneity is not considered, L E is fixed to a value of 1.
In a second set of model runs, we evaluate to what extent more advanced SPMs can be used to understand the role of these allogenic and autogenic processes. We start by evaluating the performance of a runoff-based SPM (R-SPM). To account for rock strength variability Eq. (6) is rewritten as An overview of the parameter values required to solve the R-SPM is given in Table 1. Only the value of k w is based on a regional calibration of the hydraulic geometry scaling (see Sect. 2.3). Other parameters are set to theoretical values (reported by Deal et al., 2018;DiBiase and Whipple, 2011;Scherler et al., 2017). Actively incising bedrock channels are often covered by a layer of sediment (Shobe et al., 2017). Therefore, we assume that river incision is scaled to the bed shear stress as for bedload transport (Meyer-Peter and Müller, 1948) and set a to 3/2 (see DiBiase and Whipple, 2011;Scherler et al., 2017). We use the Darcy-Weisbach resistance relation and coefficients (α = β = 2/3) to calculate shear stress exerted by the river flow on its bed and assume a friction factor of 0.08, resulting in a flow resistance factor k t of 1000 kg m −7/3 s −4/3 (e.g., Tucker, 2004). The use of Darcy-Weisbach friction coefficients in combination with a = 3/2 results in a value for the slope exponent equal to unity (n = 1; see Eq. 4). Based on these theoretical derivations, we fix n to unity when constraining the R-SPM. Note that this contrasts with the first set of model runs (application of the A-SPM) in which we allow n to vary. By fixing n to unity, we want to verify whether spatial variations in runoff (incorporated in K from Eq. 12) can explain variations in incision rates otherwise ascribed to nonlinear river incision. The only parameter not fixed to a constant value is the erosivity coefficient k e , which is optimized as described in Sect. 3.3.
In a final set of model runs, we apply the Stochastic-Threshold SPM (ST-SPM) to evaluate the role of temporal precipitation variability and thresholds for incision (Eq. 4). Here, we adjust the ST-SPM to account for rock strength variability as To derive long-term erosion rates (E), Eq. (13) is integrated over the probability density function of discharge magnitudes (Eq. 5), which requires values for the lower (Q * c ) and the upper (Q * m ) limit of the integration interval. Constraining Q * m is difficult based on observational records alone as they might miss some of the most extreme flooding events. However, when simulating incision rates over long time spans and thus considering long return times of Q * m (> 1000 years), the solution of Eq. (5) is insensitive to the choice of Q * m (Lague et al., 2005). We therefore set Q * m to infinity in all our model runs. The critical discharge (Q * c ) for erosion to occur can be derived from Eq. (13) by setting I equal to 0: The impact of spatial variations in runoff and discharge variability is evaluated by setting R and k to the subcatchment-specific values or the mean of these values (listed in Table 2; Eq. 4). The parameters left free during optimization are the erosivity coefficient k e and the critical shear stress τ * c . Parameter values of both variables are optimized as described in Sect. 3.3.

Optimization of model parameters
We propose three metrics to evaluate the performance of the river incision models. The first one is the commonly used model error (ME), where nb is the number of E CRN data points, O i represents the catchment-specific measured E CRN denudation rates, M i represents the catchment-specific modeled river incision and σ i represents the catchment-specific standard deviation of E CRN . The advantage of the ME is that it explicitly incorporates the error on the analytical data (E CRN ) by weighing the model error with the analytical error. However, errors on CRN data are heteroscedastic: they systematically increase with increasing denudation rates. Although the ME thus provides a good metric to evaluate overall model performance, the metric is not well suited to optimize model parameters in an optimization procedure: too much weight will be given on optimization of the model in the lower regime of the denudation spectrum in which measured errors on E CRN are low, whereas higher measured E CRN data will not be approximated well because of large associated errors. To compensate for the effect of heteroscedasticity we rescale values O i , M i and E i using a logarithm with base 10 when calculating ME . In this paper, ME will be used to evaluate model performance but not to optimize model parameters. A second metric is the coefficient of determination, R 2 : where f i represents the fitted E CRN denudation rates. Contrary to ME, R 2 evaluates the explained variance of the model by giving all observations the same weight, regardless their analytical error. However, when model parameters result in an offset between simulated and observed data (i.e., the intercept of the fit), this can still result in a high R 2 . We therefore use the Nash-Sutcliffe model efficiency to optimize model parameters (NS; Nash and Sutcliffe, 1970): The NS coefficient ranges between −∞ and 1, where 1 indicates optimal model performance explaining 100 % of the data variance. When NS = 0, the model is as good a predictor of the mean of the observed data. When NS < = 0, the model performance is unacceptably low. The NS coefficient was developed in the framework of hydrological modeling but has been applied in a wide range of geomorphologic studies (e.g., Jelinski et al., 2019;Nearing et al., 2011).

Comparing model results with CRN-derived denudation rates
In the following sections, we compare simulated erosion rates, obtained with the river incision models presented in Eqs. (11)-(13), with measured CRN-derived denudation rates. We start with the use of the A-SPM (Eq. 11) to evaluate the extent to which lithological variability controls denudation rates. Once the impact of lithological heterogeneity on river incision is clarified, we evaluate whether runoff variability and incision thresholds can explain variations in E CRN -derived denudation rates. To this end, two river incision models are evaluated (the R-SPM and ST-SPM, presented in Eqs. 12 and 13, respectively). The optimized parameters and model performance of all model scenarios are listed in Table 4. Best-fit results of a selected number of model runs are presented in Figs. 7 and 8. An overview of model fits for all the scenarios listed in Table 4 is given in Figs. S8, S9 and S10.

Area-based stream power model
In a first set of model runs we evaluate the use of an areabased stream power model (A-SPM) to explain observed variations in CRN-derived denudation rates (E CRN ). We optimize river incision parameters for four scenarios (Table 4: A-SPM scenarios 1-4): in the first two scenarios, the slope exponent, n, is left as a free parameter. In the second two scenarios, the slope parameter is fixed to unity (n = 1). Figure 7 illustrates both the k sn − E CRN ( Fig. 7a and b) and corresponding E Mod −E CRN relationships, wherein E Mod represents the simulated river incision ( Fig. 7c and d).
In A-SPM scenario 1 (Table 4, Fig. 7c), we assume a spatially uniform erodibility (L E fixed to 1 in Eq. 11) and leave the erosion efficiency coefficient (K ) and the slope parameter n as free parameters during model optimization. The optimized fit between simulated erosion (E; Eq. 2) and E CRN is shown in Fig. 7c. The optimized fit results in a high degree of data scattering, resulting in an NS model efficiency of 0.5, an R 2 of 0.5, an ME of 3.25, and optimized values for K and n of respectively 0.73 m 0.1 s −1 and 1.07. The fit between k sn and E CRN (Fig. 7a) or simulated river incision  (Table 2). b If R is fixed, a uniform mean runoff value of 0.8 m yr −1 is used for all catchments. If R is variable, catchment-specific values are used (Table 2). c If k is fixed, a uniform mean discharge variability value of 1.01 is used for all catchments. If k is variable, catchment-specific values are used (Table 2). d The slope exponent (n) is optimized as a free parameter in A-SPM 1-2. It is fixed to 1 in A-SPM 3-4 (see text).  (Table 2). By correcting the E CRN values for lithological heterogeneity, the k sn − E CRN relationship becomes significantly nonlinear (n = 1.63 ± 0.5). (c) A-SPM scenario 1 (see Table 4). Modeled erosion rates for catchments consisting of strong rocks (blue) are mostly overpredicted and plot below the 1 : 1 line. Modeled erosion rates for catchments consisting of weak rocks (red) are mostly underpredicted and plot above the 1 : 1 line. (d) A-SPM scenario 2 (Table 4) in which spatially variable lithological erodibility is explicitly accounted for. A complete overview of all best model fits for A-SPM scenarios 1-4 is given in Fig. S8. and measured denudation rates (Fig. 7c) hints at the existence of a correlation between E CRN and river incision rates. The fit shown in Fig. 7c illustrates that modeled erosion rates for catchments with a low mean erodibility index (high resistance to erosion) are mostly overpredicted (plotting below the 1 : 1 line), whereas modeled erosion rates of catchments with a high erodibility index are mostly underpredicted (plotting above the 1 : 1 line).
In A-SPM scenario 2 (Table 4, Fig. 7d), we quantify the impact of varying lithology by using sub-catchment-specific values for the lithological erodibility (L E in Eq. 11) and leaving k a and n as free optimization parameters. The optimized fit between simulated river incision (E, Eq. 11) and E CRN is shown in Fig. 7d. Optimization results in an NS model efficiency of 0.73, an R 2 of 0.73, an ME of 2.23, and optimized values for k a and n of respectively 0.07 m 0.1 s −1 and 1.63. Considering lithological erodibility strongly reduces data scatter surrounding the fit. The importance of lithological strength in controlling the A-SPM and the k sn − E CRN relation (Fig. 7b) confirms that strong metamorphic and plutonic rocks erode at slower rates than lithologies that are less resistant to weathering such as volcaniclastic deposits. The erodibility index appears to provide an appropriate scaling of relative rock strength: analysis of residuals did not reveal any significant relation of residuals with lithology. When using spatially variable, sub-catchment-specific lithological erodibility values (L E ) (Fig. 7d), the n coefficient of the SPM is considerably larger than unity (n = 1.63) and the k sn − E CRN relationship becomes nonlinear (Fig. 7b), corroborating earlier empirical findings (DiBiase et al., 2010;Harel et al., 2016;Lague, 2014;Whittaker and Boulton, 2012). To evaluate the impact of a variable n exponent on the performance of the empirical A-SPM, we executed two more model optimizations.
In A-SPM scenario 3 (Table 4, Fig. S8c), we assume a spatially uniform lithology and erodibility (L E fixed to 1 in Eq. 11), fix n to 1 and only leave K to be optimized as a free model parameter. With an NS model efficiency of 0.5, an R 2 of 0.5, an ME of 3.2 and an optimized value for K of 1.00 m 0.1 s −1 , the model fit and performance are similar to the values obtained in scenario 1.
In A-SPM scenario 4 (shown in Table 4, Fig. S8d), lithological variability is considered (using sub-catchmentspecific values for L E in Eq. 11), n is fixed to 1 and K is a free model parameter. With an NS model efficiency of 0.51, an R 2 of 0.56, an ME of 3.05 and an optimized value for K of 1.4 m 0.1 s −1 , the model performance is much lower than when leaving the slope exponent n as a free parameter (A-SPM scenario 2).
The results from the four scenarios show that a nonlinear relationship between river steepness (k sn , representing river incision rates) and E CRN is unveiled when the lithological heterogeneity is explicitly taken into account (Fig. 7b). Likewise, a nonlinear river incision model (A-SPM scenario 2; Fig. 7d) in which lithological heterogeneity is considered outperforms the other evaluated A-SPM scenarios (Table 4).

Runoff-based and stochastic-threshold stream power models
The previous analysis shows that the explanatory power of the A-SPM model, and therefore the k sn − E CRN relationship, improves when considering spatial variations in lithology. Moreover, when considering variations in lithological erodibility, river incision is found to be nonlinearly depen-dent on the channel slope (S), with n = 1.63. In a next step we evaluate whether this nonlinear relation can be explained by spatial and/or temporal rainfall variability and/or the existence of thresholds for river incision (Table 4: R-SPM scenarios 1-2 and ST-SPM scenarios 1-8; Fig. 8).

Runoff-Based SPM (R-SPM)
In a first set of model runs, we evaluate the performance of the Runoff-Based Stream Power Model (R-SPM Eq. 12) to evaluate the role of spatially variable runoff using catchmentspecific values for mean runoff (R derived from the Water-GAP data; reported in Table 2 and shown in Fig. 6).
In R-SPM scenario 1 (Table 4, Fig. S9a), lithological variability is not considered (L E fixed to 1 in Eq. 12). With an NS model efficiency of 0.49, an ME of 3.57 and an R 2 of 0.51, model performance is comparable to the regular A-SPM under uniform lithology with n fixed to 1 (NS = 0.5). This illustrates that studying spatial runoff variability is not feasible when ignoring the confounding role of lithological erodibility in controlling denudation rates.
In R-SPM scenario 2 (Table 4, Fig. 8a), lithological variability is considered (using sub-catchment-specific values for L E in Eq. 12). With an NS model efficiency of 0.7, an ME of 2.61 and an R 2 of 0.75, model performance is close to that of the regular A-SPM under uniform lithology with n 1 (NS = 0.72). This model simulation therefore suggests that spatial variations in runoff can account for the nonlinearity in the k sn − E CRN relationship: while slope dependency in the R-SPM is fixed to unity (see derivation in Eq. 4a-c), the model is capable of explaining the spatial pattern in denudation rates. This implies that orographic rainfall and thus runoff gradient as shown in Fig. 6 influence the efficiency of river incision. The offset between the R 2 (0.75) and NS (0.70) values can be attributed to the way in which these metrics work: whereas R 2 evaluates the goodness of the linear fit between modeled and measured observations, NS evaluates the absolute differences between modeled and observed denudation rates. Hence, for the NS model efficiency to be high, observations must fit on the 1:1 line (Fig. 8a). However, most of the simulated values for low denudation rates are overestimated when using the optimized parameter values of the R-SPM and plot below the 1 : 1 line (Fig. 8a). Therefore, we conclude that the R-SPM performs well in predicting measured denudation rates but low denudation rates are overestimated, resulting in an NS and ME value respectively slightly lower and higher than those of the empirical A-SPM. In the following section we evaluate whether introducing temporally variable runoff coefficients and/or incision thresholds can further improve the performance of a river incision model.  (Table 4) using the average lithological erodibility (L E ) and runoff R values per sub-catchment (both listed in Table 2). (b) ST-SPM scenario 7 (Table 4) using the average lithological erodibility (L E ) and runoff (R) values, as well as an incision threshold (τ c = 14 Pa). Numbered observations in (b) correspond to catchment IDs as listed in Table 2 (see also the discussion in Sect. 5). A complete overview of all best model fits for R-SPM scenarios 1-2 and ST-SPM scenarios 1-8 is given in Figs. S9 and S10, respectively.

Stochastic-Threshold SPM (ST-SPM)
In a final series of model runs, we use the Stochastic-Threshold Stream Power Model (ST-SPM, Eq. 13) to evaluate the role of spatially variable runoff (catchment-specific R; reported in Table 2 and shown in Fig. 6) in combination with catchment-specific runoff variability (k; reported in Table 2) and the presence of incision thresholds (τ c in ψ in Eqs. 4 and 10). Table 4 reports details on the different model scenarios in which ST-SPM is optimized to the observed E CRN data considering all possible combinations (4) of uniform or spatially variable catchment mean runoff (R) and uniform or spatially variable catchment mean runoff variability (k). For reference, the four scenarios include both uniform and spatially variable lithological erodibility, L E (eight scenarios in total).
In ST-SPM scenarios 1-4 (Table 4, Fig. S10a-d), the ST-SPM is optimized assuming a constant erodibility (L E fixed to 1). Similar to what has been found for the R-SPM, model performance is not any better compared to the use of a simple A-SPM when not considering lithological variability. This confirms that optimizing more complex river incision models (such as the ST-SPM) has little added value when the heterogeneity in environmental conditions (lithological heterogeneity) is not considered.
In ST-SPM scenarios 5 and 6 (Table 4, Fig. S10e-f), catchment mean runoff (R) is fixed to the average value of all catchments (0.82 m yr −1 ) in order to evaluate the role of (i) variations in observed temporal runoff variability (k) and (ii) optimized values for the incision threshold (τ c ). In scenario 5, k is fixed to the average value for all catchments (k = 1.01), whereas in scenario 6, k is set to the catchmentspecific values as listed in Table 2. Both scenarios (5 and 6) perform well with an NS value equalling 0.71, indicating that temporal runoff variability (k) is not influencing model performance. Regardless of the lack of spatially variable runoff (R), both scenarios perform as well as R-SPM scenario 2, in which runoff variability was considered. The good performance of ST-SPM scenarios 5 and 6 can be attributed to the presence of an incision threshold (ψ > 0 in Eq. 13) at which τ c is optimized to a value of ca. 30 Pa (Table 4). The fact that the use of the ST-SPM with constant runoff values yields a good model fit suggests that part of the nonlinear relationship between river steepness, k sn , and E CRN can be attributed to the presence of thresholds for river incision to occur (Lague, 2014).
ST-SPM scenarios 7 and 8 (Table 4, Figs. S10e-f and 8b) are similar to scenarios 5 and 6, with the difference that spatial runoff variability is considered by using catchmentspecific values for runoff (R; Table 2). Similarly to scenarios 5 and 6, using catchment-specific values for k does not improve model performance, resulting in a similar model performance for scenarios 7 and 8. Overall, ST-SPM scenarios 6 and 7 result in the highest model performance of all tested scenarios, with an NS model efficiency of 0.75, an ME of 2.22 and 2.21, and an R 2 of 0.75. The optimized model fit for ST-SPM scenario 7 is shown in Fig. 8b and corresponds well with the 1 : 1 line between modeled and observed denudation rates. Optimized values for τ c are ca. 14-15 Pa, which is in the range but at the lower spectrum of earlier documented values for critical shear stress (e.g., Shobe et al., 2018, report τ c values between 10 and 1000 Pa). Contrary to the R-SPM with which low denudation rates are overestimated (Fig. 8a), the ST-SPM does predict low denudation rates better due to the consideration of an incision threshold that mainly influences simulated river denudation rates at the lower end of the spectrum.
ST-SPM scenarios 7 and 8 have a model error (ME of 2.22 and 2.21, respectively) similar to the model error of A-SPM scenario 2 (ME = 2.23). Hence, we conclude that an ST-SPM considering spatial variations in runoff and simulating a critical threshold for river incision performs as well as an A-SPM with the effect of allogenic (runoff variability) and autogenic (incision thresholds) response cast in the lumped empirical incision parameters. While the R-SPM and ST-SPM do not necessarily predict spatial patterns in observed E CRN rates better than an A-SPM, they do enable one to simulate the effect of runoff variability and incision thresholds, therefore providing an operational tool to simulate past and future climate changes. Note that differences in model performance between R-SPM scenario 2 and ST-SPM scenarios 5-8 are existent but not very pronounced. To evaluate the significance of these differences, our analysis should be repeated on larger datasets capturing a wider variability in denudation rates and hydrology.

Equilibrium between river incision and hillslope denudation
In theory, rates of hillslope denudation equal rates of river incision if landscapes are either in a steady state or if transient landscapes are characterized by rapid hillslope response (e.g., threshold hillslopes). Steady-state landscapes can only be achieved under stable climatic and tectonic settings that prevail over millions of years. Such stability is rarely met in tectonically active regions where landscapes continuously respond to environmental perturbations (Armitage et al., 2018;Bishop et al., 2005;Campforts and Govers, 2015). The downstream reaches of the Paute catchment are a good example of a transient landscape where a major knickzone is propagating upstream, resulting in steep threshold topography downstream of the knickzone (Fig. S3 and Vanacker et al., 2015). Facing a sudden lowering of their base level after river rejuvenation, soil production and linear hillslope processes (Campforts et al., 2016) are no longer in equilibrium with rapidly incising rivers (Fig. 15 in Hurst et al., 2012). In steep topography, hillslopes may transiently evolve to their mechanically limited threshold slope whereby any further perturbation will result in increased sediment delivery through mass-wasting processes such as rockfall or land-sliding (Bennett et al., 2016;Blöthe et al., 2015;Burbank et al., 1996;Larsen et al., 2010;Schwanghart et al., 2018). Given the erratic nature of landslides, not all threshold hillslopes will respond simultaneously to base-level lowering depending on local variations in rock strength, hydrology, land use and seismic activity (Broeckx et al., 2020;Guns and Vanacker, 2014). Therefore, catchments in transient landscapes might experience hillslope denudation with highly variable rates (Vanacker et al., 2020).
We argue that CRN-derived denudation rates in the Paute basin both overestimate and underestimate long-term incision rates in these catchments. Overestimation may result from the occurrence of recent, deep-seated landslide events, that deliver sediments with a low CRN concentration to rivers (Tofelde et al., 2018). Underestimation, in turn, may occur if long-term hillslope lowering is accomplished by rare and large landslides whose return periods exceed the integration time of CRN-derived denudation rates (Niemi et al., 2005;Yanites et al., 2009).
Longitudinal profiles of rivers draining to the knickzone in the Paute catchment show marked knickpoints. This is particularly evident in catchments 9-16 ( Fig. 1) where k sn values are high (Fig. 2) and knickpoints appear in the longitudinal profiles (Figs. S3 and S4). Simulated erosion rates for some of these catchments deviate from CRN-derived denudation rates ( Fig. 8b; IDs 13, 14 and 16), whereas for others (e.g., IDs 9 and 11), predictions from the stochastic-threshold river incision model show good agreement with E CRN data. For catchments with a sufficiently large drainage area, modeled incision rates correspond well with E CRN (IDs 9 and 11 being both ca. 700 km 2 ), most likely because the mechanisms that potentially cause overestimation and underestimation cancel each other out at this scale. For smaller catchments (IDs 8, 13, 14 and 16 all being < 12 km 2 ) there is a discrepancy between simulated river incision rates and E CRN .
Although river incision models can be used to simulate denudation patterns in large transient catchments (> 10 km 2 ), there is a need to develop alternative approaches including landslide mechanisms in long-term landscape evolution models such as the TTLEM (Topo Toolbox Landscape Evolution Model; Campforts et al., 2017) or Landlab (Hobley et al., 2017).

Integration timescales of E CRN and k sn
Our analysis reveals the potential role of temporal and spatial variations of rainfall in long-term landscape evolution. Integration times of CRN-derived denudation rates measured in the Paute basin are of the order of 1.5-175 kyr. In contrast, response times of longitudinal river profiles generally range 0.25-2.5 Myr (Campforts et al., 2017;Goren et al., 2014;Snyder et al., 2003;Whipple, 2001;Wobus et al., 2006). During thousand-year to million-year timescales, it is unlikely that temporal rainfall distributions remain stationary. Thus, there is little reason to assume that the hydrometeo-rological data that we inferred from 35 years of data fully capture rainfall variability over the response times of river profiles and hillslopes. Contrary to temporal variations, the spatial patterns in orographic precipitation are characteristic of the formation of a mountain range at geological timescales (Garcia-Castellanos and Jiménez-Munt, 2015). In the southern Ecuadorian Andes, moist air advection via the South American low-level flow generates pronounced patterns of orographic precipitation (Campetella and Vera, 2002). These patterns might have persisted since at least the most recent phase of Andean uplift in the Late Miocene (Spikings et al., 2010;Spikings and Crowhurst, 2004). Present-day rainfall and runoff spatial gradients (Fig. 6) are thus deemed to be informative for spatial patterns of discharge at longer timescales (Sect. 3.1). The performance of the stream power models underscores this interpretation. While accounting for spatial patterns in runoff improves the performance of a stochastic-threshold SPM (Table 4 and Sect. 4.2.2), incorporating proxies for temporal discharge variability leads to no improvement of model performance (the role of k in Sect. 4.2.2).

Impact of lithological heterogeneity on long-term river incision rates
In all our simulations, model efficiency improves when incorporating rock strength variability (Table 4), which is consistent with earlier studies (Lavé and Avouac, 2001;Stock and Montgomery, 1999). In the absence of generally accepted metrics of erodibility, we employ an empirically derived lithological erodibility index (L E ; Eq. 7) based on the age and lithological composition of stratigraphic units. Owing to its simplicity, this or a similar index can potentially be applied at continental to global scales at which information on rock physical properties is usually lacking the detail available at smaller spatial scales (Attal and Lavé, 2009;Nibourel et al., 2015). Notwithstanding, river incision also depends on other rock properties such as the density of bedrock fractures, joints and other discontinuities . Fracture density has in turn been linked to spatial patterns of seismic activity (Molnar et al., 2007). Given the limited variability of seismic activity within the Paute basin (Petersen et al., 2018; Fig. S2), seismicity was not considered in our regional analysis but could be considered when applying our approach to other regions characterized by more spatial seismic variability. Incorporating spatial patterns of rock strength not only reduces the scatter surrounding the modeled river incision versus E CRN -derived denudation rates, but also controls the degree of nonlinearity between river steepness (k sn ) and denudation rates, expressed by the slope exponent n in the A-SPM (Fig. 7). Omitting rock strength variability results in a k sn − E CRN relation that is close to linear in the Paute catchment (with n = 1.07). This contradicts other studies in which lithology was assumed to be uniform and n has been reported  Fig. 7). It performs well when lithological heterogeneity is considered and all parameters are freely calibrated, resulting in a slope steepness exponent (n; see Eq. 1) of 1.63 (for a full overview of model parameters, see Table 4). In an A-SPM scenario in which n is fixed to 1, the model performance strongly deteriorates. In the R-SPM and ST-SPM models, n is fixed to the theoretically derived value of 1. The R-SPM model explicitly incorporates runoff variability (see Fig. 8a), and the ST-SPM model also includes an incision threshold (see Fig. 8b). Both models perform well when lithological heterogeneity is accounted for. Overall, the best model performance (highest NS and smallest ME) is obtained under the ST-SPM scenario in which lithological and runoff variability, as well as river incision thresholds, are considered. to be larger than 1 (e.g., DiBiase et al., 2010;Lague, 2014;Whittaker and Boulton, 2012).

Impact of runoff variability on long-term river incision rates
The A-SPM performs well in explaining E CRN when lithology is considered and n 1 ( Fig. 9; high NS model efficiency, low ME). For n = 1, the performance of the A-SPM is low. The result is consistent with earlier studies reporting n 1 (e.g., DiBiase et al., 2010;Harel et al., 2016;Ouimet et al., 2009;Scherler et al., 2014), which Lague (2014) attributes to discharge variability and incision thresholds. We tested this hypothesis using the R-SPM and ST-SPM. Our results underscore the fact that the nonlinear relationship between k sn and E CRN can be attributed to the spatial variability of mean annual runoff. Figure 9 shows that the R-SPM (in which n is fixed to the theoretically obtained value of 1) performs better than an A-SPM when n is fixed to 1. In tectonically active regions, steep river reaches often spatially coincide with the edge of the mountain range over which mean annual rainfall rates are highest. Accordingly, if variations in runoff are not considered, the effects of orographic precipitation will be partly accommodated by a nonlinear relationship between river steepness and denudation rates. The R-SPM accounts for this effect but results in an underestimation of low river incision rates (Fig. 8a). Moreover, the model error (Fig. 9b) shows that the R-SPM does not perform as well as the A-SPM. In a final set of model runs, we apply the ST-SPM with the explicit simulation of a threshold, which improves model performance, especially for low denudation rates, resulting in an overall model error equal to the one obtained with the A-SPM with n 1 (Fig. 9). This finding points to the potentially important role of thresholds for river incision to occur.
The model performance of the ST-SPM equals the performance of an empirical A-SPM with a slope exponent 1 (Fig. 9). Our interpretation is that (i) spatial variations in runoff and (ii) the incision thresholds are the causes of an observed nonlinear relation between k sn and E CRN . With a seemingly equal model performance, one could wonder what the benefit of the more complex ST-SPM model is over a simple, nonlinear A-SPM. The aim of using an ST-SPM is, however, beyond fitting observed denudation rates: we want to identify to what extent the system is forced by internal allogenic dynamics such as the presence of incision thresholds or external autogenic forces such as runoff variability. The use of the ST-SPM illustrated that both processes can be accounted for in a quantitative way so that future studies can explicitly consider their role when reconstructing past landscape response to external perturbations (e.g., climate change).
To further explore the interdependency between incision thresholds and spatial runoff variability, our approach can be applied to CRN datasets covering regions characterized by more pronounced rainfall gradients (e.g., in Chile: Carretier et al., 2018). Accounting for spatial variations in temporal discharge distributions (with k characterizing the stochastic flood occurrence) did not improve or deteriorate model performance (ST-SPM scenario 8 in Table 4). This is likely due to data limitations: the necessary data to characterize temporal variations in discharge within a given catchment over a timescale that is relevant for CRN-derived denudation rates are, at present, not available.
Our finding that spatial patterns in precipitation are related to river incision patterns corroborate findings in Hawaii (Ferrier et al., 2013), the Himalaya (Scherler et al., 2017) and in the Andes (Sorensen and Yanites, 2019). Sorensen and Yanites (2019) evaluated the role of latitudinal rainfall variability in the Andes in erosional efficiency using a set of numerical landscape evolution model runs. They show that erosion efficiency in tropical climates at low latitudes, where the Paute basin is located, is well captured by the spatial pattern of mean annual precipitation and thus runoff. At higher latitudes (25-50 • ) where mean annual precipitation decreases but erosivity is still high due to the intensity of storms (Sorensen and Yanites, 2019), river erosivity is likely better captured by spatial patterns in storm magnitude and frequency.

Conclusions
Numerous studies report a nonlinear relationship between channel steepness and CRN-derived denudation rates. Based on the growing mechanistic understanding of river incision processes, this nonlinear relationship is often attributed to incision thresholds. Rainfall variability controls the frequency of river discharges that exceed incision thresholds. Although the dynamic interplay between stochastic runoff and incision thresholds theoretically results in a nonlinear relationship between channel steepness and denudation rates, coupling theory with field data has been challenging. We address this issue in the Paute basin where we scrutinize the relationship between CRN-derived denudation rates and river incision using three different stream power models. We show that lithological variability obscures the relationship between channelsteepness-based river incision and CRN-derived denudation rates.
In order to account for rock strength variability, which for the Paute basin is mainly ascribed to variations in lithological strength in the study area, we propose the use of an empirical lithological strength index that is based on the lithology and age of lithostratigraphic units. Including lithological variability in the models increases the correlation between river steepness and denudation rates and reveals a nonlinear relation, which we seek to explain using a stochasticthreshold SPM (ST-SPM). Using a downscaled version of a hydrological reanalysis dataset, we show that the combination of spatially varying runoff and incision thresholds explains the observed nonlinear relationship. We do not detect, however, an impact of temporal discharge patterns on river incision. We attribute this to the integration time of CRN data and response times of river longitudinal profiles, which extend beyond the timescales at which discharge distributions can be assumed to be stationary.
Our study shows the potential of an ST-SPM to infer regional and, potentially, continental to global differences in rainfall variability. However, we emphasize that its application needs to account for confounding environmental variables such as rock strength. Simplified process representation of stream-power-based incision models (e.g., lack of sediment-bedrock interactions) might explain part of the remaining scatter between predicted and measured denuda-tion rates. However, residual analysis shows that most of the remaining scatter occurs in small transient catchments (up to 10 km 2 ) where sporadic mass-wasting processes on hillslopes likely obscure the relation between measurements and predictions. Elucidating this relation further could potentially be fostered by dynamic numerical landscape evolutions models that explicitly simulate the coupling between transient river adjustment and hillslope response. Data availability. All data used in this paper are freely available from the referenced agencies. The hydrological data used in this paper were created in the framework of the Earth2ObserveWater Resource Reanalysis project (WRR2; Schellekens et al., 2017) and are available from 1979 to 2014. Specifically, we use the hydrological data calculated with the global water model Water-GAP3 (Water -Global Assessment and Prognosis: Alcamo et al., 2003;Döll et al., 2003) at a spatial resolution of 0.25 and a daily temporal resolution (http://www.earth2observe.eu, last access: 19 May 2020) . A high-resolution mean annual precipitation map (PRIDW) is calculated by downscaling the WaterGAP precipitation data (P ) using a series of rain gauge observations (338 stations, 1990-2013) from the Ecuadorian national meteorological service (INAMHI; available from http://www.serviciometeorologico.gob.ec/biblioteca/, last access: 19 May 2020) (INAMHI, 2020). Topographic data are available from NASA (Farr et al., 2007). Lithological data are provided in the Supplement. Calculations were done in MATLAB using the Topo Toolbox Software (Schwanghart and Scherler, 2014).
Author contributions. BC conceived the project in collaboration with VV, MVM and GG. BC performed the statistical analysis and took the lead in writing the paper. All authors contributed to shaping the research and analyses, as well as writing the paper.
Competing interests. The authors declare that they have no conflict of interest.
Financial support. Benjamin Campforts received a postdoctoral grant from the Research Foundation Flanders (FWO, reference number 12Z6518N). Gustavo E. Tenorio received a PhD grant from ARES -the Académie de recherche et d'enseignement supérieur de la Belgique -as part of the PRD research cooperation project "Strengthening the scientific and technological capacities to implement spatially integrated land and water management schemes adapted to local socio-economic, cultural and physical settings", as well as a grant from the Secretaría de Educación Superior de Ciencia, Tecnología e Innovación de la República del Ecuador.
The article processing charges for this open-access publication were covered by a Research Centre of the Helmholtz Association.
Review statement. This paper was edited by Simon Mudd and reviewed by two anonymous referees.