Rainfall erosivity values are required for soil erosion prediction. To calculate the mean annual rainfall erosivity (

Intense soil erosion has a significant impact on the environment, for
example presenting a major threat for agricultural production or leading to
increased sedimentation and pollution in rivers, which also affects aquatic
organisms. Soil erosion modelling can be performed to detect critical parts
of the landscape and design suitable countermeasures to reduce soil losses.
One of the most frequently applied models for soil erosion modelling is the
Universal Soil Loss Equation (USLE) and its successor the Revised Universal Soil Loss Equation (RUSLE) (e.g. Renard et al., 1997). In the scope of the RUSLE model, soil erosion is described by six factors, one of which is the rainfall erosivity factor

In order to obtain robust rainfall erosivity values, high-resolution observed rainfall data are needed, in time ideally with a 1 or 5 min resolution (Dunkerley, 2019), and in space ideally to represent the spatial heterogeneity (Peleg et al., 2021). However, rainfall data at this resolution are often only available for shorter periods of observation (e.g. 10 or 20 years) or not at all. A solution to overcome this shortcoming is the use of stochastic rainfall models that allow the generation of long time series of arbitrary length. Additionally, these models can generate time series for ungauged locations through regionalisation of model parameters. In some cases, gridded rainfall erosivity data are used as input to the USLE-type soil erosion models. In terms of spatial description of the rainfall erosivity, most often station-based data are interpolated (e.g. Panagos et al., 2017) while satellite-based data could in the future provide useful estimates of gridded rainfall erosivity (e.g. Bezak et al., 2022). However, due to the station-based approach of the USLE, the generation of spatial rainfall is not required but would be useful for more sophisticated approaches to estimate erosion (Eekhout et al., 2021). A few studies have investigated the possibility of applying stochastic precipitation models to generate rainfall time series to then estimate rainfall erosivity (e.g. Jebari et al., 2012; Lobo et al., 2015; de Oliveira et al., 2018; Haas et al., 2018). Methods used to model rainfall include cluster-based models (e.g. Onof et al., 2000; Onof and Wang, 2020), cascade models (e.g. Molnar and Burlando, 2005; Pohle et al., 2018; Müller and Haberlandt, 2018), method-of-fragments models (e.g. Breinl and Di Baldassarre, 2019), or alternating renewal models (e.g. Callau Poduje and Haberlandt, 2017) or as part of weather generators (Peleg et al., 2017). The parameters of these methods are estimated based on observations, and the complexity of the models generally depends on the target temporal resolution. Thus, most studies are focused on daily data. For example, CLImate GENerator (CLIGEN) was applied to obtain daily rainfall estimates and calculate rainfall erosivity using daily data (e.g. de Oliveira et al., 2018; Lobo et al., 2015; Wang et al., 2018). Shortcomings such as sensitivity to the input parameters have been reported in the literature (e.g. Meyer et al., 2008; Haas et al., 2018). On the other hand, temporal high-resolution time series (i.e. 5 min) are less often generated using stochastic rainfall models, although in recent years advancements have been made (e.g. Haberlandt et al., 2008; Vandenberghe et al., 2011; Vernieuwe et al., 2015; Callau Poduje and Haberlandt, 2017; Müller-Thomy, 2020).

Thus, a few studies (e.g. De Oliveira et al., 2018; Haas et al., 2018) have investigated if stochastic rainfall models are able to correctly predict rainfall erosivity patterns at specific locations. In the case that a stochastic rainfall model is able to mimic the rainfall erosivity characteristics, generated long-term high-resolution rainfall time series should then allow a robust estimation of annual and even monthly erosivity patterns. Similarly, a limited number of studies (e.g. Angulo-Martinez et al., 2009) have investigated performance of different interpolation techniques related to the mapping of rainfall erosivity.

The main aim of this study is to evaluate and compare different rainfall generators and regionalisation approaches in order to obtain either directly or indirectly annual rainfall erosivity estimates for ungauged locations. Given a lack of high-resolution rainfall time series, the research question is whether these tested methods can adequately reproduce observed annual rainfall. As a follow-up research question, we investigate the performance of tested methods in terms of specific erosive event characteristics. This information is not directly needed as input to the USLE-type models but is often studied and investigated in rainfall erosivity studies. Finally, given the existence of a high-resolution rainfall time series, we also investigated how long these time series should be in order to obtain stable site annual rainfall erosivity. Hence, this information is relevant both for the soil erosion model applications using USLE-type models and for the studies investigating erosive event characteristics.

All tests were performed via leave-one-out cross validation, as the premise of this study is that high-resolution time series are not widely available. Additionally, the effect of station density on the regionalisation performance was assessed by performing each test with five different station counts (20 %, 40 %, 60 %, 80 %, and 100 % of observed stations). To minimise the sampling uncertainty, 20 realisations of each test at each station density were performed.

Location of all recording stations (

High-resolution observed rainfall data for 159 stations bounded by the
rectangle 7 to 12

Long-term climate (1881–2019) averaged across the German federal state of Lower Saxony that is investigated in the scope of this study. Source: German Weather Service (DWD) Climate Data Center (CDC) (CDC, 2020).

In this study, four different methods to calculate annual rainfall erosivity
(

Overview of the estimation of the annual rainfall erosivity (

Rainfall erosivity is one of the factors that has the highest impact on soil
erosion rates. Rainfall erosivity is characterised by multiple properties of
rainfall events such as the kinetic energy of raindrops, rainfall intensity,
and rainfall duration. In order to calculate annual rainfall erosivity for a
selected time span, the following equation proposed by Renard et al. (1997)
can be used:

Erosive rainfall events are defined according to the RUSLE methodology (Renard et al., 1997). A rainfall event is considered erosive if the total volume exceeds 12.7 mm of rain or if the maximum volume in 15 min is more than 6.35 mm. Here a 6 h period without rain is used in order to separate two consecutive erosive rainfall events. The monthly and annual rainfall erosivity values are then calculated based on the rainfall erosivity of all erosive events.

The direct interpolation of erosivity was carried out using geostatistical
methods (see e.g. textbooks from Isaaks and Srivastava, 1990, or Goovaerts,
1997). The spatial dependence of

A second approach for the regionalisation of the annual

Stochastic rainfall models allow for the generation of rainfall time series of arbitrary length, including for unobserved locations through regionalisation. For this study, the alternating renewal model (ARM) based on the theory of renewal processes was used to generate 5 min synthetic rainfall time series. In this model, rainfall is described as a series of independent alternating wet and dry spells, described by the three variables: wet spell amount (WSA), wet spell duration (WSD), and dry spell duration (DSD) (Fig. 4). Probability distributions were fitted to observations of these variables using the method of L-moments, with observed rainfall events being limited by a minimum WSA (1 mm) and DSD (60 min). Synthetic rainfall time series were then generated by producing random variates of these distributions.

Schematic of the external structure of the ARM model. The black boxes describe rainfall events derived from observations.

Additionally, the temporal distribution of rainfall within a wet spell is
described by a double exponential function conditioned on the wet spell time
to peak (WSTP – modelled using a uniform distribution), wet spell peak
intensity (WSPI – modelled using a copula; see below), and WSA (Fig. 5).
Full details of the model can be found in Callau Poduje and Haberlandt
(2017), with the following alterations which have been found to provide a
better model performance, especially for regionalisation:

a two-parameter Khoudraji–Gumbel copula describes the dependence between WSA and WSD,

a two-parameter Tawn copula describes the dependence between WSD and the ratio WSPI : WSA,

the three-parameter Weibull distribution is used instead of the four-parameter kappa distribution for the variables DSD and WSA, being more robust in a regionalisation setting.

Internal structure of ARM model according to the Callau Poduje and Haberlandt (2017).

Another possibility to generate high-resolution rainfall time series is to
disaggregate daily time series, which generally exist for longer time
periods and with higher station densities. For this study the
micro-canonical cascade model after Müller-Thomy (2019, 2020) was
applied due to its performance in previous studies (e.g. Müller and
Haberlandt, 2015, 2018). The general cascade model scheme of disaggregating
one coarse time step into “

General scheme of the cascade model for the first two disaggregation steps with exemplary rainfall amounts for a daily total of 12 mm (blue boxes show wet time steps).

The cascade model parameters are estimated by the aggregation of observed 5 min time series of the recording stations available in each density scenario (no parameter calibration or optimisation was carried out). For the daily time series serving as a starting point for the disaggregation, the aggregated 5 min time series of all 159 stations are used, independent of the applied density scenario. To investigate how suitable the disaggregation of daily information is for unobserved locations and a regionalisation of the daily precipitation is done prior to the application of the cascade model. Here, an ordinary kriging approach (OK) was used for the regionalisation of daily rainfall time series from 2007–2016. The interpolation uses an isotropic exponential variogram as in Eq. (6), a minimum of 4 neighbours and a maximum of 12 neighbours within a radius of 150 km.

In order to adequately test the performance of the different methods, all
regionalisations were performed in a cross-validation mode at varying
station densities. Five station density scenarios were chosen: 20 %
(

Five different station density scenarios shown for two realisations. The red stations are used for the cross validation for all station densities. The additional stations (blue) provide supplementary information for the regionalisation, with darker shades added to previous scenarios.

One further research question of interest is how long a rainfall time series must be in order to achieve a stable (i.e. robust) result of mean annual rainfall erosivity. The importance of such a research question lies in the fact that rainfall series used to calculate annual erosivity are often of limited length. Too short time series could lead to uncertain estimations of rainfall erosivity, which could affect the estimated soil erosion (i.e. under- or overestimation).

Using the alternating renewal rainfall model described in Sect. 3.3.1, we
investigated how many years of data are needed in order to obtain stable
estimations of the annual rainfall erosivity. For this purpose, 18 stations
with the longest observation time series length (mean

To assess the relative performance of the four different methods (Fig. 3),
three different evaluation criteria were chosen. The first is the Pearson
correlation coefficient

The main focus of this study was to evaluate the performance of the four
different tested methods in reproducing observed mean annual erosivity as
input to the USLE-type models. It is the most relevant attribute from the
perspective of the soil erosion modelling community. For the direct
regionalisation of erosivity (Direct-R), EDK with mean annual precipitation
(

Pearson's correlation coefficient, relative bias, and relative RMSE
results for the four tested methods. Box plots show the relevant statistic
for the 20 realisations (

With increasing station density, the median result was generally improved,
in particular for the Pearson correlation (Fig. 8). This is less noticeable
for

In terms of the number of stations needed to provide good estimates of
rainfall erosivity for the ungauged locations, it is clear that higher
station density yields better results, which was expected (Fig. 8). Thus,
Direct-R showed decreasing performance with decreasing station density,

Although the main focus of the study was the calculation of the mean annual
erosivity

However, three of the four methods presented in this study, namely Direct-P,
ARM, and Disagg, are able to reproduce erosive events themselves from which
the mean annual

Relative bias of all stations (

Median values over all reference stations (

For both the mean event duration and volume (Fig. 9), the results again show that the Direct-P method was the most efficient. The Disagg method significantly overestimated the event duration and at the same time underestimated the event volume. However, in the case of using the Disagg method the station density showed less of an effect on performance compared to the other methods (not shown here). An overestimation of the erosive event duration was to be expected as being previously identified by Jebari et al. (2012) with overestimations higher than 40 %. Here the overestimations are higher as the disaggregation inherits the errors caused by the regionalisation of the daily rainfall. Because of the unbiased estimator, OK tends to smoothen the spatial structures of the rainfall, leading to overestimation of the low intensities (explaining the longer event durations) and underestimation of the extreme ones (explaining the lower event volumes). The performance of the disaggregation may improve if another regionalisation method that better captures the temporal variability is employed.

The ARM method underestimated both mean event duration and mean event volume, which explains the underestimation in both annual number of events and their volume. Thus, it can be seen that Direct-P outperformed the ARM and Disagg methods not only in terms of annual rainfall erosivity but also in terms of specific events characteristics. In case one would like to obtain erosive event characteristics for ungauged sites, the Direct-P method should be preferred since it was able to produce acceptable results and yielded better performance compared with the two tested stochastic rainfall models.

Moreover, it should be noted that other available stochastic rainfall models could perhaps yield better performance in comparison to the selected stochastic rainfall models (see Sect. 1 for examples). The scope of the study limited the selection of stochastic rainfall models to the two presented in this study only. Since any rainfall model has its own unique strengths and weaknesses, it is possible that some other non-tested model could yield better performance in terms of reproducing maximum 30 min rainfall intensities, which are directly used for the estimation of the rainfall erosivity (Sect. 3.1).

Despite the fact that Direct-R and Direct-P methods yielded better
performance than the evaluated stochastic rainfall models, the benefit of
the latter is the ability to generate long time series of arbitrary length
of high-resolution data for unobserved locations. The goal of this section
is to investigate how long the synthetic time series should be in order to
obtain a stable estimate of the mean annual rainfall erosivity, which is
most frequently used as an input to the soil erosion models (Panagos et al.,
2015, 2017). Thus, this information is relevant for the soil erosion
modellers in order to evaluate the impact of potential bias in the
assessment of the mean annual rainfall erosivity on the soil erosion
modelling results. Using the ARM model and the methodology described in
Sect. 3.5, it was investigated how many years of data are needed in order to
obtain stable annual rainfall erosivity estimation. Important to remember is
that the ARM model performance for this task is better than what was shown
in Sect. 4.1 and 4.2, as here the ARM model was fitted directly to
observations and not regionalised. Figure 10 shows results of this
investigation. It can be seen that the variability between different
realisations was quite high (Fig. 10). Moreover, investigation of the
intersection between the 5 % and 95 % realisation quantiles (i.e. with the
aim to exclude potential extremes) and the

Relationship between mean annual rainfall erosivity and number of
years used in the rainfall erosivity calculation. Panel

This study evaluated four methods that can be used to estimate the annual
mean rainfall erosivity (

For the mean annual rainfall erosivity both tested direct regionalisation methods (Direct-R and Direct-P) outperformed (Fig. 8) the tested stochastic rainfall models (ARM and Disagg), with slightly better results for Direct-R. Furthermore, in terms of method complexity, Direct-R can be regarded as the simplest since it does not require the fitting of any model parameters. Differences among tested methods were relatively large, for example, in relative bias up to 25 %.

The main drawback of the Direct-R method is that it cannot be used to estimate the number of erosive events or mean event duration without applying the model to every variable separately (e.g. number of erosive events, annual rainfall erosivity). This information is sometimes additionally required in erosivity studies, although it is not directly used by the USLE-type models. Therefore, the Direct-P method has the advantage that it is able to generate high-resolution time series of erosive events for ungauged sites. Therefore, information about the number of erosive events and the characteristics of erosive events can be determined as well. In terms of the characteristics of the erosive events, the Direct-P method yielded better performance than both tested stochastic rainfall models.

Both rainfall generators have proven their applicability in the field of soil erosion modelling since they are able to produce long synthetic series of the high-resolution data, which can be used to calculate stable rainfall erosivity estimates.

The cross-validation methodology using multiple density scenarios (Fig. 7)
indicated that all methods performed slightly better with increasing station
density. However, interpolation of rainfall erosivity for ungauged locations
will in the case of the Direct-R and Direct-P methods introduce some bias
(

Investigation of the impact of time series length on the annual rainfall erosivity for 18 stations was additionally carried out using the ARM model. More than 60 years of data were required in the case that one would like to obtain rainfall erosivity estimates within 20 % of the actual long-term mean annual rainfall erosivity.

Thus, this conclusion is of critical importance for soil erosion studies
where rainfall erosivity estimates are used as input, since in most cases
the high-resolution data used to estimate rainfall erosivity are much shorter
than 60 years. So, in cases where only 5–10 years of observed rainfall data
are available, the estimated mean annual rainfall erosivity can be up to

It should be noted that the approaches presented in this paper should be applied and tested for further case studies with different rainfall and topographical characteristics than for Lower Saxony, which is mostly flat and without major orographic obstacles. Additionally, some study limitations and lessons learned can also be made based on the presented results and conclusions: for example, resolution of the measurement device, which has evolved in recent decades, has a significant effect on the calculated rainfall erosivity and relative bias (Fig. S1).

Data can be requested from the German Weather Service (DWD). The code used in this study is freely available from the first author upon request.

The supplement related to this article is available online at:

All authors developed the concepts of the manuscript. RP conducted most of the calculations with the support of BS, HMT, UH, and NB. NB drafted the first version of the manuscript. All authors contributed to writing and editing of the manuscript.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We would like to acknowledge the German Weather Service (DWD) for provide data used in this study. Also, Nadav Peleg, Mark Silburn, and the anonymous reviewer are acknowledged as well as the associate editor Greg Hancock.

The results of the study are part of the bilateral research project between Slovenia and Germany “Stochastic rainfall models for rainfall erosivity evaluation” and research programme P2-0180 “Water Science and Technology, and Geotechnical Engineering: Tools and Methods for Process Analyses and Simulations, and Development of Technologies” (P2-0180) that is financed by the Slovenian Research Agency (ARRS). Hannes Müller-Thomy has been financially supported by the DFG e.V., Bonn, Germany, as a Research Fellowship (MU 4257/1-1). Additionally, part of the results were also obtained in the scope of the bilateral project between Slovenia and Germany “Validation of precipitation reanalysis products for rainfall-runoff modelling in Slovenia (PRE-PROMISE)”, funded by the German Federal Ministry of Education and Research (BMBF). This open-access publication was funded by Technische Universität Braunschweig.

This paper was edited by Greg Hancock and reviewed by Nadav Peleg, D. Mark Silburn, and one anonymous referee.