Inverse modeling of turbidity currents using an artificial neural network approach: verification for field application

Although in situ measurements in modern frequently occurring turbidity currents have been performed, the flow characteristics of turbidity currents that occur only once every 100 years and deposit turbidites over a large area have not yet been elucidated. In this study, we propose a method for estimating the paleohydraulic conditions of turbidity currents from ancient turbidites by using machine learning. In this method, we hypothesize that turbidity currents result from suspended sediment clouds that flow down a steep slope in a submarine canyon and into a gently sloping basin plain. Using inverse modeling, we reconstruct seven model input parameters including the initial flow depth, the sediment concentration, and the basin slope. A reasonable number (3500) of repetitions of numerical simulations using a one-dimensional layer-averaged model under various input parameters generates a dataset of the characteristic features of turbidites. This artificial dataset is then used for supervised training of a deep-learning neural network (NN) to produce an inverse model capable of estimating paleo-hydraulic conditions from data on the ancient turbidites. The performance of the inverse model is tested using independently generated datasets. Consequently, the NN successfully reconstructs the flow conditions of the test datasets. In addition, the proposed inverse model is quite robust to random errors in the input data. Judging from the results of subsampling tests, inversion of turbidity currents can be conducted if an individual turbidite can be correlated over 10 km at approximately 1 km intervals. These results suggest that the proposed method can sufficiently analyze field-scale turbidity currents.


Introduction
Turbidity currents are sediment-laden density flows that occur intermittently in deep-sea environments (Talling, 2014). Turbidity currents are the main drivers of mass circulation processes in deep-sea environments. In fact, estimating the flux of organic carbon transported and buried by turbidity currents is particularly necessary to understand carbon cycle processes (Buscail and Germain, 1997;Heussner et al., 1999). In addition, the deposits of turbidity currents, i.e., turbidites, form submarine fans on the seafloor, which may function as large-scale hydrocarbon reservoirs (Kendrick, 1998;Yoneda et al., 2015), and are thus economically essential.
Through recent development of observational instruments, the velocity and flow depth of deep-sea currents can be mea-sured directly (Hughes Clarke, 2016). Consequently, numerous records of turbidity currents have been reported at locations such as Squamish Bay, Canada (Hughes Clarke, 2016), Monterey canyon offshore of California (Xu et al., 2004;Xu, 2010;Paull et al., 2018), and the Congo Submarine Channel (Vangriesheim et al., 2009;Azpiroz-Zabala et al., 2017). Surprisingly, these records have revealed that turbidity currents occur almost monthly in modern submarine environments (Paull et al., 2018). The record in Squamish even indicates seven events in 1 d. However, the turbidites observed in outcrops and cores are deposited at intervals of 500-1000 years or longer. For example, Ishihara et al. (1997) investigated the deposits of the fore-arc basin (the Pliocene Awa Group) and reported that turbidite beds were deposited approximately once every 1200-1300 years. Clare et al. (2014) analyzed turbidites in the western Mediterranean Sea, off the northwestern coast of Africa, and in the Apennines and found that they were deposited with a frequency of about 1400-36 000 years in all regions. In contrast to these geologic records, recent field observations show that turbidity currents are not a rare event.
What caused the difference in observed frequency of modern turbidity currents compared with the records of ancient turbidites? One of the possibilities is that most of the turbidity currents observed in the present day may be very small in magnitude or are diluted and leave little or no deposits in large areas. If this is the case, turbidites several tens of centimeters thick observed in geologic records can be interpreted as deposits of extraordinarily large-scale events that occurred once every several hundred years. This hypothesis implies that turbidites in the strata resulted from lowfrequency but very high-risk events such as large tsunamis and earthquakes (Goldfinger et al., 2003). In this case, we would have to assume that the velocity and concentration of turbidity currents obtained from in situ observations are quite different from those of turbidite-forming currents in strata. Another possibility is that the areas where turbidity currents have been measured experienced very special conditions and that the frequency of turbidity currents will be significantly reduced even in those areas over long timescales. It is very difficult to determine which of these hypotheses is correct at this time because typical hydraulic conditions under which ancient turbidites were deposited have not been well understood. Although the characteristics of turbidity currents in natural environments have been elucidated rapidly through recent in situ observations of flow properties (Paull et al., 2018), the flow characteristics of turbidity currents that form actual submarine fans remain unknown.
The inverse analysis of turbidites in strata may fill the gap between observations of turbidity currents and geologic field observations of ancient turbidites. The reconstruction of past conditions by inverse analysis has been a major tool in several research fields including sedimentology and geomorphology. For example, several studies have reconstructed the magnitudes of past tsunamis from tsunami deposits (Jaffe and Gelfenbaum, 2007;Naruse and Abe, 2017;Mitra et al., 2020), and Rossano et al. (1996) estimated the behavior of pyroclastic flows using inverse analysis. If the hydraulic conditions of turbidity currents, such as velocity and concentration from turbidites, can be reconstructed, it should be possible to verify whether turbidite beds in geologic records were deposited from flows of different scales or not by comparing the reconstructed values with the in situ observations. However, no practical methodology for the inverse analysis of turbidity currents applicable on a field scale has yet been established. Early attempts to obtain hydraulic parameters of turbidity currents were based on the grain size distribution of turbidites (Scheidegger and Potter, 1965;van Tassell, 1981;Bowen et al., 1984;Komar, 1985;Kubo, 1995) or on sedimentary structures (Harms and Fahnestock, 1960;Walker, 1965;Allen, 1982;Komar, 1985;Allen, 1991;Baas et al., 2000). The estimation of hydraulic conditions for turbidity currents based on grain size assumed that the flow is close to the criteria of suspension or auto-suspension (Komar, 1985), but it has been emphasized that this assumption is highly problematic and leads to significantly different results compared with the actual hydraulic conditions for turbidity currents (Hiscott, 1994). Although the methods based on sedimentary structures can provide rough estimates of the conditions of a turbidity current, assumptions regarding the thickness of the flow are required (Ohata et al., 2017).
To obtain reasonable flow characteristics from turbidites, inverse analysis using a numerical model should be performed. Falcini et al. (2009) proposed a method for predicting the hydraulic conditions of turbidity currents from ancient turbidites and applied it to the Laga Formation in the central Apennines, Italy. Their steady-state model was largely simplified to obtain an analytical solution of the model. However, most of the ancient turbidites are characterized by graded bedding (Bouma, 1962), which suggests a non-steady waning nature of currents. Therefore, the applicability of this method should be quite limited to non-graded turbidites deposited from long-maintained flows. Conversely, Lesshafft et al. (2011) applied a direct numerical simulation model for the inversion of turbidite; however, the application to field-scale data is difficult because of the high calculation cost. Parkinson et al. (2017) proposed a method applicable to non-steady field-scale flows by using a layer-averaged model as the forward model, which is potentially applicable to turbidites in outcrops. However, the flow conditions predicted from ancient turbidites were quite unrealistic in their study. They analyzed a turbidite in the Marnoso Arenacea Formation in the Apennines and gave flow depth of 3950 m or 1.92 mm; both reconstructions are not acceptable as realistic conditions. These extremely large or small estimates may be due to oversimplification in their forward model or failure in the optimization of the input parameters. Nakao and Naruse (2017) were the first to successfully perform an inverse analysis of turbidites using a general non-steady layer-averaged model. Although their reconstruction of the hydraulic conditions of the turbidity current was reasonable, the computational load of the inverse analysis was high because they used a genetic algorithm for optimization. Thus, they were unable to repeatedly analyze various artificial or field data to test the validity and robustness of their inverse model. In addition, because of the high computational load, modifying their forward model to a more complex one in the future would be difficult. These previous attempts suggest that a robust inverse model that can accept a more complex forward model is required to conduct inversion of turbidity currents from turbidites under realistic conditions.
Here, we propose a new methodology using an artificial neural network (NN) for obtaining flow characteristics of turbidity currents from their deposits (Fig. 1). NNs are machinelearning systems that can be trained to perform very complex functions (Hecht-Nielsen, 1987). NNs have been used in a wide range of applications such as classification (Krizhevsky et al., 2012) and generative modeling (Sun, 2018). In recent years, this method has also been widely applied in the field of Earth and planetary sciences (Laloy et al., 2018). Particularly, NNs are a powerful tool for high-dimensional regression of multiple variables with complex distributions (LeCun et al., 2015). In this study, we generate a nonlinear regression model to estimate the hydraulic conditions of turbidity currents from the spatial distribution of bed thickness and grain size of turbidites using NN. If the regression is adequate, the NN can be used as an inverse model of turbidity currents. However, there are too few in situ measurements of the hydraulic conditions of turbidity currents available. Although it is predicted that at least several hundred datasets of hydrological conditions and depositional characteristics are required to train an NN, such frequent observation of turbidity currents that occur intermittently on the deep seafloor cannot be expected. Therefore, the method proposed in this study is designed to generate data on deposits from known conditions by numerical calculations of the forward model. In this case, the generation of training data can be completely parallelized, and therefore any model that incurs a high computational load can be implemented as a forward model.
In this study, we implement an NN-based inverse analysis and examine its effectiveness for turbidites at the field scale. The focus of this study is on rapidly decelerating sedimentary turbidity currents, and normally graded turbidites are considered to be deposited from such decaying flows. This approach has already proven to be effective for the inverse analysis of tsunami deposits by Mitra et al. (2020). However, the forward model used in their study was based on the assumption of quasi-steady flow, and thus our work is the first to perform the inverse analysis using a neural network with completely unsteady flow. The success of the inverse analysis for turbidity currents, which exhibit quite different properties from those of tsunamis, would indicate the wide applicability of our inversion framework for event deposits.

Forward model description
Here we describe the formulation of the forward model used for producing training datasets for the inverse model (Fig. 2). This model is based on the model developed by Kostic and Parker (2006), which predicts the behavior of surge-type turbidity currents, but we modified it to consider sediment transport and deposition of multiple grain size classes. The initial setting of the flows was set to be the lock-exchange condition, which assumes that the collapse of a rectangular-shaped cloud of sediment suspension produces a turbidity current.
In a turbidity current flowing over hundreds of kilometers, Luchi et al. (2018) suggested that the upper layer of the current is predicted to be continuously diluted, while the lower layer remains highly concentrated, thus maintaining the cur-rent over long distances. Existing one-layer shallow-water equation models are insufficient to reproduce such phenomena. The forward model of this study is not an exception.
However, the focus of this study is on rapidly decelerating sedimentary turbidity currents. Normally graded turbidites are considered to be deposited from such decaying flows. In this study, the distribution of turbidites is assumed to be limited to several tens of kilometers at most, and the separation of the lower and upper layers that occurs in sustained turbidity currents after flowing tens of kilometers does not need to be considered when calculating such relatively small-scale turbidity currents. In fact, the model of Kostic and Parker (2006), on which the forward model of this study is based, has been verified to reproduce turbidity currents at experimental and small natural scales (e.g., Fildani et al., 2006). This suggests that the inverse model in this study is well suited to analyze a single bed of turbidites that generally exhibit normal grading in strata.

Layer-averaged equations
Let t and x be the time and bed-attached streamwise coordinates, respectively. Parameters U and h denote the layeraveraged flow velocity and the depth, respectively. The total sediment concentration is C T . Here, we apply the following layer-averaged conservation equations of fluid mass, momentum, and suspended sediment mass of a turbidity current (Parker et al., 1986;Kostic and Parker, 2006).
Here, R(= ρ s /ρ f −1) is the submerged specific density of the sediment (ρ s and ρ f are the densities of the sediment and the fluid), and g is the gravity acceleration. S is the slope, and C f denotes the friction coefficient. The right-hand side of the fluid mass conservation (Eq. 1) considers the entrainment of ambient fluid to the flow, in which the empirical entrainment coefficient e w is applied. Equation (3) describes the mass conservation of the suspended sediment in the flow, which varies depending on the balance between settling and entrainment of the sediment from and to the active layer. In this model, the grain size distribution of sediment is discretized to N classes. The parameter C i denotes the suspended sediment concentration of the ith class. The model applies the active layer assumption, in which the grain size distribution is vertically uniform in the bed surface layer (active layer) that exchanges sediment with suspended load (Hirano, 1971). F i indicates the fraction of the ith grain size class in the active layer. The parameter w si denotes the settling velocity of the sediment particles in the ith class, and r 0 denotes the ratio of  The volumetric rate of settling of the ith grain size class of sediment is calculated from the basal sediment concentration r 0 C i multiplied by the sediment settling velocity w si . The sediment entrainment rate from the active layer is w si F i e si , where F i is the volumetric fraction of the ith grain size class in the active layer and E si is the unit dimensionless rate of sediment entrainment. The time variation of grain size distribution in the active layer is computed in this model. the near-bed concentration to the layer-averaged concentration of the suspended sediment. The mass conservation of the sediment in the active layer and the deposit (historical layer) takes the form where η i denotes the volume per unit area of the ith grain size class, and η T is the total thickness of the deposit. L a denotes the thickness of the active layer, which is assumed to be constant for simplicity. The parameter λ p denotes porosity of the active layer and the deposit (0.4 in this study), and e si is an empirical coefficient for sediment entrainment of the ith class from the active layer. Equation (4) describes the mass conservation of the ith class sediment in the bed, and the rate of bed aggradation is obtained by summation of the accumulation rates of all grain size classes (Eq. 5). Equation (6) considers the temporal development of the grain size distribution in the active layer; the time development of the total bed thickness η T is obtained by summation of the right-hand side of Eq. (4) for all grain size classes. To solve Eqs.
(1)-(6), empirical relations are required for the parameters w si , r 0 , C f , L a , e w , and e si . Here we applied the formulation of Dietrich (1982) for obtaining the settling velocity w si . The ratios of near-bed to layer-averaged concentrations r 0 and the bed friction coefficient C f are fixed to be 2.0 and 0.004 for simplicity (Garcia, 1990). The active layer thickness L a is assumed to be constant (0.003 m). Regarding the entrainment coefficients of ambient water and basal sediment e w and e si , we applied formulations proposed by Parker et al. (1987) and Garcia and Parker (1991), respectively.
For computational efficiency and numerical stability, a deformed grid approach was adopted to solve Eqs. (1)-(3). In this transformed coordinate, the propagating flow head was fixed at the downstream boundary using a Landau transformation (Crank, 1984). The tail of the flow was also fixed at the upstream end of the calculation domain, and thus the grid spacing in the dimensional coordinate space was continuously stretched during calculation, whereas that in dimensionless space remained constant. This scheme was based on Kostic and Parker (2006), and more details regarding the numerical implementation are given by Nakao and Naruse (2017).

Model input parameters and topographic settings
In this study, a turbidity current was assumed to occur from a cloud of suspended sediment (height H 0 , length l 0 ). The initial flow velocity was set to 0, and the sediment of the Figure 3. Model input parameters. The initial conditions of the turbidity current are assumed to be the suspended sediment cloud that is H 0 and l 0 in height and length, respectively. The initial sediment concentrations C 1 to C 4 and the basin slope S l are to be specified for calculation. These seven input parameters are subject reconstruction by inverse analysis.
ith grain size class was considered to be initially homogeneously distributed in the suspension cloud at the concentration C i (Fig. 3). The suspended sediment cloud was located at the upstream end of the calculation domain, where the slope gradient was 0.1. This steep slope extended for 5.0 km and transited to a gently sloping basin plain (gradient is S l ) in the downstream region. The total length of the calculation domain was 100 km. In summary, the number of initial conditions required for the forward model calculation was three (H 0 , l 0 , and S l ) plus the number of grain size classes (C i ).

Inverse modeling by a deep-learning NN
In this study, numerical simulation of a turbidity current is repeated under various random initial conditions to produce a dataset of the characteristic features of turbidites. Then, this artificial dataset of turbidites is used for supervised training of a deep-learning NN. The values of the turbidite characteristics, i.e., distribution of volume per unit area of all grain size classes, in the training dataset are input to the NN, and the estimated initial conditions (e.g., initial flow height and concentration) of the turbidity current are obtained from the output nodes of the NN. The output values of the NN are compared with the true conditions. The optimization of weight coefficients of the NN is then conducted to reduce the mean square of the difference between the true conditions and the output values of the NN. If the number of training datasets is sufficiently large, the trained NN should be able to estimate the paleo-hydraulic conditions from the data on the ancient turbidites ( Fig. 1). In other words, an empirical relationship with numerical results and the model input parameters is explored in this method, and the discovered relationship is used for inverse modeling of turbidity currents.
The local conditions of a turbidity current (velocity, concentration, etc.) at any location and time can be estimated from the reconstructed initial conditions. The flow parameters are obtained by calculating the time evolution of the for-ward model from the initial conditions. In this way, we can obtain the behavior of the flow with a relatively small number of parameters. This approach has already been tried successfully by Lesshafft et al. (2011), andFalcini et al. (2009) also reconstructed flow conditions of turbidity currents by obtaining boundary conditions of the model.
The details of these procedures are described below.

Production and preprocessing of training and test datasets for supervised machine learning
We conducted iterative calculations using the forward model and accumulated data to train and validate the inverse model. To investigate the appropriate amounts of data for training the inverse model, we conducted 500-3500 iterations of the forward model calculations. To verify the performance of the trained model, 300 test datasets were also generated numerically, independent of the training data.
Model input parameters that are subject to inversion are required to produce the training and test data by the forward model calculation (Fig. 3). In this study, the model inputs are the initial flow height H 0 , the initial flow length l 0 , the initial sediment concentration for the ith grain size class C i , and the basin slope S. These model parameters are generated as uniform random numbers within a certain range, and their range is changed according to the target of the inverse analysis. Since this study is aimed at field-scale analysis, the following ranges are chosen. Both initial depth and length of the suspended cloud range from 50 to 600 m. The sediment concentration for each grain size class ranges from 0.01 % to 1.0 %. The number of grain size classes N is four, and the representative grain diameters are 1.5, 2.5, 3.5, and 4.5 φ. The inclination of the basin plain where the turbidites are expected to form ranges from 0 % to 1.0 %.
Each run of the forward model calculation is initiated with the given model input parameters and is terminated when the flow head reaches the downstream end or a sufficiently long time period (1.2 × 10 5 s) has elapsed. As a result of the calculation, the forward model outputs the volume per unit area of sediment for all grain size classes over the 100 km long calculation domain. The inverse model estimates the model input parameters from the resultant spatial distribution of the granulometric characteristics of the deposits. However, in natural outcrops, it is unlikely that the entire distribution of the turbidite beds would be exposed. Therefore, we limit the length of the sampling window in the calculation domain, and only the sediment data contained in this window are extracted for both training and testing. The upstream end of the sampling window was set at the transition point between the steep slope and the basin plain (5 km from the upstream end), and the length of the window varies from 1 to 30 km to evaluate the data interval required for the inverse analysis.
Before the model input parameters are input to the NN, all values are normalized between 0 and 1 using the following equation: where I * i and I i denote the ith normalized and original input parameters, respectively. I maxi and I mini are the maximum and minimum values used for generating the ith input parameter, respectively. This min-max normalization is applied to consider all parameters at equal weights because the range of the initial flow conditions is significantly different between them.

Structure of NN
The artificial NN is used as the inverse model to reconstruct flow conditions from the depositional architecture. We input the spatial distribution of volume per unit area of multiple grain size classes of a turbidite in the NN, which outputs the values of the flow initial conditions and the basin slope. In this study, we use a fully connected NN that has four hidden layers. The volume per unit area of N grain size classes of sediment deposited on M spatial grids in the sampling window is given to the input nodes of the NN. Thus, the total number of NN input nodes is N × M. The number of nodes in all hidden layers is set to 2000 in this study.
The rectified linear unit (ReLU) activation function is adopted for all NN layers (Nair and Hinton, 2010;Glorot et al., 2011). The ReLU is the half-wave rectifier f (z) = max(z, 0). Compared with other smoother nonlinearities, such as tanh(z) or 1/(1+exp(−z)), the ReLU typically learns much faster in an NN with multiple layers (Glorot et al., 2011), and thus it allows us to train a deep supervised network without unsupervised pre-training (LeCun et al., 2015).
The NN is expected to output the model input parameters (i.e., the initial flow conditions and the basin slope), and therefore the number of nodes in the output layer is equal to the number of input parameters for the forward model, which is seven here (the initial flow length, depth, sediment concentrations, and the basin slope).

Training the inverse model
To develop the inverse model, supervised training is conducted using the artificial dataset produced by the forward model calculation. First, the artificial dataset is randomly split into training and validation datasets to detect overfitting during the training process. The ratio of the validation dataset is set to 0.2 so that 80 % of the artificial dataset is used for training. The model input parameters used for producing training and validation sets were regarded as the teacher data to train and evaluate the model.
The methodology applied for training the NN is as follows. The mean squared error (MSE) is adopted as the loss function because the supervised training of the NN in this study is classified as a regression problem (Specht, 1991), and MSE is a common loss function for regression (Bishop, 2006;Hastie et al., 2009;Shalev-Shwartz and Ben-David, 2014). Before training, all weight coefficients of the NN are randomly initialized using the Glorot uniform distribution (Glorot and Bengio, 2010). The back-propagation algorithm (Rumelhart et al., 1986) is used to calculate the derivative of this error metric for each connection between the nodes, and the stochastic gradient descent method (SGD) with Nesterov momentum (Nesterov, 1983) is used for optimizing the weight coefficients of the NN to minimize the difference between the model predictions and the teacher datasets. Other optimization methods, such as AdaGrad (Duchi et al., 2011), RMSprop (Tieleman and Hinton, 2012), and AdaDelta (Zeiler, 2012), have been tested, but SGD shows the best performance in this case. Dropout regularization (Srivastava et al., 2014) is applied for each epoch to reduce overfitting and to improve the generalization ability of the NN. One training epoch, which refers to one cycle through the full training dataset, is repeated until the loss function of the validation dataset converges to a constant value. These methods are all implemented in Python with the library Tensorflow 2.1.0 (Raschka and Mirjalili, 2019), and the calculations are conducted using GPU NVIDIA GeForce GTX 2080 Super with libraries CUDA 11.0 and CuDNN 7.0.
Several hyperparameters should be specified for the training of an NN. Specifically, the dropout rate, the learning rate, the batch size, the number of epochs, and the momentum are adjusted manually after repeated trial and error. To perform an optimization calculation with SGD, the batch size and the learning rate were set to 32 and 0.02, and the value 0.9 was chosen for the momentum. The dropout rate for regularization was 0.5.

Testing the inverse model
The performance of the inverse model is tested using a set of 300 data that are produced independently of the training and validation datasets. The inversion precision for each model input parameter is evaluated by the root mean square error (RMSE) and the mean absolute error (MAE) of the prediction. These error metrics are computed for both raw and normalized values with true values and used to evaluate the model. Moreover, the bias of prediction (i.e., the mean deviation of the model predictions from the true input parameters) is used to describe the accuracy of the inversion.
Three additional tests are conducted for verifying the robustness of the inverse model that is significant for the applicability of the model to field datasets. The results of these tests are evaluated by the average of the normalized RMSE, which is defined as where I pj k and I j k denote the predicted and the original values of the j th model input parameter for the kth test dataset, respectively. J and K are the numbers of model input parameters and test datasets. First, noise is artificially added to the test data to evaluate the robustness of the inversion results against the measurement error. Under natural conditions, measurement errors in the thickness and grain size analysis of turbidites as well as the local topography affect these results. If the results of the inverse analysis change significantly due to such errors, it means that our method is not suitable for application to field data. To investigate this, we apply normal random numbers to the volume per unit area at each grid point in the training data at various rates, and we observe how much influence the noise has on the inverse analysis results.
The second test on the inverse model is to perform a subsampling of the grid points in the training data. Outcrops are not continuous over tens of kilometers, so the thickness and the grain size distribution of a turbidite in the interval between outcrops can only be obtained by interpolation. To simulate this situation, the grid points in test datasets are randomly removed in this test, and the volume per unit area at the removed grid points is linearly interpolated. By varying the rate at which grid points are removed, this test also allows us to estimate the average interval of the outcrops necessary for conducting the inverse analysis. That is, if 90 % of the grid points set at 5 m intervals are removed and the inverse analysis is conducted on the remaining 10 %, the average distance between the grid points is 50 m. Estimating the outcrop spacing requires obtaining reasonable results of inverse analysis before applying it to the actual field.
Finally, the influence of the length of the upstream slope was examined. In this study, it is assumed that a steep slope (10 %) of a submarine canyon with a length of 5 km exists upstream, and a basin plain with a gentle slope exists downstream of the steep slope. Although the topography and deposits of the upstream slope are not the subject of the inverse model analysis, the length of the slope potentially affects the results of the inverse analysis. As a test, we set a slope of 10 km length instead of 5 km upstream and deposited a turbidite bed from the turbidity current flowing down from the uppermost part of the slope. The turbidite was then analyzed using a model trained on the assumption of a 5 km slope to compare the reconstructed values with the original conditions.

Properties of artificial datasets of turbidites
Here, we describe the properties of turbidite artificial data generated for training and testing the inverse model. Several artificial datasets of turbidites are produced using a 1D shallow-water equation model. Figure 4 exhibits examples of the calculated spatial distribution in the bed thickness and grain size of turbidites deposited in the region of the basin plain. Most beds exhibit the typical "top hat" or "core and drape" shape of turbidites (Hirayama and Nakajima, 1977;Talling et al., 2012;Pantopoulos et al., 2013): turbidite beds become thicker in the upstream part of the basin and then thin rapidly from their peak of thickness. Thereafter, beds continue over a long distance, gradually decreasing in thickness (Fig. 4). At the same time, the grain size gradually becomes finer downstream. The maximum thickness of beds is 1.27 m on average (standard deviation σ = 1.65 m), and the mean value of the area where sediments with a thickness greater than 1 cm are distributed is 42.0 km (σ = 15.7 km). Each bed is composed of four grain size classes. All distributions of the volume per unit area of the grain size classes are still top-hatshaped (Fig. 4b), but the depositional center and the amounts of deposition are different for each class depending on their size.

Results of training
We trained the NN inverse model with various numbers of artificial data and lengths of the sampling window, and the best result in terms of the value of the loss function for the validation sets and the practical usage of the model can be obtained with 3500 training datasets and a 10 km long sampling window (Fig. 5). Results with fewer than 2000 training datasets produce a discrepancy in the loss function between the training and the validation sets, indicating over-learning of the NN. Conversely, when the number of datasets exceeds 2000, the loss function of the validation set is slightly less than the value of the training set. As the number of training data increases, the resultant values of the loss function improve. However, when the number of data exceeded 2500, the improvement of values of the loss function became not so rapid. Regarding the distance of the sampling window, the training results are not stable when the sampling window is shorter than 5 km (Fig. 5). On the other hand, the training results are stable when the window length is longer than 10 km, and the results gradually improve as the window length increases. However, extending the window length from 10 to 30 km results in little improvement of the loss function. We do not fully understand why the results are not stable for sampling windows shorter than 5 km, but it probably indicates that the training results fall into a local optimum solution depending on the initial values of the weight coefficients of the neural network (given by random numbers) due to incomplete information. In any case, the loss function is very good (less than 0.01), so even turbidites that can be tracked for less than 5 km are likely to give good results if the outcrop spacing is sufficiently narrow and detailed observation of beds is possible.
Hereafter, we further investigate the performance of the inverse model trained on 3500 datasets with a 10 km long sampling window. The history of training indicates that the values of the loss function improved significantly in the first 1000 epochs, and the results are improved up to 15 000 epochs (Fig. 6). Eventually, saturation is reached at approximately 20 000 epochs. The resultant loss function (i.e., the MSE of prediction) is 3.78 × 10 −3 for training sets and is 1.03 × 10 −3 for validation sets.

Precision and accuracy of inverse analysis
Using 300 test datasets, the performance of the inverse model trained with 3500 datasets and a 10 km long sampling window is evaluated. The estimated parameters match well, with slight deviations (Figs. 7, 8; Table 1). R 2 values are beyond 0.98 for all parameters. Particularly good agreement is obtained for the estimates of the initial height and the length of the suspended sediment cloud. Values of the normalized RMSE and MAE for these parameters are less than 9 % and 6 %, respectively. The sediment concentration is also precisely estimated. The normalized RMSE for the sediment concentration ranges from 12 % to 16 %, which corresponds to only 0.02-0.03 volumetric percent. The prediction for the basin slope shows relatively large errors (RMSE is close to 20 % and MAE is 11.7 %), but these errors correspond to only 0.03 % of slope. Focusing on the bias of the estimates, all estimated values except for the basin slope tend to be slightly smaller, whereas the predicted values of the basin slope tend to be larger (Fig. 8). The values of the bias, however, range only from 2 % to 12 % of the original value.
The forward model is calculated again using the reconstructed values to examine the influence of the estimation error of the model input parameters on the predicted flow behavior (Fig. 9). The chosen test values deviate from the true conditions as indicated by the RMSE value of 0.27 (Table 2), but the time evolution of the flow characteristics agrees very well with those calculated from the true values (Fig. 9). When comparing the velocity and concentration of the flow at 10 km from the upstream end, the discrepancy between calculation results using reconstructed and original parameters is less than 5 % for both parameters.

Tests for robustness against noise and subsampling on input data
The test data with various normal random values are analyzed to verify the robustness of the inverse model. Consequently, even when the standard deviation of the normal random numbers given as measurement errors was set to approximately 200 % of the value of the original data, only a small effect was observed in the normalized root mean square (rms) of the results of the inverse analysis (Fig. 10). The rms values gradually increase when the standard deviation of errors exceeds 50 %, but there is no rapid increase in the RMSE of the results at any particular threshold. Similarly, using subsampling data obtained by extracting some of the spatial grids from the original data, we conducted an inverse analysis of the test datasets. The results show that there is little influence on the RMSE values of the inverse analysis of the test datasets when the sampling rate of grids 1100 H. Naruse and K. Nakao: Inverse modeling of turbidity currents by a neural network   is greater than 1 % (Fig. 11). The RMSE values gradually increased when the sampling rate falls below 1 %, and RMSE becomes extremely high when the rate drops below 0.4 %.

Tests for influence of length of the upstream slope
Here, a turbidite deposited in a different topographic setting was analyzed to determine the influence of the topo-  graphic assumptions on inversion results. The slope of 10 km instead of 5 km was set at the upstream end of the calculation domain. The initial conditions for this test assuming a 10 km slope were a suspended sediment cloud 359 m high and 227 m long, with concentrations of 0.13 %, 0.15 %, 0.38 %, and 0.65 % for the four grain size classes. The gradient of the downstream slope was set to be 0.69 %. As a result, the initial conditions estimated by the inverse model trained on the assumption of a 5 km upstream slope were a suspended sediment cloud 117 m high and 587 m long, with concentrations of 0.33 %, 0.38 %, 0.48 %, and 0.53 % for each grain size class, and the downstream slope was estimated to be 0.96 %. Then, these initial conditions were given to the forward model to calculate their time development, and the obtained parameters were compared on a basin plain where the turbidite was deposited (Fig. 12).
The results showed that the model with a 5 km slope predicted values relatively close to the original results for the flow velocity (Fig. 12). The model with both a 5 and 10 km slope calculated velocities that were approximately 3 m s −1 maximum over the basin plain and gradually decelerated downstream. However, because the slope length is different, the time to reach each point on the basin plain differs greatly.
In contrast, the concentration of the turbidity current was significantly overestimated in the model reconstruction assuming a 5 km slope (Fig. 12). When the flow reached the downstream gentle slope at about 10 km, the original turbidity concentration was about 0.2 % at maximum, while the restored value was closer to 0.5 %. As a result, the thickness of turbidite estimated from the reconstructed initial values was also thicker than the original values.

Performance of inverse model
The performance of the inverse model for turbidity currents is evaluated using the test dataset, implying that this model can accurately reconstruct the flow characteristics of the turbidity currents from the spatial distribution of the thickness and grain size of turbidites (Figs. 7 and 8). The biases in the values reconstructed from the true input parameters are also very small and should thus not pose a serious issue when the method is applied to actual field data.
The inverse model reconstructed not only the initial conditions of turbidity currents accurately, but also the predicted time evolution of the flow behavior accurately and precisely. In the results of the forward model calculations using the predicted model input parameters that relatively deviate from the true values (Table 2), the time evolution of the velocity and the thickness of the flow does not deviate significantly from the results using the true values (Fig. 9).
Turbidity currents have a mechanism called selfacceleration, which is caused by erosion and associated increase in the flow density (Parker et al., 1986;Naruse et al., 2007;Sequeiros et al., 2009). Therefore, even slight differences in the initial conditions of the flow can lead to very different results of the time evolution of the flow parameters. However, the results of this test imply that the accuracy of the inverse analysis in this study is enough to prevent such a drastic change in the flow behavior.
The relationship between turbidity currents and characteristics of turbidites is nonlinear. Especially when the flow is self-accelerating, a small difference in the initial conditions can result in very different sedimentary characteristics. This means that it is easy to find the initial conditions of the flow by inverse analysis because even if the characteristics of the deposits are very different, the initial conditions of the flow should not be so different. Thus, the inverse results in this case are expected to be robust even if there are some measurement errors in the characteristics of deposits. In other words, there is a trade-off between the robustness of the forward and inverse modeling.
This property of the inversion can be understood when we consider the opposite case. If the initial conditions of the flow are different but the characteristics of the turbidites are exactly the same, it is impossible to estimate the flow conditions from the turbidites. The inverse analysis of hydraulic conditions is possible because the depositional characteristics are sensitive to conditions of turbidity currents. The selfacceleration of turbidity flow is an extreme example of the sensitivity of turbidites to the flow initial conditions.

Applicability to field-scale problems
To apply this method to outcrops, the extent of the area that should be surveyed to collect data and the interval between outcrops should be determined. The tests with different sizes of sampling windows suggest that the survey region should be located more than 10 km from the proximal region (Fig. 5). The loss function (i.e., the MSE of the estimates of the parameters) decreases as the length of the sampling window increases, and the best result is obtained at the 10 km long window. Regarding the interval of the outcrops, the test results of sampling rates of more than 1.0 % with interpolation for data at non-sampled grids are not inferior to the full sample. Since the training data used in this study are computed on 5 m spaced grids, extracting data from these grids with a 1.0 % probability is equivalent to conducting an inverse analysis from outcrop data that are distributed at 0.5 km intervals on average. Although the RMSEs of the model prediction certainly increase when the sampling rate decreases below 1.0 %, the RMSE values does not drastically worsen until 0.5 %. Therefore, even if the outcrop spacing is about 1 km, it should be possible to obtain reasonable estimates of the flow characteristics.
These requirements for accurate inversion are attainable in the actual field. For example, Hirayama and Nakajima (1977) correlated individual turbidites of the Pleistocene Otadai Formation distributed in the Boso Peninsula, Japan, on the basis of the key tuff beds. Their correlation covered a region over 30 km long with 33 outcrops. Thus, the average interval between outcrops was approximately 1 km. Amy and Talling (2006) correlated individual beds in the Miocene Marnoso Arenacea Formation, Italy, using the Contessa Megabed and an overlying "columbine" marker bed as the key beds. Their correlation covers 109 sections of approximately 30 m thick succession and extends over 120 km in a direction parallel to flow. Other studies in various regions (e.g., the Arnott Sandstone in France) also reported the correlation of individual turbidites at similar scale and frequency (Hesse, 1974;Tokuhashi, 1979Tokuhashi, , 1989Amy et al., 2000Amy et al., , 2004. Furthermore, Bartolini et al. (1972) surveyed the western Alboran Basin Plain, Mediterranean Sea, and discovered an individual turbidite on the seafloor at 49 cores over approximately 30 km. The records of cores at similar scale and intervals have also been reported by other studies of modern submarine fans in different areas (Bornhold and Lilkey, 1971;Pilkey et al., 1980). In summary, although the method proposed in this study requires fairly high-resolution data on turbidite individual beds correlated over a long distance, such conditions in ancient geological records and modern seafloor surveys can be achieved.
Besides these outcrop conditions, measurement errors in the field are another important factor for application. The test results suggest that the proposed inverse model of this study is very robust against random noise; random errors in the measured data have little effect on the results (Fig. 10). Therefore, even if localized and small-scale scouring and sedimentation occur due to some processes such as bottom currents after the deposition of a turbidite, results of inverse analysis will not be seriously affected. However, if deposits of multiple events are amalgamated to form a single thick massive sandstone, the hydraulic conditions reconstructed from the bed should be considerably different from the actual conditions. To avoid this situation, it is important to identify the erosional surface inside the bed carefully at the actual outcrop. In addition, it is safer not to analyze massive sand-stones that are more than several meters thick because they are likely to be amalgamated deposits.
Perhaps the most significant drawback to analyzing actual turbidites is the assumption about the topography of the upstream submarine canyon. In this study, we tested doubling the length of the upstream slope and found that the predicted values for the concentration were different from the original values (Fig. 12). In the case of actual analysis, the upstream topography can be set correctly if the modern submarine fan is analyzed. Regarding ancient turbidites, however, some assumptions about the length and scale of the submarine canyons are necessary without measurements. In this case, it is recommended to set up various lengths of submarine canyons within a reasonable range and to carefully examine the degree to which these assumptions affect the inverse analysis results. Nevertheless, it is worth noting that the test results were reasonable for velocity (Fig. 12), even if the assumption about the length of the upstream slope was substantially different. This suggests that the inverse model proposed in this study can generally reconstruct the behavior of turbidity currents in sedimentary basins, even if the development process of turbidity currents upstream is different.

Comparison with previous methodologies
In existing inverse analysis methods for turbidity currents, the difference in depositional characteristics between the outputs of the forward model and the field observation is quantified as the objective function, and the initial and boundary conditions of the forward model are determined by conducting optimization calculations to minimize the objective function (e.g., Nakao and Naruse, 2017). This is because models of turbidity currents are generally nonlinear and are difficult to linearize, especially when considering the entrainment of the basal sediment (Parker et al., 1986). Although the actual computational load depends on the choice of algorithm, this type of optimization calculation generally consists of multiple steps, and each step depends on the results of the previous calculation. Thus, the entire optimization procedure is difficult to parallelize. For instance, the kriging-based surrogate management method (Lesshafft et al., 2011) and the genetic algorithm (Nakao and Naruse, 2017) have been used to optimize the objective function for inversion of turbidity currents. In these methods, multiple calculations are conducted in each calculation step (generation), and the distribution of the objective function in the parametric space is iteratively estimated. Although the computations within each generation can be parallelized in this kind of algorithms, the next generation's computation depends on the results of the previous generation's computation, and therefore the entire computation process cannot be parallelized. Thus, if the computational load of the forward model is high, the inverse analysis takes an unrealistic amount of time. Parkinson et al. (2017) applied the adjoint method with the gradient-based optimization algorithm. Although the differ-entiation of the layer-averaged model by the adjoint method greatly reduces the load of the gradient calculation, this approach still requires an iterative calculation for optimization. Thus, the sediment entrainment process is omitted from their model. Their model does not consider the resuspension (entrainment) process of sediment, whereas suspended sand in turbidity currents is maintained by balancing the effects of particle settling and diffusion from the bottom (i.e., entrainment). Their model only considers advection and settling of particles so that the suspended sediment quickly settles and is lost over short distances at realistic flow thicknesses and concentrations. The only way to transport large amounts of suspended sediment for long distances and to deposit thick turbidites without resuspension is to make the flow extremely thick or to suppose unusually high velocity or concentration. This is the reason that the extremely thick flow depth (more than 3000 m) was obtained in their results. Their inversion method requires iterations that cannot be parallelized, so the forward model needs to be simplified for this purpose. In addition, gradient-based optimization tends to have problems with initial value dependency and escaping from local optimal solutions. For this reason, the results of their inverse analysis of turbidites were quite unrealistic. In contrast, we were able to adopt a "full model" that incorporates the entrainment process of suspended sand into our model without any problems. As a result, our inversion did not produce any anomalous reconstructions even though most of our test data exhibit thickness and grain size distributions similar to realistic turbidites. This strongly suggests the robustness of our inverse model and its applicability to real turbidites.
Another potential approach to optimization is the Markov chain Monte Carlo (MCMC) method, but even with this method, repetition of the forward model calculation is unavoidable, since MCMC usually requires repetition of calculations of the objective function, which cannot be parallelized more than the order of 10 4 time. The layer-averaged model of unsteady turbidity currents is probably not suitable for forward models due to their computational load.
The approach proposed in this study is obviously superior to existing methods in terms of applicability to the field, as it allows computationally demanding models to be applied as forward models. The general relationship between the bed and the input parameters is learned by an NN rather than adjusting the input parameters of the numerical model to reproduce the characteristics of specific individual beds. The objective function used in the training of this NN is not the difference between the features of the sediment, but the precision of the inverse analysis results themselves. The most computationally demanding part of the inverse analysis method proposed here is the generation of the training data for the NN. However, since the computations of the forward models are completely independent of each other, the generation of the training data can be conducted in parallel. Thus, our method enables us to easily prepare a large number of training data by using PC clusters, even for very computa-tionally demanding forward models. In addition, the number of calculations required for training is not as high as other methods, specifically only approximately 3000. It is also advantageous that the proposed method enables us to perform various tests for robustness or precision of inversion before application to field examples because the NN outputs the results of inverse analysis extremely fast. For these reasons, we consider this study to have successfully generated an inverse model using the layer-averaged model for unsteady turbidity currents that can be applied to the field.

Limitations and future tasks
The inverse model proposed in this study has several limitations. Inevitably, the accuracy of the inverse analysis is governed by the validity of the forward model that generates the training data. The present implementation of the inverse model uses the one-dimensional layer-averaged model as the forward model, but this model is likely to be applicable only to sedimentary basins that are laterally constrained or to the inside of the submarine channels. The layer-averaged model of Parker et al. (1986) used in this study has been widely accepted, but various doubts have been recently raised such as the formulation of entrainment rates of basal sediment (Dorrell et al., 2018) and ambient seawater (Luchi et al., 2018).
The assumption of a lock-exchange condition for the occurrence of turbidity currents may not be appropriate in some situations.
Although Luchi et al. (2018) suggested that the a singlelayer model may not be sufficient for considering behavior of turbidity currents maintained over long distances, it is expected that such turbidity currents do not leave turbidites and create a bypassing zone. Otherwise, the concentration in the lower layers of turbidity currents decreases, and therefore the currents stop within a relatively short distance. Thus, a twolayer model of turbidity currents is not always necessary for inversion of bed-scale turbidites. However, modeling of continuous sustained turbidity currents is necessary for inverse analysis of the development of submarine fans and channellevee systems on a larger scale.
It is relatively easy to solve these problems described above. Without changing the framework of the proposed method, we can adapt it to any situation by changing the forward model to generate the training data. For processes such as sediment transport, it is easy to revise the model to incorporate state-of-the-art knowledge. By adopting computationally demanding models, inverse analysis using 2D and 3D forward models may be possible. In future research, these issues should be addressed, and the methodology should be applied to actual field examples.
The analysis of ancient turbidites will be an important issue in the future. However, even if ancient turbidites are analyzed, it is not possible to verify that the results obtained are correct because the hydraulic conditions for ancient turbidity currents are unknown. Another way to verify the validity of the method is to reconstruct the hydraulic conditions of experimental turbidity currents from the turbidites deposited in the flume and compare them with the measured values. The turbidity currents measured in the modern submarine canyons and their deposits would be another candidate to be used for model verification.

Conclusions
This study implemented an inverse model that reconstructs the flow characteristics of turbidity currents from their deposits using an NN and verified its effectiveness at the field scale. In this study, we assumed that turbidity currents occur from suspended sediment clouds, which flow down from the steep slope in a submarine canyon to a gently sloping basin plain. The inverse model attempts to reconstruct seven model input parameters (height and length of the initial suspended sediment cloud, sediment concentration of four grain size classes, and slope of the basin plain) from the thickness and grain size distribution of the turbidite deposited on the basin plain. The forward model, using one-dimensional layer-averaged equations, was used to produce training datasets with random conditions in prescribed ranges. The NN was trained using the generated data to develop the inverse model. Thereafter, the test data generated independently from the training data were analyzed to verify the performance of the inverse model.
As a result of the training and tests conducted on the inverse model, the following was found.
1. More than 2000 datasets were required for the training to avoid over-learning. An increase in the number of training datasets results in improved performance of the inverse model; however, the degree of improvement becomes smaller if more than 3000 datasets are used.
2. The hydraulic conditions and basin slopes were precisely reconstructed from the test datasets. The thickness and grain size distribution of the turbidites deposited over a 10 km long interval in a sedimentary basin were sufficient to reconstruct the flow conditions.
3. The inverse model of this study is quite robust to random errors in the input data. The addition of a normal random number with about the same magnitude of the standard deviation as the original data had little effect on the results of the inverse analysis.
4. Judging from the results of subsampling tests, the inversion of turbidity currents can be performed if an individual turbidite can be correlated over 10 km at approximately 1 km intervals.
These results imply that the inverse model of turbidity currents proposed in this study is promising for analyzing fieldscale turbidites. This method is expected to be applied to actual turbidites in the future.