Inverse modeling of turbidity currents using artificial neural network: verification for field application

Although in situ measurements observed on modern frequently occurring turbidity currents have been performed, the flow characteristics of turbidity currents that occur only once every hundreds of years and deposit turbidites over a large area have not yet been elucidated. In this study, we propose a method for estimating the paleo-hydraulic conditions of turbidity currents from ancient turbidites by using machine learning. In this method, we hypothesize that turbidity currents result from 5 suspended sediment clouds that flow down a steep slope in a submarine canyon and into a gently sloping basin plain. Using inverse modeling, we reconstruct seven model input parameters including the initial flow depth, the sediment concentration and the basin slope. Repeated numerical simulation using one-dimensional shallow water equations under various input parameters generates a dataset of the characteristic features of turbidites. This artificial dataset is then used for supervised training of a deep learning neural network (NN) to produce an inverse model capable of estimating paleo-hydraulic conditions from data 10 of the ancient turbidites. Only 3,500 datasets are needed to train this inverse model. The performance of the inverse model is tested using independently generated datasets. Consequently, the NN successfully reconstructs the flow conditions of the test datasets. In addition, the proposed inverse model is quite robust to random errors in the input data. Judging from the results of subsampling tests, inversion of turbidity currents can be conducted if an individual turbidite can be correlated over 10 km at approximately 1 km intervals. These results suggest that the proposed method can sufficiently analyze field-scale turbidity 15 currents.

However, no practical methodology for the inverse analysis of turbidity currents applicable on a field scale has yet been established. Early attempts to obtain hydraulic parameters of turbidity currents were based on the grain-size distribution of turbidites (Scheidegger and Potter, 1965;van Tassell, 1981;Bowen et al., 1984;Komar, 1985;Kubo, 1995) or on sedimentary structures (Harms and Fahnestock, 1960;Walker, 1965;Allen, 1982;Komar, 1985;Allen, 1991;Baas et al., 2000). The 60 estimation of hydraulic conditions for turbidity currents based on grain size assumed that the flow is close to the criteria of suspension or the auto-suspension (Komar, 1985), but it has been emphasized that this assumption is highly problematic and leads to significantly different results compared with the actual hydraulic conditions for turbidity currents (Hiscott, 1994).
Although the methods based on sedimentary structures can provide rough estimates of the conditions of a turbidity current, assumptions regarding the thickness of the flow are required (Ohata et al., 2017). 65 To obtain reasonable flow characteristics from turbidites, inverse analysis using a numerical model should be performed. Falcini et al. (2009) proposed a method for predicting the hydraulic conditions of turbidity currents from ancient turbidites and applied it to the Laga Formation in the Central Apennines, Italy. Their steady-state model was largely simplified to obtain an analytical solution of the model. However, most of the ancient turbidites are characterized by graded bedding (Bouma, 1962), which suggests non-steady waning nature of currents. Therefore, the applicability of this method should be quite limited to 70 non-graded turbidites deposited from long-maintained flows. Conversely, Lesshafft et al. (2011) applied a direct numerical simulation model for the inversion of turbidite, whose application however to field-scale data is difficult because of the high calculation cost. Parkinson et al. (2017) proposed a method applicable to non-steady field scale flows by using a layer-average model as the forward model, which is potentially applicable to turbidites in outcrops. However, the flow conditions predicted from ancient turbidites were quite unrealistic in their study. They analyzed a turbidite in the Marnoso Arenacea Formation in 75 the Appennine, and gave flow depth of 3950 m or 1.92 mm; both reconstructions are not acceptable as realistic conditions. These extremely large or small estimates may be due to oversimplification in their forward model or failure in the optimization of the input parameters. Nakao and Naruse (2017) were the first to successfully perform an inverse analysis of turbidites using a general non-steady shallow water equation model. Although their reconstruction of the hydraulic conditions of the turbidity current was reasonable, the computational load of the inverse analysis was high because they used a genetic algorithm for 80 optimization. Thus, they were unable to repeatedly analyze various artificial or field data to test the validity and robustness of their inverse model. In addition, because of the high computational load, modifying their forward model to a more complex one in future would be difficult. These previous attempts suggest that a robust inverse model that can accept a more complex forward model is required to conduct inversion of turbidity currents from turbidites under realistic conditions.
Here, we propose a new methodology using an artificial neural network (NN) for obtaining flow characteristics of turbidity 85 currents from their deposits (Fig. 1). NNs are machine learning systems that can be trained to perform very complex functions (Hecht-Nielsen, 1987). NNs have been used in a wide range of applications such as classification (Krizhevsky et al., 2012) or generative modeling (Sun, 2018). In recent years, this method has been widely applied also in the field of earth and planetary sciences (Laloy et al., 2018). Particularly, NNs are a powerful tool for high-dimensional regression of multiple variables with complex distributions (LeCun et al., 2015). In this study, we generate a non-linear regression model to estimate the hydraulic 90 conditions of turbidity currents from the spatial distribution of bed thickness and grain size of turbidites using NN. study is designed to generate data of deposits from known conditions by numerical calculations of the forward model. In this case, the generation of training data can be completely parallelized, and therefore, any model that incur a high computational load can be implemented as a forward model. This approach was already proved to be effective for the inverse analysis of tsunami deposits (Mitra et al., 2020). In this study, we implement a NN-based inverse analysis and examine its effectiveness for turbidites at the field scale. 100 2 Forward model description Here we describe the formulation of the forward model used for producing training datasets for the inverse model (Fig. 2).
This model is based on the model developed by Kostic and Parker (2006), which predicts the behavior of surge-type turbidity currents, but we modified it to consider sediment transport and deposition of multiple grain-size classes. The initial setting of the flows was set to be the lock-exchange condition, which assumes that the collapse of a rectangular-shaped cloud of sediment 105 suspension produces a turbidity current.

Layer-averaged equations
Let t and x be the time and bed-attached streamwise coordinates, respectively. Parameters U and h denote the layer-averaged flow velocity and the depth, respectively. The total sediment concentration is C T . Here, we apply the following layer averaged conservation equations of fluid mass, momentum and suspended sediment mass of a turbidity current (Parker et al., 1986; 110 Kostic and Parker, 2006): where R(= ρ s /ρ f − 1) is the submerged specific density of the sediment (ρ s and ρ f are the densities of the sediment and the 115 fluid), and g is the gravity acceleration. S is the slope, and C f denotes the friction coefficient. The right-hand side of the fluid mass conservation (Equation 1) considers the entrainment of ambient fluid to the flow, in which the empirical entrainment coefficient e w is applied. Equation 3 describes the mass conservation of the suspended sediment in the flow, which varies depending on the balance between settling and entrainment of the sediment from and to the active layer. In this model, the grain-size distribution of sediment is discretized to N classes. The parameter C i denotes the suspended sediment concentration 120 of the ith class. The model applies the active layer assumption, in which the grain-size distribution is vertically uniform in the bed surface layer (active layer) that exchanges sediment with suspended load (Hirano, 1971). F i indicates the fraction of the ith grain-size class in the active layer. The parameter w si denotes the settling velocity of the sediment particles in the ith class, and r 0 denotes the ratio of near-bed concentration to the layer-averaged concentration of the suspended sediment.
The mass conservation of the sediment in the active layer and the deposit (historical layer), respectively, takes respectively 125 the form where η i denotes the volume per unit area of the ith grain size class, and η T is the total thickness of the deposit. To solve the Equations 1-6, empirical relations are required for the parameters: w si , r 0 , C f , L a , e w , and e si . Here we applied the formulation of Dietrich (1982) for obtaining the settling velocity w si . The ratio of near-bed to layer-averaged concentrations r 0 and the bed friction coefficient C f are fixed to be 2.0 and 0.004, for simplicity (Garcia, 1990). The active layer thickness L a is assumed to be constant (0.003 m). Regarding the entrainment coefficients of ambient water and basal sediment e w and e si , 140 we applied formulations proposed by Parker et al. (1987) and Garcia and Parker (1991) respectively.
For computational efficiency and numerical stability, a deformed grid approach was adopted to solve Equations 1-3. In this transformed coordinate, the propagating flow head was fixed at the downstream boundary using a Landau transformation (Crank, 1984). The tail of the flow was also fixed at the upstream end of the calculation domain, and thus the grid spacing in the dimensional coordinate space was continuously stretched during calculation, whereas that in dimensionless space remained 145 constant. This scheme was based on Kostic and Parker (2006), and more details regarding the numerical implementation were given by Nakao and Naruse (2017).
4 Figure 3. Model input parameters. The initial conditions of turbidity current is assumed to be the suspended sediment cloud that is H0 and l0 in height and length, respectively. The initial sediment concentrations C1 to C4 and the basin slope S l are to be specified for calculation.
These seven input parameters are subject to be reconstructed by inverse analysis.

Model input parameters and topographic settings
In this study, a turbidity current was assumed to occur from a cloud of suspended sediment (height: H 0 , length l 0 ). The initial flow velocity was set to 0, and the sediment of the ith grain-size class was considered to be initially homogeneously distributed 150 in the suspension cloud at the concentration C i (Fig. 3). The suspended sediment cloud was located at the upstream end of the calculation domain, where the slope gradient was 0.1. This steep slope extended for 5.0 km and transited to a gently sloping basin plain (gradient is S l ) in the downstream region. Total length of calculation domain was 100 km. In summary, the number of initial conditions required for the forward model calculation was three (H 0 , l 0 and S l ) plus number of grain-size classes The optimization of weight coefficients of NN is then conducted to reduce the mean square of the difference between the true conditions and the output values of the NN. If the number of training datasets is sufficiently large, the trained NN should be able to estimate the paleo-hydraulic conditions from the data of the ancient turbidites ( Fig. 1). In other words, an empirical relationship with numerical results and the model input parameters are explored in this method, and the discovered relationship 165 is used for inverse modeling of turbidity currents. The details of these procedures are described below.

Production and preprocessing of training and test data sets for supervised machine learning
We conducted iterative calculations using the forward model and accumulated data to train and validate the inverse model.
To investigate the appropriate amounts of data for training the inverse model, we conducted 500-3500 iteration of the forward model calculations. To verify the performance of the trained model, 300 test data sets were also generated numerically, 170 independent of the training data.
Model input parameters that are subject to inversion are required to produce the training and test data by the forward model calculation (Fig. 3). In this study, the model inputs are the initial flow height H 0 , the initial flow length l 0 , the initial sediment concentration for the ith grain size class C i , and the basin slope S. These model parameters are generated as uniform random numbers within a certain range, and their range is changed according to the target of the inverse analysis. Since this study is 175 aimed at field-scale analysis, the following ranges are chosen. Both initial depth and length of suspended cloud range from 50 to 600 m. The sediment concentration for each grain size class ranges from 0.01% to 1.0%. The number of grain size classes N is four, and the representative grain diameters are 1.5, 2.5, 3.5 and 4.5 phi. The inclination of the basin plain where the turbidites are expected to form ranges from 0 to 1.0%.
Each run of the forward model calculation is initiated with the given model input parameters, and is terminated when the 180 flow head reaches the downstream end or sufficiently long time period (1.2 × 10 5 s.) has elapsed. As a result of the calculation, the forward model outputs the volume-per-unit-area of sediment for all grain size classes over the 100 km-long calculation domain. The inverse model estimates the model input parameters from the resultant spatial distribution of the granulometric characteristics of the deposits. However, in natural outcrops, it is unlikely that the entire distribution of the turbidite beds would be exposed. Therefore, we limit the length of the sampling window in the calculation domain, and only the sediment 185 data contained in this window is extracted for both training and testing. The upstream end of the sampling window was set at the transition point between the steep slope and the basin plain (5 km from the upstream end), and the length of the window varies from 1 to 30 km to evaluate the data interval required for the inverse analysis.
Before the model input parameters are input to NN, all values are normalized between 0 and 1 using the following equation: where I * i and I i denote the ith normalized and original input parameters, respectively. I maxi and I mini are the maximum and minimum values used for generating the ith input parameter, respectively. This min-max normalization is applied to consider all parameters at equal weights because the range of the initial flow conditions is significantly different between them.

Structure of NN
The artificial NN is used as the inverse model to reconstruct flow conditions from the depositional architecture. We input the 195 spatial distribution of volume-per-unit-area of multiple grain size classes of a turbidite in the NN, which outputs the values of the flow initial conditions and the basin slope. In this study, we use a fully connected NN that has four hidden layers. The volume-per-unit-area of N grain-size classes of sediment deposited on M spatial grids in the sampling window is given to the input nodes of the NN. Thus, the total number of the NN input nodes is N × M . The number of nodes in all hidden layers is set to 2000 in this study.
The NN is expected to output the model input parameters (i.e., the initial flow conditions and the basin slope), and therefore, 205 the number of nodes in the output layer is equal to the number of input parameters for the forward model, which is seven here (the initial flow length, depth, sediment concentrations and the basin slope).

Training the inverse model
To develop the inverse model, supervised training is conducted using the artificial dataset produced by the forward model calculation. First, the artificial dataset is randomly split into training and validation datasets to detect overfitting during the 210 training process. The ratio of the validation dataset is set to 0.2 so that 80% of the artificial dataset is used for training. The model input parameters used for producing training and validation sets were regarded as the teacher data to train and evaluate the model.
Methodology applied for training the NN is as follows. The mean squared error (MSE) is adopted as the loss function because the supervised training of NN in this study is classified as a regression problem (Specht, 1991), and MSE is a common 215 loss function for regression (Bishop, 2006;Hastie et al., 2009;Shalev-Shwartz and Ben-David, 2014). Before training, all weight coefficients of NN are randomly initialized using the Glorot uniform distribution (Glorot and Bengio, 2010). The backpropagation algorithm (Rumelhart et al., 1986) is used to calculate the derivative of this error metric for each connection between the nodes, and the stochastic gradient descent method (SGD) with Nesterov momentum (Nesterov, 1983) is used for optimizing the weight coefficients of NN to minimize the difference between the model predictions and the teacher datasets.

220
Other optimization methods, such as AdaGrad (Duchi et al., 2011), RMSprop (Tieleman andHinton, 2012) and AdaDelta (Zeiler, 2012), have been tested, but SGD shows the best performance in this case. Dropout regularization (Srivastava et al., 2014) is applied for each epoch to reduce overfitting and to improve the generalization ability of the NN. One training epoch, which refers to one cycle through the full training dataset, is repeated until the loss function of the validation dataset converges Several hyperparameters should be specified for the training of NN. Specifically, the dropout rate, the learning rate, the batch size, the number of epochs, and the momentum are adjusted manually after repeated trial and error. To perform an optimization calculation with SGD, the batch size and the learning rate were set to 32 and 0.02, and the value 0.9 was chosen for the 230 momentum. Dropout rate for regularization was 0.5.

Testing the inverse model
The performance of the inverse model is tested using a set of 300 data that are produced independently of the training and validation datasets. The inversion precision for each model input parameter is evaluated by the root mean square error (RMSE) and the mean absolute error (MAE) of the prediction. These error metrics are computed for both raw and normalized values 235 with true values, and used to evaluate the model. Moreover, the bias of prediction (i.e., the mean deviation of the model predictions from the true input parameters) is used describe the accuracy of the inversion.
Two additional tests are conducted for verifying the robustness of the inverse model that is significant for the applicability of the model to field datasets. The results of these tests are evaluated by the average of the normalized RMSE, which is defined as: where I pjk and I jk denote the predicted and the original values of the jth model input parameter for the kth test dataset, respectively. J and K are the numbers of the model input parameters and the test data sets.
First, noise is artificially added to the test data to evaluate the robustness of the inversion results against the measurement error. Under natural conditions, measurement errors in the thickness and grain size analysis of turbidites as well as the local 245 topography affect these results. If the results of the inverse analysis change significantly due to such errors, it means that our method is not suitable for application to field data. To investigate this, we apply normal random numbers to the volume per unit area at each grid point in the training data at various rates, and we observe how much influence the noise has on the inverse analysis results.
The second test on the inverse model is to perform a subsampling of the grid points in the training data. Outcrops are not 250 continuous over tens of kilometers, so that the thickness and the grain size distribution of a turbidite in the interval between outcrops can only be obtained by interpolation. To simulate this situation, the grid points in test datasets are randomly removed in this test, and the volume-per-unit-area at the removed grid points is linearly interpolated. By varying the rate at which grid points are removed, this test also allows us to estimate the average interval of the outcrops that are necessary for conducting the inverse analysis. That is, if 90% of the grid points set at 5 m intervals are removed, and the inverse analysis is conducted 255 on the remaining 10%, the average distance between the grid points is 50 m. Estimating the outcrop spacing requires obtaining reasonable results of inverse analysis before applying it to the actual field.

Properties of artificial data sets of turbidites
Here, we describe the properties of turbidite artificial data generated for training and testing the inverse model. Several artificial  spatial distribution in bed thickness and grain size of turbidites deposited in the region of the basin plain. Most beds exhibit the typical "top-hat" or "core and drape" shape of turbidites Hirayama and Nakajima (1977); Talling et al. (2012);Pantopoulos et al. (2013), where turbidite beds become thicker in the upstream part of the basin and then thin rapidly from their peak of thickness. Thereafter, beds continue over a long distance, gradually decreasing in thickness (Fig. 4). At the same time, the grain 265 size gradually becomes finer downstream. The maximum thickness of beds is 1.27 m on average (standard deviation σ = 1.65 m), and the mean value of the area where sediments with a thickness greater than 1 cm are distributed is 42.0 km (σ = 15.7 km). Each bed is composed of four grain size classes. All distributions of the volume-per-unit-area of the grain size classes are still "top hat" shaped ( Fig. 4b), but the depositional center and the amounts of deposition are different for each classes depending on their size. data exceeds 2500, the improvement of values of the loss function became not so rapid. Regarding the distance of the sampling window, the training results are not stable when the sampling window is shorter than 5 km (Fig. 5). On the other hand, the training results are stable when the window length is longer than 10 km, and the results gradually improve as the window length 280 increases. However, extending the window length from 10 km to 30 km results in little improvement of the loss function.
Hereafter, we further investigate the performance of the inverse model trained on a 3500 dataset with a 10 km-long sampling window. The history of training indicates that the values of the loss function improved significantly in the first 1000 epochs, and the results are improved up to 15,000 epochs (Fig. 6). Eventually, saturation is reached at approximately 20,000 epochs.
The resultant loss function (i.e., the MSE of prediction) is 3.78 × 10 −3 for training sets and is 1.03 × 10 −3 for validation sets.
285 Table 1. Errors and bias of the predicted parameters. Prediction errors are exhibited by the root mean squared error (RMSE) and the mean absolute error (MAE), and the mean bias is also described. Normalized values of RMSE, MAE and mean bias by true values are also shown.

Precision and accuracy of inverse analysis
Using 300 test data sets, the performance of the inverse model trained with 3500 data sets and 10 km-long sampling window is evaluated. The estimated parameters are matched well with slight deviations (Figs. 7, 8; Table 1). R 2 values are beyond 0.98 for all parameters. Particularly good agreement is obtained for the estimates of the initial height and the length of the suspended sediment cloud. Values of the normalized RMSE and MAE for these parameters are less than 9 % and 6 %, respectively. The 290 sediment concentration is also precisely estimated. the normalized RMSE for the sediment concentration ranges from 12 to 16 %, which corresponds to only 0.02-0.03 volumetric %. The prediction for the basin slope shows relatively large errors (RMSE is close to 20 % and MAE is 11.7 %), but these errors correspond to only 0.03 % of slope. Focusing on the bias of the estimates, all estimated values except for the basin slope tend to be slightly smaller, whereas the predicted values of the basin slope tend to be larger (Fig. 8). The values of the bias, however, range only from 2 to 12% of the original value.

295
The forward model is calculated again using the reconstructed values to examine the influence of the estimation error of the model input parameters on the predicted flow behavior (Fig. 9). The chosen test values deviate from the true conditions as indicated by the RMSE value 0.27 (Table 2), but the time evolution of the flow characteristics agree very well with those calculated from the true values (Fig. 9). When comparing the velocity and concentration of the flow at 10 km from the upstream end, the discrepancy between calculation results using reconstructed and original parameters is less than 5% for both parameters.

Tests for robustness against noise and subsampling on input data
The test data with various amounts of normal random values are analyzed to verify the robustness of the inverse model. Consequently, even when the standard deviation of the normal random numbers given as measurement errors was set to approximately 200% of the value of the original data, only a small effect is observed in the normalized RMS of the results of the inverse anal- ysis (Fig. 10). The RMS values gradually increase when the standard deviation of errors exceeds 50%, but there is no rapid 305 increase in the RMSE of the results at any particular threshold.
Similarly, using subsampling data obtained by extracting some of the spatial grids from the original data, we conducted an inverse analysis of the test datasets. The results show that there is little influence on the RMSE values of the inverse analysis of the test datasets when the sampling rate of grids is greater than 1 % (Fig. 11). The RMSE values gradually increased when the sampling rate falls below 1 %, and RMSE becomes extremely high when the rate drops below 0.4%.

Performance of inverse model
The performance of the inverse model for turbidity currents is evaluated using the test data set, implying that this model can accurately reconstruct the flow characteristics of the turbidity currents from the spatial distribution of the thickness and grain size of turbidites (Figs. 7 and 8). The biases in the values reconstructed from the true input parameters are also very small and 315 thus should not pose a serious issue when the method is applied to actual field data.
The inverse model not only reconstructed the initial conditions of turbidity currents accurately, but also the predicted time evolution of the flow behavior was sufficiently accurately and precisely. In the results of the forward model calculations using the predicted model input parameters that are relatively deviate from the true values (Table 2), the time evolution of the velocity and the thickness of the flow does not deviate significantly from the results using the true values (Fig. 9). Turbidity currents 320 have a mechanism called the self-acceleration, which is caused by erosion and associated increase of the flow density (Parker et al., 1986;Naruse et al., 2007;Sequeiros et al., 2009). Therefore, even slight differences in the initial conditions of the flow can lead to very different results of the time evolution of the flow parameters. However, the results of this test imply that the accuracy of the inverse analysis in this study is enough to prevent to cause such a drastic change in the flow behavior.

Applicability to field-scale problems 325
To apply this method to outcrops, the extent of the area that should be surveyed to collect data and the interval between outcrops should be determined. The tests with different sizes of sampling windows suggest that the survey region should be located more than 10 km from the proximal region (Fig. 5). The loss function (i.e., the MSE of the estimates of the parameters) decreases as the length of the sampling window increases, and the best result is obtained at the 10 km-long window. Regarding the interval of the outcrops, the test results of sampling rates of more than 1.0% with interpolation for data at non-sampled grids are not 330 inferior to the full sample. Since the training data used in this study are computed on 5 m-spaced grids, extracting data from these grids with a 1.0% probability is equivalent to conducting an inverse analysis from outcrop data that are distributed at 0.5 km intervals on average. Although the RMSEs of the model prediction certainly increase when the sampling rate decreases below 1.0 %, the RMSE values does not drastically worsen until 0.5 %. Therefore, even if the outcrop spacing is about 1 km, it should be possible to obtain a reasonable estimates of the flow characteristics.

335
These requirements for accurate inversion are attainable in the actual field. For example, Hirayama and Nakajima (1977) correlated individual turbidites of the Pleistocene Otadai Formation distributed in the Boso Peninsula, Japan, on the basis of the key tuff beds. Their correlation covered a region over 30 km long with 33 outcrops. Thus, the average interval between outcrops was approximately 1 km. Amy and Talling (2006)  and frequency (Hesse, 1974;Tokuhashi, 1979Tokuhashi, , 1989Amy et al., 2000Amy et al., , 2004. Furthermore, Bartolini et al. (1972)  submarine fans in different areas (Bornhold and Lilkey, 1971;Pilkey et al., 1980). In summary, although the method proposed in this study requires fairly high resolution data of turbidite individual beds correlated over a long distance, such conditions in ancient geological records as well as modern seafloor surveys can be achieved.
Besides these outcrop conditions, measurement errors in the field are another important factor for application. The test results suggest that the proposed inverse model of this study is very robust against random noise; random errors in the measured data

Comparison with previous methodologies
In existing inverse analysis methods of turbidity currents, the difference in depositional characteristics between the outputs of the forward model and the field observation is quantified as the objective function, and the initial and the boundary conditions of the forward model are determined by conducting optimization calculations to minimize the objective function (Nakao and 355 Naruse, 2017, e.g.,). This is because models of turbidity currents are generally nonlinear and are difficult to linearize, especially when considering the entrainment of the basal sediment (Parker et al., 1986). Although the actual computational load depends on the choice of algorithm, this type of optimization calculation generally consists of multiple steps, and each step depends on the results of the previous calculation. Thus, the entire optimization procedure is difficult to parallelize. For instance, the kriging-based surrogate management method (Lesshafft et al., 2011) or the genetic algorithm (Nakao and Naruse, 2017) have 360 been used to optimize the objective function for inversion of turbidity currents. In these methods, multiple calculations are conducted in each calculation step (generation), and the distribution of the objective function in the parametric space is iteratively estimated. Although the computations within each generation can be parallelized in this kind of algorithms, the next generation's computation depends on the results of the previous generation's computation, and therefore, the entire computation process cannot be parallelized. Thus, if the computational load of the forward model is high, the inverse analysis takes an 365 unrealistic amount of time. Parkinson et al. (2017) applied the adjoint method with the gradient-based optimization algorithm.
Although the differentiation of the shallow-water model by the adjoint method greatly reduces the load of the gradient calculation, this approach still requires an iterative calculation for optimization. Thus, the sediment entrainment process is omitted from their model. In addition, gradient-based optimization tends to have problems with initial value dependency and escaping from local optimal solutions. For this reason, the results of their inverse analysis of turbidites were quite unrealistic. Another 370 potential approach to optimization is the Markov Chain Monte Carlo (MCMC) method, but even with this method, repetition of the forward model calculation is unavoidable, since MCMC usually requires repetition of calculations of objective function, which cannot be parallelized, more than the order of 10 4 time. The shallow water model of unsteady turbidity currents is probably not suitable for the forward models due to their computational load.
The approach proposed in this study is obviously superior to existing methods in terms of applicability to the field, as it allows 375 computationally demanding models to be applied as forward models. The general relationship between the bed and the input parameters is learned by NN rather than adjusting the input parameters of the numerical model to reproduce the characteristics of specific individual beds. The objective function used in the training of this NN is not the difference between the features of the sediment, but the precision of the inverse analysis results themselves. The most computationally demanding part of the inverse analysis method proposed here is the generation of the training data for the NN. However, since the computations of the 380 forward models are completely independent of each other, the generation of the training data can be conducted in parallel. Thus, our method enables us to easily prepare a large number of training data by using PC clusters, even for very computationally demanding forward models. In addition, the number of calculations required for training is not as high as other methods, specifically only approximately 3,000. It is also advantageous that the proposed method enables us to perform various tests for robustness or precision of inversion before application to field examples, because the NN outputs results of inverse analysis 385 extremely fast. For these reasons, we consider that this study successfully generated an inverse model using the shallow water model for unsteady turbidity currents that can be applied to the field.

Limitations and future tasks
The inverse model proposed in this study has several limitations. Inevitably, the accuracy of the inverse analysis is governed by the validity of the forward model that generates the training data. The present implementation of the inverse model uses the 390 one dimensional shallow water equation as the forward model, but this model is likely to be applicable only to sedimentary basins that are laterally constrained or to the inside of the submarine channels. The shallow water equation model of Parker et al. (1986) used in this study has been widely accepted, but various doubts have been recently raised such as the formulation of entrainment rates of basal sediment (Dorrell et al., 2018) and ambient seawater (Luchi et al., 2018). The assumption of a lock exchange condition for the occurrence of turbidity currents may not be appropriate in some situations.

395
However, it is relatively easy to solve these problems. Without changing the framework of the proposed method, we can adapt to any situation by changing the forward model to generate the training data. For processes such as sediment transport, it is easy to revise the model to incorporate the state-of-the-art knowledge. By adopting computationally demanding models, inverse analysis using 2-D and 3-D forward models may be possible. In Future research, these issues should be addressed, and the methodology to actual field examples should be applied.

Conclusions
This study implemented an inverse model that reconstructs the flow characteristics of turbidity currents from their deposits using a NN, and verified its effectiveness at the field scale. In this study, we assumed that turbidity currents occur from suspended sediment clouds, which flow down from the steep slope in a submarine canyon to a gently sloping basin plain. The inverse model attempts to reconstruct seven model input parameters (height and length of the initial suspended sediment cloud, 405 sediment concentration of four grain size classes, and slope of the basin plain) from the thickness and grain size distribution of the turbidite deposited on the basin plain. The forward model using one-dimensional shallow water equations was used to produce training data sets with random conditions in prescribed ranges. The NN was trained using the generated data to develop the inverse model. Thereafter, the test data generated independently from the training data were analyzed to verify the performance of the inverse model.

410
As a result of the training and tests conducted on the inverse model, the following was found: 1. More than 2000 data sets were required for the training to avoid overlearning. An increase in the number of training data sets results in improved performance of the inverse model; however, the degree of improvement becomes smaller even if more than 3000 data sets.
The symbols L, M and T denote dimensions of length, mass and time respectively. The symbol [1] denotes that the value is dimensionless.