ML23221A402

From kanterella
Jump to navigation Jump to search
Engineering Applications of Artificial Intelligence and Machine Learning for Mechanical Systems and Component Performance
ML23221A402
Person / Time
Issue date: 08/11/2023
From: Matthew Homiack, Raj Iyengar, Matrachisia J, Pillai R, Savara A, Starr M, Verzi S, Villareal T
Office of Nuclear Regulatory Research, Oak Ridge, Sandia
To:
References
Download: ML23221A402 (1)


Text

The views expressed in this paper are those of the authors and do not reflect the views of the U.S. Nuclear Regulatory Commission. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. Approved for public release; distribution is unlimited.

This report was prepared as an account of work sponsored by an agency of the U.S. Government. Neither the U.S. Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for any third partys use, or the results of such use, of any information, apparatus, product, or process disclosed in this report, or represents that its use by such third party would not infringe privately owned rights. The views expressed in this paper are not necessarily those of the U.S. Nuclear Regulatory Commission.

1 Engineering Applications of Artificial Intelligence and Machine Learning for Mechanical Systems and Component Performance Matthew Homiack1*, John Matrachisia1, Tristan Villarreal1, Aditya Savara1, Raj Iyengar1, Stephen Verzi2, Michael Starr2, and Rishi Pillai3 1 U.S. Nuclear Regulatory Commission, Washington, D.C., U.S.A.

2 Sandia National Laboratories, Albuquerque, New Mexico, U.S.A.

3 Oakridge National Laboratories, Oak Ridge, Tennessee, U.S. A.

  • Corresponding Author (Matthew.Homiack@nrc.gov)

Abstract Artificial intelligence (AI) and machine learning (ML) methods are some of the fastest-growing technologies globally and have the potential to enhance efficiency, effectiveness, and decision-making processes for the nuclear industry. This paper explores several recent use cases of AI/ML methods in support of the U.S. Nuclear Regulatory Commission (NRC) staffs safety research efforts for mechanical systems and component performance.

The first use case explored ML to monitor the performance of a system. This study used a full-scope boiling water reactor (BWR) simulator for synthetic data generation. Scenarios were created to induce component malfunctions that may go undetected by operators and could lead to adverse operating conditions. The goal was to support early detection of the malfunctions using ML and thereby provide operators with increased response times. A long short-term memory (LSTM) autoencoder was trained and tested for identifying the anomalies in real-time. The study demonstrated the potential for using ML to monitor system performance.

The second use case explored ML to augment Monte Carlo simulation. For this use case, an interface was developed between open-source ML models and the Extremely Low Probability of Rupture (xLPR) probabilistic fracture mechanics (PFM) code. The xLPR code was used to analyze leak-before-break behavior in a pressurized water reactor piping system subject to cracking. Supervised ML, in the form of random forest regression, was applied to the sample input and output data from the xLPR code to conduct a sensitivity analysis. Through this analysis, the input variables that were most important with respect to the selected quantities of interest (QoIs) were determined both individually in univariate analyses and across all the QoIs in a multivariate analysis. The ML models were then further applied to explore time series prediction as a surrogate model for the xLPR simulation.

The third use case explored ML to overcome sparse data and enable long-term predictions of materials compatibility for nuclear reactor components in molten salt environments. It involved surrogate ML model prediction of values relevant to corrosion and associated uncertainties. This use case involved the application of a piecewise surrogate ML model for predicting the activities of chemical elements in alloys.

Such predictions are useful for lifetime assessment of materials compatibility because the activities have Submitted to PSAM 2023 for publication on October 23, 2023. This preprint (draft) is not to be cited or reproduced.

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

2 been associated with long-term loss of metal due to corrosion attack, particularly loss of chromium. The demonstration showed that surrogate ML models can be used to augment sparse datasets with predicted values for unmeasured, or not explicitly calculated datapoints, with reasonable prediction uncertainties.

Keywords: Artificial Intelligence, Nuclear, Regulation, Research

1. Introduction AI/ML methods are some of the fastest-growing technologies globally and have the potential to enhance the business of industry, academia, and government. Like for other applications, the benefits of AI/ML for engineering generally include increases in efficiency and effectiveness and enhanced decision-making. For instance, AI/ML methods can efficiently generate data as compared to physical experiments and other forms of synthetic data generation. AI/ML methods can be more effective at processing large or high-velocity data, thereby supporting analysis of highly complex systems. AI/ML methods can also enhance decision-making by identifying patterns and trends and enabling uncertainty quantification.

This paper explores three recent use cases of AI/ML methods in support of the NRC staffs safety research efforts for mechanical systems and component performance in the following areas:

1. system performance monitoring
2. Monte Carlo simulation and PFM for piping integrity analysis
3. overcoming sparse data to enable long-term predictions of materials compatibility of nuclear reactor components in molten salt environments Sections 2, 3, and 4 present the technical approach, results, and discussion for each use case, respectively.
2. Use Case 1System Performance Monitoring The first use case explored ML methods to monitor the performance of a system. Specifically, an anomaly detector was developed using an unsupervised ML method to detect a malfunction from multivariate, sequential data. Synthetic data for training and testing the ML models were generated using a full scope BWR simulator, which supports modeling of a substantial number of initial conditions, operations, malfunctions, and recording of real-time plant parameters. As a result, the simulator data was suitable for developing realistic scenarios and training various types of ML models for system performance monitoring research.

2.1 Approach The approach to demonstrate potential application of ML methods for system performance monitoring began with the curation of a robust dataset. Using the simulator as a surrogate for a real BWR, a subset of relevant plant parameters was captured in time-series datasets. After the samples were collected, techniques such as feature selection, averaging, binning, and splitting were applied to make sure that the data was in a form suitable for model training and testing.

The next step was to select the proper ML algorithm. This selection was informed by the characteristics of the data, the problem at hand, and the desired output. For this use case, an anomaly detector was developed with an autoencoder, which uses an unsupervised learning algorithm. An autoencoder only relies on input data for the model, thus during training the input data is set as the ground truth for the predictions. The neural network compresses the input data into a reduced dimension space and reconstructs the information into the output, which in theory should closely match the input data (Srivastava et al., 2015). As the neural network trains, the predicted value back propagates to fit the input using the loss function or, in this case,

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

3 the mean squared error (MSE). The MSE is based on the residuals between the input,, and the autoencoders predicted output,. During training, the autoencoder MSE was used as the loss function and the autoencoder was trained to minimize the MSE for the training set. When new datasets like the training dataset are fed as inputs to the model, the MSE of the predicted outputs will be low; however, when new datasets quantitatively different from the training dataset are fed as inputs, the MSE will be relatively high.

Figure 1 shows the general method of a standard autoencoder. The autoencoder has a symmetrical neural network architecture composed of an encoder, a latent space, and a decoder. For this use case, because the input data is sequential, the autoencoder used LSTM neural networks. An LSTM neural network is a type of recurrent neural network that has special gates which can feed output back to themselves and forget information from the previous state (Hochreiter & Schmidhuber, 1997). Due to their nature, LSTM neural networks work well for processing time-series data. With the combination of the LSTM neural network within the structure of an autoencoder, the model can train with a window of multivariate sequential data, such as a time series.

Figure 1: Illustration of a general autoencoder architecture with MSE-based training shown by arrows.

The final step of the approach was to train and evaluate the ML model. The autoencoder was trained using a dataset based on a subset of normal BWR operating conditions and evaluated from datasets of abnormal operating conditions. This step is an iterative process and focused on optimization of the neural network by performing a sensitivity study on the hyper-parameters (e.g., epochs, batch sizes, learning rates, and activation functions), preprocessed data, and the overall architecture of the neural network. After multiple iterations of the autoencoder, the model was successfully trained and then evaluated for its anomaly detection capabilities.

2.2 Results and Discussion The performance of the anomaly detector built using the ML models was evaluated by comparing the MSE from the normal operating condition data against the MSE from the abnormal operating condition data. The optimized neural network was trained with 251,783 trainable parameters; 14,878 training samples; 100 epochs; and 7 input features. The input features were a subset of relevant plant parameters that could be available to a typical BWR instrumentation and control system, such as the system flow rates, temperatures, pressures, and water and power levels.

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

4 Figure 2 shows the MSE results from the training dataset. The MSE reached a maximum value of 3.23 with an average value of 3.03. The maximum value is critical for determining the threshold for detecting an anomaly, thus the MSE from the normal operating conditions can be compared to the MSE from the abnormal operating conditions.

Figure 2: Training dataset results based on normal operating conditions.

Figure 3 shows the MSE results from four test cases where a simulated equipment malfunction was introduced at 10, 15, 20, and 25 minutes, respectively. The results are compared to the MSE from the normal operating conditions. The simulated malfunction was a recirculation pump runaway where the recirculation flow rate increases unexpectedly resulting in an increase in core power. As the results show, when the malfunction was introduced, the MSE markedly increased giving a clear indication that there was a deviation from normal operating conditions.

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

5 Figure 3: Autoencoder MSE for normal operating conditions vs. abnormal operating conditions produced via simulated equipment malfunctions at different points in time.

A fifth test case was also run to demonstrate that the neural network was properly trained. For this test case, simulated equipment malfunctions were introduced at several intervals with different severities. Figure 4 shows the results indicating the correct timing and severity of the malfunctions.

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

6 Figure 4: Autoencoder MSE for normal operating conditions vs. abnormal operating conditions produced by simulated equipment malfunctions at various severities and internals.

Overall, the results of the trained autoencoder demonstrate that some aspects of system performance can be monitored using ML methods. This research could be expanded by further exploring the capabilities and limitations of the autoencoder and by enhancing the anomaly detector by using other ML methods to both detect and classify malfunctions. Although the results of the induced malfunctions were as expected, the model was trained on a small subset of normal operating condition data, thus more training would be needed to fit a wider range of normal operating conditions so that the model would not detect these conditions as anomalous.

3. Use Case 2Monte Carlo Simulation Support In PFM, stochastic analyses enabled by Monte Carlo simulation are used to better understand the various uncertainties when predicting the load-carrying capabilities of components containing cracks. The second use case explored ML methods to augment PFM simulations by conducting sensitivity analysis and time-series prediction via surrogate models.

3.1 Approach This use case consisted of two separate, but related applications of ML coupled with the xLPR code. xLPR is a PFM code for piping applications that was developed jointly by NRC Office of Nuclear Regulatory Research and the Electric Power Research Institute (Homiack et al., 2021). This code was used to simulate the potential behaviors of a preexisting circumferential crack in an un-mitigated, Westinghouse-designed, pressurized water reactor pressurizer surge line weld subject to primary water stress corrosion cracking (PWSCC).

There were 56 inputs in the simulation, including both constant and uncertain or probabilistic inputs. These inputs covered such areas as the geometry of the pipe, operating conditions, crack size, applied stresses (e.g., welding residual stresses (WRS) defined at 26 points through the thickness of the weld), and material properties (e.g., PWSCC growth rate model parameters). Six outputs or QoIs were selected related to key behaviors of the problem, such as the crack size (e.g., crack depth normalized by the weld thickness) and whether the cracks cause leaks (e.g., occurrence of leak). A full list of all the inputs and outputs is in (Carlson et al., 2022).

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

7 In both applications, four separate, off-the-shelf, and open-source ML regression methods were explored:

(1) linear regression, (2) random forest, (3) gradient boosting, and (4) multilayer perceptron. Each of these methods utilize supervised ML where the xLPR simulation data provided the inputs and target outputs for supervision. The ML models were constructed and trained in Python using the scikit-learn library. Linear regression was used to baseline the other models, even though some QoIs were not expected to respond linearly. The ML methods were not optimized for their hyperparameters, and specific values used in each application are detailed next.

Determining the Most Important PFM Analysis Inputs. The first application sought to use ML methods to automate a sensitivity analysis. As described in (Hund et al., 2022), sensitivity analysis focuses on identifying how the input uncertainties contribute to the uncertainty in the QoIs and can help to identify those inputs that explain substantial uncertainty in the model output. For this application, all four regression methods performed similarly; therefore, only the results from the random forest regressor are presented herein. In using the scikit-learn random forest ensemble regression method, the only parameter besides the random seed that needed to be specified was the number of estimators, which was set to 1,000. The permutation importance method from the scikit-learn inspection sub-package was used to rank the relative importance of all the input parameters.

Surrogate Modeling for Generating PFM Analysis Outputs. The second application sought to use ML methods to improve the efficiency of the PFM simulation via development and training of a surrogate model that could predict the time series of a specific QoI given a set of input samples from the xLPR code.

At first, the occurrence of leak time series was selected for prediction. All four of the ML regression methods were used to predict the time at which a leak occurs, if a leak occurs, given specific xLPR inputs.

However, because all the ML methods are supervised, and the case of no leak was much more likely in this problem, there were not enough positive leak samples to support such a prediction. Each ML method could be trained using the entire set of 2,000 samples, but they could not generalize using only a portion the data, including splitting the data into 75 percent for training and 25 percent for testing.

Given these limitations, the focus of the study was changed to predicting the normalized crack depth propagation over time. Normalized crack depth propagation was not expected to be linear, thus the linear regression results needed special interpretation beyond a predicted value of 1.0 for the normalized crack depth (i.e., the point at which the crack penetrates through-wall). Both gradient boosting and multilayer perceptron methods are not suited for time-series prediction without some further research and design considerations. Thus, the results for the surrogate modeling were restricted to the linear regression and random forest methods. For time-series prediction, the random forest method was provided with all the xLPR code inputs as well as the current normalized crack depth as an input, and it was trained to predict the next normalized crack depth.

3.2 Results and Discussion Determining the Most Important PFM Analysis Inputs. Following the approach described in Section 3.1, the uncertain inputs in the xLPR code simulation were ranked by importance for each QoI individually (i.e., univariate analysis) and across all the QoIs simultaneously (i.e., multivariate analysis).

As the appropriate number of realizations needed to be determined for the surrogate modeling application, three separate ML models were trained given data from 200; 2,000; and 20,000 xLPR realizations, respectively.

Table 1 lists the top ten most important inputs based on their permutation importance as determined using the random forest ML model for each sample size. Empty cells indicate that the input did not rank in the top ten for the given sample size. All three sample sizes resulted in the WRS at point 1 through the weld thickness being the most important input in the analysis. The results from the 2,000 and 20,000 sample

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

8 sizes also identified two parameters in the PWSCC growth model (i.e., the component-to-component and within-component variability factors) as being the second-and third-most important inputs in the analysis, respectively. Given the consistency between the results based on the 2,000 and 20,000 sample sizes, the former was determined to be appropriate for the surrogate modeling application.

Table 1: Ranked permutation importance values for the top ten xLPR code inputs across all QoIs (multivariate) using various sample sizes.

Rank Input Variable Description Permutation Importance 200 Realizations 2,000 Realizations 20,000 Realizations 1

WRS at point 1 0.5973 0.9740 1.2146 2

PWSCC growth model component-to-component variability factor 0.0280 0.2029 0.3489 3

PWSCC growth model within-component variability factor 0.1503 0.2416 4

WRS at point 14 0.0301 5

WRS at point 26 0.0680 0.0268 0.0177 6

WRS at point 22 0.0501 0.0230 7

WRS at point 2 0.0158 0.0176 0.1115 8

WRS at point 23 0.0163 9

WRS at point 24 0.0545 0.0162 0.0162 10 WRS at point 7 0.0133 0.0158 1.2146 Surrogate Modeling for Generating PFM Analysis Outputs. Following the approach described in Section 3.1, the normalized crack depth time-series was predicted using three ML methods: (1) linear regression, (2) single-trained random forest, and (3) random forest using a dataset of 2,000 random samples generated by the xLPR code. For the linear regression method, the dataset was split into 75 percent for training and 25 percent for testing; for the random forest method, the dataset was split conversely. Also, the random forest model was parameterized using 250 estimators versus the 1,000 estimators that were used in the application for determining the most important inputs. Reducing the number of estimators in the ensemble allows the random forest model to generalize better.

Figure 5 shows the three ML model predictions vs. the xLPR code predictions, which represent the ground truth. The linear regression results captured the time to through-wall crack penetration (i.e., normalized crack depth value of 1.0), even though they did not fully capture the curvature of the crack depth propagation or the termination at the normalized crack depth value of 1.0. For this reason, the vertical axis is twice that of the other two plots. The single-trained random forest model captured the curvature of the crack depth propagation well; however, it predicted a normalized crack depth of 1.0 sooner than the xLPR code simulation predicted. Lastly, since the random forest regression method is stochastic, 100 random forest models were trained to determine the spread of the predictions. The results show that ML methods can support time-series prediction for determining the time after which a leak will occur given a set of input samples from the xLPR code, particularly in the case of the random forest regression method. A potential benefit of using the random forest method over the linear regression method is that fewer samples were

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

9 required (i.e., only 25 percent of the dataset was required for training the random forest model versus 75 percent of the dataset required for training the linear regression model).

Figure 5: xLPR code predictions of normalized crack depth time-series vs. ML model predictions using linear regression (left), single-trained random forest regression (middle), and random forest regression (right).

4. Use Case 3Overcoming Sparse Data The third use case explored ML methods to overcome sparse data and enable long-term predictions of materials compatibility of nuclear reactor components in molten salt environments. Molten salt reactors (MSRs) for nuclear power generation are a current area of interest in nuclear research and development.

Molten salts based on fluorides and chlorides are being considered across a broad spectrum of MSRs (Delpech et al., 2010, Serp et al., 2014). There is, in general, a lack of advanced materials that are robust against material degradation in these reactors for tens of years. Further, corrosion of structural materials in molten salts is a critical degradation phenomenon that presents a barrier to the technical realization of MSRs. Notably, there are significant gaps in the scientific understanding of the governing mechanisms and correspondingly large gaps in the ability to predict corrosion rates. Multiple corrosion mechanisms can also occur simultaneously, which increases the complexity of determining the degradation kinetics. However, surrogate ML models may help to fill these data gaps.

A key challenge is that certain materials degradation phenomena might only appear after years or tens of years. The routine experiments, modeling, and monitoring of today can only achieve short timescales (e.g.,

minutes, days, or months) and thus may not be representative of long-term material degradation. Therefore, it is beneficial to develop, validate, and verify computational methods that can be used to predict materials compatibility and the performance of MSRs over their operational timescales. These methods consider model sensitivities and uncertainties and propagate them under varying conditions. Routine, long-term simulations are computationally intensive and hence not practical from an engineering standpoint when explicit physics-based approaches are used. A promising route for addressing these challenges is to enhance multi-scale modeling simulations with surrogate ML models. In general, only a small number of long-term experiments are currently performed, typically on the order of ten (e.g., a study might have 5 different alloy compositions for the same set of corrosion conditions). To overcome this sparsity of data to make long-term predictions, one option is to use a combination of experimental data and ML-augmented simulation data to predict points for conditions or systems that differ from those of the experimental data. This augmentation can be done with consideration of the joint uncertainty probability distributions of the long-term predictions with uncertainty propagation using Bayesian methods. With Bayesian methods, it is possible to first create a surrogate model using best guesses from simulations, and later update the posterior

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

10 distributions (i.e., final models) each time additional experimental or theoretical data become available (Matera et al., 2019). This strategy is compatible with using surrogate ML models for loss of metals during corrosion processes in MSRs, provided that the surrogate ML model has quantifiable uncertainties. The corrosion rates have been found to be nearly linear relative to chemical elements thermodynamic activities in one set of studies under MSR conditions (Pillai et al., 2021), particularly with the thermodynamic activity of chromium (Pillai et al., 2015, Pillai et al., 2023). Accordingly, its useful to investigate whether a surrogate ML model can predict the activity of metal alloys for arbitrary compositions at arbitrary temperatures. If so, such a model could then be used in multi-scale modeling efforts.

4.1 Approach Sparse data set augmentation by a surrogate ML model was investigated to predict the thermodynamic activity of chemical elements in alloys, with investigation of the surrogate ML models ability to return accurate predictions and uncertainties. The performance of a multi-linear regression model was also compared. The temperature range was 600 to 800 degrees Celsius (°C) (1,112 to 1,472 degrees Fahrenheit

(°F)). The alloys were comprised of the following ten elements: iron, chromium, manganese, silicon, carbon, titanium, molybdenum, aluminum, niobium, and nickel, with concentrations chosen from the realistic range for iron-based alloys. The training and testing data creation and validation were both based on the thermodynamic calculations from the Thermo-Calc software for Calculation of Phase Diagrams (CALPHAD) (Andersson et al., 2002). A successful surrogate ML model with reliable uncertainty quantification for its predictions could enable ML-augmented multi-scale modeling if it can generate more rapid predictions than CALPHAD. Additionally, as noted, the surrogate ML model could be built upon using experimental data and Bayesian methods, thereby using both forms of data to create useful predictions with acceptable uncertainties.

To create the training and testing data, CALPHAD was used to calculate the thermodynamic activities of each of the ten elements present for 50,000 points of iron-based alloys at temperatures of 600, 700, and 800°C, with the points chosen by astroidal Sobol sampling in the realistic concentration ranges. The CALPHAD calculations additionally provided the percent of face-centered cubic gamma and body-centered cubic alpha phases, which were also used in the training. The activity of the elements is not simply a linear function of their concentration, and the presence of other elements affects the thermodynamic activity of each. Figure 6 shows some representative element activities in this range (sliced out of the multi-dimensional space) to illustrate that the activities do not have a functional dependence on the concentration of those elements. Clearly, the responses are nonlinear.

Figure 6: Activity versus concentration for chromium (left), manganese (center), and silicon (right) as calculated by CALPHAD for iron-based alloys at temperatures of 600, 700, and 800°C.

Although there are 150,000 data points in total, the problem is one of sparse data because even this amount data is not sufficient for accurate linear interpolation given the 13-dimension, non-linear dependence.

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

11 However, as will be shown, a surrogate ML model can predict the activities of the various elements with high accuracy and good computational efficiency relative to the CALPHAD model. There is an upfront training cost, but following training, the surrogate ML model enables predictions for arbitrary compositions and temperatures within the range of training.

The surrogate ML modeling method that was chosen was based on Gaussian process (GP) regression. GPs have gained popularity in ML due to their combination of high accuracy and ability to provide estimated uncertainties for predicted points. A limitation is that GPs do not scale well to large datasets and training points, so techniques such as bagging or binning must be used for large datasets (e.g., tens of thousands of data points). To accomplish this scaling, a first stage of unsupervised ML is performed with constrained k-means clustering to produce 200 clusters with 500 to 1,000 points per cluster, which is suitable for making a piecewise surrogate model. This level of cluster size is near the practical limit for the conventional computing architectures of today with GPs. As noted, GPs enable uncertainty estimates for the predictions, and these uncertainties can be compared to the actual error on untrained points as well as from statistical sampling between multiple choices of training sets. These comparisons are important for trust considerations, which are a priority in safety-related applications.

The following approach was used to create the piecewise GP (p-GP) surrogate model with uncertainty quantification as a priority:

1. Constrained k-means clustering was performed to create regions for the piecewise surrogate model.
2. For each cluster, GP regression was performed with five-fold Monte Carlo cross-validation with an 80 percent training, 20 percent testing split within each fold. The GP regressions were performed independently for the activities of each of the ten elements.

During the GP regression, the kernels evaluated were: Mat32, Mat52, radial basis function, exponential, cosine, and the kernel retained was whichever achieved a regression coefficient of determination, r2, greater than 0.97 first or the kernel with the highest r2 value if no kernel achieved a value greater than 0.97. Within the five-folds per element activity for a given cluster, it was possible for different kernels to be chosen across the different folds. The surrogate model then involves averaging the predictions from this set of 5 GPs. With ten elements, this means there are 50 GPs for a given cluster. The estimate of the final surrogate ML model uncertainty of the prediction is taken as the greater of either (a) the average one standard deviation (1) uncertainty returned by the 5 GPs, which is the composite mean GP-predicted 1 uncertainty,

<UGP>), or (b) the 1 variability from the 5-fold cross validation, UCV. This pair of uncertainties (i.e., one from the GP and one from the statistical sampling) makes it possible to check the GPs ability to account for their epistemic uncertainties. The final surrogate model uncertainty was then taken as UF = max(<UGP>,

UCV) for each elemental activity at each cluster.

Computational considerations were also important in this use case. For these applications, the method should scale linearly and be parallelizable for both training and evaluation. On the order of 1,000 GP points were needed for reasonable accuracy and to remain viable with conventional computing architectures (e.g.,

a system with less than a 5 gigahertz processor and 40 gigabytes (GBs) of random-access memory).

Accordingly, the full training dataset of 150,000 points was divided piecewise by constrained k-means clustering with clusters constrained to 500 to 1,000 points per cluster. With these choices, the clustering and training each finished on a timescale of days with 200 clusters. The total number of GPs within the piecewise surrogate model was thus 20,000 (i.e., 200 clusters times 5 samplings per activity times 10 activities). This type of surrogate ML model does not compress the data. The training data was 35 megabytes on disk, while the surrogate model was much larger because the storage of GPs is known to scale at the order of O(n2) for n training points (Wang et al., 2019, Rasmussen & Williams, 2006). The total size of the surrogate ML model was on the order of 80 GBs when serialized, and greater than 10 GBs of memory if completely loaded into memory. However, the piecewise surrogate was utilized by cycling

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

12 through the clusters (which could be parallelized), thus requiring less than 10 GBs of memory during predictions. The loading time was on the order of 10 seconds per cluster, and the evaluation for each predicted elemental activity at a given composition required, on average, 0.0052 seconds with a standard deviation of 0.0065 seconds.

4.2 Results and Discussion Following the training and testing, the activities of 15,000 new data points were used to validate the model.

Figure 7 shows the representative parity plots for the predicted activities as calculated by the p-GP surrogate model versus the actual activities calculated by CALPHAD for chromium, manganese, and silicon. All the plots show good agreement between the p-GP surrogate model and the validation data.

Figure 7: Representative parity plots for the p-GP surrogate model predicted versus CALPHAD-calculated activities for several chemical elements in iron-based alloys at various temperatures.

To demonstrate the benefit of this type of surrogate ML model and a multi-linear regression, an ordinary least squares surrogate (OLS) model was made with the same input data. Although not shown here, the parity plots obtained for the OLS model were nonlinear. Table 2 shows the mean absolute errors (MAEs) for each model and each elemental activity along with the ratio of the errors, taken as the p-GP model MAE divided by the OLS model MAE. The comparisons show that MAE from the p-GP model is on the order-of-a-hundredth to the order-of-a-tenth that of the OLS model. The parity plots and comparisons with the OLS surrogate model predictions both indicate that the trained p-GP surrogate model was able to capture the non-linearity of the data well.

Table 2: Accuracies of the p-GP and OLS surrogate models.

Activity Iron Chromium Manganese Silicon Carbon p-GP Surrogate Model MAE 2.2 x 10-4 6.6 x 10-4 2.6 x 10-8 8.8 x 10-13 3.5 x 10-5 OLS Surrogate Model MAE 4.4 x 10-3 1.6 x 10-2 1.6 x 10-6 5.1 x 10-11 1.5 x 10-4 Ratio of MAEs 0.05 0.04 0.02 0.02 0.23

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

13 Activity Titanium Molybdenum Aluminum Niobium Nickel p-GP Surrogate Model MAE 9.2 x 10-9 8.2 x 10-6 2.5 x 10-11 2.3 x 10-7 2.0 x 10-7 OLS Surrogate Model MAE 4.0 x 10-8 8.3 x 10-5 2.3 x 10-9 1.1 x 10-6 1.5 x 10-5 Ratio of MAEs 0.23 0.10 0.01 0.20 0.01 Uncertainty estimates for the p-GP surrogate model should be reliable and within a specified tolerance to render confidence in applications of the proposed method. Thus, two uncertainty metrics were examined:

(1) whether <UGP> was less than UCV (a measure of the GPs ability to include epistemic uncertainty in

<UGP>), and (2) whether the final surrogate ML model predicted UF in agreement with the distribution of the prediction errors as measured by the difference between the predictions and actual values (a measure of the final accuracy of UF). For the first metric, considering all the predicted points across all the elemental activities, <UGP> was less than UCV for 97 percent of the values. This result suggests that the GPs do a good job at including epistemic uncertainties in their estimated uncertainties. For the second metric, 2 x UF was compared to the prediction errors (i.e., the residuals, R). For accurate normally distributed errors, 2 x UF < R will be true for 95 percent of the points. Here, it was found that the 2 x UF < R errors were 95.0 +/- 0.8 percent of the time when averaged across all ten element activities and across all 15,000 validation points. This level of agreement was fortuitous. These results demonstrate that taking an envelope of 2 x UF as a 95-percent confidence interval works well. For points found to be further away from the prediction estimate (e.g., 3 x UF), the residuals were greater than would be expected for normally distributed uncertainties.

However, the ML methods used here would allow for more complex uncertainty estimates than was used in the limited analysis presented here.

This use case demonstrated that, for this problem, this type of surrogate ML model was suitable for accurate target and uncertainty predictions relative to OLS, even when using a reduced, one-dimensional representation of the uncertainties. The uncertainty estimates are thus suitable for sensitivity analysis and uncertainty propagation and can enable reasonable confidence interval predictions for applications where only sparse data is available.

5. Conclusions The NRC staff has explored three use cases of AI/ML for engineering applications involving mechanical systems and component performance. For the first use case, the NRC staff developed an anomaly detector using an LSTM neural network within the structure of an autoencoder to monitor nuclear system performance. For the second use case, the NRC staff used augmented Monte Carlo simulations of piping component integrity using random forest regression models to conduct sensitivity analysis and predict time-series using surrogate models. For the third use case, the NRC staff augmented a sparse data set for MSR materials compatibility research by creating a p-GP surrogate model. The success of these efforts indicates that AI/ML may have a future role in augmenting nuclear regulatory activities through increased efficiency and effectiveness and enhanced decisionmaking.

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

14 Acknowledgements The authors wish to thank John McKirgan for his support and encouragement of the NRC staff in exploring applications of innovative AI/ML methods for engineering applications.

Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology

& Engineering Solutions of Sandia, LLC (NTESS), a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energys National Nuclear Security Administration (DOE/NNSA) under contract DE-NA0003525. This written work is authored by an employee of NTESS. The employee, not NTESS, owns the right, title and interest in and to the written work and is responsible for its contents. Any subjective views or opinions that might be expressed in the written work do not necessarily represent the views of the U.S. Government. The publisher acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this written work or allow others to do so, for U.S. Government purposes. The DOE will provide public access to results of federally sponsored research in accordance with the DOE Public Access Plan.

References Andersson, J-O, T. Helander, L. Hglund, and P. Shi, Thermo-Calc and DICTRA, Computational Tools for Materials Science. Calphad, 2002. pp. 273-312.

Carlson, J., M. Homiack, and R. Iyengar, Autonomous Researcher Feasibility Studies, TLR-RES/DE/REB-2022-13, 2022. Washington, DC: U.S. NRC.

Delpech, S., C. Cabet, C. Slim, and G. S. Picard, Molten Fluorides for Nuclear Applications.

materialstoday, 2010. pp. 34-41.

Hochreiter, S. and J. Schmidhuber, Long Short-Term Memory. Neural Computation, 1997. pp. 1735-1780.

Homiack, M., G. Facco, M. Benson, M. Erickson, and C. Harrington, Extremely Low Probability of Rupture Version 2 Probabilistic Fracture Mechanics Code, NUREG-2247, 2021. Washington, DC: U.S. NRC.

Hund, L., J. Lewis, N. Martin, M. Starr, D. Brooks, A. Zhang, R. Dingreville, A. Eckert, J. Mullins, P.

Raynaud, D. Rudland, D. Dijamco, and S. Cumblidge, Technical Basis for the use of Probabilistic Fracture Mechanics in Regulatory Applications, NUREG/CR-7278, 2022. Washington, DC: U.S. NRC.

Matera, S., W. F. Schneider, A. Heyden, and A. Savara, Progress in Accurate Chemical Kinetic Modeling, Simulations, and Parameter Estimation for Heterogeneous Catalysis. ACS Catal., 2019. pp. 6624-6647.

Pillai, R., S. S. Raiman, and B. A. Pint, First Steps toward Predicting Corrosion Behavior of Structural Materials in Molten Salts. Journal of Nuclear Materials, 2021. p. 152755.

Pillai, R., D. Sulejmanovic, T. Lowe, S. S. Raiman, and B. A. Pint, Establishing a Design Strategy for Corrosion Resistant Structural Materials in Molten Salt Technologies. JOM, 2023. pp. 994-1005.

Pillai, R., W. G. Sloof, A. Chyrkin, L. Singheiser, and W. J. Quadakkers, A New Computational Approach for Modelling the Microstructural Evolution and Residual Lifetime Assessment of MCrAlY Coatings.

Materials at High Temperatures, 2015. pp. 57-67.

Rasmussen, C. E. and C. K. I. Williams, Gaussian Processes for Machine Learning, Volume 1, 2006.

Cambridge, Massachusetts: MIT Press.

Serp, J., M. Allibert, O. Bene, S. Delpech, O. Feynberg, V. Ghetta, D. Heuer, D. Holcomb, V. Ignatiev, J.

L. Kloosterman, L. Luzzi, E. Merle-Lucotte, J. Uhlí, R. Yoshioka, and D. Zhimin, The Molten Salt Reactor (MSR) in Generation IV: Overview and Perspectives. Progress in Nuclear Energy, 2014. pp. 308-319.

Srivastava, N., E. Mansimov, and R, Salakhutdinov, Unsupervised Learning of Video Representations using LSTMs. in Proceedings of the 32nd International Conference on Machine Learning, 2015. Lille, France.

ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

15 Wang, K. E., G. Pleiss, J. R. Gardner, S. Tyree, K. Q. Weinberger, and A. G. Wilson, Exact Gaussian Processes on a Million Data Points. in Proceedings of the 33rd Conference on Neural Information Processing Systems, NeurIPS, 2019. Vancouver, Canada.