ML23221A402: Difference between revisions

From kanterella
Jump to navigation Jump to search
(StriderTol Bot insert)
 
(StriderTol Bot change)
 
Line 17: Line 17:


=Text=
=Text=
{{#Wiki_filter:Submitted to PSAM 2023 for publication on October 23, 2023. This preprint (draft) is not to be cited or reproduced.
{{#Wiki_filter:Engineering Applications of Artificial Intelligence and Machine Learning for Mechanical Systems and Component Performance
Engineering Applications of Artificial Intelligence and Machine Learning for Mechanical Systems and Component Performance Matthew Homiack1*, John Matrachisia1, Tristan Villarreal1, Aditya Savara1, Raj Iyengar1, Stephen Verzi2, Michael Starr2, and Rishi Pillai3 1
 
U.S. Nuclear Regulatory Commission, Washington, D.C., U.S.A.
Matthew Homiack1*, John Matrachisia1, Tristan Villarreal1, Aditya Savara1, Raj Iyengar1, Stephen Verzi2, Michael Starr2, and Rishi Pillai3 1 U.S. Nuclear Regulatory Commission, Washington, D.C., U.S.A.
2 Sandia National Laboratories, Albuquerque, New Mexico, U.S.A.
2 Sandia National Laboratories, Albuquerque, New Mexico, U.S.A.
3 Oakridge National Laboratories, Oak Ridge, Tennessee, U.S. A.
3 Oakridge National Laboratories, Oak Ridge, Tennessee, U.S. A.
Corresponding Author (Matthew.Homiack@nrc.gov)
* Corresponding Author (Matthew.Homiack@nrc.gov)
 
Abstract Artificial intelligence (AI) and machine learning (ML) methods are some of the fastest-growing technologies globally and have the potential to enhance efficiency, effectiveness, and decision-making processes for the nuclear industry. This paper explores several recent use cases of AI/ML methods in support of the U.S. Nuclear Regulatory Commission (NRC) staffs safety research efforts for mechanical systems and component performance.
Abstract Artificial intelligence (AI) and machine learning (ML) methods are some of the fastest-growing technologies globally and have the potential to enhance efficiency, effectiveness, and decision-making processes for the nuclear industry. This paper explores several recent use cases of AI/ML methods in support of the U.S. Nuclear Regulatory Commission (NRC) staffs safety research efforts for mechanical systems and component performance.
The first use case explored ML to monitor the performance of a system. This study used a full-scope boiling water reactor (BWR) simulator for synthetic data generation. Scenarios were created to induce component malfunctions that may go undetected by operators and could lead to adverse operating conditions. The goal was to support early detection of the malfunctions using ML and thereby provide operators with increased response times. A long short-term memory (LSTM) autoencoder was trained and tested for identifying the anomalies in real-time. The study demonstrated the potential for using ML to monitor system performance.
 
The second use case explored ML to augment Monte Carlo simulation. For this use case, an interface was developed between open-source ML models and the Extremely Low Probability of Rupture (xLPR) probabilistic fracture mechanics (PFM) code. The xLPR code was used to analyze leak-before-break behavior in a pressurized water reactor piping system subject to cracking. Supervised ML, in the form of random forest regression, was applied to the sample input and output data from the xLPR code to conduct a sensitivity analysis. Through this analysis, the input variables that were most important with respect to the selected quantities of interest (QoIs) were determined both individually in univariate analyses and across all the QoIs in a multivariate analysis. The ML models were then further applied to explore time series prediction as a surrogate model for the xLPR simulation.
The first use case explored ML to monitor the perform ance of a system. This study used a full-scope boiling water reactor (BWR) simulator for synthetic data generation. Scenarios were created to induce component malfunctions that may go undetected by operators and could lead to adverse operating conditions. The goal was to support early detection of the malfunctions us ing ML and thereby provide operators with increased response times. A long short-term memory (LSTM) auto encoder was trained and tested for identifying the anomalies in real-time. The study demonstrated the potential for using ML to monitor system performance.
The third use case explored ML to overcome sparse data and enable long-term predictions of materials compatibility for nuclear reactor components in molten salt environments. It involved surrogate ML model prediction of values relevant to corrosion and associated uncertainties. This use case involved the application of a piecewise surrogate ML model for predicting the activities of chemical elements in alloys.
 
The second use case explored ML to augment Monte Carlo simulation. For this use case, an interface was developed between open-source ML models and the Extremely Low Probability of Rupture (xLPR) probabilistic fracture mechanics (PFM) code. The xLPR code was used to analyze leak-before-break behavior in a pressurized water reactor piping system subject to cracking. Supervised ML, in the form of random forest regression, was applied to the sample i nput and output data from the xLPR code to conduct a sensitivity analysis. Through this analysis, the input va riables that were most important with respect to the selected quantities of interest (QoIs) were determin ed both individually in univariate analyses and across all the QoIs in a multivariate analysis. The ML models were then further applied to explore time series prediction as a surrogate model for the xLPR simulation.
 
The third use case explored ML to overcome sparse data and enable long-term predictions of materials compatibility for nuclear reactor components in molten salt environments. It involved surrogate ML model prediction of values relevant to corrosion and associated uncertainties. This use case involved the application of a piecewise surrogate ML model for pred icting the activities of chemical elements in alloys.
Such predictions are useful for lifetime assessment of materials compatibility because the activities have
Such predictions are useful for lifetime assessment of materials compatibility because the activities have


The views expressed in this paper are those of the authors and do not reflect the views of the U.S. Nuclear Regulatory Commission. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. Approved for public release; distribution is unlimited.
The views expressed in this paper are those of the authors a nd do not reflect the views of the U.S. Nuclear Regulatory Commissi on. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. Approved for public release; distribution is unlimited.
This report was prepared as an account of work sponsored by an agency of the U.S. Government. Neither the U.S. Government nor a ny agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for any third partys use, or the results of such use, of any information, apparatus, product, or process disclosed in this report, or represents tha t its use by such third party would not infringe privately owned rights. The views expresse d in this paper are not necessarily those of the U.S. Nuclea r Regulatory Commission.
 
1 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual


This report was prepared as an account of work sponsored by an agency of the U.S. Government. Neither the U.S. Government nor any agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for any third partys use, or the results of such use, of any information, apparatus, product, or process disclosed in this report, or represents that its use by such third party would not infringe privately owned rights. The views expressed in this paper are not necessarily those of the U.S. Nuclear Regulatory Commission.
been associated with long-term loss of metal due to corrosion attack, particularly loss of chromium. The demonstration showed that surrogate ML models can be used to augment sparse datasets with predicted values for unmeasured, or not explicitly calculated da tapoints, with reasonable prediction uncertainties.
1


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual been associated with long-term loss of metal due to corrosion attack, particularly loss of chromium. The demonstration showed that surrogate ML models can be used to augment sparse datasets with predicted values for unmeasured, or not explicitly calculated datapoints, with reasonable prediction uncertainties.
Keywords: Artificial Intelligence, Nuclear, Regulation, Research
Keywords: Artificial Intelligence, Nuclear, Regulation, Research
: 1. Introduction AI/ML methods are some of the fastest-growing technologies globally and have the potential to enhance the business of industry, academia, and government. Like for other applications, the benefits of AI/ML for engineering generally include increases in efficiency and effectiveness and enhanced decision-making. For instance, AI/ML methods can efficiently generate data as compared to physical experiments and other forms of synthetic data generation. AI/ML methods can be more effective at processing large or high-velocity data, thereby supporting analysis of highly complex systems. AI/ML methods can also enhance decision-making by identifying patterns and trends and enabling uncertainty quantification.
: 1. Introduction
 
AI/ML methods are some of the fastest-growing technol ogies globally and have the potential to enhance the business of industry, academia, and government. Like for other applica tions, the benefits of AI/ML for engineering generally include increases in efficiency and effectiveness and enhanced decision-making. For instance, AI/ML methods can efficiently generate data as compared to physical experiments and other forms of synthetic data generation. AI/ML methods can be more effective at processing large or high-velocity data, thereby supporting analysis of highly complex systems. AI/ML methods can also enhance decision-making by identifying patterns and trends a nd enabling uncertainty quantification.
 
This paper explores three recent use cases of AI/ML methods in support of the NRC staffs safety research efforts for mechanical systems and component performance in the following areas:
This paper explores three recent use cases of AI/ML methods in support of the NRC staffs safety research efforts for mechanical systems and component performance in the following areas:
: 1. system performance monitoring
: 1. system performance monitoring
: 2. Monte Carlo simulation and PFM for piping integrity analysis
: 2. Monte Carlo simulation and PFM for piping integrity analysis
: 3. overcoming sparse data to enable long-term predictions of materials compatibility of nuclear reactor components in molten salt environments Sections 2, 3, and 4 present the technical approach, results, and discussion for each use case, respectively.
: 3. overcoming sparse data to enable long-term predictio ns of materials compatibility of nuclear reactor components in molten salt environments
: 2. Use Case 1System Performance Monitoring The first use case explored ML methods to monitor the performance of a system. Specifically, an anomaly detector was developed using an unsupervised ML method to detect a malfunction from multivariate, sequential data. Synthetic data for training and testing the ML models were generated using a full scope BWR simulator, which supports modeling of a substantial number of initial conditions, operations, malfunctions, and recording of real-time plant parameters. As a result, the simulator data was suitable for developing realistic scenarios and training various types of ML models for system performance monitoring research.
 
2.1   Approach The approach to demonstrate potential application of ML methods for system performance monitoring began with the curation of a robust dataset. Using the simulator as a surrogate for a real BWR, a subset of relevant plant parameters was captured in time-series datasets. After the samples were collected, techniques such as feature selection, averaging, binning, and splitting were applied to make sure that the data was in a form suitable for model training and testing.
Sections 2, 3, and 4 present the technical approach, results, and discussion for each use case, respectively.
The next step was to select the proper ML algorithm. This selection was informed by the characteristics of the data, the problem at hand, and the desired output. For this use case, an anomaly detector was developed with an autoencoder, which uses an unsupervised learning algorithm. An autoencoder only relies on input data for the model, thus during training the input data is set as the ground truth for the predictions. The neural network compresses the input data into a reduced dimension space and reconstructs the information into the output, which in theory should closely match the input data (Srivastava et al., 2015). As the neural network trains, the predicted value back propagates to fit the input using the loss function or, in this case, 2
: 2. Use Case 1System Performance Monitoring
 
The first use case explored ML methods to monitor th e performance of a system. Specifically, an anomaly detector was developed using an unsupervised ML method to detect a malfunction from multivariate, sequential data. Synthetic data for training and testing the ML models were generated using a full scope BWR simulator, which supports modeling of a substantial number of initial conditions, operations, malfunctions, and recording of real-time plant parameters. As a result, the simulator data was suitable for developing realistic scenarios and training various types of ML models for system performance monitoring research.
 
2.1 Approach
 
The approach to demonstrate potential application of ML methods for system performance monitoring began with the curation of a robust dataset. Using the si mulator as a surrogate for a real BWR, a subset of relevant plant parameters was captured in time-series datasets. After the samples were collected, techniques such as feature selection, averaging, binning, and splitting were applied to make sure that the data was in a form suitable for model training and testing.
 
The next step was to select the proper ML algorithm. This selection was informed by the characteristics of the data, the problem at hand, and the desired output. For this use case, an anomaly detector was developed with an autoencoder, which uses an unsupervised learning algorithm. An autoencoder only relies on input data for the model, thus during training the input da ta is set as the ground truth for the predictions. The neural network compresses the input data into a reduced dimension space and reconstructs the information into the output, which in theory should closely match th e input data (Srivastava et al., 2015). As the neural network trains, the predicted value back propagates to f it the input using the loss function or, in this case,
 
2 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
the mean squared error (MSE). The MSE is based on the residuals between the input,, and the autoencoders predicted output,. During training, the autoencoder MSE was used as the loss function and the autoencoder was trained to minimize the MSE for the training set. When new datasets like the training dataset are fed as inputs to the model, the MSE of the predicted outputs will be low; however, when new datasets quantitatively different from the training dataset are fed as inputs, the MSE will be relatively high.
 
Figure 1 shows the general method of a standard auto encoder. The autoencoder has a symmetrical neural network architecture composed of an encoder, a latent space, and a decoder. For this use case, because the input data is sequential, the autoencoder used LSTM neural networks. An LSTM neural network is a type of recurrent neural network that has special gates which can feed output back to themselves and forget information from the previous state (Hochreiter & Schmidhuber, 1997). Due to their nature, LSTM neural networks work well for processing time-series data. With the combination of the LSTM neural network within the structure of an autoencoder, the model can train with a window of multivariate sequential data, such as a time series.


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual the mean squared error (MSE). The MSE is based on the residuals between the input,  , and the autoencoders predicted output,  . During training, the autoencoder MSE was used as the loss function and the autoencoder was trained to minimize the MSE for the training set. When new datasets like the training dataset are fed as inputs to the model, the MSE of the predicted outputs will be low; however, when new datasets quantitatively different from the training dataset are fed as inputs, the MSE will be relatively high.
Figure 1 shows the general method of a standard autoencoder. The autoencoder has a symmetrical neural network architecture composed of an encoder, a latent space, and a decoder. For this use case, because the input data is sequential, the autoencoder used LSTM neural networks. An LSTM neural network is a type of recurrent neural network that has special gates which can feed output back to themselves and forget information from the previous state (Hochreiter & Schmidhuber, 1997). Due to their nature, LSTM neural networks work well for processing time-series data. With the combination of the LSTM neural network within the structure of an autoencoder, the model can train with a window of multivariate sequential data, such as a time series.
Figure 1: Illustration of a general autoencoder architecture with MSE-based training shown by arrows.
Figure 1: Illustration of a general autoencoder architecture with MSE-based training shown by arrows.
The final step of the approach was to train and evaluate the ML model. The autoencoder was trained using a dataset based on a subset of normal BWR operating conditions and evaluated from datasets of abnormal operating conditions. This step is an iterative process and focused on optimization of the neural network by performing a sensitivity study on the hyper-parameters (e.g., epochs, batch sizes, learning rates, and activation functions), preprocessed data, and the overall architecture of the neural network. After multiple iterations of the autoencoder, the model was successfully trained and then evaluated for its anomaly detection capabilities.
2.2    Results and Discussion The performance of the anomaly detector built using the ML models was evaluated by comparing the MSE from the normal operating condition data against the MSE from the abnormal operating condition data. The optimized neural network was trained with 251,783 trainable parameters; 14,878 training samples; 100 epochs; and 7 input features. The input features were a subset of relevant plant parameters that could be available to a typical BWR instrumentation and control system, such as the system flow rates, temperatures, pressures, and water and power levels.
3


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual Figure 2 shows the MSE results from the training dataset. The MSE reached a maximum value of 3.23 with an average value of 3.03. The maximum value is critical for determining the threshold for detecting an anomaly, thus the MSE from the normal operating conditions can be compared to the MSE from the abnormal operating conditions.
The final step of the approach was to train and eval uate the ML model. The autoencoder was trained using a dataset based on a subset of normal BWR operating conditions and evaluated from datasets of abnormal operating conditions. This step is an iterative process a nd focused on optimization of the neural network by performing a sensitivity study on the hyper-parameters (e.g., epochs, batch sizes, learning rates, and activation functions), preprocessed data, and the overall architecture of the neural network. After multiple iterations of the autoencoder, the model was successfully trained and then evaluated for its anomaly detection capabilities.
 
2.2 Results and Discussion
 
The performance of the anomaly detector built using the ML models was evaluated by comparing the MSE from the normal operating condition data against the MSE from the abnormal operating condition data. The optimized neural network was trained with 251,783 trainable parameters; 14,878 training samples; 100 epochs; and 7 input features. The input features were a subset of relevant plant parameters that could be available to a typical BWR instrumentation and control system, such as the system flow rates, temperatures, pressures, and water and power levels.
 
3 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
Figure 2 shows the MSE results from the training dat aset. The MSE reached a maximum value of 3.23 with an average value of 3.03. The maximum value is critical for determining the threshold for detecting an anomaly, thus the MSE from the normal operating conditions can be compared to the MSE from the abnormal operating conditions.
 
Figure 2: Training dataset results based on normal operating conditions.
Figure 2: Training dataset results based on normal operating conditions.
Figure 3 shows the MSE results from four test cases where a simulated equipment malfunction was introduced at 10, 15, 20, and 25 minutes, respectively. The results are compared to the MSE from the normal operating conditions. The simulated malfunction was a recirculation pump runaway where the recirculation flow rate increases unexpectedly resulting in an increase in core power. As the results show, when the malfunction was introduced, the MSE markedly increased giving a clear indication that there was a deviation from normal operating conditions.
4


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual Figure 3: Autoencoder MSE for normal operating conditions vs. abnormal operating conditions produced via simulated equipment malfunctions at different points in time.
Figure 3 shows the MSE results from four test cases where a simulated equipment malfunction was introduced at 10, 15, 20, and 25 minutes, respectively. The results are compared to the MSE from the normal operating conditions. The simulated malfunction was a r ecirculation pump runaway where the recirculation flow rate increases unexpectedly resulting in an increase in core power. As the results show, when the malfunction was introduced, the MSE markedly increased giving a clear indication that there was a deviation from normal operating conditions.
A fifth test case was also run to demonstrate that the neural network was properly trained. For this test case, simulated equipment malfunctions were introduced at several intervals with different severities. Figure 4 shows the results indicating the correct timing and severity of the malfunctions.
 
5
4 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
Figure 3: Autoencoder MSE for normal operating conditio ns vs. abnormal operating conditions produced via simulated equipment malfunctions at different points in time.
 
A fifth test case was also run to demonstrate that the neural network was properly trained. For this test case, simulated equipment malfunctions were introduced at sev eral intervals with different severities. Figure 4 shows the results indicating the correct timing and severity of the malfunctions.
 
5 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
Figure 4: Autoencoder MSE for normal operating conditions vs. abnormal operating conditions produced by simulated equipment malfunctions at various severities and internals.


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual Figure 4: Autoencoder MSE for normal operating conditions vs. abnormal operating conditions produced by simulated equipment malfunctions at various severities and internals.
Overall, the results of the trained autoencoder demonstr ate that some aspects of system performance can be monitored using ML methods. This research could be expanded by further exploring the capabilities and limitations of the autoencoder and by enhancing the anomaly detector by using other ML methods to both detect and classify malfunctions. Although the results of the induced malfunctions were as expected, the model was trained on a small subset of normal operating condition data, thus more training would be needed to fit a wider range of normal operating conditions so that the model w ould not detect these conditions as anomalous.
Overall, the results of the trained autoencoder demonstrate that some aspects of system performance can be monitored using ML methods. This research could be expanded by further exploring the capabilities and limitations of the autoencoder and by enhancing the anomaly detector by using other ML methods to both detect and classify malfunctions. Although the results of the induced malfunctions were as expected, the model was trained on a small subset of normal operating condition data, thus more training would be needed to fit a wider range of normal operating conditions so that the model would not detect these conditions as anomalous.
: 3. Use Case 2Monte Carlo Simulation Support
: 3. Use Case 2Monte Carlo Simulation Support In PFM, stochastic analyses enabled by Monte Carlo simulation are used to better understand the various uncertainties when predicting the load-carrying capabilities of components containing cracks. The second use case explored ML methods to augment PFM simulations by conducting sensitivity analysis and time-series prediction via surrogate models.
 
3.1     Approach This use case consisted of two separate, but related applications of ML coupled with the xLPR code. xLPR is a PFM code for piping applications that was developed jointly by NRC Office of Nuclear Regulatory Research and the Electric Power Research Institute (Homiack et al., 2021). This code was used to simulate the potential behaviors of a preexisting circumferential crack in an un-mitigated, Westinghouse-designed, pressurized water reactor pressurizer surge line weld subject to primary water stress corrosion cracking (PWSCC).
In PFM, stochastic analyses enabled by Monte Carlo si mulation are used to better understand the various uncertainties when predicting the load-carrying cap abilities of components containing cracks. The second use case explored ML methods to augment PFM simulations by conducting sensitivity analysis and time-series prediction via surrogate models.
There were 56 inputs in the simulation, including both constant and uncertain or probabilistic inputs. These inputs covered such areas as the geometry of the pipe, operating conditions, crack size, applied stresses (e.g., welding residual stresses (WRS) defined at 26 points through the thickness of the weld), and material properties (e.g., PWSCC growth rate model parameters). Six outputs or QoIs were selected related to key behaviors of the problem, such as the crack size (e.g., crack depth normalized by the weld thickness) and whether the cracks cause leaks (e.g., occurrence of leak). A full list of all the inputs and outputs is in (Carlson et al., 2022).
 
6
3.1 Approach
 
This use case consisted of two separate, but related a pplications of ML coupled with the xLPR code. xLPR is a PFM code for piping applications that was de veloped jointly by NRC Offi ce of Nuclear Regulatory Research and the Electric Power Research Institute (Homiack et al., 2021). This code was used to simulate the potential behaviors of a preexisting circumferen tial crack in an un-mitigated, Westinghouse-designed, pressurized water reactor pressurizer surge line weld subject to primary water stress corrosion cracking (PWSCC).
 
There were 56 inputs in the simulation, including both constant and uncertain or probabilistic inputs. These inputs covered such areas as the geometry of the pipe, operating conditions, crack size, applied stresses (e.g., welding residual stresses (WRS) defined at 26 poin ts through the thickness of the weld), and material properties (e.g., PWSCC growth rate model parameters). Six outputs or QoIs were selected related to key behaviors of the problem, such as the crack size (e.g., crack depth normalized by the weld thickness) and whether the cracks cause leaks (e.g., occurrence of leak). A full list of all the inputs and outputs is in (Carlson et al., 2022).
 
6 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
In both applications, four separate, off-the-shelf, and open-source ML regression methods were explored:
(1) linear regression, (2) random forest, (3) gradient b oosting, and (4) multilayer perceptron. Each of these methods utilize supervised ML where the xLPR simula tion data provided the inputs and target outputs for supervision. The ML models were constructed and trained in Python us ing the scikit-learn library. Linear regression was used to baseline the other models, even though some QoIs were not expected to respond linearly. The ML methods were not optimized for their hyperparameters, and specific values used in each application are detailed next.
 
Determining the Most Important PFM Analysis Inputs. The first application sought to use ML methods to automate a sensitivity analysis. As described in (Hund et al., 2022), sensitivity analysis focuses on identifying how the input uncertainties contribute to th e uncertainty in the QoIs and can help to identify those inputs that explain substantial uncertainty in the model output. For this application, all four regression methods performed similarly; therefore, only the results from the random forest regressor are presented herein. In using the scikit-learn random forest ensemb le regression method, the only parameter besides the random seed that needed to be specified was the number of estimators, which was set to 1,000. The permutation importance method from the scikit-learn insp ection sub-package was used to rank the relative importance of all the input parameters.
 
Surrogate Modeling for Generating PFM Analysis Outputs. The second application sought to use ML methods to improve the efficiency of the PFM simula tion via development and trai ning of a surrogate model that could predict the time series of a specific QoI given a set of input samples from the xLPR code.


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual In both applications, four separate, off-the-shelf, and open-source ML regression methods were explored:
(1) linear regression, (2) random forest, (3) gradient boosting, and (4) multilayer perceptron. Each of these methods utilize supervised ML where the xLPR simulation data provided the inputs and target outputs for supervision. The ML models were constructed and trained in Python using the scikit-learn library. Linear regression was used to baseline the other models, even though some QoIs were not expected to respond linearly. The ML methods were not optimized for their hyperparameters, and specific values used in each application are detailed next.
Determining the Most Important PFM Analysis Inputs. The first application sought to use ML methods to automate a sensitivity analysis. As described in (Hund et al., 2022), sensitivity analysis focuses on identifying how the input uncertainties contribute to the uncertainty in the QoIs and can help to identify those inputs that explain substantial uncertainty in the model output. For this application, all four regression methods performed similarly; therefore, only the results from the random forest regressor are presented herein. In using the scikit-learn random forest ensemble regression method, the only parameter besides the random seed that needed to be specified was the number of estimators, which was set to 1,000. The permutation importance method from the scikit-learn inspection sub-package was used to rank the relative importance of all the input parameters.
Surrogate Modeling for Generating PFM Analysis Outputs. The second application sought to use ML methods to improve the efficiency of the PFM simulation via development and training of a surrogate model that could predict the time series of a specific QoI given a set of input samples from the xLPR code.
At first, the occurrence of leak time series was selected for prediction. All four of the ML regression methods were used to predict the time at which a leak occurs, if a leak occurs, given specific xLPR inputs.
At first, the occurrence of leak time series was selected for prediction. All four of the ML regression methods were used to predict the time at which a leak occurs, if a leak occurs, given specific xLPR inputs.
However, because all the ML methods are supervised, and the case of no leak was much more likely in this problem, there were not enough positive leak samples to support such a prediction. Each ML method could be trained using the entire set of 2,000 samples, but they could not generalize using only a portion the data, including splitting the data into 75 percent for training and 25 percent for testing.
However, because all the ML methods are supervised, and the case of no leak was much more likely in this problem, there were not enough positive leak samples to support such a prediction. Each ML method could be trained using the entire set of 2,000 samples, but th ey could not generalize using only a portion the data, including splitting the data into 75 percen t for training and 25 percent for testing.
Given these limitations, the focus of the study was changed to predicting the normalized crack depth propagation over time. Normalized crack depth propagation was not expected to be linear, thus the linear regression results needed special interpretation beyond a predicted value of 1.0 for the normalized crack depth (i.e., the point at which the crack penetrates through-wall). Both gradient boosting and multilayer perceptron methods are not suited for time-series prediction without some further research and design considerations. Thus, the results for the surrogate modeling were restricted to the linear regression and random forest methods. For time-series prediction, the random forest method was provided with all the xLPR code inputs as well as the current normalized crack depth as an input, and it was trained to predict the next normalized crack depth.
 
3.2   Results and Discussion Determining the Most Important PFM Analysis Inputs. Following the approach described in Section 3.1, the uncertain inputs in the xLPR code simulation were ranked by importance for each QoI individually (i.e., univariate analysis) and across all the QoIs simultaneously (i.e., multivariate analysis).
Given these limitations, the focus of the study was changed to predicting the normalized crack depth propagation over time. Normalized crack depth propaga tion was not expected to be linear, thus the linear regression results needed special interpretation beyond a predicted value of 1.0 for the normalized crack depth (i.e., the point at which the crack penetrates through-wall). Both gradient boosting and multilayer perceptron methods are not suited for time-series prediction without some further research and design considerations. Thus, the results for the surrogate modeling were restricted to the linear regression and random forest methods. For time-seri es prediction, the random forest method was provided with all the xLPR code inputs as well as the current normalized crack depth as an input, and it was trained to predict the next normalized crack depth.
 
3.2 Results and Discussion
 
Determining the Most Important PFM Analysis Inputs. Following the approach described in Section 3.1, the uncertain inputs in the xLPR c ode simulation were ranked by importance for each QoI individually (i.e., univariate analysis) and across all th e QoIs simultaneously (i.e., multivariate analysis).
As the appropriate number of realizations needed to be determined for the surrogate modeling application, three separate ML models were trained given data from 200; 2,000; and 20,000 xLPR realizations, respectively.
As the appropriate number of realizations needed to be determined for the surrogate modeling application, three separate ML models were trained given data from 200; 2,000; and 20,000 xLPR realizations, respectively.
Table 1 lists the top ten most important inputs based on their permutation importance as determined using the random forest ML model for each sample size. Empty cells indicate that the input did not rank in the top ten for the given sample size. All three sample sizes resulted in the WRS at point 1 through the weld thickness being the most important input in the analysis. The results from the 2,000 and 20,000 sample 7


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual sizes also identified two parameters in the PWSCC growth model (i.e., the component-to-component and within-component variability factors) as being the second- and third-most important inputs in the analysis, respectively. Given the consistency between the results based on the 2,000 and 20,000 sample sizes, the former was determined to be appropriate for the surrogate modeling application.
Table 1 lists the top ten most important inputs based on their permutation importance as determined using the random forest ML model for each sample size. Empt y cells indicate that the input did not rank in the top ten for the given sample size. All three sample sizes resulted in the WRS at point 1 through the weld thickness being the most important input in the an alysis. The results from the 2,000 and 20,000 sample
Table 1: Ranked permutation importance values for the top ten xLPR code inputs across all QoIs (multivariate) using various sample sizes.
 
Rank             Input Variable Description                       Permutation Importance 200           2,000         20,000 Realizations   Realizations   Realizations 1   WRS at point 1                                       0.5973         0.9740         1.2146 PWSCC growth model component-to-2                                                        0.0280         0.2029         0.3489 component variability factor PWSCC growth model within-component 3                                                            -           0.1503         0.2416 variability factor 4   WRS at point 14                                         -           0.0301             -
7 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
5   WRS at point 26                                     0.0680         0.0268         0.0177 6   WRS at point 22                                     0.0501         0.0230 7   WRS at point 2                                       0.0158         0.0176         0.1115 8   WRS at point 23                                         -           0.0163             -
 
9   WRS at point 24                                     0.0545         0.0162         0.0162 10   WRS at point 7                                       0.0133         0.0158         1.2146 Surrogate Modeling for Generating PFM Analysis Outputs. Following the approach described in Section 3.1, the normalized crack depth time-series was predicted using three ML methods: (1) linear regression, (2) single-trained random forest, and (3) random forest using a dataset of 2,000 random samples generated by the xLPR code. For the linear regression method, the dataset was split into 75 percent for training and 25 percent for testing; for the random forest method, the dataset was split conversely. Also, the random forest model was parameterized using 250 estimators versus the 1,000 estimators that were used in the application for determining the most important inputs. Reducing the number of estimators in the ensemble allows the random forest model to generalize better.
sizes also identified two parameters in the PWSCC growth model (i.e., the component-to-component and within-component variability factors) as being the sec ond-and third-most important inputs in the analysis, respectively. Given the consistency between the results based on the 2,000 and 20,000 sample sizes, the former was determined to be appropriate for the surrogate modeling application.
Figure 5 shows the three ML model predictions vs. the xLPR code predictions, which represent the ground truth. The linear regression results captured the time to through-wall crack penetration (i.e., normalized crack depth value of 1.0), even though they did not fully capture the curvature of the crack depth propagation or the termination at the normalized crack depth value of 1.0. For this reason, the vertical axis is twice that of the other two plots. The single-trained random forest model captured the curvature of the crack depth propagation well; however, it predicted a normalized crack depth of 1.0 sooner than the xLPR code simulation predicted. Lastly, since the random forest regression method is stochastic, 100 random forest models were trained to determine the spread of the predictions. The results show that ML methods can support time-series prediction for determining the time after which a leak will occur given a set of input samples from the xLPR code, particularly in the case of the random forest regression method. A potential benefit of using the random forest method over the linear regression method is that fewer samples were 8
 
Table 1: Ranked permutation importance values for the t op ten xLPR code inputs across all QoIs (multivariate) using various sample sizes.
 
Rank Input Variable Description Permutation Importance 200 2,000 20,000 Realizations Realizations Realizations 1 WRS at point 1 0.5973 0.9740 1.2146
 
2 PWSCC growth model component-to-component variability factor 0.0280 0.2029 0.3489
 
3 PWSCC growth model within-component variability factor - 0.1503 0.2416
 
4 WRS at point 14 - 0.0301 -
5 WRS at point 26 0.0680 0.0268 0.0177 6 WRS at point 22 0.0501 0.0230 7 WRS at point 2 0.0158 0.0176 0.1115 8 WRS at point 23 - 0.0163 -
9 WRS at point 24 0.0545 0.0162 0.0162 10 WRS at point 7 0.0133 0.0158 1.2146
 
Surrogate Modeling for Generating PFM Analysis Outputs. Following the approach described in Section 3.1, the normalized crack depth time-series was predicted using three ML methods: (1) linear regression, (2) single-trained random forest, and (3) ra ndom forest using a dataset of 2,000 random samples generated by the xLPR code. For the linear regression method, the dataset was split into 75 percent for training and 25 percent for testing; for the random fore st method, the dataset was split conversely. Also, the random forest model was parameterize d using 250 estimators versus the 1,000 estimators that were used in the application for determining the most important inputs. Reducing the number of estimators in the ensemble allows the random forest model to generalize better.
 
Figure 5 shows the three ML model predictions vs. the xLPR code predictions, which represent the ground truth. The linear regression results captured the time to through-wall crack penetration (i.e., normalized crack depth value of 1.0), even though they did no t fully capture the curvature of the crack depth propagation or the termination at the normalized crack depth value of 1.0. For this reason, the vertical axis is twice that of the other two plots. The single-trai ned random forest model captured the curvature of the crack depth propagation well; however, it predicted a nor malized crack depth of 1.0 sooner than the xLPR code simulation predicted. Lastly, since the random forest regression method is stochastic, 100 random forest models were trained to determine the spread of the predictions. The results show that ML methods can support time-series prediction for determining the tim e after which a leak will occur given a set of input samples from the xLPR code, particularly in the cas e of the random forest regression method. A potential benefit of using the random forest method over the linear regression method is that fewer samples were
 
8 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
required (i.e., only 25 percen t of the dataset was required for training the random forest model versus 75 percent of the dataset required for training the linear regression model).
 
Figure 5: xLPR code predictions of normalized crack de pth time-series vs. ML model predictions using linear regression (left), single-trained ra ndom forest regression (middle), and random forest regression (right).
: 4. Use Case 3Overcoming Sparse Data
 
The third use case explored ML methods to overcome sp arse data and enable long-term predictions of materials compatibility of nuclear reactor component s in molten salt environments. Molten salt reactors (MSRs) for nuclear power generation are a current area of interest in nuclear research and development.
Molten salts based on fluorides and chlorides are be ing considered across a broad spectrum of MSRs (Delpech et al., 2010, Serp et al., 2014). There is, in ge neral, a lack of advanced materials that are robust against material degradation in these reactors for tens of years. Further, corrosion of structural materials in molten salts is a critical degradation phenomenon that presents a barrier to the technical realization of MSRs. Notably, there are significant gaps in the scientific understanding of the governing mechanisms and correspondingly large gaps in the ability to predict corrosion rates. Multiple corrosion mechanisms can also occur simultaneously, which increases the complexity of determining th e degradation kinetics. However, surrogate ML models may help to fill these data gaps.
 
A key challenge is that certain materials degradatio n phenomena might only appear after years or tens of years. The routine experiments, modeling, and monito ring of today can only achieve short timescales (e.g.,
minutes, days, or months) and thus may not be represen tative of long-term material degradation. Therefore, it is beneficial to develop, validate, and verify computational methods that can be used to predict materials compatibility and the performance of MSRs over their operational timescales. These methods consider model sensitivities and uncertaintie s and propagate them under varying conditions. Routine, long-term simulations are computationally intensive and hence not practical from an engineering standpoint when explicit physics-based approaches are used. A promising route for addressi ng these challenges is to enhance multi-scale modeling simulations with surrogate ML m odels. In general, only a small number of long-term experiments are currently performed, typically on the or der of ten (e.g., a study might have 5 different alloy compositions for the same set of co rrosion conditions). To overcome this sparsity of data to make long-term predictions, one option is to use a combinatio n of experimental data and ML-augmented simulation data to predict points for conditions or systems th at differ from those of the experimental data. This augmentation can be done with consideration of the jo int uncertainty probability distributions of the long-term predictions with uncertainty propagation usin g Bayesian methods. With Bayesian methods, it is possible to first create a surrogate model using best guesses from simulations, and la ter update the posterior
 
9 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
distributions (i.e., final models) each time additional experimental or theoretical data become available (Matera et al., 2019). This strategy is compatible with using surrogate ML models for loss of metals during corrosion processes in MSRs, provided that the surroga te ML model has quantifiable uncertainties. The corrosion rates have been found to be nearly linear relative to chemical elements thermodynamic activities in one set of studies under MSR conditions (Pillai et al., 2021), particularly with the thermodynamic activity of chromium (Pillai et al., 2015, Pillai et al., 2023). Accordingly, its useful to investigate whether a surrogate ML model can predict the activity of metal alloys for arbitrary compositions at arbitrary temperatures. If so, such a model could then be used in multi-scale modeling efforts.
 
4.1 Approach
 
Sparse data set augmentation by a surrogate ML mode l was investigated to predict the thermodynamic activity of chemical elements in alloys, with inves tigation of the surrogate ML models ability to return accurate predictions and uncertainties. The performance of a multi-linear regression model was also compared. The temperature range was 600 to 800 degrees Celsius (°C) (1,112 to 1,472 degrees Fahrenheit
(°F)). The alloys were comprised of the following ten elements: iron, chromium, manganese, silicon, carbon, titanium, molybdenum, aluminum, niobium, and nickel, with concentrations chosen from the realistic range for iron-based alloys. The training and testing data creation and validation were both based on the thermodynamic calculations from the Thermo-Calc software for Calculation of Phase Diagrams (CALPHAD) (Andersson et al., 2002). A successful surrogate ML model with reliable uncertainty quantification for its predictions could enable ML-a ugmented multi-scale modeling if it can generate more rapid predictions than CALPHAD. Additionally, as noted, the surrogate ML model could be built upon using experimental data and Bayesian methods, thereby using both forms of data to create useful predictions with acceptable uncertainties.


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual required (i.e., only 25 percent of the dataset was required for training the random forest model versus 75 percent of the dataset required for training the linear regression model).
To create the training and testing data, CALPHAD wa s used to calculate the thermodynamic activities of each of the ten elements present for 50,000 points of iron-based alloys at temperatures of 600, 700, and 800°C, with the points chosen by astroidal Sobol sampling in the realistic concentration ranges. The CALPHAD calculations additionally provided the percent of face-centered cubic gamma and body-centered cubic alpha phases, which were also used in the training. The activity of the elements is not simply a linear function of their concentration, and the presence of other elements affects the thermodynamic activity of each. Figure 6 shows some representative element activities in this range (sliced out of the multi-dimensional space) to illustrate that the activities do not have a functional dependence on the concentration of those elements. Clearly, the responses are nonlinear.
Figure 5: xLPR code predictions of normalized crack depth time-series vs. ML model predictions using linear regression (left), single-trained random forest regression (middle), and random forest regression (right).
: 4. Use Case 3Overcoming Sparse Data The third use case explored ML methods to overcome sparse data and enable long-term predictions of materials compatibility of nuclear reactor components in molten salt environments. Molten salt reactors (MSRs) for nuclear power generation are a current area of interest in nuclear research and development.
Molten salts based on fluorides and chlorides are being considered across a broad spectrum of MSRs (Delpech et al., 2010, Serp et al., 2014). There is, in general, a lack of advanced materials that are robust against material degradation in these reactors for tens of years. Further, corrosion of structural materials in molten salts is a critical degradation phenomenon that presents a barrier to the technical realization of MSRs. Notably, there are significant gaps in the scientific understanding of the governing mechanisms and correspondingly large gaps in the ability to predict corrosion rates. Multiple corrosion mechanisms can also occur simultaneously, which increases the complexity of determining the degradation kinetics. However, surrogate ML models may help to fill these data gaps.
A key challenge is that certain materials degradation phenomena might only appear after years or tens of years. The routine experiments, modeling, and monitoring of today can only achieve short timescales (e.g.,
minutes, days, or months) and thus may not be representative of long-term material degradation. Therefore, it is beneficial to develop, validate, and verify computational methods that can be used to predict materials compatibility and the performance of MSRs over their operational timescales. These methods consider model sensitivities and uncertainties and propagate them under varying conditions. Routine, long-term simulations are computationally intensive and hence not practical from an engineering standpoint when explicit physics-based approaches are used. A promising route for addressing these challenges is to enhance multi-scale modeling simulations with surrogate ML models. In general, only a small number of long-term experiments are currently performed, typically on the order of ten (e.g., a study might have 5 different alloy compositions for the same set of corrosion conditions). To overcome this sparsity of data to make long-term predictions, one option is to use a combination of experimental data and ML-augmented simulation data to predict points for conditions or systems that differ from those of the experimental data. This augmentation can be done with consideration of the joint uncertainty probability distributions of the long-term predictions with uncertainty propagation using Bayesian methods. With Bayesian methods, it is possible to first create a surrogate model using best guesses from simulations, and later update the posterior 9


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual distributions (i.e., final models) each time additional experimental or theoretical data become available (Matera et al., 2019). This strategy is compatible with using surrogate ML models for loss of metals during corrosion processes in MSRs, provided that the surrogate ML model has quantifiable uncertainties. The corrosion rates have been found to be nearly linear relative to chemical elements thermodynamic activities in one set of studies under MSR conditions (Pillai et al., 2021), particularly with the thermodynamic activity of chromium (Pillai et al., 2015, Pillai et al., 2023). Accordingly, its useful to investigate whether a surrogate ML model can predict the activity of metal alloys for arbitrary compositions at arbitrary temperatures. If so, such a model could then be used in multi-scale modeling efforts.
Figure 6: Activity versus concentration for chromium (left), manganese (center), and silicon (right) as calculated by CALPHAD for iron-based alloys at temperatures of 600, 700, and 800 °C.
4.1    Approach Sparse data set augmentation by a surrogate ML model was investigated to predict the thermodynamic activity of chemical elements in alloys, with investigation of the surrogate ML models ability to return accurate predictions and uncertainties. The performance of a multi-linear regression model was also compared. The temperature range was 600 to 800 degrees Celsius (°C) (1,112 to 1,472 degrees Fahrenheit
 
(°F)). The alloys were comprised of the following ten elements: iron, chromium, manganese, silicon, carbon, titanium, molybdenum, aluminum, niobium, and nickel, with concentrations chosen from the realistic range for iron-based alloys. The training and testing data creation and validation were both based on the thermodynamic calculations from the Thermo-Calc software for Calculation of Phase Diagrams (CALPHAD) (Andersson et al., 2002). A successful surrogate ML model with reliable uncertainty quantification for its predictions could enable ML-augmented multi-scale modeling if it can generate more rapid predictions than CALPHAD. Additionally, as noted, the surrogate ML model could be built upon using experimental data and Bayesian methods, thereby using both forms of data to create useful predictions with acceptable uncertainties.
Although there are 150,000 data points in total, the probl em is one of sparse data because even this amount data is not sufficient for accurate linear interpolation given the 13-dimension, non-linear dependence.
To create the training and testing data, CALPHAD was used to calculate the thermodynamic activities of each of the ten elements present for 50,000 points of iron-based alloys at temperatures of 600, 700, and 800°C, with the points chosen by astroidal Sobol sampling in the realistic concentration ranges. The CALPHAD calculations additionally provided the percent of face-centered cubic gamma and body-centered cubic alpha phases, which were also used in the training. The activity of the elements is not simply a linear function of their concentration, and the presence of other elements affects the thermodynamic activity of each. Figure 6 shows some representative element activities in this range (sliced out of the multi-dimensional space) to illustrate that the activities do not have a functional dependence on the concentration of those elements. Clearly, the responses are nonlinear.
 
Figure 6: Activity versus concentration for chromium (left), manganese (center), and silicon (right) as calculated by CALPHAD for iron-based alloys at temperatures of 600, 700, and 800°C.
10 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
Although there are 150,000 data points in total, the problem is one of sparse data because even this amount data is not sufficient for accurate linear interpolation given the 13-dimension, non-linear dependence.
 
10
However, as will be shown, a surrogate ML model can predict the activities of the various elements with high accuracy and good computational efficiency relative to the CALPHAD model. There is an upfront training cost, but following training, the surrogate ML model enables predictions for arbitrary compositions and temperatures within the range of training.
 
The surrogate ML modeling method that was chosen was based on Gaussian process (GP) regression. GPs have gained popularity in ML due to their combinati on of high accuracy and ability to provide estimated uncertainties for predicted points. A limitation is that GPs do not scale well to large datasets and training points, so techniques such as bagging or binning must be used for large datasets (e.g., tens of thousands of data points). To accomplish this scaling, a first stage of unsupervised ML is performed with constrained k-means clustering to produce 200 clusters with 500 to 1, 000 points per cluster, which is suitable for making a piecewise surrogate model. This level of cluster size is near the practical limit for the conventional computing architectures of today with GPs. As noted, GPs enable uncertainty estimates for the predictions, and these uncertainties can be compared to the actual error on untrained points as well as from statistical sampling between multiple choices of training sets. These comparisons are important for trust considerations, which are a priority in safety-related applications.


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual However, as will be shown, a surrogate ML model can predict the activities of the various elements with high accuracy and good computational efficiency relative to the CALPHAD model. There is an upfront training cost, but following training, the surrogate ML model enables predictions for arbitrary compositions and temperatures within the range of training.
The surrogate ML modeling method that was chosen was based on Gaussian process (GP) regression. GPs have gained popularity in ML due to their combination of high accuracy and ability to provide estimated uncertainties for predicted points. A limitation is that GPs do not scale well to large datasets and training points, so techniques such as bagging or binning must be used for large datasets (e.g., tens of thousands of data points). To accomplish this scaling, a first stage of unsupervised ML is performed with constrained k-means clustering to produce 200 clusters with 500 to 1,000 points per cluster, which is suitable for making a piecewise surrogate model. This level of cluster size is near the practical limit for the conventional computing architectures of today with GPs. As noted, GPs enable uncertainty estimates for the predictions, and these uncertainties can be compared to the actual error on untrained points as well as from statistical sampling between multiple choices of training sets. These comparisons are important for trust considerations, which are a priority in safety-related applications.
The following approach was used to create the piecewise GP (p-GP) surrogate model with uncertainty quantification as a priority:
The following approach was used to create the piecewise GP (p-GP) surrogate model with uncertainty quantification as a priority:
: 1. Constrained k-means clustering was performed to create regions for the piecewise surrogate model.
: 1. Constrained k-means clustering was performed to create regions for the piecewise surrogate model.
: 2. For each cluster, GP regression was performed with five-fold Monte Carlo cross-validation with an 80 percent training, 20 percent testing split within each fold. The GP regressions were performed independently for the activities of each of the ten elements.
: 2. For each cluster, GP regression was performed with five-fold Monte Carlo cross-validation with an 80 percent training, 20 percent testing split within each fold. The GP regressions were performed independently for the activities of each of the ten elements.
During the GP regression, the kernels evaluated were: Mat32, Mat52, radial basis function, exponential, cosine, and the kernel retained was whichever achieved a regression coefficient of determination, r2, greater than 0.97 first or the kernel with the highest r2 value if no kernel achieved a value greater than 0.97. Within the five-folds per element activity for a given cluster, it was possible for different kernels to be chosen across the different folds. The surrogate model then involves averaging the predictions from this set of 5 GPs. With ten elements, this means there are 50 GPs for a given cluster. The estimate of the final surrogate ML model uncertainty of the prediction is taken as the greater of either (a) the average one standard deviation (1) uncertainty returned by the 5 GPs, which is the composite mean GP-predicted 1 uncertainty,
 
<UGP>), or (b) the 1 variability from the 5-fold cross validation, UCV. This pair of uncertainties (i.e., one from the GP and one from the statistical sampling) makes it possible to check the GPs ability to account for their epistemic uncertainties. The final surrogate model uncertainty was then taken as UF = max(<UGP>,
During the GP regression, the kernels evaluated were: Mat32, Mat52, radial basis function, exponential, cosine, and the kernel retained was whichever achieved a regression coefficient of determination, r2, greater than 0.97 first or the kernel with the highest r 2 value if no kernel achieved a value greater than 0.97. Within the five-folds per element activity for a given cluster, it was possible for different kernels to be chosen across the different folds. The surrogate model then involves averaging the predictions from this set of 5 GPs. With ten elements, this means there are 50 GPs for a given cluster. The estimate of the final surrogate ML model uncertainty of the prediction is taken as the greater of either (a) the average one standard deviation (1) uncertainty returned by the 5 GPs, whic h is the composite mean GP-predicted 1 uncertainty,
<UGP>), or (b) the 1 variability from the 5-fold cross validation, U CV. This pair of uncertainties (i.e., one from the GP and one from the statistical sampling) ma kes it possible to check the GPs ability to account for their epistemic uncertainties. The final surrogate model uncertainty was then taken as U F = max(<UGP>,
UCV) for each elemental activity at each cluster.
UCV) for each elemental activity at each cluster.
Computational considerations were also important in this use case. For these applications, the method should scale linearly and be parallelizable for both training and evaluation. On the order of 1,000 GP points were needed for reasonable accuracy and to remain viable with conventional computing architectures (e.g.,
 
Computational considerations were also important in this use case. For these applications, the method should scale linearly and be parallelizable for both traini ng and evaluation. On the or der of 1,000 GP points were needed for reasonable accuracy and to remain vi able with conventional computing architectures (e.g.,
a system with less than a 5 gigahertz processor and 40 gigabytes (GBs) of random-access memory).
a system with less than a 5 gigahertz processor and 40 gigabytes (GBs) of random-access memory).
Accordingly, the full training dataset of 150,000 points was divided piecewise by constrained k-means clustering with clusters constrained to 500 to 1,000 points per cluster. With these choices, the clustering and training each finished on a timescale of days with 200 clusters. The total number of GPs within the piecewise surrogate model was thus 20,000 (i.e., 200 clusters times 5 samplings per activity times 10 activities). This type of surrogate ML model does not compress the data. The training data was 35 megabytes on disk, while the surrogate model was much larger because the storage of GPs is known to scale at the order of O(n2) for n training points (Wang et al., 2019, Rasmussen & Williams, 2006). The total size of the surrogate ML model was on the order of 80 GBs when serialized, and greater than 10 GBs of memory if completely loaded into memory. However, the piecewise surrogate was utilized by cycling 11
Accordingly, the full training dataset of 150,000 points was divided piecewise by constrained k-means clustering with clusters constrained to 500 to 1,000 points per cluster. With these choices, the clustering and training each finished on a timescale of days wi th 200 clusters. The total number of GPs within the piecewise surrogate model was thus 20,000 (i.e., 200 clusters times 5 samplings per activity times 10 activities). This type of surrogate ML model does not compress the data. The training data was 35 megabytes on disk, while the surrogate model was much larger because the storage of GPs is known to scale at the order of O(n2) for n training points (Wang et al., 2019, Rasmussen & Williams, 2006). The total size of the surrogate ML model was on the order of 80 GBs when serialized, and greater than 10 GBs of memory if completely loaded into memory. However, the piecewise surrogate was utilized by cycling
 
11 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
through the clusters (which could be parallelized), thus requiring less than 10 GBs of memory during predictions. The loading time was on the order of 10 seconds per cluster, and the evaluation for each predicted elemental activity at a given composition re quired, on average, 0.0052 seconds with a standard deviation of 0.0065 seconds.
 
4.2 Results and Discussion
 
Following the training and testing, the activities of 15,0 00 new data points were used to validate the model.
Figure 7 shows the representative parity plots for the predicted activities as calculated by the p-GP surrogate model versus the actual activities calculated by CALP HAD for chromium, manganese, and silicon. All the plots show good agreement between the p-GP surrogate model and the validation data.
 
Figure 7: Representative parity plots for the p-GP surrogate model predicted versus CALPHAD-calculated activities for several chemical el ements in iron-based alloys at various temperatures.
 
To demonstrate the benefit of this type of surroga te ML model and a multi-linear regression, an ordinary least squares surrogate (OLS) model was made with the same input data. Although not shown here, the parity plots obtained for the OLS model were nonlinear. Table 2 shows the mean absolute errors (MAEs) for each model and each elemental activity along with the ratio of the errors, taken as the p-GP model MAE divided by the OLS model MAE. The comparisons s how that MAE from the p-GP model is on the order-of-a-hundredth to the order-of-a-tenth that of the OL S model. The parity plots and comparisons with the OLS surrogate model predictions both indicate that the trained p-GP surrogate model was able to capture the non-linearity of the data well.


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual through the clusters (which could be parallelized), thus requiring less than 10 GBs of memory during predictions. The loading time was on the order of 10 seconds per cluster, and the evaluation for each predicted elemental activity at a given composition required, on average, 0.0052 seconds with a standard deviation of 0.0065 seconds.
4.2    Results and Discussion Following the training and testing, the activities of 15,000 new data points were used to validate the model.
Figure 7 shows the representative parity plots for the predicted activities as calculated by the p-GP surrogate model versus the actual activities calculated by CALPHAD for chromium, manganese, and silicon. All the plots show good agreement between the p-GP surrogate model and the validation data.
Figure 7: Representative parity plots for the p-GP surrogate model predicted versus CALPHAD-calculated activities for several chemical elements in iron-based alloys at various temperatures.
To demonstrate the benefit of this type of surrogate ML model and a multi-linear regression, an ordinary least squares surrogate (OLS) model was made with the same input data. Although not shown here, the parity plots obtained for the OLS model were nonlinear. Table 2 shows the mean absolute errors (MAEs) for each model and each elemental activity along with the ratio of the errors, taken as the p-GP model MAE divided by the OLS model MAE. The comparisons show that MAE from the p-GP model is on the order-of-a-hundredth to the order-of-a-tenth that of the OLS model. The parity plots and comparisons with the OLS surrogate model predictions both indicate that the trained p-GP surrogate model was able to capture the non-linearity of the data well.
Table 2: Accuracies of the p-GP and OLS surrogate models.
Table 2: Accuracies of the p-GP and OLS surrogate models.
Activity Iron        Chromium        Manganese          Silicon    Carbon p-GP Surrogate      2.2 x 10-4      6.6 x 10-4      2.6 x 10-8    8.8 x 10-13  3.5 x 10-5 Model MAE OLS Surrogate      4.4 x 10-3      1.6 x 10-2      1.6 x 10-6    5.1 x 10-11  1.5 x 10-4 Model MAE Ratio of 0.05            0.04            0.02          0.02        0.23 MAEs 12


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual Activity Titanium     Molybdenum         Aluminum       Niobium     Nickel p-GP Surrogate       9.2 x 10-9     8.2 x 10-6       2.5 x 10-11   2.3 x 10-7 2.0 x 10-7 Model MAE OLS Surrogate       4.0 x 10-8     8.3 x 10-5       2.3 x 10-9   1.1 x 10-6 1.5 x 10-5 Model MAE Ratio of 0.23           0.10             0.01         0.20       0.01 MAEs Uncertainty estimates for the p-GP surrogate model should be reliable and within a specified tolerance to render confidence in applications of the proposed method. Thus, two uncertainty metrics were examined:
Activity Iron Chromium Manganese Silicon Carbon
 
p-GP Surrogate 2.2 x 10-4 6.6 x 10-4 2.6 x 10-8 8.8 x 10-13 3.5 x 10-5 Model MAE OLS Surrogate 4.4 x 10-3 1.6 x 10-2 1.6 x 10-6 5.1 x 10-11 1.5 x 10-4 Model MAE Ratio of MAEs 0.05 0.04 0.02 0.02 0.23
 
12 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
Activity Titanium Molybdenum Aluminum Niobium Nickel p-GP Surrogate 9.2 x 10-9 8.2 x 10-6 2.5 x 10-11 2.3 x 10-7 2.0 x 10-7 Model MAE OLS Surrogate 4.0 x 10-8 8.3 x 10-5 2.3 x 10-9 1.1 x 10-6 1.5 x 10-5 Model MAE Ratio of MAEs 0.23 0.10 0.01 0.20 0.01
 
Uncertainty estimates for the p-GP surrogate model s hould be reliable and within a specified tolerance to render confidence in applications of the proposed me thod. Thus, two uncertainty metrics were examined:
(1) whether <UGP> was less than UCV (a measure of the GPs ability to include epistemic uncertainty in
(1) whether <UGP> was less than UCV (a measure of the GPs ability to include epistemic uncertainty in
<UGP>), and (2) whether the final surrogate ML model predicted UF in agreement with the distribution of the prediction errors as measured by the difference between the predictions and actual values (a measure of the final accuracy of UF). For the first metric, considering all the predicted points across all the elemental activities, <UGP> was less than UCV for 97 percent of the values. This result suggests that the GPs do a good job at including epistemic uncertainties in their estimated uncertainties. For the second metric, 2 x UF was compared to the prediction errors (i.e., the residuals, R). For accurate normally distributed errors, 2 x UF < R will be true for 95 percent of the points. Here, it was found that the 2 x UF < R errors were 95.0 +/- 0.8 percent of the time when averaged across all ten element activities and across all 15,000 validation points. This level of agreement was fortuitous. These results demonstrate that taking an envelope of 2 x UF as a 95-percent confidence interval works well. For points found to be further away from the prediction estimate (e.g., 3 x UF), the residuals were greater than would be expected for normally distributed uncertainties.
<UGP>), and (2) whether the final surrogate ML model predicted U F in agreement with the distribution of the prediction errors as measured by the difference be tween the predictions and actual values (a measure of the final accuracy of UF). For the first metric, considering all the predicted points across all the elemental activities, <UGP> was less than UCV for 97 percent of the values. This result suggests that the GPs do a good job at including epistemic uncertainties in their es timated uncertainties. For the second metric, 2 x U F was compared to the prediction errors (i.e., the residuals, R). For accurate normally distributed errors, 2 x UF < R will be true for 95 percent of the points. Here, it was found that the 2 x U F < R errors were 95.0 +/- 0.8 percent of the time when averaged across all ten element activities and across all 15,000 validation points. This level of agreement was fortuitous. These results demonstrate that taking an envelope of 2 x U F as a 95-percent confidence interval works well. For points found to be further away from the prediction estimate (e.g., 3 x UF), the residuals were greater than would be expected for normally distributed uncertainties.
However, the ML methods used here would allow for more complex uncertainty estimates than was used in the limited analysis presented here.
However, the ML methods used here would allow for more complex uncertainty estimates than was used in the limited analysis presented here.
This use case demonstrated that, for this problem, this type of surrogate ML model was suitable for accurate target and uncertainty predictions relative to OLS, even when using a reduced, one-dimensional representation of the uncertainties. The uncertainty estimates are thus suitable for sensitivity analysis and uncertainty propagation and can enable reasonable confidence interval predictions for applications where only sparse data is available.
: 5. Conclusions The NRC staff has explored three use cases of AI/ML for engineering applications involving mechanical systems and component performance. For the first use case, the NRC staff developed an anomaly detector using an LSTM neural network within the structure of an autoencoder to monitor nuclear system performance. For the second use case, the NRC staff used augmented Monte Carlo simulations of piping component integrity using random forest regression models to conduct sensitivity analysis and predict time-series using surrogate models. For the third use case, the NRC staff augmented a sparse data set for MSR materials compatibility research by creating a p-GP surrogate model. The success of these efforts indicates that AI/ML may have a future role in augmenting nuclear regulatory activities through increased efficiency and effectiveness and enhanced decisionmaking.
13


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual Acknowledgements The authors wish to thank John McKirgan for his support and encouragement of the NRC staff in exploring applications of innovative AI/ML methods for engineering applications.
This use case demonstrated that, for this problem, this type of surrogate ML model was suitable for accurate target and uncertainty predictions relative to OLS, even when using a reduced, one-dimensional representation of the uncertainties. The uncertainty estimates are thus suitable for sensitivity analysis and uncertainty propagation and can enable reasonable conf idence interval predictions for applications where only sparse data is available.
Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology
: 5. Conclusions
& Engineering Solutions of Sandia, LLC (NTESS), a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energys National Nuclear Security Administration (DOE/NNSA) under contract DE-NA0003525. This written work is authored by an employee of NTESS. The employee, not NTESS, owns the right, title and interest in and to the written work and is responsible for its contents. Any subjective views or opinions that might be expressed in the written work do not necessarily represent the views of the U.S. Government. The publisher acknowledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this written work or allow others to do so, for U.S. Government purposes. The DOE will provide public access to results of federally sponsored research in accordance with the DOE Public Access Plan.
 
The NRC staff has explored three use cases of AI/ML for engineering applications involving mechanical systems and component performance. For the first use case, the NRC staff developed an anomaly detector using an LSTM neural network within the structure of an autoencoder to monitor nuclear system performance. For the second use case, the NRC staff used augmente d Monte Carlo simulations of piping component integrity using random forest regression mode ls to conduct sensitivity analysis and predict time-series using surrogate models. For the third use case, the NRC staff augmented a sparse data set for MSR materials compatibility research by creating a p-GP surrogate model. The success of these efforts indicates that AI/ML may have a future role in augmenting nuc lear regulatory activities through increased efficiency and effectiveness and enhanced decisionmaking.
 
13 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
Acknowledgements The authors wish to thank John McKirgan for his sup port and encouragement of the NRC staff in exploring applications of innovative AI/ML methods for engineering applications.
 
Sandia National Laboratories is a multi-mission labor atory managed and opera ted by National Technology
& Engineering Solutions of Sandia, LLC (NTESS), a who lly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energys National Nuclear Security Administration (DOE/NNSA) under contract DE-NA0003525. This written work is author ed by an employee of NTESS. The employee, not NTESS, owns the right, title and interest in and to th e written work and is responsible for its contents. Any subjective views or opinions that might be expressed in the written work do not necessarily represent the views of the U.S. Government. The publisher acknow ledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this written work or allow others to do so, for U.S. Government purposes. The DOE will provide public access to results of federally sponsored research in accordance with the DOE Public Access Plan.
 
References Andersson, J-O, T. Helander, L. Hglund, and P. Shi, Thermo-Calc and DICTRA, Computational Tools for Materials Science. Calphad, 2002. pp. 273-312.
References Andersson, J-O, T. Helander, L. Hglund, and P. Shi, Thermo-Calc and DICTRA, Computational Tools for Materials Science. Calphad, 2002. pp. 273-312.
Carlson, J., M. Homiack, and R. Iyengar, Autonomous Researcher Feasibility Studies, TLR-RES/DE/REB-2022-13, 2022. Washington, DC: U.S. NRC.
Carlson, J., M. Homiack, and R. Iyengar, Autonomous Researcher Feasibility Studies, TLR-RES/DE/REB-2022-13, 2022. Washington, DC: U.S. NRC.
Line 137: Line 236:
materialstoday, 2010. pp. 34-41.
materialstoday, 2010. pp. 34-41.
Hochreiter, S. and J. Schmidhuber, Long Short-Term Memory. Neural Computation, 1997. pp. 1735-1780.
Hochreiter, S. and J. Schmidhuber, Long Short-Term Memory. Neural Computation, 1997. pp. 1735-1780.
Homiack, M., G. Facco, M. Benson, M. Erickson, and C. Harrington, Extremely Low Probability of Rupture Version 2 Probabilistic Fracture Mechanics Code, NUREG-2247, 2021. Washington, DC: U.S. NRC.
Homiack, M., G. Facco, M. Benson, M. Erickson, and C. Harrington, Extremely Low Probability of Rupture Version 2 Probabilistic Fractu re Mechanics Code, NUREG-2247, 2021. Washington, DC: U.S. NRC.
Hund, L., J. Lewis, N. Martin, M. Starr, D. Brooks, A. Zhang, R. Dingreville, A. Eckert, J. Mullins, P.
Hund, L., J. Lewis, N. Martin, M. Starr, D. Brooks, A. Zhang, R. Dingreville, A. Eckert, J. Mullins, P.
Raynaud, D. Rudland, D. Dijamco, and S. Cumblidge, Technical Basis for the use of Probabilistic Fracture Mechanics in Regulatory Applications, NUREG/CR-7278, 2022. Washington, DC: U.S. NRC.
Raynaud, D. Rudland, D. D ijamco, and S. Cumblidge, Technical Basis for the use of Probabilistic Fracture Mechanics in Regulatory Applications, NUREG/CR-7278, 2022. Washington, DC: U.S. NRC.
Matera, S., W. F. Schneider, A. Heyden, and A. Savara, Progress in Accurate Chemical Kinetic Modeling, Simulations, and Parameter Estimation for Heterogeneous Catalysis. ACS Catal., 2019. pp. 6624-6647.
Matera, S., W. F. Schneider, A. Heyden, and A. Savara, Progress in Accurate Chemical Kinetic Modeling, Simulations, and Parameter Estimation for Heterogeneous Catalysis. ACS Catal., 2019. pp. 6624-6647.
Pillai, R., S. S. Raiman, and B. A. Pint, First Steps toward Predicting Corrosion Behavior of Structural Materials in Molten Salts. Journal of Nuclear Materials, 2021. p. 152755.
Pillai, R., S. S. Raiman, and B. A. Pint, First Steps toward Predicting Corrosion Behavior of Structural Materials in Molten Salts. Journal of Nuclear Materials, 2021. p. 152755.
Line 150: Line 249:
L. Kloosterman, L. Luzzi, E. Merle-Lucotte, J. Uhl&#xed;, R. Yoshioka, and D. Zhimin, The Molten Salt Reactor (MSR) in Generation IV: Overview and Perspectives. Progress in Nuclear Energy, 2014. pp. 308-319.
L. Kloosterman, L. Luzzi, E. Merle-Lucotte, J. Uhl&#xed;, R. Yoshioka, and D. Zhimin, The Molten Salt Reactor (MSR) in Generation IV: Overview and Perspectives. Progress in Nuclear Energy, 2014. pp. 308-319.
Srivastava, N., E. Mansimov, and R, Salakhutdinov, Unsupervised Learning of Video Representations using LSTMs. in Proceedings of the 32nd International Conference on Machine Learning, 2015. Lille, France.
Srivastava, N., E. Mansimov, and R, Salakhutdinov, Unsupervised Learning of Video Representations using LSTMs. in Proceedings of the 32nd International Conference on Machine Learning, 2015. Lille, France.
14


Probabilistic Safety Assessment and Management (PSAM) Topical, October 2325, 2023, Virtual Wang, K. E., G. Pleiss, J. R. Gardner, S. Tyree, K. Q. Weinberger, and A. G. Wilson, Exact Gaussian Processes on a Million Data Points. in Proceedings of the 33rd Conference on Neural Information Processing Systems, NeurIPS, 2019. Vancouver, Canada.
14 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual
 
Wang, K. E., G. Pleiss, J. R. Gardner, S. Tyree, K. Q. Weinberger, and A. G. Wilson, Exact Gaussian Processes on a Million Data Points. in Proceedings of the 33rd Conference on Neural Information Processing Systems, NeurIPS, 2019. Vancouver, Canada.
 
15}}
15}}

Latest revision as of 14:15, 13 November 2024

Engineering Applications of Artificial Intelligence and Machine Learning for Mechanical Systems and Component Performance
ML23221A402
Person / Time
Issue date: 08/11/2023
From: Matthew Homiack, Raj Iyengar, Matrachisia J, Pillai R, Savara A, Starr M, Verzi S, Villareal T
Office of Nuclear Regulatory Research, Oak Ridge, Sandia
To:
References
Download: ML23221A402 (1)


Text

Engineering Applications of Artificial Intelligence and Machine Learning for Mechanical Systems and Component Performance

Matthew Homiack1*, John Matrachisia1, Tristan Villarreal1, Aditya Savara1, Raj Iyengar1, Stephen Verzi2, Michael Starr2, and Rishi Pillai3 1 U.S. Nuclear Regulatory Commission, Washington, D.C., U.S.A.

2 Sandia National Laboratories, Albuquerque, New Mexico, U.S.A.

3 Oakridge National Laboratories, Oak Ridge, Tennessee, U.S. A.

  • Corresponding Author (Matthew.Homiack@nrc.gov)

Abstract Artificial intelligence (AI) and machine learning (ML) methods are some of the fastest-growing technologies globally and have the potential to enhance efficiency, effectiveness, and decision-making processes for the nuclear industry. This paper explores several recent use cases of AI/ML methods in support of the U.S. Nuclear Regulatory Commission (NRC) staffs safety research efforts for mechanical systems and component performance.

The first use case explored ML to monitor the perform ance of a system. This study used a full-scope boiling water reactor (BWR) simulator for synthetic data generation. Scenarios were created to induce component malfunctions that may go undetected by operators and could lead to adverse operating conditions. The goal was to support early detection of the malfunctions us ing ML and thereby provide operators with increased response times. A long short-term memory (LSTM) auto encoder was trained and tested for identifying the anomalies in real-time. The study demonstrated the potential for using ML to monitor system performance.

The second use case explored ML to augment Monte Carlo simulation. For this use case, an interface was developed between open-source ML models and the Extremely Low Probability of Rupture (xLPR) probabilistic fracture mechanics (PFM) code. The xLPR code was used to analyze leak-before-break behavior in a pressurized water reactor piping system subject to cracking. Supervised ML, in the form of random forest regression, was applied to the sample i nput and output data from the xLPR code to conduct a sensitivity analysis. Through this analysis, the input va riables that were most important with respect to the selected quantities of interest (QoIs) were determin ed both individually in univariate analyses and across all the QoIs in a multivariate analysis. The ML models were then further applied to explore time series prediction as a surrogate model for the xLPR simulation.

The third use case explored ML to overcome sparse data and enable long-term predictions of materials compatibility for nuclear reactor components in molten salt environments. It involved surrogate ML model prediction of values relevant to corrosion and associated uncertainties. This use case involved the application of a piecewise surrogate ML model for pred icting the activities of chemical elements in alloys.

Such predictions are useful for lifetime assessment of materials compatibility because the activities have

The views expressed in this paper are those of the authors a nd do not reflect the views of the U.S. Nuclear Regulatory Commissi on. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. Approved for public release; distribution is unlimited.

This report was prepared as an account of work sponsored by an agency of the U.S. Government. Neither the U.S. Government nor a ny agency thereof, nor any of their employees, makes any warranty, expressed or implied, or assumes any legal liability or responsibility for any third partys use, or the results of such use, of any information, apparatus, product, or process disclosed in this report, or represents tha t its use by such third party would not infringe privately owned rights. The views expresse d in this paper are not necessarily those of the U.S. Nuclea r Regulatory Commission.

1 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

been associated with long-term loss of metal due to corrosion attack, particularly loss of chromium. The demonstration showed that surrogate ML models can be used to augment sparse datasets with predicted values for unmeasured, or not explicitly calculated da tapoints, with reasonable prediction uncertainties.

Keywords: Artificial Intelligence, Nuclear, Regulation, Research

1. Introduction

AI/ML methods are some of the fastest-growing technol ogies globally and have the potential to enhance the business of industry, academia, and government. Like for other applica tions, the benefits of AI/ML for engineering generally include increases in efficiency and effectiveness and enhanced decision-making. For instance, AI/ML methods can efficiently generate data as compared to physical experiments and other forms of synthetic data generation. AI/ML methods can be more effective at processing large or high-velocity data, thereby supporting analysis of highly complex systems. AI/ML methods can also enhance decision-making by identifying patterns and trends a nd enabling uncertainty quantification.

This paper explores three recent use cases of AI/ML methods in support of the NRC staffs safety research efforts for mechanical systems and component performance in the following areas:

1. system performance monitoring
2. Monte Carlo simulation and PFM for piping integrity analysis
3. overcoming sparse data to enable long-term predictio ns of materials compatibility of nuclear reactor components in molten salt environments

Sections 2, 3, and 4 present the technical approach, results, and discussion for each use case, respectively.

2. Use Case 1System Performance Monitoring

The first use case explored ML methods to monitor th e performance of a system. Specifically, an anomaly detector was developed using an unsupervised ML method to detect a malfunction from multivariate, sequential data. Synthetic data for training and testing the ML models were generated using a full scope BWR simulator, which supports modeling of a substantial number of initial conditions, operations, malfunctions, and recording of real-time plant parameters. As a result, the simulator data was suitable for developing realistic scenarios and training various types of ML models for system performance monitoring research.

2.1 Approach

The approach to demonstrate potential application of ML methods for system performance monitoring began with the curation of a robust dataset. Using the si mulator as a surrogate for a real BWR, a subset of relevant plant parameters was captured in time-series datasets. After the samples were collected, techniques such as feature selection, averaging, binning, and splitting were applied to make sure that the data was in a form suitable for model training and testing.

The next step was to select the proper ML algorithm. This selection was informed by the characteristics of the data, the problem at hand, and the desired output. For this use case, an anomaly detector was developed with an autoencoder, which uses an unsupervised learning algorithm. An autoencoder only relies on input data for the model, thus during training the input da ta is set as the ground truth for the predictions. The neural network compresses the input data into a reduced dimension space and reconstructs the information into the output, which in theory should closely match th e input data (Srivastava et al., 2015). As the neural network trains, the predicted value back propagates to f it the input using the loss function or, in this case,

2 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

the mean squared error (MSE). The MSE is based on the residuals between the input,, and the autoencoders predicted output,. During training, the autoencoder MSE was used as the loss function and the autoencoder was trained to minimize the MSE for the training set. When new datasets like the training dataset are fed as inputs to the model, the MSE of the predicted outputs will be low; however, when new datasets quantitatively different from the training dataset are fed as inputs, the MSE will be relatively high.

Figure 1 shows the general method of a standard auto encoder. The autoencoder has a symmetrical neural network architecture composed of an encoder, a latent space, and a decoder. For this use case, because the input data is sequential, the autoencoder used LSTM neural networks. An LSTM neural network is a type of recurrent neural network that has special gates which can feed output back to themselves and forget information from the previous state (Hochreiter & Schmidhuber, 1997). Due to their nature, LSTM neural networks work well for processing time-series data. With the combination of the LSTM neural network within the structure of an autoencoder, the model can train with a window of multivariate sequential data, such as a time series.

Figure 1: Illustration of a general autoencoder architecture with MSE-based training shown by arrows.

The final step of the approach was to train and eval uate the ML model. The autoencoder was trained using a dataset based on a subset of normal BWR operating conditions and evaluated from datasets of abnormal operating conditions. This step is an iterative process a nd focused on optimization of the neural network by performing a sensitivity study on the hyper-parameters (e.g., epochs, batch sizes, learning rates, and activation functions), preprocessed data, and the overall architecture of the neural network. After multiple iterations of the autoencoder, the model was successfully trained and then evaluated for its anomaly detection capabilities.

2.2 Results and Discussion

The performance of the anomaly detector built using the ML models was evaluated by comparing the MSE from the normal operating condition data against the MSE from the abnormal operating condition data. The optimized neural network was trained with 251,783 trainable parameters; 14,878 training samples; 100 epochs; and 7 input features. The input features were a subset of relevant plant parameters that could be available to a typical BWR instrumentation and control system, such as the system flow rates, temperatures, pressures, and water and power levels.

3 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

Figure 2 shows the MSE results from the training dat aset. The MSE reached a maximum value of 3.23 with an average value of 3.03. The maximum value is critical for determining the threshold for detecting an anomaly, thus the MSE from the normal operating conditions can be compared to the MSE from the abnormal operating conditions.

Figure 2: Training dataset results based on normal operating conditions.

Figure 3 shows the MSE results from four test cases where a simulated equipment malfunction was introduced at 10, 15, 20, and 25 minutes, respectively. The results are compared to the MSE from the normal operating conditions. The simulated malfunction was a r ecirculation pump runaway where the recirculation flow rate increases unexpectedly resulting in an increase in core power. As the results show, when the malfunction was introduced, the MSE markedly increased giving a clear indication that there was a deviation from normal operating conditions.

4 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

Figure 3: Autoencoder MSE for normal operating conditio ns vs. abnormal operating conditions produced via simulated equipment malfunctions at different points in time.

A fifth test case was also run to demonstrate that the neural network was properly trained. For this test case, simulated equipment malfunctions were introduced at sev eral intervals with different severities. Figure 4 shows the results indicating the correct timing and severity of the malfunctions.

5 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

Figure 4: Autoencoder MSE for normal operating conditions vs. abnormal operating conditions produced by simulated equipment malfunctions at various severities and internals.

Overall, the results of the trained autoencoder demonstr ate that some aspects of system performance can be monitored using ML methods. This research could be expanded by further exploring the capabilities and limitations of the autoencoder and by enhancing the anomaly detector by using other ML methods to both detect and classify malfunctions. Although the results of the induced malfunctions were as expected, the model was trained on a small subset of normal operating condition data, thus more training would be needed to fit a wider range of normal operating conditions so that the model w ould not detect these conditions as anomalous.

3. Use Case 2Monte Carlo Simulation Support

In PFM, stochastic analyses enabled by Monte Carlo si mulation are used to better understand the various uncertainties when predicting the load-carrying cap abilities of components containing cracks. The second use case explored ML methods to augment PFM simulations by conducting sensitivity analysis and time-series prediction via surrogate models.

3.1 Approach

This use case consisted of two separate, but related a pplications of ML coupled with the xLPR code. xLPR is a PFM code for piping applications that was de veloped jointly by NRC Offi ce of Nuclear Regulatory Research and the Electric Power Research Institute (Homiack et al., 2021). This code was used to simulate the potential behaviors of a preexisting circumferen tial crack in an un-mitigated, Westinghouse-designed, pressurized water reactor pressurizer surge line weld subject to primary water stress corrosion cracking (PWSCC).

There were 56 inputs in the simulation, including both constant and uncertain or probabilistic inputs. These inputs covered such areas as the geometry of the pipe, operating conditions, crack size, applied stresses (e.g., welding residual stresses (WRS) defined at 26 poin ts through the thickness of the weld), and material properties (e.g., PWSCC growth rate model parameters). Six outputs or QoIs were selected related to key behaviors of the problem, such as the crack size (e.g., crack depth normalized by the weld thickness) and whether the cracks cause leaks (e.g., occurrence of leak). A full list of all the inputs and outputs is in (Carlson et al., 2022).

6 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

In both applications, four separate, off-the-shelf, and open-source ML regression methods were explored:

(1) linear regression, (2) random forest, (3) gradient b oosting, and (4) multilayer perceptron. Each of these methods utilize supervised ML where the xLPR simula tion data provided the inputs and target outputs for supervision. The ML models were constructed and trained in Python us ing the scikit-learn library. Linear regression was used to baseline the other models, even though some QoIs were not expected to respond linearly. The ML methods were not optimized for their hyperparameters, and specific values used in each application are detailed next.

Determining the Most Important PFM Analysis Inputs. The first application sought to use ML methods to automate a sensitivity analysis. As described in (Hund et al., 2022), sensitivity analysis focuses on identifying how the input uncertainties contribute to th e uncertainty in the QoIs and can help to identify those inputs that explain substantial uncertainty in the model output. For this application, all four regression methods performed similarly; therefore, only the results from the random forest regressor are presented herein. In using the scikit-learn random forest ensemb le regression method, the only parameter besides the random seed that needed to be specified was the number of estimators, which was set to 1,000. The permutation importance method from the scikit-learn insp ection sub-package was used to rank the relative importance of all the input parameters.

Surrogate Modeling for Generating PFM Analysis Outputs. The second application sought to use ML methods to improve the efficiency of the PFM simula tion via development and trai ning of a surrogate model that could predict the time series of a specific QoI given a set of input samples from the xLPR code.

At first, the occurrence of leak time series was selected for prediction. All four of the ML regression methods were used to predict the time at which a leak occurs, if a leak occurs, given specific xLPR inputs.

However, because all the ML methods are supervised, and the case of no leak was much more likely in this problem, there were not enough positive leak samples to support such a prediction. Each ML method could be trained using the entire set of 2,000 samples, but th ey could not generalize using only a portion the data, including splitting the data into 75 percen t for training and 25 percent for testing.

Given these limitations, the focus of the study was changed to predicting the normalized crack depth propagation over time. Normalized crack depth propaga tion was not expected to be linear, thus the linear regression results needed special interpretation beyond a predicted value of 1.0 for the normalized crack depth (i.e., the point at which the crack penetrates through-wall). Both gradient boosting and multilayer perceptron methods are not suited for time-series prediction without some further research and design considerations. Thus, the results for the surrogate modeling were restricted to the linear regression and random forest methods. For time-seri es prediction, the random forest method was provided with all the xLPR code inputs as well as the current normalized crack depth as an input, and it was trained to predict the next normalized crack depth.

3.2 Results and Discussion

Determining the Most Important PFM Analysis Inputs. Following the approach described in Section 3.1, the uncertain inputs in the xLPR c ode simulation were ranked by importance for each QoI individually (i.e., univariate analysis) and across all th e QoIs simultaneously (i.e., multivariate analysis).

As the appropriate number of realizations needed to be determined for the surrogate modeling application, three separate ML models were trained given data from 200; 2,000; and 20,000 xLPR realizations, respectively.

Table 1 lists the top ten most important inputs based on their permutation importance as determined using the random forest ML model for each sample size. Empt y cells indicate that the input did not rank in the top ten for the given sample size. All three sample sizes resulted in the WRS at point 1 through the weld thickness being the most important input in the an alysis. The results from the 2,000 and 20,000 sample

7 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

sizes also identified two parameters in the PWSCC growth model (i.e., the component-to-component and within-component variability factors) as being the sec ond-and third-most important inputs in the analysis, respectively. Given the consistency between the results based on the 2,000 and 20,000 sample sizes, the former was determined to be appropriate for the surrogate modeling application.

Table 1: Ranked permutation importance values for the t op ten xLPR code inputs across all QoIs (multivariate) using various sample sizes.

Rank Input Variable Description Permutation Importance 200 2,000 20,000 Realizations Realizations Realizations 1 WRS at point 1 0.5973 0.9740 1.2146

2 PWSCC growth model component-to-component variability factor 0.0280 0.2029 0.3489

3 PWSCC growth model within-component variability factor - 0.1503 0.2416

4 WRS at point 14 - 0.0301 -

5 WRS at point 26 0.0680 0.0268 0.0177 6 WRS at point 22 0.0501 0.0230 7 WRS at point 2 0.0158 0.0176 0.1115 8 WRS at point 23 - 0.0163 -

9 WRS at point 24 0.0545 0.0162 0.0162 10 WRS at point 7 0.0133 0.0158 1.2146

Surrogate Modeling for Generating PFM Analysis Outputs. Following the approach described in Section 3.1, the normalized crack depth time-series was predicted using three ML methods: (1) linear regression, (2) single-trained random forest, and (3) ra ndom forest using a dataset of 2,000 random samples generated by the xLPR code. For the linear regression method, the dataset was split into 75 percent for training and 25 percent for testing; for the random fore st method, the dataset was split conversely. Also, the random forest model was parameterize d using 250 estimators versus the 1,000 estimators that were used in the application for determining the most important inputs. Reducing the number of estimators in the ensemble allows the random forest model to generalize better.

Figure 5 shows the three ML model predictions vs. the xLPR code predictions, which represent the ground truth. The linear regression results captured the time to through-wall crack penetration (i.e., normalized crack depth value of 1.0), even though they did no t fully capture the curvature of the crack depth propagation or the termination at the normalized crack depth value of 1.0. For this reason, the vertical axis is twice that of the other two plots. The single-trai ned random forest model captured the curvature of the crack depth propagation well; however, it predicted a nor malized crack depth of 1.0 sooner than the xLPR code simulation predicted. Lastly, since the random forest regression method is stochastic, 100 random forest models were trained to determine the spread of the predictions. The results show that ML methods can support time-series prediction for determining the tim e after which a leak will occur given a set of input samples from the xLPR code, particularly in the cas e of the random forest regression method. A potential benefit of using the random forest method over the linear regression method is that fewer samples were

8 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

required (i.e., only 25 percen t of the dataset was required for training the random forest model versus 75 percent of the dataset required for training the linear regression model).

Figure 5: xLPR code predictions of normalized crack de pth time-series vs. ML model predictions using linear regression (left), single-trained ra ndom forest regression (middle), and random forest regression (right).

4. Use Case 3Overcoming Sparse Data

The third use case explored ML methods to overcome sp arse data and enable long-term predictions of materials compatibility of nuclear reactor component s in molten salt environments. Molten salt reactors (MSRs) for nuclear power generation are a current area of interest in nuclear research and development.

Molten salts based on fluorides and chlorides are be ing considered across a broad spectrum of MSRs (Delpech et al., 2010, Serp et al., 2014). There is, in ge neral, a lack of advanced materials that are robust against material degradation in these reactors for tens of years. Further, corrosion of structural materials in molten salts is a critical degradation phenomenon that presents a barrier to the technical realization of MSRs. Notably, there are significant gaps in the scientific understanding of the governing mechanisms and correspondingly large gaps in the ability to predict corrosion rates. Multiple corrosion mechanisms can also occur simultaneously, which increases the complexity of determining th e degradation kinetics. However, surrogate ML models may help to fill these data gaps.

A key challenge is that certain materials degradatio n phenomena might only appear after years or tens of years. The routine experiments, modeling, and monito ring of today can only achieve short timescales (e.g.,

minutes, days, or months) and thus may not be represen tative of long-term material degradation. Therefore, it is beneficial to develop, validate, and verify computational methods that can be used to predict materials compatibility and the performance of MSRs over their operational timescales. These methods consider model sensitivities and uncertaintie s and propagate them under varying conditions. Routine, long-term simulations are computationally intensive and hence not practical from an engineering standpoint when explicit physics-based approaches are used. A promising route for addressi ng these challenges is to enhance multi-scale modeling simulations with surrogate ML m odels. In general, only a small number of long-term experiments are currently performed, typically on the or der of ten (e.g., a study might have 5 different alloy compositions for the same set of co rrosion conditions). To overcome this sparsity of data to make long-term predictions, one option is to use a combinatio n of experimental data and ML-augmented simulation data to predict points for conditions or systems th at differ from those of the experimental data. This augmentation can be done with consideration of the jo int uncertainty probability distributions of the long-term predictions with uncertainty propagation usin g Bayesian methods. With Bayesian methods, it is possible to first create a surrogate model using best guesses from simulations, and la ter update the posterior

9 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

distributions (i.e., final models) each time additional experimental or theoretical data become available (Matera et al., 2019). This strategy is compatible with using surrogate ML models for loss of metals during corrosion processes in MSRs, provided that the surroga te ML model has quantifiable uncertainties. The corrosion rates have been found to be nearly linear relative to chemical elements thermodynamic activities in one set of studies under MSR conditions (Pillai et al., 2021), particularly with the thermodynamic activity of chromium (Pillai et al., 2015, Pillai et al., 2023). Accordingly, its useful to investigate whether a surrogate ML model can predict the activity of metal alloys for arbitrary compositions at arbitrary temperatures. If so, such a model could then be used in multi-scale modeling efforts.

4.1 Approach

Sparse data set augmentation by a surrogate ML mode l was investigated to predict the thermodynamic activity of chemical elements in alloys, with inves tigation of the surrogate ML models ability to return accurate predictions and uncertainties. The performance of a multi-linear regression model was also compared. The temperature range was 600 to 800 degrees Celsius (°C) (1,112 to 1,472 degrees Fahrenheit

(°F)). The alloys were comprised of the following ten elements: iron, chromium, manganese, silicon, carbon, titanium, molybdenum, aluminum, niobium, and nickel, with concentrations chosen from the realistic range for iron-based alloys. The training and testing data creation and validation were both based on the thermodynamic calculations from the Thermo-Calc software for Calculation of Phase Diagrams (CALPHAD) (Andersson et al., 2002). A successful surrogate ML model with reliable uncertainty quantification for its predictions could enable ML-a ugmented multi-scale modeling if it can generate more rapid predictions than CALPHAD. Additionally, as noted, the surrogate ML model could be built upon using experimental data and Bayesian methods, thereby using both forms of data to create useful predictions with acceptable uncertainties.

To create the training and testing data, CALPHAD wa s used to calculate the thermodynamic activities of each of the ten elements present for 50,000 points of iron-based alloys at temperatures of 600, 700, and 800°C, with the points chosen by astroidal Sobol sampling in the realistic concentration ranges. The CALPHAD calculations additionally provided the percent of face-centered cubic gamma and body-centered cubic alpha phases, which were also used in the training. The activity of the elements is not simply a linear function of their concentration, and the presence of other elements affects the thermodynamic activity of each. Figure 6 shows some representative element activities in this range (sliced out of the multi-dimensional space) to illustrate that the activities do not have a functional dependence on the concentration of those elements. Clearly, the responses are nonlinear.

Figure 6: Activity versus concentration for chromium (left), manganese (center), and silicon (right) as calculated by CALPHAD for iron-based alloys at temperatures of 600, 700, and 800 °C.

Although there are 150,000 data points in total, the probl em is one of sparse data because even this amount data is not sufficient for accurate linear interpolation given the 13-dimension, non-linear dependence.

10 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

However, as will be shown, a surrogate ML model can predict the activities of the various elements with high accuracy and good computational efficiency relative to the CALPHAD model. There is an upfront training cost, but following training, the surrogate ML model enables predictions for arbitrary compositions and temperatures within the range of training.

The surrogate ML modeling method that was chosen was based on Gaussian process (GP) regression. GPs have gained popularity in ML due to their combinati on of high accuracy and ability to provide estimated uncertainties for predicted points. A limitation is that GPs do not scale well to large datasets and training points, so techniques such as bagging or binning must be used for large datasets (e.g., tens of thousands of data points). To accomplish this scaling, a first stage of unsupervised ML is performed with constrained k-means clustering to produce 200 clusters with 500 to 1, 000 points per cluster, which is suitable for making a piecewise surrogate model. This level of cluster size is near the practical limit for the conventional computing architectures of today with GPs. As noted, GPs enable uncertainty estimates for the predictions, and these uncertainties can be compared to the actual error on untrained points as well as from statistical sampling between multiple choices of training sets. These comparisons are important for trust considerations, which are a priority in safety-related applications.

The following approach was used to create the piecewise GP (p-GP) surrogate model with uncertainty quantification as a priority:

1. Constrained k-means clustering was performed to create regions for the piecewise surrogate model.
2. For each cluster, GP regression was performed with five-fold Monte Carlo cross-validation with an 80 percent training, 20 percent testing split within each fold. The GP regressions were performed independently for the activities of each of the ten elements.

During the GP regression, the kernels evaluated were: Mat32, Mat52, radial basis function, exponential, cosine, and the kernel retained was whichever achieved a regression coefficient of determination, r2, greater than 0.97 first or the kernel with the highest r 2 value if no kernel achieved a value greater than 0.97. Within the five-folds per element activity for a given cluster, it was possible for different kernels to be chosen across the different folds. The surrogate model then involves averaging the predictions from this set of 5 GPs. With ten elements, this means there are 50 GPs for a given cluster. The estimate of the final surrogate ML model uncertainty of the prediction is taken as the greater of either (a) the average one standard deviation (1) uncertainty returned by the 5 GPs, whic h is the composite mean GP-predicted 1 uncertainty,

<UGP>), or (b) the 1 variability from the 5-fold cross validation, U CV. This pair of uncertainties (i.e., one from the GP and one from the statistical sampling) ma kes it possible to check the GPs ability to account for their epistemic uncertainties. The final surrogate model uncertainty was then taken as U F = max(<UGP>,

UCV) for each elemental activity at each cluster.

Computational considerations were also important in this use case. For these applications, the method should scale linearly and be parallelizable for both traini ng and evaluation. On the or der of 1,000 GP points were needed for reasonable accuracy and to remain vi able with conventional computing architectures (e.g.,

a system with less than a 5 gigahertz processor and 40 gigabytes (GBs) of random-access memory).

Accordingly, the full training dataset of 150,000 points was divided piecewise by constrained k-means clustering with clusters constrained to 500 to 1,000 points per cluster. With these choices, the clustering and training each finished on a timescale of days wi th 200 clusters. The total number of GPs within the piecewise surrogate model was thus 20,000 (i.e., 200 clusters times 5 samplings per activity times 10 activities). This type of surrogate ML model does not compress the data. The training data was 35 megabytes on disk, while the surrogate model was much larger because the storage of GPs is known to scale at the order of O(n2) for n training points (Wang et al., 2019, Rasmussen & Williams, 2006). The total size of the surrogate ML model was on the order of 80 GBs when serialized, and greater than 10 GBs of memory if completely loaded into memory. However, the piecewise surrogate was utilized by cycling

11 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

through the clusters (which could be parallelized), thus requiring less than 10 GBs of memory during predictions. The loading time was on the order of 10 seconds per cluster, and the evaluation for each predicted elemental activity at a given composition re quired, on average, 0.0052 seconds with a standard deviation of 0.0065 seconds.

4.2 Results and Discussion

Following the training and testing, the activities of 15,0 00 new data points were used to validate the model.

Figure 7 shows the representative parity plots for the predicted activities as calculated by the p-GP surrogate model versus the actual activities calculated by CALP HAD for chromium, manganese, and silicon. All the plots show good agreement between the p-GP surrogate model and the validation data.

Figure 7: Representative parity plots for the p-GP surrogate model predicted versus CALPHAD-calculated activities for several chemical el ements in iron-based alloys at various temperatures.

To demonstrate the benefit of this type of surroga te ML model and a multi-linear regression, an ordinary least squares surrogate (OLS) model was made with the same input data. Although not shown here, the parity plots obtained for the OLS model were nonlinear. Table 2 shows the mean absolute errors (MAEs) for each model and each elemental activity along with the ratio of the errors, taken as the p-GP model MAE divided by the OLS model MAE. The comparisons s how that MAE from the p-GP model is on the order-of-a-hundredth to the order-of-a-tenth that of the OL S model. The parity plots and comparisons with the OLS surrogate model predictions both indicate that the trained p-GP surrogate model was able to capture the non-linearity of the data well.

Table 2: Accuracies of the p-GP and OLS surrogate models.

Activity Iron Chromium Manganese Silicon Carbon

p-GP Surrogate 2.2 x 10-4 6.6 x 10-4 2.6 x 10-8 8.8 x 10-13 3.5 x 10-5 Model MAE OLS Surrogate 4.4 x 10-3 1.6 x 10-2 1.6 x 10-6 5.1 x 10-11 1.5 x 10-4 Model MAE Ratio of MAEs 0.05 0.04 0.02 0.02 0.23

12 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

Activity Titanium Molybdenum Aluminum Niobium Nickel p-GP Surrogate 9.2 x 10-9 8.2 x 10-6 2.5 x 10-11 2.3 x 10-7 2.0 x 10-7 Model MAE OLS Surrogate 4.0 x 10-8 8.3 x 10-5 2.3 x 10-9 1.1 x 10-6 1.5 x 10-5 Model MAE Ratio of MAEs 0.23 0.10 0.01 0.20 0.01

Uncertainty estimates for the p-GP surrogate model s hould be reliable and within a specified tolerance to render confidence in applications of the proposed me thod. Thus, two uncertainty metrics were examined:

(1) whether <UGP> was less than UCV (a measure of the GPs ability to include epistemic uncertainty in

<UGP>), and (2) whether the final surrogate ML model predicted U F in agreement with the distribution of the prediction errors as measured by the difference be tween the predictions and actual values (a measure of the final accuracy of UF). For the first metric, considering all the predicted points across all the elemental activities, <UGP> was less than UCV for 97 percent of the values. This result suggests that the GPs do a good job at including epistemic uncertainties in their es timated uncertainties. For the second metric, 2 x U F was compared to the prediction errors (i.e., the residuals, R). For accurate normally distributed errors, 2 x UF < R will be true for 95 percent of the points. Here, it was found that the 2 x U F < R errors were 95.0 +/- 0.8 percent of the time when averaged across all ten element activities and across all 15,000 validation points. This level of agreement was fortuitous. These results demonstrate that taking an envelope of 2 x U F as a 95-percent confidence interval works well. For points found to be further away from the prediction estimate (e.g., 3 x UF), the residuals were greater than would be expected for normally distributed uncertainties.

However, the ML methods used here would allow for more complex uncertainty estimates than was used in the limited analysis presented here.

This use case demonstrated that, for this problem, this type of surrogate ML model was suitable for accurate target and uncertainty predictions relative to OLS, even when using a reduced, one-dimensional representation of the uncertainties. The uncertainty estimates are thus suitable for sensitivity analysis and uncertainty propagation and can enable reasonable conf idence interval predictions for applications where only sparse data is available.

5. Conclusions

The NRC staff has explored three use cases of AI/ML for engineering applications involving mechanical systems and component performance. For the first use case, the NRC staff developed an anomaly detector using an LSTM neural network within the structure of an autoencoder to monitor nuclear system performance. For the second use case, the NRC staff used augmente d Monte Carlo simulations of piping component integrity using random forest regression mode ls to conduct sensitivity analysis and predict time-series using surrogate models. For the third use case, the NRC staff augmented a sparse data set for MSR materials compatibility research by creating a p-GP surrogate model. The success of these efforts indicates that AI/ML may have a future role in augmenting nuc lear regulatory activities through increased efficiency and effectiveness and enhanced decisionmaking.

13 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

Acknowledgements The authors wish to thank John McKirgan for his sup port and encouragement of the NRC staff in exploring applications of innovative AI/ML methods for engineering applications.

Sandia National Laboratories is a multi-mission labor atory managed and opera ted by National Technology

& Engineering Solutions of Sandia, LLC (NTESS), a who lly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energys National Nuclear Security Administration (DOE/NNSA) under contract DE-NA0003525. This written work is author ed by an employee of NTESS. The employee, not NTESS, owns the right, title and interest in and to th e written work and is responsible for its contents. Any subjective views or opinions that might be expressed in the written work do not necessarily represent the views of the U.S. Government. The publisher acknow ledges that the U.S. Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this written work or allow others to do so, for U.S. Government purposes. The DOE will provide public access to results of federally sponsored research in accordance with the DOE Public Access Plan.

References Andersson, J-O, T. Helander, L. Hglund, and P. Shi, Thermo-Calc and DICTRA, Computational Tools for Materials Science. Calphad, 2002. pp. 273-312.

Carlson, J., M. Homiack, and R. Iyengar, Autonomous Researcher Feasibility Studies, TLR-RES/DE/REB-2022-13, 2022. Washington, DC: U.S. NRC.

Delpech, S., C. Cabet, C. Slim, and G. S. Picard, Molten Fluorides for Nuclear Applications.

materialstoday, 2010. pp. 34-41.

Hochreiter, S. and J. Schmidhuber, Long Short-Term Memory. Neural Computation, 1997. pp. 1735-1780.

Homiack, M., G. Facco, M. Benson, M. Erickson, and C. Harrington, Extremely Low Probability of Rupture Version 2 Probabilistic Fractu re Mechanics Code, NUREG-2247, 2021. Washington, DC: U.S. NRC.

Hund, L., J. Lewis, N. Martin, M. Starr, D. Brooks, A. Zhang, R. Dingreville, A. Eckert, J. Mullins, P.

Raynaud, D. Rudland, D. D ijamco, and S. Cumblidge, Technical Basis for the use of Probabilistic Fracture Mechanics in Regulatory Applications, NUREG/CR-7278, 2022. Washington, DC: U.S. NRC.

Matera, S., W. F. Schneider, A. Heyden, and A. Savara, Progress in Accurate Chemical Kinetic Modeling, Simulations, and Parameter Estimation for Heterogeneous Catalysis. ACS Catal., 2019. pp. 6624-6647.

Pillai, R., S. S. Raiman, and B. A. Pint, First Steps toward Predicting Corrosion Behavior of Structural Materials in Molten Salts. Journal of Nuclear Materials, 2021. p. 152755.

Pillai, R., D. Sulejmanovic, T. Lowe, S. S. Raiman, and B. A. Pint, Establishing a Design Strategy for Corrosion Resistant Structural Materials in Molten Salt Technologies. JOM, 2023. pp. 994-1005.

Pillai, R., W. G. Sloof, A. Chyrkin, L. Singheiser, and W. J. Quadakkers, A New Computational Approach for Modelling the Microstructural Evolution and Residual Lifetime Assessment of MCrAlY Coatings.

Materials at High Temperatures, 2015. pp. 57-67.

Rasmussen, C. E. and C. K. I. Williams, Gaussian Processes for Machine Learning, Volume 1, 2006.

Cambridge, Massachusetts: MIT Press.

Serp, J., M. Allibert, O. Bene, S. Delpech, O. Feynberg, V. Ghetta, D. Heuer, D. Holcomb, V. Ignatiev, J.

L. Kloosterman, L. Luzzi, E. Merle-Lucotte, J. Uhlí, R. Yoshioka, and D. Zhimin, The Molten Salt Reactor (MSR) in Generation IV: Overview and Perspectives. Progress in Nuclear Energy, 2014. pp. 308-319.

Srivastava, N., E. Mansimov, and R, Salakhutdinov, Unsupervised Learning of Video Representations using LSTMs. in Proceedings of the 32nd International Conference on Machine Learning, 2015. Lille, France.

14 ProbabilisticSafetyAssessmentandManagement(PSAM)Topical,October2325,2023,Virtual

Wang, K. E., G. Pleiss, J. R. Gardner, S. Tyree, K. Q. Weinberger, and A. G. Wilson, Exact Gaussian Processes on a Million Data Points. in Proceedings of the 33rd Conference on Neural Information Processing Systems, NeurIPS, 2019. Vancouver, Canada.

15