ML21230A364: Difference between revisions
StriderTol (talk | contribs) (StriderTol Bot change) |
StriderTol (talk | contribs) (StriderTol Bot change) |
||
Line 16: | Line 16: | ||
=Text= | =Text= | ||
{{#Wiki_filter:}} | {{#Wiki_filter:Global Sensitivity Analysis of xLPR using Metamodeling (Machine Learning) xLPR User Group Meeting August 18, 2021 1 | ||
===Background=== | |||
* As part of applying xLPR to production analyses and to further validate the model, sensitivity analyses were conducted | |||
- Sensitivity studies can be used to assess the impacts of uncertain parameters and analysis assumptions on the results | |||
- Sensitivity analysis is a useful tool for identifying important uncertain model inputs that explain a large degree of the uncertainty in a quantity of interest | |||
* Reasons to perform a sensitivity analysis: | |||
- Identify inputs that warrant greatest level of scrutiny, validation, and further sensitivity analysis | |||
- Identify inputs that are key to the results | |||
- Model validation | |||
- Improve understanding of model behavior | |||
- Reduction of model complexity (e.g., set unimportant inputs to constant values) | |||
- Inform advanced Monte Carlo sampling strategies (e.g., importance sampling) | |||
* Available techniques (see TLR-RES/DE/CIB-2021-11; ML21133A485): | |||
- One-at-a-time | |||
- Local partial derivatives (e.g., Adjoint Modeling) | |||
- Variance-based (e.g., Sobol method) | |||
- Linear regression | |||
- Metamodels 2 | |||
Sensitivity Analysis using Metamodels | |||
* Why machine learning metamodeling? | |||
- Can handle correlated inputs | |||
- Accurately reflects non-monotonicity, non-linearity, and interactions | |||
- Importance measures reflect the whole input space | |||
- Several machine learning models automatically generate sensitivity metrics and down-select input variables based on information gained as part of the model fitting process | |||
- Fitted model can be used in place of the original model to compute quantitative sensitivity measures at lower computational cost | |||
* Focus of this presentation: using built-in sensitivity metrics generated during fitting 3 | |||
Metamodeling Analysis Workflow | |||
* Run the probabilistic code and collect results | |||
* Implement metamodeling code | |||
- Import results from probabilistic code runs | |||
- Transform results to prepare for input to metamodel fitting (e.g., accounting for spatially sampled variables) | |||
- Fit the metamodel, including parameter optimization using cross-validation | |||
- Extract and report input importance metrics | |||
* Evaluate | |||
- Examine goodness of fit metrics | |||
- Compare importance ranking results from alternate metamodels | |||
- Compare importance ranking results across different outputs of interest | |||
* Iterate | |||
- Collect more inputs | |||
- Analyze different outputs | |||
- Run different discrete configurations of the probabilistic code | |||
- Use different metamodels / different metamodel parameters 4 | |||
Model Implementation | |||
* Python 3.6 using Scikit Learn Package* | |||
* Machine learning models implemented: | |||
- Gradient Boosting Decision Trees | |||
- Random Forest Decision Trees | |||
- Linear Support Vector Machines | |||
* All models used are classifiers (as opposed to regressors) because the outcomes are binary (yes/no). Regressor models would be used for scalar outputs. | |||
* All models include metrics for feature selection / feature importance | |||
* Initial work focused on subset of 60 inputs: | |||
- Inputs that are expected to have high importance | |||
- Distributed inputs | |||
- Constant inputs uniformly distributed from 0.8 to 1.2 times constant value | |||
* Outputs analyzed: | |||
- Occurrence leak | |||
- Occurrence rupture (with and without inservice inspection (ISI)) | |||
*Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011 5 | |||
Spatially Distributed Inputs / Outputs | |||
* Pipe section split into 19 subunits that can potentially crack | |||
* Some inputs sampled on a subunit basis | |||
* Some outputs also available on a subunit basis | |||
* Aggregation methodology for subunit inputs / outputs | |||
- Pipe subunit inputs and outputs: Analyze each pipe subunit and crack direction separately and average feature importance metrics | |||
- Pipe subunit inputs and global outputs: | |||
Average input across all pipe subunits (and crack types) and perform single analysis to determine feature importance | |||
- This method may cause underreporting of importance metrics in comparison to alternative methods 6 | |||
Results: Leak Output | |||
* Output: Leak (through wall crack) in any pipe subunit | |||
* Analyzed using Gradient Boosted Trees Classifier (GBC) | |||
* Allows comparison between averaging subunit inputs and averaging subunit analysis outputs | |||
* Top importance parameters for averaged subunit inputs: | |||
- Primary water stress-corrosion cracking (PWSCC) initiation parameters | |||
- PWSCC growth parameters | |||
- Operating Temp./Pressure | |||
- Pipe outside diameter / | |||
Thickness | |||
- Welding Residual Stresses (WRS) - Hoop | |||
- Pipe yield strength 7 | |||
Results: Rupture Output | |||
* Rupture full model output (not subunit basis) | |||
* Analyzed using all three machine learning classification algorithms | |||
* Best prediction accuracy and CV score using Gradient Boosted Trees Classifier | |||
* General agreement between all three fitted models | |||
* Top importance parameters consistent with leak parameters | |||
- PWSCC initiation | |||
- Axial WRS ranked above Hoop (opposite of leak) 8 | |||
Changes in Importance Rankings | |||
* Importance factor results may be compared between different Most important inputs scenarios/cases to consistently drive result show changes in the relative ordering of inputs | |||
* Useful for: | |||
- Comparison between alternate metamodeling approaches Scatter indicates low | |||
- Determining differences confidence in relative in sensitivity between ranking (in the noise) different outputs of interest | |||
- Comparing runs with different model settings (e.g., different ISI intervals) 9 | |||
Conclusions | |||
* Key findings | |||
- Relative comparisons (e.g., Axial vs. Circ, Rupture with/without ISI) are very useful for sanity checking the model | |||
- Relatively high confidence in the identification of highest-impact inputs but low confidence in ordering of low-impact inputs | |||
* General challenges | |||
- Input distributions need to be selected carefully to get informative results | |||
* A default real-world analysis input set is probably not sufficient | |||
- Special consideration needed for inputs that are not continuous variables (e.g., settings flags) | |||
* xLPR-specific challenges | |||
- Prediction of simulation-wide outcomes using subunit-level sampled values | |||
- Consideration of all inputs would be time-intensive (labor to extract sampled values and simulation time to adequately cover full input space) | |||
* Potential future improvements | |||
- Include more inputs in the machine learning model | |||
- Examine other outputs of interest (e.g., leak rate jump indicator) | |||
- Examine alternate configurations that cant be covered automatically using input distributions | |||
- Use more advanced methods to improve on the relative rank importance metric (e.g., variance decomposition) 10}} |
Latest revision as of 17:12, 18 January 2022
ML21230A364 | |
Person / Time | |
---|---|
Issue date: | 08/18/2021 |
From: | NRC/RES/DE |
To: | |
Homiack M | |
Shared Package | |
ML21230A354 | List: |
References | |
Download: ML21230A364 (10) | |
Text
Global Sensitivity Analysis of xLPR using Metamodeling (Machine Learning) xLPR User Group Meeting August 18, 2021 1
Background
- As part of applying xLPR to production analyses and to further validate the model, sensitivity analyses were conducted
- Sensitivity studies can be used to assess the impacts of uncertain parameters and analysis assumptions on the results
- Sensitivity analysis is a useful tool for identifying important uncertain model inputs that explain a large degree of the uncertainty in a quantity of interest
- Reasons to perform a sensitivity analysis:
- Identify inputs that warrant greatest level of scrutiny, validation, and further sensitivity analysis
- Identify inputs that are key to the results
- Model validation
- Improve understanding of model behavior
- Reduction of model complexity (e.g., set unimportant inputs to constant values)
- Inform advanced Monte Carlo sampling strategies (e.g., importance sampling)
- Available techniques (see TLR-RES/DE/CIB-2021-11; ML21133A485):
- One-at-a-time
- Local partial derivatives (e.g., Adjoint Modeling)
- Variance-based (e.g., Sobol method)
- Linear regression
- Metamodels 2
Sensitivity Analysis using Metamodels
- Why machine learning metamodeling?
- Can handle correlated inputs
- Accurately reflects non-monotonicity, non-linearity, and interactions
- Importance measures reflect the whole input space
- Several machine learning models automatically generate sensitivity metrics and down-select input variables based on information gained as part of the model fitting process
- Fitted model can be used in place of the original model to compute quantitative sensitivity measures at lower computational cost
- Focus of this presentation: using built-in sensitivity metrics generated during fitting 3
Metamodeling Analysis Workflow
- Run the probabilistic code and collect results
- Implement metamodeling code
- Import results from probabilistic code runs
- Transform results to prepare for input to metamodel fitting (e.g., accounting for spatially sampled variables)
- Fit the metamodel, including parameter optimization using cross-validation
- Extract and report input importance metrics
- Evaluate
- Examine goodness of fit metrics
- Compare importance ranking results from alternate metamodels
- Compare importance ranking results across different outputs of interest
- Iterate
- Collect more inputs
- Analyze different outputs
- Run different discrete configurations of the probabilistic code
- Use different metamodels / different metamodel parameters 4
Model Implementation
- Python 3.6 using Scikit Learn Package*
- Machine learning models implemented:
- Gradient Boosting Decision Trees
- Random Forest Decision Trees
- Linear Support Vector Machines
- All models used are classifiers (as opposed to regressors) because the outcomes are binary (yes/no). Regressor models would be used for scalar outputs.
- All models include metrics for feature selection / feature importance
- Initial work focused on subset of 60 inputs:
- Inputs that are expected to have high importance
- Distributed inputs
- Constant inputs uniformly distributed from 0.8 to 1.2 times constant value
- Outputs analyzed:
- Occurrence leak
- Occurrence rupture (with and without inservice inspection (ISI))
- Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011 5
Spatially Distributed Inputs / Outputs
- Pipe section split into 19 subunits that can potentially crack
- Some inputs sampled on a subunit basis
- Some outputs also available on a subunit basis
- Aggregation methodology for subunit inputs / outputs
- Pipe subunit inputs and outputs: Analyze each pipe subunit and crack direction separately and average feature importance metrics
- Pipe subunit inputs and global outputs:
Average input across all pipe subunits (and crack types) and perform single analysis to determine feature importance
- This method may cause underreporting of importance metrics in comparison to alternative methods 6
Results: Leak Output
- Output: Leak (through wall crack) in any pipe subunit
- Analyzed using Gradient Boosted Trees Classifier (GBC)
- Allows comparison between averaging subunit inputs and averaging subunit analysis outputs
- Top importance parameters for averaged subunit inputs:
- Primary water stress-corrosion cracking (PWSCC) initiation parameters
- PWSCC growth parameters
- Operating Temp./Pressure
- Pipe outside diameter /
Thickness
- Welding Residual Stresses (WRS) - Hoop
- Pipe yield strength 7
Results: Rupture Output
- Rupture full model output (not subunit basis)
- Analyzed using all three machine learning classification algorithms
- Best prediction accuracy and CV score using Gradient Boosted Trees Classifier
- General agreement between all three fitted models
- Top importance parameters consistent with leak parameters
- PWSCC initiation
- Axial WRS ranked above Hoop (opposite of leak) 8
Changes in Importance Rankings
- Importance factor results may be compared between different Most important inputs scenarios/cases to consistently drive result show changes in the relative ordering of inputs
- Useful for:
- Comparison between alternate metamodeling approaches Scatter indicates low
- Determining differences confidence in relative in sensitivity between ranking (in the noise) different outputs of interest
- Comparing runs with different model settings (e.g., different ISI intervals) 9
Conclusions
- Key findings
- Relative comparisons (e.g., Axial vs. Circ, Rupture with/without ISI) are very useful for sanity checking the model
- Relatively high confidence in the identification of highest-impact inputs but low confidence in ordering of low-impact inputs
- General challenges
- Input distributions need to be selected carefully to get informative results
- A default real-world analysis input set is probably not sufficient
- Special consideration needed for inputs that are not continuous variables (e.g., settings flags)
- xLPR-specific challenges
- Prediction of simulation-wide outcomes using subunit-level sampled values
- Consideration of all inputs would be time-intensive (labor to extract sampled values and simulation time to adequately cover full input space)
- Potential future improvements
- Include more inputs in the machine learning model
- Examine other outputs of interest (e.g., leak rate jump indicator)
- Examine alternate configurations that cant be covered automatically using input distributions
- Use more advanced methods to improve on the relative rank importance metric (e.g., variance decomposition) 10