ML21230A364: Difference between revisions

From kanterella
Jump to navigation Jump to search
(StriderTol Bot change)
(StriderTol Bot change)
Line 16: Line 16:


=Text=
=Text=
{{#Wiki_filter:Global Sensitivity Analysis of xLPR using Metamodeling (Machine Learning) xLPR User Group Meeting August 18, 2021 1
{{#Wiki_filter:}}
 
===Background===
* As part of applying xLPR to production analyses and to further validate the model, sensitivity analyses were conducted
    -  Sensitivity studies can be used to assess the impacts of uncertain parameters and analysis assumptions on the results
    -  Sensitivity analysis is a useful tool for identifying important uncertain model inputs that explain a large degree of the uncertainty in a quantity of interest
* Reasons to perform a sensitivity analysis:
    -  Identify inputs that warrant greatest level of scrutiny, validation, and further sensitivity analysis
    -  Identify inputs that are key to the results
    -  Model validation
    -  Improve understanding of model behavior
    -  Reduction of model complexity (e.g., set unimportant inputs to constant values)
    -  Inform advanced Monte Carlo sampling strategies (e.g., importance sampling)
* Available techniques (see TLR-RES/DE/CIB-2021-11; ML21133A485):
    -  One-at-a-time
    -  Local partial derivatives (e.g., Adjoint Modeling)
    -  Variance-based (e.g., Sobol method)
    -  Linear regression
    -  Metamodels 2
 
Sensitivity Analysis using Metamodels
* Why machine learning metamodeling?
    - Can handle correlated inputs
    - Accurately reflects non-monotonicity, non-linearity, and interactions
    - Importance measures reflect the whole input space
    - Several machine learning models automatically generate sensitivity metrics and down-select input variables based on information gained as part of the model fitting process
    - Fitted model can be used in place of the original model to compute quantitative sensitivity measures at lower computational cost
* Focus of this presentation: using built-in sensitivity metrics generated during fitting 3
 
Metamodeling Analysis Workflow
* Run the probabilistic code and collect results
* Implement metamodeling code
    - Import results from probabilistic code runs
    - Transform results to prepare for input to metamodel fitting (e.g., accounting for spatially sampled variables)
    - Fit the metamodel, including parameter optimization using cross-validation
    - Extract and report input importance metrics
* Evaluate
    - Examine goodness of fit metrics
    - Compare importance ranking results from alternate metamodels
    - Compare importance ranking results across different outputs of interest
* Iterate
    -  Collect more inputs
    -  Analyze different outputs
    -  Run different discrete configurations of the probabilistic code
    -  Use different metamodels / different metamodel parameters 4
 
Model Implementation
* Python 3.6 using Scikit Learn Package*
* Machine learning models implemented:
    - Gradient Boosting Decision Trees
    - Random Forest Decision Trees
    - Linear Support Vector Machines
* All models used are classifiers (as opposed to regressors) because the outcomes are binary (yes/no). Regressor models would be used for scalar outputs.
* All models include metrics for feature selection / feature importance
* Initial work focused on subset of 60 inputs:
    - Inputs that are expected to have high importance
    - Distributed inputs
    - Constant inputs uniformly distributed from 0.8 to 1.2 times constant value
* Outputs analyzed:
    - Occurrence leak
    - Occurrence rupture (with and without inservice inspection (ISI))
    *Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011 5
 
Spatially Distributed Inputs / Outputs
* Pipe section split into 19 subunits that can potentially crack
* Some inputs sampled on a subunit basis
* Some outputs also available on a subunit basis
* Aggregation methodology for subunit inputs / outputs
    - Pipe subunit inputs and outputs: Analyze each pipe subunit and crack direction separately and average feature importance metrics
    - Pipe subunit inputs and global outputs:
Average input across all pipe subunits (and crack types) and perform single analysis to determine feature importance
    - This method may cause underreporting of importance metrics in comparison to alternative methods 6
 
Results: Leak Output
* Output: Leak (through wall crack) in any pipe subunit
* Analyzed using Gradient Boosted Trees Classifier (GBC)
* Allows comparison between averaging subunit inputs and averaging subunit analysis outputs
* Top importance parameters for averaged subunit inputs:
    -  Primary water stress-corrosion cracking (PWSCC) initiation parameters
    -  PWSCC growth parameters
    -  Operating Temp./Pressure
    -  Pipe outside diameter /
Thickness
    -  Welding Residual Stresses (WRS) - Hoop
    -  Pipe yield strength 7
 
Results: Rupture Output
* Rupture full model output (not subunit basis)
* Analyzed using all three machine learning classification algorithms
* Best prediction accuracy and CV score using Gradient Boosted Trees Classifier
* General agreement between all three fitted models
* Top importance parameters consistent with leak parameters
    - PWSCC initiation
    - Axial WRS ranked above Hoop (opposite of leak) 8
 
Changes in Importance Rankings
* Importance factor results may be compared between different                                                Most important inputs scenarios/cases to                                      consistently drive result show changes in the relative ordering of inputs
* Useful for:
  - Comparison between alternate metamodeling approaches                                          Scatter indicates low
  - Determining differences                                confidence in relative in sensitivity between                              ranking (in the noise) different outputs of interest
  - Comparing runs with different model settings (e.g., different ISI intervals) 9
 
Conclusions
* Key findings
    - Relative comparisons (e.g., Axial vs. Circ, Rupture with/without ISI) are very useful for sanity checking the model
    - Relatively high confidence in the identification of highest-impact inputs but low confidence in ordering of low-impact inputs
* General challenges
    - Input distributions need to be selected carefully to get informative results
* A default real-world analysis input set is probably not sufficient
    - Special consideration needed for inputs that are not continuous variables (e.g., settings flags)
* xLPR-specific challenges
    - Prediction of simulation-wide outcomes using subunit-level sampled values
    - Consideration of all inputs would be time-intensive (labor to extract sampled values and simulation time to adequately cover full input space)
* Potential future improvements
    - Include more inputs in the machine learning model
    - Examine other outputs of interest (e.g., leak rate jump indicator)
    - Examine alternate configurations that cant be covered automatically using input distributions
    - Use more advanced methods to improve on the relative rank importance metric (e.g., variance decomposition) 10}}

Revision as of 00:03, 17 January 2022

Xlpr Metamodeling
ML21230A364
Person / Time
Issue date: 08/18/2021
From:
NRC/RES/DE
To:
Homiack M
Shared Package
ML21230A354 List:
References
Download: ML21230A364 (10)


Text