ML21230A364: Difference between revisions

From kanterella
Jump to navigation Jump to search
(StriderTol Bot change)
(StriderTol Bot change)
 
(2 intermediate revisions by the same user not shown)
Line 16: Line 16:


=Text=
=Text=
{{#Wiki_filter:}}
{{#Wiki_filter:Global Sensitivity Analysis of xLPR using Metamodeling (Machine Learning) xLPR User Group Meeting August 18, 2021 1
 
===Background===
* As part of applying xLPR to production analyses and to further validate the model, sensitivity analyses were conducted
    -  Sensitivity studies can be used to assess the impacts of uncertain parameters and analysis assumptions on the results
    -  Sensitivity analysis is a useful tool for identifying important uncertain model inputs that explain a large degree of the uncertainty in a quantity of interest
* Reasons to perform a sensitivity analysis:
    -  Identify inputs that warrant greatest level of scrutiny, validation, and further sensitivity analysis
    -  Identify inputs that are key to the results
    -  Model validation
    -  Improve understanding of model behavior
    -  Reduction of model complexity (e.g., set unimportant inputs to constant values)
    -  Inform advanced Monte Carlo sampling strategies (e.g., importance sampling)
* Available techniques (see TLR-RES/DE/CIB-2021-11; ML21133A485):
    -  One-at-a-time
    -  Local partial derivatives (e.g., Adjoint Modeling)
    -  Variance-based (e.g., Sobol method)
    -  Linear regression
    -  Metamodels 2
 
Sensitivity Analysis using Metamodels
* Why machine learning metamodeling?
    - Can handle correlated inputs
    - Accurately reflects non-monotonicity, non-linearity, and interactions
    - Importance measures reflect the whole input space
    - Several machine learning models automatically generate sensitivity metrics and down-select input variables based on information gained as part of the model fitting process
    - Fitted model can be used in place of the original model to compute quantitative sensitivity measures at lower computational cost
* Focus of this presentation: using built-in sensitivity metrics generated during fitting 3
 
Metamodeling Analysis Workflow
* Run the probabilistic code and collect results
* Implement metamodeling code
    - Import results from probabilistic code runs
    - Transform results to prepare for input to metamodel fitting (e.g., accounting for spatially sampled variables)
    - Fit the metamodel, including parameter optimization using cross-validation
    - Extract and report input importance metrics
* Evaluate
    - Examine goodness of fit metrics
    - Compare importance ranking results from alternate metamodels
    - Compare importance ranking results across different outputs of interest
* Iterate
    -  Collect more inputs
    -  Analyze different outputs
    -  Run different discrete configurations of the probabilistic code
    -  Use different metamodels / different metamodel parameters 4
 
Model Implementation
* Python 3.6 using Scikit Learn Package*
* Machine learning models implemented:
    - Gradient Boosting Decision Trees
    - Random Forest Decision Trees
    - Linear Support Vector Machines
* All models used are classifiers (as opposed to regressors) because the outcomes are binary (yes/no). Regressor models would be used for scalar outputs.
* All models include metrics for feature selection / feature importance
* Initial work focused on subset of 60 inputs:
    - Inputs that are expected to have high importance
    - Distributed inputs
    - Constant inputs uniformly distributed from 0.8 to 1.2 times constant value
* Outputs analyzed:
    - Occurrence leak
    - Occurrence rupture (with and without inservice inspection (ISI))
    *Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011 5
 
Spatially Distributed Inputs / Outputs
* Pipe section split into 19 subunits that can potentially crack
* Some inputs sampled on a subunit basis
* Some outputs also available on a subunit basis
* Aggregation methodology for subunit inputs / outputs
    - Pipe subunit inputs and outputs: Analyze each pipe subunit and crack direction separately and average feature importance metrics
    - Pipe subunit inputs and global outputs:
Average input across all pipe subunits (and crack types) and perform single analysis to determine feature importance
    - This method may cause underreporting of importance metrics in comparison to alternative methods 6
 
Results: Leak Output
* Output: Leak (through wall crack) in any pipe subunit
* Analyzed using Gradient Boosted Trees Classifier (GBC)
* Allows comparison between averaging subunit inputs and averaging subunit analysis outputs
* Top importance parameters for averaged subunit inputs:
    -  Primary water stress-corrosion cracking (PWSCC) initiation parameters
    -  PWSCC growth parameters
    -  Operating Temp./Pressure
    -  Pipe outside diameter /
Thickness
    -  Welding Residual Stresses (WRS) - Hoop
    -  Pipe yield strength 7
 
Results: Rupture Output
* Rupture full model output (not subunit basis)
* Analyzed using all three machine learning classification algorithms
* Best prediction accuracy and CV score using Gradient Boosted Trees Classifier
* General agreement between all three fitted models
* Top importance parameters consistent with leak parameters
    - PWSCC initiation
    - Axial WRS ranked above Hoop (opposite of leak) 8
 
Changes in Importance Rankings
* Importance factor results may be compared between different                                                Most important inputs scenarios/cases to                                      consistently drive result show changes in the relative ordering of inputs
* Useful for:
  - Comparison between alternate metamodeling approaches                                          Scatter indicates low
  - Determining differences                                confidence in relative in sensitivity between                              ranking (in the noise) different outputs of interest
  - Comparing runs with different model settings (e.g., different ISI intervals) 9
 
Conclusions
* Key findings
    - Relative comparisons (e.g., Axial vs. Circ, Rupture with/without ISI) are very useful for sanity checking the model
    - Relatively high confidence in the identification of highest-impact inputs but low confidence in ordering of low-impact inputs
* General challenges
    - Input distributions need to be selected carefully to get informative results
* A default real-world analysis input set is probably not sufficient
    - Special consideration needed for inputs that are not continuous variables (e.g., settings flags)
* xLPR-specific challenges
    - Prediction of simulation-wide outcomes using subunit-level sampled values
    - Consideration of all inputs would be time-intensive (labor to extract sampled values and simulation time to adequately cover full input space)
* Potential future improvements
    - Include more inputs in the machine learning model
    - Examine other outputs of interest (e.g., leak rate jump indicator)
    - Examine alternate configurations that cant be covered automatically using input distributions
    - Use more advanced methods to improve on the relative rank importance metric (e.g., variance decomposition) 10}}

Latest revision as of 18:12, 18 January 2022

Xlpr Metamodeling
ML21230A364
Person / Time
Issue date: 08/18/2021
From:
NRC/RES/DE
To:
Homiack M
Shared Package
ML21230A354 List:
References
Download: ML21230A364 (10)


Text

Global Sensitivity Analysis of xLPR using Metamodeling (Machine Learning) xLPR User Group Meeting August 18, 2021 1

Background

  • As part of applying xLPR to production analyses and to further validate the model, sensitivity analyses were conducted

- Sensitivity studies can be used to assess the impacts of uncertain parameters and analysis assumptions on the results

- Sensitivity analysis is a useful tool for identifying important uncertain model inputs that explain a large degree of the uncertainty in a quantity of interest

  • Reasons to perform a sensitivity analysis:

- Identify inputs that warrant greatest level of scrutiny, validation, and further sensitivity analysis

- Identify inputs that are key to the results

- Model validation

- Improve understanding of model behavior

- Reduction of model complexity (e.g., set unimportant inputs to constant values)

- Inform advanced Monte Carlo sampling strategies (e.g., importance sampling)

  • Available techniques (see TLR-RES/DE/CIB-2021-11; ML21133A485):

- One-at-a-time

- Local partial derivatives (e.g., Adjoint Modeling)

- Variance-based (e.g., Sobol method)

- Linear regression

- Metamodels 2

Sensitivity Analysis using Metamodels

- Can handle correlated inputs

- Accurately reflects non-monotonicity, non-linearity, and interactions

- Importance measures reflect the whole input space

- Several machine learning models automatically generate sensitivity metrics and down-select input variables based on information gained as part of the model fitting process

- Fitted model can be used in place of the original model to compute quantitative sensitivity measures at lower computational cost

  • Focus of this presentation: using built-in sensitivity metrics generated during fitting 3

Metamodeling Analysis Workflow

  • Run the probabilistic code and collect results
  • Implement metamodeling code

- Import results from probabilistic code runs

- Transform results to prepare for input to metamodel fitting (e.g., accounting for spatially sampled variables)

- Fit the metamodel, including parameter optimization using cross-validation

- Extract and report input importance metrics

  • Evaluate

- Examine goodness of fit metrics

- Compare importance ranking results from alternate metamodels

- Compare importance ranking results across different outputs of interest

  • Iterate

- Collect more inputs

- Analyze different outputs

- Run different discrete configurations of the probabilistic code

- Use different metamodels / different metamodel parameters 4

Model Implementation

  • Python 3.6 using Scikit Learn Package*

- Gradient Boosting Decision Trees

- Random Forest Decision Trees

- Linear Support Vector Machines

  • All models used are classifiers (as opposed to regressors) because the outcomes are binary (yes/no). Regressor models would be used for scalar outputs.
  • All models include metrics for feature selection / feature importance
  • Initial work focused on subset of 60 inputs:

- Inputs that are expected to have high importance

- Distributed inputs

- Constant inputs uniformly distributed from 0.8 to 1.2 times constant value

  • Outputs analyzed:

- Occurrence leak

- Occurrence rupture (with and without inservice inspection (ISI))

  • Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011 5

Spatially Distributed Inputs / Outputs

  • Pipe section split into 19 subunits that can potentially crack
  • Some inputs sampled on a subunit basis
  • Some outputs also available on a subunit basis
  • Aggregation methodology for subunit inputs / outputs

- Pipe subunit inputs and outputs: Analyze each pipe subunit and crack direction separately and average feature importance metrics

- Pipe subunit inputs and global outputs:

Average input across all pipe subunits (and crack types) and perform single analysis to determine feature importance

- This method may cause underreporting of importance metrics in comparison to alternative methods 6

Results: Leak Output

  • Output: Leak (through wall crack) in any pipe subunit
  • Analyzed using Gradient Boosted Trees Classifier (GBC)
  • Allows comparison between averaging subunit inputs and averaging subunit analysis outputs
  • Top importance parameters for averaged subunit inputs:

- Primary water stress-corrosion cracking (PWSCC) initiation parameters

- PWSCC growth parameters

- Operating Temp./Pressure

- Pipe outside diameter /

Thickness

- Welding Residual Stresses (WRS) - Hoop

- Pipe yield strength 7

Results: Rupture Output

  • Rupture full model output (not subunit basis)
  • Best prediction accuracy and CV score using Gradient Boosted Trees Classifier
  • General agreement between all three fitted models
  • Top importance parameters consistent with leak parameters

- PWSCC initiation

- Axial WRS ranked above Hoop (opposite of leak) 8

Changes in Importance Rankings

  • Importance factor results may be compared between different Most important inputs scenarios/cases to consistently drive result show changes in the relative ordering of inputs
  • Useful for:

- Comparison between alternate metamodeling approaches Scatter indicates low

- Determining differences confidence in relative in sensitivity between ranking (in the noise) different outputs of interest

- Comparing runs with different model settings (e.g., different ISI intervals) 9

Conclusions

  • Key findings

- Relative comparisons (e.g., Axial vs. Circ, Rupture with/without ISI) are very useful for sanity checking the model

- Relatively high confidence in the identification of highest-impact inputs but low confidence in ordering of low-impact inputs

  • General challenges

- Input distributions need to be selected carefully to get informative results

  • A default real-world analysis input set is probably not sufficient

- Special consideration needed for inputs that are not continuous variables (e.g., settings flags)

  • xLPR-specific challenges

- Prediction of simulation-wide outcomes using subunit-level sampled values

- Consideration of all inputs would be time-intensive (labor to extract sampled values and simulation time to adequately cover full input space)

  • Potential future improvements

- Include more inputs in the machine learning model

- Examine other outputs of interest (e.g., leak rate jump indicator)

- Examine alternate configurations that cant be covered automatically using input distributions

- Use more advanced methods to improve on the relative rank importance metric (e.g., variance decomposition) 10