ML21201A375

From kanterella
Jump to navigation Jump to search
NRC Data Science and Ai Workshop 1 All_Presentations
ML21201A375
Person / Time
Issue date: 07/07/2021
From: Theresa Lalain
NRC/RES/DSA
To:
Matthew D
Shared Package
ML21201A227 List:
References
Download: ML21201A375 (216)


Text

Opening Remarks Theresa Lalain, Ph.D.

Deputy Director, Division of Systems Analysis Office of Nuclear Regulatory Research

WELCOME

  • Over 250 registered attendees
  • Participation from U.S.,

Canada, Spain, France, UAE, and Japan

Regulatory Purpose

  • NRC recognizes a need to use data analytics for regulatory enhancements as part of its effort to become a modern, risk-informed regulator1
  • The nuclear industry is investigating and using AI applications; therefore, the NRC must be prepared to understand and evaluate the technology

Data Science and Artificial Intelligence Overview

  • Artificial Intelligence (AI)

- Build intelligent smart machines

- Learn from data and deliver predictive models

  • Natural Language Processing (NLP)

- Process and analyze large amounts of natural language data

  • Deep Learning (DL)

- ML methods based on artificial neural networks

Engagement and Initiatives SHARING KNOWLEDGE AND SEEKING STAKEHOLDER INPUT LEVERAGING RESEARCH ACTIVITIES NRC is engaging and participating with external entities to best prepare for AI impacts on regulatory processes and decisionmaking

Upcoming

  • Current Topics Workshops
  • AUGUST 2021
  • Future Focused Initiatives
  • SEPT/OCT 2021

June 29, 2021 Ronald Laurids Boring, PhD, FHFES Introduction to Artificial Intelligence (AI) and Some of Its Basic Terminology

Is this AI?

Decoder Ring

Is this AI?

Microsoft Clippy

Is this AI?

Google Maps

Is this AI?

Apple Watch Heartrate Monitor

Is this AI?

Microsoft Power BI Sample Dashboard

Is this AI?

NuScale Power Control Room Simulator

They All Feature Applications of AI Lets Look at Some of the History and Technology Underlying AI

It all began in 1956

1956 Was a Watershed Year

  • Two Congressional Hearings on Automation
  • Dartmouth Summer Workshop on Artificial Intelligence We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

Birth of AI, featuring founders like Marvin Minsky, John McCarthy, Claude Shannon, Allen Newell, and Herb Simon

  • Symposium on Information Theory at MIT on September 11, 1956 Birthplace of information processing theory and study of cognition Featured George Miller, Noam Chomsky, Allen Newell, and Herb Simon, and others
  • Birth of AI and cognitive psychology occurred at the same time, because they were interested in the same problems Deconstructing human thinking into information allowed us to make computer models of it

1956 Was a Watershed Year

  • Two Congressional Hearings on Automation
  • Dartmouth Summer Workshop on Artificial Intelligence We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

Birth of AI, featuring founders like Marvin Minsky, John McCarthy, Claude Shannon, Allen Newell, and Herb Simon

  • Symposium on Information Theory at MIT on September 11, 1956 Birthplace of information processing theory and study of cognition Featured George Miller, Noam Chomsky, Allen Newell, and Herb Simon, and others
  • Birth of AI and cognitive psychology occurred at the same time, because they were interested in the same problems Deconstructing human thinking into information allowed us to make computer models of it

1956 Was a Watershed Year

  • Two Congressional Hearings on Automation
  • Dartmouth Summer Workshop on Artificial Intelligence We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

Birth of AI, featuring founders like Marvin Minsky, John McCarthy, Claude Shannon, Allen Newell, and Herb Simon

  • Symposium on Information Theory at MIT on September 11, 1956 Birthplace of information processing theory and study of cognition Featured George Miller, Noam Chomsky, Allen Newell, and Herb Simon, and others
  • Birth of AI and cognitive psychology occurred at the same time, because they were interested in the same problems Deconstructing human thinking into information allowed us to make computer models of it

1956 Was a Watershed Year

  • Two Congressional Hearings on Automation
  • Dartmouth Summer Workshop on Artificial Intelligence We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

Birth of AI, featuring founders like Marvin Minsky, John McCarthy, Claude Shannon, Allen Newell, and Herb Simon

  • Symposium on Information Theory at MIT on September 11, 1956 Birthplace of information processing theory and study of cognition Featured George Miller, Noam Chomsky, Allen Newell, and Herb Simon, and others
  • Birth of AI and cognitive psychology occurred at the same time, because they were interested in the same problems Deconstructing human thinking into information allowed us to make computer models of it

1956 Was a Watershed Year

  • Two Congressional Hearings on Automation
  • Dartmouth Summer Workshop on Artificial Intelligence We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

Birth of AI, featuring founders like Marvin Minsky, John McCarthy, Claude Shannon, Allen Newell, and Herb Simon

  • Symposium on Information Theory at MIT on September 11, 1956 Birthplace of information processing theory and study of cognition Featured George Miller, Noam Chomsky, Allen Newell, and Herb Simon, and others
  • Birth of AI and cognitive psychology occurred at the same time, because they were interested in the same problems Deconstructing human thinking into information allowed us to make computer models of it

1956 Was a Watershed Year

  • Two Congressional Hearings on Automation
  • Dartmouth Summer Workshop on Artificial Intelligence We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

Birth of AI, featuring founders like Marvin Minsky, John McCarthy, Claude Shannon, Allen Newell, and Herb Simon

  • Symposium on Information Theory at MIT on September 11, 1956 Birthplace of information processing theory and study of cognition Featured George Miller, Noam Chomsky, Allen Newell, and Herb Simon, and others
  • Birth of AI and cognitive psychology occurred at the same time, because they were interested in the same problems Deconstructing human thinking into information allowed us to make computer models of it

1956 Was a Watershed Year

  • Two Congressional Hearings on Automation
  • Dartmouth Summer Workshop on Artificial Intelligence We propose that a 2-month, 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover, New Hampshire. The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

Birth of AI, featuring founders like Marvin Minsky, John McCarthy, Claude Shannon, Allen Newell, and Herb Simon

  • Symposium on Information Theory at MIT on September 11, 1956 Birthplace of information processing theory and study of cognition Featured George Miller, Noam Chomsky, Allen Newell, and Herb Simon, and others
  • Birth of AI and cognitive psychology occurred at the same time, because they were interested in the same problems Deconstructing human thinking into information allowed us to make computer models of it

Big Picture in Information Processing Human-System Interface (HSI)

  • Computer output = human sensation and perception
  • Human action = computer input
  • Its a feedback loop Each step also represents a form of intelligence that may be modelled artificially
  • Perception: Pattern recognition, computer vision, natural language processing
  • Knowledge: Expert systems
  • Actions and behaviors: Automated controllers

Big Picture in Information Processing Human-System Interface (HSI)

  • Computer output = human sensation and perception
  • Human action = computer input
  • Its a feedback loop Each step also represents a form of intelligence that may be modelled artificially
  • Perception: Pattern recognition, computer vision, natural language processing
  • Knowledge: Expert systems
  • Actions and behaviors: Automated controllers

How Does AI Work?

Two Types of AI

  • Good Old-Fashioned AI (GOFAI)

Symbolic logic systems to represent basic elements of human thought like language, numbers, or goals Production systems featuring if-then logic

  • General Problem Solver created by Newell and Simon in 1959 Cognitive modeling architectures
  • Systems like Soar and ACT-R with a heavy emphasis on how humans accomplish goals Much of focus is not to create learning but to capture human-like intelligence related to how humans carry out decisions and actions

Two Types of AI

  • Good Old-Fashioned AI (GOFAI)

Symbolic logic systems to represent basic elements of human thought like language, numbers, or goals Production systems featuring if-then logic

  • General Problem Solver created by Newell and Simon in 1959 Cognitive modeling architectures
  • Systems like Soar and ACT-R with a heavy emphasis on how humans accomplish goals Much of focus is not to create learning but to capture human-like intelligence related to how humans carry out decisions and actions

Two Types of AI

  • Good Old-Fashioned AI (GOFAI)

Symbolic logic systems to represent basic elements of human thought like language, numbers, or goals Production systems featuring if-then logic

  • General Problem Solver created by Newell and Simon in 1959 Cognitive modeling architectures
  • Systems like Soar and ACT-R with a heavy emphasis on how humans accomplish goals Much of focus is not to create learning but to capture human-like intelligence related to how humans carry out decisions and actions

Two Types of AI

  • Good Old-Fashioned AI (GOFAI)

Symbolic logic systems to represent basic elements of human thought like language, numbers, or goals Production systems featuring if-then logic

  • General Problem Solver created by Newell and Simon in 1959 Cognitive modeling architectures
  • Systems like Soar and ACT-R with a heavy emphasis on how humans accomplish goals Much of focus is not to create learning but to capture human-like intelligence related to how humans carry out decisions and actions

Two Types of AI

  • Neural Networks Perceptron developed in 1958 as approximation of single-cell neuron By 1960s, mathematical algorithms like backpropagation developed to allow perceptrons to learn through training
  • Machine learning Multiple perceptrons chained together to create neural networks
  • More layers of neural networks chained to together to create deep learning
  • Facilitated by greater availability of parallel computing (e.g., graphical processing units)

Two Types of AI

  • Neural Networks Perceptron developed in 1958 as approximation of single-cell neuron By 1960s, mathematical algorithms like backpropagation developed to allow perceptrons to learn through training
  • Machine learning Multiple perceptrons chained together to create neural networks
  • More layers of neural networks chained to together to create deep learning
  • Facilitated by greater availability of parallel computing (e.g., graphical processing units)

Two Types of AI

  • Neural Networks Perceptron developed in 1958 as approximation of single-cell neuron By 1960s, mathematical algorithms like backpropagation developed to allow perceptrons to learn through training
  • Machine learning Multiple perceptrons chained together to create neural networks
  • More layers of neural networks chained to together to create deep learning
  • Facilitated by greater availability of parallel computing (e.g., graphical processing units)

Two Types of AI

  • Different Uses GOFAI is good at following rules and making decisions Neural networks are good at pattern recognition when trained
  • Self-Driving Vehicle Example Use GOFAI for the rules of the road
  • Procedural knowledge
  • Control automation Neural networks used to recognize the world
  • The eyes on the road
  • Information automation

Two Types of AI

  • Different Uses GOFAI is good at following rules and making decisions Neural networks are good at pattern recognition when trained
  • Self-Driving Vehicle Example Use GOFAI for the rules of the road
  • Procedural knowledge
  • Control automation Neural networks used to recognize the world
  • The eyes on the road
  • Information automation

Very Briefly Noted Some Key Applications of AI in Nuclear Industry

Key Applications of AI in Nuclear Industry Automation

  • Control automation: Using AI to control a system (or a plant, such as might be the case in a microreactor)
  • Information automation: Using AI to intelligently gather information that operator needs Detection of problems such as early warning systems and condition monitoring Prediction
  • Predictiveinstead of prescriptivemaintenance systems Human-System Interface
  • Smart notification systems like alarm filtering
  • Natural language processing for hands-free interactivity

Example Possible Automation in Nuclear Power Information Automation (Top), Control Automation (Middle), and Analog Control (Bottom)

Computerized Operator Support System (INL)

Key Applications of AI in Nuclear Industry Automation

  • Control automation: Using AI to control a system (or a plant, such as might be the case in a microreactor)
  • Information automation: Using AI to intelligently gather information that operator needs Detection of problems such as early warning systems and condition monitoring Prediction
  • Predictiveinstead of preventativemaintenance systems Human-System Interface
  • Smart notification systems like alarm filtering
  • Natural language processing for hands-free interactivity

Predictive Maintenance

  • Look for signs of performance degradation through sensor data Catch parts that are failing sooner than anticipated Leave perfectly good parts in operation
  • Convey information to human Cassia Networks

Key Applications of AI in Nuclear Industry Automation

  • Control automation: Using AI to control a system (or a plant, such as might be the case in a microreactor)
  • Information automation: Using AI to intelligently gather information that operator needs Detection of problems such as early warning systems and condition monitoring Prediction
  • Predictiveinstead of preventativemaintenance systems Human-System Interface
  • Smart notification systems like alarm filtering
  • Natural language processing for hands-free interactivity

Example Smart Notification System Computerized Operator Support System (INL)

Who Knows What the Future Will Bring, But AI Will Be Part of It!

ronald.boring@inl.gov Introduction to Machine Learning Dr. Mark Fuge Univ. of Maryland, College Park (301) 405-2558 fuge@umd.edu ideal.umd.edu

What I hope you get from today:

1. What is Machine Learning?
2. When is it helpful?
3. When is it not helpful?
4. Where do you go from here?

Supervised Unsupervised Learning Learning Reinforcement Learning

Typical Engineering or Types of ML Science Tasks Supervised Reduced Order Models Learning Multi-Fidelity / Coarse-graining Inverse Problems/ Design Unsupervised Forecasting/ Prognostics Learning Generative Design Anomaly Detection System identification Reinforcement Optimal Control Learning Optimization

Typical Engineering or Types of ML Science Tasks Supervised Reduced Order Models Learning Multi-Fidelity / Coarse-graining Inverse Problems/ Design Unsupervised Forecasting/ Prognostics Learning Generative Design Anomaly Detection System identification Reinforcement Optimal Control Learning Optimization

1. What is the problem that needs solving?
2. How can machine learning help?
3. How do we know it is working?
4. When does it break down?

Typical Engineering or Types of ML Science Tasks Supervised Reduced Order Models Learning Multi-Fidelity / Coarse-graining Inverse Problems/ Design Unsupervised Forecasting/ Prognostics Learning Generative Design Anomaly Detection System identification Reinforcement Optimal Control Learning Optimization

Input Model Output Loss( Input , Output )

Molecular structure model Prope rt ie s

Typical Engineering or Types of ML Science Tasks Supervised Reduced Order Models Learning Multi-Fidelity / Coarse-graining Inverse Problems/ Design Unsupervised Forecasting/ Prognostics Learning Generative Design Anomaly Detection System identification Reinforcement Optimal Control Learning Optimization

Example: ARPA -E DIFFERENTIATE Program Inverse Design of Aero & Heat Transfer Surfaces Mach number, Reynolds number, Target lift coefficient, CFD + Supercomputer Optimized airfoil coordinates + Angle of attack, (192 x 2) (1, a scalar)

Example: ARPA-E DIFFERENTIATE Program Inverse Design of Aero & Heat {, , }

Input Model Output Transfer Surfaces Loss( Input , Output )

Mach number, Reynolds number, Target lift coefficient, (l)

CFD + Supercomputer Optimized airfoil coordinates + Angle of attack, (192 x 2) (1, a scalar)

Typical Engineering or Types of ML Science Tasks Supervised Reduced Order Models Learning Multi-Fidelity / Coarse-graining Inverse Problems/ Design Unsupervised Forecasting/ Prognostics Learning Generative Design Anomaly Detection System identification Reinforcement Optimal Control Learning Optimization

Problem: Original airfoil representation (~100 coordinates) is too large to be useful.

The Manifold Hypothesis Example: Learning Airfoil Manifolds

?

Input Model Output Loss( Input , Output )

Example: Learning Airfoil Manifolds Input Model Output Loss( Input , Output )

Example: Learning Airfoil Manifolds Input Model Output Loss( Input , Output )

Example: Learning Airfoil Manifolds Input Model Output Loss( Input , Output )

Example: Learning Airfoil Manifolds Input Model Model Output Loss( Input , Output )

Example: Learning Airfoil Manifolds Input Model Model Output Loss( Input , Output )

Example: Learning Airfoil Manifolds Input Model Model Output Loss( Input , Output )

Example: Learning Airfoil Manifolds Input Model Model Output Loss( Input , Output )

Example: Learning Airfoil Manifolds Input Model Model Output Loss( Input , Output )

Typical Engineering or Types of ML Science Tasks Supervised Reduced Order Models Learning Multi-Fidelity / Coarse-graining Inverse Problems/ Design Unsupervised Forecasting/ Prognostics Learning Generative Design Anomaly Detection System identification Reinforcement Optimal Control Learning Optimization

Reinforcement Learning vs Other models Performance/

Current State Reward

{, , }

Input Model Output Adjust the Loss(Reward) state Loss( Input , Output )

Input Model Model Output Loss( Input , Output )

Reinforcement Learning vs Other models Performance/

Current State Reward

{, , }

Input Model Output Adjust the Loss(Reward) state Loss( Input , Output )

Model, aka. policy Input Model Model Output Loss( Input , Output )

Example: Optimizing Molecular Properties In Out Loss In Out Loss State Loss

Example: Optimizing Molecular Properties In Out Loss In Out Loss State Loss

Example: Optimizing Molecular Properties In Out Loss In Out Loss State Loss

Example: Optimizing Molecular Properties In Out Loss In Out Loss State Loss

Example: Optimizing Molecular Properties In Out Loss In Out Loss State Loss

Typical Engineering or Types of ML Science Tasks Supervised Reduced Order Models Learning Multi-Fidelity / Coarse-graining Inverse Problems/ Design Unsupervised Forecasting/ Prognostics Learning Generative Design Anomaly Detection System identification Reinforcement Optimal Control Learning Optimization

Where do you go from here?

Te chnical Challe nge s Socio-Econom ic Challe nge s How do we cre ate , colle ct, and share be nchm ark How do we e stim ate the e conom ic Re turn on datase ts? Inve stm e nt for ML te chnique s or datase ts?

How do we be st com bine e xisting Engine e ring How do we prote ct IP or Privacy in traine d m ode ls?

knowle dge with ML te chnique s?

What re gulatory fram e works do we ne e d for How do we pe rform Ve rification and Validation? ve rification of safe ty critical or othe r syste m s?

What are appropriate Standards for such m ode ls? How should we train our workforce diffe re ntly to le ve rage the se te chnique s?

What are the ke y Figure s of Me rit we should be optim izing in such syste m s?

For m ore de tails se e :

- JMD Editorial: ML in Engine e ring De sign: http://ide al.um d.e du/pape rs/pape r/m l-e ng-de sign-jm d

- Sum m ary of Data-Drive n De sign workshop: http://ide al.um d.e du/pape rs/pape r/d3-im plications

Where do you go from here?

What can you do?

Continue your e ducation in the se are as, or for those of your workforce .

Re ach out to re se arche rs and dom ain e xpe rts for ne w te chnical challe nge s we can re solve in the se are as.

Provide guidance to policy and re gulatory bodie s on how the se te chnique s m ight be m anage d.

Advocate for additional studie s of im pact in the se are as.

Thank you Dr. Mark Fuge Univ. of Maryland, College Park (301) 405-2558 fuge@umd.edu ideal.umd.edu

Backup Slides What are Generative Models doing?

(l) 1 : (l) log = log + log l det 1 l

Example: Identifying Feasible Performance Regions

{Yes, No}

Input Model Output Loss( Input , Output )

Example: Identifying Feasible Performance Regions Conventional adaptive Active Expansion Input Model Output sampling (Straddle) Sampling Loss( Input , Output )

Introduction to Deep Learning Dr. Mark Fuge Univ. of Maryland, College Park (301) 405-2558 fuge@umd.edu ideal.umd.edu

Machine Learning Input Model Output Adaptive Basis Loss( Input , Output )

Functions Deep Learning

Typical unreadable Deep Learning Slide from my research group for a recent DoE Technical Review

Lets build a Deep Learning model to predict airfoil lift Input Model Output Loss( Input , Output )

What should the input be? Lift = Model(Input)

(This will be our basis function ) = ()

How do you mathematically represent an airfoil?

{x1, y1}

How do you mathematically represent an airfoil?

T Lift = w1 x1 {x1, y1}

w2 y1 w199 x100 Loss = (Lift predicted - Lift actual)2 w200 y100 = ( wTx - Lift actual)2 Find w where:

Loss/ w = 0

How do you mathematically represent an airfoil?

T Lift = w1 cx1 w2 cy1 {t 2} {t 3}

{t 1}

{t 4}

{cx4, cy4}

{cx1, cy1}

w11 t3 Only thing we changed was the basis.

w12 t4 But the basis was fixed/ static.

What if we adapted or learned the basis?

How do you adapt a basis?

T Lift = w1 x1 {x1, y1}

w2 y1 w199 x100 Loss = (Lift predicted - Lift actual)2 w200 y100 = ( wTx - Lift actual)2

How do you adapt a basis?

T Lift = w1 g(x1) {x1, y1}

w2 g(y1) w199 g(x100 ) Loss = (Lift predicted - Lift actual)2 w200 g(y100 ) = ( wTg(x) - Lift actual)2

How do you adapt a basis? What is g?

Another model!

T Lift = w1 g(x1)

Input Model Output

{x1, y1}

w2 g(y1) Loss( Input , Output )

w199 g(x100 ) Loss = (Lift predicted - Lift actual)2 w200 g(y100 ) = ( wTg(x) - Lift actual)2

= ( w1T{w2T(x)} - Lift actual)2

How do you adapt a basis? What is g?

Another model!

T Lift = w1 g(x1)

Input Model Output

{x1, y1}

w2 g(y1) Loss( Input , Output )

w199 g(x100 ) Loss = (Lift predicted - Lift actual)2 w200 g(y100 ) = ( wTg(x) - Lift actual)2

= ((w1T{(w2T(x))}) - Lift actual)2

Now we can see that the Deep Learning model is Model (in essence) a series of chained basis transformations! g(x)

Why use Deep Learning over other (non-Deep) approaches or not?

Advantages Disadvantages

1. Fairly extensible with 1. Certain modeling modern libraries assumptions difficult to do
2. Plays nicely with other 2. Certain architectures have differentiable approaches difficulty converging or
3. Good hardware possess pathologies acceleration 3. Theory less developed than
4. Active research community some other models

+ industrial investment

Opportunities and Directions Merging of Engineering and Input Model Output Deep-Learning models Loss( Input , Output )

New invariances & constraints Cohen and Welling, 2016 on Deep Learning models arxiv.org/ abs/ 1602.07576 Generalizing Convolution de Haan et al., 2020 arxiv.org/ abs/ 2003.05425 Combining Probabilistic and Deep Learning Models

Thank you Dr. Mark Fuge Univ. of Maryland, College Park (301) 405-2558 fuge@umd.edu ideal.umd.edu

Backup Slides Now we can see that the Deep Learning model is (in essence) a series of chained basis transformations!

Minimize Sinkhorn divergence:

min ( (, ) (l) ())

Loss Conditional Formulation: Surrogate Log-Likelihood (SLL): Cost Function:

What are Generative Models doing?

(l) 1 : (l) log = log + log l det 1 l

INTRODUCTION TO NATURAL LANGUAGE PROCESSING THEORY AND APPLICATION FOR ENGINEERING Thurston Sexton Knowledge Extraction and Application Project Systems Integration Division Engineering Laboratory

DISCLAIMER The use of any products described in any presentation does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that products are necessarily the best available for the purpose.

This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. Approved for public release; distribution is limited.

BACKGROUND: PROJECT/PROGRAM OVERVIEW Knowledge Extraction and Application Much of manufacturing know-how is computationally inaccessible, within informally-written documents Create human-centric data pipelines to extract value from existing unstructured data at minimal labor cost Develop guidelines for using semi-structured data in KPI creation, functional taxonomy prediction, and customized worker training paths 3

BACKGROUND: MAINTENANCE WORK-ORDER DATA HP coolant pressure at 75 psi Hyd leak at saw atachment Major hydraulic leak at Sp#6 horseshoe Clamping spool guard broken Replaced seal in saw attachment but still leaking

- Reapirs pending with ML Bad Gauge / Low pressure lines cleaned ou Replaced - Operator could Repaired horseshoe seals have done this!

4

BACKGROUND: CURRENT MWO DATA ENTRY SPREADSHEETS Maint Tech Date Mach Description Issued By Date Up Resolution Assigned Slug detector at station 14 not working. Would not 29-Jan-16 H15 St#14 tool detect INOP JS 29-Nov-16 SA recognize Start signal.

Mitsu Brakes worn -Not 1-Jun-16 AB 28-Jun-16 Steve A Repaired FT stopping when in gear St#7 rotator collet broken -wait for Bob B 1-Jun-16 H8 JS 8-Jun-16 John Smith WORK ORDER FORMS to show him how to Machine went offline on 6/8 -Mark removed and remove instructed Bob B on removal/install process 5

Do AI to it! (...?)

Natural Language Processing (et al.) as Engineering Tools

TODAYS TALK: TAKE-HOME NLP Theory Basics

a. Data models and engineering assumptions
b. NLP Tasks and approaches
c. Metrics and Evaluation Contextualize NLP techniques, paradigms
a. How NLP concepts interface with Engineering Practice
b. Continuous interaction between experts (domain NLP)

TODAYS TALK: STRUCTURE Engineering Practice Goal & Approach State the methods followed and why.

Assumptions State your assumptions.

Measure & Evaluate Apply adequate factors of safety.

Validate Always get a second opinion.

Hutcheson, M. L. (2003). Software testing fundamentals: Methods and metrics. John Wiley & Sons.

TODAYS TALK: STRUCTURE Engineering Practice Goal & Approach State the methods followed and why.

Start Assumptions State your assumptions.

Here Measure & Evaluate Apply adequate factors of safety.

Validate Always get a second opinion.

Hutcheson, M. L. (2003). Software testing fundamentals: Methods and metrics. John Wiley & Sons.

ASSUMPTIONS That turn Natural Language into something to Process

ASSUMPTIONS: RULE-BASED VS. NUMERICAL Some very successful ways to process natural language involve rules.

Assume a language model based on known logic:

Pattern Matching (e.g. regex), coding, etc.

Clear de"nitions and transparent assumptions (iterate!)

Can be powerful and ef"cient Can be brittle and labor-intensive Newer techniques assume the text and its statistical properties alone

ASSUMPTIONS: THE CONTEXT SPECTRUM How do we turn text into numbers?

Traditional techniques come in two "avors

a. Bag-of-Words (Global Frequency and Context)
b. Markov Model (Local Sequence Probability)

Opposite answers to the question:

How much does global vs. local matter to you and/or this text?

Bag of Markov Words Model more global more local

ASSUMPTION: GLOBAL FREQUENCY & CONTEXT Basic Bag-of-Words Words in similar contexts are similar.

- Hydraulic leak at saw attachment

- Worn seal caused leak, replaced seal.

- Replaced saw, operator could have done this Hyd. leak saw seal rep. ...

Doc 1 1 1 1 0 0 ...

Doc 2 0 1 0 2 1 ...

Doc 3 0 0 1 0 0 ...

Remarkably Powerful Similarity is vector directional Documents or Terms Cosine Similarity

ASSUMPTION: GLOBAL FREQUENCY & CONTEXT Basic Bag-of-Words Modi"cations Words in similar contexts are similar. Re-weighting schemes

- Hydraulic leak at saw attachment Normalization, TF-IDF

- Worn seal caused leak, replaced seal. Ties to informational entropy

- Replaced saw, operator could have done this Dimension Reduction & Topics Hyd. leak saw seal rep. ...

Some latent set of topics:

Doc 1 1 1 1 0 0 ... Stuff we talk about has less variety Doc 2 0 1 0 2 1 ...

than words we have Acronym soup Doc 3 0 0 1 0 0 ...

PCA,SVD,LSA,NMF,LDA,TSNE,UMAP Remarkably Powerful Topic-1 ? word 1 word 2 Similarity is vector directional word 3 Documents or Terms Topic-2 ?

word 4 word 5 Cosine Similarity word 6 word 7 Topic-3 ? ...

ASSUMPTION: LOCAL SEQUENCE PROBABILITY Markov Model Next states (read: token/character) is conditionally dependent on the past:

dog 0.08 quick brown fox 0.91 cat 0.01 Useful to generate text and estimate cond. probabilities High preference for observed sequences (precision)

ASSUMPTION: LOCAL SEQUENCE PROBABILITY Markov Model Hidden Markov Model Next states (read: token/character) is What we observe are emissions from a conditionally dependent on the past: sequence of states we cannot observe.

dog 0.08 dear jane youve won quick brown fox 0.91 hidden cat 0.01 Normal Email (?) Spam (?)

Useful to generate text and Used for last-gen. language estimate cond. probabilities models, bio-informatics, etc.

High preference for observed Modular! See: GMMs, Bayes-nets...

sequences (precision)

ASSUMPTIONS: MODERN EMBEDDINGS But neural-nets?!

We like the global context, but also want local sensitivity...

Neural Nets can be trained to "nd a vector space model that balances both

a. Trained is the operative term
b. Packages/tools that let us embed text have already trained on a textual corpus You are assuming your text is like that text Otherwise these are an approachand require proper design!

Word2vec BERT Bag of Markov Words Model more global more local

ASSUMPTIONS: MORE ON MODERN EMBEDDINGS Word2Vec (2013) trains on a word-level Continuous Bag-of-Words (CBOW): target word from local context Skip-Gram: local context from target word Maintains semantic linearity (word algebra) also see GloVe (2014) lunch + night - day dinner better - good + bad worse wine + barley - grapes beer coffee - drink + snack = pastry Word2vec Bag of Markov Words Model more global more local

ASSUMPTIONS: MORE ON MODERN EMBEDDINGS Word2Vec (2013) trains on a word-level Continuous Bag-of-Words (CBOW): target word from local context Skip-Gram: local context from target word Maintains semantic linearity (word algebra) also see GloVe (2014) lunch + night - day dinner better - good + bad worse wine + barley - grapes beer coffee - drink + snack pastry BERT (2018) is a sub-word model...context (sentence) dependent!

Can capture separate semantic meaning (homophones) and out-of-vocab.

State-of-the-art in 2019; used for your Google searches.

Word2vec BERT Bag of Markov Words Model more global more local

GOALS & APPROACH NLP Tasks and The Pipeline

GOALS & APPROACHES: OVERVIEW Typical NLP Tasks (and their image-processing relatives)

a. Document Grouping, Classi"cation
b. Keyword Extraction, Multi-Label Classi"cation
c. Named Entity Recognition and Parts-of-Speech The NLP Pipeline
a. Preprocessing
b. Analyses

GOAL: DOCUMENT TYPING Clustering (Unsupervised)

Detect natural groupings for analysts to parse Also: interpreting topic models May or may not be relevant, but a useful tool

The Structure of Recent Philosophy Noichl, M. Modeling the structure of recent philosophy. Synthese 198, 5089-5100 (2021).

https://doi.org/10.1007/s11229-019-02390-8 Image distributed as CC BY 4.0 Each dot is a paper.

- Embed to 2-dimensions (UMAP)

- Clustering (HDBScan)

- Interpret, synthesize (hard)

Fully interactive online:

https://homepage.univie.ac.at/maximilian.noichl/full

/zoom_final/index.html

MSEC: A Quantitative Retrospective Sexton, T, Brundage, MP, Dima, A, &

Sharp, M. "MSEC: A Quantitative Retrospective." September 2020 https://doi.org/10.1115/MSEC2020-8440 Topic Models as an approach to typing:

- Useful understanding

- LDA for static

- Dynamic LDA over time We had to name the topics.

GOAL: DOCUMENT TYPING Clustering (Unsupervised)

Detect natural groupings for analysts to parse Also: interpreting topic models May or may not be relevant, but a useful tool Classi"cation (Supervised)

Labels required: 1 per category (mutually exclusive)

Can be useful for recommendations: relevant vs. not Images: is this a stoplight? or which animal?, etc.

Cat ? Dog ?

GOAL: DOCUMENT KEYWORDS Keyword Extraction (Unsupervised)

Use statistical properties to "nd important terms Also see: text summarization TF-IDF (sum), TextRank (graph-based), YAKE, +more cat? tree?

Multi-Label Classi"cation (Supervised)

Labels required: multiple-per-document (multiset)

Several ways to train, can use domain-knowledge Harder problem, but maybe easier to make training data Images: What animals are present?

cat? dog?

GOAL: ENTITY RECOGNITION Named Entity Recognition Find text spans that contain keywords, and annotate them Dog?

Predetermined vocabulary/taxonomy (usually 2-levels)

E.g. I went to New York [LOC] or They owe me $25 [CURR]

Cat Images: highlight and label the animals  ?

Parts-of-Speech Automatic determination of grammar information SVO triples, dependency parsing, etc. Tree Can be used to mine knowledge graphs  ?

Domain/language-dependent hard with technical text! Cat

?

GOALS: OTHERS WORTH MENTIONING Wide variety of other tasks:

Sentiment Analysis Seq2Seq & Machine Translation Reading complexity and writing quality, inclusivity Question Answering Text Synthesis What does it take to get to this point?

PROCESS: THE PIPELINE In theory, the NLP Pipeline is a Sequential progression, that Provides usable insight Actionable Preprocessing Analyses Results Impossible to outline the number of variations on this theme... Heres:

A common sequence a day-in-the-life of your analyst.

Bene"ts and drawbacks of each step

PROCESS: TEXT PREPROCESSING Technical language processing:

Unlocking maintenance knowledge.

Brundage, M. P., Sexton, T.,

Hodkiewicz, M., Dima, A., &

Lukens, S. (2021). Manufacturing Letters, 27, 42-46.

Image adapted from original.

PROCESS: TEXT ANALYSES Technical language processing:

Unlocking maintenance knowledge.

Brundage, M. P., Sexton, T.,

Hodkiewicz, M., Dima, A., &

Lukens, S. (2021). Manufacturing Letters, 27, 42-46.

Image adapted from original.

MEASURE & EVALUATE Importance of metrics and knowing what gets evaluated

MEASURE & EVALUATE: OVERVIEW Key skill of the analyst or engineer is knowing how to translate:

Qualitative needs and constraints Quantitative metrics and evaluations What do I want to measure?

Do my assumptions con"ict with the measurement?

Do the metrics assumptions con"ict with my goal/process?

Will multiple metrics provide a broader insight? (yes)

What constitutes progress toward, or success in, my goal?

Have I encoded my (stakeholder) expectations (preferences) suf"ciently?

Do I have parameters to tune (continuously and/or iteratively)?

Most important: have I transparently documented my decisions for iteration?

MEASURE What do I need to measure? Have I done my homework?

Similarity or Distance Discrete options, spellings: Levenstein, Hamming, SymSpell, Jaccard Vector/Geometry: Euclidean, Mahalanobis, Minkowski Distributions: Kullback-Leibler, Earth-mover/Wasserstein, Cross-Entropy Quality Annotation coverage, label/class imbalance (rare-event?)

Usefulness: topic perplexity, (B/A) Information Criterion Inter-rater agreement: Fleiss , Kendalls , graph-based?

Importance Information content: Shannon Entropy, log-odds, lift, sum-TFIDF Centrality: degree, betweenness, spectral (e.g. TextRank),

EVALUATE: PRECISION & RECALL NLP often involves multilabel or imbalanced classi"cation.

Accuracy is unfair or overly optimistic Precision Also Positive Predictive Value (PPV): [TP/(TP+FP)]

Of things predicted X, how many are X?

Recall Also True Positive Rate or Sensitivity: [TP/(TP+FN)]

Of the things that are X, how many were predicted X?

F-Score Harmonic mean of Precision & Recall:

Explicitly combines our preferences for the two Parameter (usually 1) : assign -times more importance to Recall than precision.

Image by Walber, distributed under a CC BY-SA 4.0

EVALUATE: THRESHOLDS AND TRADE-OFFS P

R

EVALUATE: THRESHOLDS AND TRADE-OFFS P

R

?

EVALUATE: THRESHOLDS AND TRADE-OFFS P

?

R

EVALUATE: THRESHOLDS AND TRADE-OFFS P

R

?

EVALUATE: THRESHOLDS AND TRADE-OFFS P

R

?

Sexton, T., and Fuge, M. (January 13, 2020).

"Organizing Tagged Knowledge: Similarity Measures and Semantic Fluency in Structure Mining." ASME. J. Mech. Des. March 2020; 142(3): 031111.

https://doi.org/10.1115/1.4045686

EVALUATE:

SUMMARY

Do your homework If theres something you want to measure, a metric may exist.

Metrics evaluate Use fundamentals to design metrics that assess what matters.

Metrics communicate Confusion is never the answer; strive for mutual understanding.

Remember that NLP is working on data for humans, by humans.

Be transparent and reproducible.

VALIDATION The open problem of human-in-the-loop, domain-speci"c NLP

VALIDATION: PROBLEMS So far we have glossed over some very common problems:

Interpreting topic models can be fraught 1 Out-of-the-box tools are pre-trained on very different text There is not enough data to train custom models Too hard to hand-annotate the data we have No existing standard annotation to apply, no ontology we agree on Events of interest are far too rare (unclear if over-sampling applies)

In most Engineering Design and Reliability tasks, we validate:

Sanity checks, second opinions, processes for oversight and collaboration 1

Chang, Jonathan, et al.

Reading tea leaves: How humans interpret topic models."

Neural information processing systems. Vol. 22. 2009.

VALIDATION: RE-ASSESSING THE PIPELINE Reality is never as clean as The Pipeline.

In practice, the line between input and output are not well de"ned. An analyst might use intermediary tasks and representations to enrich annotations and cascade into further tasks. A holistic approach to improving one component will inevitably improve the others; a stolid adherence to a given pipeline can prevent progress all-around.

[...]

By lowering barriers to entry for text analysis through the development of ef"ciency-boosting tools and a more human-centered annotation approach, engineers have a unique opportunity to simultaneously learn from other domains and improve on their processes. A new approach is needed to adapt NLP methods to industry use cases in a scalable and reproducible way.1 View NLP as a socio-technical system rather than as an algorithmic pipeline.

1 Brundage, Michael P., et al.

"Technical language processing: Unlocking maintenance knowledge."

Manufacturing Letters 27 (2020): 42-46.

VALIDATION: TECHNICAL LANGUAGE PROCESSING Enter Technical Language Processing NLP Techniques do not always adapt well to engineering text Current NLP solutions need to be adapted correctly for use in technical domains TLP is a methodology to tailor NLP solutions to engineering text and industry use cases in a scalable and reproducible way

Adapting Natural Language Processing for Technical Text Dima, Alden, et al.

Applied AI Letters: e33.

Image adapted from original How the TLP approach to meaning and generalization differs from NLP How data quantity and quality can be addressed Potential risks of not adapting NLP

VALIDATION: GET INVOLVED Plan for Distributed Collaboration in the TLP CoI I. GitHub Organization (just started): TLP-CoI A. Documentation - best practices for TLP, theory, etc B. Networking - curated list for state-of-the-practice: awesome-tlp C. Collaboration - base or forks for open tool repositories II. Events:

A. Past Workshop (slides):

B. TLP-COI Slack Workspace - QR code C. Other options? Webinars? Let us know!

THANK YOU Thurston Sexton thurston.sexton@nist.gov

AI Enabling Technologies Grooper and Watson Content Analytics June 29, 2021

Overview What is Grooper?

Software that provides Thrilling Automation with Intelligent Document Processing*

Use Case: Extraction of data from operator licensing (OL) applications Forms:

  • NRC Form 396 (Certification of Medical Examination by Facility Licensee)

Interfaces:

  • Electronic Information Exchange (document ingestion)
  • Reactor Program System (authoritative OL data source) 2

Grooper NRC Grooper Features Other features used:

Optical Mark Recognition (OMR) - Recognizes checkmarks Parse and De-skew, Optical extract data, Fuzzy Logic - Dictionary of Brighten, etc. Character and write in defined values that can be Recognition XML schema (OCR) OCRed or extracted based on a confidence threshold 3

Grooper AI Grooper Features Natural Language Processing and Machine Learning finds paragraphs, sentences, or other language elements in documents based on contextual meaning.

Use Case: Document sensitivity Method:

  • Manually review documents for sensitive keywords and identify true positives (in Grooper client)
  • Start to train Grooper to contextually search around area of true positive
  • Repeat with several document samples until properly trained 4

Grooper AI Grooper Features Contd*

5

Overview What is Watson Content Analytics?

Software that extrapolates business information from large collections of documents and uses natural language processing to uncover meaningful business insights.

Use Case: RES - Identify Event Reports that included an outage of two or more units NLP Method:

  • Define noun/verb combinations and NLP automatically contrives derivations of those combinations 6

References References (Indicated by an *)

BIS, Inc. (2020-2021). AI-Powered Data Integration.

Retrieved from https://www.bisok.com/.

7

AI Enabling Technologies Grooper and Digitizing Success June 29, 2021

  • Digitize Key Docketed Information
  • Making licensing and design basis information readily available to staff streamline the review process
  • Expand public access to materials
  • Comply with federal records management mandates (e.g., M19-21)
  • Reduce storage cost
  • NUDOCS microform (1979-1999) - 110K microfiche and 88K aperture cards consist of 2.3M documents or 43M images
  • 109,424 (100%) microfiche, 87,929 aperture cards digitized
  • 43,009,225 (100%) images of 43M fiche/aperture scanned
  • Over 2,355,157 PDFs generated
  • AEC Paper (pre 1978) - 1,095 boxes of paper records which consist of 205K documents or 3.2M images
  • 191 boxes, 332,879 pages digitized (COVID-19)
  • 13,619 PDFs generated
  • Components
  • Mekel Mach 7
  • Grooper software

New Searchable Image Processed with Artificial Intelligence

AI Enabling Technologies Enterprise Data Warehouse June 29, 2021

Overview What is the Enterprise Data Warehouse (EDW)?

  • Central repository of integrated data from NRCs authoritative systems D a t a S o u r c e s
  • Purpose is to provide timely, accurate data from authoritative data sources to be used for reporting and data analytics Data Warehouse

Overview Enterprise Data Warehouse Architecture

  • Interfacing systems -

NRCs operational Operational systems, authoritative Systems data sources

  • ETL Process - The Enterprise Data Warehouse extracts data ETL Process from authoritative sources, transforms it in a staging area then loads it into the EDW on Analysis a scheduled interval
  • EDW - Database that stores the data to be Data used for reporting and Warehouse data analytics Reporting /

Visualization

Overview Benefits of the Data Warehouse

  • Improved Reporting Performance and Efficiency
  • Improved Data Quality and Consistency
  • Empowers Users to Gain Data Insights Azure Cloud
  • Data Warehouse
  • Azure Analysis Services migration to Azure
  • Azure Cognitive Services Cloud

NRC AI Workshop Event Management Response Tool (EMRT) Project Relief Request Index Project Nick Mohr, Senior Technical Leader, EPRI Welding and Repair Technology Center (WRTC)

June 29, 2021 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Event Management Response Tool (EMRT)

Nick Mohr, Senior Technical Leader, EPRI Kriti Dhaubhadel, Sparkcognition Abubaker Sheikh, Sparkcognition Prateek Jindal, Sparkcognition Chris Taylor, Sparkcognition Bryan Corralejo, Sparkcognition Jaidev Amrite, Sparkcognition 2 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

What is the Event Management Response Tool (EMRT)

Single location to consolidate various data sources for searching and correlation

- Uses machine learning to refine and make future searches better Ingests various file formats (Excel, PDF, PowerPoint, etc.) to make unstructured data structured Allows previews of relevant locations within the document to Structured Data (e.g. Excel, Access, etc.)

ensure downloading is valuable Unstructured Data (Word, PDF, PowerPoint, etc.)

3 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Purpose and Objectives

  • Goal to increase productivity by:
  • Reduction of time associated with finding the needed research products
  • Display the most relevant information based on a member search within research products
  • Reduction of time associated with finding Code and Regulatory information (e.g. regulatory submittals, content within Nuclear Value/Objective:

Regulatory Research, etc. )

Provide EPRI members the needed

  • Reduction of time associated with find information to make informed decisions operating experience and lessons in one location in reduced time learned from other EPRI members related to event 4 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Event Management Response Tool (EMRT)

EPRI Research Code and Operating Regulatory Experience Member Input Event Management Response Tool OBJECTIVE: Provide members needed info in one location to make informed decisions in reduced time 5 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

EMRT: Natural Language Processing & Access Full Data Library Input Display of Output Feedback Loop Machine Learning &

Typeahead search based on prior search terms Refinement of Output Current input is text string but we would like to also use other input methods in the future 6 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

EMRT Search Results - 3 Locations Display Content 1 2 3

1) Tabulated search results 3) Preview of user-selected search result
2) Preview of user-selected search result with ability to scroll to different pages within the respective document within the respective document 7 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

EMRT-Regulatory Information Example 4

4) Suggested relevant topics on initial search Ability to Scroll within document to determine if it is valuable to download. Download button takes user to NRC site.

screen and for specific searches 8 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

EMRT-Regulatory Information-NRC ADAMS Document Library Regulatory information is necessary to make decisions NRC ADAMS contains a number of publicly available documents (subset shown)

Currently, users search ADAMS but finding data can be difficult We can use NLP and machine learning if we ingest and extract the data from these documents This would help members search this information more effectively Use of existing NRC Application Programming Interface (API) permits filtering by document type 9 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

EMRT-Regulatory Information (focus NRC ADAMS)

Metadata NRC API Code and Use Document Types to focus on desired documents Regulatory Reference Safety Analysis Report Reference Safety Analysis Report, Amendment Regulatory Analysis Regulatory Guidance Regulatory Guide Regulatory Guide, Draft Report of Proposed Activities in Non-Agreement States, NRC Form 241 Report, Administrative Report, Miscellaneous Report, Technical example Request for Access Authorization Request for Additional Information (RAI) selection Request for OMB Review Request for Procurement Action (RFPA), NRC Form 400 (yellow Request for Review of OMB Reporting Requirements RES Office Letter highlighting)

Research Information Letter (RIL)

Resume Reviewer Comments on Conference/Symposium/Workshop Paper Route Approval Letter to Licensee Routine Status Report (Recurring Weekly/Monthly)

PDF Documents Rulemaking- Final Rule Rulemaking- Proposed Rule 10 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Project Overview-High Level

  • Prototype was developed with small subset of information
  • Alpha Version was completed late 2020 incorporating a large set of 2020 data and EPRI member and personnel feedback and suggestions
  • Beta Version is currently being developed that will include larger set of information (EPRI Nuclear research, EPRI OE (meeting materials, 2021 surveys, etc.), NRC ADAMS data)
  • Incorporate feedback from users and consider other sources of Operating Experience, etc.

2022 11 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Relief Request Index Project Craig Harrington, Technical Executive, EPRI Nick Mohr, Senior Technical Leader, EPRI Jacqueline Espinoza, Beyond the Arc Steven Ramirez, Beyond the Arc 12 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

2020-2021: Relief Request Index-Proof Of Concept Research Question:

Can we apply modern text mining and natural language processing techniques to curate a body of knowledge that would be helpful to plant engineers who are addressing welding repairs and material reliability situations?

NRC ADAMS is a large source of valuable information but can be difficult to find desired information.

Value:

  • Reduce time spent finding complete series of request for alternatives relief requests
  • The curated index assists users in understanding:
  • Where code cases have been used
  • Any potential conditions that should be addressed when a similar request is being submitted
  • Identify new trends 13 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Background

ASME Code Case EPRI decided to explore a Number in Series N-432 proof of concept in 2021 using N-504 N-562 subset of desired code cases N-638 N-661 Index to filter by these topics: N-666 N-722

- ASME Code Case Number N-729 N-740

- Systems / Assets N-752 N-762

- Relief Requests for Inspection N-766 N-770

- Relief Requests for Repair N-786 N-789

- Plant Name N-818 N-839

- Operator N-853 14 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Process Flow l Creating Code Relief Series Query the ADAMS database for Relief Requests (in title or document type) 45,000 Identify those Relief Requests that (93%)

include ASME Code Cases of interest Identify duplicate documents and remove them (49%)

(41%) (73%) (56%) Identify documents that include PWR 3,368 1,721 1,013 269 118 plants and isolate them Relief Requests ASME Code Remove PWR Docket 2nd Relief 2nd ASME Code Cases Duplicates Numbers Request Cases Identify documents with an API document type that equals Relief Each bar represents the number of leading documents found after Request and isolate them applying the filters described to the right. The objective of the filters is to isolate the most relevant records. Remove Relief Requests that do not The percentages represent the reduction in records after each filter is include the ASME Code Cases of deployed. interest 15 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Process Flow l Creating Code Relief Series Extract More Records Organize the Records Refine the Topical Datasets

  • Convert PDF files to TXT
  • Group records by the Origin
  • Remove records within the Series
  • Tag these documents as document that are not related to the Relief Origin(*) records
  • Organize by topical dataset Request for an ASME Code Case
  • Run NLP algorithm to extract beginning with the oldest date to
  • Remove duplicate series reference numbers, dates, and most recent within dataset accession number
  • Assign each dataset a three-digit
  • Query ADAMS for additional Series number records based on origin record
  • This designation means that these documents are the ones used to expand the search for related records.

16 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Home Page Blue buttons lead to different views of the curated data Beginning Record Additional Correspondence Ending Record 17 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Datasets organized by the ASME Code Case number that appears in the Relief Request A visualization is provided on the home page of each of the topics.

18 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

View of datasets for Code Case N-740 Link to In this view, Relief Request datasets are organized by the ASME Code Case number that appears in the beginning or ending record.

Abstract The title Series 105 headers indicate the start of a unique series.

Series 072 19 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Each Series has an NLP Developed Abstract 20 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Where are we going next Easier way to find a complete series of information We can now look at data in new ways

- What does this data mean?

- Are we seeing initial trends (example: start of degradation in certain components, need for new Code changes, research, etc.)

Next Steps:

- Mine a larger NRC ADAMS data set now that the process has been developed and determine if there are any interesting trends

- Obtain broader member feedback from proof of concept Future: Potential to use developed process on structured NRC ADAMS dataset from Event Management Response Tool (EMRT) project and other code cases, requests for alternative 21 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Questions?

22 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

TogetherShaping the Future of Energy' 23 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Power Industry Dictionary for Text-Mining and Natural Language Processing Application Proof of Concept Karen Kim-Stevens, kkim@epri.com EPRI Principal Project Manager, Radiation Safety U.S. NRC Data Science and Artificial Intelligence Regulatory Applications Workshops Workshop #1 June 29, 2021 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Today, NLP tools will parse words based on their more common usage Groundwater A Generic NLP dictionary B Contamination Classifier NRC.gov text drain drain cooler relief valve On April 6, 2006, a drain noun cooler noun Word indexing cooler Vs.

relief valve noun in the feedwater ? system noun feedwater system lifted and remained open relief valve Objective: Build a Nuclear Industry NLP Dictionary 2 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Our use case for this proof of principle Groundwater Contamination Use case owner: Karen Kim-Stevens Goal: Develop a NLP proof of principle that demonstrates the potential benefits of machine learning applied to this domain.

Tasks Create a preliminary dictionary to be used for classification.

Develop a NLP text analytic demo and generate preliminary insights.

Benefits Natural language text analytics will help the industry enhance preparation and implementation of mitigating actions in the event of inadvertent leaks and spills of radioactive materials.

Scenario The industry has thousands of text documents from operating experiences, maintenance reports, work orders, regulatory filings, and more that reference Groundwater Contamination. Given the safety significance and the need to find ways to operate more efficiently, all nuclear plants would benefit from extracting and sharing key information from these documents to make quicker, informed decisions, reduce the number of inadvertent spills and leaks, and enhance the safety and response time to a contamination situation.

3 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Potential use cases to develop risk mitigation strategies Identify Specific SSCs Identify which SSCs could be associated with a failure and release radioactive liquid into the environment

  • How have the sources of SSC leaks and spills changed over time?
  • Does the age of the plant impact the components?
  • Do certain components leak after a certain amount of time in service?

Identify which work practice tasks could be associated to which jobs or systems that could cause the most release of radioactive liquid into the environment Work Practices

  • Do work practices during planned vs. unplanned outages affect the prediction?
  • Do routine vs. non-routine affect the prediction?
  • How have leaks from work practices changed over time?

Identify how concentration of radioactive material varies by type of leak or spill

  • How much does the concentration vary by SSC or WP?
  • Does the magnitude vary by for SSCs at BWR vs. PWR plants?

Concentration of

  • Can this information be used to help plants identify the source of leaks or radioactive material spills?

4 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

To ensure data integrity and quality results, we followed a structured data science approach NRC.gov Report 5 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

These NLP techniques help to get preliminary results quickly What is it? Why? Example Algorithms to segment text Tokens become the inputs Text 1 Tokenization into groupings, phrases, for conducting text mining.

punctuation, called tokens.

Tokens Statistical models that assign a These tags enable modeling part of speech to each token - to infer the relationships A l relief l valve l in l the l feedwater l system 2 Part of Speech noun, verb, adjective, adverb, between words in phrases DET ADJ Noun ADP DET Noun Noun etc. and sentences.

Statistical models that assign Entity recognition is helpful 3 Name Entity labels to tokens such as date, for information extraction Recognition quantity, location and more. and filtering.

Algorithms that find phrases, Improves information 4 Rules-based sequences of tokens, and extraction and text mining.

Matching entities. This approach is one way to directly incorporate SME input.

6 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Architecture for the library of dictionaries Systems, Structures Work Practices Corrective Actions Plants and Components (SSC)

Auxiliary Pumps Filling tanks Store radioactive Repair BWR material Blowdown Engineering Support activity Rad waste Restore PWR design Radwaste Temporary Buildings Maintenance Replace discharge hoses Modification Transfer Concrete pits Seals program radioactive fluids Drains Sumps Valve Operational alignments Liquid radwaste Tanks Pipelines Valves Power block Wells The dictionary map provides us with guidance on how topics are organized, overlap and relate to each other. The design evolves based on programmatic exploration and feedback from our SME.

7 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Each word has an additional level of word associations that serve as training topics for machine learning Systems, Structures rad waste and Components (SSC) rad waste facility rad waste floor drain Auxiliary Pumps rad waste peo rad waste system Blowdown rad Radwaste waste rad waste yard tanks Radwaste LLRW condensate storage tank Buildings discharge low level rad waste condensate head tank Seals low level rad waste facility hydrazine mix tank Concrete pits non-rad waste system mobile tank Drains Sumps NRW sump drain tank sump pump tank Liquid radwaste tanks Tanks water storage tank tank storage area Pipelines Valves CST Power block Wells 8 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

The lack of a consistent industry nomenclature is a key challenge in building NLP models For example groundwater picocuries pits power block seismic gap storm drains

  • gw
  • pCi/L
  • basins
  • auxiliary building
  • cracks
  • drain systems
  • ground water
  • pCi
  • moats
  • auxiliary system
  • rattle space
  • roof drains
  • ground-water
  • pCi / L
  • motes
  • rad waste building
  • seals
  • storm systems
  • gnd water
  • picocuries / liter
  • ponds
  • radwaste building
  • structural joints
  • yard drains
  • g water
  • picocuries per liter
  • g-water
  • pCi/liter
  • pCi / liter These word variabilities were identified through NLP algorithms and augmented by our subject matter expert (SME). A SME needs to provide guidance as we use the NLP tools.

9 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Preliminary Visualizations Counts for systems, structures, components Tritium activity (logarithmic scale) of reported events associated with groundwater incident reports extracted from corpus for one type of reactor design 10 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Elements of our work are transferable and can help others get started on a similar project 11 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Roles to make this type of project successful Project Sponsor

- The project sponsor is the person or group who owns the project. They hold overall accountability of the project and are responsible for providing resources, support and guidance to enable success. This role ensures that the analysis is aligned with research and business goals.

Subject Matter Expert (SME)

- The SME plays a vital role in helping the data scientist understand the data and its nuances. This role will evaluate the text analytic output, ensure it is producing relevant results, and help to describe the specific real world problem that the machine learning project is trying to solve.

Data Scientist

- A data scientist collects, analyzes, and interprets large amounts of data. Their skills and expertise in highly advanced analytical tools enables them to understand the data and develop operational models, systems and tools by applying experimental and iterative methods and techniques.

Data Analyst

- A data analyst examines the patterns, trends, and other insights extracted from the data. They are responsible for deriving meaningful, actionable insights from the data. They support the project by creating visualizations.

12 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

Key Takeaways Open-source dictionaries do not understand electric power industry language An industry-specific dictionary is needed to conduct text mining and apply NLP-based algorithm A workflow template for dictionary construction is repeatable that can be applied to new topics The development of an industry specific dictionary will require investment However, the nuclear industry will benefit from more efficient ways of digesting and applying industry data and knowledge 13 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

For More Information:

Please download Quick Insight - Power Industry Dictionary for Text-Mining and Natural Language Processing: Proof of Concept.

https://www.epri.com/research/products/000000003002019609 14 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

TogetherShaping the Future of Electricity 15 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

High Level Process for NLP Projects 16 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

High Level Process for NLP Projects NRC.gov Report 17 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

High Level Process for NLP Projects Parts-of-Speech Tagging Move forward, when you have enough understanding of the text to work with it as data Word Frequency 18 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

High Level Process for NLP Projects Pattern Matching Algorithms determined contamination due to Unit 1 Spe monitoring wells installed due to historical onsite tritium contamination due to past opera Key Terms Algorithms liquid rad waste tank storage liquid rad waste processing temporary systems effluent aux storm drain systems system power block rad storage area cathodic protection system source Simultaneously search key words and Move forward, when you have enough patterns in the corpus structure to begin text mining 19 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

High Level Process for NLP Projects Magnitude of Leaks/Spills Iterate. The analysis will determine if more extractions are needed High Risk Components PWR plants 20 www.epri.com © 2021 Electric Power Research Institute, Inc. All rights reserved.

INPO and Data Science Paul Steiner Manager Data Management and Industry Trends

© Copyright 2021 Institute of Nuclear Power Operations

Data Science Application

  • Supports monitoring station and corporate performance between evaluations/peer reviews
  • Informs application of resources

© Copyright 2021 Institute of Nuclear Power Operations

Data Science Tools - Current

  • Neural Models Applying Artificial Intelligence

© Copyright 2021 Institute of Nuclear Power Operations

Neural Modeling

  • Hundreds of data points are collected monthly
  • Experience records are continuously reported
  • Thousands of indicators are developed combining the data points and experience records
  • Effects based models - limits subjectivity
  • Neural modeling identifies patterns within these indicators that correlate to overall and area assessments

© Copyright 2021 Institute of Nuclear Power Operations

Data Science - Future

  • Neural Forecasting
  • Scram Correlations
  • Equipment Failure Correlations
  • Predictive, Behavior-Informed Modeling

© Copyright 2021 Institute of Nuclear Power Operations

Questions?

© Copyright 2021 Institute of Nuclear Power Operations