ML21277A098: Difference between revisions

From kanterella
Jump to navigation Jump to search
(StriderTol Bot insert)
 
(StriderTol Bot change)
Line 15: Line 15:


=Text=
=Text=
{{#Wiki_filter:Resource Prediction Using Natural Language Processing Trey Hathaway U.S. Nuclear Regulatory Commission RES/DSA/AAB August 18, 2021 NRC Data Science and Artificial Intelligence Regulatory Applications Workshops:
{{#Wiki_filter:}}
Current Topics
 
Natural Language Processing
* Techniques that allow computers to understand the contents of natural language
  - Allows for the extraction of information and insights from documents
  - Collection of techniques:
* Rule-based, statistical, or neural
 
Apply Natural Language Processing techniques to NRC Use Cases data and use cases Goals Demonstrate Successes
* Challenge: Deviations between resource estimates to complete a licensing review and the actual hours charged
* Goal: Create tool to assist project managers in formulating resource Resource    estimates
              - Leverage historical data
              - Find historically similar reviews Prediction
* Method: Use term frequency-inverse document frequency vectors to represent documents and perform similarity calculations
              - Rank documents based on similarity
* Term Frequency-Inverse Document Frequency (tf-idf)
              - Weighting factor for words
              - Product term frequency and inverse document frequency Resource
* Term Frequency (tf)
              - How frequency a word appears in a Prediction document
              - Importance of word
* Inverse document Frequency (idf)
              - How frequently a word appears in a collection of documents
 
Term Frequency-Inverse Document Frequency (Vector Representation) wordz wordx
* Represent a document as a vector
  - The vector reflects the word usage in the document
  - The vector will have 1000s of dimensions
 
Term Frequency-Inverse Document Frequency (Vector Space Corpus) wordz wordx
* Represent the collection of documents as vectors
                      - Create a vocabulary of all words used in the collection
 
Term Frequency-Inverse Document Frequency (Similarity Calculations) wordz wordx
* A new document is converted to a vector based on the vocabulary of the collection of documents
            - The similarity (angle between vectors) is calculated as the dot product between vectors
            - Documents ranked by similarity score
 
Approach
* Acquire historical licensing actions and resource requirements Resource
* Extract text data from pdf files
* Clean data Prediction
* Create tf-idf matrix
* Create User Interface
              - Extracts text data
              - Performs similarity calculations
 
Resource Estimation Tool Resource Estimation Tool
* Preliminary acceptance testing complete
              - Historical data provides reasonable Current estimates of required resources and review durations
* NRR/EMBARK and NRR/DORL Status  coordinating to finalize visualizations
* Develop and deploy final User Interface and
* Potential Follow-on Work:
              -  Search capabilities Follow-on    -
Predict Branch assignments Predict Standard Review Plan Work    -  Predict which Regulatory Guide(s) was used for the licensing action
* Challenge: Title 10 of the Code of Federal Regulations (CFR), and other regulatory documents, reference Regulatory  sections of 10 CFR
              - Revisions to 10 CFR could impact other Named sections
* Goal: Create a tool to find and extract 10 CFR references from Entity  documents
* Method: Use Named Entity Recognition  Recognition (NER) to label text as regulations and extract that text
 
Named Entity Recognition SpaCy Default Entities Addition of NRC Specific Language Patterns
* Used Python package Spacy
 
10 CFR Reference Identification Tool 10 CFR Reference Identification Tool 10 CFR Reference Identification Tool
* Natural Language Processing is a powerful tool to leverage unstructured data in historical documents Conclusions
* Deploying these tools would increase efficiency of staff by reducing time required for manual searches
              - Staff can leverage historical data in informing decisions}}

Revision as of 20:52, 16 January 2022

5 Trey Hathaway - Resource Prediction Using Nlp
ML21277A098
Person / Time
Issue date: 08/18/2021
From: Hathaway T
NRC/RES/DSA
To:
Dennis M
References
Download: ML21277A098 (18)


Text