Revision as of 20:52, 16 January 2022

Text

5 Trey Hathaway - Resource Prediction Using Nlp
	ML21277A098
Person / Time
Issue date:	08/18/2021
From:	Hathaway T; NRC/RES/DSA
To:	;
	Dennis M
References
	Download: ML21277A098 (18)
	v • d • e

@@ Line 15: / Line 15: @@
 =Text=
-{{#Wiki_filter:Resource Prediction Using Natural Language Processing Trey Hathaway U.S. Nuclear Regulatory Commission RES/DSA/AAB August 18, 2021 NRC Data Science and Artificial Intelligence Regulatory Applications Workshops:
+{{#Wiki_filter:}}
-Current Topics
-Natural Language Processing
-* Techniques that allow computers to understand the contents of natural language
-   - Allows for the extraction of information and insights from documents
-   - Collection of techniques:
-* Rule-based, statistical, or neural
-Apply Natural Language Processing techniques to NRC Use Cases data and use cases Goals Demonstrate Successes
-* Challenge: Deviations between resource estimates to complete a licensing review and the actual hours charged
-* Goal: Create tool to assist project managers in formulating resource Resource    estimates
-               - Leverage historical data
-               - Find historically similar reviews Prediction
-* Method: Use term frequency-inverse document frequency vectors to represent documents and perform similarity calculations
-               - Rank documents based on similarity
-* Term Frequency-Inverse Document Frequency (tf-idf)
-              - Weighting factor for words
-              - Product term frequency and inverse document frequency Resource
-* Term Frequency (tf)
-              - How frequency a word appears in a Prediction document
-              - Importance of word
-* Inverse document Frequency (idf)
-              - How frequently a word appears in a collection of documents
-Term Frequency-Inverse Document Frequency (Vector Representation) wordz wordx
-* Represent a document as a vector
-   - The vector reflects the word usage in the document
-   - The vector will have 1000s of dimensions
-Term Frequency-Inverse Document Frequency (Vector Space Corpus) wordz wordx
-* Represent the collection of documents as vectors
-                      - Create a vocabulary of all words used in the collection
-Term Frequency-Inverse Document Frequency (Similarity Calculations) wordz wordx
-* A new document is converted to a vector based on the vocabulary of the collection of documents
-             - The similarity (angle between vectors) is calculated as the dot product between vectors
-             - Documents ranked by similarity score
-Approach
-* Acquire historical licensing actions and resource requirements Resource
-* Extract text data from pdf files
-* Clean data Prediction
-* Create tf-idf matrix
-* Create User Interface
-              - Extracts text data
-              - Performs similarity calculations
-Resource Estimation Tool Resource Estimation Tool
-* Preliminary acceptance testing complete
-              - Historical data provides reasonable Current estimates of required resources and review durations
-* NRR/EMBARK and NRR/DORL Status   coordinating to finalize visualizations
-* Develop and deploy final User Interface and
-* Potential Follow-on Work:
-              -  Search capabilities Follow-on     -
-Predict Branch assignments Predict Standard Review Plan Work     -  Predict which Regulatory Guide(s) was used for the licensing action
-* Challenge: Title 10 of the Code of Federal Regulations (CFR), and other regulatory documents, reference Regulatory   sections of 10 CFR
-               - Revisions to 10 CFR could impact other Named sections
-* Goal: Create a tool to find and extract 10 CFR references from Entity   documents
-* Method: Use Named Entity Recognition   Recognition (NER) to label text as regulations and extract that text
-Named Entity Recognition SpaCy Default Entities Addition of NRC Specific Language Patterns
-* Used Python package Spacy
-CFR Reference Identification Tool 10 CFR Reference Identification Tool 10 CFR Reference Identification Tool
-* Natural Language Processing is a powerful tool to leverage unstructured data in historical documents Conclusions
-* Deploying these tools would increase efficiency of staff by reducing time required for manual searches
-               - Staff can leverage historical data in informing decisions}}

ML21277A098: Difference between revisions

Revision as of 20:52, 16 January 2022

Text

Navigation menu