Text

Meeting Slides 20230524-final
	ML23262B188
Person / Time
Issue date:	05/24/2023
From:	Chang Y, Friedman C, Mishkin A, Polra S, Pringle S, Tanya Smith, Vasquez G; NRC/RES/DRA/HFRB, Sphere of Influence
To:	;
References
	Download: ML23262B188 (1)
	v • d • e

Machine Learning Demo Wednesday Prioritizing Inspections using ML Alec Mishkin, Guillermo Vasquez, Stuti Polra, Casey Friedman, Scott Pringle, Theresa Smith Wednesday, May 24, 2023

Agenda Varying Inputs and Topic Representations Topic Modeling Metrics and Domain Expert Evaluation Topic Modeling Metrics Applied to Experiments Further Investigation Progress 2

Varying Inputs and Topic Representations 3

Varying Inputs and Topic Representations for Cluster Formation

3 inputs x 4 cluster sizes x 2 representations = 28 experiments run 3. Topic Representation (5)

PLUS 3 custom representations computed for all 28 experiments TF-IDF on input text in each topic cluster:

Stopwords (custom list of 95 words/phrases) removed from each of the 5 representations (top 100 topic terms/phrases) as a post-processing step BERTopic MMR (diversity = 0.6)

2. Topic Modeling Parameters (4) BERTopic MMR + POS MMR (diversity = 0.6) + POS (NOUN, PROPN, ADJ-NOUN, ADJ-PROPN)

MMR (diversity = 0.6);

MMR (diversity = 0.6) + POS (NOUN, TF-IDF, Counts: String matching on full item-PROPN, ADJ-NOUN, ADJ-PROPN) intros in each topic cluster:

1. Topic Modeling Input (3) Vocabulary 1411 abbreviations + full forms + failure modes n-grams range of 1-3 Item Introduction min cluster size 10, 20, 40, 60 Key Phrases (66,325 words/phrases extracted from Item Item Introduction Summary Introductions using KeyphraseVectorizer +

15 neighbors, 5 components Guided KeyBERT with vocab of 1411 (Pegasus_cnn_dailymail model) abbreviations, full forms and failure modes)

Item Introduction Key Phrases all-MiniLM-L6-v2 (KeyphraseVectorizer + Guided KeyBERT with Vocabulary + Key Phrases custom vocab of 1411 abbreviations, full forms and (67,402) failure modes) 4

Topic Modeling Metrics and Domain Expert Evaluation 5

Testing the accuracy of the coherence metric As mentioned previously, metrics will never replace subject matter experts We had Guillermo classify 60 topics (20 from three different models) as good, intermediate, or bad in coherence quality We then compared the results from our coherence metric to Guillermos results and got the following results Guillermos Results Guillermos results suggest that mmr-pos > no-rep > mmr Our results also suggest that mmr-pos > no-rep > mmr We will keep Guillermos results as a baseline whenever we test new coherence metric schemes 6

A Larger Corpus The previous slide showed promising results, however that dataset only contained uni-grams The next tests are going to involve bi-grams and tri-grams. To give us the best results, we need a larger corpus.

Our new corpus is made from 269 NRC technical report (NUREGs) and has 329000 unique words compared to our previous corpus with only 27000 unique words There are still many missing words in our reference corpus, so we will continue to improve upon that Before we used the larger corpus on new experiments, we re-ran the experiment from two weeks ago and made sure results were still consistent.

Small Corpus Large Corpus 7

Topic Modeling Metrics Applied to Experiments 8

Experiment 1 Stuti has been performing many different experiments using different representations, cluster sizes, and inputs The results presented below are the coherence scores from BERTopic MMR models using

- Item introduction as input (Blue)

- Key-Phrase from Item Introduction as Input (orange)

- Summary from Item Introduction as input (green)

Using the Item Introduction as input seems to create the most cohesive results We plotted these results using the top 10 - 70 words and from 30 - 70 words this trend held

- The plot on the left uses the top 50 words 9

Experiment 2 The results with Stutis custom representations have yielded the most positive feedback from the NRC Below we have plotted results using Vocab representation, Key-Phrases Representation, and Vocab + key-phrases representation Again, we see the item-introduction as the most coherent input The Key-Phrases representation appears to show the best results (Event better than the mmr representation from the previous slide) 10

Custom vs Default Representation Comparison Key-Phrases Representation MMR Representation (BERTopic Default) 11

Conclusion/Next Steps All the presented results suggest that the Item Introduction is the best input. However, it is possible that our reference corpus is biasing that result, so we will need to be careful with the analysis We also want to continue to grow our reference corpus

- Use more of the NUREGs to create an even more expansive corpus

- Add files from Licensee Event Reports to the corpus as well Have Guillermo perform his own quality analysis on a subset of the previous experiments Perform diversity metric calculations on the previous experiments 12

Takeaways and Next Steps Domain expert feedback on experiments varying input and topic representations, compare with metrics Enlarging reference corpus to include more documents that discuss the kinds of safety issues and their context that we are trying to discover from inspection findings text In progress: Eliminating stochasticity

- Computing MMR and MMR+POS representations on the same BERTopic model so all 5 topic representations (including 3 custom) can be compared on the same topic clusters from the same run

- Setting random state for UMAP dimensionality reduction if performance is not heavily impacted In Progress: Outlier Reduction

- 4 different outlier reduction techniques available from BERTopic

- Compute metrics on topics before and after various outlier reduction techniques TBD: Topic model and its steps have not been extensively tuned yet, which can affect results

- Embedding model (finetuned on domain data)

- Embedding reduction and clustering tuned together 13

Further Investigation 14

Further Investigation Based on our research and analysis in the three phases of this project, we have identified 5 topics that warrant further investigation and prototyping. Each of these initiatives could be pursued independently and would be similar in size to the current project. Additional details on selected topics will be provided.

Document Gisting tool to Accelerate Analysis Dynamic Analysis and Discovery Assist analysts in understanding large volumes of documents quickly. Enable analysts to explore and understand Safety Issues quickly.

Machine generated summaries Software and storage for dynamic pivots, analyst directed queries Quick understanding of numerous documents. Find sites with similar connection structures.

Inspection Reports, LERs, and others. Dynamic topic modeling to see how safety clusters change through time.

Cluster Representation - Names and Descriptions Safety Event Alerting Provide analysts with better insights into clustered safety issues Inform analysts of potential safety events.

Custom defined named entity recognition + customized pattern matching Text classification - Use inspection reports to train a system that can Topic modeling by category classify events and LERs into cornerstones + cross-cutting areas.

Text similarity of input between documents and inspection procedures Safety Cluster Tuning - a Safety Cluster Tuning - b Guided Cluster Discovery using custom vocabularies. Fine-tuning a pre-trained model or a custom trained model.

Influence cluster formation with NRC specified safety terms and concepts. Enhance existing cluster models with NRC specific language.

Use supervised or semi-supervised methods to discover clusters rather Enable Text summarization or question-answering.

than relying on fully unsupervised approaches.

Safety Cluster Tuning - c Pre-training a language model with NRC text. Tool SOTA language models suited for scientific/engineering text.

Use regulations, manuals, inspection procedures, licensee event notifications + reports, inspection reports, part 21 reports and more.

Model 15

Progress 16

SOW Task Status Phase I: March 6, 2023 - April 9, 2023 Status Phase II: March 20, 2023 - May 7, 2023 Status Describe the Problem Complete Platform/system selection and installation Complete Search the Literature Complete Data acquisition and preparation Complete Select Candidates Complete Feature pipeline engineering Complete Select Evaluation Factors Complete Clustering method experimentation & selection Complete Develop evaluation factor weights Complete Cluster pipeline engineering Complete Define evaluation factor ranges Complete Anomaly detection (as needed) Not needed Perform assessment Complete Model Development, Training, Evaluation Complete Report Results Complete Test harness development Complete Deliver Trade study report Complete PoC integration and demonstration Complete Trial runs and evaluation Complete Demonstrate PoC capability Complete Phase III: April 19, 2023 - June 16, 2023 Status Live data ingestion In progress Model execution In progress Cluster evaluation In progress Critical Method documentation Not started Technical Report Document Not started Deliver final report with findings Not started 17

ML23262B188

Text

Navigation menu

Search