ML24158A229
| ML24158A229 | |
| Person / Time | |
|---|---|
| Issue date: | 07/09/2024 |
| From: | Christian Araguas, Barber D, Dickerson K, Lin J, Matthews G NRC/RES/DRA/HFRB, Univ of Central Florida |
| To: | |
| Shared Package | |
| ML24158A227 | List: |
| References | |
| RIL 2022-11 | |
| Download: ML24158A229 (1) | |
Text
RIL2022-11 HUMAN PERFORMANCE TEST FACILITY (HPTF)
VOLUME 4 - SUPPLEMENTAL EXPLORATORY ANALYSES OF TASK ORDER EFFECTS Research Information Letter Office of Nuclear Regulatory Research Date Published: July 2024 Prepared by:
N. Hughes1 J. Lin2 G. Matthews2 D. Barber 2 K. Dickerson1 U.S. Nuclear Regulatory Commission Office of Nuclear Regulatory Research Division of Risk Assessment Human Factors and Reliability Branch Washington, DC 20555-0001 Institute for Simulation and Training, University of Central Florida 3100 Technology Pkwy Orlando, FL 32826 Niav Hughes Green & Kelly Dickerson NRC Project Managers
B-3-1 Disclaimer Legally binding regulatory requirements are stated only in laws, NRC regulations, licenses, including technical specifications, or orders; not in Research Information Letters (RILs). A RIL is not regulatory guidance, although NRCs regulatory offices may consider the information in a RIL to determine whether any regulatory actions are warranted.
3-2 PREFACE HPTF RIL Series (RIL 2022-11) Preface Much of the basis for current NRC human factors engineering (HFE) guidance comes from data from research conducted in other domains (e.g., aviation, defense), qualitative data from operational experience in nuclear power plants (NPPs), and a limited amount from empirical studies in a nuclear environment. The Commission, in staff requirements memorandum (SRM)
SECY-08-0195, approved the staffs recommendation and directed the staff to consider using generic simulator platforms for addressing human performance issues, as simulators provide a tool to gather more empirical nuclear-specific human performance data. These data would enhance the current information gathering process, thus providing stronger technical bases and guidance to support regulatory decision making.
The former Office of New Reactors (NRO) issued a user need for the Office of Nuclear Regulatory Research (RES) to update its human factors (HF) review guidance with regards to emerging technologies (User Need NRO-2012-007) and more recently the Office of Nuclear Reactor Regulation (NRR) issued a follow-on user need with the same purpose (User Need NRR-2019-008). In the spring of 2012, the NRC sponsored a project to procure a low-cost simulator to empirically measure and study human performance aspects of control room operations to address the human performance concerns related to current as well as new and advanced control room designs and operations. Using this simulator, the Human Factors and Reliability Branch (HFRB) in the RES Division of Risk Assessment (DRA) began a program of research known as the NRC Human Performance Test Facility (HPTF) to collect empirical human performance data with the purpose of measuring and ultimately better understanding the various cognitive and physical elements that support safe control room operation. Additionally, the baseline methodology documented in these volumes will enable HRA data research that will address key gaps in available data for topics such as dependency and errors of commission, improving the state-of-the-art of Human Reliability Analysis (HRA) and thus dual HF and HRA data missions.
Recognizing the essential role of data to our HF and HRA programs, the NRC historically approached data collection through multiple avenues - all with their inherent strengths and weaknesses:
- 1. Licensed Operators - controlled experiments at the Halden Reactor Project
- 2. Licensed Operators - the Scenario Authoring, Characterization, and Debriefing Application (SACADA) database capturing training scenarios
- 3. Novice populations - scientific literature, laboratory settings - non-nuclear The HPTF program captures data from both novice and operational populations and the work is specifically targeted to the nuclear domain. In addition, the HPTF methodology expands upon these data collection methods by also including formerly licensed operators and other individuals with nuclear domain expertise (e.g., former plant designers, PRA experts, NRC staff). The HPTF methodology (described in detail in RIL 2022-11 Volume 1) enables the NRC to fill in the gaps from the other 3 data collection activities and conduct responsive research to support the informational needs of our users (e.g., NRR HFE technical reviewers and HRA analysts).
The intent of the HPTF was to design experiments that balanced domain realism and laboratory control sufficiently to collect systematic, meaningful, human performance data related to
3-3 execution of common nuclear main control room (MCR) tasks. Three large-scale experiments were conducted to address challenges associated with developing a research methodology for using novices in a highly complex, expert driven domain. These three experiments are reported as Studies 1 and 2 in RIL 2022-11 Volume 1, which describes the approach and methodology underlying this research effort and the resulting findings for the series of studies. In RIL 2022-11 Volume 2, the Volume 1 findings were further validated via a fourth data collection by testing a formerly licensed operator population using a full-scale, full-scope simulator. Cross-experiment comparisons were enabled by leveraging a formerly licensed operator as a member of the research team to serve as senior reactor operator (SRO) and ensure participants received an experience as similar and structured as possible to the studies in Volume 11.
The HPTF team works with the technical staff in the user offices to ensure pertinent research questions can be addressed within the constraints of the HPTF methodology. HPTF research questions are formulated collaboratively between NRC staff and a contractor with an identical simulator and performance assessment capabilities.
Three experimental design workshops have been held to date. The first workshop was held on March 5 and 6, 2018, upon completion of the first three HPTF experiments. The direction resulting from this first workshop was to validate the methodology and generalize the findings from the baseline HPTF experiments by using formerly licensed operators as participants to complete an experimental scenario using an analog, full-scope, full-scale simulator and a digital, part-task simulator. RIL 2022-11 Volume 2 describes the research approach and findings for the fourth experiment in the series.
The second workshop was held on August 20 and 21, 2019. The direction resulting from this second workshop was to perform a reanalysis of all HPTF experiments thus far to investigate: 1)
Workload Measure Sensitivities (RIL 2022-11, Volume 3) 2) Task Order Effects (this volume) and 3) Touchscreen Ergonomics (forthcoming RIL 2022-11 Volume 5). Due to the COVID-19 health crisis, the third workshop was held as a virtual series consisting of six 2-hour blocks between October 29 to November 20, 2020. The future direction topics discussed during the most recent workshop are described in RIL 2022-11 Volume 6 (in press). The final direction and experimental design are yet to be set, but the resulting methodology and results may be published as Volume 7.
These volumes of research illustrate the NRCs ongoing effort to perform systematic human performance data collection using a simulator to better inform NRC guidance and technical bases in response to SRM SECY-08-0195 and SRM-M061020. The HF and HRA data are essential to ensure that our HFE guidance documents and HRA methods support the review and evaluation of state-of-the-art HF programs (as required by 10 Code of Federal Regulations (CFR) 50.34(f)(2)(iii)).
1 Systematic experimentation is challenging in the nuclear domain using real operators and full, dynamic scenarios because operators can take many paths to achieve a successful outcome. This variability represents a condition that is not conducive to controlled laboratory study. By including a confederate SRO in the study using a dynamic scenario, this hard to control variability is managed, thereby, enabling stable observations. See RIL 2022-11 Volumes 1 and 2 for examples of these methodological benefits.
3-4 ABSTRACT The staff of the U.S. Nuclear Regulatory Commission (NRC) is responsible for reviewing and determining the acceptability of new reactor designs and modifications to operating plants to ensure they support safe plant operations. Human factors staff use Chapter 18 of the Standard Review Plan (NUREG-0800) and the guidance documents referenced therein, in part, to ensure that plant operators can safely control the plant. The NRCs Human Factors Engineering Program Review Model, NUREG-0711, Rev. 3 (NRC, 2012) is one of these documents.
NUREG-0711 outlines that a generic human centered HFE design goal should include a design that supports personnel in maintaining vigilance over plant operations and provide acceptable workload levels. Furthermore, NUREG-0711s review elements highlight the importance of considering workload (WL). In particular, for Elements 3 (Task Analysis) and 4 (Staffing and Qualifications), providing an estimate of WL is explicitly part of the review criteria that must be met.
The basis for current NRC HFE guidance, in part, comes from data from research conducted in other domains (e.g., aviation, defense), qualitative data from operational experience in NPPs, and a limited amount from empirical studies in a nuclear environment. When it comes to new designs, technologies, and concepts of operations for new or existing control rooms, there is a lack of operational experience and appropriate research literature to draw from to inform NRC HFE guidance. To address this issue, the Commission, in Staff Requirements Memorandum (SRM) SECY-08-0195, directed the staff to consider using generic simulator platforms to address human performance issues. In response to the SRM, RES developed the NRC HPTF research program to empirically measure and study human performance aspects of control room operations using a NPP simulator and a combination of objective and subjective measures of workload. The information gained will be utilized to enhance the technical basis for the NRCs regulatory guidance in HFE and to better inform models for HRA.
Four large-scale data collections have been performed for the HPTF research program. The experiments are reported in RIL 2022-11 Volume 1 and 2. To delve further into the data previously collected, we performed a reanalysis of all HPTF experiments thus far to further investigate: 1) workload measure sensitivities 2) task order effects and 3) touchscreen ergonomics. The results of each of these supplementary analyses and their regulatory implications are discussed in RIL 2022-11 Volumes 3-5. The present RIL is Volume 4 in the RIL 2022-11 series and describes the supplementary analyses performed on datasets from four HPTF experiments to further investigate task order effects.
Previous reports on the project of the HPTF for the NRC have documented how psychophysiological and subjective workload measures are diagnostic for impacts of task type, simulator type and operator expertise. However, the potential sensitivity of workload assessments to task order effects may compromise the validity of workload assessments. This report presents supplementary analyses of data secured from four previous experiments to test for possible task order effects. One set of analyses tested for whether workload metrics for checking, detection and response implementation tasks were influenced by the nature of the preceding task in the task sequence. Significant order effects were infrequent and inconsistent from experiment to experiment. A second set of analyses tested for order effects at the task step level. Order effects associated with possible vigilance effects were found for hemodynamic workload measures. Overall, these findings suggest that order effects on workload assessments are minimal and do not threaten validity when assessing workload in NPP operations. However,
3-5 in some contexts, it may be advisable to attend to effects of novelty and to vigilance-like effects when detection tasks are repeated in sequence.
3-6 TABLE OF CONTENTS PREFACE................................................................................................................................3-2 ABSTRACT.............................................................................................................................3-4 LIST OF FIGURES..................................................................................................................3-8 LIST OF TABLES....................................................................................................................3-9 1 INTRODUCTION................................................................................................................ 3-10 1.1 Workload Assessment Methodology for Nuclear Power Plant Operations................... 3-10 1.2 The Human Performance Test Facility Research Program.......................................... 3-11 1.2.1 Types of NPP Simulated Environments............................................................. 3-11 1.2.2 Use of Novice and Experienced Participants..................................................... 3-12 1.2.3 Operator Task Classification.............................................................................. 3-13 1.2.4 Multivariate Workload Assessment.................................................................... 3-13 1.3 Order Effects and Workload Assessment.................................................................... 3-14 1.4 Aims............................................................................................................................ 3-15 2
SUMMARY
OF METHODS FOR STUDIES 1-4.................................................................. 3-15
2.1 Method
Studies 1-3.................................................................................................... 3-15 2.1.1 Study 1.............................................................................................................. 3-15 2.1.2 Study 2.............................................................................................................. 3-16 2.1.3 Study 3.............................................................................................................. 3-16 2.1.4 Experimental Scenario and Task Order (Studies 1-3)........................................ 3-16 2.1.5 Study 4 3-17 3 GENERAL METHODS...................................................................................................... 3-18 3.1 Measures of Workload................................................................................................ 3-18 3.1.1 Subjective Measures.................................................................................... 3-18 3.1.2 Physiological Measures................................................................................ 3-19 4 RESULTS......................................................................................................................... 4-22
4.1 Results
Studies 1-3................................................................................................... 4-22 4.1.1 Study 1 Checking Conditions........................................................................ 4-22 4.1.2 Study 1 Response Implementation Conditions.............................................. 4-24 4.1.3 Study 1 Detection Conditions........................................................................ 4-25 4.1.4 Study 2 Checking Conditions........................................................................ 4-25 4.1.5 Study 2 Response Implementation Conditions.............................................. 4-27 4.1.6 Study 2 Detection Conditions........................................................................ 4-27 4.1.7 Study 3 Checking Conditions........................................................................ 4-28 4.1.6 Study 3 Response Implementation Conditions.............................................. 4-28 4.1.8 Study 3 Detection Conditions........................................................................ 4-28 4.2 Results Summary: Studies 1-3................................................................................... 4-29 4.3 Procedure Step Level Order Effects: Studies 1-3....................................................... 4-29 4.3.1 ECG............................................................................................................. 4-30 4.3.2 Fnirs............................................................................................................. 4-32 4.3.3 TCD.............................................................................................................. 4-33 4.3.4 EEG.............................................................................................................. 4-34 4.4 Order Effect: Study 4.................................................................................................. 4-35
3-7 5 DISCUSSION.................................................................................................................... 5-36 5.1 Order Effects for the Checking Task........................................................................... 5-36 5.2 Task Step Order Effects............................................................................................. 5-37 6 CONCLUSION.................................................................................................................. 6-38 6.1 Workload Assessment................................................................................................ 6-38 6.2 Implications for Operations and Operator Workload During MCR Tasks..................... 6-38 6.3 Implications for HPTF Methodology............................................................................ 6-38 6.4 Implications for HRA................................................................................................... 6-39 7 REFERENCES.................................................................................................................. 7-40
3-8 LIST OF FIGURES Figure 1 ABMs X 10 EEG/ECG system................................................................................. 3-20 Figure 2 Spencer Technologies ST3 Transcranial Doppler.................................................... 3-20 Figure 3 Functional Near Infra-Red spectroscopy (fNIRS)..................................................... 3-21 Figure 4 Electrode locations for the ECG system................................................................... 3-21 Figure 5. Subjective metrics for two orderings of the checking task (Study 1)........................ 4-23 Figure 6. Physiological metrics for two orderings of the checking task (Study 1).................... 4-24 Figure 7. Subjective and physiological metrics for two orderings of the response implementation task (Study 1)......................................................................... 4-25 Figure 8. Subjective and performance metrics for two orderings of the checking task (Study 2)......................................................................................................... 4-25 Figure 9. EEG metrics for two orderings of the checking task (Study 2)................................. 4-27 Figure 10. Subjective and EEG metrics for three orderings of the detection task (Study 2).................................................................................................................... 4-28 Figure 11. Subjective and TCD metrics for two orderings of the checking task (Study 3)....... 4-28 Figure 12. Physiological metrics for three orderings of the response implementation task (Study 3)......................................................................................................... 4-29 Figure 13. Step-by-step changes in HRV for response implementation.................................. 4-31 Figure 14. Step-by-step changes in HR for response implementation.................................... 4-31 Figure 15. Step-by-step changes in HRV for checking........................................................... 4-32 Figure 16. Step-by-step changes in oxygen saturation in each hemisphere for detection....... 4-33 Figure 17. Step-by-step changes in cerebral blood flow velocity in each hemisphere for detection......................................................................................................... 4-34 Figure 18. Step-by-step changes in right hemisphere EEG beta power for detection............. 4-35
3-9 LIST OF TABLES Table 1. Summary of types of NPP simulated environments................................................... 3-12 Table 2. Summary of sensors and metrics used for workload assessment at the HPTF......... 3-14 Table 3. Task orderings for three experimental scenario conditions....................................... 3-17 Table 4. Statistical tests for significant findings for study 2 for checking task for frontal, parietal, and occipital regions.......................................................................... 4-26 Table 5. Statistical tests for significant findings for study 2 checking tasks for left and right hemispheres........................................................................................... 4-26 Table 6. Summary table for significant tests of temporal order (step) effects.......................... 4-30
3-10 1 INTRODUCTION 1.1 Workload Assessment Methodology for Nuclear Power Plant Operations The Human Factors Engineering (HFE) staff of the Nuclear Regulatory Commission (NRC) evaluate the HFE programs submitted in license applications for nuclear power plants (NPPs) to ensure their safety. One element of the review is to determine appropriate function allocation which is the allocation of functions between operators and automatic control systems which are then separated into tasks. Function allocation is the assignment of functions to (1) personnel (e.g., manual control), (2) automatic systems, and (3) combinations of both. Exploiting the strengths of personnel and system elements enhances the plants safety and reliability, including improvements achievable through assigning control to these elements with overlapping and redundant responsibilities. Functions are allocated to human and system resources and are separated into tasks. The subsequent analysis of personnel tasks identifies the alarms, displays, controls and task support needs required for performing the task. Tasks are arranged into jobs and assigned to staff positions or roles within the control room (e.g.,
reactor operator, balance of plant). Each position is evaluated to verify the workload is acceptable. (OHara, Higgins, Fleger, & Pieringer, 2012,). As such, due consideration should be given to whether there are aspects of operator tasking that are liable to impose excessive cognitive workload and potentially raise error probabilities and impact safety.
Error probabilities (i.e., likelihoods) are quantified using human reliability analysis (Boring, 2012). In the NPP context, HRA generally and prospective HRA specifically may be valuable as an approach to assessing and minimizing risk in next generation control rooms (Tran, Boring, Joe & Griffith, 2007), in particular when the workload imposed by new designs or concepts of operation have yet to be characterized.
The value of workload assessment for HRA is that it can help to identify relevant performance shaping factors that raise error probabilities, especially those derived from task demands. While HRA traditionally focuses on predicting error rates on a probabilistic basis, contemporary approaches aim also to model the cognitive processes that underlie human performance (Xing, Chang, & DeJesus, 2020; Mosleh & Chang, 2004). Workload assessment contributes to quantifying these cognitive processes and their sensitivity to task demands. In addition, factors influencing performance are often dynamic and interdependent, and continuous psychophysiological monitoring of operator state provides a means for tracking performance influencing factors dynamically in a research environment (Tran et al., 2007).
Workload assessment in the NPP context is especially important because reactors in the United States utilize a variety of plant designs, interfaces, and safety systems. For example, the main control room (MCR) will be designed differently depending on whether the plant is a boiling water reactor (BWR) or a pressurized water reactor (PWR), the plants age, and the number of control room modifications implemented. All these design factors contribute to workload differently. For MCR modernization, there may be impacts on task demands from new interface features such as touchscreens. The diversity of designs requires a standard workload assessment methodology that can, in turn, support a systematic HRA process.
Workload assessments are generally considered sensitive to task and scenario context. Context includes the order in which tasks are completed. It is possible that when a more difficult task is completed before an easier task, that the workload rating for the easier task would-be artificially low. Conversely, when a more difficult task follows an easier task, the workload ratings could be
3-11 artificially elevated. To date, there are very few studies examining task order effects and their influence on subjective workload ratings. The only direct evaluation of task order effects identified by the authors was Dickenson, Byblow, and Ryan (1993), who looked at both easy and hard scenarios for grid operators. They found that scenario order (easy first vs. difficult first) did not impact the magnitude of workload ratings. Dickenson et al., did not look specifically at task, leaving a gap in the literature in understanding the link between task order and workload in real world environments, like the NPP MCR. To address this gap, this report describes analyses of task order effects on workload response to simulated NPP MCR operations, supplementing previous reports addressing factors such as task type, interface type, and operator experience (Reinerman-Jones & Mercado, 2014; Reinerman-Jones, Teo & Harris, 2016; Reinerman-Jones et al., 2018, 2019). This introduction provides a summary of the work performed and detailed in existing reports and the motivation for performing the additional analyses.
1.2 The Human Performance Test Facility Research Program The program of research known as the Human Performance Test Facility (HPTF) has aimed to support the NRCs mission by advancing, validating, and documenting workload assessment methodology for NPP MCR operations using a generic plant simulator (Hughes, DAgostino, &
Reinerman-Jones, 2017). Using these simulators, the Human Factors and Reliability Branch (HFRB) in the Office of Nuclear Regulatory Research (RES) began a program of research known as the NRC HPTF to collect empirical human performance data with the purpose of measuring and understanding the cognitive and physical elements that support safe control room operation. To accomplish this and access a large sample population (i.e., university students), the NRC partnered with a university. The HFRB staff worked as co-investigators along with a team of researchers at the University of Central Florida (UCF) Institute for Simulation and Training (IST) to design and carry out a series of experiments aimed at measuring and understanding the human performance aspects of common control room tasks through a variety of physiological and self-report metrics2.
Four key features of the methodology are the use of NPP MCR simulated environments, novice participants, the definition of task components, and multivariate workload assessment using both subjective and objective measures. For more detailed background on the program history, goals, and methodological development (see RIL 2022-11, Volume 1, sections 1.2, 2.2, and 2.4).
1.2.1 Types of NPP Simulated Environments The use of a real NPP simulator to create a realistic experimental environment is a cornerstone of the HPTF methodology. As NPP reactor technology and control room design has modernized and evolved, so too has the NPP simulator technology and capability. In the HPTF studies, we characterize the types of NPP simulators with five main features, summarized in Table 1.
2 Performance-based measures were also collected but those data are not part of the order effects re-analysis presented in this report.
3-12 Table 1. Summary of types of NPP simulated environments.
Features NPP Simulator Types Scope
- a. Full-scope simulator - has the capability to simulate all of the physical and underlying thermodynamics occurring in the would-be plant
- b. Part-task simulator - has the capability to simulate only part of plant behavior Layout
- a. Spatially dedicated - all instrumentation and controls (I&Cs) are available and continuously in view to the operator and presented in a fixed location
- b. Hierarchical - all I&Cs are available but not continuously in view; the I&Cs can be displayed in a hierarchical manner embedded within the workstation displays Interface types
- a. Analog - conventional hard panels or bench boards with hard-wired analog I&Cs
- b. Digital - computer-based workstations with digital I&Cs
- c. Hybrid - analog hard panels and computer-based workstations
- d. Simulated Analog - digital representation of emulating analog I&C hard panels Workstation design
- a. Sit-down workstations
- b. Stand-up workstations Control interaction techniques
- a. Mouse click input (for digital and hybrid interfaces)
- b. Touchscreen input (for digital and hybrid interfaces)
- c. Manual manipulations of hard-wired controls (for conventional analog interfaces)
Based on these definitions, the simulators used in the HPTF studies can be characterized into three types: 1. a full-scope simulator with hierarchical layout, simulated analog interface in sit-down desktop mouse click workstations; 2. a full-scope simulator with hierarchical layout, simulated analog interface in stand-up touchscreen workstations; 3. a full-scope simulator with spatially dedicated layout, analog interface in stand-up manual manipulation benchboard.
1.2.2 Use of Novice and Experienced Participants Section 1.2.3 in RIL 2022-11 Volume 1 describes in detail decisions that led to using both novice and expert participants and the study design requirements to compare the two groups.
Briefly, it is often challenging to recruit licensed operators for research studies, especially given the need for multiple team members (e.g., senior reactor operator (SRO), reactor operator (RO)). Thus, this program of research investigated and found support for the use of novice participants without industry experience under certain conditions. For instance, tasks used for experimentation were selected to minimize the role of prior experience and knowledge, while still imposing comparable cognitive demands on the critical elements of information-processing.
For example, operators engage in detection when scanning the boards, they engage in checking when verifying a value specified in the procedures, and they implement responses when changing values on the panel or at a workstation if (and when) it is indicated in the procedures. These are all tasks that are readily accessible to novices, and they require the same underlying cognitive resources, including working memory, selective and sustained attention, and manual response selection and execution. This equal but different approach is taken to ensure that cognitive demands are comparable across populations, but the knowledge requirements are calibrated to the skill-base of novice participants (see also RIL 2022-11, Volume 1 section 2.4.5 for additional description of reducing complexity while maintaining fidelity). From a cognitive engineering standpoint, experimental studies that follow this type of approach can reveal processing operations that may be vulnerable to overload in both novices and experts which uncovers areas where operator demands will need to be managed or monitored.
3-13 1.2.3 Operator Task Classification The emergency operating procedure (EOP) represented in the simulation can be decomposed into a series of discrete tasks, labeled checking, detection, and response implementation, which can be readily trained within the novice population. These tasks are representative of tasks performed primarily by ROs and directed by SROs (OHara et al., 2008; OHara & Higgins, 2010; Reinerman-Jones et al., 2013 Section 2.4 in RIL 2022-11 Volume 1 describes the rationale for focusing exclusively on skills and rules-based tasks (see also Rasmussen, 1983) in a novice population).
Checking requires a one-time inspection of an instrument or control to verify that it is in the appropriate state. Detection requires continuous monitoring of a control parameter to identify a change in the state of the plant. Response implementation requires a fine motor response (mouse usage or finger touch) to change the state of the NPP by locating a control and subsequently manipulating the control in the required direction. The experimental protocol represents an EOP as a sequence of tasks of these three types. The temporal order of tasks can be manipulated but, in an actual NPP EOP, checking always precedes response implementation, while detection can occur at any point. Thus, possible task type sequences include (1) checking, response implementation, and detection, (2) checking, detection, and response implementation, and (3) detection, checking, and response implementation.
1.2.4 Multivariate Workload Assessment There has been a longstanding debate in the human factors community over the optimal methodology for workload assessment. A major challenge has been that different measures may dissociate (Hancock & Matthews, 2019). That is, manipulations of task demands may have different impacts on subjective workload, psychophysiological indicators of brain response, and objective performance metrics. Dissociation can point to additional dimensions of workload not captured in one of the other assessments or could indicate entirely different workload trends.
Thus, while the NASA Task Load Index (NASA-TLX: Hart & Staveland, 1988) is the single most popular workload measure, previous evidence of dissociation (RIL 2022-11, Volumes 1 and 2) suggest that it is not a comprehensive workload assessment.
Work conducted in the HPTF has supplemented the NASA-TLX with additional subjective measures including the Multiple Resource Questionnaire (MRQ: Boles & Adair, 2001) which has greater diagnosticity for isolating different sources of demand such as working memory and spatial attention. Stress response is assessed with the Dundee Stress State Questionnaire (DSSQ: Matthews et al., 2002). Workload is also assessed with an integrated suite of psychophysiological sensors, summarized in Table 1. Performance measures include those capturing effectiveness of three-way communication as well as those that index accuracy of task execution. Taken together, these multiple subjective and objective measures provide a comprehensive picture of operator response to changing task demands.
3-14 Table 2. Summary of sensors and metrics used for workload assessment at the HPTF Sensor Method Metrics Electrocardiogram (ECG)
Typical electrode placement: single-lead electrodes on the center of right clavicle and lowest left rib Heart rate (HR), Inter-beat Interval (IBI)
Heart rate variability (HRV)
Electroencephalogram (EEG)
Multiple scalp electrodes at frontal, temporal, parietal and occipital sites Spectral power densities (SPDs) for frequency bands (delta, theta, alpha, beta)
Cerebral blood flow velocity (CBFV)
Transcranial Doppler (TCD) ultra-sonography using transceivers above zygomatic arch Bilateral CBFV in middle cerebral arteries Task-induced response Functional near-infrared spectroscopy (fNIRS)
Forehead infrared (IR) light sources and detectors to measure prefrontal blood oxygenation Bilateral cortical oxygenation in the prefrontal cortex Findings from the HPTF studies have been summarized in a series of reports and articles (Reinerman-Jones & Mercado, 2014; Reinerman-Jones et al., 2016, 2018, 2019). A full summary is beyond the present scope, but one consistent theme is that the three different tasks elicit different workload responses. Workload on multiple metrics tends to be highest on the detection task, consistent with human factors research for sustained attention tasks (Warm, Parasuraman & Matthews, 2008). Studies using novice samples have provided the statistical power required to define workload responses accurately. Conducting similar work with operators would be impractical due to cost and time limitations. It is simply not feasible to have 60-75 operators participate in a 3-4 hour study on a routine basis. Additional studies have confirmed that workload factors generalize to experienced populations assuring that findings are relevant to operational practice. Experienced participants include both well practiced HPTF researchers (Leis et al., 2014) and former NPP operators tested at a simulator at NRC headquarters (Reinerman-Jones et al., 2018) and at the NRC Technical Training Center (TTC) in Chattanooga, Tennessee (Reinerman-Jones et al., 2019).
1.3 Order Effects and Workload Assessment Analyses of workload and performance conducted thus far have averaged metrics over multiple task steps. However, workload impacts may vary depending on the ordering of tasks due in part to difficulties associated with how operators transition between tasks and the demands of those transitions (e.g., Bowers, Christensen, & Eggemeier, 2014; Cox-Fuenzalida, 2007). One possible mechanism is adaptation. If a person becomes accustomed to a particular level of workload while performing a specific task then the magnitude of a subsequent workload change may be accentuated (or attenuated) if the following task differs in its difficulty relative to the adapted task (see also Dickenson et al., 1993). Adapting to a change in task demand may itself be a source of workload. There may also be fatigue-like effects associated with multiple repetitions of similar steps, for example, the temporal effects on the detection task type are associated with loading on sustained attention and could indicate a vigilance decrement in performance (see Warm et al., 2008).
NPP operations involve multiple types of tasks performed by operators such as detection, checking, and response implementation (see RIL 2022-11, Volume 1 Table 2-2 for task definitions and descriptions). The order of these task type combinations could be a factor potentially influencing workload and performance. From an assessment methodology standpoint, order effects might influence the validity of assessment. For example, measured workload values for task A might vary according to whether it follows task B or task C.
3-15 Accordingly, whether task A is judged a potential overload and error vulnerability factor from an HRA standpoint may depend on its position in the task sequence. Such a conclusion would require fine-grained and potentially lengthy assessments of operator tasks to accommodate order effects. Similar considerations apply to the impact of multiple repetitions of steps; a task may only elicit excessive workload if performed toward the end of the sequence.
1.4 Aims The aim of the research reported here was to analyze existing data sets to investigate possible order effects. Data from four studies were utilized for this purpose, two using novice samples (Studies 1 and 2), and two using former operator samples (Studies 3 and 4)3. Studies 1-3 used a common scenario based on the EOP for loss of all alternating current power executed using the GSE Generic PWR simulator. The scenario was presented using a digital, part-task simulator which allowed the number of each of the three task types to be equated using a blocking method for experimental control purposes. There were three different orderings used, allowing tests for the impacts of certain task orders. Study 4 was conducted in a full-scale, full-scope simulator environment that reproduced both the physical environment and the would-be physics of a real plant and plant response. It used a more realistic but also more complex sequence of steps and execution of a full scenario.
2
SUMMARY
OF METHODS FOR STUDIES 1-4
2.1 Method
Studies 1-3 Full details of the methods for these studies are provided in the reports already delivered to the NRC (Reinerman-Jones & Mercado, 2014; Reinerman-Jones et al., 2016, 2018, 2019) and in RIL 2022-11 Volumes 1 and 2 (Hughes et al, 2023). Here, we provide an overview only, with a focus on the investigation pertaining to effects of task order.
2.1.1 Study 1 This study aimed to establish the feasibility of a novel methodology in the nuclear domain. It used novice participants to perform common NPP operator tasks in a simplified desktop-based simulated environment. Digitized analog control room panels were presented on two 24-inch (16:10 aspect ratio) Ultra Extended Graphics Array (UXGA) monitors; participants had to use the mouse and scroll-wheel to view all the controls as not all the controls could fit in the display area of the monitors. Task performance required mouse and keyboard inputs. Participants were 81 students (45 males, 36 females, mean (Mage) = 21, standard deviation (SD age) = 4.11)4 trained to an acceptable level of proficiency prior to the main workload assessment in the simulated EOP. The study confirmed that, out of the three tasks, the detection task imposed the highest workload on multiple metrics, evident in both subjective and objective metrics, including higher NASA-TLX scores, spatial attentive and temporal workload, higher regional brain oxygenation (measured by fNIRS), and less accurate communication performance. Some specific workload indices showed differing trends but, in general, the convergence between workload and performance data confirmed the HRA relevance of the assessment.
3 Note that in RIL 2020-11 Volume 1, study 1 and 2 are analyzed together, but they are treated as separate studies for the purpose of this reanalysis.
4 These statistics describe the demographics (e.g., sex and age) of the sample population. The mean age of participants was 21 and the standard deviation of sampling for age was 4.11.
3-16 2.1.2 Study 2 The second study used a similar design with the aims of testing generalization of findings to a touchscreen interface and identifying differences in workload and performance between desktop and touchscreen interfaces. The touchscreen interface consisted of eight 27-inch touchscreen Wide Quad High Definition (WQHD) monitor grids (two high by four wide). The interface displayed the instrumentation and control panel in its entirety (i.e., removing the need for scrolling and zooming), but the large interface required participants to stand and move laterally in order to visually scan and interact with the interface. Seventy-one participants (40 males, 31 females, Mage = 20.15, SD age = 2.65) from the UCF student pool participated. Participants presented in pairs, with two participants acting as ROs and an experimenter working as the SRO. Study 2 confirmed that task type influenced multiple workload metrics and performance when using a touchscreen interface. Task type effects were comparable to those found for the desktop interface, with the detection task tending to produce the highest workload response, across multiple metrics. Some generally minor differences in detail in task type effects were found. Results also showed some differences in workload imposed by the two interfaces, depending on the metric examined. Findings suggested some of the costs and benefits of the introduction of touchscreens to the MCR as part of modernization efforts.
2.1.3 Study 3 The third study used a similar design in a formerly licensed operator sample (N=18; 14 males, 4 females, M age = 45.94, SD age =10.63). Participants had operational experience working in a PWR or BWR MCR in either commercial power generation or naval nuclear power generation domains. The study aimed to determine whether comparable workload findings would-be obtained from an expert sample, relative to the results from novice samples in Studies 1 and 2.
Experimental sessions were conducted in a mock MCR at the NRC headquarters in Rockville, Maryland. A Generic Pressurized Water Reactor (GPWR) NPP MCR simulator was configured for a crew of three operators, including an SRO and two ROs. A touchscreen interface was used comprising four 27-inch touch monitors arranged two high by two wide. It also distinguished former operators performing in RO1 and RO2 roles. Findings showed task type differences broadly comparable to those demonstrated in the first two studies, suggestive of highest workload on the detection task. The study also demonstrated workload differences between RO1 and RO2 on some metrics.
2.1.4 Experimental Scenario and Task Order (Studies 1-3)
The experimental scenario consisted of common NPP MCR operating procedures tasks:
checking (C), detection (D), and response implementation (RI). Tasks were composed of individual steps, e.g., one checking operation. There were twelve steps in the experimental scenario, grouped by task type (4 checking steps, 4 detection steps, and 4 response implementation steps). The order of task type block was counterbalanced across participants (Table 3). The task types were only partially counterbalanced to create scenarios because the tasks of checking and response implementation are directly linked such that checking always occurs before response implementation in real NPP operations. Task order must thus be constrained to maintain external validity.
3-17 Table 3. Task orderings for three experimental scenario conditions.
Condition 1 Condition 2 Condition 3 Order Task Type Order Task Type Order Task Type 1
C 1
D 1
C 2
C 2
D 2
C 3
C 3
D 3
C 4
C 4
D 4
C 5
RI 5
C 5
D 6
RI 6
C 6
D 7
RI 7
C 7
D 8
RI 8
C 8
D 9
D 9
RI 9
RI 10 D
10 RI 10 RI 11 D
11 RI 11 RI 12 D
12 RI 12 RI Given this scenario, we conducted planned comparisons in Analysis of Variance (ANOVA)5 to test the effect of the preceding task as permitted by the design. For example, checking is performed either at the beginning of the task sequence (Conditions 1 and 3) or following performance of detection tasks (Condition 2). The planned comparison then tests whether the mean workload metric for checking in Conditions 1 and 3 differs from the mean for the metric in Condition 2. If the test is significant, the implication is that the workload metric is influenced by the preceding task condition. On this basis, we could perform three planned comparisons:
Checking following Detection (CD) compared to when Checking is at beginning (C0) - CD vs. C0 Response Implementation following Checking (RIC) compared to when Response Implementation follows Detection (RID) - RIC vs. RID Detection following Response Implementation (DRI), compared to when Detection is at beginning (D0), and D following C (DC) - DRI vs. D0 vs. DC That is, order conditions for each task type were defined separately for the analyses. In terms of checking task type, condition 1 and condition 3 were defined as checking at the beginning, whereas condition 2 was defined as checking following detection. Similarly, response implementation conditions were defined as response implementation following checking and response implementation following detection. Detection conditions were defined as detection at beginning, detection following checking, and detection following response implementation.
2.1.5 Study 4 The fourth study aimed to validate the HPTF methodology (lightweight simulator, novice population) on a full scope simulator with formerly licensed operators. This validation study was conducted at the NRC TTC, on a Westinghouse 4-Loop PWR analog control room simulator, capable of simulating all of the underlying thermodynamics occurring in the real plant. Thirty former operators provided the sample (M age = 55.47, SD age =7.82). Similar to Study 3, Study 4 distinguished former operators performing in RO roles. In this case, roles were designated as 5 Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among group means in a sample.
3-18 RO and BOP (Balance of Plant). By contrast with Studies 1-3, the scenario followed a realistic EOP, without attempting to experimentally control the frequencies and orders of the different task types. Specifically, the experimental scenario was developed based on a generic version of an EOP for a Loss of All Alternating Current (AC) Power (ECA-0.0) scenario but modified for experimental use. The experimental procedure contained 69 steps supporting three different task types, i.e., checking, detection, and response implementation task types. In the experimental procedure, there were 30 steps (16 checking, 5 detection, and 9 response implementation) for RO, and 39 steps (27 checking, 1 detection, and 11 response implementation) for BOP. The number of steps were not balanced for RO nor task type due to the nature of the original, realistic EOP, which requires steps to be taken in a prescribed sequence. In case the crew made an error or took alternative actions outside the scope of experimental procedures, an original EOP was available to the SRO for use as a contingency plan. However, the contingency plan was never required in the actual experiments.
Overall, the study confirmed the generalizability of the results from the novice samples collected in studies 1-3 and supported the feasibility of utilizing digital simulators to conduct research, identify safety concerns, and supplement operator training.
In comparison to the previous studies, Study 4 utilized a more realistic but also more complex sequence of task steps to replicate the execution of a full scenario, thus, the steps were not artificially grouped in task type blocks (see Table 3). This has implications for the order effects analysis, which will be discussed in subsequent sections.
3 GENERAL METHODS 3.1 Measures of Workload For a full account of the measures and experimental design employed for this series of experiments, see RIL-2020-11, Volume 1.
3.1.1 Subjective Measures 3.1.1.1 NASA-TLX The NASA-TLX (Hart & Staveland, 1988; Hart, 2006) is a multi-dimensional questionnaire. It was used to assess each participants perceived workload. Subscales included mental demand, physical demand, temporal demand, effort, frustration, and performance. The NASA-TLX uses a 100-point sliding scale to rate each subscale. The average score of the six subscales provided a separate measure of global workload. Participants received a copy of the questionnaire with subscale definitions and completed the NASA-TLX at the end of each task type, throughout the scenario.
3.1.1.2 Instantaneous Self-Assessment (ISA)
The ISA (Hulbert, 1989; Jordan, 1992) was used to measure immediate subjective workload on a five-point Likert scale assessed during the performance of a task (Tattersall & Foord, 1996).
Participants received a copy of the measure with definitions and completed the ISA halfway through each task type using a customized computer program that automatically activated an
3-19 audio prompt containing the questionnaire. The audio prompt contained the phrase, please rate your workload, (Study 1), RO1 [RO2] please rate your workload (Study 2), signaling participants to respond by writing down their rating on a sheet of paper.
3.1.1.3 Multiple Resource Questionnaire (MRQ)
The MRQ was used to characterize the nature of the mental processes used during each task (Boles & Adair, 2001). The items on the questionnaire were derived from factor analytic studies of lateralized processes (Boles, 1991, 1992, 1996, 2002). Participants received a copy of the scales, with definitions, and completed the MRQ at the end of each task type, throughout the scenario. Boles (1996) indicated that the MRQ is most effective when only the target scales for the task are included. The following 14 of 17 scales were used in the HPTF studies:
auditory emotional process auditory linguistic process manual process short-term memory process spatial attentive process spatial categorical process spatial concentrative process spatial emergent process spatial positional process spatial quantitative process visual lexical process visual phonetic process visual temporal process vocal process 3.1.2 Physiological Measures 3.1.2.1 Electroencephalogram (EEG)
The Advanced Brain Monitoring B-Alert X10 system has nine EEG and one ECG channel (Figure 1). Electrodes were positioned following the international standard 10-20 system, at Fz, F3, F4, Cz, C3, C4, Pz, P3, and P4. The sampling rate was256 Hz. Reference electrodes were placed on the mastoid bone. Power Spectral Density analysis techniques were used to analyze three standard bandwidths: theta (4-8 Hz), alpha (9-13 Hz), and beta 14-30 Hz (Wilson, 2002).
Each bandwidth was collected at all nine channels. Additional analyses comparing left and right hemispheres as well as frontal, temporal, and parietal lobes were conducted by collapsing across the nine electrode sites.
3-20 Figure 1 ABMs X 10 EEG/ECG system 3.1.2.2 2.3.2.2 Transcranial Doppler (TCD)
The Spencer Technologies ST3 Digital Transcranial Doppler, model PMD150, was used to monitor CBFV of the medial cerebral artery in the left and right hemisphere through high pulse repetition frequency (Figure 2). The Marc 600 head frame set was used to hold the TCD probes in place.
Figure 2 Spencer Technologies ST3 Transcranial Doppler 3.1.2.3 2.3.2.3 Functional Near-Infrared Spectroscopy (fNIRS)
The Covidien Invos Cerebral/Somatic Oximeter, model 5100C, was used to measure (hemodynamic) changes in oxygenated hemoglobin and deoxygenated hemoglobin in the
3-21 prefrontal cortex of the left and right hemispheres (Ayaz et al., 2011; Chance et al., 1993)
(Figure 3).
Figure 3 Functional Near Infra-Red spectroscopy (fNIRS) 3.1.2.4 2.3.2.4 Electrocardiogram (ECG)
The Advanced Brain Monitoring System B-Alert X10 system was used to monitor the ECG, sampling at 256 Hz. Single-lead electrodes were placed on the center of the right clavicle and one on the lowest left rib (Figure 4). HR was computed using peak cardiac activity to measure the interval from each beat per second. The So and Chan QRS detection method was used to calculate IBI and HRV (Taylor et al., 2010). This approach maximizes the amplitude of the R-wave (Henelius et al., 2009).
Figure 4 Electrode locations for the ECG system
4-22 4 RESULTS
4.1 Results
Studies 1-3 A series of one-way ANOVAs was run for each of the subjective, performance, and psychophysiological metrics to test the order effect based on task type conditions (checking, detection, and response implementation).
4.1.1 Study 1 Checking Conditions Analyses of subjective and performance measures revealed that at the beginning of trials, participants reported higher global mental workload for the checking task on the NASA-TLX (M6
= 50.98, SD7 = 24.39), F(1,78)8 = 6.069, p <.0510, p2 =.071112.
On the DSSQ participants reported greater distress (M = 9.98, SD = 6.25), F(1,78) = 4.71, p <
.05, p2 =.10; greater engagement (M = 21.26, SD = 5.24), F(1,78) = 8.47, p <.01, p2 =.06 and higher percentage of correct checking (M = 65.99, SD = 31.38), F(1,78) = 4.02, p <.05, p2 =.05 when checking occurred first than when the checking tasks followed detection tasks (mental workload: M = 36.54, SD = 24.97; distress: M = 6.88, SD = 5.35; engagement: M = 17.50, SD =
5.76; percentage of correct checking: M = 50.73, SD = 32.91) (Figure 5).
None of the metrics from MRQ and ISA or other metrics from NASA-TLX and DSSQ showed significant differences for the checking planned comparison (checking at the beginning (C0 and checking following detection CD).
6 The mean (M) (average) of a data set is found by adding all numbers in the data set and then dividing by the number of values in the set.
7 Standard Deviation (SD) is the measure of dispersion of a set of data from its mean. It measures the absolute variability of a distribution.
8 In statistics, the number of degrees of freedom (the number in brackets after the F symbol) is the number of values in the final calculation of a statistic that are free to vary. The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it, is called number of degrees of freedom.
9 An F-test (F) is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled.
10 In statistical testing, the p-value (p) is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis.
11 F(1,78) = 6.06, p <.05, p2 =.07 is the results from ANOVA in APA style that includes F value with degrees of freedom in brackets, p value, and partial eta square (effect size).
12 Partial eta squared (p2) is the ratio of variance associated with an effect, plus that effect and its associated error variance.
4-23 Figure 5. Subjective metrics for two orderings of the checking task (Study 1).13 Analyses with the psychophysiological metrics revealed that participants HRV decreased in the checking task block at the beginning (M = -6.78, SD = 18.66), and increased in the checking task block following detection tasks (M = 15.35, SD = 24.18), F (1,63) = 16.89, p <.01, p2 =.21.
Oximeter results indicated a smaller decrease in oxygen saturation in the prefrontal cortex in the checking task block at the beginning (left: M = -.78, SD = 2.44; right: M = -.71, SD = 2.70) than in the checking task block following detection tasks (left: M = -2.96, SD = 2.65; right: M = -2.41, SD = 3.59), left: F(1,63) = 10.90, p <.01, p2 =.15, right: F(1,63) = 4.63, p <.05, p2 =.07. Taken together, the HRV and oximeter results suggest participants experienced greater workload when checking occurred as the first task block than when it followed detection. These results are consistent with the subjective assessments (NASA-TLX, DSSQ, MSQ, ISA) of workload.
Significant EEG differences came from right hemisphere theta band and parietal lobe alpha band. There was a greater increase in theta band activity when the checking task block occurred at the beginning than in the checking task block following detection tasks, F(1,63) =
8.01, p <.01, p2 =.11. There was a greater decrease in alpha band activity when the checking task block following detection tasks than when the checking task block was at the beginning, F(1,63) = 10.55, p <.01, p2 =.14 (Figure 6). Parietal alpha suppression in the presence of generalized prefrontal theta elevation can be indicative of fatigue which can be associated with workload; however, a single physiological metric is not diagnostic of an overall order-related effect. Analyses of other psychophysiological metrics did not reveal significant differences.
13 In the figure, C0 stands for checking tasks at the beginning, and Cd refers to checking tasks following detection tasks.
4-24 Figure 6. Physiological metrics for two orderings of the checking task (Study 1).
4.1.2 Study 1 Response Implementation Conditions According to the subjective measure of the MRQ, participants reported higher mental process ratings in terms of short-term memory, F(1,78) = 5.70, p <.05, p2 =.07; spatial categorical process, F(1,78) = 6.49, p <.05, p2 =.08; and spatial emergent process, F(1,78) = 4.69, p <.05, p2 =.06, in the response implementation tasks following checking tasks than in the response implementation tasks following detection tasks.
EEG data showed a greater increase in theta band in the occipital lobe in performing the response implementation tasks following detection tasks (M = 45.13, SD = 92.15) than performing the response implementation tasks following checking tasks (M = 3.95, SD = 26.72),
F(1,56) = 6.74, p <.05, p2 =.11 (Figure 7). This pattern of EEG results makes sense as the location (occipital lobe), waveform (theta) and magnitude direction (increase) combined are indicative of workload associated with visual working memory. During response implementation following detection, the participants have to locate a control item and execute an action. This would require substantial additional visual working memory relative to other tasks. None of the other metrics revealed significant differences for the response implementation planned comparisons (Response Implementation following Checking, Response Implementation following Detection).
-40
-30
-20
-10 0
10 20 30 C0 CD C0 CD C0 CD C0 CD C0 CD ECG-HRV Oxi-L Oxi-R EEG-theta-R EEG-alpha-P Difference from Baseline Psychophysiological Metrics in Study 1
4-25 Figure 7. Subjective and physiological metrics for two orderings of the response implementation task (Study 1).
4.1.3 Study 1 Detection Conditions No significant effect of detection conditions was revealed in Study 1.
4.1.4 Study 2 Checking Conditions Participants reported higher workload ratings on NASA-TLX for spatial attentive process, F(1,67) = 4.64, p <.05, p2 =.07, and greater level of engagement on the DSSQ, F(1,56) = 4.64, p <.05, p2 =.07, in the checking tasks at the beginning than in the checking tasks following detection tasks.
In terms of performance, participants completed three-way communication more accurately during the checking tasks when they followed the detection tasks than when the checking tasks were at the beginning (Figure 8). Other metrics of subjective workload and performance were not significant.
Figure 8. Subjective and performance metrics for two orderings of the checking task (Study 2).
0 20 40 60 80 100 C0 CD C0 CD C0 CD Spatial Attentive Demand DSSQ-Engagement Communication Accuracy Subjective and Performance Metrics in Study 2
4-26 The only psychophysiological measures for the checking tasks to show significant effects were the EEG measures. Analyses of EEG data revealed all power bands and regions were significant (Figure 9 bands, Figure 10, regions, table 5). Taken together, the pattern of results is suggestive of an increase in workload when checking follows detection. The modest alpha suppression suggests high resource demand in visual information and overall sensory processing areas (parietal and occipital alpha respectively (see figure 9 and table 6)). However, despite controlling for familywise error rate, there is still the possibility of type 1 error so care should be taken interpreting the results of this analysis.
Table 4. Statistical tests for significant findings for study 2 for checking task for frontal, parietal, and occipital regions.
Alpha Beta Theta Frontal F(1,67) = 14.61, p <
.01, p2 =.18; beta: F(1,67) = 14.86, p <.01, p2 =.18 F(1,67) = 15.08, p <
.01, p2 =.18 Parietal F(1,67) = 12.14, p <
.01, p2 =.15 beta: F(1,67) = 6.449, p <.05, p2 =.09 F(1,67) = 9.32, p <
.01, p2 =.12 Occipital occipital lobe, F(1,67)
= 12.88, p <.01, p2
=.16, beta: F(1,67) = 19.29, p <.01, p2 =.22 F(1,67) = 11.40, p <
.01, p2 =.15 Table 5. Statistical tests for significant findings for study 2 checking tasks for left and right hemispheres.
Hemisphere Statistic Left F(1,67) = 19.72, p <.01, p2 =.23 Right F(1,67) = 12.13, p <.01, p2 =.15
4-27 Figure 9. EEG metrics for two orderings of the checking task (Study 2).
4.1.5 Study 2 Response Implementation Conditions No significant effect of response implementation conditions was revealed in Study 2.
4.1.6 Study 2 Detection Conditions A similar trend in the ISA data was observed when detection was ordered first, followed by checking or response implementation, F(2,66) = 3.94, p <.05, p2 =.11. There were also significant effects in the EEG theta band in the frontal lobe, F(1,67) = 5.81, p <.01, p2 =.15, and overall theta activity in the left hemisphere, F(1,67) = 4.51, p <.05, p2 =.01 (Figure 10). Alpha band activity in the left hemisphere increased when the detection tasks occurred at the beginning (M = 15.53, SD = 26.89), but decreased in the detection tasks following checking tasks (M = -8.06, SD = 30.82), and when the detection tasks followed the response implementation tasks (M = -8.35, SD = 17.70), F(1,67) = 6.53, p <.01, p2 =.17, suggesting a reversal of the previously observed workload finding, based on a slightly different ordering of tasks. Overall, these results suggest an instability in task order effect trends.
-30 20 70 120 170 220 270 C0 CD C0 CD C0 CD C0 CD C0 CD EEG-Frontal EEG-Parietal EEG-Occipital EEG-Left EEG-Right EEG Metrics in Study 2 Change from Baseline Theta Beta Alpha
4-28 Figure 10. Subjective and EEG metrics for three orderings of the detection task (Study 2).
4.1.7 Study 3 Checking Conditions Although the sample size for Study 3 was small, there were some significant findings. The MRQ subscales of spatial concentrative process workload F(1,7) = 9.83, p <.05, p2 =.58; spatial categorical process workload, F(1,7) = 5.93, p <.05, p2 =.46; and associated cerebral blood flow velocity in the left hemisphere, F(1,5) = 20.18, p <.01, p2 =.08 (Figure 11) were significant.
No other metrics were significant.
Figure 11. Subjective and TCD metrics for two orderings of the checking task (Study 3).
2.1.6 Study 3 Response Implementation Conditions No significant effect of response implementation conditions was revealed in Study 3.
4.1.8 Study 3 Detection Conditions Participants physiological responses indicated increased IBI, F(2,4) = 7.98, p <.05, p2 =.80 and greater increase in oxygen saturation in the left prefrontal cortex, F(2,4) = 11.24, p <.05, p2 =.85 in the detection tasks performed at the beginning. However, EEG beta band in frontal lobe showed the greatest increase in the detection tasks following response implementation tasks, F(2,4) = 83.10, p <.01, p2 =.98 (Figure 12).
4-29 Figure 12. Physiological metrics for three orderings of the response implementation task (Study 3).
4.2 Results Summary: Studies 1-3 The re-analysis of studies 1-3 revealed good convergence between subjective workload measures, particularly MRQ, NASA-TLX, and EEG. There was a consistent pattern of elevated occipital and parietal beta and theta with moderate to strong alpha suppression that was consistent with expected task demands and reported expected workload. For example, stronger occipital alpha suppression (relative to baseline) with elevated occipital and parietal beta during a visual detection task is diagnostic for high visual workload and corroborated with a subjective rating.
The absence of strong order effects in the re-analysis is not surprising. Across the cognitive and perceptual literature, order effects tend to be small, often tied to phenomena such as priming and bias. In complex tasks where there is a logical and proceduralized order, a classical order effect would be less likely to emerge. Some example cases of where task order could be a relevant performance shaping factor in a human reliability analysis will be discussed in the general discussion.
4.3 Procedure Step Level Order Effects: Studies 1-3 We also investigated order effects on a more fine-grained basis by examining changes in workload on a step-by-step basis for each of the three tasks, checking (C), detection (D), and response implementation (RI). One factor repeated-measures ANOVAs were run for each metric and task, with four levels of step, to determine whether metric values changed across tasks. This analysis is limited in that the data sampling intervals for the checking and response implementation tasks are very short (around a minute or so). Step duration for detection was five minutes. EEG is expected to remain reliable over short durations. As in the previous analyses, crossing experiments, tasks, and workload metrics provides numerous analyses, raising the likelihood of chance findings and Type I errors.
-10 0
10 20 30 40 50 D0 DC DRI D0 DC DRI D0 DC DRI ECG-IBI Oxi-L EEG-Beta-F Psychophysiological Metrics in Study 3: Change from Baseline
4-30 Table 6. Summary table for significant tests of temporal order (step) effects.
ECG fNIRS TCD EEG Study 1 C
HRV**
L*
-F*/P**/L*
-P*/O**/R**
RI HR**
HRV**
L*
R*
R*
-P*
-L*
D L**
R**
L**
R**
-P*
Study 2 C
HR**
HRV**
IBI**
L*
R**
L*
R**
-F**/O**/L**
-F**/P**/L*
RI HR*
HRV**
IBI*
L**
R**
L**
R**
-F**/P**/L**
D HR**
HRV**
IBI**
L**
R**
L**
R**
-F**/P**/O*/L**/R*
-F**/P**/O**/L**/R**
-F*
Study 3 C
L**
R*
-O*
RI R**
-F**/P*/O**/L**/R**
D L*
R*
L*
Note: *p <.05, **p <.01, replicated effects in bold Due to these limitations, the analyses are presented as follows. First, we summarize significant effects from all analyses (table 6). Then, we identify the effects that replicate across at least two studies as meeting the criteria for further examination. Effects significant in one experiment only may either reflect chance findings or they may reflect idiosyncratic features of the study concerned; in either case, the practical implications are minor at most. For ECG, fNIRS and TCD, we focused primarily on the detection task, given the longer sampling interval, but we checked whether effects on this task replicated for the other two tasks. For EEG, we considered evidence from all three tasks equally.
4.3.1 ECG Analyses of HRV for response implementation tasks revealed significant order effects on a step-by-step basis in Study 1, F(2.19,151.25) = 10.07, p <.01, p2 =.13 and Study 2, F(2.41,163.76)
= 10.07, p <.01, p2 =.20. Generally, HRV decreased over time. Although not significant, a similar trend in Study 3, with a small sample size, was observed (Figure 13). Analyses of HR were also significant in Study 1, F(2.46,178.22) = 4.50, p <.01, p2 =.06 and Study 2, F(1.91,130.04) = 3.44, p <.05, p2 =.05 in response implementation tasks (Figure 14). However, the trends were not consistent between Study 1 and 2, because these studies were likely underpowered as indicated by the small effect sizes and F values. Moreover, the insignificant trend suggested by Study 3 was not in line with the effect in Study 1 or 2.
Besides the response implementation task, analyses of HRV also revealed significant effects for the checking task steps in Study 1, F(2.48,151.25) = 10.07, p <.01, p2 =.13 and Study 2, F(3,201) = 3.67, p <.05, p2 =.05 (Figure 15). However, the trends were inconsistent between Study 1 and 2. The insignificant effect in Study 3 seemed to be similar with the effect observed in Study 2, which utilized a simulator with the same touchscreen interface.
IBI indicated significant effects in all three task types in Study 2. But IBI may not be a sensitive metric, given that the significant effects in Study 2 were not replicated in other studies. Overall, the ECG metrics indicate an increase in workload over time. The longer the participants performed the response the tasks (both response implementation and checking) the higher the physiological workload.
4-31 Figure 13. Step-by-step changes in HRV for response implementation.
Figure 14. Step-by-step changes in HR for response implementation.
-20 0
20 40 60 80 100 RIs1 RIs2 RIs3 RIs4 HRV Change from Baseline (%)
Exp1 Exp2 Exp3
-2
-1 0
1 2
3 4
5 6
7 RIs1 RIs2 RIs3 RIs4 Heart Rate Chagne from Baseline
(%)
Exp1 Exp2 Exp3
4-32 Figure 15. Step-by-step changes in HRV for checking.
4.3.2 fNIRS The analyses of fNIRS data revealed a significant temporal effect on oxygen saturation in both left and right in the prefrontal cortex during detection tasks across three studies (Figure 16). In Study 1, oxygen saturation generally decreased toward the last detection step in the left prefrontal cortex, F(2.51,175.88) = 12.05, p <.01, p2 =.19 and in the right prefrontal cortex, F(2.53,176.78) = 11.52, p <.01, p2 =.14. In Study 2, oxygen saturation decreased toward the last detection step, but fluctuated during the middle steps in the left prefrontal cortex, F(1.92,120.89) = 6.03, p <.01, p2 =.09 and in the right prefrontal cortex, F(2.14,134.72) = 7.34, p <.01, p2 =.10. In Study 3, the trends were similar to the trends revealed in Study 2 in the left prefrontal cortex, F(3,18) = 4.40, p <.05, p2 =.42 and in the right prefrontal cortex, F(3,18) =
3.21, p <.05, p2 =.35. Such a trend of decreased oxygen saturation toward the final step of the detection tasks indicated decrement in sustained attention and distraction during a string of detection tasks. In addition, significant changes in oxygen saturation were also observed in checking and response implementation task types across three studies (see summary table).
However, the effects did not converge toward a same trend. The checking and response implementation tasks typically required less sustained attention due to the nature of the task, and so concentration may play a less important role in the change of oxygen saturation during the checking and response implementation task. Therefore, the effects were not convergent.
-20 0
20 40 60 80 Cs1 Cs2 Cs3 Cs4 HRV Change from Baseline (%)
Exp1 Exp2 Exp3
4-33 Figure 16. Step-by-step changes in oxygen saturation in each hemisphere for detection.
4.3.3 TCD The analyses of TCD data revealed a similar temporal effect with a pattern of generally decreased Cerebral Blood Flow Velocity (CBFV) in both left and right hemispheres during detection tasks across three studies (Figure 17). In Study 1, CBFV generally decreased during the detection tasks in the left hemisphere, F(2.75,192.24) = 4.93, p <.01, p2 =.07 and in the right hemisphere, F(2.49,174.14) = 11.33, p <.01, p2 =.14. In Study 2, the decrease in CBFV was more notable toward the last detection step in the left hemisphere, F(2.46,154.97) = 34.50, p <.01, p2 =.35 and in the right hemisphere, F(2.15,135.17) = 25.53, p <.01, p2 =.29. In Study 3, the change of CBFV was only significant in the left hemisphere, F(3,18) = 3.57, p <.05, p2
=.37. The consistent pattern of decreased CBFV, especially in the right hemisphere, suggested a loss of vigilance and alertness during detection tasks over time. In addition, significant changes in CBFV were also observed in checking and response implementation task types across the first two studies (see summary table). But the effects did not converge toward the same trend. Checking and response implementation tasks were transient and completed in a short time. The data of TCD may be easily affected by other physiological responses as random errors.
4-34 Figure 17. Step-by-step changes in cerebral blood flow velocity in each hemisphere for detection.
4.3.4 EEG Within the three EEG bandwidths, theta (4-8 Hz), alpha (9-13 Hz), and beta (14-30 Hz), beta was the indicator that showed the most notable changes. Analysis of right hemisphere beta indicated a generally increasing level during the detection tasks, F(2.17,143.35) = 24.32, p <
.01, p2 =.27 (Figure 18). The increased beta may suggest an increase in cognitive activity in detection tasks over time. However, the trends observed in Study 2 and Study 3 were not in line with the increasing trend in Study 1. Beta power showed an irregular temporal trend in Study 2, and a tendency to decrease in Study 3. Moreover, the significant effects revealed in checking and response implementation tasks were not consistent across the three studies either. The inconsistency made the results difficult to interpret. Perhaps, multiple types of cognitive processing were involved in the complex MCR tasks, and the dominant cognitive load may be different across task types or task steps.
4-35 Figure 18. Step-by-step changes in right hemisphere EEG beta power for detection.
4.4 Order Effect: Study 4 One factor repeated-measures ANOVAs were run to test the task type order effect using different psychophysiological measures as indices. We identified two comparison groups between two operator roles. We could compare responses to checking tasks performed at the beginning versus checking tasks following response implementation tasks, as well as comparing the same string of checking steps performed earlier in the scenario versus later in the scenario for operators in both RO and BOP roles. Overall, none of the twenty-two selected metrics from ECG, EEG, TCD, or fNIRS suggested any significant effect. Unsurprisingly, the non-significant results are complementary to the general weak and inconsistent order effects revealed in the first three studies. Taken together, this suggests that task order is not a major driver of workload effects, however, task type may moderate the impact of other factors on workload. For example, the data from studies 1-3 suggest that detection elicits the highest demand both in terms of observable neurocognitive demand and the participants subjective experience of workload. The visual channel is clearly taxed by detection (and to a lesser extent checking) tasks.
Given the size of most control rooms (digital or analog) the visual field can easily be fully occupied by a limited about of visual material making changes in parameters that may be relevant but easy to miss. Operators also have to invest substantial cognitive effort in the repeated transitions between global visual processing (board scanning) to local visual processing (parameter detection and recognition). These kinds of effortful visual activities are likely to increase in their frequency and complexity as the number of near and far visual field navigation tasks increases in digital control rooms that have implemented a workstation and overview display design. Thus, while the order of tasking may only have weak influence, the nature of the tasks continue to be important performance moderators.
0 10 20 30 40 50 Ds1 Ds2 Ds3 Ds4 Right Hemisphere Beta Change from Baselie (%)
Exp1 Exp2 Exp3
5-36 5 DISCUSSION On the basis of previous work conducted at the UCF HTPF and at NRC sites, we have advocated a multivariate approach to workload assessment. Such an approach can support HRA by identifying task configurations in which workload may be sufficiently high to increase error probabilities. For workload indices to be diagnostic of operator performance, their sensitivity to workload must generalize across different contexts. Previous work has suggested that key findings such as task type differences are sufficiently robust to generalize across novice and expert samples, and across different types of plant simulation (Reinerman-Jones et al.,
2019). The current report re-analyzed the data from studies of novices and experts to examine another contextual factor, the sequential placement of the task within an ordered task sequence.
Previous studies of workload history (e.g., Cox-Fuenzalida, 2007) suggest that workload may be sensitive to order effects.
In general, the current analyses did not support previous studies of workload sensitivity to order effects. There is no statistical evidence from Studies 1-3 to identify strong order effects or effects that generalized across studies. In Studies 1-3, order effects for metrics captured from the response implementation and detection tasks did not exceed chance levels (see Table 4).
Similarly, in Study 4, there were no significant order effects in the analyses of checking tasks performed by experienced operators using an analog, full-scope, full-scale simulator (note that the design of this study precluded analyses of detection and response implementation tasks). A somewhat higher incidence of significant effects occurred for the checking tasks in Studies 1-3, especially in Study 2. The fine-grained analysis of step-by-step order effects performed for these studies also identified some temporal effects that were significant for multiple studies. We discuss these effects next.
5.1 Order Effects for the Checking Task Study 2 showed various significant effects for EEG metrics from the checking task for multiple electrode sites and spectral frequency bands. EEG response was higher when the checking task followed detection (CD),relative to performance of checking as the first task in the sequence (C0). These findings are equivocal in relation to workload implications. Elevated theta and beta response for the CD condition suggests increased workload relative to C0, but elevated alpha suggests loss of alertness and task-directed effort. A tentative explanation is that, because of its vigilance requirement, the detection task-induced a state of mild cognitive fatigue that carried over to the following checking task. Attentional resource depletion may have elevated the mental demands of checking, expressed in theta and beta, and fatigue may have increased alpha. However, this effect was not evident in ECG, fNIRS, or TCD data from Study 2, suggesting that it was minor. Order effects for the detection task were also found in Study 2 EEG data, but the direction of the effect was reversed. EEG theta and alpha activity were higher when the detection task was performed first, relative to detection following checking or response implementation. Furthermore, there was no generalization of effects to Study 1 or Study 3. The order effect may thus be unique to a novice sample working with a touchscreen and of little practical significance.
In all three studies, a small number of order effects were found for subjective response to the checking task. For the most part, these effects were unique to a single study, suggesting that they either reflect chance (Type 1 error), or idiosyncratic features of the study concerned. The one exception is the order effect for subjective task engagement which replicated across Studies 1 and 2. Engagement was higher for C0 than for CD. One explanation is that exposure to
5-37 the novel simulated environment is somewhat motivating and challenging for novice participants initially, but engagement declines after the somewhat monotonous detection task. A small number of other subjective measures showed a similar trend, such as NASA-TLX mental demand in Study 1, but none of these measures showed a consistent effect across the first two studies. In addition, no such trends were evident in the operator samples, who had greater familiarity with NPP operations in simulator environments. Thus, there appears to be minimal operational relevance to the finding.
5.2 Task Step Order Effects Analyses for changes in psychophysiological response across the sequences of task steps showed a higher incidence of significant effects than the analyses of task-based order effects (see Table 5). However, closer inspection of results showed considerable inconsistency in significant effects for the same metric across studies. This inconsistency may have, in part, reflected the checking and response implementation time intervals being too short for reliable metric assessment for all sensors except EEG.
There was one set of findings for the detection task which were quite robust across studies 1-3, which included both novices and former operators (conducted on a lightweight simulator). All three studies showed significant temporal decline in CBFV, in both hemispheres of the brain (Figure 13). This result is consistent with the vigilance literature, which shows that decline in CBFV typically parallels the vigilance decrement in performance (Warm, Matthews &
Parasuraman, 2009). Analyses of fNIRS data also showed significant step effects in each study, but the pattern of change was less consistent. Study 1 (novices) showed a decline in blood oxygenation over time, resembling the CBFV effect. Studies 2 (novices) and 3 (former operators) showed a consistent but more irregular temporal trend; both showed declining blood oxygenation at the last step, which might be an indication of loss of vigilance. Significant effects on EEG beta power were also found in all three studies, but each one showed a different temporal trend, and the results appear highly idiosyncratic.
Given the short time intervals, findings from the checking and response implementation tasks should be treated with considerable caution. However, a consistent tendency toward higher HRV at the first step compared to later ones was found in the response implementation task in all three studies. There were also significant effects on mean HR, but these were specific to each individual study. The HRV effect is suggestive of lower workload in the initial step, but this interpretation is not corroborated by other metrics. It also failed to generalize to the checking task. Significant effects on HRV were found, but they varied from study to study.
6-38 6 CONCLUSION 6.1 Workload Assessment It is important to be aware of potential order effects in workload assessment and their sources.
However, the current data do not suggest that order effects provide a major threat to validity of assessment in the NPP context. The incidence of significant effects was low, and they showed limited replicability across the first three studies. Thus, the order of task types appears to be, at most, a minor influence on workload in NPP operations. The weakness of order effects demonstrates that the methodology for workload assessment is likely to be robust across different operational scenarios. Additionally, no order effects were found in the more realistic Study 4 scenario. Given modest sample sizes, it would-be desirable to have more data from studies using operator samples, although the current findings do not provide grounds for expecting stronger evidence for order effects if this were done.
Two sets of findings may have some implications for workload assessment. Analyses of the checking task suggested that novice participants may be more engaged if the task is the first one in the sequence. There was no evidence for this effect in the operator sample of Study 3.
Nevertheless, if plant technology, main control room layout, simulated control room, or scenario is novel for operators, it may be worth taking sufficient time to familiarize them with it prior to workload assessment. Second, vigilance-like decrements across successive detection tasks were found using TCD and, to some extent, with fNIRS. This finding suggests that there may be a specific operator vulnerability associated with repeated detection tasks which workload assessment methodology should accommodate. In general, though, workload assessments using all subjective and physiological measures for NPP MCR operators appear to be robust across different task orderings.
6.2 Implications for Operations and Operator Workload During MCR Tasks As in studies 1-3, the incidence of significant effects was low, and they showed poor replicability across the first three studies. Thus, the order of task types appears to be, at most, a minor influence on workload in NPP operations. The weakness of order effects demonstrates that the methodology for workload assessment is likely to be robust across different operational scenarios. Additionally, no order effects were found in the more realistic Study 4 scenario. Given modest sample sizes, it would be desirable to have more data from studies using operator samples, although the current findings do not provide grounds for expecting stronger evidence for order effects if this were done.
6.3 Implications for HPTF Methodology The re-analysis of the results presented here highlights one key advantage of the HPTF methodology; it enables a granular evaluation of tasks, their order, their execution, and how together these factors influence human performance. These results and others from previous HPTF studies further refine the methodological techniques and will contribute part of the technical basis supporting the updates to NUREG-0711, Rev 3, particularly in the areas of Task Analysis, Function Allocation, Staffing and Qualifications, and Human Factors Verification and Validation.
6-39 6.4 Implications for HRA One step during a Human Reliability Analysis is Human Error Probability (HEP) quantification, which generates an estimate of the HEP for important (i.e., probabilistic risk assessment creditable) human actions (Xing, Chang, & DeJesus, 2020). To complete quantification, the analyst must understand the dependency structure between a failure event and each of the causal factors (Paglioni & Groth, 2022). Task order is among the potential performance influencing factors (PIFs) that can lead to an error event. For example, in Study 1, workload and stress were higher and performance was more accurate when the checking task came before the detection task. This suggests that there may be an optimum level of workload for minimizing performance errors and that reaching this level of workload could be moderated by task order.
An important taxonomical item to note about the human factors studies in the HPTF and many of the existing HRA models is that the HPTF uses OHara and Higgins (2010) task classification framework definitions of checking, detection, and response implementation, but these terms do not have a 1:1 mapping to task definitions used in some HRA analyses. For example, within the Integrated Human Event Analysis System (IDHEAS), detection is a macrocognitive function that includes both the HPTF detection and checking tasks. Future research focused on aligning these constructs may be needed for experimental human factors research to be characterized or quantified using existing HRA methods.
The mixed results of the order effects physiological data analysis suggest that, while these data are a potentially useful source of additional information about underlying cognitive mechanisms of important human actions, more research is needed to assess their stability and applicability (see Alvarenga & Melo, 2019 and NUREG-2114 (Whaley,2016)) for discussion of mapping between cognitive mechanisms and macrocognitive functions). Additionally, more research is needed into the stability and applicability of physiological measures to HRA. Related, there is limited consistency in the results from Studies 1-3. This limited replicability has implication for the re-use potential of these data within an HRA context, but also point to the potential for a broader issue in HRA data sourcing. Namely, that when simulator data is re-analyzed for quantification purposes, the analyst must determine if there is value in situating their findings within the context of other datasets or if the data only have value as standalone events.
Regarding enhancement of NRC HRA models, research on WL can enhance the formal process of HRA (Tran et al., 2007). Through a clearer understanding of the sensitivities of WL measures, these results help to determine how changes in task and scenario composition may impact vulnerability to human error. In particular, an enhanced understanding of the PIFs of workload and task factors associated with repeated detection tasks, as mentioned earlier, will help to better inform HRA methods. In addition to physiological workload measures as a potentially useful technique to enhance HRA data collection, the results of this research program can be used to inform dependencies between human actions. These same results could also inform the design and evaluation of the conduct of operations for current as well as new designs and concepts of operations (e.g., small modular reactors). For instance, future research can address how tasks should be allocated or distributed among crew members so as not to overload one individual.
7-40 7 REFERENCES Boles, D. B., & Adair, L. P. (2001). The multiple resources questionnaire (MRQ). Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 45(25), 1790-1794.
Boring, R. L. (2012). Fifty years of THERP and human reliability analysis (No. INL/CON 25623). Idaho Falls, ID: Idaho National Laboratory (INL).
Bowers, M. A., Christensen, J. C., & Eggemeier, F. T. (2014, September). The effects of workload transitions in a multitasking environment. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 58, 220-224.
Cox-Fuenzalida, L. E. (2007). Effect of workload history on task performance. Human Factors, 49, 277-291.
Hancock, P. A., & Matthews, G. (2019). Workload and performance: Associations, insensitivities, and dissociations. Human Factors, 61, 374-392.
Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In P. A. Hancock, & N. Meshkati (Eds.), Advances in Psychology (Vol. 52, pp. 139-183). Amsterdam, the Netherlands: North-Holland.
Henelius, A., Hirvonen, K., Holm, A., Korpela, J., & Muller, K. (2009). Mental workload classification using heart rate metrics. In Engineering in Medicine and Biology Society, 2009: Proceedings of the Annual International Conference of the IEEE (pp. 1836-1839).
New York, NY: IEEE.
Hughes, N., DAgostino, Dickerson, K., Matthews, G., Reinerman-Jones, L, Barber, D.,
Mercado, J., Harris, J., & Lin, J. (2022). Human Performance Test Facility (HPTF):
Systematic Human Performance Data Collection Using Nuclear Power Plant Simulator:
A Methodology - Volume 1. (RIL-2022-11). Washington, D.C.: U.S. Nuclear Regulatory Commission.
Hughes, N., DAgostino, Dickerson, K., Matthews, G., Reinerman-Jones, L, Barber, D.,
Mercado, J., Harris, J., & Lin, J. (2022). Human Performance Test Facility (HPTF):
Comparing Operator Workload and Performance Between Digitized and Analog Simulated Environments - Volume 2. (RIL-2022-11). Washington, D.C.: U.S. Nuclear Regulatory Commission.
Hughes, N., Dickerson, K., Lin. J., Matthews, G. & Barber, D. (2022). Human Performance Test Facility (HPTF) - Supplementary Exploratory Analyses of Sensitivity of Workload Measures - Volume 3 (RIL-2022-11). Washington, D.C.: U.S. Nuclear Regulatory Commission.
Hughes, N., DAgostino, A., & Reinerman-Jones, L. (2017). The NRCs Human Performance Test Facility: Methodological considerations for developing a research program for systematic data collection using an NPP simulator. Proceedings of the Enlarged Halden Programme Group (EHPG) meeting, September 24-18, 2017, Lillehammer, Norway.
7-41 Leis, R., Reinerman-Jones, L., Mercado, J., Barber, D., & Sollins, B. (2014). Workload from nuclear power plant task types across repeated sessions. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 58, 210-214.
Matthews, G., Campbell, S. E., Falconer, S., Joyner, L. A., Huggins, J., Gilliland, K.,... & Warm, J.
S. (2002). Fundamental dimensions of subjective state in performance settings: Task engagement, distress, and worry. Emotion, 2, 315-340.
Mosleh, A., & Chang, Y. H. (2004). Model-based human reliability analysis: Prospects and requirements. Reliability Engineering & System Safety, 83, 241-253.
O'Hara, J. M., & Higgins, J. C. (2010). Human-system interfaces to automatic systems: Review guidance and technical bases (BNL-91017-2010). Human factors of advanced reactors (NRC JCN Y-6529). Washington, DC: United States Nuclear Regulatory Commission.
OHara, J. M., Higgins, J. C., Fleger, S. A., & Pieringer, P. A. (2012). Human Factors Engineering Program Review Model (NUREG-0711, Rev.3). Washington, DC: United States Nuclear Regulatory Commission.
O'Hara, J. M., Higgins, J. C., Brown, W. S., Fink, R., Persensky, J., Lewis, P., & Boggi, M. (2008).
Human factors considerations with respect to emerging technology in nuclear power plants (NUREG/CR-6947). Washington, DC: United States Nuclear Regulatory Commission.
Rasmussen, J. (1983). Skills, rules, and knowledge; signals, signs, and symbols, and other distinctions in human performance models. IEEE transactions on systems, man, and cybernetics, (3), 257-266.
Reinerman-Jones, L. E., Guznov, S., Mercado, J., & D'Agostino, A. (2013). Developing methodology for experimentation using a nuclear power plant simulator. In D. D.
Schmorrow, & C. M. Fidopiastis (Eds.), Foundations of augmented cognition (pp. 181-188). Heidelberg, Germany: Springer.
Reinerman-Jones, L. E., Lin, J., Matthews, G., Barber, D., & Hughes, N. (2019). Human performance test facility experiment 4: Former operator workload and performance comparison between two simulated environments. Rockville, MD: United States Nuclear Regulatory Commission.
Reinerman-Jones, L. E., Matthews, G., Harris, J., Barber, D., Hughes, N., & DAgostino, A.
(2018). Human performance test facility experiment 3: Former operator workload and performance on three tasks in a simulated environment. Rockville, MD: United States Nuclear Regulatory Commission.
Reinerman-Jones, L. E., & Mercado, J. (2014). Human performance test facility task order 1 technical report (JCN # V621). Rockville, MD: United States Nuclear Regulatory Commission.
Reinerman-Jones, L. E., Teo, G., & Harris, J. (2016). Human performance test facility task order 1 technical report (JCN # V621). Rockville, MD: United States Nuclear Regulatory Commission.
7-42 Swain, A. D., & Guttmann, H. E. (1983). Handbook of human-reliability analysis with emphasis on nuclear power plant applications. Final report (No. NUREG/CR--1278). Albuquerque, NM:
Sandia National Labs.
Tran, T.Q., Boring, R.L., Joe, J.C., & Griffith, C.D. (2007). Extracting and converting quantitative data into human error probabilities. Official Proceedings of the Joint 8th IEEE Conference on Human Factors and Power Plants and the 13th Annual Workshop on Human Performance/Root Cause/Trending/Operating Experience/Self Assessment, pp.164-169.
Warm, J. S., Matthews, G., & Parasuraman, R. (2009). Cerebral hemodynamics and vigilance performance. Military Psychology, 21(sup1), S75-S100.
Warm, J. S., Parasuraman, R., & Matthews, G. (2008). Vigilance requires hard mental work and is stressful. Human Factors, 50, 433-441.
Whaley, A. M. (2016). NUREG-2114: Cognitive basis for human reliability analysis. Washington, DC, DC.
Xing, J., Chang, Y. J., & Segarra, J. D. (2020). NUREG-2198: The General Methodology of an Integrated Human Event Analysis System (IDHEAS-G). Washington, DC, DC.