ML20056F445

From kanterella
Jump to navigation Jump to search
Requests That Encl Listed Documents Be Forwarded to PDR, Including Rater Orientation Guidebook, Team Interactions Skills Study,
ML20056F445
Person / Time
Issue date: 08/16/1993
From: Gaddy C, Lewis P, Jonathan Montgomery
NRC OFFICE OF NUCLEAR REGULATORY RESEARCH (RES)
To:
NRC OFFICE OF INFORMATION RESOURCES MANAGEMENT (IRM)
References
NUDOCS 9308270242
Download: ML20056F445 (71)


Text

August 16, 1993 NOTE FOR:

Document Control Desk P 137 Paul Lewis (NLf4-316) Qj g/

FROM:

Human Factors Branch

~

Office of Research Please send the attached documents to the Public Docum,ent Room.

The attached documents are:

o Montgomery, J. C., C. D. Gaddy, R. C. Lewis-Clapper, S. T. Hunt, C. W.

Holmes, A. J. Spurgin, J. L. Toquam, A. bramwell. April 1992.

Team Skills Evaluation Criteria for Nuclear Power Plant Control Room Crews.

Volume I.

Working Draft.

Pacific Northwest Laboratory, Richland, Washington.

o Rater Orientation Guidebook: Team Interactions Skills Study.

i f

P f

i 17000'T 9308270242 930815 PDR ORG NREA PDR 3 Fo S 1li

=__

{

s g-WORKING DRAFT t

i i

TEAM SKILLS EVALUATION CRITERIA FOR NUCLEAR POWER PLANT CONTROL ROOM CREWS l

Volume I J. C. Montgomery C. D. Gaddy (a)

R. C. Lewis-Clapper (a)

S. T. Hunt C. W. Holmes A. J. Spurgin (b)

J. L. Toquam (c)

A.

Bramwell (c) j April 1992 i

Pacific Northwest Laboratory Richland, Washington 99352 LIMITED DISTRIBUTION j

This document copy, since it is transmitted in advance of patent clearance, is i

made available in confidence solely for use in performance of work under l

contracts with the U.S. Department of Energy.

This document is not to be published nor its contents otherwise disseminated or used for purposes other than specified above before patent approval for such release or use has been secured, upon request, from Patent Services, Pacific Northwest Laboratory, i'

Richland, Washington 99352 l

(a) conoral Physics Corporation, Columbia, MD l

(b) Accident Prevention Group, San Diego, CA (c) Battelle Human Affairs Research Center, Seattle, WA l

U$ tb d'*

~ ^

gapkai:eu4uuLdai,Ytad2A

_ pq mA mamna uw oui uw, 7" 45

, t, m C h u ( ( ",

4 i

Q Q( f)L7L&VM. W1

  • I' M JNdf 4%s8 l

I l

3 s

  • i 1

TEAM SKILLS EVALUATION CRITERIA FOR NUCLEAR POWER PLANT CONTROL ROOM CREWS EXECUTIVE

SUMMARY

The present study developed and evaluated measures of team interaction skills using nuclear power plant control room crews in a simulator r

environment.

Team interaction skills refers to the ability of a control room crew to establish and maintain positive interactions among themselves while working together to perform coordinated and integrated activities in order to provide safe and effective plant operations.

Such skills have been seen by researchers as impacting decision making quality, performance and productivity, and plant safety, especially during emergencies.

Previous research on team interaction skills has done little to develop reliable, valid measures, and virtually none of the research has focused on the nuclear industry. The intent of this study was to develop comprehensive and highly reliable measures (i.e., that could be used with a high degree of agreement by different raters) that would have both practical and research value.

Rating scales capable of measuring team interaction skills were developed using an iterative development, evaluation, and revision process.

Initially seven dimensions of team interaction skills were identified from the existing research literature, the assistance of Battelle Pacific Northwest Laboratory (PNL) contract license examiners and the experience of the research team in control room settings. Two rating formats were employed: 1) a Behaviorally Anchored Rating Scale (BARS) format requiring ratings of crew performance on each dimension of team interaction skills, and 2) a Relative Behavioral Frequency format requiring recollection of the frequency of occurrence of specific team behaviors. After an initial data collection and i

analysis process (see Montgomery et al.,1990), these dimensions were revised to the following six: 1) Communications, 2) Team Spirit, 3) Openness /

Participation, 4) Task Coordination, 5) Adaptability, and 6) Crew Reactions to Volatile Situations.

The revised scales were tested with the help of the training staff and control room crews at a nuclear power plant located in the western United States. The training staff participated in a day-long training session which helped them become familiar with the rating scales; provided opportunities for viewing and rating. videotape scenarios depicting poor, go'.,d, and moderate team interaction skills; and which incorporated immediate ferdback and discussion of rating performance. Following training, the ins M ctors rated 14 different control room crews during requalification training using simulator scenarios specifically selected for the present study.

The ratings of team interaction skills proved to be generally very positive, indicating that all of the crews were seen as performing well as a team. The raters demonstrated a high degree of interrater agreement on the ratings.for both the BARS and the Relative Behavioral Frequency ratings.

In 4

addition, it was found that the six dimensions of team interaction skills were highly related, rather than independent, reflecting one or two underlying dimensions. Additional analyses were conducted using the ratings obtained i

during rater training to compare trainee and " expert" ratings, thus assessing rater error.

The BAR ratings showed a pattern of less error than Relative i

Behavioral Frequency ratings, and some evidence was found of decreasing error I

s.

4 i

over time. The BARS ratings were able to clearly differentiate between the poor, good, and moderate levels of performance depicted on the videotapes, and that these ratings closely matched the expert ratings.

The results provided strong support for the reliability and validity of the BARS scales for use in measuring team interaction skills. After i

considering the literature on the cognitive judgment processes, it was concluded that the Relative Behavioral Frequency ratings may not, in fact, I

have elicited recall of behavior frequency but probably required judgement j

]

processes similar to those required for the BARS.

The far larger number of Behavioral Frequency ratings may simply have proved to be too difficult a

~

cognitive task for the raters.

Methodological issues were reviewed, including the need to randomize presentation of the videotape scenarios to eliminate f

potential order effects, to increase videotape length, to include more expert raters, and the need to include control room crews who do not always perform extremely well as a team.

Additional areas for research were discussed l

including linking team interaction skills to objective measures of simulator / plant performance.

In terms of practical application of the i

measures, it was concluded that the BARS ratings are suitable for use in training and development purposes and may also be used for control room crew l

4 requalification examinations should team interaction skills become a part of the examination process.

+

d 1

l i

t 4

t 1

1

4.

t WORKING DRAFT n

f CONTENTS 1.0 PURPOSE OF THE PRESENT STUDY.................

1.1 l

l 1.1 RESEARCH OBJECTIVE 1.1 1.2 NRC AND INDUSTRY BACKGROUND 1.1 l

l 2.0 TECHNOLOGY EMPLOYED 2.1 3.0 RESEARCH METHOD 3.1 j

t

3.1 INTRODUCTION

3.1

3.2 BACKGROUND

TO SCALE DEVELOPMENT EFFORTS: STATUS OF RESEARCH ON MEASURING TEAM INTERACTION SKILLS 3.2 i

r 3.3

SUMMARY

OF PRELIMINARY RESEARCH EFFORTS 3.3 i

-l 3.4 APPROACH AND METHOD OF THE PRESENT STUDY 3.5 3.4.1 Rating Scale Revision 3.5 f

3.4.2 Study Participants 3.6 3.4.3 Rater Training 3.6 l

3.4.4 Simulator Scenarios 3.8 3.4.5 Data Collection 3.8 I

4.0 RESULTS 4.1 l

4.1 DEMOGRAPHIC STATISTICS 4.1 i

4.2 RATING SCALE DESCRIPTIVE STATISTICS AND SCALE i

INTERCORRELATIONS

4. I' 4.3 FACTOR ANALYSES 4.3 i

i 4.4 INTERRATER RELIABILITY 4.4 i

4.5 INTERNAL CONSISTENCY RELIABILITY 4.7 4.6 TRAINING RATINGS 4.7 i

9

?

WORKING DRAFT l

t CONTENTS (contl

?

4.7 CRONBACH'S COMPONENTS OF ACCURACY l

ANALYSIS

. 4.10 l

5.0 DISCUSSION 5.1 5.1 OVERVIEW 0F FINDINGS OF THE STUDY 5.1 i

5.1.1 Descriptive Statistics 5.1 5.1.2 Factor Analysis / Internal Consistency 5.2 Reliability 5.1.3 Interrater Reliability 5.2 i

5.1.4 Training Ratings 5.3 5.1.5 Cronbach's Components of Accuracy Analysis 5.4 5.2 ' BARS VS. BEHAVIORAL TREQUENCY RATINGS:

THEORETICAL ISSUES 5.5 5.3 METHODOLOGICAL CONSIDERATIONS 5.7 j

5.4 CONCLUSION

S AND RECOMMENDATIONS 5.9 i

6.0 REFERENCES

6.1

'I 6

9 4

I i

g L.--:

w3.,m,s

--u.-

4 a

A b

L s*

>l i

WORKING DRAFT I

TABLES I

3.1.

Definitions of Team Interaction Skills Dimensions 9 3.7 I

4.1.

Descriptive Statistics for the Behaviorally-l Anchored Rating Scales 4.2 4.2.

Descriptive Statistics for the Behavioral Frequency Scales. 4.2 i

4.3.

Internal Consistency Reliability of Behavioral Frequency Ratings 4.7 t

t

?

5 S

h I

h k

i i

I i

s l

f

~ _.

i WORKING DRAFT l

FIGURES i

i 4.1.

Interrater Reliability for BARS Ratings, Scenario 1.4.6..

4.6 4.2 Interrater Riliability for Behavioral Frequency Scales,

[

Scenario 1.

4.6 j

4.3.

Mean BARS P.ating for Videotape Scenarios of Poor,

'f Good, and Moderate Performance 4.9 l

i.

4.4.

Mean Behavioral Frequency Rating for Videotape Scenarios of Poor, Good, and Moderate Performance..........

4.10 4.5.

Values EL Component for Videotape Scenarios of Poor, Good, and Moderate Performance..............

4.13 4.6.

Values of SA Component for Videotape Scenario of

[

Poor, Good, and Moderate Performance..........

4.14 i

l 1

T 4

i b

t WORKING DRAFT f

1.0 PURPOSE OF THE PRESENT STUDY i

1.1 RESEARCH OBJECTIVE l

l Team interaction skills have a strong impact on effective team decision-l making, performance and productivity, plant safety, and plant operations during emergencies (Newman, 1989).

Nonetheless, measurement efforts in the nuclear industry have focused almost exclusively on technical skills, with little attention given to developing measures of team interaction skills.

Earlier Nuclear Regulatory Commission (NRC) sponsored research has identified the existence and importance of critical team interaction skills (Davis, Gaddy, &

J Turney, 1985).

The current study takes the additional steps of developing, i

administering, evaluating, and refining methods of evaluating team interaction skills.

i 1.2 NRC AND INDUSTRY BACKGROUND f

I Attention to team interaction skills intensified in the nuclear industry

)

following the Three Mile Island (THI) accident (Gaddy & Wachtel,1992).

The e

initial response was the requirement of in-plant drills, which were derived from the Navy nuclear program experience in practicing for accident situations.

The experience gained from in-plant drills led to increased concern for the interpersonal and interactional aspects of emergency response.

Nuclear industry organizations have begun to incorporate team interaction l

skills into their activities.

For example, the Electric Power Research Institute (EPRI) sponsored a study of plant communications (see Topmiller, l

et al.,

1981) in which several communications problems (such as a lack of standard terminology and a lack of procedural guidelines) were described. The l

Institute for Nuclear Power Operations (INP0) has assembled samples of team

(

training materials based on their audits of nuclear power plant training l

programs (INPO, 1988).

Recently INP0 also initiated team skills training, l

relying on dimensions of team interaction skills that closely paralleled the j

dimensions used in the preliminary effort of the present study (see Montgomery

[

et al.,1990).

However, none of these efforts were intended to provide reliable, valid measures of team interaction skills.

l 1

i k

l

WORKING DRAFT

~

2.0 TECHNOLOGY EMPLOYED

,,..v, The development of measures of team interaction skills began with a i

comprehensive assessment of the dimensions underlying this construct, based on the applied psychology research literature as well as on interviews and input l

from control room operators, operator training staff, and licensing examiners.

2 i

Decisions were then made regarding the most appropriate approaches to scale development for these dicensions.

Initial scales were developed, data was collected using the scales, and the scales were evaluated for their suitability using a variety of statistical techniques. An iterative revision process was followed until a quality end product was developed.

t i

i i

i I

t i

e 2.1

i WORKING DRAFT 3.0 RESEARCH METHOD

3.1 INTRODUCTION

This report presents the findings of the second year of a two-year study j

designed to develop and evaluate methods of measuring team interaction skills i

t of nuclear power plant control room operators.

Results of this research are s

presented and discussed in terms of scale development and validation activities f

as well as potential applications in a nuclear power plant environment.

Results of the initial year's research are presented in detail elsewhere (Montgomery, et al., 1990) but will be summarized in the present report as

[

needed for clarity and continuity. The present report focuses on the process i

of revising scales developed previously, modifying the research design to l

strengthen study

outcomes, collecting additional
data, analyzing and l

I interpreting the new data, and discussing the implications of these results for potential use of the scales in a field setting.

3 As discussed in the earlier report (Montgomery et al.,

1990), team interaction skills are an important aspect of performance that needs to be clearly distinguisheo from related concepts such as technical skills and team j

performance.

Team interaction skills refers to the ability of'a control room crew to establish and maintain positive interactions among the crew members as well as to work together to perform coordinated and integrated activities in order to provide safe and effective plant operations. Team interaction skills j

thus are closely associated with the interpersonal and group processes cccuring within the control room crew. Examples of positive team interaction behaviors might include sending clear messages, attending to what was said, providing suggestions and ideas when appropriate, helping to resolve conflicts among crew members, participating in decision making when appropriate, and displaying positive emotional responses towards other crew members and to the crew as a whole. Team interaction skills may also involve establishing and maintaining cohesive coordination and cooperation between the control room crew and other power plant personnel, such as auxiliary operators, I&C Technicians, health physics technicians, and plant management.

3.1

)

.-%m, 1

l WORKING DRAFT To date, adequate measures of team interaction skills have simply not been developed. The present research attempts to develop, evaluate, and refine team interaction skill measures that can be used with confidence by training staff, licensing examiners, and others who need to make assessments regarding team behaviors and competence.

i

3.2 BACKGROUND

TO SCALE DEVELOPMENT EFFORTS: STATUS OF RESEARCH ON MEASURING TEAM INTERACTION SKILLS Brannick, Roach, and Salas (1991) emphasized that the impact of team interaction variables on team performance (a key research question) cannot be l

evaluated without the presence of good team interaction measures.

Simil arly,.

a number of reviewers, including Dyer (1984), have concluded that reliable, valid measurement tools need to be developed in order to assess team interaction skills. Oser, McCallum, Salas, & Morgan (1989) noted the presence l

t of largely inadequate measurement systems in team research and Cannon-Bowers and Salas (1990) argued for the need to develop effective measures of team j

functioning.

l As a starting point, several researchers have attempted to identify the l

domain of variables that need to be included in measures of team interaction j

skills.

Gladstein (1984) identified the dimensions of open communication,.

{

supportiveness, conflict, discussion of strategy, weighing individual inputs, and boundary management.

Cannon-Bowers and Salas (1990), for example, mentioned such behaviors as closed-loop communications, compensatory behavior, adaptability, coordination, giving and receiving feedback, mutual performance monitcring, anticipation of team members' actions, and sharing resources.

J Oser, McCallum, Salas, and Morgan (1989), using the critical incident approach, identified the dimensions of communication, cooperation, team spirit and morale. giving suggestions and criticism, accepting suggestions and criticism, u

.w,...

coordination, and adaptability.

ge

-i,.~~~

t

. x gr. "n Brannick, Roach, & Salas (1991) have. argued that the heart of the l

measurement issue is the lack of co5truct vATiditi of the measures which have

~

been developed.

Essentially they mean that the measures have not been proven to comprehensively cover the domain of team interaction skills, that there is

, d " d.'u 3.2

..h o

h t.D A; W hetete.

& o b {j ; M -

KhtA i,.zav.,naum b n s

t

t WORKING DRAFT i

no evidence that those using the measures are able to do so reliably, or that f

there is support for their value in research or in practical applications.

Brawley, Carron, and Widmeyer (1987) concluded from their review of the literature that little attempt has been made to base measures on conceptual models, that measures typically under-represent the construct, and that little i

attempt has been made to assess the psychometric properties of team interaction skills measures.

Because the research on team interaction skills has been 9

based on inadequate measures, it is difficult to compare results across studies or to obtain a cumulative growth in knowledge about teams over time.

One of the goals of the present research was therefore to address some of the key weaknesses in the past research on team interaction skills, including j

the lack of sound, valid measures of team interaction skills.

Such measures l

must meet high technical standards, such as displaying high interrater agreement, high internal consistency, a sound factor structure, and j

consistency with other measures or indicators or team interaction skills. To the extent that such standards are met, one can conclude that there is evidence of construct validity-- that the measures do, in fact, measure what they are

[

intended to measure.

The exclusive focus of the present research was on the nuclear industry. Although attention has been given to the importance of team interactions by the NRC, INPO, and the nuclear industry itself, prior research' has not directly focused on the development of reliable and valid measures of this construct.

t 3.3

SUMMARY

OF PREllMINARY RESEARCH EFFORTS The study began in 1989 with an extensive scale development process. An initial set of seven dimensions of team interaction skills were eventually identified on the basis of the existing research literature, experience of the research team in control room settings, and assistance from PNL contract

[

licensing examiners. These seven dimensions were: 1) Two-Way Communication of Objectives / Plant Status, 2) Resource Management, 3) Inquiry, 4) Advocacy, 5)

Conflict Resolution and Decision-Making, 6) Stress Management, and 7) Team r

q

,% \\b

,

  • bdh M (,, g d

[*-

M P

)

3.3

i i

WORKING DRAFT j

Two primary rating scale formats were developed for measuring these j

dimensions: the Behaviorally Anchored Rating Scales (BARS) and the Relative Behavioral Frequency Scales. The BARS format consisted of a simple, graphic, seven-point rating scale for each dimension. Descriptors (" anchors") for high, medium, and low performance for each dimension were also included.

The Behavioral frequency scales did not rely on a single rating for each dimension.

Rather, several items were developed for each dimension. For each item, raters j

were to provide information about the frequency of occurrence of the behaviors described in the item.

Thus, raters faced quite different tasks on the two scales.

While the BARS required global,iudgments of good versus poor performance on each dimension, the Behavioral Frequency Scales required the l

recall of the frequency of occurrence of several behaviors for each dimension.

The resulting scales were pilot-tested at two simulator sites.

Several-revisions were made to the two types of scales-based on the pilot studies, which included comments and suggestions from study participants.

l t

The revised scales were then used to evaluate the simulator performance of 14 control room crews, each with four to five members.

Raters included simulator instructors, members of the research team, and crew members l

18 themselves.

Both groups participated in a short rater training program prior to making the ratings.

To help assess scale reliability and validity, each crew was rated on more than one training scenario.

The ratings tended to be very positive for all control room crews for both the BARS and the Behavioral Frequency Scale ratings. Several of the dimensions proved difficult for raters to use, resulting in numerous "Not Observed" l

ratings. Interrater reliability for each scale (across all dimensions) proved to be unstable for both scales. That is, interrater reliability was sometimes f

high and sometimes low with no apparent pattern.

Internal consistency proved j

reasonably high.

Numerous user comments were obtained which, on the whole, l

indicated that the BARS format was easier to use than that of the Behavioral i

Frequency ratings.

l Although the research effort represented a significan~t step in terms of developing measures of team interaction skills, the need to further refine the i

3.4 i

f WORKING DRAFT l

I scales was apparent.

In particular, the need to obtain consistently high interrater reliability was critical.

An extensive scale revision and i

evaluation process (the study summarized here) was then undertaken.

3.4 APPROACH AND METHOD OF THE PRESENT STUDY The present study, begun in the fall of 1990, was sirilar to the initial effort in that rating scales were developed and evaluated using the performance of control room crews in simulators.

However, the present study differed in several key ways from the initial study.

First, scale development involved a i

scale refinement and revision process, based on statistical analyses and on user comments about the original scales, rather than an initial development I

effort.

Second, the emphasis given to rater training was greatly increased, with training expanding from I hour (or less) in the earlier study to a full day, including the use of videotape scenarios and extensive practice using the rating scales.

(Scenarios depicting poor, good, and moderate performance were j

developed as a part of this study.)

Third, data collecNi$volved a more gl focused sample, using only training instructors at a single simulator facility.

Finally, the data analysis strategy was modified to include several more i

complex, sophisticated analyses, as well as conducting analyses of ratings provided during training to get a better understanding of the types and degree of rater error that might be present in the data.

?

3.4.1 Ratina Scale Revision The rating scale revision process was guided by both the content analyses of rater comments as well as statistical considerations. The research team met i

in June, 1990, to thoroughly review these analyses and begin the revision process.

The original seven dimensions of team interaction skills were subsequently revised and reduced to six. These included Communications, Team Spirit (analogous to two of the previously used dimensions), and four extensively-revised dimensions: Openness / Participation, Task Coordination, Adaptability, and Crew Reactions to Volatile Situations.

The new dimensions were felt to be more consistent with the statistical evidence, likely to be 3.5 i

Q)

Q & c.SC A)

S h

a%'.Sw.mayc.e wJA %

. v...s.. -

'9 WORKING DRAFT I

observable in most simulator scenarios, and more easily understandable and usable for study participants.

The new dimensions and their definitions are provided in Table 3.1.

In addition, the individual items for the Behavioral Frequency scales were extensively revised and new anchors were developed for the BARS.

3.4.2 Study Participants

-l.,j, v

4

. ri M #

crews at a Operator training personnel and control room operator.~

Pressurized Water Reactor (PWR)(iocated in the western United States) served as the primary participants in the study.

The crews were in requalification training at the time the study was conducted.

Participation was strictly voluntary. In addition, certain members of the research team served as raters, including a nuclear engineer, an organizational development specialist, and a contract license examiner. While the entire operator training staff at the PWR served as raters, they rotated through the study so that only a few were used for any given crew.

3.4.3 Rater Trainina All raters participated in a full day training session to familiarize them with,the meaning of the dimensions of Team Interaction Skills, to help insure the appropriate use of the seven-point BARS and Behavioral Frequency rating formats, and to provide multiple opportunities to use the scales in a controlled setting prior to actual data collection efforts.

ka2 Pb 'N

'c6l*1,l((, :

.wd The training incorporated three videotape scenarios developed by training p

personnel bhe Limerick Generating Stationbpecifically for the present. yuuw l study.

The scenarios depicted control room crews displaying poor, good, and

,p pj moderate levels of Team Interaction Skills. Following each scenario, training.;

participants provided both BARS and Behavioral Frequency scale ratings of the n bd lj observed crew behaviors.

The research team led a discussion of the ratings and supplied "true score" ratings which served as feedback to the study participants regarding their own ratings. The strong emphasis on training in the present study reflected concern that the one hour training used previously i

3.6

WORKING DRAFT TABLE 3.1.

Definitions of Team Interaction Skills Dimensions

[

t COMMUNICATION The Communications dimension' consists of the transmission of factual l

v information in a clear and concise manner.

In terms of crew behaviors this includes listening skills, nonverbal behavior, and articulation of

^

plant status or instructions for future activities.

Communications does,

not include emotional aspects of information transmission.

OPENNESS / PARTICIPATION M*i Openness / participation consists of crew members' tendency to ask for, give, and receive suggestions.

It includes questioning decisions and discussing alternatives to arrive at the best possible decision. Openness also involves the reactions of crew members to feedback, which should i

focus on the task rather than on the person when reviewi_ng actions.

l TASK COORDINATION Task Coordination refers to the crew members' ability to match the.

available resources, such as people and. procedures, to the task in order to achieve the optimal workload distribution.

TEAM SPIRIT

.'v4LC Team Spirit consists of the. mutual support, cohesiveness, group identity, and extra effort crew membe'rs exhibit to accomplish a common goal.

t MAINTAINING TASK M

FOCUS IN d' ".

s TRANSITIONS

$^ '-

Maintaining task focus in transitions deals with.. crew Members' responses j

-.cto changes from normal to non-normal plant conditions (e.g.,

loss of pressure in feedwater pump).. These responses include focusing on the task, avoiding emotional overraction or panic, and maintaining poise.

ADAPTABILITY Adaptability reflects crew members' ability to adjust or modify their

/

behavior based on the situation, to be flexible in responding to the environment, and to recognize the need for change.

j 1

3.7

-.~

l WORKING ORAFT i

had not provided sufficient time for study participants to become thoroughly l

familiar with the rating materials.

I 3.4.4 Simulator Scenarios Project staff worked with plant training pers 7nnel to select simulator j

scenarios appropriate for use in the study. It was intended that each selected

{

scenario would provide opportunities for crew member interactions beyond simply

?

reading and responding to procedures in a step-by-step fashion. They were to j

allow crew to take control of critical plant parameters (e.g., rod position, boron concentration, pressurizer level, temperature, containment pressure, f

radiation levels), during the simulation of emergency control room conditions.

j In addition, the scenarios were to proceed gradually from ordinary to emergency j

procedures, include equipment and instrument failures to increase workload, involve the less familiar emergency operating procedures (EOPs), and involve j

secondary failures (covered in less detail in the E0Ps).

i

.r.Lo l

The scenarios ultimately selected by the research team and the. training j

instructors were named Dropped Rod (s)/ATWS, Stuck Rods /RCS Leak, and Stuck Rods /Off-site Power loss (NAT CIRC).

All three scenarios were dry-run prior to data collection, with a project research member observing the process and were found to be appropriate for research needs.

I I

3.4.5 Data Collection l

Several distinct types of data were collected. First, during the one day l

l rater training program, the training participants provided both BARS and f

Behavioral Frequency scale ratings for the videotaped scenarios of poor, good, and moderate performance.

These data were gathered in part to improve i

familiarity with the rating instruments, but also to allow for a detailed i

evaluation of sources of rating error, as determined by comparison with " expert ratings" proviied by the project research staff.

The primary data collection effort involved ratings by project staff and plant training personnel of the team interaction skills of control room crews in the plant simulator during requalification training.

The-requal i ficat4on- --

i I

3.8 t

?

WORKING DRAFT training took_ place fxam Qanuary.k 1995) to February 8, 1991, while hta collection took place during4he-first three_ weeks of this period.

For each of than three weeks, several control room crews spent a three hour session in the simulator responding to the three scenarios which had been telected for the study. Ratings of the team interaction skills displayed were provide by three project staff members and by several pl ant training personnel who had participated in the rater training session.

t i

L k

l 8

i

?

3.9 l

i WORKING DRAFT i

4.0 RESULTS I

This section provides the results obtained from the current phase of the study.

The following statistics were computed and are reported here:

demographic and descriptive statistics, factor analyses, internal consistency reliability, interrater reliability, and antlysis of rater errors tased on training data.

I 4.1 DEMOGRAPHIC STATISTICS A total of 14 training instructors served as raters of the Team Interaction skills during perfarmance of control room crews on simulator scenarios.

Instructors were drawn from personnel with experience in both Nuclear Operations and/or the Conmercial Electric Generating Industry and had a wide variety of educational backgrounds.

Eleven possessed Senior Reactor i

Operator (SRO) licenses. The training instructors averaged 12 years of nuclear power plant operating experience.

Twelve existing control room crews participated in the study, with each crew performing in three scenarios for a total of 36 crew observation sessions. Nearly 75% of the crew members possessed the SRO license, with the remaining possessing the R0 license.

Crew members averaged 6.5 years of f

nuclear power plant operating experience.

i 4.2 RATING SCALE DESCRIPTIVE STATISTICS AND SCALE INTERCORRELATIONS At the conclusion of each simulator scenario, both BARS and Behavioral Frequency ratings were completed. independently by members of the observer team, l

with no discussion of ratings. Descriptive statistics for the BARS ratings are provided in Table 4.I.

Note that these values were computed across all scenarios and crews, since little variation in ratings was found.

Given that each scale consisted of 7 av.chor points, ideally distributed ratings would result in mean values of roughly 4.0., the scale midpoint. The means shown in Table 4.1 all are considerably higher than 4.0, suggesting a pattern of very favorable ratings of team interaction skills. Virtually all the ratings were at the high end, with few at the middle or low end.

Unfortunately, the high 4.1 c

e

WORKING DRAFT ratings also indicated the presence of restriction of range in the data.

Restriction of range can adversely impact some statistics that are based on the assumption of a more normal (i.e., bell-shaped) distribution. For example, the correlation coefficient can be severely attenuated by data of restricted range.

TABLE 4.1.

Descriptive Statistics for the Behaviorally Ar. chored Rating Scales

..g.5 fc Dimension M

SD N

" '* {

j 7

t Communications 5.37 0.92 176 Openness 5.51 0.82 176

t

- i Task Coordination 5.36 1.02 175

'l Team Spirit 5.54 0.95 175

. )

Maintaining Task Focus 5.51 0.87 175 Adaptability 5.55 0.87 174 l

Mean values for the six Behavioral Frequency dimensions were computed by determining the mean of all the items present in each dimension (see Table 4.2).

These means also indicated very positive levels of team interaction skills by significantly exceeding the scale midpoint of 4.0.

4 TABLE 4.2.

Descriptive Statistics for the ' Behavioral Frequency Scales Dimension M

SD N

Comruunications 5.46 0.67 176 Openness 4.88 0.99 138 Task Coordination 4.76 1.18 175 Team Spirit 4.83 0.97 177 Maintaining Task Focus 4.96 0.76 141 Adaptability 5.36 0.82 174 t

The correlations between the BARS ratings and Behavioral Frequency i

dimension values were computed.

If both types of rating scales measured the same construct with equal accuracy, relatively high correlations along the diagonahmould be expected.

That is, the BARS Communications ratings should O

.c. p eu u.er,

4.2 L

r vr

l 4

4 WORKING DRAFT correlate highly with the Behavioral Frequency Communications dimension, and so on for each dimension (indicating " convergent validity"). Further, all off-t diagonal correlations should be weak (indicating " discriminant validity"). For

{

example, the BARS Communications ratings should correlate weakly with the remaining Behavioral frequency dimensions.

However, the mean diagonal

{

correlation value of 0.44 was only slightly higher than the mean off-diagonal correlation value of 0.36.

4.3 FACTOR ANALYSES Confirmatory factor analyses were conducted for both the BARS and Behavioral Frequency scales using the EQS software program (Bentler,1989).

Since factor analysis allows an assessment of how different variables " group together", the results have important implications for conclusions about the validity of scales.

For example, for both scales a key question was whether the raters would use the six dimensions as distinct entities (a six factor

{

hypothesis), whether a general sense of " team interaction skills" would underly all the ratings (a single factor hypothesis), or whether some other pattern might emerge.

The software was programed to test for the presence of one, two, or six factors in the data.

For the BARS, a two-factor solution fit - the data exceedingly well. The dimensions of Communications, Openness, and Team Spirit formed one f actor; Maintaining Task Focus and Adaptability formed the second factor; and Task Coordination was found to be included in both factors.

i 4

Since the Behavioral Frequency dimensions were composed of multiple items, the factor analyses would provide additional crucial information about how well I

each item would group together with other items from the same dimension.

l However, it was found that none of the confirmatory analysis models (one, two, or six factors) provided a good fit to the data.

Exploratory factor analyses I

were also run, but no clear solution emerged. (A five factor solution provided the clearest factor structure, but did not account for sufficient variance in the ratings to warrant acceptance.)

Items on the Rehavioral Freqyency scale.

1 4.3

WORKlf4G DRAFT thus failed to group together as predicted during the scale development process.

4.4 If4TERRATER RELIABillTY Interrater reliability was evaluated to determine the extent to which the raters agreed in their assessments of the team interaction skills displayed by the control room crews. This was a critical analysis, since failure to obtain agreement among raters would cast significant doubts on the validity of the ',

.+

measures.

Several statistical approaches are commonly used to assess

.P*

j interrater reliability, including intraclass correlations, mean percentage of 4. f ',, ~

agreement, and average intercorrelations among judges' ratings.

However, in

'. 7t 0-[

cases where the ratings are characterized by substantial restriction of range, ] 3:3 as in the present study, James (1984) has shown that these methods are.

inappropriate because each approach understates the actual degree of interrater agreement.

As an extreme example illustrating James' concern, if two raters provided exactly the same ratings on six dimensions they would be in perfect agreement, yet the correlation between their ratinas would be zero. (Plotting the points, using the dimensions for one rater along the x-axis and the other rater's dimensions for the y-axis, results in a single point.)

Even worse, if the ratings were the same on five of six dimensions, but differed by one point for the sixth, the correlation would be -1.0, suggesting complete disagreement.

James was also concerned that some common rating errors can result in the appearance of consistency across raters when none may in fact exist.

For example, raters displaying leniency rating error (the persistent tendency to j

rate very positively) might appear to be providing consistent, reliable ratings l

because all ratings would be at the high end of the scales, when the ratings might be essentially random.

James consequently developed an approach for assessing interrater reliability in situations of restricted range where the potential for rating bias was present.

In this approach, the overall variance in ratings was first statistically compared to variance that would be obtained by completely random ratings.

In 4.4

, cg J c

'.pyy f { ($

sd)

, g j (.828 e

1

]

WORKING DRAFT addition, comparisons were made between obtained rating variance and variance that would result from conditions of hypothetical moderate or severe rater bias. True rater agreement is indicated if ratings are found to be even more consistent than could be explained by even extreme (albeit hypothetical) rating biases.

l Calculation of interrater reliability was performed separately for the BARS and for the Behavioral Frequency Scales. Three hypothetical distributions were used for comparison purposes.

Using James' terminology, these were "no skew" (i.e., no bias), moderate negative skew (moderate positive bias), and severe negative skew (severe positive bias).

Results of the analyses indicated very high levels of interrater agreement by raters for both types of rating scales, for virtually all of the-crews observed, and for all three of the simulator scenarios.

Figures 4.1 and 4.2 provide examples of the results for all eight crews on the first simulator scenario. Very similar results were obtained for the other two scenarios. The values of interrater reliability displayed on the graphs should be taken to represent proportion of agreement among the raters, with 1.0 indicating 100%

i agreement and 0.0 indicating 0% agreement.- Assuming the presence of moderate or severe skew in the ratings results in somewhat lower values of interrater reliability than the no-skew, or random, condition but the values are

)

nonetheless consistently above 0.80, with a number of values exceeding 0.90.

Note that a value for interrater reliability of 0.80 is typically used as a standard indicating good agreement among raters.

4.5

)

a.

WORKING DRAFT FIGURE 4.1.

Interrater Reliability for BARS Ratings, Scenario 1.

f

},

'N l

}

n nl i

a 1

}i

.f u

L l

y u

5

(

(

'd

[

05 f

j l

l u

OA I

!k l

I I

i 0.3

.! l {}

'.i s j s

lbl u

0.1 I s k k h

o Crew I Crew 3 Crew 5 Crew 7 Crew 2 Crew 4 Crew 6 Crew 8 M Rcrvsom Distrbtiori M Woderate Skew M [rtreme %w s

FIGURE 4.2 Interrater Reliability for Behavioral Frequency Scales, Scenario 1.

t o.9 I

[.

y o.s l

)

f Il k

l O.7 ll l

(

o.s ls l !

u L

{

J g j

s ca

'l:

(l ls.l l}

1 i t o.3 l

l l l~ f, { l i

o.i 3

s s

D o

Crev i Crew s C,ew s Cre. 7 Crew 2 Crew 4 Crew 6 Crew E M Roh Distrtutko Mao rete !kew W Extreme Se*

4,6

f 6

j i

WORKING DRAFT

{

4.5 INTERNAL CONSISTENCY RELIABILITY f

Cronbach's coefficient alpha (see Nunnally,1978), a measure of internal t

consistency reliability, was computed for the BARS and Behavioral Frequency ratings.

Internal consistency indicates the degree of homogeneity, or intercorrelation, of the items within each dimension. Since the BARS required f

only one rating per dimension, coefficient alpha assessed homogeneity across f

all of the six dimensions.

The resulting value of alpha was 0.90, indicating a high degree of intercorrelation, or homogeneity, among the rating dimensions.

For the Behavioral Frequency ratings, coefficient alpha was computed first for all 26 items, resulting in a value of 0.81, indicating a substantial degree f

of intercorrelation across all six dimensions.

Coefficient alpha was then

-l j

computed for each of the six dimensions (see Table 4.3).

The values for each dimension indicated a low to moderate degree of internal consistency.

In j

general, alpha values of 0.70 and above would indicate a substantial degree of l

8 homogeneity of items within each dimension.

The low to moderate values of i

alpha for each dimension are consistent with the factor analytic results, which demonstrated that the Behavioral Frequency items did not load on dimensions as hypothesized.

..f..'

M

< i I

TABLE 4.3.

Internal Consistency Reliability of Behavioral l

Frequency Ratings 1

Dimension:

Coefficient Alnha:

j

1. Communications 0.68 l
2. Openness 0.44 j
3. Task Coordination 0.54
4. Team Spirit 0.48
5. Maintaining Task Focus 0.25
6. Adaptability 0.52 Coefficient Alpha for Overall Behavioral Frequency Ratings = 0.81 l

1 i

4.6 TRAINING RATINGS During rater training the training instructors were asked to rate the team interaction skills in three videotape scenarios depicting poor, good, l

and moderate team interaction skills.

In addition, prior to conducting the 4.7

ya s+ * ~2 i Ekl?,;.

^

,. op. Y. E

..,y 7

'2-w. 2

~ " ' "

WORKING DRAFT training, the research team provided expert, or "true score" ratings to serve as standards of.the performance depicted in the videotapes.

Analyses of these ratings were conducted to assess the extent to which the training ratings were able to accurately reflect the videotape performance.

A close match between the training ratings and the true score ratings would provide additional evidence for scale validity and greater confidence in the ability i

of the scales to differentiate among varying levels of team interaction skills in real life settings.

Since high correlation between dimensions within each type of rating scale were found, summative scales were formed by computing the mean values across the six dimensions.

Separate scores were computed for each scale.

The resulting mean values were therefore more reliable (i.e., contained less error) than individud dimension scores (Nunnaly,1978) and helped to 3

simplify interpretation of analyses.

Mean values of the training and true score ratings across the six dimensions of team interaction skills for the BARS are shown in FIGURE 4.3.

The ratings for the first scenario, depicting poor performance, were quite low for both the training instructors (mean =2.1) and for the true score ratings (mean=2.7).

For the second scenario (good performance) training instructor ratings (mean=6.3) and true score ratings (mean=6.5) were r

considerably higher and again quite consistent with one another.

For the third scenario, depicting moderate performance, training instructor ratings (mean=4.5) and true score ratings (mean=4.3) dropped to values roughly midway between that obtained for good performance and that obtained for poor performance. The training instructor ratings thus closely followed the

)

level of performance depicted on the videotape.

i a

FIGURE 4.4 provides similar analyses based on the Behavioral Frequency scales.

The true score ratings for the poor, good, and moderate performance scenarios proved to be low (mean=2.7), high (mean=5.0), and medium (mean=4.1) respectively, consistent with scenario performance levels, but i

displaying less variation than was found for the BARS. However, mean values provided by the training instructors varied only slightly across the three I

l 4.8

[

(

WORKING DRAFT scenarios.

The value for the poor performance scenario (mean=3.8) was only slightly lower than that for the good performance scenario (mean=4.1), which in turn only slightly exceeded that for the moderate performance scenario (mean=4.0).

The Behavioral Frequency training instructor ratings failed to clearly distinguish scenario performance levels, and the match between expert and instructor ratings was clearly weaker than that obtained for the BARS ratings.

FIGURE 4.3. Mean BARS Rating for Videotape Scenarios of Poor, Good, and Moderate Performance P

P l

5 4

I A

2 1

1 2

3

{

-*- Trcin1%

ar-TrueScore s

4.9

WORKING DRAFT FIGURE 4.4. Mean Behavioral Frequency Rating for Videotape Scenarios of Poor, Good, and Moderate Performance 7

+

T 6

3 2

1 1

2 3

-*- Trainhg er-True"as 4.7 CRONBACH'S COMPONENTS OF ACCURACY ANALYSIS Additional analyses were undertaken to look more closely at the types of errors that might be present in the ratings provided by the training instructors during training.

When some form of criterion or true. score measure is available to serve as the basis for comparison, it is possible to evaluate rater errors through several statistical techniques (Murphy &

Balzer, 1989).

In the present case, the true score training ratings allowed the use of Cronbach's (1955) approach to assessing rater accuracy. This technique was of special interest because of its applicability where the sample size is small, as was the case here.

The use of Cronbach's analysis has been described as " effectively limited to laboratory settings" (Murphy &

Balzer,1989, p. 619) and is quite unusual in an applied situations because of the requirements for having criterion measures as well as a complete data set with no missing values.

4.10 1

=

.-~-

i i

WORKING DRAFT i

Using the Cronbach approach, several measures of rater accuracy can be~

j computed for each rater, based on comparisons with true score estimates of j

performance.

True score in this case refers to the ratings provided by the i

research team leader, who also conducted the training in use of the rating.

instruments and was thoroughly familiar with the both the issue of team l

interaction skills in control room settings and with use of the rating 4

instruments.

6 Using Cronbach's approach, four components of accuracy were computed:

j i

Elevation (EL): the tendency for a rater to rate consistently above or j

below the average true score.

Interpreted as the component of error j

due to overall leniency / severity of the rater.

j Differential Elevation (Dell: the tendency to rate a given scenario, across dimensions, consistently lower or higher than is warranted by the true average performance on the scenario. Describes the degree of

'{

closeness to the true average rating for a given scenario, and t

indicates the ability of the rater to appropriately rank each scenario.

t Stereotvoe Accuracy (SA): the tendency to rate a particular dimension, across scenarios, above or below the true dimensional average value.

~

Represents accuracy in ordering rating dimension averages across 4

scenarios.

Differential Accuracy (DA): the tendency, for each individual rating of

{

t i

a scenario on a dimension, to rate above or below the average true rating for that scenario on that dimension.

Represents the ability to-predict differences, averaged across dimensions, between scenarios on 1

individual dimensions.

These four accuracy components (El, DEL, SA, and DA) were computed for j

each rater for the BARS ratings as well as for the Behavioral Frequency l

ratings, for a total of eight accuracy component values per rater. Having these ' values for both scales allowed statistical comparisons to be made to j

see which type of scale produced less error.

It was found that the EL j

(elevation) error was significantly greater for the BARS ratings than the 1

P

[

4.11 l

n

WORKING DRAFT Behavioral Frequency ratings.

However, DEL (differential elevation) error was greater for the Behavioral Frequency ratings than the BARS, as was DA (Differential Accuracy).

The SA (stereotype accuracy) error, however, was nearly identical for both types of scales.

Therefore, although there i

appeared to be a greater degree of leniency error in the BARS ratings, the BARS ratings provided more accurate measures of each scenario and of l

dimension ratings within each scenario.

A second issue was whetter or not rater error for the training instructors might decrease over time, due to increasing experience with the scales and feedback about their rating performance (which was provided after

-l viewing and rating each videotape scenario).

Consequently, the accuracy components were computed for each scenario.

Only the components EL and SA j

were used, since DEL and DA prove to be zero at the scenario-level.

Each rater therefore received 12 accuracy component values-- two error components for each of three scenarios for the BARS and the same number for the Behavioral Frequency ratings.

A graph of these results for El is provided in FIGURE 4.9.

Repeated

,(

measures analyses of variance were used to analyze the data, with follow-up simple effects analyses needed to clarify several ambiguous results. There were no significant difference among the El values for the BARS ratings across the three scenarios. However, for the Behavioral Frequency ratings, EL error decreased significantly from Scenario One to Scenario Two, and again from Scenario Two to Scenario Three.

t i

l l

t 4.12 i

t B

WORKING DRAFT FIGURE 4.5.

Values EL Component for Videotape Scenarios of Poor, Good, and Moderate Performance 2.5 7

i 1.5 1

0.5 1

2 3

-+- BARS

-*- Behavierd freq.

A graph of the results for the SA accuracy component is shown in Figure 4.6.

Analyses of variance showed that the SA error for the Behavioral Frequency scale was significantly greater than that found for the BARS.

In addition, the SA error for the BARS decreased significantly from Scenario One to Scenario Two, then increased again in Scenario Three.

For the Behavioral Frequency ratings, the SA error decreased from Scenario One to Scenario Two, then remained roughly constant from Scenario Two to Scenario Three.

4.13

e WORKING DRAFT FIG'!RE 4.6.

Values of SA Component for Videotape Scenario of 1

Poor, Good, and Moderate Performance i

2.5 2

1.5 0.5 0

1 2

3

+ DARS

-*- Betuviorat frq t

b 9

l 4

4.14 i

l

i

~

WORKING DRAFT 5.0 DISCUSSION The purpose of this study was to develop reliable and valid measures of j

team interaction skills. This section reviews and interprets the results of l

the study in light of this objective, relates the findings to the empirical and theoretical literature, and proposes directions for improvements in the study.

5.1 OVERVIEW 0F FINDINGS OF THE STUDY A wide range of statistical analyses were performed on the training j

and simulator scenario _ ratings of team interaction skills. The following subsections examine the implications and interpretations of the primary findings.

5.1.1 Descriptive Statistics i

The training instructors consistently provided very positive ratings of the team interaction skills of the control room crews, especially for the BARS.

It seems most likely that the ratings reflect genuine good performance, although the alternative hypothesis of bias cannot be l

completely refuted.

First, all of the crews had been working together for some time, and all members had participated in extensive training, simulator i

practice, and previous testing to insure their knowledge and competence.

Many had undergone the rigorous selection and training that is an inherent j

part of the Navy nuclear program. All crews had demonstrated competent j

performance in the nuclear plant for some time. Thus, a consistent pattern of very positive ratings should be no surprise.

In addition, the research situation was deliberately set up to minimize pressure for rating bias.

For j

example, it was emphasized that only the research team would see the ratings f

and that there would be no impact of the ratings on either the training l

4 instructors or the control room crews.

The potential impacts of social L

desirability, peer pressure, or fear of making negative evaluations should

[

all have been minimal in this situation.

l

..m t

i

' *'l,,, s j

T' a

i 5.1 L

~_

I WORKING DRAFT 5.1.2 Factor Analysis / Internal Consistency Reliability

)

Confirmatory and exploratory factor analyses were conducted to assess the extent to which the hypothesized six-dimension structure of team i

interaction skills was found in the rating data.

For the BARS, a one or a i

two factor solution was found to provide a good fit to the data. For the two-factor solution, the first factor might best be labeled an interpersonal aspect of team interaction skills, while the second factor appeared to represent a task-oriented aspect.

The results suggest that raters using f

the BARS tend draw a distinction between how crew members getting along together as a group and how they work together to accomplish their tasks.

The one-factor solution was also a reasonably good fit with the data, suggesting a global interpretation of team interaction skills by the raters.

l Clearly, the factor analyses did not support the independence of the six i

rating dimensions. These findings do not, however, suggest that the number

{

of ratings be reduced to one or two.

Rather, the six dimensions should be summed to form one or two indicators of team interaction skills. The presence of multiple " items" for each dimension will serve to reduce rater m,, j error and substantially increase dimension reliability (see Nunnally,1978)..

No satisfactory factor solutions were found for the Behavioral l

I Frequency scales, indicating that the items simply did not fall into clearly-defined clusters, or dimensions, as hypothesized.

It should be noted, however, that the ratio of number of raters to number of items is a key issue here. A ratio of a least 10:1 is recommended for stability in factor solutions (Gorsuch, 1974). The ratio in the present case was much higher for the six BARS dimensions because of the smaller number of items (the ratio was roughly 30:1) as compared to the Behavioral Frequency ratings of 26 items (ratio of roughly 7:1) The Behavioral Frequency factor analysis results simply cannot be interpreted as strongly as can those for the BARS.

Ideally, additional data should be collected in order to adequately assess the factor structure of the Behavioral Frequency items.

5.2

WORKING DRAFT 5.1.3 Interrater Reliability A key finding of the present study was that high values of interrater reliability were obtained using both rating scales. The raters were clearly in strong agreement among themselves regarding performance of the crew l

members in the simulators.

The James (1984) approach to measuring l

interrater agreement was used because of its appropriateness for data displaying restriction in range.

Interrater agreement was generally in 0.85 to 0.90 range, although consistently somewhat higher for the BARS than the f

Behavioral Frequency ratings.

Strong interrater reliability serves as a -

l necessary, but not sufficient condition, for inferring scale validity (Nunnaly, 1978).

That is, had low reliability been obtained, the validity of the scales would have been seriously challenged. However, finding such strong agreement among raters is not in and of itself enough to prove that the scales do, in fact, measure what was intended. As discussed below, additional forms of validity evidence must also be present.

9 5.1.4 Trainina Ratings One such form of additional validity evidence centers around results obtained from using the rating scales during rater training. The training ratings were used, in part, to assess the extent to which the rating j

l instruments could be used to detect and differentiate varying levels of team o

interaction skills.

Valid rating instruments would be able to detect such differences, while instruments of poor validity would be unlikely to do so.

l The videotape training scenarios provided a considerable variation in performance with which to test the rating instruments.

i

~. o i

Analysis of the BARS training ratings showed that the training

]'Q ; s i

instructors clearly differentiated poor, good, and moderate crew performance.

In addition, the BARS ratings were found to correspond closely j

with expert ratings provided prior to training. The ability of the BARS l ratings to differentiate levels of performance and the close match with

,,,,; Il expert ratings thus provided strong evidence in support of the validity of d-the BARS rating instrument.

X,,.

' [

r.

1 g (*

l 5.3

'/q *, #

r d

4 C4Uw. q

,g

,, uc au.

tw.-..

t (

t

WORKING DRAFT Results for the Behavioral Frequency ratings were less positive.

Only small differences were found between the training instructors' ratings of poor, good, and moderate performance on the training videotape.

The expert ratings themselves showed relatively little variation across the differing levels of performance.

The Behavioral Frequency ratings thus showed little ability to differentiate among levels of performance.

In addition, a relatively pnor match was found between the training instructors' ratings and the expert ratings.

In spite of good interrater reliability, these findings cast doubts on the validity of the Behavioral Frequency scales, that is the ability of the scales to really measure the construct (team interaction skills) that they were designed to measures.

5.1.5 Cronbach's Components of Accuracy Analysis The Cronbach components of accuracy analysis was conducted to assess rater error present in the rating data gathered during training. The elevation component EL was significantly greater for the BARS ratings than for the Behavioral Frequency ratings. This result may reflect a statistical artifact, since EL is computed on the basis of rater and true score averages across all three scenarios. Analysis of El at the scenario-level revealed that the BARS error component was somewhat lower than the Behavioral Frequency EL component for two of the three scenarios.

Thus, no firm conclusions should be drawn about El error for the two types of rating scales.

The DEL, differential elevation error, was significantly greater for the Behavioral Frequency ratings than the BARS.

That is, the BARS ratings across all six dimensions for each of the three scenarios were more accurate than the Behavioral Frequency ratings of each scenario.

The DA, differential accuracy error, was also significantly greater for the Behavioral Frequency ratings than for the BARS, indicating the BARS ratings of each dimension within each scenario were more accurate than the corresponding Bdavioral Frequency ratings. Overall, with the exception of the ambiguous EL finding, the BARS ratings were shown to contain less error than the Behavioral Frequency ratings.

5.4

+

WORKING DRAFT Analysis of Cronbach's components were also conducted at the scenario level to determine if rating errors decreased with training and experience.

The SE and El components were computed for each scenario, for each type of rating scale.

(Only these two components could be computed by scenario.)

Strong support was found for the hypothesis of decreasing error with training and experience for the SA error component, with the primary decrease in error occurring between scenarios one and two.

For the El component, the BARS errors showed essentially no change across scenario, while the Behavioral Frequency error showed a decrease from scenario one to two, and again from two to three.

The results do not allow a clear interpretation of the cause of the decrease in error, since the improvement in ratings might have been due to training, to the discussions of ratings, to feedback about rating true scores, to increased experience with the scales, or to some or all of the above.

In addition, the three videotape scenarios were shown only a single time to all of the training instructors, and thus the order of presentation i

could not be controlled (i.e., counterbalanced).

Ideally the scenarios would have been presented in random order to each training instructor.

Although order effects do not appear to be a likely explanation of the findings, the lack of a random presentation order makes this alternative hypothesis impossible to eliminate empirically.

In follow-up research, the order of presentation should be varied at random to eliminate such possibilities.

Finally, the scenario-wise components analyses also tended to show less error present in the BARS ratings. The El error was somewhat lower for the BARS than the Behavioral Frequencies for the first two scenarios and higher on the third.

For the SA analysis, error for the BARS was significantly less than for the Behavioral Frequency ratings for all three scenarios.

5.2 BARS VS. BEHAVIORAL FRE0VENCY RATINGS: THEORETICAL ISSUES Results of the present study documented a consistent pattern of findings in which the BARS ratings performed better than did the Behavioral 5.5

i l

WORKING DRAFT Frequency ratings.

However, it might be anticipated that the recall process i

would be easier for the raters and ultimately result in greater rating accuracy than the more complex global judgment process required by the BARS.

In addition, the use of multiple homogeneous items per dimension for the Behavioral Frequency scale should have promoted high internal consistency reliability and corresponding decreased rating error. Why then did the BARS fare better than the Behavioral Frequency scale? To attempt an explanation requires some attention to underlying cognitive processes involved in making 5

the ratings.

The cognitive judgment process required for BARS-type ratings has been c

described by a number of writers.

For example, Nathan and Alexander (1985) stressed that, when' providing ratings, raters recall the target person as

(

belonging to a good / bad category to which this person had been assigned at the time of observation. Behaviors typical of this category are then recalled, rather than behaviors which were directly observed. That is, category " prototype" behaviors are recalled which may or may not be closely f

related to behaviors that were actually observed. Accuracy of rating will then depend on: 1) the degree that the rater's existing categories (called implicit theories of personality), reflect genuine levels of potential ratee l

performance and 2) whether the ratee was assigned to the proper category.

l Extensive empirical research (see Murphy, Martin, & Garcia,1982, for an i

overview) has supported this model.

Providing ratings for an entire group i

or team is likely to involve a very similar process. However, the complexity of observing, encoding, and storing information on the l

performance of multiple persons should increase the reliance on categories

{

and oecrease the ability of the rater to retain specific details of individual or group performance in memory.

1 Regarding cognitive processes involved in making Behavion1 Frequency-type ratings, Murphy et al.(E ') argued that to recall the h cency of occurrence of specific behavi s observed over more than a relatively short

]

period of time would involve a tremendous amount of information processing capability. However, given the linted ability of humans to process information, they felt that raters of frequency must also rely on categories 5.6

l WORKING DRAFT

~

and implicit personality theories, just as was the case for making BARS-like ratings. Murphy's empirical research provided support for this hypothesis.

He concluded that Behavioral Frequency-type scales are simpl,< " disguised measures of traitlike performance" and that, as the actual memory of l

behavior fades, the frequency rating process shifts to an inferential task that essentially draes not differ from that of ratings requiring complex judgments. Other writers, such as Nathan and Alexander (1985) have reached similar conclusions. Therefore, given the necessity in the present study of

.j recalling numerous behaviors of multiple crew members over a one to two hour scenario, it is likely that raters were forced to rely on some form of i

implicit categorization process.

i t

i If similar cognitive processes were involved in providing both types of l

ratings, one must wonder why the Behavioral Frequency ratings fared so' poorly in comparison to the BARS.

For example, in the Murphy et al. (1982) study the behavioral frequency-type ratings converged with the BARS-type ratings. The answer probably relates to the number of ratings required in f

each format.

A total of six ratings were needed to complete the BARS, while 26 were needed for the Behavioral Frequency ratings, Gaugler and Thornton (1989), in a study of assessment center ratings, found that the greatest rater accuracy resulted when three to six dimensions were used, with j

decreasing accuracy for larger numbers of dimensions.

It is likely that in j

the present study the raters were able to provide the six BAR5 ratings with l

little difficulty and a reasonable degree of accuracy.

However, the cognitive demands involved in making the 26 distinct Behavioral Frequency ratings may have been great enough to result in greater error and decreasing accuracy.

Future research on team interaction skills ratings might attempt to l

address the rating process question directly, perhaps by including a very l

short form of the Behavioral Frequency instrument or by alternating the use i

of several forms of varying ' numbers of items.

The results could be used to assess the degree to which the Behavioral Frequency ratings might converge with the BARS if the number _ of items is kept to a manageable limit.

1 l

5.7 E

5

y

+

WORKING DRAFT 5.3 METHODOLOGICAL CONSIDERATIONS Although the research methodology used in the present study represented an improvement over the previous research in several ways (e.g., the scale development process, extended rater training, use of a training videotape, statistical approaches, etc.), there are several ways in which the study might be further strengthened.

First, as mentioned above, the training videotape allowed only a single j

order of scenario presentations.

Future research should provide for the development of several tapes, each presenting the scenarios in a different order, with the tape to be used in training selected at random. The use of randomly presented tapes would eliminate the possibility of order effects in i

assessing the improvement of rating errors over time.

In addition, the expert ratings of the performance depicted on the tapes could be obtained more rigorously.

Here, the research team member responsible for developing the tapes and conducting the training provided the expert ratings. A potentially better process would involve several persons closely connected with the research team who are familiar with both the scales and with nuclear power plant operation. True scores would then be based on the mean ratings of the expert raters or on consensus ratings.

Further, increasing the length of the videotapes would also be helpful.

Each of the existing scena-ios is of less than five minutes duration.

Longer tapes would allow a greater quantity of information about team interaction skills to be included in each scenario.

The study was also hampered by characteristics of the data collected during observations of simulator performance.

For example, the simulator-based ratings reflected very high positive levels of performance by the control room crews.

Ideally, future research should include crews, perhaps from a variety of different sites, in which more variability in team interaction skills and performance capabilities could be obtained. With less restriction of range in the data, some of the complexities of data

[

analysis encountered here could be alleviated.

For example, more l

conventional measures of interrater reliability could be used and greater 5.8 i

l

o WORKING DRAFT faith could be placed in the meaning of the correlation coefficients and analysis of variance outcomes.

5.4 CONCLUSION

S AND RECOMMENDAT101{S Measures of team interactions skills of nuclear power plant control room operators were developed and evaluated in this study.

The development process was extensive and rigorous, covering more than a two year period, and was based on input from contract license examiners, training instructors, and control room operators working with the research team.

Data collected with the initial instruments, both qualitative and quantitative, was used to revise the instruments and begin a second round of

[

data collection and evaluation.

The development process helped insure that the rating instruments provided a broad and representative coverage of the domain of team interaction skills (i.e., insuring content validity) and that the language and wording used was consistent with the needs of potential users.

The data collected in the second year of the study demo 1strated, for 1

the BARS ratings at least, good interrater reliability and internal consistency.

Additional validity evidence was obtained in the training ratings, where raters were able to differentiate varying levels of team interaction skills.

While further research may contribute additional knowledge about the ratings scales, and should be actively pursued, it should be concludad that sufficient support for the validity of the scales has been demonstrated to warrant their usage in an applied setting. The use of the BARS rating instrument must, however, be linked with a 1gggggggttraining program that includes the features of the training employed in the current study. These features include explanation of the dimensions of team interaction skills, exposure to videotape scenarios of different levels of team interaction skills performance, multiple opportunities to provide ratings, and multiple i

opportunities to discuss and receive feedback on the ratings.

5.9 i

n.

{

3 4

WORKING DRAFT Several potential applications are apparent for the scales developed here, including the assessment of team interactions skills as a part of 1) the control room crew requalification examinations, and 2) the team training and development process.

For the former purpose, serious attention must be l'

given to standardiza', ion of rater training, selection of appropriate simulator scenarios, interrater agreement, and periodic retraining.

For the latter purpose, the emphasis would need to be more on the feedback process, identifying ways of improving team behaviors, and on monitoring improvements f

in performance.

An additional use of the team interactions skills measures is for further research into the nature and optimum functioning of control room crews.

For example, several writers (e.g., Gladstein,1984) have stressed j

the importance of team interaction skills for team outcomes and performance l

from a theoretical perspective.

It is now possible to empirically examine the linkages of team interaction skills with team productivity and i

performance.

As computer-generated measures of simulator or power plant performance become available, the strength of the relationship between team interactions and various objective measures of plant performance may also be investigated.

New approaches to team training, involving immediate feedback l

on both team interaction skills and simulator outcomes measures, may be j[',.i U

developed as a result. The measures of team interaction skills developed in this study may prove useful for the purposes of control room crew

{.'"ci evaluation, for development purposes, and for further research designed to

.g, i increase our understanding of the importance of team interaction skills to

.ji'

)

control room crew performance and objective plant outcomes.

4 5.10

0 t

t I

6.0 REFERENCES

i Brannick, M. T., & Salas, E. (1991). Understandina Team Performance:

A Multimethod Study.

Orlando, Florida: Naval Training Center.

Cannon-Bowers, J.

A., & Salas, E.,

(1990).

Coanitive Psycholoav and Team Trainina:

Shared Mental Models in Complex Systems.

Paper presented at the Society for Industrial / Organizational Psychology annual meeting, St. Louis, 1

MO.

Cronbach, L. J.

(1955).

Processes affecting scores on " understanding of-others" and " assumed similarity."

Psycholoaical Bulletin, 52, 177-193.

Davis, L.

T., Gaddy C. D., & Turney, J.R., (1985). An Approach to Trainina of Nuclear Power Plant Control Room Crews.

NUREG/CR-4258, U.S. Nuclear i

Regulatory Commission, Washington, D.C.

Dyer, J. L.

(1984).

Team research and team training: A State-of-the-art i

review. _ Human Factors Review, 285-323.

Gaddy, C. D. & Wachtel, J. A.

(1992). Team interaction skills training in the nuclear power industry. In Teams: Their Training and Performance, R. W.

Swezey and E. Salas (Eds.), Northwood, New Jersey: Ablex Publishing.

i Gorsuch, R.L.

(1974).

Factor Analysis.

Philadelphia: W.B. Saunders Co.

l Institute of Nuclear Power Operations (INPO).

(1988).

Guideline for Teamwork and Diaanostic Skill Development. INPD 88-003, Atlanta, Georgia.

James, L. R.,

Demaree, R. G., & Wolf, G.

(1984).

Estimating within-group interrater reliability with and without response bias. Journal of Applied Psycholoav,E9(1),85-98.

i Montgomery, J. C., Gaddy, C.D., Holmes, C. W., Seaver, D. A., Hauth, J. T.,

Spurgin, A.

J., & Beare, A. N.

(1990).

Team Skills Evaluation Criteria for Nuclear Power Plant Control Room Crews (PNL-7250). Richland, WA:

Battelle Pacific Northwest Laboratories.

Murphy, K. R., Martin, C., & Garcia, M. (1982).

Do behavioral observation scales measure observation? Journal of Applied Psycholoav, 67(5), 562-567.

Newman, L.

(1989).

Building teamwork in the control room. Nuclear News, 32(7),56-57.

Nunnally, J. C.

(1978).

Psychometric Theory.

New York: McGraw-Hill.

l Oser, R., McCallum, G. A., Salas, E., & Morgan, B. B.

(1989).

Toward a Definition of Teamwork: An Analysis of Critical Team Behaviors.

NTSC TR -

004. Orlando, Florida: Naval Training Systems Center.

Pulakos, E. D.

(1984). A Comparison of rater training programs:

Error training and accuracy training Journal of Applied Psycholoav, 69(4):581-588.

6.1

)

1

-l r,

Pulakos, E. D., & Borman, W. C.

(1986).

Rater orientation and training.

In E. D. Pulakos and W. C. Borman (Eds.), Development and Field Test of Army-Wide Ratina Scales and the Rater Orientation and Trainino Program.

Minneapolis, Minnesota: Personnel Decisions Research Institute.

Shiraki, C.

1989.

Operator Licensina Examiner Standards.

NUREG-1021, Revision No. 5.

Washington, D.C.

U.S. Nuclear Regulatory Commission.

Topmiller, D.

A., Burgy. D. C., Roth, D. R., Doyle, P.

A., & Espey, J. J.

(1981).

Survey and Analysis of Communication Problems in Nuclear Power Pl ant s.

NP-2035, Palo Alto, California: Electric Power Research Institute.

6.2 l

l w.

}[

33

.h 4

M

-f k

1 W

RATER OR ENTATION GUID OK 9

TEAM I tRA TIONS STUD

-l

-en ALL INFORMATION OBTAINED IN THI ST W L BE USED FOR RESEARCH PURPOSES i

ONLY. ALL INFORMATION WIL BE H D IN STRICTEST CONFIDENCE.

k

~

3 I

j

.3

.g.

-)

l D.1 k

i'

1 Introduction We would like to thank you for participating in the Team Interactions Skills Study. This study is being conducted by the Battelle Northwest g

Laboratory (PNL), General Physics Corporation, and the Battelle Human Aff airs Research Centers. Our research objective is to assist the Nuclear Regulatory Commission (NRC) in identifying and developing valid and reliable criteria for evaluating team interactions skills in nuclear power plant operations.

To reach the NRC's overall objective, we will attempt to determine whether or not team interactions skills can be evaluated consi tently and accurately,

{y and whether or not these skills are related to safe c ar plant operations.

To help us reach these goals, you received this gui eb k, and you will participate in a one-day training program that wi f.iliarize you with the team interactions skills rating scales that were develo d for this study.

After this training, you will use the scales t re you observations of several control room crews perfoming emergent erati pr adures in a simulator environment.

In addition to the team interactions

.il ra scales that are featured in this guidebook, and that will be dis ass d ex ns vely in the training

(

program, another set of scales is desc ibe

'n hi guidebook.

This set is the team technical performance scales.

ese c es will not be discussed in the training program, however. They are co ern with the technical r

perfomance of the control room team; how wel

+he am solves the problems L

they face, and their ability to ce the dange tha each emergency presents d

v to the plant. NRC licensed ex.iners entract aminers use scales similar to these on a regular asi o evalua ntrol room teams' technical perfomance. Thus, these scal on r

ire a s rt review in this guidebook.

It is important to note that goo team interactions skills and good technical perfomance always,o t ether.

For some teams, their high a

amount of experience nd tec ical co.,et ce, relative to other crews, can compensate for poor ee te etions is.

Conversely, for some other teams, their lack f xper1ence ai ethnical competence, relative to other t

, may be co en

'or t leae. in part, with good team interactions This guidebook was evelo d to help you make more consistent and accurate igne o in duce you to the research objectives and your ratings.

I

~

role in t. project.

The g 'd ook is also designed to familiarize you with the tea in ns kills imensions, the scales themselves, potential a

rating err rs, and e ethnical performance scales.

l 8

E a

D.2

Team Interaction Skill Rating Scale Dimensions The scale dimensions described below were the product of an extensive research process. An initial version of the team interaction rating scales

,Q were developed by a group of researchers familiar with nuclear control room operations. This version of the scales was used by a group of contract examiners and project staff. Based on their review and the statistical analyses of the data collected in the initial effort, the scales were modified i

through an iterative process. The feedback from the original set of raters was integrated by a group of project staff members. This preliminary version of the present scales was submitted to the entire proj ct team and several contract examiners for their comments. Based on thes aments, further revisions were executed. This iterative process co in d until a consensus about the dimensions and their definitions was rea e among the project staff and participating contract examiners.

On the following pages you will find the te tera + ion ting scale dimensions. Also included are the definitio nd illust tive amples of L

the major characteristics of each dimensio.

ease read th e arefully.

By doing this, it will help project staff me er ans

- questions and clear up any differences in understanding that m e st r

ese dimensions.

It is i

critical to the success of the study t tt se ir nsions be thoroughly understood before the actual ratings are ade.

p l

4 P

s s

T 4

4

~

D.3 r

COMMUNICA TIONS a a, m

r s

FACTU AL INFORMATION Example: The sky is fa!Iing h

/

I s1

/

/>

v 8

o N

o r

Et. v. 4. "

'A

-as F

~~

"C ION N

INF D"

N in im 4.

N

/

e: The dan sky fa!!in;, you bezo!

V g

3 e

'V The Communica ns di nsion consists of the transmission of factualinformation in a clear and concise man in terms of crew behaviors this includes Ilstening skills, nonverbal behavict, and articulation of plant status or instructions for future activities.

Communications does not include emotional aspects of information transmission.

D.4

N TASK COORDINA TION E

1 E

3 m

~

Asr n

/

v W

Tlske 4

i i

w Task Coordina, n refers the crew members' ability to match the available resources, such as people and pro d s, to the task in order to achieve the optimal workload distribution.

8 f

0.s

,La.J 4

OPENNESS

.4 I DON'T THINK I THINK WE X WILL WORK i

j SHOULD DO X.

BECAUSE...

O 4

/

/

N q

r m

r m

JJQ RO a

C 4

i 4

5 l

V l

Openness sits of rew members' tendency to ask for, give,and receive suggestions. It

[

. includes ques n' g decisions and discussing etternatives to arrive at the best possible decision. Openness also involves the reactions of crew members to feedback,which should focus on the task rather than on the person when reviewing actions.

0.6 4

J

i t

E TEAM SPIRIT w

2

(

Dl a

a

\\D O

i

/

9 9

m i

v Team Spirit c sists mutual support, cohesiveness, group identity, and extra effort crew members exhibit to accomplish a common goal.

r 1

t M

O.7 1

i b

MAINTAINING TASK FOCUS IN TRANSITIONS (WE JUST HADh rPROCEDURES mU J

j I

A SCRAM S AY...

j 8

O

~J

/

Y

/

N TASK FO US.

a l

iHATE WORKING NTTHE 6

WITH PEOPl.E l

I WEATHER N!CE?

WHO DRESS FUm l

2 8

O a

~

g i

NON-TASK FOCUSED V/

. s v

Maintaining Task Focus in Transitions deals with crew members' responses to changes from normal to non normal plant conditions (e.g., loss of pressure in feedwater pump). These pesponses include focusing on the task, avoiding emotional overreaction or panic, and maintaining poise.

D.8 i

i

8 ADAPTABILITY v

"s t

WILL YOU LET 1

f TAKE OVER FOR YOU I

WHILE YOU SWTTCH JRE TO THIS TASK?

O E

k x

/ 9 N

O

=

c 4

A s

Adaptab.

reft cre members' ability to adjust or modify their behavior based on the situation, t e flexi. in responding to the environment, and to recognize the need for change.

I

!D.9 i

._. _ = _

a Team Interactions Skills Rating Scales Two types of team interactions skills ratings scales were developed. These two types of scales are called " Behavioral Frequency Rating Scales" and i

"Behaviorally Anchored Rating Scales." The former is comprised of a number of i

scale items for each dimension, for which you will give ratings.

The latter j

is comprised of single item, global rating for each dimension.

A more detailed description of the scales is found below.

The Behavioral Frequency Ratings Scales were designed to evaluate the number of times the crew behaves in a certain way during control room i

operations; that is, what type of behavior is display nd how often it is displayed in a particular situation. These rating s le are designed to measure the relative frequency of certain behaviors th take place during the scenario.

g The Behaviorally Anchored Rating Scales were de gn to ovide a way to express how the crews perform overall on the an interac 'ons ale dimensions. These items are different from he thers in t t th y are not designed to look at frequency of individua be avio s, but the re a cumulative look at how the crews perform n e d'.e ions based on the behavioral anchors provided.

[

In the pages that follow, you will fin +he t m interaction skills rating scale items. Each type of scale item, Behay1 al equency and Behaviorally i

Anchored Rating Scales, is preceded by some gen al g 'delines for making the ratings. These guidelines inci e

ation abo + +ae scale items and the type of scale.

7 4

E E

4 9

'D.10 g-e l

4 GUIDELINES FOR MAKING BEHAVIORAL FRE00ENCY RATINGS General Guidelines Consider how crew members interact.

Focus on the crew as a team.

m LJ Avoid making ratings based on a single crew member.

Facts about the Behavioral Frecuency Ratines Scales A 7-point scale is used.

r u tly crew behaves in a For these ratings you will consider how certain manner.

For each scale dimension, you will..ak several (abou 4 different

~

ratings.

Guidelines for Makino Ratinos Carefully read the definition and it sf each team interaction skills dimension.

rm L*'

If a crew always behav in the m,

dest ed in the statement, 7.

then assign the crew a at i

r If a crew never behaved 1 th

.n rs tribed in the statement, the-assign the crew a rating o 1.

Record your r ng the bl k

'ne provided to the left of the statement.

Complete 11 he b haviora quency ratings before moving on to the next ra ng s 1.

Example of the 7-Po1 + Beh ioral Frecuency Ratino Scale

{

\\

2b 3

4 5

6 7

ever Seid m Sometimes About Often Very Always T_j Malf the Frequently

'~-

Time C

M

~

0.11 I

f3 t

1 2

3 4

5 6

7 Never Seldom Sometimes About Often very Always g

Half the Frequently l

Time r-C0m VNICATIONS Crew members:

g.

C

~

Fail to keep everyone fully infomed when c ng s in plant status occur.

r b.

Provide specific, precise instructions.

s Give specific clear responses to que ti s or requ s.

Acknowledge the receipt of inforr ti (e..

nodding, hand gesture, or verbal response).

OPENNESS

{~

Crew Members:

w Focus comments (feedba ) entire task p fomance.

}

N Accept constructiva cri "cis con e o

oposed actions or decisions p siti ely - thank the giver).

(e.g., express agreement rr React in defensive manner t ins ructions or feedback (e.g., respond i

k"~

+

with sarcastic or pers nal attack).

j

.n Openly oppo e ci ion

  • ey do t agree with.

Particip ei th d

" n-m-i.ng process.

l N

TASK COORDINATION l

p, Crew memb s:

t g,

Di cuss the eq ence of actions to complete tasks.

P Ass t other w b are overloaded with work.

  • u i

Re., and ck to requests for assistance from others.

Emphasi e task to detriment of others.

E 1

1 0.12

t i

~

I 2

3 4

5 6

7 Never Seldom Sometimes About Often Very Always k

Half the Frequently Time TEAM SPIRIT f

Crew members:

m I

Q Group atmosphere is clearly positive.

Refer to each other by name when asking f h

or giving direction.

l Provide verbal support to one another.

Volunteer to perfom more than the

.c sary minim tas..

Are missing the team spirit elema t.

MAINTAINING TASK FOCUS ?1 TRANSITIONS Crew members:

Become emotional in stre >

conditions

.g, use expletives in comunication to other, interru n count suggestion)

M Wait until others are f is d sp ak' b ore replying.

I Make comments or actions t at el' ve tension and re-focus effort on a

the task.

~

Remain calm arin th most cr'ti l' moments of an emergency situation.

ADAPTABILITY Definition: Adapta ~ ity flects crew members' ability to adjust or modify their beh ior ed on the situation, to be flexible in ing t the vironment, and to recognize the need for Crew r mb s:

Ma rapid ajjuetments in work assignment when required by changing lant ndifio Disc the eed to change based on the developing situation.

Demonstra e flexibility in dealing with an unexpected or unfamiliar situation.

Consider only one option when the unexpected ' occurs.

k D.D

GUIDELINES FOR MAKING BEHAVIORALLY ANCHORED RATINGS General Guidelines N

Consider how crew members interact.

Focus on the crew as a team.

Avoid making ratings Dased on a single crew member.

Facts about the Behavioral Frecuency Ratinos Scales A 7-point scale is used.

For these ratings you will consider how c>

beha s on a given dimension.

Guidelines for Makina Ratinos Carefully read the anchors for ach a

in eraction skills dimension.

If the crew always behaved in the m er scribed in the statement, then circle the number above that rati i

i If the crew did not beh e in act man described in the anchor, circle the mos ap riate nu..~

between the two anchors Q

that best describe the ew' be, vior.

member, for each dimension a number on the scale mus be ci c1 1

Examole of the 7-Point h iereliv cho ed Ratino Scale Low r

High v

w j

2 2

4 S

6 7

Low Anchor erage Anchor High Anchor 4

4 4

4

+

0.14

.t

i BARS -- COMMUNICATIONS LOW AVERAGE HIGH l

1 2

3 4

5 6

7 LOW AVERAGE HIGH 0

Provide insufficient Provide plant status aintain constant information about updates to one anoth-awareness of plant f

plant status and plans er; generally appear

.atus and plans for for stabilizing the aware of plans for st ilizing the plant.

v pl ant.

Crew members stabilizing the pia rew mbers transmit are difficult to Crew members tra s, t

fa ual formation in understand when factual inform-io a el d concise transmitting f actual that is mostl ci ar manner.

Communica-i information. They and concise bu+ oct tions including fac-

+

r seldom acknowledge the sionally i di u

tual information are i

L receipt of factual to understan Con-always verbally or information.

Communi-munications inv vin nonverbally acknowl-cations include a high f actual informatio edged by recipients proportion of non-are lly or non-(e.g., "I understand,"

{

edged or waving a hand or task-relevant informa-ver ally at tion.

b. rec' 4 nts abo making the "OK" sign).

i hal of +he

'me M

j

?

'~

BARS -- Ooenness l

LOW A ' ArE HIGH v/ v i

1 2

3 4

5 6

7 LO" AVERAGE HIGH Crew,,em rs sug-ask-focused sugges-Crew members request r-1 ces ns o feedbac 1,ans or feedback are suggestions and feed-k)

Inclu ers 1

p.ovided, but seldom back.

Responses are reference C

request such informa-task-focused.

Crew members rec e sug-tion.

Crew members' members receive suo-h gestions or fe-a reactions to feedback gestions and feedback by regularly inte -

are mostly positive, in a positive manner rupting or replying but occasionally may (e.g.,.thank the sarcastically.

be negative (e.g.,

sender).

"Get off my case").

11 I

C.lf EyJ

l 1

BARS -- TASK COORDINATION LOW AVERAGE HIGH l

l 1

2 3

4 5

6 7

g 8

LOW AVERAGE HIGH R'esources within the Staff and resources rew members use control room are within the control available and appro-allocated without room are used effec-iate resources both considering the task.

tively most of the wit n and outside the Consult procedures, time.

Future needs gntr room.

Crew but do not rely on are neither anti me... ars nsult pro-them to guide pated nor consi re.

cedur vien neces-sary.

rrectly responses.

anticipate future

"'*d' ""d

'ct'"

E BARS -- TEAM SPIRIT HIGH LOW AV K g

l l

~

1 2

3 4

5 6

7 LOW AVER.'G H]GH Crew members seem Cra niembe ; hesitate Actively and willingly he in each other.

cooperate in crew unable to recogn a

when another crew Verbal and nonverbal activities. Verbal L

member needs assis-port for crew mem-and nonverbal support tance. Verb ~

on-ber present only for team members is verbal su.

rt for r' g normal oper-provided during normal crew mem rs '

, dem at ng conditions.

operating and emer-

[,-]

gency conaition's exprese d ider any condi on. Disagre (e.g., "That's okay, M

ment re t unre-we can take care of this," or " Good work, l,j solved are ' nor I needed that.").

D.16 f~]

L-

BARS -- MAINTAINING TASK FOCUS IN TRANSITIONS LOW AVERAGE HIGH I

2 3

4 5

6 7

LOW AVERAGE HIGH Crew members express Crew members tend to ben a novel or un-anger or frustration wait and adapt slowly usual event occurs, to each other when to plant conditions.

c w members discuss novel or unusual Options are discus d.

opt s calmly, thor-conditions occur.

Crew members expr es

' hly, and rapidly.

some frustratio, i Voi s re ain the same is not directe a as whe ormal condi-another crew..em r.

tions occur (calmly).

Anger, frustration, or tension cannot be detected.

BA'.3 -- ADAPTABILITY LOW ER HIGH i

1 2

3 4

5 6

7 LW AV. AGE HIGH After a change in.

.,.fter a change in After a change in

'8 plant conditions, crew p nt conditions, crew plant conditions, crew.

members occ v..'

mem. rs may recognize members immediately recogniz he need t need to change, recognize the need for

change, ri itie

.ay often change prior-change and rapidly or may no change, a d iths slowly, and shifts priorities to some ork ssignmenti change work assign-reflect changing and g

event lly

'ange, b't ments only after a rapid adjustments in 5

others not.

significant period of work assignments.

time elapses.

2 in D.17 h-r

l I

Team Technical Performance Rating Scales The Team Technical Perfonnante D.ating Scales were initially designed to be used during the trial simulator examination component of requalification g-examinations.

In keepi.7 wiin the purpose of the study, these scales are

[

e geared toward evaluating the crew as a whole, rather than individual operators. Specifically, these rating scales are designed to measure team r

technical skills in the areas of understanding / interpretation of L_J annunciator / alarm signals, diagnosis of events / conditions based on signals / readings, understanding of plant systems' responses, compliance /use of l

procedures and technical specifications, and control b operations.

E.

Like with the other rating scales, the Team Tech ca Derformance Rating Scale items are preceded by a set of guidelines fo mak these ratings.

It is suggested that you note the differences betwe t

e sb les and the others used for this study (e.g., team technical perfo a es les ems are rated on 3-point scales, while the others use 7-poi s les).

I E

E va a

.>c 3

i E

E.

E E

u D.18 5

GUIDELINES FOR MAKING TEAM TECHNICAL PERFORMANCE RATINGS 1

General Guidelines

[-1 d

Consider how crew members interact.

Focus on the crew as a team.

Avoid making ratings based on a single crew member.

p Facts about the Behavioral Frecuency Ratinas Scales A 3-point scale is used.

This rating scale differs from the prev ou tin scales.

For these ratings, you will consider technical s T pe rma e of the crew; that is, how technically proficient crew appe s to e.

1 Guidelines for Makina Ratinos Read each performance dimensio de i on nd statement carefully.

+

of he crew, first decide whicn knen rating the technical performa a

g level of performance most closely mat es team's performance for 3

the particular dimension Circle the number on er

'ng sca e have chosen.

g Work through the rating cal sq. e.

After completing the ratin scal, please review all the rating scales g

a for accuracy a d mpletenes.

2 i

l k

s 8

D.19

j N

UNDERSTANDING / INTERPRETATION OF ANNUNCIATOR / ALARM SIGNALS i

Did the crew:

(a)

NOTICE and ACKNOWLEDGE alarms, and ATTEND T0 alarms in order of their importance/ severity?

3 2

1 All alams that directly Minor awareness or ailed to notice and/or l

related to significant response difficulties e remely slow at i

changes in plant or lapser.

sponding to signif1-conditions were noted, ant alarms at critical t as; easily distracted g

y n 'sance alarms.

(b)

Correctly INTERPRET the meaning and sig i ante of a s

d annunciators (including the use of the Alarm %spo e P ocedures, as cable)?

1 p

3 M

Crew readily detemined Minor inaccur ies in Significant misinterpre-what failures / events alam interpreta 'on tations, resulting in alams were indicating, but without safety-plant degradation.

E rela +.

. sequences.

(c) VERIFY that annunciators /

a.

, als wer sistent with plant / system F

conditions?

3 2

1 All necessary verifi-inor lap es alarm Verification of failec v ificati ut no systems was poor or cations perfomed, including the ident'-

in 7 opria actions altogether absent.

fication of erron us taken a result of i

alarms.

i.

uat verification.

t C-I i

0 IY f

D.20

DIAGNOSIS OF EVENTS / CONDITIONS BASED ON SIGNALS / READINGS DID THE CREW:

(a) RECOGNIZE off-nomal trends / status?

3 2

1 Timely and accurate Recognition of trends failed to recognize recognition of trends at time of, but not rends, even after l'._

even prior to alams.

prior to, sounding of s unding of alarms and i

C alams.

nnunciators.

7,,

C (b) USE INFORMATION and use REFERENCE MATERIA (pr

,b ks, charts) to aid in the diagnosis / classification of events con ' ions.

l h

3 2

1 Correct, timely use Minor error by rew n failure to use r'i of infomation and use or int pre ti n reference material, LJ reference material of informati an misuse / misinterpretation led to accurate reference meter 1.

of infomation resulted diagnoses.

in improper diagnoses.

4 I

(c) Correctly DIAGNOSE plant

.ond'+' ns base those control room indications?

3 2

1 7-a L

Diagnoses by crew Minor er rs iffi-Faulty diagnoses were accurate and 1 ties i dia noses.

resulted in timely.

incorrect control j

manipulations, n

E 4

E

~

I a

D.21 i

UNDERSTANDING OF PLANT / SYSTEMS RESPONSE DID THE CREW:

g (a) LOCATE and INTERPRET control room indicators correctly and efficiently to ascertain and verify the status / operation of plant systems?

l r

3 2

1 i

l Accurate and efficient Minor errors in locating Serious omissions, instrument location and or interpreting instru-elays, or inaccuracies

{,_,

interpretation by all ments and displays; some m e in instrument crew members.

crew members required nterpretation.

assistance.

f i

(b) Demonstrate an UNDERSTANDING of how the pl nt s tems, and components e

operate, including setpoints, interlocks auto., ic ir.ns?

i 5

3 2

1 All crew members Minor insta mes of Inadequate knowledge demonstrated thorough errors due o ga n

of system / component understanding of how crew knowledg of operation resulted in systems / components system /componen per serious mistakes or e

operate.

tion; some crew me.. ers plant degradations.

L req '

ssistance.

(c) Demonstrate an understan 'ng w their DNS (or inaction) affected W

system / plant conditions?

3 2

1 All members understood etions - di ectives Crew appeared to act the effect that acti.s 1 dicated in without knowledge of or directives had o in racie n under-or disregard to effect

.j plant / system cond io s.

standi individuals, on plant.

ctio pere orre by team.

C C

(--

ILA D.22 j

f-

t i

COMPLIANCE /USE OF PROCEDURES AND TECHNICAL SPECIFICATIONS DID THE CREW:

j (a) REFER T0 the appropriate procedures in a timely manner?

m 3

2 1

L Crew used procedures as Minor failures by crew Failed to correctly required; knew what to refer to procedures refer to procedures when

]

conditions were covered without prompting, but uired, resulting in by procedures and where did affect plant status.

f ulty system operation.

C (b) CORRECTLY IMPLEMENT procedures, including 11 (ng p cedural steps in

~

correct sequence, abiding by cautions and

'e tattT s, lecting correct paths on decision blocks, and correctly n :itionin etw n procedures?

3 2

1

]

Timely, accurate enact-Minor inst ces f

Importance procedural ment of procedural steps misapplicati

,b steps were not enacted by crew, demonstrating corrections ma in correctly, which led tc i

thorough understanding sufficient time t impeded and/or slow of procedural purposes /

avoi verse impac.

recovery or unnecessary bases..

degradation.

9 (c) RECOGH12E E0P ENTRY CONDITI NS d ar appropriate immediate actions without the aid of reference or

.h forms of assistance?

3 1

Consistently accura e H nor laps r errors; Failed to accurately and timely.recogn' io indiv:

al ew members recognize conditions and implementati na ed as. ttance from or execute actions, the t implement even with use of aids.

procedures.

T'~~

t__,

(continued on next page)

0...

i I

I%.

li.d D.23 u

Compliance /Use of Procedures and Technical Specifications (cont'd)

(d) CORRECTLY RECOGNIZE and COMPLY with Technical Specifications and Action Statements of LCOs?

3 2

1 Recopnized and fully Minor difficulties in Failure to recognize /

complied with LCOs/

referring to and/or comply with Tech Spec Action Statements.

applying Tech Specs; LCOs.

crew had to prompt SRO on TS requirements.

1 M

h M

M 8

d o.

r pL k

n D 24

r CONTROL BOARD OPERATIONS p

L'~

DID THE CREW:

(a) LOCATE CONTROLS efficiently and accurately?

3 2

i Controls and indicators Instances of hesitancy /

Instances of failure were located without difficulty,in locating to locate controls hesitation by individual controls by one or more jeopardized system operators.

operators.

atus.

t (b) MANIPULATE CONTROLS in an accurate and timely ma er?

3 2

1 Smooth manipulation of Minor shortcomi i

M ake

,ade in plant within controlled manipulations, ut mani la ng controls parameters.

recovery f. rom err rs caused. stem without cau no transients and related problems.

problems.

(c) Take MANUAL CONTROL of automatic func.'ons, hen appropriate?

3-1 "T

All operators took Mi. r la nd/or Failed to control automatic systems control and smoothly proi tin n e e a

operated automatic befo ov ri'ing/

manually, even when systems manually, operat' g a omatic ample time and i

without assistance, functio t plant indications existed.

thereby averting ransien we e adverse events.

a oided wh n ossible.

W El d

d Id D.25

/

2 Common Rating Errors When making rs ings of the type used in this study, several common ratings errors may occur. Below you will find a description of four different ratir.g errors.

It is important to avoid these errors when rating the crews in the simulator. The descriptions and examples will be further elaborated in the training seminar.

Halo Effect - Tendency to give the same ratings on all perfomance dimensions. For instance, raters may continuously give one crew the same rating for all dimensions for a scenario, even tho ' they performed very well in some areas and very poorly in others.

Severitv/ Leniency - Tendency to give only extr oe r ings. On the Team Interactions Rating Scales, this means the r er wou tend to only use l's, 2's, 6's, and 7's for all crews. This is tr e sp' e t.

probability of every crew being extreme on all dimensions 's ery reu te.

Stereotvoing - Tendency to rate the er b med o perceptio e of some crew cnaracteristics other than their actu p forr n If the crew members dress poorly (e.g., hair' disheveled shi t no t ked in, stains on the pants, etc.) raters may give them lo r(

er if their dress was exemplary) ratings than their perfomance would de rye.

One-Incident Performance - Te ency to rate ec - based on a single incident of perfomance that cc in the pa +.

For example, the crew members may coordinate tas we 1 in an

'+ial i cident in the scenario, but over time they may not coor 'na well a all. The initial incident M

may influence the raters to

'ves sco _s

" Task Coordination" dimension that are too high.

It is important to avoid these e ors when making ratings. Therefore, we suggest several guide nes avoid t ese errors. These guidelines are:

Consider o y e eha rei ed to a particular performance dimensio

- no let per nce in one area dictate how you r te a crew in noth

_rf e

mension.

rating scale.

Do not limit yourself to the

.~

Use all poin on a

extremes.

t' ase -

ra ' gs on

  • e actual perfomance you observed in the sc ario.

f u have prior information about the crew, do not let that i terfere w th he ratings jou make following each scenario.

l Whe aking you ratings, focus on many positive and negative incidents

- mak o

r tings. Use all relevant information to make your i

ra ogs.

When ma team interactions skills ratings, separate those ratings a

fem the technical skills ratings. When making team technical performance ratings, disregard the crew's performance in the team interactions skills areas.

D.26 8

.