ML20137Y315
| ML20137Y315 | |
| Person / Time | |
|---|---|
| Site: | 05000000, LaSalle |
| Issue date: | 04/29/1985 |
| From: | Graves N ENERGY, INC. |
| To: | |
| Shared Package | |
| ML082840482 | List: |
| References | |
| CON-FIN-A-1378, FOIA-85-769 NUDOCS 8603120246 | |
| Download: ML20137Y315 (13) | |
Text
ll//f HUMAN RELIABILITY RECOVERY METHODOLOGY FOR THE LaSALLE PRA Sandia National Laboratories Donnie W. Whitehead Department 6412 FIN A-1378 DRAF'l April 29, 1985 Norman Graves ENERGY INCORPORATED 5345 Wyoming NE, Suite 101 Albuquerque, New Mexico 87109 (505) 821-9381 hfA4 B 0106 PD SHOLLY85-769 PDR
(
DitAr f HUMAN RELIABILITY RECOVERY METHODOLOGY FOR THE LaSALLE PRA 4
1.0 INTRODUCTION
A nuclear power plant probabilistic risk assessment (PRA) is derived from a
comprehensive plant model which considers initiating events, system faults and accident sequences.
The value of the PRA is a function of the depth to which each of these areas is modeled consistent with the objectives of the study.
Historically, factors which have been difficult to assess are human reliability contributions to each area.
Human errors contribute to initiators, system / component unavailabity and recovery probabilities.
The purpose of this study is to develop a method of evaluating the human reliability contribution to recovery probabilities associated with the accident sequence cut sets from the internal events analysis.
The cut sets are the combination of system and component failures that if left unconnected will result in core damage.
In the context of this effort recovery will include all actions that can be accomplished to restore a critical system or component before core damage occurs.
This will include actions such ab manually changing valve positions, restoring power sources, etc.
Because of time constraints imposed by the onset of core damage, major repair actions such as replacing a pump are not considered.
To accomplish this goal the method developed must allow the From these analyst to begin with the accident sequence cut sets.
he should be able to apply standardized guidelines to determine which failures are potentially recoverable and th.'n to assign recovery probabilities to each failure.
For consistency and utility the method developed must have the following general features:
The model/ method must be generic.
The generic model must allow the analyst to identify and evaluate both the importance and magnitude of any plant specific deviations from the generic model.
The method accurately predict recovery probabilities by utilizing parameters the analyst can be expected to apply without extensive training or laborious data collection procedures.
The method must consider the maximum time available for the operating team to successfully perform recovery actions.
The method should be able to be tested in a simulator environment, allowing the analyst to compare his plant specific model against the generic model where possible.
1
A review of the existing literature in this area indicates that many of the efforts to evaluate human reliability do not apply to recovery.
Because of this we will develop an approach that is derived from an understanding of the operator's response during recovery operations which are related to PRA requirments.
2.0 OVERVIEW OF FACTORS IMPORTANT TO RECOVERY To establish a method of evaluating the operator's potential for recovering a
system or component, we must identify the factors affecting the operating team performance.
The critical parameters affecting recovery potentials associated with the operating team and the operating staff are diagnosis and post-diagnosis action.
These are discussed in Sections 2.1 and 2.2.
The non-recovery probability is then a function of failure to diagnose the problem and perform the required actions.
2.1 Diagnosis Diagnosis is defined in this study as the process by which the operating team recognizes and evaluates an operational situation that must be corrected to prevent core damage.
The two phases, recognition and evaluation, must be accomplished within a pericd allowed for recovery actions.
Recognition, To successfully recover a critical failure the operating team must realize a failure has occurred.
These failures are functional in' nature and if left uncorrected will result in core damage.
The critical parameters associated with these functional problems ares reactor power, containment
- pressure, reactor vessel level, and reactor pressure.
These do not include routine operational problems such as a turbine trip or a reactor scram.
These occurrences, while they do require operator response and represent an economic cost to the utility, do not necessarily represent a threat to plant safety.
The situations the operator must recognize from a PRA standpoint consist of identifying potential degraded core cooling situations, failures of emergency core cooling systems, etc.
that adversely affect the critical parameters identified above.
These realizations must be accomplished within the framework of normal operational demands utilizing standard control room indications such as l
Indicators (chart recorders, digital indicators, analog indicators)
Alarms Equipment status lights 2
JT Tag outs (Tags placed on the operating controls of equipment that is out of service for maintenance or repair)
Switch positions Controller settings (analog control devices for controlling a plant parameter such as turbine speed)
Plant status board Operating logs From these indications the operating team must recognize a functional problem exists.
This involves the acts of perceiving, discriminating, and interpreting as defined in Table 12-1 Anaysis With (attached) of the Handbook of Human Reliability l
Emphasis on Nuclear Power Plant Applications (NUREG/CR T278, Oct 1983).
For example, assume the HPCS pump did not start when required.
The operating team initially perceives an alarm or indication that directs their attention to the problem.
The team then discriminates between these and other indications to determine that the low flow alarm for the High Pressure Core Spray (HPCS) system is annunciated.
They must then interpret these data, correlating them to the critical parameter reactor vessel water level.
In this case the loss of flow will either cause the parameter limit to be exceeded or fail to restore the parameter if it is outside of its operating limits.
The variety of alarms and indications in a nuclear power plant and the operational situations in which they may occur present too many combinations to be evaluated on an individual basis.
A few examples will illustrate the problem and suggest methods of evaluation.
First let us assume a simple situation in which a single alarm notifies the operating team that a low flow condition exists in system X.
In this case the alarm itself is the identification, and we can assume that any team will be able to identify the problem as low flow in system X.
This type of situation is easy for the team to recognize and should be easy for the analyst to evaluate.
Now look at a more difficult situation, one that involves multiple alarms and indications.
This type of situation may or may not be more difficult for the team to recognize.
The degree of difficulty is not necessarily related to the number of alarms and indications the team must observe.
The factor having the most effect upon the team's ability to recognize a problem is their familiarity with the particular pattern of alarms and indications that accompany a given failure.
For example, the simultaneous occurrence cf five alarms which the team is familiar with may indicate a given failure as readily as the occurrence of one alarm.
The converse is a situation that gives no, or very few, alarms.
An example of this type of situation would be a flux tilt resulting from an incorrect rod sequence.
The plant could be operated for an extended period without the operating team noticing a problen, particularly if the error resulted in power indication variances between channels similar to those inherent in the instruments.
3
DN' Evaluating this phase requires the development of recognition difficulty categories with well defined guidelines for assigning failures to each category.
Initially we would expect these catagories to be a function of the the type of failure, the directness with which the alarms and indications annunciated the
- failure, the operator's familiarity with the
- failure, and the operatioal situation in which the failure occurred.
Evaluation During this phase the operating team 6etermines the course of action necessary to cope with the problem.
This does not have to be a
detailed analysis of the specific causes of the problem.
The operating team is only required to evaluate the situation to a level that will allow them to decide upon a course of action.
This involves the acts of diagnosis and decision making as defined in the Handbook.
During the evaluation the team will utilize the same information utilized in recognition as well as other sources such as:
Procedures Engineering constants Component technical manuals Plant status board Tag out log Advice from other shift personnel (technicians, technical advisors, etc.)
Going back to our earlier example the team has correctly completed the first step, recognizing the problem.
Before they can initiate corrective action, they must determine the reason for the pump failure.
This is necessary because the cause of the failure may be something the team can manually overide to recover the system or the cause may have also defeated an alternate system normally available for recovery.
In either case the team must obtain additional information that will allow them to decide upon a
course of action.
This involves observing other indications to determine if the failure can be corrected in a reasonable time frame or identifying other systems which are available to accomplish the same purpose.
If there is a standby pump available that will restore flow, the team can immediately start the standby pump.
If no alternative is evailable they can either attempt to determine why the pump is not running, so the problem can be corrected, or select some other alternative, such as switching to a standby system.
Evaluating this phase presents the same difficulties as the recognition phase did.
An example of a simple situation to evaluate would be a low flow alarm resulting from the inadvertent 4
DRM f closure of a
system valve.
The operating team immediately recognizes the problem, observes the control room indications associated with the alarm, determines valve 2 is closed and reopens it.
The opposite occurs in a situation resulting from a containment high temperature alarm.
This alarm could result from an instrument failure, a malfunction of the area cooling systen, a small Loss of Coolant Accident (LOCA) in the vicinity of the detector, a component overheating near the detector, or a fire.
The team cannot identify this problem immediately.
At this point they only know an alarm exists.
Their course of action will depend upon their experience with this type of alarm.
Suppose that in the last several weeks many containment high temperature alarms had been spuriously initiated and the alarm had been informally identified as a
nuisance alarm.
The team may initially give the alarm no attention at all.
Assuming they do take the alarm seriously they will look at all indications they reasonably correlate to the alarm in an attempt to determine can the cause.
In the case of a small LOCA in the vicinity of the detector they will probably only be able to eliminate certain causes leaving others as possibities until additinnal information becomes available.
Evaluating this phase requires the development of evaluation difficulty categories similar to those used to categorize the recognition phase.
The factors expected to influence these categories are. similar to those identifed for the recognition phase.
2.2 Action The final step in the recovery is performance of the actions necessary to recover the system or component.
In a comprehensive study where all systems have been considered, these actions will consist primarily of manually operating components that have failed to operate automatically or restoring of components or control c ir cu i*.s that were not restored following maintenance.
In studies that have excluded some altern1tive
- systems, the actions would consist of initiating these systems.
Of these actions there are two distinct categories:
those actions that can be performed strictly by the operating team, and those that will require the services of support groups.
The reason for this distinction is to account for the longer time frame for performance when a
support group must be included in the evaluation.
3.0 RECOVERY ANALYSIS METHOD 1
In the preceding discussions we have identified the factors which must be evaluated, diagnosis and action.
For the recovery to be successful both steps must be completed prior to the onset 5
L of core' damage.
Before explair.ing the application of the method, the components and definitions associated with this approach will be explained.
The parameters which will be used to determine the nonrecovery probabilities associated with the cut sets are
, Time Factors T,- Maximum time in which both phases of the recovery must be completed T, - Time to perform recovery action after successful diagnosis Td - Time to diagnose the problem Diagnosis Factors D
- Recognition difficulty r
D
- Evaluation difficulty Probabilities Pd - Diagnosis failure probability P, - Human error probability associated'with performing the required post diagnosis recovery action P
- Nonrecovery probability = Pd+Pa r
These parameters will be used' with a diagnosis difficulty matrix and a diagnosit probability curve-to obtain the recovery p,robability associr'.ed with a f ailure identified in the accident sequence cut ssen.
Diagnosis Difficulty Matrix In the discussion of the diagnosis phase of the recovery, two critical parameters were identified:
recognition and evaluation.
To evaluate these parameters we have established two identical scales.
Initially these scales will have five
-divisions categorizing the difficulty of each parameter.
Evaluation of the data from the simulator studies may allow the number of categories to be reduced.
The categories assigned to each parameter are:
Very easy to recognize / evaluate (VE)
Easy to recognize / evaluate (E) 6
O Average ease of recognition / evaluation (A)
Difficult to recognize / evaluate (D)
Very difficult to recognize / evaluate (VD)
Using these categories to form a matrix establishes a method for the analyst to determine the difficulty of the recovery action he is attempting to evaluate.
Guidelines developed from studies of operatin team actions observed in a
simulator environment and from expert opinion (primarly operator interviews) will assist the analyst in determining where a
specific failure should be located on the matrix.
In its final form the matrix will appear as follows:
R E
VD-VE VD-E VD-A VD-D VD-VD C
O D-VE D-E D-A D-D D-VD G
H A-VE A-E A-A A-D A-VD I
T E-VE E-E E-A E-D E-VD I
O VE-VE VE-E VE-A VE-D VE-VD N
EVALUATION Each point on the matrix defines a diagnosis difficulty value which identifies a specific diagnosis failure probabiU ty curve.
These curves wi'.1 be derived from time-to-first-correct-response experiments for operational situations corresponding to each value in the matrix.
Diagnosis Probability Curves Diagnosis failure probability curves will provide the diagnosis failure probability (Pd).
Cv:ves similar to these have been obtained in ~ other studies such as the Criteria for Safety-Related Nuclear Power Plant Operation Action:
Initial Boiling Water Re action (BWR) Simulator Exorcise (ORNL/TM8195) and Post Envert Human Decision Errors:
Operaton Action Tree / Time Reliability Correlation (BNL NUREG 51601).
These curves may be applicable to some of the matrix values described above.
Curves for other matrix values will be obtained from simulator experiments to be conducted at the LaSalle simulator.
The ordinate of the graph will be the diagnosis probability derived from statistical analysis of the time to first correct operator action.
The abseissa will be the diagnosis time (Td).
The general appearance of these curves is shown below.
7
F high Pd low oow high Time Ay lication The analyst begins with the accident sequence cut sets derived from the PRA.-
These identify the initiating event and the various combinations of system failurea that will lead to core damage if corrective actions are not completed.
The flow chart below shows the relationship of the various steps involved in evaluating nonrecovery probabilities.
Calculate Non Re overy Probability
/L A
Determine Diagnosis Fai L,ure Probabiliy I
V Calculate Diagnosis Select Diagnosis Calculate Time JT )
Probabili".y Curve Action d
d L Failure Probability (Pa) s q.
ss Determine Determine Determine Action Time Diagnosis Actign HEP d
(T,)
Diffj.culty Identity Rgcovery Actions Determine Maximum Ident:,fy Recoverable Failu es Time (T,j(
T Accident Sequence Cut Sets 8
O The initial step for the analyst will be to determine which system failures are recoverable and the time the operating team has to accomplish these actions.
Two different categories of this time constraint must be considered by the analyist.
These are component failures that are sequence independent and core or containment states that are sequence dependent.
An example of the first case is the f ailure of a support system which must be recovered within an all.tMd time to prevent nonrecoverable damage to a component that is critical to controlling one of four critical parameters.
An example of the second case are failures that must be identified and corrected within a time period determined by the state of the core or containment as defined by the accident sequence.
A more detailed discussion of these considerations is contained in Appendix-C of the Interm Reliability Evaluation Pr ogr am Analy s is of the Arkansas Nuclear One - Unit One Power Plant (NUREG/CR 2787).
The time allotted to 1
the recovery effort before the onset of core damage is the maximum time (Tm) and can be obtained from thermal hydraulic calculations.
Having identified a
recoverable action the' analyst's next step will be to determine both the Human Error Probability (HEP) associated with the action and categorize the action time (Ta).
For this step in the analysis the action failure probabilities (P ) will be derived from the HEPs.
a Simple actions performed by an operator in the control room.
Examples of these actions are starting a pump or opening a Motor Operated Valve (MOV) that failed to actuate correctly during a system initiate sequence.
Simple actions performed by an operator outside the control 5
room.
Examples of these actions would be manually opening a MOV locally when the motor had failed.
Complex actions performed in or out of the control room.
+
Examples of these actions are manually starting a diesel that had failed to auto start or removal of a test jumper from an initiation circuit that was not removed following maintance.
The action times associated with each category has been arbitrarily identified at this point as 5 minutes, 15 minutes,
and 30
- minutes, respectively.
From this the analyst then (T ), where calculates the diagnosis time d
-T' Td=Tm a
The analyst then categorizes the recovery action according to difficulty matrix described above.
Guidelines for assigning a recovery action to the matrix have yet to be established.
Initially it is believed both phases of the matrix will be related to the number and types of indications and the operating 9
team familiarity with the failure in question.
In addition the evaluation phase may be af fected by additional f actors, involving the number and types of recoverable failures or alternative actions.
In
- practice, when the analyst has a
simulator available, the cut sets can be run on the simulator to assist in evaluating the matrix value.
In situations where a simulator is not available, expert opinion obtained from the operating staf f of the plant under investigation may be used.
The influence of these factors will be investigated during simulator experiments.
When the difficulty matrix value has been determined, the proper response curve is selected to determine the diagnosis failure probability (P )*
d The final step is to calculate the non-recovery probability Pr = P, + P
- d 4.0 DATA COLLECTION Some data applicable to the methodology just outlined
^
already exist.
HEPs for rule-based actions, for example, are fairly well documentd.
Some time-to-first-correct-response data has been collected but have not been categorized according to the difficulty matrix described earlier.
Additional response data will have to be collected to establish statistically reliable response curves for each category.
- Finally, guidelines for assigning cut set failures to the difficulty matrix must be established.
Data for developing the response curves will be collected at the LaSalle simulator.
This will be accomplished by evaluating operator response to standard training scenarios and special scenarios developed specifically from cut sets derived from the laSalle PRA.
An evaluation sheet which will be filled out by the instructor will identify the time it took for the operator's first correct response and will also obtain the instructor's subjective evaluation of the actions, and his assessment of the difficulty of recognizing and evaluating the simulated failure.
This information will be used in establishing the guidelines discussed earlier.
Additional information necessary to establish effective guidelines for assigning difficulty factors to the recognition and evaluation phases of the diagnosis will be developed from information obtained from the operators.
This information will be collected following each scenario and will obtain the operator's subjective evaluation of the classification of the scenario.
This data will be evaluated by personnel experienced in both PRA and operations to establish categorization guidelines, with
- examples, for use in future recovery evaluations.
10
O Features expected to influence these guidelines are:
Recognition Number of alarms Familiarity of alarm pattern Operator awareness of the importance of the functional parameter affected by the failure Types of indications Correlation between indications Initiator categeory Evaluation Number of indications Correlation between indications Number of alarms Familiarity of alarm pattern Number of failures to be evaluated Number of identified f ailures which can be recovered Examples of the operator and instructor questionnaires are attached.
In addition an operator profile sheet will be completed to establish an operator profile baseline and establish
- what, if
- any, correlations exist between the operator's performance and the factors identified in the profile.
11
Table 12-1 e
1
?
Table 12-1 Definitions of cognition-related terms and usage in the Handbook.
Tera Dictionary Definition
- Handbook Usage Cognition the act or process of knowing, restricted to those aspects of behavior involved in including both awareness and diagnosis of abnormal events judgment Judgment the process of forming an not used in our models--too imprecise; used only opinion or evaluation by in the contest of espert estination discerning and comparing
)
Ferceive to attain awareness or under-used in the very narrow sense of *awarenese* with-standing; to became aware out the further meaning of "unders tanding,* e.g.,
through the senses
'some annunciator tales over there are blinking
- Discristnate to mark or perceive the dio-distinguishing one signal (or a set of signals) tinguishing or peculiar fea-from another, e.g., "the coolant level in Tank A is tures of; to distinguish one 37 feet." or if there are 11 sit marks on the seter, like object from another "the coolant level is out of limits" (in the latter case, some interpretation is done for the operator by the design of the display)
Interpret to conceive in the light of the assigr.sent of a seaning to the pattern of individual belief, judgment, signals (or stimulil that was discristnated, or circumstance e.g., "the coolant level in TanA A is low, which means that the sake-up pump is not rhaning, or there la a lean sosewhere, or the thdicator is out of order"; if there is only one pcssible cause for
,s the observed signal, the interpretation is equivalent to diagnosis Diagnosis a statement or concluaton the attributing of the most likely cause(s) of the concerning the nature or abacreal event to the level required to identify cause of some phenomenon those systems or components whose status can be changed to reduce or elisinate the prcbles; diagncsis includes interpretation and (when necessary) decision-saking Decide is sake a choice or judgment
- dectaton-saking" used instead of
- deciding
- Decision-(1) decision *saking es part of diagnosist the Making act of choosing between alternative diegeoses, e.g.,
to settle on the sont probable cause of the pattern of stimuli associated with an abnormal event (2) postdiagnosis dect aton saking: the act of choosing which actions to carry out af ter a diagnosis has been made; in scat cases, these actions are prescribed by rules or procedures, and decielon-eeking is not required Action a thing accceptished usually carrying out one or more activities (e.g.,
steps or over a period of time, in tasas) indicated by diagnosis, cperating rules, or stages, or with the possibility written procedures of repetition Webster (1975) i t*,e