ML20045B346
| ML20045B346 | |
| Person / Time | |
|---|---|
| Site: | South Texas |
| Issue date: | 06/08/1993 |
| From: | Hehl C, Jordan E NRC OFFICE FOR ANALYSIS & EVALUATION OF OPERATIONAL DATA (AEOD) |
| To: | |
| Shared Package | |
| ML20045B306 | List: |
| References | |
| NUDOCS 9306170216 | |
| Download: ML20045B346 (75) | |
Text
.,
g-4 4
ENCLOSURE 1 i
DIAGNOSTIC EVALUATION TEAM REPORT ON SOUTH TEXAS PROJECT. ELECTRIC GENERATING STATION (March 29-April 9 and April 26-30,1993) 6 i
/('
~- k ' 'h @... w_f h$. %y---
'e
. - - ~.
}- *__
.:., fG,
?,,% 7 -.<> O, % \\ )r~'1
-- Q
W C+^
~
s,s
--.-,2:= -.
m
.si 1
= ~ _. _ _ q,_-
~
_. - _ +
as:st;
_. q mmmer
=;
e i
,k 41,1
.. d
,i y
~
l.p 3 M i!M H. I nervi M
s--
n i
1 j
_.7..
,... - f:
SOUTH TEXAS PROJECT ELECTRIC GENERATING STATION i
U.S. Nuclear Regulatory Comission Office for Analysis and Evaluation of Operational Data Division of Operational Assessment i
Diagnostic Evaluation and Incident Investigation Branch 2
7 9306170216 930610 PDR ADOCK 05000498 l
P PDR
?
0FFICE FOR ANALYSIS AND EVALUATION OF OPERATIONAL DATA DIVISION OF OPERATIONAL ASSESSMENT Licensee:
Houston Lighting and Power Company facility:
South Texas Project Electric Generating Station, Units 1 and 2 Location:
Matagorda County, Texas Docket Nos:
50-498,. Unit 1 50-499, Unit 2 Evaluation Period:
March 29 - April 30, 1993 Team Manager:
C. William Hehl Administrative Assistant:
Michelle Smith Team Members:
Henry Bailey Bruce Bartlett Christopher Caldwell Robert Haag Ronald Lloyd Alan Madison
_arry Nicholson Peter Prescott Sadanandan Pullani Walter Rogers John Thompson t
Contractors:
John Darby Brian Haagensen David Schultz Frank dsworth Submitted By:
d ff C. William liehl, Team Manager Mte South Texas Diagnostic Evaluation Team Approved By:
Edward L Jordan, Director at Office r Analysis and Evaluation o Operational Data
1 CONTENTS PAGE Abbreviations............................... v Executive Summary............................ vii
1.0 INTRODUCTION
I I
1.1 Background
2 1.2 Scope and Objectives 1.3 Methodol ogy.......................... 2 2
1.4 Facility Description 3
1.5 Organization 2.0 EVALUATION RESULTS 5
5 2.1 Operations 2.1.1 Marginal Staffing for Scope of Responsibility 6
2.1.2 Poor Support to Operations............... 8 2.1.3 Confusing and Conflicting Management Expectations 9
2.1.4 Inconsistent Operator Performance 11 2.1.5 Ineffective Problem Identification and Resolution...
12 2.1.6 Positive Observations 14 2.2 Maintenance and Testing...................
14 2.2.1 Ineffective Corrective Maintenance..........
15 2.2.2 Less than Fully Effer.tive Preventive Maintenance Program................
17 2.2.3 Maintenance Training Deficiencies..........
18 2.2.4 Deficiencies in the Replacement Parts Program....
19 2.2.5 Insufficient Support to Maintenance.........
21 2.2.6 Inefficient Work Control Process...........
22 2.2.7 Post-Maintenance Testing Program Not Always Effective.....................
24 2.2.8 Periodic Testing Not Always Effective........
24 2.2.9 Positive Observations
................25 2.3 Engineering Support.....................
25 i
2.3.1 Weak Support in Resolving Plant Problems.......
26 2.3.2 System Engineering Program Not Effectively Implemented....................
28 2.3.3 Engineering Work Backlogs Were Large, Poorly Tracked, and Not Well Managed
...........30 2.3.4 Use of Industry and Site Operational Experience Was Inadequate
.............30 2.3.5 Insufficient Support to Engineering
.........32 2.3.6 Configuration Control Weaknesses...........
33 iii i
.2.3.7 Functional and Programmatic Weaknesses Could Adversely Affect the Operability of the Essential Chilled Water System...........
34 2.3.8 Untimely Resolution of Fire Protection Issues 36 2.3.9 Positive Observations
................37 2.4 Management and Organization.................
37 2.4.1 Ineffective Direction and Oversight
.........38 2.4.2 Poor Support and Resources Utilization........
39 2.4.3 Communications and Teamwork were Weak
........42 2.4.4 Ineffective Corrective Action Process 44 2.4.5 Ineffective Utilization of Self-assessment and Quality Oversight Functions
................46 2.4.6 Inadequate Information Systems............
47 3.0 R00T CAUSES............................
49 i
3.1 Failure of Management to Provide Adequate Support......
49 3.2 Ineffective Management Direction and Oversight
.......49 3.3 Failure to Effectively Utilize Self-Assessment and Quality Oversight functions................
49 3.4 Inadequate Root Cause and Ineffective Corrective Action Processes
.....................50 4.0 EXIT MEETING...........................
51 Appendix A - Exit Presentation.....................
52 i
I iv j
i
c i
ABBREVIATIONS AFW Auxiliary Feedwater B0P Balance-Of-Plant CAG Corrective Action Group CCW Component Cooling Water CFR Code of Federal Regulations
~
CH Essential Chilled Water CRDM Control Rod Drive Mechanism DBA Design Basis Accident DBD Design Basis Documentation DED Design Engineering Department DET Diagnostic Evaluation Team DRPI Digital Rod Position Indication EAP Experimental Activity Proposal EAB Electrical Auxiliary Building ECP Essential Cooling Pond ECW Essential Cooling Water ESF Engineered Safety Feature FSAR Final Safety Analysis Review F
Fahrenheit FP Fire Protection FCR Field Change Request FWIV Feedwater Isolation Valve FW Feedwater GL Generic Letter HL&P Houston Lighting and Power HVAC Heating, Ventilating, and Air Conditioning HHSI High-Head Safety Injection ISEG Independent Safety Engineering Group IST Inservice Testing LCO Limiting Condition for Operation LOCA Loss of Coolant Accident LPSI Low Pressure Safety Injection L0 Lube Oil MCB Main Control Board M&TE Maintenance and Testing Equipment MIS Management Information System MOD Modification MOV Motor Operated Valve M0 VATS Motor Operated Valve Analysis and Test System M0P Master Operating Plan -
i y
-Main Steam Isolation Valve NPRDS Nuclear Plant Reliability Data System NRC Nuclear Regulatory Commission NRR Nuclear Reactor Regulation (Office of)
NSRB Nuclear Safety Review Board OER Operating Experience Review OJT On-the-Job-Training OTL Operability Tracking Log PM Preventive Maintenance PRA Probabilistic Risk Analysis PED Plant Engineering Department PMT Post-Maintenance Testing PORY Power-Operated Relief Valve QA Quality Assurance QC Quality Control QDPS Quality Display Parameter System RCA Root Cause Analysis RCS Reactor Coolant System RCP Reactor Coolant Pump 1RE04 Unit [1] Refueling [ outage] number 04 RES Radiant Energy Shield RHR Residual Heat Removal (system)
R0 Reactor Operator RPG Reactor Plant Operator SALP Systematic Assessment of Licensee Performance SDG Standby Diesel Generator SG Steam Generator SGFPT Steam Generator Feed Pump Trip SI Safety Injection SOV Solenoid Operated Valve SPR Station Problem Report SR Service Request SSFA Safety System Functional Astessment SSPS Solid State Protection System STP South Texas Project TCV Temperature Control Valve TDAFW Turbine-Driven Auxiliary Feedwater TM Temporary Modification TPNS Total Plant Numbering System TSC Technical Support Center TS Technical Specification i
TSI Technical Specification Interpretation i
VETIP Vendor Equipment Technical Information Program j
VDC Volt Direct Current vi
)
l i
EXECUTIVE
SUMMARY
From March 29 - April 30, 1993 a diagnostic evaluation team from the U.S.
Nuclear Regulatory Comission (NRC) evaluated the performance of Houston Lighting and Power in ensuring safe operation of the South Texas Project Electric Generating Station. The evaluation had been requested by the NRC Executive Director for Operations in order to obtain information needed to make an adequately informed decision on overall performance at the South Texas Project. The team of 15 evaluators was led by a NRC manager during the 5-week evaluation. Areas evaluated included operations, maintenance and testing, engineering support, and management and organization. Both units remained shut down throughout the evaluation. Unit I was in a forced outage, and Unit 2 was in a refueling outage, both starting in early February 1993.
The team found that the assigned workload and poor site support adversely impacted the capability of the shift supervisor and the control room staff to safely operate the plant. The shift supervisors, were not maintaining a broad plant perspective because their attention was frequently consumed with administrative duties and resource-intensive surveillances. The near absence of operational experience outside the operations organization placed an excessive reliance on shift supervisors to screen work packages for safety impact and selection of appropriate post-maintenance testing. Operators were significantly affected by degraded plant equipment, including equipment workarounds and the administrative burden associated with the high rate of removal and return of equipment to service. Non-licensed operator staffing was insufficient as evidenced by the high routine use of overtime, poor performance on plant log keeping rounds, and several events with staffing shortages and fatigue as contributing factors. The control room staff, including the non-licensed operators, was also affected by longstanding equipment design issues, numerous technical specification inconsistencies, and an overall lack of support from the other site organizations. Operator training was reduced in scope and often deferred to help compensate for staffing shortages.
Several events occurred during the past year which undermined the credibility of site management with the control room staff. Managers comunicated confusing and conflicting expectations and policies to the control room staff through numerous memoranda and other informal guidance. This situation was caused, in part, by the absence cf appropriate operations-perspective feedback to decisions and policy development and the limited opportunities for shift crew interaction with site management.
Operator performance was inconsistent, in part, because of poor communications, cumbersome programs and procedures, and excessive operator distractions.
In performing self-assessments of their performance and root cause evaluations of operational events, both operations and other departments did not consider the broader implications of events. Corrective actions to programs and components were ineffective, presented repetitive challenges to plant operators, and contributed to events and performance problems.
L l
vii i
i
The team also noted that shift turnovers and operator knowledge of control board status-were good and the shift crews were observed to be very dedicated with a generally good morale. Strong radiological housekeeping practices were observed.
Weaknesses in maintenance and testing reduced the reliability of safety-related and balance-of-plant equipment.
Ineffective corrective and weak preventive maintenance significantly contributtd to poor equipment performance.
Ineffective corrective maintenance was caused by inadequate root cause analysis, poor prioritization of work activities and poor craft performance.
Preventive maintenance was not always effective because the program lacked appropriate scope and included deficient implementing procedures. Craft performance suffered from numerous training deficiencies.
Senior managers did not consistently enforce quality performance. Unavailable replacement parts delayed maintenance activities and negatively affected equipment operability. The size of the maintenance staff and the amount of emergent work resulting from equipment failures limited the time that the maintenance personnel could spend on balance-of-plant corrective maintenance.
The inefficient work control system, the added workload of a three-train plant and longstanding design deficiencies also detracted from the amount of balance-of-plant corrective maintenance.
Surveillance and post-maintenance testing did not always verify equipment operability.
Post-maintenance testing weaknesses resulted from a poor reference document and the lack of planner training to compensate. The surveillance testing procedures did not contain all required technical specification attributes. Other contributors to the maintenance and testing weaknesses were poor comunications and coordination, a poor management information system.
Maintenance and testing contained several positive attributes including the quality of the maintenance shop facilities and the recent relocation of almost all the participants in the work control process into one building. The creation of the Technical Support Engineer position improved communications with engineering. The licensee had improved the resolution of field difficulties through walkdown crews (when work schedule constraints permitted) and the establishment of the General Maintenance Supervisor position.
Several conditions reduced the ability of the engineering staff to support other organizations. Neither the plant engineers nor the design engineering staff had sufficient resources to support the site. This caused engineering to be slow in identifying deficient conditions and hasty in performing investigations or root cause evaluations, resulting in many engineering solutions or products that corrected the symptom, but not tne root cause.
Approved corrective actions generally took a long time to implement because of schedular or financial considerations. The system engineering program was comprehensive, but was not effectively implemented because of insufficient resources and management oversight.
Engineering work backlogs were large, rapidly increasing, poorly tracked, and not well managed.
Industry and site operational experience was not adequately used, which led to avoidable site events, repetitive equipment failures, and additional engineering time expenditures. The engineering staff was not sufficiently trained and lacked the analytical tools for some tasks.
Information databases were often inaccurate or not current, and computers for system-level trending were very viii
4 inaccurate or_ not current, and computers for system-level trending were very limited in number.
Improvement procrams often did not help improve the efficiency of engineering support, and the resultant corrective actions were often delayed or cancelled because of low priority or high cost.
Configuration control weaknesses adversely affected the performance of safety-related equipment and the quality of design documents.
The design, maintenance, and testing of the essential chilled water system contained functional and programatic weaknesses, which, if not corrected, could adversely affect the operability of the system. The licensee had never demonstrated or analyzed the ability of the system to function under design i
basis accident low heat load conditions. As a result, the team questioned the operability of the system under certain cold weather accident conditions.
The licensee did not resolve several chronic fire protection issues in a timely manner. The issues included excessive shrinkage of penetration seals, an unreliable fire alarm system, a large backlog of service requests on fire protection systems, and inadequate control of transient combustibles in the plant.
Team observations included a dedicated engineering staff, an evolving engineering program to include an increased level of support through the establishment of the technical support engir.eer group, well written and comprehensive major modifications, and a partially completed design basis document program that appeared to produce good results, j
Senior management failed to provide the staff clear direction and oversight in-several key areas including performance standards and station priorities.
Frequent, conflicting messages about the implementation of these standards and priorities were sent by senior management. As a result, the staff often questioned the credibility of senior management. Middle managers often failed to obtain feedback on problems and give consistent direction because they did not interact frequently enough with people in the plant.
These factors significantly decreased management effectiveness throughout the organization.
1 Management failed to apply sufficient resources to maintain performance levels and standards. Significant station activities were not adequately funded despite the clearly stated objections of the middle managers responsible for those areas.
Information systems, including management information systems, did not effectively support performance monitoring and in some cases impeded the plant staff.
Management did not establish good comunications and teamwork.
Expectations regarding competing priorities between budget, schedule and safety performance were not communicated well. Vertical comunications were particularly weak.
Senior managers did not foster frank, open feedback from lower managers and staff. As a result, lower managers were reluctant to bring problems to senior management. Horizonal comunications and interface problems added to the difficulty of completing work using established processes. There was a lack of coordination and accountability between the disciplines.
The team concluded that the licensee's ineffective corrective action process IX j
l I
Ineffective problem identification, shallow root cause analyses, inadequate safety evaluations, lack of aggressive problem resolution, poor information l
systems, and budgetary constraints resulted in short term rather than long term solutions. Managers did not respond effectively to the findings, concerns, and recommendations of their principal self-assessment and quality oversight functions; including those of the Nuclear Safety Review Board and Quality Assurance.
The team also noted there were recent improvements to the Master Operating Plan (H0P) and recent management and organizational changes, both those completed and those underway at the end of the evaluation period.
Additionally, most of the performance issues observed by the team had also been identified by the licensee's own assessment effort.
The underlying root causes for declining performance at the South Texas Project included the following: (1) failure of management to provide adequate support; (2) ineffective management direction and oversight; (3) failure to effectively utilize self-assessment and quality oversight functions, and (4) ineffective root cause and corrective action processes.
P X
1.0 INTRODUCTION
1.1 Background
South Texas Project (STP) had a decline in performance during the past two Systematic Assessment of Licensee Performance (SALP) periods.
In the last SALP period (June 1991 to August 1992), performance declined in the functional areas of maintenance / surveillance, engineering / technical support, security and safety assessment / quality verification. Performance problems discussed in the SALP reports appeared to result from weaknesses in three broad areas, which included material condition and housekeeping, human performance, and organizational performance.
Historically, hardware problems, some of which were repetitive, resulted in numerous plant trips, transients, engineering safety features (ESF) actuations, and forced outages. Many of these system and component problems were limited to balance-of-plant equipment, but the licensee had not fully resolved longstanding safety-related hardware problems.
Personnel errors also had resulted in reactor trips, ESF actuations, and technical specification violations.
Findings from a Region IV Operational Safety Team Inspection, conducted in the late fall of 1992, indicated that the licensee has not resolved these issues.
In February 1993, an event involving failures of the turbine driven auxiliary feedwater pumps on both units prompted the NRC to send an augmented inspection team (AIT). The AIT further confirmed the continuing problems with repetitive equipment failures.
There had recently been a large number of management changes. The operations manager, maintenance manager and training manager recently left the facility.
In addition, the plant manager, QA manager and planning and scheduling manager had recently been reassigned along with many lower level managers.
At a meeting in January 1993, NRC Senior Managers discussed the decline in performance and the licensee's corrective actions. The Senior Managers determined that the need for additional information to make an adequately informed decision on performance at South Texas Project was apparent. The Executive Director for Operations (EDO) directed the staff to obtain this information by conducting a diagnostic evaluation at the South Texas Project.
Prior to the NRC's decision to conduct a diagnostic evaluation, the licensee had acknowledged the existence of problems and had made several attempts to improve performance, including implementing an operations improvement program, however these efforts were only partially effective. The licensee had developed a Master Operating Plan for South Texas that included action plans to effect improvements. After the NRC announced the diagnostic evaluation, the licensee conducted its own pre-diagnostic evaluation which confirmed the need for further action.
I
4 1.2 Scope and Objectives The EDO directed the staff to perform a broadly structured evaluation to hssess overall plant operations and the adequacy of the licensee's major programs for supporting safe plant operation. The following goals were set for the diagnostic evaluation: (1) provide information to NRC Senior Management to supplement the Systematic Assessment of Licensee Perfonnance Program and other assessment data, (2) evaluate licensee management and staff involvement and effectiveness with respect to safe plant operation, (3) evaluate the effectiveness of the licensee's improvement programs and plans, (4) determine the root causes of safety-related equipment and performance problems.
1.3 Methodology The diagnostic evaluation team (the team) consisted of 15 technical members and an administrative assistant and was organized with four team leaders reporting to a team manager. The team devoted several weeks to preparation that included team meetings and briefings by representatives from Region IV, the Office of Nuclear Reactor Regulation (NRR), and the Office for Analysis and Evaluation of Operational Data (AE0D). On March 29, 1993, the team began a 2-week evaluation at the facility. This first onsite evaluation period was followed by a two week inoffice review of data collected. The team returned to the site on April 26, 1993 for an additional week of onsite evaluation. The NRC Resident Inspectors frequently attended team meetings at the site and provided technical advise to the team. Representatives from the team met daily with their licensee counterparts to discuss team activities and. findings.
An in-depth assessment of the Essential Chilled Water system was performed to gain insight into the licensee performance of activities such as maintenance, testing, operation and design control. Case study reviews were also performed of equipment problems with the standby diesel generators, the auxiliary feedwater pumps and the main feedwater system. Special emphasis was placed on identifying the causes of performance problems. The licensee's performance in identifying and correcting its own problems was also assessed.
1.4 Facility Description The South Texas Project Electric Generating Station (STP) is located 20 miles southwest of Bay City, in Matagorda County Texas; or about 87 miles south-southwest of the city of Houston, Texas, the nearest major city. The plant features twin Westinghouse Pressurized Water Reactors of the four-loop design.
The STP Units 1 and 2 are largt (3800 MWt) physically separate plants with three-trains of major safety equipment. The facility was constructed by Ebasco with Bechtel as the architectural engineer. The operating licenses were issued on August 25, 1988 and June 19, 1989 for Units 1 and 2, respectively.
2 r
1.5 Organization The STP is jointly owned by Houston Lighting and Power Company, the cities of San Antonio and Austin, and Central Power and Light Company headquartered in Corpus Christi, Texas. Houston Lighting and Power Company is the licensed operator of STP. The following chart illustrates the HL&P organizational structure for management and support of STP.
1 6
s P
i i
i l
3
-SOUTH TEXAS PROJECT I
ELECTRIC GENERATING STATION ORGANIZATIONAL STRUCTURE 4
{
I t
I i
D. D. Jersee Chelemas of the Boats I
ese CEO I
W. T. Cas tle Steep VP i
N eolo st.
D. P. Nell Seeler WP i
i N eeleet l
1 S. L. Reese l
W. H. Klas ey, Jr.
WP vp Nesleet Engieestles Neolent Sesotelles i
I T.Jorges S. L. Pe t her Geestal Weseget Pleet Wesfest Servlees t
Weseget i
I I
D. L.eur W. P.ly T. 4. Wessi.h.
Weseget Wasseet Deputy.
d Plant Eaglesettes Design Easiesegles Plant Weseget l
l 1
T. E. Undetweed K. J. Chtlettee M. W. Seeseedahl Weseget Weseget Meeesee-Wals to nest e fleet Opetellees Teateloal Stev&oes m.
...... e... gee *
- -. - -. -.i
.., em,w9,.
1 l
l 1
4-1
2.0 EVALtJATION RESULTS 2.1 Operations The team evaluated the performance of the operations department in the areas of staffing and scope of responsibility, site support to operations, communication of management expectations, operator performance and problem identification and resolution. To accomplish this evaluation, the team used a combination of formal interviews, informal interviews, document reviews, plant walkdowns and extended control room observations.
The team found that the assigned workload and poor site support adversely impacted the capability of the shift supervisor and the control room staff to safely operate the plant. The shift supervisors were not maintaining a broad plant perspective because their attention was frequently consumed with administrative duties and resource-intensive surveillances. The near absence of operational experience outside the operations organization placed an excessive reliance on shift supervisors to screen work packages for safety impact and selection of appropriate post-maintenance testing. Operators were significantly affected by degraded plant equipment, including equipment workarounds and the administrative burden associated with the high rate of removal and return of equipment to service.
Non-licensed operator staffing was insufficient as evidenced by the high routine use of overtime, poor performance on plant log keeping rounds, and several events with staffing shortages and fatigue as contributing factors. The control room staff, including the non-licensed operators, was also affected by longstanding equipment design issues, numerous technical specification inconsistencies, and an overall lack of support from the other site organizations. Operator training was reduced in scope and often deferred to help compensate for staffing shortages.
Several events occurred during the past year which undensincd the credibility of site management with the control room staff. Managers comunicated confusing and conflicting expectations and policies to the control room staff through numerous memoranda and other informal guidance. This situation was caused, in part, by the absence of appropriate operations-perspective feedback to decisions and policy development and the limited opportunities for shift crew interaction with site management.
Operator performance was inconsistent, in part, because of poor comunications, cumbersome programs and procedures, and excessive operator distractions.
In perfoming self-assessments of their performance and root cause evaluations of operational events, both operations and other departments did not consider the broader implications of events. Corrective actions to programs and components were not always adequate, presented repetitive challenges to plant operators, and contributed to events and performance problems.
I The team also noted that shift turnovers and operator knowledge of control board status was good and the shift crews were observed to be very dedicated 5
with a generally good morale. Strong radiological housekeeping practices were observed.
~~
2.1.1 Marginal Staffing for Scope of Responsibility The excessive scope of responsibility for the control room staff was not commensurate with the staffing resources. The shift supervisors and their control room staff could not effectively maintain the proper focus and overview of plant operations because of their participation in administrative programs and resource-intensive surveillances. Certain programs and processes had evolved to become cumbersome and labor-intensive, depending excessively on operations personnel to effect desirable performance. To accommodate the outage workload, the number of operating crews were reduced from 5 to 4 for each unit. Operator training was reduced in scope and was often deferred to help compensate for marginal staffing levels. Furthermore, strained operator staffing had prevented a normal progression of operations personnel into other parts of the site organization.
Staffing of senior reactor operator (SRO) positions was most affected by administrative burdens. Each of the units' crews contained two SR0s, the shift supervisor and unit supervisor. The shift supervisor spent the majority of his time performing a number of administrative duties, including reviewing work packages for work start authority and again at closeout for post-maintenance test adequacy. The team also confirmed through interviews that there was a heavy administrative burden placed on the shift supervisors during power operations. This situation was exacerbated during refueling outages.
One shift supervisor reviewed 22 procedure field changes during a dayshift watch, taking a significant portion of his shift. Additionally, the team observed that the shift supervisor was routinely involved in providing the maintenance craft personnel with general information, such as plant status and schedules, that could have been obtained elsewhere. This responsibility left the unit supervisor to monitor the control room and any plant tests or evolutions.
The surveillance test program was also a significant resource burden on the control room staff in general and the SR0s in particular.
Each unit has three-trains of safety equipment, thus adding a third more surveillances than 4
the conventional two train design. Operations, in lieu of the instrumentation and control department, conducted the solid state protection system (SSPS) logic surveillances that essentially consumed the entire control room staff.
Shift supervisors stated that during these tests, it was sometimes necessary for them to become directly involved in collecting test data.
In addition, with the implementation of the reactor trip reduction program, SR0s were expected to assume a more active oversight role during certain critical surveillances. This program was a good initiative, but was implemented without regard to the accompanying additional resource burden.
The work control program, including post-maintenance testing (PHT) and equipment clearance orders, had evolved to become cumbersome and labor-intensive. The limited operational experience throughout the site organization placed an excessive reliance on the shift supervisor to screen work packages for safety impact and selection of appropriate PMT. The shift i
6 f
i supervisor spent considerable time reviewing work packages to determine the appropriate PMT requirements because the PMT requirements recommended by the work planners were often vague and unusable. A typical example was the wording in a work package that listed several possible PMTs with directions to
" perform PMT in accordance with applicable procedures, if required."
The three-train design requirements and the history of material condition problems frequently prompted the control room staff to cause the plant to enter limiting conditions for operation (LCO).
For a 2-year period, ending February 1993, Unit I averaged 19 LCO entries each day while Unit 2 averaged 26 entries each day. These figures did not include entries into an LCO for surveillances of less than 8-hour duration. Although these LCO entries and exits were appropriate, they placed a substantial administrative burden on the operations staff. On the basis of a request by the team, the licensee performed a survey and concluded that the plant entered LCOs at a rate greater than four times that of similar facilities.
The licensee further strained staffing levels for the non-licensed reactor plant operators (RPGs) by implementing 12-hour shifts without margin above the administrative staffing limit of 4 each shift. Thus, any delay in an RPO reporting to work resulted in holding one of the onshift RP0s over past the normal 12-hour shift and therefore, on occasions, exceeding the technical specification (TS) overtime guidelines. Since January 1993, operators exceeded the TS guidelines for overtime on several occasions.
1 l
The RP0s were significantly affected by degraded equipment and balance-of-l plant (BOP) workarounds.
RPO log keeping rounds were being conducted on an expedited basis to accocrnodate management's expectations to keep work moving.
Numerous examples of frayed insulation and oil leaks were left unchallenged by the RP0s. The shortage of RPGs resulted from the decisions management made in 1991 and 1992 to reduce the operator training pipeline size and frequency, as well as to staff an operations support activity with reactor operators (RO) and RP0s in lieu of outside contractors. Additionally, management recently decided to relax the standards for staffing a crew to allow use of apprentice RPGs as long as they were qualified at their specific watchstations. These management decisions could continue to impact plant performance because of the need to utilize seasoned RP0s to fill the upcoming reactor operator license class, thus further reducing the skill level of the remaining RPGs in the field. Any future attrition of RP0s would necessitate removing RO candidates from license class to perform RPO duties.
The additional workload associated with the dual unit outages had forced the licensee to defer operator training and reduce the shift rotation from five to four crews.
Personnel from the extra crew that would normally be in training were dispersed into the remaining crews to support the outages. Training personnel stated that the proposed schedule to resume training would reduce the scope of requalification training to include only the minimum required subjects.
In addition, the licensee had suspended on-the-job RP0 training since February 18, 1993, to correct performance issues relating to the role of the evaluators. An attempt to retrain evaluators, both in an initial one day class and a subsequent series of classes, failed in part because operations could not divert individuals away from their plant duties to attend.
7
The team reviewed the staffing required to mitigate a resource-intensive accident (rea'ctor shutdown outside the control room) and concluded that the existing staffing would be significantly strained to handle such a scenario.
Strained staffing resources also contributed to several plant events. The i
licensee concluded that inadequate staffing for a surveillance procedure contributed to a non-licensed RP0 throttling the wrong train of essential cooling water (ECW). Earlier that same day, another RPO inadvertently deenergized the Unit 2 plant computer system.
Prior to the latter event, the RP0 had worked 8 consecutive midnight shifts, several of which were of 12-hour duration.
In another example, an electrical transient and accompanying t
transfer of residual heat removal (RHR) control to the remote shutdown panel occurred when an RP0 failed to correctly return an inverter to service. The l
SRO on duty during the event later stated that he was extremely busy with administrative tasks and therefore failed to stop and conduct an-adequate pre-job discussion with the RPO.
2.1.2 Poor Support to Operations Poor support to operations was adversely impacting their capability to safely operate the plant. Degraded equipment, inadequate support to computer information systems, and numerous TS inconsistencies created unnecessary obstacles for the control room staff. Management's inability to resolve these problems had resulted in an operations staff that tolerated poor plant material condition and poor site support.
Longstanding design issues and degraded plant equipment contributed significantly to the operator workload.
Examples of these deficiencies are listed below.
J The absence of permanently-installed flow measuring devices required the use of temporary test instrumentation to support routine pump flow surveillances in safety-related systems such as the essential chilled water, auxiliary feedwater, RHR, and spent fuel cooling systems.
Extended surveillance setup times had been necessary to obtain accurate and meaningful surveillance results.
Numerous Target Rock solenoid-operated valves (S0Vs) exhibited problems due in part to installation in high temperature applications. Some of the problems resulted in the SOVs being out of their required position or without prepar remote indication. Operators obtained local readings and measurements to compensate for these inadequacies and performed contingency actions to operate these valves properly. Systems where these SOVs were installed included the primary sample system, the steam generator bulk water sample system, the chemical volume and control system, and the reactor vessel head vent system.
Numerous automatic controls, such as temperature control valves (TCVs),
had been inoperable for a significant period of time. Examples included the TCVs in the BOP lube oil coolers, the seal oil coolers, and the hydrogen coolers on the turbine generator. These TCVs were oversized and had to be manually throttled, along with the associated bypass valves, in order to control cooling for the various systems.
8
1 Support to operations was inadequate regarding computer information systems and software' programs. The Information Resources Organization supplied the operations staff with programs, such as a TS Action Statement Program, which it could not use because they did not perform the required tasks and were difficult to use. As a result, operations developed an internal network of computer information systems and software programs that aided in performing such functions as work control, equipment clearances, and reactor coolant system leak-rate calculations. These systems were originally intended to aid the operators in performing these operating functions. However, operators had come to rely on these information systems as more than aids because they could centralize and organize multi-task functions.
These systems were initially 1
developed without appropriate quality assurance controls and procedural guidance. The team reviewed the licensee's actions to date and found these computer systems still lacked quality controls regarding software development and utilization.
Problems resulted from operators using these uncontrolled computer information systems. For example, a problem with the Operability Tracking Log (OTL) software contributed to an event on March 10, 1993, when the Unit 2 heating, ventilation, and air conditioning (HVAC) system was found to be incorrectly aligned. The OTL program tracked equipment operability to aid the operators in deciding on the limiting and applicable TS action requirements, based on i
equipment operability. Due to an error in the OTL software report program, and because the operators were inappropriately relying on the information in the computer report, the operators were not alerted to the proper system lineup configuration for the HVAC system.
Consequently, the operators did not approp. iately align the HVAC system. The licensee was reluctant to formally adapt and upgrade these computer systems because a new, site-wide integrated computer network has been planned and will assume the functions of the various site computer systems. However, operators expressed significant concerns that the product would not meet their needs.
The licensee had not aggressively pursued TS revisions to resolve the numerous inconsistencies within the TS at STP.
The licensee has written approximately 150 technical specification interpretations (TSIs) and clarifications (TCIs) to help clarify some of these TS inconsistencies. These TSIs and TCIs were only intended to provide short term guidance, with the eventual implementation of this guidance to be provided in a more permanent document, such as a TS i
amendment. However, some of the TSIs have been in effect for over 4 years.
~
Particularly troublesome areas regarding TS inconsistencies included the toxic gas and control room HVAC, standby diesel generator (SDG), and RHR systems.
Although the licensee has pursued essential TS amendments needed to continue plant operation, the bulk of the licensee's TS improvement effort was deferred with the understanding that a new standard TS would eliminate many of the TS inconsistencies.
2.1.3 Confusing and Conflicting Management Expectations Management has sent confusing and conflicting guidance to the control room staff through numerous menioranda without soliciting input from the first line supervisors.
Some of this guidance consisted of the implementation of operations policies and standards and other informal guidance. Many of these 9
i
i informal memoranda were revisions or changes that sometimes contradicted earlier memoranda. This problem was compounded by the absence of the control room staff feedback to decisions and policy development and has impaired management's ability to communicate expectations and goals to the shift supervisors. Although the licensee attempted to resolve this issue, several events and issues occurred during the past year which undermined the i
credibility of site management with the control room staff.
Examples of some of these events and issues are described below.
l A station problem report (SPR) and quality assurance (QA) audit of e
operations in 1991 stated that hundreds of memoranda on various i
subjects, such as TS interpretations and operations policies, were issued each year and that many of them conflicted with procedures or with each other.
Most of these memoranda were not controlled documents because they had not been reviewed formally.
The licensee discovered that the Unit 2 reactor trip breaker shunt trips e
had never been adequately tested.
The breakers were declared inoperable and a TS shutdown action statement was entered without informing or involving the control room personnel that were licensed and responsible for operability decisions.
When the control room personnel were finally informed, they were told they had been in the action statement for over 2 hours2.314815e-5 days <br />5.555556e-4 hours <br />3.306878e-6 weeks <br />7.61e-7 months <br /> and would therefore need to perform an accelerated shutdown.
A memorandum was issued after the above event occurred regarding the TS shutdown.
l Approximately 3 months later, this memorandum was revised and transmitted to the control room approximately 2 minutes before entry into a 6-hour shutdown action statement. The new policy allowed extra time for troubleshooting before initiating an orderly shutdown, yet did not contain an explanation for the change and appeared to conflict with previous guidance and the TS. The shift supervisor disregarded this policy revision and initiated an orderly shutdown.
The team concluded through numerous interviews that some of the first-line supervisors in operations were still confused as to their management's expectations in areas such as policies, programs and TS interpretations. Some of the shift supervisors indicated that they would have disregarded guidance in some of these memoranda and would have continued to operate the plant using previous documented operating guidance.
The licensee attempted to consolidate their written guidance to the control rooms into a " Plant Policies and Procedures Manual."
This effort appeared to have been hampered by the inability of the licensee to determine the extent and subject matter of the memoranda that had been issued.
Program and policy implementation was ineffective, in part, becau:e of a lack of operations perspective and middle management involvement.
A licensee survey and team discussions with the shift supervisors indicated the reactor trip prevention program was implemented without being explained sufficiently to be uniformly understood and accepted.
t Management's desire to reduce trips by deferring more work to the outage, while at the same time not providing additional resources or extending the outage duration, appeared as a 10 l
I conflicting. message to.the control room staffs.
l 2.1.4 Inconsistent Operator Performance.
l 1
Operators generally performed their duties in a competent manner. ' However, l
poor communications, marginal staffing levels, and excessive operator i
distractions and tasks contributed to inconsistent operator performance.
I Cumbersome programs and procedures were also barriers to successful j
performance. A common element in many previous operator performance problems was the failure to stop and adequately focus on the specific task.
Work schedule, practices, and staffing were significant contributors to performance problems in operations. ' Examples reviewed by the team included
{
the following:
l No SR0 was in the Unit 2 control room for a short period because the e
unit supervisor left the control room to participate in a surveillance activity. The licensee determined the root causes to have been a lack i
of self-verification and deficiencies in management guidance regarding l
comand and control. Contributing factors included the relative i
4 inexperience of the SR0s involved, shift rotation, and competing tasks l
that called the unit supervisor out of the control room.
[
An inadvertent boron dilution event occurred while the operators e
attempted to borate the reactor coolant system. The licensee determined I
that the event was caused by a deficient understanding of the system operation during shutdown conditions. However, other contributing factors mentioned in the licensee's assessment included an. inadequate shift turnover, insufficient crew experience, and the inability of personnel to properly focus on a specific task.
During a periodic surveillance of the ECW system, the operator pho was f
performing the local valve manipulations had to leave the area to locate 1
a valve lock key so he could throttle flow to a heat exchanger. When he returned, he throttled the valve to the wrong heat exchanger in a different train. The licensee detemined that the event resulted'in part from inadequate self-verification. The licensee stated that a contributor to the event was the insufficient number of personnel-available to perform the evolution. SR0s who have performed this surveillance in the past stated to the team that generally,' four RPGs' are required to perfom this surveillance, although the surveillance i
could have been performed efficiently with three RPGs.
In this case, only two RPGs performed this surveillance which made it difficult to focus on the required specific tasks. The three remaining RP0s on shift j
at the time were not available because they were performing other duties.
{
Cumbersome programs and procedures were barriers to successful operator I
performance. The following are examples:
Weaknesses in the PMT program, such as difficulties in understanding the e
PMT reference manual, have resulted in confusion and differing i
11 j
i i
i P
P interpretations by the various users. As a result, the PMT recommendations from the planners were often very broad and vague. This contributed to the performance of incorrect post-maintenance testing following painting activities on SDG 13.
Poor procedures centributed to two occasions in which an RHR pump I
tripped on low flow. One of these trips occurred during a reactor cavity draindown.
An operating crew shifted from charging pump IB, which was operable, to charging pump 1A, which was inoperable, because they did not thoroughly review a work package for closure.
In this case, two maintenance groups were performing work activities associated with pump 1A. One group had completed its work and had sent its package to the control room, the other had not. There was no easy way to determine the status of work being performed.
The team generally agreed with the licensee's assessment that there were two fundamental factors for the events in 1992 and early 1993: (1) personal l
accountability and responsibility needed to be emphasized, stressing self-4 verification and attention to detail and (2) organizational and programmatic support had to be strengthened to enhance the clarity of written guidance, oral briefings and instructions, equipment design and labeling, and repetitive task assignments. However, the team considered that work schedule, work practices and staffing issues have also been significant contributors to past events. These were only recently being considered as contributing factors by the licensee.
j 4
2.1.5 Ineffective Problem Identification and Resolution The identification and resolution of problems within operations were not always effective, presented repetitive challenges to plant operators, and resulted in events and performance problems. Self-assessments of operations' performance and root cause evaluations of operational events, performed by both operations and other departments, lacked depth and in some cases did not consider the broader implications of events.
The procedure for performing the operations's self-assessment program appeared to provide a good, detailed methodology. However, in implementing this procedure, the operations staff performed shallow assessments that were relatively ineffective in identifying program weaknesses. Self-assessments performed in 1991 and 1992 on various topics such as operator aids, equipment clearance orders, and locked valves failed to find significant problems that were later evident through events or reviews by other organizations. The assessment of operator aids was limited to a one page check sheet. Those who performed a 1992 self-assessment of operator rounds stated that they lacked sufficient time to conduct a detailed review within the 2 days allotted to l
prepare, perform, and document their assessment of the program. The operations manager stated that the assessments have been largely ineffective in assessing operations programs.
Evaluations of operational events, both by operations and other organizations, 12 i
were of limi.ted depth and did not always consider the broader implications and impact on the plant. Examples reviewed by the team included:
In followup to a Unit 1 inverter trip on March 29, 1993, the corrective actions group (CAG) focused on several narrow elements of the event such as the RP0 energizing the cabinet without a procedure in hand.
- However, the CAG did not address other generic aspects of the event, such as the adequacy of the recovery actions and the RHR system controls automatically swapping to the remote shutdown panel.
Two performance problems reviewed by the team concerned the return of essential chiller 21A to service without the proper paperwork being completed and failed to verify control rod position between digital rod position indication and demand position. The operations staff determined that the root causes were inattention to detail and human performance problems, respectively.
Recommended corrective actions focused on counseling the individuals or issuing memoranda to the operators. However, the more fundamental aspects of these events, including weaknesses in the work control process and distractions in the control room, were not pursued.
Discussions with the applicable operations personnel indicated they were aware that more fundamental issues existed; but did not have the time or charter to pursue them a
further.
Under the licensee's corrective action program, the CAG and the Independent Safety Engineering Group (ISEG) assigned investigations and operating experience reviews (OER) to the operations staff. The two SPR coordinators on the operations staff were responsible for performing 8 to 10 OER and 20 to 30 SPR reviews a month. These individuals spent large amounts of overtime to i
complete the sizeable workload as the volume of SPRs continued to grow.
Management support to correct program and component problems was not always effective. This was evidenced by management deferral of corrective action proposals to fix several longstanding problems. The operators continually faced challenges such as poor plant labeling, a weak locked valve program, and difficulty in controlling plant cooldown after a reactor trip. Additionally, to reduce waterhammer in the auxiliary feedwater (AFW) system, the operators had to control AFW flow to the steam generators with a stop check valve.
Management did not properly address this problem until after the thermal cycles on the steam generator nozzles from this method of flow control became an issue.
Poor component labels contributed to numerous plant transients and other events.
In response to a 1991 NRC concern, the licensee stated that a labeling improvement program was being implemented, and committed to reconsider the direction and schedule for the program.
In February 1992, a
labeling study indicated labeling problems had been recognized since 1988. The study revealed that 21 percent of valve labels included incorrect information.
As a result, plans to upgrade labels were submitted in both the February and August 1992 budget proposals. However, management did not fund the program on either occasion as it preferred to maintain previous efforts to replace missing labels when identified.
In late 1992 and 1993, component labeling 13 l
l
l contributed to errors by personnel. For example, in March 1993, while preparing to begin work on a breaker, maintenance personnel discovered that the breaker was energized. Subsequent investigation revealed that the wrong breaker had been cleared for return to an operable status by operations due to confusing equipment labeling on the breaker panel. At the end of the evaluation, the licensee informed the team that it was again reviewing the prioritization of the plant labeling upgrade.
)
2.1.6 Positive Observations Positive observations on the shift crews included good shift turnovers and awareness of control board status. The shift crews were cbserved to be very dedicated and with a generally good morale. Excellent radiological housekeeping practices were observed.
2.2 Maintenance and Testing The team performed a detailed evaluation of maintenance and testing of equipment for the essential chill 2d water system (CH), standby diesel l
generators (SDG), auxiliary feedwater (AFW) and main feedwater (FW) systems, and to a lesser degree, other safety and nonsafety-related components and systems. The team reviewed: preventive maintenance, predictive maintenance, corrective maintenance, periodic testing and post-maintenance testing to determine their centributions to proper equipment performance.
The team also reviewed the number and experience of available personnel resources, the work control process, maintenance training, maintenance facilities and the availability of spare parts as they related to the performance of maintenance, the performance of tests, and the maintenance request backlog. The team conducted formal interviews, informal interviews, machinery history record reviews, and direct observations to evaluate maintenance and testing i
perfomance.
The team found that maintenance and testing weaknesses reduced the reliability of safety-related and balance-of-plant equipment.
Ineffective corrective and weak preventive maintenance significantly contributed to poor equipment performance.
Ineffective corrective maintenance, caused by inadequate root i
cause analysis, poor priorititation of work, and poor craft performance, adversely affected safety-related equipment perfonnance. Preventive maintenance weaknesses resulted from the lack of appropriate scope and incorrect implementing procedures. Craft performance suffered from numerous training deficiencies. Senior managers did not consistently reenforce quality i
performance. Unavailable replacement parts delayed maintenance activities and negatively affected equipment operability. The size of the maintenance staff and the amount of emergent work resulting from equipment failures limited the time that the maintenance personnel could spend on balance-of-plant corrective maintenance. The inefficient work control system, the added workload of a three-train plant and longstanding design deficiencies also detracted from the amount of balance-of-plant corrective maintenance. Surveillance and post-maintenance testing did not always verify equipment operability. Post-maintenance testing weaknesses resulted from a p;or reference document and the 14 l
lack of training to compensate for this document. The surveillance testing procedures did not conta 3 all required technical specification attributes.
Other contributors to the maintenance and testing weaknesses were poor communications and coordination, the quality of the management information system, and the limited staffing to perfom vibration analysis for predictive maintenance.
The team found several positive attributes including the quality of the maintenance shop facilities and the recent relocation of almost all the participants in the work control process into one building. The creation of the Technical Support Engineer position improved communications with engineering. Using walkdown crews, when work schedule constraints permitted, alleviated some of the field difficulties. The General Maintenance Supervisor position was an asset also in resolving field difficulties. The licensee had recognized the need for improvements and had recently developed or was developing corrective action plans to address many of the problem areas.
2.2.1 Ineffective Corrective Maintenance Ineffective corrective maintenance, caused by poor root cause analysis, poor prioritization of work, and poor craft performance, adversely affected safety-related equipment performance.
The licensee had established a program to determine the root cause of events and major equipment failures but the identification and evaluation of maintenance issues did not always occur. This resulted in the ineffective or untimely resolution of equipment problems. Craft personnel occasionally made mistakes during corrective maintenance. Though the procedures in many instances did not help alert workers to potential problems, a well trained, qualified and attentive workforce could have successfully completed the tasks.
The following were examples of poor root cause determination and poor maintenance efforts:
A feedwater isolation bypass valve (a containment isolation valve) was e
found partially open for over a year. Maintenance had been performed on the valve to correct a failure to get a closed indication light in the control room. Maintenance personnel stroked the valve several times and i
then adjusted the closed limit switch to bring in the closed light without confirming the actual position of the valve.
Five months later l
the licensee issued another SR to correct an apparent discrepancy between the control room indication and the local position indication.
Howear, the potential safety significance of this condition was not properly recognized and the SR was worked six months later. At that time maintenance personnel determined that the valve was only going 75 percent closed.
Standby diesel generator (50G) injector pump hold down studs failed on e
nine separate occasions. The root cause analysis was shallow and corrective actions were insufficient to preclude recurrence. The licensee did not perform a more detailed analysis of the stud failures until the team became involved.
15
A SDG. jacket water leak took four attempts to correct.
The first two repair efforts were unsuccessful because maintenance personnel installed the wrong size of gasket.
In a third repair attempt, the gaskets were made on site with material not suited for that application, Corrective maintenance performed on the high head safety injection e
(HHSI) pump damaged the motor when too much oil was added. The oil level sight glass was reinstalled upside down resulting in a higher level mark on the sight glass. The procedure specified 11 quarts as the capacity of the bearing reservoir. Due to the unrecognized reversed sight glass, maintenance personnel added 20 quarts of oil to obtain the level mark on the sight glass. The result was oil intrusion into the motor windings.
Repeatedly, the overspeed trip tappet of a turbine driven auxiliary feedwater pump (TDAFWP) did not return to its normal position after a manual or overspeed trip. The initial corrective action involved removing a sticky tar-like substance from the tappet and the upper turbine housing. Personnel did not determine the cause of the tar-like substance and took no action to preclude its recurrence. Approximately six months later the tappet stuck again in its tripped position when the turbine was manually tripped.
In 1989, the windings to a motor-operated valve, critical in e
establishing hot leg recirculation following a LOCA, electrically shorted rendering the valve inoperable. The licensee performed an inadequate root cause analysis and did not rectify the problem.
In 1993, the windings shorted again rendering the valve inoperable.
Untimely corrective maintenance and poor prioritization resulted in delays in restoring equipment to an operable sta+ ~. allowed degraded equipment to deteriorate until it was incapable
,.c rming its intended safety function, and resulted in site personnel being forc h *a work around the failed and degraded equipment. Specific example. 41' In August of 1992, the licensee discovered that seismic hold down screws in the Qualified Display Processing System (QDPS) card racks were missing but did not issue a SR to replace the missing screws for four months. The team noted that the SR had not been implemented or evaluated for operability. At the request of the team the licensee evaluated the situation. Consequently, QDPS was declared inoperable affecting both units.
The steam generator primary side access covers on Unit I had known leaks for two and a half years prior to being repaired. On four separate I
occasions licensee personnel noted the leaks, however, corrective action l
was not implemented. These leaks existed through two refueling outages.
While numerous SRs were written for repairs, confusion concerning the status of the SRs resulted in the repair efforts not being performed.
Failure to assess the safety impact of a steam leak and properly prioritize the repair effort resulted in an inoperable steam generator 16
power operated relief valve (PORV).
The steam was impinging on the PORV actuator but was not immediately repaired. Having observed previous failures of the FWIVs, caused by degraded hydraulic fluid, the licensee knew that subjecting hydraulic fluid to high temperatures would cause it to degrade.
Eventually, the oil degraded preventing the PORY from operating, and it was declared inoperable. After repair efforts failed, the licensee entered an 8-day forced maintenance outage.
There was a large maintenance backlog of security system components such as rusted camera base plates, water in manholes, broken doors, and degraded intrusion detection systems. An average of 13 officers, each working 12-hour shifts were being scheduled to compensate for long tenn maintenance issues.
A number of components in the inservice test program were in the alert and failed condition. Seven had been in the alert condition since 1989 without effective corrective action taken.
Eleven components had been in the alert range before failing and being declared inoperable. Al so, the increased testing frequency for items in the alert range from quarterly to monthly resulted in another burden on operators to accomplish testing.
2.2.2 Less than Fully Effective Preventative Maintenance Program Weaknesses in the scope and implementation procedures for the preventative maintenance (PM) program contributed to poor equipment performance. These weaknesses could be attributed in part to poor development of the PM program in terms of scope and procedure accuracy that were not properly addressed.
In developing the initial PM program before plant licensing, the licensee identified approximately 33,000 PM tasks.
In the late 1980s the licensee revised the program to include approximately 11,000 " active" tasks, 12,000
" inactive" (no longer scheduled) tasks, and the remaining tasks either cancelled or superseded. The licensee selected the inactive tasks based on "importance factors" that had been assigned to the individual PM activities when they were developed. After the "importance factors" screening the only review performed to determine which individual PM tasks would be classified as inactive or active, was a non-technical one by maintenance personnel.
As a result of not performing these inactive PM tasks, the following preventable events, equipment failures, and instances of poor assurance of operability (mostly dealing with instrument calibrations) occurred:
- 1) An uncalibrated lubricating oil pressure switch contributed to a startup feed pump not starting on demand following a reactor trip;
- 3) The technical support center (TSC) chillers failed, resulting in high temperature conditions and corresponding alarms on the plant computer. Only inactive PMs were associated with the TSC chiller; 4) Temperature indicators used to determine the operability of safety-related chillers were not periodically calibrated; and 5) An uncalibrated level switch in the component cooling water system contributed to an ESF actuation.
17
Appropriate,PM tasks were not developed or included in the PM program for some important equipment in the SDGs and support systems.
Relay failures in tha voltage-regulating circuit caused inoperable SDGs on two different occasions.
The relays that failed had been installed beyond their ten years service life, but had never been replaced nor scheduled to be replaced. Main control board meters used during SDG testing and SDG monitoring were not in the PM program and had not been calibrated since startup.
In reviewing the issue of noncalibrated SDG meters the licensee identified approximately 150 additional main control board instruments that were not in the PM program.
Some of these instruments monitored important parameters for the 125 VDC batteries and the battery chargers.
Incomplete or incorrect PM procedures resulted in poor equipment performance.
Examples of equipment failures, malfunctions or inoperable equipment resulting from procedural deficiencies were: 1) Repeated examples of 13.8 KV breakers failing to cycle due to inadequate PM lubrication instructions; 2) An ESF actuation from an improperly calibrated emergency cooling water transmitter because the PM instructions did not specify the type of M&TE equipment to be used. The improperly calibrated transmitter contributed to an ESF actuation; and 3) Two relief valves having incorrect setpoints because the PM procedures specified the wrong setpoint.
The method for improving the PM program involved the use of PM " feedback" forms to identify errors and refinements for incorporation into the program.
However, since 1991 a large backlog of PM feedback forms had accumulated.
In 1992 over 2500 feedback forms were not processed on schedule. As of April, 1993 the backlog of unprocessed PM feedback forms was approximately 5800.
Recently, the licensee added personnel to address this large backlog.
2.2.3 Maintenan:e Training Deficiencies The training program established for maintenance craft personnel was deficient. This contributed to numerous instances of ineffective maintenance and poor equipment performance. Key maintenance support personnel such as maintenance planners and procedure writers only received limited formal technical training. The team noted the procedures did not include enough detail and cautions to compensate for training deficiencies.
In mid-1992, an industry organization determined the licensee's basic maintenance craft skills training program was deficient.
In response the licensee established a recertification testing program for journeyman in the three disciplinet. To allow continuation of work, craft qualification matrices were established.
Each matrix listed individual craftsmen and the tasks in which they were currently " qualified," such as breaker maintenance.
To compensate for a lack of " qualified" individuals, a supervisor or qualified journeyman continuously observed the work of the unqualified personnel. This decreased the supervisor's freedom to observe work actfvities under his cognizance and to select workers for particular tasks. The poorly trained work force and the obstacles associated with the matrix further reduced the effectiveness of the maintenance program.
18
The team noted the following deficiencies in the basic craft training knowledge:
The training for molded case circuit breakers did not include the correct method for determining the breaker settings based on the values (amperes) provided in the setpoint document. This lack of training and the complex procedural instructions for determining the breaker settings resulted in incorrect breaker settings rendering seven safety-related i
components inoperable.
I&C technicians introduced air into essential chillers and flooded a control panel with oil due to a lack of understanding of how the chillers function under vacuum. This contributed to degraded equipment performance and lack of equipment availability.
Craft personnel were not trained on the need to expeditiously place battery chargers into service after performing discharge testin3 c 125 VDC station batteries. This lack of training and the omission from the testing procedure of this critical element of battery testing could have resulted in permanent damage to the station batteries.
Beyond the basic skills training deficiencies, the licensee identified that training in specialized skills did not match the necessary tasks tc be performed. Examples included:
The mechanical maintenance staff was not trained to maintain the TDAFWP governor or the TDAFWP overspeed trip mechanism.
This contributed to the numerous unsuccessful attempts to resolve problems on TDAFWPs.
Training for reactor coolant pump motors was based on a generic 2000 horsepower motor and did not include the unique features of these motors.
Training on the SDGs did not include the governor or voltage regulator.
I&C technicians assigned to work on the security system were not trained on certain aspects of that system. Three.of the five designated technicians had not received specific security system related training and the other technicians received only limited training.
2.2.4 Deficiencies in the Replacement Parts Program The team found numerous deficiencies in the spr.re parts system including the lack of parts and the use of wrong parts. These deficiencies contributed to inefficient use of maintenance resources and negatively impacted equipment operability.
The lack of parts caused safety-related equipment to remain inoperable and degraded the performance of equipment important to safety. The lack of readily available parts contributed to the size of the maintenance backlog.
Approximately 25 percent of all non-outage corrective maintenance work packages were routinely in a parts hold status. Numerous general usage 19
material such as bolts, nuts, gaskets and desiccant were not available as general issue ~ items from the warehouse. To support emergent work, needed i
items were obtained by substituting parts that were reserved for other planned work. This time-consuming process frequently occurred, hampering maintenance effectiveness.
Examples of unavailable parts which adversely impacted equipment performance included:
In December 1992, during maintenance to repair an AFW turbine trip throttle valve, a replacement disc and seat were not available in the warehouse. The valve was reassembled and the system declared operable.
This leaking valve contributed to numerous overspeed turbine trips in 4
January and February of 1993.
The lack of parts contributed to valves within the primary containment being inoperable for a year. During the 1991 refuel outage, "T" drains were not available for installation into some new valve motors. Without the "T" drains installed, moisture could not drain from the motors and could damage the components after an accident. A failure of the work control system later resulted in the "T" drains not being installed in a timely manner.
l l
The Unit 2 secondary side B PORY was inoperable because of sn internal i
hydraulic leak that caused premature failure of a pressure switch. The internal leak caused the hydraulic pump to cycle frequently and eventually resulted in the high pressure switch failing low. The i
hydraulic pump ran continuously until its thermal overloads tripped.
The switch was replaced h t the leak was not fixed because of a lack of parts.
Previously, several switches on the CH system failed and were replaced.
f e
However, if they had failed again no replacements were in the warehouse
+
or on order when the inventories were' reviewed by the team, j
Occasionally, maintenance personnel installed or attempted to install the wrong part in safety-related systems at the facility. The major reason for these situations appeared to be in the parts sourcing process. The process to I
I determine the correct replacement part was extremely difficult and cumbersome.
The computerized parts reference system consisted of two databases requiring the viewing of multiple screens. The overall response of the system was slow.
j Numerous part numbers were " flagged" for revision because of the large engineering document backlog. Sometimes part numbers, as in some Rockwell valve components, were wrong. As a result some planners distrusted the computer information. When computer information was questionable, such as being flagged, design and purchase documents had to be used. However, a number of these documents had unincorporated revisions due to the large engineering backlog. The overall process was prone to error and was time consuming. Examples of attempts to install incorrect parts follow.
During repair activities to stop a jacket water leak on the inlet header l
of a SDG, the discharge header gasket was installed. This occurred twice before the mechanics recognized that the gasket was not the correct size.
l 20 t
i L
During repair activities to return an essential chiller to service, the correct type of pressure switch was installed but was not qualified as safety-related. The switch was replaced before the chiller was placed back into service.
2.2.5 Insufficient Support to Maintenance Besides training and parts availability deficiencies, management support to maintenance was poor in a number of other areas reducing the effectiveness of the maintenance process and the quality of the maintenance effort, and thus contributed to the poor material condition.
Maintenance department senior supervisors provided limited reenforcement of expected quality performance standards. Their time was dominated by preparation for meetings, attending meetings, and performing administrative tasks. This was exemplified by the maintenance manager attending approximately four meetings daily and, until recently, reviewing all maintenance feedback forms. The team rarely saw senior maintenance supervision observing work activities. This observation was confirmed in interviews with key maintenance supervision.
R e staff size was insufficient to accomplish corrective maintenance given the productivity achieved using the existing system, the unique three-train design of the facility, and the untimely resolution of design deficiencies. The balance-of-plant corrective maintenance effort suffered mostly due to the lack of personnel resources.
From the end of the Unit 2 refuel outage (December 1991) until the beginning of the Unit I refuel outage (September 1992) both units were essentially operating at power. However, during these 9 months, the backlog of nonoutage SRs increased by 1600, an increase of approximately 50 percent. Three fourths of the SRs were on balance-of-plant systems.
The more significant percentage of these were on systems such as feedwater, condensate polishers and fire protection which were in poor material condition.
The third train added approximately 25 percent more work to the testing and maintenance effort.
Recognized design deficiencies for numerous equipment had not been resolved.
Examples included the Brown Boveri breakers for the TSC diesel generator, the obsolete fire protection computer system and water intrusion into the startup feedwater pump's lubrication system.
Refrigerant and oil contamination mitigation devices had not been permanently installed on essential chillers even though air and moisture intrusion had reduced their reliability.
In an outage condition, substantial, routine use of overtime was used to try to accomplish the scheduled tasks. Since September 1992, craft workers and supervisors, including the maintenance managsr, averaged 50 percent overtime.
In some instances Technical Specification overtime guidelines were exceeded without appropriate management review and approval. By the first quarter of 21 i
1993, absenteeism for illness of craft workers had increased in all three disciplines 'with mechanical maintenance almost doubling from previous years.
Also, staffing limitations impaired the amount of vibration monitoring accomplished under the predictive maintenance program. Only one technician was assigned to routine vibration monitoring of rotating equipment. As demonstrated by the following examples, activities were infrequently performed on some equipment and some degraded conditions were not aggressively pursued.
During a vibration analysis in May 1990, the Unit I main generator seal oil backup pump exceeded alarm limits. However, over 2 1/2 years passed before the next vibration readings were taken in January 1993.
Subsequently, the deteriorated motor and pump bearing had to be replaced.
Since the plant began commercial operation the vibration of the Unit 1 HHSI pump motors exceeded the alarm limits of the predictive maintenance program.
However, more than 27 months passed between vibration readings on the IC pump and 18 months passed for the 1A pump. Eventually, unsatisfactory oil samples vere taken on the 1A and IC motor bearings.
The combination of the high vibration and oil sample results prompted maintenance personnel to inspect and replace the bearings.
As much as a three years passed between vibration readings on the Unit I auxiliary feedwater pumps.
2.2.6 Inefficient Work Control Process The work control process was inefficient and manpower intensive. This resulted in the inefficient use of staff which contributed to the poor material condition of the plant, and the completion of only high priority work. Consequently, the high maintenance backlog significantly stressed the maintenance department in the form of emergent work, rendering the process more inefficient. Also, multiple barriers to an efficient work control process existed within the planning, preparation, scheduling and execution of work.
The large amount of emergent work significantly contributed to the inefficient work control process.
This was due, in part, to the large corrective maintenance backlog which inhibited the timely repair of deficiencies before their condition degraded. The emergent work estimates of about 20 percent were routinely exceeded with over 40 percent of routinely accomplished corrective maintenance being emergent work. This stressed all aspects of the work control system by reducing the time in which personnel had to accomplish their assigned tasks.
Less than 80 percent of the scheduled corrective maintenance was performed. The excessive emergent work, prompted the staff to postpone previously planned or partially planned jobs, adding to the backlog.
Numerous barriers existed in the planning and preparation to accomplish work.
A major detractor was the quality of management information systems. Planner performance was inhibited in some cases by incorrect component identifications within the facility on SRs. This necessitated walkdowns of all equipment to 22
verify the correct component number.against design documents.
As discussed previously, the tools (computer and vendor references) for identifying correct part number,s were cumbersome and not user friendly.
The computer hindered scheduler performance because it did not allow for changes in workforce size or show support discipline ties to performing the job. The schedule was only published every other day with handwrittea updates needed when it was not published.
Due to previous trainin'g program deficiencies, there were numerous i
unqualified maintenance personnel requiring increased supervisor observation and direction.
Occasionally, the lack of certified workers required work activities to be postponed until a qualified individual was available.
In the field, numerous barriers existed in the execution of work.
Coordination and comunication weaknesses contributed to poor maintenance while work package quality and parts availability deficiencies decreased efficiency.
Poor communications resulted in several recurring safety-related equipment failures and damage to safety-related equipment.
Examples follow:
During an uncoupled run of the reactor coolant pump, the lower motor bearing failed as a result of lube oil starvation. The starvation occurred when a maintenance worker, attempting to correct a suspect high lube oil level, drained approximately 3 gallons of lube oil before the The maintenance worker failed to notify the control room that the run.
lube oil had been drained.
The maintenance worker's supervisor, stationed in the control room, stated that he did not know of the suspect high lube oil level and would have stopped the job if he had known that 3 gallons had been drained.
Several SDG failures resulted from broken fuel oil injector pump hold down studs, many of which were installed using a deficient stud driver tool designed by the system engineer. The system engineer failed to consult design engineering or the SDG vendor while designing the tool.
An inadequate turnover contributed to maintenance personnel flushing two feedwater isolation valve hydraulic systems with used coolant from the balance-of-plant diesel generator instead of the proper flushing fluid.
An inadequate pre-job brief contributed to a HHSI motor pump bearing reservoir sight glass being improperly installed. As a result, lube oil was introduced into the motor windings.
Coordination of the various support groups did not always occur as evidenced by the team observing two work activities which could not continue because support workers did not erect the designated scaffolding. Approximately 20 I
percent of the work packages were revised to correct errors or to change the i
scope of the work activity. The work procedures occasionally contained unneeded information ud did not match the experience of the individuals using the procedures. The procedures were sometimes ignored. For example the contractors testing motor operated valves did not take the procedure to the field or taped all four corners of the 200 plus page procedure shut.
When the job required parts not originally anticipated, the parts had to be sourced for availability and usually deallocated from another planned job. However, the General Maintenance Supervisor, who had to approve the deallocation, and 23
numerous line supervisors were not sufficiently trained to use the computer which detrac'ted from the parts sourcing effort.
2.2.7 Post-Maintenance Testing Frogram Not Always Effective Numerous weaknesses in the implementation and programatic requirements for post-maintenance testing (PMT) reduced assurance that equipment was operable upon return to service. The PMTp3p1044Xmanda6y planners to select the appropriate test requirements did not specify appropriate detail and occasionally specified the wrong test. The planners lacked appropriate training, experience, and guidance that would allow them to compensate for the manual's deficiencies. This resulted in planners listing all possible Pi4T that might be necessary and specifying the PMTs to be performed as "if required." This required the already heavily burdened shift supervisor to review the scope of work completed in orrier to specify the appropriate post maintenance test to be perfonned.
Periodically, the shift supervisor selected inappropriate PMT and in some instances inoperable equipment was not identified such as:
SDG 13 was inoperable for 2 weeks because of the failure to perform adequate PMT after painting activities. The correct PMT had been specified in the work package but was inappropriately cancelled due to a concern over excessive SDG starts.
PMT was not performed on a SDG output breaker after a fuel oil injector pump was repaired. During that maintenance activity, the output breaker was racked out to support work on the injector pump and later improperly racked in.
For PMT the SDG was started but breaker closure was not tested. During a subsequent surveillance test, the SDG output breaker would not close onto the bus.
After work was performed on the feeder breaker for essential chiller 21C, no PMT was performed, yet the chiller was declared operable. The following day the chiller's feeder breaker tripped during a routine start attempt due to breaker problems.
2.2.8 Periodic Testing Not Always Effective Previous LERs and NRC enforcement actions documented that the licensee's testing procedures did not ensure all TS surveillance requirements were being met. Numerous instances had been identified where procedures were inadequate to meet TS surveillance requirements, thereby reducing assurance that the equipment was operable. Among these was a failure to completely test a manual reactor trip handswitch and the nonconservative setting of one of the four reactor protection channels during a reactor startup. To address these inadequacies, the licensee committed to perform a sample review of TS surveillance tests and verify their technical adequacy. The licensee's sample indicated that the TS surveillance program needed strengthening but did appear to satisfy TS. The licensee later comitted to enhance the TS surveillance procedures.
The team detennined that the enhancement program only included those TS 24
surveillance tests performed by operations and maintenance personnel. The team was concerned that the enhancement program did not include TS surveillances performed by engineering and other groups and, those procedures could be deficient.
In follow up, the team questioned the licensee concerning an engineering test of the control room emergency ventilation recirculation charcoal adscrbers. Subsequently, the licensee determined the surveillance requirements had not been satisfied in that an defective method had been devised to determine when adsorber testing should be performed. The failure to send the charcoal sample for testing within the required interval resulted in a 3-month delay in determining that the charcoal bed was below required standards for iodine adsorption.
If the adsorbers had been tested at the proper test interval, the results may have indicated the degradation, prompting adsorber replacement before the standards were not met.
While reviewing the root causes of this event, the licensee stated that management originally intended to include all TS surveillances in the enhancement program. However, due to a personnel error, the scope of the program had been reduced to include only the operations and maintenance surveillances. The licensee expanded the scope of the enhancement program to meet the original intent.
2.2.9 Positive Observations Maintenance and testing ccntained several positive attributes. Shop facilities were good and almost all the participants in the work control process had been recently relocated into one building. The creation of the Technical Support Engineer position improved communications with engineering.
Using walkdown crews (when work schedule constraints permitted) alleviated some of the field difficulties. The General Maintenance Supervisor position was also an asset in resolving field difficulties. The licensee had recognized the need for improvements and recently developed or was developing corrective action plans to address many of the problem areas observed by the team, including the computer based management information system, maintenance training and the work control process.
2.3 Engineering Support The team evaluated the effectiveness of engineering and technical support functions by reviewing engineering evaluations, root cause determinations, responses to plant deficiencies, licensee event reports (LERs), and engineering work backlog trends associated with the plant engineering department (PED) and design engineering department (DED). The team reviewed and walked down the essential chilled water system (CH) and associated heating, ventilation, and air conditioning systems to compare the as-built condition with design. The team interviewed members of the engineering staff and management to evaluate teamwork, comunication, technical background, training, and the work environment.
Several conditions reduced the ability of the engineering staff to support other organizations. Neither the plant nor the design engineering staff had sufficient resources to appropriately support the site. This caused 25
engineering.to be slow in identifying deficient conditions and hasty in perfoming investigations or root cause evaluations, resulting in many engineering solutions or products that corrected the symptom, but not the root cause. Approved corrective actions generally took a long time to implement i
because of schedular or financial considerations. The system engineering program was comprehensive, but was not effectively implemented because of insufficient resources and management oversight.
Engineering work backlogs were large, rapidly increasing, poorly tracked, and not well managed.
Industry and site operational experience was not effectively used, which led i
to avoidable site events, repetitive equipment failures, and additional engineering time expenditures. The engineering staff was not sufficiently trained and lacked the analytical tools for some tasks.
Information databases were often inaccurate or not current, and computers for system-level trending were very limited in number.
Improvement programs often did not help improve the efficiency of engineering support, and the resultant corrective actions were often delayed or cancelled because of low priority or high cost.
Configuration control weaknesses adversely affected safety-related equipment and the quality of design documents.
The design, maintenance, and testing of the essential chilled water system contained functional and programmatic weaknesses, which, if not corrected, could adversely affect the operability of the system. The licensee had never demonstrated or analyzed the ability of the system to function under design basis accident low heat load conditions. As a result, the team questioned the operability of the system under certain cold weather accident conditions.
The licensee also did not resolve several chronic fire protection issues in a timely manner. The issues included excessive shrinkage of penetration seals, an unreliable fire alarm system, a large backlog of service requests on fire protection systems, and inadequate control of transient combustibles in the plant.
Team observations noted a dedicated engineering staff, an evolving engineering program to include an increased level of support through the establishment of the technical support engineer group, well written and comprehensive major modifications, and a partially completed design basis document program that appeared to produce good results.
2.3.1 Weak Support in Resolving Plant Problems The engineering departments gave weak support in resolving plant problems.
The root cause analyses (RCAs) and resulting corrective actions were often ineffective in preventing repetitive equipment problems. The team found inadequacies regarding equipment operability evaluations. The permanent plant modification program, although complex, was comprehensive and well defined, and modification packages were generally well prepared. However, the installation of plant modifications to effect plant improvements was not always successful. Temporary modifications (TMs) were not thoroughly evaluated and were not aggressively pursued to closure.
Examples of ineffective engineering support, investigations, root causes analyses and corrective actions include the following:
26
[
\\
\\
The licensee did not determir:e the root cause of repetitive failures of the fuel injector pump hold-down studs associated with the standby diesel generators (SDGs). Nine separate failures occurred between 1987 and 1993, including five failures on SDG 22. The failure of these studs was a significant contributor to the high unavailability of SDG 22.
The RCAs and accompanying corrective actions were ineffective in preventing repeated failures of the toxic gas monitors and containment ventilation isolation system.
Since 1987, 28 toxic gas monitor events occurred, including 6 during 1991 through 1992. Similarly, there were several repeat occurrences involving spurious actuation of the containment ventilation isolation system, including 4 during 1991 through 1992.
Widespread, longstanding problems with the application and performance of Target Rock solenoid-operated valves (S0V) were not resolved. These valves were used extensively in several safety-related systems.
Multiple LERs involving wear, aging, debris, contamination, and valve misapplication had occurred since 1990. Temporary modifications were installed to bypass containment isolation valves to allow steam generator sampling. Previous corrective actions, such as re-orienting the main steam isolation valve above the seat drains, did not prevent additional failures.
The licensee started up with a significant design deficiency that resulted in excessive water hamer in the auxiliary feedwater system (AFW).
Engineering's resolution to the water hamer issue was to install mechanical stops on the AFW valves to prevent them from closing, which created additional operational concerns. Operators could no longer effectively throttle valves during certain plant conditions to control flow to the steam generators. As a result, operators controlled flow by cycling the stop check valves, resulting in an excessive number of thermal cycles on steam generator nozzles.
Corrective actions for numerous safety and nonsafety-related circuit breaker problems were not aggressive or complete. The licensee evaluated each breaker failure and took corrective actions for safety-related breakers. Many of these actions were incomplete.
Further, the licensee was slow in resolving problems and taking corrective actions for many nonsafety-related breakers.
After a reactor trip, the startup feedwater pump (SUFP) failed to start upon demand because of low oil pressure.
Repeated occurrences of moisture intrusion had caused the oil filters to become clogged, reducing the lube oil pressure. A previous SUFP trip on low lube oil pressure had not been properly evaluated, resulting in the failure to recognize design deficiencies.
During oil pump transfers, the steam generator feed pump turbine tripped repeatedly because the oil pressure decreased rapidly.
Engineering 27
mistakenly accepted the recommendation of a vendor to drill holes in the pump casing to prevent air binding, which, when implemented, exacerbated the problem.
The Technical Support Center diesel generator was not reliable, as evidenced by repeated failures'to start and load during testing.
Contributing to the poor reliability was exposure to the environment, design weaknesses, and poor circuit breaker reliability. The licensee only partially implemented proposed resolutions to these problems.
The engineering staff did not always adequately evaluate equipment operability as illustrated below:
In August 1992, a system engineer discovered that seismic hold-down screws were missing from the Unit 1 quality display parameter system (QDPS) card racks, but did not understand the seismic consequences and did not request an evaluation for operability. The licensee did not properly evaluate the effect of the deficiency on operability until so requested by the team in April 1993.
The QDPS was subsequently declared inoperable.
Torque measurements and computations associated with testing of motor operated valves (MOVs) were not evaluated to verify valve operability.
The licensee discovered, upon evaluating previcus test data, that several residual heat removal valves had been torqued above design values because of a deficiency in the test procedure and associated engineering documents to measure or compute torque.
The permanent plant modification program, although complex, was comprehensive and well defined, and modification packages were generally well prepared.
Most safety evaluations were thorough. The licensee performed followup actions, such as drawing and document revisions, in a timely manner. However, the installation of plant modifications to effect plant improvements was not always successful. TMs were not thoroughly evaluated and were not aggressively pursued to closure, as illustrated in the following:
Sixteen TMs were installed for more than 2 years, including some that caused problems for operators. Some TMs were originally assigned a long restoration period (1 to 2 years) or given an extension without adequate justification. Some were later converted to permanent modifications and remained open until the permanent modifications were closed.
In performing engineering evaluations for TMs affecting the CH system and steam generator sample valves, the engineering staff failed to realistically evaluate required operator action in a potential high radiation field, to compensate for failed safety-related automatic valve actuators.
2.3.2 System Engineering Program Not Effectively Implemented Program expectations for the system engineers greatly exceeded the resources provided. Some system engineers were assigned the " primary" responsibility 28 L
h for as many.as 10 systems, with an additional 10 systems assigned as " backup."
Most system en-ineers could not remember what backup systems they were assigned, and were not knowledgeable in their Sackup system assignment.
Staffing allocation was roughly based upon other two-unit facilities, however, the three-train safety system design resulted in an increased workload for the system engineers when compared to otherwise equivalent nuclear facilities with two trains.
System engineers generally did not complete their monthly mlkdowns or did not sufficiently document them when performed. Some system e sineers performed walkdowns of multiple systems in both units on the same z, indicating a cursory review at best. System health reports lacked useful detail and trending infomation. Most system engineers received no feefback on the content of the system health reports from their supervisors, did not review and track service requests on their assigned systems, did not know how many service requests were outstanding on their systems, did not kncw how many modifications affected their systems, and did not track and trend problems or particular attributes of their systems. The licensee indicated that trending will not be performed until the end of 1993 when the software becomes available.
PED relied heavily on on-the-job training (0JT) or previcus work experience of the system engineer to fulfill its needs.
Several engineers were deficient in training or equivalent work experience, which, with the demands on time available for daily responsibilities and a perception of limited resources, resulted in system engineers receiving little training for specific jobs, components, or systems. Those engineers who were " hands-on" oriented and focused more on the equipment aspects of their system, tended not to be as involved in technical monitoring and analysis which included design basis issues, system tracking and trending, and proactive activities.
According to PED's system engineering guidelines, the system engineer was expected to " maintain a knowledge of the status of, and will be considered the plant expert for, their assigned system (s)." However, the system engineers could not effectively do their job because of the large amount of emergent work, lack of detailed system or component specific training, inaccurate material history databases, lack of system tracking and trending, lack of individual computer systems, and staffing limitations. Additionally, management did not oversee and direct the program in a consistent manner.
System engineers reported to different supervisors who had differing standards for implementing the system engineering program, which resulted in program inconsistencies. Because of the reactive nature of system engineering work, and the networking between operations and maintenance, first line supervisors maintained minimal control over work assigned to the system engineers, who spent over 40 percent of their time supporting emergent work of other site organizations. Thus, the system engineers received support requests that had not been screened for validity by PED supervision.
29
2.3.3 Engineering Work Backlogs Were Large, Poorly Tracked, and Not Well Managed The engineering work backlog was large, increasing rapidly, and ineffectively managed.
Emergent work consumed approximately 40 percent of engineering time and prevented engineering from reducing the backlog, accomplishing scheduled work, or helping to improve the plant.
The licensee did not have an effective method to determine the size and composition of the engineering backlog. This conclusion is based on the fact that the data initially given to the team was grossly inaccurate and it subsequently took more than 4 weeks to provide reasonably accurate data. The backlog consisted of approximately 10,800 work items on May 1,1993, including 253 modifications, 395 engineering change notices, 6674 preventive maintenance feedback items, 209 predictive maintenance items, 200 Station Problem Report (SPR) investigation items, 690 plant change form items, 204 design change notices, 381 request-for-action items, 54 TMs, 385 procedures, 33 vendor equipment technical information program (VETIP) items, 51 vendor packages, 660
" closure" items, 44 operating experience review (0ER) items, and other miscellaneous items. The backlog did not include work assignments of administrative or contractor personnel.
The licensee's person-hour estimates to complete each type of work item indicated that the modifications, the engineering change packages, and the preventive maintenance feedback items were the most significant items in the backlog. The number of work items in the backlog was increasing at a net rate of 428 each calendar quarter (7 person years each quarter). To compensate for this workload, numerous individuals worked more than 70 percent overtime and some worked more than 100 percent overtime in a pay period.
The licensee was not incorporating amendments into site vendor drawings in a timely manner. On March 19, 1993, approximately 11,150 vendor drawings (approximately 50 percent being safety-related) had one or more unincorporated amendments. Drawings with many unincorporated amendments rendered the associated vendor drawings cumbersome to use and impede work planning and execution. PreCous initiatives to reduce this backlog were not effective.
2.3.4 Use of Industry and Site Operational Experience Was Inadequate Industry and site OERs performed by the licensee were not comprehensive or timely, and failed to completely address problems or recommendations.
In several instances, engineering failed to review and benefit from industry experience, such as described in NRC information notices and bulletins, vendor service bulletins, and industry reports, or site operational experience, which led to avoidable site events, repetitive equipment failures, and additional engineering time expenditures.
]
The following are exa,mples in which the licensee failed to properly implement the OER program.
NRC Information Notice 91-046, " Degradation of Emergency Diesel Generator Fuel Oil Delivery Systems," listed instances where inadvertent j
30
painting of fuel injector assemblies, including metering rods, rendered emergency diesel generators inoperable.
The licensee's response to the notice indicated that adequate controls were in place and that no further actions were necessary.
However, during painting activities, paint dripped into the holes which contained the fuel metering rods, rendering a diesel inoperable as later discovered during the performance of a surveillance test.
During tests in March 1993, the licensee discovered that 36 control rods in Unit I were thermally locked. The event occurred following a reactor cooldown in February 1993 with the control rods energized on the core bottom. The licensee could have avoided the event by following guidance in Westinghouse Technical Bulletin TB-92-05 of May 21,1992.
The licensee received the bulletin in June 1992 but failed to route it to Reactor Engineering and Operations Support groups. Therefore, its contents were not incorporated into operating procedures by cognizant operational groups.
When replacing SDG rocker arms with a modified design, the licensee failed to include specific Cooper-Bessemer service bulletin requirements for torquing and installing modified parts. This could have prevented the replaced rocker arms from functioning properly.
During an uncoupled run of a reactor coolant pump, the 1cwer motor bearing failed from a lack of lube oil (LO) after a maintenance worker drained approximately 3 gallons of LO in an attempt to correct a suspect high LO level. An investigation showed that reactor coolant pump motor bearing oil levels had a history of erratic readings and that a lower reactor coolant pump motor bearing was damaged during a previous outage because of insufficient LO in the lower bearing.
In May 1990, the license detected high vibration readings on the Unit I turbine generator seal oil backup pump, but did not monitor the pump Lntil completing the 1992 outage and inspection of the main turbine and auxiliaries. During turbine startup, high vibration readings were again observed on the seal oil motor and pump bearings which necessitated repair.
The licensee assigned limited personnel and hardware resources to the VETIP to receive, distribute, and track vendor information. The licensee added staff temporarily to correct problems, but did not take long term corrective actions, thus permitting problems to recur.
For example, the licensee noted a significant backlog of unincorporated vendor information in 1988, temporarily reduced the backlog in 1991, but since then, allowed a significant backlog to accumulate. Many examples of inadequate incorporation of vendor information were repeatedly noted by Quality Assurance (QA), the Independent Safety Engineering Group (ISEG), and other audit groups without substantive corrective action being taken.
The licensee established a probabilistic risk assessment (PRA) group and a detailed PRA for the site. The licensee used this PRA information to enhance some plant procedures and hardware, and to justify 16 proposed changes to the 31
l l
technical specifications (TS). However, the licensee had not updated the PRA database to reflect actual plant equipment failure data. The PRA had been updated in reflect the frequency and duration of maintenance for some plant equipment. The licensee was not using the unique capabilities of the PRA group to identify plant equipment reliability or to help in ranking modification or maintenance work. During this evaluation, the licensee used PRA to address team concerns with the reliability of the SDGs, in particular for SDG 22, but only in response to specific and repeated team requests.
2.3.5 Insufficient Support to Engineering Management did not support engineering by assigning an adequate number of staff; supplying resources to implement engineering modifications, corrective actions, and improvement initiatives in a timely manner; an accurate material database; and maintaining an accurate management information system, including personal computer and software support. These factors reduced the effectiveness of engineering performance as illustrated in the following:
Management assigned inadequate information systems to aid engineering in i
evaluating system performance, trending maintenance history, accessing industry and site experience, performing investigations and root cause analyses, and making informed decisions. The equipment maintenance history database was not accurate and current because of the poor quality of information loaded into the system, and because of the large backlog of outstanding entries, estimated by the licensee to be approximately 6-8 months. The engineering staff did not use information contained in the database because it was not reliable. A sample of various databases showed conflicting and incomplete information concerning the maintenance history of CH chillers, failure histories for the SDGs, lists of temporary modifications, and MOV issues. The licensee could not retrieve design basis variances concerning MOV setpoints, and could not track or index Plant Change Forms by system or type. The licensee had to manually search service requests (SRs) to determine where modified SDG rocker arms were installed, and whether they were installed in accordance with a Cooper-Bessemer bulletin.
The effectiveness of engineering was hampered by sparse computer resources and analytical tools to monitor and assess component or system performance. Until the end of 1992, only five percent of the system engineers had a computer to aid in performing their job function.
Backlogged engineering work continued to increase at the rate of seven person-years each quarter, even though most groups in PED and DED worked substantial amounts of overtime. Many staff and supervisors worked greater than 70 percent and some more than 100 percent in a pay period.
j Management support for training was weak and inequitable.
PED was e
weaker than DED in terms of background and experience, had more staff (170 vs. 148), but were assigned only one-seventh the training budget of DED. The licensee primarily used OJT (particularly in PED), or previous work experience rather than training.
32
{
[
Engineering performance was not substantially improved through the e
improvsment program process. The licensee fell behind its schedule in completing many improvement programs designed to improve engineering performance and cancelled some after investing substantial resources.
Some corrective actions resulting from improvement programs produced no improvement in performance and were later cancelled. The licensee appeared to classify improvement program action items as " closed" without evaluating their effectiveness.
Substantial recurrent problems noted by maintenance, operations, engineering or other groups often resulted in design modifications to resolve the problem. However, the modifications were not installed in a timely manner.
In April 1993, the licensee conducted a modification scoping meeting to discuss the various due dates and installation windows in preparation for executive review of the 1994 modification budget. Over 150 modifications, many involving multiple components, were reviewed. The licensee speculated that approximately 50 percent of the modifications scheduled would actually be worked. Many of the modifications considered for 1994, 1995 or 1996 were initiated between 1987 and 1990. During the same meeting, over 100 ECNs were discussed for implementation, most of which were initiated during 1988 to 1990.
The licensee failed to make effective use of studies critical of engineering activities. A substantial contractor review of engineering activities completed in February 1992 resulted in 42 major findings, 20 conclusions, and 11 recommendations. The licensee did not track the recommendations because "the recommendations and associated actions were not viewed by the line organizations as having sufficient specific value to warrant attention." Many of the findings and conclusions made in the contractor report were similar to the team findings. Examples include: Duties of system engineers not effectively communicated, frequent priority and modification scope changes, crisis management atmosphere, information database fragmentation and inconsistencies preventing efficient use of resources for engineers, large backlogs, work process inefficiencies, poor to non-existent professional training, supervisors not aware of their responsibilities, PED not proactive, system engineer group understaffed, trending not done, focus on short term problems, transition from construction to operations not fully accomplished, and budgeting and tracking of engineering manpower not specific.
2.3.6 Configuration Control Weatnesses Configuration control weaknesses which adversely affected safety-related plant equipment, were noted in several instances, such as molded case circuit breakers, SDGs, and environmental qualification of MOVs.
In other instances, such as the vendor drawings, the team observed weaknesses in configuration j
control that, if left uncorrected, could adversely affect plant operations.
Ineffective management oversight and direction, including insufficient resources, were significant contributors to these weaknesses.
l Examples of configuration control weaknesses include the following:
The Electrical Setpoint Index for molded case circuit breakers was not e
33 1
1
properly understood or implemented in the field, resulting in operability concerns.
While performing maintenance on molded case circuit breakers, the licensee discovered that magnetic trip settings were adjusted using the electrical penetration test point calculations for permissible currents rather than trip values obtained from the index. The licensee later determined that the instantaneous trip (magnetic) settings were improperly adjusted for approximately 30 breakers in Units 1 and 2.
The licensee found operability concerns with 10 breakers powering MOVs such as containment and accumulator isolation i
valves. This condition may have existed since startup. Although the index contained appropriate criteria, the licensee had not prepared detailed work or procedural instructions for craft personnel to use in interpreting or scaling the index guidance.
When installing SDG rocker arms with a modified design, the licensee failed to include specific Cooper-Bessemer service bulletin requirements for torquing and installing the modified parts, which could have caused the replaced rocker arms to function improperly. Once alerted to the bulletin requirements, installation of the rocker arms was still not completed correctly, i.e., the requirement to replace both the intake and exhaust rocker arms as a set was not accomplished. The licensee also had to resort to hand searches of service requests to locate where the modified rocker arms were installed.
The licensee did not maintain the environmental qualification of valve actuator motors in containment by installing "T" drains as required by design. A service request submitted in November 1990 to install two "T" drains in Unit 2 train B residual heat removal suction isolation valve was still open during the evaluation. The team requested the licensee to determine which MOVs did not have installed "T" drains.
The licensee found five actuator motors that did not have "T" drains. The engineering staff evaluated three of the five, concluded that no action was required, and was evaluating corrective actions for the remaining two valve actuator motors.
The many unincorporated amendments to vendor drawings remained significant and could impede work planning and execution. As of March 19, 1993, approximately 11,150 vendor drawings (approximately 50 percent being safety-related) had one or more outstanding unincorporated amendments.
2.3.7 Functional and Programmatic Weaknesses Could Adversely Affect the Operability of the Essential Chilled Water System The safety-related CH system supplies chilled water to air-handling units in areas of the plant housing safety-related equipment. The safety function of the system is to maintain space temperatures within design limits in the event of an accident.
Functional and programmatic weaknesses were observed in the design, testing, modification and maintenance of the system, that if uncorrected, could adversely affect the operability of the system.
The ability of the CH system 34
l to function for extended periods, during a design basis accident under low
)
heat load conditions was never demonstrated, either by testing the system at various design basis accident heat loads, or by engineering analysis.
Examples of weaknesses include the following:
The licensee did not complete an analysis for the CH system under low heat load conditions.
If an accident occurred during cold weather and all chillers operate, the chillers would be under-loaded, causing surging and failure, resulting in loss of CH cooling of safety-related equipment. A 1989 Safety System Functional Assessment (SSFA) of the essential cooling water (ECW) system conducted by the licensee identified concerns with chiller operation for accidents during cold weather. The licensee performed calculations for some concerns of the SSFA, but failed to address overcooling of the CH system during accidents when more than two trains of the chilled water system operate, as in response to an safety injection or loss-of-offsite-power signal which start all three-trains, or when heat loads would be lower than those assumed for worse case high heat loads accident conditions. The findings in a March 26, 1993, QA report of a CH system SSFA included continuing concerns about compressor low heat loads. Although the planned action to resolve the findings was adequate, a number of issues, such as verification of field setting of compressor suction pressure l
trip setpoint, were not adequately addressed by the licensee.
The I
licensee made a comitment to the team to evaluate under-loading of l
chillers during accident conditions.
Preoperational, surveillance, and post-maintenance testing (PMT) performed on the CH system did not demonstrate that the system would be operable for extended periods under design basis heat load conditions.
The piping design configuration did not allow the CH system to be tested with heat loads representative of those anticipated during accident conditions.
Compressor refrigerant and oil contamination was a long term problem that significantly affected reliability. The vendor proposed installing a proven refrigerant clean-up kit that would allow uninterrupted chiller operation. Although the modification was approved in September 1991 for installation in 1992, its installation date was deferred to October 1994 for Unit I and April 1995 for Unit 2.
In 1993, after further evaluation and repeated attempts at installation, the licensee cancelled plans to install a proximate vibration probe assembly recomended by a vendor in 1988 to detect high speed thrust bearing displacement and an automatic compressor trip function for the 300-ton compressors to prevent catastrophic failure.
In 1989, the licensee implemented a temporary modification to remove an ECW valve actuator which automatically controlled flow to the chiller condensers by using an upstream manual valve rather than correcting automatic control system design and material deficiencies.
35
.r -
?:
s p
b After maintenance work was performed on the feeder breaker for essential chiller 21C, the chiller was declared operable without PMT. The following day the chiller tripped during a routine start attempt because of breaker problems.
The maintenance craft personnel introduced air into the essential chillers and flooded a control panel with oil because they did not understand how the chillers function under vacuum.
Inadequate training caused poor maintenance work and contributed to degraded performance of the equipment and the lack of availability.
2.3.8 Untimely Resolution of Fire Protection Issues The licensee did not resolve numerous fire protection issues in a timely manner. The issues included excessive shrinkage of penetration seals, an unreliable fire alam system, a large backlog of service requests on fire protection systems, and inadequate control of transient combustibles in the plant. Management did not adequately oversee and direct the efforts to resolve these issues in a timely manner.
Excessive shrinkage and resultant cracks of Hydrosil-type penetration seals allowed free air to pass between fire areas and raised questions of structural integrity, making the seals ineffective fire barriers.- The problem was previously identified in 1990 and was thought to have been corrected after a 100 percent survey in 1991-92 and subsequent repairs / rework. The cracking was again identified in March 1993. The investigation of the problem was scheduled to be completed by May 31, 1993.
The Pyrotronics fire protection computer system, which monitors fires in various plant areas and alarms in the control room, was unreliable with numerous chronic problems, including defective detectors and electronic transmitter boards.
Numerous false alarms frequently annunciated (20-30 each day) and control room operators could not quickly ascertain which detector was in alarm status.
Replacement parts were not available because the system was obsolete. Although a modification was proposed to replace the system, the modification received low priority, and was not scheduled for installation until 1996. The team raised concerns about the system reliability and the ability of operators to determine if and where a fire existed.
At the time of the evaluation, the licensee had a large backlog of 361 open t
SRs for fire protection systems (164 for Unit 1, 122 for Unit 2, and 75.
common). Included are 249 SRs associated with fire suppression system problems, the majority being valve packing leaks; and 112 SRs associated with fire detection systems, 30 percent caused by trouble-alarms because of dirty fire detectors.
In addition to the backlog of 361 open SRs for fire l
protection systems, the licensee had another backlog of 68 SRs for Unit I and 163 for Unit 2, pertaining to fire barrier breaches including Thermo-Lag installations and Hydrosil penetration seals.
The large backlog indicated that the reliability of fire protection systems was questionable.
In April 1993, the licensee located significant quantities of transient j
combustibles such as wooden tables, waste oil, oil-soaked rags, and 36
miscellaneous combustible items located throughout the plant. The presence of such large amounts of transient combustibles was indicative of an inadequate control program.
2.3.9 Positive Observations Team observations included a dedicated engineering staff, an improving engineering program to include an increased level of support by establishing the technical support engineer group, well written and comprehensive major modifications, and a partially completed design basis document program that appeared to produce good results.
2.4 Managernent and Organization The team evaluated the involvement and effectiveness of STP management and organization with respect to safe plant operation. This evaluation resulted in significant findings and observations in the following areas: direction and oversight, support and resource utilization, comunication and teamwork, corrective actions, self-assessment and quality oversight, and information systems.
Teveral factors significantly decreased management effectiveness throughout the organization. Senior management failed to provide the staff clear direction and oversight in several key areas, including performance standards and station priorities.
Frequent, conflicting messages about the implementation of these standards and priorities were sent by senior management. Numerous uncontrolled memoranda and oral instructions were used to change standards and priorities. As a result, the staff questioned the credibility of senior management. Middle managers often failed to obtain feedback on problems and give consistent direction because they did not-interact frequently enough with people in the plant.
Management failed to apply sufficient resources to maintain adequde performance levels and standards. Significant station activities were not adequately funded despite the clearly stated objections of the middle managers responsible for those areas.
Information systems, including management information systems, did not adequately support performance monitoring and in some cases impeded the plant staff.
Management did not establish good comunications and teamwork.
Expectations regarding competing priorities between budget, schedule and safety performance were not comunicated well. Vertical comunications were particularly weak.
Senior u.anagers did not foster frank, open feedback from lower managers and staff. As a result, lower managers were reluctant to bring problems to senior management. Horizonal comunications and interface problems added to the difficulty of completing work using established processes. There was a weak coordination and accountability between the disciplines during routine work.
As a result, an excessive number of task forces, outside the normal organization, seemed to be required to accomplish routine work.
37
1 l
The team concluded that the licensee's ineffective corrective action processes were major obstacles to improving plant equipment and human performance.
Ineffective problem identification, shallow root cause analyses, inadequate safety evaluations, lack of aggressive problem resolution, poor information systems, and budgetary constraints resulted in short term rather than long term solutions. Managers did not respond effectively to the findings, concerns, and recommendations of their principal self-assessment and quality oversight functions, especially those of the Nuclear Safety Review Board and Quality Assurance.
Prior to the NRC's decision to conduct a diagnostic evaluation, the licensee had identified and acknowledged most of the problems identified by the team and had made several attempts to improve performance, including implementation i
of an operations improvement program. However, these efforts were only partially effective. The licensee had developed a Master Operating Plan for South Texas that included action plans to effect improvements.
Following the anneuncement of the NRC diagnostic evaluation, the licensee conducted a pre-diagnostic evaluation which confirmed the need for further action. The Master Operating Plan was substantially modified to incorporate the results of that evaluation. Another positive observation was the recent management and organizational changes, both those completed and those underway at the end of the evaluation period.
2.4.1 Ineffective Direction and Oversight Senior management failed to provide the staff clear direction and oversight in several key areas including performance standards and station priorities.
Frequent, conflicting messages about the implementation of these standards and priorities were sent by senior management. Numerous uncontrolled memoranda and oral instructions were used to change standards and priorities. As a result, the staff often questioned the credibility of senior management.
Management's stated emphasis on "doing things right, not just doing them" often seemed to conflict with these memoranda and instructions. As a result, the staff w n tioneo the credibility of senior management.
Several factors sian'.ficantly decreased management's effectiveness throughout the organ'zation Middle managers often failed to obtain feedback on problems t
and give censistent direction because they did not interact frequently enough with people in the p hnt. The reactive mode of the organization, a result of the poor material con 11 tion of equipment, contributed to managers not spending enough time for face to face communications with people in the plant.
Although the licensee initiated the management surveillance program in 1990 in an attempt to increase management's presence in the plant, the plant staff had not fully accepted thO program. The perception by plant personnel was that the managers focused or: minor housekeeping items rather than effectively l
interfacing with personnel and providing one-on-one direction and feedback.
The lack of clear and cot.sistent station management direction combined with senior management's over-involvement in lower level issues created a widespread perception that middle managers had little authority. Many levels of approval were required for some lower level decisions, such as small purchase orders. This over involvement contributed to a high senior 38 1
management workload, limited their time available to focus and provide direction on higher level issues, and discouraged ownership and accountability at the lower levels of management. As a result, many of the plant's more important activities and initiatives, such as root cause analyses, didn't receive consistent and clear management direction and didn't have an owner that really felt accountable. Key performance issues were often not fully appreciated by senior management even after they were identified by outside industry and regulatory agencies, despite precursors and warnings within the organization at STP.
Most managers at STP lacked commercial nuclear operating experience outside of STP. Some managers had Navy nuclear experience, but had very limited experience at STP. Weaknesses in commercial operational experience resulted in a lack of insight into some operational problems and failure to fully recognize the safety significance of some operational issues and also contributed to management's failure to fully appreciate and compensate for the unique challenges of the plant's design.
In addition, many managers had recently been rotated and into positions for which they had little background.
The majority of the department level managers had been rotated one or more times during the past year. The large number of recent management changes caused sufficient concern for top management to issue a one year moratorium in early 1993 on management rotations in an attempt to stabilize the management team.
The lack of clear and consistent direction and oversight often contributed to a failure to successfully implement a key program or improvement initiative.
Although the licensee recently revised the Master Operating Plan (HOP) to better reflect management's stated strategic goals, many of these new goals remained to be implemented. The discontinuity between the strategic goals and the daily activities had undercut the credibility of senior management's plans and the MOP.
2.4.2 Poor Support and Resource Utilization Management failed to provide and adequately focus sufficient resources to maintain performance levels and standards for the existing plant conditions.
Significant station activities were not adequately funded despite the clearly stated objections of the responsible middle level managers. Although top management clearly stated to the team that resource limitations had never prevented the accomplishment of necessary maintenance or the resolution of safety-related problems, middle level managers perceived that resources would not be approved if the proposed line item caused department budgets to exceed the target budget levels established by senior management.
STP management had not established management systems that would effectively and efficiently accomplish the strategic goals listed in the MOP by implementing those goals into the daily work schedule. The planning, scheduling and work process controls did not support the timely and reliable i
completion of work by maintenance, operations and engineering. Although station management had recognized this problem, they had failed, until recently, to focus the necessary resources to correct this situation. The impact of this problem was readily observable across the station in 39
operations,. maintenance, and engineering. These management systems problems had contributed to job performance errors, low productivity, deferral of maintenance, and poor prioritization of necessary station work.
f Senior management's reaction to unforeseen, emergent work was to defer or cancel other previously budgeted line items to maintain the target budget expenditure goals. This approach resulted in deferral or cancellation of budget line items that had previously been judged to have merit. STP routinely experienced a significant end-of-year deficit in the accomplishment of planned, priority work because of the failure to adequately anticipate and budget for emergent work. The increasing backlogs of deferred work in maintenance, engineering and operations were clear indicators of this management approach.
One example that illustrates the senior management response to high priority budget requests was the previous maintenance and training managers' request for maintenance training.
Both managers had established the need and requested the funds to P ~;'Je additional maintenance craft training in response to recognized deficiencies. The request was not adequately funded despite a clearly written budget justification highlighting the significant consequences of not funding this program. Subsequently, the licensee's maintenance staff knowledge was found to be below industry standards and the licensee was forced to initiate an accelerated remedial training program.
A second example was the elimination of funding for the operations training pipeline for replacement of plant operations personnel due to attrition.
Operations staffing had been reduced to such a low level as to become the critical resource for implementation of some station activities. Attempts to justify funding for training of additional non-licensed operators were rejected by senior management b,: sed on inaccurate staffing projections that failed to account for the actual scope of work and responsibilities assigned to operators.
Failure to provide additional non-licensed operators had prevented upwart progression of non-licensed operators into the ranks of licensed operators and precluded utilization of licensed operator experience in other functional areas at STP.
A third example of the impact of insufficient funding was the budget exclusion noted in the proposal submitted by engineering highlighting the fact that engineering backlogs of modifications and corrective actions would not be reduced in 1993 due to lack of funding.
In fact, engineering backlogs had continued to increase in 1993.
Staffing levels were marginal or insufficient in several key areas.
There were several incidents where operations personnel exceeded technical specification guidelines on overtime. Minimal operations staffing was exemplified by some operating crews' inability to compensate for unplanned absences despite the fact that the licensee had recently gone from five operating crews to four and operator training was deferred. The maintenaitte department as a whole routinely worked in excess of 50 percent overtime.
Performance review boards conducted very recently by the licensee identified fatigue as a root cause of several personnel errors.
Insufficient staffing and resource utilization in engineering had resulted in some engineers working 40
in excess of 70 percent overtime and system engineers being unable to meet i
management's expectations of their performance.
Insufficient staffing in the CAG and the independent safety engineering group (ISEG) resulted in significant backlogs of station problem report (SPR) and operational experience report (OER) reviews.
ISEG had also been forced to cancel or delay safety-significant tasks in the past, due to resource constraints.
Further, even when the failure to approve a request could clearly be projected to result in mid-year requests for additional funding, key budget line items were still not adequately funded.
For example, when the operations department requested additional funds for operator overtime, senior management denied these funds based on budget target goals that had been established on data from the 1992 approved budget and not on actual 1992 expenditures. Operations management made a convincing case but senior management stated that the budget i
targets would not be exceeded. This had resulted in a mid-year request by 1
operations to provide these funds or over-run their budget.
The decision to have several station staffing studies conducted by outside consultants indicated senior managements' concern over appropriate staffing i
levels. However, the recommended staffing levels in the most recent study was l
based on incorrect assumptions on productivity. The average time required to complete a service request (SR) was estimated in this study as less than 30 hours3.472222e-4 days <br />0.00833 hours <br />4.960317e-5 weeks <br />1.1415e-5 months <br />.
Information supplied by the licensee at the request of the team indicated that the actual time required to complete a SR ranged between 42 and 50 hours5.787037e-4 days <br />0.0139 hours <br />8.267196e-5 weeks <br />1.9025e-5 months <br />. Additional information found by the team indicated that the latter figures may also be low because the licensee's management information system (MIS) did not account for lost time due to lack of coordination and parts availability, which were significant problems at STP, and time expended by one craft assisting another.
Staff productivity was not effectively measured or understood by management.
Although the licensee identified inefficient work control processes as major contributors to the large work backlogs, the MIS did not provide adequate i
measures of staff productivity. The maintenance required to complete SRs was not accurately measured and no system existed to measure engineering staff productivity. Additionally, the licensee did not account for all overtime worked by salaried employees.
In addition to staffing based on incorrect assumptions on productivity, the licensee generally appeared to be staffing based on levels predicated on the station operating in a stable condition with only long term requirements and no significant backlogs or emergent workloads. These assumptions did not reflect the state of maintenance backlogs, improvement programs, and other post construction transitional requirements that existed at STP.
Licensee management was reviewing..taffing levels when the team left the site and had added temporary contractors to the ISEG to address OER backlog issues.
Licensee management acknowledged at the close of the evaluation period that staffing levels in some areas were not sufficient, given the existing work process and backlog issues.
Support of training, including funding, was weak.
For example, the team l
41
-e w+-
identified several instances where lack of training contributed to system engineer ineffectiveness, weak problem resolution, and poor decision making.
However, the funds budgeted for training of design engineering department (DED) engineers was seven times greater than the funding budgeted for approximately the same number of engineers in the plant engineering department (PED) which included the system engineers. This appeared to be a historical discrepancy in that engineering training funds were based on previous year's budgets and not on actual need. Also, the scope and duration of operations training was frequently altered to support manpower shortages in the plant.
Management did not adequately budget for or effectively manage spare / replacement parts.
Lack of parts availability had adversely impacted the SR backlog. The licensee had established a master parts list with minimum and maximum stocking levels. However, several problems identified by the team indicated that this system may have been based or, an inaccurate economic model, coupled with errors in the plant labeling system (parts identification numbers).
It appeared that management considered the entire inventory as homogenous when assessing inventory turnover frequency rather than separating long-term strategic from rotating stock. When requested by the team to provide numbers identifying the turnover of routinely used parts, it was apparent that these figures were not considered or monitored by STP. The low reported turnover rates, when coupled with the lack of available consumables reported by maintenance and the budgeted figure for parts obsolescence and disposal, indicated that STP was not adequately stocking those parts routinely useo. This was exacerbated by the failure of the master parts list system to adequately account for surges in usage. The master parts list also had errors in its design data, due to inadequacies in the vendor information tracking program.
Further, the performance indicator related to stocked material availability was inaccurate due to the comon practice of not starting work until it was known that parts were available.
Station improvements were adversely impacted due to budget pressures.
For example, the plant relabeling program was initiated as a result of numerous operator errors involving wrong train and wrong component and errors in parts identification which had produced maintenance delays. Operations management requested approximately $220k for this program in 1993 budget requests.
Senior management approved $12k. A second example was the Long Range Engineering Improvement Plan.
Funding for this program was eliminated in the 1993 budget.
The licensee had provided staff with excellent facilities, buildings, office space, maintenance shops, etc., and h:d planned these structures to facilitate work by locating similar work groups, e.g., maintenance, in close proximity.
2.4.3 Comunications And Teamork Were Weak Management did not establish good comunications and teamwork.
Expectations regarding competing priorities between bedget, schedule and safety performance were not communicated well. Vertical comuur.ications were particularly weak.
Senior managers did not foster frank, open feedback from lower managers and staff. Horizonal communications and interface problems added to the difficulty of completing work using established processes. There war. a weak 42
^
of coordination and accountability between the disciplines during routine i
work.
As a result, an excessive number of task forces, outside the normal organization, seemed to be required to accomplish work.
The level of routine administrative workload and the reactive mode of the organization tended to leave little time for ccmunications and coordination within work groups and with other groups. This problem existed, to some i
extent, at all levels of the organization. The senior managers sometimes did not have good comunications and teamwork among themselves. The weak teamwork I
was accompanied by weak ownership and accountability at the lower levels.
The i
team viewed the weak comunications and teamwork as a significant contributor to the weak coordination in work planning, scheduling and control.
Management expectations in some areas often were not ha s comunicated.-
Competing priorities between budget, schedule and safety perfomance were not managed or communicated well and the messages received by the plant staff i
often reflected this conflict. For example, the team observed during meetings to discuss the Unit I workload and startup schedule that senior management did j
not appreciate the impact of their startup schedule expectation on the i
operations department workload and had not accurately weighed the competing priorities of safety and schedule adherence partly due to a lack of operations' input into the startup schedule.
l Management had failed, in some cases, to clearly define and comunicate appropriate standards and priorities for personnel and plant performance.
In i
addition, there was often conflicting messages sent in the implementation of these standards.
As a result, personnel were often not sure what management actually expected.
Examples included:
The threshold of SPR initiation and depth of root cause analyses was not
{
well defined, and comunicated to staff. As a result, the quality of root cause analyses was often weak, but varied significantly between i
groups and individuals within a group.
l The M0P goal of increased reliability was in conflict with the deferral of maintenance.
i Vertical comunications were particularly weak. The team observed that top onsite management practices did not foster frank, open feedback from lower levels of management and staff. As a result, lower managers were reluctant to bring problems to senior management. This had a negative effect on teamwork
)
as well as vertical comunications. The team attended meetings where senior
)
management dominated the meeting to such an extent that there was little comunication except top down. On several occasions after the senior management left the meeting, the team observed markedly improved comunications and coordination.
The licensee had established several formal methods for employees to comunicate upward to management. The Speakout Program was intended for safety concerns and the Employee Assistance Program (EAP) was intended for nonsafety concerns.
Although both programs were supposed to be anonymous, there was a perception among many employees that these programs were not, 43
which limited thir effectiveness. There was also the perception that management was rot interested in hearing about problems as demonstrated by the lack of results when issues were brought forward. Some employees had used the Speakout Program to express concerns for the lack of maintenance training, but their expressions of concern proved to be ineffective. Some employees expressed concern that some issues, such as excessive overtime, were processed as nonsafety concerns in the EAP.
Although personnel expressed reluctance to report some problems, the team did not detect any reluctance for employees to report issues perceived as imediate afety concerns.
Horizontal comunications and interface problems resulted in difficulties in getting work done using the existing processes. There was a lack of coordination and accountability between the disciplines for routine work. For example, a lack of effective interface, during design engineering modifications to build permanent valve operating platforms for operators, resulted in platforms that were located on the wrong end of the valve operator not accessible to the handwheel. Another result of this lack of coordination was an excessive dependence on task forces, outside the normal organization, to accomplish many tasks at STP. While the team was onsite, there were more than 40 task forces in place.
2.4.4 Ineffective Corrective Action Process The team concluded that the licensee's ineffective corrective action 7rocess was a major obstacle to plant equipment and human performance improvement.
Poor problem identification, shallow root cause analyses, inadequate safety impact evaluations, and lack of aggressive problem resolution, combined with poor information systems and budgetary constraints, resulted in shcrt term rather than long term solutions to station problems. Many of the technical and management problems had been identified through internal, industry, and NRC assessments. However, until recently, management had not responded appropriately to concerns identified by these entities.
Insufficient staffing in the ISEG and the corrective action group (CAG) for the existing workload and lack of ownership for SPR reviews and root cause analyses performed by-other groups, contributed to an ineffective corrective action process.
The team found several examples where confusion and lack of training resulted in SPRs not being issued in a timely manner on safety-related equipment. The team observed the impact of this failure to document identified problems during licensee meetings to discuss SDG work. Management was not sufficiently aware of these problems or their significance because SPRs had not been written. The licensee's QA department had repeatedly notified management of a weakness in the definition of " conditions adverse to quality' which resulted in licensee personnel not being aware of when to write a SPR. This QA finding remained open at the conclusion of the team's onsite period because management had not effectively addressed this concern. Additionally, lack of effectiveness in reporting problems reflected workers' willingness to live with problems, due at least in part to conflicting management expectations and standards regarding material condition.
The team identified many examples of inadequate root cause analyses related to equipment failures. A formal root cause methodology had been established, 44
including the recent addition of human performance factors. However, several individuals outside of the CAG who performed root cause analyses had not been adequately trained. Also, as in the case of engineering, individuals performing root cause analyses often were not knowledgeable on the system or component of concern.
An example of inadequate root cause analysis was the licensee's failure to identify the root cause of repeated failures of SDG fuel injector pump hollow hold-down studs. Additionally, until very recently, the licensee had not identified " fatigue" as a root cause of personnel errors even though the station had essentially been in an outage since September 1992, with maintenance personnel working in excess of 50 percent overtime. The NSRB had " excessive overtime" as a watch list item of concern for the previous year. The licensee's ISEG and quality assurance (QA) department, as well as the NRC, had repeatedly identified inadequate root cause analysis as a contributing factor to ineffective corrective actions.
The team identified several instances where inadequate safety evaluations resulted in ineffective corrective actions. For, example, the licensee failed to adequately evaluate the impact of missing seismic hold down screws in the Unit 1 Quality Display Parameter System (QDPS) card racks when discovered in August 1992. The QDPS was declared inoperable eight months later following an operability review requested by the team.
The team identified several examples where timely and effective corrective actions were not taken.
For example, the licensee failed to take appropriate actions to resolve the presence of moisture in the feedwater isolation valves' hydraulic fluid or to address similar concerns in the essential chill water flow control valves and the steam generator PORVs.
Although senior management expressed the desire to become more responsive on corrective actions, it appeared from documentation and interviews that little progress had been made and that budgetary pressures had an adverse impact on corrective actions. This situation was clearly exemplified by the 1992 Unit I refueling outage where management established an unrealistic g.al of a 63 day schedule which led to starting up the unit with an excessive number of deferred work items and an unreliable AFW train (even though the outage actually lasted for 105 days). In addition, failure to fund important corrective actions, such as the plant relabeling program and improvements to the CH system, had significantly contributed to the lack of station improvement.
The team found that the CAG was not adequately staffed to address the workload that existed at the time of the evaluation. The CAG had been formed in response to identified weaknesses in the corrective action program. The CAG had been budgeted to perform 50 category 1 through 3 SPR reviews, 9 event ieview teams, and 1500 final SPR reviews annually. However, the current workload was more than twice this amount and included OER reviews and the recent addition of performance review boards.
Further, the team found that the CAG had been funded by reducing or eliminating the corrective action funds of other departments.
In fact, the corrective action workload had increased in maintenance, operations and engineering since the establishment of the CAG.
The limited staffing available for SPR review and root cause analysis had contributed to shallow and hurried efforts. Compounding this problem, the 45
team found that the CAG lacked ownership of the corrective action program with respect to the SPR reviews and root cause analyses not performed by CAG.
The effectiveness of ISEG in identifying root causes of problems and proper corrective actions was also limited. The scope and detail of work assigned to ISEG had exceeded the capability of the assigned staff to meet those functions required by technical specifications in a timely manner. Coordination of the OER program suffered severely from ISEG's overloaded and limited staff. This was evidenced by the excessive backlog of OERs requiring an ISEG review and the cancellation of safety-significant tasks. Two examples where lack of an effective OER program contributed to ineffective corrective actions were the long standing problems with the AFW and SDG systems discussed in earlier sections of this report. Management's failure to provide more than the technical specification minimum staffing for ISEG and the frequent change or absence of ISEG directors were further evidence of management's lack of support for corrective actions.
At the close of the evaluation, the licensee was planning revisions to both the SPR and the service request (SR) programs. Additionally, recent licensee assessments had identified lack of coordination between these programs and cthers (radiological occurrence reports, equipment history, etc.) as a root cause of poor corrective action performance. However, the licensee had not taken any steps to establish overall coordination of these programs.
2.4.5 Ineffective Utilization of Self-assessment and Quality Oversight Functions Managers did not respond effectively to the findings, concerns, and recommendations of their principal self-assessment and quality oversight functions, including the Nuclear Safety Review Board (HSRB) and QA. In addition, management had not fully supported the ISEG review for lessons learned.
Since 1991, NSRB had advised STP management of specific problems which could affect safety and needed management attention. An NSRB " Watch List" was established in 1992 to emphasize the importance to safety of the 10 top issues of concern to the NSRB.
In February 1992, the material condition of the plant and the corrective maintenance backlog concerns were near the top of the NSRB Watch List of concerns.
In the last 14 months, the NSRB had repeatedly advised senior management to take proper remedial measures, but little, if any, improvement was noted in these areas. As of April 1993, these same concerns were near the top of the NSRB Watch List.
Personnel concerns were also included on the April 1993 NSRB Watch List, such as supervisor effectiveness, the effectiveness of communicating the station philosophy of getting things done right, rhther than just getting them done, and personnel integrity.
Similar findings from QA audits and reviews had been equally ineffective in bringing about corrective actions and improvement even though these findings and recommendations were routed to top management and other managers who had responsibility for the issues addressed. The team found numerous records which documented QA's persistence in attempting to gain management's attention 46
4 I
to their safety findings. Examples included continuing weaknesses in:
the corrective action process material condition of safety-related systems e
management's use of informal documents (e.g., night orders) to amend or revise procedural requirements before changing the procedures themselves configuration management e
maintenance training, qualifications, and work activities e
Beginning in 1990 and continuing well into 1993, SPEAK 0UT, the QA program, and the training department manager alerted senior management that there were serious problems in maintenance training, maintenance qualifications, and maintenance performance. Management failed to effectively address the root causes and the corrective actions recommended by QA in these areas, even when advised that unacceptable practices still existed. The result was a growing backlog of work, reduced plant reliability, and an extended unplanned outage.
The ISEG review for lessons learned was not fully effective due to weak management support as discussed earlier.
Less than 50 percent of the OER documents that required review for applicability to STP systems had received any review by ISEG.
Approximately 300 OERs were open in April 1993. Many reviews had been incomplete or did not address the industry identified problems or recommendations. Additionally, there had been 5 ISEG directors in the previous five years and, frequently, as was the case during the evaluation period, this position was filled in an " acting" capacity.
2.4.6 Inadequate Information Systems Infonnation systems, including Management Information Systems, did not adequately support and in son.e cases were an impediment to plant operations, work control and other management functions. Although this had been previously identified by the licensee, this situation detracted from station performance over several years.
Information systems suffered from a lack of management support and the failure of management to adequately address the needs of end users.
Inadequacies in information systems resulted in misleading and inaccurate perfcrmance indicators and an absence of information necessary for plant operations, work process control, productivity measurements and budgetary justifications.
The computerized information system consisted of several non-integrated hardware configurations, including seven local area networks. There were also several uncontrolled computer programs utilized in the control room for various work control processes. There was no interactive (sharing of data) interface between the different computers which meant that similar data was duplicated on different computers. This method of managing data was inefficient and increased the probability for error due to multiple entry at different time intervals. The team found that the data in several areas was unreliable. Additionally, STP was experiencing significant delays in 47
accessing ditta from its main computer system due to hardware and processing limitations.
The team identified and confirmed the following weaknesses in information systems: (1) Equipment history records were incomplete and approximately eight months behind in being updated. This resulted in the licensee's tendency to not rely on these records as observed by the team.
(2) The acquisition of parts information was cumbersome, slowing down maintenance work package preparation.
(3) The information system used for outage planning was not capable of performing assessments of critical path items.
(4) Computer assistance to aid the system engineer in documenting and trending system performance and condition was not generally available. The licensee had purchased approximately 700 personal computers in 1992, nowever, most of these remained in the warehouse at the time of the evaluation.
(5) The PRA database was not updated to reflect actual plant failure data.
(6) Information used to derive the plant performance indicators was inaccurate and misleading.
(7)
Information to support management in budget justification was missing or inaccurate.
(8) Staff productivity measurements were nonexistent or misleading.
(9) Issues identified by the team regarding the control and use of computerized information systems in the control room were additional examples of inadequate information systems and its impact.
The licensee's performance indicators (PIs) were, in some cases, misleading and inaccurate. Many PIs were in error and were not indicative of the major performance problems at STP.
For example, PIs associated with overtime indicated that the overtime rate was less than 5 percent for salaried and less than 10 percent for non-salaried. However, the team found that most salaried employees worked at least 25 percent overtime and that many engineering personnel worked in excess of 70 percent overtime with a few in excess of 100 percent overtime. Additionally, most maintenance craft and supervisory employees worked approximately 50 percent overtime. A second example was that the performance indicator related to stocked material availability was inaccurate due to the practice of maintenance, scheduling, and planning personnel not starting work until it was known that parts were available.
The licensee was in the process of purchasing a new computer program directed at improving information systems. However, management's errors in establishing the current system were being repeated in the information system improvement program in that input and feedback from the end users was not being adequately incorporated. Management's lack of support for information systems improvement was further evidenced by the failure to replace, in a timely manner, the manager responsible for this improvement program following his promotion to another on-site organization. At the close of this evaluation, licensee management had begun to involve end users in the final development of the long range information systems improvement plan and had hired a replacement program manager.
48
^
3.0 ROOT,CAUSES The underlying root causes for declining performance at South Texas Project were attributed to the following:
3.1 Failure of Management to Provide Adequate Support l
The STP units entered the operational phase with large backlogs of unresolved equipment and design issues. Not long after these units were licensed, management initiated a continuing effort to reduce costs at the facility,-
1 which included reductions in staffing. The licensee also failed to allocate resources commensurate with management expectations of performance. Resources r
allocated by management were insufficient to resolve the existing backlog of outstanding work and precluded initiatives needed for long term improvement.
The lack of commercial operating experience outside of STP, at senior management levels, contributed to a lack of appreciation for the resources needed to achieve and maintain performance improvement.
3.2 Ineffective Management Direction and oversight Senior management failed to provide the staff clear direction in several key areas including, performance standards and station priorities. Management sent conflicting messages regarding priorities between budget, schedule and safety performance. Senior management practices did not foster frank, open feedback from lower levels of management and staff. Middle management often failed to get feedback on problems and provide consistent oversight and direction in part due to limited interactions with people in the plant and poor management information systems. These-factors significant~ly decreased management credibility and effectiveness. Horizonal communications and interface problems, such as weak coordination and accountabilit;r, added to management's difficulty of providing direction and getting work done.
3.3 Failure To Effectively Utilize Self-Assessment and Quality oversight Functions Management had not responded effectively to findings, concerns and recommendations of its principal self-assessment and quality oversight functions, including the Nuclear Safety Review Board (NSRB), and Quality
,i Assurance (QA). These and other STP oversight functions had clearly identified significant programmatic weaknesses associated with operations, maintenance, engineering, and the corrective action process.
In addition, j
management had not fully supported the Independent Safety Engineering Group's (ISEG) review for lessons learned.
Since 1991, the NSRB had advised STP management of specific problems which could affect safety and needed management attention, but little, if any improvement was noted in many of these areas. Similar information was available from QA audits and reviews.
49
i 3.4 Inade,quate Root Cause and Ineffective Corrective' Action Processes The licensee's ineffective corrective action processes were major obstacles to improving plant equipment and human performance.
Ineffective problem identification, shallow root cause analyses, inadequate. safety evaluations, the lack of aggressive problem resolution, poor information systems,.and budgetary constraints.resulted in short term rather than long term solutions.
This resulted in numerous operator workarounds, repetitive equipment failures i
and personnel errors, degraded material conditions, and large backlogs of l
deferred corrective maintenance and modifications. This exacerbated i
management's reactive mode by requiring frequent responses to previously identified problems that were not effectively resolved.
i i
9 i
i a
i k
T b
r I
50 i
~
w,,
i 4.0 EXIT MEETING On June 3,1993, the Director, AE0D, the Associate
, NRR, HL&P Chairman of the Board and Chief Executivej Operating Officer, and senior managers and staff from STP to review the 1
results of the evaluation.
observation.
This DET exit meeting was open for public of Texas, the Mayor of Bay City, Texas, and members Briefing notes, sumarizing the team findings and conclusions are attach Appendix A.
51
APPENDIX A SOUTH TEXAS PROJECT DIAGNOSTIC EVALUATION TEAM FINDINGS
/
I.
L_
7----- -..
A
---4 3.
"o
&~
l
,C,, wg m,,D,f L- -N g'
~*
- 5. ~.t
_-== y, m-M Q-
%~-~._
=--r.
- =.. _ _
_ _ -.3
-=_
x f_
=u-SOUTII TEXAS PROJECT ELECTRIC GENERATING STATION June 3,1993
~---
SELECTION OF SOUTH TEXAS BASED ON 1
e DECLINE IN PERFORMANCE IN THE LAST TWO SALP REPORTS e
REPETITIVE HARDWARE PROBLEMS e
SIGNIFICANT NUMBERS OF PERSONNEL ERRORS e
NUMBER OF RECENT MANAGEMENT CHANGES l
l e
ORGANIZATIONAL PERFORMANCE PROBLEMS NOT WELL UNDERSTOOD
..... ~
DET GOALS AND OBJECTIVES PROVIDE INFORMATION TO SUPPLEMENT OTHER ASSESSMENT DATA AVAILABLE TO NRC SENIOR MANAGEMENT e
EVALUATE LICENSEE MANAGEMENT INVOLVEMENT AND EFFECTIVENESS WITH RESPECT TO SAFE PLANT OPERATION e
EVALUATE THE EFFECTIVENESS OF THE. LICENSEE'S IMPROVEMENT PROGRAMS AND PLANS DETERMINE THE ROOT CAUSES OF SAFETY-RELATED EQUIPMENT AND PERFORMANCE PROBLEMS-
t DET METHOLOGY h
i '
15-MEMBER TEAM: 3-OPS, 4-M&T, 4-ENG, 4-M&O 5-WEEK. EVALUATION: 3 WEEKS ON-SITE, 2 WEEKS IN-OFFICE OVER 140 INTERVIEWS CONDUCTED FROM COB /CEO TO RPO 3 DAYS OF NEAR ROUND-THE-CLOCK CR e
OBSERVATION INDEPTH REVIEW OF 4 SYSTEMS l
1 a
T m
m.
mmm
-- m.
m.____,.-m
~.
...m.
-. -,..-.-_,-_+..
~m
.m
--m.
.--a-.
wm-- --
-.m m
4 1
OPERATIONS WEAKNESSES MARGINAL OPS STAFFING LEVEL CONSIDERING WORKLOAD POOR SITE SUPPORT TO OPERATIONS I
e -CONFLICTING MANAGEMENT EXPECTATIONS AND POLICIES e
INCONSISTENT OPERATOR PERFORMANCE INEFFECTIVE PROBLEM' IDENTIFICATION AND RESOLUTION
.W.
OPERATIONS POSITIVE OBSERVATIONS DEDICATION CONTROL BOARD AWARENESS e
SHIFT TURNOVERS e
RADIOLOGICAL HOUSEKEEPING
.._..._......._....__._..._...__.._...-..._........~..__-..-._..._.._..-_;._..._.._.-..._...._,.--...-
MAINTENANCE AND TESTING WEAKNESSES 4
INEFFECTIVE CORRECTIVE MAINTENANCE PREVENTIVE MAINTENANCE PROGRAM LESS THAN FULLY EFFECTIVE i
l l
MAINTENANCE TRAINING DEFICIENCIES l
l DEFICIENCIES IN THE REPLACEMENT PARTS i
PROGRAM INSUFFICIENT SUPPORT TO MAINTENANCE 1
m v,+*-,--
me v=e e-
=
e-sw
-=-r-ruse.--
us.-+-w w e-msw
.e-
_,,e-*-es-
-ww e-ee...me+a e-ie w+--
w--
+- a~
..e<w,-w
-w e-,e r-.--ei+
..<-w-ew.e L
w + ww -
r
--+o-w
-e. -
m
.-, + > + - +
-".-wr-+-
eu -
me m-me--
4 MAINTENANCE AND TESTING WEAKNESSES (Continued?
e INEFFICIENT WORK CONTROL PROCESS e
POST MAINTENANCE TESTING NOT ALWAYS EFFECTIVE e
. PERIODIC TESTING NOT ALWAYS EFFECTIVE
~.
MAINTENANCE AND TESTING i
POSITIVE OBSERVATIONS QUALITY OF MAINTENANCE FACILITIES TECHNICAL SUPPORT ENGINEER POSITION e
GENERAL MAINTENANCE SUPERVISOR POSITION
ENGINEERING SUPPORT WEAKNESSES WEAK SUPPORT IN RESOLVING PLANT PROBLEMS SYSTEM ENGINEERING PROGRAM NOT EFFECTIVELY lMPLEMENTED ENGINEERING BACKLOGS WERE LARGE, POORLY TRACKED, AND NOT WELL MANAGED USE OF OPERATIONAL EXPERIENCES WAS INADEQUATE INSUFFICIENT SUPPORT TO ENGINEERING 1
.. ~...... - -
l 4
ENGINEERING SUPPORT WEAKNESSES (Continued?
l e
CONFIGURATION CONTROL WEAKNESSES e
ESSENTIAL CHILLED WATER SYSTEM DESIGN, MAINTENANCE, AND TESTING ISSUES CHALLENGE OPERABILITY e
UNTIMELY RESOLUTION OF FIRE PROTECTION ISSUES l
4
ENGINEERING SUPPORT POSITIVE OBSERVATIONS e
TECHNICAL SUPPORT ENGINEERS DESIGN BASIS DOCUMENTATION PROGRAM f
t
~
9 MANAGEMENT A.ND ORGANIZATION WEAKNESSES INEFFECTIVE MANAGEMENT DIRECTION AND e
OVERSIGHT POOR SUPPORT AND RESOURCE UTILIZATION o
COMMUNICATIONS AND TEAMWORK WERE WEAK e
i INEFFECTIVE CORRECTIVE ACTION PROCESS e
INEFFECTIVE UTILIZATION OF SELF ASSESSMENT e
AND QUALITY OVERSIGHT FUNCTIONS INADEQUATE INFORMATION SYSTEMS e
.._,-,--.-.~.__.-.-.L...,
t MANAGEMENT AND ORGANIZATION POSITIVE OBSERVATIONS e
RECENT MASTER OPERATING PLAN IMPROVEMENTS RECENT MANAGEMENT AND ORGANIZATIONAL CHANGES, COMPLETED AND UNDERWAY e
er e.
eg.,
9,
.m g,i, am
,u,,,,.,_,.g,,,
c.e a geg..Tu'w.mFT m
uasw v geMg.-
. +w e wey e
.m, w.
,g.i,,+,ye.W_
%,-sw
,_m e
...ea,-
,u.4_,
_.e,.
w.-
m m
I ROOT CAUSES e
FAILURE OF MANAGEMENT TO PROVIDE ADEQUATE SUPPORT INEFFECTIVE MANAGEMENT DIRECTION AND OVERSIGHT e
FAILURE TO EFFECTIVELY UTILIZE SELF-ASSESSMENT AND QUALITY OVERSIGHT FUNCTIONS l
INEFFECTIVE ROOT CAUSE/ CORRECTIVE ACTION l
PROCESS l
-