ML070170386

From kanterella
Jump to navigation Jump to search
APS Response to NRC Inspection Report 05000528-06-012; 05000529-06-012; 05000530-06-012
ML070170386
Person / Time
Site: Palo Verde  Arizona Public Service icon.png
Issue date: 01/09/2006
From: Mauldin D
Arizona Public Service Co
To:
Document Control Desk, NRC Region 1
References
102-05626-CDM/SAB/JAP/CJS, IR-06-012
Download: ML070170386 (12)


Text

LA

  • A subsidiaryof Pinnacle West CapitalCorporation David Mauldin Vice President Mail Station 7605 Palo Verde Nuclear Nuclear Engineering Tel: 623-393-5553 PO Box 52034 Generating Station and Support Fax: 623-393-6077 Phoenix, Arizona 85072-2034 102-05626-CDM/SAB/JAP/CJS January 09, 2006 U.S. Nuclear Regulatory Commission ATTN: Document Control Desk Washington, DC 20555

Dear Sir:

Subject:

Palo Verde Nuclear Generating Station (PVNGS)

Units 1, 2 and 3 Docket Nos. STN 50-528, 50-529, and 50-530 APS Response to NRC Inspection Report 0500052812006012; 0500052912006012; 0500053012006012 In NRC Special Inspection Report 2006012, dated December 6, 2006, the NRC documented their examination of activities associated with the PVNGS Unit 3, Train A, emergency diesel generator (EDG) failures that occurred on July 25 and September 22, 2006. On both occasions the EDG failed to produce an output voltage during testing.

The report discusses two findings.

The two findings were (1) a lack of adequate instructions for corrective maintenance of the K-1 relay and (2) the failure to identify and correct the cause of erratic K-1 relay operation prior to installation of the spare relay on July 26, 2006. These two findings resulted in the Unit 3, Train A, EDG being inoperable from September 4 until September 22, 2006. APS has reviewed the NRC Inspection Report and has no substantive disagreement with the facts, as documented in the report.

In accordance with the Inspection Manual Chapter 0609, the NRC is currently evaluating the safety significance of the findings. At a January 16, 2007 Regulatory Conference in Arlington, Texas, APS will provide the NRC its perspective on the facts and analytical assumptions relevant to determining the safety significance of the findings.

The purpose of this letter is to provide the results of APS' evaluation of the EDG K-1 relay failures in advance of the Regulatory Conference to facilitate a focused discussion at the conference on the safety significance of the EDG K-1 relay failures. APS is providing our position on the findings as well as the causes and corrective actions which have been or will be taken.

A member of the STARS (Strategic Teaming and Resource Sharing) Alliance Callaway

  • Comanche Peak 0Diablo Canyon 0Palo Verde
  • South Texas Project 0Wolf Creek I E /

U.S. Nuclear Regulatory Commission ATTN: Document Control Desk APS Response to NRC Inspection Report 05000528/2006012; 05000529/2006012; 05000530/2006012 Page 2 NRC letter dated December 22, 2006, which communicated the results of the Regulatory Conference on the Spray Pond operability issue, re-iterated the NRC's concern about the continuing occurrence of problem identification, root cause analysis and technical rigor issues. APS recognizes that this is another such example and is committed to improving its performance in these areas.

APS realizes the troubleshooting and problem solving process lacked the technical rigor necessary to ensure deficiencies were properly identified and resolved the first time. In this case, the failure to consider all possible causes of the July K-1 relay failure resulted in a subsequent failure in September. Immediate actions have been taken to assure the EDGs remain within their design basis. Additional actions to address the programmatic weaknesses have been identified. We continue to implement, reinforce, monitor and adjust our performance improvement plan to provide greater confidence that similar events will not recur.

The Enclosure to this letter contains a summary of APS' preliminary root cause of failure evaluation for the K-1 relay and also includes a response to the two apparent violations.

Finally, APS intends to supplement Licensee Event Report (LER) 2006-006 to reflect the results of the K-1 relay failure investigation.

The actions described in the Enclosure represent corrective actions and are not regulatory commitments. There are no regulatory commitments in this letter. If you have any questions, please contact James A. Proctor at (623) 393-5730.

Sincerely, JML/SAB/JAP/CJS/gt

Enclosure:

Summary of APS Investigation into Unit 3, Train A, Emergency Diesel Generator (EDG) K-1 Relay Failures and Corrective Actions cc: B. S. Mallett NRC Region IV Regional Administrator M. B. Fields NRC NRR Project Manager M. T. Markley NRC NRR Project Manager G. G. Warnick NRC Senior Resident Inspector for PVNGS

ENCLOSURE Summary of APS Investigation into Unit 3, Train A Emergency Diesel Generator (EDG) K-1 Relay Failures And Corrective Actions I - Introduction The following describes the sequence of events and establishes the context for the K-1 relay failure on September 22, 2006. This discussion is not intended to justify the recurrence of the failure, but to establish that APS personnel acted in good faith, though in retrospect, with less than adequate rigor, to identify the apparent cause of failure in July 2006. The EDG was tested repeatedly to confirm that the apparent cause had been addressed before it was returned to service in July 2006.

The root cause analysis of the September 22, 2006 failure of the replacement K-1 relay was determined to be inadequate auxiliary dc contact 'compression.' The symptoms of failure in July, however, could be explained by contact oxidation, and the contact

'compression' issue did not reveal itself during repeated testing prior to returning the relay to service in July 2006. This is explained in more detail in the following sections.

At the time of the original K-1 relay failure on July 25, 2006, there had not been a failure of an EDG to produce any output voltage following a start in the emergency mode in over 3,000 starts since 1990, when a database was initiated to track EDG start history.

A written troubleshooting plan was developed on July 25, 2006 by personnel with over 40 years combined EDG experience. When it was identified that the K-1 relay was the cause of the EDG to not produce output voltage, the 'original' K-1 relay was removed and segregated for failure analysis. The maintenance practice had been to replace the K-1 relay unit, due to its hybrid design, and not to perform maintenance on the K-1 relay.

APS had already planned to replace all of the EDG automatic voltage regulators, including replacement of the K-1 relays, with a different design during the next refueling outage for each unit beginning with Unit 1 in the spring of 2007. The planned replacements were for a variety of reasons, including the inability to obtain replacement parts, since the component parts of the K-1 relay are no longer being manufactured and there were limited spares available.

When the first replacement K-1 relay was obtained from the warehouse (one of two remaining spares) it exhibited symptoms of auxiliary dc contact oxidation, which was not unexpected due to the relay being stored for about 20 years. Initial attempts to remove this oxidation by non-intrusive methods were not entirely successful, and the last replacement spare in the warehouse was judged to not be suitable for use, due to a warped relay cover and apparent auxiliary dc contact oxidation. It was only after these efforts that APS personnel had no recourse but to disassemble the auxiliary dc contact assembly on the first replacement relay to perform more extensive contact cleaning.

1

Before disassembly of the replacement dc auxiliary contact assembly, a dc auxiliary contact assemb!y from a training relay was disassembled with the craft to ensure adequate knowledge of the device. Maintenance and Engineering personnel practiced repeated disassemblies and reassemblies of the auxiliary dc contact assembly for approximately 2 hours2.314815e-5 days <br />5.555556e-4 hours <br />3.306878e-6 weeks <br />7.61e-7 months <br /> to ensure proficiency. There was no attempt to change the configuration of the relay, just an effort to clean the contacts, following disassembly.

Following the corrective maintenance on the replacement relay, approximately 10 manual actuation tests and 3 electrical functional tests verified that the replacement relay was performing properly. A successful maintenance run of the Unit 3, Train A, EDG was performed and the EDG passed a Technical Specification surveillance test, before being declared operable. During these retests, there was no indication of erratic dc auxiliary contact assembly operation. The Unit 3, Train A, EDG was subsequently successfully tested on August 7 and 24, 2006 and finally on September 4, 2006. Our root cause efforts have led us to the conclusion that the relay failed to properly reset after the September 4, 2006 test. This failure led to the September 22, 2006 failure of the Unit 3, Train A, EDG.

In retrospect, APS acknowledges, as described in the inspection report "the licensee's problem analysis efforts were narrowly focused, which led them to conclude that the cause of the erratic dc auxiliary switch operation was oxidized contacts." (Page 7 of IR enclosure) The failure mechanism of inadequate contact 'compression' of September 22, 2006 did not reveal itself during repeated maintenance and post-installation tests following cleaning of the contacts in July 2006. Subsequent testing demonstrated that inadequate contact 'compression' was the cause of the September 22, 2006 failure.

The latent failure mechanism did ultimately reveal itself as part of the routine testing protocols implemented as part of normal plant operations. It should be noted that subsequent root cause of failure testing of the replacement K-1 relay determined that the mean number of operations before failure was 58 cycles, with a minimum of 0 and a maximum of 323 cycles.

2 - Summary of Emergency Diesel Generator K-1 Relay Root Cause of Failure (RCF) Testing and Evaluation The testing developed to determine the root cause of failure of the September 22, 2006 event was performed in the following five phases:

1. Verified the integrity of circuits external to the K-1 relay to conclusively determine that the K-1 relay is the appropriate focus of this root cause investigation.
2. Performed a physical and dimensional comparison of the K-1 relay as well as other training, failed or spare K-1 relays.

2

3. Deterrmirý.d if temperature can affect the K-1 relay dc auxiliary contact assembly perfcrmance such that it can cause dimensional tolerances to grow and open closed contacts.
4. Electrically cycled the K-1 relay to determine if the September 22, 2006 event could be repeated.
5. Performed an internal inspection of the September 22, 2006 dc auxiliary contact assemb!y.

The root cause of failure testing of the K-1 relay produced the following substantive conclusions:

" The K-1 relay failure to reset was repeatable.

  • A troubleshooting event, where a technician made contact with a terminal (wire number 69) on the dc auxiliary contact assembly which caused the K-1 relay to reset on September 22, 2006, was repeatable. The K-1 testing revealed that when the K-1 relay failed to reset, it was then possible to manipulate wire number 69 which caused the dc auxiliary normally open contact to open and close. It was noted during an internal inspection of the dc auxiliary contact assembly that the lower terminal on this device was loose due to an anomaly in the molded enclosure. Straightening the K-1 relay metal actuator arm rectified the loose terminal by applying positive pressure to the stationary and movable contacts of the dc auxiliary contact assembly.

" Dimensional comparison showed that the root cause of failure of the September 22, 2006 event was due to the accumulation of tolerances associated with the various components that make-up the K-1 relay (inadequate auxiliary dc contact

'compression').

  • With the K-1 relay in the latched state, and the position of the dc auxiliary contact assembly normally open contact marginally closed, an increase of temperature did not provide the needed external stimulus to cause a change of state with this contact.

" During the cycle testing, there were no failures noted when the K-1 relay's dc auxiliary contact assembly's normally open contact closed and then subsequently opened. It was observed that during a failure, the normally open contact simply did not make-up when the K-1 relay was latched.

  • The decision made by engineering, after the September 22, 2006 failure, to 'field straighten' the metal actuator was correct. This compensates for any tolerance stack-up issues. This configuration was evaluated both dimensionally and by cycling a K-1 relay.

3

In summary', thW.: root cause of failure analysis concluded that the Unit 3, Train A, EDG was inopernble Crom September 4 to September 22, 2006 as the K-1 relay dc auxiliary contact assembly contacts did not close properly following the September 4th shutdown of the EDO. The auxiliary dc contacts were stable (i.e., would not change state due to physical contact) if they properly reset, but were unstable (i.e., easily changed state with physical contact) i, they failed to properly reset.

3 - Response to Apparent Violations This section sets forth APS' position on the two apparent violations and summarizes corrective actions taken or planned that are directly related to the apparent violations.

Apparent Violation of 10 CFR 50, Appendix B, Criterion V, "Instructions, Procedures, and Drawings" Restatement of Apparent Violation 10 CFR Part 50, Appendix B, Criterion V, "Instructions, Procedures, and Drawings,"

states, in part, that activities affecting quality shall be prescribed by documented instructions, procedures, or drawings of a type appropriate to the circumstances and shall be accomplished in accordance with these instructions, procedures, or drawings.

Contrary to this, the licensee failed to develop appropriate instructions or procedures for corrective maintenance activities on the Unit 3 Train A EDG K-1 relay. This failure resulted in the Unit 3 Train A EDG being inoperable between September 4 and 22, 2006. This item has been entered into the licensee's corrective action program as Condition Report/Disposition Request (CRDR) 2926830. Pending determination of safety significance, this finding is identified as an apparent violation (AV)5000530/2006012-01, "Failure to Establish Appropriate Instructions."

Apparent Violation of 10 CFR 50, Appendix B, Criterion XVI, "Corrective Actions" Restatement of Apparent Violation 10 CFR Part 50, Appendix B, Criterion XVI, "Corrective Action," states, in part, that measures shall be established to assure that conditions adverse to quality, such as failures, malfunctions, deficiencies, deviations, defective material and equipment, and nonconformances are promptly identified and corrected and for significant conditions adverse to quality, measures shall assure that the cause of the condition is determined and corrective action taken to preclude repetition. Contrary to this, the licensee failed to identify and correct the cause of the erratic EDG K-1 relay operation prior to installation of the relay on July 26, 2006. This failure resulted in the Unit 3 Train A EDG being inoperable between September 4 and 22, 2006. This item has been entered into the licensee's corrective action program as CRDR 2926830. Pending determination of safety significance, this finding is identified as AV 05000530/2006012-02, "Failure to Identify and Correct a Condition Adverse to Quality."

4

Admission APS admits these apparent violations.

Cause Personnel attempted to address the most likely cause of the failure first, and to determine if this first cause, when corrected, addressed the problem. This approach tries to ensure that actual causes are not masked by performing multiple actions and then not knowing which action was the reason for solving the apparent problem. This narrow approach, however, did not consider all potential failure possibilities and was not successful in identifying the latent flaw in the set-up of the K-1 replacement relay.

When contact cleaning appeared to correct the performance problem, as evidenced by repeated manual, electrical and in-service tests, it was thought that the cause had been addressed.

APS acknowledges the problem analysis efforts were narrowly focused, which led to an incorrect conclusion that the cause of the erratic relay operation was solely due to oxidized contacts. During the investigation following the September 22, 2006 failure, it was determined that an opportunity to identify the cause of failure was missed early in the July 25 investigation. There was a lack of communication between organizations (i.e., technicians and engineers) during the troubleshooting and relay installation process, which may have led to a realization that more than contact oxidation was involved.

APS recognized that this failure to correctly identify the cause of the July 25, 2006 Unit 3, Train A, EDG K-1 relay failure had potential organizational and programmatic elements and initiated a specific investigation (CRDR 2950124) to identify any such issues. The analysis methodology and preliminary results of this investigation are provided to demonstrate that APS recognizes that this specific equipment failure is an opportunity to reassess the adequacy of the problem analysis, troubleshooting and cause evaluation processes being implemented at Palo Verde.

Instead of simply extrapolating the organizational and programmatic (O&P) root causes from this single event, the investigation team took a broader look at all the barriers to minimize error in problem solving and root cause analysis at PVNGS in general. An independent consultant with failure mode analysis and O&P experience was used in the assessment, working in conjunction with PVNGS personnel.

The team identified the possible failure modes that result in inconsistent use of the problem solving methodology to solve equipment failures and evaluated their effects.

The team evaluated the effectiveness of barriers and determined their contribution to ineffective cause analysis. Finally, the team applied stream analysis to determine cause-effect relationships among the organization and programmatic failure modes.

5

In summary, the team has preliminarily identified 4 organizational and programmatic

'root' causes and 6 'contributing' causes. The root causes were:

1. Inconsistent equipment root cause of failure program management, resulting in varying degrees of document quality.
2. No formal problem solving and troubleshooting process in place to establish evaluation consistency.
3. Inconsistent management reinforcement of the equipment root cause of failure application methodology resulting in inconsistent evaluation quality.
4. Inconsistent application of a problem solving methodology, as evidenced by varying degrees of rigor in root cause of failure determinations.

The contributing causes were:

1. Inconsistent consideration of all failure modes (lack of formal troubleshooting and problem solving process).
2. No continuing training on equipment failure analysis methodology.
3. Operating experience was not typically reviewed prior to opening components for analysis (lack of formal problem solving and troubleshooting process).
4. Program to program interface between corrective maintenance and the equipment root cause of failure programs has allowed the equipment root cause of failure process to be bypassed in some cases.
5. Inconsistent priority has been given to the equipment failure analyses, resulting in less than acceptable documentation.
6. Inappropriate accountability for meeting both timeliness and quality expectations on equipment failure determinations.

Corrective Actions Taken and Results Achieved Equipment Actions Corrective actions involved mechanical adjustments to the relay actuating arm to provide adequate auxiliary contact compression. Additional corrective actions included inspecting, cleaning, and making mechanical adjustments, as necessary, to all other affected EDG K-1 relays consistent with detailed methodology that determined the proper amount of contact 'compression.' These actions corrected the direct cause of the September 22, 2006 Unit 3, Train A, EDG failure.

Orgqanizational and Programmatic Actions Completed and on-going corrective actions for the root and contributing causes of the organizational and programmatic issues include elements of the Performance Improvement Plan and review of ERCFA reports by the Engineering Product Review Board (EPRB).

6

The Operations decision making process (ODP-16) and the Engineering human performance tools (EDG-01 and 02) are considered sufficient interim action until the more extensive planned actions are fully implemented.

Corrective Actions to Be Taken Equipment Actions APS plans to replace al: of the EDG automatic voltage regulators, including replacement of the K-I relays, with a different design during the next refueling outage for each unit.

This is the longer-term equipment corrective action to prevent recurrence.

Related longer-term corrective actions resulting from the root cause of failure analysis that address precluding event recurrence include:

" Perform a review of the proposed modification that will ciange the control and power components of the diesel generator excitation systems. This review will include:

o Method of de-excitation (e.g., type of relay to be used for field shorting, other types of field shorting devices used in the industry, reset circuit logic, circuit redundancy, etc.)

o Evaluate the need for an annunciator and alarm circuit on the field shorting relay.

o Completeness of vendor documentation (e.g., complete set of control documentation from the prime and sub-component suppliers, etc.)

o PM's for the various newly installed components (e.g., PM's on K-1 relay including verification of sufficient contact 'compression.').

(Due date: May 1, 2007)

  • Incorporate the findings from the failure analysis into the current K-1 documentation (e.g., technical manuals, operating procedures, etc.) (Due date:

August 31, 2007)

  • Evaluate the need for the field shorting components of the diesel generator excitation system. (Due date: August 31, 2007)

" Evaluate installing a jumper across the dc auxiliary contact assembly normally open contact to eliminate the potential of the contact failing to close and making the EDG inoperable. (Due date: August 31, 2007)

" Assess if there are other similar safety-related circuits in the units where alarms need to be installed to monitor the operability of the circuit. (Due date: August 31, 2007)

  • The final root cause analysis will assess the extent of cause and the extent of condition.

7

Or~ganizational and Programmatic Actions Corrective actions to be taken for each root and contributing cause for the organizational and programmatic issues are as follows:

The Plant Health Ccmmittee will review the organizational and programmatic root cause evaluation. (Due date: February 28, 2007)

Root Cause 1 - Inconsistent equipment root cause of failure analysis (ERCFA) program management, resulting in varying degrees of ERCFA document quality.

Corrective Actions Establish a single ERCFA program owner. (Due date: February 1, 2007)

Establish performance indicators for the ERCFA program. (Due date: April 1, 2007)

Perform an ERCFA program self assessment. (Due date: March 1, 2007)

Include the ERCFA program in procedure 73DP-OAP05, Engineering Programs Management and Health Reporting. (Due date: March 31, 2007)

Establish an ERCFA program improvement plan. (Due date: April 1, 2007)

Root Cause 2 - Lack of a formal problem solving and trouble shooting process to establish ERCFA consistency.

Corrective Actions Develop a troubleshooting and problem solving process to be used by Operations, Maintenance and Engineering. Include as part of this process guidance and directions on when and how to develop specific work or troubleshooting instructions in the absence of component design instructions or information. (Due date: February 1, 2007)

Revise the ERCFA program to include the new problem solving process. (Due date: March 1, 2007)

Provide training on the new process to selected Operations, Maintenance and Engineering personnel. (Due date: May 1, 2007)

Root Cause 3 - Inconsistent management reinforcement of ERCFA application methodology, which resulted in inconsistent quality of root cause of failure determinations.

Corrective Actions Establish a quality checklist for EPRB and Corrective Action Review Board review of ERCFA evaluations. (Due date: February 10, 2007) 8

Ensure samples of ERCFA evaluations are included in the next EPRB meeting (Due late: February 10, 2007)

The existing Leadership Performance Improvement Plan has established common goals and the expectation that the highest standards of performance and personal accountability be pursued.

Root Cause 4 - Inconsistent application of a systematic problem solving methodology, resulting in varying degrees of rigor in root cause of failure determinations and quality of documented evaluations.

Corrective Actions Provide training to selected Operations, Maintenance and Engineering personnel on the new troubleshooting and problem solving process. (Due date: May 1, 2007)

Contributing Cause 1 - Inconsistent consideration of all failure modes.

Corrective Actions Provide training to ERCFA qualified engineers that will include the need to consider all failure modes as part of initial troubleshooting and root cause activities. (Due date: May 1, 2007)

Contributing Cause 2 - No continuing training on equipment failure analysis methodology.

Corrective Actions Provide training to ERCFA-qualified engineers on changes to the ERCFA program. (Due date: May 1, 2007)

Establish periodic ERCFA industry events training. (Due date: March 31, 2007)

Contributing Cause 3 - Operating experience was not typically reviewed prior to opening components for analysis.

Corrective Actions Provide training to ERCFA qualified engineers that will include reviewing any applicable Operating Experience as part of the initial troubleshooting and root cause activities. (Due date: May 1, 2007)

Contributing Cause 4 - Program to Program interface between Corrective Maintenance and ERCFA programs allows the ERCFA process to be bypassed in some cases.

9

Corrective Actions Perform a self assessment of ERCFA program, including corrective maintenance interface, to identify appropriate corrective actions. (Due date: March 1, 2007)

Contributing Cause 5 - Inconsistent priority has been given to the equipment failure analyses, resulting in less than acceptable documentation.

Corrective Actions Provide training to ERCFA qualified engineers that will include a discussion of establishing appropriate priority to ensure a quality analysis. (Due date: May 1, 2007)

The existing Leadership Performance Improvement Plan has established common goals and the expectation that the highest standards of performance and personal accountability be pursued.

Contributing Cause 6 - Inappropriate accountabi!ity for meeting both timeliness and quality expectations on ERCFA determinations.

Corrective Actions Establish a single ERCFA program owner. (Due date: February 1, 2007)

Provide training to ERCFA qualified engineers that will include a discussion of accountability and expectations for both quality and timeliness. (Due date: May 1, 2007)

The existing Leadership Performance Improvement Plan has established common goals and the expectation that the highest standards of performance and personal accountability be pursued.

An effectiveness review will be established to ensure the equipment failure analysis program improvements are achieving the desired results. Performance indicators established as part of the above corrective actions will be used to monitor performance.

4 - Conclusion APS realizes the troubleshooting and problem solving process lacked the technical rigor necessary to ensure deficiencies were properly identified and resolved the first time. In this case, the failure to consider all possible causes of the July K-1 relay failure resulted in a subsequent failure in September. Immediate actions have been taken to assure the EDGs remain within their design basis. Additional actions to address the programmatic weaknesses have been identified to provide greater assurance that similar events will not occur.

10