ML070180148: Difference between revisions

From kanterella
Jump to navigation Jump to search
Created page by program invented by StriderTol
Created page by program invented by StriderTol
Line 15: Line 15:
| page count = 12
| page count = 12
}}
}}
See also: [[see also::IR 05000528/2006012]]


=Text=
=Text=
{{#Wiki_filter:CORRECTED  
{{#Wiki_filter:CORRECTED COPY L A M A subsidiary of Pinnacle West Capital Corporation David Mauldin Vice President Mail Station 7605 Palo Verde Nuclear Nuclear Engineering Tel: 623-393-5553 PO Box 52034 Generating Station and Support Fax: 623-393-6077 Phoenix, Arizona 85072-2034 102-05626-CDM/SAB/JAP/CJS January 09, 2007 U.S. Nuclear Regulatory Commission ATTN: Document Control Desk Washington, DC 20555  
COPY L A M A subsidiary  
 
of Pinnacle West Capital Corporation
==Dear Sir:==
David Mauldin Vice President  
 
Mail Station 7605 Palo Verde Nuclear Nuclear Engineering  
==Subject:==
Tel: 623-393-5553  
Palo Verde Nuclear Generating Station (PVNGS)Units 1, 2 and 3 Docket Nos. STN 50-528, 50-529, and 50-530 APS Response to NRC Inspection Report 0500052812006012; 0500052912006012; 05000530/2006012 In NRC Special Inspection Report 2006012, dated December 6, 2006, the NRC documented their examination of activities associated with the PVNGS Unit 3, Train A, emergency diesel generator (EDG) failures that occurred on July 25 and September 22, 2006. On both occasions the EDG failed to produce an output voltage during testing.The report discusses two findings.The two findings were (1) a lack of adequate instructions for corrective maintenance of the K-1 relay and (2) the failure to identify and correct the cause of erratic K-1 relay operation prior to installation of the spare relay on July 26, 2006. These two findings resulted in the Unit 3, Train A, EDG being inoperable from September 4 until September 22, 2006. APS has reviewed the NRC Inspection Report and has no substantive disagreement with the facts, as documented in the report.In accordance with the Inspection Manual Chapter 0609, the NRC is currently evaluating the safety significance of the findings.
PO Box 52034 Generating  
At a January 16, 2007 Regulatory Conference in Arlington, Texas, APS will provide the NRC its perspective on the facts and analytical assumptions relevant to determining the safety significance of the findings.The purpose of this letter is to provide the results of APS' evaluation of the EDG K-1 relay failures in advance of the Regulatory Conference to facilitate a focused discussion at the conference on the safety significance of the EDG K-1 relay failures.
Station and Support Fax: 623-393-6077  
APS is providing our position on the findings as well as the causes and corrective actions which have been or will be taken.A member of the STARS (Strategic Teaming and Resource Sharing) Alliance J gI0 J Callaway 0 Comanche Peak 0 Diablo Canyon
Phoenix, Arizona 85072-2034
* Palo Verde 0 South Texas Project & Wolf Creek U.S. Nuclear Regulatory Commission ATTN: Document Control Desk APS Response to NRC Inspection Report 05000528/2006012; 05000529/2006012; 05000530/2006012 Page 2 NRC letter dated December 22, 2006, which communicated the results of the Regulatory Conference on the Spray Pond operability issue, re-iterated the NRC's concern about the continuing occurrence of problem identification, root cause analysis and technical rigor issues. APS recognizes that this is another such example and is committed to improving its performance in these areas.APS realizes the troubleshooting and problem solving process lacked the technical rigor necessary to ensure deficiencies were properly identified and resolved the first time. In this case, the failure to consider all possible causes of the July K-1 relay failure resulted in a subsequent failure in September.
102-05626-CDM/SAB/JAP/CJS
Immediate actions have been taken to assure the EDGs remain within their design basis. Additional actions to address the programmatic weaknesses have been identified.
January 09, 2007 U.S. Nuclear Regulatory  
We continue to implement, reinforce, monitor and adjust our performance improvement plan to provide greater confidence that similar events will not recur.The Enclosure to this letter contains a summary of APS' preliminary root cause of failure evaluation for the K-1 relay and also includes a response to the two apparent violations.
Commission
Finally, APS intends to supplement Licensee Event Report (LER) 2006-006 to reflect the results of the K-1 relay failure investigation.
ATTN: Document Control Desk Washington, DC 20555 Dear Sir: Subject: Palo Verde Nuclear Generating  
The actions described in the Enclosure represent corrective actions and are not regulatory commitments.
Station (PVNGS)Units 1, 2 and 3 Docket Nos. STN 50-528, 50-529, and 50-530 APS Response to NRC Inspection  
There are no regulatory commitments in this letter. If you have any questions, please contact James A. Proctor at (623) 393-5730.Sincerely, JMLISAB/JAP/CJS/gt
Report 0500052812006012;
 
0500052912006012;  
==Enclosure:==
05000530/2006012
 
In NRC Special Inspection  
Summary of APS Investigation into Unit 3, Train A, Emergency Diesel Generator (EDG) K-1 Relay Failures and Corrective Actions cc: B. S. Mallett NRC Region IV Regional Administrator M. B. Fields NRC NRR Project Manager M. T. Markley NRC NRR Project Manager G. G. Warnick NRC Senior Resident Inspector for PVNGS ENCLOSURE Summary of APS Investigation into Unit 3, Train A Emergency Diesel Generator (EDG) K-1 Relay Failures And Corrective Actions 1 -Introduction The following describes the sequence of events and establishes the context for the K-1 relay failure on September 22, 2006. This discussion is not intended to justify the recurrence of the failure, but to establish that APS personnel acted in good faith, though in retrospect, with less than adequate rigor, to identify the apparent cause of failure in July 2006. The EDG was tested repeatedly to confirm that the apparent cause had been addressed before it was returned to service in July 2006.The root cause analysis of the September 22, 2006 failure of the replacement K-1 relay was determined to be inadequate auxiliary dc contact 'compression.'
Report 2006012, dated December 6, 2006, the NRC documented  
The symptoms of failure in July, however, could be explained by contact oxidation, and the contact'compression' issue did not reveal itself during repeated testing prior to returning the relay to service in July 2006. This is explained in more detail in the following sections.At the time of the original K-1 relay failure on July 25, 2006, there had not been a failure of an EDG to produce any output voltage following a start in the emergency mode in over 3,000 starts since 1990, when a database was initiated to track EDG start history.A written troubleshooting plan was developed on July 25, 2006 by personnel with over 40 years combined EDG experience.
their examination  
When it was identified that the K-1 relay was the cause of the EDG to not produce output voltage, the 'original' K-1 relay was removed and segregated for failure analysis.
of activities  
The maintenance practice had been to replace the K-1 relay unit, due to its hybrid design, and not to perform maintenance on the K-1 relay.APS had already planned to replace all of the EDG automatic voltage regulators, including replacement of the K-1 relays, with a different design during the next refueling outage for each unit beginning with Unit 1 in the spring of 2007. The planned replacements were for a variety of reasons, including the inability to obtain replacement parts, since the component parts of the K-1 relay are no longer being manufactured and there were limited spares available.
associated  
When the first replacement K-1 relay was obtained from the warehouse (one of two remaining spares) it exhibited symptoms of auxiliary dc contact oxidation, which was not unexpected due to the relay being stored for about 20 years. Initial attempts to remove this oxidation by non-intrusive methods were not entirely successful, and the last replacement spare in the warehouse was judged to not be suitable for use, due to a warped relay cover and apparent auxiliary dc contact oxidation.
with the PVNGS Unit 3, Train A, emergency  
It was only after these efforts that APS personnel had no recourse but to disassemble the auxiliary dc contact assembly on the first replacement relay to perform more extensive contact cleaning.1 Before disassembly of the replacement dc auxiliary contact assembly, a dc auxiliary contact assembly from a training relay was disassembled with the craft to ensure adequate knowledge of the device. Maintenance and Engineering personnel practiced repeated disassemblies and reassemblies of the auxiliary dc contact assembly for approximately 2 hours to ensure proficiency.
diesel generator (EDG) failures that occurred on July 25 and September  
There was no attempt to change the configuration of the relay, just an effort to clean the contacts, following disassembly.
22, 2006. On both occasions  
Following the corrective maintenance on the replacement relay, approximately 10 manual actuation tests and 3 electrical functional tests verified that the replacement relay was performing properly.
the EDG failed to produce an output voltage during testing.The report discusses  
A successful maintenance run of the Unit 3, Train A, EDG was performed and the EDG passed a Technical Specification surveillance test, before being declared operable.
two findings.The two findings were (1) a lack of adequate instructions  
During these retests, there was no indication of erratic dc auxiliary contact assembly operation.
for corrective  
The Unit 3, Train A, EDG was subsequently successfully tested on August 7 and 24, 2006 and finally on September 4, 2006. Our root cause efforts have led us to the conclusion that the relay failed to properly reset after the September 4, 2006 test. This failure led to the September 22, 2006 failure of the Unit 3, Train A, EDG.In retrospect, APS acknowledges, as described in the inspection report "the licensee's problem analysis efforts were narrowly focused, which led them to conclude that the cause of the erratic dc auxiliary switch operation was oxidized contacts." (Page 7 of IR enclosure)
maintenance  
The failure mechanism of inadequate contact 'compression' of September 22, 2006 did not reveal itself during repeated maintenance and post-installation tests following cleaning of the contacts in July 2006. Subsequent testing demonstrated that inadequate contact 'compression' was the cause of the September 22, 2006 failure.The latent failure mechanism did ultimately reveal itself as part of the routine testing protocols implemented as part of normal plant operations.
of the K-1 relay and (2) the failure to identify and correct the cause of erratic K-1 relay operation  
It should be noted that subsequent root cause of failure testing of the replacement K-1 relay determined that the mean number of operations before failure was 58 cycles, with a minimum of 0 and a maximum of 323 cycles.2 -Summary of Emergency Diesel Generator K-1 Relay Root Cause of Failure (RCF) Testing and Evaluation The testing developed to determine the root cause of failure of the September 22, 2006 event was performed in the following five phases: 1. Verified the integrity of circuits external to the K-1 relay to conclusively determine that the K-1 relay is the appropriate focus of this root cause investigation.
prior to installation  
: 2. Performed a physical and dimensional comparison of the K-1 relay as well as other training, failed or spare K-1 relays.2
of the spare relay on July 26, 2006. These two findings resulted in the Unit 3, Train A, EDG being inoperable  
: 3. Determined if temperature can affect the K-1 relay dc auxiliary contact assembly performance such that it can cause dimensional tolerances to grow and open closed contacts.4. Electrically cycled the K-1 relay to determine if the September 22, 2006 event could be repeated.5. Performed an internal inspection of the September 22, 2006 dc auxiliary contact assembly.The root cause of failure testing of the K-1 relay produced the following substantive conclusions: " The K-1 relay failure to reset was repeatable.
from September  
* A troubleshooting event, where a technician made contact with a terminal (wire number 69) on the dc auxiliary contact assembly which caused the K-1 relay to reset on September 22, 2006, was repeatable.
4 until September 22, 2006. APS has reviewed the NRC Inspection  
The K-1 testing revealed that when the K-1 relay failed to reset, it was then possible to manipulate wire number 69 which caused the dc auxiliary normally open contact to open and close. It was noted during an internal inspection of the dc auxiliary contact assembly that the lower terminal on this device was loose due to an anomaly in the molded enclosure.
Report and has no substantive
Straightening the K-1 relay metal actuator arm rectified the loose terminal by applying positive pressure to the stationary and movable contacts of the dc auxiliary contact assembly.* Dimensional comparison showed that the root cause of failure of the September 22, 2006 event was due to the accumulation of tolerances associated with the various components that make-up the K-1 relay (inadequate auxiliary dc contact'compression')." With the K-1 relay in the latched state, and the position of the dc auxiliary contact assembly normally open contact marginally closed, an increase of temperature did not provide the needed external stimulus to cause a change of state with this contact." During the cycle testing, there were no failures noted when the K-1 relay's dc auxiliary contact assembly's normally open contact closed and then subsequently opened. It was observed that during a failure, the normally open contact simply did not make-up when the K-1 relay was latched.* The decision made by engineering, after the September 22, 2006 failure, to 'field straighten' the metal actuator was correct. This compensates for any tolerance stack-up issues. This configuration was evaluated both dimensionally and by cycling a K-1 relay.3 In summary, the root cause of failure analysis concluded that the Unit 3, Train A, EDG was inoperable from September 4 to September 22, 2006 as the K-1 relay dc auxiliary contact assembly contacts did not close properly following the September 4th shutdown of the EDG. The auxiliary dc contacts were stable (i.e., would not change state due to physical contact) if they properly reset, but were unstable (i.e., easily changed state with physical contact) if they failed to properly reset.3 -Response to Apparent Violations This section sets forth APS' position on the two apparent violations and summarizes corrective actions taken or planned that are directly related to the apparent violations.
disagreement  
Apparent Violation of 10 CFR 50, Appendix B, Criterion V, "Instructions, Procedures, and Drawings" Restatement of Apparent Violation 10 CFR Part 50, Appendix B, Criterion V, "Instructions, Procedures, and Drawings," states, in part, that activities affecting quality shall be prescribed by documented instructions, procedures, or drawings of a type appropriate to the circumstances and shall be accomplished in accordance with these instructions, procedures, or drawings.Contrary to this, the licensee failed to develop appropriate instructions or procedures for corrective maintenance activities on the Unit 3 Train A EDG K-1 relay. This failure resulted in the Unit 3 Train A EDG being inoperable between September 4 and 22, 2006. This item has been entered into the licensee's corrective action program as Condition Report/Disposition Request (CRDR) 2926830. Pending determination of safety significance, this finding is identified as an apparent violation (AV)5000530/2006012-01, "Failure to Establish Appropriate Instructions." Apparent Violation of 10 CFR 50, Appendix B, Criterion XVI, "Corrective Actions" Restatement of Apparent Violation 10 CFR Part 50, Appendix B, Criterion XVI, "Corrective Action," states, in part, that measures shall be established to assure that conditions adverse to quality, such as failures, malfunctions, deficiencies, deviations, defective material and equipment, and nonconformances are promptly identified and corrected and for significant conditions adverse to quality, measures shall assure that the cause of the condition is determined and corrective action taken to preclude repetition.
with the facts, as documented  
Contrary to this, the licensee failed to identify and correct the cause of the erratic EDG K-1 relay operation prior to installation of the relay on July 26, 2006. This failure resulted in the Unit 3 Train A EDG being inoperable between September 4 and 22, 2006. This item has been entered into the licensee's corrective action program as CRDR 2926830. Pending determination of safety significance, this finding is identified as AV 05000530/2006012-02, "Failure to Identify and Correct a Condition Adverse to Quality." 4 Admission APS admits these apparent violations.
in the report.In accordance  
Cause Personnel attempted to address the most likely cause of the failure first, and to determine if this first cause, when corrected, addressed the problem. This approach tries to ensure that actual causes are not masked by performing multiple actions and then not knowing which action was the reason for solving the apparent problem. This narrow approach, however, did not consider all potential failure possibilities and was not successful in identifying the latent flaw in the set-up of the K-1 replacement relay.When contact cleaning appeared to correct the performance problem, as evidenced by repeated manual, electrical and in-service tests, it was thought that the cause had been addressed.
with the Inspection  
APS acknowledges the problem analysis efforts were narrowly focused, which led to an incorrect conclusion that the cause of the erratic relay operation was solely due to oxidized contacts.
Manual Chapter 0609, the NRC is currently evaluating  
During the investigation following the September 22, 2006 failure, it was determined that an opportunity to identify the cause of failure was missed early in the July 25 investigation.
the safety significance  
There was a lack of communication between organizations (i.e., technicians and engineers) during the troubleshooting and relay installation process, which may have led to a realization that more than contact oxidation was involved.APS recognized that this failure to correctly identify the cause of the July 25, 2006 Unit 3, Train A, EDG K-1 relay failure had potential organizational and programmatic elements and initiated a specific investigation (CRDR 2950124) to identify any such issues. The analysis methodology and preliminary results of this investigation are provided to demonstrate that APS recognizes that this specific equipment failure is an opportunity to reassess the adequacy of the problem analysis, troubleshooting and cause evaluation processes being implemented at Palo Verde.Instead of simply extrapolating the organizational and programmatic (O&P) root causes from this single event, the investigation team took a broader look at all the barriers to minimize error in problem solving and root cause analysis at PVNGS in general. An independent consultant with failure mode analysis and O&P experience was used in the assessment, working in conjunction with PVNGS personnel.
of the findings.  
The team identified the possible failure modes that result in inconsistent use of the problem solving methodology to solve equipment failures and evaluated their effects.The team evaluated the effectiveness of barriers and determined their contribution to ineffective cause analysis.
At a January 16, 2007 Regulatory
Finally, the team applied stream analysis to determine cause-effect relationships among the organization and programmatic failure modes.5 In summary, the team has preliminarily identified 4 organizational and programmatic
Conference  
'root' causes and 6 'contributing' causes. The root causes were: 1. Inconsistent equipment root cause of failure program management, resulting in varying degrees of document quality.2. No formal problem solving and troubleshooting process in place to establish evaluation consistency.
in Arlington, Texas, APS will provide the NRC its perspective  
: 3. Inconsistent management reinforcement of the equipment root cause of failure application methodology resulting in inconsistent evaluation quality.4. Inconsistent application of a problem solving methodology, as evidenced by varying degrees of rigor in root cause of failure determinations.
on the facts and analytical  
The contributing causes were: 1. Inconsistent consideration of all failure modes (lack of formal troubleshooting and problem solving process).2. No continuing training on equipment failure analysis methodology.
assumptions  
: 3. Operating experience was not typically reviewed prior to opening components for analysis (lack of formal problem solving and troubleshooting process).4. Program to program interface between corrective maintenance and the equipment root cause of failure programs has allowed the equipment root cause of failure process to be bypassed in some cases.5. Inconsistent priority has been given to the equipment failure analyses, resulting in less than acceptable documentation.
relevant to determining  
: 6. Inappropriate accountability for meeting both timeliness and quality expectations on equipment failure determinations.
the safety significance  
Corrective Actions Taken and Results Achieved Equipment Actions Corrective actions involved mechanical adjustments to the relay actuating arm to provide adequate auxiliary contact compression.
of the findings.The purpose of this letter is to provide the results of APS' evaluation  
Additional corrective actions included inspecting, cleaning, and making mechanical adjustments, as necessary, to all other affected EDG K-1 relays consistent with detailed methodology that determined the proper amount of contact 'compression.'
of the EDG K-1 relay failures in advance of the Regulatory  
These actions corrected the direct cause of the September 22, 2006 Unit 3, Train A, EDG failure.Orqanizational and Programmatic Actions Completed and on-going corrective actions for the root and contributing causes of the organizational and programmatic issues include elements of the Performance Improvement Plan and review of ERCFA reports by the Engineering Product Review Board (EPRB).6 The Operations decision making process (ODP-16) and the Engineering human performance tools (EDG-01 and 02) are considered sufficient interim action until the more extensive planned actions are fully implemented.
Conference  
Corrective Actions to Be Taken Equipment Actions APS plans to replace all of the EDG automatic voltage regulators, including replacement of the K-1 relays, with a different design during the next refueling outage for each unit.This is the longer-term equipment corrective action to prevent recurrence.
to facilitate  
Related longer-term corrective actions resulting from the root cause of failure analysis that address precluding event recurrence include:* Perform a review of the proposed modification that will change the control and power components of the diesel generator excitation systems. This review will include: o Method of de-excitation (e.g., type of relay to be used for field shorting, other types of field shorting devices used in the industry, reset circuit logic, circuit redundancy, etc.)o Evaluate the need for an annunciator and alarm circuit on the field shorting relay.o Completeness of vendor documentation (e.g., complete set of control documentation from the prime and sub-component suppliers, etc.)o PM's for the various newly installed components (e.g., PM's on K-1 relay including verification of sufficient contact 'compression.').(Due date: May 1, 2007)* Incorporate the findings from the failure analysis into the current K-1 documentation (e.g., technical manuals, operating procedures, etc.) (Due date: August 31, 2007)" Evaluate the need for the field shorting components of the diesel generator excitation system. (Due date: August 31, 2007)* Evaluate installing a jumper across the dc auxiliary contact assembly normally open contact to eliminate the potential of the contact failing to close and making the EDG inoperable. (Due date: August 31, 2007)" Assess if there are other similar safety-related circuits in the units where alarms need to be installed to monitor the operability of the circuit. (Due date: August 31,2007)* The final root cause analysis will assess the extent of cause and the extent of condition.
a focused discussion
7 Organizational and Programmatic Actions Corrective actions to be taken for each root and contributing cause for the organizational and programmatic issues are as follows: The Plant Health Committee will review the organizational and programmatic root cause evaluation. (Due date: February 28, 2007)Root Cause 1 -Inconsistent equipment root cause of failure analysis (ERCFA) program management, resulting in varying degrees of ERCFA document quality.Corrective Actions Establish a single ERCFA program owner. (Due date: February 1, 2007)Establish performance indicators for the ERCFA program. (Due date: April 1, 2007)Perform an ERCFA program self assessment. (Due date: March 1, 2007)Include the ERCFA program in procedure 73DP-OAP05, Engineering Programs Management and Health Reporting. (Due date: March 31, 2007)Establish an ERCFA program improvement plan. (Due date: April 1, 2007)Root Cause 2 -Lack of a formal problem solving and trouble shooting process to establish ERCFA consistency.
at the conference  
Corrective Actions Develop a troubleshooting and problem solving process to be used by Operations, Maintenance and Engineering.
on the safety significance  
Include as part of this process guidance and directions on when and how to develop specific work or troubleshooting instructions in the absence of component design instructions or information. (Due date: February 1, 2007)Revise the ERCFA program to include the new problem solving process. (Due date: March 1, 2007)Provide training on the new process to selected Operations, Maintenance and Engineering personnel. (Due date: May 1, 2007)Root Cause 3 -Inconsistent management reinforcement of ERCFA application methodology, which resulted in inconsistent quality of root cause of failure determinations.
of the EDG K-1 relay failures.  
Corrective Actions Establish a quality checklist for EPRB and Corrective Action Review Board review of ERCFA evaluations. (Due date: February 10, 2007)8 Ensure samples of ERCFA evaluations are included in the next EPRB meeting (Due date: February 10, 2007)The existing Leadership Performance Improvement Plan has established common goals and the expectation that the highest standards of performance and personal accountability be pursued.Root Cause 4 -Inconsistent application of a systematic problem solving methodology, resulting in varying degrees of rigor in root cause of failure determinations and quality of documented evaluations.
APS is providing  
Corrective Actions Provide training to selected Operations, Maintenance and Engineering personnel on the new troubleshooting and problem solving process. (Due date: May 1, 2007)Contributing Cause 1 -Inconsistent consideration of all failure modes.Corrective Actions Provide training to ERCFA qualified engineers that will include the need to consider all failure modes as part of initial troubleshooting and root cause activities. (Due date: May 1, 2007)Contributing Cause 2 -No continuing training on equipment failure analysis methodology.
our position on the findings as well as the causes and corrective  
Corrective Actions Provide training to ERCFA-qualified engineers on changes to the ERCFA program. (Due date: May 1, 2007)Establish periodic ERCFA industry events training. (Due date: March 31, 2007)Contributing Cause 3 -Operating experience was not typically reviewed prior to opening components for analysis.Corrective Actions Provide training to ERCFA qualified engineers that will include reviewing any applicable Operating Experience as part of the initial troubleshooting and root cause activities. (Due date: May 1, 2007)Contributing Cause 4 -Program to Program interface between Corrective Maintenance and ERCFA programs allows the ERCFA process to be bypassed in some cases.9 Corrective Actions Perform a self assessment of ERCFA program, including corrective maintenance interface, to identify appropriate corrective actions. (Due date: March 1, 2007)Contributing Cause 5 -Inconsistent priority has been given to the equipment failure analyses, resulting in less than acceptable documentation.
actions which have been or will be taken.A member of the STARS (Strategic  
Corrective Actions Provide training to ERCFA qualified engineers that will include a discussion of establishing appropriate priority to ensure a quality analysis. (Due date: May 1, 2007)The existing Leadership Performance Improvement Plan has established common goals and the expectation that the highest standards of performance and personal accountability be pursued.Contributing Cause 6 -Inappropriate accountability for meeting both timeliness and quality expectations on ERCFA determinations.
Teaming and Resource Sharing) Alliance J gI0 J Callaway 0 Comanche Peak 0 Diablo Canyon * Palo Verde 0 South Texas Project & Wolf Creek  
Corrective Actions Establish a single ERCFA program owner. (Due date: February 1, 2007)Provide training to ERCFA qualified engineers that will include a discussion of accountability and expectations for both quality and timeliness. (Due date: May 1, 2007)The existing Leadership Performance Improvement Plan has established common goals and the expectation that the highest standards of performance and personal accountability be pursued.An effectiveness review will be established to ensure the equipment failure analysis program improvements are achieving the desired results. Performance indicators established as part of the above corrective actions will be used to monitor performance.
U.S. Nuclear Regulatory  
4 -Conclusion APS realizes the troubleshooting and problem solving process lacked the technical rigor necessary to ensure deficiencies were properly identified and resolved the first time. In this case, the failure to consider all possible causes of the July K-1 relay failure resulted in a subsequent failure in September.
Commission
Immediate actions have been taken to assure the EDGs remain within their design basis. Additional actions to address the programmatic weaknesses have been identified to provide greater assurance that similar events will not occur.10}}
ATTN: Document Control Desk APS Response to NRC Inspection  
Report 05000528/2006012;  
05000529/2006012;
05000530/2006012
Page 2 NRC letter dated December 22, 2006, which communicated  
the results of the Regulatory  
Conference  
on the Spray Pond operability  
issue, re-iterated  
the NRC's concern about the continuing  
occurrence  
of problem identification, root cause analysis and technical  
rigor issues. APS recognizes  
that this is another such example and is committed  
to improving  
its performance  
in these areas.APS realizes the troubleshooting  
and problem solving process lacked the technical  
rigor necessary  
to ensure deficiencies  
were properly identified  
and resolved the first time. In this case, the failure to consider all possible causes of the July K-1 relay failure resulted in a subsequent  
failure in September.  
Immediate  
actions have been taken to assure the EDGs remain within their design basis. Additional  
actions to address the programmatic
weaknesses  
have been identified.  
We continue to implement, reinforce, monitor and adjust our performance  
improvement  
plan to provide greater confidence  
that similar events will not recur.The Enclosure  
to this letter contains a summary of APS' preliminary  
root cause of failure evaluation  
for the K-1 relay and also includes a response to the two apparent violations.
Finally, APS intends to supplement  
Licensee Event Report (LER) 2006-006 to reflect the results of the K-1 relay failure investigation.
The actions described  
in the Enclosure  
represent  
corrective  
actions and are not regulatory  
commitments.  
There are no regulatory  
commitments  
in this letter. If you have any questions, please contact James A. Proctor at (623) 393-5730.Sincerely, JMLISAB/JAP/CJS/gt
Enclosure:  
Summary of APS Investigation  
into Unit 3, Train A, Emergency  
Diesel Generator (EDG) K-1 Relay Failures and Corrective  
Actions cc: B. S. Mallett NRC Region IV Regional Administrator
M. B. Fields NRC NRR Project Manager M. T. Markley NRC NRR Project Manager G. G. Warnick NRC Senior Resident Inspector  
for PVNGS  
ENCLOSURE Summary of APS Investigation  
into Unit 3, Train A Emergency  
Diesel Generator (EDG) K-1 Relay Failures And Corrective  
Actions 1 -Introduction
The following  
describes  
the sequence of events and establishes  
the context for the K-1 relay failure on September  
22, 2006. This discussion  
is not intended to justify the recurrence  
of the failure, but to establish  
that APS personnel  
acted in good faith, though in retrospect, with less than adequate rigor, to identify the apparent cause of failure in July 2006. The EDG was tested repeatedly  
to confirm that the apparent cause had been addressed  
before it was returned to service in July 2006.The root cause analysis of the September  
22, 2006 failure of the replacement  
K-1 relay was determined  
to be inadequate  
auxiliary  
dc contact 'compression.'  
The symptoms of failure in July, however, could be explained  
by contact oxidation, and the contact'compression'  
issue did not reveal itself during repeated testing prior to returning  
the relay to service in July 2006. This is explained  
in more detail in the following  
sections.At the time of the original K-1 relay failure on July 25, 2006, there had not been a failure of an EDG to produce any output voltage following  
a start in the emergency  
mode in over 3,000 starts since 1990, when a database was initiated  
to track EDG start history.A written troubleshooting  
plan was developed  
on July 25, 2006 by personnel  
with over 40 years combined EDG experience.  
When it was identified  
that the K-1 relay was the cause of the EDG to not produce output voltage, the 'original'  
K-1 relay was removed and segregated  
for failure analysis.  
The maintenance  
practice had been to replace the K-1 relay unit, due to its hybrid design, and not to perform maintenance  
on the K-1 relay.APS had already planned to replace all of the EDG automatic  
voltage regulators, including  
replacement  
of the K-1 relays, with a different  
design during the next refueling outage for each unit beginning  
with Unit 1 in the spring of 2007. The planned replacements  
were for a variety of reasons, including  
the inability  
to obtain replacement
parts, since the component  
parts of the K-1 relay are no longer being manufactured  
and there were limited spares available.
When the first replacement  
K-1 relay was obtained from the warehouse (one of two remaining  
spares) it exhibited  
symptoms of auxiliary  
dc contact oxidation, which was not unexpected  
due to the relay being stored for about 20 years. Initial attempts to remove this oxidation  
by non-intrusive  
methods were not entirely successful, and the last replacement  
spare in the warehouse  
was judged to not be suitable for use, due to a warped relay cover and apparent auxiliary  
dc contact oxidation.  
It was only after these efforts that APS personnel  
had no recourse but to disassemble  
the auxiliary  
dc contact assembly on the first replacement  
relay to perform more extensive  
contact cleaning.1  
Before disassembly  
of the replacement  
dc auxiliary  
contact assembly, a dc auxiliary contact assembly from a training relay was disassembled  
with the craft to ensure adequate knowledge  
of the device. Maintenance  
and Engineering  
personnel  
practiced repeated disassemblies  
and reassemblies  
of the auxiliary  
dc contact assembly for approximately  
2 hours to ensure proficiency.  
There was no attempt to change the configuration  
of the relay, just an effort to clean the contacts, following  
disassembly.
Following  
the corrective  
maintenance  
on the replacement  
relay, approximately  
10 manual actuation  
tests and 3 electrical  
functional  
tests verified that the replacement
relay was performing  
properly.  
A successful  
maintenance  
run of the Unit 3, Train A, EDG was performed  
and the EDG passed a Technical  
Specification  
surveillance  
test, before being declared operable.  
During these retests, there was no indication  
of erratic dc auxiliary  
contact assembly operation.  
The Unit 3, Train A, EDG was subsequently
successfully  
tested on August 7 and 24, 2006 and finally on September  
4, 2006. Our root cause efforts have led us to the conclusion  
that the relay failed to properly reset after the September  
4, 2006 test. This failure led to the September  
22, 2006 failure of the Unit 3, Train A, EDG.In retrospect, APS acknowledges, as described  
in the inspection  
report "the licensee's
problem analysis efforts were narrowly focused, which led them to conclude that the cause of the erratic dc auxiliary  
switch operation  
was oxidized contacts." (Page 7 of IR enclosure)  
The failure mechanism  
of inadequate  
contact 'compression'  
of September 22, 2006 did not reveal itself during repeated maintenance  
and post-installation  
tests following  
cleaning of the contacts in July 2006. Subsequent  
testing demonstrated  
that inadequate  
contact 'compression'  
was the cause of the September  
22, 2006 failure.The latent failure mechanism  
did ultimately  
reveal itself as part of the routine testing protocols  
implemented  
as part of normal plant operations.  
It should be noted that subsequent  
root cause of failure testing of the replacement  
K-1 relay determined  
that the mean number of operations  
before failure was 58 cycles, with a minimum of 0 and a maximum of 323 cycles.2 -Summary of Emergency  
Diesel Generator  
K-1 Relay Root Cause of Failure (RCF) Testing and Evaluation
The testing developed  
to determine  
the root cause of failure of the September  
22, 2006 event was performed  
in the following  
five phases: 1. Verified the integrity  
of circuits external to the K-1 relay to conclusively  
determine that the K-1 relay is the appropriate  
focus of this root cause investigation.
2. Performed  
a physical and dimensional  
comparison  
of the K-1 relay as well as other training, failed or spare K-1 relays.2  
3. Determined  
if temperature  
can affect the K-1 relay dc auxiliary  
contact assembly performance  
such that it can cause dimensional  
tolerances  
to grow and open closed contacts.4. Electrically  
cycled the K-1 relay to determine  
if the September  
22, 2006 event could be repeated.5. Performed  
an internal inspection  
of the September  
22, 2006 dc auxiliary  
contact assembly.The root cause of failure testing of the K-1 relay produced the following  
substantive
conclusions: " The K-1 relay failure to reset was repeatable.
* A troubleshooting  
event, where a technician  
made contact with a terminal (wire number 69) on the dc auxiliary  
contact assembly which caused the K-1 relay to reset on September  
22, 2006, was repeatable.  
The K-1 testing revealed that when the K-1 relay failed to reset, it was then possible to manipulate  
wire number 69 which caused the dc auxiliary  
normally open contact to open and close. It was noted during an internal inspection  
of the dc auxiliary  
contact assembly that the lower terminal on this device was loose due to an anomaly in the molded enclosure.  
Straightening  
the K-1 relay metal actuator arm rectified  
the loose terminal by applying positive pressure to the stationary  
and movable contacts of the dc auxiliary  
contact assembly.* Dimensional  
comparison  
showed that the root cause of failure of the September 22, 2006 event was due to the accumulation  
of tolerances  
associated  
with the various components  
that make-up the K-1 relay (inadequate  
auxiliary  
dc contact'compression')." With the K-1 relay in the latched state, and the position of the dc auxiliary  
contact assembly normally open contact marginally  
closed, an increase of temperature
did not provide the needed external stimulus to cause a change of state with this contact." During the cycle testing, there were no failures noted when the K-1 relay's dc auxiliary  
contact assembly's  
normally open contact closed and then subsequently
opened. It was observed that during a failure, the normally open contact simply did not make-up when the K-1 relay was latched.* The decision made by engineering, after the September  
22, 2006 failure, to 'field straighten'  
the metal actuator was correct. This compensates  
for any tolerance stack-up issues. This configuration  
was evaluated  
both dimensionally  
and by cycling a K-1 relay.3  
In summary, the root cause of failure analysis concluded  
that the Unit 3, Train A, EDG was inoperable  
from September  
4 to September  
22, 2006 as the K-1 relay dc auxiliary contact assembly contacts did not close properly following  
the September  
4th shutdown of the EDG. The auxiliary  
dc contacts were stable (i.e., would not change state due to physical contact) if they properly reset, but were unstable (i.e., easily changed state with physical contact) if they failed to properly reset.3 -Response to Apparent Violations
This section sets forth APS' position on the two apparent violations  
and summarizes
corrective  
actions taken or planned that are directly related to the apparent violations.
Apparent Violation  
of 10 CFR 50, Appendix B, Criterion  
V, "Instructions, Procedures, and Drawings" Restatement  
of Apparent Violation 10 CFR Part 50, Appendix B, Criterion  
V, "Instructions, Procedures, and Drawings," states, in part, that activities  
affecting  
quality shall be prescribed  
by documented
instructions, procedures, or drawings of a type appropriate  
to the circumstances  
and shall be accomplished  
in accordance  
with these instructions, procedures, or drawings.Contrary to this, the licensee failed to develop appropriate  
instructions  
or procedures  
for corrective  
maintenance  
activities  
on the Unit 3 Train A EDG K-1 relay. This failure resulted in the Unit 3 Train A EDG being inoperable  
between September  
4 and 22, 2006. This item has been entered into the licensee's  
corrective  
action program as Condition  
Report/Disposition  
Request (CRDR) 2926830. Pending determination  
of safety significance, this finding is identified  
as an apparent violation (AV)5000530/2006012-01, "Failure to Establish  
Appropriate  
Instructions." Apparent Violation  
of 10 CFR 50, Appendix B, Criterion  
XVI, "Corrective  
Actions" Restatement  
of Apparent Violation 10 CFR Part 50, Appendix B, Criterion  
XVI, "Corrective  
Action," states, in part, that measures shall be established  
to assure that conditions  
adverse to quality, such as failures, malfunctions, deficiencies, deviations, defective  
material and equipment, and nonconformances  
are promptly identified  
and corrected  
and for significant  
conditions
adverse to quality, measures shall assure that the cause of the condition  
is determined
and corrective  
action taken to preclude repetition.  
Contrary to this, the licensee failed to identify and correct the cause of the erratic EDG K-1 relay operation  
prior to installation
of the relay on July 26, 2006. This failure resulted in the Unit 3 Train A EDG being inoperable  
between September  
4 and 22, 2006. This item has been entered into the licensee's  
corrective  
action program as CRDR 2926830. Pending determination  
of safety significance, this finding is identified  
as AV 05000530/2006012-02, "Failure to Identify and Correct a Condition  
Adverse to Quality." 4  
Admission APS admits these apparent violations.
Cause Personnel  
attempted  
to address the most likely cause of the failure first, and to determine  
if this first cause, when corrected, addressed  
the problem. This approach tries to ensure that actual causes are not masked by performing  
multiple actions and then not knowing which action was the reason for solving the apparent problem. This narrow approach, however, did not consider all potential  
failure possibilities  
and was not successful  
in identifying  
the latent flaw in the set-up of the K-1 replacement  
relay.When contact cleaning appeared to correct the performance  
problem, as evidenced  
by repeated manual, electrical  
and in-service  
tests, it was thought that the cause had been addressed.
APS acknowledges  
the problem analysis efforts were narrowly focused, which led to an incorrect  
conclusion  
that the cause of the erratic relay operation  
was solely due to oxidized contacts.  
During the investigation  
following  
the September  
22, 2006 failure, it was determined  
that an opportunity  
to identify the cause of failure was missed early in the July 25 investigation.  
There was a lack of communication  
between organizations (i.e., technicians  
and engineers)  
during the troubleshooting  
and relay installation
process, which may have led to a realization  
that more than contact oxidation  
was involved.APS recognized  
that this failure to correctly  
identify the cause of the July 25, 2006 Unit 3, Train A, EDG K-1 relay failure had potential  
organizational  
and programmatic
elements and initiated  
a specific investigation (CRDR 2950124) to identify any such issues. The analysis methodology  
and preliminary  
results of this investigation  
are provided to demonstrate  
that APS recognizes  
that this specific equipment  
failure is an opportunity  
to reassess the adequacy of the problem analysis, troubleshooting  
and cause evaluation  
processes  
being implemented  
at Palo Verde.Instead of simply extrapolating  
the organizational  
and programmatic (O&P) root causes from this single event, the investigation  
team took a broader look at all the barriers to minimize error in problem solving and root cause analysis at PVNGS in general. An independent  
consultant  
with failure mode analysis and O&P experience  
was used in the assessment, working in conjunction  
with PVNGS personnel.
The team identified  
the possible failure modes that result in inconsistent  
use of the problem solving methodology  
to solve equipment  
failures and evaluated  
their effects.The team evaluated  
the effectiveness  
of barriers and determined  
their contribution  
to ineffective  
cause analysis.  
Finally, the team applied stream analysis to determine cause-effect  
relationships  
among the organization  
and programmatic  
failure modes.5  
In summary, the team has preliminarily  
identified  
4 organizational  
and programmatic
'root' causes and 6 'contributing'  
causes. The root causes were: 1. Inconsistent  
equipment  
root cause of failure program management, resulting  
in varying degrees of document quality.2. No formal problem solving and troubleshooting  
process in place to establish evaluation  
consistency.
3. Inconsistent  
management  
reinforcement  
of the equipment  
root cause of failure application  
methodology  
resulting  
in inconsistent  
evaluation  
quality.4. Inconsistent  
application  
of a problem solving methodology, as evidenced  
by varying degrees of rigor in root cause of failure determinations.
The contributing  
causes were: 1. Inconsistent  
consideration  
of all failure modes (lack of formal troubleshooting  
and problem solving process).2. No continuing  
training on equipment  
failure analysis methodology.
3. Operating  
experience  
was not typically  
reviewed prior to opening components  
for analysis (lack of formal problem solving and troubleshooting  
process).4. Program to program interface  
between corrective  
maintenance  
and the equipment  
root cause of failure programs has allowed the equipment  
root cause of failure process to be bypassed in some cases.5. Inconsistent  
priority has been given to the equipment  
failure analyses, resulting in less than acceptable  
documentation.
6. Inappropriate  
accountability  
for meeting both timeliness  
and quality expectations
on equipment  
failure determinations.
Corrective  
Actions Taken and Results Achieved Equipment  
Actions Corrective  
actions involved mechanical  
adjustments  
to the relay actuating  
arm to provide adequate auxiliary  
contact compression.  
Additional  
corrective  
actions included inspecting, cleaning, and making mechanical  
adjustments, as necessary, to all other affected EDG K-1 relays consistent  
with detailed methodology  
that determined  
the proper amount of contact 'compression.'  
These actions corrected  
the direct cause of the September  
22, 2006 Unit 3, Train A, EDG failure.Orqanizational  
and Programmatic  
Actions Completed  
and on-going corrective  
actions for the root and contributing  
causes of the organizational  
and programmatic  
issues include elements of the Performance
Improvement  
Plan and review of ERCFA reports by the Engineering  
Product Review Board (EPRB).6  
The Operations  
decision making process (ODP-16) and the Engineering  
human performance  
tools (EDG-01 and 02) are considered  
sufficient  
interim action until the more extensive  
planned actions are fully implemented.
Corrective  
Actions to Be Taken Equipment  
Actions APS plans to replace all of the EDG automatic  
voltage regulators, including  
replacement
of the K-1 relays, with a different  
design during the next refueling  
outage for each unit.This is the longer-term  
equipment  
corrective  
action to prevent recurrence.
Related longer-term  
corrective  
actions resulting  
from the root cause of failure analysis that address precluding  
event recurrence  
include:* Perform a review of the proposed modification  
that will change the control and power components  
of the diesel generator  
excitation  
systems. This review will include: o Method of de-excitation (e.g., type of relay to be used for field shorting, other types of field shorting devices used in the industry, reset circuit logic, circuit redundancy, etc.)o Evaluate the need for an annunciator  
and alarm circuit on the field shorting relay.o Completeness  
of vendor documentation (e.g., complete set of control documentation  
from the prime and sub-component  
suppliers, etc.)o PM's for the various newly installed  
components (e.g., PM's on K-1 relay including  
verification  
of sufficient  
contact 'compression.').(Due date: May 1, 2007)* Incorporate  
the findings from the failure analysis into the current K-1 documentation (e.g., technical  
manuals, operating  
procedures, etc.) (Due date: August 31, 2007)" Evaluate the need for the field shorting components  
of the diesel generator excitation  
system. (Due date: August 31, 2007)* Evaluate installing  
a jumper across the dc auxiliary  
contact assembly normally open contact to eliminate  
the potential  
of the contact failing to close and making the EDG inoperable. (Due date: August 31, 2007)" Assess if there are other similar safety-related  
circuits in the units where alarms need to be installed  
to monitor the operability  
of the circuit. (Due date: August 31,2007)* The final root cause analysis will assess the extent of cause and the extent of condition.
7  
Organizational  
and Programmatic  
Actions Corrective  
actions to be taken for each root and contributing  
cause for the organizational  
and programmatic  
issues are as follows: The Plant Health Committee  
will review the organizational  
and programmatic  
root cause evaluation. (Due date: February 28, 2007)Root Cause 1 -Inconsistent  
equipment  
root cause of failure analysis (ERCFA) program management, resulting  
in varying degrees of ERCFA document quality.Corrective  
Actions Establish  
a single ERCFA program owner. (Due date: February 1, 2007)Establish  
performance  
indicators  
for the ERCFA program. (Due date: April 1, 2007)Perform an ERCFA program self assessment. (Due date: March 1, 2007)Include the ERCFA program in procedure  
73DP-OAP05, Engineering  
Programs Management  
and Health Reporting. (Due date: March 31, 2007)Establish  
an ERCFA program improvement  
plan. (Due date: April 1, 2007)Root Cause 2 -Lack of a formal problem solving and trouble shooting process to establish  
ERCFA consistency.
Corrective  
Actions Develop a troubleshooting  
and problem solving process to be used by Operations, Maintenance  
and Engineering.  
Include as part of this process guidance and directions  
on when and how to develop specific work or troubleshooting  
instructions  
in the absence of component  
design instructions  
or information. (Due date: February 1, 2007)Revise the ERCFA program to include the new problem solving process. (Due date: March 1, 2007)Provide training on the new process to selected Operations, Maintenance  
and Engineering  
personnel. (Due date: May 1, 2007)Root Cause 3 -Inconsistent  
management  
reinforcement  
of ERCFA application
methodology, which resulted in inconsistent  
quality of root cause of failure determinations.
Corrective  
Actions Establish  
a quality checklist  
for EPRB and Corrective  
Action Review Board review of ERCFA evaluations. (Due date: February 10, 2007)8  
Ensure samples of ERCFA evaluations  
are included in the next EPRB meeting (Due date: February 10, 2007)The existing Leadership  
Performance  
Improvement  
Plan has established
common goals and the expectation  
that the highest standards  
of performance
and personal accountability  
be pursued.Root Cause 4 -Inconsistent  
application  
of a systematic  
problem solving methodology, resulting  
in varying degrees of rigor in root cause of failure determinations  
and quality of documented  
evaluations.
Corrective  
Actions Provide training to selected Operations, Maintenance  
and Engineering  
personnel on the new troubleshooting  
and problem solving process. (Due date: May 1, 2007)Contributing  
Cause 1 -Inconsistent  
consideration  
of all failure modes.Corrective  
Actions Provide training to ERCFA qualified  
engineers  
that will include the need to consider all failure modes as part of initial troubleshooting  
and root cause activities. (Due date: May 1, 2007)Contributing  
Cause 2 -No continuing  
training on equipment  
failure analysis methodology.
Corrective  
Actions Provide training to ERCFA-qualified  
engineers  
on changes to the ERCFA program. (Due date: May 1, 2007)Establish  
periodic ERCFA industry events training. (Due date: March 31, 2007)Contributing  
Cause 3 -Operating  
experience  
was not typically  
reviewed prior to opening components  
for analysis.Corrective  
Actions Provide training to ERCFA qualified  
engineers  
that will include reviewing  
any applicable  
Operating  
Experience  
as part of the initial troubleshooting  
and root cause activities. (Due date: May 1, 2007)Contributing  
Cause 4 -Program to Program interface  
between Corrective  
Maintenance
and ERCFA programs allows the ERCFA process to be bypassed in some cases.9  
Corrective  
Actions Perform a self assessment  
of ERCFA program, including  
corrective  
maintenance
interface, to identify appropriate  
corrective  
actions. (Due date: March 1, 2007)Contributing  
Cause 5 -Inconsistent  
priority has been given to the equipment  
failure analyses, resulting  
in less than acceptable  
documentation.
Corrective  
Actions Provide training to ERCFA qualified  
engineers  
that will include a discussion  
of establishing  
appropriate  
priority to ensure a quality analysis. (Due date: May 1, 2007)The existing Leadership  
Performance  
Improvement  
Plan has established
common goals and the expectation  
that the highest standards  
of performance
and personal accountability  
be pursued.Contributing  
Cause 6 -Inappropriate  
accountability  
for meeting both timeliness  
and quality expectations  
on ERCFA determinations.
Corrective  
Actions Establish  
a single ERCFA program owner. (Due date: February 1, 2007)Provide training to ERCFA qualified  
engineers  
that will include a discussion  
of accountability  
and expectations  
for both quality and timeliness. (Due date: May 1, 2007)The existing Leadership  
Performance  
Improvement  
Plan has established
common goals and the expectation  
that the highest standards  
of performance
and personal accountability  
be pursued.An effectiveness  
review will be established  
to ensure the equipment  
failure analysis program improvements  
are achieving  
the desired results. Performance  
indicators
established  
as part of the above corrective  
actions will be used to monitor performance.
4 -Conclusion
APS realizes the troubleshooting  
and problem solving process lacked the technical  
rigor necessary  
to ensure deficiencies  
were properly identified  
and resolved the first time. In this case, the failure to consider all possible causes of the July K-1 relay failure resulted in a subsequent  
failure in September.  
Immediate  
actions have been taken to assure the EDGs remain within their design basis. Additional  
actions to address the programmatic
weaknesses  
have been identified  
to provide greater assurance  
that similar events will not occur.10
}}

Revision as of 04:06, 18 September 2019

APS Response to NRC Inspection Reports 05000528-06-012, 05000529-06-012 and 05000530-06-012, Corrected Copy
ML070180148
Person / Time
Site: Palo Verde  
Issue date: 01/09/2007
From: Mauldin D
Arizona Public Service Co
To:
Document Control Desk, NRC Region 4
References
102-05626-CDM/SAB/JAP/CJS IR-06-012
Download: ML070180148 (12)


Text

CORRECTED COPY L A M A subsidiary of Pinnacle West Capital Corporation David Mauldin Vice President Mail Station 7605 Palo Verde Nuclear Nuclear Engineering Tel: 623-393-5553 PO Box 52034 Generating Station and Support Fax: 623-393-6077 Phoenix, Arizona 85072-2034 102-05626-CDM/SAB/JAP/CJS January 09, 2007 U.S. Nuclear Regulatory Commission ATTN: Document Control Desk Washington, DC 20555

Dear Sir:

Subject:

Palo Verde Nuclear Generating Station (PVNGS)Units 1, 2 and 3 Docket Nos. STN 50-528, 50-529, and 50-530 APS Response to NRC Inspection Report 0500052812006012; 0500052912006012; 05000530/2006012 In NRC Special Inspection Report 2006012, dated December 6, 2006, the NRC documented their examination of activities associated with the PVNGS Unit 3, Train A, emergency diesel generator (EDG) failures that occurred on July 25 and September 22, 2006. On both occasions the EDG failed to produce an output voltage during testing.The report discusses two findings.The two findings were (1) a lack of adequate instructions for corrective maintenance of the K-1 relay and (2) the failure to identify and correct the cause of erratic K-1 relay operation prior to installation of the spare relay on July 26, 2006. These two findings resulted in the Unit 3, Train A, EDG being inoperable from September 4 until September 22, 2006. APS has reviewed the NRC Inspection Report and has no substantive disagreement with the facts, as documented in the report.In accordance with the Inspection Manual Chapter 0609, the NRC is currently evaluating the safety significance of the findings.

At a January 16, 2007 Regulatory Conference in Arlington, Texas, APS will provide the NRC its perspective on the facts and analytical assumptions relevant to determining the safety significance of the findings.The purpose of this letter is to provide the results of APS' evaluation of the EDG K-1 relay failures in advance of the Regulatory Conference to facilitate a focused discussion at the conference on the safety significance of the EDG K-1 relay failures.

APS is providing our position on the findings as well as the causes and corrective actions which have been or will be taken.A member of the STARS (Strategic Teaming and Resource Sharing) Alliance J gI0 J Callaway 0 Comanche Peak 0 Diablo Canyon

  • Palo Verde 0 South Texas Project & Wolf Creek U.S. Nuclear Regulatory Commission ATTN: Document Control Desk APS Response to NRC Inspection Report 05000528/2006012; 05000529/2006012; 05000530/2006012 Page 2 NRC letter dated December 22, 2006, which communicated the results of the Regulatory Conference on the Spray Pond operability issue, re-iterated the NRC's concern about the continuing occurrence of problem identification, root cause analysis and technical rigor issues. APS recognizes that this is another such example and is committed to improving its performance in these areas.APS realizes the troubleshooting and problem solving process lacked the technical rigor necessary to ensure deficiencies were properly identified and resolved the first time. In this case, the failure to consider all possible causes of the July K-1 relay failure resulted in a subsequent failure in September.

Immediate actions have been taken to assure the EDGs remain within their design basis. Additional actions to address the programmatic weaknesses have been identified.

We continue to implement, reinforce, monitor and adjust our performance improvement plan to provide greater confidence that similar events will not recur.The Enclosure to this letter contains a summary of APS' preliminary root cause of failure evaluation for the K-1 relay and also includes a response to the two apparent violations.

Finally, APS intends to supplement Licensee Event Report (LER) 2006-006 to reflect the results of the K-1 relay failure investigation.

The actions described in the Enclosure represent corrective actions and are not regulatory commitments.

There are no regulatory commitments in this letter. If you have any questions, please contact James A. Proctor at (623) 393-5730.Sincerely, JMLISAB/JAP/CJS/gt

Enclosure:

Summary of APS Investigation into Unit 3, Train A, Emergency Diesel Generator (EDG) K-1 Relay Failures and Corrective Actions cc: B. S. Mallett NRC Region IV Regional Administrator M. B. Fields NRC NRR Project Manager M. T. Markley NRC NRR Project Manager G. G. Warnick NRC Senior Resident Inspector for PVNGS ENCLOSURE Summary of APS Investigation into Unit 3, Train A Emergency Diesel Generator (EDG) K-1 Relay Failures And Corrective Actions 1 -Introduction The following describes the sequence of events and establishes the context for the K-1 relay failure on September 22, 2006. This discussion is not intended to justify the recurrence of the failure, but to establish that APS personnel acted in good faith, though in retrospect, with less than adequate rigor, to identify the apparent cause of failure in July 2006. The EDG was tested repeatedly to confirm that the apparent cause had been addressed before it was returned to service in July 2006.The root cause analysis of the September 22, 2006 failure of the replacement K-1 relay was determined to be inadequate auxiliary dc contact 'compression.'

The symptoms of failure in July, however, could be explained by contact oxidation, and the contact'compression' issue did not reveal itself during repeated testing prior to returning the relay to service in July 2006. This is explained in more detail in the following sections.At the time of the original K-1 relay failure on July 25, 2006, there had not been a failure of an EDG to produce any output voltage following a start in the emergency mode in over 3,000 starts since 1990, when a database was initiated to track EDG start history.A written troubleshooting plan was developed on July 25, 2006 by personnel with over 40 years combined EDG experience.

When it was identified that the K-1 relay was the cause of the EDG to not produce output voltage, the 'original' K-1 relay was removed and segregated for failure analysis.

The maintenance practice had been to replace the K-1 relay unit, due to its hybrid design, and not to perform maintenance on the K-1 relay.APS had already planned to replace all of the EDG automatic voltage regulators, including replacement of the K-1 relays, with a different design during the next refueling outage for each unit beginning with Unit 1 in the spring of 2007. The planned replacements were for a variety of reasons, including the inability to obtain replacement parts, since the component parts of the K-1 relay are no longer being manufactured and there were limited spares available.

When the first replacement K-1 relay was obtained from the warehouse (one of two remaining spares) it exhibited symptoms of auxiliary dc contact oxidation, which was not unexpected due to the relay being stored for about 20 years. Initial attempts to remove this oxidation by non-intrusive methods were not entirely successful, and the last replacement spare in the warehouse was judged to not be suitable for use, due to a warped relay cover and apparent auxiliary dc contact oxidation.

It was only after these efforts that APS personnel had no recourse but to disassemble the auxiliary dc contact assembly on the first replacement relay to perform more extensive contact cleaning.1 Before disassembly of the replacement dc auxiliary contact assembly, a dc auxiliary contact assembly from a training relay was disassembled with the craft to ensure adequate knowledge of the device. Maintenance and Engineering personnel practiced repeated disassemblies and reassemblies of the auxiliary dc contact assembly for approximately 2 hours2.314815e-5 days <br />5.555556e-4 hours <br />3.306878e-6 weeks <br />7.61e-7 months <br /> to ensure proficiency.

There was no attempt to change the configuration of the relay, just an effort to clean the contacts, following disassembly.

Following the corrective maintenance on the replacement relay, approximately 10 manual actuation tests and 3 electrical functional tests verified that the replacement relay was performing properly.

A successful maintenance run of the Unit 3, Train A, EDG was performed and the EDG passed a Technical Specification surveillance test, before being declared operable.

During these retests, there was no indication of erratic dc auxiliary contact assembly operation.

The Unit 3, Train A, EDG was subsequently successfully tested on August 7 and 24, 2006 and finally on September 4, 2006. Our root cause efforts have led us to the conclusion that the relay failed to properly reset after the September 4, 2006 test. This failure led to the September 22, 2006 failure of the Unit 3, Train A, EDG.In retrospect, APS acknowledges, as described in the inspection report "the licensee's problem analysis efforts were narrowly focused, which led them to conclude that the cause of the erratic dc auxiliary switch operation was oxidized contacts." (Page 7 of IR enclosure)

The failure mechanism of inadequate contact 'compression' of September 22, 2006 did not reveal itself during repeated maintenance and post-installation tests following cleaning of the contacts in July 2006. Subsequent testing demonstrated that inadequate contact 'compression' was the cause of the September 22, 2006 failure.The latent failure mechanism did ultimately reveal itself as part of the routine testing protocols implemented as part of normal plant operations.

It should be noted that subsequent root cause of failure testing of the replacement K-1 relay determined that the mean number of operations before failure was 58 cycles, with a minimum of 0 and a maximum of 323 cycles.2 -Summary of Emergency Diesel Generator K-1 Relay Root Cause of Failure (RCF) Testing and Evaluation The testing developed to determine the root cause of failure of the September 22, 2006 event was performed in the following five phases: 1. Verified the integrity of circuits external to the K-1 relay to conclusively determine that the K-1 relay is the appropriate focus of this root cause investigation.

2. Performed a physical and dimensional comparison of the K-1 relay as well as other training, failed or spare K-1 relays.2
3. Determined if temperature can affect the K-1 relay dc auxiliary contact assembly performance such that it can cause dimensional tolerances to grow and open closed contacts.4. Electrically cycled the K-1 relay to determine if the September 22, 2006 event could be repeated.5. Performed an internal inspection of the September 22, 2006 dc auxiliary contact assembly.The root cause of failure testing of the K-1 relay produced the following substantive conclusions: " The K-1 relay failure to reset was repeatable.
  • A troubleshooting event, where a technician made contact with a terminal (wire number 69) on the dc auxiliary contact assembly which caused the K-1 relay to reset on September 22, 2006, was repeatable.

The K-1 testing revealed that when the K-1 relay failed to reset, it was then possible to manipulate wire number 69 which caused the dc auxiliary normally open contact to open and close. It was noted during an internal inspection of the dc auxiliary contact assembly that the lower terminal on this device was loose due to an anomaly in the molded enclosure.

Straightening the K-1 relay metal actuator arm rectified the loose terminal by applying positive pressure to the stationary and movable contacts of the dc auxiliary contact assembly.* Dimensional comparison showed that the root cause of failure of the September 22, 2006 event was due to the accumulation of tolerances associated with the various components that make-up the K-1 relay (inadequate auxiliary dc contact'compression')." With the K-1 relay in the latched state, and the position of the dc auxiliary contact assembly normally open contact marginally closed, an increase of temperature did not provide the needed external stimulus to cause a change of state with this contact." During the cycle testing, there were no failures noted when the K-1 relay's dc auxiliary contact assembly's normally open contact closed and then subsequently opened. It was observed that during a failure, the normally open contact simply did not make-up when the K-1 relay was latched.* The decision made by engineering, after the September 22, 2006 failure, to 'field straighten' the metal actuator was correct. This compensates for any tolerance stack-up issues. This configuration was evaluated both dimensionally and by cycling a K-1 relay.3 In summary, the root cause of failure analysis concluded that the Unit 3, Train A, EDG was inoperable from September 4 to September 22, 2006 as the K-1 relay dc auxiliary contact assembly contacts did not close properly following the September 4th shutdown of the EDG. The auxiliary dc contacts were stable (i.e., would not change state due to physical contact) if they properly reset, but were unstable (i.e., easily changed state with physical contact) if they failed to properly reset.3 -Response to Apparent Violations This section sets forth APS' position on the two apparent violations and summarizes corrective actions taken or planned that are directly related to the apparent violations.

Apparent Violation of 10 CFR 50, Appendix B, Criterion V, "Instructions, Procedures, and Drawings" Restatement of Apparent Violation 10 CFR Part 50, Appendix B, Criterion V, "Instructions, Procedures, and Drawings," states, in part, that activities affecting quality shall be prescribed by documented instructions, procedures, or drawings of a type appropriate to the circumstances and shall be accomplished in accordance with these instructions, procedures, or drawings.Contrary to this, the licensee failed to develop appropriate instructions or procedures for corrective maintenance activities on the Unit 3 Train A EDG K-1 relay. This failure resulted in the Unit 3 Train A EDG being inoperable between September 4 and 22, 2006. This item has been entered into the licensee's corrective action program as Condition Report/Disposition Request (CRDR) 2926830. Pending determination of safety significance, this finding is identified as an apparent violation (AV)5000530/2006012-01, "Failure to Establish Appropriate Instructions." Apparent Violation of 10 CFR 50, Appendix B, Criterion XVI, "Corrective Actions" Restatement of Apparent Violation 10 CFR Part 50, Appendix B, Criterion XVI, "Corrective Action," states, in part, that measures shall be established to assure that conditions adverse to quality, such as failures, malfunctions, deficiencies, deviations, defective material and equipment, and nonconformances are promptly identified and corrected and for significant conditions adverse to quality, measures shall assure that the cause of the condition is determined and corrective action taken to preclude repetition.

Contrary to this, the licensee failed to identify and correct the cause of the erratic EDG K-1 relay operation prior to installation of the relay on July 26, 2006. This failure resulted in the Unit 3 Train A EDG being inoperable between September 4 and 22, 2006. This item has been entered into the licensee's corrective action program as CRDR 2926830. Pending determination of safety significance, this finding is identified as AV 05000530/2006012-02, "Failure to Identify and Correct a Condition Adverse to Quality." 4 Admission APS admits these apparent violations.

Cause Personnel attempted to address the most likely cause of the failure first, and to determine if this first cause, when corrected, addressed the problem. This approach tries to ensure that actual causes are not masked by performing multiple actions and then not knowing which action was the reason for solving the apparent problem. This narrow approach, however, did not consider all potential failure possibilities and was not successful in identifying the latent flaw in the set-up of the K-1 replacement relay.When contact cleaning appeared to correct the performance problem, as evidenced by repeated manual, electrical and in-service tests, it was thought that the cause had been addressed.

APS acknowledges the problem analysis efforts were narrowly focused, which led to an incorrect conclusion that the cause of the erratic relay operation was solely due to oxidized contacts.

During the investigation following the September 22, 2006 failure, it was determined that an opportunity to identify the cause of failure was missed early in the July 25 investigation.

There was a lack of communication between organizations (i.e., technicians and engineers) during the troubleshooting and relay installation process, which may have led to a realization that more than contact oxidation was involved.APS recognized that this failure to correctly identify the cause of the July 25, 2006 Unit 3, Train A, EDG K-1 relay failure had potential organizational and programmatic elements and initiated a specific investigation (CRDR 2950124) to identify any such issues. The analysis methodology and preliminary results of this investigation are provided to demonstrate that APS recognizes that this specific equipment failure is an opportunity to reassess the adequacy of the problem analysis, troubleshooting and cause evaluation processes being implemented at Palo Verde.Instead of simply extrapolating the organizational and programmatic (O&P) root causes from this single event, the investigation team took a broader look at all the barriers to minimize error in problem solving and root cause analysis at PVNGS in general. An independent consultant with failure mode analysis and O&P experience was used in the assessment, working in conjunction with PVNGS personnel.

The team identified the possible failure modes that result in inconsistent use of the problem solving methodology to solve equipment failures and evaluated their effects.The team evaluated the effectiveness of barriers and determined their contribution to ineffective cause analysis.

Finally, the team applied stream analysis to determine cause-effect relationships among the organization and programmatic failure modes.5 In summary, the team has preliminarily identified 4 organizational and programmatic

'root' causes and 6 'contributing' causes. The root causes were: 1. Inconsistent equipment root cause of failure program management, resulting in varying degrees of document quality.2. No formal problem solving and troubleshooting process in place to establish evaluation consistency.

3. Inconsistent management reinforcement of the equipment root cause of failure application methodology resulting in inconsistent evaluation quality.4. Inconsistent application of a problem solving methodology, as evidenced by varying degrees of rigor in root cause of failure determinations.

The contributing causes were: 1. Inconsistent consideration of all failure modes (lack of formal troubleshooting and problem solving process).2. No continuing training on equipment failure analysis methodology.

3. Operating experience was not typically reviewed prior to opening components for analysis (lack of formal problem solving and troubleshooting process).4. Program to program interface between corrective maintenance and the equipment root cause of failure programs has allowed the equipment root cause of failure process to be bypassed in some cases.5. Inconsistent priority has been given to the equipment failure analyses, resulting in less than acceptable documentation.
6. Inappropriate accountability for meeting both timeliness and quality expectations on equipment failure determinations.

Corrective Actions Taken and Results Achieved Equipment Actions Corrective actions involved mechanical adjustments to the relay actuating arm to provide adequate auxiliary contact compression.

Additional corrective actions included inspecting, cleaning, and making mechanical adjustments, as necessary, to all other affected EDG K-1 relays consistent with detailed methodology that determined the proper amount of contact 'compression.'

These actions corrected the direct cause of the September 22, 2006 Unit 3, Train A, EDG failure.Orqanizational and Programmatic Actions Completed and on-going corrective actions for the root and contributing causes of the organizational and programmatic issues include elements of the Performance Improvement Plan and review of ERCFA reports by the Engineering Product Review Board (EPRB).6 The Operations decision making process (ODP-16) and the Engineering human performance tools (EDG-01 and 02) are considered sufficient interim action until the more extensive planned actions are fully implemented.

Corrective Actions to Be Taken Equipment Actions APS plans to replace all of the EDG automatic voltage regulators, including replacement of the K-1 relays, with a different design during the next refueling outage for each unit.This is the longer-term equipment corrective action to prevent recurrence.

Related longer-term corrective actions resulting from the root cause of failure analysis that address precluding event recurrence include:* Perform a review of the proposed modification that will change the control and power components of the diesel generator excitation systems. This review will include: o Method of de-excitation (e.g., type of relay to be used for field shorting, other types of field shorting devices used in the industry, reset circuit logic, circuit redundancy, etc.)o Evaluate the need for an annunciator and alarm circuit on the field shorting relay.o Completeness of vendor documentation (e.g., complete set of control documentation from the prime and sub-component suppliers, etc.)o PM's for the various newly installed components (e.g., PM's on K-1 relay including verification of sufficient contact 'compression.').(Due date: May 1, 2007)* Incorporate the findings from the failure analysis into the current K-1 documentation (e.g., technical manuals, operating procedures, etc.) (Due date: August 31, 2007)" Evaluate the need for the field shorting components of the diesel generator excitation system. (Due date: August 31, 2007)* Evaluate installing a jumper across the dc auxiliary contact assembly normally open contact to eliminate the potential of the contact failing to close and making the EDG inoperable. (Due date: August 31, 2007)" Assess if there are other similar safety-related circuits in the units where alarms need to be installed to monitor the operability of the circuit. (Due date: August 31,2007)* The final root cause analysis will assess the extent of cause and the extent of condition.

7 Organizational and Programmatic Actions Corrective actions to be taken for each root and contributing cause for the organizational and programmatic issues are as follows: The Plant Health Committee will review the organizational and programmatic root cause evaluation. (Due date: February 28, 2007)Root Cause 1 -Inconsistent equipment root cause of failure analysis (ERCFA) program management, resulting in varying degrees of ERCFA document quality.Corrective Actions Establish a single ERCFA program owner. (Due date: February 1, 2007)Establish performance indicators for the ERCFA program. (Due date: April 1, 2007)Perform an ERCFA program self assessment. (Due date: March 1, 2007)Include the ERCFA program in procedure 73DP-OAP05, Engineering Programs Management and Health Reporting. (Due date: March 31, 2007)Establish an ERCFA program improvement plan. (Due date: April 1, 2007)Root Cause 2 -Lack of a formal problem solving and trouble shooting process to establish ERCFA consistency.

Corrective Actions Develop a troubleshooting and problem solving process to be used by Operations, Maintenance and Engineering.

Include as part of this process guidance and directions on when and how to develop specific work or troubleshooting instructions in the absence of component design instructions or information. (Due date: February 1, 2007)Revise the ERCFA program to include the new problem solving process. (Due date: March 1, 2007)Provide training on the new process to selected Operations, Maintenance and Engineering personnel. (Due date: May 1, 2007)Root Cause 3 -Inconsistent management reinforcement of ERCFA application methodology, which resulted in inconsistent quality of root cause of failure determinations.

Corrective Actions Establish a quality checklist for EPRB and Corrective Action Review Board review of ERCFA evaluations. (Due date: February 10, 2007)8 Ensure samples of ERCFA evaluations are included in the next EPRB meeting (Due date: February 10, 2007)The existing Leadership Performance Improvement Plan has established common goals and the expectation that the highest standards of performance and personal accountability be pursued.Root Cause 4 -Inconsistent application of a systematic problem solving methodology, resulting in varying degrees of rigor in root cause of failure determinations and quality of documented evaluations.

Corrective Actions Provide training to selected Operations, Maintenance and Engineering personnel on the new troubleshooting and problem solving process. (Due date: May 1, 2007)Contributing Cause 1 -Inconsistent consideration of all failure modes.Corrective Actions Provide training to ERCFA qualified engineers that will include the need to consider all failure modes as part of initial troubleshooting and root cause activities. (Due date: May 1, 2007)Contributing Cause 2 -No continuing training on equipment failure analysis methodology.

Corrective Actions Provide training to ERCFA-qualified engineers on changes to the ERCFA program. (Due date: May 1, 2007)Establish periodic ERCFA industry events training. (Due date: March 31, 2007)Contributing Cause 3 -Operating experience was not typically reviewed prior to opening components for analysis.Corrective Actions Provide training to ERCFA qualified engineers that will include reviewing any applicable Operating Experience as part of the initial troubleshooting and root cause activities. (Due date: May 1, 2007)Contributing Cause 4 -Program to Program interface between Corrective Maintenance and ERCFA programs allows the ERCFA process to be bypassed in some cases.9 Corrective Actions Perform a self assessment of ERCFA program, including corrective maintenance interface, to identify appropriate corrective actions. (Due date: March 1, 2007)Contributing Cause 5 -Inconsistent priority has been given to the equipment failure analyses, resulting in less than acceptable documentation.

Corrective Actions Provide training to ERCFA qualified engineers that will include a discussion of establishing appropriate priority to ensure a quality analysis. (Due date: May 1, 2007)The existing Leadership Performance Improvement Plan has established common goals and the expectation that the highest standards of performance and personal accountability be pursued.Contributing Cause 6 -Inappropriate accountability for meeting both timeliness and quality expectations on ERCFA determinations.

Corrective Actions Establish a single ERCFA program owner. (Due date: February 1, 2007)Provide training to ERCFA qualified engineers that will include a discussion of accountability and expectations for both quality and timeliness. (Due date: May 1, 2007)The existing Leadership Performance Improvement Plan has established common goals and the expectation that the highest standards of performance and personal accountability be pursued.An effectiveness review will be established to ensure the equipment failure analysis program improvements are achieving the desired results. Performance indicators established as part of the above corrective actions will be used to monitor performance.

4 -Conclusion APS realizes the troubleshooting and problem solving process lacked the technical rigor necessary to ensure deficiencies were properly identified and resolved the first time. In this case, the failure to consider all possible causes of the July K-1 relay failure resulted in a subsequent failure in September.

Immediate actions have been taken to assure the EDGs remain within their design basis. Additional actions to address the programmatic weaknesses have been identified to provide greater assurance that similar events will not occur.10