ML20140H901

From kanterella
Jump to navigation Jump to search
Forwards List of Questions & Requested Actions for Use in Preparing for ACRS Review of Proposed Final SRP Scetions, BTPs & RGs
ML20140H901
Person / Time
Issue date: 01/23/1997
From: Miller D
Advisory Committee on Reactor Safeguards
To: Coffman F, Wermiel G
NRC (Affiliation Not Assigned), NRC OFFICE OF NUCLEAR REGULATORY RESEARCH (RES)
Shared Package
ML20140H678 List:
References
ACRS-3047, NUDOCS 9705130315
Download: ML20140H901 (6)


Text

. _. _ _ . _ _ . _

4 4

W January 23. 1997 i

MEMORANDUM TO:vared L. Wermiel. Chief Instrumentation and Controls Branch Division of Reactor Controls and Human Factors Office of Nuclear Reactor Regulation Franklin D. Coffman Jr.. Chief Control. Instrumentation and Human Factors Branch Division of Systems Technology Office of Nuclear Regulatory Research FRDH: Dr. Don W. Miller, Chairman Instrumentation and Control Systems and Computers Subcommittee Advisory Committee on Reactor Safeguards

SUBJECT:

ISSUES AND REQUESTED ACTIDNS FOR NRC STAFF USE IN PREPARING FOR ACRS REVIEW OF PROPOSED FINAL SRP/BTP/RGs I

The purpose of this memorandum is to forward a list of questions and requested actions for  ;

your use in preparing for ACRS review of proposed final Standard Review Plan (SRP) sections.

Branch Technical Positions, and Regulatory Guides (RGs). We have discussed many of these j issues during past meetings. As you know, some ACRS Members feel very strongly about some i of these issues. Therefore. I am providing this list of issues and requested actions to l facilitate your preparation in resolving these matters during future Subcommittee and ACRS  ;

deliberations. l Issue 1: ]

Several Members of the ACRS point to Ontario Hydro's experience at Darlington as an innovative approach to software (S/W) design and assessment. Cognizant members of the staff have informally expressed concern that the formal methods introduced (forced on?) by Parnas for software design are confusing and "a step backwards." The staff has expressed the view that the approach advocated by Parnas is neither transparent nor useful. Furthermore, the staff believes that we are substantially ahead of the Canadians in this area.

Recommended Action:

. Discuss the staff's opinion of the S/W design methods used by Dntario Hydro.

Issue 2:

A recurring question and concern raised by Members of the ACRS has been that S/W system 9705130315 970207 PDR ACRS 3047 PDR

=+

-. . .. . ~_ . _ . - .

s design methods nave high reliance on process with little reliance on product testing or evaluation as a means_for developing high quality, highly reliability S/W. There is agreement that a high quality process will improve product quality and reliability and that a precisely defined and auditable process results in a product which is easier to maintain and update, and simplifies configuration management. However, some ACRS Members expressed concern that the acceptance criteria are not clearly specified. From their view, the  !

acceptance criteria provides limited guidance on what is acceptable and what is not l acceptable in terms of requirements for accuracy, consistency, and competency. I

Recommended Action
  • Address these concerns by citing specific examples (i.e., using a simple system). On

, the issue of acceptance criteria explicitly demonstrate how acceptance criteria are specified by the IEEE standards. I e Discuss the assertion that the acceptance criterion provides limited guidance on what l is acceptable, what is not acceptable, and that there is no criteria for accuracy, consistency, and competency.

Discussion:

l The following abbreviated definition of Verification and Validation (V&V) may be considered as starting point in discussing these questions and concerns.

V&V is "the process of determining whether the requirements for a system or component are complete and correct, the products of each development phase fulfill the requirements or conditions imposed by the previous phase, and the final system or component complies with specified requirements. Note the activities involved in V&V for digital systems are essentially equivalent to some of the activities that have traditionally been performed for design and acceptance testing of any nuclear-safety related equipment." (

Reference:

" Guideline on Evaluation and Acceptance of Comercial Grade digital Equipment for Nuclear Safety Applications",

July 1996.)

V&V is important to assessing that the process is done correctly. Testing is used to verify the process is completed and the product meets specified requirements.

V&V provides for and requires a well defined PROCESS to assure the process of S/W development is complete and correct AND that the PRODUCT at each phase of development process performs as expected AND that the FINAL PRODUCT performs as specified. What it does not do is guarantee performance indefinitely nor provide a statistical methodology for predicting failure.

Simply put, V&V is a well-defined process for design and performance evaluation of S/W I products analogous to the design and performance evaluation process used for hardware (H/W). l There are: however, two distinct and important differences: 1) the process of translating i system requirements into S/W requirements is difficult and is the cause of design errors '

being the dominant source of failures. Design errors are fundamentally human performance I errors and are not stochastic. Other S/W failures such as coding errors, although less I prevalent. also are not stochastic, in the case of H/W, most failures are not dominated by i design errors. Consequently. H/W failures are dominated by " mother nature" and are random and stochastic. Stochastic failures can be characterized by mean time to failure and , )

failure rates. If design errors were dominant in H/W systems, H/W systems would have properties similar to those found in S/W systems (i.e., the inability to predict failures

? ,

s and comon-mode failure in redundant systams).

Golay (see below) captures this concept from a different perspective:

"The concept of S/W failure is fundamentally different from H/W failure. A S/W failure consists of encountering an initially present but still undetected error.

Conversely, a H/W failure consists of exciting a failure mode by exercising a flaw which has been introduced since the last time that this failure mode was demonstrated to be inactive. Thus a perfect 1/W program can be expected to remain error free

{ indefinitely} but an analogous H/W system can not".

Issue 3:

On numerous occasions the staff and the ACRS have stated that nuclear safety :;ystems are inherently simple. The SRP Chapter 7 update is based on consensus based industrial standards. standards that represent generally accepted S/W engineering practice and consensus standards developed for all S/W based systems, simple or complex.

A question has been raised on several occasions including at the recent meeting with the Commission. whether guidance or standards can be daveloped specifically for simple systems such as those used with nuclear plant safety systemsi Recommended Action:

  • Discuss the feasibility of this approach with specific attention to the pros and cons?

Issue 4 i i

Themethodologyforaddressingcommon-causefailures(CCFs}shouldbeadaptabletoredundant l S/W based systems. In fact. identification of possible CCF paths may be inherently easier l than many H/W paths since it is well known that S/W failures are dominated by design errors and that design errors are a significant contributor to CCFs.

Recomended Action:

  • Discuss the methodology for addressing CCFs in digital systems Issue 5:

The SRP is intended to be guidance for reviewers of digital systems and should not include specific examples. Several ACRS Members have suggested that the SRP cite analytical tools that may be used in specific cases. For example, tools for translation of system requirements into S/W requirements.

Recomended Action:

  • Cite possible examples of tools that may be used?

Issue 6:

M

i l I

o 7

The staff has stated that a graded approach to reviews based on importance to safety will be used. However, the criteria for reviews based on a graded approach are not explicitly l defined. The staff also stated that guidance on graded quality assurance (GQA) is being developed for PRA as a part of the SRP update to Chapter 19. A draft Regulatory Guide for GQA has been developed.

Recommended Action:

  • Consider how or whether the fundamental concepts and methods in the GQA RG can be i

applied to the review of digital systems?

i 1 Issue 7: 1 I

^

During past ACRS discussions it has been suggested that the guidance provided for review of )

digital systems is analogous to the quality assurance QA process for safety-related equipmene specified in Appendix B. This analogy has been implied when one considers the i

graded approach to review and consider the use of the concepts the RG for graded quality assurance.

Recomended Action:

  • Loment on the merits of this analogy?

. Issue 8: ,

The methodology and tools used in developing design specifications should be transparent anci independent of natural language.

Recommended Action:

  • Provide examples including a tutorial ot. ' 'r use.

i Issue 9:

There is an expectation that the designer / developer complete tasks specified by the process.

The SRP specifies what is to be done and in some cases why, but not how to do it, which is

designer specific.

1

< Recomended Action:

  • Evaluate S/W developcent and design methods that supplement or provide an alternative to the traditional " Waterfall" methodology on which the IEEE S/W standards are based.

Two such approaches were reported in papers published at the recent PSA*96 ANS Topical meeting in Park City. Utah. I have included references to each with quotes from their respective papers.

1. M.W. Golay, J. Luniholfer and M. Ouyang. "A Strategy for Developing and Demonstrating Highly Reliable Nuclear Software"
  • Applies primarily to simple systems
  • The keys to developing reliable S/W are a combination of requiring the of

1

? l l l

I 1

]

structure to be simple and a thorough testing program.

The concept of S/W failure is fundamentally different from H/W failure. A S/W j failure consists of encountering an initially present but still undetected i error. Conversely, a H/W failure consists of exciting a failure rude, by  !

exercising a flaw which has been introduced since the last time that tnis '

failure mode was demonstrated to be inactive. Thus, a perfect S/W program can be expected to remain error free { indefinitely), but analogous H/W cannot.

Whenthesemethodshavebeencomparedtethemoretraditionalmethods{1.e. i Waterfall} they have been shown to produce usable code more efficiently and to J be cons 1&rable more effective in eliminating various forms of S/W errors.

2. A. Clark and C. Smidts. " Systematic Genert' 'on of Software Failure Mode and Effects Analysis for Fault Tc W A - e.

This approach is recovery-oriented and focuses on tolerating and recovering from faults to continue to provide service.

The S/W FMECA approach was prototyped .. for FAA's new air traffic control i system which has an unavailability requirement of IE-7. i Like H/W FMECA. S/W FMECA identifies inadequacies in the design. identifies the need for corrective actions and provides data to develop test plans. ,

However, unlike H/W FMECA. S/W FMECA process described in this paper takes a  !

recovery oriented paradigm. where failures are designed to be tolerated rather I than eliminated, because S/W failures are more difficult to identify and eliminate than are H/W failures.

Coment: It will be interesting to see what position the NAS study takes on these issues.

Issue 10:

Reference:

GL 95 02 and digital I&C upgrades.

Several ACRS Members expressed the view that there is some ambiguity in the licensing ,

application process for digital I&C upgrades. The update of the SRP Chapter 7 should Nip. I but there remains a question whether more should be done. Currently, the staff recorrmende  ;

that licensees who are considering upgrades contact the staff ahead of time and present an j overview of the proposed change. The staff will then provide an opinion on whether it can 1 be done via 10 CFR 50.59 or whether a license amendment is required to assure an unreviewed safety question (USQ) does not arise. The staff and industry should have gained sufficient experience since issuance of GL 95-02 to " narrow the gray areas" and more clearly identify  !

criteria for 50.59 changes. The NRC  !

Technical Training Center (TTC) plans to hold a second Regulatory Perspectives Workshop" in l January or February regarding these matters. This topic should be discussed both formally l and informally. l l

Recommended Action. i e Based on experience with digital upgrades, clarify criteria for t,0.59 changes.

1 Issue 11:

Reference:

K. Korsah. T.J. Tanaka T.L. Wilson and R.T Wood. " Environmental Testing of an Experimental Digital Safety Channel". NUREG/CR 6406. September 1996.

Significant Findings and

Conclusions:

1. Interfaces were found to be the most vulnerable element of the Experimental Digital 30

~ - -

.' j i

1 o

Safety Channel (EDSC). "Thus, qualification testing should confim the response f I any digital interfaces to environmental stressors."

2. The most prevalent stressor induced upsets, as well as the most severe., were found to occur during the EMI/RFI tests. These tests produced the only permanent failure of the EDSC (i.e.. power supply). Also, the effect of the stressor was typically 1 mediate, whereas the occurrence of high temperature / humidity and smoke e<posure effects was delayed for some interval (i.e. tens of minutes),

i Discussion:

. High-voltage spikes on power leads were found to cause a greater number of upsets and within 1

a relatively short time (i.e. seconds) compared to low-voltage, sinusoidal rms noise on the l same power leads (

Reference:

4.8.

"Sumary of EMI/RFI Test Results"). Throughout all the EMI/RFI tests, this

. (the power supply of the original PRS / MUX multiplexer backplane under test inside the GTEM cell failed permanently after the 20MHz 72V/m test.) is the only hard or permanent . failure

! that occurred. The minimum field strength' at which temporary errors occurred with the DTC j was'40V/m. (

Reference:

4.7.3 " Analysis of R503 Test Results").

It was noted that susceptibility of particular systems can be mitigated by grounding, shielding isolation and surge practices. I

)

Recommended Action:

]

Considering the results from the EMI/RFI tests address the following two questions:

4 i 1. How similar were the high voltage spikes on power leads to transients that might be

+

expected from transients resulting from lightning?

2. Taking into consideration the BNL Risk study of environmental stressors, which concluded that lightning represented the highest envirmnental risk for digital I&C

. systems, does the EPRI EMI/RI Guideline endorsed by an NRC SER provide sufficient guidance relative to grounding, shielding, isolation and surge practices?

Issue 12: NRC Research Plan i

Identify the most important regulatory and technical issues raised in the Nationdl Academy 4

of Science Phase 2 study and relate them to current NRC research programs for !&C and identify new research programs needs. Include issues considered in this memo where appropriate.

Recommended Action:

  • Identify points of agreement and disagreement with the NAS/NRC Phase 2 study report.

. Present the preliminary assessment of the changes required as a result of the study.

cc: ACRS Members J. Larkins R. Savio S. Duraiswamy l ACRS Staff and Fellows J/