ML16246A157

From kanterella
Jump to navigation Jump to search
Digital Instrumentation and Control System Reliability
ML16246A157
Person / Time
Issue date: 09/16/1992
From: Banks M
Advisory Committee on Reactor Safeguards
To: Selin I
Advisory Committee on Reactor Safeguards
References
D920916
Download: ML16246A157 (4)


Text

D920916 The Honorable Ivan Selin Chairman U.S. Nuclear Regulatory Commission Washington, D.C. 20555

Dear Chairman Selin:

SUBJECT:

DIGITAL INSTRUMENTATION AND CONTROL SYSTEM RELIABILITY During the 389th meeting of the Advisory Committee on Reactor Safeguards, September 10-12, 1992, we reviewed the staff's proposed approach with respect to defense against common-mode failure of digital I&C systems, as discussed in policy issue "A" of the draft Commission paper entitled, "Design Certification and Licensing Policy Issues Pertaining to Passive and Evolutionary Advanced Light Water Reactor Designs," forwarded to the Commission on June 25, 1992. Specific comments on policy issue "A" are contained in a letter to Mr. Taylor dated September 16, 1992. The concerns we raise here are, however, more generally applicable, e.g., in connection with the staff's proposed generic letter on analog-to-digital replacements.

The trend in most industries over the last few decades has been toward the replacement of analog instrumentation and control systems with digital alternatives, and the nuclear industry has been no exception. This has been true for both functional replacements within existing nuclear facilities and for new designs, so it has been necessary for the staff to develop regulatory practices to deal with both the novel opportunities and the novel threats posed by these systems.

Experience, both military and industrial, has generally shown the digital systems to be more reliable and versatile than their analog counterparts. There are, however, some caveats and some regulatory conundrums. An advantage is that the digital systems are capable of more complex functions, so it is possible to build in self-testing capabilities that provide continuous assurance of operabil-ity with negligible system stress. In addition, the digital systems don't wear out; a billion activations of a CMOS gate are no more damaging than a thousand. While much has been made of the vulnerabilities of multiplexed data transmission systems, some of which are doubtless real, such systems generally provide greater fidelity and reliability of data transfer, along with greater fault tolerance through error-correcting coding. (If an analog signal is corrupted, it is often not possible to know it has happened.)

Indeed, error detection and error correction can be carried to arbitrary lengths for digitized data. There are many other advantages, and the future clearly belongs to digital systems, where they can be used.

On the negative side, the available complexity of function afforded by digital systems invites the creation of complex software, which can be difficult to validate and can be subject to surprising error

modes. Such systems are also hard to regulate, because only the simplest programs are amenable to formal validation and verifica-tion (V&V), in the sense of a complete analysis of the mapping of the input space to the output space. For more complex programs (relevant to nuclear control systems, but not necessarily to instrumentation or safety actuation systems), there are many analytical techniques in use, none perfect. That is also true of analog systems. Solid-state systems, whether digital or analog, are also peculiarly vulnerable to environmental damage, e.g., from overheating. Finally, programmable digital systems have their own special vulnerabilities to human error.

The staff has concentrated its attention on one of these many issues, the vulnerability of digital systems to certain kinds of common-mode failures, principally through programming errors introduced into the software, and therefore common to all channels.

To deal with this supposedly special susceptibility to common-mode failure, the staff has proposed a set of regulatory requirements.

The set includes some unarguable items, like the provision of adequate diversity to cope with common-mode failures that can affect safety systems, and analysis of the appropriate accident sequences. The set also includes some items whose desirability is less clear, and we now turn to these. Since each of these would require an extensive discussion to develop the point completely, and since our recommendation is that the staff revisit all these points, we will be brief. There is no special order.

The lack of explicit and quantifiable safety standards for instrumentation and control systems is particularly troublesome here. The staff speaks of reliability for digital systems in the same terms (failures per demand) that it uses for items which do wear out, like relays and switches. The entirely different failure mechanisms make this an inappropriate transfer of terminology.

Indeed, a simple software-based system, in which the hardware is kept within its environmental constraints, and whose software is simple enough to have been subjected to a full validation and verification (in the sense used above) can be expected to never fail. (Never is only a slight exaggeration.) The failure anecdotes we all know are typically in systems that are too complex for formal V&V, leaving the door open to software errors, or have been mistreated, opening the door to hardware failures. The latter problem is not unique to digital systems.

In view of the lack of explicit standards for the reliability of the digital systems, the staff seems to have drifted to what has been called the "bring me a rock" posture, in which the industry is asked to analyze its own vulnerabilities, after which the staff will make its ruling about the adequacy of the design. The spirit of the safety-goal initiative was presumably to help make regula-tion more predictable, and this approach is clearly in the other direction.

The focus on common-mode failures is troublesome. Software errors in single systems can lead to accidents just as serious as those due to common-mode failures in redundant systems, and the entire question of software reliability greatly transcends the issues

raised here. We have been conducting a coordinated series of meetings on the safety issues involved in the inevitable computer-ization of the industry, already in progress. When we report on these, we will doubtless raise the question of whether sufficient talent, both in quantity and in experience, is being directed at these issues by NRC. That question is also an underlying issue here.

For the specific issue of protection against common-mode failures, whether for digital systems or such devices as diesel generators, there is a set of standard prophylaxes like diversity and defense in depth, which are useful when applied sensibly. (Slogans can be overplayed. It makes no sense to insist that multi-engine aircraft have a suitable mix of turbine and piston engines.)

The most controversial specific position taken by the staff is that there must be a safety-grade set of displays and controls located in the control room, independent of the computer systems, and "conventionally hardwired" to the lowest level practicable. Though the intent of the words in quotations is unclear, we were assured that it was to require analog backup systems. We do not concur in this proposed requirement. We think that the staff is unnecessari-ly mixing up the issues of digital/analog, hard wire/multiplex, and software/hardware.

Each instrumentation and control system that is important to the safety of a plant ought to meet some identifiable standard of reliability and fault tolerance, regardless of the hard-ware/software basis used in designing and fabricating the system.

It is not necessary that any given element of the system be perfect, but that the system as a whole meet some recognized standard, presumably in the form of a relevant surrogate for the Commission's safety goals. Both the identification of that standard and the evaluation of conformance for the system in question pose problems, but each should somehow be completed before, not after, a regulatory position is established. For example, the staff proposes to require that a backup system provide protection equivalent to that of the primary system, whereas the need is for sufficient protection to assure the adequate safety of the plant. It is not at all uncommon for backup systems to be designed to lower standards than the primaries, taking into account the fact that they will be called upon less often. (Consider spare tires.)

It is entirely possible that a digital system may turn out to be a better backup than an analog system. (The proposed position does accommodate this idea, but the staff briefings did not.) For some situations a light beam is a more reliable means of communication than a hard wire. A general-purpose microprocessor that is in widespread commercial use may be more reliable (and more thoroughly tested) than a special-purpose analog switch. And so forth.

In each case it is necessary to make a specific reliability analysis, measured against a reasonable standard, and the staff gave no evidence of having done so for any case. Instead, it has adopted a general requirement for an analog backup for all cases, and we were not convinced by the justification provided.

We recommend that the staff revisit these issues, augment its own capabilities, and broaden its interaction with those elements of the outside world who have previously dealt with such problems. It would be unwise, however, to read too literally into the nuclear arena the considerations that are relevant to far more complex systems. We are dealing here with the relatively simple safety-centered parts of the computerized instrumentation and control system, and an architecture that exploits this fact may be more robust.

Sincerely, David A. Ward Chairman

References:

1. Memorandum dated June 25, 1992, from James M. Taylor, Execu-tive Director for Operations, NRC, for The Commissioners,

Subject:

Review of the Draft Commission Paper, "Design Certification and Licensing Policy Issues Pertaining to Passive and Evolutionary Advanced Light Water Reactor Designs"

2. 57 Federal Register, 36680, August 14, 1992, Proposed Generic Communication; Analog-to-Digital Replacements Under the 10 CFR 50.59 Rule