ML20045B262

From kanterella
Jump to navigation Jump to search
Expresses Gratitude for Sending Equations for Confidence Limits.Supports Setting Arbitrary Thresholds for Different Levels of Attention to Diesels
ML20045B262
Person / Time
Issue date: 02/19/1991
From: Lewis H
Advisory Committee on Reactor Safeguards
To: Jerrica Johnson
NRC OFFICE OF NUCLEAR REGULATORY RESEARCH (RES)
Shared Package
ML20042D089 List:
References
FRN-57FR14514, REF-GTECI-B-56, REF-GTECI-EL, RULE-PR-50, TASK-B-56, TASK-OR AE06-1-060, AE6-1-60, NUDOCS 9306170086
Download: ML20045B262 (3)


Text

_ _ _ _ - _ _ _ _ _ _ _ _ _ - _ _ _ _ _

~

&fofo-l f

  1. p.n nec o

'o UNITED STA7 G E3-L NUCLEAR REGULATORY COMMISSION 0-

.E ADVISORY COMMITTEE ON REACTOR SAFEGUARDS k\\

8 WASHINGTON D. C. 20555 f

February 19, 1991 MEMORANDUM FOR:

Dr. James W. Johnson, RES/PRAB FROM:

Dr.

e RS

SUBJECT:

CO}iFIDENCb LIMITS Thanks for sending along your equations for the confidence limits.

The only difference between us on the definitions is whether the observed number of failures should be included in the sum, and that depends on what you plan to do with them, since confidence limits are not uniquely defined anyway in the case of a discrete set.

(In fact, the theorem that determines the properties of confidence limits is not valid for a discrete system.)

The choice you made, to do the sums inclusively, is not unique, and leads to a conservative set cf " limits" at the low end, and nonconservative at the high end, in the NRC sense.

It produces the maximum distance between the upper and lower " limits," making the " limits" span a larger range of parameters, but thereby throwing away information.

I chose inclusive at one end and exclusive at the other, to be conservative at both ends, in the sense of trying to underestimate the reliability, but neither of us is "right" in the usual sense of the word.

At the end that matters, the low reliability end, our definitions agree.

Since the definition fails for a discrete system (there are many papers in the literature about this) inclusive or exclusive is a matter of choice.

Anyway, that isn't the main issue.

However they are calculated, the " confidence limits" don't have any regulatory value.

Let me illustrate the point by using your definition, and a Poisson distribution with parameter w (to keep it as simple as possible---binomial is almost the same for the low 3

failure rates we're talking about).

Then, if n failures are i

observed in a sample for which the underlying expected number is w, the symmetric 90% confidence limits (calculated your way, so the probability of enclosing the correct value is OVER 90%)

are 9306170086 930422 O

D PDR PR O

50 57FR14514 PDR

-ri

l

/

l 2

n lower upper 0

0.000 2.996 1

0.051 4.744 2

0.355 6.296 3

0.817 7.754 4'

l.366 9.154 and so forth.

(Note that the lower " bound" here is in the failure rate, corresponding to an upper bound in the reliability.

That is where our definitio.ns differ.)

If these were real confidence intervals, then WHATEVER the true value of w, the probability distribution of observed values of n would be such that the computed confidence limits would enclose the correct value EXACTLY 90% of the time.

(Note that this does NOT l

mean that there is a 90% probability that the real value lies l

within the set of confidence limits derived from a particular measurement of n---that may be the most common misunderstanding in all of statistics.)

The problem created by discreteness is that only a few possible values of w are represented in the " limits" table, so the table cannot cover the parameter space.

Now just do an example, to see the point.

Suppose the real (unknown) value of w were 0.05, then it is easy to calculate that a measurement has a 95.1% chance of spanning the correct value, I

compared to a claim of 90%.

(In effect, only a measurement of no f ailures will do it. )

Now suppose that the real value is trivially l

different, equal to 0.06, and do the same calculation.

Then the l

chance of enclosing it within the confidence limits increases to a mind-boggling 99.8%, because a measurement of one failure will also do the job.

They are both much larger than the assumed 90%,

because of your definition.

This is of course an extreme case, but the same pattern applies for larger failure rates.

Going from a true failure rate of 2.9 to 3.0 changes the probability of enclosing the true value from 97.1% to 91.7%, and so forth.

This erratic behavior is a direct consequence of the discreteness.

All l

these numbers would be exactly 90%

for a

continuous case.

Certainly anyone who tried to regulate with these temperamental numbers would be in trouble, because they don't mean what they say for the underlying reliability of the diesels.

l On top of that, these ranges are far too large for any regulatory use.

The cases of interest in the diesel application are a few failures out of a sample of a few dozen tries, and that's not enough to conclude anything at all.

Take three failures from the table, and your confidence interval spans nearly a factor of ten in the underlying failure rate (0.817 to 7.754), which is pretty useless.

(It doesn't appear that big in the table you gave to

/,

3 Commissioner Curtiss, because you calculated the reliability instead of the failure rate.

Taking, for example, the case of 3/100 at 90%

from _your table, you show the bounds on the reliability as 0.924 and 0.992.

This means expected failure rates of.076 and.008, which differ by nearly a factor of ten, and, when multiplied by 100, mean an expected number of failures between 0.8 and 7.6, close enough to the Poisson values given above.)

You can't regulate with factors of ten floating around.

My position is the same as it always has been, that there is nothing wrong with setting arbitrary thresholds for different levels of attention to the diesels, but that it is statistically incorrect to try tp tie them to any underlying reliability number, l

tis is required to comply with the blackout rule.

The Commission knows my position.

There is plenty of precedent for the use of arbitrary action thresholds.

The most extreme case is aviation, where each single l

commercial accident triggers a flood of attention and rules, but nobody pretends that it has any computable significance with respect to the underlying safety of flying.

NRC got itself into this pickle, presumably by getting bad statistical advice, or none at all, and needs mne help in pulling itself out.

I have no quarrel with the way you did your calculation---you were no more arbitrary in the choice of limits for the sum than I was.

You chose the definition that gives the maximum spread, and I chose the one that was conservative, in the sense that it minimizes the estimated reliability at both ends.

We could argue about which is

better, and undoubtedly agree to disagree, but that would be pointless, if only because the important end, the lower " limit" on l

reliability, is the one where we used the same definition.

The l

problem is only that people who are statistically unsophisticated i

will read much more significance into the tables of confidence intervals than is warranted, and conceivably even depend on them l

to regulate the industry.

l cc:

l C.

J. Heltemes, RES l

L. R. Abramson, RES/PRAB i

W.

T. Russell, NRR i

M.

A. Cunningham, RES/PRAB I

l l

-