ML19351E680

From kanterella
Jump to navigation Jump to search
Some Selected Consensus Approaches, Presented at 1980 IEEE Intl Conference on Cybernetics & Society,801008-10
ML19351E680
Person / Time
Issue date: 10/10/1980
From: Abramson L
NRC OFFICE OF MANAGEMENT AND PROGRAM ANALYSIS (MPA)
To:
References
NUDOCS 8012100650
Download: ML19351E680 (11)


Text

4 SCME Sf.l.rCTED CONSENSUS APPR0 ACHES Lee R. Abramson Applied Statistics Branch U.S. Nuclear Regulatory Con:missicn Washington, DC 20555 Summary C,

A number of approaches to the problem of forming

'8

.f a consensus based on individual assessors' judgments 5

pj have been prg osed. This paper discusses and provides 3

ii:l examples of several of these approaches, including the weighted average approach, the Bayesian approach and Q

the performance criterion approach.

i In, 2

  • in W

l.

Introduction H

.N Y

)

w Suppose that a group of n assessors e.ch provides an assessment of a subjective probability distribution, denoted by f, i = 1,

, n.

Winkler discusses g

three approaches to the problem of forming a consensus assessment f which, in some sense, best represents the group assessment. These are the weighted average approach, the controlled feedback approach and the Bayesian approach.

In the weighted average approach, the consensus f is usu'.lly expressed as a linear combination of the f 's, although non 5.inear cochinations are sometimes 1

used (e.g., median, geometric mean, harmonic mean).

The weights are either equal or proportional to some raaking of the assessors which reflects their exper-

.ise, e.g., self-ratings. Some studies have shown that equal weights work about as well as other methods of assigning weights. The approach can be a one-step procedure or can be iterative. If iteratise, the weights can be constant or change at each step.

In the controlled feedback approach, the consen-sus f is developed by several rounds of feedback and reassessment. The reassessment 'can be either group (face-to-face) or individual (Delphi). While this approach is widely used, the resultant consensus may be overly influenced by group dynamics. Experimenta-tion by psychologists has shown that when group inter-actien involves open discussion, group positions tend toward uniformity and established norms. This can be induced by the influence of discussion leaders as well as the desire to reach agreement. Even with anonymity, feedback can induce pressure towards a consensus.

In the Bayesian approach, the f 's are viewed as sample information which is then combined with the decision-maker's prior through Bayes' theorem. While this approach has the virtue of directly involving the decision-maker, the results may be difficult to inter-pret. Furthermore, the f 's are often chosen for g

mathematical convenience rather than as an uncon-strained expression of the assessors' judgments.

In [3], Dalkey proves that no group decision rule exists which is consistent with all of the postulates of probability theory. For example, the average of a set of probabilities fulfills the requirement that probabilities of er.clusive events add; however, it does not fulfill the requirement that the probability of the conjunction of two independent events is the product.

The converse is true for the product (or the geometric mean) as an aggregation rule; it does not sum to one for exclusive and exhaustive events but it is multi-plicative for conjunctions.

8 12100 (Q 1

SOME SELECTE CONSENSUS APPROACi!ER Lee R. Abramson Applied Statistics Branch U.S. Nuclear Regulatory Commission Washington, DC 20555 i

As a solution to this dilemma, Dalkey espouses what he terms the Emerson Principle *: Perforinance is at least as important a criterion for aggregation as consistency ([4], page 10). The Emerson Principle is a rationale for the use of performance criteria, i.e.,

probabilistic scoring rules which reward the assessors dependina on the accuracy of their assessments.

This paper discusses some selected examples of j

the consensus approathes introduced above. No attempt at completeness is made; the purpose is to illustrate some of the ways in which the consensus problem has been approached.

2.

Weighted Averages An Axiomatic Approach in [1], Abramsor. uses an axiomatic ap. roa:h.to the problem of combining subjective probat tlity dis-tributions into a group consensus. A smaAl numbe' of plausible properties which a consensus dist 'Ne a

should satisfy are specified and it is t!

th st there is only one function of the individt 1-butions which satisfies these properties.

Assume that each of a group of n assessors pro-vides a subjective probability distribution on a common set of a mutually exclu*ive and exhaustive events, as indicated in the table belos.

Event 2

E)

E, I

Assessor E

E 1

1 0

P, P) p2 1

1 33 1

1 i

g P,

p2 P

i 1

ij p1 Pn2

P P

l n

n nj nm c

I c

Consensus C

C y

2 j

m Here p ) is the probability assigned by assessor g

i to event E) and C is the consensus probability for E). For the case of equally weighted assessors, it is assumed that the consensus distribution (if it exists) must satisfy the following three properties:

(l's The consensus probabilities sum to 1.

(2) The consensus probabiliry for any event depends only on the set of probabilitiet

  • "A foolish consistency is the hobrohlin of littis minds... " - Ralph Waldo Emersen 1

for thst avsnt, cnd not on the probabilities for the othir syrnts os on which car assor a2 signs which probtbility.

(3) If all assessors agree on the probability of an event, then the consensus probability is their connnon probability.

It is then proven that the only consensus distri-bution which satisfies properties (1), (2) and (3) is the average of the assessors' subje:ctive probability distributions, i.e.,

n

=1 I p for j=1,===, m,

C 4

j n g,g These results are genstalized to the case where the assessors have arbitrary known weights. Let w = weight cf sssessor 1, where wg >_ 0 and g

g + *** + w,= 1.

Properties (1) and (3) remain w

unchanged and property (2) is generalized to allow the consensus probability for an event to depend on both the probabilities and weights for that event. Two additional properties are assumed:

(4) If two assessors assign the same probability to an event, they can be replaced by a single assessor with weight equal to the sum of their weights.

(3) n e consensus probability for any event is a continuous function of the assessors' weights.

It is then proven that the only consensus distri-butlon which satisfies properties (1) - (5) is the weighted aeerage of the assessors' subjective proba-bility distributions, i.e.,

n I w p for j=l,'**,m.

C

=

i=1 Dalkey ([4), p.228) prove ( essentially the same result with a very similar approach. One difference between the approaches is that Abramson assumed that the assessors are assigned weights independently of their subjective assessments and that the significance of these weights is expressed by preperty (4), while in Dalkey's derivation the weights are implied by the consensus distribution. (ne implied weight for each assessor is the consensus probability of an event to which that assessor assigns probability one and to which all other assessors assign probability zero.)

Iterative Weighting Constant weights. DeGroot considers the follow-ing problem.

8 = parameter to be estimated (may be a vector)

F = subjective probability distribution assigned g

by assessor i to parameter 8 (i=1, * * *, k) p

= veight that assessor i assigns to the distribution of assessor j, where k

P 1 0 and p)=1U,j=1,

,O.

gj g

The F are revised by each assessor using the weights Thus,the first revision of F by assessor i is p

g 2

f f

k gg I Gg), h phe h h mmd h F =

j-1 matrix notation, ll). P F y

F(") = P F("~1} = P" F, n=2, 3, ***

By definition, a consensus is resched if the re-vised subjective distributions all approach s'ome limit-ing distribution F*.

A necessary and suf ficient con-dition for a consensus to be reached is that there

, s ) such that exists a vector n = (wg, k

Den

+ w) f or i, j =1, * *., k.

p k

F* = I s F and sP=s.

g g i=1 A sufficient conditic. for a consensus to be reached is that for some n, every element in at least one column of the matrix P is positive. In other words, for some iteration, there is at least one assessor to whom all of the other assessors give positive weight.

In a validation experiment conducted by Moskowitz 8

and Bajgier, subjects, participating either as a mem-ber of a panel discussion or Delphi group, made iter-ative subjective probability distTibution (SPD) assess-ments using the fractile method on various unknown quantities. Examples included the percentage of Purdue students on academic probation and the number of miles driven per automobils accident fatality. De DeGroot model with constant weights did not appear to predict or describe the panel discussion or Delphi group con-sensus process, Opinion weights were not stable and appeared to vary inversely with the dispersion of a group member's SPD. These, however tended to stabilize after several iterations. Models in which the weights were inversely proportional to the variance or the.01 to.99 f ractile range of each group member's SPD gave considerably better predictions than did the DeGroot modal.

Variable weights. Chatterjee and Seneta gener-6 alized the Decroot mode 1 to the case of variable weights. 14e p )(n) weight assigned by individual i to the

=

g distribution of individual j af ter n iterations.

Then a sufficient condition for a consensus to be reached is that I max (min p (n)) = =.

(1) n=1 j i

nree ezemples where consensus is reached are as follows.

(a) Were are an infinite number of occasions when there is at least one assessor to whose opinion everyone attaches a weight of at least 6 > 0 (generalizes constant weight criterion).

2

(b) Open-windid reasseors. As the iterations procesd,~infor=ztion is exchengsd end the esssssors' initial spacialized information tends to become group knowledge, i.e., the assessors tend to give equal weight to all opinions. Then p (n) -* 1/k, Eq. (1) is satisfied and consensus will be reached.

(c) Slow hardening of positions. Suppose that the information exchange process causes the assessors to put more weight on their own opinions and less on those of others, with a tendency in the limit to put all the weight on their own opinions. If the hardening of positions is sufficiently slow, even this situation can lead to a consensus. For example, suppose that pgg(n) = 1 -

1 gj (n) = 2(i-1)2. ' I'd

  • P Ihen Eq.(1) is satisfied and consensus will be reached.

3.

Qualitative controlled Feedback A controlled feedback procedure can be character-ized as follows:

(1) Each member of a group of respondents inde-pendently answers a battery of related ques-tions. Sometimes, reascus for their answers are also solicited.

(ii) Summary information is presented to each group member, and step (i) is repeated.

(iii) The questioning and feedback process is re-peated until it stabilizes (little change from round to round). The stabilization can either be in the form of a group consensus or judgment nuclei, i.e., a hung jury.

The commonly used Delphi procedure is a quantita-tive controlled feedback procedure, whereby the sum-mary information in aLep (ii) is in the form of group medians, quantiles and the like. In [9], Press pre-sents a qualitative controlled feedback procedure whereby panelists supply answers and justifying reasons as in step (i) but only a composite of the reasons is fed back in step (ii).

If one question only is asked, Press proposes the following model:

r

  • i,1
  • ik 0 1

"il

  • k=1 where z

= first-stage response of respondent i g,y

= " cue" variables (demographic and xik attitudinal characteristics of respondent as well as variables related to the question)

B = unknown regression coefficient k

u = random disturbance with zero mean and gg constant variance.

1, 1

l

m -

6 Tor stegt n 1 2, i

R ij(1 - 6(n-1)]p

  • i,n
  • ion-1 +

4 u h

where n-stage response of respon ut i z

=

R total number of reasons pres ited at

=

stage n i

unknown weight coefficients e

o q

f, g) = I 1 if respondent i gives reason j at 6g

, stage n

'O otherwise p = ptchability of response j at stage n u

= random disturbance with zero mean, g

variance o and E(uh"b}"X'i 3*

n Remarks 1.

D e first-stage model is a conventional mul-tiple regression model which assumes inde-pendent responses, but the model for the sub-sequent stages is an autoregressive model which accounts for the dependencies induced by the feedback process.

2.

We model for n 12 can be interpreted as saying that panelist i's response from stage (n-1) to stage n is proportional to the "importance" of the reasons given by the panel in stage (n-1) which panelist i did n_ot, t

give. (ne p are not fed back to the o

panelists.)

3.

ne assumed error structure expresses the re-quirement that the same information is fed back to each panelist on each round and assumes the panel is homogeneous.

i 4

n e model can be used to predict response on a given round from responses on earlier i

rounds. This capability could be used in situations where it is inconvenient, costly I

or impossible to carry out the next round of questioning.

5.

ne expected group mean response after n rounds of raalitative controlled feedback is e

proposed as an estimate of the group judg-ment.

6.

L e model can be extended to the case where there is quantitative feedback (e.g., the mean response), either with or without qualitative feedback.

7.

In [10), Press generalizes the model to the multivariate case where there are many re-lated questions of simultaneous interest.

n e generalization consists of the assumption of an arbitrary covariance matrix for any individual's responses. There is no other assumed interaction among the different questions. Panelists are still assumed to respond independently.

3 i

l [f.

e 8.

Many questions remain to be addressed by empirical research.

e (a) Should an analyst edit the panelists' reasons which appear to be duplicates or paraphrases of other reasons or should he not tamper with the semantic issues which might arise?

(b) Should panelists generate all of the reasons themselves or should a list be provided?

(c) Should panelists be questioned by mail, telephone, personal interview, or by on-line computer?

(d) should models account for round-to-round changes in responses on a relative or absolute basis?

(e) Does group polarization disappear under qualitative controlled feedback?

(f) Most important, how well do the models predict?

4.

Bayesian Calibration In [7], Morris proposes a n,odel whereby a decision maker's prior state of information is modified by ex-pert opinion in a Bayesian framework to produce the decision maker's posterior. The key to the model is the decision maker's subjective calibration of the expert (s). For a single ex;ert, the model takes the folleving form

{xlf,d}=k*C(x)

  • f(x) = (xld)

= k f (x) * (xld),

c where d = decision maker's prior state of infor-mation (xld)=decisionmaker'sprior (x]f,d)=decisionmaker'sposterior f = f(x) = expert's prior C(x) = Calibration Function (decision maker's subjective calibration oi the expert) f (x) E C(x)

  • f(x) c

= expert's subjectively calibrated prior k = normalizing i.onstant.

For several experts, the model bec.emest (xlf, d) = k

  • C(x). f (x).... f (x) *(xld)

= k f,(x) * (xld),

where f = (f, ***, f ) = set of expert priors y

C(x) = Joint Calibration Function 1

f,(x) = C(x) f (x)..... f (x) g

= surrogate prior.

/

4,

./ /

t

i e

Remarks l

l 1.

In principle, in the single-expert esse, the Calibration Function can be "measur d" by

" obtaining a frequency distributica of per-formance measures on a large set of variables (including the variable of interest), over which the expert's assessmeat performance is indistinguishable" ((7], p.14).

2.

"in the dependent, multi-expert case, measurement (of the Calibration Function]

becomes much more dif ficult. A set of variables must be found, over which, in rough terms, all experts share the same de-gree of dependence" ([7], p.14). According-ly, the consensus problem of combining the experts' priors has been replaced by the decision maker's specification of a Joint Calibration Function.

I 3.

If the experts are " independent", then i

n C(x) = w C (x).

"Of course, situations g

i=1 where the experts are independent are rare.

The experts need not associate with each other to be dependent in the probabilistic case" ([7], p.11).

4.

"The model may also be extended to the sit-uation in which the exp. rts provide event probabilities as opposed to probability densities on continuous variables.

....In the event case, the expert probabilities are combined using a normalized additive rule, as contrasted to the continuous variable case presented here in which the probabili-ties are combined using a normalized mul-tiplicative formula" ((7], p.15).

5.

These results hold under rather general as s umptions. In the paper, the results are proved for normal priors, but Morris asserts that they hold for many other priors. Two further assumptions are also made:

"Invariance to Scale: The variance of the j

expert's prior alone provides no information about the uncertain quantity. In other words, the expert's stated confidence in his own prediction ability gives no information independent of his actual prediction. For example, if the only information we have from an expert is a statement that he feels quite knowledgeable about the height of the Eiffel Tower, we have no reason to chance our own beliefs about the height unless he fur-ther provides his actual assessment."

"Irivariance to Shift: The assessment of the location (i.e., the mean) of the expert's

, prior is directly related to the revealed value af the uncertain quantity. If the reveale value is shif ted by some amount,

that asasisment of the Ir, cation of the ex9ert's prior must shtf t by that amount."

Poth of these assumptions ca3 be relaxed without affecting the form of the result.

"If the ur.certain quantity depends upon the variance of the expert's prior alone, then the assessment of this dependence adds another multiplicative term to the likelihood f unctio'" ([ 7], p.15). If the.nvariance to shift

,s

/

sssurption is relaxed, the Ca115r..

-n Function ha t e same form but is tor : dIfficui o u n> cs r.

4 9

5.

Probrbilistic Scoring Rules As a performancs criteria, probabilistic scoring rules are defined as follows (e.g., cf. [4]):

E = (E ) = a set of mutually exclusive and 3

exhaustive events for which probabilities are desired.

R = (R)) = the probabilities which the estimator reports.

P = (P ) = the (unknown) objective probabilities.

S(R, j) = a reward function (scoring rule) which, af ter the fact, pays the estinator an amount S, depending on the report R and the event j which occurs.

It is crucial that the estimator be motivated to accurately report his assessment. In other words.

the scoring rule should not reward the estimator for deliberately distorting his assessment. This re-quirement can be met by using only scoring rules whose expected value is a maximum when R = P.

Such rules are called proper scores and satisfy I P, S(R. j) S I P) S(P, j; J

j Some examples of proper scores are as follows:

1.

Logarithmic Score.

S(R, j) = log R The logarithmic score has a number of unique properties.

(a) It is the only rule which depends solely on the probability reported for the event that occurs.

(b) It is the only rule which is additive over successive estimates.

(c) It is the only rule which is invariant over logically equivalent estimates (e.g., estimates expressed in terus of conditional probabilities, disjunctive combinations, and the like).

2 Quadratic Score.

S(R, j) = 2R -IR k

The quadratic score is the only one where the dif ference between the expected score of a perfect forecaster (i.e., one that announces P) and one that announces R is a function solely of R - P.

j 3.

" Scientific" Score.

! 1 if R = max (R )

S(R, j) =.0 otherwise This score can be interpreted as the usual score in an chjective test (1 for each cor-rect answer and 0 for each incorrect answer) in which the test-taker checks the answer l

5

)

I

e that he thinks is most likely to be correct.

I Some properties of a proper score era es follows:

1.

A proper score is operational, i.e., it can be assigned on the basis of a single in-stance.

2.

A proper score rewards the forecaster for

accuracy, i.e.,

the expected score increases as the report R gets closer to the actual probability.

3.

A proper score rewards a forecaster for honesty. If the forecaster believes Q and reports R.

Sen his subjective expecta-tion is a maximum when R = Q.

?

4.

A proper score rewards the estimator for increasing his information concerning the events before formulating his report.

If there are several estimators, then an "n-heads" rule should be used, An n-heads rule is a scoring rule such that a group of ests stors performs better than the individual members of the group.

If the group estimate is the average, then an n-heads rule requires that the expected score of the group be greater than or equal to the average expected score of the individual estimators:

I P) S(R j) > f I I P) S(R ' $)

k j

kj where k=1,

, n is the index for individual estima-

tors, R = {Rkj) = set of pr babilities k

reported by k R= FIR A necessary and sufficient condition for S(R, f) to be an n-heads rule is that S(R, j) be concave in P..

Examples are the logarithmic score and the quadratic score.

An improved n-heads rule can be derived if the method of aggregation is tailored to the form of scoring rule. For example, the geometric mean " fits" the logarithmic score better than the mann.

The expected group score using the geometric mean is equal to the average expected ind!.vidual score plus a term which is an increasing function of the dispersion of the individual estimates but is independent of the objective probabilities P.

If the quadratic score is used and the mean is the aggregation function, the group advantage is the sum of the variances of the individual reports.

In general, individual assessments tend to be correlated be-ause they are of ten based on the same background expatience. This is, perhaps, especially true for groups of experts. In [5], Dalkey considers the problem of aEgregating expert assessments without knowing arvthing about the dependency structure among the expert..

Dalkey models the experts es inquiry systems and shows that, if a proper scoring rule is

'. it is possible to aggregate the individual ua ass; sments so that the group assessment is better than any individual assessment.

S

.A

e*.

Referency 1.

Abramson, Lee R. (1978). Forming a consensus from subjective prot ability distributions. ORSA/TIMS Joint National Meeting, 1.os Angeles, California, November 15, 1978.

2.

Chatterjee, S. and Soneta. E. (1977). Some con-vergence theorems on repeated averaging.

J. Appl. Prob, J4, 89-97.

3.

Dalkey, N. (1972). An impossibility theorem for group probability functions. De Rand Corpora-tion, P-4862.

4.

- (1977). Croup Decision neory. UCI.A School of Engineering and Appif ed Science. UCU-ENG-7749.

5.

- (1980). Aggregation of probability estimates.

TlMS/ORSA Joint National Neeting, Washingten.

D.C., May 7, 19 80.

6.

Degroot. Morris H. (1974). Rerching a consensus.

J. Amer. St at. Assoc., 69,, 118-121.

7.

Morris, Peter A. (1975). Modeling experts. Xerox Palo Alto Research Center, ARG Tech. Repo.t No.

75-2.

8.

Moskowitz, Herbert and Bajgier, Steve M. (1978).

Validity of the DeCroot model for achieving consensus to panel and Delphi groups. Krannert Craduate School of Management, Purdue Univ.,

Paper No. 672.

9.

Press, James S. (1978). Qualitative controlled feedback for forming group judgments and making decisions.

J. Amer. Stat. Assoc., y,

526-535.

10. - (1979). Multivariate group judgments by qualitative controlled feedback. Dept of Statistics, Univ. of California, Tech. Report No. 39.
11. Winkler, Robert L. (1968). The consensus of subjective probability distributions.

Management Sci., J5, B-16-75.

t it

+

6

~

~

,