ML20148J057

From kanterella
Revision as of 11:07, 23 June 2020 by StriderTol (talk | contribs) (StriderTol Bot insert)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Applicants Rebuttal Testimony 4 (Rebuttal to Corrected Testimony of Ae Luloff Re Beach Blanket Survey Conducted for Commonwealth of Ma.)* Witnesses:Bd Spencer & Ds Mileti
ML20148J057
Person / Time
Site: Seabrook  NextEra Energy icon.png
Issue date: 01/22/1988
From:
PUBLIC SERVICE CO. OF NEW HAMPSHIRE
To:
Shared Package
ML20148H865 List:
References
OL, NUDOCS 8801270354
Download: ML20148J057 (21)


Text

o i

mt 00CHETED USNitC Dated: Januar(22,,1988

  • JM 26 M1 gg UNITED STATES OF AMERICA hg C 3 g ,, p NUCLEAR REGULATORY COMMISSION before the ATOMIC SAFETY AND LICENSING BOARD

)

In the Matter of )

)

PUBLIC SERVICE COMPANY ) Docket Nos. 50-443-OL OF NEW HAMPSHIRE, at al. ) 50-444-OL

)

(Seabrook Station, Units 1 ) (Offsite Emergency and 2) ) Planning Issues)

)

APPLICANTS' REBUTTAL TESTIMONY NO. 4 (REBUTTAL TO THE CORRECTED TESTIMONY OF DR. ALBERT E. LULOFF REGARDING THE BEACH BLANKET SURVEY CONDUCTED FOR THE COMMONWEALTH OF MASSACHUSETTS)

Witnesses: Bruce D. Spencer Dennis S. Mileti Applicants' rebuttal testimony regarding the Beach Blanket Survey conducted on July 17, 18 and 28, 1987 by AEL Associates for the Commonwealth of Massachusetts was l

developed from two viewpoints: the external validity of the Su rvey , or the ability to generalize the Survey results to the general population which did not participate in the survey, and the internal validity, the Survey's ability to l

8801270354 880122 PDR ADOCK 05000443 T PDR

r- -

o measure what it claims to measure, and associated freedom

-from bias.

I. Analysis of External Validity The Beach Blanket Survey is not based on random sampling. Random sampling involves deliberate randomized selection of respondents from an identified or identifiable population of respondents. No such deliberate randomization was utilized in the Beach Blanket Survey. To show that no randomization was used and to indicate the many ways that the samp?.e could fail to be representative (in any sense of the word), it is useful to review the major steps in the Survey.

1. Three days in mid-July, a Friday, Saturday, and Tuesday, were chosen on which to conduct the survey inte rviews .
2. The New Hampshire shoreline was divided into four sections. Beachfronts within each section were to be surveyed on each of the three days.
3. Each of the beach areas was subdivided into pieces; it is assumed that the pieces did not overlap and that all the pieces put together covered the entire beach under survey. According to Attachment 4, p. 2, interviewers were "told to divide the beach area into roughly equivalent l pieces", and so it is deduced that the pieces for any given
beach consisted of roughly the same area. It is also assumed l

l that each piece was a strip of land running from the seawall to the waterline.

_2_

l

4.

Subpieces were then systematically chosen within each piece (or within each group of three ontiguous pieces).

One subpiece was to be selected from the part of the piece that was 3/4 of the way from the seawall to the waterline, one subpiece was to be selected from the part of the area that was 1/2 of the way from the seawall to the waterline, and one subpiece was to be selected from the part of the area that was 1/4 of the way from the seawall to the waterline.

5.

Beach blankets were to be selected within each subpiece.

The selection of the blankets was not determined by a deliberate randomization. Rather, the selection was up to the discretion of the interviewers subject only to certain quotas by sex and by three broad age groups.

Based on the above brief review of AEL procedures, it is seen that at no point in this Survey was randomization used.

It is possible, however, for a non-random sample to yield useful data.

To clarify this statement we would like to describe some differences between random and non-random samples.

All of us make inferences every day from experiences that do not result from random sampling.

Sometimes those inferences are valid for situations not exactly the same as the ones immediately experienced, and sometimes not. In the same way, statistics calculated from samples sometimes are accurate measures of what the results would be if the statistics were calculated from the entire population from which the sample was drawn. When the sample is a random sample, statistics such as proportions are generalizable to the population from which the sample is-drawn, in the following sense: if the-sample were selected repeatedly and independently, the average value of the statistic (i.e., averaged over the various samples) would equal the value of the statistic for the whole population.

If the sample is not a random sample this generalizability does not hold; the value of the statistic might, on average, equal the statistic for the whole population, but it very well might not. The only way to attribute generalizability to a non-random sample, such as the Beach Blanket Survey, is by an act of faith or by subjective judgment. ,

Generalizability does not ensure accuracy, i.e., that the statistic from any single chosen sample is close to the statistic for the population. The typical magnitude of the error -- the difference between the statistic for the sample and the statistic for the whole population -- is called the standard error. With random samples the standard error decreases as the sampled size increases; with non-random samples this need not occur. Also, with random samples the standard error can be assessed internally, from the chosen sample itself. The standard error for non-random samples cannot be assessed internally -- only by comparing the results of a non-random sample with accurate external benchmarks can the standard error of a non-random sample be assessed.

Some impression about the possible accuracy of a non-random sample can be gleaned from consideration of the manner in which the sample was selected. If the selection seems free of systematic biases or tendencies to over-represent or under-represent some groups, then we may be willing to believe that the sample might yield accurate results.

In order to judge whether the Beach Blanket Survey might yield accurate statistics we examined the consequences of the various stages of sampling described in paragraphs 1-5 above.

1. The first stage of sampling is really the restriction of the Survey to responses by people at the beach. No reasons are presented to show why one would be justified in generalizing any conclusions which might apply to persons at the beach to conclusions appropriate to people at other locations than the beach.
2. The next stage of sampling is the restriction of the Survey to a few days in mid-July. No reasons are presented to show how one would be justified in generalizing to other days. If one is interested in how people would respond to the Survey on a "typical" day, then one would want to calculate statistics for each day separately and see if the responses were consistent from one day to the next. The statistics presented in Attachment 4 most heavily represent Saturday and least represent Tuesday. Different kinds of people use the beaches on different days; for example, 60% of those interviewed on Tuesday reported that they would "do as

told" if a policeman told them to go in a different direction, compared to an average of 50% for those interviewed on Friday and Saturday.

3. The next stage in the sampling is the choice of four particular beach areas to represent all of the New Hampshire beachfront. Attachment 4, "Beach survey", indicates that roughly 2/5 (41.6%) of the interviews occurred at Hampton Beach, 1/4 (28.6%) at North Hampton (Boar's Head to 101C),

1/10 (11.5%) at Seabrook, and 1/5 (18.3%) from 101C to Odiorne Point. Counts from aerial photographs on July 18 in early afternoon using these same geographical boundaries show discrepancies from the sampled proportions:

Actual Proportions Sampled (from 1987 beach Beach Procortions Doculation counts)

Hampton 41.6% 57.6%

North Hampton - 101C 28.6% 10.1%

101C - Odiorne Point 18.3% 21.6%

Seabrook 11.5% 10.7%

Furthermore, the coverage of the individual beaches was far from uniform across the three days of the Survey. For instance, nearly half of the interviews at North Hampton Beach occurred on Tuesday, whereas only 16% of all interviews occurred on a Tuesday; also, 97% of the Boarc Head interviews occurred on Saturday, with only 3% on Friday, and none on Tuesday. In addition, it appears that certain sections of

the shorefront (Rye, North Beach, Foss Beach, Bass Beach, Little Boar's Head and Plaice Cove) may not be represented at all in the Survey.

4. Dividing each beach into pieces (as described on Page 2 above), ensures that all parts of the beach areas are covered by the Survey. However, since approximately equal numbers of interviews occurred in each piece (it is assumed),

a higher ratio of interviews per beach blanket occurred in pieces with few beach blankets than occurred in pieces with more beach blankets. Thus, the Survey would tend to under-represent persons who clustered near other persons at the beach and it would over-represent persons who placed their blankets at some distance from other blankets.

5. The manner in which the selection of subpieces occurred, as described earlier, entails that the subpieces tended to be located along three horizontal strips running lengthwise along the beaches, one strip 1/4 of the way from the seawall to the waterline, one strip 1/2 of the way from the seawall to the waterline, and one strip 3/4 of the way from the seawall to the waterline. Persons who settled their blankets very close to the seawall would not fall into these selected subpieces and therefore would not at all be represented in the Survey.
6. The fact that the interviewers could decide which l beach blankets in a subpiece they would select for the sample, subject to quotas on age and sex of respondents, 1

I L

suggests that certain types of people were more likely to be interviewed than other types, e.g., blankets occupied by approachable-looking people would be more likely to be chosen than other types. Indeed, this selectivity may explain why the cooperation rate was high (Attachment 4, page 3 notes that there were "very few refusals"). This selectivity tends to cause systematic errors in the Survey statistics. The use of quotas hy sex and age was an attempt to control the selectivity, and certainly the selectivity would have been even more severe without the quotas, but selectivity still occurred for people in age and sex subgroups, e.g.,

approachable-looking middle-aged men were more likely to be interviewed than other middle-aged men.

7. In addition to the fact that the use of quotas does not completely control systematic errors from' interviewer selectivity, the use of quotas also introduces systematic error into the Survey statistics because the sex and age groups are represented in the Survey according to prespecified target values (quotas) and not according to the numbers in which they actually are present on the beach or in whatever population to which the Survey is supposed to be generalized. Thus, if the actual proportion of beach-going persons aged under 25 is really 40% then the Survey will over-represent them becausu 52.6% of the surveyed people were under 25 years of age (the quota was set at around 1/2).

Additionally, sex and age quotas used in sample selection on l

all beaches were based on a 1983 survey of Hampton Beach.

There is no basis on which to assume that age and sex quotas for Hampton Beach are equivalent to those that would occur for Hampton Beach in 1987 or on other beaches.

The Beach Blanket Survey report (September 14, 1987) states (Attachment 4, pp. 3-4) "

. . . the sampling error is plus or minus five percentage points. This means that in theory if the Survey were to be repeated an infinita number of times, that 95% of the time we would obtain the same results." That statement is wrong, and was later amended (Errata to Testimony of Dr. Albert E. Luloff . . . December 14, 1987) to:

This means that in theory if the survey were to be repeated 100 times using the r;ame techniques, in 95 out of 100 times the results obtained for a particular question would not vary by more than 5 percentage points from the results which would have been obtained had every beach-blanket group on the beach been surveyed.

The amended statement would be correct if certain conditions held, namely:

(i) the sample were a renion cample.

(ii) the standard error e, actually 2.5 percentage points (we may inf..r tm Luloff means sampling error to refer to Spproxim 11y two times the standard error) , < "3 (iii) the sample size wes ,suffic ntly large.

Unfortunately, Dr. Luloff's theor.* 1 interpretation of sampling error is not applicable because the Beach Blanket Survey is not based on a random sample and there is no known way to calculate the standard arror for the Survey.

~9-

Two conclusions that follow from the fact that the Beach Blanket Survey is not based on random sampling are:

(1) there is no statistical-theoretical basis for assuming that its conclusions are accurate, and (2) the estimates of sampling error presented in the report are meaningless and have no statistical-theoretical basis.

Furthermore, the manner in which the sample was selected involves so many kinds of selectivity and unequal representation of persons on the beaches that the statistics calculated from the Survey should not be trusted even to the extent of generalizing to the beach population on the days sampled. Generalization to other populations is certainly not supported by such a blatantly non-random sample.

II. Analysis of Internal Validity Applicants' Direct Testimony No. 7 on Evacuation Time Estimate and Human Behavior in Emergencies and Applicants' Rebuttal Testimony No. 3 demonstrates that the behavioral intentions of people (be those intentions voiced on a beach blanket or anywhere else) are not indicative of actual behavior in an actual future emergency. We do not detail this viewpoint here; our conclusion is simply that even if this Survey had sound behavioral intention measurement (internal validity) and sound external validity (discussed L _ -_ _ _ . . . _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

above), it would shed little light on the actual behavior of beachgoers in an actual future emergency at Seabrook. This basic point must not be lost in the context of this critique of the technical aspects of this poll. Human reaponse to an actual emergency would largely be directed by factors which prevail during the emergency as it is experienced. These factors were not simulated in the Beach Blanket Survey, nor could a survey adequately simulate these factors. Behavioral intention polls of beachgoer behavior can be nothing more than what respondents thought the day they were interviewed taking into account only what they may or may not have had in mind before questions were answered. Actual public behavior in an actual future emergency is the consequence of other factors which cannot be simulated in pre-emergency polls or surveys. But these other factors and how they affect behavior are well known, based on actual emergencies, and these should guide and shape emergency planning for actual emergencies at Seabrook, not a behavioral intention poll.

The Beach Blanket Survey at best held the potential to gather "factual" data (how people got to the beach the day they were interviewed, for example) and speculative perceptual data (how people "thought" they might oehave in a future unexperienced emergency) from the persons interviewed.

However, flaws of internal validity in the Survey design and interview schedule were profound and numerous. There is really little if no basis, therefore, to conclude that the

n answers interviewees gave to the questions which they were asked can be taken as any measurement of behavioral intentions (let alone actual future emergency response behavior) or even of much of the "factual" data the Survey was designed to collect.
a. Internr1 Validity of Behavioral Intertions.

There are many reasons why the internal valldity of the Beach Blanket Survey was too low to generate trustworthy data on the behavioral intentions of beachgoers. The reasons for this conclusion follow.

The Beach Blanket Survey was performed on the wrong unit of analysis. It appears as though the Survey assumed that the driver in beach-going groups would make emergency response decisions for that entire group. As a result interviews were conducted with individual drivers instead of the entire group of persons who came and/or would leave or shelter with the driver. The Survey approach incorrectly assumed that drivers would have the significant decision making power to determine group emergency response.

Empirical evidence teaches that engaging in protective actions (for example, evacuation) in respon c to emergency warnings and situations is largely a group affair. For

example, people prefer evacuating in family or intimate groups when all other things are held constant. The structure of this Survey -- drivers were interviewed rather than the entire intimate group which comprised the "beach i

blanket" -- was such that drivers were not able to adequately account for the decision-making input from other "beach blanket", all of whom would engage in discussions leading up to response decisions in an actual emergency. The correct unit of analysis for the interviews performed in this Survey would have been the entire "beach blanket" intimate group and not just the driver. In a real emergency actual response decisions would be made through a process of group interaction in which the driver would provide only one voice.

This emergency response group decision making process is well understood, and it has even been diagrammed rogarding family evacuation decision making (see Stanley D. Brunn, James H.

Johnson, Jr., and Donald Zeigler, 1979, Final Report on a Social Survey of Three Mile Island Residents. East Lansing:

Michigan State University, Department of Geography, page 46).

This fundamental flaw in the Survey design suggests that answers to behavioral intention questions for all respondents (except perhaps those who came to the beach alone) are not reflective of even what "groups" thought, and it is these "groups", as noted, who would make collective response judgments in an actual emergency and not just drivers. This flaw in the Survey has dramatic implications for the answers chtained to intended emergency response behavior questions 7, 8, 9, 10A, 10B, 10C, 10D, 10E, 10F, 10G, llA, llB and 12. These questions gathered data solely on driver behavioral intentions. Answers to these questions, therefore,.are the intentions of a renponse unit that is not relevant.

It is likely that during the driver interview process other beach blanket members may have engaged in voicing their opinions (some obviously would have overheard the interview as it was being conducted). This "spontaneous" interaction, however, could not approximate the actual group decision making process that would occur in an actual emergency for many reasons: (a) many "beach blanket" intimate group members who would participate in emergency response group decision making would have undoubtedly not spoken up during the driver's interview since they were not invited to participate, (b) "beach blanket" intimate group members were away from the blanket during the driver interview and could not participate spontaneously -- this was true in 25% of the interviews -- as indicated by responses to question V10B in the Survey frequencies, and (c) whatever "spontaneous" interaction which could have occurred during the interview was likely severely limited by other factors that would not constrain group interaction during a real emergency, for example, sleeping, eating and so on.

Sore of the behavioral intentions questions on the interview schedule (see questions 7, 8, 10A, 10B, 10C, 10D, 10E, lor, 10G and 12) are s' the sc e that asked respondents to speculate about behavioral intentions in response to simulated emergency information. Yet the emergency l

l information simulated for respondents in reference to these questions'was not similar to the emergency information that would be provided in a Seabrook emergency. In fact, under cross examination Dr. Luloff disassociated himself with any claim that the Beach Blanket Survey was designed to simulate EBS response. (January 11, 1988, II. 8224-26)

An actual response to an actual emergency by beachgoers, however, would certainly include hearing EBS messages, as well as a three-minute sounding of the sirens. It seems grossly inappropriate, therefore, to suggest that behavioral intentions in response to something else would resembla behavioral intentions in response to actual EBS information that beachgoers would respond to in an actual Seabrook emergency. We can only conclude, therefore, that the Beach Blanket Survey is a study of behavioral intentions in response to something other than the character of an actual Seabrook Station emergency. It is not possible, therefore, to conclude that the behavioral intentions which respondents voiced in reference to questions 7, 8, 10A, 10B, 10C, 10D, 10E, 10F, 10G and 12 would be similar to intentions that would be voiced in response to the actual emergency information which would characterize a Seabrook emergency.

Question 8 in the survey portrays emergency scenario characteristics that are inconsistent with the Seabrook plan as to render answers to this question meaningless with regard to behavioral intentions.

A similar problem exists for question 7 which asks respondents to incorrectly assume that "designated" shelters have been identified. Finding a "designated" shelter is not the same thing as going inside a building. It seems very unlikely that behavioral intentions would be the same in response to question 7 were the question reworded to reflect actual Seabrook emergency planning and actual emergency public information.

Question 9 also contains bias. This question reads as follows:

Suppose tbn first shelter you went to was already filled to capacity, what would ycu do? Would you search for another shelter or would you evacuate the area, or would you do something else?

This question requires that the respondent assume that a "search" must be engaged in to find an alternative shelter which suggests that seeking another shelter may be burdensome; the wording of this question would "direct" people to answer with the alternative major protective action, that is, "evacuate." Indeed, Attachment 4 reports frequencies of 55.0% intended evacuation in response to this question versus 31.4% intended seeking another shelter. In a real emergency, however, the "search" for another shelter could be but a few steps away. In this way the word "search" in this question significantly biases results toward respondents selecting evacuation as their behavioral intention.

Questions llA and 11B read as follows, respectively:

If you were in your vehicle for 1 hour1.157407e-5 days <br />2.777778e-4 hours <br />1.653439e-6 weeks <br />3.805e-7 months <br /> and traffic had moved less than 1 mile, would you remain in your car as authorities insist.. or would you get out?

Suppose you were in the car / truck / van for 3 hours3.472222e-5 days <br />8.333333e-4 hours <br />4.960317e-6 weeks <br />1.1415e-6 months <br /> with little movement -- would you still remain in your car as authorities insist or would you get out?

Attachment 4 frequencies report that 14.5% intended car abandonment after 1 hour1.157407e-5 days <br />2.777778e-4 hours <br />1.653439e-6 weeks <br />3.805e-7 months <br />, while 38.3% intend it after 3 hours3.472222e-5 days <br />8.333333e-4 hours <br />4.960317e-6 weeks <br />1.1415e-6 months <br />. Two points are relevant regarding this sequence of questions and answers. First, there is significant bias in questions on questionnaires that are sequential such as these; bias certainly operates in terms of question interaction to inflate second question response. Put simply question 11B is biased to overestimate car abandonment since it would be heard by some respondents as follows: "Ok, so

ou wouldn't consider abandonment of your car after one hour, what about if we tripled the time and said three hours?"

Intended car abandonment almost tripled from 14.5% to 58.3%.

A third question was not asked, for example, "what about if we said you were in your car with little movement for 6 hours6.944444e-5 days <br />0.00167 hours <br />9.920635e-6 weeks <br />2.283e-6 months <br />

-- then would you abandon your car?" Were this third question asked, intended abandonment would have proportionately increased due to the bias introduced by question sequencing.

Second, these questions and answers illustrate more than bias; they illustrate how behavioral intentions and actual emergency public behavior can be dramatically different. Dr.

Luloff uses the car abandonment "data" on which to suggest I

i

_17 c

that car abandonment is a likely Seabrook evacuation problem, yet, we do not know of one evacuation in the history of the United States where car abandonment has ever been an impediment to evacuation. It is surprising that Dr. Luloff would have us ignore decades of actual data on actual behavior in evacuations (including evacuations precipitated by technological emergencies) in favor of respondent speculations based on biased questions in a poll. Dr.

Luloff's concern over car abandonment is an imprudent hypothesis for yet another reason. It appears as if his concern presumes that an abandoned car is one left on the road as opposed to one driven onto a shoulder and out of other traffic's way. Thic. presumption is unreasoned and unreasonable when viewed in light of the public's true concern for others in emergencies.

b. Internal Validity and "Factual" Data.

This Survey also sought to gather data on "facts" in addition to data on behavioral intentions. In particular, questions 2, 3A, 3B, 3C and 3D served to provide a basis for estimating auto occupancy. These questions about "facts" relevant to estimating auto occupancy, however, each contain enough bias to render auto occupancy estimates based on answers to these questions untrustworthy. The reasons for this conclusion follow.

Question 2 read as follows: "Counting yourself, how many people are with you today?" The answers given to this t

l question would be just that; that is, how many people were with the respondent. Answers would have undoubtedly included people who met each other on the beach, but who did not come to the beach in the driver-respondent's vehicle; these persons, for example, could have walked to the beach from their house or motel. The point is simply that the wording of question 2 would have biased results in terms of overestimating the number of people who came to the beach together.

Question 3A read as follows: "How did your party get to the beach today?" Answers to this question were: 1 = car (truck, van), 2 = motorcycle, 3 = bus, 4 = bicycle, and S =

other. A total of 107 or 18.3% of the respondents answered that they walked or got to the beach in some other way while 461 or 79.1% answered that they came in a car, truck or van.

These answers cannot be trusted since respondent referents could have varied depending on how the question was interpreted in terms of ". . . get to the beach today." Some respondents could have driven to friends / relatives / hotels /

motels and then walked to the beach. There is no way to know which part of their journey they had in mind when this question was answered.

Question 3B read as follows: "How many vehicles did your party take to the beach today?" There is no way to know, as was the case in question 3A, what respondents may have had in mind when this question was answered. For L

example, was beach defined as local friends / relatives or motel / hotel; was number of vehicles envisioned as the number literally taken to the interview cite or to the local domicile, and so forth. The point is simply that different respondent referents could have operated to jeopardize the internal validity of this question.

Questions 3A and 3B, discussed above, both lack internal validity and answers to these questions are hardly trustworthy for the same reason. Both questions are unreliable measures; answers would not be the same if the survey were repeated since respondent answers would vary depending on reference to what people happened to be thinking when the questions were asked. The questions did not seek to make measurement reliable; the answers therefore lack internal validity.

Question 3C read as follows: "Where did you park?",

and the answers provided were: 1 = parking lot, 2 = back or front yard, 3 = street, and 4 = other. This question and its answer categories is an unreliable measure. There is no way to know if back or front yard parkers were actually parked in driveways or if they were parked in yard overflow parking areas since different respondents could have defined the answer choice in different ways.

Question 3D read as follows: "About how long did it take to get from where you parked your car to the beach?"

557 respondents answered from 1 to over 10 minutes (see l

l

e frequencies in Attachment 4). Since 557 is 95% of the total 584 respondents, this implies that all but 27 respondents had a car parked at the beach. This is inconsistent with answers to question 3C in which 51 or 8.7% of respondents claimed to not have parked but rather to have walked to the beach; this is inconsistent with answers to question 3A in which 107 or 18.3% of respondents answered that they got to the beach by walking or in some other (non-car, truck, van, bus, motorcycle or bicycle) way, and is inconsistent with Dr.

Luloff's direct testimony at page 12 in which he claims that 18.3% of the respondents got to the beach by walking. These inconsistencies are not a surprise. What they reveal is simply that questions on surveys which lack internal validity and which are unreliable produce results which cannot be trusted. Obviously, these inconsistencies illustrate that it is impossible to use the data collected as part of this Survey regarding parking / walking, number of cars at the beach and so on with confidence. The "factual" data collected as part of this Survey simply lacks internal validity and is inaccurate.

Additionally, the vehicle occupancy rate calculated by Resource Systers Group is equally untrustworthy since it relied solely upon quite questionable data gathered with internally invalid measures. Invalid unreliable survey questions are, in essence, a "rubber ruler"; that is, the Survey would not produce the same results were it replicated.

l I