ML12335A679

From kanterella
Jump to navigation Jump to search
Official Exhibit - ENT000017-00-BD01 - Accuracy and Coverage Evaluation of Census 2000: Design and Methodology
ML12335A679
Person / Time
Site: Indian Point  Entergy icon.png
Issue date: 09/30/2004
From:
US Dept of Commerce, Bureau of Census
To:
Atomic Safety and Licensing Board Panel
SECY RAS
References
RAS 22095, 50-247-LR, 50-286-LR, ASLBP 07-858-03-LR-BD01
Download: ML12335A679 (202)


Text

United States Nuclear Regulatory Commission Official Hearing Exhibit Entergy Nuclear Operations, Inc.

In the Matter of:

(Indian Point Nuclear Generating Units 2 and 3)

ASLBP #: 07-858-03-LR-BD01 Docket #: 05000247 l 05000286 Exhibit #: ENT000017-00-BD01 Identified: 10/15/2012 ENT000017 Admitted: 10/15/2012 Withdrawn: Submitted: March 28, 2012 Rejected: Stricken:

Other:

Accuracy and Coverage Evaluation of Census 2000 : Desi nand Methodolo ptember 2004 DSSD/03*DM C(Yf DS CE P

(( t.n'- //') x - x-E M ND """'D CE j fu + CE i E-I c

DSE ij = Cenij x rDD,ij x

~m,j fv

+ g (p~m,j -p ~m,j))

ND ~D M nm,j h,f + M nm,j + (Pim,jfs,f Pom,jf4,f ND P nm,j f6,f + P~ Dnm,j + Pim,j fs,f + g (D ~D )

P nm,j - P nm,j

( (

DSf: (Yin) 00 I>5'f.: x reB IJSE ( en x r j)[) x rM u.s.

U S C EN S U S B U REA U Department of Commerce Economics and Statistics Administration u.s. CENSUS BUREAU Helping You Make Informed Decisions

ACKNOWLEDGMENTS This technical document was prepared William Bell, Patrick Cantwell, under the direction of Donna Kostanich, Deborah Fenstermaker, Richard Assistant Division Chief for Sampling and Griffin, Dawn Haines, Michael Ikeda, Estimation, Decennial Statistical Studies Donna Kostanich, Elizabeth Krejsa, Division. The overall management and Vincent Thomas Mule, Mary Mulry, coordination of the review was conducted Rita Petroni, Robert Sands, Eric by Dawn Haines and Douglas Olson. Schindler, Bruce Spencer, of Northwest-The combined efforts of numerous U.S. ern University, and David Whitford.

Census Bureau staff have culminated in Rhonda Geddings provided administra-the publication of this document. Some tive support.

staff members wrote chapters, while Bernadette Beasley, Meshel Butler, others reviewed chapters. In some cases, Helen Curtis, Susan Kelly, and Kim staff members filled both capacities.

Ottenstein of the Administrative and Contributing to the March 2001 portion of Customer Services Division, Walter Accuracy and Coverage Evaluation of Odom, Chief, provided publications and Census 2000: Design and Methodology printing management, graphics design, were Patrick Cantwell, Inez Chen, and composition and editorial review Danny Childers, Peter Davis, for print and electronic media. General James Farber, Deborah Fenstermaker, direction and production management Richard Griffin, Dawn Haines, were provided by James Clark, Assistant Howard Hogan, Michael Ikeda, Division Chief.

Donna Kostanich, Vincent Thomas Mule, Mary Mulry, Alfredo Margaret Smith of ACSD provided assis-Navarro, Douglas Olson, J. Gregory tance in placing the electronic version of Robinson, Robert Sands, and Michael this document on the Internet (see Starsinic. Joseph Waksberg, of Westat, www.census.gov/dmd/www/refroom.html).

Inc., reviewed these chapters for readabil-We are grateful for the assistance of the ity and consistency.

individuals listed and all others who Contributing to the A.C.E. Revision II sec- contributed but are not specifically tion of Accuracy and Coverage Evaluation mentioned. The preparation and publica-of Census 2000: Design and Methodology tion of this document was possible were Tamara Adams, Michael Beaghen, because of their invaluable contributions.

Accuracy and Coverage Evaluation of DSSDj 03-DM n Ii', x ... F x P E If CE iND fu + Ckf E,

CenijxrDD,ijx -D AtmJhJ( (D -D l~

~Jf2J + MnmJ + ~mJfsJ+g PnmJ-P nmJ PomJ f4 J ND -D P rtmJf6,/ +P rim) + r (D -D)

~mjJSJ+g PnmJ-P "mJ u.s. Department of Commerce Donald L. Evans, Secretary Theodore W. Kassinger, Deputy Secretary Economics and Statistics Administration Kathleen B. Cooper, Under Secretary for Economic Affairs u.s. CENSUS BUREAU Charles Louis Kincannon, Director

SUGGESTED CITATION FILES: Census 2000, Accuracy and Coverage Evaluation of Census 2000:

Design and Methodology U.S. Census Bureau, 2004 ECONOMICS AND STATISTICS ADMINISTRATION Economics and Statistics Administration Kathleen B. Cooper, Under Secretary for Economic Affairs U.S. CENSUS BUREAU Cynthia Z.F. Clark, Associate Director Charles Louis Kincannon, for Methodology and Director Standards Hermann Habermann, Marvin D. Raines, Deputy Director and Associate Director Chief Operating Officer for Field Operations Vacant, Arnold A. Jackson, Principal Associate Director Assistant Director and Chief Financial Officer for Decennial Census Vacant, Principal Associate Director for Programs Preston Jay Waite, Associate Director for Decennial Census Nancy M. Gordon, Associate Director for Demographic Programs

Foreword The U.S. Census Bureau conducted the Accuracy and produce the original estimates of net undercount released Coverage Evaluation (A.C.E.) survey to measure the in March 2001. Analysis and evaluations indicated that coverage of the population in Census 2000. The A.C.E. there were serious errors in the March 2001 A.C.E.

was designed to serve two purposes: (1) to measure the Research efforts to fix the detected errors resulted in net coverage of the population, both in total and for major improved coverage estimates referred to as A.C.E. Revi-subgroups, and (2) to provide data that could serve as the sion II. The second part of this document describes the basis for correcting the census counts for such uses as methodology used to correct for errors in the March 2001 Congressional redistricting, state and local redistricting, A.C.E.

funds allocation and governmental program administra- After extensive analysis and consideration, the Census tion. The A.C.E. survey provides critical information that Bureau ultimately decided not to use the A.C.E. - neither can be used to improve the census-taking process. the March 2001 nor the Revision II results - to correct the However, the design, methodology, operations and data Census 2000 counts or any other data products. A.C.E.

collection efforts are extremely complex and not widely Revision II, the superior of the two results, provides useful understood. The work described in this publication was a coverage measurement information that can be used for major undertaking, and the technical documentation is research purposes. All of these results, decisions, support-intended to increase awareness and knowledge, and sub- ing analyses, technical assessments, and limitations can sequently improve the 2010 Census and coverage mea- be found on the Census Bureaus Web site at surement techniques. www.census.gov/dmd/www/EscapRep.html.

Despite the fact that coverage measurement techniques This document is intended to promote knowledge and have been utilized by the Census Bureau for several encourage collaboration on coverage measurement issues.

decades, this is the first comprehensive documentation of As such, we welcome comments and suggestions from its kind. This technical document describes the method- colleagues on technical issues and also on the value of ologies that were used to produce estimates of Census this document.

2000 coverage error from the A.C.E. The first part of this document discusses the entire survey design used to Charles Louis Kincannon Director, U.S. Census Bureau U.S. Census Bureau

CONTENTS Section I: A.C.E. March 2001 Chapters

1. Introduction to the A.C.E. . . . . . . . . . . . . . . . . . . . . . . . . 1-1
2. Accuracy and Coverage Evaluation Overview . . . . . . . . . . . . . 2-1
3. Design of the A.C.E. Sample. . . . . . . . . . . . . . . . . . . . . . . 3-1
4. A.C.E. Field and Processing Activities . . . . . . . . . . . . . . . . . . 4-1
5. Targeted Extended Search . . . . . . . . . . . . . . . . . . . . . . . . 5-1
6. Missing Data Procedures . . . . . . . . . . . . . . . . . . . . . . . . 6-1
7. Dual System Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
8. Model-Based Estimation for Small Areas . . . . . . . . . . . . . . . . 8-1 Appendixes A. Census 2000 Missing Data . . . . . . . . . . . . . . . . . . . . . . . A-1 B. Demographic Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . B-1 C. Weight Trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1 D. Error Profile for A.C.E. Estimates . . . . . . . . . . . . . . . . . . . . D-1 Section I References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Section II: A.C.E. Revision II March 2003 Chapters
1. Introduction to A.C.E. Revision II . . . . . . . . . . . . . . . . . . . . 1-1
2. Summary of A.C.E. Revision II Methodology . . . . . . . . . . . . . . 2-1
3. Correcting Data for Measurement Error. . . . . . . . . . . . . . . . . 3-1
4. A.C.E. Revision II Missing Data Methods . . . . . . . . . . . . . . . . 4-1
5. Further Study of Person Duplication in Census 2000 . . . . . . . . . 5-1
6. A.C.E. Revision II Estimation . . . . . . . . . . . . . . . . . . . . . . 6-1
7. Assessing the Estimates . . . . . . . . . . . . . . . . . . . . . . . . . 7-1 Section II References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 iv

Accuracy and Coverage Evaluation of Census 2000:

Design and Methodology Section I A.C.E. March 2001 U.S. Census Bureau, Census 2000

Chapter 1.

Introduction to the A.C.E.

INTRODUCTION As mentioned earlier, the A.C.E. was designed to serve two The U.S. Census Bureau conducted the Accuracy and Cov- purposes. One goal was to measure coverage of the popu-erage Evaluation (A.C.E.) to measure the coverage of the lation, both total and in various major subdivisions such population in Census 2000 and to allow for the possibility as race/ethnicity, sex, major geographical areas, and of correcting the census results for the measured under- socioeconomical groupings. These measurements indicate count. It also provides a wealth of information on the whether changes made in enumeration methods in the census process and may, thus, enable improvement in 2000 census were successful in improving the census and future censuses. This document is written to provide a show where improvements may be necessary in future clear and permanent record of the methods and opera- censuses. Another goal was to provide data that could tions used in this project. serve as the basis for correcting the census counts. In The current chapter presents the objectives and scope of planning the A.C.E., the Census Bureau focused on the the A.C.E., and discusses limitations of what it was accuracy of population totals for both geographic areas attempting to accomplish. It includes a brief history of the and demographic groups. Consideration was given to the evolution of the statistical and operational methods upon possibility of both improving the population totals which the A.C.E. is based. Chapter 2 presents an overview (numeric accuracy) and population shares (distributive of the various statistical steps necessary to produce esti- accuracy). Although early planning considered using dual mates of census coverage and how they are tied into the system estimation to produce a one number census, operation of the survey. The sequence of major activities after the Supreme Court ruled on the use of sampling for and their timing is given. Subsequent chapters discuss congressional apportionment in 1999, the survey was in detail A.C.E. sampling, interviewing, processing, and redesigned and refocused on non-apportionment uses.

estimation steps. One important use was congressional redistricting. Thus Goals an important consideration in the design was to improve the accuracy of congressional districts, which average The evaluation of the completeness of census enumera-around 650,000 people. The U.S. Census Bureau also rec-tion has been an integral part of the decennial census ognized other uses, including state and local redistricting, since the 1950 census. This evaluation has taken on many funds allocation, and program administration. The tradi-forms including demographic analysis, administrative tional goals of coverage evaluation to inform users and record checks, matches to independent surveys, and aid in the planning of the next census continue to be dependent record rechecks and reinterviews.

important. These goals greatly influenced the sample and The evaluation of the five censuses from 1950 to 1990 estimation design.

clearly showed that each of the traditional decennial censuses undercounted the total population, and further, The A.C.E. Defined missed certain identifiable population groups at greater rates than others. Specifically, these evaluations clearly The A.C.E. is a post-enumeration survey, based on the showed that undercounts were not merely random occur- theory of dual system estimation. The results of the dual rences, but predictable biases in the census taking pro- system estimation can be used with model-based estima-cess. The undercount has been consistently higher for the tion to produce census files adjusted for the measured African-American population than for the rest of the popu- net undercount (or net overcount). The design involved lation, and while the data set is not so extensive, the evi- comparing (matching) the information from an indepen-dence also pointed to consistently higher undercounts for dent sample survey to initial census records.

Hispanics, Asians, Pacific Islanders, and American Indians than for the White non-Hispanic population. The under- In this process, the Census Bureau conducted field count was also related to socioeconomic status, chiefly interviewing and computerized and clerical matching of measured by home ownership, with renters having consis- records. Using the results of this matching, the Census tently higher undercounts. The U.S. Census Bureau Bureau applied dual system estimation to develop designed the Accuracy and Coverage Evaluation to mea- estimates of coverage for various population groups. The sure this differential undercount and, if possible, correct initial plans were to apply correction factors to the census the counts, thereby making the census more accurate. files that could be used to produce all required Census Introduction to the A.C.E. Section IChapter 1 1-1 U.S. Census Bureau, Census 2000

2000 tabulations, other than apportionment. The correc- Design Limitations of the A.C.E.

tion aspect of Census 2000 tabulations was later aban-The A.C.E. was designed to measure the household popu-doned. The A.C.E. can be summarized as follows:

lation for large social, economic, ethnic, racial and geo-

  • Select a stratified random sample of blocks for the graphic groups and compare them with the census counts.

A.C.E. The results provide a measure of net undercount and a mechanism to correct that net undercount, if that appears

  • Create an independent list of housing units in the advisable. Although the goal of the A.C.E. was to measure sample of A.C.E. blocks. the net undercount, it also provides information on the
  • Begin conducting telephone interviews of housing units separate components of the net undercount such as omis-that mailed in a completed questionnaire and that could sions and various types of erroneous enumerations in the be clearly linked to a telephone number. census. Measures of gross error cannot be obtained directly and exclusively from these components because
  • After the initial census nonresponse follow-up, conduct of the strict definition of correct that is needed to imple-a personal visit interview at every housing unit on the ment the dual system estimator. For example, A.C.E. treats independent list not already interviewed by telephone. census enumerations as not correctly enumerated if they
  • Match the results of the A.C.E. interview to the census lacked sufficient information for accurate matching. This and vice versa. requirement allows for more precise matching, but increases both the number of nonmatching cases and the
  • Search the census records for duplicates. number of cases coded as erroneous. A similar strict rule on correct block location of an address also increases both
  • Resolve cases that require additional information for the non-matches and erroneous enumerations. These rules matching by conducting a personal visit follow-up inter-may be inapplicable in the census outside the DSE con-view.

text.

  • Use the information from other, similar people to impute The design of the A.C.E. does not provide information on missing information.

very local or unique errors in the census process. Specifi-

  • Categorize the A.C.E. data by age, sex, tenure, cally, the A.C.E. was not designed to correct for particular race/ethnicity and other appropriate predefined vari- errors made by, say, a census taker or a local census man-ables into estimation groupings called post-strata. ager, or to correct for local errors in the census address list. The Census Bureau had other programs in place to
  • Calculate the coverage correction factors for each post-deal with these issues, such as the quality assurance pro-stratum using the dual system estimator.

cess, the coverage improvement follow-up, and the local

  • If appropriate, apply the coverage correction factors to update of census addresses. The A.C.E. was designed, correct the initial census data using a model-based esti- rather, to correct for large systematic errors in census tak-mator and tabulate the statistically corrected census ing, most especially the historic differential undercount.

results. Finally, the A.C.E. was not designed to measure the under-There are a number of assumptions inherent in the A.C.E. count for some special population groups such as the Proper application of the dual system estimation (DSE) group quarters population (including college dormitories, model requires the A.C.E. be conducted independently of institutions, and military barracks), the population that the census and that the rules used to determine correct uses homeless shelters and/or soup kitchens, or the enumerations are the same as the rules used to determine remote areas of Alaska. The Census Bureau instituted spe-cases eligible for matches. The DSE model can be sensitive cialized procedures for these groups in order to achieve to measurement errors. It is important to obtain consistent the best count possible. Extending the A.C.E. methods to reporting of Census Day residence. Inclusion of fictitious all of these populations would have been very costly and persons and errors in matching can directly influence the difficult to implement properly.

DSE. There are other assumptions necessary in developing HISTORY models for handling nonresponse and other missing infor-mation. The A.C.E. design was based very much on the Starting with 1950, every census has included a formal theoretical concepts discussed and publicly presented by study of the coverage of the population. The 2000 Accu-the Census Bureau in advance of the census. These con- racy and Coverage Evaluation (A.C.E.) is very much a con-cepts included careful attention to statistical indepen- tinuation of that tradition.

dence, a strict application of the concepts of sufficient 1950 through 1970 information, and careful attention to balancing the con-cepts used to measure census misses, as well as census The U.S. Census Bureau conducted its first post-erroneous inclusions. For a more detailed discussion of enumeration survey, or PES, as part of the 1950 census.

this approach see Hogan (2000). The essential elements in a post-enumeration survey are a 1-2 Section IChapter 1 Introduction to the A.C.E.

U.S. Census Bureau, Census 2000

second attempt to enumerate a sample of households and, During these same decades, the methods of dual system using case-by-case matching, to determine the number estimation were being refined for use in the human popu-and characteristics of people not included in the first cen- lation. Although introduced over a century ago for use in sus enumeration. This first PES was not based on dual sys- animal populations, dual system estimation was first used tem estimation. with human populations in an important article by Sekar and Deming (1949) that applied the technique to measur-During the next two decades the Census Bureau experi- ing births. Dual system estimation was widely used to mented with alternative coverage measurement methods measure births and deaths in developing countries during based on case-by-case matching including a Reverse the 1970s in conjunction with important operational and Record Check, administrative record checks, and a match theoretical work. The ideas from dual system estimation to the Current Population Survey. In addition, there were soon applied to post-enumeration surveys. See most various alternative versions of PES designs. importantly Marks (1979).

Soon after the completion of the 1950 census, methods of aggregate demographic analysis for coverage analysis 1980 were developed at Princeton University by Ansley Coale The design of the A.C.E. traces most directly to the 1980 and colleagues. See Coale (1955), Coale and Rives (1973), Post-Enumeration Program (PEP). This was the first large and Coale and Zelnick (1963) for details. Demographic scale post-enumeration survey to use dual system estima-analysis (DA) is the construction of an estimate of the tion. In addition, it included several important innova-true population using birth, death, migration and other tions, as well as important lessons on the design of a PES.

data sources. This methodology can provide independent measures of the census net undercount by age, sex, and The 1980 PEP was based on a match of people included in Black/non-Black; however, it is subject to its own limita- the April and the August Current Population Survey to the tions and uncertainties. An important limitation is the lack 1980 census. This match was used to determine the pro-of data to independently estimate the Hispanic, Asian, and portion of people counted in the census. It was a sample American Indian populations or other detailed demo- of people known to exist and be residents of the U.S., and graphic groups, such as homeowners or renters. Nor can was labeled the Population or P sample.

demographic analysis provide estimates for geographic All matching was done by clerks and technicians. In order areas below the national level. In addition, the level of to make it possible to do the matching, each persons emigration and undocumented immigration must be address needed to be assigned the correct census geo-estimated using indirect methods. Since the U.S. only had graphic code (geocoded). This process was slow and error reasonably complete birth registration since 1935, sophis-prone.

ticated analysis was needed in 1950 for the population over age 15. Early studies were restricted to the native- In addition, a separate sample of census records was born White population, but with time were expanded to drawn. This was known as the Enumeration or E sample.

include the native-born African-American population as The census records included in the E sample were checked well. in the office to see if they were duplicated, followed by a field operation to determine whether the people were real, Later work at the U.S. Census Bureau by Jacob Siegel and lived at the address on Census Day, and whether the unit colleagues expanded the estimates to the total population, was assigned the correct census geographic code (cor-with the first official estimates being issued in conjunction rectly geocoded).

with the 1970 census (Siegel, 1974). The 1970 estimates recognized the need to address the problem of race mis- One important concept introduced in 1980 was that of classification in the complete count. By the time of the sufficient information for matching. Sufficient information 1970 census, the population covered by birth registration for matching means that a record, from either the P or E included those under age 35, with tests of birth registra- sample, contains sufficient information, including most tion completeness having been conducted in 1940, 1950, importantly a name, to allow accurate matching and and the mid-1960s. Medicare data now provided a basis follow-up. Records that lack this information are removed for estimates for those over age 65. from matching, processing and estimation. For the E sample, this exclusion is done in two parts: census However, the difficulty of measuring migration, an impor- imputed records (non-data-defined) are excluded from tant component of DA, gained attention. These studies the sampling frame, and then sampled data-defined noted The figures on net immigration for the 1960 to records are reviewed for name and other necessary infor-1970 decade should be considered as estimates subject to mation.

considerable error. Importantly, the estimates did not include any allowance for...unrecorded alien immigration, Another concept used earlier but made explicit in 1980 particularly illegal immigration. See Siegel (1974) for was that of search area. A person was only considered more details. correctly enumerated if he/she was counted in a specific, Introduction to the A.C.E. Section IChapter 1 1-3 U.S. Census Bureau, Census 2000

defined area that included the address where he/she City of New York. The U.S. Census Bureaus position was should have been enumerated. This search area was to that the 1980 PEP was not of sufficient accuracy for this be applied to both the P and the E samples. purpose, and this decision was upheld.

The 1980 PEP was also, very importantly, the first PES to 1990 be, itself, carefully evaluated (Fay et al., 1988). This evalu-ation proved invaluable to the design of the 1990 PES. Building on the knowledge gained in 1980, the Census Among the important findings were: Bureau made major design changes for the 1990 PES.

Important changes included:

  • Sampling variances were very high.
  • Excluding institutional population and military
  • Geocoding a sample of housing units was costly and ships/barracks from the universe.

error prone.

  • The use of a block sample tied to census geographic
  • Drawing independent P and E samples made it very hard codes, with the same sample of blocks used for both to apply the same concepts, especially that of search the P and the E sample.

area.

  • Repeated call-back to reduce nonresponse and missing
  • Levels of missing data needed to be reduced and meth-data.

ods to account for the missing data needed to be refined.

  • A computer and computer-assisted clerical matching operation.
  • Matching needed to be made more accurate and faster.
  • A model to account for missing data taking into account
  • An independent sample of people living in institutions the important covariates.

proved nearly impossible to match and process, both because the interviews relied on the same set of admin- The design of the estimation cells (post-strata) was com-istrative records and because administrators often pletely changed. Following the advice of John Tukey and refused to give names, even to the Census Bureau. others, the estimation cells were not restricted to a single state, but allowed to cross state lines. Thus, Hispanics By 1980, the precision of demographic analysis benefited living in Utah could be combined with Hispanics living in from the fact that the part of the population not covered Colorado and other mountain states to form one estima-by either adequate birth registration data or Medicare data tion cell, rather than being combined with non-Hispanics was now reduced to only those 45 to 65 (in 1980).

living in Utah. A smoothing model was used to combine However, immigration, especially illegal/undocumented/

information within Census Region.

unauthorized immigration, remained a problem. Early demographic estimates for 1980, which again did not The 1990 PES was explicitly designed so that it could be contain an allowance for illegal immigration, showed a net used to adjust the census results. Specifically, model-overcount of the population. However, pathbreaking work based methods were developed to carry the estimates by Jeff Passel and colleagues produced the first estimates down to the smallest census geographic units (blocks) and of the number of illegal immigrants counted in the census. to include positive or negative whole person records to This work was generally validated when data from the account for the measured net undercount or overcount.

Immigration Reform and Control Act (IRCA) produced This complete file could then be aggregated to obtain data similar numbers of immigrants applying for legalization. that was consistent for all geographical levels.

Although the 1980 PEP was not explicitly designed to Many lessons were learned in 1990, many having to do correct the census for measured undercount, it was the with the need for tight operational control and testing.

first PES to be considered in this context. Increased use of One important statistical lesson concerned the use of the census results for congressional, state, and local redistrict- statistical smoothing methods. These methods became ing, as well as for federal funds allocation highlighted the highly controversial and became the focus of much statis-importance of census accuracy. The voting rights cases of tical analysis and debate. They were not well understood the 1960s (Baker v. Carr (1962), Reynolds v. Simms and the U.S. Census Bureau decided to drop the use of (1964)) had greatly increased the importance of census smoothing and instead recompute the results with fewer data in redistricting. General Revenue Sharing funds, and thus larger estimation cells.

distributed in part based on census data, became an important source of local government revenue in the mid Demographic analysis estimates went very smoothly in 1970s. The legal and statistical questions were discussed 1990 with birth registration and Medicare data covering in academic journals and as part of several lawsuits, all but those age 55 to 65. The IRCA data and the work of including influential suits by the City of Detroit and the Jeff Passell and others (see Fay et al., 1988, Chapters 2 1-4 Section IChapter 1 Introduction to the A.C.E.

U.S. Census Bureau, Census 2000

and 3) provided an allowance for undocumented immi- Another concern is the treatment in the DSE of cases grants. Further, for the first time, the Census Bureau pro- involved in the Housing Unit Duplication Operation duced explicit allowances for the uncertainty in the demo- (referred to as late census adds) and the level of whole graphic analysis estimates. This analysis showed that the person imputations in the census. These records were preferred or point demographic analysis estimates not included in the A.C.E. matching, processing, or tended to fall at the lower end of the uncertainty range. follow-up processes. They were also excluded from the However, this method of expressing the uncertainty range DSE, although properly accounted for in computing the came under criticism from outside the Census Bureau. net undercount. It is possible that, had these records Limitations of this method are documented in Robinson et been included in the A.C.E. and the DSE, the estimated al. (1993) and Himes and Clogg (1992). undercount would have differed. The number of excluded records is much larger than it was in 1990. If The 1990 Demographic Analysis estimates were in general the ratio of matches to correct enumerations is the agreement with the results of the 1990 PES. At the same for the excluded and included cases, the DSE national level the two estimates were very close 1.8 expected value should be nearly the same. However, if percent undercount for demographic analysis (later the people referred to in the correct cases were either revised to 1.7 percent) and 1.6 percent for the PES. At much more likely to have been included in the A.C.E. or more detailed levels, differences emerged, especially the much less likely to have been included, then excluding tendency for the PES to greatly underestimate the under-these cases from the A.C.E. would have changed the count for adult African-American males. Taking into level of correlation bias and affected the A.C.E. For account what was known about the biases and uncertain-more detail, see Hogan (2001).

ties of each, it seemed clear that both were measuring a real differential undercount even though PES was underes- There was a change in the treatment of people who had timating the amount for adult African-American males. moved between April 1 and the time of the PES inter-view. In 1980 and 1990, these movers were sampled 2000 at their current (i.e. PES Interview Day) address. In the In the early 1990s, task forces and National Academy of A.C.E., they were sampled at their Census Day, April 1, Science Panels suggested that the differential undercount address.

in the census could not be reduced without elaborate Although conceptually much the same, the implementa-enumeration and matching procedures, which are too tion of the search area was very different. In 1990, the costly to be carried out except on a sample of the popula- entire search area was always to be searched for all tion. In the 1995 and 1996 Census Tests, an alternative cases in order to find matches or duplicates, and all Census Plus methodology was compared to the DSE. The cases were map-spotted to determine whether they performance of the DSE was better and subsequent were inside the search area. In 2000, the search of the research efforts focused on improving the DSE. Conse- surrounding blocks was restricted by both targeting and quently, most of the A.C.E. design can be seen as a sampling. First, the surrounding block was searched for continuation and refinement of the 1990 PES design. only certain kinds of cases, specifically cases where Among the important refinements are: there was a likelihood of geocoding error in the basic

  • Much larger and better designed block sample. census process. In addition, a stratified sub-sample was taken for this search, with only some of the initial
  • Earlier interviewing, including the use of early telephone sample blocks subjected to this extended search. This interviewing.

process was known as Targeted Extended Search, or

  • Computer-assisted (laptop) telephone and personal TES.

interviewing.

Because of the difficulty in explaining and defending

  • More refined estimation cells (post-strata).

the 1990 smoothing methods, smoothing models were

  • Explicit collapsing rules to account for small cell size. not employed. Instead, the A.C.E. relied upon a larger
  • Explicit weight trimming rules in case of extraordinary sample size and a more refined set of estimation cells (outlier) cells. to produce estimates.

The survey universe was restricted to the housing Finally, although this was not a separate step, the A.C.E.

unit/household population. All group quarters, not just was subjected to much more exacting specification, military and institutional, populations were excluded. documentation and testing than any previous coverage Consequently, the A.C.E. estimate of coverage error will measurement study. Much of the operational success of be underestimated to the extent there were errors in the the A.C.E. can be traced to the care and attention given group quarters population. to documentation and testing.

Introduction to the A.C.E. Section IChapter 1 1-5 U.S. Census Bureau, Census 2000

This document is then very much part of the overall clearly as well as precisely and accurately, the A.C.E.

A.C.E. process. It attempts to document, concisely and design.

1-6 Section IChapter 1 Introduction to the A.C.E.

U.S. Census Bureau, Census 2000

Chapter 2.

Accuracy and Coverage Evaluation Overview INTRODUCTION being processed, the Census Bureau suspected that there was a significant number of duplicate addresses in the The Accuracy and Coverage Evaluation Survey (A.C.E.) was census files. To address the suspected housing unit designed primarily to measure the net undercoverage or duplication, the Housing Unit Duplication Operation was overcoverage in the census enumeration. The methodol-introduced in the fall of 2000. See Nash (2000) for further ogy used was dual system estimation that requires two details. The primary goal of this census operation was to independent systems of measurement. The P sample or improve the quality of the census; however, its design Population sample measured the housing unit popula-allowed the A.C.E. operations to proceed. Essentially, sus-tion, as did the census, but was conducted independently pected duplicate housing units were temporarily removed of the census. This was done by selecting a sample of from the census files, while further analysis was done for block clusters, geographically contiguous groups of these cases. Approximately 5.9 million person records blocks, and interviewing housing units that were obtained were in these suspected duplicate housing units, which by independently canvassing each block cluster. The were: 1) out-of-scope for the E-sample component of the results of the P sample were matched to census A.C.E., 2) not available for the person matching including enumerations to determine the omission rate in the the identification of person duplicates in the E sample, and census. Additionally, a sample of census enumerations,

3) excluded from the census component in the dual sys-the E sample, was selected to measure the erroneous enu-tem estimates. Approximately 2.3 million person records meration rate in the census. The E sample was comprised were reinstated into the census after the E sample was of census enumerations in the same sample block clusters selected and were reflected in the net coverage estimates.

as the P sample. These overlapping samples reduced Hogan (2001) showed that excluding these person records variance on the dual system estimator, reduced the from the A.C.E. would not affect the dual system esti-amount of field activities and their cost, and resulted in mates, if the number of P-sample matches was reduced efficient data processing.

proportionately to the number of E-sample correct enu-There were considerable challenges in the implementation merations.

of the A.C.E. One of the requirements of the A.C.E. was to This chapter summarizes the major activities of the A.C.E.

produce measures of net undercount or overcount shortly and indicates their relationship to the census. Subsequent after the census counts were compiled. This was a daunt- chapters go into considerably greater detail about the ing task because the requirement for independence meant methodology of the A.C.E. and are organized as follows:

that A.C.E. activities could not interfere, or in any way affect the results of the census enumerations, or vice

  • Chapter 3. Design of the A.C.E. Sample versa. As with most surveys, the A.C.E. consisted of
  • Chapter 4. A.C.E. Field and Processing Activities designing a sample, creating a frame, selecting the sample, conducting the interviews, dealing with nonre-
  • Chapter 5. Targeted Extended Search sponses and missing information, as well as producing the
  • Chapter 6. Missing Data Procedures estimates. In addition, the A.C.E. had several matching and field follow-up activities. In order to accomplish these
  • Chapter 7. Dual System Estimation tasks and meet the goals of the A.C.E. in a timely manner,
  • Chapter 8. Model-Based Estimation for Small Areas its design was uniquely built around census operations.

The intent of this chapter is to provide a broad context for Additionally, to ensure quality with such a compressed the design of the A.C.E. Here we give a sequential time schedule, it was essential that software systems be accounting of these activities. Table 2-1 gives the order in written and thoroughly tested prior to the start of an which the A.C.E. activities occurred and maps the activi-activity.

ties to the chapter where each is discussed in further One census operation that had major influence on the detail. This table shows the substantial integration of the A.C.E. design and estimation plan was the Housing Unit sampling and operational activities. Figure 2-1 shows the Duplication Operation. As the census questionnaires were flow of the major activities.

Accuracy and Coverage Evaluation Overview Section IChapter 2 2-1 U.S. Census Bureau, Census 2000

Table 2-1. Sequence of A.C.E. Activities Rather than abandoning the effort, i.e., software develop-ment, etc., that had already been invested in the ICM, it Activity Description Chapter(s) was more efficient, particularly from a software quality 1 First-phase sampling 3 perspective, to complete the sampling for the ICM, and 2 Independent listing 4 then select a subsample for the A.C.E. The infrastructure 3 Second-phase sampling 3 for the field staff was being deployed in preparation for 4 Initial housing unit matching/field follow-up 4 5 Targeted extended search 4&5 the first field operation that started in September, 1999, 6 Subsampling within large block clusters 3 and the development of the sampling system that was 7 A.C.E. person interviewing 4 scheduled to begin production in March, 1999 was well 8 E-sample identification 3&4 9 Person matching and field follow-up 4 underway. There was not adequate time to redesign the 10 Missing data processing 6 A.C.E. sample allocation entirely, select the sample, 11 Dual system estimation 7 produce the different listing materials including maps, 12 Model-based estimation for small areas 8 conduct the listing as scheduled, and ensure a high level of quality in a revised software system. Consequently, the Table 2-2 further illustrates the integration of the sampling A.C.E. sample design was derived from the ICM design activities and operations by summarizing the sample size using a double sampling approach. The entire ICM sample at each phase of sampling and the operations for which was selected as originally planned and then reduced the sample is an input. The data collected from each through various steps to yield the A.C.E. target housing operation is input to the next sampling operation. For unit sample.

example, the first phase of sampling resulted in 29,136 sample areas with almost 2 million housing units. The first-phase sampling consisted of:

Independent address lists were created for these areas.

The results of the independent listing were used in the

  • Forming primary sampling units.

second phase of sampling.

  • Stratifying primary sampling units.

Activity 1. First-Phase Sampling Timing: March through June, 1999; prior to the creation

  • Systematic sampling of primary sampling units.

of the census address list.

The A.C.E. primary sampling unit was the block cluster, a At the time of the January, 1999 Supreme Court ruling group of one or more geographically contiguous census against the use of sampling for apportionment, the blocks. To make efficient field workloads, the target size Census Bureau was heavily involved in the first phases of of block clusters was about 30 housing units, although sampling for the Integrated Coverage Measurement (ICM). block clusters varied in size. Within each state, block The goal of the ICM was to produce reliable estimates of clusters were stratified by size using housing unit counts coverage of each states total population, and this required from a preliminary census address list: small (0 to 2 a very large sample - a 750,000 housing unit sample was housing units), medium (3 to 79 housing units), and large planned. As a result of the Supreme Court ruling, state (80 or more housing units). Some states included a sepa-population estimates for apportionment were no longer rate sampling stratum for American Indian Reservations.

key estimates of the coverage survey; instead, the goal Within each sampling stratum, a systematic sample of was to measure census coverage for national and subna- block clusters was selected with equal probability.

tional population domains having different census cover-age properties. These estimates could be measured with This phase of sampling yielded 29,136 block clusters with sufficient precision with a sample of about 300,000 an estimated 2 million housing units in the 50 states and housing units. the District of Columbia.

Table 2-2. Sample Sizes by Sampling Phase and Operation Sample size Sampling phase Operations Areas Housing units First-phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29,136 1,989,000 Independent listing Second-phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11,303 844,000 Initial housing unit matching/follow-up Subsampling within large cluster (P-sample) . . . . 11,303 301,000 A.C.E. person interviewing, person matching/follow-up, dual system estimation E-sample identification . . . . . . . . . . . . . . . . . . . . . . . 11,303 311,000 Person matching/follow-up, dual system estimation 2-2 Section IChapter 2 Accuracy and Coverage Evaluation Overview U.S. Census Bureau, Census 2000

Activity 2: Independent Listing clusters. Lower sampling rates were, therefore, used in this stratum. However, two considerations were taken into Timing: September through early-December, 1999; well account in establishing the lower rates. One goal was to before census enumeration began.

avoid having small clusters with an overall probability of Field staff visited the sample block clusters and created an selection much lower than the probability of selection of independent address list of all housing units, including other clusters in the sample. A second goal was to have housing units at special places. The goal of this operation higher probabilities of selection for small clusters in which was to create an independent address frame of all the the number of housing units was greater than the housing units that were likely to exist on Census Day, expected 0 to 2 housing units. These two goals attempted April 1, 2000. Since this operation occurred prior to to reduce the contribution of small clusters to the variance Census Day, any potential housing unit structures were of the dual system estimates. Small block clusters with the included on the independent address list. Later, during potential for high erroneous enumeration or nonmatch housing unit follow-up, these structures were visited rates were retained at higher rates. The second-phase to confirm that they actually contained housing units sample contained 11,303 block clusters for the 50 states on Census Day. Since housing units could not be added and the District of Columbia.

to the independent address frame in this later operation, but could be removed, it was important to include struc- Activity 4: Initial Housing Unit Matching and Field tures with questionable housing unit status during the Follow-Up independent listing.

Timing: February through April, 2000; prior to census This listing consisted of approximately 2 million housing nonresponse follow-up.

units or potential housing units in the 50 states and the District of Columbia. The objectives of these operations were:

Activity 3: Second-Phase Sampling 1. Create a list of confirmed A.C.E. housing units in order to:

Timing: December, 1999 through February, 2000; prior to mailing the census questionnaire.

  • obtain the best list of housing units to facilitate per-son interviewing in later activities.

The second phase of sampling selected block clusters from the first phase to be the final A.C.E. sample areas.

  • have better control of the final A.C.E. housing unit Block clusters were stratified using two housing unit sample size.

counts: 1) a count from the independent listing operation, and 2) a count from the updated census address list as of 2. Establish a link between the A.C.E. and census hous-January, 2000. It was important to reduce the first-phase ing units in order to:

sample before the next operations, the housing unit

  • identify the A.C.E. housing units eligible for tele-matching and field follow-up, to reduce the number of phone interviewing.

clusters going into those operations. The stratification of the block clusters was done separately by first-phase sam-

  • facilitate overlapping P and E samples.

pling strata: 1) medium and large strata, and 2) small strata. All first-phase clusters from the American Indian 3. Identify potential geocoding errors in order to:

Reservation stratum were retained in the second-phase sample.

  • establish the targeted extended search sampling frame.

Medium and large strata. The resulting national sample allocation was roughly proportional to state population

  • identify sample areas for which the creation of a new with some differential sampling within states. The two independent address list, or relisting, was necessary.

goals of the differential sampling were: 1) to provide sufficient sample to support reliable estimates for several Housing unit matching. The housing units on the cen-sub-populations, and 2) to reduce the variance contribu- sus address list in January, 2000 were matched to the tion due to clusters with the potential for high omission A.C.E. independent address list. First, the addresses were or erroneous enumeration rates. These clusters were iden- computer matched. The computer matching was followed tified and put into separate sampling strata by comparing by a clerical review of the computer match results in an the consistency of housing unit counts between the inde- automated environment intended to find additional pendent list and the updated census list for each cluster. matches using supplemental materials. There was also a clerical search, limited to the block cluster, for duplicate Small cluster stratum. Conducting interviews and housing units during this phase of the matching. Possible follow-up operations in small block clusters is much more duplicates in both the A.C.E. and the census were identi-costly per housing unit than in medium or large block fied.

Accuracy and Coverage Evaluation Overview Section IChapter 2 2-3 U.S. Census Bureau, Census 2000

Housing unit follow-up. In some cases, the computer The initial housing unit matching results were used to and clerical matching were not able to determine the sta- identify the A.C.E. housing unit nonmatches and census tus of a housing unit. Field staff visited these cases to get housing unit geocoding errors. Clusters without A.C.E.

more information about these housing units. After match- housing unit nonmatches or census geocoding errors were ing, the cases which were not matched, possibly matched, out-of-scope for the targeted extended search sampling.

or possible duplicates were sent to the field for follow-up Changes to the census inventory of housing units after interviews. Some of the matched cases were also sent for January, 2000 were not reflected in the housing unit additional information. The field follow-up was designed matching used to identify targeted extended search to determine if a housing unit existed, if it existed in the clusters.

block cluster, or if different addresses were referring to Only whole households of nonmatched people were the same housing unit.

eligible for the extended search during person matching.

Partial household nonmatches (i.e., some household mem-Activity 5: Targeted Extended Search bers were matches) were not as likely to indicate that the Timing: May, 2000. housing unit was a geocoding error.

The targeted extended search was designed to improve Activity 6: Subsampling Within Large Block the accuracy of the dual system estimate by searching for Clusters matches, correct enumerations and duplicates one ring Timing: April and May, 2000; during census nonresponse beyond the sample block cluster. The operation was follow-up.

implemented in a subset of A.C.E. block clusters selected through a combination of certainty and probability Subsampling was used in large block clusters for the final sampling. selection of housing units to participate in the P sample.

The objective was to reduce costs and yield manageable There are census geocoding errors of exclusion and inclu- field workloads without seriously affecting the precision sion in the A.C.E. sample block clusters. Census geocod- of the A.C.E. by taking advantage of the high intra-class ing errors of exclusion (i.e., housing units miscoded in the correlation expected in large block clusters. Since the census so they appear to be outside the A.C.E. block large block clusters had a higher initial probability of cluster) affect the P-sample match rate. Census geocoding selection than medium block clusters, the reduction in errors of inclusion (i.e., housing units miscoded in the sample size had a fairly minor effect on the precision of census to appear inside the block cluster) affect the erro- the A.C.E. estimates. The subsampling of housing units neous enumeration rate in the census or E sample. If the within large clusters brought the overall probability of census housing unit is omitted from the sample block selection of these housing units more in line with housing cluster, the P-sample household can not be matched. This units in the medium clusters.

yields a lower match rate. On the E-sample side, if a hous-ing unit is included in the sample block cluster due to a Any block cluster with 80 or more confirmed A.C.E. hous-geocoding error, the E-sample people will be considered ing units, based on the initial housing unit match, was erroneously enumerated. eligible for this housing unit reduction. The reduction of housing units within a large block cluster was done by The primary motivation for using an extended search area forming groups of adjacent housing units, called seg-was to reduce the sampling variance of the dual system ments, and selecting one or more segments for A.C.E.

estimates due to census geocoding error. Even though the person interviewing. The segments had roughly equal extended search allowed more P-sample people to be numbers of housing units within a block cluster. Segments matched and more E-sample people to be converted to of housing units were used as the sampling unit in order correct enumerations, the expected value of the dual sys- to obtain compact interviewing workloads and to facilitate tem estimate should not be affected as long as the two the identification of an overlapping E sample. The A.C.E.

samples were treated equally with respect to the search housing units that were retained after all of the subsam-area. Another benefit is that the extended search makes pling comprise the P sample.

the dual system estimate more robust by protecting After the reduction of housing units within large block against potential bias due to P-sample geocoding error.

clusters was completed, the A.C.E. interview sample size Previous census evaluations have shown that geocoding for the 50 states and the District of Columbia was approxi-errors are highly clustered. The targeted extended search mately 300,000 housing units.

was designed to take advantage of the distribution of Activity 7: A.C.E. Person Interviewing geocoding errors by focusing on those clusters that con-tain the most potential geocoding errors. The implementa- Timing: April through mid-June, 2000 for the telephone tion of this operation resulted in dual system estimates phase; Mid-June through mid-September, 2000 for the per-with more precision. sonal visit phase; after census enumeration was complete.

2-4 Section IChapter 2 Accuracy and Coverage Evaluation Overview U.S. Census Bureau, Census 2000

The goal of the A.C.E. person interview was to provide a census housing units, the within-cluster segments of adja-list of persons who lived at the sample address on Census cent housing units defined for the P-sample reduction Day, as well as those who lived at the address at the time were mapped on to the census records. This was possible of A.C.E. interviewing. The A.C.E. person interview was when a link between the census and A.C.E. housing unit conducted using a Computer Assisted Personal Interview was established during the initial housing unit matching.

(CAPI) instrument. Using specific rules, census housing units that did not have this link were assigned to a segment. The segment To get an early start on interviewing, a telephone inter-selected for the P sample was selected for the E sample. If view was conducted at households for which the census the sample segment contained 80 or more census housing questionnaire was data-captured and included a telephone units with no established link to an A.C.E. housing unit, number. Both households with mail returns and then a systematic sample of these housing units was enumerator-filled questionnaires were eligible for tele-selected to reduce the E-sample person follow-up work-phone interviews. Certain types of housing units, such as loads.

those without house number and street name, were not eligible for a telephone interview. All remaining interviews This resulted in approximately 311,000 census housing following the telephone operation were conducted in per- units in the E sample for the 50 states and the District of son. However, some nonresponse conversion operation Columbia.

interviews and interviews in gated communities or secured buildings were conducted by telephone. Activity 9: Person Matching and Field Follow-Up The person interview was conducted only with a house- Timing: October and November, 2000.

hold member during the first 3 weeks of interviewing. If an interview with a household member was not obtained Insufficient information for matching. Rules were after 3 weeks, an interview with a nonhousehold member established for determining which person records had was attempted. This was called a proxy interview. Proxy sufficient information for matching. These rules were interviews were allowed during the remainder of the inter- established and applied before the start of the matching viewing period. During the last 2 weeks of interviewing a operation to avoid introducing potential bias into the nonresponse conversion operation was attempted for the matching results. Both the P and E samples used the same noninterviews using interviewers who were considered to rules. Each person record required a complete name and be the best available. two other characteristics.

Person matching. All P-sample persons who lived at Activity 8: E-Sample Identification each sample housing unit on Census Day were matched to Timing: October, 2000. the people enumerated in the census to estimate the match rate. Census persons in the E sample who matched The E sample consisted of the census enumerations in the to the P sample were considered to be correctly enumer-same sample areas as the P sample. All data-defined cen-ated. The E-sample person records that did not match to sus person records in the A.C.E. block clusters were eli-the P sample were interviewed during field follow-up gible to be in the E sample.1 To be a census data-defined operations to classify them as correctly or erroneously person, the person record must have two 100-percent enumerated. This matching was a computer operation data items filled. Name was not required for the person with clerical review. Variables such as name, address, date record to be considered data-defined, but could be one of birth, age, sex, race, Hispanic origin, and relationship of the two items required to be data-defined. Like the were used to identify matches between the P sample and P sample, it was sometimes necessary to subsample the census enumerations. Duplicates were identified in both census housing units in a cluster when it contained a large the P sample and E sample. If a case qualified for targeted number of census housing units. The goal of the E-sample extended search, the search for matches and duplicates identification was to create overlapping P and E samples in was extended to the ring beyond the sample block cluster.

an effort to reduce person follow-up workloads. An over-lapping P and E sample is not necessary, but improves Person follow-up. The person follow-up interview col-both the cost effectiveness of the subsequent operation lected additional information that was sometimes neces-and the precision of the dual system estimates. sary for the accurate coding of the residence status of the nonmatched P-sample people and the enumeration status If a block cluster had fewer than 80 census housing units, of the nonmatched E-sample people. The goal of this then all of the census housing units in the block cluster operation was to confirm that ambiguous P-sample non-were in the E sample. For block clusters with 80 or more matches actually lived in the sample block cluster on Census Day. Thus, follow-up interviews for P-sample non-1 Excludes data-defined person records temporarily removed matched cases were carried out when there was a possibil-from the census. ity the residence status was not correct. Similarly, Accuracy and Coverage Evaluation Overview Section IChapter 2 2-5 U.S. Census Bureau, Census 2000

E-sample nonmatch cases were subject to follow-up inter- enumeration rate in the census. The dual system estimator views to determine if they were correctly or erroneously assumes that all persons have the same probability of enumerated in the block cluster. Possible matches were being captured in the census. This is obviously an over-interviewed to resolve their match status. There were also simplification of the existing situation. Post-stratification other cases sent to follow-up, such as matched people sharply reduced the likelihood that this assumption would with unresolved residence status and other types of cases bias the results, since it only requires equal capture prob-considered to have the potential for geographic errors in abilities within post-strata.

the P sample. The person follow-up interview used a paper questionnaire. Interviewers gathered information Post-stratification. Dual system estimation was used to that permitted each person to be coded as a matched calculate the proportion of persons missed in each of a resident/nonresident or a nonmatched resident/ number of relatively homogeneous population groups nonresident of the block cluster on Census Day. There was called post-strata. The post-strata for the Census 2000 considerable emphasis on obtaining a knowledgeable A.C.E. were defined by the variables: race/Hispanic origin respondent before the follow-up questions were asked. domain, age/sex, tenure, census region, metropolitan After the follow-up interview was completed, the results statistical area size/type of enumeration area, and census were reviewed by clerks who assigned final status to these return rate. A complete cross-classification of these vari-cases using an automated system. ables would have unnecessarily increased the variances of the estimates due to small expected sample sizes in many Activity 10: Missing Data Processing of the post-strata. Consequently, many of the detailed cells were combined. In the United States, there were 448 Timing: December, 2000 through the early part of potential post-strata which were collapsed to 416 post-January, 2001.

strata on the basis of small observed sample sizes or high Since the results of the matching operation were to be coefficients of variation.

used in the estimation phase of the A.C.E., it was neces-sary to determine the match, correct enumeration and The dual system estimate. The dual system estimate residence status of all sample cases. When these could not (DSE) for each post-stratum was defined by:

be resolved through computer and clerical matching or CE Np through field follow-up interviews, the match, correct enu- DS E DD Ne M meration, or residence probabilities were imputed based on the distribution of outcomes of the resolved follow-up where DD was the number of data-defined persons in the interviews. Also, as in the census, some respondents did census at the time of A.C.E. matching,2 CE was the not answer all the questions in the A.C.E. interview which weighted estimate of the number of people in the census were needed for estimation. If the variables tenure, sex, who were correctly enumerated, Ne was the weighted race, Hispanic origin, or age were blank for P-sample indi- estimate of the number of people in the census, Np was viduals, the missing information was imputed based on the weighted estimate of the number of people found by the distribution of the variable within the household, the the independent A.C.E. collection procedures, and M was overall distribution of the variable, or using hot-deck the weighted estimate of the number of persons found by methods, depending on the variable. Imputation for miss- the independent A.C.E. collection procedures who were ing information in the E sample was resolved in the census matched to persons enumerated in the census.

processing. Finally, a noninterview adjustment was made to account for the weights of households that should have Activity 12: Model-Based Estimation for Small been interviewed in A.C.E., but were not. Areas Timing: February, 2001.

Activity 11: Dual System Estimation Activities 1 through 11 were designed to provide esti-Timing: Late January, 2001.

mates of net coverage for Census 2000. These estimates Dual system estimation was used to estimate the net can serve two purposes. One purpose was to provide undercount or overcount of the household population information on the quality of the census so that analysts included in the census. Coverage estimates of persons can make more intelligent use of the data, and to help the living in group quarters or in Remote Alaska areas were Census Bureau improve procedures for future censuses.

not made. The second purpose was to have a basis for adjusting the census counts for net coverage, if deemed appropriate.

The term dual system estimation is used because data The sample sizes used in the A.C.E. provided adequate from two independent systems are combined to measure the same population. After matching to the census, the P sample was used to measure the omission rate in the 2 The data-defined persons term excludes cases temporarily census. The E sample was used to measure the erroneous removed from the census.

2-6 Section IChapter 2 Accuracy and Coverage Evaluation Overview U.S. Census Bureau, Census 2000

reliability for such estimates for the U.S. as a whole, and coverage correction factors higher than 1.00, while cover-for major geographical areas. However, the sample sizes age correction factors less than 1.00 in a post-stratum were too small to provide reliable estimates for most occurred when erroneous enumerations rates in the states, counties, cities, and the thousands of other munici- census exceeded omission rates.

palities that normally make use of census data. As a result, model-based estimation was used in these areas. A coverage correction factor was calculated for each post-stratum. If a post-stratum was estimated to have Model-based estimation treats the coverage correction more persons than the census count, within each block a factors as uniform within a given post-stratum. Another random sample of the appropriate size of census people way of saying this is that the coverage error rate for a in the post-stratum was selected. The data of the selected given post-stratum is assumed to be the same within all people were replicated in their blocks with a weight of geographic areas. This assumption is obviously an over-

+1. If a post-stratum was estimated to have fewer people simplification, and small errors are introduced. However, than the census count, within each block a random sample the model-based estimates provide a consistent set of esti-of the appropriate size of people in the post-stratum was mates in which the sum of the population counts for small selected. The data of the selected people were replicated areas are equal to the dual system estimates of much in their blocks with a weight of -1. Under this procedure larger areas (e.g., the U.S. total, regions, etc.).

no reported data for any individual was removed from the Coverage correction factors were obtained by dividing the Census 2000 data files. A controlled rounding procedure dual system estimates by the census counts of persons in was used to produce integer-valued model-based esti-housing units. Persons in group quarters were not mates at various geographic levels.

adjusted for net coverage. Coverage correction factors for population groups that generally had good coverage were Estimates were made at various levels by aggregating the close to 1.00. Population groups with poor coverage had data from the appropriate blocks and/or post-strata.

Accuracy and Coverage Evaluation Overview Section IChapter 2 2-7 U.S. Census Bureau, Census 2000

Figure 2-1 Flow of Major A.C.E. Activities With Respect to Major Census Activities Census P Sample Initial Phase Sampling Initial Census Address List Independent Listing Updated Second Phase Census Address Sampling List HU Matching &

Follow-up Census Enumerations & Targeted Subsampling Nonresponse Extended HUs in Large Follow-up Search Clusters Person Interviewing Unedited Census Person File Reflecting E-Sample Person Duplication Identification Matching &

Operation Follow-up Missing Data Process Edited Census Person File Dual-System Estimation Adjusted Model-Based Census Person Estimation File 28 Section IChapter 2 Accuracy and Coverage Evaluation Overview U.S. Census Bureau, Census 2000

Chapter 3.

Design of the A.C.E. Sample INTRODUCTION sample along with a broad overview of the sample design.

Later sections of this chapter provide a more in-depth The A.C.E. sample design was a multiphase, national description of the A.C.E. design and are available for read-sample of 301,000 housing units. Its development was ers who desire greater detail.

heavily influenced by its planned predecessor, the Inte-grated Coverage Measurement survey (ICM). Initial plans A.C.E. SAMPLE OVERVIEW AND RESULTS for Census 2000 were for a one-number census corrected for coverage based on the ICM. A primary purpose of the The A.C.E. consisted of two parts. The Population Sample, ICM was to produce direct state estimates of coverage P sample, and the Enumeration Sample, E sample, have with sufficient reliability for apportionment population traditionally defined the samples for dual system estima-counts. This called for a state-based design and a much tion. Both the P sample and the E sample measured the larger overall national sample of 750,000 housing units. same household population. However, the P-sample opera-The January 1999 Supreme Court ruling against the use of tions were conducted independent of the census. The E sampling for apportionment resulted in a change of plans sample consists of census enumerations in the same for the Census 2000 coverage survey for which the pri- sample areas as the P sample. After matching with the mary goal became the production of reliable national cen- census lists and reconciliation, the P sample yields an esti-sus coverage estimates, and of selected sub-populations. mated rate at which the population was missed in the cen-This did not require as large a sample. sus whereas the E sample yields an estimated rate at which enumerations were erroneously included in the cen-The A.C.E. sample design was derived from the ICM sus. Combining them yields an A.C.E. estimate of net cen-sample design. By the time the change of plans for the sus coverage of the household population.

Census 2000 coverage survey occurred, many operational plans for the ICM were too far advanced to make signifi- The Accuracy and Coverage Evaluation had three sampling cant changes required for a newly conceived sample phases:

design plan. The implementation plans and software sys- 1. First-phase sample. The selection of the ICM tems for creation of the sampling frame and selection of sample, comprising a large number of sample areas the ICM sample were moving along and almost ready to for which a list of housing unit addresses was created start. Much of the field office infrastructure and staffing independent of the census.

was being put in place for the first field operation under the ICM sample plan. It was critical to proceed as planned 2. Second-phase sample. The reduction of the first-in order to meet schedules. phase sample which resulted in the A.C.E. sample areas.

The A.C.E. sampling plan was thus developed as a mul-tiphase design. The much larger ICM sample was first 3. Third-phase sample. The reduction of housing units selected. Field staff canvassed the sample areas to create by subsampling within unusually large A.C.E. sample an independent address list. Then, using updated mea- areas.

sures of size from the field canvass, the ICM sample was Table 3-1 summarizes the A.C.E. sample size after each re-stratified and reduced with differential probabilities of phase of sampling for the United States. The dates given selection to create the A.C.E. sample design. in the table are the production dates. The housing unit Sections on the A.C.E. sample and its design are directed counts are approximate, based on the best known infor-to a general audience. They provide results of the A.C.E. mation at the time of the particular sampling phase.

Design of the A.C.E. Sample Section IChapter 3 3-1 U.S. Census Bureau, Census 2000

Table 3-1. Census 2000 A.C.E. Sample Sizes by Sampling Phase Start and finish date Sampling phase Sample areas Estimated housing units March, 1999 thru June, 1999 First-phase 29,136 1,989,000 December, 1999 thru February, 2000 Second-phase 11,303 844,000 April, 2000 thru May, 2000 Third-phase 11,303 301,000 within-cluster reduction 3,153 106,000 no within-cluster reduction 8,150 195,000 SURVEY CHARACTERISTICS AND THE A.C.E. housing units, which was an efficient interviewer work-SAMPLE DESIGN load. An important block cluster characteristic was well-defined, physical boundaries. Ambiguous block cluster Main Characteristics of the A.C.E. Sample boundaries could potentially lead to errors of omission or erroneous inclusion in the A.C.E. sample.

The A.C.E. sample:

Phases of the A.C.E. sample. Three phases of the

  • Is a probability sample of 301,000 housing units in A.C.E. sampling were:

11,303 sample areas for the United States.

  • Yields estimates of net census coverage of persons in 1. Selection of an initial sample of approximately 30,000 households and housing units for the nation excluding block clusters for which the field staff developed an Remote Alaska. independent list of housing unit addresses.
  • Has independent samples in each state and the District 2. Selection from the initial sample results of a sub-of Columbia, but there are no state-based design sample of block clusters for the A.C.E. sample based criteria. on the results of the independent list.
  • Has total state sample sizes roughly proportional to 3. Selection of a subsample of housing units within large population size with the exception that the smaller block clusters.

states have additional sample; these smaller states have similar sample sizes. First phase consisted of the selection of a system-atic sample in each state. In the first phase of the

  • Uses some differential sampling within states for areas A.C.E. sampling, block clusters in each state were classi-that may contribute disproportionately to total variance fied by size into four mutually exclusive groups known as or have higher concentrations of historically under- sampling strata: (1) clusters with 0 to 2 housing units counted population groups. (small stratum), (2) clusters with 3 to 79 housing units
  • Has a separate sample of American Indian Reservation (medium stratum), (3) clusters with 80 or more housing and other associated trust lands. units (large stratum), and (4) clusters on American Indian Reservations with three or more housing units (American
  • Uses updated measures of size at each phase of Indian Reservation stratum). Block clusters with 80 or sampling. more housing units were selected with higher probability than medium clusters in this phase because housing units
  • Balances operational limitations such as field workloads in large clusters were subsampled in a later operation, and statistical issues such as weight variation.

bringing the overall probability of selectionthe inverse of the sampling weightfor housing units in these clusters Overview of the Design more in line with the overall selection probabilities of The A.C.E. uses a multiphase sample to measure the net housing units in medium clusters. Within each sampling coverage for the household population in Census 2000. stratum, clusters were sorted and a systematic sample The national sample, 301,000 housing units in 11,303 was selected with equal probability.

sample areas, was distributed among the 50 states and the District of Columbia roughly proportional to popula- Second phase involved the reduction of the ICM tion size except for the smaller states that had their first-phase sample to the level desired for the samples increased. A.C.E. In the second phase, the block clusters from the medium and large sampling strata were re-stratified based Primary sampling unit. The block cluster was the Pri- on the estimated demographic composition of the block mary Sampling Unit (PSU) for the A.C.E. Each block cluster clusters and the relationship between the housing unit consisted of one or more geographically contiguous cen- count from the independent list and the January 2000 sus blocks. Each block cluster contained on average 30 updated census address list. This was done separately for 3-2 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

the medium and large strata within each state. These sub- preliminary census housing unit count or the 1990 census strata are referred to as reduction strata. Within each address count for a block cluster containing city-style reduction stratum, the clusters were sorted, and a system- addresses, house number and street name. For block clus-atic sample was selected with equal probability within ters with non-city-style addresses, the measure of size each reduction stratum. This reduction used different was the preliminary 2000 census housing unit count. The selection probabilities across the reduction strata within a rules for determining which housing units on the prelimi-state and across states. nary 2000 census files would eventually move forward to the Decennial Master Address File had not been defined, Next, using housing unit counts from the independent list so the block cluster measure of size was based on a rea-and the January 2000 updated census address list, the sonable set of criteria, but not the final set.

small block clusters were stratified within each state by size, and systematic samples were selected from each Second-phase sample. For the second phase of sam-stratum with equal probability. All clusters from the small pling, the block cluster measure of size was the count of sampling stratum with 10 or more housing units based on housing units on the list of housing unit addresses created the updated information were retained. All clusters from independently of the census in the fall of 1999. The reduc-the small sampling stratum that were on American Indian tion of the medium and large block clusters used a pre-land as well as List/Enumerate clusters were also retained. liminary count of these housing units, which was a clerical The second phase of sampling was not done for the tally of housing units from the listing sheets. The small American Indian Reservation sampling stratum. block cluster reduction used the count of housing units from the independent listing sheets after the addresses The third phase consisted of the sample reduction had been keyed. For the most part, the preliminary and of housing units within large block clusters. In the the keyed counts for each block cluster were identical, but third phase of A.C.E. sampling, a subsample of housing for some clusters there were differences. Using a prelimi-units was selected within large clusters. If a cluster con-nary count was necessary because the medium and large tained 79 or fewer housing units, all the housing units cluster reduction had to be completed before the keying were included in the A.C.E. sample. In clusters with 80 or of the independent listing sheets was done.

more housing units, a subsample was selected to reduce the cost of data collection. This phase of sampling Third-phase sample. For the third phase of A.C.E. sam-resulted in lower variation of selection probabilities for pling, the block cluster measure of size was the housing housing units within the same reduction stratum because unit count resulting from the housing unit matching and the large clusters had a higher probability of selection at follow-up operation. This operation confirmed the count the first phase. This subsampling was done by forming resulting from the independent listing and removed any groups of adjacent housing units, called segments. A sys- nonexistent addresses from the sampling frame.

tematic sample of segments within each cluster was selected. All housing units in the selected segments were FIRST PHASE OF THE A.C.E. SAMPLE DESIGN included in the A.C.E. sample.

The sample selection during the first phase consisted of The P sample and the E sample. The P sample con- three major steps:

sisted of the households used for the A.C.E. interviews

1. Definition of the primary sampling units.

that were conducted in these selected block clusters and block cluster segments. The E sample was the set of 2. Stratification and allocation of the primary sampling census enumerations in these same block clusters and units within each state.

block cluster segments.

3. Selection of the primary sampling units within each Measures of Size state.

As stated earlier, the A.C.E. sample design used updated Defining the Primary Sampling Unit measures of size at each phase of sampling.

The Primary Sampling Units (PSUs) for the A.C.E. were First-phase sample. The block cluster measure of size block clusters. The PSUs were delineated in such a way for the first-phase sample was based on preliminary cen- that they encompass the entire land area of the United sus files existing in the spring of 1999. Ideally, the source States, except for extremely remote areas of Alaska. Each of the block cluster measure of size would have been the block cluster consisted of a census block or several geo-Decennial Master Address File, the base file of census graphically contiguous census blocks. They contained an addresses for the decennial programs. However, the first average of 30 housing units. The land area for each PSU version of this file was not available until the summer of was made reasonably compact so it could be traversed by 1999, too late for use in the block clustering. Instead, the an interviewer in the field without incurring unreasonable first-phase measure of size was typically the higher of the costs.

Design of the A.C.E. Sample Section IChapter 3 3-3 U.S. Census Bureau, Census 2000

Why the block cluster? A basic design decision, which neighboring blocks, these zero housing unit blocks was a continuation from the 1990 Post-Enumeration Sur- remained stand-alone zero block clusters.

vey, was that the PSU would be a block cluster, a single The two operational goals of forming block clusters were block or a group of adjacent blocks established for the to increase listing efficiency and to reduce the chance of collection of Census 2000 information. These blocks may listing error. The first goal was met by collapsing census be standard city blocks or irregularly shaped areas with blocks to produce block clusters that were geographically identifiable political or geographic boundaries. Using compact and which averaged about 30 housing units, a block clusters as PSUs, instead of counties or county manageable workload. The second goal was to create groups that are more commonly used in national surveys, block clusters that were well defined to minimize the improved the precision considerably with only a modest increase in costs. chance that the cluster would be listed incorrectly. For example, a listing error may result when a census block An alternative sample design was considered that would has an invisible or nonphysical boundary such as city lim-have defined PSUs by segmenting whole blocks into its making it unclear where the block boundary was. As a smaller components (roughly one-half of a block.) The result, census blocks separated by invisible boundaries alternative design would likely have resulted in reduced were always combined.

sampling error, but was rejected because it would increase costs (primarily due to increased matching workloads and Limitations. As mentioned earlier, the block cluster interviewer travel) and probably would have resulted in measure of size for the first phase was based on prelimi-matching errors due to problems in identifying (spatially) nary census address counts. Some census operations that the PSU boundaries. helped build the census address list were not available at the time block clustering started. Instead, a snapshot of Goals of block clustering. Block clusters were formed the best known information was used. This presented to meet both statistical and operational goals. In the Cen-some limitations with the data used for block clustering.

sus 2000 Dress Rehearsal, a small census block was by definition a single block cluster. This rule led to a large

  • Address limitations: The results of the Block Canvassing number of small block clusters that could potentially exert and Local Update of Census Addresses (LUCA) opera-undue influence on the final population and variance esti- tions were not incorporated into the census address list mates. One feature of block clustering under the Census in time for block clustering. Block Canvassing was a 2000 A.C.E. procedure was to combine small census Census 2000 field operation in mailout/mailback areas blocks with adjacent census blocks, if the neighboring (mostly city-style addresses). The Census Bureau sent block contained one or more housing units. This change in staff into the field to canvass their assignment areas the treatment of small census blocks had an enormous and provide updates to the address list such as correc-impact on the number of small block clusters, which was tions, adds, or deletes. Local Update of Census reduced by approximately 65 percent as seen in Table 3-2. Addresses was also a Census 2000 program that pro-Still, many block clusters contained zero housing units. vided an opportunity for local and tribal governments to Roughly 70 percent of the zero housing unit blocks review and update address information in the census occurred in sparsely populated areas. Without populated address list.

Table 3-2. Accuracy and Coverage Evaluation: Block Cluster Summary Statistics1 Preliminary number of housing units 0-2 3 - 79 80+ Total Number of census blocks2 . . . . . . . . . . . . . . . . . . . . . . . . 2,969,000 4,009,000 245,000 7,223,000 Number of block clusters . . . . . . . . . . . . . . . . . . . . . . . . . . 1,029,000 2,486,000 252,000 3,767,000 Number of blocks per cluster3 . . . . . . . . . . . . . . . . . . . . . 1.3 2.2 1.5 1.9 Number of housing units per cluster . . . . . . . . . . . . . . . . 0.3 29.2 181.9 31.5 1

The United States and Puerto Rico are included in these summary statistics.

2 Count of census collection blocks before clustering and before block suffixing. Does not include water blocks or census blocks in Remote Alaska.

3 These numbers are not the first row divided by the second row. They are the number of census blocks in each block cluster size category divided by the number of block clusters in each category. For example, if two census blocks with 40 housing units collapse to form an 80 housing unit block cluster, those two census blocks are counted in the 80+ category for the number of blocks per cluster computation. Block clustering can combine across categories; therefore, the first and second rows are not consistent.

3-4 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

  • Geographic limitations: Each block in the census block cluster and an average of 31.5 overall. The number address list had a Type of Enumeration Area (TEA) of small block clusters also decreased from nearly three assignment. For Census 2000, TEA is a classification million to about one million, an approximate 65 percent that identified both the census enumeration method and reduction from the Census 2000 Dress Rehearsal rules of the method used to compile the census address list. The defining a small block to be a cluster by itself. However, block clustering operation occurred concurrently with since about 70 percent of small blocks occurred in less the census review of TEA assignments to ensure the populated areas with little or no population to combine, most complete coverage of the area. This review pro- many single zero-housing unit block clusters were formed.

cess sometimes changed the TEA assignment of blocks after the block cluster was defined. On a few occasions, Stratifying and Allocating the Primary Sampling this resulted in a block cluster consisting of blocks that Units had different methods for compiling the census address list. For example, a block cluster consisted of three Stratifying the first-phase sample. Prior to sampling, blocks, and all three blocks had a TEA assignment of block clusters were stratified according to the expected Block Canvassing and Mailout/Mailback at the time of number of housing units and the American Indian Reserva-block clustering. After the census TEA review, one of tion (AIR) status of the block cluster. The four sampling those blocks was converted to an Address Listing and Update/Leave TEA assignment. For a complete list of strata and their definitions are presented in Table 3-3.

TEAs for Census 2000, see the attachment or visit http://www.geo.census.gov/mob/homep/teas.html. Allocating the first-phase sample. As stated earlier, the Census Bureau was preparing to conduct the ICM, a General rules for defining block clusters. much larger coverage measurement survey of 750,000

  • Block clusters were formed by combining neighboring housing units, when the use of sampling for apportion-Census 2000 blocks. ment counts was disallowed by the Supreme Court in January, 1999. To keep the coverage measurement survey
  • Block clusters did not cross specific geographical on schedule, the Census Bureau went ahead with the plans boundaries. Among these were county, interim census to select the ICM sample and create independent address tract, Local Census Office, TEA group, military area, and lists. This was followed by the subsampling of the first-American Indian Country. For TEA groups, blocks from phase sample to produce the A.C.E. sample design.

certain TEAs could be clustered together if the TEAs had the same method for compiling the address list. Ameri- The first-phase sampling plan was a national sample of can Indian Country refers, collectively, to lands that are 30,000 block clusters: 25,000 medium and large block American Indian Reservation or other trust lands, tribal clusters and 5,000 small block clusters. Included in the jurisdiction statistical areas (now known as Oklahoma 25,000 block clusters was a separate sample of block Tribal Statistical Areas), tribal designated statistical clusters for American Indian Reservations.

areas, and Alaska native village statistical areas.

It is important to point out that the allocation of the

  • Blocks separated by an invisible boundary, a city line, 25,000 medium and large block clusters was dependent for example, were clustered except for the situations on the ICM sample design and under the assumption of described above.

roughly 30 housing units per block cluster. The allocation

  • Whenever possible, small census blocks, those with of the 5,000 small block clusters to the states and the fewer than three housing units, were clustered with separate American Indian Reservation sample to the states neighboring census blocks containing housing units to was done prior to defining block clusters for all states, reduce the total number of small block clusters. If there since the first-phase sampling was done on a state-by-were no neighboring census block with housing units, state flow-basis. This means that the first-phase sample the small census block was a cluster by itself. was selected for some states before the block clusters had
  • To prevent block clusters from becoming too large with been defined for other states. As a result, we used the respect to housing unit size, census blocks with 80 or best information we had at the time to carry out the more housing units were generally not clustered with allocation.

other census blocks.

Medium and large block clusters. The 25,000

  • In addition to the criteria of unit size, any block larger medium and large block clusters were allocated to the than 15 square miles was generally a block cluster by states to meet the ICM sample requirements (Schindler, itself. 1998) with some minor modifications. Most states had These rules produced 3.8 million block clusters, about half between 300 to 500 block clusters and the very largest the 7.2 million non-suffixed census blocks. The block clus- states had an allocation of between 1,000 and 2,000 block ters had an average of 29.2 housing units per medium clusters.

Design of the A.C.E. Sample Section IChapter 3 3-5 U.S. Census Bureau, Census 2000

Table 3-3. First-Phase Sampling Strata First-phase sampling stratum Definition Small 0 to 2 housing units Medium 3 to 79 housing units Large 80 or more housing units American Indian Reservation 3 or more housing units and on American Indian Reservations Within each state, the block cluster sample was propor- classified as small, but were observed to have a larger tionally allocated to the medium and large sampling strata number of housing units, there was concern about high based on the number of housing units in the sampling sampling weights disproportionately contributing to vari-stratum: ance. In an attempt to avoid the problems associated with the high weights, a larger number of small clusters was Hstate,k cstate,k Cstate initially selected, followed by an independent address list, Hstate followed by a subsample to remain in sample. Using where, updated measures of size for those 5,000 small block k = medium or large sampling stratum; clusters in the small cluster reduction helped to target cstate,k = target number of clusters in sampling clusters that could have contributed disproportionately to stratum k within state; the variance. These initial 5,000 small clusters were allo-Cstate = target number of A.C.E. first-phase cated to states proportionately to their estimated total medium and large sample clusters for number of housing units in small blocks.

state; Hstate,k = number of housing units in sampling stra- Ideally, we would have allocated the 5,000 block clusters tum k within state; proportionally to states based on the number of small Hstate = number of housing units in the medium block clusters in the state. This was not possible because and large strata in state. the first-phase sampling was done on a flow basis.

As an example, lets say that 402 total medium and large American Indian Reservation block clusters. To block clusters were allocated to a particular state. Assum- ensure sufficient sample for calculating reliable coverage ing that there are an expected 9,000 housing units in all estimates for American Indians living on reservations, we clusters in the medium sampling stratum and 12,060 allocated 355 block clusters to American Indian Reserva-housing units in both the medium and large sampling tions nationwide. The 355 clusters were allocated to 26 strata, the target number of clusters from the medium states proportional to the 1990 population of American sampling stratum for the state is calculated as follows: Indians living on reservations. Small block clusters on American Indian Reservations were not included in these 9,000 355 block clusters. These clusters were eligible for selec-Cstate, medium 402 300.

12,060 tion in the small cluster stratum. Block clusters within states containing little or no American Indian population The target number of clusters from the large sampling on reservations were represented in the medium and large stratum would then be 102.

strata.

Small block clusters. Because of cost considerations, This sample allocation resulted in variable first-phase small block clusters were generally sampled at a lower selection probabilities across the states despite our goal of rate than either medium or large clusters. An overall allo- having proportional allocation of the American Indian Res-cation of 5,000 small block clusters was chosen because a ervation (AIR) sample. This occurred because the average total of 30,000 block clusters was deemed manageable for number of housing units per American Indian Reservation creating independent address lists. The high weights block cluster varied across states. To get similar first-resulting from the lower sampling rates were not expected phase selection probabilities, we needed to have all of the to have a serious impact on the estimates or variances for block clustering completed before allocating the sample.

most clusters selected from the small block cluster sam- However, the first-phase sampling was done on a flow pling stratum. However, for clusters that were initially basis.

3-6 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

Selecting the Primary Sampling Units Within Each PWstate = planning workload estimate for the large State sampling stratum in state.

Calculation of the sampling parameters. The block cluster probability of selection (PS) for each of the four Sorting the PSUs. The first-phase clusters were sorted sampling strata in each state is the ratio of the target within each sampling stratum as follows:

sample size to the number of clusters in the stratum. It

  • American Indian Country Indicator takes the following form:
  • Demographic/Tenure Group cstate,k L state PSstate,k ,
  • 1990 Urbanization Cstate,k where,
  • County code PSstate,k = probability of selection (sampling rate) in
  • Block cluster identification number sampling stratum k within state; Cstate,k = number of clusters in sampling stratum k Although there was no differential sampling within the within state; four first-phase sampling strata, the clusters were sorted cstate,k = target number of clusters in sampling stra- by several variables in an attempt to improve the repre-tum k within state; sentativeness of the sample of block clusters. The first Lstate = the factor to reduce the number of clusters variable was the American Indian Country Indicator, which to select for the state, if the expected listing separated the block clusters into three American Indian workload exceeded the planning estimate. categories:

L state { 10<forL small, medium and AIR sampling stratum 1 for large sampling stratum

1. American Indian Reservation or other trust land,
2. tribal jurisdiction statistical area, Alaska native village The large block cluster sampling rate was reduced if the statistical area or tribal designated statistical area, and expected number of housing units to list was greater than the planning estimate of the listing workload. A second 3. all other areas.

step of sampling was necessary in Missouri and Indiana because the selected sample of clusters resulted in a The second sort variable was the demographic/tenure greater number of housing units to list than was expected.

group. Block clusters containing similar demographic/

To meet operational constraints, a subsample of the first-tenure proportions, based on 1990 census data, were step selected block clusters was selected. The second step grouped. To aid in selecting a sample that was well repre-of sampling only occurred in the large sampling stratum, sented by the six major race/origin groups, as well as since that stratum disproportionately contributed to the owners and renters, block clusters were classified into 12 listing workload. The second step occurred only if the esti-demographic/tenure groups. Although many block clus-mated number of housing units in the medium and large ters tend to have a large proportion of one demographic/

strata was at least ten percent larger than the planning tenure group, rarely were they entirely composed of only estimate of the number of housing units to be listed.

one, thus many clusters fit well in two or more categories.

For states needing the second step of sampling, the sam- To ensure that each cluster was assigned to only one pling rate took the following form: group, a hierarchical assignment rule was developed so that when a cluster exceeded the first group threshold, it PWstate PS2state was assigned to that group. These thresholds were based Wstate on a multivariate clustering method applied to 1990 cen-

where, sus blocks. Table 3-4 lists these threshold values. The hier-PS2state = second-step sampling rate for the large archy gives the smaller demographic groups priority over sampling stratum in state, the larger ones and renters priority over owners. For Wstate = resulting workload estimate from sample example, if the approximate distribution of a block cluster selection for the large sampling stratum population was 20 percent Asian Renter, 40 percent Asian in state, Owner, and 40 percent White and other Renter, then the block cluster was assigned to the Asian Renter demographic/tenure group.

Design of the A.C.E. Sample Section IChapter 3 3-7 U.S. Census Bureau, Census 2000

Table 3-4. Demographic/Tenure Group 1. Sampling units were sorted using the PSU sort criteria Thresholds (50 States and the described at each sampling phase.

District of Columbia)

2. Each successive PSU was assigned an index number 1 Order Demographic/Tenure Group Threshold through N within each sampling stratum where N is the number of PSUs in the stratum.

1 Hawaiian and Pacific Islander renters 10%

2 Hawaiian and Pacific Islander owners 10% 3. A random number (RN) between zero and one, 3 American Indian and Alaska Native renters 10% 0 < RN 1, was generated.

4 American Indian and Alaska Native owners 10%

5 Asian renters 20% 4. A random start (RS) for the sampling stratum was cal-6 Asian owners 20% culated. The random start was the random number 7 Hispanic renters 20% multiplied by the inverse of the probability of selec-8 Hispanic owners 20%

9 Black renters 25% tion, RS = RN 1/PS, such that 0 < RS 1/PS.

10 Black owners 25% 5. Sampling sequence numbers were calculated. Given N 11 White and other renters 30%

12 All others all others PSUs, sequence numbers were:

RS, RS + 1 (1/PS), RS + 2 x (1/PS),...,RS + n (1/PS) where n was the largest integer such that A third sort variable was the estimated level of urbaniza- [RS + (n-1) 1/PS] N. Sequence numbers were tion based on 1990 data for each block cluster. Each block rounded up to the next integer. An integer cluster was categorized either as an urbanized area with number rounded to itself.

250,000 or more people, an urbanized area with less than

6. Sampling sequence numbers were compared to the 250,000 people, or a non-urban area. And finally, the clus-index numbers assigned to PSUs. The PSU with the ters were sorted geographically using county and cluster index number corresponding to the rounded sequence number. number was selected. All PSUs without corresponding index numbers were not in sample.

General sampling procedure. A systematic sample of block clusters was selected from each sampling stratum First-Phase Sample Results with each block cluster having the same probability of Table 3-5 lists the block cluster sample sizes and the num-selection within a sampling stratum. The method used to ber of housing units by sampling stratum for each state, select systematic samples follows: the District of Columbia, and the nation.

3-8 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

Table 3-5. State First-Phase Sample Results by First-Phase Stratum First-phase housing units1 First-phase block clusters State Small Medium Large AIR Total Small Medium Large AIR Total Alabama . . . . . . . . . . . . . . . . . . . . . . 60 7,900 19,000 0 26,960 116 286 109 0 511 Alaska . . . . . . . . . . . . . . . . . . . . . . . . 20 5,200 23,200 20 28,440 20 190 137 1 348 Arizona . . . . . . . . . . . . . . . . . . . . . . . 20 7,800 44,700 2,600 55,120 86 269 180 113 648 Arkansas . . . . . . . . . . . . . . . . . . . . . 40 9,600 15,900 0 25,540 90 353 101 0 544 California . . . . . . . . . . . . . . . . . . . . . 50 45,000 227,600 230 272,880 184 1,442 1,311 11 2,948 Colorado . . . . . . . . . . . . . . . . . . . . . . 20 8,000 25,600 60 33,680 83 293 157 2 535 Connecticut . . . . . . . . . . . . . . . . . . . 10 6,100 25,600 0 31,710 20 211 159 0 390 Delaware . . . . . . . . . . . . . . . . . . . . . 20 7,200 28,700 0 35,920 20 243 156 0 419 District of Columbia . . . . . . . . . . . . 10 4,800 50,500 0 55,310 20 132 247 0 399 Florida . . . . . . . . . . . . . . . . . . . . . . . . 50 7,500 50,100 30 57,680 145 259 230 1 635 Georgia . . . . . . . . . . . . . . . . . . . . . . . 70 6,100 30,300 0 36,470 154 220 162 0 536 Hawaii . . . . . . . . . . . . . . . . . . . . . . . . 10 3,000 42,400 0 45,410 20 103 161 0 284 Idaho . . . . . . . . . . . . . . . . . . . . . . . . . 10 8,200 10,900 140 19,250 54 312 75 6 447 Illinois . . . . . . . . . . . . . . . . . . . . . . . . 100 8,600 22,300 0 31,000 185 281 140 0 606 Indiana . . . . . . . . . . . . . . . . . . . . . . . 80 6,100 9,700 0 15,880 140 202 51 0 393 Iowa . . . . . . . . . . . . . . . . . . . . . . . . . 120 6,800 9,500 0 16,420 147 242 53 0 442 Kansas . . . . . . . . . . . . . . . . . . . . . . . 110 6,400 11,100 30 17,640 193 237 63 1 494 Kentucky . . . . . . . . . . . . . . . . . . . . . 60 7,200 22,300 0 29,560 96 268 135 0 499 Louisiana . . . . . . . . . . . . . . . . . . . . . 10 11,300 24,900 0 36,210 65 407 155 0 627 Maine . . . . . . . . . . . . . . . . . . . . . . . . 20 5,800 11,000 10 16,830 38 226 79 1 344 Maryland . . . . . . . . . . . . . . . . . . . . . 20 5,300 38,000 0 43,320 36 177 175 0 388 Massachusetts . . . . . . . . . . . . . . . . 20 6,400 22,000 0 28,420 38 229 140 0 407 Michigan . . . . . . . . . . . . . . . . . . . . . . 50 7,900 15,100 150 23,200 122 268 104 5 499 Minnesota . . . . . . . . . . . . . . . . . . . . 70 6,000 14,000 270 20,340 141 208 83 10 442 Mississippi . . . . . . . . . . . . . . . . . . . . 40 8,400 11,700 120 20,260 81 303 77 3 464 Missouri . . . . . . . . . . . . . . . . . . . . . . 110 5,700 14,500 0 20,310 162 200 71 0 433 Montana . . . . . . . . . . . . . . . . . . . . . . 10 8,400 9,700 840 18,950 67 333 67 24 491 Nebraska . . . . . . . . . . . . . . . . . . . . . 80 6,800 7,700 70 14,650 142 245 55 3 445 Nevada . . . . . . . . . . . . . . . . . . . . . . . 10 6,400 57,800 190 64,400 46 225 230 5 506 New Hampshire . . . . . . . . . . . . . . . 20 5,700 15,400 0 21,120 25 201 106 0 332 New Jersey . . . . . . . . . . . . . . . . . . . 10 8,700 30,100 0 38,810 39 282 178 0 499 New Mexico . . . . . . . . . . . . . . . . . . . 10 9,300 24,800 1,640 35,750 108 335 136 70 649 New York . . . . . . . . . . . . . . . . . . . . . 80 17,600 124,700 70 142,450 143 603 631 5 1,382 North Carolina . . . . . . . . . . . . . . . . . 100 6,700 20,700 80 27,580 143 236 121 4 504 North Dakota . . . . . . . . . . . . . . . . . . 100 5,900 9,100 340 15,440 121 236 64 12 433 Ohio . . . . . . . . . . . . . . . . . . . . . . . . . 110 7,800 24,000 0 31,910 132 268 133 0 533 Oklahoma . . . . . . . . . . . . . . . . . . . . . 60 9,000 17,300 270 26,630 142 314 101 8 565 Oregon . . . . . . . . . . . . . . . . . . . . . . . 10 5,200 15,400 70 20,680 86 195 90 3 374 Pennsylvania . . . . . . . . . . . . . . . . . . 110 12,900 22,600 0 35,610 180 427 146 0 753 Rhode Island . . . . . . . . . . . . . . . . . . 10 7,600 18,000 0 25,610 20 256 108 0 384 South Carolina . . . . . . . . . . . . . . . . 40 8,200 19,100 0 27,340 95 285 112 0 492 South Dakota . . . . . . . . . . . . . . . . . 50 5,800 9,200 450 15,500 106 242 57 27 432 Tennessee . . . . . . . . . . . . . . . . . . . . 90 7,800 25,400 0 33,290 133 285 137 0 555 Texas . . . . . . . . . . . . . . . . . . . . . . . . 70 34,700 148,500 30 183,300 349 1,222 681 1 2,253 Utah . . . . . . . . . . . . . . . . . . . . . . . . . 10 9,100 23,900 120 33,130 38 312 144 7 501 Vermont . . . . . . . . . . . . . . . . . . . . . . 20 5,600 12,000 0 17,620 21 201 88 0 310 Virginia . . . . . . . . . . . . . . . . . . . . . . . 60 5,600 31,900 0 37,560 98 96 166 0 460 Washington . . . . . . . . . . . . . . . . . . . 20 5,600 21,400 480 27,500 73 187 120 17 397 West Virginia . . . . . . . . . . . . . . . . . . 30 5,000 13,100 0 18,130 46 189 79 0 314 Wisconsin . . . . . . . . . . . . . . . . . . . . . 80 6,200 8,200 220 14,700 119 211 58 10 398 Wyoming . . . . . . . . . . . . . . . . . . . . . 10 8,700 9,200 90 18,000 72 346 69 5 492 Total U.S. . . . . . . . . . . . . . . . . . . . . . 2,400 438,600 1,539,800 8,620 1,989,420 5,000 15,393 8,388 355 29,136 1

Preliminary census address list housing unit counts from spring 1999.

SECOND PHASE OF THE A.C.E. SAMPLE DESIGN units that were sent for interview. Since not all of the first-phase block clusters were required for A.C.E., the reduc-The second phase, often referred to as the A.C.E. reduc-tion subsampled those clusters, with the selected clusters tion phase, linked the first-phase sample selection to the retained for the A.C.E. operations.

A.C.E. sampling plan. The A.C.E. reduction was the first of several operations that reduced the number of housing Following the selection of the A.C.E. first-phase sample, units from the nearly two million housing units in the field staff visited the block clusters and created an inde-independent listing to the approximately 300,000 housing pendent address list for A.C.E. These updated housing Design of the A.C.E. Sample Section IChapter 3 3-9 U.S. Census Bureau, Census 2000

unit counts were used in the cluster subsampling phase. cluster reduction: a demographic group and a consistency The cluster subsampling was done separately for: group. Block clusters were put into reduction strata based on the combination of these two groups.

  • medium and large cluster reduction, and Demographic groups were based on the demographic/
  • small block cluster reduction. tenure groups created in the first-phase sample selection.

The demographic/tenure groups represented a classifica-Medium and Large Cluster Reduction tion of block clusters, using the information of race/

Hispanic origin and tenure of each block reported in the The medium and large cluster reduction was the transition 1990 census. The demographic/tenure groups were used to the A.C.E. sampling plan. The resulting national sample as a sort variable in the selection of the first-phase allocation was roughly proportional to state population sample. For this reduction, clusters were put into two with some differential sampling within states. Only block demographic groups by combining the 12 demographic/

clusters from the medium and large first-phase sampling tenure groups in Table 3-4. The two demographic groups strata in the 50 states and the District of Columbia were are:

subsampled in this phase. As part of the sample reduc-tion, two other objectives of the A.C.E. sample were imple- 1. Minority: block clusters from one of the ten minority mented. demographic/tenure groups One objective of the medium and large cluster reduction 2. Non-minority: block clusters from one of the two design was to stratify the first-phase clusters based on the other demographic/tenure groups relationship of current housing unit counts from the A.C.E. For this reduction, two updated cluster housing unit independent listing and the updated census address list as counts were used: the independent listing housing unit of January, 2000. Clusters were sampled with different count and the housing unit count from the updated census selection probabilities in order to reduce the variance con- address list as of January 2000. The two housing unit tribution due to inconsistent housing unit counts between counts were compared, and clusters were placed into con-the updated census list and the independent list. Clusters sistency groups based on the relationship of the housing with significant differences between the counts were unit counts. Large differences between the counts indi-expected to have high erroneous enumeration and high cated that coverage problems might occur; thus, the sam-omission rates. The objective of differentially sampling pling weights for such clusters were controlled to avoid these types of clusters was to reduce the sampling serious variance effects.

weights associated with clusters having relatively high Clusters were placed into three consistency groups as numbers of missed persons or those enumerated in error, shown in Table 3-6.

and, thus, having potentially high variance contributions.

A second objective of the medium and large cluster reduc- Table 3-6. Second-Phase Sampling Consistency tion design was to differentially sample clusters based on Groups the estimated demographic composition of the cluster.

Clusters with a high proportion of persons of Hispanic ori- Relationship Consistency group gin or persons belonging to a census race group other Independent list is at least 25 percent lower than than White were classified into a minority stratum. These census . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Low inconsistent Independent list is at least 25 percent greater than types of clusters were sampled at a higher rate than pre-census . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . High inconsistent dominantly non-Hispanic White clusters, in order to Independent list is within +/- 25 percent of census . . Consistent increase the sample size and improve the reliability of the A.C.E. population estimates for these historically under- For List/Enumerate clusters (see attachment), the counted subgroups. census housing unit count was not known at the time of the reduction since this census operation had not Stratifying second-phase clusters. Each block cluster started. Thus, all such clusters were classified as high was put into two categories for the medium and large inconsistent.

3-10 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

Based on the demographic group, the consistency group, The following statements describe how the stratum differ-and the independent listing housing unit count, block ential sampling factors were set to yield the overall state clusters were assigned to one of five reduction strata: sample size. These are not exact rules, but give a sense of how much differential sampling within states was done.

1. Minority (low inconsistent, high inconsistent, consistent)
  • The maximum expected sampling weight after all sub-
2. Non-minority low inconsistent sampling, the inverse of the overall probability of selec-tion, was 650 for the non-minority consistent reduction
3. Non-minority high inconsistent stratum.
4. Non-minority consistent
  • The maximum differential sampling factor was 3 for the
5. Medium stratum jumper two inconsistent reduction strata.

Medium stratum jumper clusters were selected from the

  • The differential sampling factor was around 2 for the medium sampling stratum for the first-phase sample, but minority reduction stratum, except in small states had 80 or more independent listing housing units.

where all of the minority clusters were retained.

Medium clusters were sampled at lower rates than large clusters in the first-phase sample since large clusters The differential sampling factors were assigned using eventually were to undergo within-cluster housing unit guidelines designed to achieve the two objectives of the subsampling, an operation that increases sampling reduction, while also controlling the size of the sampling weights. Medium stratum jumper clusters also went weights and the amount of differential sampling. This led through within-cluster housing unit subsampling, meaning to the design of the differential sampling factors summa-the already higher sampling weights of these clusters rized in Table 3-7.

became even larger. Retaining all of the medium stratum jumper clusters in this reduction avoided introducing sig- Using the stratum differential sampling factors and the nificant weight variation in the sample. estimated number of housing units, the sample allocation for each reduction stratum was derived as follows:

Allocating sample to strata. The first step was to allo-cate the national sample of 300,000 housing units to the DSFg H g

50 states and the District of Columbia, in most cases pro- Tg T 4 portional to 1998 population estimates, with a minimum DSFg Hg g1 of 1,800 housing units in each state. Hawaii was allocated approximately 3,750 housing units due to its concentra- where, tion of Hawaiian and Pacific Islanders for which separate g = A.C.E. second-phase sampling stratum, population coverage estimates were planned. Tg = Target number of sample housing units Within each state, the second-phase selection probabilities allocated to reduction stratum g, varied somewhat among the strata. First, all clusters in the T = State target number of sample housing medium stratum jumper reduction stratum were retained. units modified for medium stratum jumper For the remaining four reduction strata, higher retention clusters, rates were used in the minority, non-minority low incon- H = Estimated number of housing units in the g

sistent and the non-minority high inconsistent reduction reduction stratum based on the indepen-strata than the non-minority consistent stratum. The stra- dent listing housing unit counts, and tum differential sampling factor is the ratio of the prob- DSFg = Differential Sampling Factor for reduction ability of selection for the stratum to the probability of stratum g.

selection for the consistent stratum.

Design of the A.C.E. Sample Section IChapter 3 3-11 U.S. Census Bureau, Census 2000

Table 3-7. A.C.E. Second-Phase Sample Design Parameters for Large and Medium Clusters Differential Sampling Factors1 Target First-phase State Low High sample sample Minority2 inconsistent3 inconsistent4 Consistent5 size6 size 7 Alabama . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.78 1.78 1.78 1.00 4,470 26,960 Alaska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.20 3.00 3.00 1.00 1,800 28,440 Arizona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.78 1.78 1.78 1.00 4,800 55,120 Arkansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 2,610 25,540 California . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 33,510 272,880 Colorado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.99 2.93 2.93 1.00 4,080 33,680 Connecticut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 3,360 31,710 Delaware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.91 3.00 3.00 1.00 1,800 35,920 District of Columbia . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 1,800 55,310 Florida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.00 1.00 1.00 1.00 15,300 57,680 Georgia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.01 2.01 2.01 1.00 7,830 63,470 Hawaii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 3,750 45,410 Idaho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.71 3.00 3.00 1.00 1,800 19,250 Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.19 1.19 1.19 1.00 12,360 31,000 Indiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.68 1.68 1.68 1.00 6,060 15,880 Iowa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 2,940 16,420 Kansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 2,700 17,640 Kentucky . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 4,050 29,560 Louisiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.89 3.00 3.00 1.00 4,470 36,210 Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.55 3.00 3.00 1.00 1,800 16,830 Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.87 2.46 2.46 1.00 5,280 43,320 Massachusetts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.33 2.33 2.33 1.00 6,300 28,420 Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.25 1.25 1.25 1.00 10,080 23,200 Minnesota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.11 2.11 2.11 1.00 4,860 20,340 Mississippi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.96 2.83 2.83 1.00 2,820 20,260 Missouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.25 2.25 2.25 1.00 5,580 20,310 Montana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.57 3.00 3.00 1.00 1,800 18,950 Nebraska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.44 3.00 3.00 1.00 1,800 14,650 Nevada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.95 2.76 2.76 1.00 1,800 64,400 New Hampshire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.84 3.00 3.00 1.00 1,800 21,120 New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.24 2.24 2.24 1.00 8,340 38,810 New Mexico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.73 1.73 1.73 1.00 1,800 35,750 New York . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 18,660 142,450 North Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.83 1.83 1.83 1.00 7,740 27,580 North Dakota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.14 3.00 3.00 1.00 1,800 15,440 Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.22 1.22 1.22 1.00 11,490 31,910 Oklahoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 3,420 26,630 Oregon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.94 2.76 2.76 1.00 3,360 20,680 Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.70 1.70 1.70 1.00 12,300 35,610 Rhode Island . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.94 3.00 3.00 1.00 1,800 25,610 South Carolina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.60 1.60 1.60 1.00 3,930 27,340 South Dakota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.83 3.00 3.00 1.00 1,800 15,500 Tennessee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.99 2.86 2.86 1.00 5,580 33,290 Texas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.86 2.36 2.36 1.00 20,280 183,300 Utah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 2,160 33,130 Vermont . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.91 3.00 3.00 1.00 1,800 17,620 Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.90 1.90 1.90 1.00 6,960 37,560 Washington . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.23 2.23 2.23 1.00 5,850 27,500 West Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.00 3.00 3.00 1.00 1,860 18,130 Wisconsin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.75 1.75 1.75 1.00 5,370 14,700 Wyoming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.99 3.00 3.00 1.00 1,800 18,000 1

The observed or actual sampling factors differed from the design sample rates. See the section on Selecting a subsample.

2 Clusters with high concentrations of minorities.

3 Clusters where the independent listing housing unit count is at least 25 percent lower than the updated census list count.

4 Clusters where the independent listing count is at least 25 percent higher than the updated census list.

5 Clusters where the independent listing count and the updated census list do not differ by more than 25 percent.

6 Target state housing unit interview sample size, excluding American Indian Reservation sample.

7 First-phase preliminary census address list housing unit counts from Spring, 1999.

3-12 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

Sorting the PSUs. The first-phase clusters within each Small Cluster Reduction second-phase stratum by first-phase sampling stratum were sorted as follows: The first-phase sample contained 5,000 small clusters in the United States. Small clusters were expected to have

  • Consistency group between zero and two housing units based on an early
  • List/Enumerate indicator census address list. Conducting interviewing and follow-up operations in clusters of this size was not as
  • American Indian Country Indicator cost effective as in larger clusters. Therefore, to allocate
  • Demographic/Tenure Group A.C.E. resources more efficiently, only a subsample of these small clusters was retained in the A.C.E. sample.
  • 1990 Urbanization This subsampling operation attempted a balance among
  • County code three goals. One goal was to prevent any small clusters
  • Block cluster identification number from having sampling weights that were extremely high compared to other clusters in the sample. Second, sam-Selecting a subsample. Since the first-phase sample pling weights should be lower on clusters where the num-utilized different sampling rates for the medium and large ber of housing units was different than expected. These sampling strata, separate samples were drawn for each first two goals attempted to reduce the contribution of second-phase stratum within the first-phase sampling small clusters to the variance of the dual system esti-strata. Selecting the sample required calculating the sam- mates. The third goal was to improve operational effi-pling rates, sorting the clusters, and drawing a systematic ciency by reducing the number of clusters and future field sample of clusters. visits. To achieve these goals, differential sampling was All of the medium stratum jumpers were retained in the used.

sample. The sampling rates for the remaining four reduc-tion strata were computed so that an integer number of Stratifying first-phase clusters. The first-phase small block clusters was selected. This required computing a clusters were classified into nine possible reduction strata sampling rate based on the ratio of housing units which within each state. These strata were defined using three resulted in a non-integer expected number of clusters, cluster characteristics: Size, American Indian Country sta-determining an integer number of clusters to select, and tus, and List/Enumerate status.

calculating the final sampling rate based on the ratio of The size of a cluster was based on the greater of the inde-clusters. The medium and large cluster reduction followed pendent listing housing unit count or the updated census the sampling procedure discussed earlier.

address list housing unit count as of January 2000. For This resulted in a total of 9,765 out of 24,136 medium List/Enumerate clusters the size was always based on the and large clusters retained in the A.C.E. sample for the 50 actual independent listing count since the List/Enumerate states and the District of Columbia. operation had not yet started by the time of this reduc-tion. The American Indian Country status had three cat-Medium and large cluster reduction sample results.

egories as described in the first-phase of sampling.

Table 3-8 lists the number of housing units and clusters in Table 3-9 contains the reduction strata for small block sample. clusters.

Table 3-8. Second-Phase ResultsMedium and Large Block Cluster and Housing Unit Counts American Number of.... Low High Stratum Indian Minority inconsistent inconsistent Consistent jumpers reservations Nation 1

Housing units . . . . . . . . . . . . . . . . . . . . . . . . . . 230,529 49,086 94,850 403,806 32,064 9,251 819,586 Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,553 971 842 4,801 243 355 9,765 1

Independent Listing counts as of December, 1999.

Design of the A.C.E. Sample Section IChapter 3 3-13 U.S. Census Bureau, Census 2000

Table 3-9. Small Block Cluster Second-Phase Strata Second-phase Housing American List/Enumerate stratum units Indian country status 1 0 to 2 No No 2 3 to 5 No No 3 6 to 9 No No 4 10+ - -

5 0 to 2 No Yes 6 3 to 9 No Yes 7 0 to 9 Reservation/Trust land -

8 0 to 2 TJSA/TDSA/ANVSA1 -

9 3 to 9 TJSA/TDSA/ANVSA -

1 Tribal Jurisdiction Statistical Area/Tribal Designated Statistical Area/Alaska Native Village Statistical Area Determining target sampling rates. Using indepen- summary of the sampling conditions. Table 3-11 illustrates dent listing housing unit counts, target sampling rates the process for determining the second-phase sampling were determined. These rates attempted to satisfy the pre- rate for each stratum.

viously discussed statistical and operational goals. The overall target selection probability was based on the maximum number of housing units within a stratum and Generally, the small clusters were stratified into four the previously mentioned cap of 2,400 housing units. For groups based on the number of housing units in the clus-example, the maximum number of housing units in stra-ter. All clusters with ten or more housing units, on Ameri-tum group one was two. Hence, the overall target selec-can Indian land, or classified as List/Enumerate were tion probability was 1 in (2,400/2) or 1 in 1,200. The sam-retained in sample. For the remaining three reduction pling rate for each second-phase stratum was then set at strata, some differential sampling was introduced.

the rate required to attain these overall target probabilities To determine the sampling rates for these strata, two con- of selection.

ditions were imposed. One of these conditions was that, if Sorting the PSUs. The first-phase clusters were sorted possible, the number of weighted housing units in a clus-in the following order in each second-phase stratum:

ter did not exceed 2,400 housing units. Through com-puter simulations, a number of different limits were tried

  • 1990 urbanization until a cap of 2,400 yielded a sample of appropriate size.
  • county code The second condition was a minimum sampling rate, which varied among the three strata. Table 3-10 contains a
  • A.C.E. cluster identification number Table 3-10. Small Cluster Reduction Sampling Conditions Second-phase Cluster Overall target Minimum second-phase stratum size (HUs) selection probability sampling rate 1 0 to 2 1/1,200 1/10 2 3 to 5 1/480 1/4 3 6 to 9 1/267 1/2.22 Table 3-11. Second-Phase Sampling Rate Criterion If . . . Then, the second-phase sampling rate equals...

Overall target selection probability Overall target selection probability Minimum second-phase sampling rate First-phase sampling rate First-phase sampling rate Overall target selection probability

< Minimum second-phase sampling rate Minimum second-phase sampling rate First-phase sampling rate 3-14 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

Selecting a subsample. Separate samples were selected This resulted in a total of 1,538 out of 5,000 small clus-from each second-phase stratum within each state and the ters retained in the A.C.E. sample for the 50 states and the District of Columbia. This required calculating the actual District of Columbia.

sampling rate for the stratum, sorting the clusters and Small cluster reduction results. Table 3-12 gives the drawing a systematic sample of clusters. distribution of block clusters and housing units after small block cluster reduction. As mentioned earlier, the larger of All clusters with 10 or more housing units that were clas- the independent listing housing unit count and the hous-sified as List/Enumerate, or were in American Indian Coun- ing unit count from the updated census address list as of try, were retained in sample. The sampling rates for the January 2000 was used to stratify the clusters. In Table remaining three strata were computed to achieve an inte- 3-12, only the independent listing housing unit count is ger number of block clusters drawn from each stratum, used in these tallies. Hence, with 55 clusters, as seen in similar to procedures used for the medium and large clus- the 6-9 cluster size, the number of housing units does not achieve the minimum of 330.

ter reduction. This required computing a sampling rate, which resulted in a noninteger expected number of clus- Second-Phase Sampling Results ters determining an integer number of clusters to select, Table 3-13 lists the block cluster sample sizes and and calculating the final sampling rate based on the ratio the number of housing units in each state, the of clusters. The small cluster reduction followed the sam- District of Columbia, and the nation after the second pling procedure discussed earlier. phase of A.C.E. sampling.

Table 3-12. Second-Phase ResultsSmall Block Cluster and Housing Unit Counts Cluster size American Indian List/enumerate Number of Number of (HUs)1 country status housing units2 clusters 0-2 No No 209 692 3-5 No No 358 117 6-9 No No 325 55 10+ - - 4,532 112 0-2 No Yes 59 290 3-9 No Yes 76 16 0-9 Reservation/Trust land - 43 128 0-2 TJSA/TDSA/ANVSA3 - 40 121 3-9 TJSA/TDSA/ANVSA - 30 7 Total 5,672 1,538 1

The size of a cluster was based on the higher of the independent listing housing unit count or the January, 2000 census address list. For List/Enumerate clusters the size was always based on the actual independent listing count since the List/Enumerate operation had not yet been started by the time of this reduction.

2 Keyed independent listing housing unit counts as of January, 2000.

3 Tribal Jurisdiction Statistical Area/Tribal Designated Statistical Area/Alaska Native Village Statistical Area.

Design of the A.C.E. Sample Section IChapter 3 3-15 U.S. Census Bureau, Census 2000

Table 3-13. State Second-Phase Sample Results by First-Phase Stratum Second-phase housing units1 Second-phase block clusters State Small Medium Large AIR Total Small Medium Large AIR Total Alabama . . . . . . . . . . . . . . . . . . . . . . 54 3,599 7,531 0 11,184 14 104 43 0 161 Alaska . . . . . . . . . . . . . . . . . . . . . . . . 24 1,401 3,099 16 4,540 7 40 22 1 70 Arizona . . . . . . . . . . . . . . . . . . . . . . . 140 3,082 17,185 2,826 23,233 69 79 61 113 322 Arkansas . . . . . . . . . . . . . . . . . . . . . 16 2,077 3,566 0 5,659 13 71 24 0 108 California . . . . . . . . . . . . . . . . . . . . . 401 19,124 77,913 204 97,642 93 528 469 11 1,101 Colorado . . . . . . . . . . . . . . . . . . . . . . 19 2,722 9,248 52 12,041 24 85 55 2 166 Connecticut . . . . . . . . . . . . . . . . . . . 37 1,699 6,718 0 8,454 5 59 47 0 111 Delaware . . . . . . . . . . . . . . . . . . . . . 7 1,572 2,979 0 4,558 3 40 23 0 66 District of Columbia . . . . . . . . . . . . 0 1,251 5,403 0 6,654 2 25 31 0 58 Florida . . . . . . . . . . . . . . . . . . . . . . . . 265 7,976 54,986 20 63,247 43 259 230 1 533 Georgia . . . . . . . . . . . . . . . . . . . . . . . 211 4,095 21,195 0 25,501 27 138 111 0 276 Hawaii . . . . . . . . . . . . . . . . . . . . . . . . 11 1,200 22,252 0 23,463 6 40 75 0 121 Idaho . . . . . . . . . . . . . . . . . . . . . . . . . 12 1,632 2,714 152 4,510 32 53 16 6 107 Illinois . . . . . . . . . . . . . . . . . . . . . . . . 51 7,527 20,041 0 27,619 25 247 131 0 403 Indiana . . . . . . . . . . . . . . . . . . . . . . . 125 4,141 7,431 0 11,697 24 141 46 0 211 Iowa . . . . . . . . . . . . . . . . . . . . . . . . . 161 2,338 3,705 0 6,204 21 79 22 0 122 Kansas . . . . . . . . . . . . . . . . . . . . . . . 33 2,193 3,488 31 5,745 24 70 22 1 117 Kentucky . . . . . . . . . . . . . . . . . . . . . 92 2,329 9,621 0 12,042 14 92 52 0 158 Louisiana . . . . . . . . . . . . . . . . . . . . . 7 3,332 6,574 0 9,913 40 109 50 0 199 Maine . . . . . . . . . . . . . . . . . . . . . . . . 38 1,447 2,020 1 3,506 24 53 16 1 94 Maryland . . . . . . . . . . . . . . . . . . . . . 22 3,288 17,041 0 20,351 6 77 82 0 165 Massachusetts . . . . . . . . . . . . . . . . 105 3,467 11,471 0 15,043 10 120 80 0 210 Michigan . . . . . . . . . . . . . . . . . . . . . . 64 6,612 13,581 148 20,405 19 227 92 5 343 Minnesota . . . . . . . . . . . . . . . . . . . . 79 3,210 7,275 286 10,850 28 116 49 10 203 Mississippi . . . . . . . . . . . . . . . . . . . . 84 2,499 2,957 96 5,636 20 76 25 3 124 Missouri . . . . . . . . . . . . . . . . . . . . . . 269 3,229 11,558 0 15,056 24 113 51 0 188 Montana . . . . . . . . . . . . . . . . . . . . . . 15 1,880 2,365 905 5,165 41 60 14 24 139 Nebraska . . . . . . . . . . . . . . . . . . . . . 25 1,685 1,317 91 3,118 31 53 13 3 100 Nevada . . . . . . . . . . . . . . . . . . . . . . . 1 1,361 8,506 204 10,072 38 28 30 5 101 New Hampshire . . . . . . . . . . . . . . . 50 1,658 2,535 0 4,243 11 46 19 0 76 New Jersey . . . . . . . . . . . . . . . . . . . 4 4,883 14,960 0 19,847 8 147 103 0 258 New Mexico . . . . . . . . . . . . . . . . . . . 29 1,813 2,666 1,854 6,362 76 47 19 70 212 New York . . . . . . . . . . . . . . . . . . . . . 582 8,256 62,616 93 71,547 34 271 317 5 627 North Carolina . . . . . . . . . . . . . . . . . 300 5,149 18,901 136 24,486 28 151 93 4 276 North Dakota . . . . . . . . . . . . . . . . . . 35 1,332 2,076 394 3,837 34 58 17 12 121 Ohio . . . . . . . . . . . . . . . . . . . . . . . . . 146 6,906 22,631 0 29,683 22 230 127 0 379 Oklahoma . . . . . . . . . . . . . . . . . . . . . 96 2,557 5,142 267 8,062 104 89 31 8 232 Oregon . . . . . . . . . . . . . . . . . . . . . . . 7 2,165 7,231 124 9,527 52 70 44 3 169 Pennsylvania . . . . . . . . . . . . . . . . . . 203 8,622 15,227 0 24,052 28 293 107 0 428 Rhode Island . . . . . . . . . . . . . . . . . . 6 1,517 2,517 0 4,040 4 47 18 0 69 South Carolina . . . . . . . . . . . . . . . . 113 3,540 9,094 0 12,747 15 88 39 0 142 South Dakota . . . . . . . . . . . . . . . . . 22 1,307 2,613 453 4,395 40 55 14 27 136 Tennessee . . . . . . . . . . . . . . . . . . . . 381 4,000 10,436 0 14,817 24 125 58 0 207 Texas . . . . . . . . . . . . . . . . . . . . . . . . 714 13,473 47,011 30 61,228 149 405 238 1 793 Utah . . . . . . . . . . . . . . . . . . . . . . . . . 112 2,583 4,061 134 6,890 29 48 23 7 107 Vermont . . . . . . . . . . . . . . . . . . . . . . 16 1,191 3,237 0 4,444 10 45 20 0 75 Virginia . . . . . . . . . . . . . . . . . . . . . . . 62 3,443 20,872 0 24,377 15 131 112 0 258 Washington . . . . . . . . . . . . . . . . . . . 225 3,320 12,976 438 16,959 33 106 76 17 232 West Virginia . . . . . . . . . . . . . . . . . . 24 1,263 4,666 0 5,953 10 46 23 0 79 Wisconsin . . . . . . . . . . . . . . . . . . . . . 164 4,380 5,909 219 10,672 24 138 39 10 211 Wyoming . . . . . . . . . . . . . . . . . . . . . 13 1,778 1,186 89 3,066 61 62 11 5 139 Total U.S. . . . . . . . . . . . . . . . . . . . . . 5,672 187,104 642,303 9,263 844,342 1,538 5,880 3,530 355 11,303 1

Keyed independent listing housing unit counts as of January 2000. Keyed implies these counts went through a quality control review. Conse-quently, small discrepancies may exist between these independent listing housing unit counts and those from Table 3-8.

3-16 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

THIRD PHASE OF THE A.C.E. SAMPLE DESIGN the sampling units in order to obtain compact interview-ing workloads and to facilitate overlapping P and E In very large block clusters, the housing units within the samples to reduce E-sample person follow-up workloads.

cluster were subsampled. This achieved manageable field workloads for A.C.E. interviewing and person follow-up Flow of operations. A complication of this project was without having a big impact on reliability. The strategy of that large block clusters were ready for the housing unit the A.C.E. large block cluster sampling plan was to subsampling on a flow basis as the preceding operations, increase the number of clusters in sample, while still housing unit matching and follow-up, were completed. To attaining the targeted number of housing units for inter- remain on schedule, it was essential that the P-sample view. Because housing units in a block cluster are often housing units were selected and prepared for interview as similar, interviewing all of them is not the most efficient quickly as possible. This meant that sampling parameters use of resources. Instead, interviewing a manageable frac- were computed based on the housing unit counts from the tion of several different clusters provides a more geo- independent listing. If scheduling had not been an issue, graphically diverse sample. the housing unit counts from the housing unit matching In the first-phase sampling, large block clusters had a and follow-up would have been used. The time schedule higher selection probability than medium block clusters to constraints did not permit the entire country to be pro-take into account this anticipated, subsequent housing cessed prior to subsampling. Further, there was no pre-unit reduction. The A.C.E. second-phase reduction main- specified order in which block clusters were ready for tained the differential selection probabilities between the housing unit subsampling. Thus, following the flow of large and medium block clusters. After the reduction of block clusters from the preceding operations, the housing housing units in large block clusters, the housing unit unit subsampling was performed daily.

selection probabilities in medium and large block clusters Stratifying third-phase clusters. Before selecting the in the same second-phase sampling stratum were similar.

sample of segments, block clusters were divided into Another important goal of this housing unit reduction was seven strata within each state. The first five strata were to geographically overlap the P and E samples to reduce the same strata used for the second phase of sampling for the E-sample person follow-up workload. An overlapping P the medium and large first-phase strata. The sixth stratum and E sample was not necessary, but improved the preci- was the small to large stratum jumpers, block clusters sion of dual system estimates, the cost-effectiveness of from the small stratum observed to have more than 80 the succeeding operation, and the data processing housing units during the independent listing. The seventh efficiency. stratum was equivalent to the first-phase American Indian Reservation stratum, for which no housing unit reduction Identifying the P-Sample Housing Units was done.

The source of the P-sample housing units, which were Allocating the sample. Nationally, the target distribu-subject to person interviewing by the field staff, was the tion of the 300,000 P-sample housing unit sample was independently listed housing units that were confirmed to roughly proportional to population size, except for exist following the housing unit matching and follow-up increases in sample size in the smaller states, which had operations. (See Chapter 4.) In block clusters that had roughly equal sizes. The second-phase introduced differ-fewer than 80 of these housing units, all of the housing ential sampling within each state and generated overall units were designated to be in the P sample. In addition, target sample sizes for each reduction stratum in the all housing units in a block cluster selected from the state, the Tg in the earlier section. Based on these targets American Indian Reservation stratum were in the P and the observed second-phase sample block clusters, the sample, regardless of how many housing units were in the sample was allocated to each stratum to provide approxi-block cluster. Most block clusters from this stratum were mately equal overall probabilities of selection for housing expected to have fewer than 80 housing units and it was units from the same stratum.

desirable to avoid introducing weight variation to the sample cases for this stratum. For block clusters with 80 Determining sampling parameters. Separate sam-or more housing units, the housing units were sub- pling parameters were computed for each stratum within a sampled and the selected housing units were in the P state. For each stratum, the selection probability was the sample. ratio of the target number of housing units from large block clusters over the number of housing units from the The reduction of housing units within a large block cluster independent listing in large block clusters.

was done by forming groups of adjacent housing units called segments and selecting one or more segments of Within-cluster sampling rate =

housing units to participate in the P sample. The segments Target housing unit sample size in large block clusters had approximately equal numbers of housing units within Number of listed housing units in large block clusters a block cluster. Segments of housing units were used as Design of the A.C.E. Sample Section IChapter 3 3-17 U.S. Census Bureau, Census 2000

The target housing unit sample size was derived by sub- Forming the segments. Within each block cluster the tracting the number of housing units in medium block housing units were sorted by census block and geo-clusters based on the independent list from the target graphic location within the block. Then based on the num-stratum sample size. When tallying the housing unit ber of segments, approximately equal numbers of housing counts from the independent list, any housing units classi- units were assigned to each segment.

fied as future construction were omitted from the count.

Although some of this future construction was probably Selecting a subsample. Within-cluster subsampling was going to be built by Census Day, it was expected to be a done daily as the clusters completed the housing unit rare occurrence. matching and follow-up operations. Despite the daily pro-Within a particular stratum in a state, a fixed number of cessing, the subsampling was equivalent to a one-time segments was formed in each block cluster. This number sample, since the results of the previous day were carried was a function of the within-cluster sampling rate. This over to the next and continued. The one difference with method yielded different size segments across block clus- the daily operation was the inability to control the block ters within the same stratum. This method is a trade-off cluster sort across all block clusters in the stratum due to between having fewer segments to reduce nonsampling the flow of the block clusters. So, each day the block clus-error and having more segments of a fixed size to reduce ters that were to be subsampled were sorted by block sample size variation. Nonsampling error was reduced by cluster number within each stratum.

having fewer segment boundaries to identify. If the within-cluster sampling rate was less than or equal to 0.5, then A sample of segments was selected by taking one system-atic sample across all large block clusters in each stratum 1

Number of segments within a state. Selecting one systematic sample per sam-within-cluster sampling rate pling stratum, rather than a separate sample from each rounded up to the nearest integer. When the within-cluster large cluster, reduced sample size variability. This allowed sampling rate was greater than 0.5, the above formula an observed sample size close to the target housing unit results in only two segments resulting in increased sample sample size to be achieved.

size variation with the larger segment size. To better con-trol sample size variation when the sampling rate was P-Sample Results greater than 0.5, the number of segments was calculated as Following within-cluster subsampling, the sample for the 50 states and the District of Columbia was 11,303 1 block clusters containing about 301,000 housing units.

Number of segments (1 - within-cluster sampling rate) Table 3-14 displays the results for each state.

3-18 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

Table 3-14. State Third-Phase Sample Results for the P Sample Housing unit counts1 by cluster size2 Block cluster counts by cluster size2 State 0 - 79 80+ AIR Total 0 - 79 80+ AIR Total Alabama . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,947 1,503 0 4,450 115 46 0 161 Alaska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,152 587 16 1,739 48 21 1 70 Arizona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5,193 2,474 2,661 7,667 154 55 113 322 Arkansas . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,795 921 0 2,716 86 22 0 108 California . . . . . . . . . . . . . . . . . . . . . . . . . . . 18,608 14,919 192 33,527 675 415 11 1,101 Colorado . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,662 1,491 50 4,153 113 51 2 166 Connecticut . . . . . . . . . . . . . . . . . . . . . . . . . 1,971 1,272 0 3,243 72 39 0 111 Delaware . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,077 693 0 1,770 42 24 0 66 District of Columbia . . . . . . . . . . . . . . . . . . 1,106 1,084 0 2,190 29 29 0 58 Florida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8,736 6,518 20 15,254 329 203 1 533 Georgia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,690 3,072 0 7,762 183 93 0 276 Hawaii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,156 2,447 0 3,603 47 74 0 121 Idaho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,653 342 146 1,995 86 15 6 107 Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8,510 3,855 0 12,365 292 111 0 403 Indiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,172 1,773 0 5,945 169 42 0 211 Iowa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,162 829 0 2,991 101 21 0 122 Kansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,114 552 29 2,666 101 15 1 117 Kentucky . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,607 1,372 0 3,979 111 47 0 158 Louisiana . . . . . . . . . . . . . . . . . . . . . . . . . . . 3,031 1,386 0 4,417 153 46 0 199 Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,571 361 1 1,932 80 13 1 94 Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,574 2,713 0 5,287 91 74 0 165 Massachusetts . . . . . . . . . . . . . . . . . . . . . . 4,500 1,893 0 6,393 151 59 0 210 Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . 7,224 2,756 147 9,980 259 79 5 343 Minnesota . . . . . . . . . . . . . . . . . . . . . . . . . . 3,734 1,420 261 5,154 151 42 10 203 Mississippi . . . . . . . . . . . . . . . . . . . . . . . . . . 2,332 602 96 2,934 97 24 3 124 Missouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3,389 2,120 0 5,509 141 47 0 188 Montana . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,146 654 863 2,800 100 15 24 139 Nebraska . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,736 225 79 1,961 86 11 3 100 Nevada . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,141 973 189 2,114 70 26 5 101 New Hampshire . . . . . . . . . . . . . . . . . . . . . 1,156 609 0 1,765 53 23 0 76 New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . 5,369 2,902 0 8,271 175 83 0 258 New Mexico . . . . . . . . . . . . . . . . . . . . . . . . 2,600 988 1,736 3,588 119 23 70 212 New York . . . . . . . . . . . . . . . . . . . . . . . . . . . 9,301 9,390 88 18,691 332 290 5 627 North Carolina . . . . . . . . . . . . . . . . . . . . . . 4,405 3,438 93 7,843 177 95 4 276 North Dakota . . . . . . . . . . . . . . . . . . . . . . . 1,780 404 381 2,184 95 14 12 121 Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7,369 3,973 0 11,342 262 117 0 379 Oklahoma . . . . . . . . . . . . . . . . . . . . . . . . . . 2,696 970 260 3,666 193 31 8 232 Oregon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,866 1,606 124 3,472 127 39 3 169 Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . 9,463 2,801 0 12,264 344 84 0 428 Rhode Island . . . . . . . . . . . . . . . . . . . . . . . 1,200 574 0 1,774 49 20 0 69 South Carolina . . . . . . . . . . . . . . . . . . . . . . 2,505 1,994 0 4,499 103 39 0 142 South Dakota . . . . . . . . . . . . . . . . . . . . . . . 1,657 520 439 2,177 95 14 27 136 Tennessee . . . . . . . . . . . . . . . . . . . . . . . . . . 4,071 1,748 0 5,819 156 51 0 207 Texas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13,031 7,331 29 20,362 588 204 1 793 Utah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,640 846 122 2,486 73 27 7 107 Vermont . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,345 571 0 1,916 57 18 0 75 Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3,765 3,122 0 6,887 156 102 0 258 Washington . . . . . . . . . . . . . . . . . . . . . . . . . 4,064 2,043 416 6,107 147 68 17 232 West Virginia . . . . . . . . . . . . . . . . . . . . . . . . 1,108 769 0 1,877 56 23 0 79 Wisconsin . . . . . . . . . . . . . . . . . . . . . . . . . . 4,323 1,186 209 5,509 164 27 10 211 Wyoming . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,391 527 83 1,918 121 13 5 139 Total U.S. . . . . . . . . . . . . . . . . . . . . . . . . . . . 184,155 108,028 8,730 300,913 7,774 3,174 355 11,303 1

The source of the P-sample housing unit counts was the independent list that was confirmed to exist following the housing unit matching and follow-up operations.

2 Cluster size was based on number of confirmed A.C.E. housing units after housing unit matching and follow-up.

Identifying the E-Sample Housing Units fewer than 80 census housing units or in block clusters selected from the American Indian Reservation stratum The E sample consisted of the census enumerations in the were designated to be in the E sample. For block clusters same sample areas as the P sample. The source of the with 80 or more housing units, the housing units were E-sample housing units was the unedited census files. Like reduced and the selected housing units were in the E the P sample, all housing units in block clusters that had sample.

Design of the A.C.E. Sample Section IChapter 3 3-19 U.S. Census Bureau, Census 2000

The reduction of housing units within a large block cluster If there was a link with an A.C.E. unit, then the census was done by mapping the P-sample segments onto the housing unit was assigned to the same segment as the census housing units. This was possible because when A.C.E. unit. This helped to create overlapping P and E there was a match between an A.C.E. independently listed samples. Sometimes a census housing unit did not have a address and a census address during the housing unit link with an A.C.E. housing unit. When this happened, all matching, the census identification number was linked to the available census housing units were sorted and then the A.C.E. unit. Then the same segment selected for the P each census housing unit without a link was assigned to sample was selected for the E sample. the same segment as the preceding census housing unit.

When the block cluster contained city-style addresses, the The census inventory of housing units changed between census housing units were sorted by census block num-the housing unit matching operation and the identification ber, street name, house number, and unit designation.

of the E sample. Therefore, some census housing units did When the block cluster contained non-city-style addresses, not have a link with an A.C.E. unit. These cases were the census housing units were sorted by census block assigned to a segment using pre-specified rules. Some-number and geographic location within the block. For city-times there were a large number of these cases in the seg-style census addresses, geographic location was not avail-ment selected to be in sample. If there were more than 80 able.

of these, then an additional subsample was drawn from these census housing units without a link to an A.C.E.

Selecting the E-sample housing units. Once all the unit.

census housing units within a block cluster were assigned The data-defined census person enumerations in the to a segment, then the census housing units in the seg-E-sample housing units were in the E sample. To be a cen- ment or segments selected for the P sample were in the E sus data-defined person, the person record had two sample. Occasionally, the selected segment or segments 100-percent data items filled. Name was not required for within the block cluster contained more than 80 census the person record to be considered data-defined, but housing units that did not link to an A.C.E. housing unit.

could be one of the two items required to be data-defined. When this occurred, an additional step of subsampling was done to reduce the E sample follow-up workload, Census housing units not available for the since the census housing units without this link were E sample. Not all housing units on the unedited census more likely to contribute to the follow-up workload than file were eligible to be in the E sample. As the census enu-census housing units with this link.

merations were being processed, the Census Bureau sus-pected that there was a significant number of duplicate A systematic subsample of census housing units without a addresses in the census files. As such, a new census link to an A.C.E. housing unit was drawn. Using the same operation, the Housing Unit Duplication Operation, was sort used for assigning housing units to a segment, a sub-introduced in the fall of 2000. The primary goal of this sample of 40 housing units was selected if the resulting operation was to improve the quality of the census; how- subsampling rate was greater than 0.25. However, to ever its design allowed the A.C.E. operations to proceed. avoid excessive sampling weight variation, the minimum Essentially, suspected duplicate housing units were set subsampling rate was set to 0.25, resulting in more than aside and analyzed further. These housing units and the 40 census housing units without a link to an A.C.E. hous-corresponding census person enumerations were not eli- ing unit being in the E sample from the particular block gible for the E-sample component of the A.C.E. nor avail- cluster.

able for person matching and were excluded from the dual-system estimation calculation. Some of these set- Special case block clusters. There were special case aside housing units and the corresponding census enu- block clusters when none of the census housing units in a merations were later put back into the final census counts. block cluster linked to an A.C.E. housing unit address at the time of the housing unit matching. One example of a Subsampling criteria. If a block cluster contained 80 or special case was a List/Enumerate cluster, since the fewer available census housing units, then all available List/Enumerate operation had not been conducted by the census housing units were in the E sample. If the block time that the housing unit matching was done. None of cluster was from the American Indian Reservation stratum, the housing units in a List/Enumerate cluster could be all available housing units were in the E sample. If the assigned to a segment. Instead of selecting a compact block cluster had 80 or more available census housing segment of housing units to be in the E sample, a system-units, the housing units were subsampled.

atic subsample of the housing units was drawn using the Assigning housing units to segments. Within a block same method as discussed above. This prevented overlap-cluster, the census housing units were assigned to a seg- ping the P and E samples when these block clusters were ment based on the link to an A.C.E. housing unit address. large. This did not happen often.

3-20 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

E-Sample Results 11,303 block clusters containing about 311,000 housing units. Table 3-15 displays the results for each state.

Following E-sample identification and subsampling, the E sample for the 50 states and the District of Columbia was Table 3-15. State Third-Phase Sample Results for the E Sample Housing unit1 counts by cluster size2 Block cluster counts by cluster size2 State 0 - 79 80+ AIR Total 0 - 79 80+ AIR Total Alabama . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,776 1,793 0 4,569 113 48 0 161 Alaska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925 926 16 1,867 44 25 1 70 Arizona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,819 2,521 2,521 7,861 152 57 113 322 Arkansas . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,838 1,118 0 2,956 85 23 0 108 California . . . . . . . . . . . . . . . . . . . . . . . . . . . 17,906 16,228 271 34,405 658 432 11 1,101 Colorado . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,623 1,587 49 4,259 114 50 2 166 Connecticut . . . . . . . . . . . . . . . . . . . . . . . . . 2,074 1,241 0 3,315 73 38 0 111 Delaware . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,270 659 0 1,929 45 21 0 66 District of Columbia . . . . . . . . . . . . . . . . . . 1,144 1,216 0 2,360 29 29 0 58 Florida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8,108 7,037 26 15,171 320 212 1 533 Georgia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,373 3,346 0 7,719 179 97 0 276 Hawaii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,323 2,653 0 3,976 49 72 0 121 Idaho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,243 850 155 2,248 72 19 6 107 Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8,190 4,302 0 12,492 288 115 0 403 Indiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,082 1,870 0 5,952 170 41 0 211 Iowa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,237 907 0 3,144 102 20 0 122 Kansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,097 734 27 2,858 100 16 1 117 Kentucky . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,360 1,692 0 4,052 107 51 0 158 Louisiana . . . . . . . . . . . . . . . . . . . . . . . . . . . 3,078 1,809 0 4,887 152 47 0 199 Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,595 429 1 2,025 80 13 1 94 Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,651 2,786 0 5,437 92 73 0 165 Massachusetts . . . . . . . . . . . . . . . . . . . . . . 4,249 2,736 0 6,985 146 64 0 210 Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . 6,682 3,311 146 10,139 253 85 5 343 Minnesota . . . . . . . . . . . . . . . . . . . . . . . . . . 3,183 1,720 260 5,163 148 45 10 203 Mississippi . . . . . . . . . . . . . . . . . . . . . . . . . . 2,374 647 114 3,135 99 22 3 124 Missouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3,231 2,098 0 5,329 141 47 0 188 Montana . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,480 521 866 2,867 98 17 24 139 Nebraska . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,637 258 68 1,963 86 11 3 100 Nevada . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907 1,175 133 2,215 67 29 5 101 New Hampshire . . . . . . . . . . . . . . . . . . . . . 1,426 467 0 1,893 57 19 0 76 New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . 4,952 3,666 0 8,618 170 88 0 258 New Mexico . . . . . . . . . . . . . . . . . . . . . . . . 1,136 754 1,536 3,426 121 21 70 212 New York . . . . . . . . . . . . . . . . . . . . . . . . . . . 9,114 11,071 84 20,269 326 296 5 627 North Carolina . . . . . . . . . . . . . . . . . . . . . . 4,510 3,253 101 7,864 182 90 4 276 North Dakota . . . . . . . . . . . . . . . . . . . . . . . 1,401 482 358 2,241 95 14 12 121 Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7,223 4,016 0 11,239 263 116 0 379 Oklahoma . . . . . . . . . . . . . . . . . . . . . . . . . . 2,366 1,038 265 3,669 193 31 8 232 Oregon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,644 2,378 125 4,147 122 44 3 169 Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . 9,143 3,449 0 12,592 336 92 0 428 Rhode Island . . . . . . . . . . . . . . . . . . . . . . . 1,194 556 0 1,750 50 19 0 69 South Carolina . . . . . . . . . . . . . . . . . . . . . . 2,502 1,968 0 4,470 105 37 0 142 South Dakota . . . . . . . . . . . . . . . . . . . . . . . 1,278 495 433 2,206 95 14 27 136 Tennessee . . . . . . . . . . . . . . . . . . . . . . . . . . 4,022 2,429 0 6,451 157 50 0 207 Texas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12,412 9,213 27 21,652 574 218 1 793 Utah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,434 818 123 2,375 75 25 7 107 Vermont . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,263 640 0 1,903 56 19 0 75 Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3,555 3,731 0 7,286 152 106 0 258 Washington . . . . . . . . . . . . . . . . . . . . . . . . . 3,371 2,609 411 6,391 144 71 17 232 West Virginia . . . . . . . . . . . . . . . . . . . . . . . . 1,173 724 0 1,897 57 22 0 79 Wisconsin . . . . . . . . . . . . . . . . . . . . . . . . . . 4,067 1,159 211 5,437 167 34 10 211 Wyoming . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,337 554 84 1,975 121 13 5 139 Total U.S. . . . . . . . . . . . . . . . . . . . . . . . . . . . 178,978 123,640 8,411 311,029 7,690 3,258 355 11,303 1

Available housing unit counts from the unedited census file.

2 Cluster size was based on available census housing unit tallies.

Design of the A.C.E. Sample Section IChapter 3 3-21 U.S. Census Bureau, Census 2000

Third-Phase Sampling Results In Table 3-16, for most of the states, the average E-sample weight is smaller than the average P-sample weight.

Table 3-16 gives the state weighted and unweighted P-sample and E-sample housing units. Also displayed are Nationally, despite the P- and E-sample sizes differing by the average P-sample and E-sample weights, prior to about 10,000 housing units, after applying the weight, the weight trimming, TES adjustment, and nonresponse weighted number of P-sample housing units is less than adjustments. The average weights ranged from approxi- one percent larger than the weighted number of E-sample mately 100 to 500. housing units.

Table 3-16. P-Sample and E-Sample Housing Unit Sampling Results Weighted housing unit estimates Housing unit sample sizes Average weight State P sample E sample P/E P sample E sample P/E P sample E sample Alabama . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,967,703 1,953,559 1.007 4,450 4,569 0.974 442 428 Alaska . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186,971 187,657 0.996 1,739 1,867 0.931 108 101 Arizona . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,291,735 2,419,098 0.947 7,667 7,861 0.975 299 308 Arkansas . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,204,014 1,214,878 0.991 2,716 2,956 0.919 443 411 California . . . . . . . . . . . . . . . . . . . . . . . . . . . 12,255,066 12,129,849 1.010 33,527 34,405 0.974 366 353 Colorado . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,633,980 1,579,070 1.035 4,153 4,259 0.975 393 371 Connecticut . . . . . . . . . . . . . . . . . . . . . . . . . 1,262,197 1,249,792 1.010 3,243 3,315 0.978 389 377 Delaware . . . . . . . . . . . . . . . . . . . . . . . . . . . 282,962 285,557 0.991 1,770 1,929 0.918 160 148 District of Columbia . . . . . . . . . . . . . . . . . . 295,972 295,099 1.003 2,190 2,360 0.928 135 125 Florida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7,350,667 6,958,799 1.056 15,254 15,171 1.005 482 459 Georgia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3,178,003 3,101,337 1.025 7,762 7,719 1.006 409 402 Hawaii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446,780 467,582 0.956 3,603 3,976 0.906 124 118 Idaho . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475,978 494,377 0.963 1,995 2,248 0.887 239 220 Illinois . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,752,616 4,723,175 1.006 12,365 12,492 0.990 384 378 Indiana . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,565,559 2,611,248 0.983 5,945 5,952 0.999 432 439 Iowa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,286,159 1,303,393 0.987 2,991 3,144 0.951 430 415 Kansas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,054,277 1,085,066 0.972 2,666 2,858 0.933 395 380 Kentucky . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,738,637 1,688,359 1.030 3,979 4,052 0.982 437 417 Louisiana . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,690,093 1,767,498 0.956 4,417 4,887 0.904 383 362 Maine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606,684 580,671 1.045 1,932 2,025 0.954 314 287 Maryland . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,240,463 2,237,811 1.001 5,287 5,437 0.972 424 412 Massachusetts . . . . . . . . . . . . . . . . . . . . . . 2,637,732 2,652,699 0.994 6,393 6,985 0.915 413 380 Michigan . . . . . . . . . . . . . . . . . . . . . . . . . . . 3,945,568 3,948,348 0.999 9,980 10,139 0.984 395 389 Minnesota . . . . . . . . . . . . . . . . . . . . . . . . . . 1,976,410 1,940,302 1.019 5,154 5,163 0.998 383 376 Mississippi . . . . . . . . . . . . . . . . . . . . . . . . . . 1,067,393 1,065,495 1.002 2,934 3,135 0.936 364 340 Missouri . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,678,909 2,576,545 1.040 5,509 5,329 1.034 486 483 Montana . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463,607 459,884 1.008 2,800 2,867 0.977 166 160 Nebraska . . . . . . . . . . . . . . . . . . . . . . . . . . . 684,874 667,586 1.026 1,961 1,963 0.999 349 340 Nevada . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895,050 862,509 1.038 2,114 2,215 0.954 423 389 New Hampshire . . . . . . . . . . . . . . . . . . . . . 558,641 523,562 1.067 1,765 1,893 0.932 317 277 New Jersey . . . . . . . . . . . . . . . . . . . . . . . . . 3,377,908 3,338,768 1.012 8,271 8,618 0.960 408 387 New Mexico . . . . . . . . . . . . . . . . . . . . . . . . 708,714 667,620 1.062 3,588 3,426 1.047 198 195 New York . . . . . . . . . . . . . . . . . . . . . . . . . . . 7,573,292 7,706,526 0.983 18,691 20,269 0.922 405 380 North Carolina . . . . . . . . . . . . . . . . . . . . . . 3,857,166 3,748,539 1.029 7,843 7,864 0.997 492 477 North Dakota . . . . . . . . . . . . . . . . . . . . . . . 294,040 288,677 1.019 2,184 2,241 0.975 135 129 Ohio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,785,461 4,687,680 1.021 11,342 11,239 1.009 422 417 Oklahoma . . . . . . . . . . . . . . . . . . . . . . . . . . 1,461,163 1,465,046 0.997 3,666 3,669 0.999 399 399 Oregon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,411,681 1,431,030 0.986 3,472 4,147 0.837 407 345 Pennsylvania . . . . . . . . . . . . . . . . . . . . . . . 5130,010 5,179,175 0.991 12,264 12,592 0.974 418 411 Rhode Island . . . . . . . . . . . . . . . . . . . . . . . 408,426 401,022 1.018 1,774 1,750 1.014 230 229 South Carolina . . . . . . . . . . . . . . . . . . . . . . 2,274,389 2,332,485 0.975 4,499 4,470 1.006 506 522 South Dakota . . . . . . . . . . . . . . . . . . . . . . . 300,952 297,492 1.012 2,177 2,206 0.987 138 135 Tennessee . . . . . . . . . . . . . . . . . . . . . . . . . . 2,489,607 2,609,919 0.954 5,819 6,451 0.902 428 405 Texas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8,116,215 8,098,923 1.002 20,362 21,652 0.940 399 374 Utah . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885,164 823,255 1.075 2,486 2,375 1.047 356 347 Vermont . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307,822 296,414 1.038 1,916 1,903 1.007 161 156 Virginia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,714,879 2,797,836 0.970 6,887 7,286 0.945 394 384 Washington . . . . . . . . . . . . . . . . . . . . . . . . . 2,496,269 2,435,145 1.025 6,107 6,391 0.956 409 381 West Virginia . . . . . . . . . . . . . . . . . . . . . . . . 917,901 916,552 1.001 1,877 1,897 0.989 489 483 Wisconsin . . . . . . . . . . . . . . . . . . . . . . . . . . 2,274,773 2,268,976 1.003 5,509 5,437 1.013 413 417 Wyoming . . . . . . . . . . . . . . . . . . . . . . . . . . . 190,271 194,844 0.977 1,918 1,975 0.971 99 99 United States . . . . . . . . . . . . . . . . . . . . . . . 115,650,803 115,016,729 1.006 300,913 311,029 0.967 384 370 3-22 Section IChapter 3 Design of the A.C.E. Sample U.S. Census Bureau, Census 2000

Attachment.

Census 2000 Type of Enumeration Areas (TEAs)1 The term TEA has been used for several decennial cen-

  • Blocks are NOT included in Block Canvassing, the Postal suses. For Census 2000, it reflects not only the type of Validation Check, or the New Construction Program enumeration, but also the method of compiling the census
  • Puerto Rico, including its military bases, is completely in address list that controls the enumeration process. TEA 2 The Census Bureau defines TEA codes at the census collec- Address Listing and Update/Leave are implemented in tion block level. Each block must have a TEA code, and no areas where mail often is delivered to non-city-style block may have more than one TEA code. addresses. In these areas, it is difficult to obtain an up-to-date mailing address list and then geocode each address TEA 1 - Block Canvassing and Mailout/Mailback (that is, assign it to a collection block code), because of the constantly changing residential location/mailing
  • Contains areas with predominantly city-style (house address relationship (especially for P.O. Box addresses).

number/street name) addresses used for mail delivery. The census address list therefore is compiled through a

  • Census address list is created from USPS, 1990 census, door-to-door independent listing operation (Address List-local/tribal, and other potential supplementary address ing) that is implemented in all TEA 2 blocks.

sources. During Address Listing, enumerators knock on each resi-dence door to obtain the occupants name, phone number,

  • Blocks are included in both Block Canvassing and the residential address (or location description), and mailing Postal Validation Check. address. (Enumerators do NOT revisit residences whose
  • Blocks are included in local/tribal program to identify occupants are not present. This is why the census address new construction. list frequently does NOT contain a mailing address, and why the location description is the ONLY address in the Mailout/mailback is the most efficient, cost-effective enu- census address list for many residences.) Enumerators meration method in heavily populated areas in which mail identify the location of each building (containing living is delivered to city-style addresses in virtually all cases quarters) they encounter with a uniquely numbered map (there may be scattered non-city-style mailing addresses in spot that they enter on their map and record in their use in these areas). In most instances, a census enumera- address register; this number is linked to all residential tor visits a residence onceduring Block Canvassing. A units in the building, and stored in both the census subsequent visit is sometimes necessary during Nonre- address list and the TIGER data base. These areas will be sponse Follow-up. included in Address List Review (LUCA) 1999.

The mailing list used for this operation is derived initially At census time, enumerators deliver census question-from automated address files (the USPS Delivery Sequence naires to all housing units compiled during Address List-File and the 1990 Census Address Control File), and ing and that remain in TEA 2. In the course of delivering updated through various operations, including Address these questionnaires, the enumerators also update the List Review (LUCA1998), ongoing DSF updates, Block Can- census address list and map spotted map to reflect hous-vassing, the Postal Validation Check, and the New Con- ing units that were not previously listed, and to eliminate struction Program. residences that they cannot locate. (This operation is called Update/Leave, because the enumerators UPDATE TEA 2 - Address Listing and Update/Leave the census address list and maps and LEAVE question-

  • Contains areas with some number of non-city-style (e.g., naires.) Update/Leave enumerators use the residential P.O. Box or Rural Route) mailing addresses. address/location description in conjunction with the map spot location to determine the correct delivery point for all
  • Census address list is created from Address Listing, and questionnaires.

updated from Address List Review (LUCA) 1999 Recan- Most housing units in TEA 2 areas are visited at least twice vassing (in selected areas) and Update/Leave by enumerators once during Address Listing, and again during Update/Listing. Respondents must mail their com-1 pleted census questionnaires to the Census Bureau, and so This documentation is reproduced from the Geography Division, U.S. Census Bureau, Web site located at some residences also will be visited a third time, during http://www.geo.census.gov/mob/homep/teas.html. Nonresponse Follow-up.

Design of the Accuracy and Coverage Evaluation Sample Section IChapter 3 3-23 U.S. Census Bureau, Census 2000

TEA 3 - List/Enumerate

  • Blocks are NOT included in the Postal Validation Check or the New Construction Program
  • Contains areas that are remote, sparsely populated, or not easily accessible
  • The term rural reflects Address Listing as the initial
  • Census address list is created and enumeration con- source of the census address list, and does NOT reflect ducted concurrently the official census definition of the term rural
  • Blocks are not included in Block Canvassing, the Postal
  • These areas will be included in Address List Review Validation Check, the New Construction Program, or (LUCA) 1999 materials, as the MAF was compiled ini-Address Listing tially from Address Listing
  • Includes all military bases in TEA 3 areas In some areas that otherwise meet the criteria for inclu-sion in TEA 2, the Census Bureau has decided that having
  • All island areas (except Puerto Rico), including their mili-respondents enumerate themselves and return their ques-tary bases, are TEA 3 tionnaires via the mail is not the best way to conduct the Some areas are remote, sparsely populated, and/or not enumeration. Some targeted populations may be less easily visited. Many of the residences in these areas do not likely to return their questionnaires in the mail, and more have city-style mail delivery. It is inefficient and expensive likely to respond to an enumerator. In other areas, housing to implement Address Listing, Update/Listing, and Nonre- units may be vacant because they are occupied seasonally.

sponse Follow-up operations involving multiple visits.

In these and comparable situations, enumerators visit all Instead, the creation of the address list and the residences on the census address list and complete the delivery/completion of the census questionnaire are enumeration on-site. In the course of delivering these accomplished during a single operation, List/Enumerate.

questionnaires, they also update the census address list to Enumerators visit residences in TEA 3 blocks, LIST them

1) reflect housing units that were not previously listed for inclusion in the census address list, mark their location (including a map spot to reflect each building that con-on their map with a map spot and number, enter that map tains one or more living quarters), and 2) eliminate hous-spot number in their address register, and ENUMERATE the ing units that they cannot locate. (This operation is called residents on-site. They collect the same address informa-Rural Update/Enumerate, because the enumerators work tion as in Address Listing, and include a map spot to in areas that were Address Listed, UPDATE the census reflect each building that contains one or more living quar-address list [and assign map spots as well], and ENUMER-ters. These areas will NOT be included in any Address List ATE the residents.)

Review (LUCA) program, because there is no address list for them in advance of the census.

TEA 6 - Military TEA 4 - Remote Alaska

  • Contains blocks within TEA 2 that are on military bases
  • Similar to List/Enumerate, but conducted earlier, before ice breakup/snow melt
  • Mailout/Mailback for family housing
  • These areas will NOT be included in any Address List
  • Separate enumeration procedures for barracks, hospi-Review (LUCA) program, because there is no address list tals, etc.

for them in advance of the census

  • Blocks are included in both Block Canvassing and the TEA 5 - Rural Update/Enumerate Postal Validation Check
  • Contains blocks initially in TEA 2, with map spots for all
  • These blocks are included in Address List Review (LUCA) structures containing at least one housing unit 1998 materials, as the MAF was compiled initially in the same manner as TEA 1 areas
  • In some instances, blocks initially in TEA 3 will be con-verted to TEA 5. These blocks were not included in The Department of Defense has advised the Census Address Listing and LUCA 1999, and therefore lack Bureau that virtually all family housing (that is, individual structures and map spots in the MAF and TIGER at the residences as opposed to barracks, hospitals, and jails) are times that LUCA 1999 and Rural Update/Enumerate assigned city-style addresses to which the Postal Service are conducted delivers mail. The Census Bureau therefore implements Mailout/Mailback methods to enumerate the population of
  • Self-enumeration (through Update/Leave) is thought to these individual residences. Within TEA 1 areas, blocks on be unlikely or problematic military bases are assigned a TEA code of 1. Within TEA 2
  • Census address list is updated, and enumeration is con- areas, blocks on military bases are assigned a TEA code of ducted, concurrently 6. There is no difference between TEA 1 blocks on military 3-24 Section IChapter 3 Design of the Accuracy and Coverage Evaluation Sample U.S. Census Bureau, Census 2000

bases and TEA 6 blocks in terms of either compiling the

  • Same enumeration procedures as TEA 5 census address list or enumerating the population. Blocks
  • The term urban reflects the initial inclusion of the within military bases in List/Enumerate areas (TEA 3) also block in TEA 1 due to the predominance of city-style are TEA 3.

mailing addresses TEA 7 - Urban Update/Leave

  • These areas are included in Block Canvassing and the
  • Contains blocks initially in TEA 1 Postal Validation Check
  • Census address list is updated, and questionnaires are Most American Indian Reservations will be enumerated delivered concurrently, by Census Bureau staff (follow- using a single enumeration procedure (Mailout/Mailback, ing procedures employed in TEA 2 areas, but without Update/Leave, or Update/Enumerate). Some of these ini-assigning map spots) tially contained blocks with a mixture of TEA codes. In these instances, the reservations will be enumerated using
  • Blocks ARE included in the Postal Validation Check and Update/Enumerate methods (see TEA 5). However, for the New Construction Program affected blocks initially in TEA 1, the MAF and TIGER do
  • The term urban reflects the predominance of city-style not include map spots for structures containing at least addresses, and does NOT reflect official census defini- one housing unit. Instead of converting these blocks to tion of the term urban TEA 5 (Rural Update/Enumerate) and determining map spot locations, the blocks are being distinguished by a
  • These blocks are included in Address List Review (LUCA) separate TEA.

1998 materials, as the MAF was compiled initially in the same manner as TEA 1 areas TEA 9 - Additions to Address Listing Universe of Blocks In many areas where mail is delivered mostly to city-style addresses, older apartment buildings are common. In

  • Contains groups of blocks (assignment areas) initially many of these buildings, unit designators (that is, apart- assigned to TEA 1 ment numbers), often do not exist. Further, the subdivi-
  • Converted to Address Listing before Block Canvassing is sion of existing units into multiple units, and the conver-conducted sion of non-residential space to living quarters, may be frequent. Mail, therefore, often is not delivered to indi-
  • Blocks are NOT included in Block Canvassing, the Postal vidual apartments (or individual mail boxes), but instead Validation Check, or the New Construction Program left at common drop points.

Some blocks that are in TEA 1 contain a significant num-In some other areas with mostly city-style addresses, ber of living quarters with non-city-style addresses. These many residents have elected to receive their mail at post blocks should not be included in Block Canvassing, which office boxes. The Census Bureau is concerned that the is an operation that is designed to confirm and correct the city-style addresses of these residents may not appear in existence and/or location of city-style addresses. The the census address list. Geography and Field Divisions are identifying Block Can-vassing assignment areas (AAs) that likely contain blocks To ensure questionnaire delivery to the largest number of with significant numbers of non-city-style addresses.

residences, Update/Leave procedures are employed. As Some of these AAs will be removed from Block Canvass-these residences have city-style addresses, there is no ing, and included in Address Listing. The blocks in these need for enumerators to assign map spots to assist enu-AAs will be assigned a TEA code of 9, and the census merators in identifying these residences in subsequent address list compilation and census enumeration activities operations.

in TEA 9 blocks will be virtually identical to those in TEA 2 blocks (for instance, they will be included in Update/Leave TEA 8 - Urban Update/Enumerate and Nonresponse Follow-up).

  • Contains blocks initially in TEA 1, without map spots for Because most of these blocks had few, if any, addresses in any addresses; maps generated for TEA 8 areas will not the MAF from the USPS, the entities the blocks are in include map spots mostly had nothing to review during Address List Review
  • Contains mostly blocks on those American Indian reser- (LUCA) 1998. For this reason, most of these blocks will vations that initially were included in both TEA 1 and have their Address List Reviewed during a new phase of either TEA 2 or 3 LUCA, often called LUCA 99 1/2 .

Design of the Accuracy and Coverage Evaluation Sample Section IChapter 3 3-25 U.S. Census Bureau, Census 2000

Chapter 4.

A.C.E. Field and Processing Activities INTRODUCTION housing unit at the time of the interview and on Census Day. The outmovers lived in the housing unit on Census This chapter describes the operational aspects of the Day, but moved before the A.C.E. interview. Nonmovers A.C.E. which consisted of four major activities: housing and outmovers in the P sample were matched to census unit listing, housing unit matching, person interviewing, people in their block cluster. In 1990, each inmover and person matching. Housing unit listing and person household (those that moved into PES block clusters interviews were conducted as field activities, whereas after Census Day) had to be matched to a Census Day housing unit matching and person matching were process- address, which was usually outside the cluster. In 2000, ing activities carried out in the National Processing Center the reconstructed Census Day household was matched (NPC) in Jeffersonville, Indiana. As described earlier, all of to the census enumerations in the sample block cluster.

these activities were completed prior to estimation. Once the sample clusters were selected, interviewers visited the

  • A study of clerical error in the 1990 PES found error in clusters and independently listed all housing units. The coding matches (Davis 1991) and erroneous enumera-A.C.E. and census housing units were then matched and, tions (Davis 1991b). In 1990, codes were entered into a for those for which a match was not found, a follow-up computer system, but the actual matching and duplicate interview was conducted to determine the status of the searches were done using paper. In the 2000 A.C.E., the housing unit at the time of the census. matching was better controlled and more efficient than 1990 because the clerical matching and quality assur-Following the resolution of the housing unit nonmatches, ance were automated and coded directly into the auto-interviews were conducted with residents of the A.C.E.

mated system. The automated interactive system did sample household (P sample) to obtain the roster of not prevent all matching error, but reduced the chances household residents and the detail required for matching.

for error significantly. Software allowed searching for The P-sample persons were then matched to the list of matches in the census based on first names, last names, persons enumerated in the census in the sample clusters.

characteristics, and addresses. For example, the system The search area was expanded to include one ring of sur-allowed searching for all people named George, all rounding blocks for those clusters identified as containing people whose last name begins with an H, all people on potential census geocoding errors. This operation was Elm Street, or everyone in the age 30 to 40 range. The called the targeted extended search (TES) because it tar-software controlled the match codes that were relevant geted clusters with high rates of A.C.E. housing unit non-to the situation. For example, only P-sample nonmatch matches and census housing unit geocoding error. A fur-codes could be assigned to a P-sample nonmatch.

ther follow-up interview was conducted for selected mismatched people for whom additional information was

  • The electronic searches for duplicates reduced the required. Based on these activities, each person in the tedious searching through paper lists of census people.

sample clusters, whether interviewed in the A.C.E. sample The searching in 1990 was limited to printouts in two (P sample) or found in the census (E sample) was assigned sorts: last name and household by address. In 2000, the a final match status code. clerks had the capability to filter on name, characteris-tics, and address to help identify duplicates. The system It is important to point out some key improvements of the monitored whether the matcher had completed all the A.C.E. 2000 operations over the 1990 Post-Enumeration necessary searches, such as looking for duplicates.

Survey (PES). The 2000 A.C.E. improved on 1990 PES in several ways for interviewing and clerical matching.

  • There were built-in edits to ensure consistency of cod-ing. For example, codes that applied to a household,
  • One problem in 1990 was the misreporting of Census such as geographic codes, were assigned to all people Day addresses, with an estimated 0.7 percent of the P in the household. The system automatically assigned sample being erroneously reported as nonmovers (West certain codes, reducing coding error.

1991). The Computer Assisted Personal Interview (CAPI) instrument improved the quality of the reporting of

  • Clerical matchers could use a code indicating the case mover status because it was a more automated process. needed review at the next level of matching. This code In 2000, the Census Day household consisted of non- allowed them to flag unusual cases to be examined by a movers and outmovers. The nonmovers lived in the person with more experience.

A.C.E. Field and Processing Activities Section IChapter 4 4-1 U.S. Census Bureau, Census 2000

  • All quality assurance for the clerical matching was auto- units within a basic street address were listed on the mated. pages of the listing book reserved for multiunits. Also, the A.C.E. lister recorded the number of units within a basic
  • Clerical matching was centralized at the NPC instead street address on the map in parentheses to conform with of having separate groups of matchers in the seven census methodology.

processing offices, as was done in 1990. Forty-six tech-nicians were hired in August, 1999 and were thoroughly Mobile homes that were not in mobile home parks were trained in the design of the A.C.E. and methods of listed like single units. Each mobile home was assigned a matching people and housing units. These technicians unique map spot number and each mobile home was were responsible for quality assurance of the clerical listed on a separate line in the listing book. If the mobile matchers. Additionally, ten analysts who were among homes were in a park, the park was listed in the housing the most experienced matchers conducted quality unit section of the listing book, and each individual mobile assurance for the technicians and handled the most home and vacant site was listed in the mobile home park difficult cases. section of the listing book. Each individual mobile home was assigned a unique map spot number, whether the

  • The computer matcher identified matches and possible mobile home was in a park or not. The location of the matches within a block cluster. Additional computer mobile home was identified by placing the map spot num-programs were used to check the matching on cases ber for the mobile home on the map. This was the same after the before follow-up clerical matching to identify procedure that was used in the census.

matches and duplicates in the expanded search area that were not identified by the clerical matchers. Consis- The following items were collected and recorded in the tency checks were also performed between housing unit listing book for each basic street address:

and person match codes.

  • City-style addresses (house number and street names)
  • Keying error in the data capture of the 1990 PES was
  • Non-city-style addresses (route numbers, route and box reduced because the 2000 interview used a CAPI instru-numbers, or any other type of address that was not a ment. A more accurate capture of the data increased the city-style address) efficiency of the computer matching.
  • Householder names (rural areas only)

HOUSING UNIT LISTING

  • Description of addresses (for only nonhouse number The first stage of sampling was the selection of A.C.E. addresses in both urban and rural areas) block clusters. Then, in September through December of
  • Number of housing units in a basic street address 1999, a listing of the addresses of all the housing units in the A.C.E. sample clusters was conducted. The listing was
  • Type of basic street address (single unit, multiunit, independent of the census. Training in how to list both mobile home not in a mobile home park, mobile home city-style and non-city-style areas lasted 3 days and in a mobile home park, housing unit in special place, included a review of the first completed cluster assigned multiunit in a special place, or other) to each lister. There were 29,136 sampled block clusters
  • Unit status for single units (occupied or intended for in the 50 states and the District of Columbia. This list of occupancy, under construction, future construction, housing units recorded in the Independent Listing Books unfit for habitation, boarded up, storage of household (ILB) became the frame of A.C.E. housing units from which goods, and other) the P sample was later selected. Besides listing each hous-ing unit in the cluster, the listers inquired about housing The following items were also collected and recorded in units present at each special place and commercial struc- the listing book for each unit within a multiunit basic ture. street address:

The housing unit listing was by basic street address. Each

  • Unit designation basic street address was assigned a map spot number and
  • Unit status for multi units (occupied or intended for the map spot number was recorded on the A.C.E. map to occupancy, under construction, future construction, identify the location of the basic street address. The unfit for habitation, boarded up, storage of household address and coverage questions about the structure were goods, and other) asked for each basic street address. The number of hous-ing units at the basic street address was obtained from a The following items were also collected and recorded in household member at the address, by proxy, from the the listing book for each mobile home in a mobile home apartment manager, or by observation. This contact park:

helped to improve the coverage of housing units in the

  • House number, lot number, or physical description A.C.E. A page in the listing book for single and multiunit structures is shown in Figure 4-1. The individual housing
  • Street name 4-2 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000
  • Rural address those in the E sample) eliminating the error prone cleri-cal E-sample identification required to achieve the over-
  • Unit status (intended for occupancy, unfit for habitation, lapping samples in the 1990 Post Enumeration Survey.

boarded up, storage of household goods, vacant trailer site in a mobile home park, and other)

  • The linking of addresses also allowed the person inter-viewing to begin earlier on the telephone using the cen-After the listing books were received in NPC, they were sus telephone number for the census questionnaire checked in and the data keyed into a computer file. The returned by mail. The telephone number from the cen-keying quality assurance was 100 percent. Keying rejects sus questionnaire was not available without the link were reviewed clerically to correct errors before the between the A.C.E. housing unit and the census ques-matching began. A data file of the A.C.E. housing units tionnaire for that housing unit.

was created to be used as input to the housing unit After sample reduction of clusters, there were 11,303 clus-matching.

ters in the 50 states and the District of Columbia. See Chapter 3 for a discussion on the sample reduction. The HOUSING UNIT MATCH 420 clusters in list/enumerate areas were not matched, because their census addresses were not available in the The housing unit matching consisted of four steps: com-Spring of 2000. Therefore, 10,883 clusters were matched puter matching, clerical matching, housing unit follow-up, in the housing unit phase. Table 1 contains the number of and after follow-up coding. The A.C.E. housing units were housing units and clusters in housing unit matching for compared to the census housing units within cluster by the A.C.E. The census numbers were preliminary; these computer, and then, clerically. Housing units that did not were the addresses prior to mailing the census question-match, possible matches, and possible duplicates were naires. Subsequent census operations added and removed followed up by field inspection and interview. The results addresses from this list. Even though this census list con-of the follow-up interview were recorded during the after tains more housing units than the A.C.E., this was not follow-up coding.

indicative of coverage differences due to the preliminary The purpose of housing unit matching was to create a list nature of the census numbers. See Chapter 3 for a discus-of addresses that existed as housing units in the block sion of the final P-sample and E-sample housing units and cluster on Census Day to use in the P-sample interviewing. how they compare.

The housing unit listing was conducted in the Fall of 1999. Addresses that had a chance to be housing units on Table 4-1. Sample Sizes for the A.C.E. Housing Census Day, such as under construction, future construc- Unit Matching tion, and vacant trailer sites, were listed. After the housing Housing unit matching and follow-up, only the housing units origi- Clusters units nally listed and confirmed to exist as housing units were Clusters with housing units . . . . . . . . . . . . . 10,157 included in the P sample for CAPI interviewing. Housing A.C.E. housing units. . . . . . . . . . . . . . . . . . 838,427 units with unresolved status were also included in the Census housing units. . . . . . . . . . . . . . . . . 859,296 interviewing. Clusters without housing units . . . . . . . . . . . 726 Total clusters in housing unit matching. . . . 10,883 Computer matching was conducted after the second phase of sampling, which consisted of sample reduction Computer Match and small block subsampling. The results of the computer The census housing units included on the DMAF in matching were reviewed clerically. All matching was con- January, 2000, in the block clusters retained in the A.C.E.

ducted within the sample block clusters. The census after sample reduction and small block subsampling, were addresses were the ones contained in the January, 2000 used in the housing unit matching. The housing unit data version of the Decennial Master Address File (DMAF). This from the independent listing book file and the DMAF was not the final version of the inventory of census extract went through a series of data preparation steps, addresses, because of later operations. The inventory of including address standardization. Addresses from either census housing units was final after the Hundred Percent file that were blank or could not be standardized were Census Unedited File (HCUF) was completed. matched clerically. The results of the computer matching and images of the A.C.E. and census maps with map spots As noted earlier, the P and E samples were located in the in rural areas were inputs into an automated review and same block clusters. The advantages of linking the A.C.E.

coding software for clerical matching.

and census housing units were:

Clerical Match

  • The link of A.C.E. and census addresses allowed an overlapping P sample and E sample, (i.e., the housing The clerical matchers used the results of the computer units selected for the P sample were mostly the same as matching to aid in their matching of addresses from the A.C.E. Field and Processing Activities Section IChapter 4 4-3 U.S. Census Bureau, Census 2000

A.C.E. and the census. There were 115 clerks, 46 techni- addresses were more difficult to match, mainly because of cians, and ten analysts involved in the matching opera- the non-city-style addresses. The matchers had house-tion. The clerks carried out the matching. The technicians holder names and location descriptions to help in match-applied quality control to the matching performed by the ing the A.C.E. and census addresses in rural areas. The clerks. The analysts carried out quality control on the spotted maps for the A.C.E. and the census were also used work of the technicians. The clerks and technicians used a in the final determination of which housing units matched review code when they saw something unusual or some- in rural areas. Computer images of the A.C.E. and census thing that should have been looked at by the next level of spotted maps that were used in the housing unit matching matcher. The technicians and analysts examined the cases were accessed via the matching software and viewed on coded for review in the previous stage of matching, in the screen.

addition to cases selected for quality control. The clerks There was also a clerical search, limited to the block clus-used in the housing unit matching were given 4 weeks of ter, for duplicate housing units during this phase of the training. The technicians were hired in August, 1999 and matching. The possible duplicates were linked in the data-given extensive training on the background of coverage base for both the A.C.E. and the census. A follow-up inter-measurement and the design of the A.C.E. allowing them view was conducted to determine if the two addresses to make more informed decisions. The analysts were our referred to the same housing unit.

most experienced people. The analysts have worked on coverage measurement for many years and were quite One goal for the 2000 A.C.E. was not to use any paper in knowledgeable about the A.C.E. The three levels of staff the clerical matching. Almost all materials needed for produced a high quality of matching with a cost-efficient clerical matching were available on the computer. Paper-operation. less matching reduced the time needed for clerical match-ing, because the time spent waiting for an assignment and The clerical matching was conducted in the housing unit associated material was eliminated. There was thus no matching phase of the A.C.E. only for clusters expected to need for a large staff to maintain an A.C.E. library. Paper benefit from further examination. Since clerical matching maps were available to use for cases where the image of was labor intensive, the amount of clerical work per- the map was not available or was not easy to view in the formed for the 2000 A.C.E. was reduced by an automated software.

identification of clusters for follow-up interviewing with-out clerical review. These clusters had only a few non- The quality assurance was applied as follows: all of the matches or nonmatches on only one side. For example, work done by each clerical matcher was reviewed initially there could be 25 A.C.E. nonmatches and no census non- until the matcher was determined to be performing at an matches, so there was nothing the clerical matchers could acceptable level of quality. The number of records to be do. The clerical matchers were thus able to concentrate on reviewed before a clerical matcher was classified as the more difficult clusters where the review was benefi- acceptable was 200, after which an acceptable clerk had cial. In 2000, 3,267 clusters were sent to the field for the a systematic sample of clusters reviewed for quality assur-follow-up phase without clerical review. ance. There was a computer record of the level of quality of each clerks work. If the work in the sample of reviewed Supplemental materials were provided to facilitate the clusters fell below the acceptable level of quality, all of the clerical matching, such as the maps with spots to identify subsequent work of that clerk was reviewed by techni-the location of A.C.E. and census addresses in rural areas. cians, until the clerk achieved an acceptable level of qual-The A.C.E. and census addresses that could not be ity, then sampling was resumed. The analysts performed matched by the computer were identified for the clerical the same type of quality assurance on the technicians.

matching. The matched addresses were not targeted for Table 4-2 contains the results of before follow-up clerical review, because experience in studies preparatory to the matching. These numbers include only the housing units 2000 Census indicated a very high quality of the matches in clusters that were processed in the housing unit match-assigned by the computer. However, clerks were allowed ing. The list/enumerate clusters are therefore not to correct any errors in the computer matching that they included. The relisted clusters described at the end of this noticed, while they were attempting to match the housing section are also not included in Table 4-2. The census had units that were not computer matched.

more possible duplicates and housing units not matching The clerical matchers used all housing unit information than the A.C.E. The follow-up interview resolved the hous-available to match housing units. The urban areas were ing unit status and determined if the possible duplicates almost totally city-style addresses. In rural areas, the were in fact duplicated.

4-4 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

Table 4-2. Housing Unit Matching Results Before Follow-Up Interviewing A.C.E. Census Housing units Percent Housing units Percent Matched . . . . . . . . . . . . . . . . . . . . . 681,385 81.6 681,385 79.7 Possible match . . . . . . . . . . . . . . . 29,231 3.5 29,231 3.4 Possible duplicate . . . . . . . . . . . . 735 0.1 5,775 0.7 Not matched . . . . . . . . . . . . . . . . . 123,469 14.8 138,657 16.2 Remove from A.C.E. . . . . . . . . . . 10 0.0 Total . . . . . . . . . . . . . . . . . . . . . . . . 834,830 100.0 855,048 100.0 Housing Unit Follow-Up to determine its status at the time of the follow-up inter-view. The address was either classified as a housing unit All of the cases coded as not matched, possibly matched, or removed from further processing. For example, a unit or possibly duplicated were sent for a follow-up interview, that was under construction or future construction at the regardless of the type of basic street address code.

time of listing may have fit the definition of a housing unit Selected matched cases were also sent to follow-up to at the time of the follow-up interview. If the unit fit the collect additional information. Specifically, the cases definition of a housing unit, it was included in the A.C.E.

identified for field follow-up were:

housing unit processing. If construction had not pro-

  • A.C.E. addresses with a before follow-up code of gressed enough for it to fit the definition of a housing not matched. Information was obtained to determine unit, it was coded as removed from the A.C.E. housing whether the addresses were housing units within the unit inventory.

sample cluster. The housing unit follow-up forms were computer gener-

  • Census addresses with a before follow-up code of ated. The questions for housing units requiring a not matched. Information was obtained to determine follow-up interview were printed. In addition, all housing whether the addresses were housing units within the units in the block cluster were printed for reference. The sample cluster. questions for the A.C.E. nonmatches are in Figure 4-2. The same questions were asked for the census nonmatches.
  • Possible matches. The possible matches were sent to the field to determine if the A.C.E. and census addresses The questions on the follow-up form were not designed to referred to the same housing unit. If they did not, they be read to respondents, but were intended to be used as a were identified as an A.C.E. nonmatch and a census guide for an interviewer. Indeed, many questions were nonmatch during the housing unit follow-up and infor- answered by observation. The answer to one question mation was obtained to determine whether the may have been the result of asking several other ques-addresses were housing units within the sample cluster. tions. The follow-up interviewer appropriately modified the questions, when necessary, to the situation that was
  • Possible census duplicates. Census housing units encountered in the field and recorded the appropriate that were identified as possible duplicates were fol- answers on the follow-up form. This approach was lowed up to determine if the two census addresses adopted because there were many situations that could referred to the same housing unit. occur and a form to cover every possible situation would
  • Possible A.C.E. duplicates. A.C.E. housing units that be cumbersome to handle. It was necessary to find out if were identified as possible duplicates were followed up the housing unit satisfied the census housing unit defini-to determine if the two A.C.E. addresses referred to the tion at the time of the follow-up interview. There was no same housing unit. attempt to gather information about reasons for being something other than a housing unit.
  • Matched housing units with a code of under con-For example, the follow-up interviewer determined if the struction, future construction, unfit for habita-address for an A.C.E. independent listing nonmatch or a tion, vacant trailer site in a mobile home park, census nonmatch existed as a housing unit. This was not other. These matches were followed up to determine if a question meant for a respondent. There were several they fit the definition of a housing unit at the time of reasons why an address might not fit the definition of a the follow-up interview.

housing unit, such as it burned, it was a mobile home that An A.C.E. housing unit with unit status indicating some- moved, it was converted to fewer housing units, it was thing other than an occupied or vacant housing unit that group quarters, it was used for storage of farm machinery, was intended for occupancy needed a follow-up interview it was the laundry room in an apartment complex, it was a A.C.E. Field and Processing Activities Section IChapter 4 4-5 U.S. Census Bureau, Census 2000

Table 4-3. After Follow-Up Housing Unit Matching Results A.C.E. Census Housing units Percent Housing units Percent Matched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719,013 86.1 719,013 84.1 Not matched, but existed in the block cluster . . . . . . . . . . . . . . . . 76,418 9.2 28,874 3.4 Did not exist as a housing unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30,770 3.7 48,684 5.7 Geocoded outside the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6,316 0.8 45,053 5.3 Duplicate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,157 0.1 12,296 1.4 Unresolved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,156 0.1 1,128 0.1 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834,830 100.0 855,048 100.0 business, and so forth. The interviewer appropriately Table 4-4. A.C.E. Housing Units Eligible for modified the questions, as necessary, to the situation that Person Interviewing was encountered in the field. Furthermore, the interviewer A.C.E.

could identify matches or duplicates in the field that had not been identified in the clerical matching. Housing units Percent Additional matches were also identified between the A.C.E. and census addresses during the follow-up inter- Matched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719,013 90.3 view, when the interviewer realized the two different Not matched, but existed in the block cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76,418 9.6 addresses in the A.C.E. and census referred to the same Unresolved . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,156 0.1 unit. Corrections and updates to the addresses were also Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796,587 100.0 recorded on the follow-up form. The address updates were keyed into the database to accurately identify A.C.E. Relisting for Clusters with A.C.E. Geocoding Errors housing units for the person interviewing. The follow-up The follow-up operation also examined potential geocod-interviewers were instructed not to add housing units ing errors in the original A.C.E. housing unit listings. If a missed by both the A.C.E. and census for the 2000 A.C.E. large proportion of the A.C.E. housing units in the cluster had wrong geocodes, the cluster was relisted. Clusters After Follow-Up Coding were identified for relisting when the after follow-up cod-After the field follow-up, the completed forms were ing described in the previous section was completed. The returned to the processing office. Using the information decision to relist was automated. If 80 percent of the obtained during the field work, an after follow-up match housing units in a cluster had geocoding error, the cluster code was assigned by the clerical matchers for cases sent was relisted. There were 62 relisted clusters in the 50 to the field. The technicians and analysts reviewed the states and the District of Columbia. The field lister for clusters containing housing units with a review code and relisted clusters had no previous contact with this cluster.

carried out quality assurance for the clusters processed in The relisting operation was carried out independently of the after follow-up housing unit matching.

the list of census housing units. To assure independence, The follow-up forms were reviewed clerically and codes the A.C.E. housing unit listings (both the original listing were assigned to the A.C.E. and census housing units. and the relisting) were done without the A.C.E. lister see-Table 4-3 provides housing unit matching results for all ing the census inventory of housing units.

A.C.E. and census housing units after the follow-up inter-view codes were assigned. A.C.E. housing units classified There was no housing unit matching in the relisted clus-as existing in the block cluster and housing units with ters during the housing unit matching phase of A.C.E. The unresolved housing unit status were eligible for person addresses listed for A.C.E. during the relisting operation interviewing. This included both matched and not were the addresses used to conduct person interviewing.

matched units. A.C.E. addresses classified as not housing These clusters were treated in the same way as the units, duplicates, and geocoding errors were removed list/enumerate clusters in 2000.

from the A.C.E. universe, and therefore, were not eligible An unresolved code was assigned to all of the A.C.E. hous-for person interviewing. The numbers in Table 4-4 are the ing units in the relisted clusters and in the list/enumerate A.C.E. housing units that were eligible for person inter- clusters. The census housing units in these clusters were viewing before sample reduction. These numbers do not assigned a blank housing unit code.

include the relisted clusters and clusters in list/enumerate areas. Census housing units with codes of not matched PERSON INTERVIEW and unresolved statuses were not eligible to be included in the P sample for interviewing because they were not Prior to person interviewing there was another stage of listed in the A.C.E. independent listing. sampling, the within block subsampling of large block 4-6 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

clusters. See Chapter 3 for more details. The resulting

  • Housing units without house number and street name housing units from A.C.E. comprised the P-sample housing addresses units assigned for interviewing. There were 11,303 clus-
  • Housing units in small multiunit structures (i.e., less ters selected for interviewing, and they contained 300,913 than 20 units)

P-sample housing units. The person interview training lasted five days. Large multiunits were able to be included in the telephone interviewing, because they tended to have unique unit A.C.E. mover and residence status codes necessary to designations. Many small multiunit structures and rural identify P-sample people from the person interview are areas did not have addresses that allow the telephone assigned within the interview instrument. These codes are interviewer to distinctly identify the address. Since there described in Figure 4-3. was no housing unit matching in relisted and The goal of the interview was to obtain a household roster list/enumerate clusters, all person interviewing in relisted for everyone living at the housing unit at the time of the and list/enumerate clusters was by personal visit.

interview and on Census Day, April 1, 2000. Procedure C All remaining interviews after the end of the telephone was used for the 2000 A.C.E. With Procedure C, each operation were conducted in person, except for some non-A.C.E. person was assigned an A.C.E. mover code, an response conversion operation (NRCO) interviews and A.C.E. born since Census Day code, an A.C.E. group quar- interviews in gated communities or secured buildings. The ters code, and an A.C.E. other residence code. The A.C.E. person interviews conducted by personal visit were con-status code combined all of the information from these ducted from June 18 until September 11, 2000. Crew codes to identify the people for whom matching was nec- leaders and supervisors conducted telephone interviews essary. Attachment 1 contains the definitions for codes for to give them experience in interviewing.

the movers, those born since Census Day, members of group quarters, other residence code, and the A.C.E. Table 4-5 contains the number of interviewers, crew lead-status code. See the Chapter 7 attachment for more on ers, and supervisors used during production interviewing Procedure C. and during the interviewing for person follow-up after the clerical matching.

Group quarters were not listed in the A.C.E. and A.C.E.

interviews were not conducted in group quarters. See Table 4-5. Field Interview Personnel for a discussion of the treatment of group quarters in A.C.E. Telephone Personal Person interview interview follow-up Mode of Interview Interviewers . . . . . . . . . . . . . . . . . . . . 450 4,502 4,470 Crew leaders . . . . . . . . . . . . . . . . . . . 794 836 712 The A.C.E. person interview was conducted using a CAPI Supervisors . . . . . . . . . . . . . . . . . . . . 189 186 184 instrument on laptop computers. Attachment 3 contains a description of the procedures followed in the person For the first 3 weeks of interviewing, the person interview interview. Some person interviews were conducted by tele- was conducted only with a household member. If an inter-phone and some by personal visit. view with a household member could not be carried out within 3 weeks, an interview with a knowledgeable non-To get an early start for the interviewing, a telephone household member was attempted, called a proxy inter-interview was conducted at households where the census view. The proxy interviewing was allowed during the questionnaire included a telephone number and was remainder of the interviewing period. During the last 2 received at a census processing office early enough for weeks of interviewing for a cluster, a nonresponse conver-computer processing, before the start of person interview- sion operation was conducted for the noninterviews using ing. The telephone number came from the census ques- the best interviewers. This noninterview conversion tionnaire of the matching census housing unit. The person attempted to obtain an interview with a household mem-interviews conducted by telephone were conducted from ber or a knowledgeable proxy respondent, but not a last April 24, 2000 until June 13, 2000. See Byrne et al. (2001) resort interview1. The nonresponse conversion operation for more details. A total of 88,573 interviews or 29.4 per- converted 9,518 of the 9,735 total noninterviews to inter-cent of the total workload were conducted by telephone. views.

The following cases were excluded from the A.C.E. tele-phone interviewing: 1 Last resort interviews were ones with minimal information, such as names like White Female. The last resort interview is

  • Housing units in census large household and census usually not from a knowledgeable proxy respondent. Last resort interviews were conducted in the census at the end of nonre-coverage edit follow-up sponse follow-up, after all attempts to contact a knowledgeable respondent have not obtained an interview. Last resort interviews
  • Questionnaires that were not returned by mail were not conducted for A.C.E.

A.C.E. Field and Processing Activities Section IChapter 4 4-7 U.S. Census Bureau, Census 2000

The Questionnaire was based on various indicators likely to predict poor data quality or potential fabrication. The targeted sample was There were three paths or sections within the person inter-another 5 percent of the total workload.

view. An interview was conducted using the first two paths, when at least one of the household members, for A separate CAPI questionnaire was designed for the qual-whom information was required, currently lived at the ity assurance interviews. The quality assurance question-housing unit when the interview was conducted. One path naire contained separate paths for telephone and personal collected data from a household member, and another visit quality assurance interviews. The questionnaire also path collected data from a nonhousehold member (i.e., included a complete version of the original interview to proxy respondent) for these people. There were two allow quality assurance interviewers to conduct the paths, because the questions were worded differently for household interview on cases suspected of fabrication.

interviews with household members and with proxy Consequently, it was not necessary to assign another field respondents. The interviews from the first two paths were representative at a later date to conduct the household in housing units containing: interviews for such cases.

  • Whole household nonmovers Quality assurance interviews were conducted either by
  • Whole household inmovers telephone or personal visit. The interview determined whether or not the original respondent was contacted by
  • Households with a mixture of nonmovers, inmovers, an interviewer. If, after an initial set of questions, it and outmovers appeared that the respondent had not been previously The third path was for whole household outmovers. The contacted, the quality assurance interview continued with data for outmovers was obtained by proxy with the cur- a full household interview that replaced the original inter-rent resident in the sample household or with other proxy view in all future processing.

respondents, when necessary. When there was an inter-The quality assurance plan centered on whether the origi-view with whole household inmovers, there was also an nal interviewer actually contacted the person who was interview using the third path for whole household reported to have been interviewed. When this was the outmovers.

case, the interview itself was assumed to be correct When there were multiple interviews for the same housing because, the person interview questionnaire was designed unit, the CAPI data from the last interview was selected for to ensure data quality using data edits and automated processing. If there was also a quality assurance interview questionnaire skip patterns. When this was not the case that replaced the original interview, the quality assurance (i.e., the proper household was not contacted), a full rein-interview was selected over any other interview. terview was conducted.

After the interviewers obtained the names and characteris- The quality assurance plan was designed to be most effec-tics of household members, they established the residence tive for the few interviewers who blatantly include data status on Census Day. For nonmovers and outmovers, from fictitious interviews. This occurs in practice in similar mover status in addition to questions about group quar- surveys. Therefore, discrepant results were targeted by ters and other residences on Census Day established the looking for inconsistent or conspicuous results identified residence status. using the targeting reports. Examples of inconsistent or College students living elsewhere in dormitories were not conspicuous results include using the same name for part of the A.C.E. universe. However, they were inadvert- respondents across cases, using famous names for house-ently included as inmovers in the A.C.E. instrument. To hold members, or completing cases too late in the day to correct for this, an edit was performed for partial house- really have been interviewing at someones house.

hold inmovers who were in group quarters on Census Day.

Effectively identifying an interviewer with only one or two If the inmover was in group quarters on Census Day and errors in a large workload of cases would require a pro-was between the ages of 18 and 22, inclusive, the hibitively large random sample. Because, later A.C.E.

inmover was given an A.C.E. status code of removed.

operations such as the person follow-up interview were expected to identify such cases, the quality assurance Quality Assurance of Person Interviewing plan did not attempt to identify these situations beyond The quality assurance plan for the A.C.E. Person Interview what falls in the 5 percent random sample.

operation consisted of a reinterview of a sample of the original A.C.E. interviews. The workload consisted of a Preliminary Estimation Outcome Codes preselected random sample of 5 percent of the total per-son interview caseload and another sample consisting of Preliminary P-sample estimation outcome codes were cases targeted by the supervisors in the regional offices assigned to each P-sample housing unit before the com-using specially designed targeting reports. The targeting puter and clerical matching. This outcome code was 4-8 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

assigned to the housing unit based on Census Day for Noninterviews nonmovers and outmovers. Only people with the

  • Field noninterview.

following A.C.E. status codes were used in the matching operations:

  • Whole households of people with insufficient informa-tion to permit matching and follow-up.
  • N = nonmover resident Vacant on Census Day
  • O = outmover resident
  • Housing units identified as vacant on Census Day by the
  • U = unresolved residence status interviewer.

The preliminary estimation outcome codes identified

  • Whole households of people who should have been interviews and noninterviews in occupied housing units, counted elsewhere on Census Day (i.e., whole house-vacant housing units, and housing units that were hold nonresidents).

removed from the P sample. The interview outcomes Not a Housing Unit on Census Day described in this section were Census Day interview outcomes after data editing, which converts whole house-

  • The housing units identified during the person interview holds of Census Day residents with insufficient informa- as not a housing unit on Census Day were removed tion for matching to noninterviews and whole households from the P sample.

of Census Day residents, who should not have been Table 4-6 contains the number of each category of prelimi-counted at the housing unit on Census Day to vacant nary outcome codes and the number and percentages of housing units.

total occupied and vacant housing units for the prelimi-Interviews nary outcome codes grouped into interview, noninterview, and vacant. The percentages of interview and noninter-

  • Complete interviews. Interviews conducted with a view for occupied housing units were also included. The household member. noninterview rate for occupied housing units was 1.9 per-cent based on the preliminary outcome codes before
  • Proxy interviews. Interviews conducted with someone clerical matching. The interviewers identified 10,206 outside the household.

addresses or 3.4 percent of the A.C.E. addresses as not

  • Sufficient partial interviews. Interviews with household being housing units on Census Day. The A.C.E. housing members or proxies that did not collect all required units identified as something other than housing units data, but did collect enough information to be consid- were not in the P sample. For more details see Childers et ered as interviews. al. (2001).

Table 4-6. Preliminary Census Day Estimation Outcome for A.C.E. Housing Units (Unweighted)

Total housing units Occupied housing units Outcome code Number Percent Number Percent Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257,624 88.6 257,624 98.1 Complete interview with a household member . . . . . . . . . . . . . . . . . . . . . . . . . 235,632 Complete interview with a proxy respondent . . . . . . . . . . . . . . . . . . . . . . . . . . 19,380 Sufficient partial interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,612 Noninterview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,988 1.7 4,988 1.9 Field noninterview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,667 All people have insufficient information for matching and follow-up . . . . . . . 2,321 Total occupied housing units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262,612 100.0 Vacant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28,095 9.7 No Census Day residents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,184 Vacant on Census Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23,911 Total occupied and vacant housing units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290,707 100.0 Not a housing unit on Census Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10,206 Total interviewed housing units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300,913 A.C.E. Field and Processing Activities Section IChapter 4 4-9 U.S. Census Bureau, Census 2000

The percent noninterview was calculated for the quality assurance check, but no data were collected to unweighted numbers of noninterviews divided by the replace the original interview since the QA interviewer occupied interviews, which was the interviews plus the could not find the address. However, unlike in personal noninterviews. Tables of preliminary noninterview rates visit cases, no attempt was made by the QA interviewer to are presented for respondent type and interview mode in determine if the sample address also did not exist on Cen-Tables 4-7 and 4-8. sus Day. Therefore, these cases were considered to be Census Day noninterviews. There were 108 such cases.

Table 4-7. P-Sample Preliminary Percent Noninterview in Before Follow-Up PERSON MATCHING by Respondent Type After both the CAPI interviewing and the HCUF were com-P-sample preliminary Respondent type percent noninterview pleted, the E sample was identified from the HCUF and person matching began. People with incomplete names Household member . . . . . . . . . . . . . . . . . . . . . . . 0.9 were identified by computer for both the P and E sample, Proxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.8 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 because they did not contain sufficient information for matching and follow-up. See Attachment 4 for more infor-Of all interviews at occupied housing units, 33.5 percent mation about census data-defined and insufficient infor-were completed by telephone, 66.1 percent were com- mation for matching and follow-up.

pleted by personal visit, and 0.3 percent, which was 910 The P-sample people and those in the HCUF, within the interviews, were completed by a quality assurance sample clusters, were computer matched. The possible replacement interview. The percent noninterview of occu-matches, P-sample nonmatches, and E-sample nonmatches pied housing units for each interview mode is shown in were clerically reviewed using an automated matching and Table 4-8.

review system. Additional matches and possible matches were identified by the clerical staff. Duplicates on both Table 4-8. P-Sample Preliminary Percent Noninterview Before Follow-Up by lists were also identified clerically. After the matching was Interview Mode completed, field follow-up was conducted and the results of the field interview were coded in the matching data-P-sample preliminary base.

Interview mode percent noninterview Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.9 Within Block Cluster Computer and Clerical Personal visit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Matching Quality assurance replacement . . . . . . . . . . . . . 36.0 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 With procedure C, the people in P-sample housing units, who were initially matched to the E-sample and While telephone interviews were more likely than personal non-E-sample census enumerations were:

visit interviews to have insufficient information because neighbors could not be contacted, this was offset by the

  • nonmovers and outmovers identified as residents (i.e.,

straightforward nature of the telephone interviews. These A.C.E. status equal to N and O), or were cases where the respondent completed and returned

  • people with unresolved residence status (i.e., A.C.E.

the census form in a timely manner and provided a tele- status equal to U) phone number on the form. Conversely, personal visit cases tended to be the more difficult situations (such as The matching within the sample clusters was done by the movers or reluctant respondents), and were therefore, computer matcher followed by a computer assisted cleri-much more likely to result in noninterviews. cal review. The computer compared the nonmovers and outmovers to the E-sample census enumerations in sample There were several reasons for a high noninterview rate clusters and when necessary to the non-E-sample enu-for the quality assurance replacement interviews. These merations. These non-E-sample enumerations were census were difficult interviews, because they failed the quality people in housing units that were not included in the E assurance check and needed a reinterview. Many of the sample after the subsampling of census housing units.

noninterviews were refusals. Additionally, because the The clerical matchers also searched among people enu-instrument was monitoring both the quality assurance merated in the census in group quarters. A match was case and the replacement interview, it was difficult to assigned when the name and characteristics in the P obtain the Census Day residents in mover cases so that sample for a person were found in the census data within many of these were noninterviews. There was also a prob-the block cluster.

lem with the instrument in cases where the quality assur-ance interviewer could not find the address on the day of During computer matching, the P sample was matched to the QA interview. When this occurred, the case failed the the census. However, this matching was prioritized; first 4-10 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

the P sample was matched to the E sample, then any left- was keyed. For person matching, images were only avail-over nonmatches from the P sample were matched to the able for housing units on the January, 2000 DMAF. Images non-E-sample people in housing units. The matching were not available for census housing units added after occurred in two steps: January, 2000. An address identified by census identifica-tion number (ID) could return more than one form, includ-

  • Record Pair Ranking. The standardized names from ing the following: original census form, Be Counted form, the P-sample person and the census person were com-a foreign language form, and/or a Simplified Enumerator pared along with the person characteristics using a Questionnaire. Be Counted forms were not available to use string comparison (Winkler, 1994). A ranking score was for viewing images, since they did not have a census ID assigned to each pair of people and the optimal pairs associated with the form when data captured.

were identified.

The clerical matchers reviewed data for census people

  • Determination of Match Cutoffs. The optimal pairs with insufficient information for matching and follow-up in the cluster were reviewed to determine the cutoffs for and searched for additional information that might allow matches and nonmatches. All pairs above the match them to be matched when the image was available. All cutoff were identified as a match. All pairs between the review of census people with insufficient information for match cutoff and nonmatch cutoff were identified as matching and follow-up was done before the clerical possible matches. All pairs below the nonmatch cutoff matching began and the census data in the matching soft-were classified as not matched. Match cutoffs were ware was updated. The software did not permit the assigned conservatively to prevent false matches.

assignment of a code until there were two characteristics The goal of the matching and follow-up operation was to and a complete name. After the software data were produce the correct ratio of cases classified as omitted updated, the clerical matching process began and the from the census to those classified as correctly included in matchers could match the P-sample person to the census the census. After the computer matching, P-sample and people now containing sufficient information for match-E-sample people who did not match were reviewed cleri- ing. The matchers were also able to review data for non-cally. The clerical matchers were able to match people the matches when they suspected data capture errors and to computer could not, because they had the whole house- correct the records of name, relationship to person num-hold to aid in matching. The P-sample nonmatches were ber one, sex, age, Hispanic origin, and race.

searched for in the census. A duplicate search was also conducted clerically. The matching and duplicate search The corrected data were used on the follow-up form, but was aided by the software in sorting and searching the not sent to estimation. The updated data were not census records. The computer assisted clerical matching inserted into the HCUF. This updating was for matching in software contained all A.C.E. and census information A.C.E. and for the follow-up form only. The matchers were about P-sample and census people, including names, char- NOT looking at people who were not data-defined to see acteristics, outcome of the interview, and address. if there was more information on the census form to make them data-defined. Therefore, people were NOT created in The A.C.E. technicians carried out the quality assurance the census.

for the clerical matchers and resolved the cases flagged by the clerical matchers as needing further review. The A.C.E. Duplicate Search Within Cluster. The search for dupli-analysts did the quality assurance for the technicians and cates was done clerically. A person was duplicated when resolved the cases flagged by the technicians as needing the data collected for the person was repeated within the further review. There were 235 clerks, 46 technicians, and block cluster. The printouts used in 1990 for duplicate 10 analysts to do the clerical matching. search were automated in 2000. Search routines in the 2000 clerical matching software made the searches Census Images. Scanned images of census question- quicker and more accurate. Duplicates were linked in the naires were available for matching for the first time in matching system for later analysis.

Census 2000. The clerical matchers used these images as an aid in matching and when additional information (like Duplicated People Were Identified:

names) was found, the new information was made avail-

  • Within the P sample. A duplicated P-sample person able for the follow-up interview. An E-sample record could was removed from the final P sample, because both be updated by the clerks to provide sufficient information people were not needed in that household in the P for matching and follow-up or to correct image capture sample. When the whole households of P-sample people errors. In addition, some information written outside the were duplicated, one of the housing units was con-capture boxes was used to update the data.

verted to a noninterview because the interview was not For Census 2000, all census forms were scanned and the a good one. The duplicated P-sample household was in subsequent information was interpreted using Optical a different housing unit and one of them was included Mark Recognition and Optical Character Recognition or instead of the people who actually lived at the address.

A.C.E. Field and Processing Activities Section IChapter 4 4-11 U.S. Census Bureau, Census 2000

For example, the Smith family was collected in apart- errors. The clerical matchers assigned a code indicating ments A and B. Both apartments were housing units. geocoding error to E-sample persons for whole household The P-sample interview for the duplicated family is not a E-sample nonmatches. There was no need for a follow-up good interview and is converted to a noninterview after interview, since the housing unit follow-up operation iden-the P-sample people were removed. tified these housing units with geocoding errors. These E-sample people were erroneously enumerated in this

  • Within the E sample. An E-sample person duplicate sample cluster because they were enumerated in a hous-was an erroneous enumeration in the census.

ing unit that was incorrectly geocoded to this sample clus-

  • Between E-sample people and people not in the E ter. In 1990, these people were followed up because it sample. The E-sample people were also compared to wasnt clear who was incorrectly geocoded until after the the census people in housing units within the same follow-up interview.

sample cluster who were not in sample in large block clusters after the E-sample identification. There was no Coding Nonmatches in Large Households duplicate search between E-sample people and people The mail return short form had a continuation roster to enumerated in group quarters. Also, there was no dupli-collect names for persons seven through twelve. The mail cate search within group quarters.

return long form had a roster for the names of persons When duplication between an E-sample person and a one through twelve. Data were collected for the first six non-E-sample person was identified, it indicated that people in the household, for both long and short forms. If there was not a full erroneous enumeration. Therefore, the large household follow-up was unsuccessful, there the probability of erroneous enumeration caused by were only names for persons seven through twelve for the duplication was needed for the duplicated E-sample per- long and short mail return forms. Census records were not son. The formula for the probability of erroneous enu- created for the people in households with only names, meration, was 100 times d divided by c+d+1 percent or since they were not data-defined.

Pr (EE) 100 d / (c + d + 1) percent The names on the rosters were used to reduce the where P-sample follow-up of nonmatches in large households.

P-sample people in large households who were found on c = number of times the E-sample person was the large household roster were not followed up because duplicated with another E-sample person they were residents of the housing unit on Census Day.

d = number of times the E-sample person They were still counted as not matched to a census enu-was duplicated with a non-E-sample person meration, but a follow-up interview was not needed to establish their residence on Census Day.

In 1990, when there was duplication between a person in the E sample and a person in a household that was not in Targeted Extended Search the large-cluster subsample, and therefore not in the E sample, the E-sample person was assigned a probability of P-sample whole household nonmatches with no address erroneous enumeration of one half. This methodology was match and E-sample whole households of nonmatched refined in the 2000 A.C.E. to accommodate triplicates. The people in housing units coded as geocoding errors had 1990 estimate was biased when there was a triplicate enu- their search area expanded into the first ring of surround-meration in the census and this triplicate involved two ing blocks. The expanded search is referred to as targeted E-sample duplicates and the triplicate was not in the E extended search (TES). See Chapter 5 for a full discussion.

sample. However, there were only a few of these cases in The targeted extended search for 2000 A.C.E. was a two-2000. stage process. First, clusters were identified that would benefit most from expanding the search area to surround-This assumes the E-sample person had been coded as cor-ing blocks. Second, blocks within the surrounding blocks rectly enumerated. If the E-sample person was coded unre-were targeted for searching.

solved, the final probability of erroneous enumeration included an imputation for unresolved enumeration status. This extended search was targeted at the clusters most If the E-sample person was assigned a match code that likely to benefit from expanding the search area. The clus-indicated erroneous enumeration, the number of times ters selected for targeted extended search for the 2000 that the E-sample person was duplicated with non-E- Accuracy and Coverage Evaluation were:

sample people was irrelevant and ignored. A person could not have a probability of erroneous enumeration that was

  • Clusters included with certainty larger than 100 percent.
  • Relisted clusters in A.C.E.

Census Geocoding Errors

  • The 5 percent of clusters having the most The clerical matchers reviewed people in census housing unweighted census geocoding errors and A.C.E.

units identified in the housing unit matching as geocoding address nonmatches 4-12 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

  • The 5 percent of clusters having the most weighted P-Sample Matching Extended Search census geocoding errors and A.C.E. address non- The search area was expanded to clerically search the ring matches of surrounding blocks for the P-sample whole household nonmatches, when a housing unit was not a match in
  • Clusters selected at random from the clusters with housing unit matching, (i.e., the housing unit match code A.C.E. housing unit nonmatches (i.e., A.C.E. housing was a nonmatch or unresolved). There was no searching in units coded CI or UI) or census housing units identified surrounding blocks for partial household nonmatches or as geocoding errors (i.e., coded GE) for whole household nonmatches with matching The clusters not selected for targeted extended search addresses.

were: How the search was done depended on whether the clus-ter and its surrounding blocks consisted solely of urban

  • Clusters not selected from the clusters with A.C.E. hous-type addresses, or whether they consisted of some or all ing unit nonmatches (i.e., A.C.E. housing units coded CI rural type addresses.

or UI) and census housing units identified as geocoding errors (i.e., coded GE), (i.e. TES eligible for sampling, but

  • In areas that are completely urban, if the clerk located not selected) the basic street address in the surrounding blocks or the clerk determined the range of addresses was in the
  • Clusters with no A.C.E. housing unit nonmatches or cen- surrounding blocks, person matching was conducted in sus geocoding errors identified in the housing unit that block where the basic street address or range was matching. located. The matching was also conducted when there was a possible address match in a surrounding block.
  • List/Enumerate clusters
  • In rural or mixed urban and rural areas, because of the Table 4-9 contains the number of clusters selected for TES difficulties in matching rural type addresses, there was and the number of P-sample and E-sample people in TES. no attempt to match addresses in the surrounding The number of clusters includes the clusters included with blocks. Instead, people were searched for in all of the certainty because they were relisted. P-sample people with surrounding blocks.

a residence probability of zero have been excluded from E-Sample Extended Search for Geocoding Errors the table.

A census person in a housing unit that was coded as a Table 4-9. The TES Sample geocoding error was an erroneous enumeration unless the housing unit was located inside the expanded search area.

P-sample E-sample The census geocoding errors were identified in the hous-Clusters people people ing unit phase of the A.C.E. Another interview identified Included with certainty . . . . . . . . . . . 1,150 28,533 20,572 the housing units that physically existed in the surround-Sampled for TES . . . . . . . . . . . . . . . 1,089 3,889 2,281 ing blocks, instead of within the cluster where they were Total . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,239 32,422 22,853 enumerated. This field work was done for whole house-hold E-sample nonmatches in housing units identified dur-Clusters with the most unweighted and weighted census ing the housing unit phase as geocoding errors. This field geocoding errors and A.C.E. address nonmatches were visit was conducted at about the same time as the A.C.E.

included because some clusters with large weights con-person interview.

tribute disproportionately to the estimates. Approximately 10 percent of the clusters of the remaining clusters with The people in these housing units were coded as follows:

A.C.E. housing unit nonmatches and census geocoding

  • If the housing unit was found to exist in the surround-errors (49 percent of all clusters) were selected at random. ing blocks, the clerks coded the E-sample person as There were 2,239 clusters selected for targeted extended geocoded to the surrounding blocks during the before search. follow-up person matching.

In the second stage of targeting, the work was targeted to

  • If the housing unit existed in the sample cluster, the blocks within the search area where the geocoding error E-sample person was coded as not a geocoding error, was located. In 1990, the effort required to search for because that housing unit did exist in the sample matches and duplicates in large areas that had only a few cluster.

possible matches or duplicates appeared to lead to errors.

  • If the housing unit did not exist in the surrounding There was anecdotal evidence of clerks who did not blocks or could not be located on the map sent with the bother to look in surrounding blocks because they rarely case, the E-sample person was coded as a geocoding found anything. Targeting the expanded searching prob- error, indicating the person was erroneously enumer-ably reduced clerical errors, as well as the cost of the ated because the housing unit was incorrectly geocoded operation. in the block cluster.

A.C.E. Field and Processing Activities Section IChapter 4 4-13 U.S. Census Bureau, Census 2000

  • If the field work was not done or if it could not be deter- The targeted extended search was based on the A.C.E.

mined if the block number entered on the form was in housing unit matching to the January, 2000 DMAF and did the block cluster or in the surrounding blocks, the unre- not cover census housing units added to the block cluster solved code was used. There was no follow-up for the since housing unit matching, thus excluding any geocod-unresolved cases. ing errors that were not recognized in time to conduct the TES field follow-up. If a cluster was not identified for tar-A person follow-up interview for the E-sample nonmatches geted extended search and a large building was added to coded in the sample cluster or in the surrounding blocks the cluster, the first time it could have come to our atten-was needed to identify other reasons for erroneous enu-tion was during person matching and any added housing meration, such as fictitious people and other residences units would be identified as geocoding errors during the where people should have been counted on Census Day.

person follow-up. If any of these cases should have been E-Sample Targeted Duplicate Search included in the targeted extended search and were incor-rectly geocoded, another follow-up operation would have A search for duplicated people was conducted clerically in been needed to identify the ones that actually existed in the targeted extended search clusters, when the housing the surrounding blocks and those that existed outside the unit was identified during the field interview as physically expanded search area.

existing in the surrounding blocks. Like the P-sample search for missed units, the duplicate search was created There was not sufficient time to conduct another interview to identify people who were duplicated because of geo- to determine which added census housing units with coding error. There was no searching for duplicates in the geocoding error really existed in the first ring of surround-group quarters enumerations. ing blocks. These cases were handled in two ways:

If an E-sample housing unit was identified as existing in

  • In TES clusters and clusters eligible for TES sampling, the surrounding blocks, a housing unit duplicate search the people in added housing units where person was conducted. How this was done depended on whether follow-up identified geocoding error were treated as the cluster and its surrounding blocks consisted solely of unresolved and the probability of correct enumeration urban style addresses or whether they were some or all was imputed. These new unresolved cases were treated rural style addresses. the same as any other person coded with unresolved geography.
  • In urban areas, this duplicate search was done first on housing units and then on people. First, the clerks
  • When the housing unit was not in a TES cluster, the searched in the block where the housing unit should people remained coded as geocoding errors and were have been counted in the ring of surrounding blocks. If erroneous enumerations.

the housing unit was duplicated, a search was con-A similar limitation existed when a housing unit that was ducted to identify duplicated people. The duplicate matched in the housing unit matching was later deleted.

search was conducted only in the block where the dupli-There was a concern that the deleted unit may have been cated housing unit was located. These people were moved to a surrounding block. Clusters, where matched duplicated because the housing unit was enumerated housing units in the DMAF that were deleted from the correctly in a surrounding block and incorrectly in the HCUF, had no chance of being TES clusters, if the cluster sample cluster. If the housing unit was not duplicated, a had no A.C.E. housing unit nonmatches or census geo-search for person duplication was not conducted. The coding errors.

search concentrated on people who were duplicated and were in duplicated housing units caused by housing These deleted cases were also treated differently depend-unit geocoding error in the surrounding blocks. ing on whether they were in TES clusters:

  • The duplicate search in rural or mixed areas was a
  • If in a TES cluster, they were identified as TES people search throughout the entire search area for person and a surrounding block search was conducted for the duplicates. housing units in the TES P-sample matching.

Added and Deleted Census Housing Units

  • If the housing unit was not in a TES cluster, there was no surrounding block matching. Surrounding block Census coverage operations continued past the creation of matching could not be done because there were no sur-the January, 2000 DMAF. As a result, an added census rounding block people in non-TES clusters.

housing unit is one that was not in the initial housing unit matching, because it was added to the inventory of census Before Follow-Up Results housing units after the January, 2000 DMAF was created. A deleted census housing unit is one that was in the January, Tables 4-10 and 4-11 contain the results of before 2000 DMAF, but was removed from the cluster before the follow-up matching for the P sample and the E sample. For final inventory of housing units was created. details of these codes, see Childers (2001). These before 4-14 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

follow-up matching results are from unweighted data from Correctly enumerated. The correctly enumerated the fifty states and the District of Columbia. people in before follow-up matching were the ones match-ing the P sample.

The P-sample codes are grouped into:

Erroneously enumerated. The categories during before

  • Matched follow-up were fictitious people, duplicates, insufficient
  • Not matched information for matching and follow-up, and geocoding errors.
  • Possible match
  • The fictitious people were those where notes on the
  • Unresolved match status census image identified the person as one who died
  • Removed from the P sample before or was born after Census Day, or as not a real person such as a dog or other pet.

Matched. The P-sample person was found in the census.

  • The E-sample people enumerated more than once were Not matched. The P-sample person was not found in the coded as duplicates.

census. A follow-up interview was conducted for:

  • The E-sample people with insufficient information for
  • Partial household nonmatches matching and follow-up were those who were data-defined, but did not contain full name and at least two
  • Whole households of conflicting household members characteristics.4 (i.e., whole households of P-sample and census non-matches)2
  • Census people in housing units identified as geocoding errors5 during the initial housing unit follow-up were
  • Other whole household nonmatches where the P-sample coded as erroneously enumerated because of geocoding interview was conducted with a nonhousehold member3 error.

Possible match. The P-sample person may have been a Nonmatch. All E-sample people who did not match to the match to the census person. A follow-up interview was P sample were sent for a follow-up interview.

needed to determine if the two names referred to the Possible match. E-sample people who were coded as same person.

possible matches were followed up to determine whether Unresolved match status. The only category of unre- they were, in fact, matches.

solved in the before follow-up matching was insufficient Unresolved. In before follow-up matching, the unre-information for matching and follow-up. solved category only includes the census housing units that needed targeted extended search field work and that Removed from the P sample. The only category of field work was not done.

removed from the P sample in the before follow-up match-ing were the P-sample people coded as duplicates.

Table 4-10. P Sample Before Follow-Up The E-sample codes are grouped into: Matching

  • Correctly enumerated P-sample match status Unweighted people Percent
  • Erroneously enumerated Matched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573,506 85.7
  • Nonmatch Not matched . . . . . . . . . . . . . . . . . . . . . . . . . . 76,804 11.5 Possible match . . . . . . . . . . . . . . . . . . . . . . . . 5,070 0.8
  • Possible match Unresolved . . . . . . . . . . . . . . . . . . . . . . . . . . . 7,524 1.1 Removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5,923 0.9
  • Unresolved Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668,827 100.0 2 4 These cases have been called the Smith/Jones cases in the This is the same rule that was used in the 1990 PES. There past. must have been enough information about the person to have a 3

No follow-up interview was conducted when there were chance at locating the person for a follow-up interview before the whole households of P-sample nonmatches from interviews with person was allowed into the matching process. See Childers household members in a housing unit that did not match in the (2001).

5 housing unit operation or matched to a housing unit containing A geocoding error is an error in assigning the housing unit to no data-defined people. the correct location.

A.C.E. Field and Processing Activities Section IChapter 4 4-15 U.S. Census Bureau, Census 2000

Table 4-11. E-sample Before Follow-Up Matched. The P-sample person was found in the census Matching in the block cluster or in a surrounding block after the follow-up interview.

Unweighted E-sample enumeration status people Percent Nonmatched resident of the cluster on Census Correctly enumerated . . . . . . . . . . . . . . . . . . 544,995 76.4 Day. The P-sample nonmatch was not found in the Erroneously enumerated . . . . . . . . . . . . . . . 27,934 3.9 Not matched . . . . . . . . . . . . . . . . . . . . . . . . . . 134,916 18.9 census, and the follow-up interview determined he or she Possible match . . . . . . . . . . . . . . . . . . . . . . . . 4,751 0.7 should have been counted in the search area for this Unresolved . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 0.0 cluster.

Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712,900 100.0 Note: Percentages in table may not add to total due to rounding. Unresolved residence or match status. The person had unresolved residence status, because the follow-up A.C.E. Person Follow-Up interview did not successfully collect the information required to accurately identify this person as a resident of The person follow-up was conducted to gather additional the cluster on Census Day. In the case of possible information to accurately code the residence status of the matches, the follow-up interview was not able to ascertain nonmatched P-sample people and the enumeration status the match status of the people.

of the E-sample people. In addition, the match status of the possible matches was resolved during the follow-up Nonresident of the cluster on Census Day. The interview. The following cases were sent to person follow-P-sample person was not a resident of the housing unit on up:

Census Day and was removed from the P sample. These

  • P-sample partial household nonmatches people were duplicates, fictitious, living in a P-sample housing unit that was listed in the cluster in error (i.e.,
  • P-sample whole household nonmatches where the cen- P-sample geocoding error), or the P-sample person should sus enumerated different E-sample people (i.e., conflict- have been counted at another residence on Census Day.

ing households or Smith/Jones cases)

The results of the follow-up interview in Table 4-12 indi-

  • P-sample whole household nonmatches where the cate 14.7 percent unresolved and 12.5 percent removed A.C.E. person interview was with a proxy respondent from the P sample.
  • E-sample nonmatches Table 4-12. Results of P-sample Follow-Up
  • Possible matches between the P sample and the census Interview
  • P-sample matches and nonmatches with unresolved Unweighted After follow-up match code people Percent residence status Matched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9,793 19.4
  • P-sample nonmatches needing additional geographic Nonmatched resident . . . . . . . . . . . . . . . . . . 26,961 53.4 work6 Unresolved . . . . . . . . . . . . . . . . . . . . . . . . . . . 7,451 14.7 Nonresident . . . . . . . . . . . . . . . . . . . . . . . . . . 6,296 12.5 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50,501 100.0 The results of the follow-up interview were recorded in the matching software by the matching clerks. Table 4-12 con-tains the results of the follow-up coding for the P-sample Table 4-13 contains the results of the E-sample follow-up people who were followed up. The P-sample people who interviews. The followed-up E-sample people were classi-were followed up were clerically classified as: fied as:
  • Matched
  • Matched
  • Correctly enumerated
  • Nonmatched resident of the cluster on Census Day
  • Erroneously enumerated
  • Unresolved residence or match status
  • Unresolved
  • Nonresident of the cluster on Census Day Matched. The P-sample and E-sample enumerations refer 6 to the same person. The match was made after the Housing units in relist and list/enumerate clusters did not have housing unit matching. Therefore, P-sample geocoding follow-up interview.

errors in such clusters needed to be identified during person matching. In addition, when the interviewer changed the address Correctly enumerated. The E-sample nonmatch was in the CAPI instrument, the P-sample geography was checked to make sure the interviewer did not interview outside the sample identified during the follow-up interview as correctly enu-cluster. merated in the census.

4-16 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

Erroneously enumerated. The E-sample nonmatch was Census Day. If the nonmatch was not sent for a follow-up identified during the follow-up interview as erroneously interview, a household member identified the person as a enumerated in the census, because the person should resident of the housing unit during the original A.C.E.

have been counted at another residence on Census Day, interview.

was fictitious, had insufficient information for matching and follow-up, was duplicated, or lived in a household that Unresolved match status. The match status was unre-was a geocoding error. solved for possible matches with unsuccessful follow-up interviews and for P-sample people with insufficient infor-Unresolved. The follow-up interview for the census non- mation for matching and follow-up.

match was not successful.

Removed from the P sample. People were removed The results of the E-sample follow-up in Table 4-13 indi- from the P sample when they were fictitious, duplicates, cate 7.4 percent of the E-sample people followed up were geocoding errors, or not residents of the housing unit on erroneously enumerated and 14.1 percent were unre- Census Day.

solved.

Table 4-14. P-sample Match Status After Table 4-13. Results of E-sample Follow-Up for Follow-Up Nonmatches and Possible Matches P-sample after follow-up Unweighted Unweighted match status people Percent After follow-up match code people Percent Matched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578,695 88.6 Matched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9,088 6.3 Not matched . . . . . . . . . . . . . . . . . . . . . . . . . . 54,424 8.3 Correctly enumerated . . . . . . . . . . . . . . . . . . 103,589 72.2 Unresolved . . . . . . . . . . . . . . . . . . . . . . . . . . . 7,826 1.2 Erroneously enumerated . . . . . . . . . . . . . . . 10,618 7.4 Removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12,393 1.9 Unresolved . . . . . . . . . . . . . . . . . . . . . . . . . . . 20,185 14.1 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653,338 100.0 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143,480 100.0 The P-sample residence status was defined as:

After Follow-Up Coding

  • Resident After the follow-up was completed, the results of the inter-
  • Nonresident views were reviewed and codes entered into the system by the matching clerks. See Attachments 5, 6, and 7 for
  • Unresolved residence status definitions of the individual match, enumeration, and resi-Resident. The P-sample matched or not matched person dence status codes assigned by the matching clerks.

was a resident of the housing unit on Census Day.

The final P-sample results are shown in Tables 4-14 and 4-15. The P-sample people have been classified as Nonresident. P-sample people were nonresidents of the matched, not matched, unresolved match status, and cluster when they were fictitious, duplicates, geocoding removed in Table 4-14 and also tabulated as resident, non- errors, or should not have been included as a resident of resident, and unresolved residence status in Table 4-15. the housing unit on Census Day. Nonresidents were The data are unweighted, but the people sampled out of removed from the P sample.

the targeted extended search are removed from tabula-Unresolved residence status. A matched or not tions for this section.

matched P-sample person had unresolved residence status The P-sample match status is defined as: when the follow-up interview did not successfully deter-mine the persons residence on Census Day. The residence

  • Matched status of the possible match was unresolved when the follow-up interview was not successful. The residence sta-
  • Not matched tus was also unresolved when the P-sample person had
  • Unresolved match status insufficient information for matching.
  • Removed from the P sample Table 4-15. P-sample Residence Status After Follow-Up Matched. The P-sample person was found in the cluster or in the surrounding block in either a housing unit or in P-sample after follow-up Unweighted group quarters. residence status people Percent Resident . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625,863 95.8 Not matched. The P-sample person was not found in the Nonresident . . . . . . . . . . . . . . . . . . . . . . . . . . 12,393 1.9 search area. If the nonmatch was sent to follow-up, the Unresolved . . . . . . . . . . . . . . . . . . . . . . . . . . . 15,082 2.3 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653,338 100.0 person was confirmed to be a resident of the cluster on A.C.E. Field and Processing Activities Section IChapter 4 4-17 U.S. Census Bureau, Census 2000

The final E-sample results are in Table 4-16. The E-sample P-sample person could be matched, but have unresolved people were classified as correctly or erroneously enumer- residence status or have both match and residence status ated or having an enumeration status of unresolved. unresolved. Therefore, tabulations for match status and These were the unweighted match results that go to impu- residence status are shown separately for the P-sample.

tation and estimation with the people sampled out of the Estimation Outcome Codes targeted extended search removed.

Two sets of outcome codes were prepared, one for the The E-sample enumeration status was defined as:

Census Day household and one for the Interview Day

  • Correctly enumerated household. The final P-sample estimation outcome code identified the status of the interview for estimation on
  • Erroneously enumerated Census Day and on the day of the interview. For example,
  • Unresolved enumeration status there were cases that were complete interviews for the current residents, but were reported as noninterview or Correctly enumerated. E-sample people were correctly vacant for the Census Day residents.

enumerated when they were matched to the P sample, or when they have been followed up and they should have The final Census Day outcome codes are in Table 4-17.

been enumerated in this cluster. Outcome codes were changed as a result of the follow-up interview in the following types of situations:

Erroneously enumerated. E-sample people were erro-

  • No Census Day residents noninterview. Whole neously enumerated when they have another residence households of P-sample people who said they lived else-where they should have been counted on Census Day, where on Census Day were converted to noninterviews.

were fictitious, were duplicated, lived in a housing unit that was a geocoding error, or had insufficient information

  • No Census Day residents vacant. Whole households for matching and follow-up. who lived in group quarters on Census Day or should have been enumerated at another residence were con-Unresolved enumeration status. E-sample people had verted to vacant.

unresolved enumeration status when the follow-up inter-The outcome codes for these two situations were changed view was unsuccessful. The E-sample person may have because new information from the follow-up interview been followed up to obtain information about the indicated the original interview was incorrect. The housing E-sample nonmatch, possible match, matched person with unit outcome code for people identified as residents of the unresolved residence status, or geographic work to obtain housing unit from the person interview who said in the the location of the housing unit.

follow-up interview that they lived elsewhere was changed to noninterview. The original person interview listed this Table 4-16. E-sample Matching After Follow-Up household as residents of the housing unit when they did Unweighted not live at this address. The interview is incorrect and is E-sample enumeration status people Percent converted to a noninterview.

Correctly enumerated . . . . . . . . . . . . . . . . . . 652,390 92.6 The housing unit outcome codes for people identified as Erroneously enumerated . . . . . . . . . . . . . . . 31,064 4.4 residents of the housing unit, from the person interview Unresolved . . . . . . . . . . . . . . . . . . . . . . . . . . . 21,148 3.0 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704,602 100.0 who said in the follow-up interview that they lived in group quarters or should have been enumerated at another residence, were changed to vacant. The original There were unresolved codes assigned to P-sample and person interview should have classified the housing unit E-sample people. A probability of being matched was as vacant, because the people should have been enumer-imputed for a P-sample person with unresolved match sta-ated at another address.

tus. A probability that the P-sample person was a resident was imputed when the follow-up did not give enough The table also contains numbers of housing units identi-information to resolve the persons residence status. The fied as interviews, noninterviews, and vacant and percent-probability that a P-sample person was a resident was the ages of total housing units and numbers and percentages probability that the person should have been included in of occupied housing units. The noninterview rate for occu-the P-sample. The probability that the E-sample person pied housing units for Census Day was 3.0 percent.

was correctly enumerated was also imputed for the Addresses that were not housing units on Census Day E-sample people with unresolved enumeration status. A were removed from the P sample.

4-18 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

Table 4-17. Final Census Day Estimation Outcome Codes for A.C.E. Housing Units (Unweighted)

Total housing units Occupied housing units Census Day outcome code Number Percent Number Percent Census Day interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254,175 87.5 254,175 97.0 Complete Census Day interview with a household member. . . . . . . . . . . . . . 233,327 Complete Census Day interview with a proxy respondent . . . . . . . . . . . . . . . 18,335 Sufficient partial interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,513 Census Day noninterview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7,794 2.7 7,794 3.0 No Census Day residents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,709 Field Census Day noninterview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2,667 All people have insufficient information for matching and follow-up . . . . . . . 2,418 Total occupied Census Day housing units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261,969 100.0 Vacant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28,472 9.8 No Census Day residents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,561 Vacant on Census Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23,911 Total occupied and vacant housing units on Census Day . . . . . . . . . . . . . . . . . 290,441 100.0 Not a housing unit on Census Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10,472 Total housing units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300,913 The Census Day noninterview rates in Tables 4-18 and Comparison of Initial and Final P-Sample 4-19 are for occupied housing units. The percent noninter- Estimation Outcome Codes for Census Day view was calculated for the unweighted numbers of Cen-sus Day noninterviews divided by the occupied Census Table 4-20 compares the preliminary and final Census Day Day interviews, which was the interviews plus the nonin-interview outcome codes. The preliminary Census Day terviews on Census Day. The Census Day noninterview outcome codes were changed, when the follow-up inter-rates were recalculated to reflect changes due to coding in views for the P-sample classified people as nonresidents after follow-up matching.

because they did not live at the sample address at the Table 4-18. P-sample Noninterview Rates for time of the census, or they were considered as living at Census Day in Occupied Housing the sample address but should have been counted at Units by Interview Mode another residence such as group quarters or another Percent home. The housing unit could also be identified as not Interview mode noninterview being a housing unit on Census Day.

Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Personal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Quality assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37.4 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.0 Table 4-19. P-sample Noninterview Rates for Census Day in Occupied Housing Units by Type of Interview Percent Type of interview noninterview Interview with a household member . . . . . . . . . . . . . . . 1.8 Proxy interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4 Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.0 A.C.E. Field and Processing Activities Section IChapter 4 4-19 U.S. Census Bureau, Census 2000

Table 4-20. Comparison of the Preliminary and Final Census Day Outcome Codes Final Census Day outcome codes Preliminary Census Day Interview Inter- No Census Whole No Census Not outcome codes with view Partial Day household Day a household with inter- residents- Field insufficient residents- housing member proxy view noninterview noninterview information vacant Vacant unit Interview with Household member . . . . 233,327 0 0 2,033 0 0 125 0 147 Interview with proxy . . . . . . . . . . . . . . . . . 0 18,335 0 676 0 0 252 0 117 Partial interview . . . . . . . . . . . . . . . . . . . . 0 0 2,513 0 0 97 0 0 2 Field noninterview . . . . . . . . . . . . . . . . . . 0 0 0 0 2,667 0 0 0 0 Whole household insufficient information . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 0 0 2,321 0 0 0 No Census Day residents-vacant . . . . . 0 0 0 0 0 0 4,184 0 0 Vacant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 0 0 0 0 23,911 0 Not a housing unit . . . . . . . . . . . . . . . . . . 0 0 0 0 0 0 0 0 10,206 Table 4-21. Final Interview Day Estimation Outcome Codes for A.C.E. Housing Units (Unweighted)

Total housing units Occupied housing units Interview Day outcome code Number Percent Number Percent Interview Day interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264,103 89.0 264,103 98.9 Complete interview on Interview Day with a household member . . . . . . . . . . . . . . . 249,854 Complete interview on Interview Day with a proxy respondent . . . . . . . . . . . . . . . . 12,317 Sufficient partial interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1,932 Interview Day noninterview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3,052 1.0 3,052 1.1 No Interview Day residents-household converted to noninterview . . . . . . . . . . . . . . 483 Field noninterview on Interview Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 All people have insufficient information for matching and follow-up . . . . . . . . . . . . . 2,196 Total occupied housing units on Interview Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267,155 100.0 Vacant on Interview Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29,662 10.0 Total occupied and vacant housing units on Interview Day . . . . . . . . . . . . . . . . . . . . . . 296,817 100.0 Not a housing unit on Interview Day . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4,096 Total housing units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300,913 Final P-Sample Estimation Outcome Codes for Table 4-22. P-sample Noninterview Rates for Interview Day Interview Day in Occupied Housing Units by Interview Mode The final Interview Day outcome codes are in Table 4-21.

The interview outcome, as of Interview Day, was for cases Percent Interview mode noninterview originally classified as nonmovers and inmovers. Changes as a result of the follow-up interview were from whole Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.7 households of nonmovers who said they: Personal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.0 Quality assurance . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4

  • Never lived at this residence Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1
  • Lived in group quarters on Census Day
  • Lived at another residence on Census Day Table 4-23. P-sample Noninterview Rates for The outcome codes for these cases were converted to Interview Day in Occupied Housing noninterviews. Units by Type of Interview The Interview Day noninterview rates were recalculated to Percent Type of interview reflect changes due to coding in after follow-up matching. noninterview The final noninterview rates for Interview Day by inter-Interview with a household member . . . . . . . . . . 0.5 view mode and type of interview are in Tables 4-22 and Proxy interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 4-23. Total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 4-20 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000 A.C.E. Mover and Residence Status Code
  • A.C.E. Mover Code
  • A.C.E. Other Residence Code 1 = Nonmover 0 or blank = Default for whole household inmovers 2 = Inmover 1 = In other residence on Census Day 3 = Outmover 2 = Not in other residence on Census Day D = Dont know
  • A.C.E. Born Since Census Day Code R = Refused 0 or blank = Default for inmovers
  • A.C.E. Status 1 = Born on or before Census Day 2 = Born since Census Day N = Nonmover, resident on Census Day D = Dont know O = Outmover, resident on Census Day R = Refused I = Inmover, nonresident on Census Day R = Removed, nonresident on Census Day
  • A.C.E. Group Quarters Code U = Unresolved residence status 0 or blank = Default for whole household inmovers7 B = Born since Census Day, nonresident on Census 1 = In group quarters on Census Day Day 2 = Not in group quarters on Census Day D = Dont know R = Refused 7

Partial household inmovers were assigned the codes of 1, 2, D, or R during the edit for CAPI data review.

A.C.E. Field and Processing Activities Section IChapter 4 4-21 U.S. Census Bureau, Census 2000 The Treatment of Group Quarters in A.C.E.

The A.C.E. was designed to provide estimates of person Likewise, if a census person in the E-sample was enumer-coverage in housing units. There was no sample of, and ated in a housing unit, but the housing unit was misclassi-no estimates for, persons in group quarters. The P-sample fied and should have been group quarters, the follow-up housing units were selected for the A.C.E. and the people of the census nonmatch obtained information about the in the P-sample housing units were matched to the people residence of the person. If it found the person should have enumerated in census housing units. been counted in this block in group quarters or a housing unit, the person was coded as correctly enumerated in Classifying a structure as group quarters was difficult at A.C.E. processing. The ideal was not to classify someone times. For example, homes for the elderly have made it as erroneous when they really should have been counted more common for a single structure to contain apartments in this cluster, but the type of residence was misclassified.

for retired people, assisted living, and full care. Another If a structure contained both housing units and group example was college dormitories. A dormitory was group quarters, the people who were enumerated in the census quarters when it was occupied by unmarried students. in a housing unit were eligible to be in the E sample. The The dormitory contained housing units if it was occupied follow-up interview identified such E-sample people who by married students. If the dormitory was mixed with mar- were not matched as living in the cluster and having no ried, unmarried, faculty, and staff, it contained housing other residence. They were coded as correctly enumer-units. As a result, housing units or group quarters could ated. There was no duplicate search between people enu-be misclassified, when they were not easily classified as merated in group quarters and housing units.

housing units or group quarters. This misclassification In summary, then, the approach was balanced:

could be found in both the A.C.E. and the census.

  • Look for P-sample people in group quarters when they were not found in census housing units.

When the P-sample people in A.C.E. housing units did not match to people enumerated in housing units in the cen-

  • Follow up E-sample people in both housing units and sus, they were matched to people enumerated in the cen- group quarters in the cluster.

sus in group quarters. That is, group quarters were The population in housing units was covered, but there searched for P-sample nonmatches. If the P-sample people was no estimate of coverage in group quarters. If the were found in the group quarters enumerations, they were housing unit was duplicated in the group quarters, the treated as matched. However, no attempt was made to dis- group quarters people were not counted as duplicates.

cover whether the misclassification was in the A.C.E. or Likewise, if a group quarter was missed, there was no the census. determination of undercounted inhabitants.

4-22 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000 The A.C.E. Person Interview8 Household Roster NAMES ON SCREEN]. Is that correct? After the respondent had reviewed the names, the interviewer could change the If the person lived at the sample address, the interviewer spelling, or add or delete a name.

began the interview with a series of questions to obtain the names of everyone currently living at the sample hous-Movers ing unit. The first question was:

I need to get a list of everyone living here permanently When the respondent agreed that the list was correct, the or staying temporarily at this address. What is your interviewer handed the respondent a calendar containing name? the months of March, April, and May of 2000 that had Census Day clearly marked. At this point in the interview, After obtaining the name of the person with whom the the goal was to begin determining whether the people interviewer was speaking, the interviewer asked, Anyone listed as current residents were also residents of the else? If there was a yes response, the interviewer asked, sample housing unit on Census Day and if anyone else What is his or her name? and followed that with Anyone should have been included as a Census Day resident. The else? until the interviewer received a no response. interviewer asked if any of the listed persons (current resi-As a check for types of people who were frequently left off dents) had moved into the sample housing unit after Cen-listings of household members, there were two additional sus Day. The interviewer said to the respondent:

questions. The first question asked about people who may Please look at this calendar. Did any of the people I just have lived at the household sometimes, but not all the listed move into <sample address> after Census Day, time, such as children in joint custody or people who trav-April 1, 2000?

eled a great deal of the time. The question was:

Are there any additional people who currently live or If the answer was yes, the interviewer asked, Who stay here, like someone whos temporarily away or moved in after April 1? Any person mentioned was con-someone who stays here off and on? sidered a nonresident of the sample housing unit on Cen-sus Day. If everyone in the household was mentioned, If the response was yes the interviewer asked, What then the whole household was considered nonresidents on was his or her name? and followed with Anyone else? Census Day.

until the interviewer received a no response.

The interviewer now had a list of current residents who Other persons who were frequently omitted from house- also lived at the sample housing unit on Census Day. It hold listings were roommates or live-in employees. The was necessary to determine if there was anyone living at interviewer asked, Is there anyone else like a roommate the sample housing unit on Census Day who did not live or a live-in employee who lives here? there currently. The interviewer asked, Was there anyone If the response was yes the interviewer asked, What is else living or staying here on April 1, 2000 who has his or her name? and followed with Anyone else? until moved out? If the response was yes, the interviewer the interviewer received a no response. asked, What is his or her name? and Anyone else? until a no response was received.

At this point in the interview, the interviewer had collected a list of household members that the respondent had vol- The interviewer now had a list of the names of everyone untarily mentioned, and the interviewer had also checked the respondent had reported living at the sample housing for two types of persons that research had shown were unit currently and on Census Day. The interviewer then frequently left off household listings. established a reference person (relationships will be rela-tive to this person) by asking who owns or rents the The interviewer then reviewed a screen that contained a house or apartment. The interviewer asked, In whose list of the household members the respondent reported.

name is this (house/apartment) owned or rented? The The interviewer read the list of names and asked if the list interviewer also asked whether the housing unit was was correct. The interviewer said, I have listed [READS owned or rented by saying, Do you own this (house/apartment), rent it, or live here without payment of 8

See Keeley (2000) for details. rent?

A.C.E. Field and Processing Activities Section IChapter 4 4-23 U.S. Census Bureau, Census 2000

Demographics 5. Race. Race was also collected in a series of questions.

The interviewer referred the respondent to the part of At this point in the interview, the interviewer began to col- the card containing racial categories and said, Im lect demographic characteristics about all listed persons going to read a list of race categories. Please choose to facilitate matching the persons collected in this inter- one or more categories that best describe [NAMES]

view to persons listed on the census questionnaire for the race.

sample housing unit. Demographic characteristics are also used to create post-strata in dual system estimation. See If the respondent said, American Indian or Alaska Chapter 7 for more details. Native, the interviewer asked, What is [NAMES]

enrolled or principal tribe(s)? The interviewer The demographic characteristics collected in the interview recorded as many responses as given.

were:

If the respondent said, Asian, the interviewer asked, To what Asian group did [NAME] belong? Is [NAME]

1. Sex. The interviewer may have entered the sex of the Asian Indian, Chinese, Filipino, Japanese, Korean, Viet-person or asked the question when in doubt. The namese, or, some other Asian group? The interviewer question was, Is [NAME9] male or female? recorded as many responses as given.

If the respondent said, Pacific Islander, the inter-

2. Age. Age was collected in a series of questions. The viewer said, To what Pacific Islander group did interviewer asked for date of birth (What is [NAMES] [NAME] belong? Is [NAME] Guamanian or Chamorro, date of birth?). When the date of birth was entered in Samoan, or some other Pacific Islander group? The the instrument, the age of the person was calculated interviewer recorded as many responses as given.

and the interviewer verifies the age by saying, So At this point, the interviewer had a list of all reported cur-

[NAME] was about [AGE] on April 1? If the age was rent and Census Day residents and their demographic not correct, the interviewer changed the date of birth characteristics for use in matching these residents to in the previous question and the age was then recalcu- residents reported on the census questionnaire for this lated. housing unit.

If the respondent did not know the date of birth, then For households that reported moving into the sample the interviewer asked the persons age. The inter- housing unit after Census Day, this information was viewer asked, What was [NAMES] age on April 1, verified. The interviewer said to the respondent:

2000?

So, everyone you mentioned today moved into

<sample address> after April 1, 2000. Is that

3. Relationship. Relationship was to the person in correct?

whose name the house or apartment was owned or If the information was correct, the interview was contin-rented (called the Reference Person). The interviewer ued by asking the respondent if he or she knew and had handed the respondent a card containing relationship information about the residents of the sample housing categories and asked, How is [NAME] related to [THE unit who lived there on Census Day. (This part of the inter-REFERENCE PERSON]? for each person.

view was discussed in the section on movers.)

4. Hispanic Origin. Hispanic origin was collected in a Residence Section series of questions. The first question was, Is anyone For all households in which at least one member lived at of Spanish, Hispanic, or Latino origin? If the response the sample housing unit on Census Day, the interviewer was yes, the interviewer asked, Who is? followed continued with a few questions that checked for two types by Is there anyone else of Spanish, Hispanic, or of special living situations that were potential sources of Latino origin? until the response was no. duplicate enumerations. Respondents tended to forget that household members may have been living or staying If anyone was mentioned as being of Hispanic origin, at a place away from the sample housing unit. This may the interviewer asked, Is [NAME] of Mexican, Puerto have caused some persons to be reported more than once, Rican, Cuban, or some other Spanish origin? for each at the sample housing unit and again at other places person mentioned.

where they may have lived or stayed.

The first situation that had the potential to cause duplicate 9

The brackets containing name, age, and the Reference Per- enumerations was when a person may have lived at a sons name were filled by the instrument. When speaking to the respondent, Are you or other appropriate fillers replaced Is place that was not a private household on Census Day.

[NAME]. Since the Census Bureau did special enumerations at 4-24 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

places such as college dorms, nursing homes, prisons, other residence, such as, During a typical week, did and emergency shelters, the interviewer inquired if any- [NAME] spend more days at <sample address> or at the one was staying at any of these types of places by saying: other residence? or During a typical month, did [NAME]

spend more weeks at <sample address> or at the other Your answers to the next few questions help us residence?

count everyone at the right place. The Census Bureau did a special count at all places where groups If these questions did not help the respondent decide of people stay. Examples include college dorms, where the person spent most of the time, the persons nursing homes, prisons, and emergency shelters. On residence was determined by asking:

April 1, 2000, were any of the people you mentioned Was [NAME] staying at <sample address> or the today staying elsewhere at any of these types of other residence on April 1, 2000?

places?10 At this point, the interviewer had a reported list of current If the response was yes, the interviewer asked, and Census Day residents of the sample housing unit Who stayed at one of these types of places?

developed through an extensive household listing proce-The next situation that could result in a duplicate enu- dure. The interviewer had obtained the demographic char-meration was when a person might have had another resi- acteristics of the listed persons. Through questions on dence. The interviewer said: mobility and other possible residences it had been deter-mined:

Some people have more than one place to live.

Examples include a second residence for work, a

  • whether everyone listed in the household currently friends or relatives home, or a vacation home. On should be considered a Census Day resident of the April 1, 2000, did any of the people you mentioned sample housing unit today have a residence other than <sample
  • whether anyone currently absent from the household address>? should be considered a Census Day resident.

If the response was yes, the interviewer asked, Conclusion of Interview Who had another residence?

The interviewer now was ready to conclude the interview.

For each person mentioned as having another residence, Before concluding, there was one last check of the house-the interviewer asked, As of April 1, did [NAME] spend hold listing. The first name, middle initial, last name, sex, most of the time at <sample address> or at the other resi-and age of each person listed as a current and Census Day dence? If the response was, I dont know, the inter-resident was shown on the screen. The interviewer, again, viewer asked:

showed the respondent the computer screen and asked, Which of the following categories, most accurately Do I have the spelling, sex, and age correct for every-describes the amount of time [NAME] stays at the one? If not, corrections could be made at this screen and other residence? A few days of each week; entire the respondent was asked to verify and/or change the weeks of each month; months at a time; or some information until the respondent said that everything was other period of time. correct.

If the respondent still was not sure where the person The interviewer asked the respondent for his/her tele-spent most of the time, there was a series of questions phone number by saying, In case we need to contact you designed to assign an amount of time spent at some again, may I please have your telephone number? then thanked the respondent and concluded the interview by 10 An interviewer help screen was available with a complete list saying, This concludes our interview. The Census Bureau of special enumeration places. thanks you for your participation.

A.C.E. Field and Processing Activities Section IChapter 4 4-25 U.S. Census Bureau, Census 2000 Insufficient Information for Matching and Follow-Up The census person records were reviewed both by com- and either age or year of birth11. Census records were cre-puter and clerically to identify people with insufficient ated on the HCUF for all data-defined people. Anyone who information for matching and follow-up. Only people with was not data-defined was a whole person imputation.

sufficient information for matching and follow-up were The count of census people who were whole person impu-allowed to be processed in the matching and follow-up tations were identified separately from the other census interviewing phases of the person matching. The three people with insufficient information for matching, because types of insufficient information were:

they were treated differently in the Dual System Estimator.

  • The census people were not data-defined. The number of whole person imputations was subtracted
  • The census people were data-defined, but computer from the census count within post-strata. The E-sample coded as insufficient information for matching and people who were data-defined but with insufficient infor-follow-up. mation for matching were included in the count of errone-
  • The census people were computer coded as sufficient ous enumerations, and were, thus, excluded from the information, but converted clerically to insufficient count of whole person imputations in the Dual System information for matching and follow-up. Estimator.

The first type of census people who were not data-defined The mail return census forms were designed to collect were not included in the E sample. Only data-defined characteristics for six people. However, space was pro-people were included in the E sample. These data-defined vided for the names of the additional residents in house-people create person records in the census. holds with seven to twelve people. The large household follow-up operation attempted to obtain characteristics for Census Data-Defined these people by telephone.

The term data-defined was a term that has been used in The exception was the enumerator questionnaire used in the past at the Census Bureau to mean that a census per- nonresponse follow-up. There was space for five people, son record has been created. The term Total Persons was but a continuation form was used to record data for per-the total number of people counted in the census at a cen- sons six and above in large households.

sus housing unit. The term Selected Persons referred to There was some consideration given to using the names in data-defined census people in a census household. The the long form roster for persons seven through twelve to difference was people who were not data-defined. These create person records and having them data-defined. How-people had no census person record. A whole person ever, it seemed preferable not to do this, and the A.C.E.

imputation procedure was employed to create characteris-did not attempt to create additional census data-defined tic data in the census for these people.

Two characteristics were required to be data-defined, 11 Person one did not automatically have a relationship of head where name counts as a characteristic. Name must have of household like it did in 1990, and the telephone number in had at least three characters in the first and last name item 2, on the mail return questionnaire, did not count as a char-acteristic. The age and date of birth were examined together. If together. Other characteristics that could be used in the age was present, age/year of birth counted as a characteristic. If counting were relationship, sex, race, Hispanic origin, age was blank, but year of birth was present, then the age/year of birth counted as a characteristic. If age and year of birth were both blank, the age/date of birth did not count as a characteristic.

The month and day of birth were used in Dress Rehearsal in the determination of counting the age/date of birth as a characteris-tic, but not in Census 2000.

4-26 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

people for these people with only names in large house- boxes on the census questionnaire. Names were obtained holds. These people were whole person imputations. from the roster on the image of the questionnaire for the The number of whole person imputations used in the Dual long forms. Children with first names and no last names System Estimator will correspond to the counts used in were converted to sufficient information for matching and the census. follow-up when the last name could be assumed from an adult with first and last name in the household. These Computer Coding of Insufficient Information for updates to the names were captured into the matching Matching and Follow-Up for the E Sample software, which was programmed to decide if the person The A.C.E. requires a minimum amount of information for had sufficient information for matching and follow-up.

matching and follow-up. The data-defined census people P-Sample Insufficient Information for Matching were reviewed to identify the ones with sufficient informa- and Follow-Up tion for matching and follow-up for A.C.E. The minimum amount of data required for data-defined census people to The P-sample people were reviewed by computer to iden-have sufficient information for matching and follow-up tify people with insufficient information for matching and was complete name and two characteristics. follow-up. The P-sample rules for sufficient information for matching and follow-up were the same as the E-sample Complete name was defined as: rules, which was complete name and two characteristics.

  • First name12, middle initial, and last name Cases identified by the computer as missing sufficient information were suppressed from viewing by the clerical
  • First name and last name matchers to prevent errors in matching people with insuf-
  • First initial, middle initial, and last name ficient information for matching. There were fewer than 4,000 P-sample people computer coded with insufficient The A.C.E. used the same criteria for classifying age as information for matching and follow-up.

data-defined as the census, which is only age and year of birth were used to determine if age was present in count- This computer review was established to avoid certain ing characteristics to determine if the person had enough types of clerical errors in matching. For example, names data to be data-defined in the census. In other words, like D K or Dont Know (D or Dont is the first name and K when the age and year of birth were both blank, month or Know is the last name), R R (refused for the first name and day of birth were not considered. and refused for the last name), or M Smith, which could not be matched with certainty or, if treated as a nonmatch, Clerical Coding of Insufficient Information for followed up with a high rate of success. The census might Matching and Follow-Up for the E Sample have recorded a person with a complete name, which might be matched by a clerk. If matching were allowed, it There were cases where the name was not blank, but was would have been biased by what was enumerated in the too incomplete or unlikely to be real to permit matching census. A match would have resulted if the names were and follow-up. Census names like Mr. Doe, Donald Duck, present at the address, and a nonmatch if the names were and White Female were coded insufficient information by not in the census. Since names like DK could not be fol-the clerical matchers. The computer could not recognize lowed up, they would have been coded as insufficient names that were not real or were really incomplete names.

information for matching and follow-up. Therefore, a The retrieval system contained an image of the census match would have been assigned when the census questionnaire. The image of the census questionnaire was obtained complete names, and unresolved when no match reviewed for census people coded as insufficient informa- was found. The best way to avoid a bias was to suppress tion for matching and follow-up to see if there was addi- the P-sample cases computer coded as insufficient infor-tional data that could be used to convert them to suffi- mation for matching and treat them as unresolved.

cient information for matching and follow-up. The data The probability of a match was imputed for the P-sample capture system may have had problems reading the hand people coded as insufficient information for matching and written entries, or there may be information outside the follow-up. They were treated in the same way as other P-sample people with unresolved match status. If the 12 whole household had insufficient information for match-The minimum number of characters to be a name was two.

Two characters were required in the first name and two charac- ing and follow-up, the people were removed and con-ters in the last name. verted to noninterview status.

A.C.E. Field and Processing Activities Section IChapter 4 4-27 U.S. Census Bureau, Census 2000 Final P Sample Person Match Codes Matched M = The P-sample and census people were matched. The P-sample person was a resi-dent of the housing unit on Census Day.

MR = The follow-up interview determined that the matched person with unresolved resi-dence status was a resident.

MU = The A.C.E. person was matched, but the follow-up interview obtained no useful information to resolve the residence status for the matched person who had a residence status of unresolved before follow-up. The P-sample persons residence status was unresolved.

Not Matched NP = The P-sample person was not matched to a census person. There was no follow-up for the whole household nonmatches from person interviews with household members and the whole household nonmatches were not conflicting household nonmatches.

NC = The P-sample nonmatch was found on the census roster. This person in a partial nonmatch household was not matched to the census because only name was col-lected in the census for this person in a large household and the census person was not data-defined. No follow-up interview was necessary.

NR = The P-sample person was not matched and was identified as a resident in the block cluster on Census Day during the A.C.E. person follow-up interview.

NU = The P-sample person was not matched. Not enough information was collected during the A.C.E. person follow-up interview to identify the P-sample person as a resident or nonresident in the block cluster. The residence status for the P-sample person was unresolved. This code was also used when the P-sample person was followed up to collect geographic information and that information was not col-lected. The NU code was also used when the person did not live at the sample address on Census Day and the Census Day address was not complete enough to determine if the Census Day address was in the sample cluster.

Unresolved P = There was not enough information collected during the follow-up interview to determine if the possible match was a match or not. The match status of the P-sample person was unresolved.

KI = Match not attempted for the P-sample person, because the person had insufficient information for matching and follow-up. The name was blank or incomplete or the name was complete, but the person had only one characteristic. This was a com-puter assigned code and these people were suppressed from view by the match-ers.

KP = Match not attempted for the P-sample person, because (1) the name was incom-plete, such as Mr. Jones, or (2) the name was not a valid name, such as White Female or Donald Duck. This was a clerically assigned code.

4-28 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

Removed from the P Sample FP = The P-sample person was fictitious in this block cluster. The person was inter-viewed in error during the person interview. This person was not included in the final P sample.

NL = The P-sample person did not live at the sample address or in the block cluster on Census Day and was listed as a nonmover or outmover in error. This person was removed from the list of P-sample people, since he or she was collected during the person interview in error.

NN = The P-sample person was identified as a nonresident in the block cluster on Cen-sus Day during the A.C.E. person follow-up interview, because the person lived in group quarters on Census Day, or had another residence where the person should have been counted on Census Day according to census residence rules. This per-son was removed from the list of P-sample people, since he or she was collected during the person interview in error.

DP = The P-sample person was a duplicate of another P-sample person.

MN = The A.C.E. person follow-up interview determined that the matched person with unresolved residence status was not a resident in this housing unit or in this block cluster. The person was no longer in the list of P-sample people.

GP = The P-sample person was removed, because the person interview was conducted at a housing unit that exists outside the sample cluster. The person follow-up iden-tified this housing unit as a P-sample geocoding error.

A.C.E. Field and Processing Activities Section IChapter 4 4-29 U.S. Census Bureau, Census 2000 E-Sample Person Enumeration Codes Correctly Enumerated M = The P-sample and E-sample people were matched. The E-sample person was cor-rectly enumerated.

CE = The E-sample nonmatch was identified as correctly enumerated during the A.C.E.

person follow-up interview.

MR = The A.C.E. person follow-up interview determined that the matched person with unresolved residence status was a resident.

Erroneously Enumerated13 GE = The E-sample person was erroneously enumerated in this block cluster, because the census housing unit was a geocoding error (i.e., counted in the wrong block cluster). The E-sample person should have been enumerated elsewhere in the census.

EE = The E-sample nonmatch was identified during the person follow-up interview as erroneously enumerated.

FE = The E-sample nonmatch was determined to be fictitious in this block cluster during the follow-up interview. The person may have existed, but should not have been enumerated in the census within this block cluster. The E-sample person was erro-neously enumerated in the census in this block cluster.

DE = The E-sample person was a duplicate of another E-sample person. The code was also used when the E-sample person was a duplicate of a census person in a sur-rounding block. The people in the E-sample housing unit were erroneously enu-merated, because they were counted accurately in the surrounding block and duplicated in the sample cluster.

MN = The A.C.E. person follow-up interview determined that the matched person with unresolved residence status was not a resident in this housing unit or in this block cluster. The E-sample person was an erroneous enumeration.

KE = Match not attempted for the E-sample person. The name was blank or incomplete or the name was complete, but the person had only one characteristic. The name was incomplete or not a valid name, such as Child Jones, or Mickey Mouse.

13 The E-sample people who were duplicated with non-E-sample people were not full erroneous enumerations. See the section on Duplicate Search Within Cluster in this chapter for a discussion of the probability of erroneous enumeration when there was duplication between a census person in the E sample and a non-E-sample person.

4-30 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

Unresolved UE = Not enough information was collected during the A.C.E. person follow-up inter-view to identify the E-sample person as correctly or erroneously enumerated in the block cluster. The enumeration status for the E-sample person was unresolved. The UE code was also used when the person did not live at the sample address on Cen-sus Day and the Census Day address was not complete enough to determine if the Census Day address was in the sample cluster. This code was also used when the E-sample person was followed up to collect geographic information and that infor-mation was not collected.

MU = The E-sample person was matched, but the follow-up interview obtained no useful information to resolve the residence status for the matched person who had a resi-dence status of unresolved before follow-up. The E-sample persons enumeration status was unresolved.

P = There was not enough information collected during the follow-up interview to determine if the possible match was a match or not. The match status of the P-sample person was unresolved.

GU = The geographic work for the targeted extended search was unresolved. The code had the same definition in both the before and after follow-up matching. The dif-ference was in after follow-up, the code was only used in the list/enumerate clus-ters. The field work for the targeted extended search was not done or the block number on the form was not in the surrounding blocks, in the block cluster, or on the map. It was not clear where the housing unit was located.

A.C.E. Field and Processing Activities Section IChapter 4 4-31 U.S. Census Bureau, Census 2000 Final P-Sample Person Residence Status Codes Resident M = The P-sample and census people were matched.

MR = The follow-up interview determined that the matched person with unresolved residence status was a resident.

NR = The P-sample person was not matched and was identified as a resident in the block cluster on Census Day during the A.C.E. person follow-up interview. The P-sample person was missed in the census.

NP = The P-sample person was not matched to a census person. There was no follow-up for the whole household nonmatches from person interviews with household members and the whole household nonmatches were not conflicting household nonmatches. These people were considered residents of the housing unit on Census Day.

NC = The P-sample nonmatch was found on the census roster. This person in a partial nonmatch household was not matched to the census because only name was collected in the census for this person in a large household and the census person was not data-defined. No follow-up interview was necessary.

Nonresident FP = The P-sample person was fictitious in this block cluster. The person was inter-viewed in error during the person interview. This person was not included in the final P sample.

NL = The P-sample person did not live at the sample address or in the block cluster on Census Day and was listed as a nonmover or outmover in error. This person was removed from the list of P-sample people, since he or she was collected during the person interview in error.

NN = The P-sample person was identified as a nonresident in the block cluster on Census Day during the A.C.E. person follow-up interview, because the person lived in group quarters on Census Day or had another residence where the person should have been counted on Census Day according to census residence rules.

This person was removed from the list of P-sample people, since he or she was collected during the person interview in error.

DP = The P-sample person was a duplicate of another P-sample person.

MN = The A.C.E. person follow-up interview determined that the matched person with unresolved residence status was not a resident in this housing unit or in this block cluster. The person was no longer in the list of P-sample people.

GP = The P-sample person was removed because the person interview was conducted at a housing unit that exists outside the sample cluster. The person follow-up identified this housing unit as a P-sample geocoding error.

4-32 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

Unresolved MU = The A.C.E. person was matched, but the follow-up interview obtained no useful information to resolve the residence status for the matched person who had a residence status of unresolved before follow-up. The P-sample persons residence status was unresolved.

NU = The P-sample person was not matched. Not enough information was collected during the A.C.E. person follow-up interview to identify the P-sample person as a resident or nonresident in the block cluster. The residence status for the P-sample person was unresolved. This code was also used when the P-sample person was followed up to collect geographic information and that information was not col-lected. The NU code was also used when the person did not live at the sample address on Census Day and the Census Day address was not complete enough to determine if the Census Day address was in the sample cluster.

P = There was not enough information collected during the follow-up interview to determine if the possible match was a match or not. The residence status of the P-sample person was unresolved.

KI = Match not attempted for the P-sample person, because the person had insufficient information for matching and follow-up. The name was blank or incomplete or the name was complete, but the person had only one characteristic. This was a computer assigned code and these people were suppressed from view by the matchers.

KP = Match not attempted for the P-sample person, because (1) the name was incom-plete, such as Mr. Jones, or (2) the name was not a valid name, such as White Female or Donald Duck. This was a clerically assigned code.

A.C.E. Field and Processing Activities Section IChapter 4 4-33 U.S. Census Bureau, Census 2000

U.S. Census Bureau, Census 2000 A.C.E. Field and Processing Activities Hello, Im (Your name) from the U.S. Bureau of the Census. Heres my identification. We are listing addresses as part of the Census 2000, and I have a few questions to ask you.

Section 4 - LISTING PAGE (1) Line No. (2) Block No. (3) House No. (4a) Road/street name (5a) Rural (5b) Fill items 9 and 10 in areas without city style addresses (see cover, Section 1, item 5).

Rte. No. Box No.

1 (9) Householder name First MI (10) Physical location description or E-911 address (Maximum 50 characters)

(6) Map Spot No. - Do not fill for Mobile Home Park. (7) PO Box No. (8) ZIP Code Number Letter Last (11a) How would you describe this type of address?

1 One family house (occupied or vacant) - 5 One family house (occupied or (11b) Unit status (12) REMARKS (Do NOT use this ' SINGLE UNIT ADDRESS 1 Occupied, or vacant 6 Storage of space for location Detached or attached to one or more vacant) in special place - In 12, enter and intended for household (13) How many (housing units/living quarters/apartments), occupied or vacant, are buildings - Go to 11b. name of special place and contact description.)

occupancy - Go to 13. goods - there at (basic address)? Example: basement apartment, garage apartment . .

person. Then go to 13. Go to 17.

2 Basic address with two or more housing units 2 Under construction (14) Besides the unit(s) you have just mentioned, has this address been (example: apartment house) - Go to 15a. 6 Basic address with two or more (started) . 8 Other -

housing units in special place - In 12, converted into apartments where other people might live?

However, if under construction or future Go to Go to 12, Figure 4-1.

construction, skip 15 and go to 16. enter name of special place and 3 Future construction 12, 1 No 2 Yes - How many ADDITIONAL apartments? . . .

(not started) then 17.

contact person. Then go to 15a. then 3 Mobile home/trailer, NOT in a park - Go to 17.

4 Unfit for habitation 17. If sum of 13 plus 14 is: 1 - Go to 16 and 17.

7 Other, for example: Occupied camper, 4 Mobile home/trailer park - Go to Mobile Home tent, van, boat, etc. - Go to 17. 2 or more - Change item 11a to "Basic address with two or Park Page, Section 6. 5 Boarded up .

more housing units" and go to 15a.

' MULTI-UNIT ADDRESS (16) Total number of housing units, occupied or vacant, at OFFICE USE ONLY (15a) Canvass the multi-unit basic address and enter the number of units on each floor. this basic address.

If more floors, enter additional floor information Basement 1st floor 2nd floor 3rd floor 4th floor 5th floor 6th floor (example: "5 APTS 14th FLR") (15b) Total units from your canvass . (17) Information obtained from:

(15c) How many apartments, occupied 1 HH member* 3 Manager*

or vacant, are there at (basic address) ? 2 Proxy* 4 Observation 7th floor 8th floor 9th floor 10th floor 11th floor 12th floor Attic *Respondent name Telephone No.

(15d) If there is a difference between your canvass and respondent total, resolve the difference. Enter the correct total in 16. If multi-unit, go to the Multi-Unit Page.

(1) Line No. (2) Block No. (3) House No. (4a) Road/street name (5a) Rural (5b) Fill items 9 and 10 in areas without city style addresses (see cover, Section 1, item 5).

Rte. No. Box No.

2 (9) Householder name First MI (10) Physical location description or E-911 address (Maximum 50 characters)

(6) Map Spot No. - Do not fill for Mobile Home Park. (7) PO Box No. (8) ZIP Code Number Letter Last 0001 (11a) How would you describe this type of address?

1 One family house (occupied or vacant) - 5 One family house (occupied or (11b) Unit status (12) REMARKS (Do NOT use this ' SINGLE UNIT ADDRESS 1 Occupied, or vacant 6 Storage of space for location Detached or attached to one or more vacant) in special place - In 12, enter and intended for household (13) How many (housing units/living quarters/apartments), occupied or vacant, are buildings - Go to 11b. name of special place and contact description.)

occupancy - Go to 13. goods - there at (basic address)? Example: basement apartment, garage apartment . .

person. Then go to 13. Go to 17.

2 Basic address with two or more housing units 2 Under construction (14) Besides the unit(s) you have just mentioned, has this address been (example: apartment house) - Go to 15a. 6 Basic address with two or more (started) . 8 Other -

housing units in special place - In 12, converted into apartments where other people might live?

However, if under construction or future Go to Go to 12, construction, skip 15 and go to 16. enter name of special place and 3 Future construction 12, 1 No 2 Yes - How many ADDITIONAL apartments? . . .

(not started) then 17.

contact person. Then go to 15a. then 3 Mobile home/trailer, NOT in a park - Go to 17.

4 Unfit for habitation 17. If sum of 13 plus 14 is: 1 - Go to 16 and 17.

7 Other, for example: Occupied camper, 4 Mobile home/trailer park - Go to Mobile Home tent, van, boat, etc. - Go to 17. 2 or more - Change item 11a to "Basic address with two or Park Page, Section 6. 5 Boarded up .

more housing units" and go to 15a.

' MULTI-UNIT ADDRESS (16) Total number of housing units, occupied or vacant, at OFFICE USE ONLY (15a) Canvass the multi-unit basic address and enter the number of units on each floor. this basic address.

If more floors, enter additional floor information Address Listing Book Page for Single and Multiunit Structures Basement 1st floor 2nd floor 3rd floor 4th floor 5th floor 6th floor (example: "5 APTS 14th FLR") (15b) Total units from your canvass . (17) Information obtained from:

(15c) How many apartments, occupied 1 HH member* 3 Manager*

or vacant, are there at (basic address) ? 2 Proxy* 4 Observation 7th floor 8th floor 9th floor 10th floor 11th floor 12th floor Attic *Respondent name Telephone No.

(15d) If there is a difference between your canvass and respondent total, resolve the difference. Enter the correct total in 16.

Section I Chapter 4 If multi-unit, go to the Multi-Unit Page.

FORM D-1302 (6-23-99) 4-34

Figure 4-2.

The Housing Unit Follow Up Questions for an A.C.E. Nonmatch A .C.E. A DDR ESS CENSUS A DDR ESS BLOC K MSN Withi n Type of Unit # Un its BLOCK CENSUS [0 MSN MSN IO Address Status If lIeccssmy. put adilress corrections here:

A.C. E. NOilItIUrcll Adtfren Notes:

I. 1s A. C. E. N OIllt/lltch A t/dren located within the n o n *~hadcd area shown on the A.C. E. Cluster Map?

D y" D No. address docs not exist* Skip 10 ilem 6.

D No. address is outside cluster* Skip to item 6.

2. Is there a housing unit at A .C. E. NOllmutc/1 A dt/ren?

D y" D No - Explain ill nOles, thell skip 10 ilem 6.

3. Does A. C. E. /Voult/lltch Afldrel"l" represent the same housing unit a;; any of the addresses listed in the Census column orthe I-lousing Unit Reference List?

D Yes - Enter Census ID here:

D No

4. Docs A.C.£. NOl/nll/tell Aflr/ress represent ttle same housing unit as any or the addresses listed in the A. C. E. column of the ] lousing Unit Reference List?

D yc~ - Enter A.C.£' map spot number: _ _ and the Wilhin MSN ID D No

5. I low many housing units arc at th is basic address? Enter number of I IUs here:

(£Wh1in in Noles below)

6. Information from: If more than one. mark the m ll j" source.

D Household member D Building manager or landlord D Proxy D Observation Notes - Continue on back of page ifneeded :

A.C.E. Field and Processing Activities Section IChapter 4 435 U.S. Census Bureau, Census 2000

Figure 4-3.

A.C.E. Mover and Residence Status Flow Removed CAPI Interview Yes Is the Inmover Born since in Group Census Day Yes Coded B and No Quarters based on removed**

and age 18 to 22? date of birth Yes No Is the Inmover No a partial Inmover Inmover Mover code Household Inmover?

Nonmover or No, DK, or Outmover This address, Refused DK, Refused In Group Where should Yes Other Removed* Quarters on they be Residence?

Census Day? DK or Yes counted?

Refused Other No address DK or Other Unresolved Refused Where should Other Yes address Residence they be Removed Residence?

Status counted?

This DK or No address Refused Resident of Unresolved this Housing Unit on Residence Census Day Status Person Matching

  • Other residence question is asked, but the person is removed regardless of the answer.
    • Group quarters and Other residence questions are asked, but the person is removed regardless of the answer.

436 Section IChapter 4 A.C.E. Field and Processing Activities U.S. Census Bureau, Census 2000

Chapter 5.

Targeted Extended Search INTRODUCTION Search Area Definition The concept behind the dual system estimate is to esti-The search area for the 2000 A.C.E. was limited to either mate the census omission rate using the P sample and the just the sample block cluster or one ring of adjacent erroneous enumeration rate using the E sample. The com-blocks. An adjacent block is one that touches the cluster plete definition of being omitted from or erroneously enu-of sample blocks at one or more points. This definition merated in the census includes the concept of location, includes the blocks that touch the corner of the block clus-that is, a successful enumeration must have located the ter. Results from empirical research, using Census 1998 person in the right place. Right location in the census Dress Rehearsal data, show that the additional benefits means anywhere in the block where the reported housing of using two rings of surrounding blocks are negligible unit address was located, or in the search area, defined (Wolfgang, 1999).

as one ring of adjacent blocks. The operation concerned with locating and matching the persons in the surrounding Amount of Searching areas is Targeted Extended Search, or TES. The name was chosen because, unlike the similar procedure in the There were two important differences between the extent 1990 Post-Enumeration Survey (PES) where the surround- of searching in the 1990 PES and the 2000 A.C.E.:

ing area of every cluster was searched, the A.C.E. search

1. Only about 20 percent of A.C.E. block clusters had was targeted in two ways:

their surrounding areas searched, whereas in 1990 the

1. Results from the initial housing unit matching opera- surrounding area of every block cluster was searched.

tion were used to select the housing units that are

2. The search was targeted (in most cases) to only candidates for TES.

housing units identified as being likely to exhibit

2. In most cases, only clusters that include TES-eligible geocoding error; in 1990, all persons in surrounding housing units were included in TES. areas were eligible for search.

This chapter focuses on the statistical methods used in The clusters with a high number of potential geocoding TES. A.C.E. field and processing activities, including TES, errors were identified from the results of the initial hous-are described in Chapter 4. ing unit matching operation and subsequent field follow-up (see Chapter 4). These were A.C.E. block clus-Overview ters with a large number of Independent Listing housing The 1990 PES included a search in all blocks surrounding units not found in the January 2000 Census Address List.

each sample cluster. Every person in every house in every These types of nonmatches are possibly census geocoding block adjoining every sample block cluster was included errors of exclusion (i.e. not included in the census within in the search. This was determined to be burdensome in the sample area although they should have been). On the terms of time, cost, and perhaps mental fatigue on the census side, A.C.E. block clusters with a large number of part of matchers performing low-payoff searches (Hogan, census geocoding errors are likely to be errors of inclusion 1993). To improve efficiency, the Census 2000 A.C.E. took (i.e. reported by the census in the block cluster, although a more focused (i.e. targeted) approach in selecting clus- the unit is physically outside the A.C.E. block cluster).

ters, defining search areas, and determining which hous- These two types of housing units were eligible to be in the ing units and residents would be part of surrounding extended search as part of TES operations, and are thus block operations. TES-eligible housing units.

The Census 2000 A.C.E. search operation differed from Any cluster that included at least one potential census the 1990 PES in four primary ways: geocoding error, of either inclusion or exclusion, was eli-

1. Search area definition. gible to have TES operations performed in it and is termed a TES-eligible cluster. Clusters with no such potential
2. Amount of searching. geocoding errors became non-TES-eligible. The clusters in which TES was actually done are TES clusters, and
3. Persons eligible for search.

were selected from among the TES-eligible clusters either

4. TES weighting. with certainty or by probability sampling.

Targeted Extended Search Section IChapter 5 5-1 U.S. Census Bureau, Census 2000

Results from the 1990 PES show that geocoding errors are reduction. However, it is of at least equal importance that highly clustered. Slightly over 77 percent of the whole the surrounding area search be balanced. There are two household nonmatches were concentrated in less than ways TES could have been out of balance: 1) the geo-one-fourth of the PES sample block clusters. On the other graphical area included in the search could have differed hand, about 72 percent of the census geocoding errors between the P and E samples; 2) the TES block cluster were found in less than 3 percent of the PES sample block sampling could have selected clusters containing errors of clusters. TES is a good example of Demings 80-20 inclusion with greater or less likelihood than clusters with guideline80 percent of the benefits are realized by solv- errors of exclusion. To achieve the balancing in sample ing 20 percent of the problems. selection, it was necessary for each cluster with TES-eligible housing units and persons to have some Persons Eligible for Search probability of being selected for TES and be weighted by In order to be included in TES operations, a person must the inverse of the selection probability.

live in: The information available for TES selection included the

  • a TES cluster; and results of the initial housing unit matching, which included the results from the housing unit follow-up.
  • a TES-eligible housing unit Housing unit follow-up indicated, among other things, the A person in a housing unit that was not a TES-eligible count of potential geocoding errors of inclusion and exclu-housing unit, was a non-TES person, and thus, was not sion. The geocoding errors of inclusion were census units directly affected by TES operations. Any person in a found outside the cluster. Potential geocoding errors of TES-eligible housing unit was a TES person, unless exclusion were coded as address nonmatches in the someone in the housing unit matched (i.e. someone is independent listing. The combined number of census confirmed to be not a TES person). TES persons in clusters geocoding errors and independent listing address non-that were not selected for TES operations were identified, matches were considered to be the number of potential but did not have TES operations applied. Instead, these geocoding errors in each cluster. The probability that any cases were effectively removed from the sample by having cluster would be selected for TES depended on the count an assigned weight of zero. They were represented by of its potential geocoding errors for most clusters.

persons in other TES clusters selected by sampling. Exceptions are relisted clusters and clusters that were enumerated in the census using the List/Enumerate meth-TES Weighting odology. Those clusters did not go through housing unit matching and follow-up.

Every selected TES cluster was assigned a sampling weight equal to the reciprocal of its selection probability. This TES A housing unit that represented a potential geocoding cluster weight was assigned to all TES persons in that error could have been discovered by TES operations to be cluster and was multiplied by their A.C.E. sampling a geocoding error or an actual coverage error. Putting a weights to produce their TES-adjusted weights. The TES- particular housing unit in the category of potential made adjusted weight for TES persons in clusters not selected it, and the persons living in it, eligible for TES. This search for TES is zero. In this way, the TES persons in the TES was intended to determine whether the housing unit and clusters represent the TES persons in non-TES clusters. All persons were geocoded incorrectly into a neighboring elements of the dual system estimate (DSE) calculation, block, in which case they would be counted as correctly except those involving inmovers, can be affected by the enumerated, or were truly enumeration errors.

TES weighting because TES persons can be nonmovers Hence, the following TES selection strategy was imple-or outmovers, matches or nonmatches, and correct or mented:

erroneous enumerations.

  • Clusters that did not have counts of potential geocoding CLUSTER SAMPLING errors available at the time of the TES sampling opera-tion were assigned to a separate TES procedure. Clus-The decision to select 20 percent of the A.C.E. block clus- ters that were relisted (which were later included in TES ters for TES was based on the assumption that most of the with certainty) or enumerated using the List/Enumerate TES-eligible housing units and persons would be concen- methodology (which were ultimately excluded from TES) trated in a small fraction of the block clusters. Hence, fall into this group.

most of the benefits of a complete surrounding area

  • The 5 percent of clusters that included the largest num-search could be realized at a substantial reduction in cost, ber of housing units that were potential geocoding if a disproportionate share of the effort was concentrated errors were included in TES with certainty.

in the clusters with the greatest likely payoffthe ones with the most TES-eligible housing units and persons.

  • The 5 percent of clusters that had the most housing Targeting these clusters would achieve one of the princi- units that were potential geocoding errors, when pal goals of the surrounding areas searchvariance weighted by their A.C.E. cluster weight, were also 5-2 Section IChapter 5 Targeted Extended Search U.S. Census Bureau, Census 2000

included in TES with certainty. The 5 percent of clusters 5,326 divided by 1,089 or 4.8907. The remaining 4,407 included in the above bullet for having the most clusters were out of scope for TES because they had no unweighted cases were excluded before this step was identified potential geocoding errors.

performed, so that a total of 10 percent of the A.C.E.

For purposes of drawing the systematic sample, clusters clusters were selected based on the two certainty were sorted in the order:

criteria.

  • State
  • All clusters with at least one potential geocoding error
  • First-phase Sampling Stratum housing unit were assigned to a noncertainty stratum to be sampled at a uniform national rate to be included in
  • Second-phase Sampling Stratum TES. The sampling rate was set so that the overall size
  • Small Block Cluster Sampling Stratum of the TES sample, including those selected by certainty and by sampling, totaled 20 percent of A.C.E. clusters
  • Cluster Number (excluding the first group). The first four characteristics are the same ones used to select the A.C.E. sample. Sorting clusters in this order for Clusters with no potential geocoding errors were excluded TES improved the representativeness of TES with respect from TES selection since there were no housing units or to the national A.C.E. sample. After sorting in this order, persons that were candidates for TES operations. This the clusters were systematically sampled with equal prob-creates a potential for a small bias in TES, because hous- ability using a take-every of 4.8907 and were assigned a ing units added to or deleted from the address lists after TES weight equal to that figure.

the selection of TES clusters were not eligible for TES operations. Results of Cluster Sampling The TES sample included 2,239 block clusters out of Sampling Methodology 11,303, or 19.8 percent. (Originally it had been intended to include a small number of List/Enumerate clusters in For the United States as a whole, there were 11,303 A.C.E.

TES, and some sample was set aside for them but never clusters. Of these, 420 were excluded from TES selection used.) The clusters included 45,000 E-sample and 77,000 because they used the List/Enumerate census method. Of P-sample housing units, representing 80 and 73 percent of the remaining 10,883 clusters, 20 percent, or 2,177 were TES-eligible units in their respective samples before sub-selected for TES. Of the eligible clusters, 62 were relist sampling within large block clusters was performed.

clusters and were not part of the normal TES selection.

Because of differences in procedures, more E-sample units (These clusters did not count as part of the 2,177 TES got into TES by certainty (76 percent versus 66 percent),

target sample size.) Five percent of the sampling universe, while more P-sample units were selected by sampling, 7 or 544 clusters, with the most potential geocoding errors percent to 5 percent. TES units represent about 7 percent were selected for TES with certainty and assigned a TES of the housing units resulting from initial housing unit weight of 1. Of the remaining clusters, an additional 544 matching. (See Table 5-1.) This was not the final number with the most potential geocoding errors, when weighted of housing units included in TES field operations because:

by the A.C.E. cluster weight, were also selected with certainty and assigned a TES weight of 1.

  • Subsampling within large block clusters reduced the number of A.C.E. housing units in clusters with 80 or Of the remaining clusters that included at least one poten- more housing units; and tial geocoding error, 1,089 were selected using systematic random sampling with equal probability. There were 5,326
  • Housing unit counts for Relist clusters were not clusters in the noncertainty stratum (i.e. all those that available at the time the sample was selected were not already selected by one of the other means and Subsampling within large block clusters reduced the final that contained at least one potential geocoding errors), TES workload to 12,000 E-sample and 18,000 P-sample so the selected clusters were assigned a TES weight of housing units.

Targeted Extended Search Section IChapter 5 5-3 U.S. Census Bureau, Census 2000

Table 5-1. TES Sampling Frame and Selection Results Potential geocoding errors Total potential errors Errors of inclusion Errors of exclusion Clusters Number Percent Number Percent Number Percent Total . . . . . . . . . . . . . . . . . 11,303 122,440 100 45,053 100 77,387 100 Out-of-scope . . . . . . . . . . . . . . 4,827 0 ... 0 ... 0 ...

List/Enumerate . . . . . . . 420 0 ... 0 ... 0 ...

No TES HUs . . . . . . . . . 4,407 0 ... 0 ... 0 ...

Eligible for TES . . . . . . . . . . . . 6,476 122,440 100 45,053 100 77,387 100 Certainty . . . . . . . . . . . . . . . . . . 1,150 85,309 70 34,089 76 51,220 66 Top weighted . . . . . . . . . 544 11,858 10 4,037 9 7,821 10 Top unweighted . . . . . . 544 73,451 60 30,052 67 43,399 56 Relist . . . . . . . . . . . . . . . 62 0* ... 0* ... 0* ...

Noncertainty . . . . . . . . . . . . . . 5,326 37,131 30 10,964 24 26,167 34 Selected into sample . . 1,089 7,642 6 2,106 5 5,536 7 Not selected . . . . . . . . . 4,237 29,489 24 8,858 20 20,631 27 TES clusters . . . . . . . . . . . . . . 2,239 92,951 76 36,195 80 56,756 73

  • TES units in Relist clusters had not been determined at the time the sample was selected.

Note: Percentages in table may not add to total due to rounding.

TES FIELD AND PROCESSING ACTIVITIES Adds and Deletes The preliminary census address list of housing units as of Details on the operations involved in TES are described in January 2000 was the source for the initial housing unit Chapter 4. In summary, the main activities are:

matching on which TES is based. Since some housing units on the January 2000 list were later deleted and others

  • Cluster selection (Spring 2000). This operation added, the final list of census housing units did not selects the clusters for TES. Because of the need to exactly match the initial housing unit matching counts select the cluster sample at a particular time, the final of potential geocode errors. Therefore, procedures were E and P samples had not been selected at the time of necessary to update the TES identifications for adds and this operation.

deletes.

  • Search for census units in surrounding blocks In the vast majority of cases, where adds and deletes were (Summer 2000). Determines if census units errone- not involved, P-sample housing units are TES-eligible if ously included in the sample block cluster are located they did not match to a census address. However, if a within the surrounding ring of blocks. This field opera- P-sample unit was matched to an address during initial tion is described more fully in Chapter 4. housing unit matching, but that address was deleted, then the unit was considered nonmatched. To adjust for dele-
  • Identify TES Persons (Fall 2000). An automated tions, P-sample persons in housing units that were activity performed at the National Processing Center matched to deleted census housing units were flagged in Jeffersonville, Indiana. See Chapter 4 for more as TES persons, as long as the unit did not contain any information. persons matched within the sample block (i.e. non-TES persons). This adjustment was performed only on persons
  • Extend the search area to surrounding blocks for in TES clusters.

TES persons (Fall 2000). The P-sample TES persons were allowed to match to census records in the sur- E-sample housing units that were added to the final cen-rounding block. The E-sample TES persons were treated sus list after January 2000 could represent geocoding as correct enumerations if the census unit was located errors, but they were not part of TES field operations.

in a surrounding block. This was a clerical operation. Without field operations, persons in such units would never be identified as surrounding block correct enumera-

  • Assign TES weights (Winter 2000/2001). TES per- tions. Therefore, a correct enumeration probability was sons identified in TES-eligible clusters were assigned the imputed for such persons in TES clusters. The imputed TES weight associated with that cluster, either 1.0 for a probability is the overall correct enumeration probability cluster selected with certainty or 4.8907 for a cluster of all resolved persons in geocoding error housing units in selected by sampling. TES persons in TES clusters not the TES sample. See Chapter 6 for a description of the pro-selected into the sample were assigned a zero weight. cedure.

5-4 Section IChapter 5 Targeted Extended Search U.S. Census Bureau, Census 2000

Table 5-2. Effect of Census Address List Changes after January 2000 Matches/ correct Count Weighted enumerations P sample - persons in housing units matched to deletes . . 2,319 2,036,564 675,892 E sample - geocode error adds . . . . . . . . . . . . . . . . . . . . . . . . . 53 15,307 14,915 TES IN DUAL SYSTEM ESTIMATION where DD census data-defined persons Accounting for TES in the DSE calculation is primarily a CE estimated number of A.C.E. E-Sample matter of applying weights properly. Every person in the correct enumerations A.C.E. is either a TES person or a non-TES person, and Ne number of A.C.E. E-Sample persons every A.C.E. cluster is either a TES cluster or a non-TES Nn estimated number of A.C.E. P-Sample cluster. Every TES person is assigned the TES weight of his nonmovers A.C.E. cluster. The calculation of the DSE requires the use Ni estimated number of A.C.E. P-Sample inmovers of seven distinct components, all but one of which repre-No estimated number of A.C.E. P-Sample sents the sum of the A.C.E. weights for some group of outmovers persons in the A.C.E., including both TES and non-TES per-Mn estimated number of A.C.E. P-Sample sons. Hence, six of the seven components represents a nonmover matches weighted sum of TES and non-TES persons, the former Mo estimated number of A.C.E. P-Sample with their TES cluster weights applied.

outmover matches Applying TES Weights The estimator has seven A.C.E. distinct components (plus DD from the census enumeration). Six of the seven com-Every A.C.E. cluster including TES persons has a TES ponents represent a weighted sum of persons, including weight, although that weight is zero if the cluster is not both TES- and non-TES persons.

selected for TES. A TES person must be weighted by the associated TES weight. The A.C.E. weight is multiplied by Other than inmovers, who cannot be TES persons, each of the TES weight to produce a person weight. TES weighting the DSE components is expressed as:

does not affect the weight of non-TES persons. Their indi- n np n np n np vidual weights are the same as the A.C.E. weights. wij* mij xij wij* mij yij w ij* tij mij z ij i1 j1 i1 j 1 i1 j1 Table 5-3. TES Weights by TES Status of the where Person and Cluster i cluster index TES cluster Non-TES cluster j person index n number of block clusters in the A.C.E. sample TES persons . . . . . . . . . . . 1, if cluster in TES 0 np number of persons in block cluster i with certainty xij 1 if the person is not a TES person, 0 4.8907, if cluster otherwise selected for TES by sampling 0 yij 1 if the person is a TES person and is in the TES sample with certainty, 0 otherwise Non-TES persons . . . . . . . 1 1 zij 1 if the person is a TES person and is in the TES systematic sample, 0 otherwise The issues related to inmovers, outmovers and noninter- mij characteristic of interest, match, correct enumeration, views are the same for TES persons as for all other per- E-sample person, or P-sample person sons. From a calculation standpoint, the only effect that wij* weight used for estimation (includes inverse of TES status has on the dual system estimates is in applying the probability of selection for A.C.E.,

the clusters TES weight. adjustment for household noninterview and weight trimming) tij TES sampling weight, the TES systematic sample DSE Calculation take-every The DSE for Census 2000 is: EFFECTS OF TES ON DUAL SYSTEM ESTIMATION The principal effect of TES in Census 2000 is approxi-DS E (DD) ( )

CE Ne ( Nn Ni Mn ( )

Mo No Ni

) mately what was expected-the overall correct enumera-tion rate was 2.9 percent higher with TES, than it would Targeted Extended Search Section IChapter 5 5-5 U.S. Census Bureau, Census 2000

Table 5-4. Effect of TES at the National Level With TES Without TES Difference* Effect of TES (1) (2) (1)-(2) (1)/(2)

E sample Persons (Ne) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264,578,862 264,634,794 (55,932)** 1.000 Correct Enumerations (CE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252,096,238 244,387,951 7,708,288 1.032 CE Rate (%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95.3 92.3 2.9 1.032 P sample Persons (Np) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263,037,259 262,906,916 130,343** 1.000 Matches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240,878,622 230,681,205 10,197,418 1.044 Match Rate (%) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.6 87.7 3.8 1.044 Ratio of CE to Match Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.040 1.053 (0.012) 0.989 Coefficient of Variation for Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.129 0.314 (0.197) 0.405

  • Percentages were calculated on unrounded values.
    • The weighted E- and P-sample sizes differed slightly because of variance in TES sampling.

Note: Table above reflects national totals without regard to post-stratification and differs from other totals in which post-stratum totals were aggregated.

have been without, and the overall match rate was 3.8 Table 5-5. Effect of TES on Coefficient of percent higher (see Table 5-4). The larger increase in the Variation (CV) match rate, as compared to the correct enumeration rate, Standard CV occurred because there were more identified potential error (percent) geocoding errors in the P sample than in the E sample.

With TES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355,451 0.129 The difference in the number of matches versus correct Without TES . . . . . . . . . . . . . . . . . . . . . . . . . . 877,664 0.314 enumerations (10.2 million and 7.7 million, respectively) from TES had been a source of concern, since it suggested Note: Table above reflects national totals without regard to post-the possibility of balancing error. Balancing error would stratification and differs from other totals in which post-stratum totals have occurred if the geographic boundaries included in were aggregated.

the P and E samples had not been consistent. For instance, suppose the P sample was allowed to match to census per-sons in housing units beyond the first ring, while a census At the post-stratum level, the average improvement in the unit could only be classified correct if it was within the DSE standard error is about 33 percent. The gains in preci-first ring. Adams and Liu (2001) performed an evaluation sion as measured by variance show that TES makes dual study of the P-sample housing units in A.C.E. and con- system estimates more precise, and that TES improves the cluded that the main source of the measured imbalance quality of the A.C.E., so long as it does not make the DSEs was geocoding error in the P sample. less accurate by introducing bias. The coefficient of varia-Table 5-4 shows that TES increased the number of correct tion was reduced for a majority of the collapsed post-enumerations from 244.4 million to 252.1 million and strata (448 original post-strata were collapsed into 416 matches from 230.7 million to 240.9 million. Before TES, post-strata for DSE calculation purposes).

there had been 20.2 million erroneous enumerations, of which 7.7 million were geocoding errors that were classi- Table 5-6. Effect of TES on Post-Stratum CVs fied as correct enumerations by TES. TES also allowed [Percent]

10.2 million additional P-sample matches to occur out of 32.2 million original nonmatches. Improving both the With TES Without TES match and correct enumeration rate this much signifi- Average CV . . . . . . . . . . . . . . . . . . . . . . . . . . 2.07 2.66 cantly improves the variance of the DSE, since over 90 per- Median CV . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.81 2.32 cent of people match or are correctly enumerated. Average CV weighted by census count . . 1.30 1.93 Table 5-5 shows the significant contribution that TES makes to variance reduction. For the A.C.E. considered as a whole (i.e. a direct DSE of the entire population without post-stratification), the coefficient of variation is 0.129 percent with TES and 0.314 percent without TES.

5-6 Section IChapter 5 Targeted Extended Search U.S. Census Bureau, Census 2000

Chapter 6.

Missing Data Procedures INTRODUCTION type was unit missing data. These were households that were not interviewed in the A.C.E. either because they This chapter gives an overview of missing data procedures could not be contacted or because the interview was for the Census 2000 Accuracy and Coverage Evaluation refused. The noninterview adjustment process spread the (A.C.E.). General background information is presented weights of these households among households that were first, while the following sections describe three types of interviewed in the same noninterview cell.

procedures used to account for data missing in the A.C.E.

The noninterview adjustment accounts for whole- The other type of missing data was item missing data.

household nonresponse. The next section describes the This situation occurred when some information for a characteristic imputation used to assign values for specific household or person was available but portions of the missing demographic variables. Finally, for persons with data were missing. Two groups of missing data items had unresolved match, residence, or enumeration status, a to be addressed: demographic items and items relating to probability of matching, residence, or correct enumeration a specific operational status. Missing age, sex, tenure, was assigned according to procedures. race, and Hispanic origin were imputed to allow the pro-As missing data in the A.C.E. were addressed after the duction of estimates of the census undercount by these completion of the field operations that produced the characteristics, and because they were necessary to assign A.C.E. data files, a knowledge of the field activities and the people to post-strata.

circumstances that led to specific outcomes is necessary For a small number of people in the P sample, there was to understand the motivation for these procedures. For not enough information available to determine the match this information, the reader is referred to Chapter 4 for status (whether or not the person matched to someone in details on the field operations. the census in the appropriate search area) or the residence The missing data procedures used in the Census 2000 status (whether or not the person was living in the block A.C.E. were similar to those used on the Integrated Cover- cluster on Census Day). Determining residence status was age Measurement (ICM) sample in the Census 2000 Dress important for the P sample because Census Day residents Rehearsal. An outline of the ICM procedures and a sum- of the block clusters in the sample were used to estimate mary of related research are given in Ikeda, Kearney, and the proportion of the population who were not counted in Petroni (1998). Kearney and Ikeda (1999) provide an over- the census. Similarly, some people in the E sample lacked view of the results from the Dress Rehearsal. For detailed information to determine whether the person was cor-missing data procedures for the 2000 A.C.E., see Cantwell rectly enumerated. Such cases where status could not be (2000) and Ikeda and McGrath (2001). A few basic results determined were said to be unresolved. Generally for on missing data from 2000 are found in this chapter; cases with missing status a probability of residence, many more results can be found in Cantwell et al. (2001). match, or correct enumeration was assigned based on information available about the specific case and about BACKGROUND cases with similar characteristics.

Before dual system estimates were calculated, it was nec- In the 1990 Post-Enumeration Survey, a hierarchical logis-essary to account for missing information from the inter- tic regression program was used to calculate probabilities views of P-sample people and from the matching opera- of match and correct enumeration for cases with missing tions. It should be noted that the term missing data information. (Due to the procedure used to treat movers in applies after all follow-up attempts have been made. 1990, residence status played a different role then.) The Chapter 4 describes some of the extensive field proce- model and some results are discussed in Belin et al.

dures conducted to minimize the resulting level of such (1993). During census tests in 1995 and 1996, certain missing data. These activities - all specified in components of missing data were addressed using logistic advance - included multiple attempts at interviews, the regression, while for other components a simpler proce-use of highly trained clerks and technicians to resolve dure called imputation cell estimation was used. The latter cases, and the follow up of cases where a second inter- procedure was used exclusively in the Census 2000 Dress view could provide additional required information. Rehearsal in 1998. Data from these tests indicate that the There were two main types of missing data in the A.C.E. exact method of calculating probabilities for unresolved and three processes used to correct for them. The first status (match, residence, or correct enumeration) has a Missing Data Procedures Section IChapter 6 6-1 U.S. Census Bureau, Census 2000

minor effect on the dual system estimates. More details of deleted units did not contribute toward dual system esti-this research can be found in Ikeda (1997, 1998, 1998b, mation. An example of an illustrative block cluster, pro-and 1998c) and Cantwell (1999). Based on these findings vided in Figure 6-1, page 6-10, shows how the status of a and concerns about implementing logistic regression in a housing unit on Census Day and Interview Day would be production environment, the simpler procedure (that is, determined. Results of the A.C.E. interviewing operation imputation cell estimation) was used to estimate missing are shown in Table 6-1.

data items in the A.C.E.

Table 6-1. Status of Household Interviews in Noninterview Adjustment (Household Level) the A.C.E. [Unweighted]

At the time of the Computer Assisted Personal Interview A.C.E.

(CAPI), questions were asked to determine who lived in Census Day Interview Day the household on Interview Day and who lived there on Number Percent Number Percent Census Day, and a mover status was assigned based on the replies. Thus two rosters were created for each house- Total housing units . . . . . . . . . 300,913 100.0 300,913 100.0 hold-the Census Day roster and the Interview Day roster. Interviews . . . . . . . . . . . . . . . 254,175 84.5 264,103 87.8 Noninterviews . . . . . . . . . . . . 7,794 2.6 3,052 1.0 The A.C.E. used inmovers to estimate the number of Vacant units . . . . . . . . . . . . . 28,472 9.5 29,662 9.9 P-sample movers in the post-stratum, while using outmov- Deleted units . . . . . . . . . . . . 10,472 3.5 4,096 1.4 ers to estimate the match rate of the movers. This method Noninterview rate . . . . . . . . . . . 3.0% 1.1%

is referred to as Mover Procedure C or PES-C in the Note: Percentages in table may not add to total due to rounding.

research studies. See Chapter 4 for descriptions of the terms nonmover, inmover, and outmover.

Of the 261,969 housing units occupied on Census Day, All inmovers and all nonmovers were generally assumed 7,794 (3.0 percent) were noninterviews. The correspond-to be A.C.E. Interview Day residents, with the exception of ing numbers for Interview Day were 267,155 and 3,052 infants born after Census Day. People living in group quar- (1.1 percent). The noninterview rate was higher for Cen-ters, such as college students in dormitories, were not eli- sus Day than Interview Day, because interview status was gible for the P sample. Therefore, for the purpose of esti- determined by results obtained on Interview Day. On that mating the number of inmovers, person inmovers aged 18 date, information was sought for both Census Day and to 22 who were living in group quarters on Census Day Interview Day. Any time a household member or knowl-were not considered to be Interview Day residents. edgeable proxy could be reached, an interview for Inter-view Day was generally obtained. Census Day data was Noninterview adjustment was performed only on the P not always obtainable from the same respondent, usually sample. The procedure was similar to that used in the in cases when the housing units occupants had moved in Census Dress Rehearsal. Due to the mover procedure after Census Day. Each of the two noninterview adjust-described above, there were two noninterview adjust- ments generally spread the weights of noninterviewed ments - one based on housing unit status as of Census units over interviewed units in the same noninterview cell, Day (i.e., the Census Day roster), and the other based on defined as the sample block cluster crossed with the type housing unit status as of the day of the A.C.E. interview of basic address. For purposes of this adjustment, the (i.e., the Interview Day roster). An occupied housing unit types of basic address were single-family, multiunit (such was defined as an interview (for the given reference day - as apartments), and all others. The Census Day noninter-Census Day or Interview Day) if there was at least one per- view adjustment, determined according to the status of son (with a name and at least two demographic character- housing units as of Census Day, was used to adjust the istics) who possibly or definitely was a resident of the person weights of nonmovers and outmovers. Similarly, housing unit on the given reference day. An occupied the Interview Day noninterview adjustment, determined housing unit (as of the given reference day) that was not according to the status of housing units as of Interview an interview was a noninterview. Thus a unit that was Day, was used to adjust the person weights of inmovers.

vacant, removed from the list of eligible housing units The formulae are described as follows:

(because, for example, it was demolished or used only as a business), or in certain special places was not consid- For a given block cluster and type of basic address, the ered an interview or a noninterview. In the latter two situa- Census Day noninterview adjustment factor was com-tions, the unit was deleted from the list of A.C.E. sample puted as housing units.

wi wi Census Day Census Day If a housing unit was found to be vacant on Census Day or interviews noninterviews deleted from the sample, then that household did not fac- f

  • c tor into the Census Day noninterview adjustment. The wi Census Day same concept applies to Interview Day. Thus, vacant and interviews 6-2 Section IChapter 6 Missing Data Procedures U.S. Census Bureau, Census 2000

where wi represents the weight of housing unit i, that is, Characteristic (Item) Imputation (Person Level) the inverse of its probability of selection into the A.C.E. Production of A.C.E. undercount estimates required data sample. When computing the noninterview adjustment on age, sex, tenure (owner versus nonowner), race, and factor, the weight wi incorporated the trimming that Hispanic origin to classify respondents by these important occurred in some block clusters. (See Appendix C.) How- demographic characteristics, so they had to be imputed ever, the weights did not reflect the sampling for targeted whenever the data were not collected. Characteristic impu-extended search (TES, Chapter 5) for two reasons. First, tation was not carried out for other missing variables the noninterview adjustment was done at the housing unit (with the exception of the items with unresolved status).

level, but a housing unit could contain some people with Several variables also used to assign post-strata, such as TES status and others without it. Second, TES status was the location or return rate of the census tract, were the not determined until after the matching operation, but same for everyone in the block. The extent of the missing information was usually not collected about people in non- characteristics is portrayed in Table 6-2.

interviewed units, and these people were generally not sent to be matched. Therefore, there was not a reasonable The imputation rates in the E sample for the five character-way of systematically classifying noninterviews into those istics listed above ranged from 0.3 percent for sex up to with and without TES status. 3.8 percent for tenure (using unweighted frequencies).

Since the A.C.E. record for each person in the E sample Similarly, for a given cluster and type of basic address, the was matched to the Census 2000 edited file and the five Interview Day noninterview adjustment factor was com- characteristics were extracted and copied, the following puted as imputation procedures apply only to the P sample.

P-sample characteristic imputation for the A.C.E. was simi-wi wi Interview Day Interview Day lar to that for the 1990 PES and the various Census 2000 interviews noninterviews tests, including the Dress Rehearsal. Age and sex were f

  • i wi imputed based on the available demographic distributions Interview Day determined from the P sample. Tenure was imputed using interviews a form of nearest-neighbor hot-deck procedure. To impute The example in Figure 6-1 on page 6-10, demonstrates the for race and Hispanic origin, the two approaches were calculation of the noninterview adjustment. When the combined.

unweighted number of noninterviewed units in a given For missing tenure, race, and Hispanic origin, a hot-deck noninterview cell (sample block cluster by type of basic procedure was used to take advantage of the correlations address category) was more than twice the unweighted often found in these characteristics among people living in number of interviewed units, then the weights of the non- the same block cluster (or, generally, in geographic prox-interviewed units were spread over the interviewed units imity). The characteristics age and sex are geographically in a broader cell. This cell was formed by combining the less clustered than tenure, race, and Hispanic origin. Fur-sample block clusters in the same A.C.E. sampling stratum ther, the value of age or sex is often considerably affected within the same type of basic address. Because the nonin- by specific conditions, such as the persons relationship to terview rates were so small, the noninterview adjustment the reference person, or whether information is available factors were close to 1 for most housing units in the on the persons spouse. Thus, national distributions condi-sample. For Census Day, the factors were smaller than tioned on relevant covariates were used to impute for age 1.10 for more than 92 percent of the units; for Interview and sex. These distributions were constructed before the Day, the factors were less than 1.10 for over 98 percent of imputation began, without regard to the imputation for the units. other missing characteristics.

Table 6-2. Percent of Characteristic Imputation in the P and E Samples [Unweighted]

Percent of people with imputed characteristic Percent of people with one or more His- imputed Total panic characteristics people Age Sex Tenure Race origin P sample . . . . . . . . . . . . . . 706,245 2.5 1.7 1.9 1.4 2.4 5.5 E sample . . . . . . . . . . . . . . 704,602 3.1 0.3 3.8 3.5 3.6 11.2 Missing Data Procedures Section IChapter 6 6-3 U.S. Census Bureau, Census 2000

Age. The value of age was missing for 2.5 percent the same type of basic address and had tenure recorded.

(unweighted) of the P sample. When age was missing, one As with the adjustment for noninterviews, three types of of four age categories (0-17, 18-29, 30-49 and 50 or basic address were used: single-family, multiunit, and all older) - rather than a number - was imputed, because other types of units. See Figure 6-4, on page 6-13, for fur-only the category was used to assign people to a post- ther information.

stratum for estimation. In one-person households, missing age was imputed from the distribution of ages reported in Race. When race was missing - 1.4 percent of the P such households. In multiperson households, if the rela- sample - the imputed race could be any of the 63 possible tionship to the reference person was missing, the distribu- combinations of the six basic race categories: White, tion of ages (excluding those of reference persons) in all Black, American Indian or Alaskan Native, Asian, Native multiperson households was used. Otherwise, if the per- Hawaiian or Other Pacific Islander, and Some Other Race.

son was the spouse, child, sibling, or parent of the refer- All 63 categories were treated the same in the imputation.

ence person, missing age was generally imputed from a That is, there were no special procedures for any catego-distribution of reported ages using the relationship to the ries or groups of categories.

reference person and the age of the reference person. For Whenever possible, missing race was imputed from the reference persons, other relatives, and nonrelatives, age same household. Independently for each household mem-was imputed from the distribution of ages reported by ber with missing race, one person was selected at random persons with the same relationship. See Figure 6-2, on from those household members with reported race and page 6-11, for details. the selected persons race was imputed to the given household member. If race was missing for all household Sex. The imputation rate for sex was 1.7 percent in the P members but someone had reported origin (Hispanic or sample. For one-person households, sex was imputed non-Hispanic), then the race distribution of the nearest from the distribution of sex in all one-person households. previous household with any reported race and the same To impute the sex of a reference person, if the household origin was used. Note that the Hispanic origin of the had more than one person but no spouse was present, the household was that of the first person on the household distribution of sex for reference persons of multiperson roster with origin reported. When race and Hispanic origin households with no spouse present was used. If a spouse were missing for the whole household, the race distribu-was present, the missing sex of the reference person or tion of the nearest previous household with reported race the reference persons spouse was imputed as the sex was usedregardless of Hispanic origin. See Figure 6-5, opposite to that of the spouse. If sex was missing for the on page 6-14, for details.

reference person and the spouse, then the sex of the refer- Hispanic Origin. A value of origin - Hispanic or non-ence person was imputed from the distribution of sex for Hispanic - was imputed for 2.4 percent (unweighted) of reference persons with a spouse present. The spouse was the P sample. The procedure was analogous to that for then assigned the sex opposite to that of the reference imputing missing race. That is, whenever possible, origin person. was imputed from within the same household. If everyone For other persons in multiperson households (that is, in the household was missing origin, then the nearest pre-other than reference persons and spouses): 1) if the rela- vious household with reported origin and the same race tionship to the reference person was missing, and if no category was used. When both Hispanic origin and race one else in the household was recorded as a spouse of the were missing for the whole household, the Hispanic origin reference person, sex was imputed from the distribution distribution of the nearest previous household with of sex for persons (excluding reference persons) from all reported Hispanic origin was used - regardless of race. For multiperson households; 2) otherwise, sex was imputed the imputation procedure and the race categories used in from the distribution of sex for persons (excluding refer- it, see Figure 6-6, on page 6-15.

ence persons, spouses, and persons with missing relation- For each of the five characteristics discussed, the distribu-ship) from all multiperson households. Figure 6-3, on page tion of imputed values did not necessarily mirror the dis-6-12, illustrates the procedure. tribution of reported values - nor was this expected. How-ever, because the imputation rates were low in the P and E Tenure. Household tenure (owner versus nonowner) was samples, the distributions before and after imputation missing for 1.9 percent of the people in the P sample. Ten- were very similar. See the distribution of characteristics on ure was imputed from the previous household that had the following page.

6-4 Section IChapter 6 Missing Data Procedures U.S. Census Bureau, Census 2000

Distribution of Characteristics Before and After Imputation [Weighted]

P sample E sample Before After Before After imputes Imputed imputes imputes Imputed imputes Race 1.4% Imputed 3.2% Imputed White Only 73.5% 67.5% 73.4% 76.9% 57.2% 76.2%

Black Only 11.0% 10.2% 11.0% 11.8% 6.6% 11.6%

AIAN Only 0.6% 0.7% 0.6% 0.8% 0.8% 0.8%

Asian Only 3.5% 3.4% 3.5% 3.7% 2.9% 3.7%

NHPI only 0.1% 0.3% 0.1% 0.1% 0.3% 0.1%

Some other race only 8.3% 14.4% 8.4% 4.5% 28.5% 5.3%

Multiple races 3.0% 3.5% 3.0% 2.3% 3.7% 2.3%

Hispanic origin 2.3% Imputed 3.4% Imputed Hispanic 12.4% 11.5% 12.4% 12.5% 9.0% 12.4%

Age 2.4% Imputed 2.9% Imputed 0-17 26.1% 21.7% 26.0% 25.9% 19.7% 25.7%

18-29 16.7% 18.9% 16.7% 15.5% 19.0% 15.6%

30-49 30.7% 33.0% 30.8% 31.0% 30.9% 31.0%

50+ 26.5% 26.4% 26.5% 27.6% 30.5% 27.6%

Sex 1.7% Imputed 0.2% Imputed Male 48.4% 47.2% 48.3% 48.8% 53.9% 48.8%

Female 51.6% 52.8% 51.7% 51.2% 46.1% 51.2%

Tenure 1.9% Imputed 3.6% Imputed Owner 68.4% 70.3% 68.4% 69.9% 65.1% 69.7%

Nonowner 31.6% 29.7% 31.6% 30.1% 34.9% 30.3%

Assigning Probabilities for Unresolved Cases in noninterviewed units were nonresidents (since, by defi-(Person Level) nition, if one person in the household was a resident then the household was considered an interview). Therefore, After all follow-up activities were completed, there using the noninterview factor to calculate the averages for remained a small fraction of the A.C.E. sample without unresolved cases would have produced a biased estimate enough information to compute the components of the of residence probability. The issue of which weights to use dual system estimator given in Chapter 7. Their status was moot when resolving E-sample cases with missing was said to be unresolved. A procedure called imputa-tion cell estimation was used to assign probabilities for enumeration status, as a noninterview adjustment was not P-sample people with unresolved match or Census Day applied to E-sample persons.

residence status, and for E-sample people with unresolved enumeration status. Thus, the weights, wi, used here incorporated all stages of sampling, including the selection of people for targeted All P- and E-sample persons - resolved and unresolved - extended search, but were not adjusted for household were placed into groups called imputation cells based on noninterviews. Any trimming of the weights was also per-operational and demographic characteristics. Different formed before these weighted averages were calculated.

variables were used to define cells for P-sample match and residence status and in the E-sample for enumeration status. Within each imputation cell the weighted average Unresolved Residence Status in the P Sample of 1s and 0s (representing, e.g., match and nonmatch, respectively) among the resolved cases was calculated, After follow-up was completed, all persons in the P sample and that average was imputed for all unresolved persons who were eligible to be matched to the Census (see Chap-in the cell. ter 4) were classified into three types, according to their status as a resident in their sampled block at the time of One should note that the noninterview adjustment factor the census: Census Day residents, Census Day nonresi-was not incorporated into the person weights when these averages were calculated. This is because the noninter- dents, and unresolved persons - those for whom there view adjustment was designed to spread the weight of was not enough information to determine the residence noninterviewed housing units over interviewed housing status. The results are displayed in Table 6-3.

units. However, all persons with resolved residence status Missing Data Procedures Section IChapter 6 6-5 U.S. Census Bureau, Census 2000

Table 6-3. Final Residence Status for the P Sample by Mover Status

[Unweighted]

Final residence status Residence rate for Total Confirmed Confirmed Unresolved resolved people resident nonresident resident cases U.S. total . . . . . . . . . . . . . . . . . . . . . . . . . . 653,337 95.8% 1.9% 2.3% 98.1%

Mover status Nonmover . . . . . . . . . . . . . . . . . . . . . . . 627,992 96.6% 1.7% 1.7% 98.3%

Outmover . . . . . . . . . . . . . . . . . . . . . . . . 25,345 75.2% 7.5% 17.4% 91.0%

Because of the uncertainty of the actual status of the of Census Day residents, that is, the weighted average of 15,082 people (2.3 percent of 653,337) with unresolved 1s and 0s, was computed:

residence status, a probability of being a Census Day resident was assigned (see equation (6.2)). Then, when wi Prres, j resolved computing the dual system estimate, all person persons nonmovers and outmovers were included with their esti- Pr*res, j mation weight (see Chapter 7) and the following residence wi resolved probability: persons (6.1) (6.2)

{ 1, if person j is a resident on Census Day Prres,j 0, if person j is NOT a resident on Census Day Pr*res, j, if person j is unresolved where wi was defined at the beginning of this section.

This proportion was then assigned as Pr*res,j to each unre-solved case in the cell. (The exception is for follow-up match code group 7; this is explained below.) The cells To assign Pr*res, j for unresolved cases, the Census Day used to resolve residence status, along with the probabili-residence probability for inmovers was irrelevant for esti- ties assigned to the unresolved cases, are given in mation and was not used. Only nonmovers and outmovers Table 6-4.

in the P sample who had a resolved final residence status Match code groups 1 through 7, which partition the popu-and went through the person matching operation (for- lation into mutually exclusive and exhaustive groups, mally, those with a final match-code status) were used. were determined from the match codes and other vari-They were placed into a number of imputation cells as ables derived before the follow-up operation as explained defined in Table 6-4. Within each cell, among the resolved in Chapter 4. Group 8 was formed differently. Some infor-cases (those with Prres, j = 1 or 0) the weighted proportion mation from the follow-up operation was coded in time for Table 6-4. Imputation Cells and Probabilities Assigned for Resolving Residence Status in the P Sample Owner Nonowner Match code group Non-Hispanic White Others Non-Hispanic White Others 1 = Matches needing follow-up . . . . . . . . . . . . . . . . . . . 0.982 0.986 0.993 0.991 2 = Possible matches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.973 0.968 0.966 0.972 3 = Partial household nonmatches needing V3a* V3b* V3a* V3b* V3a* V3b* V3a* V3b*

follow-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.755 0.956 0.901 0.971 0.883 0.959 0.928 0.969 4 = Whole household nonmatches needing follow-up, not conflicting households . . . . . . . . . . 0.920 0.943 0.911 0.914 5 = Nonmatches from conflicting household . . . . . . . . 0.910 0.927 0.945 0.954 6 = Resolved before follow-up . . . . . . . . . . . . . . . . . . . . 0.993 0.990 0.990 0.988 7 = Insufficient information for matching (Weighted 0.813 0.867 0.844 0.872 column average of groups 1-5 and 8) . . . . . . . . . .

8 = Potentially fictitious or said to be living else-0.119 0.123 0.177 0.157 where on Census Day . . . . . . . . . . . . . . . . . . . . . . .

  • V3a = Group 3 Persons age 18-29 listed as child of reference person; V3b= All other group 3 persons.

6-6 Section IChapter 6 Missing Data Procedures U.S. Census Bureau, Census 2000

the A.C.E. missing data procedures. (Under the original In the Dress Rehearsal, only three weighted ratios were schedule, this information would have become available calculated for residence probability: a ratio for persons too late to be of use.) After the follow-up operation, a sent to follow-up, a ratio for persons not needing follow-small number of people in the P sample were coded as up, and an overall ratio used for persons with insufficient being potentially fictitious or said to be living elsewhere information for matching. Based on Dress Rehearsal on Census Day. Such people were placed in Group 8, even results, Kearney and Ikeda (1999) suggest calculating though they also qualified for one of the Groups l1 separate ratios by match code group and splitting persons through 7. from conflicting households into a separate match code The two tenure categories were owners and nonowners. group. The larger Accuracy and Coverage Evaluation Persons were placed into one of two race categories: non- sample size in Census 2000 than in the Dress Rehearsal Hispanic White and all others. People of multiple races (for made it possible to separate matches needing follow-up example, a person responding as White and Asian) were from possible matches. Additional research and discussion placed in the latter group. V3 was a variable defined only suggested adding additional variables within match code for match code group 3, partial household nonmatches. group.

V3a comprised persons in group 3 who were 18 to 29 years of age and were listed on the A.C.E. household ros- Unresolved Match Status ter as a child of the reference person. V3b included all other persons in group 3. Computing the dual system estimator required measuring the total number of P-sample people who were matched to The residence probability for unresolved P-sample persons persons included in the census. (Separate estimates were was computed as described above, except for those in obtained for nonmovers and outmovers, but that does not match code group 7 - people with insufficient information affect what follows.) After follow-up activities were com-for matching. Within this set of four cells (see Table 6-4), pleted, each confirmed or possible (unresolved) Census there were almost no resolved cases from which to extract Day resident in the P sample was determined to be a a probability of being a Census Day resident. Because of match, a nonmatch, or unresolved (that is, persons for the lack of information - most of these cases did not even whom match status could not be determined). Match sta-have a valid name - these people did not go through the tus of confirmed Census Day nonresidents was not used in matching operation and were not sent to follow-up. To the estimation. As is seen in Table 6-5, unresolved adjust for these cases, a weighted proportion of Census matches were infrequent in the P sample.

Day residents (1s and 0s) was computed among the resolved cases in each of the four columns of Table 6-4 The treatment of unresolved matches was similar to that using match code groups 1 through 5 and 8. Separately for unresolved residence status. For each confirmed or for each of the four tenure race/ethnicity classes, the over- possible Census Day resident j in the P sample, the value all weighted probability of being a resident among those Prm, j was assigned as 1, 0, or Pr*m, j, in a manner analo-sent to follow-up (groups 1 through 5 and 8) was assigned gous to equation (6.1), according to whether the person to those with insufficient information for matching (group was a match, a nonmatch, or had unresolved match sta-7). Left out of this computation were those people who tus, respectively. Unresolved matches accounted for 7,826 were resolved before follow-up (group 6). Observations of 640,945 people in the P sample, or 1.2 percent. Pr*m, j from the Census 2000 Dress Rehearsal indicated that, in was assigned using imputation cell estimation based on terms of their demographic and operational characteris- those with a resolved match status. The formula is the tics, people in group 7 tend to be more like those in same as in equation (6.2), but pertains to match status, groups 1 through 5 and 8, than like those in group 6. that is, uses the values of Prm, j.

Table 6-5. Final Match Status for the P Sample by Mover Status [Unweighted]

Final match status Match rate for P sample (confirmed or possible residents)

Number of Unresolved resolved persons Match Nonmatch match cases U.S. total . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 640,945 90.3% 8.5% 1.2% 91.4%

Mover status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Nonmover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617,490 91.1% 8.0% 0.9% 91.9%

Outmover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23,455 67.8% 21.7% 10.5% 75.8%

Missing Data Procedures Section IChapter 6 6-7 U.S. Census Bureau, Census 2000

As with residence status, the cases were first classified Table 6-6. It is useful to note that most persons with according to several characteristics. Within cells, the unresolved match status (7,693 of the 7,826) had insuffi-weighted proportion of matches among the resolved cases cient information for matching; most of them did not have

- excluding all confirmed Census Day nonresidents - was a valid name, and their rate of missing characteristics was computed and assigned to each of the unresolved cases in much higher than the average. Further, almost all of these the same cell. Again, the weights, wi, are defined earlier. people (7,506) were in match code group 7. As such, they did not go through the matching process, nor were they The characteristics used to define the imputation cells for sent for follow-up. This information was considered when match status - different from those used for residence sta-cells were selected for imputation of match status. Vari-tus - are shown in Table 6-6. They were based on observa-ables such as age and ethnicity - that had a high chance tions from the Census 2000 Dress Rehearsal and an analy-of being imputed and might be of questionable quality -

sis of the A.C.E. operations. Kearney and Ikeda (1999) were avoided.

showed that mover status (nonmover versus outmover) discriminated well between matches and nonmatches In the Dress Rehearsal, within each of the four geographic among the resolved cases. The housing unit address sites, one overall weighted ratio for match probability was match code refers to the initial match between housing calculated and used. Kearney and Ikeda (1999) suggest units on the independent (A.C.E.) listing and the census that separate ratios for outmovers and nonmovers should address list; conflicting housing units were determined be calculated.

during A.C.E. person matching activities.

Unresolved Enumeration Status (E Sample)

People with at least one imputed demographic variable (i.e., age, sex, race, Hispanic origin, or tenure) were The dual system estimator also required the total number grouped together for imputation of match status. Unpub- of correct enumerations in the E sample. As with opera-lished studies indicate that, at least in the Dress Rehearsal, tions previously discussed, follow-up activities left each the presence of these imputed characteristics among person in the E sample with one of three types of enu-resolved cases is negatively associated with the propen- meration status: correct, erroneous, or unresolved. The sity to be a match. For outmovers from a unit that was a person was assigned a number, Prce, j, equal to 1, 0, or nonmatch or a conflicting household, people were not Pr*ce, j, respectively, according to that status, similar to separated according to their imputed characteristics. The equation (6.1). Table 6-7 shows the distribution of persons reason was to maintain a reasonable number of resolved according to enumeration status. The values of Pr*ce,j for cases in each cell from which to estimate the weighted the 21,148 unresolved E-sample people (3.0 percent of proportion of matches. The probabilities assigned to 704,602) were determined through imputation cell people with unresolved match status are provided in estimation.

Table 6-6. Imputation Cells and Probabilities Assigned for Resolving Match Status in the P Sample Housing Unit Address Match Code Housing unit was a nonmatch or Mover status Housing unit was a match (code 1) the household is conflicting (code 2 or 4)

No imputes 1 or more imputes No imputes 1 or more imputes Nonmover 0.945 0.901 0.690 0.567 Outmover 0.798 0.791 0.516 Table 6-7. Final Enumeration Status for the E Sample [Unweighted]

Final enumeration status Correct enumeration E sample Correct Erroneous Unresolved rate for Number of enumera- enumera- enumera- resolved persons tion tion tion cases U.S. total . . . . . . . . . . . . . . . . . . . . . . . . . . 704,602 92.6% 4.4% 3.0% 95.5%

6-8 Section IChapter 6 Missing Data Procedures U.S. Census Bureau, Census 2000

The resolved and unresolved cases were placed in the defined in Table 6-8. Other characteristics used to define cells defined shown in Table 6-8. Within each cell, the cells were the presence or absence of imputed characteris-weighted proportion of correct enumerations among tics (as was used to define cells for match status); whether resolved cases was computed before accounting for the person was non-Hispanic White or any other race-duplication with non-E-sample people, analogous to equa- ethnicity combination; and V3, as defined in the section tion (6.2), and then assigned to each unresolved case in on residence status.

the cell.

As with residence status for P-sample people, a key factor There was an additional adjustment made to the enumera-in determining enumeration status was the E-sample per- tion probability of E-sample people as a result of duplica-sons match code. These codes can be found in Chapter 4. tion with persons subsampled out of the E-sample in large People were placed in match code groups accordingly in clusters. If the same identity was assigned to u E-sample the following sequence: 1) People coded as potentially persons and v persons who were subsampled out of the fictitious or said to be living elsewhere on Census Day E sample, 1) one of the u E-sample persons was selected (based on information collected during the follow-up during the person matching operation, and 2) the initial operation) were placed in groups 11 and 12, respectively. correct enumeration probability was multiplied by

2) All other people included in the operation for targeted u/(u + v) during the missing data activities, as it was not extended search were placed in group 10. See Chapter 5 known which person was the actual E-sample person.

for details. 3) People in the remainder of the E sample The other u-1 E-sample persons were assigned a correct were then placed in the appropriate match code group, as enumeration probability of 0.

Table 6-8. Probabilities Assigned for Resolving Enumeration Status in the E Sample Match code group No imputed characteristics 1 or more imputed characteristics 1 = Matches needing follow-up 0.977 0.977 2 = Possible matches 0.968 0.968 3 = Partial household nonmatches V3a* 0.871 V3b* 0.974 V3a* 0.908 V3b* 0.960 4 = Whole household nonmatches where the housing unit Non-Hispanic Others matched; not conflicting households White 0.958 0.974 0.965 5 = Nonmatches from conflicting households; for housing units not in regular nonresponse follow-up 0.975 0.965 6 = Nonmatches from conflicting households; housing units in regular nonresponse follow-up 0.914 0.926 7 = Whole household nonmatches, where the housing unit did Non-Hispanic Others not match in housing unit matching White 0.950 0.947 0.959 Non-Hispanic Others White 0.979 0.990 8 = Resolved before follow-up 0.995 9 = Insufficient information for matching 0.000 10 = Targeted extended search people 0.928 0.858 11 = Potentially fictitious people 0.058 0.088 12 = People said to be living elsewhere on Census Day 0.229 0.210

  • V3a = Group 3 Persons age 18-29 listed as child of reference person; V3b= All other group 3 persons Missing Data Procedures Section IChapter 6 6-9 U.S. Census Bureau, Census 2000

Figure 6-1.

Adjustment for Noninterviews: An Example Consider a block cluster with nine housing units, all having the same type of basic address, for example, all single-family homes, as depicted below.

Status of Census Day A.C.E.

Housing Actual Weight (and information from) interview Interview Day unit situation A.C.E. Interview status interview status Resident on 4/1/00 and at Interviewed in A.C.E.

Interview Interview 1 100 time of A.C.E. interview Resident on 4/1/00 and at Neighbor (proxy) interviewed Interview Interview 2 100 time of A.C.E. interview in A.C.E.

Resident on 4/1/00 and at No one interviewed inA.C.E.

Noninterview Noninterview 3 100 time of A.C.E. interview Vacant on 4/1/00, resident Interviewed inA.C.E., knows Vacant Interview 4 100 at time of A.C.E. interview of 4/1/00 status Vacant on 4/1/00, resident Interviewed in A.C.E., no at time of A.C.E. interview knowledge of 4/1/00 Noninterview Interview 5 100 status Vacant on 4/1/00, resident No one interviewed inA.C.E.

Noninterview Noninterview 6 100 at time of A.C.E. interview Resident on 4/1/00, vacant Information obtained from Interview Vacant 7 100 at time of A.C.E. interview proxy Resident on 4/1/00, vacant No information on 4/1/00 at time of A.C.E. interview status; Census staff deter-Noninterview Vacant mines vacant at time of 8 100 A.C.E.

Resident on 4/1/00, differ- Interviewed inA.C.E., knows Interview ent resident at time of A.C.E. of 4/1/00 status Interview 9 100 interview Note: In this noninterview cell (sample block cluster x type of basic address), people in interviewed housing units would have received the follow-ing noninterview adjustments:

a) To the person weights of nonmovers and outmovers, Census Day Noninterview adjustment = 800 / 400 = 2.

b) To the person weights of inmovers, A.C.E. Interview Day Noninterview adjustment = 700 / 500 = 1.4.

6-10 Section IChapter 6 Missing Data Procedures U.S. Census Bureau, Census 2000

Figure 6-2.

Imputation of Age in the P Sample Designate first reference person on Is the roster as the reference Note: Age, is imputed into 1 of 4 categories person reference person (0-17, 18-29, 30-49, or 50 and older) for imputation missing age?

purposes Impute age from yes distribution of age yes among persons with the same value of relationship (3 distributions) Impute age of Is there more reference person Impute from age than one no from age distribution in one- reference no distribution of person households person in the reference persons household?

5, 6, or 9 one yes more Impute age from START How many persons live than one What is value 1, 2, 3, or 4 Is there a reference distribution of age among persons in this of person in the with the same value household? relationship? household? of relationship with a reference person of the same age (16 distributions) 0 (missing) no Relationship (to reference person) has the following Impute age from distribution of Impute age from possible values for persons in imputation purposes: distribution of age multiperson house- among persons holds (excluding with the same value 0=Missing reference persons) of relationship 1=Spouse (4 distributions) 2=Child 3=Sibling 4=Parent 5=Other relative 6=Nonrelative 9=A.C.E. reference person Missing Data Procedures Section IChapter 6 611 U.S. Census Bureau, Census 2000

Figure 6-3.

Imputation of Sex in the P Sample Is sex missing for yes both no Impute the opposite sex for reference the person missing sex person and spouse?

1. Impute sex of reference person from distribution1 yes of reference persons with Does someone in the spouse present START household have relationship of no
2. Next, impute the opposite sex for spouse spouse?

Impute sex of reference 1 (spouse) or person from distribution1 of 9 (reference person) reference persons with spouse absent How many What is more than one value 2, 3, 4, 5, or 6 (all other) persons live in this of household? relationship?

one 0 (missing)

Impute from sex distribution in one- Does Impute sex from distribution1 person households someone in the yes that excludes reference household have persons and spouses and relationship of persons with missing spouse? relationship no Impute sex from Relationship (to reference distribution1 that excludes person) has the following reference persons possible values for imputation purposes:

0=Missing 1 This distribution only includes persons in multiperson 1=Spouse households (i.e. exclude persons in one-person households) 2=Child 3=Sibling 4=Parent 5=Other relative 6=Nonrelative 9=A.C.E. reference person 612 Section IChapter 6 Missing Data Procedures U.S. Census Bureau, Census 2000

Figure 6-4.

Imputation of Tenure in the P Sample START Is this the first yes Is this person yes Copy the tenure (collapsed to owner/nonowner) from the person in the previous household with the missing same type of basic address household? tenure? (single family, apartments, other units) no no Copy the tenure (collapsed to owner/nonowner) of the Proceed to next person first person in the household on file to this person Missing Data Procedures Section IChapter 6 613 U.S. Census Bureau, Census 2000

Figure 6-5.

Imputation of Race in the P Sample Impute race from within this household START Is everyone in the no

1. Randomly select one person with reported race as the donor household 2. Give the selected donors value of missing race to the donee race?

yes

1. Select the nearest previous household with at least one person Is everyone with reported race in the yes 2. From this household, randomly household select one person with reported missing race as the donor Hispanic 3. Give the selected donors value of origin? race to the donee no
1. Select the nearest previous household with the same value of Hispanic origin and at least one person with reported race
2. From this household, randomly select one person with reported race as the donor
3. Give the selected donors value of race to the donee 614 Section IChapter 6 Missing Data Procedures U.S. Census Bureau, Census 2000

Figure 6-6.

Imputation of Hispanic Origin in the P Sample Impute Hispanic origin from within this household START Is everyone in the no

1. Randomly select one person with reported Hispanic origin household 2. Give the selected donors value of missing Hispanic origin to the donee Hispanic origin?

yes

1. Select the nearest previous household with at least one person with reported Hispanic origin Is everyone in the yes 2. From this household, randomly household select one person with reported Hispanic origin missing race? 3. Give the selected donors value of Hispanic origin to the donee no
1. Select the nearest previous household with the same value of race1 and at least one person with reported Hispanic origin
2. From this household, randomly select one person with reported Hispanic origin as the donor
3. Give the selected donors value of Hispanic origin to the donee 1 For imputing Hispanic origin, a households race category, determined by the first person in the household with reported race, is coded as one of the following four values: 1) missing; 2) white; 3) other race, or white and other race; or 4) any of the remaining race categories Missing Data Procedures Section IChapter 6 615 U.S. Census Bureau, Census 2000

Chapter 7.

Dual System Estimation INTRODUCTION probability of being either in or not in the census enu-meration, as well as either in or not in the A.C.E.

Dual System Estimation (DSE) was used to estimate cover-age of Census 2000 using data from the Accuracy and Table 7-1. DSE Model Coverage Evaluation (A.C.E.) Survey. DSE was also used by the U.S. Census Bureau to estimate census coverage for In census Out of census Total the 1980 and 1990 censuses, and to evaluate coverage In A.C.E. N11 N12 N1+

prior to 1980. The use of DSE for measurement of cover-Out of A.C.E. N21 N22 N2+

age in 1980 is described in Fay et al. (1988), while Hogan (1992,1993) describes the use of DSE in 1990. As Total N+1 N+2 N++

described in Killion (1998), several alternatives to DSE were considered for Census 2000. These alternatives were All cells are conceptually observable except N22 and any either shown to produce results grossly inferior to DSE or of the marginal cells that include N22 (i.e., N2+, N+2, and research was not conclusive. N++). The model assumes independence between the cen-sus and the A.C.E. This means that the probability of being This chapter provides the details of DSE for the Census in the ijth cell, pij, is the product of the marginal probabili-2000 A.C.E. The DSE was calculated separately for a set of ties, pi+p+j. The estimate of total population in a post-population groups referred to as post-strata. The post- stratum with the independence assumption is stratification variables and the final post-stratification plan are discussed in detail. In addition, the variance estimation N1 N1 DSE N .

methodology used in each post-stratum is summarized N11 and some basic results are given.

The independence assumption can be in error, either due to causal dependence between the census enumeration DUAL SYSTEM ESTIMATION and the A.C.E. enumeration, or due to heterogeneity in capture probabilities within a post-stratum. Causal depen-This section contains the details of the DSE calculated dence occurs when the event of an individuals inclusion within each final post-stratum. It describes the basic DSE or exclusion from one system affects his or her probability model, including a discussion of the advantage of post- of inclusion in the other system. For example, some stratification. The details of the DSE computed within each people who did answer the census may not have cooper-final post-stratum for Census 2000 are presented. All com- ated with the A.C.E., thinking they had helped enough.

ponents of the DSE are defined. The DSE accounted for As another example, a person contacted during A.C.E. list-special handling of missing data, search areas for match- ing may not have responded to the census thinking that ing, and movers. Missing data and search areas for match- the A.C.E. lister already recorded them. However, even if ing are covered in detail in Chapters 6 and 5, respectively. causal independence is true for all individuals The method used to handle special problems caused by (pij = pi+p+j ), the independence assumption can be vio-movers in Census 2000 DSE is also discussed. The attach- lated by heterogeneity. Either the census inclusion prob-ment provides detailed background on options for dealing abilities p+1 or the A.C.E. inclusion probabilities p1+ must with movers in census coverage measurement surveys. be the same for all individuals. This means that homoge-The section concludes with a short discussion of how the neity in both systems is not required. For example, some DSE results serve as input to synthetic estimation down to people may try their best to avoid being counted in both the block level. A detailed discussion of synthetic estima- the census and A.C.E., resulting in these people having tion is provided in Chapter 8 and Haines (2001). much smaller inclusion probabilities than other people.

Error in the independence assumption for either reason DSE Model results in correlation bias.

The DSE model is discussed in detail in Wolter (1986) and Post-stratification, or grouping of individuals likely to have more generally in Hogan (1992). This chapter gives a gen- similar inclusion probabilities, and calculating DSEs within eral presentation. The DSE model (applied within each post-strata was done to decrease correlation bias.

post-stratum) conceptualizes each person as having a Research was carried out to determine effective variables Dual System Estimation Section IChapter 7 7-1 U.S. Census Bureau, Census 2000

for the A.C.E. post-stratification design. All variables DD = the number of census data-defined persons included in the 1990 PES post-stratification were consid- eligible and available for A.C.E. matching, ered as were several new ones. The specific variables con- CE the estimated number of correct enumerations from sidered were race/Hispanic origin, age/sex, tenure, house- the E sample, hold composition, relationship, urbanicity, percent owner, Ne the estimated number of people from the E sample, return rate, percent minority, type of enumeration area, Np the estimated total population from the P sample, household size, hard-to-count scores, census division, M the estimated number of persons from the P-sample census region, and regional census center. From these population who match to the census.

variables, fifteen post-stratification options were devel-oped for empirical research. For each post-stratification Note: Persons in Group Quarters are excluded from all the option, mean square errors of total population estimates above counts for A.C.E., as were persons in housing units and synthetic estimates were computed at the national, who were added to the census after E sample Identifica-state, and congressional district levels, as well as for tion (late adds).

selected cities. The major conclusions were as follows:

Definitions

  • The demographic variables used in the 1990 PES were effective, but did not fully capture the geographic differ- Block Cluster. A grouping of one or more census blocks.

ences, especially those affected by the quality of the Block clusters are the primary sampling units for A.C.E.

Master Address File. An urbanicity/type of enumeration and average about 30 housing units each.

variable appeared to capture much of the geographic differences. Correct Enumeration (CE). A correct enumeration is a

  • The tract-level return rate variable captured some of the person who is enumerated in a sample block cluster dur-socioeconomic differences for synthetic estimates at ing the census who is also determined by A.C.E. opera-lower levels of aggregation. tions to have lived in that block cluster (or if appropriate a surrounding block) on Census Day. Correct enumerations Details of the Census 2000 post-stratification research have a correct enumeration probability, Prce,j, equal to 1 methodology are given in Kostanich et al. (1999) and Grif- for each person j.

fin (1999). Results of this research are given in Griffin and Haines (2000) and Schindler (2000). The post-stratification Correct Enumeration Probability (Prce,j). This is design chosen for Census 2000 is provided in this chapter. defined as the probability that person j in the E sample The DSE can be written as follows: was correctly enumerated in the A.C.E. (or surrounding block) block cluster. The probability of correct enumera-DSE N 1 ( )

N1 N11 tion is typically 0 or 1, but it can take on values within this range due to missing data imputation.

That is, the total population is estimated by the number Coverage Correction Factor (CCF). The coverage cor-captured in the census times the ratio of those captured in rection factor for a post-stratum is calculated by dividing the A.C.E. survey to those captured in both systems. In the DSE for that post-stratum by its census count. A.C.E.

practice, the components of the DSE are estimated from a synthetic estimates for any data item for any geographic sample survey. N+1 is not the census count; the census area are obtained by multiplying the coverage correction count (C) must be corrected for erroneous enumerations, factor by the census count within each post-stratum, then as well as for persons enumerated in the census with summing over all post-strata (see Chapter 8 for details on insufficient information to match to the A.C.E. enumera- synthetic estimation).

tion. To actually estimate the number of people correctly enumerated in the census, a sample of all data-defined Data-Defined Person. This concept is defined for all persons is selected. This sample of data-defined census census persons. A data-defined person is a person who persons is called the enumeration or E sample. To estimate has two or more of the 100-percent data items answered the ratio of those captured in both systems to those cap- on the census form. Any items can be selected from the tured in A.C.E., the population or P sample is used. The P 100-percent data items, which include name, age, sex, sample consists of persons interviewed during A.C.E. enu- race, and Hispanic origin. Relationship to person one is meration. also a 100-percent data item for all persons besides per-The form of the DSE used in census coverage measure- son one. Persons not satisfying this criteria are referred to ment surveys such as A.C.E. is as follows: as non-data-defined.

CE Np E Sample. The E sample is the Enumeration sample. It DSE DD Ne M consists of all data-defined persons in the A.C.E. block where clusters who were enumerated in the census.

7-2 Section IChapter 7 Dual System Estimation U.S. Census Bureau, Census 2000

Group Quarters (GQ) Persons. Persons living in GQs, Targeted Extended Search (TES). A.C.E. operation in such as college dormitories, prisons, or military barracks. which block clusters are identified and selected for a GQ persons were not covered in the A.C.E. and are search of the immediate surrounding area to find persons excluded from the A.C.E. universe. geographically mis-located in a block neighboring the A.C.E. block cluster. More generally, it is the methodology Inmover. A person who moved into a P-sample housing for targeting, sampling, and implementing the search unit after Census Day. operations in the field.

Insufficient Information in Census (II). Those persons DSE Formula in the census for whom there is insufficient information The DSE for any given post-stratum was calculated by:

for inclusion in the E sample. Very little data is available for these persons. This category includes non-data-defined persons and persons in whole household imputations.

Note that insufficient information in census is different than insufficient information for matching. The former are DS E DD ( )

CE Ne [( Nn N i Mn ( ))

Mo No Ni

]

excluded from the E sample and the latter are included in the E sample. All counts and estimates are for a specific post-stratum and the subscripts n, i, and o stand for nonmovers, inmov-Late Adds. Late Adds are persons in housing units who ers, and outmovers, respectively. Adjustments to this DSE were added to the census after E-sample Identification. were occasionally made to avoid the unlikely event that These housing units had an unknown final status at the the formula results in division by zero. For post-strata time of A.C.E. matching but were subsequently included in with less than ten (unweighted) outmover persons, the the census. Persons who are Late Adds were ineligible for ratio inside the square brackets was changed to the fol-matching and, therefore, not included in the census DSE lowing:

component.

Nn No Match Probability (Prm,j). This is defined as the prob- Mn Mo ability that person j in the P sample was matched to a Coverage Correction Factor Formula census person in the search area (or in a TES block) . The match probability is typically 0 or 1, but it can take on val- The coverage correction factor (CCF) is a measure of the ues within this range due to missing data imputation. net overcount or net undercount of the household popula-tion within the census. The CCF for a post-stratum is the Mover Status. Each person in the P sample was classified ratio of the DSE to the census count:

as a nonmover, outmover, or inmover.

DSE CCF Nonmover. An A.C.E. sample person whose housing unit C on Census Day and A.C.E. Interview Day are identical.

where Outmover. A person who moved out of an A.C.E. housing C = the final census household population unit between Census Day and the date of the A.C.E. inter- count where C DD + II + LA, view. II the number of census people with insufficient information, P Sample. Also known as the Population sample. The P LA the number of people added (late) to the census sample consists of those persons confirmed to be resi- and not available for A.C.E. matching. Late Adds dents of the housing units in the A.C.E. block clusters as include both data-defined and non-data-defined records.

of Census Day by the independent portion of the A.C.E.

Note: The numerator of the CCF is based on data-defined reinterview and subsequent operations.

persons. The denominator includes data-defined and non-data-defined persons as well as late adds. Thus, we are Residence Probability (Prres,j). The probability that per-implicitly assuming the coverage of late adds and non-son j on the P-sample file is a resident of the sample data-defined persons is the same as that for data-defined household on Census Day. All inmovers are assumed to be persons. For example, a coverage correction factor of 1.05 A.C.E. Interview Day residents. Nonmovers and outmovers would imply that for every 100 people within the given can be Census Day nonresidents, if information indicates post-stratum, the net undercount is five persons.

they were not a resident of the sample household based on census residency rules. The residence probability is DSE Components typically 0 or 1 but it can take on values within this range due to missing data imputation. Each component of the DSE is described next.

Dual System Estimation Section IChapter 7 7-3 U.S. Census Bureau, Census 2000

DD is the census count (unweighted) of data-defined per- (outmovers). The P sample includes nonmovers and out-sons in the post-stratum. movers. For outmovers, the interviewers attempted a proxy interview to obtain data such as name, sex, and age The estimated number of E-sample persons is written as: that was used for matching. The match rate for inmovers was estimated by the match rate of outmovers. In con-Ne W*j trast, the number of movers in the P sample for A.C.E.

j E sample sample areas was estimated by the inmovers. Note that no where Wj* = inverse of the probability of selection, matching was done for inmovers.

including a factor for Targeted Extended Search Nn is the weighted total population for nonmovers for the sampling.

post-stratum from the P sample. The weight for each per-The estimated number of correct enumerations is calcu- son j is the product of three values:

lated as: 1. the inverse of the P-sample selection probability including a factor for the Targeted Extended Search CE Prce, j W *j sampling (Wj*),

j E sample where Prce,j is: 2. a noninterview adjustment based on Census Day inter-1 if person j correctly enumerated, view status (f *c,j ), and 0 if person j NOT correctly enumerated, or

3. a Census Day residence probability (Prres,j).

Pr*ce, j if person j is unresolved, where Pr*ce, j is estimated through missing data imputation. The estimated number of P-sample nonmovers is calcu-lated as:

Note: Probabilities for persons with unresolved final cor-rect enumeration status in the E sample or unresolved Nn f

  • c, j Prres, jW*j final residence or match status in the P sample are j Nonmovers assigned using imputation cell estimation within groups. where, Prres, j is:

See Chapter 6 for details. Within each group, a probability 1 if person j is a resident on Census Day, equal to a simple proportion is imputed for unresolved 0 if person j is NOT a resident on Census Day, or persons. For example, E-sample (or P-sample) persons in a Pr*res, j if person j is unresolved, where Pr*res, j is group with unresolved enumeration (match) status were estimated through missing data imputation.

assigned a correct enumeration (match) probability that is the proportion of correct enumerations (matches) among Note: Persons who were not residents on Census Day are persons with resolved enumeration (match) status in the not included in Nn since Prres, j = 0 is a multiplicative fac-group. The probabilities are estimated in the DSE formulas tor in each persons contribution to Nn.

as: The estimated number of P-sample nonmover matches is Pr*m, j is the estimated match probability for written as:

unresolved match status Pr*res, j is the estimated residence probability for Mn Prm, j f

  • c, j Prres, jW*j unresolved residence status j Nonmovers Pr*ce, j is the estimated enumeration probability for where, Prm,j is:

unresolved enumeration status 1 if person j is a match on Census Day, 0 if person j is NOT a match on Census Day, or Some persons moved between Census Day and A.C.E. Pr*m, j if person j is unresolved, where Pr*m, j is Interview Day. A mover is a person whose location on the estimated through missing data imputation day of the A.C.E. interview differs from his or her location on Census Day. The treatment of movers has important Ni is the weighted total population for inmovers for the ramifications for estimation. The attachment to this chap- post-stratum from the P sample. The weight for each per-ter titled The Effect of Movers on Dual System Estimation son j is the product of two values:

provides a discussion on alternative methodologies for 1. the inverse of the P-sample probability of selection handling movers. For Census 2000, movers were treated (Wj* as defined above), and by a procedure known as Procedure C, unless a post-stratum had less than ten (unweighted) outmover persons. 2. a noninterview adjustment factor based on A.C.E.

In this case, Procedure A was implemented. Procedure C Interview Day status (f*a,j).

identifies all current residents living or staying at the The estimated number of P-sample inmovers is denoted:

sample address at the time of the A.C.E. interview (non-movers and inmovers), plus all other persons who lived at Ni f* a, jW*j the sample address on Census Day who have since moved j Inmovers 7-4 Section IChapter 7 Dual System Estimation U.S. Census Bureau, Census 2000

Note that all inmovers are assumed to be A.C.E. Interview many surveys, post-stratification is done to reduce vari-Day residents. ances and partially correct for problems in sampling or undercoverage. For DSE, the primary reason for post-The estimated number of P-sample outmovers is written:

stratification is to reduce heterogeneity bias. Any variance No f* c, j Prres, jW*j reduction or sampling bias correction associated with j Outmovers post-stratification is a bonus. In fact, the usual trade-off is that forming many post-strata reduces heterogeneity at The estimated number of P-sample outmover matches is the expense of adding variance. As the number of post-calculated as: strata increases, fewer people in the coverage measure-ment survey fall into each individual post-strata.

Mo Prm, j f* c, j Prres, jW*j j Outmovers The post-stratification plan for Census 2000 A.C.E. is sum-marized in this section. Also, the detailed definitions of Synthetic Estimation the post-stratification variables and the race/Hispanic ori-gin domains are given. See Haines (2001b) for further The estimated coverage correction factors for each post-details. The 2000 A.C.E. differs from the 1990 Post-stratum were used to form synthetic estimates. Synthetic Enumeration Survey (PES) in that it has approximately estimation combines coverage error results with census twice the sample size of the PES. This larger sample size counts at the block level to produce adjusted block-level permitted the formation of more post-strata that has the population estimates. The synthetic methodology assumes advantage of reducing correlation bias, as well as sam-coverage correction factors do not vary within a post-pling variance. Additionally in 2000, multiple responses to stratum. As a result, one coverage correction factor is the race question were permitted; in 1990 only one race assumed to be appropriate for all geographic areas within could be selected.

each post-stratum. To obtain block-level synthetic esti-mates, block-level census counts for post-strata are multi- The 1990 PES post-strata started with a cross-classification plied by post-stratum coverage correction factors and of seven variables: age, sex, race, Hispanic origin, tenure, aggregated. There is one coverage correction factor for urbanicity, and region. There were 840 cells in the cross-each post-stratum, and each person in a block is in one classification. Collapsing was necessary in order to pro-post-stratum. For example, suppose all persons in a block duce post-strata with sufficient sample for reliable Dual fall into one of six post-strata. A synthetic estimate for System Estimation (DSE). The collapsing reduced the num-this block is formed by summing the product of census ber of post-strata to 357.

counts for that block and post-stratum with its corre-Race and Hispanic origin were considered the most impor-sponding coverage correction factor. A controlled round-tant variables to retain in 1990. After collapsing, five ing technique was implemented, resulting in the creation race/Hispanic origin post-strata were maintained: Non-of person records at the block level. Subsequent tabula-Hispanic White or Other, Black, Hispanic White or Other, tions, based on the original and replicated records, are Asian and Pacific Islander, and Reservation Indians. Off-corrected for coverage error. A detailed discussion of syn-reservation American Indians were placed in either the thetic estimation is provided in Chapter 8 and Haines Non-Hispanic White or Other group or the Hispanic White (2001).

or Other group, depending on whether they were of His-panic origin. Within each of these race/Hispanic origin POST-STRATIFICATION post-strata, seven age/sex categories were maintained.

Background The other variables were collapsed in the following order:

region, urbanicity, then tenure, if necessary. For American The goal of post-stratification for dual system estimation Indians residing on reservations, all these variables were is to establish groups of persons who are expected to collapsed. For Asian and Pacific Islanders, region and urba-have similar coverage. A common assumption is that nicity were collapsed and tenure maintained. For the Black people who are subject to similar housing, language, edu-and Hispanic White or Other groups, region was collapsed cation, and cultural attitudes would also share similar cen-for two levels of urbanicity. For Non-Hispanic White or sus coverage. Hogan (1993) indicated that tenure, race Other, the full cross-classification of region, urbanicity and and ethnic origin, age/sex, and degree of urban develop-tenure were maintained. Griffin and Haines (2000b) pro-ment were reasonable markers for these similarities in the vides a detailed table on the 1990 PES post-stratification.

1990 census. An earlier section noted, however, that the independence assumption of the DSE model can be in Post-Stratification Plan error due to heterogeneous capture probabilities within a post-stratum. Post-strata are formed to support DSE by The Census 2000 A.C.E. retained most of the 1990 PES grouping persons with similar census coverage, so as to post-stratification variables and included several addi-reduce heterogeneity in capture probabilities for DSEs. In tional ones. Nine variables were used in 2000: age, sex, Dual System Estimation Section IChapter 7 7-5 U.S. Census Bureau, Census 2000

race, Hispanic origin, tenure, region, Metropolitan Statisti- was a maximum of 64 x 7 = 448 post-strata. The P-sample cal Area size/Type of Enumeration Area, and tract-level size was too small or the sampling variance too high for return rate. The Metropolitan Statistical Area size variable eight of the 64 post-stratum groups. For each of these replaced the urbanicity variable that was not available eight groups, the 7 age/sex post-strata were collapsed until the summer of 2001. Type of Enumeration Area (TEA) into 3 post-strata (under 18; males 18+ and females 18+).

and the tract return rate were two new features of the 2000 A.C.E. post-stratification. The mailout/mailback As a result, direct DSEs were calculated within each of 416 areas were differentiated from other types of enumeration post-strata, which were expanded to 448 DSEs using syn-areas. In addition, tracts were classified by high or low thetic estimation for the collapsed groups. The post-return rates. Multiple responses to the race question were stratification plan was chosen to reduce correlation bias reflected in the race and Hispanic origin groupings. without having an adverse effect on the variance of the Table 7-2 shows the 64 post-stratum groups for the Cen- dual system estimator. Following is a detailed description sus 2000 A.C.E. Within each post-stratum group, there are of the post-stratification variables including an explana-seven age/sex groups (shown in Table 7-3). Thus, there tion of the race/Hispanic origin domain assignment 7-6 Section IChapter 7 Dual System Estimation U.S. Census Bureau, Census 2000

Table 7-2. Census 2000 A.C.E. 64 Post-Stratum Groups (U.S.)

High return rate Low return rate Race/Hispanic origin domain number*

Tenure MSA/TEA NE MW S W NE MW S W Domain 7 Owner Large MSA MO/MB 01 02 03 04 05 06 07 08 (non-Hispanic White or Some other race) Medium MSA MO/MB 09 10 11 12 13 14 15 16 Small MSA & Non-MSA MO/MB 17 18 19 20 21 22 23 24 All other TEAs 25 26 27 28 29 30 31 32 Nonowner Large MSA MO/MB 33 34 Medium MSA MO/MB 35 36 Small MSA & Non-MSA MO/MB 37 38 All other TEAs 39 40 Domain 4 Owner Large MSA MO/MB (Non-Hispanic Black) 41 42 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 43 44 All other TEAs Nonowner Large MSA MO/MB 45 46 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 47 48 All other TEAs Domain 3 Owner Large MSA MO/MB (Hispanic) 49 50 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 51 52 All other TEAs Nonowner Large MSA MO/MB 53 54 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 55 56 All other TEAs Domain 5 Owner 57 (Native Hawaiian or Pacific Islander) Nonowner 58 Domain 6 Owner 59 (Non-Hispanic Asian)

Nonowner 60 American Domain 1 Owner 61 Indian or (On Alaska Reservation) Nonowner 62 Native Domain 2 Owner 63 (Off Reservation) Nonowner 64

  • For Census 2000, persons can self-identify with more than one race group. For post-stratification purposes, persons are included in a single Race/Hispanic Origin Domain. This classification does not change a persons actual response. Further, all official tabulations are based on actual responses to the census.

Dual System Estimation Section IChapter 7 7-7 U.S. Census Bureau, Census 2000

Table 7-3. Census 2000 A.C.E. 6. 50+ male Age/Sex Groups

7. 50+ female Male Female The two tenure categories were:

Under 18 1

1. Owner 18 to 29 2 3 30 to 49 4 5 2. Nonowner 50+ 6 7 The four MSA/TEA categories were:

Post-stratification Variables 1. Large MSA Mailout/ Mailback (MO/MB)

This section gives a detailed description of the post- 2. Medium MSA MO/MB stratification variables including the handling of multiple responses to the race question. A.C.E. post-stratification 3. Small MSA or Non-MSA MO/MB used the following variables:

4. All other TEAs
  • race/Hispanic origin - seven categories MSA/CMSA FIPS codes, as defined by the Office of Manage-
  • age/sex - seven categories ment and Budget, were used for post-stratification. For simplification, MSA/CMSA will herein be referred to as
  • tenure - two categories MSA. Large MSA consists of the ten largest MSAs based on
  • Metropolitan Statistical Area (MSA) by Type of Enumera- unadjusted, Census 2000 total population counts includ-tion (TEA) - four categories ing the population in Group Quarters. Medium MSAs are those (besides the largest 10) that have at least 500,000
  • return rate - two categories total population. Small MSAs are those with a total popula-
  • region - four categories tion size less than 500,000. For post-stratification pur-poses, MO/MB areas were contrasted with the non-MO/MB The seven race/Hispanic origin domains were: areas.
  • American Indian or Alaska Native on Reservations The two return rate categories were:
  • Off-Reservation American Indian or Alaska Native 1. High
  • Hispanic 2. Low
  • Non-Hispanic Black Return rate is a tract-level variable measuring the propor-
  • Native Hawaiian or Pacific Islander tion of occupied housing units in the mailback universe that returned a census questionnaire. Low (high) return
  • Non-Hispanic Asian rate tracts are those tracts whose return rate is less than
  • Non-Hispanic White or Some other race or equal to (greater than) the 25th percentile return rate.

Separate 25th percentile cut-off values were formed for Inclusion in a race/Hispanic origin domain is complicated, the six applicable race/Hispanic origin by tenure groups.

as it depends on several variables and whether there are Persons in List/Enumerate, Rural Update/Enumerate, and multiple race responses. In addition, inclusion in a Urban Update/Enumerate TEAs were automatically placed race/Hispanic origin domain does not change a persons in the High category.

race/Hispanic origin response. All Census 2000 tabula-tions are based on the actual responses. For example, a The four region categories were:

person who responded as American Indian on a reserva- 1. Northeast tion and Black was placed in the first race/Hispanic origin category (Domain 1) for post-stratification purposes, but 2. Midwest was tabulated in the census as American Indian/Black. 3. South The seven age/sex categories were: 4. West

1. Under 18 Pre-Collapsing
2. 18 - 29 male Pre-collapsing was done prior to data collection and
3. 18 - 29 female knowledge of the exact sample size in each post-stratum.

All race/Hispanic origin, age/sex, and tenure categories

4. 30 - 49 male for the U.S. were initially maintained. The research for the
5. 30 - 49 female determination of the important post-stratification variables 7-8 Section IChapter 7 Dual System Estimation U.S. Census Bureau, Census 2000

provided information on the expected sample size in each

  • If some post-strata are still too small and require col-category which was then used to define a collapsing hier- lapsing, collapse region next, if applicable. This collaps-archy. The pre-collapsing plan for the region, MSA/TEA ing applies only to the Non-Hispanic White or Some and return rate variables was as follows: other race domain since the variable region is only included in their post-stratification definition. In this
  • Non-Hispanic White or Some other race Owners: No case, all levels of region (Northeast, Midwest, South, collapsing. West) are combined to eliminate the variable.
  • Non-Hispanic White or Some other race Non-owners:
  • Next, collapse the four-level MSA/TEA variable, into the Region was eliminated. following two groups:
  • Non-Hispanic Black: Region was eliminated. In addition
  • Large and medium MSA MO/MB there was partial collapsing of the MSA/TEA variable within return rate and tenure categories.
  • Small MSA and non-MSA MO/MB and all other TEAs
  • Hispanic: Region was eliminated. In addition there was
  • If further collapsing is necessary, return rate is the next partial collapsing of the MSA/TEA variable within return variable to collapse. High and Low return rate catego-rate and tenure categories. ries are combined to eliminate the variable.
  • Native Hawaiian or Pacific Islander: The region, return
  • Next collapse the variable MSA/TEA. If necessary, the rate and MSA/TEA variables were eliminated. Only ten- two groups defined above would be combined together ure and age/sex were retained. to eliminate the variable MSA/TEA completely.
  • Non-Hispanic Asian: The region, return rate and
  • The next variable to collapse is tenure. Owner and non-MSA/TEA variables were eliminated. Only tenure and owner categories are combined to eliminate the variable age/sex were retained. entirely, if necessary.
  • American Indian or Alaska Native on Reservations: The
  • If collapsing is still needed, the three remaining age/sex region, return rate and MSA/TEA variables were elimi- post-strata are combined to eliminate the age/sex vari-nated. Only tenure and age/sex were retained. able completely.
  • Off-Reservation American Indian or Alaska Native: The
  • In the event that there are less than 100 P-sample region, return rate and MSA/TEA variables were elimi- persons in a race/Hispanic origin domain, combine all nated. Only tenure and age/sex were retained. persons in that domain with Domain 7, which includes non-Hispanic White and Some other race.

Post-Collapsing In practice, only the first step of collapsing was necessary.

A.C.E. post-stratification included a plan to collapse post- Eight of the 64 post-stratum groups had their 7 age/sex strata that contained less than 100 (unweighted) P-sample post-strata collapsed to 3 age/sex groups, resulting in 32 persons, called post-collapsing, considering such a post- fewer post-strata. Thus, there were 448 - 32 = 416 post-stratum too small to produce reliable estimates. If a col- strata.

lapsed post-strata was still too small, it could have been further collapsed. The collapsing procedure was hierarchi- Race and Hispanic Origin Classifications cal and required a pre-defined collapsing order. Given the The Census 2000 questionnaire has 15 possible race pre-collapsing plan that yielded 448 post-strata, not much responses. The 15 responses are collapsed into six major post-collapsing was anticipated, but an extensive post- race groups as shown below. Races that are included in collapsing strategy was designed for completeness and to the major groups are shown in parentheses. Persons self-satisfy the requirement of pre-specification. identifying with a single race essentially place themselves Note that collapsing does not necessarily imply elimina- into one of these six categories.

tion of a variable. Collapsing can refer to a reduction in

  • White the number of categories for a variable. The following general outline describes the post-collapsing hierarchy
  • Black (Black, African American, Negro) that was planned:
  • American Indian or Alaska Native
  • If any of the 448 post-strata are too small, collapse
  • Asian (Asian Indian, Chinese, Filipino, Japanese, Korean, age/sex first. This means that within any of the 64 U.S.

Vietnamese, Other Asian) post-stratum groups, if at least one of the seven age/sex categories defined in Table 7-3 has less than

  • Native Hawaiian or Pacific Islander (Native Hawaiian, 100 P-sample persons, reduce age/sex to the following Guamanian or Chamorro, Samoan, Other Pacific three categories: Under 18, 18+ male, and 18+ female. Islander)

Dual System Estimation Section IChapter 7 7-9 U.S. Census Bureau, Census 2000

  • Some other race (There was a box on the question- only used for the post-stratification, all census data were naire labeled Some other race - Print race with a line to tabulated in accordance with the race and Hispanic origin enter any race the respondent desired.) categories selected by census respondents.

For the first time in census history, persons were able to For the following tables, Indian Country (IC) is a block-respond to more than one race category. Allowing persons level variable that indicates whether a block is (wholly or to self-identify with multiple races results in many more partly) inside an American Indian reservation/trust land, than six race groups. In fact, after collapsing race to the Oklahoma Tribal Statistical Area (OTSA), Tribal Designated six major groups, there are 26 -1 = 63 possible race com- Statistical Area (TDSA), or Alaska Native Village Statistical binations. It is necessary to subtract the 1 in this equation Area (ANVSA).

since each individual is assumed to have a race.

Tables 7-4 and 7-5 display the assignment of race/

The race variable defined above is often cross-classified Hispanic origin domains. Table 7-4 applies to Hispanic with the Hispanic origin variable to define post-strata. The persons, while Table 7-5 applies to non-Hispanic persons.

Hispanic origin variable consists of two responses, No and The first six rows of Tables 7-4 and 7-5 correspond to a Yes. Categories that are included in the Yes response are single race response. The remaining portion of the tables shown in parentheses. address the assignment of multiple race responses to a single race/Hispanic origin domain. Although a person

1. No, not Spanish/Hispanic/Latino may be associated with multiple race responses, each
2. Yes (Mexican, Mexican American, Chicano, Puerto person is included in only one of the seven race/Hispanic Rican, Cuban, Other Spanish/Hispanic/Latino) origin domains. All persons with a common number are assigned to the same race/Hispanic origin domain. The Combining the race and Hispanic origin variables yields number for each race/Hispanic origin domain was 63 x 2 =126 possible race/Hispanic origin groups. It is assigned as follows:

important to note that in a survey the size of A.C.E., no post-stratification plan of interest can support 126 Domain 1 (Includes American Indian or Alaska race/Hispanic origin groups. Consequently, each of the Native on Reservations). This domain includes any per-126 race/Hispanic origin response possibilities was son living on a reservation marking American Indian or assigned to one of seven race/Hispanic origin domains. Alaska Native either as their single race or as one of many The seven race/Hispanic origin domains are defined as races, regardless of their Hispanic origin.

follows:

1. American Indian or Alaska Native on Reservations Domain 2 (Includes Off-Reservation American Indian or Alaska Native). This domain includes any person liv-
2. Off-Reservation American Indian or Alaska Native ing in Indian Country, but not on a reservation who marks American Indian or Alaska Native either as a single race or
3. Hispanic as one of many races, regardless of their Hispanic origin.
4. Non-Hispanic Black This domain also includes any Non-Hispanic person not living in Indian Country who marks American Indian or
5. Native Hawaiian or Pacific Islander Alaska Native as a single race.
6. Non-Hispanic Asian Domain 3 (Includes Hispanic). This domain includes all
7. Non-Hispanic White or Some other race Hispanic persons who are not included in Domains 1 or 2.

Note that missing race and Hispanic origin data are All Hispanic persons (excluding American Indian or Alaska imputed. The rules used to classify the 126 race and His- Native in Indian Country) are included in Domain 3. The panic origin combinations into one of the seven only exception to this rule occurs when a Hispanic person race/Hispanic origin domains are now presented. Many of lives in the state of Hawaii and classifies himself or herself the decisions on how multiple race persons were classified as Native Hawaiian or Pacific Islander, regardless of are based on cultural, linguistic, and sociological factors, whether he or she identifies with a single or multiple race.

which are known to affect coverage and are not necessar- All Hispanic persons satisfying this condition are ily data-driven. re-classified into Domain 5.

A hierarchy was used to assign persons to a race/Hispanic Domain 4 (Includes Non-Hispanic Black). This domain origin domain. The race/Hispanic origin designation includes any non-Hispanic person who marks Black as occurs in the following order: American Indian or Alaska their only race. It also includes the combination of Black Native on Reservations, Off-Reservation American Indian and American Indian or Alaska Native not in Indian Coun-or Alaska Native, Hispanic, Non-Hispanic Black, Native try. In addition, people who mark Black and another single Hawaiian or Pacific Islander, Non-Hispanic Asian, and Non- race group (Native Hawaiian or Pacific Islander, Asian, Hispanic White or Some other race. This collapsing was White, or Some other race) are included in Domain 4.

7-10 Section IChapter 7 Dual System Estimation U.S. Census Bureau, Census 2000

The only exception to this rule occurs when a NonHispanic American Indian or Alaska Native not in Indian Country, Black person lives in the state of Hawaii and classifies him- they are included in Domain 6.

self or herself as Native Hawaiian or Pacific Islander. All Non-Hispanic Black persons satisfying this condition are Domain 7 (Includes Non-Hispanic White or Some reclassified into Domain 5. other race). Non-Hispanic White or Non-Hispanic Some other race persons are included in Domain 7. Non-Domain 5 (Includes Native Hawaiian or Pacific Hispanic persons who self-identify with American Indian Islander). This domain includes any Non-Hispanic person or Alaska Native not in Indian Country and are White or marking the single race Native Hawaiian or Pacific Some other race are classified into Domain 7. If a Native Islander. For NonHispanic persons, it also includes the race Hawaiian or Pacific Islander response is combined with a combination of Native Hawaiian or Pacific Islander and White or Some other race response, they also are American Indian or Alaska Native not in Indian Country. included in Domain 7. A person who self-identifies with Also included is the race combination of Native Hawaiian Asian and White or Asian and Some other race is also or Pacific Islander with Asian for Non-Hispanic persons. All included in this domain. Finally, all Non-Hispanic persons persons living in the state of Hawaii who classify them- who self-identify with three or more races (excluding selves as Native Hawaiian or Pacific Islander, regardless of American Indian or Alaska Native in Indian Country) are their Hispanic origin and whether they identify with a included in Domain 7. The only exception to this rule single or multiple race, are also included in Domain 5. occurs when a Non-Hispanic White or Non-Hispanic Some other race person lives in Hawaii and classifies them-Domain 6 (Includes Non-Hispanic Asian). This domain selves as Native Hawaiian or Pacific Islander, regardless of includes any non-Hispanic person marking Asian as their whether they identify with other races. Persons who sat-single race. If a person self-identifies with Asian and isfy this criteria are re-classified into Domain 5.

Dual System Estimation Section IChapter 7 7-11 U.S. Census Bureau, Census 2000

Table 7-4. Census 2000 A.C.E. Race/Origin Post-stratification Domains for Hispanic Indian country (IC)

Indian country (IC)

Not on On Not in IC reservation reservation Single race:

American Indian or Alaska Native . . . . . . . . . . . . 3 2 1 Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 Native Hawaiian or Pacific Islander . . . . . . . . . . *3 3 3 Asian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 Some other race . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 American Indian or Alaska Native and:

Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 1 Native Hawaiian or Pacific Islander . . . . . . *3 2 1 Asian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 1 White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 1 Some other race . .. . . . . . . . . . . . . . . . . . . 3 2 1 Black and:

Native Hawaiian or Pacific Islander . . . . . . *3 3 3 Asian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 Some other race . .. . . . . . . . . . . . . . . . . . . 3 3 3 Native Hawaiian or Pacific Islander and:

Asian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *3 3 3 White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *3 3 3 Some other race . .. . . . . . . . . . . . . . . . . . . *3 3 3 Asian and:

White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 3 Some other race . .. . . . . . . . . . . . . . . . . . . 3 3 3 American Indian or Alaska Native and:

Two or More Races . . . . . . . . . . . . . . . . . . . . *3 2 1 All Else** . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *3 3 3

  • All persons living in the state of Hawaii who classify themselves as Native Hawaiian or Pacific Islander, regardless of their Hispanic origin and whether they identify with a single or multiple race, are included in Domain 5, which includes Native Hawaiian or Pacific Islander.
    • All Else encompasses all remaining combinations that exclude American Indian or Alaska Native.

7-12 Section IChapter 7 Dual System Estimation U.S. Census Bureau, Census 2000

Table 7-5. Census 2000 A.C.E. Race/Origin Post-stratification Domains for Non-Hispanic Indian country (IC)

Not on On Not in IC reservation reservation Single race:

American Indian or Alaska Native . . . . . . . . . . . . 2 2 1 Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 Native Hawaiian or Pacific Islander . . . . . . . . . . 5 5 5 Asian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 6 6 White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 Some other race . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 American Indian or Alaska Native and:

Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 1 Native Hawaiian or Pacific Islander . . . . . . 5 2 1 Asian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 1 White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 1 Some other race . .. . . . . . . . . . . . . . . . . . . 7 2 1 Black and:

Native Hawaiian or Pacific Islander . . . . . . *4 4 4 Asian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 Some other race . .. . . . . . . . . . . . . . . . . . . 4 4 4 Native Hawaiian or Pacific Islander and:

Asian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *7 7 7 Some other race . .. . . . . . . . . . . . . . . . . . . *7 7 7 Asian and:

White . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 Some other race . .. . . . . . . . . . . . . . . . . . . 7 7 7 American Indian or Alaska Native and:

Two or More Races . . . . . . . . . . . . . . . . . . . . *7 2 1 All Else** . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . *7 7 7

  • All persons living in the state of Hawaii who classify themselves as Native Hawaiian or Pacific Islander, regardless of their Hispanic origin and whether they identify with a single or multiple race, are included in Domain 5, which includes Native Hawaiian or Pacific Islander.
    • All Else encompasses all remaining combinations which exclude American Indian or Alaska Native.

Dual System Estimation Section IChapter 7 7-13 U.S. Census Bureau, Census 2000

VARIANCE ESTIMATION each particular cluster not been part of the sample. The overall variance is calculated by summing the squares of The A.C.E. sample was considered a three-phase the differences between the replicate DSE and the whole-samplethe initial listing sample was the first phase; sample DSE.

A.C.E. reduction and small block cluster subsampling was the second phase; and Targeted Extended Search (TES) The most important challenge for the Census 2000 A.C.E.

was the third phase. Multiphase sampling differs from variance estimation was the precise form for calculating multistage in the following way. In a multistage design, the contribution of replicate DSEs to the variance estima-the information needed to draw all stages of the sample is tor; in particular, new weights had to be calculated for rep-known before the sampling begins; in a multiphase licates to represent the effect of removing the cluster design, the information needed to draw any phase of the whose replicate was being calculated. No previous results sample is not available until the previous phase is com- were directly applicable to the DSE, but a methodology pleted. Because of the multiphase nature of the design was developed based on the work of Rao and Shao (1992).

(housing counts not available until after the first-phase The remaining part of this section describes the precise listing), a new variance estimator needed to be developed. formulas in detail. They require somewhat complex nota-Full details are given in Starsinic and Kim (2001). tion and mathematical steps.

Our goal is to obtain a variance estimator for the Dual Sys- Detailed Methodology tem Estimator (DSE), of the form:

A general estimator of a total is:

DS E DD ( )

CE Ne ( Nn N i Mn ( )

Mo No Ni

) (1)

Ty wi y i The estimator for the jth replicate is i

(3) where:

T yi w ij yi (4)

DD number of census data-defined persons i CE estimated number of A.C.E. E-sample where yi is the characteristic of interest, and wi(j) is the correct enumerations replicate weight for the ith unit, which differs from the Ne estimated number of A.C.E. E-sample original weight in a prespecified subset of the observa-persons tions. With these replicate estimators, a variance estimator Nn estimated number of A.C.E. P-sample can be constructed:

nonmovers Ni estimated number of A.C.E. P-sample Var Ty cj T yi Ty2 (5) inmovers j No estimated number of A.C.E. P-sample Before continuing, we must set down some specific nota-outmovers tion. Let wi be the first phase sampling weight, and let yi Mn estimated number of A.C.E. P-sample be the cluster-level total of any of the seven estimated nonmover matches components of the DSE (CE, Nn, etc.). Let A and A2 indicate Mo estimated number of A.C.E.

the first and second phase samples, respectively. Let xig=1 P-sample outmover matches if unit i is in group (second phase stratum) g and zero The DSE is computed separately for each post-stratum otherwise. Let nh be the number of units selected in first-denoted by h. The national corrected population estimate phase stratum h. Let ng be the number of units in stratum is computed as: h that are also in group g, and let rg be the number of the ng units selected in the second phase. In all of the follow-T US DS Eh (2) ing equations, j will represent one cluster that is being h

dropped to calculate its associate replicate estimate T(j);

k is one cluster other than the one being dropped.

There is no closed-form solution for the variance estima-tor, and the Taylor linearization variance estimator is very For two-phase stratified sampling, there are two different complex. That leaves replication methodology as the only point estimators, the Double Expansion Estimator (DEE) practical variance estimator. Specifically, a stratified jack-knife estimator was the type of replication method chosen ng DEE wi xig yi (6) for the implementation. g iA 2 rg A jackknife estimator is calculated from a set of replicates and the Reweighted Expansion Estimator (REE) where the number of replicates is equal to the number of observations (clusters in this case) in the sample. Each replicate represents what the DSE would have been had REE g i 2

( )

i i A2 wi xig wixig wi xig yi (7) 7-14 Section IChapter 7 Dual System Estimation U.S. Census Bureau, Census 2000

There is an established result by Rao and Shao (1992) 2 which gives a replicate variance estimator for the REE T y3 i ui k1 iA i

i tk sik ai vi (11) 2 2 under two-phase stratified sampling. Unfortunately, all the individual components of the DSE, such as Ne, the number where sik is the third phase stratum indicator (sik=1 if the of E-sample people, are DEEs. Taking a closer look at the cluster is selected with certainty, 0 otherwise; si2=1-si1, an DEE, however, suggested a procedure that could be indicator that the cluster is eligible to be selected system-applied. atically), ai is the third phase sample indicator (ai=1 if the cluster is in A3, 0 otherwise), and tk, the TES conditional ng iA xig, rg xig weight, is equal to iA2 sik sik number of clusers selected in phase 2

( )

xkg iA2 iA2 ng tk DEE wi xig yi k

wi xig yi sik ai sik number of clusters selected in phase 3 g iA2 rg g i2 xkg k

i2 i3 (12) 2 For si1, the certainty stratum, all clusters within it have g i2

( wk xkg wk1 k 2 wk xkg wk1

) wi xig yi (8) ai=1, so tk=1 for all clusters in the stratum.

To create the replicate estimator, simply apply what was learned above in equations (8) and (10).

The DEE has just been rewritten in a form that is quite 2 similar to the REE. This suggests the following generaliza- T y3j ij ui + ij tkj sik ai vi (13) iA2 k1 iA tion: 2 Ty2 i yi, where ij ui iA iA2 ij tlj sil ai vi iA ij t2j si2 ai vi 2 2 i 2 where, i

g

( wk xkg qk k

k 2 wk xkg qk

) wi xig (9) t2j t 1j 1 ijsi2 1 iA2 i

-1 ijsi2 ai 1 i and where qj = 1 for the REE and wi for the DEE. iA2 Implementation of Variance Estimation for the Replicates are then naturally written as:

A.C.E.

Tyj 2

ij yi, where iA2 The first step in implementing this variance estimation methodology is calculating the replicate weights. To this ij g

( wkj xkg qk k

k 2 wkj xkg qk

) wij xig (10) point, the method of replication used to arrive at the vari-ance is immaterial, but we will now state that the jack-knife will be used. Let the replicate weights after the first stage of sampling be the standard jackknife replicate weights When qj=1 (i.e. the REE case), the replicate variance esti-mator of this generalized estimator, based on equation (5),

is the same as the REE replicate variance estimator of Rao and Shao (1992). wij { 0 nh nh 1 if i j whi if i and j are in the same first phase stratum whi otherwise Application To a Three-Phase Dual System (14)

Estimator Then, the final weights are obtained by applying equation (10).

Within any of the seven components of the DSE that are subject to sampling error (CE, Ne, Nn, No, Ni, Mn, and Mo), Note that this is an unusual form of the jackknife. Nor-the cluster sums (yi) can be broken down into two compo- mally, the jackknife has as many replicates as observa-nents: the total prior to any adjustments made by TES (ui), tions. Here, there are 11,303 clusters remaining after the and the additional total from the TES sample (vi). This sec- second phase of the sample, but the number of replicates ond piece can be further subdivided into TES totals from is equal to the first phases sample size of 29,136 clusters.

clusters sampled with certainty, and TES totals from clus- The clusters sampled out in the second phase obviously ters sampled systematically. The estimator (a DEE) of one do not contribute to the variance due to the second and of the components is third phases, but they must be included to accurately Dual System Estimation Section IChapter 7 7-15 U.S. Census Bureau, Census 2000

account for the first phase of sampling. Deleting a clus- Equation (13) was used for the separate computation of ter that was sampled out changes the weights of the other each of the seven replicated terms of the DSE: CE(j), Ne(j),

clusters that were in the same first phase sampling stra- Nn(j), Ni(j), No(j), Mn(j), and Mo(j).

tum.

The variance estimates for post-stratum h used formula The second step of the implementation is to adjust the (5):

imputation of certain probabilities to account for the repli-n1, i 1 cation. This is a component of the variance that can be Var DS Eh DS E hj DS E h2 (17) accounted for by including the effect of the replicate j n1,i weights in the imputation. For some persons, their match, finally, the variance of the national adjusted population residence, or correct enumeration status remains unre- estimate is:

solved even after follow-up operations. In these cases, a probability for each unresolved status is imputed using an Var T US poststratum h poststratum h' Cov DS Eh, DS E h',

imputation cell technique, with each unresolved case in an where Cov DS E h, DS E h Var DS E h, and imputation cell getting the same imputed probability. The (18) general form for the replicated imputation of the prob-n1,i 1 ability for an unresolved person in imputation cell k is: Cov DS E h, DS Eh' DS E hj DS Eh DS Ehj' DS Eh')

j n1,i wp* j tp* j Prp resolved pk Covariances exist between post-strata mostly because of Prk* j (15) correlations between members of the same household resolved pk wp* jtp* j being in different post-strata but having the same prob-ability of being included in the sample. For instance, where the summation is over all resolved persons in impu-within a given race/Hispanic origin/tenure/region group tation cell k, and:

there exists some covariance among males 30-49, females 30-49 and children 0-17, because such persons are likely w*p = person-level weight for replicate j, incorporating all to live in the same household, and hence, show very simi-sampling operations except TES, and not including lar census and A.C.E. inclusion probabilities.

the noninterview adjustment RESULTS t*p j { conditional TES weight for replicate j, the inverse of the probability of selection in the TES sample, if the person is a TES person 1 if the person is NOT a TES person The percent net undercount (UC) is the estimated net undercount (or net overcount) divided by the dual system Prp estimate for a post-stratum expressed as a percentage. A

{ 1 if a person is a match resident correct enumeration positive number implies undercoverage, while a negative 0 if a person is NOT a match resident correct number implies overcoverage. The percent net undercount for Census 2000 shown in this document is strictly for the enumeration}

household population and excludes group quarters per-To complete the estimation of the variances, the 29,136 sons.

replicate dual system estimates were computed for each DSE C of the 448 post-strata:

UC ( DSE

) 100 DS E hj C II ( )(

CE j j

N nj N ij

) (16)

Table 7-6 presents the estimated percent net undercount N e M nj ( )

M j o

N oj N ij for each of the 64 post-stratum groups. Table 7-7 presents the standard error of each of these estimates. Many more results are available in Davis (2001).

7-16 Section IChapter 7 Dual System Estimation U.S. Census Bureau, Census 2000

Table 7-6. Census 2000 A.C.E. 64 Post-Stratum Groups - Percent Net Undercount High return rate Low return rate Race/Hispanic origin domain number*

Tenure MSA/TEA NE MW S W NE MW S W Domain 7 Owner Large MSA MO/MB 0.81 0.01 0.36 -0.38 -3.62 -2.61 2.19 1.14 (non-Hispanic White or Some other race) Medium MSA MO/MB 0.30 -0.12 0.46 -0.28 -4.39 -0.33 0.66 1.81 Small MSA & Non-MSA MO/MB -0.25 0.14 0.44 0.30 2.29 2.61 2.09 2.71 All other TEAs 1.84 -1.11 1.34 0.85 0.56 -0.16 0.15 1.59 Nonowner Large MSA MO/MB 1.82 1.02 Medium MSA MO/MB 0.61 2.83 Small MSA & Non-MSA MO/MB 2.45 3.61 All other TEAs 1.64 4.08 Domain 4 Owner Large MSA MO/MB (Non-Hispanic Black) 1.63 -1.31 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 0.07 0.46 All other TEAs Nonowner Large MSA MO/MB 4.18 3.42 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 2.64 0.12 All other TEAs Domain 3 Owner Large MSA MO/MB (Hispanic) 1.46 0.04 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.66 1.08 All other TEAs Nonowner Large MSA MO/MB 3.52 4.98 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 4.88 10.74 All other TEAs Domain 5 Owner 2.71 (Native Hawaiian or Pacific Islander) Nonowner 6.58 Domain 6 Owner 0.55 (Non-Hispanic Asian)

Nonowner 1.58 American Domain 1 Owner 5.04 Indian or (On Alaska Reservation) Nonowner 4.10 Native Domain 2 Owner 1.60 (Off Reservation) Nonowner 5.57

  • For Census 2000, persons can self-identify with more than one race group. For post-stratification purposes, persons are included in a single Race/Hispanic Origin Domain. This classification does not change a persons actual response. Further, all official tabulations are based on actual responses to the census.

A negative net undercount denotes a net overcount.

Dual System Estimation Section IChapter 7 7-17 U.S. Census Bureau, Census 2000

Table 7-7. Census 2000 A.C.E. 64 Post-Stratum Groups - Standard Error of the Net Undercount in Percent High return rate Low return rate Race/Hispanic origin domain number*

Tenure MSA/TEA NE MW S W NE MW S W Domain 7 Owner Large MSA MO/MB 0.43 0.36 0.87 -0.45 1.05 1.43 1.54 2.09 (non-Hispanic White or Some other race) Medium MSA MO/MB 0.85 -0.28 0.42 0.38 1.52 0.84 1.10 2.79 Small MSA & Non-MSA MO/MB 1.33 0.40 0.43 0.57 3.60 2.12 1.08 1.49 All other TEAs 1.06 0.39 0.97 1.66 2.17 1.21 0.65 1.89 Nonowner Large MSA MO/MB 0.63 1.01 Medium MSA MO/MB 0.71 1.24 Small MSA & Non-MSA MO/MB 0.51 1.24 All other TEAs 0.94 1.67 Domain 4 Owner Large MSA MO/MB (Non-Hispanic Black) 0.56 1.24 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.07 1.86 All other TEAs Nonowner Large MSA MO/MB 0.66 1.05 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 0.96 2.08 All other TEAs Domain 3 Owner Large MSA MO/MB (Hispanic) 0.52 1.26 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.01 2.09 All other TEAs Nonowner Large MSA MO/MB 0.67 1.12 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.55 4.12 All other TEAs Domain 5 Owner 3.83 (Native Hawaiian or Pacific Islander) Nonowner 4.07 Domain 6 Owner 0.87 (Non-Hispanic Asian)

Nonowner 0.98 American Domain 1 Owner 1.45 Indian or (On Alaska Reservation) Nonowner 1.42 Native Domain 2 Owner 1.95 (Off Reservation) Nonowner 2.02

  • For Census 2000, persons can self-identify with more than one race group. For post-stratification purposes, persons are included in a single Race/Hispanic Origin Domain. This classification does not change a persons actual response. Further, all official tabulations are based on actual responses to the census.

A negative net undercount denotes a net overcount.

7-18 Section IChapter 7 Dual System Estimation U.S. Census Bureau, Census 2000

Attachment.

The Effect of Movers on Dual System Estimation This attachment discusses the effect of movers on Dual for Census 2000, inmover matching would have had an System Estimation (DSE). Three alternative methodologies even higher level of difficulty. A decision was made that for handling movers in DSE have been considered by the Procedure B would NOT be used for Census 2000. When U.S. Census Bureau. Historically, they are referred to as the Supreme Court decided against sampling for appor-PES-A, PES-B, and PES-C. However, the current terminology tionment (no sampling for nonresponse), it was too late to is to refer to them as Procedures A, B, and C. Following are change the decision on Procedure B.

the definitions of these methodologies as described in U.S.

Bureau of the Census (1985). In the 1995 and 1996 Census tests, Procedure A was used. The U.S. Census Bureau reasoned that an outmover Procedure A. This procedure reconstructs the house-match rate would be more accurate than an inmover holds as they existed at the time of the census. A respon-dent is asked to identify all persons who were living or match rate, particularly with sampling for nonresponse.

staying in the sample household on Census Day. These For outmovers, interviewers attempted to obtain the persons are then matched against names on the census names, new addresses and other data that could be used questionnaire for the sample address (and surrounding for matching from the new occupants or neighbors. Then area). From this information, estimates of the number and an attempt could be made to trace the people to obtain an percent matched for nonmovers and outmovers can be interview with a household member. The best available made. data for outmovers was matched to their Census Day addresses in the same manner as for the nonmovers.

Procedure B. This procedure identifies all current resi-dents living or staying in the sample household at the Outmover tracing had problems in 1995 and was tested in time of the interview. The respondent is asked to provide 1996 and in the Census 2000 Dress Rehearsal. The out-the address(es) where these persons were living or staying mover tracing evaluation by Raglin and Bean (1999) on Census Day. These persons are then matched against showed that there is little gain in an outmover tracing names on corresponding census questionnaire(s) at the operation. A decision was made to use the outmover nonmovers or inmovers census address. Estimates of the proxy interview data for outmover matching for Census number and percent matched for nonmovers and inmov- 2000.

ers can be made.

Procedure C was tested in the Census 2000 Dress Procedure C. This procedure identifies all current resi- Rehearsal and it was used in Census 2000 (Schindler, dents living or staying at the sample address at the time 1999). The advantage of Procedure C is that the estimate of the interview plus all other persons who lived at the of the number of movers uses inmover data, which is sample address on Census Day and have moved since more reliable since it is collected from the inmovers them-Census Day. However, only the Census Day residents (non- selves. The match rate of the movers is estimated using movers and outmovers) are matched with the census the outmover match rate so that the difficulties of inmover questionnaire(s) at the sample address. Estimates of the matching are avoided. Outmover tracing is a problem, number of nonmovers, outmovers, inmovers, and the per- however, and in many cases it is necessary to use proxy cent matched for nonmovers and outmovers, can then be data for matching. There was no outmover tracing for made. Estimates of nonmovers and movers come from Census 2000. Procedure C attempts to obtain a Procedure Procedure B and match rate estimates for the movers from B estimate with no inmover matching. Procedure C and Procedure A (using outmover matching). Thus, Procedure Procedure B estimates are different since outmovers do C is a combination of Procedures A and B. not have the same match rate as inmovers. However, the In 1990, Procedure B was used. The unresolved match rate disadvantage of the Procedure B inmover match rate esti-for inmovers in 1990 was high, around 13 percent. In mate is that it may yield a high percentage of unresolved addition, with sampling for nonresponse initially planned cases.

Dual System Estimation Section IChapter 7 7-19 U.S. Census Bureau, Census 2000

Chapter 8.

Model-Based Estimation for Small Areas INTRODUCTION Synthetic estimation was used for Census 2000 to provide This chapter documents the Accuracy and Coverage Evalu- adjusted population estimates for small geographic areas ation (A.C.E.) methodology of synthetic estimation for such as blocks, tracts, counties, and congressional dis-small areas including the estimation of sampling variances tricts. These block-level estimates can then be aggregated of synthetic estimates and the generalization of the vari- to any geographic level. The synthetic estimates provide ances. Synthetic estimation is the particular model used revised population counts for both all persons and per-for coverage adjustment for small areas for A.C.E. First, sons 18 and over. Counts are also provided for Hispanic or the synthetic estimation methodology and the implied Latino persons by race (63 categories) and Not Hispanic or model are described. Then, the methodology for estimat- Latino persons by race (63 categories) for both the total ing sampling variances of these synthetic estimates and population and the population 18 years and over. For for generalizing these variances are discussed. example, counts of single-race Asian persons who are Not Hispanic or Latino are given for both the total population SYNTHETIC ESTIMATION METHODOLOGY FOR SMALL AREAS and the population 18 years and over. Counts of single-race Asians who are Not Hispanic or Latino who are less Background than 18 years of age can be obtained by subtraction.

As discussed in Chapter 7, dual system estimates (DSE) Synthetic estimates are formed by combining coverage and coverage correction factors were calculated at the measurement results with census counts to produce popu-post-stratum level. These are direct A.C.E. Survey esti- lation estimates for any geographic area of interest.

mates, based only on data from sample units in the post- For example, a block-level synthetic estimate is formed by stratum. However, census counts adjusted for coverage distributing a post-stratums coverage correction factor to error are desirable for small geographic areas much blocks proportional to the size of the post-stratums popu-smaller than any post-stratum such as blocks, tracts, lation within the block. Rounded, adjusted synthetic esti-counties and congressional districts. The adjusted counts mates at the tabulation block level constitute the adjusted were expected to improve data used for congressional redistricting1 data file.

redistricting as well as states, most metropolitan areas, and larger counties and cities and to provide consistent The synthetic estimation model assumes that coverage totals when census data are aggregated over many small correction factors are uniform within a given post-stratum, areas. Many of these areas do not include any A.C.E. meaning that the coverage error rate for a given post-sample units, making a direct estimate impossible (see stratum is the same within all blocks. To the extent that Chapter 3 for details of A.C.E. sampling). The geographic the synthetic assumption is incorrect, the estimates of areas that include A.C.E. sample units only have a small coverage for individual areas are biased and, hence, so number of sample units. A direct estimate would result in are the population size estimates based on the coverage unacceptably large standard errors. Synthetic estimation is correction factors. Synthetic estimation bias decreases as discussed in Ghosh and Rao (1994), Gonzalez (1973), and the size of the geographic area increases.

Gonzalez and Waksberg (1973). Gonzalez (1973) describes synthetic estimation as follows: An unbiased estimate is Synthetic Estimation obtained from a sample survey for a large area; when this estimate is used to derive estimates for subareas under This section describes the calculation of synthetic esti-the assumption that the small areas have the same charac- mates. Synthetic estimation includes a controlled rounding teristics as the large area, we identify these estimates as procedure used to produce estimates that are integer-synthetic estimates. Synthetic estimation was first used valued. The visual representation of the twelve steps in by the National Center for Health Statistics (1968) to calcu- the controlled rounding process given in Haines (2001) is late state estimates of long and short term physical dis- provided here.

abilities from the National Health Interview Survey data (Ghosh and Rao, 1994). Synthetic estimation is a useful 1

procedure for small area estimation, mainly due to its sim- Since it was originally intended that the A.C.E. might be used to adjust census counts for redistricting, such data is called plicity and potential to increase accuracy in estimation by redistricting data, although it was not ultimately used for that borrowing information from similar small areas. purpose.

Model-Based Estimation for Small Areas Section IChapter 8 8-1 U.S. Census Bureau, Census 2000

Calculation Changes in an individuals post-stratum would also cause changes in the dual system estimates, coverage correction Consider forming synthetic estimates for geographic level factors, and synthetic estimates. To avoid potential incon-g for a given post-stratum. Let Ci,g denote the census sistencies in the assignment of people to post-strata, there count for post-stratum i in geographic level g and define was only one assignment of people to post-strata. The CCFi to be the coverage correction factor for post-stratum assignment was based on collection-block geography,

i. The general form for a synthetic estimate for post-which was consistent with the geography used in the stratum i at geographic level g is calculated as A.C.E. Further, this post-stratification assignment was S C CCF N maintained for all estimation purposes.

i,g i,g i.

Aggregating synthetic estimates over all the post-strata in Controlled Rounding geographic level g yields a synthetic estimate for the total Synthetic estimates at any geographic level are not typi-population of geographic level g. This is denoted as cally integer-valued. A controlled rounding program, developed by the Statistical Research Division (SRD) of the N g i,g s C CCF i

i.

U.S. Census Bureau, was utilized that produces integer-valued estimates. The theory of controlled rounding is One purpose of synthetic estimation and the controlled given in Cox and Ernst (1982). The problem is represented rounding procedure is to produce integer-valued adjusted as a transportation theory problem to minimize an objec-synthetic estimates at the tabulation block level. Then, tive function that measures the change due to controlled summing over different geographies within a larger area rounding. In essence, the controlled rounding program yields the same estimate as that for the larger geographic takes a two-dimensional matrix of numbers and rounds area. These estimates comprise the adjusted redistricting each to an adjacent integer value based on an efficiency data file. algorithm. An optimal solution that minimizes the change due to controlled rounding is guaranteed; there can, how-Geography ever, be more than one optimal solution. The two dimen-Components of synthetic estimates use two slightly differ- sions of the matrix are: 1) the post-strata for one level of ent organizations of geography. Both collection and tabu- geography; and 2) totals for a lower level of geography.

lation blocks are used in the synthetic estimation process. The controlled rounding procedure ensures that the sum A collection block is a geographic area used during census of the synthetic estimates within a geographic level are data-collection activities. The Hundred-Percent Census rounded up or down by an amount strictly less than one Edited File (HCEF) is based on collection block geography. person.

Tabulation blocks, on the other hand, are geographic areas The overall goal of controlled rounding was to obtain an used for tabulating census data. The Hundred-Percent integer number of persons for each post-stratum i within Detail File (HDF) is based on tabulation block geography.

each tabulation block b, reflecting the estimates of over-Synthetic estimation census counts are based on tabula-count and undercount. The controlled rounding program tion block geography while the coverage correction fac-could not be implemented in one step due to the size of tors associated with post-strata are based on collection the post-strata by tabulation block matrix. As a result, block geography. This could have ramifications on vari-controlled rounding was implemented in steps such that ables with a geographic component, although any such the rounded, adjusted synthetic estimates for blocks effects are probably small.

sum to:

For example, consider the post-stratification variable

  • the rounded, adjusted synthetic estimates for tracts, return rate. Return rate was calculated at the tract level and based on collection-tract definitions. People were
  • the rounded, adjusted synthetic estimates for counties, assigned to post-strata based on the return rate of tracts and defined using collection blocks. Now consider the case
  • the rounded, adjusted synthetic estimates for states.

where people are assigned to post-strata based on the In other words, the block, tract and county rounded, return rate of tracts defined using tabulation blocks. It adjusted synthetic estimates would all be consistent with could be the case that the change in geography causes an each other. Also, the state-level synthetic estimates are individuals post-stratum assignment to change. For adjusted in order to guarantee that total population esti-example, suppose the return rate of a collection-tract is 80 mates at the state level sum to the national total popula-percent and that the collection tract is split into two pieces tion estimate.

by a tabulation-tract. A person who belonged to the collection-tract (with an 80 percent return rate) may now A controlled rounding procedure for the U.S. can be imple-belong to a tabulation-tract with a different return rate. mented as follows:

8-2 Section IChapter 8 Model-Based Estimation for Small Areas U.S. Census Bureau, Census 2000

1. Form the ratio of the control-rounded dual system estimate (DSER) to the unrounded DSE for post-stratum Post-stratum i in state s Post-stratum i in state s
i. It is written as County 1 2 .. i .. County 1 2 .. i ..

DSEiR 1 1 DSEi 2 2

2. For each post-stratum i within state s, multiply the  : AS  : RS N i,c N i,c state-level synthetic estimate by the ratio formed in c c step 1. The superscript AS denotes an adjusted syn-thetic estimate. The resulting product is the adjusted  :  :

synthetic estimate for post-stratum i, within state s written as AS N S [ DSER DSE ] where N S C CCF 7. Form the ratio of the rounded, adjusted, county-level N i,s i,s i i i,s i,s i. synthetic estimate to the county-level synthetic esti-mate for post-stratum i in county c in state s.

3. Apply the controlled rounding procedure to the adjusted state-level synthetic estimates to produce rounded, state-level synthetic estimates, denoted N RS. 8. For each post-stratum i within tract t in county c for i,s The superscript RS denotes a rounded, synthetic esti- state s, form the product of the tract-level synthetic mate. The two dimensions of this matrix are state s by estimate and the ratio formed in step 7. This results in post-stratum i. the adjusted tract-level synthetic estimate for post-stratum i, written as N AS N S [N RS N S ] where N S C CCF .

i,t i,t i,c i,c i,t i,t i Post-stratum i Post-stratum i State 1 2 .. i .. State 1 2 .. i .. 9. Apply the controlled rounding procedure to the adjusted tract-level synthetic estimates to produce 1 1 rounded, adjusted tract-level synthetic estimates, RS. The two dimensions of this matrix are 2 2 denoted N i,t tract t (in county c in state s) by post-stratum i (in

AS N  : RS N county c in state s).

i,s i,s s s Post-stratum i in county c in Post-stratum i in county c in

4. Calculate the ratio of the rounded state-level synthetic state s state s estimate to the state-level synthetic estimate for post-stratum i in state s. Tract 1 2 .. i .. Tract 1 2 .. i ..
5. For each post-stratum i within county c for state s, 1 1 multiply the county-level synthetic estimate by the 2 2 ratio formed in step 4. The resulting product is the adjusted county-level synthetic estimate for post-  : AS N  : N RS i,t i,t stratum i, written as t t AS N S [N RS N S ] where N S C CCF .  :  :

N i,c i,c i,s i,s i,c i,c i

6. Apply the controlled rounding procedure to the adjusted county-level synthetic estimates to produce 10. Calculate the ratio of the rounded, adjusted tract-rounded, adjusted, county-level synthetic estimates, level synthetic estimate to the tract-level synthetic denoted N RS. The two dimensions of this matrix are estimate for post-stratum i in tract t in county c in i,c county c (in state s) by post-stratum i (in state s). state s.

Model-Based Estimation for Small Areas Section IChapter 8 8-3 U.S. Census Bureau, Census 2000

11. For each post-stratum i within block b in tract t in block b. The selected records were replicated and county c for state s, multiply the block-level syn- appended to the file of person records. The undercount thetic estimate by the ratio formed in step 10. The person record for each of the replicated records was given resulting product is the adjusted block-level syn- an effective weight of +1 for tabulations. This resulted in thetic estimate for post-stratum i, written as an upward adjustment of people in post-stratum i in tabulation block b.

AS N N S [N RS N S ] where N S C CCF .

i,b i,b i,t i,t i,b i,b i Coverage Correction Factors <1

12. Again, apply the controlled rounding procedure to the adjusted block-level synthetic estimates to pro- When the coverage correction factor for post-stratum i was duce rounded, adjusted block-level synthetic esti- less than one, overcount person records were replicated mates, denoted N RS. The two dimensions of this to reflect the overcount in post-stratum i and block b as i,b matrix are block b in tract t in county c in state s follows:

RS Oi,b C i,b N by post-stratum i in tract t in county c in state s. i,b If Oi,b > 0, then Oi,b overcount person records were repli-cated for post-stratum i in tabulation block b.

Post-stratum i in tract t in county Post-stratum i in tract t in county Overcount person records were replicated by randomly c in state s c in state s selecting without replacement Oi,b records from the Ci,b Block 1 2 .. i .. Block 1 2 .. i .. available person records in post-stratum i and tabulation block b. The selected records were replicated and 1 1 appended to the file of person records. The overcount per-son record for each of the replicated records was given an 2 2 effective weight of -1 for tabulations, resulting in a down-

AS  : RS N i,b N i,b ward adjustment of people in post-stratum i in tabulation b b block b.

VARIANCE ESTIMATION FOR SMALL AREAS Estimating the error due to sampling for any published estimate is a policy of the Census Bureau. This policy Record Replication for Coverage Correction applies to synthetic estimates as well as the more tradi-Once the rounded, adjusted block-level synthetic esti- tional estimates. Due to the large number of estimates at mates were formed, they were compared with the census lower levels of geography, it is not feasible to provide counts for post-stratum i in tabulation block b. Person tables listing the standard error of each published esti-records were then replicated at the post-stratum level to mate. Instead, a parameter, the generalized coefficient of reflect the coverage correction for the census blocks. No variation (GCV), is provided, that allows users to approxi-attempt was made to place these persons in households. mate the standard error for any desired estimate. The Thus, for example, the number of persons per household coefficient of variation of an estimate is simply the ratio of does not change due to coverage correction. The number the estimates standard error to the estimate itself.

of records replicated depends on the value of the coverage correction factor that is reflected in the rounded synthetic Small area variance estimation is a two-step process. The estimate for post-stratum i and tabulation block b. first step consists of producing direct variance estimates for the synthetic count estimates for small areas such as Coverage Correction Factors 1 census tracts. This process is explained under the heading Direct Variance Estimates. The second step is to model the When the coverage correction factor for post-stratum i was direct variance estimates using the generalized coefficient greater than one, undercount person records were repli-of variation, or GCV. This method is explained under the cated to reflect the undercount in post-stratum i and block heading Generalized Variance Estimates, along with an b as follows:

RS C example.

Ui,b N i,b i,b Variances calculated for small areas do not account for all If Ui,b = 0, then no additional records were necessary. If sources of synthetic error; they only reflect variations due Ui,b > 0, then we replicated Ui,b undercount person records to sampling. Synthetic population bias can exist since the for post-stratum i in tabulation block b.

same coverage correction factors are applied to areas with Undercount person records were replicated by randomly different net census coverage. See Griffin and Malec selecting without replacement Ui,b records from the Ci,b (2001) for details on estimating synthetic bias. In most available person records in post-stratum i and tabulation very small geographic areas such as blocks and tracts, the 8-4 Section IChapter 8 Model-Based Estimation for Small Areas U.S. Census Bureau, Census 2000

biases are likely to be the principal source of errors. Sam- where pling errors dominate the total error for larger areas such Cjth is the final, unadjusted census count for data item j in as states, metro area, etc. Bias in the post-stratum-level post-stratum h in tract t.

dual system estimates can stem from matching bias, data collection errors, and correlation bias, among other Here h and h' refer to particular post-strata and j refers to sources. Bell (2001) investigates and estimates correlation a data item.

bias in the A.C.E. dual system estimates by comparing them to results from Demographic Analysis. Generalized Variance Estimates The generalized coefficient of variation (GCV) is the vari-Direct Variance Estimates ance estimation methodology used for estimating vari-During the post-stratum-level A.C.E. variance estimation ances of adjusted redistricting data and for estimates of operation, a variance-covariance matrix of the A.C.E. cov- adjusted population counts for the thousands of geo-erage correction factors (CCFs) was produced. The esti- graphic areas that can be tabulated using synthetic esti-mated variance of any synthetic population estimate can mation. For a given count in a particular state, the coeffi-be computed using this matrix and the unadjusted census cient of variation (CV) was calculated for all tracts in that counts, broken down by post-stratum and excluding out- state that had population in the particular demographic of-scope persons in the A.C.E. See Starsinic (2001) for category. The CV of an estimate is estimated as the ratio details. A synthetic household population estimate (Group of the standard error of the estimate to the estimate itself, Quarters persons are not included) for tract t is written as i.e.

X t X th

=

SE X post-strata h CV X .

416 X

Cth CCFh' h1 The standard error in the numerator is the square root of where the variance estimate from (1). Tracts composed entirely of persons out-of-scope for the A.C.E. sample had no sam-Cth is the final, unadjusted census count for post-stratum pling variance (and therefore a CV of 0) and were removed h in tract t.

from the processing. Also removed were tracts with a very There were 416 post-strata used to estimate coverage. small population in the demographic category, as these The variance for the synthetic household population esti- were shown in the Census 2000 Dress Rehearsal analysis mate X is to have a disproportionate downward effect on the param-t eters. The process of removing tracts was controlled to 416 prevent removing an overly large fraction of small tracts VarX Var t X th h1 for any adjusted demographic data item. In addition, outli-416 416 ers were identified using the relative absolute deviation h1 h' 1 CovX ,X th th' (RAD) statistic for each data item j. Tracts with a RAD value above the cutoff value were removed and a new 416 416 GCV was computed using CVs of remaining tracts. There h1 h' 1 CovCth CCFh, Cth' CCFh' were four iterations of identifying and removing outliers.

Of the 286 unique demographic categories, GCVs were 416 416 calculated for the 50 states and the District of Columbia h1 h' 1 Cth Cth' CovCCFh, CCFh'. for each of the 56 largest categories and 4 additional catch-all groups.

For a given data item j in tract t, the synthetic variance for the synthetic household population estimate X is The average of the direct CVs for data items in a state is a jt expressed as GCV parameter. The state-level GCV parameters can then be used to estimate the standard error of a data item for 416 416 all geographic areas within that state. Consider the follow-VarX jt h1 h' 1 Cjth Cjth' CovCCFh, CCFh', (1) ing table of GCV parameters for a given state.

Model-Based Estimation for Small Areas Section IChapter 8 8-5 U.S. Census Bureau, Census 2000

State Parameters for Calculating the Standard Error of A.C.E.-Adjusted Data All persons Not Hispanic or Latino Demographic category All ages 18 and over All ages 18 and over GCV GCV GCV GCV All persons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0063 0.0067 0.0066 0.0069 Hispanic or Latino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0106 0.0115 X X Population of one race . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0064 0.0067 0.0066 0.0069 White alone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0073 0.0077 0.0081 0.0083 Black or African American alone . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0073 0.0083 0.0073 0.0083 American Indian and Alaska Native alone . . . . . . . . . . . . . . . . . 0.0143 0.0147 0.0188 0.0190 Asian alone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0080 0.0085 0.0081 0.0086 Native Hawaiian and Other Pacific Islander alone . . . . . . . . . . 0.0391 0.0495 0.0507 0.0545 Some Other Race alone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0109 0.0119 0.0126 0.0139 Population of two or more races . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0070 0.0077 0.0071 0.0082 Population of two races . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0071 0.0078 0.0071 0.0082 White; Black or African American . . . . . . . . . . . . . . . . . . . . . . . . . 0.0103 0.0156 0.0103 0.0157 White; American Indian and Alaska Native . . . . . . . . . . . . . . . . . 0.0088 0.0092 0.0096 0.0100 White; Asian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0116 0.0131 0.0120 0.0133 Black or African American; American Indian and Alaska Native . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.0129 0.0140 0.0128 0.0140 Asian; Native Hawaiian and Other Pacific Islander . . . . . . . . . . 0.0524 0.0560 0.0530 0.0566 All other combinations of two or more races . . . . . . . . . . . . . . . . . 0.0088 0.0095 0.0088 0.0099 Suppose a data user is interested in calculating the stan- estimate for all Asians in this county is 370 people. Users dard error of the population estimate of all Asians in a are instructed to use the formula given county. The data user would locate the GCV param- = GCV X ,

SE X eter that corresponds to the Asian alone demographic category and the All persons, All ages classification in to calculate the estimated synthetic standard error, yield-the appropriate state table. For the table above, the GCV ing 0.0080 x 370 = 2.96, or about 3 people in this parameter is 0.0080. Now assume that the population example. Similar calculations can be done for any geo-graphic level and demographic category.

8-6 Section IChapter 8 Model-Based Estimation for Small Areas U.S. Census Bureau, Census 2000

Appendix A.

Census 2000 Missing Data INTRODUCTION BACKGROUND The Census Bureau used imputation in the 2000 Decennial The census data collection activities started around mid-Census, as it has in prior censuses, to address the prob- March of 2000, through the mail or directly using census-lem of missing, incomplete, or contradictory data, an enumerators. From June to September, census staff con-inevitable aspect of censuses and surveys. It is impossible ducted nonresponse follow-up (NRFU) and coverage not to have missing data in an endeavor as massive and improvement follow-up (CIFU) operations to revisit complex as a decennial census. In Census 2000, the Cen- addresses for which census reports were not completed, sus Bureau processed data for over 120 million house- i.e. did not respond to mailout/mailback or early enumera-holds, including over 147 million paper questionnaires tion operations. Based on the results of these operations, and 1.5 billion pages of printed material. In the 2000 Cen- the Census Bureau was able to designate more than 99.5 sus, the various situations that resulted in missing data percent of housing unit records as occupied, vacant, or included incomplete or unavailable responses from hous- nonexistent housing unit addresses. To designate an ing units with previously confirmed addresses, conflicting address as vacant or nonexistent required at least two data about the same housing unit, and failures in the data- independent census operations. This was to ensure com-capture process. The various types of missing data plete census coverage. The nonexistent housing units included characteristic data (information about an enumer- were addresses of places used only for nonresidential pur-ated person, such as sex, race, age), population count poses, or places that were uninhabitable and were not data (information about the number of occupants in an included in the census counts.

identified housing unit), and housing unit status data To permit the production of census population counts, it (whether the unit is vacant, occupied, or nonexistent). The was necessary for each census address to have a status of 2000 Census used two primary types of imputation.

occupied/vacant/nonexistent and a household size if

1. The first type, called count imputation, is imputation occupied. To permit the production of the redistricting file of the number of occupants of a housing unit. Count and other more detailed census products, it was also nec-imputation applies when the Census Bureau is unable essary to have information about each person such as age, to secure any information regarding a given address, race, and sex. The count imputation covered status for or when the Census Bureau has limited information housing units with undetermined status and household about the address and does not have definitive infor- size for occupied units with an unknown number of occu-mation on the number of occupants. pants. The characteristic imputation was used to fill in the missing person data.
2. The second type, called characteristic imputation, is imputation that supplies missing characteristic data Census housing units identified in the Accuracy and Cov-for a housing units response, but does not involve the erage Evaluation ( A.C.E.) block clusters were defined as number of occupants for a housing unit. For example, the E-sample housing units. Persons residing in these if a given housing unit did not provide ages for the housing units were E-sample persons. It took several dif-individuals living in the housing unit, but supplied all ferent census operations to establish a list of census hous-other information, age would be imputed for the indi- ing unit records and a list of census person records. One viduals in that housing unit. Sometimes the household of these operations was the creation of a Hundred Percent size is known for the housing unit; however, none of Census Unedited File (HCUF). At the housing unit level, all the characteristics about the people are known. In this housing units designated as occupied or vacant through case all of the persons characteristics are imputed.1 data collection or through imputation were included in the HCUF. The file was used as a source file to identify the This appendix summarizes the methods used to impute E-sample housing units for the A.C.E. operations. At the these types of missing data in the census. Some summary person level, the HCUF was used as a source file for per-statistics showing the degree of imputation for these cat-son matching between the census and the A.C.E. (How-egories is given in the last section of this appendix.

ever, this does not include imputed persons, since they 1

were not sent to A.C.E. matching). Chapter 3 provides This does not include geographic characteristics such as loca-tion, urban or rural residency etc., which are generally known for detailed information on E-sample identification, while all households. Chapter 4 provides information on person matching.

Census 2000 Missing Data Section IAppendix A A-1 U.S. Census Bureau, Census 2000

Persons imputed to an occupied unit with an unknown mailout/mailback areas and enumerator visit in number of occupants or persons with all their characteris- list/enumerate or update/enumerate areas. To properly tics imputed were considered as non-data-defined persons represent these cases (donees), the primary donor pools in the person Dual System Estimation (DSE). For data- were also housing units from NRFU, CIFU or from other defined persons, characteristic imputation filled in census enumerator visited cases. In the design phase, the Census missing data, such as sex, age, ethnicity, and owner Bureau did develop a standby procedure to include all enu-

/renter status for person DSE poststratification purposes. merations in an expanded donor pool. With 99.5 percent of housing units having status and household size infor-COUNT IMPUTATION mation available from data collection activities, the expanded donor pools were never used. The chart below The Census Bureau used count imputation for three cat-characterizes the relationship between donees and the pri-egories of cases in Census 2000.

mary donor pool by imputation category.

1. Household size imputation. The Census Bureau imputed the number of occupants for a housing unit when Census Bureau records indicated that the hous- Donors and Donees by Imputation Category ing unit was occupied, but did not show the number of individuals residing in the unit. Imputation categories Donees Donor pool
2. Occupancy imputation. When Census Bureau Household size Occupied with unknown Occupied units with known records indicated that a housing unit existed, but not imputation: household population population (in NRFU, or whether it was occupied or vacant, the Census Bureau a. Single units CIFU, or from list/enumerate
b. Multiunits or update/enumerate areas) imputed occupancy status (occupied or vacant). If the unit was imputed to be occupied, the household size Occupancy Units known to be either Occupied units with known was also imputed. imputation: occupied or vacant population or vacant units
a. Single units (in NRFU, or CIFU, or from
3. Status imputation. When the Census Bureaus b. Multiunits list/enumerate or update/

records had conflicting or insufficient information enumerate areas) about whether an address represented a valid, nondu-Status Imputation: Units with no status Occupied units with known plicated housing unit, the Census Bureau first imputed

a. Single units information population, vacant units, for the status of the unit (occupied, vacant, nonexist- b. Multiunits or nonexistent units (in ent), then, if occupied, the household size was NRFU, or CIFU, or from list/enumerate or update/

imputed.

enumerate areas)

Methodology In general, type of structure (multi or single), type of enu-The Census Bureau used the nearest-neighbor hot deck meration (mail or list/enumerate), and final stage of the imputation methodology to perform the count imputation. data collection for a housing unit (initial collection, NRFU, Nearest was defined by the geographical closeness of or CIFU) determined whether a housing record could be housing units. Group quarters addresses were included in used as a primary donor. Each available donor could only the measure of distance, although not otherwise involved be used once. Most of the time, the nearest potential in count imputation. Census geographical identifiers, such donor was selected as the donor. Occasionally, a second as tract number, block number, or map spot number, along nearest neighbor was designated as the donor, because with street name, house number or apartment number the nearest donor had been taken by some previously pro-were used to describe geographical proximity of housing cessed donee. Whenever possible, the donor and donee records. To properly assign status and number of occu- were to be in the same tract, or in the same multiunit if pants to the housing units requiring imputation, limited the donee was located in a multiunit building.

donor pools and expanded donor pools were developed To identify the nearest donor, a search was conducted in for each imputation category, which were further subdi-both directions: forward and backward. Using the donee vided by type of structure.

as a reference point, potential donors surrounding the All cases with missing status, occupancy or household donee record were searched, and the donor housing unit size went through intensive follow-up operations to geographically closest to the donee housing unit was reduce the amount of imputation as much as possible. determined. The search was done separately for single This was the main purpose of the NRFU and CIFU for units and multiunits.

A-2 Section IAppendix A Census 2000 Missing Data U.S. Census Bureau, Census 2000

CHARACTERISTIC IMPUTATION For within household imputations as well as within person imputations, the process allocates missing values for indi-Characteristic imputation was the process of filling in vidual person characteristic data items on the basis of missing person characteristics, which include sex, other reported information for other persons in the house-age/date of birth, relationship, Hispanic origin and race.

hold, or from other persons in households with similar The Census Bureau used characteristic imputation for characteristics.

three categories of cases in Census 2000.

1. Whole household imputation. The Census Bureau RESULTS imputed all of the characteristics for all of the persons in the household when the household record did not This section briefly summarizes the overall level of impu-contain any data defined persons. To be data-defined, tation for people whose 100-percent characteristics were a person record must contain two or more of the 100- totally imputed in Census 2000 (within person imputa-percent population data items, or a name. tions are excluded) for the U.S. population residing in housing units.
2. Within household imputation. The Census Bureau imputed all the 100-percent characteristics for any Census 2000 Housing Unit Persons by non-data-defined persons in the household when the Imputation Category household record contained at least one data-defined (Excludes within person imputations) person. Number of Percent of persons total persons
3. Within Person Imputation. Sometimes some of the 100-percent characteristic data for data-defined per- Total housing unit population . . . . . . . . . . 273,643,272 100.00 sons were missing and were imputed. 100-percent characteristic imputation not required . . . . . . . . . . . . 267,869,007 97.89 100-percent characteristic Methodology imputation required . . . . . . . . . . . . . . . 5,774,266 2.11 The categories of characteristic imputation employ differ- Count imputations: . . . . . . . . . . . . . . . . . . . 1,172,144 0.43 ent methodologies. For whole household imputations, the Household size . . . . . . . . . . . . . . . . . . . . 495,600 0.18 Occupancy . . . . . . . . . . . . . . . . . . . . . . . . 260,652 0.10 process replicates all of the 100-percent person data items Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415,892 0.15 by substituting data from a hot deck nearest neighbor Characteristic imputations . . . . . . . . . . . . . 4,602,122 1.68 donor pool record of the same household size. This pro- Whole household1 . . . . . . . . . . . . . . . . . 2,269,010 0.83 cess is sometimes referred to as substitution, since it Within household . . . . . . . . . . . . . . . . . . 2,333,112 0.85 assigns all the characteristics for all of the persons in the 1

selected donor household to the household requiring The count imputation cases (also requiring characteristic imputa-tion) are not included in this figure to avoid duplication.

imputation. This substitution process is also used to obtain the person characteristics for those housing units About 2 percent of persons residing in housing units that were imputed as occupied or had their household size required imputations of all 100-person characteristics. The imputed during the count imputation process. By defini- majority of these cases , about 1.7 percent, occurred in tion these households do not contain any data-defined situations where a census response on household size was persons. However, the majority of whole household obtained. Less than a half of a percent were situations imputations occur for cases where a census response on where household size or the status of the housing unit household size was obtained. was unknown.

Census 2000 Missing Data Section IAppendix A A-3 U.S. Census Bureau, Census 2000

Appendix B.

Demographic Analysis INTRODUCTION of the census and are highly reliable, such as administra-tive statistics on births, deaths, and Medicare data and The Census Bureau has used Demographic Analysis (DA) estimates of immigration and emigration. The difference to measure population coverage, trends between cen-between the DA estimated population (P) and the census suses, and differences in coverage by age, sex, and race count (C) provides an estimate of the net census under-(Black, non-Black) at the national level in every census count (u). Dividing the net undercount by the DA bench-since 1960 (Siegel and Zelnik (1966), Siegel (1974), Fay et mark provides an estimate of the net undercount rate (r):

al. (1988), and Robinson et al. (1993)). DA produces esti-mates of the U.S. population through the use of data from uPC administrative records and other noncensus sources. It r u P 100 has documented both the long-term reduction in the cen-sus net undercount rate and the persistent and dispropor- The particular analytic procedure used to estimate cover-tionate undercount of certain demographic groups, such age nationally for the various demographic subgroups as Black men. One goal of Census 2000 was to reduce depends primarily on the nature and availability of the these differential undercounts, which has been a continu- required demographic data. Two different demographic ing effort for the last several censuses. techniques were used to produce the demographic analy-sis estimates for 2000, one for the population under age The independence from the census and internal consis-65 and another for the population 65 and over.

tency of the DA estimation process allows us to compare the results with the survey-based Accuracy and Coverage Ages under 65. The Demographic Analysis estimates for Evaluation (A.C.E.) coverage estimates; in particular, the the population below age 65 are based on the compilation consistency of the age-sex results can be assessed. DA of historical estimates of the components of population and A.C.E. use entirely different methodologies. Because change: births since 1935 (B), deaths to persons born the sources and patterns of errors in the two estimates are since 1935 (D), immigrants born since 1935 (I), and emi-sufficiently different, any disagreement in the results can grants born since 1935 (E). Presuming that the compo-shed light on both the quality of the census and potential nents are accurately measured, the population estimates problems in methodology in the A.C.E. or the DA. Because (P0-64) are derived by the basic demographic accounting of data limitations, DA estimates and comparisons are equation applied to each birth cohort:

only possible at the national level and for certain large P064 B D I E demographic groups. A further discussion of DA limita-tions is found in the section Limitations of DA Estimates The size of the component estimates used to develop of this appendix. the DA population under age 65 for 2000 is shown in Table B-1:

The U.S. Census Bureau released two sets of DA results as part of its evaluation of Census 2000 and the A.C.E. All DA results in this section are from the revised values released Table B-1. DA Estimates of the Components of Change for the U.S. Resident in October 2001. See Robinson (2001) for details. Population: April 1, 2000 DESCRIPTION OF THE DEMOGRAPHIC ANALYSIS Component Estimate METHOD Total population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281,759,858 Demographic Analysis represents a macro-level approach Under age 65 in 2000 for measuring coverage. Estimates of net undercount are + Births since 1935 (B) . . . . . . . . . . . . . . . . . . . . . . . 234,860,298

- Deaths to persons born since 1935 (D) . . . . . . . 14,766,736 obtained by comparing census counts to independent esti- + Immigration of persons born since 1935 (I) . . . . 32,563,971 mates of the population derived from other measures - Emigration of persons born since 1935 (E) . . . . 5,485,117 (mostly administrative data). In general, DA population Ages 65 and over in 2000 estimates are developed for the census date by combining Medicare-based population . . . . . . . . . . . . . . . . . . . . 34,587,440 various types of demographic data that are independent Demographic Analysis Section IAppendix B B-1 U.S. Census Bureau, Census 2000

Clearly, births (234.9 million) represent by far the largest The demographic component estimates for the population component. The immigration component (32.6 million) is under 65 are combined with the Medicare-based estimate second largest, followed by deaths (14.8 million) and emi- for the population 65 and over to produce the total DA grants (5.5 million). population estimate of 281.8 million as of April 1, 2000.

The actual calculations are carried out for single-year birth cohorts. For example, the estimate of the population age LIMITATIONS OF DA ESTIMATES 40 on April 1, 2000 is based on births from April 1959 to March 1960 (adjusted for under-registration), reduced by DA estimates for the total population are available only at deaths to the cohort in each year between 1959 and 2000, the national level and only for the broad categories Black and incremented by estimated immigration and emigra- and non-Black. DA cannot provide estimates for sub-tion of the cohort over the 40-year period. national geographic areas like states or metropolitan areas; or for other demographic groups, such as Hispan-The components for births and deaths are compiled princi-ics. DA also cannot provide separate estimates for census pally from vital statistics records augmented by correction factors. The immigration component is estimated from its overcoverage and undercoverage, but is limited to esti-subcomponents: mating net undercount.

There are also certain inherent limitations on DA estimates Table B-2. DA Estimates of the Components of because of data quality. The race categories reflect the Immigration for the U.S. Resident Population Under 65 Years of Age: race as assigned at the time of the event (e.g. birth or April 1, 2000 Medicare enrollment), which for some persons will differ from the race reported in the census. There is also consid-Component Estimate erable uncertainty in the quality of the data for some of Legally admitted permanent residents . . . . . . . . . . . . . 20,332,038 the components related to immigration, most importantly Other measured migration . . . . . . . . . . . . . . . . . . . . . . . 2,249,001 the components which capture those who entered illegally Migrants from Puerto Rico . . . . . . . . . . . . . . . . . . . . . 905,698 or temporarily, or whose legal status had not yet been Temporary migrants . . . . . . . . . . . . . . . . . . . . . . . . . . . 776,002 Civilian citizen migration . . . . . . . . . . . . . . . . . . . . . . . 891,940 determined.

Armed Forces overseas . . . . . . . . . . . . . . . . . . . . . . . -324,639 Residual foreign-born migration (includes unautho- DA ESTIMATES rized migrants) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9,982,932 Age 65 and over. Administrative data on aggregate Compared to the Census 2000 count of 281.4 million, the Medicare enrollments are used to estimate the population DA estimate of 281.8 million implies a net census under-age 65 and over (P65+): count of 0.12 percent (see Table B-3). The net census undercount in 2000 was dramatically different from that in P65 M m 1990, which was 4.2 million, or 1.65 percent. However, where M is the aggregate Medicare enrollment and m is the fact that DA provides only a net undercount estimate, the estimate of underenrollment in Medicare. The DA not separate measures of gross undercount and over-population 65 and over is based on 2000 Medicare enroll- count, is a limitation on its ability to shed light on specific ments. Medicare is an administrative data set from the undercoverage or overcoverage problems in the census.

Health Care Financing Administration. Although Medicare enrollment is generally presumed to be quite complete, Table B-3. Demographic Analysis Estimate and adjustments are made to the basic data to account for Net Census Undercount for the individuals who are omitted. An allowance is made for Total Population: 1990 and 2000 the estimated 1.3 million not enrolled (3.9 percent). 1990 2000 Category Underenrollment factors are based on survey estimates of Census Census Medicare coverage and data on age at enrollment in the DA (millions) . . . . . . . . . . . . . . . . . . . . . . . . . . 252.9 281.8 Medicare file. The DA population aged 65 and over (34.6 Difference from Census . . . . . . . . . . . . . . . . 4.2 0.3 million) represents 12.3 percent of the total population in Percent difference . . . . . . . . . . . . . . . . . . . . . 1.65 0.12 2000.

B-2 Section IAppendix B Demographic Analysis U.S. Census Bureau, Census 2000

The DA estimates indicate that the substantial reduction in Table B-4. Demographic Analysis Estimates of net census undercount from 1990 to 2000 was shared by Percent Net Census Undercount for the Total Population and Selected almost all demographic groups. The net census under-Demographic Groups: 1990 and count of males and females each fell by about 1.5 percent- 2000 age points (to an estimated net census undercount of 0.86 Category 1990 DA 2000 DA percent for males and estimated net census overcount of 0.60 percent for females in 2000). The estimated net Total. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.65 0.12 undercount rate dropped more for Blacks (estimated net Male . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.39 0.86 Female . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.93 -0.60 census undercount of 2.78 percent in 2000) than non- Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.52 2.78 Blacks (estimated net census overcount of 0.29 percent in Non-Black . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.08 -0.29 2000), reducing the differential undercount of Blacks rela- Black male, ages 20-64 . . . . . . . . . . . . . . . . 11.31 8.44 tive to non-Blacks from 4.4 percentage points in 1990 to Children, ages 0-4 . . . . . . . . . . . . . . . . . . . . . 3.72 3.84 3.1 points in 2000. (a minus sign denotes a net overcount)

Demographic Analysis Section IAppendix B B-3 U.S. Census Bureau, Census 2000

Appendix C.

Weight Trimming INTRODUCTION extreme clusters. The A.C.E. weight trimming plan was a modification of the method used for the 1990 Post-This appendix contains a general overview of the Accu-Enumeration Survey (PES). As in 1990, the weights for racy and Coverage Evaluation (A.C.E.) weight trimming extremely influential clusters were trimmed to yield a pre-plan. The procedure was designed to protect against specified net error. The intention of the plan was to lessen undue influence from a small fraction of the sample. The the impact of such clusters on the dual system estimates weight trimming criteria were established prior to the and variances.

completion of data processing operations to ensure that The plan did not redistribute the weights across the there was no manipulation of the dual system estimates.

remaining clusters to preserve totals. This would imply This procedure was implemented according to the pre- treating the E and P samples differently to preserve these specified criteria. Since only one cluster was trimmed, the separate totals, and contradicts the preference for consis-impact on the dual system estimates was very minimal. tent treatments of both samples. Since the primary inter-The A.C.E. weight trimming procedure was designed to est was in the dual system estimation ratios, and not E-reduce the sampling weights for clusters that potentially and P-sample totals, the weights were not redistributed.

could have had an extreme influence on the dual system A.C.E. WEIGHT TRIMMING METHODOLOGY estimates and variances. The measure of cluster influence Each cluster was evaluated to determine if it contributed was the net cluster error, the absolute difference between disproportionately to the dual system estimates and vari-the weighted estimate of nonmatches and the weighted ances. If the cluster was an outlier, the cluster sampling estimate of erroneous enumerations. When the net error weight was multiplied by a factor to decrease the influ-exceeded a pre-set maximum value, the sampling weights ence of the cluster on the dual system estimates and vari-were reduced. This approach reduced variance and may ances.

have introduced some bias, but it is highly likely to have Identify Outlier Clusters reduced the mean square error for most items. If the net A measure of the cluster influence was calculated for error of the cluster did not exceed the pre-set maximum each cluster. Then, based on pre-set criteria, a decision value, the sampling weights were unchanged.

was made whether the cluster should be identified as an The net error criteria was examined after the A.C.E. person outlier.

matching operation was completed. If the criteria for Cluster Influence. The measure of cluster influence was weight trimming was met, it was done for all sample cases the net error. For purposes of weight trimming, the net in a cluster even though a cluster contributed sample to error was the absolute difference between the weighted multiple post-strata. This was done prior to the missing number of nonmatches and the weighted number of erro-data process. neous enumerations. The form of the weighted net error was BACKGROUND Zi l (Pi - Mi) - (Ei - CEi) l (1)

Weight trimming guards against the possibility of a certain small number of clusters exerting an undue influence on where post-stratum estimates and variances. In the A.C.E., these Zi the net error estimate for cluster i, are expected to be due to a disproportionate number of Pi the weighted P-sample population estimate for census nonmatches or census erroneous enumerations cluster i, within a few block clusters. Although extreme sampling Mi the weighted P-sample match estimate for cluster i, weights can also be a source of influence in surveys, the Ei the weighted E-sample population estimate for A.C.E. sampling weights, the inverse of the probability of cluster i, and selection, were reasonably controlled in the sample CEi the weighted E-sample correct enumeration design. These were not expected to be an important estimate for cluster i.

source of variance in the A.C.E.

The first term of equation (1) was the weighted number of While the A.C.E. sample design helped minimize the occur- nonmatches in the ith cluster, while the second term was the rence of highly influential clusters, a weight trimming plan weighted number of erroneous enumerations in the ith was developed to reduce the effect of these potential cluster.

Weight Trimming Section IAppendix C C-1 U.S. Census Bureau, Census 2000

Outlier Criteria. The outlier criterion was the maximum was less than 10 then only the non-movers and allowable net error for a single cluster. There were two dif- outmovers were used. Because of the small number ferent criteria based on the cluster geography. The nation of movers expected in most clusters, this process was classified into two levels of geography: American only used nonmovers and outmovers.

Indian Reservations and the balance of the nation. The

  • Some nonmovers and outmovers had unresolved match American Indian Reservation clusters were sampled at dis-status and residence status. Some E-sample cases had proportionately higher rates relative to the balance of the unresolved enumeration status. This meant the status of country. In addition, separate American Indian on Ameri-unresolved cases had to be estimated to identify outlier can Indian Reservation post-stratum estimates were clusters. Information available at the time of the weight planned. If the American Indian Reservation clusters were trimming process was used to approximately estimate included with the rest of the nation, it is unlikely that an the unresolved status cases. Since the weight trimming influential cluster would be detected. The two outlier crite-process was done before the missing data process, ria are defined in Table C-1.

there was some information that the missing data pro-Table C-1. Outlier Cluster Criteria cess used to estimate unresolved status that was not yet available.

Maximum Cluster geography

  • A P-sample noninterview adjustment was approximated net error in the estimate of nonmatches. Information available American Indian Reservations . . . . . . . . . . . . . . . . . . . . 6,250 during the weight trimming process was used to Balance of the United States . . . . . . . . . . . . . . . . . . . . . 75,000 approximately estimate the noninterview adjustment for each cluster. As with the unresolved cases, since the All clusters with net error greater than the maximum weight trimming process was done before the missing allowable net error were considered influential clusters.

data process, there was some information that the miss-They were expected to disproportionately influence the ing data process used to do the noninterview adjust-dual system estimates and variances. The sampling ment that was not yet available.

weights of these clusters were decreased.

  • The targeted extended search results and sampling The maximum net error for the balance of the country was rates were reflected in the estimate of nonmatches and based on experience in the 1990 PES. Since the A.C.E. was erroneous enumerations (Chapter 5).

roughly double the PES sample size, the maximum allow-able net error was set to be half the 1990 value. For the Down-Weighting Outlier Cluster American Indian Reservation clusters, the maximum allow-All outlier clusters were down-weighted, so that no cluster able value was a function of the average sampling rates.

contributed more than the maximum allowable number of The American Indian Reservation average P-sample cluster net errors for the appropriate geography. A separate sampling weight was approximately one-twelfth the bal-down-weighting factor was computed for each outlier ance of the U.S. average P-sample cluster sampling cluster. The down-weighting factor was the ratio of the weight. Because of this, the American Indian Reservation outlier cluster criteria to the cluster net error computed maximum allowable net error was one-twelfth the balance above.

of the U.S. criteria.

C Implementation Strategy. The outlier clusters were Di (2) identified after the person matching operation (Chapter 4) Zi was completed, but before the missing data process where (Chapter 6). The person matching results were the major Di the down-weighting factor for cluster i, and input into this process. The weight trimming estimate C the maximum net error from Table C-1 for used the best estimate of cluster net error at that time that the appropriate level of geography, and was operationally feasible. This timing had several impli- Zi the net error estimate for cluster i from (1).

cations:

The cluster down-weighting factor was applied to the

  • Only nonmovers and outmovers were used for deriving P-sample and the E-sample weights of the outlier cluster.

the estimate of nonmatches above. For dual system esti- The P-sample and E-sample weights for the remaining mation, if the number of outmovers in a post-stratum clusters were unchanged.

C-2 Section IAppendix C Weight Trimming U.S. Census Bureau, Census 2000

A.C.E. WEIGHT TRIMMING RESULTS Figures C-1 and C-2 show the distributions of net error before weight trimming for the balance of the United Table C-2 shows the one cluster down-weighted by the weight trimming process in the balance of the United States and the American Indian Reservation areas.

States. No clusters were trimmed on American Indian Reservations.

Table C-2. A.C.E. Weighted Net Errors for Down-Weighted Cluster After Before trimming trimming Geographic area Estimated Estimated Estimated Estimated erroneous weighted weighted weighted enumerations nonmatches net error net error Balance of the United States . . . . . . 79,371 1,396 77,975 75,000 Weight Trimming Section IAppendix C C-3 U.S. Census Bureau, Census 2000

Figure C-1.

Distribution of Net Error for the Balance of the United States (Number of clusters) 7,610 2,461 510 133 45 24 14 12 3 2 0 0 2 0 0 0 1 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 Net error before trimming C-4 Section IAppendix C Weight Trimming U.S. Census Bureau, Census 2000

Figure C-2.

Distribution of Net Error for American Indian Reservations (Number of clusters) 262 186 29 510 3 2 1 1 1 1 0 0 0 0 0 0 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 Net error before trimming Weight Trimming Section IAppendix C C-5 U.S. Census Bureau, Census 2000

Appendix D.

Error Profile for A.C.E. Estimates INTRODUCTION N estimated number of inmovers i

M estimated number nonmovers enumerated in the n

The Accuracy and Coverage Evaluation (A.C.E.) survey pro-census vided estimates of census coverage error that have been M estimated number outmovers enumerated in the considered for adjusting Census 2000. The estimation o census used the PES-C version of dual system estimation with the data collected by the A.C.E. The adjusted estimates are When a post-stratum had fewer than 10 outmovers, the subject to nonsampling error, as well as sampling error. PES-A version of the dual system estimator that does not This appendix discusses the types of errors found in the use inmovers was employed as follows:

use of PES-C and the measurement of these errors.

/M (N ) (N N )/(M

+M )

P n o n o OVERVIEW OF ADJUSTED ESTIMATES The adjustment factor for post-stratum h is defined as Define the following notation for each post-stratum h. Ah N C . The unadjusted estimate for area j is h h Nunadj, j hCh, j and the adjusted estimate is N Ch census count for post-stratum h adj, j The estimates of undercount in the population IIh number of persons imputed into the original A C h h h, j enumeration for post-stratum h size of area j is N adj, j Nunadj, j and the estimate of the corre-I sponding undercount rate is Nadj, j Nunadj, j N E,h estimated number of enumerations in post-stratum h adj, j .

with insufficient information for matching1 E E,h estimated number of erroneous enumerations in SOURCES OF ERROR IN ADJUSTMENTS post-stratum h The adjusted estimates are subject to a variety of possible N E,h estimated population size for post-stratum h from sources of error: sampling error, data collection and sur-the E sample vey operations error, missing data, error from exclusion of CE h estimated population size for post-stratum h who late census data and data with insufficient information for could possibly be matched I matching, contamination error, correlation bias, synthetic CE h N E,h E,h EE,h estimation bias, inconsistent post-stratification, and bal-N P,h estimated size of the P-sample population ancing error.

M estimated number of the P-sample population enu-h merated in the census P-Sample Matching Error and E-Sample Processing The dual system estimator for the population size Nh in Error post-stratum h is defined by Source. The term P-sample matching has been used to C II )(CE N N )(N M ). describe the search of the census records for enumera-h h h h E,h P,h h tions for P-sample respondents. The P-sample respondents The 2000 A.C.E. used the PES-C formulation of the dual are designated as matching an enumeration in the census system estimator which uses the number of inmovers to or as not enumerated. The counterpart for the E sample is estimate the number of outmovers, but uses the match called E-sample processing where census enumerations rate for the outmovers to obtain the estimate of the num- are designated as correctly enumerated or erroneously ber of outmovers that match the census. The post-stratum enumerated. When the status of a P-sample or E-sample index h is suppressed in the following formula. case can not be determined, it is designated as unre-M ) N N M (M N )N ). solved.

(N P n i n o o i P-sample matching error refers to the net effect of where errors that occur during the processing that affect the n

N estimated number of nonmovers determination whether a P-sample person matches a cen-N estimated number of outmovers sus enumeration. Likewise, the net effect of errors in o

assigning enumeration status to E-sample enumerations during the office processing is called E-sample processing 1

Late enumerations are included with imputations in the original enumeration. error.

Error Profile for A.C.E. Estimates Section IAppendix D D-1 U.S. Census Bureau, Census 2000

Errors may occur in either direction. P-sample people may Under the assumption that all other errors are zero, the be designated as matching a census enumeration bias in the adjustment factors caused by P-sample match-although they are not in the census, called a false match, ing error and E-sample processing error is defined as or people may be designated as not enumerated although ChIIh B CE B

[N they are, called a false nonmatch. E-sample enumera- h CEprocess,h P,h Pprocess,h]

B process,h A h .

tions may be falsely assigned a correct enumeration sta- Ch N Mh BMprocess,h E,h tus, called a false correct enumeration, or enumerations may be incorrectly designated as an erroneous enumera- The error component definitions include a ratio adjust-ment because they are estimated using the Evaluation tion, a false erroneous enumeration.

Sample. The ratio adjustment for components from the Matching error also encompasses errors in the size of the P-sample is the ratio of the P-sample population total from P-sample population that may happen during the process- the A.C.E. to the P-sample population total based on the Evaluation Sample N . The ratio adjustment for the com-ing of the P-sample. These errors also may occur in either PF direction. A person included as a member of a household ponents from the E-Sample is ratio of the two E-sample may really reside at another location or not be in the totals defined comparably.

population of interest. For example, the census residency B Mprocess [mnms momims] [N N P

]

PF rules consider family members away at college to reside at B

[nnms] [NP NPF]

Pprocess their college address. A family member in a nursing center B CE-process [ee ] [N s

N E

]

EF is considered to be in the group quarters population, which is not part of the population of interest. Vice versa, P-Sample and E-Sample Data Collection Error a person with two homes, may be designated as living at the other home, but really live at the one in the sample. Source. Errors may occur during the data collection.

While an interview is in progress, the respondent may In the application of PES-C, respondents have the potential make an error in answering a question, or the interviewer of many more statuses than was possible in the process- may make an error in asking a question or recording the ing of the P-sample than in 1990. The reason is that a answer. Errors also occur when an interviewer goes to the P-sample respondent may be a nonmover, an outmover, an wrong address. Regardless of whether the error is caused inmover, or an out-of-scope person. The nonmovers and by the respondent, the interviewer, or a combination of outmovers have another characteristic, which is resident the two, such errors may cause the matching operation to or nonresident. A person who is living at the sample assign mover status, residency status, or match status address on Census Day is called a resident. incorrectly to a person on the household roster. The A.C.E.

interviewer collects both a Census Day roster and an Inter-Errors in mover status may go in all directions. A person view Day roster. A person who resides at the household on designated as a nonmover may be an inmover or an out- both days is classified as a nonmover. A person who lived mover. All combinations of errors may happen and affect there only on Census Day is an outmover, while a person the DSE in different ways. who lived there only on Interview Day is an inmover. Per-sons classified as outmovers and nonmovers may or may Definition. P-sample matching error affects both the not have been a resident at the address on Census Day.

estimates of nonmovers and inmovers in the estimate of Errors in the mover status, residency status, or other the size of the P-sample population. In addition, matching errors may cause the matching operation to fail to deter-error affects the estimates of the number of nonmover mine that a person was enumerated and to classify the matches, the number of outmovers and outmover person as a nonmatch incorrectly.

matches, and the number of inmovers in the estimate of the number of matches. E-sample data collection error Sometimes people listed on household rosters do not affects the estimate of the number of erroneous enumera- exist. A more likely scenario is an interviewer who is hav-tions. The post-stratum index h is suppressed in the fol- ing trouble contacting the residents of a housing units lowing definitions. may copy the name from a mail box and fill in the charac-mnms net P-sample matching error in the nonmover com- teristics. This type of error is called P-sample fabrication.

Usually fabricated households cause an underestimate of ponent of M the match rate, because they are smaller than the average momims net P-sample matching error in the outmover and household size and do not match, inmover component of M nnms net P-sample matching error in the nonmover com- A special type of E-sample data collection error is the fail-ponent of N P ure to identify duplicate enumerations. The processing nims net P-sample matching error in the inmover com- includes a search for duplicate enumerations within the ponent of N block cluster and the surrounding blocks. Duplicate enu-P ees net E-sample office processing error in CE merations outside the block cluster and surrounding D-2 Section IAppendix D Error Profile for A.C.E. Estimates U.S. Census Bureau, Census 2000

blocks are more difficult to find. Identifying these duplica- defined as tions requires the respondent to provide information con-Ch IIh B B cerning another address where a household member may CE h CEcollect,h [N P,h Pcollect,h]

B collect,h A h .

also be enumerated. Errors may occur when the respon- Ch N

Mh Bmcollect,h E,h dent does not understand the residency rules or is unaware that a household member may be enumerated at The error component definitions include a ratio adjust-another address. The situations most prone to causing ment because they are estimated using the Evaluation duplicate enumerations are college students enumerated Sample. The ratio adjustment for components from the at their family home and their college address, children in P-sample is the ratio of the P-sample population total from joint custody agreements enumerated at both parents the A.C.E. to the P-sample population total based on the addresses, and people with two residences. Evaluation Sample. The ratio adjustment for the compo-nents from the E-Sample is ratio of the two E-sample totals Another type of field error occurs during the listing of the defined comparably.

housing units for the census or for the P-sample. The B Mcollect [mnmr mnmfp momimr momimfp] [N ]

P PF housing units listed as being in the sample block may be BPcollect [nnmr nnmfp nim] [N P NPF]

in another block or vice versa. These types of errors are B CEcollect [eer] [N N

]

E EF called geocoding error. To account for minor geocoding errors in 2000, the search for matches occurred within all Missing Data block-clusters and also in surrounding blocks for a sample Source. A.C.E. data may be missing for a variety of of the cases with geocoding errors recorded in the E reasonssome A.C.E. interviews fail to take place, some sample a design called Targeted Extended Search (TES).

households provide incomplete data on questionnaire The variance estimates for the A. C. E. account for the TES items, and in some cases the information for classification design. Flaws in the execution of the TES may result in as a match or nonmatch is ambiguous. The methods used biases.

to compensate for missing data effectively assume that the match status for the case with missing data is equal Definition. P-sample fabrication and data collection error on average to the status for cases that are similar, except affect both the estimates of nonmovers and inmovers in that they have complete data. Missing data on characteris-the estimate of the size of the P-sample population. In tics are imputed from otherwise similar cases with com-addition, fabrication and data collection error affect the plete data. Nonresponse weighting adjustments are used estimates of the number of nonmover matches, the num-to account for sampled but noninterviewed households.

ber of outmovers and outmover matches, and the number The P-sample matching and E-sample processing operation of inmovers in the estimate of the number of matches.

assigns unresolved enumeration status to a case when E-sample data collection error affects the estimate of the the available data is inadequate to determine whether the number of erroneous enumerations. Again, the post-person is enumerated in the census and a probability of stratum index h is suppressed in the following definitions.

being correctly enumerated is imputed for such cases.

mnmr net P-sample data collection error in the non-Also, error in the resolved cases causes error in the impu-mover component of M tations, because the resolved cases are used to form the momimr net P-sample data collection error in the outmover imputations. Even if the imputation model were perfect, and inmover component of M the imputations will have error if the data used to fit the nnmr net P-sample data collection error in the non model has error. This type of error is called imputation mover component of N P error due to data error.

nimr net P-sample data collection error in the inmover component of N Although one can consider the range of effects on the DSE P

eer net E-sample data collection error in CE by considering extreme alternativese.g., all unresolved mnmfp net P-sample fabrication error in the matches truly are matches or truly are nonmatchesthe range is too wide to be informative about the likely bias.

nonmover component of M The bias from the method used to compensate for missing momimfp net P-sample fabrication error in the data can in principle be estimated from intensive outmover and inmover component of M follow-up of cases with missing data, but in practice the nnmfp net P-sample data collection error in the fraction completed by follow-up is too low. The Census nonmover component of N P Bureau analyzed the missing-data bias by looking at the nimfp net P-sample data collection error in the inmover changes in the DSE when alternative methods were used component of N P to compensate for missing data.

Under the assumption that all other errors are zero, the Results from the Analysis of Reasonable Alternative Impu-bias in the adjustment factors caused by P-sample data tation Models are used to estimate the variance compo-collection error and E-sample data collection error is nent. See Keathley et al. (2001) for details. The results of Error Profile for A.C.E. Estimates Section IAppendix D D-3 U.S. Census Bureau, Census 2000

Reasonable Alternative Imputation Models provided the Sample. The ratio adjustment for components from the data for the calculation of the variance-covariance matrix P-sample is the ratio of the P-sample population total from for adjustment factors the missing data component. The the A.C.E. to the P-sample population total based on the missing data variance-covariance matrix was added to the Evaluation Sample. The ratio adjustment for the compo-sampling variance-covariance matrix to obtain a variance- nents from the E Sample is the ratio of the two E-sample covariance matrix for the adjustment factors that con- totals defined comparably.

tained the random error due to sampling and imputation B Mimpdata [mnmi momi] [N N P

]

PF for missing data. N ]

B Pimpdata [nnmi] [N P PF B [ee ] [N N ]

Definition. The Census Bureau models the error due to CEimpdata i E EF imputation as a random effect and estimates its variance-Sampling Error covariance matrix. Modeling imputation error as a random effect is motivated by practicalities. In principle, the bias Source. Sampling error gives rise to random error, quan-from the method used to compensate for missing data can tified by sampling variance, and to a systematic error be estimated from intensive follow-up of cases with miss- known as ratio-estimator bias. The sampling variance is ing data, but in practice the fraction completed by present in any estimate based on a sample instead of the follow-up is too low. The variance component due to whole population. Ratio-estimator bias arises because imputation for missing data has three components. even if X and Y are unbiased estimators, X/Y typically is biased.

VM variance due to imputation VRA VB VI Definition. The sampling variance and ratio-estimator where bias for the adjustment factors are VRA variance due to the imputation model selection S2 sampling variance-covariance matrix for the VB variance due to the model parameter estimation adjustment factors VI within-person imputation variance.

B ratio,h ratio-estimator bias in the adjustment factor A h The imputation variance components due to parameter for post-stratum h.

estimation and within person estimation are included in Random sampling error is reflected in the estimated the sampling error estimates, leaving the variance due to variance-covariance matrix of the Ahs. The covariance model selection to be estimated separately. The missing matrix is estimated by the Census Bureaus sampling-error data variance-covariance matrix is added to the sampling software applied to the A.C.E. data. The software also can variance-covariance matrix to obtain a variance-covariance be used to produce estimates of ratio-estimator bias.

matrix for the adjustment factors that contained the ran-dom error due to sampling and imputation for missing Correlation Bias data.

Source. If there is variability of the enumeration prob-The components of imputation error due to data error abilities for persons in the same post-stratum, or if there affect estimate of the number of nonmovers, the estimate is a dependence between enumeration in the census and of the number of nonmovers enumerated, the estimate of in the A.C.E. e.g., people less likely to be enumerated in the match rate for the outmovers, and the estimate of the the census may also be less likely to be found in the number of erroneous enumerations. The post-stratum A.C.E., then correlation bias may arise. Correlation bias is index h is suppressed in the following definitions. most likely a source of downward bias in the DSE. Evi-mnmi net imputation error due to data error in the dence of correlation bias in national estimates is provided nonmover component of M by sex ratios (males to females) for adjusted numbers that momi net imputation error due to data error in the are low relative to ratios derived from demographic analy-outmover match rate component of M sis of data on births, deaths, and migration.

nnmi net imputation error due to data error in the The information from demographic analysis is insufficient nonmover component of N P to estimate correlation bias at the post-stratum level, how-eei net imputation error due to data error in CE.

ever, and alternative parametric models have been used to Under the assumption that all other errors are zero, the allocate correlation bias estimates for national age-race-caused by sex groups down to post-strata. Estimates of correlation bias in the adjustment factor A h bias at the national level provided by demographic analy-imputation error due to data error is defined as sis information also account for possible error from B

Ch IIh CE groups whose probabilities of enumeration are so low that h CEimpdata,h [NP,h BPimpdata,h]

B impdata,h Ah the DSE will fail to account for them. The estimates of cor-Ch N B M

E,h h Mimpdata,h relation bias based on sex ratios are affected by error in The error component definitions include a ratio adjust- the demographic-analysis sex ratios and by possible other ment because they are estimated using the Evaluation biases in the sex ratios in the DSE.

D-4 Section IAppendix D Error Profile for A.C.E. Estimates U.S. Census Bureau, Census 2000

The assumptions and model underlying the measurement

  • The P sample covers the correct enumerations among of correlation bias are discussed in detail in Bell the late adds at the same rate as other correct enumera-(2001,2001b), but are described briefly here. Although tions.

there are several models for how correlation bias is dis-

  • The late adds occur in the E sample at the same rate as tributed, the two-group model was selected. The two-they occur in the census (excluding the imputations).

group model relies on the basic assumptions listed below for the estimation of correlation bias. In addition, sensitiv- Definition. Error due to excluding the reinstated census ity analyses assess the impact of variations in these enumerations in the calculation of the DSE affects the esti-assumptions. mate of the DSE and therefore the adjustment factor

  • The ratio of males to females measured in demographic directly.

analysis is more reliable for the two racial groups, Black B reinstate,h bias in the adjustment factor A for post-stratum h

and non-Black, than the A.C.E. h due to excluding reinstated census enumerations.

  • There is no correlation bias present in the A.C.E. esti-Contamination Error mates for females.
  • The relative correlation bias is equal across all A.C.E. Source. Contamination occurs when the A.C.E. selection post-strata within an age-race category. of a given block cluster alters the implementation of the census there and affects enumeration results, e.g, by
  • The relative impact of other nonsampling errors is equal increasing or decreasing erroneous enumerations or by for males and females at the national level.

increasing or decreasing coverage rates. Contact with resi-The assumption with the two-group model of the relative dents of the sample blocks during the listing for the correlation bias being equal across post-strata within an P-sample may cause them to not respond to the census, age-sex category has the advantage of permitting the esti- because they think that the listing contact is a response to mation of correlation bias through a multiplicative factor the census.

applied to the corrected DSE. Even more important, an unbiased estimate of the factor is available under the Definition. The bias in the adjustment factor for post-assumption that the relative impact of the other nonsam- stratum h from contamination is defined as follows.

pling errors is equal for males and females without B bias in the adjustment factor A for post-stratum h contam,h h actually having to estimate the nonsampling errors. due to contamination error.

Definition. Correlation bias usually causes the DSE to Synthetic Estimation Bias underestimate the population size.

B correlation bias in the adjustment factor A for Source. The adjustment methodology relies on a method correl,h h post-stratum h. called synthetic estimation to provide the same adjust-ment factor Ah for all enumerations in a given post-Excluded-data Error from Reinstated Census stratum, regardless of whether the enumerations are from Enumerations the same geographic area. Synthetic estimation bias arises Source. The DSE treats late census data as nonenumera- when the census from different areas but in the same tions. Thus, duplicate enumerations among the late data post-stratum should have different adjustment factors.

do not contribute to census data, but valid enumerations To assess synthetic estimation bias for a given area one among the late data are treated as census misses and are needs to develop an estimate based on data from the area estimated by the DSE. If the late census data were alone, which is rarely possible. Attempts to estimate syn-excluded from the entire adjustment process and estima- thetic estimation bias in undercount estimates from analy-tion, no new source of error would be present. The sis of artificial populations or surrogate variables, adjusted estimates do partially incorporate late census whose geographic distributions are known, are unconvinc-data by including them in Ch, but excluding them from the ing. Therefore, sensitivity analyses have been conducted computation of Nh. This use of late data affects the esti- to assess the impact of synthetic estimation bias. These mates for areas with disproportionately many or few late studies show that assuming synthetic estimation has a adds, with an effect that is similar to synthetic estimation minor effect on uses of the data is reasonable.

error. In addition, the exclusion of late census data from the E sample could bias the estimates at the post-stratum Definition. The synthetic estimates may cause a bias in level. the adjusted estimates for area j. Error from synthetic esti-mation does not affect the dual system estimate for a There are two conditions that have to be met for the post-stratum, only areas within a post-stratum.

exclusion of the late adds from the processing of the A.C.E. not to bias the dual system estimates at the post- B syn, j bias in the adjusted estimate N adj, j for area j due to stratum level: synthetic estimation error.

Error Profile for A.C.E. Estimates Section IAppendix D D-5 U.S. Census Bureau, Census 2000

Inconsistent Post-stratification However, the number of inmovers may not equal the num-ber of outmovers in a post-stratum because of circum-Source. The computation of requires census enu-CEh N stances such as economic conditions causing more people E,h merations to be assigned to post-strata, and the computa- to move out of an area than to move into an area.

tion of N M requires P-sample enumerations to be Definition. The error due to using the inmovers to esti-P, h h assigned to post-strata. When the assignments are not mate the outmovers affects the estimates of the size of made consistently for the two samples, error arises in the the P-sample population and the number of matches.

ratio N M . Inconsistent assignments to post-strata may P, h h mio,h net P-sample data collection error in the mover be caused by mis-reporting of characteristics used in post- component of M in post-stratum h h

stratification. nio,h net P-sample data collection error in the mover component of N P, h in post-stratum h.

Cases that are most prone to inconsistent classification are those where there is a different respondent for the Under the assumption that all other errors are zero, the household in the census and the A.C.E. For example, a bias in the adjustment factors caused by P-sample match-household members age or race may be reported differ- ing error and E-sample processing error is defined as ently in a self-response than when another household B Ch IIh CE h [N P,h Pinout,h]

members responds for the person. Such inconsistencies B inout,h A .

h also may be due to computer processing errors, as well as Ch N M B E,h h Minout,h inconsistent reporting. The error component definitions include a ratio adjust-ment because they are estimated using the Evaluation The matches in the A.C.E. sample provide a source of data Sample. The ratio adjustment for components from the for estimating the error due to inconsistent post-stratifi-P-sample is the ratio of the P-sample population total from cation. An estimate of the error for a post-stratum may be the A.C.E. to the P-sample population total based on the formed by assuming the inconsistency rate observed in Evaluation Sample. The post-stratum index h is sup-the matches also holds for those not matched.

pressed in the following definitions.

Definition. Error due to inconsistent post-stratification B Minout [mio] [N N P

]

PF affects the estimate of the DSE and therefore the adjust-BPinout [nio] [NP NPF]

ment factor directly.

Balancing Error B inconsist,h bias in the adjustment factor A for post-stratum h

h due to inconsistent post-stratification. Source. Balancing error must be addressed in the design of the search areas used to search for E-sample correct Error from Estimating Outmovers with Inmovers enumerations and P-sample matches. Limiting the search for correct enumerations and matches is necessary Source. This error is unique to the PES-C model used in because the matching operation cannot search the entire the A.C.E. For the PES-C model, the members of the census. By limiting the search area, a small percentage of P-sample are the residents of the housing units on Census correct enumerations will not be found and a small per-Day. There is some difficulty in identifying all the residents centage of matches will not be found. This causes an of all the housing units on Census Day because some underestimate of the correct enumerations and an under-move prior to the A.C.E. interview. The A.C.E. interview estimate of the matches. However, the estimate of the net relies on the respondents to identify those who have error is not biased as long as the percentage error in the moved out, the outmovers. Since the outmovers are iden- correct enumerations equals the percentage error in the tified by proxies, many of the outmovers are not recorded. matches. The A.C.E. design avoids balancing error by Therefore, the estimate of outmovers is too low. PES-C choosing the same block clusters for the E-sample and the uses the number of inmovers to estimate the number of P-sample and drawing the search areas consistently.

outmovers. The inmovers are those who did not live in the Definition. There is not a separate measurement of bal-sample blocks on Census Day, but moved in prior to the ancing error. Any balancing error that may arise during the A.C.E. interview. Theoretically the number of inmovers in implementation of the A.C.E. will be included in the mea-the whole country should equal the number of outmovers. surement of data collection error.

D-6 Section IAppendix D Error Profile for A.C.E. Estimates U.S. Census Bureau, Census 2000

Section I.

References Adams, T. and Liu, X. (2001). ESCAP II: Evaluation of Lack Coale, A. (1955). The Population of the United States in of Balance and Geographic Errors Affecting Person Esti- 1950 Classified by Age, Sex, and Color A Revision of Cen-mates, Executive Steering Committee for A.C.E. Policy II, sus Figures, Journal of the American Statistical Associa-Report 2. tion, 50, 16-54.

Baker v. Carr 369 U.S. 186 (1962). Coale, A. and Rives, N. (1973). A Statistical Reconstruc-tion of the Black Population of the United States, 1880-Belin, T., Diffendal, G., Mack, S., Rubin, D., Schafer, J., and 1970: Estimates of True Numbers by Age and Sex, Birth Zaslavsky, A. (1993). Hierarchical Logistic Regression Rates, and Total Fertility, Population Index, 39(1), 3-36.

Models for Imputation of Unresolved Enumeration Status Coale, A. and Zelnick, M. (1963). New Estimates of Fertil-in Undercount Estimation, Journal of the American Statis- ity and Population in the United States, Princeton Univer-tical Association, 88, 1149-1166. sity Press.

Bell, W. (2001). Accuracy and Coverage Evaluation: Corre- Cox, L. and Ernst, L. (1982). Controlled Rounding, lation Bias, DSSD Census 2000 Procedures and Opera- INFOR, Vol. 20, No. 4.

tions Memorandum Series #B-12*.

Davis, M. (1991). Preliminary Final Report for PES Evalua-Bell, W. (2001b). ESCAP II: Estimation of Correlation Bias tion Project P7: Estimates of P-sample Clerical Matching in 2000 A.C.E. Estimates Using Revised Demographic Error from a Rematching Evaluation, 1990 Coverage Stud-Analysis Results, Executive Steering Committee for A.C.E. ies and Evaluation Memorandum Series #H-2.

Policy II, Report 10. Davis, M. (1991b). Preliminary Final Report for PES Evalu-ation Project P10: Measurement of the Census Erroneous Byrne, R., Imel, L., Ramos, M., and Stallone, P. (2001). Enumerations - Clerical Error Made in the Assignment of Accuracy and Coverage Evaluation: Person Interviewing Enumeration Status, 1990 Coverage Studies and Evalua-Results, Census 2000 Procedures and Operations Memo- tion Memorandum Series #L-2.

randum Series #B-5*.

Davis, P. (2001). Accuracy and Coverage Evaluation: Dual Cantwell, P. (1999). Accuracy and Coverage Evaluation System Estimation Results, DSSD Census 2000 Proce-Survey: Overview of Missing Data for P & E Samples, dures and Operations Memorandum Series #B-9*.

DSSD Census 2000 Procedures and Operations Memoran-Fay, R., Passel, J., Robinson, J. G., and Cowan, C. (1988).

dum Series #Q-3.

The Coverage of Population in the 1980 Census, 1980 Census of Population and Housing: Evaluation and Cantwell, P. (2000). Accuracy and Coverage Evaluation Research Reports, PHC80-E4, U.S. Bureau of the Census, Survey: Specifications for the Missing Data Procedures, Washington, D.C.

DSSD Census 2000 Procedures and Operations Memoran-dum Series #Q-25. Ghosh, M. and Rao, J. N. K. (1994). Small Area Estimation:

An Appraisal, Statistical Science, Vol. 9, No. 1, 55-93.

Cantwell, P., McGrath, D. Nguyen, N., and Zelenak, M.

(2001). Accuracy and Coverage Evaluation: Missing Data Gonzalez, M. (1973). Use and Evaluation of Synthetic Esti-Results, DSSD Census 2000 Procedures and Operations mators, Proceedings of the Social Statistics Section, Memorandum Series #B-7*. American Statistical Association.

Gonzalez, M. and Waksberg, J. (1973). Estimation of the Childers, D. (2001). Accuracy and Coverage Evaluation: Error of Synthetic Estimates, paper presented at the first The Design Document, Census 2000 Procedures and meeting of the International Association of Survey Statisti-Operations Memorandum Series, Chapter S-DT-1, Revised. cians, Vienna, Austria, August 18-25, 1973.

Childers, D., Byrne, R., Adams, T., and Feldpausch, R. Griffin, R. (1999). Accuracy and Coverage Evaluation Sur-(2001). Accuracy and Coverage Evaluation: Person Match- vey: Post-stratification Research Methodology, DSSD Cen-ing and Followup Results, Census 2000 Procedures and sus 2000 Procedures and Operations Memorandum Series Operations Memorandum Series #B-6*. #Q-5.

References Section IReferences 1 U.S. Census Bureau, Census 2000

Griffin, R. and Haines, D. (2000). Accuracy and Coverage Ikeda, M. (1998c). Effect of Using Simple Ratio Methods Evaluation Survey: Post stratification for Dual System Esti- to Calculate P-Sample Residence Probabilities and mation, DSSD Census 2000 Procedures and Operations E-Sample Correct Enumeration Probabilities for the 1995 Memorandum Series #Q-21. Data, DSSD Census 2000 Dress Rehearsal Memorandum Series #A-30.

Griffin, R. and Haines, D. (2000b). Accuracy and Coverage Evaluation Survey: Final Post stratification Plan for Dual Ikeda, M., Kearney, A., and Petroni, R. (1998). Missing System Estimation, DSSD Census 2000 Procedures and Data Procedures in the Census 2000 Dress Rehearsal Inte-Operations Memorandum Series #Q-24 . grated Coverage Measurement Sample, Proceedings of the Survey Research Methods Section, American Statistical Griffin, R. and Malec, D. (2001). Accuracy and Coverage Association.

Evaluation: Assessment of Synthetic Assumption, DSSD Ikeda, M. and McGrath, D. (2001). Accuracy and Coverage Census 2000 Procedures and Operations Memorandum Evaluation Survey: Specifications for the Missing Data Pro-Series #B-14*.

cedures; Revision of Q-25, DSSD Census 2000 Procedures Haines, D. (2001). Accuracy and Coverage Evaluation Sur- and Operations Memorandum Series #Q-62.

vey: Synthetic Estimation, DSSD Census 2000 Procedures Kearney, A. and Ikeda, M. (1999). Handling of Missing and Operations Memorandum Series #Q-46. Data in the Census 2000 Dress Rehearsal Integrated Cov-erage Measurement Sample, Proceedings of the Survey Haines, D. (2001b). Accuracy and Coverage Evaluation Research Methods Section, American Statistical Associa-Survey: Computer Specifications for Person Dual System tion.

Estimation (U.S.) - Re-issue of Q-37, DSSD Census 2000 Procedures and Operations Memorandum Series #Q-48. Keathley, D., Kearney, A., and Bell, W. (2001). ESCAP II:

Analysis of Missing Data Alternatives for the Accuracy and Himes, C. and Clogg, C. (1992). An Overview of Demo- Coverage Evaluation, Executive Steering Committee for graphic Analysis as a Method for Evaluating Census Cov- A.C.E. Policy II, Report 12.

erage in the United States, Population Index, 58(4), 587-607. Keeley, C. (2000). Census 2000 Accuracy and Coverage Evaluation Computer Assisted Interview, DSSD Census Hogan, H. (1992). The 1990 Post-Enumeration Survey: 2000 Procedures and Operations Memorandum Series An Overview, The American Statistician, Vol. 46(4), 261- #S-QD-02.

269.

Killion, R.A. (1998). Estimation Decisions for the Inte-Hogan, H. (1993). The 1990 Post-Enumeration Survey: grated Coverage Measurement Survey for Census 2000, Operations and Results, Journal of the American Statisti- Census 2000 Decision Memorandum No. 42.

cal Association, 88, 1047-1060. Kostanich, D., Griffin, R., and Fenstermaker, D. (1999).

Census 2000 Accuracy and Coverage Evaluation Survey:

Hogan, H. (2000). The Accuracy and Coverage Evaluation:

Sample Allocation and Post-stratification Plans, DSSD Cen-Theory and Application, Proceedings of the Survey sus 2000 Procedures and Operations Memorandum Series Research Methods Section, American Statistical Associa-

  1. R-2.

tion.

Marks, E. (1979). The Role of Dual System Estimation in Hogan, H. (2001). Accuracy and Coverage Evaluation Sur- Census Evaluation, in K. Krotki (Ed.), Developments in vey: Effect of Excluding Late Census Adds, DSSD Census Dual System Estimation of Population Size and Growth, 2000 Procedures and Operations Memorandum Series Edmonton: University of Alberta Press, 156-188.

  1. Q-43.

Nash, F. (2000). Overview of the Duplicate Housing Unit Ikeda, M. (1997). Effect of Using the 1996 ICM Character- Operations, Census 2000 Information Memorandum istic Imputation and Probability Modeling Methodology on Number 78.

the 1995 ICM P and E-Sample Data, DSSD Census 2000 National Center for Health Statistics (1968). Synthetic Dress Rehearsal Memorandum Series #A-20. State Estimates of Disability, P. H. S. Publication 1759, U.S. Government Printing Office, Washington, D.C.

Ikeda, M. (1998). Effect of Different Methods for Calculat-ing Match and Residence Probabilities for the 1995 Raglin, D. and Bean, S. (1999). Outmover Tracing and P-Sample Data, DSSD Census 2000 Dress Rehearsal Interviewing, Census 2000 Dress Rehearsal Evaluation Memorandum Series #A-23. Results Memorandum Series #C-3.

Ikeda, M. (1998b). Effect of Different Methods for Calcu- Rao, J.N.K. and Shao, J. (1992). Jackknife Variance Estima-lating Correct Enumeration Probabilities for the 1995 tion with Survey Data Under Hot Deck Imputation, E-Sample Data, DSSD Census 2000 Dress Rehearsal Biometrika, 79, 811-822.

Memorandum Series #A-28. Reynolds v. Simms, 377 U.S. 533 (1964).

2 Section IReferences References U.S. Census Bureau, Census 2000

Robinson, J. G. (2001). ESCAP II: Demographic Analysis Starsinic, M. (2001). Accuracy and Coverage Evaluation Results, Executive Steering Committee for A.C.E. Policy II, Survey: Specifications for Covariance Matrix Output Files Report 1. from Variance Estimation for Census 2000, DSSD Census 2000 Procedures and Operations Memorandum Series Robinson, J. G., Ahmed, B., Gupta, P., and Woodrow, K.

  1. V-4.

(1993). Estimation of Population Coverage in the 1990 United States Census Based on Demographic Analysis, Starsinic, M. and Kim, J. (2001). Accuracy and Coverage Journal of the American Statistical Association, 88, 1061- Evaluation: Computer Specifications for Variance Estima-1071. tion for Census 2000 - Revision, DSSD Census 2000 Pro-cedures and Operations Memorandum Series #V-5.

Schindler, E. (1998). Allocation of the ICM Sample to the States for Census 2000, Proceedings of the Survey U.S. Bureau of the Census (1985). Evaluating Censuses of Research Methods Section, American Statistical Associa- Population and Housing, Statistical Training Document, tion. ISP-TR-5, Washington, D.C.

Schindler, E. (1999). Comparison of DSE C and DSE A, West, K. (1991). Final Report for PES Evaluation Project Census 2000 Dress Rehearsal Evaluation Memorandum P4: Quality of Reported Census Day Address - Evaluation

  1. C-8a. Follow-up, 1990 Coverage Studies and Evaluation Memo-randum Series #D-2.

Schindler, E. (2000). Accuracy and Coverage Evaluation Survey: Post-stratification Preliminary Research Results, Winkler, W. (1994). Advanced Methods for Record Link-DSSD Census 2000 Procedures and Operations Memoran- age, Proceedings of the Survey Research Methods Sec-dum Series #Q-23. tion, American Statistical Association.

Sekar, C. C. and Deming, W. E. (1949). On a Method of Wolfgang, G. (1999). Request for Dress Rehearsal Sur-Estimating Birth and Death Rates and the Extent of Regis- rounding Block Files for A.C.E. Research, unpublished tration, Journal of the American Statistical Association, Census Bureau Memorandum.

44, 101-115. Wolter, K. (1986). Some Coverage Error Models for Census Data, Journal of the American Statistical Association, 81, Siegel, J. (1974). Estimates of Coverage of Population by 338-346.

Sex, Race, and Age: Demographic Analysis, 1970 Census of Population and Housing: Evaluation and Research Pro- Woltman, H., Alberti, N., and Moriarity, C. (1988). Sample gram, PHC(E)-4, U.S. Bureau of the Census, Washington, Design for the 1990 Census Post Enumeration Survey, D.C. Proceedings of the Survey Research Methods Section, American Statistical Association.

Siegel, J. and Zelnik, M. (1966). An Evaluation of Cover-age in the 1960 Census of Population by Techniques of ZuWallack, R. (2000). Sample Design for the Census 2000 Demographic Analysis and by Composite Methods, Pro- Accuracy and Coverage Evaluation, Proceedings of the ceedings of the Social Statistics Section, American Statisti- Survey Research Methods Section, American Statistical cal Association. Association.

References Section IReferences 3 U.S. Census Bureau, Census 2000

Accuracy and Coverage Evaluation of Census 2000:

Design and Methodology Section II A.C.E. Revision II March 2003 U.S. Census Bureau, Census 2000

Chapter 1.

Introduction to A.C.E. Revision II INTRODUCTION BACKGROUND The Accuracy and Coverage Evaluation (A.C.E.) survey was The original A.C.E. estimates were available in February designed to measure and possibly correct net coverage of 2001, in time to allow for the possibility of correcting error in Census 2000. However, because A.C.E. failed to Census 2000 redistricting files. The Census Bureaus measure a significant number of erroneous enumerations, ESCAP recommended in March 2001 not to correct the A.C.E. did not meet these objectives. The Census Bureaus Census 2000 counts for purposes of redistricting (ESCAP I, Executive Steering Committee for A.C.E. Policy (ESCAP) 2001). The Secretary of Commerce concurred. Given the recommended twice NOT to correct the census counts.1 information available at this time, this decision was not There are, however, concerns about differential coverage based on any clear evidence that the census counts were error in Census 2000 data. While the Census 2000 data more accurate, but rather concern that there was some yet products will not be corrected, it is possible that improve- undiscovered error in the A.C.E. The A.C.E estimate of a ments could be made to the post-censal population esti- 3.3 million net undercount was much larger than the mates used for survey controls. This is the Census Demographic Analysis (DA) estimate of only 340,000.

Bureaus motivation for correcting errors in the A.C.E. data Further evaluations were conducted over the next 6 and developing improved estimates of the net undercount. months to examine the reasons for the discrepancy and to The improved estimates are called A.C.E. Revision II esti- determine if other Census 2000 data products should be mates. These revised estimates provide a better picture of corrected. The Census 2000 redistricting files were the Census 2000 coverage and will help us design a better first of many Census 2000 data products scheduled for coverage measurement program for 2010. This part of the public release. See the Census Bureaus Web site, document provides a description of the methodology used www.census.gov, for released data products. The question to produce the A.C.E. Revision II estimates. A comprehen- remained as to whether these other Census 2000 data sive technical description of the methodology used to pro- releases should be corrected.

duce the original estimates of net undercount released in March 2001 is In October 2001, the ESCAP again decided not to correct presented in the first half of this publication. the census counts for other Census 2000 data products.

Analysis of A.C.E. evaluation data and a study of dupli-This chapter summarizes the history of the two adjust- cates in the census revealed that the A.C.E. failed to ment decisions and discusses key findings and limitations. measure large numbers of erroneous census enumerations It also introduces the key components of the revision and (ESCAP II, 2001). This error called into question the quality describes the major errors being corrected. The next of the A.C.E. survey results. Some of the key findings from chapter provides an overview of the revision process and the analyses are:

subsequent chapters provide detailed methodology as follows:

  • An evaluation by Krejsa and Raglin (2001) was the first indication that A.C.E. seriously underestimated errone-Chapter 2: Summary of A.C.E. Revision II ous enumerations. This analysis revealed an additional Methodology net 1.9 million erroneously enumerated persons for Chapter 3: Correcting Data for Measurement Error those cases that could be resolved. These results are based on an independent reinterview and matching of Chapter 4: A.C.E. Revision II Missing Data Methods about 70,000 E-sample persons. Because of the serious Chapter 5: Further Study of Person Duplication in implications of this finding, a further Review Study was Census 2000 conducted.

Chapter 6: A.C.E. Revision II Estimation

  • The findings from the Review Study by Adams and Krejsa (2001) showed that A.C.E. underestimated erro-Chapter 7: Assessing the Estimates neous enumerations by a net of 1.45 million persons, which was smaller than the evaluation figure but still a 1

significantly large amount. This figure does not include The ESCAP recommendations, supporting analyses, technical assessments, and limitations can be found on the Census Bureaus unresolved cases, so the estimated amount is probably Web site at www.census.gov/dmd/www/EscapRep.html. somewhat higher. This study was based on a sample of Introduction to A.C.E. Revision II Section IIChapter 1 1-1 U.S. Census Bureau, Census 2000

about 17,000 persons selected from the 70,000 person Using Fays lower bound on the level of unmeasured erro-evaluation follow-up E sample. The most experienced neous enumerations, Thompson et al. (2001) produced a analysts reviewed these cases using both the original Revised Early Approximation of undercount for three A.C.E. person follow-up interviews as well as the reinter- race/Hispanic origin groups. These estimates were view results to determine their enumeration status. intended to be illustrative of net undercount and possible

  • Mule (2001) showed that Census 2000 suffers from a coverage differences. The same methodology and data large number of duplicate enumerations, i.e., persons were later used to expand the calculations to seven who were double counted. Mule computer-matched race/Hispanic origin groups. See Fay (2002) and Mule census enumerations in A.C.E. block clusters to those (2002) for details. These preliminary estimates show a across the entire country. The matching used by Mule very small net undercount. The data also indicate that the was conservative in picking up census duplicates given differential undercount has not been eliminated. These his requirement for exact matching at the first stage.

results are limited to the extent that they only provide Within the A.C.E. block clusters, Mule found only 38 information at the national level for broad population percent of the in-scope duplicates that A.C.E. found, groups. Furthermore, these preliminary approximations leading us to believe that his matching algorithm was underestimating duplicates in the census. Note that were based on a small subset of A.C.E data and only par-A.C.E. was not designed to estimate duplicates outside tially correct for errors in measuring erroneous enumera-the search area, and this, itself, was not a design flaw. tions. Potential errors in measuring omissions were not A.C.E. was, however, expected to determine which accounted for.

census enumerations were erroneous because they were reported at the wrong residence. The design of A.C.E. In summary, the A.C.E. results were not acceptable Revision II accounts for duplicates outside the search because A.C.E. failed to measure large numbers of errone-area. Mules study did not distinguish which of the ous census enumerations. This was the reason for not duplicate pair was correct and which was erroneous, using the A.C.E., but this does not mean that there were but one could easily speculate that half of these should no other errors in the A.C.E. In particular, there was con-be correct and half should be erroneous.

cern about P-sample cases that matched to enumerations

  • Feldpausch (2001) examined the A.C.E. enumeration suspected of being duplicates. If the E-sample case was status for E-sample cases identified by Mule (2001) as erroneous, then that match cannot be valid. The extent of duplicates outside the search area. Only 14 percent of this problem was not quantified at the time of the ESCAP II the E-sample persons that were duplicates of a person decision. The level of other errors was small by compari-in a housing unit were coded as erroneous by A.C.E. son, and therefore, was not a major factor in this decision.

This was much lower than the expected 50 percent, See Hogan et al. (2002) and Mulry and Petroni (2002) for indicating that A.C.E. underestimated erroneous enu-further information.

merations due to not perceiving that these E-sample persons should have been counted at other residences.

Plans for Revising the 2000 A.C.E. Estimates Note that these results suggest measurement error in the original A.C.E. figures released. Even though the ESCAP recommended twice not to correct the census counts, they had concerns about differential

  • Fay (2001, 2002) then compared the enumeration status for the E-sample Review Study cases to the duplicates coverage error in Census 2000 data. They thought it pos-identified by Mule (2001) outside the search area. Only sible that further research resulting in revised estimates of 19 percent of the review cases that were duplicates of a coverage could potentially be used to improve the post-person in a housing unit were coded as erroneous by censal estimates. In addition, revised estimates would the Review Study. Again, this was much less than the provide a better understanding of Census 2000 coverage expected 50 percent, indicating that the evaluation data error that could be used to improve census operations for and the special review did not identify all the erroneous 2010 and would help in developing better methodologies enumerations. Using these data, Fay then produced a for the 2010 coverage measurement program.

lower bound on the level of unmeasured erroneous enumerations of 2.9 million. The major objective was to produce improved estimates of

  • There was also evidence that similar problems may the household population that could be used to measure have affected the population sample (P sample) which is net coverage error in Census 2000. This meant obtaining used to measure the omission rate. A.C.E. evaluation better estimates of erroneous census enumeration from data from Raglin and Krejsa (2001) show that there are the E sample and obtaining better estimates of census measurement errors in determining residency and omissions from the P sample. Furthermore, since the mover status. national net undercount, as indicated by both DA and the 1-2 Section IIChapter 1 Introduction to A.C.E. Revision II U.S. Census Bureau, Census 2000

Revised Early Approximations, was very close to zero them in a better position to resolve apparent discrepan-and the census included large numbers of erroneous enu- cies. It was not possible to completely code all cases merations in the form of duplicates, it was imperative that because of missing or conflicting information; however, the revised methodology carefully account for both over- the number of conflicting cases was relatively small.

counts and undercounts. Hogan (2002) summarized the The duplicate study was designed to further improve major revision issues in the form of the following five estimates of erroneous census enumerations and census challenges:

omissions. This study used computer matching and mod-eling techniques to identify E- and P-sample cases that link

1. Improve estimates of erroneous census enumerations to census enumerations across the entire country, includ-
2. Improve estimates of census omissions ing group quarters, reinstated, and deleted census cases.

For the E-sample links, this study does not identify which

3. Develop new models for missing data enumeration is correct and which is the duplicate. For P-sample links, this study does not identify whether the
4. Enhance the estimation post-stratification correct Census Day residence is at the P-sample location or the census location. This information is used to model
5. Consider adjustment for correlation bias the probability that an E-sample linked case is a correct There were no field operations associated with the A.C.E. enumeration or that a P-sample case is a resident on Revision II process. Because of the late date, it was not Census Day.

feasible (or practical) to revisit households for additional New missing data models were developed to reflect the data collection. Consequently, the revisions were based on different types of missing data now possible as a result of data that had already been collected. One aspect of the the recoding operation. There were three new types of strategy for revising the coverage estimates involves cor- missing data to deal with:

recting measurement error using information from the A.C.E. evaluation data. This is referred to as the recoding 1. P-sample households that were originally considered operation. Another aspect of these corrections involves interviews, but the recoding determined that there conducting a more extensive person duplicate study to were no valid Census Day residents, correct for measurement error that was not detected by 2. cases with unresolved match, enumeration, or resi-A.C.E. evaluations. This is referred to as the Further Study dency status because of incomplete or ambiguous of Person Duplication (FSPD). The estimation method, interview data, and discussed briefly in Chapter 2 and more fully in Chapter 6, is designed to handle overlap of errors detected by both 3. cases with conflicting enumeration or residency sta-of these studies to avoid overcorrecting for measurement tus, because contradictory information was collected error. in the A.C.E. PFU and EFU interviews.

It was impossible to determine which data were valid for The recoding operation was designed to improve esti-these cases. A household noninterview weighting adjust-mates of erroneous census enumerations and census ment using new cell definitions was used for type 1 omissions. It uses the original A.C.E. person interview (PI) above. Imputation cells and donor pools were developed and person follow-up (PFU), the evaluation follow-up inter-for the second type of missing data based on detailed view (EFU), the matching error study (MES), and the responses to the questionnaire. For the conflicting cases PFU/EFU Review Study2 to correct for measurement error in type 3 above, there were no applicable donor pools, in enumeration status, residence status, mover status and and probabilities of 0.5 were imputed for correct enumera-matching status. This effort involved extensive recoding tion status and Census Day residency status. Fortunately, of about 60,000 P-sample cases and more than 70,000 the recoding operation resulted in a relatively small num-E-sample cases.3 An automated computer algorithm was ber of these cases.

used to recode most of the cases, but many required a clerical review by experienced analysts at the National The revision effort incorporates separate post-strata for Processing Center. The analysts had access to the ques- estimating census omissions and erroneous census enu-tionnaire responses, as well as interviewer notes that put merations because the factors related to each of these are likely to be different. The research effort focused on deter-mining variables related to erroneous enumerations. This 2

The PFU/EFU Review Study was not a planned evaluation. It was because much of the previous work on developing was a special study conducted in a subsample of the evaluation data to resolve discrepancies between enumeration status in the post-strata focused only on the census omissions, and by PFU and EFU. default, the same post-strata were applied to the errone-3 These are probability subsamples of the original A.C.E. P and ous inclusions. For the E sample, some of the original E samples and in the context of A.C.E. Revision II are called revi-sion samples, but they are in fact equivalent to the evaluation post-stratification variables have been eliminated and follow-up samples. additional variables have been included. Variables such as Introduction to A.C.E. Revision II Section IIChapter 1 1-3 U.S. Census Bureau, Census 2000

region, Metropolitan Statistical Area/type of census enu- might not even be in the right direction when the estimate meration area, and tract-level return rate were replaced by is close to zero. For example, if there is a small true net proxy status, type and date of census return, and house- undercount, its possible to estimate an overcount because hold relationship and size. For the P sample, only the age the DSE would underestimate population in the presence variable was modified to define separate post-strata for of correlation bias. Estimates of correlation bias were cal-children aged 0 to 9 and those 10 to 17. This was done culated using the two-group model and sex ratios from because the DA estimates suggested different coverage DA. The sex ratio is defined as the number of males for these groups. The estimated correct enumeration rates divided by the number of females. This model assumes no and estimated match rates are used to calculate Dual correlation bias for females or for males less than 18 years System Estimates (DSEs) for the cross-classification of the of age, and that Black males have a relative correlation E and P post-strata. bias that is different from the relative correlation bias for non-Black males. The correlation bias adjustment is also The A.C.E. Revision II DSEs include an adjustment for done by three age categories: 18-29, 30-49, and 50 and correlation bias. Correlation bias exists whenever the over with the exception of non-Black males 18 to 29 years probability that an individual is included in the census is of age. This is because the A.C.E. Revision II sex ratios for dependent on the probability that the individual is non-Blacks 18-29 exceed the corresponding modified DA included in the A.C.E. This form of bias generally has a sex ratio and is likely a result of a data problem. This downward effect on estimates, because people missed in model further assumes that relative correlation bias is the census may be more likely to also be missed in the constant over male post-strata within age groups.

A.C.E. Since the intent of the A.C.E. Revision II is to esti-mate the net coverage error, it is important to carefully The DSEs, adjusted for correlation bias, are used to pro-account for errors of omissions and errors of erroneous duce coverage correction factors for each of the cross-inclusions. In previous coverage measurement surveys, classified post-strata. These factors are applied or carried the erroneous inclusions were assumed to be much down within the post-strata to produce estimates for smaller than omissions. Consequently, not adjusting for geographic areas such as counties or places. This process correlation bias had the effect of understating the net is referred to as synthetic estimation. The key assumption undercount and relative to the census was a correction underlying this methodology is that the net census that was in the right direction, but just not big enough. In coverage, estimated by the coverage correction factor, is the presence of large numbers of overcounts, this assump- relatively uniform within the post-strata. Failure of this tion is no longer valid and its possible that a correction assumption leads to synthetic error.

1-4 Section IIChapter 1 Introduction to A.C.E. Revision II U.S. Census Bureau, Census 2000

Chapter 2.

Summary of A.C.E Revision II Methodology INTRODUCTION 4. At this point the A.C.E. Revision II program com-menced. The Revision E and P samples were devel-The original A.C.E. estimates were found to be unaccept- oped for purposes of producing A.C.E. Revision II esti-able because they failed to detect significant numbers of mates. They are each comprised of about 70,000 erroneous census enumerations. There were also suspi- sample persons. These samples were essentially the cions that the A.C.E. may have included residents in its P same as the evaluation E and P samples for EFU, but sample that were actually nonresidents. Thus, the major the data have undergone a major recoding to correct goal in revising the A.C.E. estimates included a correction for measurement error. These data, along with other of these measurement errors. One aspect of these correc- measurement error corrections identified by the dupli-tions involved correcting a subsample of the A.C.E. data. cate study, were used to adjust the Full E and P Another aspect involved correcting measurement errors samples to produce A.C.E. Revision II estimates.

that could not be detected with the information available in the subsample. These additional errors were identified The A.C.E. Revision II process is presented below. First, the via a duplicate study. The purpose of this chapter is to corrections for measurement error (undetected erroneous present a high-level overview of the process used to pro- enumerations and P-sample nonresidents) in the Revision duce A.C.E. Revision II estimates of the population cover- Samples are explained. Then, a discussion is given of the age of Census 2000. Further details concerning the meth- missing data methods applied to cases whose match, resi-odology and procedures are included in subsequent dency or enumeration status had changed in the Revision chapters. Samples. Next, the process for identifying census dupli-cates across the entire nation is discussed. An applicable Background dual system estimation formula that incorporates these changes and accounts for correlation bias is presented.

The chronology of events leading to the corrected A.C.E. Finally, synthetic estimation was employed to produce Revision II results were as follows: A.C.E. Revision II results. See Kostanich (2003) for a sum-mary of the methodology.

1. The A.C.E. estimates produced in March 2001 were based on the Full E and P samples, which were prob- CORRECTING MEASUREMENT ERROR ability samples of over 700,000 persons in 11,303 IN THE REVISION SAMPLES block clusters.

As previously stated, the original A.C.E. process (step 1.

2. The Matching Error Study (MES) and the Evaluation above) failed to detect significant numbers of erroneous Follow-up (EFU) were two programs that evaluated the census enumerations (EEs). These undetected EEs (one March 2001 A.C.E. estimates. The MES measured part of measurement error in the A.C.E.) were uncovered errors introduced when the census and A.C.E. inter- during the evaluations of the A.C.E. (step 2. above).

views were matched. The EFU, which was designed to In general, the original A.C.E. Person Interview (PI) and study unusual living situations, entailed another inter- PFU, the EFU interview, the MES, and the PFU/EFU Review view. It evaluated the Census Day residency, enumera- results were used to correct for measurement error in the tion status and mover status assigned during the enumeration, residency, mover, and match statuses for A.C.E. interview and A.C.E. Person Follow-up (PFU) subsamples of the Full A.C.E., called the Revision E and P interview. The MES and EFU were conducted in a sub- samples. No additional data were collected in this mea-sample of 2,259 block clusters selected from the origi- surement error correction process.

nal 11,303 block clusters. A further subsample of per- The Revision Samples underwent extensive recoding using sons within these block clusters was selected for the all available data indicated above. This recoding included EFU evaluation. the original interview and matching results, the evaluation interview and matching results, as well as the recoding

3. The PFU/EFU Review occurred next; it was not part of done for the PFU/EFU Review.

the planned evaluations. It was done in order to resolve major discrepancies in enumeration status The A.C.E. Revision II recoding operation was an extension between the EFU and PFU results. Thus, the Review of the PFU/EFU Review clerical recoding, which was used E sample was a subsample of the EFU E sample. to examine discrepancies between enumeration status in Summary of A.C.E. Revision II Methodology Section IIChapter 2 2-1 U.S. Census Bureau, Census 2000

the original A.C.E. and the Evaluation Follow-up (EFU). work, these missing data adjustments for the Full A.C.E. E Given the information available, the recoding that was and P samples were essentially unchanged from those done on the 17,500 cases in the Review E sample was used to produce the March 2001 A.C.E. estimates.

considered to have negligible error, since these data were For the Revision E and P samples, however, there were reviewed and recoded by expert matchers using rules three new types of missing data to deal with:

consistent with census residence rules.

An automated coding algorithm based on specific 1. Noninterviewed households: Revision P-sample house-responses to the PFU and the EFU questionnaires was used holds that were considered interviews in the A.C.E. P to determine an appropriate code for each case. This was sample, but were identified as noninterviews in the done for both the PFU interview and the EFU interview. Revision coding because it was determined that there The automated coding also assigned a Why code that were no valid Census Day residents; described the reason why the particular code was

2. Revision E- or P-sample cases with unresolved match, assigned.

enumeration, or residency status because of incom-A three-step process was followed to assign final codes to plete or ambiguous interview data; each case:

3. Revision E- or P-sample cases with conflicting enu-
1. Validation. Determine, for categories of Why codes, meration or residency status. This occurred when con-if the automated coding was of high quality based on tradictory information was collected in the A.C.E. PFU level of agreement with the Review data. and the EFU interviews and it could not be determined which was valid.
2. Targeting. Target only those Why code categories that had codes produced by automated coding that Household Noninterview Adjustment had low levels of agreement with the Review data. for the Revision P Sample
3. Clerical coding. Clerically recode only cases in the For the original March 2001 A.C.E. estimates, the house-targeted Why code categories. The clerical recoding hold noninterview adjustment generally spread the took advantage of handwritten interviewer comments.

weights of the Full P-sample noninterviewed housing units In general, cases did not go to clerical review if both the over interviewed housing units in the same block cluster PFU and EFU automated codes agreed, the mover statuses with the same housing unit structure type. The methodol-also agreed, and the Why code category was deemed to ogy for the Revision P-sample household noninterview be of high enough quality. adjustment for Interview Day was essentially unchanged from that used for the Full P sample. There was, however, After the A.C.E. Revision II recoding operation corrected an important change for the noninterview adjustment for for enumeration, residency, and mover status, the results Census Day residency. A separate cell was defined for new of the MES were used to correct for false matches and noninterviews due to whole households of persons deter-false nonmatches. Some matching errors were a result of mined to be inmovers or nonresident outmovers based on incorrect residency status coding and had been corrected the recoding that was done to correct for measurement as part of the recoding operation discussed above. To error.

determine the correct match status, each of the possible combinations of match status was reviewed to determine Imputation for Revision E- or P-Sample the appropriate match status for each type of case. In gen-Unresolved Cases eral, the MES match status was assigned when there were changes from a match to a nonmatch or changes from a In the Full A.C.E. P sample, persons with unresolved Cen-nonmatch to a match. For other situations the match sus Day residency or match status came about in two status from the EFU coding was assigned. See Krejsa and ways. First, the person interview (PI) may not have pro-Adams (2002) for further details. vided sufficient information for matching and follow-up.

Second, the Person Follow-up (PFU) may not have collected ADJUSTMENT FOR MISSING DATA adequate information to determine a persons Census Day residency status or their match status. The imputation As with all survey data, it is not possible to obtain inter- method differed by how the case came to be unresolved.

views for all sample cases, nor is it possible to obtain answers to all interview questions. For the Full A.C.E. E Revision P-sample persons with insufficient information and P samples, household noninterview adjustments were for matching and follow-up tended also to have had insuf-used to adjust for noninterviewed households. In addition, ficient information in the original coding of the Full P imputation methods were used to adjust for missing char- sample, except for some rare coding changes. These per-acteristics such as age or tenure, as well as enumeration, sons with insufficient information were not sent out for an residency, and match status. For the A.C.E. Revision II Evaluation Follow-up interview.

2-2 Section IIChapter 2 Summary of A.C.E. Revision II Methodology U.S. Census Bureau, Census 2000

For the Revision P sample, the imputation of Census Day For purposes of constructing A.C.E. Revision II estimates, residency was improved upon by defining finer imputation the study of person duplication used matching and model-cells that included whether or not the housing unit was ing techniques to identify duplicate links between the Full matched, not matched, or had a conflicting household. E and P samples to census enumerations. Links to group The probability of a match was imputed based on the quarters, reinstated, deleted and E-sample eligible records overall match rate for five groups defined by mover sta- throughout the entire nation were allowed. The matching tus, housing unit match status as in the original A.C.E., algorithm used statistical matching to identify linked and also on conflicting household status. records. Statistical matching allowed for the matching For Revision P- and E-sample persons who were unre- variables not to be exact on both records being compared.

solved because of ambiguous or incomplete follow-up Because linked records may not refer to the same indi-information, the situation was more complicated because vidual even when the characteristics used to match the there were two follow-up interviews to consider, the PFU records were identical, modeling techniques were used to and EFU. assign a measure of confidence, the duplicate probability, that the two records refer to the same individual.

For the Full E and P samples, imputation cells were based mostly on information obtained before any follow-up was Matching Algorithm conducted. For the Revision E and P samples, imputation cells relied on the after follow-up information. This change The matching algorithm consisted of two stages. The first was the single most important improvement in the miss- stage was a national match of persons using statistical ing data methodology. matching. Statistical matching links records based on simi-lar characteristics or close agreement of characteristics.

Imputation for Revision E- or P-Sample Statistical matching allowed two records to link in the Conflicting Cases presence of missing data and typographical or scanning When the A.C.E. PFU and EFU interviews had contradictory errors. The second stage of matching was limited to information, the case was assigned a code of conflicting. matching persons within households that contained a link All cases determined to be conflicting based on the from the first stage.

automated recoding were sent to analysts for further clerical review. By examining the handwritten notes of The second stage of matching was limited to matching interviewers, the analysts could often determine which of persons within linked households. The first stage estab-the interviews was better and assign an appropriate code. lished a link between two housing units. The second stage There were some cases where the interviews appeared to was a statistical match of all household members in the be of equal quality, such as both respondents were house- sample housing unit to all household members in the cen-hold members or both respondents were of equal caliber sus housing unit.

proxy. For these conflicting cases, the interviews seemed equally valid based on the expertise of the analysts. Modeling Techniques Therefore, probabilities of 0.5 were imputed for correct enumeration for Revision E-sample conflicting cases and The set of linked records consists of both duplicated enu-for Census Day residency for Revision P-sample conflicting merations and person records with common characteris-cases. tics. Using two modeling approaches, the probability that the linked records were the same person was estimated.

FURTHER STUDY OF PERSON DUPLICATION One approach used the results of the statistical matching Earlier work showed that correcting measurement error by and relied on the strength of multiple links within the recoding was not going to correct all the missed errone- household to indicate person duplication. The second ous enumerations. Evaluations of the March 2001 A.C.E. relied on an exact match of the census to itself and the coverage estimates indicated the A.C.E. failed to detect a distribution of births, names, and population size to indi-large number of erroneous census enumerations. One type cate if the individual link was a duplicate. These two of census erroneous enumerations was duplicate census approaches were combined to yield an estimated duplicate enumerations; that is, census enumerations included in probability for the linked records from the statistical the census two or more times. The A.C.E. was not specifi- matching of the Full E and P samples to the census. See cally designed to detect duplicate census enumerations Chapter 5 for a full discussion on the person duplication beyond the A.C.E. search area (the area where census and study.

A.C.E. person matching was conducted). However, there was an expectation that the A.C.E. would detect that these THE A.C.E. REVISION II DSE FORMULA E-sample enumerations had another residence and that roughly half the time this other residence was the usual With the correction of measurement error in the Revision E residence. Feldpausch (2001) showed this expectation was and P samples, the adjustment for missing data in the not met. Revision E and P samples, and the determination of census Summary of A.C.E. Revision II Methodology Section IIChapter 2 2-3 U.S. Census Bureau, Census 2000

duplicate links between the Full E and P samples and cen- was necessary to include the probability of not being a sus enumerations, the dual system estimation formula can duplicate in the tallies. This probability of not being a be applied. The following sections explain the formula and duplicate was included in all of the terms involving the ND its adjustment for the A.C.E. Revision II work. superscript.

Using procedure C for movers and different post-strata for Although the duplicate study identified E- and P-sample the E and P samples, the DSE formula can be written as: cases linking to census enumerations outside the A.C.E.

search area, this study could not determine which compo-

[ ]

CEi Ei nent of the link was the correct one since there were no additional data collected to determine this. On the C

Cen'ij II'ij

[ ]

Mom,j E-sample side, this study does not identify whether the

[ ]

DSE ij Mnm,j Pim,j Pom,j linked E-sample case is the correct enumeration. On the Pnm,j Pim,j P-Sample side, this study does not identify whether the linked P-sample case is a resident on Census Day. Thus, it The A.C.E. Revision II DSE formula using procedure C for was necessary to estimate two conditional probabilities, movers, separate E and P post-strata, measurement error which are reflected for the E sample in C EDi . In the P corrections from the E and P Revision Samples, and dupli- sample, these probabilities are reflected in the nonmover cate study results is: D D terms P nm,j and Mnm,j .

CEiND f1,i'C

[ Ei EDi

] Adjustment for Measurement Error Using the Revision E and P Samples

[ ]

Cen'ij II'ij

[ D Mom,j f3,j' P f g PD -

]

C ReDSE ND D ij Mnm,j f2,j' M nm,j im,j 5,j' nm,j P nm,j Pom,j f4,j' ND Pnm,j f6,j' D P nm,j D Pim,j f5,j' g Pnm,j - D P nm,j Next, an adjustment is made for other measurement errors not accounted for by the duplicate study. This Recall that the II' term excludes the late census adds.

adjustment was applied only to nonduplicate terms to avoid over-correction due to any overlap between the Notation duplicate study and correction of measurement error.

Terms CE Correct enumerations In support of the A.C.E. Revision II program, the Revision E E-sample total Samples have undergone extensive recoding using all M Matches P P-sample total available interview data and matching results. Missing f Adjusts for measurement error data adjustments have also been applied to the Revision g Adjusts nonmovers to movers due to Samples. This recoded data from the Revision Samples duplication were used to correct for measurement error in the original Subscripts i, j Full E and P post-strata Full E and P samples.

i', j' Revision E and P measurement error correction post-strata The ratio adjustments that correct for measurement error nm, om, im nonmover, outmover, inmover were based on the E or P Revision Sample and were a ratio Superscripts C DSE Procedure C for movers of an estimate using the Revision coding to the estimate ND Not a duplicate to census enumeration using the original coding. These adjustments were done outside search area D Duplicate to census enumeration outside search by measurement error correction post-strata i' or j' and are area denoted by the f terms in the A.C.E. Revision II DSE for-Includes probability adjustment for residency mula.

given duplication The term g adjusts the number of inmovers for those Full P-sample nonmovers who are determined to be nonresi-Adjustment for Duplicates using the Duplicate dents because of duplicate links. Some of these nonresi-Study dents are nonresidents because they are inmovers and The first task was to adjust the usual dual system estimate should be added into the count of inmovers. The term PD D formula for those cases that have a link to a census enu- nm,j - P nm,j is an estimate of nonresidents among nonmov-meration outside the A.C.E. search area. P- and E-sample ers with duplicate links.

cases with links to census enumerations were assigned a nonzero probability of being a duplicate. P- and E-sample Adjustment for Correlation Bias Using cases without duplicate links were assigned a probability Demographic Analysis of zero.

Next, the A.C.E. Revision II DSE estimates are adjusted to When estimating terms in the A.C.E. Revision II DSE involv- correct for correlation bias. Correlation bias exists when-ing nonduplicates, those indicated by a superscript ND, it ever the probability that an individual is included in the 2-4 Section IIChapter 2 Summary of A.C.E. Revision II Methodology U.S. Census Bureau, Census 2000

census is not independent of the probability that the indi- totals, since it too is not part of the A.C.E. universe. Since vidual is included in the A.C.E. This form of bias generally this population is small, the DA sex ratios would not be has a downward effect on estimates, because people affected in any meaningful way. See U.S. Census Bureau missed in the census may be more likely to also be missed (2003) for technical details.

in the A.C.E. Estimates of correlation bias are calculated using the two-group model and sex ratios from Demo- SYNTHETIC ESTIMATION graphic Analysis (DA). The sex ratio is defined as the num- The coverage correction factors for detailed post-strata ij ber of males divided by the number of females. This were calculated as:

model assumes no correlation bias for females or for R eDSECij males under 18 years of age; and that Black males have a correlation bias, which is different than the relative corre- CCFij Cenij lation bias for non-Black males. The correlation bias where:

adjustment is also done by three age categories: 18-29, 30-49, and 50 and over. This model further assumes that R eDSEijCs are the correlation bias adjusted DSEs for relative correlation bias is constant over male post-strata post-strata ij.

within age groups. The Race/Hispanic Origin Domain vari- Cenijs are the census counts for post-strata ij, able is used to categorize Black and non-Black. including late census adds.

A coverage correction factor was assigned to each post-The DA totals are adjusted to make them comparable with stratum. The post-strata excluded persons in group quar-A.C.E. Race/Hispanic Origin Domains. Black Hispanics are ters or in remote Alaska. Effectively, these persons have a subtracted from the DA total for Blacks and added to the coverage correction factor of 1.0. In dealing with duplicate DA total for non-Blacks. This is done because the A.C.E.

links to group quarters persons, the person in the group assigns Black Hispanics to the Hispanic domain, not the quarters was treated as if (s)he was a correct enumeration Black domain. The second adjustment deletes the group or as if this was their correct residence on Census Day. A quarters (GQ) people from the DA totals using Census synthetic estimate for any area or population subgroup b 2000 data. The reason for making this adjustment is that is given by:

the GQ population is not part of the A.C.E. universe. A final adjustment that could have been made would have been to remove the remote Alaska population from the DA Nb Cenb,ij CCF ij ijb Summary of A.C.E. Revision II Methodology Section IIChapter 2 2-5 U.S. Census Bureau, Census 2000

Chapter 3.

Correcting Data for Measurement Error INTRODUCTION The A.C.E estimates produced in March 2001 were based on the Full E and P samples, which are probability samples The original A.C.E. estimates were found to be unaccept- of over 700,000 persons in 11,303 block clusters. The able because they failed to detect significant numbers of Matching Error Study (MES) and the Evaluation Follow-up erroneous census enumerations. There were also suspi- (EFU) were two programs that had been planned to evalu-cions that the A.C.E. may have included residents in its P ate the March 2001 A.C.E. estimates. These evaluations sample that were actually nonresidents. Thus, the major were conducted in a subsample of 2,259 block clusters goal for the A.C.E. Revision II estimates includes a correc- selected from the original 11,303 block clusters. A further tion of these measurement errors. One aspect of these subsample of persons within these block clusters was corrections involves correcting a subsample of the A.C.E. done for the EFU evaluation. The probes used for EFU were data. Another aspect involves correcting measurement designed to capture unusual living situations. The errors that cannot be detected with the information PFU/EFU Review was not part of the planned evaluations.

available in the subsample. These additional errors, which It was conducted in order to resolve major discrepancies are identified via a duplicate study, are discussed in in enumeration status between the EFU and PFU results.

Chapter 5. Thus, the Review E sample is a subsample of the EFU E sample. The Revision E and P samples are referred to as To understand the measurement error correction process, such for purposes of producing A.C.E. Revision II esti-it is important to be familiar with the various sources of mates. These samples are essentially the same as the available information. These are summarized in the follow- Evaluation E and P samples for EFU, but the data have ing table. undergone a major recoding to correct for measurement Table 3-1. Overview of A.C.E. Revision II Data Sources Program Sample Sample size What & when Decennial census Spring 2000 A.C.E. Full E and P samples E & P: About 700,000 persons in A.C.E. Person Interviewing 11,303 block clusters (PI), Summer 2000 A.C.E. Person Follow-up (PFU), Fall 2000 Matching Error Study Evaluation E and P samples E & P: About 170,000 persons in Rematching Operation, December 2000 (MES) 2,259 block clusters Evaluation Follow-up EFU E and P samples1 E: About 77,000 persons in Evaluation Person Follow-up (EFU),

(EFU) 2,259 block clusters January - February, 2001 P: About 61,000 persons in 2,259 block clusters PFU/EFU Review Review E sample E: About 17,500 persons in Recoding Operation, Summer 2001 2,259 block clusters A.C.E. Revision II Revision E and P samples E: About 77,000 persons in Recoding Operation, Summer 2002 2,259 block clusters P: About 61,000 persons in 2,259 block clusters 1

The number of sample cases included in the Evaluation Follow-up is less than those selected to be in this sample. Cases were excluded from follow-up for certain situations such as insufficient information or a duplicate enumeration.

Correcting Data for Measurement Error Section IIChapter 3 3-1 U.S. Census Bureau, Census 2000

error. This chapter discusses the measurement error cor- Two other sources of error were not part of the measure-rections made to the E- and P- Revision samples. These ment error recoding portion of the A.C.E. Revision II.

corrected data, along with other measurement error cor- These errors included geocoding errors and duplicates rections identified by the duplicate study, were used to outside the search area. Certain geocoding errors detected adjust the Full E and P samples to produce A.C.E. by various geocoding evaluations were not included in the Revision II estimates. A.C.E. Revision II.3 Within the P sample, 245,926 produc-tion nonmatched residents were found outside the search GOALS AND BACKGROUND area4 and 195,321 production correct enumerations in the The goal for A.C.E. Revision II was to correct as much mea- E sample were found outside the search area. See Adams surement error as possible in the original A.C.E. estimates, and Liu (2001). Some of the correct enumerations outside given resource and timing constraints.2 The primary the search area were identified by the EFU interview and, sources of measurement error were determining residence hence, were reflected in the revised coding.5 Duplicates and enumeration status, match status, and mover status. found outside the search area as a result of computer matching (see Chapter 5) were not handled by clerical cod-Residence and Enumeration Status. The original ing. They were accounted for in the dual system estimator A.C.E. did not detect all of the erroneous enumerations. using estimation techniques. See Chapter 6 for a full See Adams and Krejsa (2001) and Fay (2002) for documen- description of the estimation techniques.

tation. The Evaluation Follow-up (EFU) detected approxi-mately 1.4 million additional erroneous enumerations in RESIDENCE STATUS AND ENUMERATION STATUS the E sample. Since the coding of enumeration status in the E sample was identical to the coding of residence sta- As already noted, the original March 2001 A.C.E. underes-tus in the P sample, similar results for P-sample residence timated the number of erroneous enumerations. To correct status coding were expected (i.e., additional nonresidents for this, the best residence status code was based on were expected to be found as a result of the EFU). To cor- available field follow-up data. Duplicates were corrected rect for the residence status errors, the A.C.E. Revision II using a separate process. The following data were avail-utilized a recoding of the Evaluation Follow-up Interview in able for measurement error correction:

combination with the original A.C.E. to determine the best

  • Person Interview (PI). The PI was the original A.C.E.

residence or enumeration status for each person in the enumeration of the P sample. It was a Computer-Revision sample. Assisted Personal Interview questionnaire designed to Matching Error. The Matching Error Study showed a net fully enumerate persons in the A.C.E. It was conducted difference in match codes between the original March by either phone or personal visit between April and 2001 matching results and the evaluation matching September, 2000.

results of 0.41 percent in the E sample and 0.20 percent in

  • Person Follow-up (PFU). The PFU was the follow up the P sample. Bean (2001) suggested this net difference used to assign residence and enumeration status, when-translated into an increase in the dual system estimate of ever those items were not determined, after the before 483,938 people. To correct for matching error, results of follow-up matching (Childers, 2001). It was conducted the Matching Error Study and the A.C.E. Revision II recod- by personal visit in October and November, 2000, ing were used in conjunction to determine the appropriate approximately 6-7 months after Census Day.

match status for each person.

  • Evaluation Follow-up (EFU). The EFU was an evalua-Mover Status. Raglin and Krejsa (2001) estimated a 2.6 tion of the A.C.E. designed to detect unusual living situ-percent gross difference rate in the mover status between ations using additional probes and additional interview-the original A.C.E. and the Evaluation Follow-up. This ing techniques (e.g., flashcards). It was conducted by translated into a negative bias of 465,000 in the DSE personal visit in January and February, 2001, approxi-(assuming no other biases). Results of the Evaluation mately 9-10 months after Census Day.

Follow-up were used to correct for mover status errors.

The EFU questionnaire contained questions designed to 3

probe for a persons mover status. This information was As part of the A.C.E., several evaluations of geocoding error were conducted on various subsamples of the A.C.E., most nota-captured during the clerical recoding and during the initial bly Targeted Extended Search 2 (TES2) and Targeted Extended coding of the Evaluation Follow-up form. These types of Search 3 (TES3). Results of these evaluations can be found in measurement errors were corrected either by computer or Adams and Liu (2001).

4 For the 2000 A.C.E., the search area, or area in which a per-clerically. son can be considered a correct enumeration or match, was the cluster and any census block touching the cluster.

5 Some of the cases in TES2 were evaluated using the Evalua-2 In order to complete the A.C.E. Revision II estimates on time, tion Follow-up questionnaire. For these cases, results of the geoc-12 weeks were allotted for coding. Analysts at the National Pro- oding evaluation were included in the Evaluation Follow-up; how-cessing Center were expected to code approximately 25,000 ever, if a case was in TES2, but not in the Evaluation Follow-up, no cases in this time frame. geocoding evaluation results were included.

3-2 Section IIChapter 3 Correcting Data for Measurement Error U.S. Census Bureau, Census 2000

Results of the Person Interview were used to assign A.C.E. Why code for each case. Then, for both the PFU and EFU residence status by computer to all people in A.C.E. who forms, the percentage agreement with the original coding did not need follow-up. In contrast, the PFU was used to (either production coding or the coding of the EFU form) assign residence status for anyone who was eligible for for the respective form, the percentage agreement with follow-up (Childers, 2001). The PFU is similar to the PI. the PFU/EFU Review, and the residual risk were examined.

The PFU process interviewed both P-sample and E-sample That is, the following calculations were performed twice -

people. The EFU followed up a sample of people sent to once for PFU and once for EFU.

PFU and a sample of those not sent to PFU. This allowed The residual risk of disagreement (i.e. potential bias) rep-the residence/enumeration status of a representative resented the number of cases at risk for being coded sample of people eligible for field follow up to be wrong due to accepting the automated code for categories evaluated.

defined by questionnaire responses. Cases subject to risk There were measurement errors in both the A.C.E. PFU and were those where the automated code and original code EFU resulting from limitations of their respective inter- agreed. If they disagreed, the automated code was views. These errors are documented in Bean (2001) and rejected and the case was sent for clerical review. The risk Adams and Krejsa (2001), respectively. Also, the EFU was for the cases agreeing is calculated as follows:

not strictly coded according to census residence rules. To evaluate the E sample for ESCAP II, the Census Bureau con- risk = AgreeK AgreeRev ducted the PFU/EFU Review in the summer of 2001. Expert where matchers reviewed a subsample of the EFU E sample and AgreeK = The weighted number of cases whose code applied consistent census residency rules. These analysts from the keyed data agreed with the original produc-were assumed to make negligible errors; therefore, the tion code.

PFU/EFU Review was considered to be free of coding error, given available data. AgreeRev = Of those cases where the code from the keyed data agreed with the original production code, For A.C.E. Revision II, this high-quality coding was needed the weighted number of cases whose code from the for subsamples of the A.C.E. P and E samples that were keyed data agreed with the PFU/EFU Review code.

large enough to provide accurate subgroup estimates of net coverage. Twelve weeks coding time were allotted to The term risk, rather than an error, is used because some clerically code approximately 25,000 cases. However, potential coding changes may not have had an effect on there were over 100,000 cases needing codes. To assign the DSE. For example, people who were in group quarters the highest quality codes, while meeting a demanding have a residual risk of 26,517 after computer coding.

schedule, keyed data from both the PFU and EFU forms These represent cases that probably should have been were used to augment clerical coding procedures. An coded as erroneous enumerations, but were not. However, automated coding algorithm, based on specific responses some of the 26,517 cases could be unresolved, which to the PFU and EFU questionnaires, was used to determine have a probability less than one of being correct.

an appropriate code for each case. This was done for both the PFU interview and the EFU interview. The automated The automated coding results for a given Why code cat-coding also assigned a Why code that describes the rea- egory were rejected if the residual risk was too high or if son why the particular code was assigned. There were there were not enough cases to make an informed deci-more than 60 possible Why code categories. A final code sion. The exception to this rule was the category consist-was assigned to each case using the following three-step ing of cases without any indication of living in a group process: quarters or other residence. This group was, by far, the

  • Validation. Determine for each category of Why code largest category for both the PFU and EFU, so a higher if the automated coding is of high quality using the residual risk6 was expected.

PFU/EFU Review as a truth deck.

Targeting Cases for Clerical Review

  • Targeting. Target only those Why code categories that have low levels of agreement between the automated After the decision was made to accept or reject the auto-coding and the PFU/EFU Review data. mated code for each Why code category, cases were tar-
  • Clerical Review. Clerically recode only those cases in geted for clerical review. Analysts, who were the highest the targeted Why code categories. The clerical recoding level of clerical matchers, performed the clerical review.

takes advantage of handwritten interviewer comments. Due to their experience and additional training, they were assumed to make negligible errors in coding.

Validation of Keyed Data To validate the quality of coding produced by the keyed data algorithm, skip patterns for both questionnaires were 6 Absolute risk, rather than relative risk, is used. Therefore, programmed to determine an appropriate match code and larger categories tended to have higher risks.

Correcting Data for Measurement Error Section IIChapter 3 3-3 U.S. Census Bureau, Census 2000

In general, cases did not go to clerical review if both the

  • for P-sample people, the mover status from the keyed PFU and EFU automated codes agree, the mover statuses data did not agree with mover status assigned during agree, and the Why code category was deemed to be of the EFU coding.

high enough quality. In some instances, cases are exempt

  • there was write-in information in open-ended questions from clerical review because they could be coded based on the form that could not be coded.

on information available in data files. For many of these situations, consistent and complete data were obtained

  • the case was a possible match in before follow-up from both the PFU and EFU interviews. These cases matching and the production and original EFU code dis-included: agreed.
  • Census Usual Home Elsewhere. If the person
  • the case was a duplicate in either the original EFU cod-claimed a Usual Home Elsewhere on certain types of ing or production after follow-up coding.

census forms, they were counted as a correct7 enumera-

  • the case was not yet flagged for clerical review and the tion within the cluster and did not need clerical review.

PFU code from the keyed data did not agree with EFU

  • Geocoding Errors from Initial Housing Unit code from the keyed data, and one of the cases was not Matching. If a case should not have been sent to PFU unresolved for certain reasons.

or EFU and was only sent due to clerical error in the ini-

  • the case was in the PFU/EFU Review and was conflicting tial production matching, then it did not need clerical or had a mover status disagreement between the keyed review.

data and the original EFU mover status.

In contrast, some cases are automatically sent to clerical review. For example, this includes cases in the PFU/EFU Clerical Review Review that resulted in a conflicting status, noninterview The clerical review for A.C.E. Revision II was an analyst-cases, or cases where mover dates could not be deter- only operation. The following data were collected:

mined from the EFU keyed data. Some of the cases that went to clerical review did so because the original A.C.E or

  • Match Code for each form PFU results did not agree with the EFU results. Most of the
  • Why Code for each form cases went to clerical review because the automated cod-ing process was not reliable for that Why code category.
  • Respondent for each form For P-sample inmovers, there was no validation data.
  • Whether the respondents are the same for the two inter-Cases where the original EFU mover status did not match views the mover status from the keyed data, or the residence
  • Best Code. A code indicating which form is the better of status from the keyed data did not match the original EFU the two forms residence status, were sent to clerical review. Noninter-view cases or cases where mover dates could not be
  • Smooshed Code. Information from both forms com-determined from the keyed data were also sent to clerical bined to make a code to represent the true situation review.
  • Mover Status. Mover Status from the EFU form for Cases with the following attributes were sent to clerical P-sample people review:

The match codes were assigned using the census resi-

  • the code from the keyed data for either form was not dence rules to construct coding rules for the flow of the accepted for that case. questionnaire.
  • the code from the keyed data was accepted for both The best code could be one of four values:

forms, but at least one of the codes from the keyed data

  • Both = The enumeration statuses were the same did not agree with its original code (i.e., the PFU code from the keyed data did not agree with production or
  • PFU = The PFU form provided better information the EFU code from the keyed data did not agree with the
  • EFU = The EFU form provided better information original EFU code).
  • Conflicting = Similar caliber respondents (e.g., husband and wife; two neighbors) provided contradictory infor-7 A person can claim a usual home elsewhere if he or she is mation for the case enumerated on certain types of census forms in group quarters (e.g. military, shipboard, and certain types of special places like To ensure reproducibility, computer edits were applied to shelters). If a person on one of these forms claims a usual home the best code. If the analyst did not follow pre-specified elsewhere, then that person is counted at the address they indi-cate is their usual home. These people are part of the E sample rules, then the analyst had to review the case again or because they are part of the housing unit universe. leave a note indicating the situation.

3-4 Section IIChapter 3 Correcting Data for Measurement Error U.S. Census Bureau, Census 2000

CORRECTION OF MOVER STATUS ASSIGNMENT hierarchy: A.C.E. Revision II Clerical Review, PFU/EFU ERRORS Review, Keyed Data. This code reflects the final match, residence, and enumeration status for the A.C.E. Revision For each P-sample case, mover status was based on the II process.

EFU. This was used to determine whether or not the per-son needed clerical review.

LIMITATIONS CORRECTION OF MATCHING ERRORS There were several limitations on the data for the A.C.E.

After the A.C.E. Revision II recoding operation corrects for Revision II:

enumeration, residence, and mover status, the results of the Matching Error Study (MES) were used to correct for

  • Sample Size. The sample used to estimate measure-false matches and false nonmatches. Some matching ment error is 2,259 clusters, containing about 10 per-errors were a result of incorrect residence status coding cent of the persons in the sample used in the produc-and have been corrected as part of the recoding operation tion A.C.E. Due to the smaller sample size, some discussed above. To determine the correct match status, subgroup estimates are subject to higher variances each of the possible combinations of match status was compared to those for the original March 2001 A.C.E.

reviewed to determine the appropriate match status for each type of case. In general, the MES match status was

  • Conflicting Cases. Conflicting cases occurred when assigned when there were changes from a match to a non- the PFU and EFU interviews had respondents of the match or changes from a nonmatch to a match. For other same caliber (either both nonproxy or proxy respon-situations, the match status from the EFU coding was dents who were in the position to have similar knowl-assigned. edge about the household, e.g. two neighbors) who gave contradictory information. Since an additional field DATA OUTPUTS follow-up was not possible, these cases were coded as conflicting, were reviewed separately, and imputed.

After the clerical operation was completed, two files were assembled - one for the P sample and another for the E

  • Data Collection Error. Cases were coded as best as sample. The files contain match codes and Why codes possible. However, there was no attempt to correct for (where appropriate) for original March 2001 A.C.E., EFU, any residual data collection error. Any remaining respon-PFU/EFU Review, Keyed Data, and A.C.E. Revision II Cleri- dent and interviewer errors could not be rectified with-cal Review. A final code is also assigned in the following out an additional field follow-up.

Correcting Data for Measurement Error Section IIChapter 3 3-5 U.S. Census Bureau, Census 2000

Chapter 4.

A.C.E. Revision II Missing Data Methods BACKGROUND 2. Revision E- and P-sample cases with unresolved match, enumeration, or residency status, because of Missing data arises because it is not possible to obtain incomplete or ambiguous interview data from the Per-interviews for all sample cases or to obtain answers to all son Follow-up (PFU) or the Evaluation Follow-up (EFU).

interview questions. This was as true for the A.C.E. Revi-sion II, as it was for the A.C.E. To put the A.C.E. Revision II 3. Revision E- or P-sample cases with conflicting enu-missing data methods in perspective, a brief summary of meration or residency status because contradictory the A.C.E. missing data adjustments is presented. For the information was collected in the PFU and the EFU inter-A.C.E. P sample, a household noninterview adjustment views and it could not be determined which was valid.

compensated for noninterviewed households. Imputation methods were implemented to handle missing characteris- AGE IMPUTATION tics such as age or tenure. Further, match and residency For the original A.C.E., P-sample people with missing age probabilities were assigned when the respective match were assigned to age categories defined by the post-and residency statuses could not be definitively deter-stratification plan. The A.C.E. Revision P-sample post-mined. There was no noninterview adjustment for the stratification divided the original A.C.E. post-stratification A.C.E. E sample, nor was there an imputation for missing group of 0-17 year olds into two age groups: 0-9 and characteristics as the census imputations were used. How-10-17. Those people with missing age who had been ever, E-sample cases with unresolved enumeration status assigned to the 0-17 group were reassigned to either the were assigned probabilities of correct enumeration. See 0-9 or the 10-17 group. This reassignment assumed that Ikeda and McGrath (2001) for details on the A.C.E. missing the age distribution of people missing age was uniform data methodology.

within the 0-17 age grouping. Other people with unre-As will be discussed in Chapter 6, the A.C.E. Revision II solved age remained in the age group they had been origi-estimation utilizes both the original A.C.E. coding results nally assigned to.

on the Full E and P samples and the Revision coding results on the smaller Revision Samples. Note that the HOUSEHOLD NONINTERVIEW ADJUSTMENT A.C.E. Revision II subsample of the A.C.E. is referred to as the Revision Sample and the new coding operation is The A.C.E. household noninterview adjustment generally called the Revision coding. The missing data adjustments spread the weights of P-sample noninterviewed housing for the A.C.E. E and P samples were unchanged from those units over interviewed housing units in the same block used to produce A.C.E. estimates, with the exception of cluster with the same housing unit structure type. Housing the imputation for missing age. It was necessary to impute units were determined to be noninterviews in two ways:

age again for the Full A.C.E. P sample because the A.C.E. 1) an interview was not conducted during the A.C.E. per-Revision II post-strata had different age groupings. son interview operation, and 2) based on the results of the A.C.E. PFU, it was determined that a whole household of The Revision P sample used the same imputations for P-Sample people should not have been listed in the first missing characteristics that the A.C.E. did, including the place, and that another household may have been resi-new age imputation. However, since A.C.E. Revision II dents at that housing unit. Separate household noninter-measurement methodology had important differences view adjustments were implemented for Census Day and from the A.C.E. measurement methods, it was necessary A.C.E. Interview Day.

to develop new missing data methods. The A.C.E. Revision II missing data confronted three general types of new The A.C.E. Revision II noninterview adjustment methodol-missing data problems: ogy for A.C.E. Interview Day was essentially unchanged from the A.C.E. There was, however, an important change

1. New noninterviewed households: Revision P-sample from the A.C.E. methodology for the noninterview adjust-households that were considered interviews in the ment for Census Day residency. In A.C.E. Revision II, a new A.C.E. were identified as noninterviews in the Revision imputation cell was defined. It included new noninter-coding when it was determined that none of the views due to whole households of A.C.E. nonmovers who P-sample people there were valid Census Day resi- were determined to be inmovers or nonresident outmov-dents. ers by the Revision coding. The new noninterview cell A.C.E. Revision II Missing Data Methods Section IIChapter 4 4-1 U.S. Census Bureau, Census 2000

spread the weights of these noninterviewed units over people often changed from what it was in the A.C.E. These housing units with at least one person who: 1) indicated statuses changed because the Revision coding processed he/she lived at another address, or 2) was identified as new information from the EFU, in addition to the original potentially fictitious in the A.C.E. These new noninter- information from the PFU. Thus, while the EFU information views were assumed to have both a low match rate and a resolved many cases that were unresolved in the A.C.E.

low residency rate similar to this group. Otherwise, the because of the PFU, EFU cases with incomplete or ambigu-noninterview adjustment for Census Day used methodol- ous information were a new source of unresolved cases.

ogy similar to that of the A.C.E. There were about the same number of weighted E-sample unresolved cases in the Revision sample as in the A.C.E.,

ASSIGNMENT OF PROBABILITIES OF CORRECT more than six million, with about half of these represent-ENUMERATION, CENSUS DAY RESIDENCY, AND ing new unresolved cases. In contrast, the Revision coding MATCH STATUS generated substantially more P-sample unresolved cases than the A.C.E., 4.6 million compared to 2.7 million. This In the A.C.E., P-sample people with unresolved Census Day increase was due to the fact that all Revision P-Sample residency or match status occurred in one of two ways. cases (except those with insufficient information) went to Firstly, the A.C.E. person interview may not have provided EFU, including whole households of nonmatched people sufficient information for match and follow-up. Secondly, who had not gone to PFU. These people were assumed to the A.C.E. PFU may not have collected adequate informa- be resolved in the A.C.E. and could have become unre-tion for determining a persons Census Day residency sta- solved because of the EFU.

tus or their match status. Inadequate data collection can also result in unresolved enumeration statuses for A.C.E. The original A.C.E. missing data plan based the imputation E-sample people. In the A.C.E. Revision II, the EFU was also cells on information obtained before any follow-up was the source of unresolved cases. How a case was imputed conducted. An ad hoc fix to the A.C.E. missing data meth-depended on how it became unresolved. odology was implemented using information from the PFU. See Cantwell and Childers (2001) for details. Based Imputation for People with Insufficient on the PFU keyed data, after follow-up groups for poten-Information for Match and Follow-Up tial fictitious and lived elsewhere on Census Day were The Revision P-sample people with insufficient information created. The new cells used information highly relevant to for match and follow-up tended to be the same people resident or enumeration status. Further, they showed who had insufficient information for match and follow-up greater discrimination in assigning probabilities of correct in the A.C.E., except for some rare cases with coding enumeration and residency. In A.C.E. Revision II, the changes. Note that people who had insufficient informa- before follow-up imputation cells were abandoned and the tion in the A.C.E. were not sent to EFU. There were about cells were defined based on after follow-up information.

three million weighted people with insufficient informa- This change was the single most important improvement tion for match and follow-up in both the Full and Revision in the A.C.E. Revision II missing data methodology.

P samples. The after follow-up group definitions were based on keyed In the A.C.E., P-sample people with insufficient informa- responses to the PFU and EFU questionnaire checkboxes tion for match and follow-up were assigned a probability and the Why codes. Why codes were clerically-applied of Census Day residency equal to the residency rate of codes that reflected responses in the questionnaire check-P-sample people who went to PFU. This methodology was boxes, as well as handwritten notes. See Adams and improved in the A.C.E. Revision II by defining finer imputa- Krejsa (2002) for a detailed description. The keyed results tion cells that accounted for whether or not the housing and Why codes helped identify the following:

unit was matched, nonmatched, or had a conflicting household. A conflicting household existed when the P-

  • unresolved cases with the same history, i.e., the recipi-and E-sample households had no people in common. ent cells.

The probability of match was assigned based on the over-

  • resolved follow-up cases with the same history up to all match rate, divided into groups based on mover status the point of being unresolved, i.e., the donor pool.

and housing unit match status, as was done in the A.C.E.,

PFU after follow-up groups were defined for those cases and additionally on conflicting household status.

that were unresolved as a result of the PFU.

Imputation for People with Incomplete or Similarly, EFU after follow-up groups were defined for Ambiguous Follow-Up those cases unresolved because of the EFU. It was neces-In contrast to P-sample people with insufficient informa- sary to define separate groups for the PFU and EFU, tion, the residency status for Revision P-sample people because their interviews and questionnaires were differ-and the correct enumeration status for Revision E-sample ent. However, the same after follow-up groups were 4-2 Section IIChapter 4 A.C.E. Revision II Missing Data Methods U.S. Census Bureau, Census 2000

employed for the P- and E-sample unresolved cases, as the before Census Day were the largest informative after PFU and EFU questions about Census Day residency were follow-up group. Another important informative after the same as the PFU and EFU questions about enumeration follow-up group consisted of people who, according to the status. follow-up, had another residence such as a vacation home, though the follow-up did not indicate whether the other It is useful to distinguish between uninformative and infor-residence or the sample address was the Census Day resi-mative unresolved cases:

dence. The noninterview groups and didnt answer other

  • uninformative unresolved: the follow-up was a noninter- residence questions group were the larger uninformative view or an incomplete interview, though there was no groups.

evidence of an erroneous enumeration or nonresident.

Table 4-1. EFU After Follow-up Groups

  • informative unresolved: a follow-up interview was con-ducted, and there was evidence of an erroneous enu- Informative groups meration or nonresident.

The followed up person Lived elsewhere or at an other residence, but Note that when one interview was uninformative unre- the address was not given.

solved, but the other interview was resolved, the Revision Followed up person moved in after Census Day or out before Census coding selected (i.e., the code was based on) the resolved Day, but Census Day address not given.

interview. On the other hand, when the unresolved inter- Respondent indicated the followed-up person Never lived here at the view was informative, the Revision coding could choose sample address, but did not provide the Census Day address.

the unresolved interview over the resolved one. See The followed-up person had an other residence, but did not indicate Adams and Krejsa (2002) for details of the Revision cod- whether the sample address or other residence was the Census Day ing. residence.

It often happened that both the PFU and EFU interviews Followed up person moved in or moved out, but no move dates given.

were unresolved. To assign this case to an imputation cell, Uninformative groups the unresolved interview that was more informative was The respondent indicated the followed up person Lived here at the selected. When both interviews had the same level of sample residence, but did not answer the other residence question.

information, the EFU was typically selected over the PFU, The respondent answered the current residence question, but did not because questions on the EFU questionnaire were more answer the group quarters and other residence questions.

sharply defined.

The respondent did not answer the usual residence question, nor the Consider the following example of an after follow-up group quarters and other residence questions.

group. One cell of unresolved E-sample people or recipi- Potentially fictitious person, no respondents knew of the followed up ents was defined as people with evidence from the EFU person.

interview that they had moved in since Census Day, or moved out before Census Day, though the EFU interview did not provide the address they moved to or from. It was Some of the larger EFU groups were subdivided by A.C.E.

impossible to determine the enumeration status of these operational variables, such as whether or not the house-people, since it was unclear if their Census Day address hold went to PFU, or whether the household was conflict-was in the A.C.E. cluster. The corresponding donor pool ing. The uninformative after follow-up groups tended to consisted of those resolved people who indicated in the have imputed probabilities of correct enumeration or resi-EFU that they had moved in after Census Day or moved dence close to one, typically in the range of 0.92 to 0.99.

out before Census Day. Generally, these people provided In contrast, the informative after follow-up groups had their mover address in the EFU. An analogous after follow- smaller probabilities, often less than 0.25. The probability up group was formed for people unresolved because they of correct enumeration is calculated as the weighted pro-indicated they were movers in the PFU interview. These portion of correct enumerations in the donor pool. For groups are characterized as informative, because the example, follow-up provided evidence of an erroneous enumeration. Weighted CEs in Donor Pool Probability of correct enumeration .

Weighted Resolved Enumerations in Donor Pool Table 4-1 shows the nine EFU after follow-up groups, while Table 4-2 shows the nine PFU after follow-up groups. For the P sample, probabilities of residency and match sta-People who moved in after Census Day or moved out tus were calculated analogously.

A.C.E. Revision II Missing Data Methods Section IIChapter 4 4-3 U.S. Census Bureau, Census 2000

Table 4-2. PFU After Follow-up Groups Imputation for Conflicting Coding Cases Informative groups When the A.C.E. EFU and PFU interviews had contradictory information, the Revision coding procedure assigned the The followed up person Lived elsewhere or at an other residence, but case a conflicting code. Note that a conflicting code is dif-the address was not given. ferent than a conflicting household. All conflicting cases in Followed up person moved in after Census Day or out before Census the Revision coding process were sent to analysts for cleri-Day, but Census Day address was not given. cal review. By examining the handwritten notes of inter-The respondent indicated the followed up person did not live here at viewers, analysts could often determine which of the two the sample address, but did not indicate the other address and did not interviews was better and assign the appropriate code.

answer the group quarters and other residence questions. There were some cases where the interviews appeared to The followed up person had an other residence, but did not indicate be of equal quality, such as when both respondents were where the usual residence was. household members or both respondents were proxies of Uninformative groups equal caliber. For these conflicting cases, the interviews seemed equally likely to be correct based on the analysts The respondent indicated the followed up person Lived here at the sample residence, but did not answer the other residence question. expertise. Therefore, the probability of correct enumera-tion for Revision E-sample conflicting cases and the prob-The respondent answered the usual residence question, but did not answer the group quarters and other residence questions. ability of Census Day residency status for Revision P-sample conflicting cases were assigned to be 0.5. It The lived here question is Dont Know/refused, and the group quarters and other residence questions were not answered. should be noted that the Revision coding resulted in con-siderably fewer conflicting cases than the PFU/EFU Review Blank questionnaire.

Sample. According to Adams and Krejsa (2001), the Potentially fictitious person, no respondents knew of the followed up PFU/EFU Review Sample had about 2.6 million weighted person. conflicting people in contrast to only about 100,000 weighted conflicting people in the Revision Samples.

4-4 Section IIChapter 4 A.C.E. Revision II Missing Data Methods U.S. Census Bureau, Census 2000

Chapter 5.

Further Study of Person Duplication in Census 2000 INTRODUCTION rules to minimize the number of false matches that could be introduced when doing a nationwide search, since Evaluations of the March 2001 coverage estimates indi-there was no clerical review of the results. As a conse-cated the A.C.E. failed to detect a large number of errone-quence of the matching rules, comparisons to benchmarks ous census enumerations. One type of these census erro-indicated that the ESCAP II duplicate estimates were a neous enumerations was duplicate census enumerations; lower bound. Specifically, comparing the ESCAP II results that is, census enumerations included in the census two or within the A.C.E. sample area to the A.C.E. clerical match-more times. The A.C.E. was not specifically designed to ing results showed that only 37.8 percent of the census detect duplicate census enumerations beyond the search duplicates were identified. Fay (2001, 2002) estimated the area. However, there was an expectation that the A.C.E.

matching efficiency at 75.7 percent when accounting for would detect that these E-sample enumerations had the census records out-of-scope for the A.C.E. duplicate another residence, and that, roughly half the time search. The out-of-scope records were those that were this other residence was the usual residence.

reinstated and deleted from the Housing Unit Duplication Feldpausch (2001) showed this expectation was not met.

Operation, documented in Nash (2000).

For purposes of constructing A.C.E. Revision II estimates, The ESCAP II matching was a two-step process. First, the matching and modeling techniques were used to identify sample of census records were matched to the full census duplicate links between the Full E and P samples to census on first name, last name, month of birth, day of birth and enumerations. The matching algorithm used statistical computed age. Age was allowed to vary by one year.

matching to identify linked records. Statistical matching Middle initials and suffixes being scanned into the first allowed for the matching variables not to be exact on both name field were accounted for; however, the other charac-records being compared. Because linked records may not teristics had to be exact matches at this stage. This first-refer to the same individual, even when the characteristics stage match established a link between households. In the used to match the records are identical, modeling tech- second stage, all person records in the linked households niques were used to assign a measure of confidence, the from the first stage were statistically matched using first duplicate probability, that the two records refer to the name, middle initial, last name, month of birth, day of same individual. These duplicate probabilities were used birth, and computed age. The matching parameters used in the A.C.E. Revision II estimates. in the statistical matching were borrowed from other Cen-sus 2000 matching operations. Mule (2001) describes this This chapter documents the matching and modeling meth-matching algorithm in more detail.

ods that were used to identify duplicate links and to pro-duce duplicate probabilities. Note that this study was not To reduce the impact of false matches, particularly with intended to identify which enumeration was in the correct respect to persons with common names and the same location. Chapter 6 describes how to compute the condi- month and day of birth, model weights were applied to tional probability that the sample case was in the correct each set of linked records as a measure of confidence that location given that it had a link to a census enumeration the linked records were indeed duplicates. Due to sched-outside the A.C.E. search area. This calculation impacts ule constraints, a national, Poisson model was used in lieu the correct enumeration status in the E sample and the of a probability model.

residence status in the P sample. A full discussion of the The ESCAP II census duplicate methodology satisfied the estimation components is given in Chapter 6. intended project goals and provided a valuable evaluation of the census by showing that person duplication existed.

BACKGROUND However, limitations of the methodology made it difficult to estimate the magnitude of person duplication in the Mule (2001) reported results for initial attempts at measur- census.

ing the extent of person duplication in Census 2000. This work was conducted by an inter-divisional group as part OVERVIEW OF THE DUPLICATE STUDY PLAN of the further research to inform the ESCAP II decision on Like the ESCAP II study, the A.C.E. Revision II duplicate adjusting census data products. This study is referred to plan involved matching the Full E and P samples to the as the ESCAP II duplicate study in this chapter. The ESCAP census to establish potential duplicate links. Then, model-II duplicate study used conservative computer matching ing techniques were used to identify the links most likely Further Study of Person Duplication in Census 2000 Section IIChapter 5 5-1 U.S. Census Bureau, Census 2000

to be duplicate enumerations and to assign a measure of duplicates, but to identify all of the A.C.E. duplicates confidence that the links are duplicates. Key differences would have required fewer characteristics to be exact with the ESCAP II study include extending the use of sta- matches. This could potentially lead to a high number of tistical matching and developing models to assign a dupli- false links.

cate probability to the links. An advantage of duplicate The search for duplicate links between the Full E and P probabilities over the Poisson model weights used in the samples and the census was limited to those pairs that ESCAP II study is that all duplicate links outside the A.C.E. agreed on certain identifiers, or blocking criteria. Blocking search area could be reflected in the A.C.E. Revision II esti- criteria were sort keys that were used to increase the com-mates. Fay (2001, 2002) used a subset of the ESCAP II puter processing efficiency by searching for links where duplicate links to produce a lower bound on the level of they were most likely to be found. For instance, to search erroneous enumerations that the A.C.E. did not measure. only for duplicates when the first and last names agreed, both the sample and census files would have been sorted Estimates of census duplication were based on matching by the blocking criteria of first and last name. Then, all and modeling E-sample cases to the census. For purposes possible pairs within each first name/last name combina-of A.C.E. Revision II estimation, the P sample was also tion would have been searched for duplicate links.

matched to the census. However, these results did not Although true matches can be missed by using blocking contribute to estimates of person duplication in the cen- criteria, multiple sets of blocking criteria minimize the sus. The A.C.E. Revision II estimation methodology number of missed matches. The A.C.E. Revision II dupli-adjusted the A.C.E. correct enumeration rate for E-sample cate study utilized four sets of blocking criteria.

cases with links outside the A.C.E. search area. Further, At the first stage of matching, it was possible for one the A.C.E. Revision II estimation methodology adjusted the sample case to link to multiple census records. All of these A.C.E. match rate for P-sample cases that linked to census links were retained for the second stage of matching.

cases outside the search area.

The second stage of matching was limited to matching The matching algorithm consisted of two stages. The first persons within households. If an E- or P-sample case stage was a national match of persons using statistical linked to a census record in a group quarter, the case did matching as described in Winkler (1995). Statistical match- not go to the second stage. Using results from the first ing attempted to link records based on similar characteris- stage of matching, a link between two housing units was tics or close agreement of characteristics. Exact matching established. The second stage was a statistical match of required exact agreement of characteristics. Statistical all household members in the sample housing unit to all matching allowed two records to link in the presence of household members in the census housing unit. The missing data and typographical or scanning errors. second-stage matching variables were the same as the first stage; however, the matching parameters differed.

Six characteristics common to both files, called matching Using a subset of the first-stage links, the second-stage variables, were used to link records in the Full E and P matching parameters were derived using the Expectation-sample with records in the census. Matching parameters Maximization (EM) algorithm. See Winkler (1995) for a associated with each matching variable were used to mea- more detailed explanation. A key difference between the sure the degree to which the matching variables agreed first- and second- stage parameters was the reduced between the two records, ranging from full agreement to emphasis on requiring last names to agree in the second full disagreement. The measurement of the degree to stage. This intuitively makes sense, since second stage which each matching variable agreed was called the vari- matching was within a given household.

able match score. The overall match score for the linked The household was the only set of blocking criteria used records was the sum of the variable match scores. at the second stage of matching. Sample records were allowed to link to only one census record within the Full agreement of at least four characteristics was required household. As a consequence, this limited the ability to to be considered a duplicate link. Because this study was a identify within-household duplicate links. Each link had an computer process without the benefit of a clerical review, overall match score based on the second-stage matching.

this limitation of the statistical matching was necessary in order to minimize linking records with similar characteris- The set of linked records from the second-stage matching tics that represented different people. This was a particu- and the links to group quarter enumerations from the first lar concern when looking for duplicate enumerations stage consisted of both duplicate enumerations and per-across the entire country. The need to use statistical son records with common characteristics. Two modeling matching at the first stage was apparent after the limited approaches were used to estimate the probability that the success of the ESCAP II exact matching procedure to iden- linked records were duplicates. One approach used the tify A.C.E. duplicates in the A.C.E. sample areas. The sta- results of the statistical matching and relied on the tistical matching yielded better identification of the A.C.E. strength of multiple links within the household to indicate 5-2 Section IIChapter 5 Further Study of Person Duplication in Census 2000 U.S. Census Bureau, Census 2000

person duplication. The second relied on an exact match Based on the results of this matching and modeling, an of the census to itself and the distribution of births, overall estimate of census duplicates was derived from the names and population size to indicate if the individual link E-sample links. Further, for each Full E- and P-sample per-was a duplicate. These two approaches were referred to as son who linked outside the A.C.E. search area, these the statistical match modeling and the exact match model- results provided the probability that they were in fact the ing, respectively. These two approaches were combined so same person. These duplicate probabilities were used in that each sample case with a link to a census enumeration the A.C.E. Revision II estimates.

had an estimated probability of being a duplicate.

The statistical match modeling was used when two or MATCHING ALGORITHM more duplicate links were found between housing units in the second stage. After the second-stage matching, each Efforts to increase matching efficiency over the ESCAP II duplicate link between a sample household and census duplicate study included implementing statistical match-household had an overall match score. So, for each sample ing of persons at the first stage and the use of more dis-household, a set of match scores was observed. For any criminating matching parameters at the second stage.

resulting set of match scores, a probability of not observ-Inputs ing this set of match scores was estimated. See the attach-ment for details. The higher this probability, the more Both the Full E and P samples were matched to the census likely that the set of linked records in the household were records. The E-sample records reflected any updates made duplicates. by the clerical staff during the A.C.E. matching operation The estimate of the probability of not observing this set of when the census characteristics were incorrectly tran-match scores assumed independence of the individual scribed or scanned. The P sample included all nonmovers, match scores within each household. This assumption was outmovers, and inmovers. The same matching algorithm based on using the EM algorithm to determine the second- was used for the Full E and P samples.

stage matching parameters. The probability of observing The census files consisted of data-defined person records the individual match scores was estimated from the for both the household and group quarters populations.

empirical distribution of individual match scores resulting Both the reinstated and deleted records from the Housing from the second-stage matching. Further, this measure Unit Duplication Operation described in Nash (2000) were accounted for the number of times that a unique sample included in the matching, so these links could be reflected household was matched to different census households in the A.C.E. Revision II estimates.

within a given level of geography. The probability of not observing this set of match scores was translated into a First Stage: Person-Level Matching statistical match duplicate probability of 0 or 1 based on critical values that varied by level of geography. The first stage was a statistical match of the Full E and P The exact match modeling relied on an exact match of the samples to the census. This was a national match where census to itself. The methodology accounted for the over- each Full sample case was compared with census records all distribution of births, frequency of names, and popula- across the nation to assess how well the matching vari-tion size in a specific geographic area. Duplicate probabili- ables agreed.

ties were computed separately by geographical distance of The matching variables were first name, last name, middle the links. Further, duplicate links were modeled separately initial, month of birth, day of birth, and computed age.

by how common the last name was, as well as for His- The matching variables and parameters are given in Table panic names. 5-1. The agreement weight and the disagreement weight The two approaches were combined to assign an esti- are the matching parameters of each variable. Standard mated probability that the linked records were duplicates. matching parameters were used at the first stage. The The duplicate probability for the links to group quarters in relationship of the agreement and disagreement param-the first stage and one-person household links were from eters translated into the match score for each variable. For the exact match modeling. For all other links, the dupli- example, the full agreement value for first name was cate probability was the larger of the two model esti- 2.1972; whereas, the full disagreement match score was mates. For nonexact matches, this was always from the -2.1972. The sum of the variable match scores was the statistical match modeling. For exact matches, adjust- total match score. When the match score was 9.4006, this ments were made to account for the integration of these indicated full agreement of all variables. A match score of two methods. -9.4006, on the other hand, indicated full disagreement.

Further Study of Person Duplication in Census 2000 Section IIChapter 5 5-3 U.S. Census Bureau, Census 2000

Table 5-1. First-Stage Matching Parameters Matching parameters Match score Matching variables Type of Agreement Disagreement Agreement Disagreement comparison weight (m) weight (u) ln (m/u) ln ((1-m)/(1-u))

First name String (uo) 0.9 0.1 2.1972 -2.1972 Last name String (uo) 0.9 0.1 2.1972 -2.1972 Middle initial Exact 0.7 0.3 0.8473 -0.8473 Month of birth Exact 0.8 0.2 1.3863 -1.3863 Day of birth Exact 0.8 0.2 1.3863 -1.3863 Computed age Age (p) 0.8 0.2 1.3863 -1.3863 Total 9.4006 -9.4006 The type of comparison indicated the statistical matching the middle initial score was zero. Second, the total match method for comparing the variables. For example, the score had to be 4.7 or greater. This minimum score was string comparitor was used for first name and last name. about half the total score for full agreement of all match-This method addressed typographical errors in names. For ing variables.

example, Tim and Tum can yield a positive agreement Table 5-2 shows the distribution of A.C.E. links within score. An exact match algorithm would have treated these cluster that were identified by the resulting number of as a disagreement. For age, the age values could have matching variables in full agreement. There were a total of been off by +/- one year and still receive a full agreement 10,559 duplicate links identified by the A.C.E. clerical staff score on computed age.

that agreed on the first letter of the first and last name.

The Statistical Research Division matching software called The table shows the number of identified A.C.E. duplicates BigMatch documented in Yancey (2002) was used in the as the number of matching variables in full agreement first stage. This software allowed a sample record to link decreased. The table also displays the number of total to more than one census record. This capability was links that were identified. The percent of A.C.E. links in important, since it was possible for there to be more than each row of the table decreases as the number of match-two enumerations of the same person in the census. ing variables in full agreement decreases.

By requiring at least four matching variables to be in full Four blocking criteria were used. Blocking restricted the agreement, 68.4 percent of these A.C.E. duplicates were comparisons of records to only those that exactly agreed identified. On the other hand, when only four of the six on certain values. Most records that did not agree on the variables fully agreed, only 30.4 percent of the total links values below are probably not duplicates. The blocking identified by this criteria were A.C.E. Revision II duplicates.

criteria were:

Note that it was tempting to require that only three vari-

  • First name, last name ables be in full agreement, since this would increase the number of A.C.E. duplicates by 20 percent. However, this
  • First name, first initial of last name, age groupings change would substantially increase the number of false (0-9, 10-19, 20-29, etc.) matches.

Table 5-3 shows that introducing a minimum total score

  • Last name, first initial of first name, age groupings greatly increased the density of A.C.E. links identified.

(0-9, 10-19, 20-29, etc.)

Note that some A.C.E. duplicate links were dropped by

  • First initial of first name, first initial of last name, using this criteria. This was a consequence of applying month of birth, day of birth rules that reduced the false link rate.

All possible links within each blocking criteria were com-Second Stage: Household-Level Matching pared. For each comparison, the variable match score and the total match score were computed. The first-stage The second stage of matching was restricted to the house-matching decision rules were as follows. First, a match hold population. The person links from the first stage must have had at least four of the match variables in full established a link between two housing units. The second agreement. This meant that four of the variables had to stage was a statistical match of the household members have a match score equal to the agreement match score in from the two housing units. A sample household was Table 5-1. The one exception was the middle initial. When included in the second stage multiple times, if the sample-the middle initial was blank, it was considered to be in full household had persons with links to multiple census agreement in this study since the middle initial was often households in the first stage. This was the same approach missing on the sample and census records. In this case, used for the ESCAP II duplicate study.

5-4 Section IIChapter 5 Further Study of Person Duplication in Census 2000 U.S. Census Bureau, Census 2000

Table 5-2. Distribution of Links Within A.C.E. Clusters by Full Agreement

[Percentages may not add due to rounding]

A.C.E. links Number of variables Total Percent of A.C.E.

in full agreement Count Percent Cumulative percent links links in row 6 2,348 22.2 22.2 2,451 95.8 5 2,895 27.4 49.7 3,983 72.7 4 1,983 18.8 68.4 6,520 30.4 3 2,211 20.9 89.4 40,891 5.4 2 954 9.0 98.4 180,324 0.5 1 164 1.6 99.9 601,370 <0.1 0 4 <0.1 100 350,987 <0.1 Total 10,559 100 100 1,186,526 0.9 Table 5-3. Distribution of A.C.E. and Total Links Within A.C.E. Clusters

[Only include links with score 4.7]

Number of variables in Percent of A.C.E.

full agreement A.C.E. links Total links links in row 6 2,348 2,451 95.8 5 2,868 3,763 76.2 4 1,680 2,670 62.9 3 0 0 n/a 2 0 0 n/a 1 0 0 n/a 0 0 0 n/a Total 6,896 8,884 77.6 Table 5-4. Second-Stage Matching Parameters Matching parameters Match score Matching variables Type of Agreement Disagreement Agreement Disagreement comparison weight (m) weight (u) ln (m/u) ln ((1-m)/(1-u))

First name String (uo) 0.9500 0.0125 4.3307 -2.9832 Last name String (uo) 0.9600 0.5700 0.5213 -2.3749 Middle initial Exact 0.0840 0.0220 1.3398 -0.0655 Month of birth Exact 0.6000 0.0600 2.3026 -0.8544 Day of birth Exact 0.3000 0.0200 2.7081 -0.3365 Computed age Age (p) 0.9750 0.1325 1.9959 -3.5467 Total 13.1984 -4.1948 The matching variables were the same as the first stage: score, while last name only contributed 0.5213 when it first name, last name, middle initial, month of birth, day of was in full agreement. Further, month of birth and day of birth, and age. Table 5-4 gives the matching parameters. birth were more powerful than computed age. This was The data in this table have similar meaning as that for the expected since adults in a housing unit often have similar first stage parameters in Table 5-1. Using a subset of the ages, but not the same month and day of birth.

first-stage links, the second-stage matching parameters The Statistical Research Division Record Linkage software were derived using the EM algorithm as described in Win- described in Winkler (1999) was used for the second kler (1995). These parameters were anticipated to be more stage. Each sample record was linked to only one census discriminating than the set used for the ESCAP II study. record within the household, a one-to-one matching. There Since the first-stage matching established a link between was no additional blocking criteria beyond household; all two housing units, first name had more discriminating possible links within households were compared. Each link power than last name in the second stage. When first had a total match score ranging from -4.1948 to 13.1984.

name fully agreed, it contributed 4.3307 toward the total This second-stage match score was used for the modeling.

Further Study of Person Duplication in Census 2000 Section IIChapter 5 5-5 U.S. Census Bureau, Census 2000

All links with a second-stage match score greater than matching attempt. Further, the Pr(NT) for each set of dupli-0.3419 were retained as input to the modeling. cate links for a sample housing unit varied because of the geographic distance of the duplicate links. As shown in Reverse Name Matching the attachment, Pr(NT) was estimated by

[ ]

n Occasionally, first and last name was captured in reverse p order on the data files. The first name was in the last Pr NT 1 Pr Xd xd name field and the last name was in the first name field. d1 When the data was in reverse-order on one file but not the other, it was difficult to identify these duplicate links since where the variable match scores for first and last name disagreed Pr(Xd xd) was the probability of getting a total match score Xd for both the first and second stage. To attempt to identify greater than or equal to xd, these cases, the first and last name fields were reversed and then matched to the census files a second time. The p was the number of duplicate links in the sample household, duplicate links from both runs, name in the usual order and and in reverse order, were input to the modeling. When both methods identified the same duplicate link, the n is the number of census housing units the sample household higher of the two match scores was retained and used in was matched with in the second stage within a given geo-the modeling. graphic area.

The estimate of the probability of not observing this set of MODELING LINKS match scores assumed independence of the individual match scores within each household. This assumption was Since the goal of this study was to provide duplicate infor- based on using the EM algorithm to determine the second-mation for calculating A.C.E. Revision II estimates, it was stage matching parameters. The probability of observing important to provide a measure of confidence that could the individual match scores was estimated from the be incorporated into the estimation methodology. Conse- empirical distribution of individual match scores resulting quently, modeling efforts focused on methods for estimat- from the entire second-stage matching. Further, this mea-ing the probability that two linked records were duplicate sure accounted for the number of times that a unique enumerations. An advantage of duplicate probabilities sample household was matched to different census house-over the Poisson model weights used in ESCAP II was that holds within a given level of geography. The geographical all duplicate links outside the A.C.E. search area could be levels were block, tract, same county (outside tract), same reflected in the A.C.E. Revision II estimates. The statistical state (outside county), and different state.

and exact match modeling approaches were combined to yield an estimated duplicate probability for the linked For the E sample, this analysis was done at the E-sample records from the statistical matching of the E and P household level. For the P sample, a household consisted samples to the census. of any combination of nonmovers, outmovers, and inmov-ers. To account for this, the duplicate links were analyzed Statistical Match Probability separately by mover status when looking at patterns of match scores.

The statistical match modeling was used when the second stage matching resulted in two or more duplicate links. The probability of not observing this set of match scores After the second-stage matching, each duplicate link was translated into a statistical match duplicate prob-between a sample household and census household had ability of 0 or 1 based on critical values that varied by an overall match score. So, for each sample housing unit level of geography. Table 5-5 shows the minimum value of to census housing unit match, a set of match scores was Pr(NT) for assigning a statistical match duplicate probabil-observed. For any resulting set of match scores, a prob- ity of 1 for E and P samples.

ability of not observing this set of match scores, Pr(NT),

was estimated for each link within the sample household. Table 5-5. Minimum Value for Assigning The higher this probability, the more likely that the set of Statistical Match Probability linked records in the household were duplicates. Minimum Pr(NT)

Geographic distance of linked records Since a sample housing unit could have been matched to E sample P sample more than one census housing unit during the second Same block . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.00 0.25 stage, there were multiple sets of duplicate links and Same tract (different block) . . . . . . . . . . . . . 0.70 0.35 match scores for each sample housing unit. Each set of Same county (different tract) . . . . . . . . . . . . 0.97 0.60 duplicate links for a sample housing unit was assigned a Same state (different county) . . . . . . . . . . . 0.97 0.60 Different state . . . . . . . . . . . . . . . . . . . . . . . . . 0.97 0.60 separate Pr(NT) since the match scores differ for each 5-6 Section IIChapter 5 Further Study of Person Duplication in Census 2000 U.S. Census Bureau, Census 2000

Duplicate links with a Pr(NT) greater than or equal to the the exact match modeling. For all other links, the dupli-minimum value in Table 5-5 were assigned a statistical cate probability was the larger of the two model esti-match duplicate probability of 1. All other links were mates, as indicated by the shaded cells in Table 5-6. For assigned a statistical match duplicate probability of 0. nonexact matches, the duplicate probability assignment was always based on the statistical match modeling.

Exact Match Probability For exact matches in sample households with two or more Given exact matching of the census to itself, duplicate persons, adjustments were made to account for the inte-probabilities were assigned to linked records by taking gration of these two methods. The exact match probabili-into account the overall distribution of births, frequency of ties were determined conditionally, requiring a downward names, and population size in a specific geographic area. adjustment of the exact probabilities for the links, which Duplicate probabilities were computed separately by links the statistical match modeling assigned a probability of within county, links within state and different county, and zero. The amount of the downward adjustment was based different states. Further, duplicate links were modeled on the upward adjustment made when using the statistical separately by how common the last name was, as well as match probability of one instead of the exact match prob-for Hispanic names. Fay (2002b) gives the model and pre- ability.

liminary results. The following are excerpts from Fay (2002b) to give the reader a general idea of the approach. Table 5-6. Combining the Two Modeling Like the Poisson model, the new approach uses frequen- Results cies of occurrences of combinations of first and last name. Size of Statistical Exact The result is an estimated probability of duplication for Type of Link sample Type of match match most matches, except for matches of frequently occurring HU match probability probability names, where the probability of duplication is low and dif- Housing Unit 1 Exact - [0, 1) ficult to estimate with high relative precision. Nonexact - -

2+ Exact 1 [0, 1)

This work results in a series of probability models, with Exact 0 [0, 1) parameters that can be estimated statistically from Nonexact 1 -

Nonexact 0 -

observed census data. A core model characterizes prob-abilities of duplication, triple enumeration (apparent enu- Group Quarter Exact - [0, 1)

Nonexact - -

meration of the same person three times), and other forms of multiple enumeration within a given geographic area. - Modeling did not assign a value.

The other models account for duplication across domain.

The results of this modeling provided, for each Full E- and The first part of the core model expresses the probability P-sample person who linked to a census person outside of coincidentally sharing a birthday. A second set of the A.C.E. search area, the probability that they were in expressions, a model for census duplication, is built on fact the same person. These probabilities, referred to as pt top of the model for coincidental sharing of date of birth.

in Chapter 6, were used to obtain A.C.E. Revision II esti-The core model combines the two models to account for mates.

observed patterns of exact computer matches of census enumerations. The core model provides a basis to esti-Reinstated and Deleted Census Records mate a probability that a given computer match links the same person instead of two persons coincidentally sharing For the exact match modeling, separate probabilities were a birthday. An approximate argument allows the core computed based on population distributions with and model to be extended to nested geographic categories, without the reinstated and deleted records. One computed such as (1) counties, (2) other counties within state, and duplicate probability allowed sample records to link to (3) other states. reinstated and deleted census records, while a second duplicate probability did not allow links to reinstated and The result of the exact match model is a duplicate prob-deleted records. Under this second scenario, any links to ability greater than or equal to zero, but less than one for reinstated or deleted records were assigned a duplicate census records that agree exactly on first name, last probability of zero. The duplicate probabilities used in the name, month and day of birth and two-year age intervals.

A.C.E. Revision II estimation were those that allowed links Combining the Two Models to reinstated and deleted census records.

The two approaches were combined to give one duplicate ASSESSMENT OF LINKS probability to each E- and P-sample duplicate link. Table 5-6 summarizes the results of combining the two models. Throughout the development of the Further Study of Per-The duplicate probability for the links to group quarters in son Duplication in Census 2000, the A.C.E. duplicate links the first stage and one-person household links were from found during production were the benchmark used to Further Study of Person Duplication in Census 2000 Section IIChapter 5 5-7 U.S. Census Bureau, Census 2000

gauge whether the matching algorithm did a good job of FORMING ESTIMATES OF DUPLICATES finding true duplicates and minimizing the number of false links found within the block cluster. Estimates of census duplicates were formed by summing This study and the ESCAP II duplicate study documented in the product of the sampling weight for the E-sample per-Fay (2001, 2002) utilized the same method for estimating son, the duplicate probability, and the multiplicity factor.

efficiency for the E sample. Basically, the method esti- Since a sample of the census (E sample) was matched to mated the effectiveness of identifying A.C.E. clerical dupli- the census, a naive approach would treat each duplicate cates within the A.C.E. sample area and accounted for link of A to B as one duplicate. However, had a different duplicate links to reinstated and deleted records that were sample been drawn, it could have contained the B to A out-of-scope for A.C.E. Instead of producing one overall link. Applying a multiplicity factor of 1/2 in this simple efficiency measure, several measures were computed for case ensured that the estimate of this example was only various levels of detail including size of sample household one duplicate. See Mule (2002b) for more details on the and number of links between the units. computation of the multiplicity factor.

5-8 Section IIChapter 5 Further Study of Person Duplication in Census 2000 U.S. Census Bureau, Census 2000

Attachment.

Probability of Not Observing a Set of Match Scores Each E-sample household had a set of duplicate links to a where n was the number of different census housing units particular census household. Each duplicate link had a cor- that the E-sample housing unit was linked to during the responding overall match score from the second-stage second-stage match. This calculation accounted for the matching resulting in a pattern of match scores for the fact that the more times the E-sample housing unit sample household. The task was to assess whether this matched to different housing units, the greater the chance observed set of match scores occurred because the links of obtaining this outcome.

were duplicates or because the records had characteristics in common but were different people. Individual match scores X1, X2, ..., Xp were assumed to be Objective: To estimate the probability of not observing independent, since the second-stage matching parameters this set of match scores or better for each gave more emphasis to first name rather than last name.

E-sample household. Further, the parameters gave more emphasis to month and day of birth rather than age. Under the independence The hypothesis is that the higher the probability of not assumption, (1) can be written as follows:

observing this set of match scores or better, the more

[ ]

n likely the links represent duplicate enumerations. p PrNT 1 Pr Xd xd (2)

Suppose a particular E-sample household has p 2 dupli-d1 cate links with observed match scores x1, x2, ..., xp.

Define Pr(NT) to be the probability of not observing the set The probability of observing a match score Xd greater than of match scores or better, (X1 x1, X2 x2, ..., Xp xp. or equal to xd, Pr(Xd xd), was obtained from the empiri-This probability can be expressed as cal distribution of second-stage match scores. The prob-ability in (2) was used for the P-sample households as PrNT [1 PrX1 x1,X2 x2,....,X p xp]n (1) well.

Further Study of Person Duplication in Census 2000 Section IIChapter 5 5-9 U.S. Census Bureau, Census 2000

Chapter 6.

A.C.E. Revision II Estimation The A.C.E. Revision II Dual System Estimate (DSE) method- LA = late additions to the census, i.e. records ology was developed with the following objectives in included too late for A.C.E. processing; mind: primarily reinstated cases from the housing unit duplication operation

  • Integration of the corrections for measurement errors so CE = E-sample weighted estimate of census correct that measurement errors identified by both the evalua-enumerations tions and the duplicate study are not over-corrected.

E = E-sample weighted estimate of census total

  • Separate estimation for both E- and P- sample persons enumerations (includes insufficient based on whether or not they linked to a census enu- information for matching and followup cases, meration outside the search area. excludes nondatadefined cases and late adds)

P = P-sample weighted estimate of total persons

  • Flexibility in the post-stratification design, because the M = P-sample weighted estimate of matches to factors that affect correct enumeration (as measured by census persons the E sample) were not necessarily the same as those that affect coverage (as measured by the P sample). DSEs were computed separately within post-strata. A post-stratum is a group of people defined by demographic and
  • Adjustment for correlation bias.

geographic characteristics who are assumed to have the This chapter presents how this additional information was same probabilities of inclusion in the census. Post-strata incorporated into the DSE for A.C.E. Revision II estimates. can also be defined to have equal probabilities of correct The reader is assumed to be familiar with the basic dual enumeration in the census.

system model and how it was used to produce the March The DSE in (1) can be written as a function of the final cen-2001 A.C.E. estimates. See Haines (2001) for a detailed sus count, Cen, which includes late adds and the following description of this methodology and Davis (2001) for the three rates:

original dual system estimation results. This chapter describes the approach to A.C.E. Revision II dual system rCE estimation. The chapter discusses estimation of the term DSE Cen rDD (2) rM accounting for persons in the census who are not in the E sample. The correct enumeration rate from the E-sample where data is described. Then, the estimation of the match rate rDD (Cen' II' ) / Cen is the census data-defined from the P-sample data is addressed. The census, rate. The numerator excludes E-sample, and P-sample data are combined to form a late adds, but the denominator single DSE formula. Next, the post-stratification variables includes late adds.

used for the A.C.E. Revision II Full and Revision Samples rCE = CE / E is the E-sample correct enumeration rate are defined. The chapter then discusses adjustment for rM M / P is the P-sample match rate.

correlation bias using demographic analysis sex ratios and The three rates can be interpreted as estimates of prob-concludes with a discussion of synthetic estimation. abilities. Thus, within post-stratum, DUAL SYSTEM ESTIMATION

  • rDD estimates the probability that a census person record has sufficient (and timely) information for The basic form of the dual system estimate (DSE) is: inclusion in A.C.E. processing, CE P
  • rCE estimates the probability that an E-sample uni-DSE Cen' II' (1)

E M verse person is a correct enumeration, and where

  • rM estimates the probability that a person in the Cen' = the census count, excludes late adds P sample is included in the census.

II = non-data-defined census records, excludes late adds A.C.E. Revision II Estimation Section IIChapter 6 6-1 U.S. Census Bureau, Census 2000

The interpretation of rM may be less obvious than the Mom = estimate of matches to census persons for other two; it is the sample-weighted proportion of outmovers P-sample persons who were also found in the census. The Pnm = estimate of total nonmovers general independence assumption underlying DSE is that Pom = estimate of total outmovers either the census or the A.C.E. inclusion probabilities are Pim = estimate of total inmovers the same (both are not required). Assuming causal inde-Nonmovers, outmovers, and inmovers were defined with pendence, the match rate rM estimates the probability of reference to their status in the period of time between census inclusion for the post-stratum.

Census Day (April 1, 2000) and the A.C.E. interview. Non-Equation (2) also gives an interpretation of how the DSE movers were those who did not move during this period, constructs population estimates within a post-stratum. outmovers were those persons who moved out of a sample block during this period, and inmovers are those

  • Multiply the census count (Cen) by the data-defined who moved into a sample block during this period. Equa-rate, rDD , to estimate the number of census persons tion (3) estimated P-sample matches (M) as the sum of who are data-defined and, therefore, eligible for inclu-estimated matches among nonmovers (Mnm) and esti-sion in the E sample.

mated matches among movers. The number of mover

  • Reduce this product by multiplying it by the estimated matches was estimated as the product of an estimated probability of correct enumeration, rCE number of movers (Pim) and an estimate of the mover match rate (Mom / Pom). Thus, P-sample outmovers were
  • Increase this result by dividing it by the estimated prob-used to estimate the mover match rate while P-Sample ability of census inclusion, rM inmovers were used to estimate the number of movers.

The primary tasks in developing DSEs at the post-stratum This approach implies that Pnm + Pim should be used for level are the estimation of the three rates involved. The the estimated total of P-sample persons (P).

estimate rDD is straightforward because it is based on Equation (3) can be further expanded to include post-100-percent census tabulations. More detail is provided stratification subscripts. The Full E- and P-sample post-for the estimates rCE and rM, since they are more challeng-strata are denoted by subscripts i and j, respectively. The ing.

census term was calculated for the cross-classification of i The different estimation tasks can be tackled one term at a and j post-strata, denoted ij. The DSE formula, using pro-time. Basically, the goal is to estimate the numerators and cedure C for movers, with different post-strata for the E denominators of the terms rCE and rM. Since E, the esti- and P samples is:

mated number of total census data-defined enumerations, is a simple, direct sample-weighted estimate, the chal-lenges relate mostly to developing the estimates CE, P,

[ ]

CEi Ei and M. The estimation challenges for A.C.E. Revision II DSECij Cen ij rDD, ij

[ ]

Mom,j focus on accounting for: (i) information from the revised coding of the A.C.E. Revision Sample, (ii) information from the A.C.E. Revision II study of census duplicates, and (iii)

[ Mnm,j Pom,j Pnm,j Pim,j Pim,j

]

different post-stratification schemes for the Full E and P ESTIMATION OF rDD samples. The most difficult issue is (ii).

Recall the general form of the DSE in equation (2). This Before proceeding to a detailed discussion of the A.C.E. section discusses the estimation of the data-defined rate, Revision II DSE components, consider the general nature or DD-rate.

of the estimator. While the basic DSE shown in equation (1)

The DD-rate estimate (rDD) is defined as (Cen' II' Cen was applied in the 1990 PES (Hogan, 1993), the March for a given detailed ij post-stratum, where Cen', II', and 2001 A.C.E. incorporated the modification called PES-C Cen are defined from 100-percent census tabulations. At estimation. See Haines (2001) and Mule (2001b) for the post-stratum level, Cen rDD reduces to Cen' II'.

details. This DSE had the general form:

This suggests that an alternative to computing rDD at a CE Pnm P im post-stratum level is to compute Cen' II' for all levels DSEC Cen' II' (3) (e.g., demographic post-stratum groups within small geo-E Mom Mnm Pim graphic areas) for which estimates were to be computed, Pom and then to adjust these quantities by the appropriate where the following quantities are all P-sample weighted rCE / rM factors. This approach may be problematic, espe-estimates for the given post-stratum: cially when applied to very small areas.

Mnm = estimate of matches to census persons for The problem with direct computation of Cen' II' for very nonmovers small areas can be seen with the following hypothetical example. Suppose a particular small geographic area (e.g.,

6-2 Section IIChapter 6 A.C.E. Revision II Estimation U.S. Census Bureau, Census 2000

a collection of blocks) has a high rate of imputation in the The component CE~ iD represents the estimated number of census, say 15.0 percent. Imputation rates will vary geo- correct enumerations in the Full E sample with duplicate graphically, and high rates could result from a number of links in post-stratum i, which are retained after unduplica-factors, such as difficulties getting access to housing units tion. This term includes the probability of being a dupli-in secure communities or difficulties in hiring and retain- cate, pt, as well as the conditional probability that an ing census enumerators in a particular area. In this hypo- E-sample case is a correct enumeration given that it is a thetical example, removing all imputations from the cen- duplicate to another census enumeration outside the sus count for the area by computing Cen' II' would A.C.E. search area.

reduce the census count by 15.0 percent. Subsequent mul-tiplication by the rCE rM factors and summing the result- The total weighted number of persons in post-stratum i in ing DSEs over post-strata may increase the population esti- the E sample are denoted by Ei.

mate from this base, but perhaps by no more than two or The double-sampling ratio factor f1, i' corrects for measure-three percent (depending on the post-stratum composition ment error based on the Revision E sample. It is a ratio of of the area). The net synthetic DSE would, thus, be 12.0 or an estimate that uses the revised coding (indicated by *)

13.0 percent lower than the census count. While this esti- to an estimate that uses the original coding. These adjust-mate could make sense if almost all the housing units for ments, which are calculated for measurement error post-which persons were imputed were actually vacant (and strata i', are represented by:

this fact were not discovered in the census enumeration),

it would not make sense if most of the units were occu- CE i'ND*

pied and the high rate of imputation resulted from other Ei'ND CEi'ND*

factors such as those suggested above. Calculating rDD for f l, i' .

post-strata and applying it synthetically avoids such prob- CEi'ND CEi'ND lems in small area estimates, though perhaps incurring Ei'ND some error for larger areas for which the direct tabulation of Cen' II' would be sensible. P- and E-sample cases with duplicate links were assigned a nonzero probability of being a duplicate, pt . P- and The data-defined rates, rDD , are computed at the detailed E-sample cases without duplicate links were assigned a pt post-stratum obtained as the intersection of the E- and value of zero. This probability is usually 0 or 1 for E- and P-sample post-strata.

P-sample cases, but some duplicate links have a value in ESTIMATION OF rCE between, indicating less confidence that the link is repre-This section discusses the estimation of the correct enu- senting the same person. These probabilities are also meration rate, rCE = CE/E. The Full E-sample post-strata transferred to the E- and P- Revision Samples.

are denoted by the subscript i. The Revision E sample has Although the duplicate study identified E- and P-sample post-strata denoted by i', where i' is based on collapsed cases linking to census enumerations outside the A.C.E.

post-strata i. This means that the Revision Sample post-search area, this study could not determine which compo-strata were obtained by collapsing the Full Sample post-nent of the link was the correct one since no additional strata i. The correct enumeration rate is written:

data were collected for this purpose. Assuming that the CEiND f l, i' CE~ D linked person does exist, the goal is to determine which of i

rCE, i the two locations is the appropriate place to count the per-Ei (4) son. Since linked persons may be geographically close or Note that the numerator term separates the E-sample enu- far apart, this has implications for the degree of synthetic merations with a duplicate link to a census enumeration error. On the E-sample side, this study does not identify outside the A.C.E. search area, as identified in the dupli- whether the linked E-sample case is the correct enumera-cate study, from those enumerations without a link. As tion. Thus, it is necessary to estimate the following condi-discussed in Chapter 5, the duplicate study used tional probability:

computer-based record linkage techniques to match the zt the probability that an E-sample case is a correct Full P- and E-samples to census enumerations outside the enumeration given that it is a duplicate to another search area. The census enumerations included those enu-census enumeration outside the A.C.E. search area.

merations that were added too late to be included in the E sample, as well as those enumerations that were deter-E-Sample Links mined to be duplicates and, therefore, were never included in the census. From the duplicate study, an estimate of correct census The term CE ND i estimates the number of correct enumera- enumerations can be derived by considering the situation tions in the Full E sample without duplicate links in post- of the linked enumerations, as well as assuming that each stratum i. This term includes the probability of not being a link represents one correct enumeration. This assumes, of duplicate, 1-pt. course, that the link consists of true duplicates. These A.C.E. Revision II Estimation Section IIChapter 6 6-3 U.S. Census Bureau, Census 2000

assumptions are used to estimate the contribution to cor- The components of equation (4) are defined below.

i W, t pt z t PRce, t rect enumerations from Full E-sample cases with duplicate CE~ D E links, including those originally coded as correct, as well ti as those originally coded as erroneous. This contribution is the estimated number of correct enumerations with to correct enumerations is given by the term: CE~D duplicate links in post-stratum i who were retained after i . To esti-unduplication.

mate this term, the E-sample links are first classified according to the characteristic of the linked situation and CENDi W E, t 1 pt PRce, t ti the original coding of the E sample. Attachment 1 summa-rizes this classification and the rules for assigning zts. is the number of correct enumerations without duplicate links in post-stratum i, where the summation is taken over First, linked situations are identified where one compo- all enumerations in the A.C.E. E sample in post-stratum i.

nent of the link is thought to be correct and the other incorrect. If a person in a housing unit links with a person WE, t is the production A.C.E. sampling weight for in a group quarters, such as a college dormitory, the per- E-sample person t.

son in the housing unit is taken to be incorrect and pt is the probability that person t has a duplicate assigned a zt of zero. See Linked Situation 1. in Attachment link outside the search area. This is usually 0 or

1. If a linked person 18 years of age or older is listed in 1, but could be between these two values for only one of the households as a child of the reference per- probability matches, where the accuracy of the son, this person is assumed to be incorrectly included link was uncertain.

with their parents and correctly included in the other PRce, t is the probability that person t is a correct enu-household, unless A.C.E. had already determined them to meration in the original production coding. This be an erroneous inclusion. An example of this might be a is either 0 or 1 unless it was not possible to code college student that was listed with their parents and also the E-sample case a correct or erroneous enu-listed in an off-campus apartment. This is represented by meration. In these cases, a probability of correct Linked Situations 2a. and 2b. in Attachment 1. enumeration was imputed.

For other Linked Situations, the choice of which person is CEi'ND*

WRR, t i' E

t 1 pt PRceR, t correct is not clear. Consider links between whole house- fl, i' holds where all household members are duplicated CEi'ND WR, t i' E

t 1pt PRce, t (Linked Situation 3.). This includes families that might have moved some time around Census Day and were inad- where E

vertently included at both places or this might involve WRR,t is the A.C.E. Revision Sample weight for per-households with multiple residences with a helpful, but son t to be used for Revision Sample coding.

perhaps, uninformed proxy respondent. Another situation, E WR, is the A.C.E. Revision Sample weight for per-t Linked Situation 4., involves children ages 0 to 17, per- son t to be used with production coding.

haps of divorced parents, that are linked between two dif- These two weights could differ slightly ferent households. For these and all other situations, it is depending on TES status and noninterview assumed that only half of these census enumerations with adjustment.

duplicate links are correct. To estimate the conditional probability, zt , that the E-sample person is the correct enu- PRceR, t is the probability that person t is a correct meration, controls cells are defined for Linked Situations enumeration in the A.C.E. Revision Sample 3., 4., and 5., as indicated in Attachment 1, by: coding.

  • 3 Race/Hispanic Origin Domains Ei WE, t ti
  • Tenure is the total weighted number of persons in the E sample in These resulting control cells are given in Attachment 2. post-stratum i.

Within each control cell the zts are determined such that ESTIMATION OF rM duplicate E-sample cases, originally coded correct or unre- This section discusses the estimated match rate in equa-solved, will weight up to one half the number of census tion (2). E-sample post-strata are indexed by i, while the duplicates identified, including the erroneous enumera- P-sample post-strata are indexed by j. The match rate for tions. This is calculated as: post-stratum j is represented as:

0.5 Wt pt rM, j t

z t t Wt pt Pr CE MNDnm, j f 2, j' Mnm, j D

[ Mom, j f 3, j' Pom, j f 4, j'

] (Pim, j f 5, j' g (Pnm, D ~D j Pnm, j))

~D ~D nm, j f 6, j' Pnm, j Pim, j f 5, j' g (Pnm, j Pnm, j)

PND D The summations are over the links in a control cell regard-less of the original E-sample coding. (5) 6-4 Section IIChapter 6 A.C.E. Revision II Estimation U.S. Census Bureau, Census 2000

The residence status of P-sample movers was adjusted for identifying correct residence is similar to the error in iden-coding error. The computer matching results were not tifying correct enumeration for similar situations. There-used. Outmovers in the P sample were collected by a fore, the ht for P-sample persons is set equal to the zt deter-proxy interview, which made it difficult to obtain date of mined for the E sample for comparable linked situations birth and age information. Since date of birth and age as identified by the control cells in Attachment 2. The hts were important characteristics used in the computer are then included in the weighted tallies, along with the pt, matching, the movers were only adjusted for coding error. to calculate the duplicate contribution to the Full P-sample nonmovers and nonmover matches.

Although the duplicate study identified E- and P-sample cases linking to census enumerations outside the A.C.E. The terms in equation (5) are defined below. Summation search area, this study could not determine which compo- t j denotes summation over A.C.E. Full P-Sample post-nent of the link was the correct one, since there were no stratum j, while summation t j' denotes summation over additional data collected to determine this. Assuming that Revision Sample post-stratum j'. The summation notation the linked person does exist, the goal is to determine also indicates whether the sum is taken over nonmovers, which of the two locations is the appropriate place to outmovers, or inmovers, and if the Production () or Revi-count the person. Since linked persons may be geographi- sion (R) Sample coding is used.

cally close or far apart, this has implications for the degree of synthetic error.

nm, j MND W,P t (1pt) PRres,t PRm, t tj On the P-sample side, this study does not identify whether t nonmover the linked P-sample case is a resident on Census Day. production Thus, it is necessary to estimate the following conditional where probability: WP, t is the P-sample production weight of person t.

ht is the probability that a P-sample case is a resident on pt is the probability that person t has a duplicate Census Day given that it links to a census enumera- link outside the search area.

tion outside the A.C.E. search area.

PRm, t is the probability that person t is a match in the production coding.

P-Sample Links PRres, t is the probability that person t is a resident in the Unlike the E-sample side, the duplicate study does NOT production coding.

provide an estimate of the number of correct Census Day residents in the P sample. In order to estimate ht the prob- W RR, P t 1 p t PRresR, t PRmR,t t j' ability that a P-sample case is a resident on Census Day t nonmover given that it links to a census enumeration outside the ND*

Mnm, revision j'

search area, it is necessary to borrow the resulting zts f 2, j' from the E-sample links. Attachment 1 summarizes how ND Mnm, j' W R, P t 1 pt PRres, t PRm,t t j' the hts borrow information from the zts. t nonmover production First, the P-sample links to census enumerations outside the search area are identified for situations where it can be is the double-sampling adjustment for nonmover matches.

determined which component of the link is the correct PRmR, t is the probability that person t is a match in the residence. The Linked Situations and rules for assigning Revision Sample coding.

hts are the same as those used for comparable types of PRresR, t is the probability that person t is a resident in E-sample links. For example, consider a P-sample person the Revision Sample coding.

18 years of age or older, listed as a child of the reference P W RR, t is the A.C.E. Revision Sample weight for person t person who links with a census enumeration in a house-to be used for Revision Sample coding.

hold where they are not listed as a child. This P-sample P W R, t is the A.C.E. Revision Sample weight for person t person would be assigned an ht of zero regardless of how to be used with production coding. These two A.C.E. coded this person. Thus, it is assumed that this per-weights could differ slightly depending on TES son should not have been included in the P sample.

status and the noninterview adjustment.

For the other Linked Situations 3., 4., and 5., there once again is no information to determine whether the P sample Mom, j WP, t PRres, t PRm, t tj had the person at the correct location or whether the cen-t outmover sus had them at the correct location. Additionally, there is production no reasonable assumption about how many of these linked P-sample persons should be at the correct location. is the number of matched outmovers in the Full Sample in To overcome this obstacle, it is assumed that the error in post-stratum j.

A.C.E. Revision II Estimation Section IIChapter 6 6-5 U.S. Census Bureau, Census 2000

~D j P nm, j is an estimate of nonresidents among D

WRR,P t PRresR, t PRmR, t The term P nm, t j' nonmovers with duplicate links. This term is multiplied by g, t outmover which is an estimate of the proportion of originally-coded non-Mom,j' revision movers with duplicate links who are true nonresidents that have f 3, j' Mom, j' WR,P t PRres, t PRm, t moved in since Census Day. The term g is estimated using the t j' Revision Sample and both the original A.C.E. and the revision t outmover coding as follows:

production is the double-sampling ratio for matched outmovers for PDnm, im*

g D post-stratum j'. Pnm, nr*

D Pom, j W P, t PRres, t Pnm, im* is an estimate of persons (using the Revision P t, j sample) with a duplicate link who were originally t nonmover coded as nonmovers but the revision coding deter-mined them to be inmovers (a subset of nonresi-production dents).

is the number of outmovers in the Full Sample for post- D P nm, nr* is an estimate of persons (using the Revision P stratum j. sample) with a duplicate link who were originally coded as nonmovers but the revision coding deter-WRR, P t PRresR, t mined them to be nonresidents.

t j' t outmover Pom, revision A couple of important assumptions are:

j' f 4, j' P om, j' WRP , t PRres, t

  • If the revision coding determined that a person was a t j' nonresident, they really are a nonresident. That is, t outmover revision-coded nonresidents are assumed to be a subset production of true nonresidents.

is the double-sampling ratio for outmovers for post-

  • The rate of inmovers for revision-coded nonresidents is stratum j'. the same as that for true nonresidents.

Pim, j W P, t ~D M nm, j WP, t pt ht PRm, t PRres, t tj tj t nonmover t inmover production production is the number of inmovers in the Full Sample post-stratum is the number of duplicate persons determined to have

j. been Census Day residents who matched to the census in poststratum j.

WRR,P t PRinmoverR, t t j' PNDnm, j W,P t 1 pt PRres, t t inmover tj Pim, j' revision f 5, j' t inmover P im, j' WRP , t production t j' t inmover is the number of nonmovers without links outside the production search area in post-stratum j.

is the double-sampling ratio for inmovers for post-stratum WRR, P t 1 pt PRresR, t j'. t j' t nonmover PRinmoverR, t is the probability that person t in the Revi- ND*

Pnm, j' revision sion Sample is an inmover. f 6, j' ND P nm, j' P WR, t 1 pt PRres, t g (P D nm, j P~ D nm, j) t j' t nonmover production The term g adjusts the number of inmovers for those Full P-sample nonmovers who are determined to be nonresidents is the double-sampling adjustment for nonmovers in post-because of duplicate links. Some of these nonresidents are stratum j'.

nonresidents because they are inmovers and should be added to the count of inmovers.

6-6 Section IIChapter 6 A.C.E. Revision II Estimation U.S. Census Bureau, Census 2000

applied for post-strata with nine or fewer P-sample out-P~ nm, D

j W P, t pt ht PRres, t tj movers. For these post-strata, it was assumed that some t nonmover of the duplicate links determined not to have been resi-production dents were really outmovers.

is the estimated number of nonmover persons with dupli- The DSE formula that uses procedure A for movers with cate links who were residents after unduplication. different post-strata for the E- and P-samples is:

CEi D

P nm, j W, P t Pt PRres, t tj Ei DSEijA Cenij rDD, ij Mnm, j Mom, j t nonmover production [ Pnm, j Pom, j

]

is the number of P-sample persons with duplicate links, regardless of whether they were determined to be resi- The A.C.E. Revision II DSE formula, using procedure A for dents by the unduplication process. movers, separate E- and P-sample post-strata, measure-ment error corrections from the E- and P- Revision Samples, and duplicate study results is written:

THE A.C.E. REVISION II DSE FORMULA DSEijA The A.C.E. Revision II DSE formula, using procedure C for movers, separate E- and P-sample post-strata, measure-ment error corrections from the E- and P- Revision [ CENDi f 1, i' CE~ Di Ei

]

Samples, and duplicate study results is: Cenij rDD,ij DSEijC [ ND Mnm, ND Pnm,

~D j f 2, j' Mnm, j Mom, j f 3, j' g Mnm, j Mnm, j

~D D

~D D

j f6, j' Pnm, j Pom,j f 4, j' g Pnm, j Pnm, j D

]

[ CEND i f 1, i' CE~ iD Ei

] This version of the formula is used only when the sample size for outmovers in the Full P sample is strictly less

[ M om, j f 3, j'

]

Cenij rDD, ij

[ ND M nm, j f 2, j' M nm, j ND P nm, D

~D P om,j f 4, j' P im, j f 5, j' g P nm, D

D

~D j f 6, j' P nm,j + Pim, j f 5, j' g Pnm, j P nm, j

~D j P nm, j

] than 10. This formula was used 93 times in the A.C.E.

Revision II production process. The new term introduced in this formula is defined as follows:

MDnm, j Wp, t pt PRres, t PRm, t Notation tj t nonmover Terms CE Correct enumerations production E E-sample total M Matches is the number of matched P-sample persons with duplicate P P-sample total f Adjusts for measurement error links, regardless of whether they were determined to be g Adjusts nonmovers to movers due to residents by the unduplication process.

duplication Subscripts i,j Full E and P post-strata A.C.E. REVISION II POST-STRATIFICATION DESIGN i', j' Revision E and P measurement error correction post-strata The Full E- and P-samples with the original coding results nm, om, im nonmover, outmover, inmover that were used to produce the March 2001 estimates of Superscripts C DSE procedure C for movers census coverage provided the basis of the A.C.E.

ND Not a duplicate to census enumeration Revision II estimates. The March 2001 A.C.E. estimates outside search area were determined to be unacceptable because of the pres-D Duplicate to census enumeration outside search area ence of large amounts of measurement error. These Full Includes probability adjustment for residency samples were comprised of over 700,000 sample persons given duplication each. Instead of one set of post-stratification variables, the A.C.E. Revision II estimates include separate post-strata for the Full E and P samples, indicated by subscripts i and j, In some small post-strata, the number of inmovers was respectively.

substantially larger than the number of outmovers. If there were only a few outmovers, the outmover match rate was Full P Sample subject to high sampling error. In these post-strata, it was not considered appropriate to apply a suspect match rate For the Full P sample, the new post-strata were nearly to what could be a relatively large number of inmovers, so identical to those used for the March 2001 A.C.E. esti-PES-A was used. PES-A uses only outmovers. PES-A was mates. The only difference was that the 0-17 age group A.C.E. Revision II Estimation Section IIChapter 6 6-7 U.S. Census Bureau, Census 2000

was split into two groups, 0-9 and 10-17, which resulted Figure 6-1. P-Sample Age/Sex Groupings in some collapsing differences. The Full P sample, consist-ing of 480 post-strata, was based on the following charac- 8 groups 4 groups 1 group*

Age teristics (as opposed to the previous 416 post-strata): Male Female Male Female Male Female

  • Race/Hispanic Origin Domain 0-9
  • Tenure 10-17
  • Size of Metropolitan Statistical Area 18-29 30-49
  • Type of Census Enumeration Area 50+
  • Return Rate Indicator (Low vs. High)
  • The 1 group is not used for the Full P-sample post-strata (j), only
  • Region the Revision P-sample post-strata (j').
  • Age
  • Sex Table 6-1 shows the 64 Full P-sample post-stratum groups.

For the Full P sample, the post-stratum groups either The number in each cell represents the number of retained all eight Age/Sex categories or were collapsed to Age/Sex categories in each post-stratum group.

four Age/Sex categories as shown below:

6-8 Section IIChapter 6 A.C.E. Revision II Estimation U.S. Census Bureau, Census 2000

Table 6-1. Full P-Sample Post-Stratum Groups and Number of Age and Sex Groupings (j)

High return rate Low return rate Race/Hispanic origin Tenure MSA/TEA domain number NE MW S W NE MW S W Large MSA MO/MB 8 8 8 8 8 4 8 4 Medium MSA MO/MB 8 8 8 8 4 8 8 8 Owner Small MSA & Non-MSA MO/MB 8 8 8 8 4 8 8 8 Domain 7 All other TEAs 8 8 8 8 8 8 8 8 (Non-Hispanic White or Some other race) Large MSA MO/MB 8 8 Medium MSA MO/MB 8 8 Nonowner Small MSA & Non-MSA MO/MB 8 8 All other TEAs 8 8 Large MSA MO/MB 8 8 Medium MSA MO/MB Owner Small MSA & Non-MSA MO/MB 8 8 Domain 4 All other TEAs (Non-Hispanic Black) Large MSA MO/MB 8 8 Medium MSA MO/MB Nonowner Small MSA & Non-MSA MO/MB 8 4 All other TEAs Large MSA MO/MB 8 8 Medium MSA MO/MB Owner Small MSA & Non-MSA MO/MB 8 8 Domain 3 All other TEAs (Hispanic) Large MSA MO/MB 8 8 Medium MSA MO/MB Nonowner Small MSA & Non-MSA MO/MB 8 4 All other TEAs Domain 5 Owner 4 (Native Hawaiian or Pacific Islander) Nonowner 4 Domain 6 Owner 8 (Non-Hispanic Asian)

Nonowner 8 Domain 1 Owner 8 American (On Indian or Reservation) Nonowner 8 Alaska Domain 2 Owner 8 Native (Off Reservation) Nonowner 8 A.C.E. Revision II Estimation Section IIChapter 6 6-9 U.S. Census Bureau, Census 2000

Full E Sample relationship category consists of single-person house-holds and persons in housing units with any other type For the A.C.E. Revision II Full E sample, the post-strata of relationship, including unrelated persons.

definitions have undergone major revisions. Some of the original post-stratification variables were omitted and

  • Household Size. Household size, or number of per-additional variables were added. Logistic regression mod- sons residing in the housing unit.

els identified several variables, not included in the Full

  • Early/Late Mailback. Persons in mailback housing P-sample post-stratification, that were good indicators of units with an earliest form processing date. On or correct enumeration. The Full E sample, consisting of 525 before March 24 is early and after March 24 is late.

post-strata, was defined using the following characteristics:

  • Early/Late Nonmailback. Persons in nonmailback housing units with an earliest form processing date. On
  • Proxy Status or before June 1 is early and after June 1 is late.
  • Race/Hispanic Origin Domain For the Full E sample, the post-stratum groups either
  • Tenure retained all eight Age/Sex categories or were collapsed to four, two, or one Age/Sex groups, based on sample sizes,
  • Household Relationship as shown below:
  • Household Size Figure 6-2. E- Sample Age/Sex Groupings
  • Type of Census Return (mailback vs. nonmailback) 8 groups 4 groups 2 groups 1 group
  • Date of Return (early vs. late) Age Male Female Male Female Male Female Male Female
  • Age
  • Sex 0-9 The new variables proxy status, household relationship and size, and type (mailback/nonmailback) and date 10-17 (early/late) of census return are described generally below.

18-29

  • Proxy Status. Nonproxy includes those housing unit persons for whom census data were provided by a household member. Proxy includes those housing unit 30-49 persons for whom census data were provided by a non-household member, such as a neighbor or rental agent. 50+
  • Household Relationship. The Householder/Nuclear (HHer/Nuclear) relationship category includes persons Table 6-2 shows the 93 Full E-sample post-stratum groups.

in housing units consisting only of the householder with The number in each cell represents the number of spouse or own children (l7 or younger). The Other Age/Sex categories in each post-stratum group.

6-10 Section IIChapter 6 A.C.E. Revision II Estimation U.S. Census Bureau, Census 2000

Table 6-2. Full E-Sample Post-Stratum Groups and Number of Age and Sex Groupings Early Early Late Late non-Proxy status & domain Tenure Relationship HH Size non-mailback mailback mailback mailback Proxy: Domain 7 (Non-Hispanic White or Some Other Race) 8 Proxy: Domain 4 (Non-Hispanic Black) 8 Proxy: Domain 3 (Hispanic) 8 Proxy: Domain 5 (Native Hawaiian or Pacific Islander) 1 Proxy: Domain 6 (Non-Hispanic Asian) 4 Proxy: Domain 1 (America Indian or Alaska Native On Reservation) 4 Proxy: Domain 2 (American Indian or Alaska Native Off Reservation) 1 2-3 8 8 8 8 HHer/Nuclear 4+ 8 8 4 8 Owner 1 2 2 1 2 Nonproxy:

Domain 7 Other 2-3 8 8 2 4 (Non-Hispanic White or Some Other Race) 4+ 8 8 4 8 HHer/Nuclear 8 8 8 8 Nonowner Other 8 8 8 8 HHer/Nuclear 4 4 2 4 Owner Nonproxy: Other 8 8 4 8 Domain 4 (Non-Hispanic Black) HHer/Nuclear 8 8 8 8 Nonowner Other 8 8 8 8 HHer/Nuclear 8 8 4 8 Owner Nonproxy: Other 8 8 4 8 Domain 3 (Hispanic) HHer/Nuclear 8 8 8 8 Nonowner Other 8 8 8 8 Nonproxy: HHer/Nuclear 2 2 2 2 Owner &

Domain 5 Nonowner (Native Hawaiian or Pacific Islander) Other 2 2 1 2 Nonproxy: HHer/Nuclear 8 8 4 4 Owner &

Domain 6 Nonowner (Non-Hispanic Asian) Other 4 4 2 4 Owner & HHer/Nuclear 8 Domain 1 On Reservation Nonowner Other 8 Nonproxy:

(American Indian or Alaska Native) HHer/Nuclear 2 2 2 2 Domain 2 Owner &

Off Reservation Nonowner Other 2 2 1 2 A.C.E. Revision II Estimation Section IIChapter 6 6-11 U.S. Census Bureau, Census 2000

Revision P Sample The measurement error correction post-stratum definitions The Revision P sample is a subsample of the Full P sample (j' ) depend on a persons mover status. Both inmovers and is comprised of over 60,000 sample persons. The and outmovers are subdivided into Owner and Nonowner Revision P sample has been subjected to an additional groups. For nonmovers, the measurement error correction field interview and/or rematching operation as part of the post-strata are: American Indians on Reservations (AIR) original A.C.E. evaluation program. In support of the and, for the Non-AIR cases, a cross of Tenure (Owner ver-A.C.E. Revision II program, the Revision P sample has sus Nonowner) with eight Age and Sex categories. The undergone extensive recoding using all available interview Age/Sex collapsing pattern from the Full P sample is data and matching results. Missing data adjustments have retained when defining the measurement error correction also been applied to the Revision P sample. This recoded post-strata. The Revision P-sample post-strata (j' ) are data are used to correct for measurement error in the Full defined as follows:

P sample.

Figure 6-3. Revision P-Sample Post-Strata (j')

8 groups Mover Status & Domain Tenure Age 1 group Male Female Owner Movers:

Domains 1 thru 7 Nonowner 0-9 10-17 Owner 18-29 N/A 30-49 Nonmovers: 50+

Domains 2 thru 7 0-9 10-17 Nonowner 18-29 N/A 30-49 50+

Nonmovers:

Domain 1 (American Indian or Alaska Native On Reservation)

N/A means not applicable.

6-12 Section IIChapter 6 A.C.E. Revision II Estimation U.S. Census Bureau, Census 2000

Revision E Sample For the Revision E sample, the measurement error correc-The Revision E sample is a subsample of the Full E sample tion post-strata are: Proxies, American Indians on Reserva-and is comprised of over 75,000 sample persons. The tions (AIR) and, for the Nonproxy/Non-AIR cases, a cross Revision E sample has been subjected to an additional of a two-level Relationship variable with eight Age/Sex field interview and/or rematching operation as part of the categories. Note that Household Size is collapsed out of original A.C.E. evaluation program. In support of the the Household Relationship/Size variable. The Age/Sex A.C.E. Revision II program, the Revision E sample has collapsing pattern from the Full E sample is retained when undergone extensive recoding using all available interview defining the measurement error correction post-strata. The data and matching results. Missing data adjustments have Revision E sample post-strata (i' ) are defined as follows:

also been applied to the Revision E sample. These recoded data are used to correct for measurement error in the Full E Sample.

Figure 6-4. Revision E-Sample Post-Strata (i')

8 groups Proxy Status

& Domain Relationship Age 1 group Male Female Proxy:

Domain 7 (Non-Hispanic White or Some Other Race)

Domain 4 (Non-Hispanic Black)

Domain 3 (Hispanic)

Domain 5 (Native Hawaiian or Pacific Islander)

Domain 6 (Non-Hispanic Asian)

Domain 1 (American Indian or Alaska Native On Reservation)

Domain 2 (American Indian or Alaska Native Off Reservation) 0-9 10-17 HHer/Nuclear 18-29 N/A 30-49 Nonproxy: 50+

Domains 2 thru 7 0-9 10-17 Other 18-29 N/A 30-49 50+

Nonproxy:

Domain 1 (American Indian or Alaska Native On Reservation)

N/A means not applicable.

A.C.E. Revision II Estimation Section IIChapter 6 6-13 U.S. Census Bureau, Census 2000

ADJUSTMENT FOR CORRELATION BIAS USING Blacks and non-Blacks. This correlation bias adjustment is DEMOGRAPHIC ANALYSIS calculated as:

The dual system estimates are adjusted to correct for cor-relation bias. Correlation bias exists whenever the prob-ability that an individual is included in the census is not independent of the probability that the individual is cR, k (

ij k DSE ij ij k DSEij Rf Rm

) rDAR, k included in the A.C.E. This form of bias generally has a downward effect on estimates, because people missed in where the census may be more likely to also be missed in the A.C.E. Estimates of correlation bias are calculated using DSEijRf = DSE for race, R=Black or non-Black, female the two-group model and sex ratios from Demographic post-strata ij.

Analysis (DA). The sex ratio is defined as the number of DSEijRm = DSE for race, R=Black or non-Black, male post-males divided by the number of females. This model strata ij.

assumes no correlation bias for females or for males rDAR,k = DA sex ratio for race, R=Black or non-Black, for under 18 years of age; no correlation bias adjustment for age group k as given in Attachment 3.

non-Black males aged 18-29; and that Black males have a The sum over the ij post-strata includes only the intersec-relative correlation bias that is different than the relative tion of those post-strata with age group k.

correlation bias for non-Black males. The correlation bias adjustment is also done by three age categories: 18-29, DSEs Adjusted for Correlation Bias 30-49, and 50 and over. This model further assumes that relative correlation bias is constant over male post-strata A correlation bias-adjusted DSE for a male, 18+ post-within age groups. The Race/Hispanic Origin Domain vari- stratum ij in age-race group k is calculated as:

able is used to categorize Black and nonBlack.

DS~ Eijm ck DSEijm The DA totals are adjusted to make them comparable with A.C.E. Race/Hispanic Origin Domains. Black Hispanics are For all remaining post-strata, which includes female post-subtracted from the DA total for Blacks and added to the strata as well as post-strata for persons under 18 years of DA total for non-Blacks. This is done because the A.C.E. age, no correlation bias adjustment is done. Thus:

assigns Black Hispanics to the Hispanic domain, not the Black domain. The second adjustment deletes group quar- DS~ E ijf DSE ijf ters people from the DA totals using Census 2000 data.

The reason for making this adjustment is that the group The DS~ Eijs are then used to form synthetic estimates.

quarters population is not part of the A.C.E. universe. A final adjustment that could be made would be to remove SYNTHETIC ESTIMATION the Remote Alaska population from the DA totals, since it too is not part of the A.C.E. universe. Since this population The coverage correction factors for detailed post-strata ij is small, the DA sex ratios would not be affected in any are calculated as:

meaningful way. The resulting DA sex ratios for the three DS~ Eij age groups by Black and non-Black domain are shown in CC ~F ij . Cenij In general the correlation bias adjustment factor, ck, is where the DS~ Eij are the correlation bias-adjusted DSEs for defined for k = 3 age groups such that: post-stratum ij.

E [ck DSEm k ] True male population for age group k, Cenijs are the census counts for post-stratum ij. Note that where this Cenij includes late census adds.

DSEm k is the sum of DSEs over male post-strata in age A coverage correction factor was assigned to each census group k. person, except those in group quarters or Remote Alaska.

Since the purpose of this adjustment is to reflect persons Effectively, these persons have a coverage correction fac-missed in both the census and the A.C.E., the value of ck tor of 1.0. In dealing with duplicate links to group quar-was not allowed to be less than one. ters persons, the person in the group quarter was treated as the correct enumeration, or that this was their correct Correlation Bias Adjustment for Black and residence on Census Day. A synthetic estimate for any Non-Black Males 18 Years and Older area or population subgroup b is given by:

The correlation bias adjustment for Black and non-Black males 18 years and older is done so that the A.C.E Revi- N~

b Cenb, ij CC~ Fij sion II sex ratios will agree with the DA sex ratios for ij b 6-14 Section IIChapter 6 A.C.E. Revision II Estimation U.S. Census Bureau, Census 2000

Note that the coverage correction factor can be expressed as:

CC~ Fij ( )( )

DD ij rCE, i Cen ij rM, j ck where rCE, i is the correct enumeration rate component of the DSE, varying over i post-strata.

rM, j is the match rate component of the DSE, varying over j post-strata.

ck is the correlation bias adjustment factor, varying over the Black and non-Black groups and k age cells.

DDij Cenij is the data-defined rate, varying over the ij post-strata.

A.C.E. Revision II Estimation Section IIChapter 6 6-15 U.S. Census Bureau, Census 2000 Rules for Assigning zt & ht for Full P- and E-Sample Duplicate Links The Linked Situations and assignment of zts and hts occur in the order listed below.

Linked situation Original Original zt ht (E or P) (Census) E coding P coding EE 0 NonRes 0

1. (Person in a housing unit) (Person in a group quarters)

CE/UE 0 Res/UE 0 EE 0 NonRes 0 2a. (Person 18+, child of reference person) (Person 18+, not child of reference person)

CE/UE 0 Res/UE 0 EE 0 NonRes 0 2b. (Person 18+, not child of reference person) (Person 18+, child of reference person)

CE/UE 1 Res/UE 1 EE 0 NonRes 0

3. (All persons in a housing unit) (All persons in another housing unit)

CE/UE z1 Res/UE z1 EE 0 NonRes 0

4. (Child 0-17) (Child 0-17)

CE/UE z2 Res/UE z2 EE 0 NonRes 0

5. All remaining linked situations CE/UE z3 Res/UE z3 EE is erroneous enumeration.

CE is correct enumeration.

UE is unresolved.

Res is resident on Census Day.

NonRes is not a resident on Census Day.

6-16 Section IIChapter 6 A.C.E. Revision II Estimation U.S. Census Bureau, Census 2000 Control Cells for Linked E Sample Race/Hispanic Origin Domain Tenure Linked situation Control cell 3.

Owner 4.

5.

Domain 4 (Non-Hispanic Black) 3.

Nonowner 4.

5.

3.

Owner 4.

5.

Domain 3 (Hispanic) 3.

Nonowner 4.

5.

3.

Owner 4.

Domain 7 (Non-Hispanic White or Some Other Race)

Domain 5 (Native Hawaiian or Pacific Islander) 5.

Domain 6 (Non-Hispanic Asian)

Domain 1 (American Indian or Alaska Native On Reservation) 3.

Domain 2 (American Indian or Alaska Native Off Reservation)

Nonowner 4.

5.

A.C.E. Revision II Estimation Section IIChapter 6 6-17 U.S. Census Bureau Census 2000 Correlation Bias Adjustment Groupings and Factors DA sex Adjustment Race/Hispanic Origin Domain Age ratios factor 18-29 0.90 1.08 Black:

30-49 0.89 1.10 Domain 4 (Non-Hispanic Black) 50+ 0.76 1.05 Non-Black: 18-29 1.04 1.00*

Domain 3 (Hispanic)

Domain 7 (Non-Hispanic White or Some Other Race)

Domain 5 (Native Hawaiian or Pacific Islander) 30-49 1.01 1.02 Domain 6 (Non-Hispanic Asian)

Domain 1 (American Indian or Alaska Native On Reservation)

Domain 2 (American Indian or Alaska Native Off Reservation) 50+ 0.86 1.01

  • This number set to 1.00 due to the inconsistency between DA and A.C.E. Revision II results.

6-18 Section IIChapter 6 A.C.E. Revision II Estimation U.S. Census Bureau Census 2000

Chapter 7.

Assessing the Estimates INTRODUCTION the A.C.E. Revision II estimates, the adjustment for correla-tion bias was recalculated for each replicate. An alterna-The evaluations of the A.C.E. Revision II estimates may be tive variance estimation procedure assumed that the form divided into two categories. One category contains the of the correlation bias adjustment was a scalar times the evaluations that focus on individual error components.

double-sampling estimator. The replication method also The other group consists of comparisons of the relative accounts for the A.C.E. block cluster sampling.

error between the census and the A.C.E. Revision II estima-tor.

SYNTHETIC ERROR EVALUATION This chapter provides a brief description of the evaluation studies. The component errors examined by separate The A.C.E. Revision II has several potential sources of syn-studies are sampling error, error from imputation model thetic error. One source involves correcting the individual selection, error due to using inmovers to estimate out- post-stratum estimates for error estimates at more aggre-movers in PES-C, synthetic error, error in the identification gate levels, such as corrections for correlation bias and of the census duplicates as determined by administrative measurement coding errors. However, the evaluation of records, error in the identification of computer duplicates synthetic error focuses on error in small area estimation.

as determined by a clerical review, error from inconsistent Synthetic estimation bias arises when areas in a post-post-stratification variables, and potential error arising stratum have different coverage error rates, but have the from the automated coding of some cases, called the same census coverage correction factor. To assess syn-at-risk coding, in the Revision Sample. The comparisons of thetic estimation bias for a given area, an estimate based relative error between the census and the A.C.E. Revision on data from the area alone, called a direct estimate, must II estimator include a comparison with Demographic be developed. Such an estimate is possible for only large Analysis, the construction of confidence intervals that areas. In lieu of direct estimates, synthetic estimation bias account for bias as well as random error, and loss function in undercount estimates is estimated from analysis of analyses. Also in this category is an examination of the artificial populations or surrogate variables whose geo-consistency of the estimates of coverage error measured graphic distributions are known. These surrogate variables by the A.C.E. Revision II estimator and the Housing Unit are constructed as best as possible to have patterns simi-Coverage Study (HUCS). Although an adjustment for corre- lar to coverage error. Sensitivity analyses assess the lation bias is included in the A.C.E. Revision II estimates, impact of synthetic estimation bias for these variables.

no evaluations address the error in the level of correlation The evaluation of synthetic error within post-strata uses bias or the model used to distribute it across post-strata. an artificial population analysis similar to those conducted The reason is that examining alternative models only for ESCAP I and ESCAP II. These studies are documented in accounts for differences in models. Those differences Griffin and Malec (2001, 2001b). This time, however, the would reflect the variations in how the several models cor- evaluation compares the A.C.E. Revision II estimates and rect the original DSEs for correlation biases, but would not Census 2000. The study uses loss functions for assessing reflect the presence or absence of correlation bias in the the effect of synthetic error. The major products are:

corrected DSEs.

  • Estimates of the bias in the difference between census SAMPLING ERROR loss and A.C.E. Revision II estimator loss.

Sampling error gives rise to random error, which is quanti-

  • Indicator of whether the decision to use the A.C.E.

fied by sampling variance. The sampling variance is Revision II estimator would have changed due to present in any estimate based on a sample instead of the synthetic error.

whole population. The variance estimation methodology is a simplified jackknife with the block clusters being the pri- ERROR DUE TO USING INMOVERS TO ESTIMATE mary sampling unit. The effect of within-cluster subsam- OUTMOVERS IN PES-C pling is implicitly captured in the weighting.

The error due to using inmovers to estimate outmovers is The March 2001 A.C.E. data showed that the simplified unique to the PES-C model for dual system estimation jackknife method produces satisfactory variance esti- used in the original A.C.E. and the A.C.E. Revision II. For mates. Since a correlation bias adjustment was included in the PES-C model, the members of the P sample are the Assessing the Estimates Section IIChapter 7 7-1 U.S. Census Bureau, Census 2000

residents of the housing units on Census Day. There is that can be analyzed to determine the nature of the cen-some difficulty in identifying all the residents of all the sus duplication, so that the information may be used in housing units on Census Day because some move prior to reducing census duplication in 2010 and to aid in the the A.C.E. interview. The A.C.E. interview relies on the evaluation of the methodology for the construction of respondents to identify those who have moved out, the StARS 2000. The study produces a comparison of the esti-outmovers. Since the outmovers are identified by proxies, mated amount of census duplication based on administra-many of the outmovers are not recorded. Therefore, the tive records with the estimate from FSPD.

estimate of outmovers is too low. To avoid a bias caused StARS is new methodology that compiles seven adminis-by an underestimate of the number of movers, PES-C uses trative records files, including files from IRS, Medicare, the number of inmovers to estimate the number of out-HUD, and Selective Service1. The evaluation uses a previ-movers. The inmovers are those who did not live in the ous match between the census and StARS 2000 to assign sample blocks on Census Day, but moved in prior to the an Identification (ID) Number to as many census records A.C.E. interview. Theoretically, the number of inmovers in as possible. The process of assigning ID Numbers was the whole country should equal the number of outmovers.

based on name and address. One pass through the census However, the number of inmovers may not equal the num-files used both the address and the name to assign ID ber of outmovers in a post-stratum because of circum-Numbers. A second pass used only the name and birth stances such as economic conditions causing more people date. A census record was assigned an ID Number only if to move out of an area than to move into an area.

it was linked with exactly one ID Number.

The first step of the methodology consists of raking the number of outmovers to total inmovers. The distribution Census enumerations with the same ID Number are con-of the raked outmovers may better describe the outmov- sidered duplicates. The method accounts for coincidental ers than the distribution of the inmovers. The A.C.E. Revi- agreement of names by requiring assignment of ID Num-sion II estimates formed by using the number of inmovers bers only when exactly one ID Number was linked to the are compared with the A.C.E. Revision II estimates calcu- enumeration. In most cases, two people with very similar lated using the raked number. names and characteristics would have linked to each oth-ers ID Number and would not have been assigned a ERROR FROM IMPUTATION MODEL SELECTION unique ID Number.

This project estimates the uncertainty due to choice of imputation model by drawing on the analysis of reason- CLERICAL REVIEW OF COMPUTER DUPLICATES able alternatives to the imputation model conducted in The study examines accuracy of the FSPD computer identi-2001. See Keathley et al. (2001) for details. The ideal fication of duplication in the census by having clerks approach would be to repeat the very time-consuming review the enumerations that the computer designates as analysis of reasonable alternatives for the A.C.E. Revision duplicates. The clerks determine whether the sets of two II estimator. However, this analysis was not conducted due enumerations appear to be the same persons. In addition, to limited resources. Instead, an estimate of the additional census enumerations identified as duplicates by adminis-variance due to the choice of imputation model is devel- trative records, but not by the computer, also have a cleri-oped using the previous A.C.E. work. cal review. The potential census duplicates identified by Estimates of the variance component for census coverage administrative records are a by-product of the evaluation correction factors that account for the missing data error of the computer duplicates using administrative records.

component due to the imputation of enumeration status, The review is restricted to duplicates between enumera-residency status, match status, and the P-sample noninter-tions in the E sample in the A.C.E. blocks and census enu-view adjustment are formed. The replicates used to merations outside the search area. Links between estimate the missing data variance are used in the loss P-sample nonmatches and enumerations outside the function analysis to represent the random error due to the search area also are reviewed.

choice of the models imputation for missing data.

The clerical review produces the following:

EXAMINING THE QUALITY OF THE COMPUTER DUPLICATES WITH ADMINISTRATIVE RECORDS

  • Number of E-sample enumerations with false duplicate Administrative records provide an opportunity to examine links identified by the computer.

the quality of the estimates of duplicate enumerations used in the A.C.E. Revision II estimates. This study uses 1 The Census Bureau obtains administrative data for its StARS the Statistical Administrative Records System (StARS) 2000 database as authorized by Title 13 U.S.C., section 6 and sup-(Leggieri et al., 2002; Judson, 2000) to assess the effec- ported by provisions of the Privacy Act of 1974. Under Title 13, tiveness of the automated methodology used in the Fur- the Census Bureau is required to protect the confidentiality of all the information it receives directly from respondents or indirectly ther Study of Person Duplication (FSPD) to identify dupli- from administrative agencies and is permitted only to use that cate enumerations. Secondary goals are to provide data information for statistical purposes.

7-2 Section IIChapter 7 Assessing the Estimates U.S. Census Bureau, Census 2000

  • Number of E-sample enumerations with missed dupli- for the matches. Although recodes from the PFU/EFU cates identified by administrative records that are cor- Review were available for the matches in the P sample, rect. none of the nonmatches or unresolved cases were included.
  • Number of P-sample nonmatches with false duplicate links identified by the computer. The categories of cases not sent for clerical review had a high agreement rate between the PFU and EFU codes
  • Number of P-sample nonmatches with missed dupli- assigned by the computer algorithm. For the cases in cates identified by administrative records that are cor- these categories where the PFU and EFU disagreed, the rect. selected code came from the form with more detailed information. Therefore, there are three types of cases in With these results, the accuracy rate for the computer the estimation:

identification of duplicates in the census and between the P-sample nonmatches and the census can be computed. 1. The PFU and EFU codes assigned by computer agree.

2. The PFU and EFU codes assigned by computer dis-AT-RISK CODING agree, but are in a category where there is high con-sistency between the PFU and EFU codes, and either The study assesses the amount of error at risk due to not the PFU form or the EFU form does not have answers having each and every case in the Evaluation Follow-up to all the questions. The code for the form with com-(EFU) sample reviewed clerically (Adams and Krejsa, 2002).

plete data is selected.

The data collected in the Evaluation Follow-up of the A.C.E. found errors in the coding of E-sample census enu- 3. Clerically assigned codes.

meration status and P-Sample residence and match status that needed to be corrected for the A.C.E. Revision II esti- The first group is called the at-risk cases. These cases mator. Ideally, this would mean recoding the entire A.C.E. may have a higher risk of error than the others because sample, but that was not possible because the Evaluation the lack of clerical review, even though the codes assigned Follow-up collected data in only 2,259 out of the 11,303 by the computer algorithm agree. However, cases in the A.C.E. sample clusters. Even clerically recoding the 70,000 second group may also have error, although they are in a cases in the Evaluation Follow-up sample was not feasible category with high consistency between the PFU and EFU.

because of time constraints. A new strategy was devised For these cases, there is no way to assess the risk of error to provide the most high quality data in the time allowed due to the lack of information on one of the forms.

by restricting the clerical review to the more difficult To assess the potential for error, the at-risk cases are cases. This strategy reduced the clerical workload to assumed to have the same error rate as cases in their cat-about 25,000, which could be done, and ensured the larg-egory in the PFU/EFU Review. The potential impact is est sample possible for the A.C.E. Revision II estimates.

assessed by comparing the A.C.E. Revision II double-Since the Person Follow-up (PFU) and the Evaluation sampling adjustment factors with the double-sampling Follow-up (EFU) questionnaires had been keyed and were ratios under the assumption that incorporates the error available in electronic form, data were combined using an rates. The double-sampling adjustment factors are algorithm based on the keyed data and a clerical coding of described in Chapter 6.

the categories of cases where the computer did not appear to do a good job. INCONSISTENCY OF POST-STRATIFICATION VARIABLES The method compares the code assigned based on the PFU questionnaire to the code assigned based on the EFU Inconsistency in the E- and P-sample reporting of the char-questionnaire, and then, determines the best code. The acteristics used in defining the post-strata may create a effectiveness of the computer algorithm is assessed by the bias in the dual system estimate (DSE). This bias affects agreement between the two new codes, and a comparison the estimation of the P-sample match rate.

with recodes assigned in the fall of 2001 to a subsample of the EFU E sample called the Person Follow-up/ The analysis of the post-stratification variables for the Evaluation Follow-up (PFU/EFU) Review. The PFU/EFU A.C.E. Revision II estimator was similar to the investigation Review is believed to have been the best A.C.E. coding done for the original A.C.E. The basic approach was to operation. estimate the inconsistency in the post-stratification vari-ables using the matches, then assume that the rates also For the P sample in the Evaluation Follow-up, a coding held for the nonmatches. The models used for the incon-algorithm for the keyed data from the PFU and EFU ques- sistency analysis of the original A.C.E. post-strata, tionnaires also was developed. Assessing the quality was described in Haberman and Spencer (2001), were fitted in not as easy for the nonmatches and unresolved cases as two steps: (1) models for inconsistency of basic variables, Assessing the Estimates Section IIChapter 7 7-3 U.S. Census Bureau, Census 2000

and (2) derivation of inconsistency probabilities for post- Estimates of differential census coverage are compared by stratification given the inconsistency probabilities of the demographic characteristics, including race, sex, and age.

basic variables. The inconsistency probabilities led to an The estimates of population size based on DA are not estimate of the bias in the P-sample match rate that was viewed with as much confidence as the estimates of differ-used to estimate the bias in the DSE. The approach taken ential coverage. DA does a better job of measuring differ-for the A.C.E. Revision II estimator is to re-calculate the ences in coverage between groups than population size.

models in (1) and (2) to reflect revisions in the P-sample post-stratification and repeat the analysis. In addition, sex ratios from the A.C.E. Revision II estimates and DA are compared. The sex ratio is the ratio of males To assess the bias due to inconsistency in the post-to females and provides a measure of differential coverage stratification variables, the A.C.E. Revision II estimates are of males and females, especially when calculated for race calculated with a correction to the match rate for the inconsistency. Estimates with and without the correction groups.

are then compared.

These comparisons are repeated with 1990 Post-Enumeration Survey and DA estimates to provide a con-CONSISTENCY BETWEEN THE A.C.E. REVISION II text for viewing the comparisons with the 2000 data. An ESTIMATOR AND HUCS assessment is conducted to determine whether both meth-The study examines the validity of the A.C.E. Revision II ods measure the same change in differential net under-estimates by assessing the consistency in the results from counts from 1990 to 2000.

the A.C.E. Revision II estimates and the Housing Unit Cov-erage Study (HUCS) described in Barrett et al. (2001). Since the A.C.E. Revision II estimates could have been used in RELATIVE ACCURACY OF THE CENSUS AND THE the post-censal estimates program that utilizes the aver- A.C.E. REVISION II ESTIMATOR USING CONFIDENCE age household size in many calculations, it is important to INTERVALS AND LOSS FUNCTION ANALYSIS consider the consistency between the A.C.E. Revision II Two additional methods of assessing the relative accuracy estimates and the HUCS data.

of the census and the A.C.E. Revision II estimates are A.C.E. Revision II estimates census coverage for people using confidence intervals for the net undercount rate and and HUCS estimates census coverage for housing units. a loss function analysis. Confidence intervals for net Patterns in the differential coverage for demographic and undercount rates are formed using estimates of net bias geographic groups were examined. Similar patterns in the and variance. Since most of the data available on the qual-measures of change in census coverage between 1990 ity of the original A.C.E. is being incorporated in the A.C.E.

and 2000 for demographic and geographic groups are Revision II estimates, the estimation of the net bias uses expected. If there is a substantial difference in the census the data that were not included. In the loss function analy-coverage error caused by missing whole households and sis, the mean squared error weighted by the reciprocal of by missing people within households, the patterns of dif- the census count is used to estimate loss for levels and ferential coverage of people and of housing units may not shares for counties and places across the nation and have similar patterns. within state.

If there are demographic or geographic groups where the Confidence intervals that incorporate the net bias as well differential coverage from the A.C.E. Revision II estimator as the variance for the net undercount rate provide a and HUCS is substantially different, the study attempts to method for comparing the relative accuracy of the census describe whether the disagreement is a symptom of prob- and the A.C.E. Revision II estimates. The net bias in the lems with the A.C.E. Revision II estimator or HUCS, or the census coverage correction factor is estimated for each result of legitimate differences in coverage. post-stratum. With the estimated bias and variance for each census coverage correction factor, the bias B (U ) and RELATIVE ACCURACY OF THE CENSUS AND A.C.E.

variance V in the net undercount rate are estimated.

REVISION II ESTIMATOR USING DEMOGRAPHIC Also, 95 percent confidence intervals for the net under-ANALYSIS count rate are constructed by Demographic Analysis (DA) uses vital records, immigration statistics, and Medicare data to obtain an estimate of the B U (U 2 V , U B U 2V ).

population size. Since the methods are somewhat inde-pendent of the census, DA provides a method for assess- Since =0 corresponds to no adjustment of the census, ing the relative quality of the census and the A.C.E. Revi- one comparison of the relative accuracy of the census and sion II. The consistency of estimates of differential census the A.C.E. Revision II estimates is based on an assessment coverage from the A.C.E. Revision II estimator and DA are of whether the confidence intervals for the evaluation assessed for demographic groups. post-strata cover 0 and .

7-4 Section IIChapter 7 Assessing the Estimates U.S. Census Bureau, Census 2000

A loss function analysis for levels and shares compares

  • All places with population at least 50,000 but less the census and the A.C.E. Revision II estimator for coun- than 100,000 ties and places across the nation and within state. The
  • All places with population greater than 100,000 measure of accuracy used by the loss functions is the weighted mean squared error with the weights set to the
  • Shares within state reciprocal of the census count for levels and the reciprocal
  • All counties of census share for shares. The motivation for the selected groupings for the loss functions is their potential use in
  • All places the post-censal estimates. These groupings are:
  • Shares within U.S.
  • Levels
  • All places with population at least 25,000 but less than 50,000
  • All counties with population of 100,000 or less
  • All places with population at least 50,000 but less
  • All counties with population greater than 100,000 than 100,000
  • All places with population greater than 100,000
  • All places with population at least 25,000 but less than 50,000
  • All states Assessing the Estimates Section IIChapter 7 7-5 U.S. Census Bureau, Census 2000

Section II.

References Adams, T. and Krejsa, E. (2001). ESCAP II: Results of the Fay, R. (2002). ESCAP II: Evidence of Additional Erroneous Person Followup and Evaluation Followup Forms Review, Enumerations from the Person Duplication Study, Executive Steering Committee for A.C.E. Policy II, Report Executive Steering Committee for A.C.E. Policy II,

24. Report 9, Revised Version, March 27, 2002.

Adams, T. and Krejsa, E. (2002). A.C.E. Revision II Mea- Fay, R. (2002b). Probabilistic Models for Detecting Census surement Subgroup Documentation, DSSD A.C.E. Revision Person Duplication, Proceedings of the Survey Research II Memorandum Series #PP-6. Methods Section, American Statistical Association.

Adams, T. and Liu, X. (2001). ESCAP II: Evaluation of Lack Feldpausch, R. (2001). ESCAP II: Census Person of Balance and Geographic Errors Affecting Person Esti- Duplication and the Corresponding A.C.E. Enumeration mates, Executive Steering Committee for A.C.E. Policy II, Status, Executive Steering Committee for A.C.E. Policy II, Report 2. Report 6.

Barrett, D., Beaghen, M., Smith, D., and Burcham, J. (2001). Griffin, R. and Malec, D. (2001). Accuracy and Coverage ESCAP II: Census 2000 Housing Unit Coverage Study, Evaluation: Assessment of Synthetic Assumption, DSSD Executive Steering Committee for A.C.E. Policy II, Report Census 2000 Procedures and Operations Memorandum

17. Series, B-14*.

Bean, S. (2001). ESCAP II: Accuracy and Coverage Griffin, R and Malec, D. (2001b). ESCAP II: Sensitivity Evaluation Matching Error, Executive Steering Committee Analysis for the Assessment of the Synthetic Assumption, for A.C.E. Policy II, Report 7. Executive Steering Committee for A.C.E. Policy II, Report 23.

Cantwell, P. and Childers, D. (2001). Accuracy and Coverage Evaluation Survey: A Change to the Imputation Haberman, S. and Spencer, B. (2001). Estimation of Cells to Address Unresolved Resident and Enumeration Inconsistent Post-stratification in the 2000 A.C.E. Paper Status, DSSD Census 2000 Procedures and Operations prepared by Abt Associates Inc. and Spencer Statistics, Inc.

Memorandum Series, #Q-44. under Task Number 46-YABC-7-00001, Contract Number 50-YABC-7-66020.

Childers, D. (2001). Accuracy and Coverage Evaluation:

The Design Document, DSSD Census 2000 Procedures Haines, D. (2001). Accuracy and Coverage Evaluation and Operations Memorandum Series, Chapter S-DT-1, Survey: Computer Specifications for Person Dual System Revised. Estimation (U.S.) - Re-issue of Q-37, DSSD Census 2000 Procedures and Operations Memorandum Series #Q-48.

Davis, P. (2001). Accuracy and Coverage Evaluation:

Dual System Estimation Results, DSSD Census 2000 Hogan, H. (1993). The 1990 Post-Enumeration Survey:

Procedures and Operations Memorandum Series #B-9*. Operations and Results, Journal of the American Statistical Association, 88, 1047-1060.

ESCAP I (2001). Report of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy, Hogan, H. (2002). Five Challenges in Preparing Improved March 1, 2001. (See www.census.gov/dmd/www/ Post Censal Population Estimates, DSSD A.C.E. Revision II pdf/Escap2.pdf) Memorandum Series #PP-1.

ESCAP II (2001). Report of the Executive Steering Hogan, H., Kostanich, D., Whitford, D., and Singh, R.

Committee for Accuracy and Coverage Evaluation (2002). Research Findings of the Accuracy and Coverage Policy on Adjustment for Non-Redistricting Uses, Evaluation and Census 2000 Accuracy, Proceedings of the October 17, 2001. (See www.census.gov/dmd/www/ Survey Research Methods Section, American Statistical pdf/Recommend2.pdf) Association.

Fay, R. (2001). ESCAP II: Evidence of Additional Erroneous Ikeda, M. (2001). Accuracy and Coverage Evaluation Enumerations from the Person Duplication Study, Survey: Some Notes Related to Accuracy and Coverage Executive Steering Committee for A.C.E. Policy II, Evaluation Missing Data Procedures, DSSD Census 2000 Report 9, Preliminary Version, October 26, 2001. Procedures and Operations Memorandum Series #Q-77.

References Section IIReferences 1 U.S. Census Bureau, Census 2000

Ikeda, M. and McGrath, D. (2001). Accuracy and Coverage Mule, T. (2002b). Further Study of Person Duplication Evaluation Survey: Specifications for the Missing Data Statistical Matching and Modeling Methodology, DSSD Procedures; Revision of Q-25, DSSD Census 2000 A.C.E. Revision II Memorandum Series #PP-51.

Procedures and Operations Memorandum Series #Q-62.

Mulry, M. and Petroni, R. (2002). Error Profile for PES-C Judson, D. (2000). The Statistical Administrative Records as Implemented in the 2000 A.C.E., Proceedings of the System: System Design and Challenges, Paper presented Survey Research Methods Section, American Statistical at the NISS/Telcordia Data Quality Conference, November, Association.

2000.

Nash, F. (2000). Overview of the Duplicate Housing Unit Keathley, D., Kearney, A., and Bell, W. (2001). ESCAP II:

Operations, Census 2000 Information Memorandum Analysis of Missing Data Alternatives for the Accuracy and Number 78.

Coverage Evaluation, Executive Steering Committee for A.C.E. Policy II, Report 12. Raglin, D. and Krejsa, E. (2001). ESCAP II: Evaluation Results for Changes in Mover and Residence Status in the Kostanich, D. (2003). A.C.E. Revision II: Summary of A.C.E., Executive Steering Committee for A.C.E. Policy II, Methodology, DSSD A.C.E. Revision II Memorandum Series Report 16.

  1. PP-35.

Krejsa, E. and Adams, T. (2002). Results of the A.C.E. Robinson, J. G. (2001). ESCAP II: Demographic Analysis Revision II Measurement Coding, DSSD A.C.E. Revision II Results, Executive Steering Committee for A.C.E. Policy II, Memorandum Series #PP-55. Report 1.

Krejsa, E. and Raglin, D. (2001). ESCAP II: Evaluation Thompson, J., Waite, P., Fay, R. (2001). Basis of Revised Results for Changes in A.C.E. Enumeration Status, Early Approximation of Undercounts Released October 17, Executive Steering Committee for A.C.E. Policy II, Report 3. 2001, Executive Steering Committee for A.C.E. Policy II, Report 9a.

Leggieri, C., Pistiner, A., and Farber, J. (2002). Methods for Conducting an Administrative Records Experiment in U.S. Census Bureau (2003). Technical Assessment of Census 2000, Proceedings of the Survey Research Meth- A.C.E. Revision II, March 12, 2003. (See ods Section, American Statistical Association. www.census.gov/dmd/www/pdf/ACETechAssess.pdf)

Mule, T. (2001). ESCAP II: Person Duplication in Census Winkler, W. (1995). Matching and Record Linkage, 2000, Executive Steering Committee for A.C.E. Policy II, Business Survey Methods, ed. B. G. Cox et al.

Report 20.

(New York: J. Wiley and Sons), 355-384.

Mule, T. (2001b). Accuracy and Coverage Evaluation:

Decomposition of Dual System Estimate Components, Winkler, W. (1999). Documentation for Record Linkage DSSD Census 2000 Procedures and Operations Software, U.S. Census Bureau, Statistical Research Memorandum Series #B-8*. Division.

Mule, T. (2002). Revised Preliminary Estimates of Net Yancey, W. (2002). BigMatch: A Program for Extracting Undercounts for Seven Race/Ethnicity Groupings, DSSD Probable Matches from a Large File for Record Linkage, A.C.E. Revision II Memorandum Series #PP-2. U.S. Census Bureau, Statistical Research Division.

2 Section IIReferences References U.S. Census Bureau, Census 2000

DSSDj03-DM Accuracy and Coverage Evaluation of Census 2000: Design and Methodology USCENSUSBUREAU