ML12335A672
ML12335A672 | |
Person / Time | |
---|---|
Site: | Indian Point |
Issue date: | 03/28/2012 |
From: | Entergy Nuclear Operations, US Dept of Commerce, Bureau of Census |
To: | Atomic Safety and Licensing Board Panel |
SECY RAS | |
References | |
RAS 22094, 50-247-LR, 50-286-LR, ASLBP 07-858-03-LR-BD01 | |
Download: ML12335A672 (91) | |
Text
United States Nuclear Regulatory Commission Official Hearing Exhibit ENT000016 Entergy Nuclear Operations, Inc. Submitted: March 28, 2012 In the Matter of:
(Indian Point Nuclear Generating Units 2 and 3) c,.",tJ"oR REGlJ<..q" ASLBP #: 07-858-03-LR-BD01
- ~'.
Docket #: 05000247 l 05000286 t:~ "0 Exhibit #: ENT000016-00-BD01 Identified: 10/15/2012
~ ~
Admitted: 10/15/2012
'"~ "
i Rejected:
Withdrawn:
........1-
" *** ... ..0-Other:
Stricken:
Technical Assessment of A.C.E. Revision II March 12, 2003 USCENSUSBUREAU Helping You Make Informed Decisions
Table of Contents Technical Summary of A.C.E. Revision II for The Committee on National Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
- 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Overview of the A.C.E. Revision II Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Census Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
- 2. A.C.E. Revision II Results and Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Estimates of Census 2000 Coverage at the National Level . . . . . . . . . . . . . . . . . . 9 2.2 Estimates of Census Coverage for Small Geographic Areas . . . . . . . . . . . . . . . . 22 2.3 A.C.E. Revision II Estimation Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
- 3. Comparison to Demographic Analysis (DA) . . . . . . . . . . . . . . . . . . . . . . . . 33
- 4. Evaluation of A.C.E. Revision II Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1 Evaluation of the Identification of Duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Loss Functions and Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
- 5. Limitations of the A.C.E. Revision II Estimates . . . . . . . . . . . . . . . . . . . . . 47 5.1 Adjustment for Correlation Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 Underestimation of Duplicates by the A.C.E. Revision II Further Study of Person Duplication (FSPD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3 Alternative Approaches are Available for Tabulating Contributions to Correct Enumerations from E-Sample Cases with Duplicate Links . . . . . . . . . . 55 5.4 Alternative Approaches are Available for Assigning Residency Probabilities to P-Sample Cases that Link to Census Cases Outside the Search Area . . . . . . . 57 5.5 Use of Different Post-strata for the E-Sample and P-Sample Could Either Reduce or Increase Synthetic Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.6 Other Issues in Synthetic Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
- 6. Other Assessments of A.C.E. Revision II Results . . . . . . . . . . . . . . . . . . 63 6.1 Measurement Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2 Other Errors in Census Omissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.3 Missing Data Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6.4 Coding of Mover Status Using Evaluation Data . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.5 To What Extent are the Errors Noted Above Reflected in the Confidence Intervals and Loss Function Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Attachment A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Technical Summary of A.C.E. Revision II for The Committee on National Statistics This report summarizes the findings and limitations of the Accuracy and Coverage Evaluation (A.C.E.) Revision II research and analysis as set forth in the detailed research reports. The Census Bureau conducted the original A.C.E. as part of Census 2000 with the expectation that it could be used to adjust the Census 2000 results for all non-apportionment purposes if it improved the census data. The A.C.E. used Dual-System Estimation (DSE) to estimate the net coverage of Census 2000. The A.C.E. results were produced in March 2001, but evaluations and analyses of the estimates conducted through October 2001 indicated problems that precluded their use for any official purposes. The estimates were publicly released, under court order, in December 2002. The goal of the A.C.E. Revision II work carried out over the last year has been to further examine these problems with the goal of producing improved estimates of the net coverage of Census 2000. If sufficient improvements could be made in the net coverage estimates, the A.C.E. Revision II estimates could then potentially be used to improve the intercensal population estimates, adjusting them for the net coverage errors of Census 2000.
This work has now been completed and the Census Bureau is ready to issue its results and to determine whether or not to adjust the intercensal population estimates using the results of A.C.E. Revision II. It should be noted that this report neither documents the decision, nor does it make any recommendations in this regard. Rather, this document is intended to document the results of the research.
Much of the research conducted for A.C.E. Revision II over the past year has been new and groundbreaking. While prior research and adjustment activities had focused on errors of omission, this research also accounts for errors of erroneous inclusion to improve estimates of net undercount. Based on the underlying premise of a very small net national undercount, this required a very careful accounting of both types of errors. Extensive research and analysis were conducted to reduce errors in the data and to develop new methodology so that corrections could be made to the DSE. New statistical matching and models for exact matching, which accounted for coincidental agreement, were developed to further enhance the detection of census duplicates For the first time, extensive evaluation results were integrated into the DSE using double sampling ratio adjustments while accounting for overlap between evaluations. Additionally, new models were required to estimate probabilities of correct enumeration and probabilities of residency status on census day for those cases with a duplicate link. Some of the more important aspects of the work are discussed below.
A.C.E. Person Follow-up data were combined with evaluation follow-up data and a sample of the data was re-coded to produce improved estimates of census erroneous enumerations and census omissions. Further study of census duplication (a type of erroneous enumeration) resulted in improved identification of duplicates and potential non-residents included in the A.C.E. sample on Census Day. These results were integrated into DSEs by population subgroups called post-i
strata. New E-Sample post-strata (distinct from the P-Sample post-strata) were developed to account for additional variation in erroneous enumeration rates. Finally, the estimates thus obtained were further adjusted for correlation bias estimated using comparisons to sex ratios obtained from Demographic Analysis results. These efforts have not only addressed major problems with the March 2001 A.C.E. estimates, but have also yielded valuable insights into challenges of census coverage and the difficulties associated with coverage measurement. This new knowledge will be extremely useful in planning and carrying out the 2010 Census and an associated program of coverage measurement.
The totality of the evaluation and revision work performed leads to the conclusion that the A.C.E. Revision II estimates of the net coverage of Census 2000 are dramatically superior to the March 2001 A.C.E. estimates, and also represent significant improvements to the revised preliminary A.C.E. estimates issued in October 2001 (which were also limited in detail). The A.C.E. Revision II methodology, however, has some important limitations that lead to sources of error in the estimates, which are discussed below under question 4. The findings and limitations of the A.C.E. Revision II estimates are discussed in the remainder of this summary in the context of the following five questions:
- 1. What do the A.C.E. Revision II estimates say about net coverage in Census 2000?
- 2. Did the A.C.E. Revision II estimates address major problems with the March 2001 estimates?
- 3. What do evaluations say about the quality of the A.C.E. Revision II estimates?
- 4. What are the main limitations of the A.C.E. Revision II estimation methodology?
- 5. What do the A.C.E. Revision II results tell us about Census 2010 planning?
The full report provides additional background, more information on the A.C.E. Revision II methodology, and further discussion of the findings and limitations.
- 1. What Do the A.C.E. Revision II Estimates Say About Net Coverage in Census 2000?
The A.C.E. Revision II estimates suggest that Census 2000 produced a net overcount of the population. This finding is in contrast to the measures of net coverage for previous censuses and indeed contrasts dramatically with the March 2001 A.C.E. estimates as well. The estimates also suggest that Census 2000 reduced dramatically the differential net coverage between race groups seen in previous censuses. This latter finding is very consistent with findings of Demographic Analysis and indeed with the March 2001 A.C.E. estimates.
Some specific findings from the A.C.E. Revision II estimates are as follows:
- A.C.E. Revision II estimated that Census 2000 resulted in a net overcount of the total household population of about one-half of one percent.
- A.C.E. Revision II estimated a net overcount of 1.13 percent for non-Hispanic Whites, but a net undercount of 1.84 percent for non-Hispanic Blacks. Both of these estimates were found to be significantly different from zero.
ii
- Net coverage estimates for all other race/Hispanic origin domains were not statistically different from zero for the A.C.E. Revision II estimates.
- A.C.E. Revision II estimated a net overcount of 1.25 percent for owners and a net undercount of 1.14 percent for non-owners. Non-owners showed differential net coverage estimates for every race/Hispanic origin domain with the exception of American Indians on reservations and Native Hawaiians and Other Pacific Islanders. These two domains are relatively small and their estimates of net coverage have large sampling errors.
- There were also differences in estimated census net coverage across age/sex groups. Statistically significant net overcounts were estimated for children aged 10-17, and for adult females aged 18-29, 30-49, and 50 and over, as well as for males aged 50 and over. Statistically significant net undercounts were estimated for males aged 18-29, and 30-49. The net coverage estimate for children aged 0-9 was a net overcount but not significantly different from zero.
In general, the A.C.E. Revision II findings are dramatically different from the March 2001 A.C.E. results. A.C.E. Revision II found a one-half of one percent net overcount of the total household population, whereas the March 2001 A.C.E. estimated a 1.18 percent net undercount.
This difference reflects the correction of major errors in the March 2001 A.C.E. estimates. This further supports the October 2001 ESCAP II recommendation that the March 2001 A.C.E.
estimates not be used for any official purposes. The A.C.E. Revised Preliminary estimates released initially in October 2001 are much closer to the A.C.E. Revision II estimates than are the March 2001 A.C.E. estimates. The A.C.E. Revised Preliminary estimates showed only a 0.06 percent net undercount of the household population. These preliminary estimates only corrected for errors in estimating census erroneous enumerations, whereas the A.C.E. Revision II estimates refined these corrections and also corrected for errors in estimating census omissions. Tables 1 and 2 in the appendix show key comparisons between these three sets of coverage estimates. The coverage estimates for the 1990 Census are also included.
The coverage discussed above is net census coverage. It is this coverage that is directly measured by the dual-system estimates. The gross errors of Census 2000, as with all previous censuses, are much larger than the net coverage errors. The DSE methodology does not directly provide estimates of gross census errors unless further assumptions are made. However, underlying research and evaluations can be used to provide some insight about the level of gross census errors.
The A.C.E. Revision II further study of census duplication estimated 5.8 million census duplicates, a figure which should be considered a lower bound on the true number of duplicates.
In addition, the census contained other erroneous enumerations not caused by duplication such as fictitious enumerations or non-residents of the United States. To the extent that the erroneous inclusions were in the same geographic area and had the same relevant characteristics as the omitted persons, they were accurate predictions of persons who should have been counted and iii
they improved census accuracy. On the other hand, to the extent that they resided in different areas or had different characteristics, they represented errors and they reduced census accuracy.
The differential net coverage estimated by A.C.E. Revision II indicates that census erroneous enumerations did not exactly offset census omissions.
Also, the census counts included 1.2 million count imputation persons. The Census Bureau believes that this imputation improved the overall accuracy of the census. However, while some of these imputations would represent persons that should have been enumerated, others would represent gross errors, as when persons were imputed into truly vacant housing units or when no persons were imputed into truly occupied housing units.
- 2. Did the A.C.E. Revision II Estimates Address Major Problems with the March 2001 Estimates?
The major issues raised with the March 2001 estimates by the ESCAP and ESCAP II Committees have been for the most part addressed by the work leading up to the release of the A.C.E. Revision II estimates. Limitations of the A.C.E. Revision II methodology relevant to some of these issues are discussed later under questions 3 and 4.
- Underestimation of Erroneous Enumerations In October 2001, the ESCAP II Committee concluded that the March 2001 A.C.E.
estimates had overstated the net undercount by at least 3 million persons due to significant underestimation of erroneous enumerations, many of which were census duplicates. The A.C.E. Revision II estimates incorporate revised estimates of census erroneous enumerations that use results from a new study of census duplication as well as results from a re-coding of a sample of records using both A.C.E. and evaluation follow-up data. A.C.E. Revision II estimates 4.7 million (rather than the at least 3 million reported earlier) more erroneous enumerations than were estimated by the March 2001 estimates.
- Differences with Demographic Analysis The ESCAP Committee was concerned that unexplained inconsistencies between the March 2001 A.C.E. estimates and estimates from Demographic Analysis raised the possibility of an as-yet undiscovered problem in the March 2001 A.C.E.
methodology.
The Revision II estimates and the Demographic Analysis estimates are reasonably consistent at the national level, eliminating the previous concern. This consistency was enhanced by the application of correlation bias adjustment to the A.C.E. Revision II estimates for adult males, since these adjustments forced adult sex ratios of the A.C.E. Revision II estimates to agree exactly with sex ratios iv
obtained using Demographic Analysis results. Estimates for adult females, however, were reasonably consistent, even though their estimates were not adjusted for correlation bias.
The primary exception to the consistency of results occurs for children aged 0-9.
While the A.C.E. Revision II estimates a small net overcount for children 0-9 (the estimate was not statistically significantly different from zero), Demographic Analysis estimated a net undercount of 2.56 percent. The Demographic Analysis estimate for this age group is more accurate than those for other age groups because the estimate for young children depends primarily on recent birth registration data which are believed to be highly accurate.
- Estimates of Omissions The March 2001 A.C.E. estimates clearly overstated census omissions from the P-Sample. The re-coding operations conducted as part of the A.C.E. Revision II process identified P-Sample cases that were mis-coded in regard to whether or not they were Census Day residents. The Census Day residency status was questionable both for some of the P-Sample matches and for some of the P-Sample non-matches. In addition, computer matching algorithms identified P-sample cases, both matches and nonmatches, that linked to census enumerations outside the A.C.E. search area, raising questions about whether these cases were Census Day residents of their A.C.E. sample blocks. The A.C.E. Revision II estimates were adjusted accordingly using the re-coded data and the results from the computer matching algorithms, and are believed to be much more accurate.
- Total Error Model and Loss Function Analysis The ESCAP II Committee noted that there was not sufficient time by October 2001 to develop a new total error model that accounted for all the errors discovered from evaluations of the March 2001 A.C.E. estimates. This situation has now changed. Since the A.C.E. Revision II estimates incorporate adjustments for errors based on evaluation results, most of these errors have been removed from the new estimates to the extent that we can assess them. This left less error to be accounted for in the error model used for the loss function analysis. Thus, the new error model accounted for some of the remaining errors. Other errors could not be accounted for due to lack of data, and were thus effectively assumed to be negligible. It is unknown whether or not significant errors in the A.C.E.
Revision II estimates still exist, but if they do, they are not accounted for in the current model. While the revised model used in the loss function analysis (results of which are discussed below) favored the A.C.E. Revision II estimates over the Census 2000 numbers, these results should be interpreted in the context of the important limitations of the analysis. Both results and limitations are discussed below.
v
- Missing Data The level of missing data from the re-coding operation was comparable to that in the March 2001 A.C.E. and the 1990 PES. The A.C.E. Revision II missing data models are thought to be of higher quality than those used for the March 2001 A.C.E. estimates because the imputation cells rely on more information and more detailed questionnaire responses. Initially there was considerable concern about conflicting cases since there was not an appropriate donor pool for them that could be used for imputation. These cases result from situations where two apparently good and equal caliber interviews were obtained, but gave contradictory information. Special procedures were developed as part of the A.C.E. Revision II process where expert analysts were able to reduce the number of conflicting cases to very low levels.
- 3. What Do Evaluations Say About the Quality of the A.C.E. Revision II Estimates?
Evaluations were performed on the A.C.E. Revision II estimates to estimate their bias (systematic error) and variance (random error). The evaluations of bias were relatively limited because data that previously were used to estimate biases in the March 2001 estimates were used in the production of the A.C.E. Revision II estimates to correct for the major biases. The limited data available for evaluation of bias does not itself reflect negatively on the A.C.E. Revision II estimates; in fact, it is because of the corrections for these major errors that we believe the A.C.E.
Revision II estimates to be of much higher quality than the March 2001 A.C.E. estimates.
Nevertheless, although the evaluations do account for the variance arising from the corrections for bias, the corrections for bias in the A.C.E. Revision II estimates may themselves be subject to bias, the magnitude of which has not been quantified. This is particularly true for the corrections for correlation bias and for P-Sample cases that matched census enumerations outside the A.C.E.
search area.
Loss function analysis examined the comparative accuracy of the A.C.E. Revision II estimates and the census for population levels and population shares for groupings of geographic areas such as states, counties, and places. The loss function analysis indicated that A.C.E. Revision II was more accurate for both shares and levels for all groupings considered, with the exception that the census produced more accurate estimates of levels for places with population of at least 100,000. More research is needed to understand the reason for the one exceptional result. The validity of the loss function analysis depends on the quality of the estimates of components of error in the A.C.E. Revision II, and some of those components are not quantified. The resulting implications on the loss function analysis are discussed below.
The measure of accuracy used by the loss functions was weighted mean squared error, with weights set inversely proportional to the census counts. Mean squared error equals the sum of variance and squared bias, and the bias and variance estimates account for both sampling and nonsampling errors. Of course, the bias and variance estimates will themselves have errors. The vi
effect of omitting a variance component (if the corresponding error is uncorrelated with other random effects) would be to overstate the accuracy of the A.C.E. Revision II estimate and to understate the accuracy of the census, but we have not identified significant omitted variance components. In general, we cannot be certain whether omitted biases will tend to make any given loss function analysis overstate or understate the comparative accuracy of the A.C.E.
Revision II estimates relative to the census. Whether omitted biases cause the loss function to favor the census or the A.C.E. Revision II depends on the signs of the correlations between the omitted biases and the expected undercount rate for the areas considered.
The loss function analysis accounted for some but not all error components that could be identified in the A.C.E. Revision II estimates. More specifically, the bias estimate included error components for inconsistency of post-stratification assignments based on census versus A.C.E.
data, for error from estimating the numbers of outmovers by the numbers of inmovers, and most importantly, for error in the estimates of census duplicates (although evaluations indicate that this last error may have been mis-estimated.) The variance estimate included sampling error components from both phases of sampling in A.C.E. Revision II estimates, and also random nonsampling error components from choice of imputation models and for models used to account for P-Sample cases that matched census enumerations outside the search area.
On the other hand, the loss function analysis did not account for the following errors: synthetic estimation error; bias from response error and coding error in P-Sample residency status, match status, and mover status; bias from response error and coding error in E-Sample correct enumeration status; bias in correlation bias adjustments to the estimates due to error in the Demographic Analysis sex ratios and to the choice of model used to implement the adjustments; and bias due to the choice of model used to adjust the dual system estimator for E-Sample cases with duplicate links.
Though not included in the loss functions, the effects of synthetic error were investigated. One source of synthetic error involves correcting the individual post-stratum estimates for errors estimated at more aggregate levels (such as the corrections for correlation bias and coding errors). Two of the variance components noted above (those related to choice of imputation models and to accounting for P-Sample cases matching to census enumerations outside the search area) were included in the loss functions, but these components reflect the level of these errors, not the synthetic errors from such corrections. Errors from other such corrections, such as the adjustments for correlation bias, also were not reflected. Another source of synthetic error is variation in census coverage within post-strata (something not captured by synthetic application of post-stratum coverage correction factors for specific areas). Analyses based on artificial populations that simulated patterns of coverage variation within post-strata were done to assess whether omission of resulting synthetic biases from the loss function analysis tilted the comparisons in one direction or another. These analyses did not change the overall loss function findings, though we recognize that the analyses were not conclusive. It should be kept in mind that synthetic error is expected to be more important for smaller areas. Any limitations of the loss functions regarding synthetic error are expected to be more important when comparing small places or counties than for large places or counties.
vii
Although the loss function analysis incorporated all the components of error for which estimates were available, other biases in the A.C.E. Revision II estimates were not included as described above. Although we cannot ascertain whether omitted biases cause the loss functions to favor the census or the A.C.E. Revision II, sensitivity analyses could partially address this issue by examining the effect on the loss functions of different amounts and distributions of unaccounted for error. Such sensitivity analyses could lead to assessment of the amounts and distributions of error needed to change the directions of the indications from the loss function analysis.
In summary, when viewing the results of the loss function analysis, one must keep the assumptions and limitations in mind, as well as realize that the effect of any omitted biases could be in either direction (increasing or decreasing the estimate of the relative accuracy of the census versus the A.C.E. Revision II estimates). While the loss function evaluations suggest the superiority of the A.C.E. Revision II estimates, concerns do remain about whether the bias estimates used in the loss function analysis are of sufficient quality to assure the correctness of the results.
- 4. What are the Main Limitations of the A.C.E. Revision II Estimation Methodology?
The A.C.E. Revision II estimation methodology has some limitations, and uncertainty remains about certain methodology decisions that had an appreciable impact on the results. Some of these uncertainties are reflected in the loss function analysis mentioned above, but others are not.
The main limitations and associated uncertainties are:
- Post-stratification New post-strata were created for the E-Sample to better explain variation in correct enumeration rates. These new post-strata were different from those used for the P-Sample to estimate census omission rates. In particular, census proxy status was used as an E-Sample post-stratification factor, as proxy status was determined to be negatively correlated with correct enumeration. Use of the new post-strata may have reduced synthetic error for small geographic areas, but it also resulted in some extreme coverage correction factors for some combined E- and P-Sample post-strata, especially those related to proxy status. We do not know that these extreme coverage correction factors are incorrect, as census coverage for small areas can also be highly variable. We do recognize, however, that separately stratifying the E- and P-Samples raises a technical issue that could lead to a systematic bias in the direction of overcount estimates for places with unusually large proportions of proxy respondents.
- Correlation Bias Adjustments of the A.C.E. Revision II estimates for correlation bias removed a significant source of error, but with certain limitations: (1) The adjustments assumed estimates for adult females were unbiased, and compared sex ratios viii
obtained from A.C.E. Revision II with those obtained from Demographic Analysis results to estimate correlation bias adjustments for adult males. This approach would not provide a good approximation to patterns of correlation bias that involve substantial correlation bias for adult females. (2) No correlation bias adjustments were made for children. (3) Different correlation bias models could have been used that provided identical fits to the available data, but that produced different sub-national results. The model chosen assumed constant relative correlation bias within post-strata, and was the simplest possible and probably had the lowest variances (though we have not yet been able to compare variances). (4)
The demographic detail of the adjustments was limited (by that of the Demographic Analysis data used) to Black versus non-Black race groups by age.
While it is possible that correlation bias differs according to race/Hispanic origin beyond the Black versus non-Black distinction of DA, or according to other factors (e.g., renter versus owner), we had no data to detect such differentials in correlation bias by race/Hispanic origin within the non-Black group. (5) Data for non-Blacks aged 18-29 did not permit estimation of correlation bias for this group, suggesting possible problems with the underlying assumptions (e.g.,
possible bias in estimates for non-Black females aged 18-29), or with the Demographic Analysis or A.C.E. Revision II data for this group. In general, given limitations of data and assumptions, we could better estimate correlation bias for Blacks than for non-Blacks.
- P-Sample Links and Residency Status The duplicate study identified P-Sample persons coded as non-mover residents who linked to a census enumeration outside the A.C.E. search area. The study could not, however, determine which location was the correct Census Day residence. This was addressed by assigning to the P-Sample record a probability that its location was the persons Census Day residence. Only limited data were available from which to develop these probabilities, so they were developed from correct enumeration probabilities developed for the E-Sample cases with duplicate links for comparable situations. Other reasonable alternatives for assigning residency status probabilities are reflected in the loss function analysis.
- Underestimation of Duplicates The identification of census duplicates was conservative in the sense that it probably resulted in underestimation of the number of duplicates. This leads to some error in the A.C.E. Revision II estimates. We considered, but rejected, applying an efficiency adjustment to account for missed duplicates. Analysis of the duplicates within the A.C.E. clusters and the characteristics of those duplicates were studied. This study showed that for those groups where it appeared that a large percentage of duplicates were missed within the cluster, the mechanism causing those duplicates, such as misdelivery of census mail forms, was likely not to be that causing duplication to occur outside the cluster.
ix
- Estimating Correct Enumerations Among Duplicates The person duplication study identified E-Sample cases with a link to a census enumeration outside the A.C.E. search areas. In most cases, we did not feel we could determine which census record of a duplicate pair was correct. Assuming that such duplicate records are the same person and that one of the two census records is a correct enumeration, it is reasonable to expect that half of these E-Sample links are correct enumerations and half are erroneous. There were some exceptions where we felt we could determine which member of a duplicate pair was correct, such as links to census group quarters residents (such as college dorm residents). Taking the exceptions into account, probabilities of correct enumeration were assigned to maintain the expected overall proportion of correct enumerations among the duplicate links. For national totals, it does not matter which member of a duplicate pair is the correct enumeration, but this does affect post-stratum estimates, and hence subnational estimates.
- 5. What Do the A.C.E. Revision II Results Tell Us About Planning for the 2010 Census?
The A.C.E. Revision II estimates will be invaluable tools for planning the 2010 Census, focusing and supporting 2010 Census research and design. The Census 2000 Testing, Experimentation, and Evaluation Program is currently engaged in assessing Census 2000 operations and results, measuring the effectiveness and impact on data quality of the Census 2000 design, operations, systems, and processes. The A.C.E. Revision II work will feed into 2010 plans for this program and further their goals. The A.C.E. Revision II work will inform the 2010 Census research, development and testing program and provide information for the Master Address File and other Census Bureau surveys.
While many elements of the A.C.E. Revision II work will feed into the 2010 Census planning process, several areas of additional research and possible testing are immediately suggested:
First, Planning for the 2010 Census will clearly be informed by the A.C.E. Revision II work on census erroneous enumerations, particularly duplicates. The extent of undetected duplication in prior censuses is unclear, as previous censuses did not capture name information to allow duplicate detection by name (and birth date) matching. Much of the research in this area conducted by Census Bureau analysts over the past year has been new and groundbreaking. Clearly efforts should be made in the direction of preventing duplicates from occurring, as well as investigating ways to determine which member of a duplicate pair is correct.
Second, the A.C.E. Revision II work further highlights the need for research into the Census Bureaus residence rules. Decennial censuses use the concept of usual residence to determine where to count each individual, the goal being to count everyone once, only x
once, and in the correct location. The residence rules proved problematic for several groups in Census 2000, and may have introduced error. Notable difficult situations involve college students, children in joint custody, and individuals with more than one residence. Cognitive research and testing of simplified, more understandable residence rules will be part of the 2010 Census research, development and testing program.
Third, the A.C.E. Revision II work confirms that proxy data is highly error-prone.
Significant research and testing will be devoted to minimizing error in the 2010 Census caused by proxy data. Clearly census operations should be designed to limit the introduction of proxy data in the first place, and systems should be developed to improve the quality of the proxy data, when proxy data must be used.
The A.C.E. Revision II work will help build the foundation for making early and informed decisions about the role and scope of the 2010 Census in the federal statistical system and its interaction with the Master Address File. This work provides critical analysis and information for Census Bureau planning and implementation of decisions for the 2010 Census.
xi
Appendix Table 1: Percent Net Undercount for Major Groups A.C.E. A.C.E. Revised A.C.E.
1990 PES Revision II Preliminary March 2001 Characteristic Est. (%) S.E. (%) Est. (%) S.E. (%) Est. (%) S.E. (%) Est. (%) S.E. (%)
Total -0.49 0.20 0.06 0.18 1.18 0.13 1.61 0.20 Race/Hispanic Origin Domain Non-Hispanic White* -1.13 0.20 -0.33 0.21 0.67 0.14 0.68 0.22 Non-Hispanic Black 1.84 0.43 0.78 0.45 2.17 0.35 4.57 0.55 Hispanic 0.71 0.44 1.25 0.54 2.85 0.38 4.99 0.82 Non-Hispanic Asian** -0.75 0.68 -0.31 0.91 0.96 0.64 2.36 1.39 Hawaiian or Pacific Isl** 2.12 2.73 4.64 2.79 4.60 2.77 2.36 1.39 AI on Reservation*** -0.88 1.53 3.44 1.60 4.74 1.20 12.22 5.29 AI off Reservation*** 0.62 1.35 3.44 1.60 3.28 1.33 0.68 0.22 Tenure Owner -1.25 0.20 n/a n/a 0.44 0.14 0.04 0.21 Non-Owner 1.14 0.36 n/a n/a 2.75 0.26 4.51 0.43 Age/Sex 0 - 9**** -0.46 0.33 n/a n/a 1.54 0.19 3.18 0.29 10 - 17**** -1.32 0.41 n/a n/a 1.54 0.19 3.18 0.29 18 - 29 Male 1.12 0.63 n/a n/a 3.77 0.32 3.30 0.54 18 - 29 Female -1.39 0.52 n/a n/a 2.23 0.29 2.83 0.47 30 - 49 Male 2.01 0.25 n/a n/a 1.86 0.19 1.89 0.32 30 - 49 Female -0.60 0.25 n/a n/a 0.96 0.17 0.88 0.25 50+ Male -0.80 0.27 n/a n/a -0.25 0.18 -0.59 0.34 50+ Female -2.53 0.27 n/a n/a -0.79 0.17 -1.24 0.29 The A.C.E. Revision II, the A.C.E. Revised Preliminary, and the A.C.E. March 2001 net undercount are for the household population.
The 1990 net undercount is for the PES universe which included noninstitutional, nonmilitary group quarters in addition to the household population. The results from the Committee on Adjustment of Post-censal Estimates (CAPE) are total population estimates. As a result, the 1990 estimates may differ from the CAPE results. See Bryant et al. (1992) and Thompson (1992).
- For 1990, AI off Reservation was included in the Non-Hispanic White Race/Hispanic Origin Domain. Therefore, the net undercount and standard error for these domains are identical.
- For 1990, Asian or Pacific Isl. was a single Race/Hispanic Origin Domain. Therefore, for Non-Hispanic Asian and for Hawaiian or Pacific Isl, the net undercount and standard error are repeated.
- For the A.C.E. Revised Preliminary estimates, American Indian and Alaskan Native was a single Race/Hispanic Origin Domain. Therefore, for AI on Reservation and for AI off Reservation, the net undercount and standard error are identical.
- For March 2001 and for the 1990 PES, the 0 - 17 Age/Sex group was a single group. Therefore, the net undercount and standard error for children 0 - 9 and 10 - 17 are identical.
A negative net undercount denotes a net overcount.
n/a means Not Available.
xii
Appendix Table 2: Net Undercount Estimates for Major Groups (in thousands)
A.C.E. A.C.E.
Census 1990 PES Characteristic Revision II March 2001 2000 Est. S.E. Est. S.E. Est. S.E.
Total 273,587 -1,332 542 3,262 378 3,994 488 Race/Hispanic Origin Domain Non-Hispanic White* 192,924 -2,151 382 1,302 272 1,277 417 Non-Hispanic Black 33,470 628 146 741 121 1,389 168 Hispanic 34,538 248 152 1,014 141 1,102 181 Non-Hispanic Asian** 9,960 -74 67 96 65 174 103 Hawaiian or Pacific Isl** 590 13 16 28 18 AI on Reservation 540 -5 8 27 7 52 22 AI off Reservation* 1,565 10 21 53 22 Tenure Owner 187,925 -2,320 372 840 264 71 334 Non-Owner 85,662 988 310 2,422 235 3,871 368 Age/Sex 0-9*** 39,642 -180 130 1,127 141 2,084 191 10 - 17*** 32,307 -422 129 18 - 29 Male 21,594 245 138 845 76 792 130 18 - 29 Female 21,576 -295 III 492 65 687 113 30 - 49 Male 41,297 848 104 784 83 685 114 30 - 49 Female 42,783 -257 105 414 73 326 95 50+ Male 33,798 -270 90 -83 61 -160 93 50+ Female 40,590 -1,001 107 -318 67 -419 98 The Census count is for the household population.
The AC.E. Revision II and the AC.E. March 2001 net undercount are for the household population.
The 1990 net undercount is for the PES universe which included noninstitutional, nonmilitary group quarters in addition to the household population. The results from the Committee on Adjustment of Post-censal Estimates (CAPE) are total population estimates. As a result, the 1990 estimates may differ from the CAPE results. See Bryant et al. (1992) and Thompson (1992).
- For 1990, AI off Reservation was included in the Non-Hispanic White Race/Hispanic Origin Domain.
- For 1990, Asian or Pacific lsI. was a single Race/Hispanic Origin Domain. Therefore, the net undercount and standard error displayed is for the Asian or Pacific lsI Domain.
- For March 2001 and for the 1990 PES, the "0 - 17" Age/Sex group was a single group. Therefore, the net undercount and standard error displayed are for the 0 - 17" Age/Sex group.
Estimates from the AC.E. Revised Preliminary methodology are not available. Since the revised preliminary estimates are only an approximation of the undercount, the dual system estimates were not calculated.
A negative net undercount denotes a net overcount.
Xlll
- 1. Introduction The Census Bureaus work on producing revised estimates of the net undercount in Census 2000 was completed in 2002. These revised estimates of coverage adjust for errors identified in the March 2001 Accuracy and Coverage Evaluation (A.C.E.) and are referred to as A.C.E. Revision II. The March 2001 A.C.E. estimates of Census 2000 coverage were determined to be unacceptable because A.C.E. failed to measure a significant number of erroneous census enumerations and thus overstated the net undercount. The background pertaining to these findings, an overview of the A.C.E. Revision II, and a summary of census error are contained in Sections 1.1, 1.2, and 1.3, respectively.
This revision effort was a formidable task made even more challenging by time constraints. This work was limited to those activities that could be completed in a years time so that we could begin research preparing for coverage measurement in the 2010 census. The approach was to identify and correct for errors in the March 2001 Dual System Estimates (DSE). Considerable research was needed to better understand the components of coverage error and how they affect the estimate of the net undercount. As with previous coverage measurement surveys, the goal of these revisions was to estimate the net undercount. Limitations in the implementation of the DSE do not immediately permit a direct estimate of the gross error components (Hogan 1993).
Extensive research and analysis were conducted to reduce errors in the data and to develop new methodology so that corrections could be made to the DSE. New statistical matching and models for exact matching were developed to further enhance the detection of census duplicates. The revision program also included an evaluation aspect.
This document summarizes our analysis and evaluation of the A.C.E. Revision II estimates from a technical perspective. It is based on detailed evaluations and assessments that could be completed in the time allowed. See Attachment A for a listing of the technical reports fully documenting this effort. The primary purpose of this document is to clearly set-forth specific findings and to highlight the limitations with the methodology. We do NOT provide here a recommendation on whether the A.C.E. Revision II estimates are more accurate than the Census.
The A.C.E. Revision II results and a brief overview of the revision methodology are presented in Section 2. Section 3 contains comparisons of the new estimates with estimates from Demographic Analysis. Evaluations, limitations, and other assessments of the A.C.E. Revision II estimates are summarized in Sections 4, 5, and 6, respectively. More detailed results can be found in the technical reports. A more comprehensive description of the methodology is available in Kostanich (2003a) .
1.1 Background The original March 2001 A.C.E. estimates were available in time to allow for the possibility of correcting Census 2000 redistricting files. At that time the Census Bureaus Executive Steering Committee for A.C.E. Policy (ESCAP) recommended NOT to correct the Census 2000 counts for purposes of redistricting (ESCAP, 2001). The Secretary of Commerce concurred. Given the 1
information available at that time, this decision was not based on any clear evidence that the Census counts were more accurate, but rather concern that there was some yet undiscovered error in the March 2001 A.C.E. estimates. In particular, there were concerns about the inconsistency between the A.C.E. results and estimates from Demographic Analysis (DA). The A.C.E.
estimate of a 3.3 million net undercount was very different from the DA estimate of a 1.8 million net overcount.1 The ESCAP also noted concerns with the possibility of synthetic and balancing error.
Further evaluations were conducted over the next six months to examine the reasons for the discrepancy and to determine if Census 2000 data products, other than redistricting data, should be corrected. Two planned A.C.E. evaluation programs, the Matching Error Study (MES) (Bean 2001) and the Evaluation Followup (EFU) (Raglin and Kresja 2001), identified some but not all the errors in the A.C.E. The Person Duplication Study used computer matching techniques to identify large numbers of duplicate census enumerations that were not identified by the A.C.E.
evaluation results (Fay 2001 2002). Additional evaluations were conducted to alleviate other concerns such as any problems with A.C.E. balancing, contamination, or missing data. Also, further research was done on the components of the DA estimates, resulting in some significant revisions to the components (particularly to the migration estimates), and a new set of DA estimates (Robinson 2001b).
In October 2001, the ESCAP again decided NOT to correct the census counts for other Census 2000 data products. Analysis of A.C.E. evaluation data and the results of the person duplication study revealed that the A.C.E. failed to measure large numbers of erroneous census enumerations, overstating the net undercount by at least 3 million persons (ESCAP II, 2001).
This error alone was sufficient to call into question the quality of the A.C.E. estimates. Coupled with the revisions to the DA estimates, it provided an explanation for the previously observed inconsistency with DA. The earlier concerns with A.C.E. balancing, contamination, and biases due to missing data had also been resolved. The level of other errors was believed to be small by comparison and therefore was not a major factor in the second ESCAP decision. See Hogan et al. (2002) and Mulry and Petroni (2002) for further information.
In October 2001, the Census Bureau released approximations of the undercount for three race/Hispanic origin groups (Thompson et al. 2001). These Revised Early Approximations corrected estimates of erroneous enumerations for census duplicates and for other Erroneous enumerations identified in the A.C.E. evaluations but not in the full A.C.E. E-Sample. The results were intended to be illustrative of the effects of these corrections on net undercount estimates and on possible coverage differences. The same methodology and data were later used 1
The 1.8 million net overcount estimate is from the original Base DA estimates available in March 2001 (Robinson 2001a). Alternative estimates that allowed for a higher level of net undocumented immigration, Alt DA, were also given by Robinson (2001a) for use in comparisons against the A.C.E. estimates. These yielded a net undercount estimate of 914,000. Revisions to the DA estimates (Robinson 2001b) ultimately changed these results to a net undercount estimate of about 340,000. All three DA estimates differ substantially from the March 2001 A.C.E. estimate of a 3.3 million net undercount.
2
to expand the calculations to seven race/Hispanic origin groups (Fay 2002a, Mule 2002a). These preliminary estimates show a very small net undercount. The data also indicate that the differential undercount has not been eliminated. These results are limited to the extent that they only provide information at the national level for broad population groups. Furthermore, these preliminary approximations were based on a small subset of A.C.E. data and only partially corrected for errors in measuring erroneous enumerations using Fays lower bound (Fay 2001).
Potential errors in measuring omissions were not accounted for.
1.2 Overview of the A.C.E. Revision II Process Even though the ESCAP recommended twice NOT to correct the census counts, they had concerns about differential coverage in Census 2000. They thought it possible that further research resulting in revised estimates of coverage could be used to improve the post-censal estimates. In addition, work on revised estimates would provide a better understanding of Census 2000 coverage error that could be used to improve census operations for 2010 and would help in developing better methodologies for the 2010 coverage measurement program. Hence, work began on revising the A.C.E. estimates to correct for detected errors in an effort now known as A.C.E. Revision II.
The major objective of A.C.E. Revision II was to produce improved estimates of the household population that could be used to measure net coverage error in Census 2000. Since the national net undercount, as indicated by both DA and the Revised Early Approximations, was very close to zero, and the census included large numbers of erroneous enumerations in the form of duplicates, it was imperative that the revised methodology carefully account for both overcounts and undercounts. This meant obtaining better estimates of erroneous census enumerations from the E-Sample and obtaining better estimates of census omissions from the P-Sample. Hogan (2002) summarized the major revision issues in the form of the following five challenges:
- 1. Improve estimates of erroneous census enumerations
- 2. Improve estimates of census omissions
- 3. Develop new models for missing data
- 4. Enhance the estimation post-stratification
- 5. Consider adjustment for correlation bias.
There were no new field operations associated with the A.C.E. Revision II process. Because of the late date, it was not feasible (or practical) to revisit households for additional data collection.
Consequently, the revisions were based on data that had already been collected. One aspect of our strategy for revising the coverage estimates involved correcting measurement errors using information from the A.C.E. evaluation data. This is referred to as the recoding operation.
Another aspect of these corrections involved conducting a more extensive duplicate study to provide results to be used to correct for measurement error due to duplication that was not detected by the A.C.E. evaluations. This study is referred to as the Further Study of Person Duplication (FSPD) (Mule 2002b). The estimation method, discussed briefly in Section 2.3 and more fully in Kostanich (2003a), is designed to handle overlap of errors detected by both of these studies to avoid overcorrecting for measurement error.
3
The recoding operation was designed to improve both estimates of erroneous census enumerations and census omissions. It used the original A.C.E. person interview (PI) and person followup (PFU) data, combined with data from the evaluation followup interview (EFU), the matching error study (MES), and the PFU/EFU review study2 to correct for coding error in enumeration status, residence status, mover status, and matching status. This effort involved extensive recoding of about 60,000 P-Sample cases and more than 70,000 E-Sample cases.3 An automated computer algorithm was used to recode most of the cases, but others required a clerical review by experienced analysts at the National Processing Center (NPC). These analysts had access to the questionnaire responses as well as to interviewer notes which put them in a better position to resolve apparent discrepancies. It was not possible to completely code all cases because of missing or conflicting information, however the number of conflicting cases was considerably reduced.
The FSPD was designed to provide information to improve estimates of both erroneous census enumerations and census omissions. This study used computer matching and modeling techniques to identify E- and P- Sample cases which linked to (matched) another census enumeration anywhere across the entire country, including group quarters enumerations, and reinstated and deleted census cases. For the E-Sample links the study could not generally identify which enumeration was correct and which was the duplicate. For P-Sample links, the study could not identify whether the correct Census Day residence was at the P-Sample location or the census location. Rather, the information from the FSPD was used to model the probability that an E-Sample linked case was a correct enumeration or that a P-Sample case was a resident on Census Day.
New missing data models were developed to reflect the different types of missing data now possible as a result of the recoding operation. There were three new types of missing data to deal with: (1) P-Sample households that were originally considered interviews but the recoding determined that there were no valid Census Day residents, (2) cases with unresolved match, enumeration, or residency status because of incomplete or ambiguous interview data, and (3) cases with conflicting enumeration or residency status because contradictory information was collected in the A.C.E. PFU and the EFU interviews and it could not be determined which was valid. A household noninterview weighting adjustment using new cell definitions was used for (1). Imputation cells and donor pools were developed for the second type of missing data based on detailed responses to the questionnaire. For the conflicting cases in (3), there were no applicable donor pools, and probabilities of 0.5 were imputed for correct enumeration status and Census Day residency status. Fortunately, the recoding operation resulted in a relatively small number of these cases.
2 The PFU/EFU review study was not a planned evaluation. It was a special study conducted in a subsample of the evaluation data to resolve discrepancies between enumeration status in the PFU and EFU.
3 These are probability subsamples of the original A.C.E. P- and E- Samples. In the context of A.C.E. Revision II they are called revision samples, but they are in fact equivalent to the evaluation followup samples.
4
The revised estimates incorporate separate post-strata for estimating census omissions and erroneous census enumerations because the factors related to each of these are likely to be different. Our research efforts focused on determining variables related to explaining variations in rates of erroneous enumerations. This is because much of the previous work on developing post-strata focused on census omissions, and the same post-strata were simply applied to the estimation of erroneous inclusions. For the E-Sample, some of the original post-stratification variables were eliminated and additional variables were added. Variables such as region, Metropolitan Statistical Area and type of enumeration area, and tract return rate were replaced by proxy status, type and date of census return, and household relationship and size. For the P-Sample, only the age variable was modified to define separate post-strata for children aged 0 to 9 and those 10 to 17. The same change to the age groups was made for the E-Sample as well. This change was made because the DA estimates suggested different coverage for younger versus older children. The estimated correct enumeration rates and estimated match rates were used to calculate Dual System Estimates (DSE) for the cross-classification of the E and P post-strata (see Section 2.3).
The A.C.E. Revision II DSEs include an adjustment for correlation bias. Correlation bias exists if (within P-Sample post-strata) people missed in the census were more likely (or less likely) to also be missed in the A.C.E. In the more likely to be missed scenario, correlation bias has a downward effect on estimates. In previous coverage measurement surveys, the erroneous inclusions were assumed to be much smaller than omissions. In this setting not adjusting estimates for correlation bias had the effect of understating the net undercount, which resulted in corrections to the census that were in the right direction but not large enough. In the presence of overcounts, it is possible that corrections without correlation bias might not even be in the right direction, and could actually increase errors relative to no adjustment. (See Section 5.1 for discussion.)
Estimates of correlation bias in A.C.E. Revision II were calculated using the two-group model and sex ratios (number of males divided by the number of females) obtained from DA data. The correlation bias estimates are made only for adult males under the assumption of no correlation bias for adult females. Also, correlation bias is not estimated for children. The correlation bias adjustments were done separately for Blacks and NonBlacks within three age categories: 18-29, 30-49, and 50 and over, with the exception of NonBlack males 18 to 29 years of age, a group for which the data would not support estimation of correlation bias for males. The model used for the correlation bias adjustment was about the simplest possible, and assumed that relative correlation bias was constant over male post-strata within the age-race groups.
The DSEs, adjusted for correlation bias, were used to produce coverage correction factors for each of the cross-classified post-strata (E-Sample post-strata cross-classified with the P-Sample post-strata). These factors were applied (carried down) within the post-strata to produce estimates for geographic areas such as places and counties. This process is referred to as synthetic estimation. The key assumption underlying this methodology is that the net census coverage, estimated by the coverage correction factor, is relatively uniform within the cross-classified post-strata. Failure of this assumption leads to synthetic error.
5
1.3 Census Errors Based on the analysis conducted in support of the two ESCAP recommendations, Census 2000 was determined to be an operational success. The Census Bureau was successful in lowering both the undercount and the differential undercount. Nonetheless, all censuses are subject to nonsampling errors. Nonsampling error may be introduced during any of the operations used to collect and process census data. Such errors include: not enumerating every household or every person in the population, enumerating some households and persons more than once (duplication), enumerating persons in the wrong place, failing to obtain all required information from the respondents, obtaining incorrect or inconsistent information, and recording information incorrectly. In addition, errors can occur during the field review of the enumerators work, during clerical handling of the census questionnaires, or during the electronic processing of the questionnaires. In order to assess the results of the revised estimates of coverage, its important to be aware of the level and pattern of errors in Census 2000. This includes both errors of omission and errors of erroneous enumeration.
Every census since at least 1940 has experienced a net undercount with substantial differential undercounts between Blacks and NonBlacks, between owners and renters, and between children and adults. The Census Bureau has historically spent considerable efforts on trying to reduce this persistent coverage error, and Census 2000 was no exception. The design of Census 2000 was fundamentally similar to the design of the 1990 census. There were, however, new innovative programs that were intended to improve coverage. These programs included:
- Local Update of Census Addresses (LUCA) Program - local and tribal government officials were given the opportunity to review and update the address list prior to Census Day.
- Paid Advertising Campaign Program - to increase awareness of the census across the country and in targeted cities to encourage cooperation with enumerators during nonresponse followup.
- New Construction Program - local governments could submit addresses for housing units that had been built subsequent to the completion of the address list in January 2000.
- Multiple Response Options - in addition to the conventional mail return and enumerator forms, responses were permitted by telephone and via the Internet, and Be Counted forms were available at public locations for individuals who believed that they might have been missed.
- Language Program - Questionnaires were available upon request in five languages and language assistance guides were available in more than forty languages.
- Restructured Pay Scale - to attract and hire sufficient numbers of temporaryquality field workers to conduct the census, competitive local pay rates were established.
- Promotion and Outreach Program - established the largest number of partnerships ever with a wide range of organizations to implement promotional activities to educate the public about the importance of participating in the census.
6
While it was expected that these programs would improve coverage in the census, it was not expected that the differential undercount could be completely eliminated. The A.C.E. was designed for the purpose of: (1) measuring net coverage in Census 2000, and (2) possibly correcting the net coverage error in Census 2000. Census 2000 is believed to have been successful in making substantial gains in lowering the differential net undercount. Even though ESCAP recommended twice not to adjust the Census 2000 counts, there remained concerns about differential coverage. In fact, the evidence from DA demonstrated that the differential net undercount was reduced but not eliminated. DA estimated (Robinson 2001b) that the overall net undercount rate was reduced from 1.65 percent in 1990 to 0.12 percent in 2000. For Blacks the rate was reduced from 5.52 in 1990 to 2.78 in 2000; while for NonBlacks the rate went from 1.08 in 1990 to -0.29 (an overcount) in 2000.
As Census 2000 was in progress, analysts became concerned that there might be large numbers of duplicate addresses on the address list. This resulted in a new census operation that was designed to reduce the impact of these duplicates (Nash 2000, Miskura 2000). Based on an address matching operation, about 2.4 million addresses (and their associated 6 million enumerated persons) were temporarily set aside for further analysis. A more detailed examination resulted in 1.4 million of these addresses (3.6 million persons) being permanently removed and the remaining 1.0 million addresses (2.4 million persons) being put back into the census. It is anticipated that some of these addresses and persons may have been erroneously removed and some erroneously reinstated. However, the Census Bureau believes that this new operation improved the overall accuracy of the census.
The census counts included 1.2 million count imputation persons. The count imputation rate is higher than the rate observed in the 1990 Census but was comparable to the rates in earlier censuses. Some of these imputations may represent enumerations that were correct in all but a very technical sense. For example, some of these imputations were for persons in truly occupied housing units for which an enumeration was not obtained. We would not characterize such imputations as erroneous when they represent actual known people. However, some of the count imputations would also represent errors, as when persons were imputed into truly vacant housing units or when no persons were imputed into truly occupied housing units.
The Further Study of Person Duplication estimated 5.8 million duplicated persons in Census 2000. This is thought to be an underestimate since the computer matching algorithm was conservative in identifying duplicate records. Records not containing a name, age, or birth date were more difficult to match and, therefore, more difficult to identify as duplicates. This study also showed that the amount of duplication and the pattern of duplication differs by race. There were also differences in the pattern of duplicates by age/sex groups and by tenure. In addition, the census contained other erroneous enumerations not caused by duplication. These included fictitious enumerations, enumerations of persons who died before April 1 or of babies born after April 1, enumerations of nonresidents of the United States, and enumerations of people counted only one time, but in the wrong place sufficiently far from their true residence (for example in the wrong state). Thus, there is strong evidence pointing to a number of erroneous enumerations much higher than the 5.8 million duplicates identified. To the extent that erroneous inclusions 7
were in the same geographic area and had the same relevant characteristics as the omitted people, they were accurate predictions of persons who should have been counted and they improved census accuracy. On the other hand, to the extent that they resided in different areas or had different characteristics, they represented errors and they reduced census accuracy. The differential net coverage by post-strata measured by the A.C.E. Revision II estimates indicates that the offset was far from perfect.
The above statistics derived from Census 2000 data are firm indicators that erroneous enumerations are present. When combined with knowledge about net census coverage, this is suggestive about the magnitude of gross census errors of erroneous inclusions and omissions.
8
- 2. A.C.E. Revision II Results and Methodology The A.C.E. Revision II estimation methodology was constructed to address five challenges posed by Hogan (2002). Detailed discussion of the estimation methodology can be found in Kostanich (2003a) or Kostanich (2003b). Section 2.3 provides a very brief summary explaining how the DSE and synthetic estimates are constructed. Before this Sections 2.1 and 2.2 discuss the A.C.E.
Revision II results. Section 2.1 discusses A.C.E. Revision II coverage estimates for Census 2000 at the national level, and Section 2.2 discusses A.C.E. Revision II coverage estimates for small geographic areas (places and counties).
2.1 Estimates of Census 2000 Coverage at the National Level Table 1 shows A.C.E. Revision II estimates of percent net undercount in Census 2000 for the total household population and major demographic groups. For comparison, Table 1 also shows results from the A.C.E. Revised Preliminary estimates and the March 2001 A.C.E. estimates, as well as estimates of 1990 census coverage from the 1990 PES. Table 2 shows corresponding net undercount estimates in terms of number of persons (not computed and hence not shown for the A.C.E. Revised Preliminary estimates), along with corresponding census counts for reference.
We give a general discussion of main results from Table 1 below. Then, we discuss more detailed results from the A.C.E. Revision II estimates given later in Tables 3-9.
In examining Tables 1 and 2 pay attention to the footnotes noting issues of comparability between the sets of results. For example, only the A.C.E. Revision II estimates break the 0-17 age group into 0-9 and 10-17; thus, in Table 1 the same numbers are shown for 0-9 and 10-17 for the March 2001 A.C.E. estimates. This is also true of the 1990 PES estimates, while the A.C.E.
Revised Preliminary estimates did not provide estimates by age-sex groups. Also, some of the race/Hispanic origin domain definitions used in the estimates for 2000 differed from those used in the 1990 PES (e.g., American Indians Off Reservations were included among non-Hispanic Whites in 1990).
There are also methodological differences between the different sets of estimates that affect their comparability. As noted in Section 1, the A.C.E. Revision II estimates improve on the March 2001 A.C.E. estimates by including corrections for undetected duplicates, data corrections that affect estimates of erroneous enumerations and census omissions, and adjustments for correlation bias for adult males. In fact, the March 2001 A.C.E. estimates are shown in Table 1 primarily so that comparing them to the A.C.E. Revision II estimates shows the effects of these corrections on the national estimates. The A.C.E. Revised Preliminary estimates also include corrections for undetected duplicates (though not the same corrections as the A.C.E. Revision II estimates), and only some data corrections, but not the other A.C.E. Revision II corrections noted. The 1990 PES estimates do not include corrections analogous to those of the A.C.E. Revision II, although the extent to which the 1990 PES estimates may have been affected by undetected duplicates and coding errors is unknown. Estimates of the bias for individual error components for the 1990 PES are contained in Table 1 of the paper by Mulry and Spencer (1993). However, there is evidence of correlation bias in the 1990 PES estimates (Bell 1993).
9
A.C.E. Revision II estimates a negative net undercount, or overcount, of the Census 2000 household population. The estimated percent net undercount of -0.49 with a standard error of 0.20 is significantly different from zero at the 10-percent significance level. This differs sharply from the March 2001 A.C.E. estimate of a 1.18 percent net undercount (standard error of 0.13),
an estimate which was corrupted by undetected duplicates and the effects of coding errors. The A.C.E. Revision II estimate does not dramatically differ from the A.C.E. Revised Preliminary estimate of a 0.06 percent net undercount (standard error of 0.18), although the closeness of the two is aided by the correlation bias adjustment which increased the A.C.E. Revision II undercount estimate. (Without the correlation bias adjustment, DSE with the A.C.E. Revision II data estimate a -1.12 percent net undercount, a figure that can be obtained from results in Appendix Table 2 of Shores (2002).) The A.C.E. Revision II estimate of Census 2000 coverage also differs dramatically from the 1990 PES estimate of a 1.61 percent net undercount (standard error of 0.20) in the 1990 census.
Among the A.C.E. Revision II coverage estimates by race/Hispanic origin domains, only those for the Non-Hispanic White and Non-Hispanic Black domains show estimated net undercounts that differ significantly from zero. The Non-Hispanic White domain has a negative estimated net undercount of -1.13 percent, reflecting an overcount, while the Non-Hispanic Black domain has an estimated net undercount of 1.84 percent.
The 1990 PES estimated very similar net undercount rates for the Non-Hispanic Blacks and Hispanics. The A.C.E. Revision II estimate for the Hispanic domain is a net undercount of 0.71 percent, which is not as similar to the Non-Hispanic Black estimate as it was in 1990. This may partly be due to sampling variation. However, the A.C.E. Revision II net undercount estimates for the Non-Hispanic Black and Hispanic domains are not significantly different from one another. Differences in the estimates for these two domains are also affected by the correlation bias adjustment present in the A.C.E. Revision II estimates (and not present in the 1990 PES estimates). As noted in Section 5.1, the A.C.E. Revision II estimates for Non-Hispanic Blacks are more strongly affected by the correlation bias adjustment than are the estimates for the NonBlack race domains, including Hispanics.
The A.C.E. Revision II net undercount estimate for the On Reservation American Indian and Alaska Native (AIAN) population differs markedly from the estimate of the 1990 PES: an estimated net undercount of -0.88 percent in A.C.E. Revision II versus 12.22 in the 1990 PES, though the standard error on the latter estimate is large. The March 2001 A.C.E. estimate and the A.C.E. Revised Preliminary estimate for the On Reservation AIAN population fall in between these two. The differences between these point estimates and that from the A.C.E. Revision II show that corrections made in the A.C.E. Revision II estimates had important effects for this group.
Table 1 shows differential coverage estimates with respect to Tenure. Nationally, A.C.E.
Revision II estimates owners to have a net undercount of -1.25 percent and non-owners a net undercount of 1.14 percent. These estimated net undercount rates are statistically different from zero, and their difference is also statistically significant. The 1990 PES estimated an even more 10
dramatic difference in coverage between owners and non-owners, though in the same direction (higher estimated undercount for non-owners). Also, the 1990 PES estimated net undercount rate of 4.51 for non-owners was much higher than that of the A.C.E. Revision II.
The A.C.E. Revision II estimates show coverage differentials by age and sex. In particular, statistically significant net overcounts were estimated for children age 10-17 and for adult females 18-29, 30-49, and 50 and over, as well as for males 50 and over. In contrast, statistically significant net undercounts were estimated for males 18-29 and 30-49, and the net undercount estimate for children 0-9 was not significantly different from zero. The coverage differences by sex are affected by the correlation bias adjustments that increase the undercount estimates for adult males. This makes comparisons with the 1990 PES results somewhat difficult. The main thing in common to the two sets of estimates for age-sex groups appears to be the much lower undercount estimates (in fact, overcount estimates in all cases) for age 50 and over compared to the other adult age groups, a pattern that shows up for both males and females. One notable difference in estimated coverage occurs for children. The 1990 PES estimated a large net undercount for children 0-17 of 3.18 percent, a larger undercount point estimate than for any other group except 18-29 males. In contrast, the A.C.E. Revision II estimated net undercounts of
-0.46 and -1.32 percent for children 0-9 and 10-17, respectively, do not stand out in contrast to the estimates for adults, at least not as significantly higher. However, this comparison is affected partly by the correlation bias adjustments that increase the A.C.E. Revision II estimates for adult males. The comparisons to estimates for adult females are not affected by the correlation bias adjustments.
The March 2001 A.C.E. and the 1990 PES both estimate a percent net undercount for children ages 0-17. The A.C.E. Revision II separates children into two age groups: 0-9 and 10-17. The 0-9 year olds have an estimated percent net undercount of -0.46 which is not significantly different from zero. This estimated percent net undercount for 0-9 year olds is not consistent with Demographic Analysis. In contrast, the 10-17 year olds have a percent net undercount of
-1.32 which is significantly different from zero. In contrast to the 0-9 age group, the estimated percent net undercount for 10-17 year olds is consistent with Demographic Analysis.
11
Table 1: Percent Net Undercount for Major Groups A.C.E. A.C.E. Revised A.C.E.
1990 PES Revision II Preliminary March 2001 Characteristic Est. (%) S.E. (%) Est. (%) S.E. (%) Est. (%) S.E. (%) Est. (%) S.E. (%)
Total -0.49 0 .20 0.06 0. 18 1.18 0.13 1.61 0.20 Race/Hispanic Origin Domain Non-Hispanic White* -1. \3 0.20 -0.33 0.21 0 .67 0.14 0.68 0.22 Non-Hispanic Black 1.84 0.43 0 .78 0.45 2 .17 0.35 4 .57 0.55 Hispanic 0.71 0.44 1.25 0.54 2.85 0.38 4 .99 0.82 Non-Hispanic Asian** -0.75 0.68 -0.31 0.91 0.96 0.64 2.36 1.39 Hawaiian or Pacific Isl** 2. 12 2.73 4 .64 2 .79 4.60 2.77 2.36 1.39 AI on Reservation*** -0.88 1.53 3.44 1.60 4.74 1.20 12.22 5.29 AI off Reservation*** 0.62 1.35 3.44 1.60 3.28 1.33 0.68 0 .22 Tenure Owner -1 .25 0 .20 n/a n/a 0.44 0. 14 0.04 0 .21 Non-Owner 1.14 0.36 nla nla 2.75 0.26 4 .51 0.43 Age/Sex 0-9**** -0.46 0 .33 n/a nla 1.54 0.19 3.18 0.29 10 - 17**** -1.32 0.41 nla nla \.54 0 .19 3.18 0.29 18 - 29 Male 1.12 0.63 nla n/a 3 .77 0.32 3.30 0.54 18 - 29 Female -1.39 0.52 nla nla 2 .23 0.29 2.83 0.47 30 - 49 Male 2.01 0.25 n/a nla 1.86 0.19 1.89 0 .32 30 - 49 Female -0.60 0.25 nla nla 0.96 0.17 0.88 0.25 50+ Male -0.80 0 .27 n/a nla -0.25 0.18 -0.59 0.34 50+ Female -2.53 0.27 nla nla -0.79 0.17 -1.24 0.29 The A.C.E. Revision II, the A.C.E. Revised Preliminary, and the A.C.E. March 2001 net undercount are for the household population.
The 1990 net undercount is for the PES univ::rse which included noninstitutional, nonmilitary group quarters in addition to the household population. The results from the Committee on Adjustment ofPost-censal Estimates (CAPE) are total population estimates. As a result, the 1990 estimates may differ from the CAPE results. See Bryant et al. (1992) and Thompson (1992).
"For 1990, AI off Reservation was included in the Non-Hispanic White RacelHispanic Origin Domain. Therefore, the net undercount and standard error for these domains are identical.
""For 1990, Asian or Pacific lsI. was a single RacelHispanic Origin Domain. Therefore, for Non-Hispanic Asian and for Hawaiian or Pacific lsi, the net undercount and standard error are repeated.
"""For the A.C.E. Revised Preliminary estimates, American Indian and Alaskan Native was a single RacelHispanic Origin Domain. Therefore, for AI on Reservation and for AI off Reservation, the net undercount and standard error are identical.
"For March 2001 and for the 1990 PES, the "0 - 17" AgelSex group was a single group. Therefore, the net undercount and standard error for children "0 - 9" and "10 - 17" are identical.
A negative net undercount denotes a net overcount.
"n/a" means "Not Available."
12
Table 2: Net Undercount Estimates for Major Groups (in thousands)
A.C.E. A.C.E.
Census 1990 PES Characteristic Revision II March 2001 2000 Est. S.E. Est. S.E. Est. S.E.
Total 273,587 -1,332 542 3,262 378 3,994 488 Race/Hispanic Origin Domain Non-Hispanic White* 192,924 -2,151 382 1,302 272 1,277 417 Non-Hispanic Black 33,470 628 146 741 121 1,389 168 Hispanic 34,538 248 152 1,014 141 1,102 181 Non-Hispanic Asian** 9,960 -74 67 96 65 174 103 Hawaiian or Pacific Isl** 590 13 16 28 18 AI on Reservation 540 -5 8 27 7 52 22 AI off Reservation* 1,565 10 21 53 22 Tenure Owner 187,925 -2,320 372 840 264 71 334 Non-Owner 85,662 988 310 2,422 235 3,871 368 Age/Sex 0 - 9*** 39,642 -180 130 1,127 141 2,084 191 10 - 17*** 32,307 -422 129 18 - 29 Male 21,594 245 138 845 76 792 130 18 - 29 Female 21,576 -295 111 492 65 687 113 30 - 49 Male 41,297 848 104 784 83 685 114 30 - 49 Female 42,783 -257 105 414 73 326 95 50+ Male 33,798 -270 90 -83 61 -160 93 50+ Female 40,590 -1,001 107 -318 67 -419 98 The Census count is for the household population.
The A.C.E. Revision II and the A.C.E. March 2001 net undercount are for the household population.
The 1990 net undercount is for the PES universe which included noninstitutional, nonmilitary group quarters in addition to the household population. The results from the Committee on Adjustment of Post-censal Estimates (CAPE) are total population estimates. As a result, the 1990 estimates may differ from the CAPE results. See Bryant et al. (1992) and Thompson (1992).
- For 1990, AI off Reservation was included in the Non-Hispanic White Race/Hispanic Origin Domain.
- For 1990, Asian or Pacific Isl. was a single Race/Hispanic Origin Domain. Therefore, the net undercount and standard error displayed is for the Asian or Pacific Isl Domain.
- For March 2001 and for the 1990 PES, the 0 - 17 Age/Sex group was a single group. Therefore, the net undercount and standard error displayed are for the 0 - 17 Age/Sex group Estimates from the A.C.E. Revised Preliminary methodology are not available. Since the revised preliminary estimates are only an approximation of the undercount, the dual system estimates were not calculated.
A negative net undercount denotes a net overcount.
13
Table 3 presents estimated net undercount rates for the cross-classification of race/Hispanic Origin domain by tenure for A.C.E. Revision II and the 1990 PES. Overall, owners and non-owners in A.C.E. Revision II have statistically significant estimated net overcounts and undercounts, respectively, while the 1990 PES estimated significant net undercounts for non-owners. In A.C.E. Revision II, both Non-Hispanic Asian and Non-Hispanic White owners have significant estimated net overcounts while Non-Hispanic Black and Hispanic non-owners have significant estimated net undercounts. All other tenure by domain coverage rates are not statistically different from zero.
The estimated coverage differences between owners and non-owners in A.C.E. Revision II are statistically significant within most of the race/Hispanic origin domains. In fact, this holds for all race/Hispanic origin domains except Native Hawaiian or Other Pacific Islander and American Indian or Alaska Native On Reservations. These two domains are relatively small and their estimates of net coverage have large standard errors.
Table 4 shows the estimated census inclusion rates4 for the 64 A.C.E. Revision II full P-Sample post-stratum groups. For six of the seven race/Hispanic origin domains, the census inclusion rates are higher for owners than for non-owners. This conclusion is not true for Domain 1, American Indian or Alaska Natives On Reservations. In addition, census inclusion rates are higher for the High Return Rate post-stratum groups than for the Low Return Rate post-stratum groups. The census inclusion rates are similar for the Non-Hispanic Black and Hispanic populations. Standard errors of these estimated rates are provided in Table 5.
Census correct enumeration rates are provided in Table 6 for the 93 A.C.E. Revision II full E-Sample post-stratum groups. The census correct enumeration rates are defined as the estimated correct enumerations divided by the census counts. The rates are quite low for the seven post-stratum groups for proxy enumerations. Mailback post-stratum groups have higher census correct enumeration rates than non-mailback post-stratum groups. Also, within either the mailback or non-mailback category, the census correct enumeration rates are higher for early responses than for late responses. Table 7 contains the standard errors associated with these estimated census correct enumeration rates.
Table 8 shows the percent net undercount estimates for the 64 A.C.E. Revision II full P-Sample post-stratum groups. Estimated net overcounts are present for nearly all owner post-stratum groups. High estimated net undercount rates for Hispanic and Non-Hispanic Black non-owners stand out. These rates, however, are accompanied by large standard errors. Table 9 displays the percent net undercount standard errors associated with the 64 A.C.E. Revision II full P-Sample post-stratum groups.
4 As discussed in Section 2.3, for individual post-strata without a correlation bias adjustment the census inclusion probabilities would be estimated by the P-Sample match rates, rM,j. With the correlation bias adjustment for adult males, we divide the rM,j by the correlation bias adjustment factors. To get the census inclusion rates shown in Table 4 for the post-stratum groups we divided estimated correct enumerations for that group by the A.C.E.
Revision II population estimate for that group (Bell 2002a). Examination of equation (1) in Section 2.3 will reveal that for individual P-Sample post-strata these two definitions are consistent with one another.
14
Table 3: Percent Net Undercount: RacelHispanic Origin Domain by Tenure r~ ----- - ------- --
A.C.E. Revision II 1990 PES
~ -----
Characteristic Est. (%} S . E .lli)_ -L_~~(%} S.E. (%} Characteristic Non-Hispanic White -1.13 0.20 I 0.68 0.22 Non-Hispanic White I
Owner -1.46 0.20 -0.26 0.23 Owner Non-Owner -0.07 0.41 i 3.06 0.50 Non-Owner I
I AI off Reservation 0.62 1.35 I I
Owner -1.53 1.77 I Non-Owner 3.54 2.18 Non-Hispanic Black 1.84 0.43 4.57 0.53 Black Owner 0.56 0.49 2.26 0.56 Owner Non-Owner 3.06 0.60 6.48 0.83 Non-Owner Hispanic 0.71 0.44 4.99 0.78 Hispanic Owner -1.08 0.50 1.82 0.68 Owner Non-Owner 2.35 0.62 7.43 1.18 Non-Owner Non-Hispanic Asian -0.75 0.68 2.36 1.36 Asian or Pacific lsI.
Owner -1.71 0.85 -1.45 1.47 Owner Non-Owner 0.68 0.98 6.96 2.50 Non-Owner Hawaiian or Pacific lsI. 2.12 2.73 Owner 0.67 3.87 Non-Owner 3.64 3.60 AI on Reservation -0.88 1.53 Owner -0.74 1.74 Non-Owner -1.17 1.71 Total -0.49 0.20 1.59 0.19 Total*
Owner -1.25 0.20 i 0.04 0.21 Owner Non-Owner 1.14 ____ <U~ ___ L 4.51 - --
-_._-_._ 0.41 Non-Owner
- Excludes American Indians on Reservations.
A negative net undercount denotes a net overcount.
The 1990 Hispanic domain excludes Blacks, Asian or Pacific Islanders, and American Indians on Reservation.
The 1990 net undercount is for the PES universe which included noninstitutional, nonmilitary group quarters in addition to the household population. As a result, the 1990 estimates may differ from the CAPE results. See Bryant et al. (1992) and Thompson (1992).
15
Table 4: 64 A.C.E. Revision II Full P-Sample Post-Stratum Groups -
Census Inclusion Rates (%)
High Return Rate Low Return Rate Race/Hispanic Origin Tenure MSA/TEA Domain Number*
NE MW S W NE MW S W Domain 7 Owner Large MSA MO/MB 95.30 96.32 95.19 95.12 92.81 94.28 92.20 92.39 (Non-Hispanic White or Some other race) Medium MSA MO/MB 96.28 97.01 94.90 95.68 97.16 94.59 93.33 91.11 Small MSA & Non-MSA MO/MB 95.98 96.06 95.35 95.73 93.14 90.92 92.26 91.41 All Other TEAs 92.90 96.00 92.94 91.26 92.05 93.32 91.96 89.03 Non- Large MSA MO/MB 89.10 85.39 Owner Medium MSA MO/MB 89.51 85.17 Small MSA & Non-MSA MO/MB 89.65 85.19 All Other TEAs 88.32 83.19 Domain 4 Owner Large MSA MO/MB (Non-Hispanic Black) 89.25 84.91 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 89.07 86.87 All Other TEAs Non- Large MSA MO/MB Owner 82.38 79.01 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 83.99 84.78 All Other TEAs Domain 3 Owner Large MSA MO/MB (Hispanic) 92.10 88.01 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 90.87 88.83 All Other TEAs Non- Large MSA MO/MB Owner 86.44 80.72 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 83.53 75.91 All Other TEAs Domain 5 Owner 87.15 (Native Hawaiian or Pacific Islander) Non-Owner 83.27 Domain 6 Owner 92.32 (Non-Hispanic Asian)
Non-Owner 87.07 Domain 1 Owner 86.13 American (On Indian Reservation) Non-Owner 87.14 or Alaska Domain 2 Owner 90.54 Native (Off Reservation) Non-Owner 84.25
- For Census 2000, persons can self-identify with more than one race group. For post-stratification purposes, persons are included in a single Race/Hispanic Origin Domain. This classification does not change a persons actual response. Further, all official tabulations are based on actual responses to the census.
16
Table 5: 64 A.C.E. Revision II Full P-Sample Post-Stratum Groups -
Census Inclusion Rate Standard Errors (%)
High Return Rate Low Return Rate Race/Hispanic Origin Tenure MSA/TEA Domain Number*
NE MW S W NE MW S W Domain 7 Owner Large MSA MO/MB 0.97 1.08 0.73 0.63 1.09 1.33 1.73 1.92 (Non-Hispanic White or Some other race) Medium MSA MO/MB 1.08 0.88 0.51 0.46 1.40 0.85 1.28 2.69 Small MSA & Non-MSA MO/MB 1.04 0.74 1.21 0.90 3.86 2.46 1.18 1.41 All Other TEAs 1.20 1.02 1.13 1.86 2.21 1.09 0.90 1.65 Non- Large MSA MO/MB 0.98 1.08 Owner Medium MSA MO/MB 0.98 1.32 Small MSA & Non-MSA MO/MB 0.89 1.13 All Other TEAs 0.86 1.66 Domain 4 Owner Large MSA MO/MB (Non-Hispanic Black) 0.51 1.22 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 0.77 1.73 All Other TEAs Non- Large MSA MO/MB Owner 0.79 0.94 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.16 1.80 All Other TEAs Domain 3 Owner Large MSA MO/MB (Hispanic) 0.47 1.36 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.04 1.84 All Other TEAs Non- Large MSA MO/MB Owner 0.77 1.27 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.42 3.37 All Other TEAs Domain 5 Owner 3.46 (Native Hawaiian or Pacific Islander) Non-Owner 3.05 Domain 6 Owner 1.21 (Non-Hispanic Asian)
Non-Owner 0.82 Domain 1 Owner 1.39 American (On Indian Reservation) Non-Owner 1.38 or Alaska Domain 2 Owner 1.66 Native (Off Reservation) Non-Owner 1.75
- For Census 2000, persons can self-identify with more than one race group. For post-stratification purposes, persons are included in a single Race/Hispanic Origin Domain. This classification does not change a persons actual response. Further, all official tabulations are based on actual responses to the census.
17
Table 6: 93 A.C.E. Revision II Full E-Sample Post-Stratum Groups -
Census Correct Enumeration Rates (%)
Race/Hispanic HH Early Late Early Late Origin Domain* Tenure Relationship Size Mail- Mail- Non- Non-back back Mailback Mailback Domain 7 PROXY Non-Hispanic White or SOR 61.14 Domain 4 PROXY Non-Hispanic Black 57.61 Domain 3 PROXY Hispanic 59.18 Domain 5 PROXY Native Hawaiian or Pacific Islander 63.25 Domain 6 PROXY Non-Hispanic Asian 59.24 Domain 1 PROXY AI or AN On Reservation 80.50 Domain 2 PROXY AI or AN Off Reservation 72.59 Domain 7 Owner HHer/Nuclear 2-3 97.33 96.22 94.33 90.26 Non-Hispanic White or Some other race 4+ 97.97 97.09 96.15 93.17 Other 1 95.67 93.90 91.70 87.21 2-3 92.03 91.57 90.48 87.77 4+ 89.78 89.59 88.73 87.88 Non- HHer/Nuclear 95.93 95.02 93.12 89.97 Owner Other 92.70 91.67 89.64 86.40 Domain 4 Owner HHer/Nuclear 96.62 95.40 93.73 90.96 Non-Hispanic Other 90.87 90.54 89.56 86.85 Black Non- HHer/Nuclear 94.75 93.79 92.69 89.12 Owner Other 90.11 89.02 89.34 84.57 Domain 3 Owner HHer/Nuclear 97.68 96.98 95.20 91.60 Hispanic Other 92.88 91.82 88.90 88.42 Non- HHer/Nuclear 96.12 95.29 92.84 89.50 Owner Other 90.83 90.40 86.16 84.85 Domain 5 Owner & HHer/Nuclear 97.32 93.92 93.05 92.16 Native Hawaiian or Pacific Non-Islander Owner Other 89.05 86.94 86.05 88.05 Domain 6 Owner & HHer/Nuclear 97.34 95.65 92.35 90.95 Non-Hispanic Asian Non-Owner Other 90.98 90.13 86.51 86.10 American Domain 1 Owner & HHer/Nuclear 93.02 Indian or On Non-Alaska Reservation Owner Other 88.71 Native Domain 2 Owner & HHer/Nuclear 97.10 93.91 94.11 90.50 Off Non-Reservation Owner Other 88.93 87.70 89.59 84.08
- For Census 2000, persons can self-identify with more than one race group. For post-stratification purposes, persons are included in a single Race/Hispanic Origin Domain. This classification does not change a persons actual response. Further, all official tabulations are based on actual responses to the census.
18
Table 7: 93 A.C.E. Revision II Full E-Sample Post-Stratum Groups -
Census Correct Enumeration Rate Standard Error (%)
Race/Hispanic HH Early Late Early Late Origin Domain* Tenure Relationship Size Mail- Mail- Non- Non-back back Mailback Mailback Domain 7 PROXY Non-Hispanic White or SOR 1.08 Domain 4 PROXY Non-Hispanic Black 1.57 Domain 3 PROXY Hispanic 1.88 Domain 5 PROXY Native Hawaiian or Pacific Islander 13.41 Domain 6 PROXY Non-Hispanic Asian 3.78 Domain 1 PROXY AI or AN On Reservation 3.20 Domain 2 PROXY AI or AN Off Reservation 5.97 Domain 7 Owner HHer/Nuclear 2-3 0.56 0.97 0.89 0.66 Non-Hispanic White or Some other race 4+ 0.92 0.57 1.02 1.06 Other 1 0.93 0.82 0.91 1.15 2-3 0.61 0.80 0.98 1.02 4+ 0.90 0.92 1.52 1.40 Non- HHer/Nuclear 0.59 0.41 0.71 0.64 Owner Other 0.58 0.48 1.21 0.59 Domain 4 Owner HHer/Nuclear 1.09 0.62 1.28 0.83 Non-Hispanic Other 0.68 0.67 0.99 1.23 Black Non- HHer/Nuclear 0.90 1.03 1.16 0.82 Owner Other 0.71 1.04 1.31 0.88 Domain 3 Owner HHer/Nuclear 0.77 0.61 0.85 1.11 Hispanic Other 0.95 0.68 1.28 1.38 Non- HHer/Nuclear 1.08 1.16 0.82 0.79 Owner Other 0.65 0.61 1.27 1.10 Domain 5 Owner & HHer/Nuclear 0.73 1.82 2.81 2.05 Native Hawaiian or Pacific Non-Islander Owner Other 2.38 3.10 3.78 2.97 Domain 6 Owner & HHer/Nuclear 0.59 1.20 1.11 0.99 Non-Hispanic Asian Non-Owner Other 1.13 1.18 2.19 1.61 American Domain 1 Owner & HHer/Nuclear 1.01 Indian or On Non-Alaska Reservation Owner Other 1.10 Native Domain 2 Owner & HHer/Nuclear 1.18 1.48 2.05 1.92 Off Non-Reservation Owner Other 1.80 2.08 2.60 2.92
- For Census 2000, persons can self-identify with more than one race group. For post-stratification purposes, persons are included in a single Race/Hispanic Origin Domain. This classification does not change a persons actual response. Further, all official tabulations are based on actual responses to the census.
19
Table 8: 64 A.C.E. Revision II Full P-Sample Post-Stratum Groups -
Percent Net Undercount High Return Rate Low Return Rate Race/Hispanic Origin Tenure MSA/TEA Domain Number*
NE MW S W NE MW S W Domain 7 Owner Large MSA MO/MB -1.45 -2.13 -0.97 -1.37 -3.57 -4.70 0.45 -0.22 (Non-Hispanic White or Some other race) Medium MSA MO/MB -1.96 -2.56 -1.01 -1.85 -5.87 -2.36 -0.90 1.43 Small MSA & Non-MSA MO/MB -1.81 -1.81 -1.40 -1.83 -0.54 1.58 0.43 0.83 All Other TEAs -0.69 -3.26 -0.60 -0.53 -0.68 -1.95 -1.36 1.27 Non- Large MSA MO/MB -0.25 -0.21 Owner Medium MSA MO/MB -0.59 1.50 Small MSA & Non-MSA MO/MB -0.49 2.06 All Other TEAs -1.04 2.67 Domain 4 Owner Large MSA MO/MB (Non-Hispanic Black) 0.81 1.04 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 0.04 -0.54 All Other TEAs Non- Large MSA MO/MB Owner 3.49 3.76 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.87 -2.03 All Other TEAs Domain 3 Owner Large MSA MO/MB (Hispanic) -1.15 -0.43 Medium MSA MO/MB Small MSA & Non-MSA MO/MB
-1.23 -1.40 All Other TEAs Non- Large MSA MO/MB Owner 1.02 4.66 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 2.66 8.48 All Other TEAs Domain 5 Owner 0.67 (Native Hawaiian or Pacific Islander) Non-Owner 3.64 Domain 6 Owner -1.71 (Non-Hispanic Asian)
Non-Owner 0.68 Domain 1 Owner -0.74 American (On Indian Reservation) Non-Owner -1.17 or Alaska Domain 2 Owner -1.53 Native (Off Reservation) Non-Owner 3.54
- For Census 2000, persons can self-identify with more than one race group. For post-stratification purposes, persons are included in a single Race/Hispanic Origin Domain. This classification does not change a persons actual response. Further, all official tabulations are based on actual responses to the census.
- A negative net undercount denotes a net overcount.
20
Table 9: 64 A.C.E. Revision II Full P-Sample Post-Stratum Groups -
Percent Net Undercount Standard Error High Return Rate Low Return Rate Race/Hispanic Origin Tenure MSA/TEA Domain Number*
NE MW S W NE MW S W Domain 7 Owner Large MSA MO/MB 0.41 0.37 0.60 0.38 0.81 1.30 1.59 1.95 (Non-Hispanic White or Some other race) Medium MSA MO/MB 0.64 0.25 0.40 0.38 0.99 0.79 0.87 2.86 Small MSA & Non-MSA MO/MB 0.81 0.36 0.47 0.49 4.01 2.38 1.00 1.36 All Other TEAs 1.00 0.38 0.92 1.98 2.29 1.16 0.65 1.64 Non- Large MSA MO/MB 0.68 1.00 Owner Medium MSA MO/MB 0.69 1.24 Small MSA & Non-MSA MO/MB 0.57 1.24 All Other TEAs 0.87 1.72 Domain 4 Owner Large MSA MO/MB (Non-Hispanic Black) 0.55 1.12 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 0.90 1.73 All Other TEAs Non- Large MSA MO/MB Owner 0.77 1.07 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.00 1.98 All Other TEAs Domain 3 Owner Large MSA MO/MB (Hispanic) 0.54 1.33 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 0.97 2.02 All Other TEAs Non- Large MSA MO/MB Owner 0.72 1.18 Medium MSA MO/MB Small MSA & Non-MSA MO/MB 1.43 4.28 All Other TEAs Domain 5 Owner 3.87 (Native Hawaiian or Pacific Islander) Non-Owner 3.60 Domain 6 Owner 0.85 (Non-Hispanic Asian)
Non-Owner 0.98 Domain 1 Owner 1.74 American (On Indian Reservation) Non-Owner 1.71 or Alaska Domain 2 Owner 1.77 Native (Off Reservation) Non-Owner 2.18
- For Census 2000, persons can self-identify with more than one race group. For post-stratification purposes, persons are included in a single Race/Hispanic Origin Domain. This classification does not change a persons actual response. Further, all official tabulations are based on actual responses to the census.
21
2.2 Estimates of Census Coverage for Small Geographic Areas As will be discussed in Section 2.3, correlation bias-adjusted DSEs and corresponding coverage correction factors (CCFs) are formed for each combination of E- and P- Sample post-strata. Call these combinations detailed post-strata. The A.C.E. Revision II had 7,456 non-empty detailed post-strata, thus resulting in 7,456 direct DSEs and CCFs. The CCFs are applied synthetically within the detailed post-strata as shown in Section 2.3 to produce estimates for small geographic areas such as places and counties. This section examines the CCFs and the resulting place and county estimates. These results are based on Census 2000 collection geography.
Figure 1 shows the distribution of the CCFs for non-proxy detailed post-strata within three size groups defined by the detailed post-stratum census counts. The size groups are defined as follows: (1) less than 10,000; (2) between 10,000 and 100,000; and (3) greater than 100,000.
CCFs for detailed post-strata associated with proxy census enumerations are displayed separately. Figure 1 shows that detailed post-strata with 100,000 or more people typically have CCFs greater than 1, indicating an undercounted population. Detailed post-strata in the two smaller size categories have more CCFs less than one than greater than 1, indicating predominantly overcounted populations. Detailed post-strata for proxy census enumerations have dramatically lower CCFs ranging between 0.4 and 1.1. Nearly all the proxy CCFs are less than 1, and most are substantially less than 1, reflecting large estimated overcounts.
Table 10 and Figure 2 show the distribution of estimated net undercount rates for the A.C.E.
eligible population for places by size of place. The data in this table represent about 57.0 percent of the total population. Synthetic estimates were formed at the place level, and then compared to the census counts for those places. For each place, the net undercount rate is estimated as (100 times) the synthetic estimate less the census count divided by the synthetic estimate. The entries in the body of Table 10 give the count of places falling in each cell defined by the undercount rate categories and place size categories, and the percent that count represents of the column total (the total number of places in the given place size category).
For the 211 places with populations of at least 100,000, the net undercount rates fall between
2.0 percent and 3.0 percent. As the place size decreases, the estimated net undercount rates become more variable. The smallest places shown have estimated net undercount rates between
22.0 percent and 6.0 percent, the former corresponding to an estimated overcount rate of 22.0 percent. The fact that the largest places ( 100,000 people) have more estimated net undercounts than overcounts may be due to some concentration of hard-to-count populations in large urban areas. For smaller places, the synthetic estimates produce more estimated net overcounts than undercounts, and the net overcount estimates are more extreme than for larger places.
Table 11 and Figure 3 show analogous results for counties. The data shown represent the full A.C.E. eligible population of 273,586,997 people. Counties are divided into categories based on their census counts as follows: (1) 0-2,499; (2) 2,500-9,999; (3) 10,000-24,999; (4) 25,000-99,999; (5) 100,000-249,999; (6) 250,000-999,999; and (7) greater than or equal to 1,000,000.
22
At the county level, synthetic estimation leads to more estimated net overcounts than undercounts overall. Even the largest counties ( 1,000,000 people) yield an overall net undercount estimate of -0.11 percent, reflecting a slight estimated overcount. As county size decreases, the estimated net overcount rates increase. The largest estimated net overcounts are in the 7.0 - 10.0 percent range, but these instances are rare. One county in size category 0-2,499 has an estimated net undercount between 8.0 and 9.0, but this is an extreme case. Figure 3 shows the distribution of county estimated undercount rates for each size category. The differences in the distributions across the county size categories shown in Figure 3 do not seem as large as the differences in the distributions across the place size categories shown in Figure 2.
23
Figure 1: Distribution of CCFs by Post-stratum Size 200 175 150 125 100 Post-strata 75 50 25 0
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 CCF
>100K 10K-100K <10K proxy 24
Table 10 : Distribuf fNet Und t Rates for PI bvPI s* A.C.E. Eli!!ible P I ..
- -~~-
Census 2000 Count >= 100,000 25,000-99,999 10,000-24,999 2,500 - 9,999 1,000 - 2,499 250 - 999 100 - 249 1-99 Number of Places 211 888 1266 3316 3167 5423 2727 2271
-- . - -~
Total censusi 68,684,049 41,870,505 19,934,239 16,950,949 5,061,970 2,941,782 460,589 120,028 t- .-- --- ~ - --- ~ . --
- .- - --.- .. -.~-
I Average Rate 0.2037% -0.3830% -0.6042% -0.6902% -1.0693% -1.7356% -2.2026% -2.5312%
- - -~--- - - . - - ~ --- - ._ - -
Median Rate 0.1438% -0.4445% -0.6781% -0.8479% -1.1873% -1.8767% -2.2623% -2.4361%
- -.~-- .. -. . --- -
0 -56% -10% 0.03% 16 0.30% 14 0.51% 76 3.35%
V E -10% -9% 0.03% 9 0.17% 11 0.40% 26 1.14%
R -9% -8% 0.03% 7 0.22% 8 0.15% 16 0.59% 33 1.45%
C 0 -8% -7% 4 0.12% 5 0.16% 21 0.39% 34 1.25% 50 2.20%
U N -7% -6% 4 0.12% 6 0.19% 51 0.94% 72 2.64% 86 3.79%
T -6% -5% 6 0.18% 22 0.69% 107 1.97% 102 3.74% 131 5.77%
-5% -4% 1 I 2 0.16% 12 0.36% 49 1.55% 301 5.55% 216 7.92% 231 10.17%
-4% -3%1 3 0.34% 7 0.55% 47 1.42% 195 6.16% 77314.25% 432 15.84% 309 13.61%
I I
-3% -2%1 I 33 3.72% 81 6.40% 315 9.50% 59318.72% 124722.99% 617 22.63% 350 15.41%
-2% -1%1 21 9.95%1 220 24.77% 388 30.65% 1082 32.63 85026.84% 114821.17% 45616.72% 361 15.90%
-1% 0% 64 30.33% 1 330 37.16% 480 37.91% 1019 30.73 74523.52% 877 16.17% 38314.04% 245 10.79%
- - - - " ___ _ .0 _ _ -
U 0% 1% 102 48.34% 233 26.24% 234 18.48% 513 15.47 43013.58% 558 10.29% 202 7.41% 178 7.84%
N 0 1% 2% 22 10.43% 63 7.09% 56 4.42% 238 7.18% 189 5.97% 207 3.82% 123 4.51% 102 4.49%
E R 2% 3%1 2 0.95% 6 0.68% 16 1.26% 55 1.66% 55 1.74% 73 1.35% 33 1.21% 43 1.89%
C 3% 4%1 2 0.16% 15 0.45% 14 0.44% 23 0.42% 11 0.40% 29 1.28%
0 U
N T
4%
5%
5%
12%
--- I -----
3 0.09%
2 0.06%
4 0.13%
0.03%
3 0.06%
0.02%
3 0.11%
2 0.07%
12 9
0.53%
0.40%
25
Figure 2: Distribution of Place Undercount Rates for A.C.E. Eligible Population 1250 1125 1000 875 750 625 Number of Places 500 375 250 125 0
-11% -10% -9% -8% -7% -6% -5% -4% -3% -2% -1% 0% 1% 2% 3% 4% 5% 6%
Undercount Rate
>100K 25K-100K 10K-25K 2500-10K 1000-2499 250-999 100-249 0-99 26
Table 11: Distribution of Net Undercount Rates for Counties by County Size - A.C.E. Elieible Population Ce""", Count
- Counties Total Census
> 1,000,000 33 67,840,497 184 92,087,055
+
250,000 - 999,999 1 100,000 - 249,999 1 25,000.-. *99,999 298 i _45,7~~ 730~_ !?l~75,6~0 1016 10,000 - 24,999.
880
. 2,500 - 9,999 607 14,604,178-- -- --- 3,843,599 0 ~ 2. ,4. 99 117 181:258 Wtd. Average -0.1125% -0.4296% L --=-2.:.~980% 1 ~2:8290% i -0.9692% -1.1053% -1.2}_~6%
Median -0.2613% -0.5290% 1 -0.6690% _r-=9~?2Jl% I -0.8900% -1.1833% -1.0011%
0 1 -10% -9% 0.11%
V -9% -8% 2 0.23% 0.16% 0.85%
E R -8% -7% 0.16% 2 1.71%
C -7% -6% 2 0.20% 3 0.49% 5 4.27%
o -6% -5% 0.23% 4 0.66% 5 4.27%
Vi 0.10%1 2 I
N -5% -4% I 0.10%1 2 0.23% 9 1.48% 7 5.98%
T -4% -3%1 22 3.62% 4 3.42%
I 1 11 1.08%1 18 2.05%1
-3% -2%1 2 1.09%1 13 4.36% 1 118 11.61%1 147 16.70%1 132 21.75% 13 11.11 %
-2% -1%1 5 15.15%1 47 25.54%1 89 29.87%1 295 29.04%1 246 27.95%1 160 26.36% 22 18.80%
-1% 0%1 15 45.45% ~4.57%1 130 43.62o/o L __ 427_j2.03o/~l ____J13 35.57%1 148 24.3~!<>n25__ 1J.l.7~
V 0% 1%1 12 36.36%. 47 25.54% 50 16.78% 130 12.80%1 113 12.84%1 83 13.67% 18 15.38%
N 1% 2%! 3.03% 6 3.26%1 16 5.37% 27 2.66%1 31 3.52%i 40 6.59% 12 10.26%
D E 2% 3%1 i 4 0.39% 3 0.34% 3 0.49% 2 1.71%
R 3% 4% 2 0.23% 1 0.16%
C 4% 5%1 o
V 5% 6%1 N 6% 7%1 T
7%
8%1 8% 9% L 0.85%
27
Figure 3: Distribution of County Undercount Rates for A.C.E. Eligible Population 450 400 350 300 250 200 150 100 Number of Counties 50 0
-10% -9% -8% -7% -6% -5% -4% -3% -2% -1% 0% 1% 2% 3% 4% 5% 6% 7% 8% 9%
Undercount Rate
>1000K 250K-1000K 100K-250K 25K-100K 10K-25K 2,500-9,999 <2500 28
2.3 A.C.E. Revision II Estimation Methodology The A.C.E. Revision II estimation methodology uses Dual System Estimates (DSEs) that incorporate corrections for measurement errors obtained from two sources: the recoded cases from the A.C.E. evaluation data by the A.C.E. Revision II Measurement Coding Operation (see Adams and Kresja 2002a) and census duplicate detections from the Further Study of Person Duplication (see Mule 2002b). Both of these corrections affect the estimated E-Sample correct enumeration rates and the P-Sample match rates. The DSEs for adult males are also inflated by correlation bias adjustment factors estimated using DA sex ratios for the adult age groups (18-29, 30-49, 50+) at the national level by Black versus NonBlack race groups.
The specific form of the A.C.E. Revision II DSE is given in equation (1) and discussed below.
For a detailed discussion of the estimator, see Kostanich (2003a) or Kostanich (2003b).
rCE ,i DSE ij = Cenij x rDD ,ij x x (1) rM , j where:
i and j denote the E- and P- Sample post-strata used to estimate the correct enumeration and match rates, respectively.
Cenij is the census count of the household population for the cross-classification of post-strata i and j. Includes the reinstated cases.
rDD,ij is the data-defined rate for the cross-classification of post-strata i and j.
The reinstated cases are included in the denominator but not in the numerator.
rCE ,i is the estimated correct enumeration rate for E-Sample post-stratum i.
rM , j is the estimated match rate for P-Sample post-stratum j.
1 is the correlation bias adjustment factor (for adult males, distinct for a given age-race group)
The numerator of the data-defined rate, rDD,ij, is the count of census data-defined persons, which is the census count excluding whole person imputations and all reinstated persons (those who were removed from the census but then reinstated as part of the Housing Unit Duplication Operation.) The denominator of rDD,ij, is the census count, so that the product, CenijxrDD,ij, at the level of the ij post-strata is count of data-defined persons that were eligible for A.C.E. matching.
The correct enumeration rate, rCE,i, is the ratio of the E-Sample estimated correct enumerations to 29
the weighted estimate of data-defined persons for E-Sample post-stratum i. The product, CenijxrDD,ijxrCE,i, effectively estimates correct enumerations for the detailed ij post-stratum under the synthetic assumption that correct enumeration rates are constant over persons within E-Sample post-stratum i.
The match rate, rM,j, is the ratio of estimated matches to estimated Census Day residents for P-Sample post-stratum j. Under the traditional DSE independence assumption (no correlation bias),
these match rates would estimate the probabilities of persons being included in the census, so that dividing the estimated correct enumerations (CenijxrDD,ijxrCE,i) by rM,j would appropriately inflate them to account for census omissions (under the synthetic assumption that census inclusion probabilities are constant over persons within P-Sample post-stratum j). In the presence of correlation bias the rM,j tend to overestimate the census inclusion probabilities so that dividing by them does not sufficiently inflate the estimate of correct enumerations. Demographic Analysis sex ratios provide evidence of such correlation bias and permit its estimation for adult males (assuming no correlation bias for adult females) at the national level for age-race (Black versus NonBlack) groups. These estimates can be expressed as multiplicative factors 1 which correct the adult male DSEs for this estimated correlation bias. Note this includes a synthetic assumption that correlation bias for adult males is constant over persons within the age-race groups. For children and adult females the factors 1 are 1.
The results of the A.C.E. Revision II Measurement Coding Operation and Further Study of Person Duplication affect the estimates of correct enumerations that are the numerators of the correct enumeration rates, rCE,i. The denominators of the correct enumeration rates are not affected. For example, E-Sample cases with duplicates that were originally coded as correct enumerations are given reduced correct enumeration probabilities, which reduces tabulated estimates of correct enumerations. The A.C.E. Revision II Measurement Coding Operation and Further Study of Person Duplication also affect both the estimates of matches and the estimates of P-Sample residents that are the numerators and denominators of the match rates, rM,j. The specifics are somewhat complicated. For details, see Kostanich (2003a) or Kostanich (2003b).
Equation (1) shows how the A.C.E. Revision II estimates are constructed for the cross-classified ij post-strata. To produce estimates for specific areas or population subgroups we first define coverage correction factors (CCFs) by dividing the DSEs from equation (1) by the corresponding census counts, i.e.,
rCE ,i CCFij = DSE ij / Cenij = rDD ,ij x x (2) rM , j To produce the estimate for any area or population subgroup a, the CCFs from equation (2) are applied synthetically:
rCE ,i Cen a ,ij xCCFij = Cena ,ij xrDD ,ij x rM , j x
ij ij 30
where the summation is over all the cross-classified ij post-strata and Cena,ij is the census count in post-stratum ij for area or subgroup a.
The A.C.E. Revision IT DSE can be thought of as incorporating the following enhancements to a traditional DSE:
- New post-stratification to reflect different factors related to erroneous inclusions and omISSIons.
- Measurement corrections to the correct enumeration rate from the Further Study of Person Duplication.
- Measurement corrections to the correct enumeration rate from the A.C.E. Revision IT Measurement Coding Operation.
- Measurement corrections to the match rate from the Further Study of Person Duplication.
- Measurement corrections to the match rate from the A.C.E. Revision IT Measurement Coding Operation.
- Adjustment for correlation bias.
The impact of these revisions can best be seen by looking at the numerical effects of incorporating one change at a time to the DSE. Consider Table 12 below which shows the impact of each change relative to the March 2001 A.C.E. estimates of national net undercount.
Table 12: Change in Estimated Net Undercount of the Household Population Estimated Net Undercount Change
- Cumulative March 2001 A.C.E. Estimate 3,261,876 New Post-Stratification 38,618 3,300,493 E Sample: Person Duplication -2,814,355 486,138 Corrections Coding Corrections -2,427,198 -1,941,060 P Sample: Person Duplication -1,103,805 -3,044,865 Corrections Coding Corrections 11,032 -3,033,833 Correlation Bias 1,702,176 -1,331,656 A.C.E. Revision II Estimate -1,331,656 -4,593,532
- Shows the effect of adding in one revision at a time. A different ordering of the revisions would result in slightly different intermediate effects, but yield the same overall net undercount estimate. Estimated change in the net undercount is not the same as estimated additional erroneous enumerations or additional census omissions.
31
This table starts with the March 2001 A.C.E. estimate of a national net undercount of just under 3.3 million persons. Each row shows the effect on the net undercount estimate of making one of the specific revisions. Using only the new post-stratification and not making any measurement error corrections would increase the estimated net undercount to 3.3 million, an increase of less than 39,000. Though the effect of the new post-stratification is small at the national level, it has considerably more impact on subnational estimates, particularly for small areas, as noted in Section 2.2. When measurement error corrections are made to the correct enumeration rate, we see that if we first correct for those identified by the person duplication study the estimated net undercount is reduced by 2.8 million. Next, adding in the corrections identified from the recoding reduces the estimated net undercount by another 2.4 million, resulting in an estimated net overcount of 1.9 million. Next we incorporate measurement error corrections into the match rate.
First, adding in the corrections based on the person duplication study reduces the estimated net undercount by another 1.1 million. Adding in the corrections from the recoding causes the estimated net undercount to increase slightly by only 11,000. Making the final correction for correlation bias increases the estimated net undercount by 1.7 million, yielding the A.C.E.
Revision II estimate of a 1.3 million net overcount.
Limitations - It is important to note that the change in the net undercount estimate shown in these tables reflect a specific ordering of incorporating the A.C.E. Revision II changes. If the order were rearranged, the estimates of change in the net undercount estimates for each incorporation would be different. However, the final estimates at the bottom of the table would still be the same. The net undercount change estimates are not equivalent to estimates of additional census erroneous enumerations measured by the A.C.E. Revision II. The table shows change in the net undercount estimates. For example, the table shows that after accounting for the new post-stratification and the additional erroneous enumerations that the undercount estimate went from an undercount of 3.3 million to an overcount of 1.9 million. This is a change in the net undercount of 5.2 million people. This is not the change in erroneous enumerations. Likewise, changes in undercount estimates are not equivalent to the estimates of additional census omissions. See Mule (2003) for further discussion and for similar tables for the Race/Hispanic Origin domains.
32
- 3. Comparison to Demographic Analysis (DA)
This section summarizes the comparison of the A.C.E. Revision II coverage estimates of Census 2000 to the corresponding estimates based on Demographic Analysis (DA). We examine the consistency of the DA estimates at the national level with the A.C.E. Revision II estimates with adjustment for correlation bias. The adjustment for correlation bias is made on the basis of the DA results on sex ratios for adult males (separately for Black males and NonBlack males).
Robinson and Adlakha (2002) discuss the A.C.E. Revision II and DA comparisons in more detail.
Note that this assessment is confined to the comparison of DA and A.C.E. Revision II estimates at the national level-the consistency of the estimates for subnational areas is not addressed here.
DA represents a macro-level approach for estimating the net undercount by comparing aggregate sets of data or counts. The demographic method differs fundamentally from the survey-based method used in the Census 2000 A.C.E. The traditional DA population estimates are developed for the census date by analyzing various types of demographic data, such as administrative statistics on births, deaths, legal international migration, and Medicare enrollments, as well as estimates of legal emigration and unauthorized immigration. The difference between the DA estimate and the census count provides an estimate of the census net undercount. Dividing the net undercount by the DA estimate provides an estimate of the net undercount rate.
The Census 2000 count of 281.4 million is 0.34 million lower than the revised DA estimate of 281.8 million (Table 13). Relative to DA, the difference implies a net undercount of 0.12 percent.
This net undercoverage is dramatically different from that in the 1990 census or any other previous census. In 1990, the revised net undercount estimated by DA was 4.2 million or 1.65 percent. The DA results show that the improvement in coverage between the 1990 and 2000 censuses was shared by almost all demographic groups, males and females, Blacks and NonBlacks, and broad age groups. Overall, the DA results show that for Census 2000 the net census undercount had been reduced to substantially low levels except for the two groups--Black adult men and young children ages 0-9--for whom the net census undercount remained disproportionately high. These are the only groups in 2000 with coverage rates that differed by 2 percentage points or more from the coverage rate for the total population.
The A.C.E. Revision II estimates of net undercount rates with adjustment for correlation bias are broadly consistent with the DA estimates. The A.C.E. Revision II estimate with correlation bias adjustment (280.1 million) is 1.7 million below the revised DA estimate. The A.C.E. Revision II estimate implies a net census overcount of 1.3 million, or -0.48 percent5, compared to the DA estimated net undercount of 0.12 percent. The A.C.E. Revision II with an adjustment for correlation bias primarily affects the undercount estimates for Black adult males and brings the measured differentials in line with DA (Table 14 and Figure 4). This is basically a consequence of using the DA sex ratios to remove the correlation bias. The A.C.E. Revision II estimates for females (especially NonBlack females) are generally consistent with the DA estimates for ages 10 and over, even thought they did not receive an adjustment for correlation bias (Figure 4).
5 This estimated net undercount from A.C.E. Revision II is slightly different from the -0.49 estimate cited earlier mainly because it is relative to the entire resident population including persons in group quarters.
33
The A.C.E. Revision II and the DA estimates remain inconsistent with regard to coverage rates for children aged 0-9. In contrast to DA results which show a relative large undercount of children (both Black and NonBlack), the A.C.E. Revision II results estimates show a net overcount of NonBlack children and small net undercount of Black children (Table 14 and Figure 4). We need to do further research into the causes of the inconsistency of the DA and A.C.E. Revision II results for young children.
For ages 50 and over, a smaller but systematic gap is observed between the DA estimate and A.C.E. Revision II estimate for each race-sex group (Figure 4). For Black males, the DA percent net undercount is higher than the corresponding A.C.E. Revision II estimate; for NonBlack males DA measures a small net undercount and the A.C.E. Revision II estimates a small net overcount; for Black females and NonBlack females both DA and the A.C.E. Revision II measure a net overcount but the DA estimate is smaller.
Table 13: Census Count, Demographic Analysis (DA) Estimate and A.C.E. Revision II Estimate for the U.S. Resident Population:
April 1, 2000 (a minus sign indicates a net overcount)
Count or Estimate
- 1. Census Count 281,421,906
- 2. DA Estimate 281,759,858
- 3. A.C.E. Revision II Estimate 280,090,250 Net Census Undercount (Amount)
- 4. DA Estimate (=2-1) 337,952
- 5. A.C.E. Revision II Estimate (=3-1) -1,331,656 Net Census Undercount (Percent)
- 6. DA Estimate (=4/2*100) 0.12
- 7. A.C.E. Revison II Estimate (=5/3*100) -0.48 Source: U.S. Census Bureau Note: 1) A.C.E. Revision II estimate includes an adjustment for correlation bias, based on the DA sex ratios for adult males.
- 2) DA estimate reflects revised estimate published in U.S. Bureau of the Census, 2001, ESCAP II, Report No. 1, October 13.
34
Table 14: Estimates of Percent Net Undercount by Race, Sex, and Age based on DA and A.C.E. Revision II: Census 2000 Category DA A.C.E Revision II BLACK MALE All ages 5.15 4.19 0-9 3.26 0.72 10-17 -1.88 -0.59 18-29 5.71 6.14 30-49 9.87 8.29 50+ 3.87 2.43 BLACK FEMALE All Ages 0.52 -0.61 0-9 3.60 0.70 10-17 -1.20 -0.55 18-29 -0.66 0.00 30-49 1.28 -0.40 50+ -1.03 -2.51 NONBLACK MALE All Ages 0.21 -0.19 0-9 2.18 -0.68 10-17 -2.01 -1.46 18-29 -0.63 0.19 30-49 0.63 1.05 50+ 0.14 -1.10 NONBLACK FEMALE All Ages -0.78 -1.41 0-9 2.59 -0.68 10-17 -1.55 -1.44 18-29 -1.94 -1.54 30-49 -1.01 -0.63 50+ -1.18 -2.42 Source and Notes: See Table 13.
35
Figure 4 a-d. Comparison of Alternative Estimates of Percent Net Census Undercount for 2000: DA Estimates and A.C.E. Revision II Estimates with Adjustment for Correlation Bias Black Female NonBlack Female
[a] [b]
10.0 10.0 8.0 8.0 6.0 6.0 4.0 4.0 2.0 2.0 0.0 0.0
-2.0 -2.0
-4.0 -4.0 0-9 10-17 18-29 30-49 50+ 0-9 10-17 18-29 30-49 50+
Age Group Age Group DA Es timate A.C.E. R evis ion II DA Es timate A.C.E. R evis ion II B lack M ale NonBlack M ale
[c] [d]
10.0 10.0 8.0 8.0 6.0 6.0 4.0 4.0 2.0 2.0 0.0 0.0
-2.0 -2.0
-4.0
-4.0 0-9 10-17 18-29 30-49 50+ 0-9 10-17 18-29 30-49 50+
Age Group Age Group DA E s timate A.C.E . R evis ion II DA E s timate A.C.E . R evis ion II Source: Table 14.
36
- 4. Evaluation of A.C.E. Revision II Results This section discusses the major results of the evaluations of the components of the A.C.E.
Revision II estimation and the evaluation of relative error in the census and the A.C.E. Revision II using confidence intervals and loss functions. The individual components discussed in this section are the identification of census duplicates of records in the E-Sample and P-Sample. The identification of E-Sample records with duplicates affects the challenge of improving the estimation of erroneous enumerations. The identification of P-Sample nonmover residents linked to enumerations outside the search area affects the challenge of improving the estimation of census omissions.
4.1 Evaluation of the Identification of Duplicates Issue:
Could the accuracy of the duplicates identified for A.C.E. Revision II using only computer algorithms and data collected in the census and A.C.E. be validated by an independent data source provided by administrative records and in a review by analysts?
Major Findings:
Generally, administrative records and a review by analysts agreed with the duplicates identified by A.C.E. Revision II. These studies suggest that not all the duplicates were found. The upper bound for additional duplicates outside the search area for the A.C.E. Revision II estimator appears to be 1.2 million in the E-Sample and 2.3 million for P-Sample nonmover residents.
However, some evidence suggests that there may be only half as many. The estimation of the order of magnitude should be possible with additional tabulations of current data.
Detailed Discussion:
Two evaluations provided evidence that duplicate enumerations are present in the census and were not detected by the March 2001 A.C.E. estimation and evaluations. The evaluations also showed that estimation of duplicates has been greatly improved for A.C.E. Revision II. One of the evaluations used administrative records, and the other used the Census Bureaus elite matching team. We will describe the studies and then discuss the results for the estimation of erroneous enumerations and census omissions.
The Census and Administrative Records Duplication Study (CARDS) (Bean and Bauder 2002) independently identified census duplicates of E-Sample enumerations and P-Sample people using administrative records. CARDS first assigned a Protected Identification Keys (PIKs) (based on Social Security Numbers) to each census and P-Sample record. CARDS designated each FSPD as confirmed (same PIK), denied (different PIKs), or undetermined (PIK could not be assigned to at least one record). CARDS also identified duplicates that A.C.E. Revision II did not designate as duplicates. CARDS was conducted in the A.C.E. sample.
In the Clerical Review of Census Duplicates (CRCD) (Byrne et al. 2002), the Census Bureaus elite matching team classified duplicates from A.C.E. Revision II statistical linking and CARDS as confirmed, denied, or undetermined. Some of the duplicates found by CARDS also were 37
found by the A.C.E. Revision II exact matching. CRCD reviewed data collected in the census and A.C.E. for E-Sample cases and P-Sample nonmover residents with duplicates outside the surrounding blocks in a subsample of the A.C.E. block clusters known as the Evaluation Sample.
CRCD did not review duplicates to enumerations in group quarters because the analysts would not have information from other household members to use in making decisions.
Estimation of Erroneous Enumerations Generally, CARDS and CRCD agreed with A.C.E. Revision II on the identification of E-Sample records with duplicates in the census. The estimate of the number of duplicates in the census using only the duplicates identified by administrative records is 6,653,171 while the A.C.E.
Revision II methodology estimated 5,826,478. CARDS found more duplicates that were geographically distant and more group quarters duplicates while the A.C.E. Revision II process was better at finding duplicates that were geographically close.
When we consider duplicates used in the A.C.E. Revision II, we are focusing on the duplicates outside the search area and including duplicates to enumerations that were deleted by the census Housing Unit Duplication Operation (HUDO) (Miskura 2000, Nash 2000). CARDS found approximately 1.2 million additional duplicates outside the search area in the E-Sample not found by A.C.E. Revision II statistical or exact matching. The elite matching team in CRCD raised some questions about the duplicates found only by CARDS. For discussion, we separate the three sources of duplicates, A.C.E. Revision II statistical matching, A.C.E. Revision II exact matching, and CARDS only. The CRCD focused only on households with members who had duplicates found by A.C.E. Revision II statistical matching or CARDS, and the elite matching team found very few additional duplicates in such households. Some of the cases in CRCD found by CARDS and not by A.C.E. Revision II statistical matching also were found by A.C.E.
Revision II exact matching.
The elite matching team agreed with 94.9 percent of the A.C.E. Revision II statistical matching E-Sample duplicates outside the search area, denied 3.8 percent, and found 1.3 percent undetermined. In fact, both CARDS and CRCD agreed with 73.4 percent (922,325 out of 1.25 million) of the duplicates found outside the search area. Also, both studies agreed that 81.7 percent (3.2 million out of 3.9 million) of the links A.C.E. Revision II statistical matching found outside the search area but did not declare duplicates were not duplicates. CRCD alone agreed that 93.8 percent were not duplicates, but that 4.6 percent were duplicates with 1.6 percent undetermined.
For A.C.E. Revision II exact matching duplicates in the E-Sample also found by CARDS, we have results for the elite matching team under the assumption that the probability of being a duplicate equals 1. Under this assumption, the elite team agreed with 80.0 percent of the A.C.E.
Revision II exact matching E-Sample duplicates in housing units eligible for the E-Sample found outside the search area, denied 8.3 percent, and found 11.8 percent undetermined. For duplicates to enumerations outside the search area in housing units that were reinstated or deleted during 38
HUDO, the elite team agreed with 98.5 percent, denied 1.0 percent, and found 0.5 percent undetermined. However, all these cases received a probability of being a duplicate that is less than 1 in the A.C.E. Revision II estimator. A tabulation using the A.C.E. Revision II probability of being a duplicate would represent the agreement rate for these cases as they were used in the estimator.
A.C.E. Revision II exact matching duplicates in the E-Sample that were not also found by CARDS were not included in the CRCD review because they were not available in time for sample selection. The implication is that 384,049 of the duplicates in the E-Sample incorporated in the A.C.E. Revision II estimates were not confirmed (denied or undetermined) by administrative records and not submitted for review by the matching team. Also, 513,984 duplicates to enumerations in group quarters outside the search area were not eligible for review in CRCD.
CARDS found approximately 1.2 million additional duplicates in the E-Sample not found by A.C.E. Revision II statistical or exact matching. The matching teams review raises questions about the duplicates found only by CARDS. The matching team agreed on 37.3 percent, disagreed on 47.3 percent, and was undecided on 15.4 percent. The question is whether matching team is correct for these cases because the reason given for 70.0 percent of the disagreements was household composition. We have concerns about the accuracy of the coding for the cases based on household composition because detecting a person who is truly a member of two different households is difficult.
The interpretation of the CRCD results for the duplicates found only by CARDS is further complicated by the fact that address information was not used in assigning all the PIKs. We have more confidence in duplicates identified using PIKs assigned using address information along with personal characteristics. However, the distributions of the duplicates by whether address information was used in assigning the PIKs for both members of the pair, only 1 member of the pair, or neither of the pair are not dramatically different for those found only by CARDS and those found by both CARDS and A.C.E. Revision II when tabulated separately by within state and between state.
An additional tabulation of the distribution of the CRCD results for the pairs in the same state where one is outside the search area by whether CARDS identified them using a combination of address information and personal characteristics or only personal characteristics would indicate whether there is a difference in the cases found by CARDS only and those found by both CARDS and A.C.E. Revision II . A tabulation of these cases by the amount of agreement between the personal characteristics (first name, middle initial, last name, sex, month of birth, day of birth, age) also would provide insight into the quality of the links.
Estimation of Census Omissions Generally, CARDS and CRCD agreed with A.C.E. Revision II on the identification of P-Sample records with census enumerations, with patterns similar to the results for E-Sample. When we consider P-Sample cases linked to census enumerations used in the A.C.E. Revision II, we are 39
focusing on the cases linked to enumerations outside the search area and including enumerations that were deleted by the census Housing Unit Duplication Operation. We report evaluation results for the P-Sample nonmover residents because they affect the A.C.E. Revision II estimator.
Since the E-Sample includes correct and erroneous enumerations, the results for the E-Sample would be more comparable to the P-Sample nonmover residents and nonresidents combined than to the P-Sample nonmover residents by themselves.
The estimate of the total number of P-Sample nonmover residents linked to enumerations outside the search area of the A.C.E. block based only on administrative records is 7,789,570, where 4,698,642 are matches and 3,090,928 are nonmatches. The estimate for A.C.E. Revision II is 6,264,996 where 3,360,417 are matches and 2,904,579 are nonmatches. The major difference is that for links between P-Sample nonmover residents and enumerations in different states, administrative records estimates 1.0 million more matches and 264,006 more nonmatches.
CARDS found approximately 2.3 million additional P-Sample nonmover residents with census enumerations outside the search area not found by A.C.E. Revision II statistical or exact matching. The elite matching team in CRCD raised some questions about the duplicates found only by CARDS. For discussion, we separate the three sources of duplicates, A.C.E. Revision II statistical matching, A.C.E. Revision II exact matching, and CARDS only. The CRCD focused only on households with members who had enumerations outside the search area found by A.C.E. Revision II statistical matching or CARDS, and the elite matching team found very few additional duplicates in such households. Some of the cases in CRCD found by CARDS and not by A.C.E. Revision II statistical matching also were found by A.C.E. Revision II exact matching.
The elite matching team agreed with 96.3 percent of the A.C.E. Revision II statistical matching P-Sample nonmover residents with census enumerations outside the surrounding blocks, disagreed with 2.6 percent, and were undecided about 1.1 percent. Both CRCD and CARDS agreed 78.1 percent of the P-Sample nonmover residents linked to enumerations outside the search were found by A.C.E. Revision II statistical matching. Also, both studies agreed that 43.1 percent (507,531 out of 1.2 million) of the of the links A.C.E. Revision II statistical matching found between P-Sample nonmover residents and enumerations outside the search area but did not declare duplicates were not duplicates. The agreement on the denials is smaller for the P-Sample nonmover residents than for the E-Sample because a higher percentage for the P-Sample nonmover residents was classified as undetermined by CARDS. CRCD alone agreed that 66.3 percent were not duplicates, but that 26.3 percent were duplicates with 7.4 percent undetermined.
For A.C.E. Revision II exact matching P-Sample nonmover residents with census enumerations outside the search area also found by CARDS, we have results for the elite matching team under the assumption that the probability of being a duplicate equals 1. Under this assumption, the elite team agreed with 67.2 percent of the A.C.E. Revision II exact matching P-Sample nonmover residents with duplicates found in housing units eligible for the E-Sample outside the search area 40
with 17.7 percent denied and 15.2 percent undetermined. For P-Sample nonmover residents with enumerations in housing units outside the search area that were reinstated or deleted during HUDO, the elite team agreed with 98.8 percent, denied 0.9 percent, and were undecided about 0.3 percent. However, all these cases received a probability of being a duplicate that is less than 1 in the A.C.E. Revision II estimator. A tabulation using the A.C.E. Revision II probability of being a duplicate would represent the agreement rate for these cases as they were used in the estimator.
P-Sample nonmover residents with census enumerations outside the search area identified by A.C.E. Revision II exact matching but not also found by CARDS were not included in the CRCD review because they were not available in time for sample selection. The implications is that 622,870 of the P-Sample nonmover residents linked to enumerations outside the search area incorporated in the A.C.E. Revision II estimates were not confirmed (undetermined or denied) by administrative records and not submitted for review by the matching team. Also, 401,634 P-Sample nonmover residents linked to enumerations in group quarters outside the search area were not eligible for review in CRCD.
CARDS found approximately 2.3 million additional P-Sample nonmover residents with census enumerations outside the search area not found by A.C.E. Revision II statistical or exact matching. The matching teams review raises questions about the P-Sample nonmover residents linked to enumerations outside the surrounding blocks found only by CARDS. The matching team agreed on 28.5 percent, disagreed on 56.4 percent, and was undecided on 15.1 percent. A tabulation of the reasons the team gave when they disagreed would indicate whether the reasons are the same as those for the E-Sample duplicates found only by CARDS.
As with the E-Sample, the interpretation of the CRCD results for the P-Sample nonmover residents linked to enumerations found only by CARDS is further complicated by the fact that address information was not used in assigning all the PIKs. However, the distributions of the duplicates by whether address information was used in assigning the PIKs for both members of the pair, only 1 member of the pair, or neither of the pair are not dramatically different for those found only by CARDS and those found by both CARDS and A.C.E. Revision II when tabulated separately by within state and between state. An additional tabulation of the distribution of the CRCD results by whether CARDS identified them using a combination of address information and personal characteristics or only personal characteristics would indicate whether there is a difference in the cases found by CARDS only and those found by both CARDS and A.C.E.
Revision II . A tabulation of these cases by the amount of agreement between the personal characteristics (first name, middle initial, last name, sex, month of birth, day of birth, age) also would provide insight into the quality of the links.
41
4.2 Loss Functions and Confidence Intervals Issue:
What is the relative accuracy of the census and the A.C.E. Revision II estimates for shares and levels for geographic groupings such as states and groups of counties and places used in fund allocations? What is the relative accuracy of the census and the A.C.E. Revision II estimates for domain and tenure groups?
Major Findings:
Evaluations were performed on the A.C.E. Revision II estimates to estimate bias (systematic error) and variance (random error) for use in constructing bias-corrected confidence intervals and in a loss function analysis. The evaluations of bias were relatively limited because data that previously were used to estimate bias were incorporated into the A.C.E. Revision II estimates in order to correct for major errors discovered in the March 2001 A.C.E. estimates. The limited data available for evaluation of bias does not itself reflect negatively on the A.C.E. Revision II estimates; in fact, it is because of the corrections for major errors that we believe the A.C.E.
Revision II estimates to be of much higher quality than the March 2001 A.C.E. estimates.
Nevertheless, although the evaluations do account for the variance arising from the corrections for bias, the corrections for bias in the A.C.E. Revision II estimates may themselves be subject to bias, the magnitude of which has not been quantified. This is particularly true for the corrections for correlation bias and for P-Sample cases that matched census enumerations outside the A.C.E.
search area.
The evaluations detected a small amount of bias in the A.C.E. Revision II estimate of the net undercount rate at the national level, only - 0.16 percent. Based on the bias-corrected 95-percent confidence intervals, both the census and the A.C.E. Revision II estimates are too low for Non-Hispanic Blacks and both Non-Hispanic Black Owners and Renters. The intervals show the census is too high for Non-Hispanic Whites, Owners, White Owners, and Hispanic Owners. All other census and A.C.E. Revision II estimates are covered by their bias-corrected 95-percent confidence intervals. The source of most of the bias estimate is the CARDS evaluation of the identification of duplicates. Tabulations of the CARDS E-Sample and P-Sample cases by race/ethnicity domain and enumeration (or residency) status would explain how the bias arises.
The loss function analysis examines the relative accuracy by using the estimates of sampling variance and nonsampling bias and variance to estimate the aggregate expected loss for the census and the A.C.E. Revision II for levels and shares for counties and places across the nation and within state. The analyses indicated that the A.C.E. Revision II is more accurate than the census for every loss function considered with the exception of levels for places with population of at least 100,000. The bulk of the error in the A.C.E. Revision II for places with population of at least 100,000 appears to lie in the nine (9) places with population of at least 1 million. More research is needed to understand the one exceptional result. The validity of the loss function analysis depends on the quality of the estimates of components of error in the A.C.E. Revision II, and some of those components are not accurately quantified. The resulting limitations on the loss function analysis are discussed below.
42
Detailed Discussion:
Two methods assess the relative accuracy of the estimates of population size from A.C.E.
Revision II and Census 2000. One method examines the quality of the census and A.C.E.
Revision II through the construction of confidence intervals for the census undercount rate corrected for bias as well as variance. The other method uses a loss function analysis to compare the relative accuracy of the census and the A.C.E. Revision II for states, counties, and places.
The confidence intervals and loss function analysis are based on estimates of components of error in the A.C.E. Revision II estimates. The calculations assume that we have available unbiased estimates of the biases and variances of the A.C.E. Revision II estimates. This will not be exactly true though, if biases and variances unaccounted for are relatively small, the loss function results will be approximately unbiased. If errors not accounted for are relatively large, however, the calculations will be biased and the validity of the conclusions will be in jeopardy. Even if the loss function results are approximately unbiased, they still are subject to random error. In principle one could develop a confidence interval for the difference in accuracy for the census and A.C.E. Revision II estimates, but this was not done. The limitations on the loss function analysis are discussed below.
The measure of accuracy used by the loss functions was weighted mean squared error, with weights set inversely proportional to the census counts. Mean squared error equals the sum of variance and squared bias, and the bias and variance estimates account for both sampling and nonsampling errors. Of course, the bias and variance estimates will themselves have errors.
The effect of omitting a variance component (if the corresponding error is uncorrelated with other random effects) would be to overstate the accuracy of the A.C.E. Revision II estimate and to understate the accuracy of the census, but we have not identified significant omitted variance components. The effects of neglecting bias components is more difficult to predict for two reasons: (1) positive biases may cancel with negative biases, and (2) omitting biases affects the estimates of accuracy of both the A.C.E. Revision II estimates and the census. (The direction of the effect on the comparison of accuracy depends on the sign of a weighted sum of products of neglected biases and expected values of the undercount estimates. See Mulry and Spencer (2001) for details. Thus, in general, we cannot be certain whether omitted biases will tend to make any given loss function analysis overstate or understate the comparative accuracy of the A.C.E. Revision II estimates relative to the census. Further analysis could, in principle, be done to investigate this. For example, sensitivity analyses could examine the effects on the loss function analyses of different assumed amounts and distributions of error. This would give indications of the amounts and distributions of error needed to reverse the comparisons from the loss function analysis.
The loss function analysis accounted for some but not all error components that could be identified in the A.C.E. Revision II estimates. More specifically, the bias estimate included error components for inconsistency of post-stratification assignments based on census versus A.C.E.
data, for error from estimating the numbers of outmovers by the numbers of inmovers, and most 43
importantly, for error in the estimates of census duplicates although evaluations indicate that this error may have been misestimated. The variance estimate included sampling error components from both phases of sampling in A.C.E. Revision II estimates, and also random nonsampling error components from choice of imputation models and for models used to account for P-Sample cases that matched census enumerations outside the search area. The error in the identification of census duplicates is discussed in Section 4.1 and the other components are discussed further in Section 6. The potential errors for which the loss function analysis did not include an allowance are listed in Section 6.5 Though not fully included in the loss functions the effects of synthetic error were investigated.
One source of synthetic error involves correcting the individual post-stratum estimates for errors estimated at more aggregate levels (such as the corrections for correlation bias and coding errors). Two of the variance components noted above (those related to choice of imputation models and to accounting for P-Sample cases matching to census enumerations outside the search area) were included in the loss functions, but these components reflect the level of these errors, not the synthetic errors from such corrections. Errors from other such corrections, such as the adjustments for correlation bias, were not reflected. Another source of synthetic error is variations of census coverage within post-strata (something not captured by synthetic application of post-stratum coverage correction factors for specific areas). Analyses based on artificial populations that simulated patterns of coverage variation within post-strata were done to assess whether omission of resulting synthetic biases from the loss function analysis tilted the comparisons in one direction or another. These analyses did not in general change the loss function results, though they had some limitations. It should be kept in mind that synthetic error is expected to be more important the smaller are the areas whose estimates are being compared, so that any limitations of the loss functions regarding synthetic error would be expected to be more important in comparisons for small places or counties than for large places or counties.
While there are acknowledged limitations in the loss function analyses of the A.C.E. Revision II estimates, it is worth noting that the current analyses are markedly superior to the loss function analyses conducted of the March 2001 estimates. Furthermore, it was realized in March 2001 that there were very significant limitations of those analyses. First, inconsistencies with the Demographic Analysis estimates suggested there were potentially significant errors in the March 2001 estimates (something ultimately found to be the case), and such errors were not reflected in the loss function targets. Second, since the 2000 A.C.E. evaluation data were not then available, the analysis applied estimated error rates from the 1990 Post Enumeration Survey to the 2000 A.C.E. results as a crude approximation. It was clear this would lead to mis-specification of the resulting error components, though the amount of mis-specification was unknown. While sensitivity analyses (see Navarro and Asiala 2001) attempted to address this limitation, given our current knowledge it is clear that this was inadequate. In particular, the range considered for data collection error was far too narrow to reflect the large number of undetected census duplicates. As a result, the March 2001 loss function analysis was incorrect in its assessments of accuracy of the census and the A.C.E. estimates. In contrast, while there are limitations with the loss function analyses of the A.C.E. Revision II estimates, we have no evidence to suggest or reason to expect that these limitations approach the magnitude of those of the March 2001 loss function analyses.
44
When viewing the confidence intervals and results of the loss function analysis for the A.C.E.
Revision II, one must keep the assumptions and limitations in mind. For example, the estimated bias in the A.C.E. Revision II estimates may not account for all the sources of bias or may not account for the included nonsampling error components well. Due to time limitations, estimates of ratio-estimator bias are not included. Estimates of correlation bias used in the A.C.E.
Revision II are assumed to be without error. The estimated variance in the A.C.E. Revision II estimates may not account for all the sources of variance or may not account for the included nonsampling error components well, especially for error from choice of model used in the estimation of the probability of being a resident for the P-Sample nonmover residents with duplicates. As for choices in the loss function analysis, the expected loss could instead have been measured by a loss function other than squared error weighted by the reciprocal of the census count.
Considering the limitations, the bias-corrected estimate of the net undercount rate for the U. S. is
-0.33 percent while the A.C.E. Revision II estimate is -0.49 percent. The explanation for the estimated bias appears to be due to error in the identification of duplicates since the effects of the error due to inconsistent post-stratification variables and the error due to using inmovers to estimate movers appear very small. Additional tabulations by enumeration and residency status by domain would indicate whether the increase in the undercount rate arises from the effect of undetected duplicates in the P-Sample or the E-Sample. For example, if the evaluation detected duplications of erroneous enumerations in the E-Sample, the A.C.E. Revision II estimate would increase.
Based on the bias-corrected 95-percent confidence intervals, both the census and A.C.E.
Revision II estimates for Non-Hispanic Blacks, Non-Hispanic Black Owners, and Black Renters are too low. Neither the census nor A.C.E. Revision II estimates lie within the 95-percent confidence interval that includes a bias correction. The bias-corrected estimate of the net undercount rate for the Non-Hispanic Blacks is 3.56 percent while the A.C.E. Revision II estimate is 1.72 percent. Additional tabulations by enumeration and residency status by domain would indicate whether the increase in the undercount rate arises from the effect of undetected duplicates in the P-Sample or the E-Sample. The estimate of a 2.78 percent net undercount rate for Blacks based on Demographic Analysis (Robinson and Adlahka 2002) does lie within the 95-percent confidence interval. The intervals for all the other domains cover both the census and the A.C.E. Revision II estimate, with the exception of the census for Non-Hispanic Whites. When the groups are the domains crossed by tenure, the bias-corrected 95-percent confidence intervals covered both the census and the A.C.E. Revision II estimate for the groups, with the exception of the census for all Owners, Non-Hispanic White Owners, and Hispanic Owners where intervals indicated the census was too high.
The loss function analysis considered shares for five geographic groupings and levels for five geographic groupings, with some overlap of groupings. If we accept the error components as estimated, the analyses indicate that the A.C.E. Revision II is more accurate than the census for every loss function considered with the exception of levels for places with population of at least 45
100,000. When the places with population of at least 100,000 are split into places with population between 100,000 and 1 million and places with population of at least 1 million, the loss function analysis indicates that the bulk of the error in the A.C.E. Revision II for places with population of at least 100,000 lies in the nine (9) places with population of at least 1 million.
The loss function analyses did not take synthetic estimation error into account, but separate analyses (Griffin 2002) suggest that had synthetic error been included, the conclusions would have been the same.
The major source of estimated bias in the A.C.E. Revision II concerns the estimation of census duplicates. There are two evaluations of those estimates, Census and Administrative Records Study (CARDS) (Bean and Bauder 2002) and Clerical Review of Census Duplicates (Byrne et al.
2002). The estimation of the bias in the loss function analysis is based on CARDS. There are some discrepancies in findings from CARDS and CRCD. If these differences were resolved, one or more of the conclusions from the outcome of the loss function analysis could change.
However, under the assumption that the A.C.E. Revision II estimates have only the bias due to inconsistent reporting of poststratification variables, which is very small, and the only other error components are the estimated sampling and nonsampling variance components, the loss function analysis finds that the A.C.E. Revision II estimates are more accurate than the census for all groupings considered, even for levels for places with population of at least 100,000. Further analyses assuming larger amounts of bias or a different distribution of the bias would increase the knowledge of the limitations of the data.
46
- 5. Limitations of the A.C.E. Revision II Estimates This section discusses some other issues that arose with the A.C.E. Revision II estimates that are not discussed above. Most of these relate to methodological decisions that involved some uncertainty, i.e., where other decisions could have been made and this would have had some appreciable impact on the results. Table 15 below summarizes the issues, and Sections 5.1 to 5.6 that follow provide general discussions (with references given to reports providing still more detailed discussions when these are available). Some of these uncertainties are reflected to some extent in the loss function calculations discussed in Section 4, but some are not. This latter topic is discussed in Section 6.5.
5.1 Adjustment for Correlation Bias Summary:
Correlation bias refers to the tendency towards underestimation by DSEs if persons found in the census are more likely than those missed in the census to also have been found in the coverage survey. The adjustment corrects for correlation bias in adult male DSEs at the national level by age-race (Black versus NonBlack) groups subject to the assumptions of: (1) no bias in female DSEs, and (2) accuracy of the DA sex ratios (modified for comparability with the A.C.E.
Revision II universe). The correlation bias adjustment added about 800,000 persons to the estimates for Blacks and about 900,000 to the estimates for NonBlacks. In relative terms, it was thus much more important for Blacks than NonBlacks (since Blacks are a much smaller group overall). Without the correlation bias adjustment we would have estimated a 0.53 percent net overcount for Blacks rather than a (statistically significant) net undercount of 1.84 percent.
47
Table 15: Summary of Issues Discussed in Sections 5.1 to 5.6 Section Title Issue 5.1 Adjustment for Correlation Bias Alternative models for correlation bias can be used that are equally consistent with the data but that produce different subnational estimates.
5.2 Underestimation of Duplicates by the Some underestimation is expected since it is not possible A.C.E. Revision II Further Study of to detect all census duplicates by computer matching due Person Duplication to coincidental agreements of names and birth dates, and to difficulties in detecting duplicated persons whose records contain missing data.
5.3 Alternative Approaches are Available for For most detected duplicate pairs it is not possible to Tabulating Contributions to Correct determine which member of the pair is the true correct Enumerations from E-Sample Cases with enumeration. There are alternative ways of handling Duplicate Links duplicates for tabulating correct enumerations that are consistent with the one correct enumeration per duplicate pair principle yet produce different post-stratum estimates, and hence different subnational estimates.
5.4 Alternative Approaches are Available for For most P-Sample cases linked to census cases outside Assigning Residency Probabilities to P- the search area it is not possible to determine which of the Sample Cases that Link to Census Cases two locations should be regarded as the persons census Outside the Search Area day residence. Alternative approaches are possible for treating these cases in tabulating P-Sample total residents and matches. Furthermore, there is no obvious aggregate control analogous to the one correct enumeration per duplicate pair principle used in the tabulation of correct enumerations for E-Sample cases with duplicate links.
5.5 Use of Different Post-strata for the E- The new E-Sample post-stratification explains Sample and P-Sample Could Either significantly more variation in correct enumeration rates Reduce or Increase Synthetic Error than did the previous post-stratification, and this could reduce synthetic error under certain conditions. However, since the new E-Sample post-stratifying factors could not readily be tested for use in post-stratifying the P-Sample, it is also possible that these separate E- and P-Sample post-strata lead to a systematic bias and to more synthetic error. At this point whether this feature reduces, increases, or has little effect on synthetic error is unknown.
5.6 Other Issues in Synthetic Estimation The synthetic nature of the corrections made for correlation bias and for duplicates (note issue statements above for Sections 5.1, 5.3, and 5.4) lead to synthetic error in post-stratum estimates, and hence also in estimates for geographic areas or population subgroups.
48
Limitations:
(1) Different models that provide the same fit to the data can be used to allocate among post-strata the correlation bias estimated at the national level for age-race (Black versus NonBlack) groups. There is unresolvable uncertainty about which model is most appropriate, yet the different models yield different subnational estimates.
(2) The adjustment assumes no bias (including correlation bias) in DSE for adult females. Also, no correlation bias is estimated for children.
(3) Data for NonBlacks 18-29 do not support estimation of correlation bias - this could be due to errors in the DA or A.C.E. Revision II data for this group, and possibly to failure of the assumptions noted.
Detailed Discussion:
Shores (2002) discusses the calculation of the correlation bias adjustments in the A.C.E.
Revision II estimates. Bell (1993) gives a more detailed discussion of the underlying methodology, and Bell (2001a, 2001b) discusses application of alternative models for correlation bias adjustments to the March 2001 A.C.E. data. As noted in these references the approach used assumes no correlation bias for children and adult females, and adjusts estimates for adult males so national tabulations by age-sex-race (Black versus NonBlack) will reproduce sex ratios calculated from demographic analysis estimates modified for comparability with the A.C.E.
Revision II estimates. (e.g., one part of the modification is to subtract the census count of the group quarters population from the DA estimates.) Subject to the assumptions made and to the quality of the modified DA sex ratios, correlation bias adjustment should correct for a tendency of the DSE to underestimate census omissions of adult males. For A.C.E. Revision II the correlation bias adjustments were obtained from the two-group model, which assumes constant relative (percentage) bias in the adult male DSE over all post-strata within an age-race group.
A fundamental issue regarding correlation bias adjustment is that there are various alternative models that can be used that are equally consistent with the available data (here A.C.E. Revision II data and modified DA sex ratios). This is due to the limited detail of the DA estimates, which are available only at the national level by age, sex, and Black versus NonBlack race. The different models will all reproduce the national modified DA sex ratios yet produce different subnational estimates. Since detailed data to discriminate between these alternative models does not exist, this issue is unavoidable. There are essentially two options:
- accept the correlation bias reflected in the aggregate results (i.e., dont adjust for it) because we cannot determine that any one model for adjusting for correlation bias produces results that are closer to the truth than any other;
- pick a model and use it to adjust for correlation bias understanding that other models could have been chosen and these would have produced different subnational results.
49
In planning leading up to the 2000 census it was decided not to adjust the March 2001 A.C.E.
estimates for correlation bias for the reason cited in the first option above. Correlation bias adjustment was also considered and rejected for the 1990 PES estimates partly for this reason, though there was also considerable concern at that time about how the correlation bias adjustment would fit into the 1990 PES production schedule, as well as the perceived complexity of the adjustment from what was then a new methodology. (These latter reasons no longer apply-the models are now better understood and some of them, particularly the two-group model, are quite simple.) Another concern in work on the 1990 PES focused on what could happen if the female DSE were biased upward. In this case, correlation bias adjustment of the male DSE to force sex ratios of the resulting aggregates to agree with the modified DA sex ratios could result in overestimates for males as well. (This issue is related to a point made by Wachter and Freedman (1999) about the presence of other biases in DSE potentially affecting estimates of correlation bias.) Given the bias towards overestimation in the March 2001 DSE due to underestimation of erroneous enumerations, in hindsight this issue was quite relevant to the March 2001 estimates.
In determining the methods to be used for the A.C.E. Revision II estimates we favored the reasoning behind the second option above. Our thinking was influenced by the pattern of both estimated undercounts and overcounts from the A.C.E. Revised Preliminary estimates, as noted in Hogan (2002). In the work on the 1990 PES and in planning for the 2000 A.C.E., most concerns were focused on census undercount. In this setting, the DSE without correlation bias adjustment were viewed as conservative in that they would increase the estimates in the direction of truth though perhaps (due to correlation bias) not far enough. For A.C.E. Revision II, however, the situation is different. First, the Revision II DSE are already adjusted for other known significant biases, addressing the concern noted above about possible overestimation by female DSE and its consequences for correlation bias adjustment of male DSE. Second, in A.C.E. Revision II the DSE without correlation bias adjustment could yield estimated overcounts for groups that were truly undercounted. If the magnitude of such estimated overcounts exceeded the magnitude of the true undercounts, then the DSE without correlation bias adjustment would actually move the estimates further from the truth, a result that can hardly be regarded as conservative.
In the face of uncertainty about the most appropriate model for correlation bias adjustment we picked the two-group model since it is the simplest of the available models, and also the model we expected would produce estimates with the lowest variances.6 The alternative models can generally be thought of as producing subnational estimates with unknown biases, though with lower expected biases overall than estimates without the correlation bias adjustment. Since we do not know which of the alternative models yields the least biased subnational estimates, choosing the model expected to produce estimates with the lowest variances makes some sense.
6 Variance comparisons for the alternative correlation bias adjustment models have not been made on the A.C.E. Revision II data. In evaluations of the 1990 PES, variances were calculated for four models. While not including the two-group model, these comparisons suggested that models that allocate correlation bias according to more stable quantities have lower variances. The two-group model allocates correlation bias proportional to the DSE, which should yield more stable estimates than the other models.
50
Another justification for making a correlation bias adjustment in the face of the unresolvable uncertainty about the most appropriate adjustment comes from some comparisons reported in Bell (1997). Using 1990 PES data (357 post-strata) estimated state shares from alternative correlation bias adjustment models were compared among themselves and with corresponding PES estimates without correlation bias adjustment, as well as with state shares from the 1990 census counts. To summarize the results, comparisons made various ways showed that while there were differences between the estimates from the alternative models for correlation bias adjustment, the differences of these estimates from the DSE without correlation bias adjustment were generally larger, and differences from results obtained from the census counts were much larger still. Bell (1993) reported similar comparisons for post-stratum estimates by age groups for Black and NonBlack race groups using the original 1990 PES estimates (from 1,392 post-strata). The comparisons for adult Blacks showed more agreement among the alternative models with correlation bias adjustment (four were considered) than between these estimates and the original 1990 PES estimates without correlation bias adjustment. The results were reversed for adult NonBlacks, however.
The confidence intervals and loss function analyses (Mulry and ZuWallack 2002) take the A.C.E.
Revision II estimates and adjust for other biases in defining targets that are assumed unbiased.
Thus, the targets used implicitly assume the correlation bias adjustment made in the A.C.E.
Revision II estimates (from the two-group model) is correct, and so do not reflect any uncertainty about the appropriateness of the correlation bias adjustment. It would be possible to do such analyses using estimates adjusted for correlation bias via a different model, and thus reflect some error in the correlation bias adjustments, but that could not be carried out in the time available.
The A.C.E. Revision II results presented another issue with adjustment for correlation bias in that we could not estimate correlation bias for NonBlack Males age 18-29, and so could make no correlation bias adjustment for this group. This is because DSE without correlation bias adjustment from the A.C.E. Revision II data yielded sex ratios for NonBlacks 18-29 that exceeded the corresponding modified DA sex ratios. Use of these results under any of the available models would imply overestimation of males by the DSE, which does not correspond to any reasonable notion of correlation bias. This situation may be due to errors in the A.C.E.
Revision II or DA data for NonBlacks 18-29. In particular, the DA estimates for this group may be most affected by errors in estimates of undocumented immigration. Another possibly contributing factor is that the assumption of no correlation bias for NonBlack females 18-29 may not hold. Whatever the reason, the same issue was present with the March 2001 A.C.E. results.
If data problems affect the results for NonBlacks 18-29, they may also affect the results for other NonBlack age groups. In fact, sex ratios for DSE without correlation bias adjustment from the A.C.E. Revision II data for NonBlacks 30-49 and 50+ are only slightly larger than the corresponding modified DA sex ratios, resulting in only small correlation bias adjustments for these groups. The results for NonBlacks could be due to their not being very much correlation bias for this group, or to problems in the Revision II data or the modified DA sex ratios, or to failure of the underlying assumptions (e.g., there could be correlation bias for females). We are 51
unable to tell which of these is the case. As noted in Shores (2002), very similar patterns were observed in the results from the 1990 PES (357 post-strata), except that the 1990 PES sex ratio for NonBlacks 18-29 was slightly lower than that from DA, reflecting a small amount of possible correlation bias (0.3 percent).
Finally, it is worth commenting on results from a modified version of the two-group model that assumes Hispanic males have the same correlation bias as Black males. This model produced results for Hispanic males that were dramatically different from those for the two-group model and other models that were tried. In our original planning (before results were available) we considered using this model. However, with further review we realized that we should, in principle, be able to find some evidence bearing on this assumption about Hispanic male correlation bias from the data on NonBlacks. We were unable to find evidence supporting this assumption for NonBlacks 18-29 and 30-49 even under the extreme assumption that all correlation bias for NonBlack males was due to correlation bias for Hispanic males. Results for age 50+ were less relevant; since Hispanic males are a relatively small proportion of NonBlack males (about 7.0 percent) the assumption that all NonBlack correlation bias is from Hispanic males is very extreme for this group. While the low estimates of correlation bias for NonBlack males (including Hispanic males) could be due to unknown data errors, as noted above, it nonetheless means that use of the modified two-group model would involve making an assumption that is not supported by our data. For this reason we dropped consideration of the modified two-group model, except for illustrative comparisons.
5.2 Underestimation of Duplicates by the A.C.E. Revision II Further Study of Person Duplication (FSPD)
Summary:
Duplicates that were detected by the FSPD (Mule 2002b) represent a substantial correction to the March 2001 A.C.E. estimates. (See Table 12 of Section 2.3 for the general magnitude of these corrections.) Significant improvements were made to the previous duplicate methodology resulting in higher detection efficiency and less need for an efficiency adjustment to correct for underestimation. One measure of the efficiency of the duplicate detection comes from comparisons against clerical detection of duplicates within the A.C.E. sample blocks. For the A.C.E. Revised Preliminary estimates such an evaluation formed the basis of an efficiency adjustment (Mule 2002a). The two primary reasons an efficiency adjustment was not done for the FSPD were: (1) for some groups, mainly cases in households with two or more persons duplicated, the rate of duplicate detection appeared quite high, and (2) for other groups, mainly single person households or single person duplicates in multi-person households, lower efficiency estimates obtained within the sample blocks may not apply to detection of duplicates outside the sample blocks.
Limitations:
(1) Duplicates of one person households or of single persons in multi-person households are inherently difficult to detect in a nationwide duplicate search using the available information because of coincidental agreements of names and birth dates. This is particularly true for persons with common names. Thus, some underestimation of duplicates is to be expected.
52
(2) The reasons for duplication within the sample blocks may be different than the reasons for duplication at greater distances (outside the sample blocks). Thus, comparisons of FSPD results against results from clerical detection of duplicates within the sample blocks do not provide a clear basis for an efficiency adjustment to correct for underestimation of duplicates.
Detailed Discussion:
The FSPD for A.C.E. Revision II detected census duplicates using a combination of statistical and exact matching techniques (Mule 2002b). Statistical matching could be used whenever two or more duplicate records were detected between housing units. When there was only one duplicate record between housing units, and in detecting duplicates between housing units and group quarters, we had to rely on exact matching. Since exact matching requires exact agreement on all characteristics used, it was less sensitive in detecting true duplicates than statistical matching. The overall goal of the FSPD was to improve the techniques used and thus improve on the results from the ESCAP II analysis (Fay 2002a, Mule 2001).
In general, duplicate detection involves a tradeoff between the efficiency with which true duplicates are detected (sensitivity) and the rate of detection of false duplicates (specificity). The methods used took steps to control the detection of false duplicates, but there were reasons to expect that not all duplicates could be found. Reasons for this include: (1) the need to rely on exact matching instead of statistical matching in certain situations (as noted above), (2) the need to allow for chance agreement of name and birth date between different persons, which becomes particularly important when matching across wide geographic areas for persons with common names, and (3) duplicates that involve census records with incomplete information (such as missing birth dates).
We estimated efficiency of the duplicate detection in the FSPD as was done for the ESCAP II
- analysis, that is, by using the duplicates detected within the sample blocks by the A.C.E. clerks as a benchmark. We estimated two efficiency measures this way. The first was the estimate using only the links to cases in the A.C.E. universe as was done by Mule (2001). Using this approach, the overall efficiency was 64.7 percent for FSPD versus 37.8 percent within the cluster for the ESCAP II analysis. The second estimate used the cases in the A.C.E. universe augmented by duplicates detected in the Housing Unit Duplication Operation (HUDO) as was done by Fay (2002a). Using this approach, we estimated an overall efficiency for FSPD of 86.9 percent versus Fays estimate of 75.7 percent for the ESCAP II analysis. While there is clear evidence of improvement, there is also evidence that some duplicates were missed, which, for reasons noted above, was expected.
A.C.E. Revised Preliminary undercount estimates were derived from the March 2001 A.C.E. data (Mule 2002a). Efficiency adjustments were made to the A.C.E. Revised Preliminary estimates to inflate the number of duplicates detected based on the efficiency estimates made using the clerically detected duplicates. An efficiency adjustment was not applied to the A.C.E. Revision II estimates for several reasons. One was the improved efficiency of duplicate detection from the 53
FSPD. In fact, when the population was broken into groups by such things as number of duplicate links per household, Mule (2002b) noted efficiency rates exceeding 90-percent for some groups. The lowest efficiency estimates were for the groups for whom we had to rely exclusively on exact matching results - duplicates between single person households and single duplicate links that involved multi-person households.
Another reason we did not do an efficiency adjustment, even for those groups for which the estimated efficiency was low, is that to apply such adjustments would require us to assume that the efficiency of duplicate detection within the sample clusters is the same as the efficiency of duplicate detection outside the sample clusters. This seems questionable because, for the specified subgroups, the mechanism that is causing the duplication within the cluster is suspected to be different from that causing duplication outside the cluster. For instance, duplicates within the cluster can be caused by misdelivery of forms or related families living close together. As the geographic distance increases, the duplicates are more likely to be movers, persons with two residences, or children in joint-custody situations. Also, there may be other variables like age or the type of response (Both Mail returns versus One Mail/One Non-Mail versus Both Non-Mail) which can show differential efficiency. Including these variables could produce different adjustments than the ones used in this analysis. Because the assumptions required seemed questionable, we decided not to adjust the estimates for efficiency of duplicate detection out of concern that this might in fact overadjust the estimates for certain groups.
The results from the FSPD were also the subject of two evaluation studies. The Census and Administrative Records Duplication Study or CARDS (Bean and Bauder 2002) compared FSPD results against duplicates detected using administrative records data. In the Clerical Review of Census Duplicates or CRCD (Byrne et al. 2002) clerks examined a sample of duplicates detected by FSPD as well as CARDS and tried to assess the accuracy of both. The CARDS study concluded that, The FSPD process was more effective at finding duplicates that are geographically close, while CARDS identified more duplicates that are geographically distant, and CARDS identified more group quarters duplicates. It also concluded that, CARDS links that were geographically more distant were more questionable. The CRCD study also raised questions about the duplicates detected by CARDS that were not detected by FSPD, concluding that about half of these were not true duplicates. Of course, clerical identification of duplicates cannot be regarded as error-free either. In particular, it seems unlikely that clerks could appropriately take into account the real phenomenon of chance agreement of names and birth dates since doing so requires sophisticated probability calculations (Fay 2002b).
The CARDS results were used to make estimates of the efficiency of the FSPD at detecting duplicates. See Table 15 in the CARDS analysis by Bean and Bauder (2002).
To allow for some uncertainty about the accuracy of the CARDS results, she did calculations both assuming the CARDS results were completely accurate and also assuming that only half of the duplicates CARDS detected that FSPD did not were true duplicates. She concluded that the efficiency estimate from comparison to A.C.E. within cluster clerical duplicate detection results was definitely too low for single duplicate links found within multi-person households. As this 54
was one of the groups that the within cluster comparisons suggested had lowest efficiency, and hence for which an efficiency adjustment would have the greatest effect, this suggests that concerns about such an adjustment and the possibility of overadjustment were well-founded.
This result provides some justification for the decision not to do the efficiency adjustment.
5.3 Alternative Approaches are Available for Tabulating Contributions to Correct Enumerations from E-Sample Cases with Duplicate Links Summary:
The A.C.E. Revision II estimates make substantial corrections to estimates of correct enumerations at the national level to account for underestimation of erroneous enumerations due to duplicates. (See Table 12 of Section 2.3 for the general magnitude of these corrections.) For national totals about the only assumption needed is that each duplicate pair has one correct enumeration and one erroneous enumeration. (This ignores the possibility that both members of a duplicate pair are erroneous enumerations, but that is presumed to happen rarely.) For national totals, it does not matter which member of a duplicate pair is the correct enumeration, but this does affect post-stratum estimates, and hence subnational estimates.
Limitations:
(1) For most duplicate pairs there is no way to know which member of the pair is the correct enumeration and which is the erroneous enumeration. Alternative approaches could be used to pick one correct enumeration from each duplicate pair, or to weight both members of a duplicate pair (e.g., give weight 1/2 to both) while still remaining consistent with the one correct enumeration per duplicate pair principle. For duplicate pairs whose numbers are in different post-strata (e.g., one renter record and one owner record), the choice of approach affects the post-stratum estimates.
Detailed Discussion:
Bell (2002b) discusses this issue. The basic principle we followed was that used by Fay (2002a)
- each detected duplicate pair would contribute one correct enumeration in the A.C.E. Revision II estimation. This would be wrong for those duplicate pairs where both enumerations were truly erroneous, but that is probably a minor problem. Of more concern is the fact that we do not know which member of the duplicate pair is truly correct and which is truly erroneous. This does not matter much for aggregate tabulations at the national level but does matter for post-stratum estimates since the two records for a duplicated person could fall in different post-strata (e.g., one in an owner post-stratum and one in a non-owner post-stratum). Various assumptions can be made about what weights to assign to the two members of a duplicate pair as long as the two weights sum to 1. The simplest assumption would be to assign weight 1/2 to both. We felt we could do somewhat better than this, and so made the following assumptions, broken into general cases:
55
- For duplicate links of E-Sample persons to group quarters (GQ) residents we followed Fay (2002a) and assumed the GQ enumeration to be correct and the E-Sample enumeration to be erroneous. This assumption simplified tabulations since it avoided the need to modify the census GQ counts. For GQs that do not allow residents to claim a usual home elsewhere (UHE) this assumption is consistent with census residence rules. For residents of GQs who can claim a UHE (which provide about 11.0 percent of E-Sample duplicate links to GQ residents outside the search area) this could be wrong, but as Fay pointed out such persons do not have to claim a UHE nor would it necessarily be right for them to do so.
- For duplicate links involving a person 18+ listed as a child of the householder in one source and not a child of the householder in the other source we assumed the not a child of enumeration to be the correct enumeration and the other to be an erroneous enumeration. This was designed to handle adults who were actually living independently (and not in a GQ) but who were also listed at their parents residence, such as college students living off-campus.
- For E-Sample persons with a duplicate link who were coded as erroneous enumerations, we assumed the erroneous enumeration code was right and so assigned the E-Sample record a weight of 0 (implicitly assigning the linked record a weight of 1). This reflected a belief that the primary problem with duplicates was the failure to recognize when an A.C.E. sample person was actually resident elsewhere, resulting in wrongly coding erroneous enumerations as correct enumerations, but that relatively few errors were made when erroneous enumeration codes were actually assigned, particularly to cases with duplicate links.
- For other duplicate links to E-Sample universe cases we assigned a correct enumeration probability so that over all duplicate links the weighted number of correct enumerations would be 1/2 the weighted number of duplicate links. This took into account the contributions to correct enumerations from the previous case just described. It was done separately for three race/Hispanic origin by Tenure domains within each of three linked situations. See Kostanich (2003a) for details.
To summarize, for the first three cases above we believed we could infer which member of a duplicate pair of enumerations was likely to be correct. For the remainder, however, we were uncertain which of the pair of enumerations was correct, and so assumed either was equally likely to be the correct enumeration, assigning correct enumeration probabilities to the E-Sample cases so as to maintain the desired overall number of correct enumerations from the one correct enumeration per duplicate pair rule. There is thus some unavoidable uncertainty regarding how post-stratum tabulations of correct enumerations should account for erroneous enumeration due to duplication in the census.
56
Uncertainty about how correct enumerations from duplicated census records are distributed geographically also leads to synthetic error in estimates for geographic areas - see Section 5.6.
5.4 Alternative Approaches are Available for Assigning Residency Probabilities to P-Sample Cases that Link to Census Cases Outside the Search Area Summary:
For each P-Sample case the FSPD searched the entire census for matching persons. Sometimes matching census records were found at locations outside the search area for the P-Sample case. Such links to census persons outside the search area occurred both for P-Sample matches and for P-Sample nonmatches. (For matches this implied duplication between the matching census record and the newly linked census record outside the search area.) In either case these links for P-Sample persons raised doubts about whether the P-Sample person actually was a Census Day resident at the given address. (Exception: P-Sample persons classified as inmovers were already known not to be Census Day residents.) The A.C.E. Revision II methodology assigned probabilities of being a Census Day resident to all P-Sample persons that linked to census records outside their search area in an analogous manner to how E-Sample duplicate links were handled. These residence probabilities were used in tabulations of P-Sample estimates of matches and nonmatches to correct the tabulations for these detected links. Overall this reduced the DSE because the percentage reduction in the number of nonmatches was larger than that for matches (both matches and nonmatches decrease from this correction because the linked cases get residence probabilities less than one.) (See Table 12 of Section 2.3 for the general magnitude of these corrections.) This raised the estimated P-Sample match rate, which lowered the DSE overall.
Limitations:
(1) As was the case with the E-Sample duplicate links, there is generally no way to know whether the P-Sample record gives the persons true Census Day residence.
In principle it could be that the P-Sample record is always correct and the census record outside the search area is always wrong, or that the census record outside the search area is always correct and the P-Sample record is always wrong. More likely, truth lies somewhere in between these extremes, though exactly where is uncertain. Thus, alternative approaches could be used to assign residency probabilities to these linked P-Sample cases, and how this is done affects the post-stratum estimates.
(2) For P-Sample links there is no symmetry argument analogous to that used for the E-Sample duplicates (of one correct enumeration per duplicate pair with either member equally likely to fall in the sample). Thus, there is no clear target contribution from these P-Sample links to aggregate tabulations of P-Sample total persons and matches.
57
Detailed Discussion:
As part of the FSPD for A.C.E. Revision II we attempted to link P-Sample records to census records outside the search area. To do this we used essentially the same techniques that were used to link E-Sample records to census records (to detect census duplication). When a P-Sample person appears to be the same as the person represented by a census record outside the search area this casts doubt on the validity of the P-Sample record as a Census Day resident (except for P-Sample inmovers, since for most of these we would expect to find a matching census record outside the search area). In this situation either the P-Sample record is correct and the census record is erroneous, or the census record is correct and the P-Sample person was not truly a Census Day resident at the address where they were found in the P-Sample. A third possibility, which we shall ignore, is that both records are wrong.
Dealing with these P-Sample links is more difficult than dealing with the E-Sample duplicate links because we cannot make the same kind of symmetrical argument that was made for the E-Sample (of one correct enumeration per duplicate pair with either member equally likely to fall in the sample). We decided to put the linked P-Sample people in the same groups discussed in Section 5.3 that were used for E-Sample duplicates, and assign a probability of Census Day residency equal to the corresponding groups correct enumeration probability. People coded nonresidents in A.C.E. remained nonresidents. This method is justified in two ways. First, matched P-Sample records (with links outside the search area) should have the same probability of being correctly listed as the corresponding E-Sample record had of being a correct enumeration since they are the same person. Second, for similar groupings we might expect the same correct enumeration/Census Day residency rate for the census and the P-Sample. This will be true if we have defined groups that are homogeneous with respect to Census Day residency/correct enumeration probabilities.
The plausibility of the chosen methodology is only weakly supported by the observed correct enumeration and residency rates of the A.C.E. Revision II Measurement Coding Operation (revision coding) of cases with duplicate links. The revision coding coded 76.9 percent of E-Sample records with duplicate links as correct enumerations, and coded 76.2 percent of P-Sample records with links outside the search area as Census Day residents. While there are limitations to this analysis (since overall the revision coding did not code a sufficient number of cases with duplicate links as erroneous enumerations), the closeness of these two rates suggests similarity of the probability of a P-Sample person with a link being a resident and that of a linked E-Sample person being correctly enumerated.
5.5 Use of Different Post-strata for the E-Sample and P-Sample Could Either Reduce or Increase Synthetic Error Summary:
In A.C.E. Revision II a substantially new post-stratification was introduced for the E-Sample, while only one small change was made to the P-Sample post-stratification (age group 0-17 split into 0-9 and 10-17). Prior work on post-stratification focused on post-stratifying the P-Sample for estimating census omission rates, with the E-Sample simply using the same post-strata. The 58
new E-Sample post-stratification explains significantly more variation in correct enumeration rates than did the previous post-stratification, which could lead to improvements in synthetic estimates under certain conditions. Under other conditions, however, this could actually lead to worse synthetic estimates, while under still other conditions it may not make much difference.
(See the Limitations below for more explanation.) Unfortunately, we dont know which set of conditions is present.
Limitations:
To explain the nature of this issue consider post-stratification of the E-Sample on proxy versus nonproxy response to the census. This variable had the most dramatic effect on explaining variation in correct enumeration rates because proxies have much lower correct enumeration rates than nonproxies. It could not be used in the P-Sample post-stratification because census proxy status is undefined for P-Sample nonmatches. Whether or not post-stratifying the E-Sample on proxy status improves synthetic estimates depends on two things: (1) the proxy versus nonproxy composition of a given area, and (2) how census inclusion probabilities differ between proxies and nonproxies. (The census inclusion probabilities are what we conventionally estimate by P-Sample match rates, though they are also affected by any correlation bias adjustment.) We can say the following:
(1) For areas whose proxy proportion of census respondents is similar to the corresponding national proportion the synthetic estimates are unlikely to be affected much by whether or not we post-stratify the E-Sample on proxy status.
The remaining comments apply to estimates for areas whose proxy proportions do differ significantly from the national proportion.
(2) If census inclusion probabilities for proxy cases are not much different from those for nonproxies, then post-stratifying the E-Sample on proxy status will improve synthetic estimates. This is essentially the assumption implicit in our use of proxy status only for the E-Sample post-stratification.
(3) If census inclusion probabilities for proxy cases differ from those for nonproxies in the same way that the correct enumeration rates differ between these two groups (i.e., if proxies have very low census inclusion probabilities), then post-stratifying the E-Sample on proxy status will lead to worse synthetic estimates than use of common post-strata for the E- and P- Samples.
(4) Somewhere between the extremes implicit in (2) and (3) is a neutral point where the accuracy of estimates with proxy status included in the E-Sample post-stratification would be about the same as that of estimates without proxy status included.
(5) Because proxy status (and the other new E-Sample post-stratifiers) were difficult or impossible to use on the P-Sample, we were unable to test the new E-Sample post-stratifiers in regard to how P-Sample match rates varied across them. As a result, we dont know where truth lies on the continuum referred to at the end of point (4).
59
(6) Apart from accuracy considerations, the greater variation in correct enumeration rates explained by the new E-Sample post-stratification can potentially lead to more extreme estimates of overcounts than we are accustomed to (there is less potential for extreme undercount estimates because there is less variation in estimated match rates). This could happen for areas with high proportions of proxy respondents, and, as shown in Section 2.2, it did happen for a small number of very small places and small counties.
Detailed Discussion:
We decided to use different post-strata for the E-Sample (used to estimate correct enumeration rates) than for the P-Sample (used to estimate match rates reflecting census inclusion probabilities apart from correlation bias adjustments). Previous research using 1990 PES and 2000 A.C.E. production data, as well as more limited analyses of data from test censuses, suggested different factors were predictive of census erroneous enumeration and census omission. Previous work on post-stratification focused on using factors relevant to census omissions, hence, there was much more opportunity for enhancing the E-Sample post-stratification than the P-Sample post-stratification. Logistic regression modeling was done using 2000 A.C.E. production E- and P- Sample data to determine the most important factors for post-stratifying the E- and P- Samples. In the end the E-Sample post-stratification was substantially revised from the March 2001 A.C.E., but the only change made to the P-Sample post-stratification was to split the 0-17 age group into 0-9 and 10-17. The P-Sample change was suggested by DA results that showed differential census coverage for the younger versus higher ages within the 0-17 group.
The goal in post-stratifying is to account for variation in the corresponding rates-the correct enumeration rates from the E-Sample and the match rates from the P-Sample. DSE are then constructed synthetically for the cross-classification of the E- and P- Sample post-strata. As shown in Section 2.3, for each cell in this cross classification the coverage correction factor (CCF) is constructed by taking the data-defined rate for that cell times the correct enumeration rate for the corresponding E-Sample post-stratum divided by the match rate for the corresponding P-Sample post-stratum. For adult males the CCFs will also incorporate the correlation bias adjustment factor. This process results in distinct CCFs for this relatively large number of cross-classified cells, but they are determined by a much smaller number of correct enumeration rates and match rates (and five correlation bias adjustment factors). The numbers of correct enumeration and match rates used are roughly comparable to the numbers of correct enumeration and match rates used for the 1990 PES with 357 post-strata, as well as to the numbers used for the March 2001 A.C.E. estimates. Thus, the fact that this cross-classification generates a large number of cells and corresponding CCFs is not in itself an issue.
An apparent issue did arise, though, in regard to the occurrence of some rather extreme estimates of correct enumeration rates for some E-Sample post-strata, and corresponding extreme estimates of net census coverage when calculated for these E-Sample post-strata and corresponding post-stratum groups. In particular, the proxy post-strata had extremely low census correct enumeration rates, which are defined conceptually as the proportion of census records that are 60
correct enumerations. (The census correct enumeration rates are actually the product of the usual E-Sample correct enumeration rates and the data-defined rates-see Section 2.3.) In some cases the census correct enumeration rates were as low or lower than 60.0 percent. (See Table 6 of Section 2.1.) No P-Sample post-strata had estimated census inclusion probabilities this low:
among P-Sample post-stratum groups only two had census inclusion probabilities (a little) below 80.0 percent, and for many of the groups these probabilities exceeded 90.0 percent. (See Table 4 of Section 2.1.) Therefore, estimated census coverage rates computed for the proxy post-strata showed large estimated overcounts. It is possible, however, that the true census inclusion probabilities for proxies are more extreme than any of the post-stratum estimated census inclusion probabilities, and if the inclusion probabilities for proxies were truly as low as 60.0 percent then the true net undercount for proxies would be close to zero. But we dont know this to be the case; it could instead be that inclusion probabilities for proxies are not so extremely different from those for non-proxies. Unfortunately, it is difficult (not necessarily impossible) to even define census proxy status for P-Sample nonmatches, so we were unable to post-stratify the P-Sample by proxy status to estimate match rates and hence census inclusion probabilities separately for proxies. Left with this uncertainty about the true census inclusion probabilities for proxies, we dont know if their extreme estimated overcount rates are reasonable or not.
Analogous situations could arise, though much less extreme, for a few other E-Sample post-stratum groups, such as those for late Nonresponse Followup (NRFU) returns.
We examined results from a sensitivity analysis of a simple example designed to mimic the situation for the proxy post-strata. This led to the conclusions (1)-(5) listed under Limitations above. Note that point (1) says that if different geographic areas of interest do not differ that much in their proxy versus nonproxy proportions, then it should not matter much whether or not we include proxy status in the E-Sample post-stratification. This remark, and the other points noted, also apply to other factors in the E-Sample post-stratification for which concerns about possibly extreme overcount estimates might arise (such as late NRFU returns), though the correct enumeration rates for these other groups are not as extreme as those for proxies.
From the results of Section 2.2, synthetic A.C.E. Revision II estimates showed some rather extreme estimates for some small places and a few small counties. Thirty-one place estimates (out of 16,998) had estimated overcounts that exceeded 10.0 percent, with a few of these exceeding 15.0 percent (including one at 22.0 percent). The most extreme undercount estimates for places approached 6.0 percent. For counties the estimated overcounts were not so extreme, the largest approaching 10.0 percent, and only 8 (out of 3,135) exceeding 7.0 percent. One county had an estimated undercount of nearly 8.0 percent, though the next largest estimated county undercount rate was less than 4.0 percent.
The extreme synthetic estimates for some small places and a few small counties, although few in number, are troubling for two reasons. The first is the possibility that census inclusion probabilities for proxies are similarly low to their correct enumeration rates, in which case the large estimated overcounts would reflect significant bias in the estimates. The second reason is that even if census inclusion probabilities for proxies are not so extreme, so that the large estimated overcounts do not reflect large biases and are reasonable on average, some extreme 61
place estimates may reflect large synthetic errors. Note that the data for any small place exert very little direct influence on the post-stratum estimates, and hence have little direct influence on the synthetic estimate for that small place. Thus, an extreme overcount (or undercount) estimate for any particular small place is not directly supported by data from that place. While we do not actually know that the extreme estimates for particular places are inaccurate, adjustments for overcounts on the order of 15.0 or 20.0 percent may be seen as implausible.
5.6 Other Issues in Synthetic Estimation Summary:
The corrections made for correlation bias and for duplicates as discussed in Sections 5.1, 5.3, and 5.4 all involve estimates at a very aggregate (national) level with little or no information available about how the effects being estimated truly affect correct enumeration rates and census inclusion probabilities for individual post-strata. Hence, the corrections lead to synthetic error in the post-stratum estimates. This in turn leads to synthetic error in estimates for geographic areas or population subgroups.
A particular aspect of the corrections for duplicates in regard to estimates of census correct enumerations is worth noting. This correction is made based on duplicate links obtained by matching E-Sample cases to the full census, and affects weighted estimates of correct enumerations for the E-Sample post-strata. Thus, geographic variation in rates of census duplication is not reflected in the estimates (since the E-Sample is not post-stratified by geography). Consideration was given to matching the full census against itself to detect duplications in the entire census, not just among the E-Sample cases. This would have permitted estimates for any specific place to account for actual detected duplicates for that place. However, matching of the full census against itself for duplicate detection proved to be infeasible in the limited time available.
62
- 6. Other Assessments of A.C.E. Revision II Results 6.1 Measurement Coding Issue:
Since the evaluations of the A.C.E. found errors in the assignment of enumeration and residence status codes for the E- and P- Samples, respectively, the A.C.E. Revision II methodology included recoding a subsample of the A.C.E. sample and used the results in a double sampling ratio adjustment. In addition, there were a large number of Conflictingcases where the A.C.E.
Followup interview and EFU interview collected information that was contradictory regarding whether the person was a resident in the sample block on Census Day although the interviews appeared to be of comparable quality. The A.C.E. Revision II recoding operation strove to correct coding errors and to reduce the number of Conflicting cases by accurately coding as many of these as possible. The recoding operation assigned some of the E- and P-Sample codes by a computer algorithm, with the rest assigned clerically by the Census Bureaus elite matching team (Adams and Kresja 2002b). Both the automated coding and the procedure for reducing the Conflicting cases were new methods and had the potential for introducing error into the estimates.
Major Findings:
The automated recoding was evaluated by applying it to cases also coded by the elite matching team, and this evaluation found the potential for this error to be very small. While the assignment of the Conflicting codes may have been appropriate for evaluation purposes in the PFU/EFU review, these codes were assigned too readily for the purpose of coding cases for their use in DSE. The recoding was able to successfully code many of the Conflicting cases in regard to their enumeration status or census day residency status. After the second coding of cases initially coded as Conflicting, very few remained coded as Conflicting and therefore, are believed to have a very small impact on the estimator.
Detailed Discussion:
Although the strategy of combining automated and clerical coding permitted recoding of a larger sample in the time available, most likely reducing the variance of the A.C.E. Revision II, there was concern that the automated assignment of enumeration and residence status for some of the cases increased the possibility of error in the A.C.E. Revision II dual system estimates (DSE).
An evaluation based on a subsample of cases coded both ways showed that the potential error from the automated coding was very small (Adams and Kresja 2002c).
The A.C.E. Revision II coding operation coded some cases in the E-Sample and P-Sample Conflicting because information collected in the A.C.E. Person Followup and the Evaluation Followup appeared to be of comparable quality but disagreed as to whether the person was a resident of the A.C.E. sample block on Census Day. In the E-Sample, these are enumerations whose enumeration status could not be coded Correct, Erroneous, or Unresolved. In the P-Sample, these are cases whose residency status could not be coded Resident, Nonresident, or Unresolved.
63
Since the Conflicting code is relatively new, we will include some explanation about its use. The 2001 A.C.E. Person Followup/Evaluation Followup (PFU/EFU) Review (Adams and Kresja 2001) was the first operation to use the Conflicting code, and it coded only a subsample of the E-sample for the Evaluation Followup. The PFU/EFU Review was mounted quickly to discover whether the coding of erroneous enumerations in the A.C.E. and the evaluations was correct, and had no intention of providing results for use in developing population estimates. Since the study arose from the concern that the A.C.E. evaluations were coding too many erroneous enumerations, the instructions were to be cautious in assigning the code of erroneous enumeration, and there was a liberal rule for using the Conflicting code. With respect to coding cases for their use in developing population estimates, however, the Conflicting code was used too readily in the PFU/EFU Review. Since the A.C.E. Revision II coding operation had the goal of providing data for producing population estimates, and there was evidence that the A.C.E. and the evaluations had underestimated, not overestimated, erroneous enumerations, the operation required stricter procedures for assigning the Conflicting code to the E-sample and comparable procedures for the P-sample.
In the construction of the A.C.E. Revision II estimator, we found that we did not have any data for developing a model for estimating the probability of being a correct enumeration for the Conflicting cases in the E-sample nor for estimating the probability of being a resident for the Conflicting cases in the P-sample. We addressed this problem by having the elite matching team review the Conflicting cases under somewhat relaxed rules with the goal of assigning Correct, Erroneous, or Unresolved to those in the E-sample and Resident, Nonresident, or Unresolved to those in the P-sample, if possible. The elite matching team could use their judgment more than was permitted in the initial assignment of the Conflicting codes. The team worked in pairs and assigned a code only when both members of the pair agreed on where the person resided on census day or that the code should be Unresolved. If they could not agree, then they let the code remain Conflicting. Relaxing the rules may cause the coding not to be reproducible, and the nonsampling variance may be increased slightly. However, we believe relaxing the rules reduces the bias since the elite matching team has years of experience and the skill to assign high quality codes. The elite matching team with their vast experience probably provided better data than any statistical model that could be developed. Because of the concern that the coding be reproducible, both the initial code of Conflicting as well as the final code are retained in the A.C.E. Revision II files although the second code was used in the A.C.E. Revision II estimation.
Of the initial 741,616 (weighted) E-sample cases coded conflicting, all but 46,738 received a code of Correct, Erroneous, or Unresolved in the second review. Of the 268,223 (weighted) P-sample cases coded as Conflicting, all but 59,225 received a code of Resident, Nonresident, or Unresolved in second review (Adams and Kresja 2002b).
The A.C.E. Revision II evaluation program did not include a further evaluation of the coding of match status, but used the coding of match status for the P-Sample from the Measurement Error Review (Raglin and Kresja 2001). The additional matches in the surrounding block discovered by the Matching Error Study also were incorporated for the A.C.E. Revision II match status 64
(Bean 2001). It was felt that drawing from the results of these evaluations provided the best coding available, and that any further recoding probably would not provide better codes.
Therefore, the assumption is that the A.C.E. Revision II match status codes are error-free.
6.2 Other Errors in Census Omissions Issue:
The match rate may be biased by a respondents inconsistent reporting of the variables used in the post-stratification in the census and P-Sample. Another potential source of bias in the match rate arises from using inmovers to estimate the number of outmovers.
Major Findings:
The evaluations of these two error sources showed they had very little effect on the A.C.E.
Revision II estimates.
Detailed Discussion:
We also examined errors in the estimates of census omissions due to inconsistent reporting of variables used in post-stratification and to using inmovers to estimate the outmovers in the A.C.E. Revision II PES-C version of dual system estimation. A respondents inconsistent reporting of characteristics in the census and the P-Sample may be problematic because the respondent is categorized on a different basis for the match rate and the correct enumeration rate.
Using inmovers in PES-C estimation is believed to produce a better estimate of the number of movers since the interviews for outmovers are all from proxies and therefore, tend to underestimate the number of movers. However, for particular areas, there may be legitimate reasons why the number of outmovers is less than the number of inmovers.
The evaluations of these two error sources showed they had very little effect on the A.C.E.
Revision II estimates (Bench 2002, Keathley 2002).
6.3 Missing Data Models Issue:
The recoding operation created new types of missing data that required additional missing data models.
Major Findings:
The A.C.E. Revision II missing data models are thought to be of higher quality than those used for the March 2001 A.C.E. estimates because the imputation cells rely on more information and more detailed questionnaire responses. We did find lower variance due to alternative missing data.
65
Detailed Discussion:
The missing data models used for the March 2001 A.C.E. were not directly applicable to the Revision Sample used in the A.C.E. Revision II estimation because the recoding operation created new missing data. There were three new types of missing data that required models for the Revision Sample: (1) P-Sample households that were originally considered interviews but the recoding determined that there were no valid Census Day residents, (2) cases with unresolved match, enumeration, or residency status because of incomplete or ambiguous EFU and PFU data, and (3) cases with conflicting enumeration or residency status because contradictory information was collected in the A.C.E. PFU and the EFU interviews and it could not be determined which was valid.
A separate model was developed for each new type of missing data. A household noninterview weighting adjustment using new cell definitions was designed specifically to address housing units newly recoded as noninterviews (Ikeda 2002). For the cases that were unresolved because of incomplete or ambiguous EFU and PFU data, the creation of new imputation cells used EFU and PFU check boxes and enumeration, residency, and match status assigned during the recoding along with the why codes (Beaghen and Sands 2002). Since the March 2001 A.C.E. imputation for such cases used only data collected before followup, the new A.C.E. Revision II cells are viewed as an improvement since they provide greater discrimination in assigning probabilities of correct enumeration and residency. For the cases coded as Conflicting, there were no applicable donor pools to use in developing an imputation model. Therefore, probabilities of 0.5 were imputed for correct enumeration status and Census Day residency status. Fortunately, the recoding operation resulted in a relatively small number of these cases.
All the imputation involves choosing a model. The choice of the imputation model may introduce error into the estimates. The evaluation of the March 2001 A.C.E. estimates included an analysis of reasonable alternative imputation models (Keathley et al. 2001). Time constraints did not permit the creation of alternative imputation models for the A.C.E. Revision II.
However, an evaluation built on the alternative models created for the March 2001 A.C.E. and estimated the variance due to the choice of the missing data model. The evaluation showed that the variance due to missing data was lower for A.C.E. Revision II than was observed for the A.C.E. (Kearney 2002). The loss function analysis incorporated a variance component for error due the choice of the missing data model (Mulry and ZuWallack 2002).
6.4 Coding of Mover Status Using Evaluation Data Issue:
The design of the Evaluation Followup (EFU) questionnaire appeared to create the tendency for respondents not to report moving. The reason for suspecting this problem is that the EFU results for the P-Sample showed A.C.E. underestimated the resident nonmovers by 0.7 million and overestimated the resident outmovers and inmovers by 1.1 million and 0.5 million, respectively.
Traditionally movers are harder to measure than nonmovers and more careful interviewing finds more movers, but the EFU found fewer.
66
Major Findings:
The A.C.E. Revision II did not accept some EFU changes from mover to nonmover. The A.C.E.
Revision II recoding operation permitted changing mover status using information collected by A.C.E. personal interview and followup and the EFU. After the initial recoding, if the status of nonmover disagree with the A.C.E. coding of outmover or inmover, an algorithm permitted accepting the A.C.E. mover status for the A.C.E. Revision II under some circumstances.
Detailed Discussion:
Both the A.C.E. and the A.C.E. Revision II estimators require that each P-Sample case have one of the following mover statuses: nonmover, outmover, or inmover. A mover, whether outmover or inmover, is a person who has one residence and leaves one housing unit for another. In contrast, a person who alternates living between two residences is a nonmover who cycles between two housing units. A person who cycles is a nonmover but may or may not be a resident of the housing unit in question on Census Day.
The mover status for the A.C.E. was assigned during the personal interview (PI) and no changes were allowed during the matching operations before or after followup. During the coding of data from the Evaluation Followup, changes in mover status were allowed and may have been based on data collected during the PI, production followup (PFU), or the Evaluation Followup (EFU).
However, the structure of the EFU questionnaire appears to cause respondents to tend not to report moving. The revised coding for A.C.E. Revision II permits changing any of the mover status codes, but does not focus on coding mover status.
For most of the P-Sample, the PI and the EFU mover status agree. The cases where they disagree is of concern, particularly when the change is from mover to nonmover. The tendency of the EFU questionnaire to cause movers to report themselves as nonmovers has to be balanced against the fact that errors in mover status in the PI could not be corrected during the original A.C.E.
even when recognized.
The A.C.E. Revision II developed a procedure to override the initial recoding of nonmover for cases that were A.C.E. movers for some circumstances depending on the evidence supporting the classification of mover. If an A.C.E. inmover proved to be a duplicate of nonmover in the recoding, this superseded all the rules and A.C.E. Revision II classified the case as a nonmover duplicate. For the cases where the initial recoding was nonmover but the A.C.E. status was outmover or inmover and there was evidence of another residence, then the A.C.E. Revision II took the nonmover status because the person appeared to cycle between residences. If there was no evidence of another residence and the respondent was a proxy for the A.C.E. or the EFU, then the A.C.E. Revision II accepted the mover status corresponding to the interview with a respondent from the household because it was believed to have more accurate information. For the cases where the initial recoding was nonmover but the A.C.E. status was inmover and the person had a census enumeration outside the search area, then the A.C.E. Revision II took the inmover status because the person appeared to have moved. In all the other circumstances when the recoding was nonmover and the A.C.E. status was outmover or inmover, the A.C.E. Revision II took the status of nonmover.(Adams and Krejsa 2002b) 67
6.5 To What Extent are the Errors Noted Above Reflected in the Confidence Intervals and Loss Function Analyses Issue:
The results of the Confidence Intervals and Loss Function Analysis (Mulry and ZuWallack 2002) are conditional on the assumption that the nonsampling bias and variance components used in the analyses accurately portrayed all the nonsampling errors in the A.C.E. Revision II estimates.
Since most of the data available on the quality of the March 2001 A.C.E. are being incorporated in the A.C.E. Revision II, the estimation of the net bias could use only data that were not included.
Major Findings:
Potential errors not reflected in the loss function analysis are excluded because measuring and estimating them is not feasible.
Detailed Discussion:
The estimation of bias in the A.C.E. Revision II underlying the construction of the bias-corrected confidence intervals and the loss function analysis excludes consideration of the following errors:
- synthetic estimation error
- response error and coding error in A.C.E. Revision II P-Sample residency and match status and E-Sample correct enumeration status (e.g., conflicting cases)
- response error and coding error in A.C.E. Revision II P-Sample mover status
- error in Demographic Analysis sex ratios for correlation bias estimation
- error due to the model used to estimate correlation bias from Demographic Analysis sex ratios
- error due to the model for estimating the effect of E-Sample cases with duplicates.
68
References Adams, Tamara, and Krejsa, Elizabeth A. (2001) ESCAP II: Results of the Person Followup and the Evaluation Followup Forms Review. Executive Steering Committee For A.C.E.
Policy II, Report No. 24., dated October 12, 2001. Census Bureau, Washington, DC.
Adams, Tamara and Kresja, Elizabeth (2002a) Results of A.C.E. Revision II Measurement Coding, DSSD A.C.E. REVISION II MEMORANDUM SERIES #PP- 55. Census Bureau, Washington, DC.
Adams, Tamara and Krejsa, Elizabeth (2002b) A.C.E. Revision II Measurement Subgroup Documentation. DSSD A.C.E. REVISION II MEMORANDUM SERIES #PP- 6.
Census Bureau, Washington, DC.
Adams, Tamara and Kresja, Elizabeth (2002c) Evaluation of At-Risk Codes. DSSD Revised A.C.E. Estimates Memorandum Series #PP- 45. Census Bureau, Washington, DC.
Beaghen, Michael and Sands, Robert (2002) Results from the Imputation for Unresolved Enumeration, Residency, and Match Status. DSSD A.C.E. REVISION II MEMORANDUM SERIES #PP- 57. Census Bureau, Washington, DC.
Bean, Susanne L. (2001) ESCAP II: Accuracy and Coverage Evaluation Matching Error.
Executive Steering Committee For A.C.E. Policy II, Report No. 7., October 12, 2001.
Census Bureau, Washington, DC.
Bean, Susanne L. and Bauder, D. Mark (2002) Census and Administrative Records Duplication Study, DSSD A.C.E. REVISION II MEMORANDUM SERIES #PP-44. Census Bureau, Washington, DC.
Bell, William R. (1993), Using Information from Demographic Analysis in Post Enumeration Survey Estimation, Journal of the American Statistical Association, 88, 1106-1118.
Bell, William R. (1997), Combining Demographic Analysis (DA) and ICM Results-Further Results, internal Census Bureau note, revised version, December 12, 1997.
Bell, William R. (2001a), Accuracy and Coverage Evaluation Survey: Correlation Bias, DSSD Census 2000 Procedures and Operations Memorandum Series B-12*, February 28, 2001.
Bell, William R. (2001b), ESCAP II: Estimation of Correlation Bias in 2000 A.C.E. Estimates Using Revised Demographic Analysis Results, Executive Steering Committee for A.C.E. Policy II, Report 10, October 16, 2001.
Bell, William R. (2002a), A.C.E. Revision II: Calculating aggregate data-defined, correct enumeration, and census inclusion rates (for groups that involve aggregation across post-strata, DSSD A.C.E. Revision II Memorandum Series #PP-40, Decennial Statistical Studies Division, U.S. Bureau of the Census; December 31, 2002.
69
Bell, William R. (2002b), On Alternative Options for Tabulating Estimates of Census Correct Enumerations Allowing for Duplicate Links, DSSD A.C.E. REVISION II MEMORANDUM SERIES PP-3, U.S. Bureau of the Census, May 21, 2002.
Bench, Katie (2002) P-Sample Match Rate Corrected for Error Due to Inconsistent Post-stratification Variables. DSSD A.C.E. REVISION II MEMORANDUM SERIES
- PP- 46. Census Bureau, Washington, DC.
Bryant, B. E. et al., Assessment of Accuracy of Adjusted Versus Unadjusted 1990 Census Base for Use in Intercensal Estimates: Recommendation, Report of the Committee on Adjustment of Postcensal Estimates, U.S. Census Bureau, Washington, D.C., 1992.
Byrne, Rosemary, Beaghen, Michael, and Mulry, Mary H. (2002) Clerical Review of Census Duplicates. DSSD A.C.E. REVISION II MEMORANDUM SERIES #PP- 43. Census Bureau, Washington, DC.
ESCAP (2001), Report of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy, March 1, 2001.
ESCAP II (2001), Report of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy on Adjustment for Non-Redistricting Uses, October 17, 2001.
Fay, Robert E. (2001), ESCAP II: Evidence of Additional Erroneous Enumerations from the Person Duplication Study, Executive Steering Committee for A.C.E. Policy II, Report 9, Preliminary Version, October 26, 2001.
Fay, Robert E. (2002a), ESCAP II: Evidence of Additional Erroneous Enumerations from the Person Duplication Study, Executive Steering Committee for A.C.E. Policy II, Report 9, Revised Version, March 27, 2002.
Fay, Robert E. (2002b), Probabilistic Models for Detecting Census Person Duplication, American Statistical Association, Proceedings of the Survey Research Methods Section.
Griffin, Richard (2002) Assessment of Synthetic Assumption. DSSD A.C.E. Revision II Estimates Memorandum Series #PP-49. Census Bureau, Washington, DC.
Hogan, Howard (1993), The 1990 Post Enumeration Survey: Operations and Results, Journal of the American Statistical Association, 88, 1047-1060.
Hogan, Howard (2002), Five Challenges in Preparing Improved Post-Censal Population Estimates, DSSD A.C.E. REVISION II MEMORANDUM SERIES PP-1, January 25, 2002. Census Bureau, Washington, DC.
70
Hogan, H., Kostanich, D., Whitford, D., and Singh, R. (2002), Research Findings of the Accuracy and Coverage Evaluation and Census 2000 Accuracy, American Statistical Association Joint Statistical Meetings, 2002 Proceedings of the Section on Survey Research Methods.
Ikeda, Michael (2002) Results of the Noninterview Adjustment, DSSD A.C.E. Revision II Estimates Memorandum Series #PP- 56. Census Bureau, Washington, DC.
Kearney, Anne (2002) Evaluation of Missing Data Model, DSSD A.C.E. Revision II Estimates Memorandum Series #PP- 48. Census Bureau, Washington, DC.
Keathley, Don (2002) Error Due to Estimating Outmovers Using Inmovers in PES-C, DSSD A.C.E. REVISION II MEMORANDUM SERIES #PP- 47. Census Bureau, Washington, DC.
Keathley, Don, Kearney, Anne, and Bell, William (2001) ESCAP II: Analysis of Missing Data Alternatives for Accuracy and Coverage Evaluation, Executive Steering Committee For A.C.E. Policy II, Report No. 12. dated October 11, 2001. Census Bureau, Washington, DC.
Kostanich, D. (2003a), A.C.E. Revision II: Design and Methodology, DSSD A.C.E.
REVISION II MEMORANDUM SERIES #PP-30, U.S. Bureau of the Census, Washington, DC.
Kostanich, D. (2003b), A.C.E. Revision II: Summary of Methodology, DSSD A.C.E.
REVISION II MEMORANDUM SERIES #PP-35, U.S. Bureau of the Census, Washington, DC. .
Miskura, Susan M. (2000), Results of Reinstatement Rules for the Housing Unit Duplication Operations, Memorandum for Preston J. Waite, Decennial Management Division, U.S.
Census Bureau, November 21, 2000.
Mule, Thomas (2001), ESCAP II: Person Duplication in Census 2000, Executive Steering Committee for A.C.E. Policy II, Report 20, October 11, 2001.
Mule, Thomas (2002a), Revised Preliminary Estimates of Net Undercounts for Seven Race/Ethnicity Groupings, DSSD A.C.E. REVISION II MEMORANDUM SERIES PP-2; U.S. Bureau of the Census; April 4, 2002.
Mule, Thomas (2002b), Accuracy and Coverage Evaluation Revision II: Further Study of Person Duplication, DSSD A.C.E. REVISION II MEMORANDUM SERIES PP-51, U.S. Bureau of the Census, December 31, 2002.
71
Mule, Thomas (2003), A.C.E. Revision II Results: Change in Estimated Net Undercount, DSSD A.C.E. REVISION II MEMORANDUM SERIES PP-58, U.S. Bureau of the Census, March 4, 2003.
Mulry, M. and Petroni, R. (2002), Error Profile for PES-C as Implemented in the 2000 A.C.E.,
American Statistical Association Joint Statistical Meetings, 2002 Proceedings of the Section on Survey Research Methods.
Mulry, Mary H. and Spencer, Bruce D. (1993) Accuracy of the 1990 Census and Undercount Adjustments. Journal of the American Statistical Association, 88, 1080-1091.
Mulry, Mary H. and Spencer, Bruce D. (2001) Overview of Total Error Modeling and Loss Function Analysis, DSSD Census 2000 Procedures and Operations Memorandum Series B-19*. Census Bureau, Washington, DC.
Mulry, Mary and ZuWallack, Randal (2002) Confidence Intervals and Loss Function Analysis, DSSD A.C.E. REVISION II MEMORANDUM SERIES #PP- 42. Census Bureau, Washington, DC.
Nash, Fay (2000), Overview of the Duplicate Housing Unit Operations, Census 2000 Informational Memorandum Number 78, Decennial Management Division, U.S. Census Bureau, November 7, 2000.
Navarro, Alfredo and Asiala Mark (2001) Accuracy and Coverage Evaluation: Comparing Accuracy, DSSD Census 2000 Procedures and Operations Memorandum Series B-19*.
Census Bureau, Washington, DC.
Raglin, David A. and Krejsa, Elizabeth A. (2001) ESCAP II: Evaluation Results for Changes in Mover and Residence Status in the A.C.E. Executive Steering Committee For A.C.E.
Policy II, Report No. 16., October 15, 2001, Census Bureau, Washington, DC.
Robinson, J. Gregory (2001a), Accuracy and Coverage Evaluation Survey: Demographic Analysis Results, DSSD Census 2000 Procedures and Operations Memorandum Series B-4*, March 2, 2001.
Robinson, J. Gregory (2001b), ESCAP II: Demographic Analysis Results, Executive Steering Committee for A.C.E. Policy II, U.S. Bureau of the Census, Report 1, October 13, 2001.
Robinson, J. Gregory and Adlakha, Arjun (2002) Comparison of A.C.E. Revision II Results with Demographic Analysis,DSSD A.C.E. REVISION II MEMORANDUM SERIES
- PP- 41. Census Bureau, Washington, DC.
72
Shores, Roger (2002), Accuracy and Coverage Evaluation Revision II: Adjustment for Correlation Bias, DSSD A.C.E. REVISION II MEMORANDUM SERIES PP-53, U.S.
Bureau of the Census, December 31, 2002.
Thompson, J. H., CAPE Processing Results, U.S. Census Bureau Memorandum, Washington, D.C., 1992.
Thompson, J., Waite, P., Fay, R., (2001), Basis of Revised Early Approximations of Undercounts Released Oct. 17, 2001, Executive Steering Committee for A.C.E. Policy II, Report 9a, October 26, 2001.
Wachter, Kenneth W. and Freedman, David A. (1999), The Fifth Cell: Correlation Bias in U.S.
Census Adjustment, Technical Report Number 570, Department of Statistics, University of California, Berkeley.
73
Attachment A A.C.E. Revision II Reports: Assessment & Results Memo Title & Description Author Date Series #
Comparison of A.C.E. Revision II Results with Gregg 12/31 PP-41 Demographic Analysis - compares estimates of Robinson &
differential coverage by demographics Arjun Adlakha PP-42 Confidence intervals and loss function analyses - (1) Mary Mulry 12/31 forms confidence intervals for adjustment factors using &
estimates of net bias and variance, and (2) uses weighted Randy squared error loss for levels and shares for counties and ZuWallack places across the nation and within state PP-43 Clerical Review of Census Duplicates (CRCD) - Rose Byrne, 12/31 examines accuracy of computer duplicates by having NPC Michael analysts review computer duplicates to determine whether Beaghen, they appear to be the same persons & Mary Mulry PP-44 A.C.E. Revision II Report: Census and Administrative Susanne Bean 12/31 Records Duplication Study - examines accuracy of &
computer duplicates by using administrative records to Mark Bauder identify computer duplicates who are not the same people using Personal Identification Keys (PIKs) and to confirm computer duplicates by using all the addresses found for their PIKs. PIKs are assigned using Social Security Numbers.
PP-45 At-Risk Codes Evaluation - assesses the amount of error Tammy 12/31 at risk due to not having all the cases in the EFU sample Adams &
reviewed clerically Eli Krejsa PP-46 Evaluation Report for P-Sample Match Rate Corrected for Katie Bench 12/31 Error due to Inconsistent Post-stratification Variables -
addresses bias due to inconsistent reporting of variables used in post-stratification in E-Sample and P-Sample PP-47 Report on the Error Due to Estimating Outmovers Using Don Keathley 12/31 Inmovers in the PES-C - addresses use of inmovers to estimate number of movers by comparing DSE using inmovers and outmovers raked to total inmovers 74
Memo Title & Description Author Date Series #
PP-48 A.C.E. Revision II Missing Data Evaluation Final Report Anne Kearney 12/31
- estimates uncertainty due to choice of imputation model by building on alternatives to the A.C.E. imputation model PP-49 A.C.E. Revision II - Analysis of Synthetic Assumption - Rick Griffin 12/31 examines synthetic estimation error using artificial populations for states, counties, and places & examines effects on loss functions PP-50 Comparison of A.C.E. Revision II Population Coverage Gregg 12/31 Results with HMCS Housing Unit Coverage Results - Robinson &
assesses consistency in differential coverage for Glenn demographic and geographic and geographic groups Wolfgang PP-51 A.C.E. Revision II Results: Further Study of Person Tom Mule 12/31 Duplication - Estimates the number of duplicates in the Census by demographics. Estimates the number of potential P-Sample errors by examining those that link to Census enumerations outside the search area PP-52 A.C.E. Revision II Results: Estimated Correct Debbie 12/31 Enumeration and Residence Probability for Duplicate Fenstermaker Links - Summarizes the modeling results for broad &
domains and the situation of the linked cases. Pete Davis PP-53 Accuracy and Coverage Evaluation Revision II: Roger Shores 12/31 Adjustment for Correlation Bias - Summarizes the effect of this adjustment by for major demographic groups.
PP-54 A.C.E. Revision II: Summary of Estimated Net Coverage Debbie 12/31
- Summarizes the revised estimated of coverage for major Fenstermaker demographic groups and makes comparisons to 1990 &
coverage estimates. Also summarizes components of Dawn Haines coverage in terms of correct enumeration rates and match rates.
PP-55 Results of the A.C.E. Revision II Measurement Coding - Tammy 12/31 Summarizes the results of measurement error corrections Adams &
made to the revision samples Eli Krejsa 75
Memo Title & Description Author Date Series #
PP-56 Results from the Noninterview Adjustment - Summarizes Michael Ikeda 12/30 results of household noninterview adjustment in revision P-Sample PP-57 Results from the Imputation of Unresolved Enumeration, Michael 12/31 Residency, and Match Status - Summarizes the Beaghen probabilities imputed by cell.
PP-58 A.C.E. Revision II Results: Change in Estimated Net Tom Mule 3/4 Undercount - shows the cumulative effect of incorporating methodological changes one step at a time.
76