October 2002, (Vol. 3, No. 5)
Racial and Ethnic Data: Are they Reliable for Program and Policy Development?
By: Donna Friedsam, MPH

Racial and ethnic minorities are at substantially higher risk for a broad range of adverse health outcomes. Research has demonstrated disparities in access, health care treatment, quality, and outcomes among racial and ethnic populations, regardless of income and insurance status, and even "within the same system of care, and within the same health plan." (1)

New programs and policies are emerging to address these disparities. So too are calls to strengthen data collection. Indeed, "collection of these data by health care providers, coupled with standards for collection, use, and privacy protection, would be a first step toward eliminating disparities." (1)

How, then, are these disparities identified and measured? Policymakers, program developers, and granting agencies rely on public data sets to assess need and assign relative priority to various populations and interventions. Yet the quality of racial and ethnic data consistently proves questionable. There appears to be a lack of cohesive federal directives, and methods for collecting and reporting data vary widely. Some even question the very validity of race and ethnicity as constructs by which to measure and categorize populations.

Validity: How relevant are these data, and should we even collect it?

The apparent correlation between race, genetic data, and disease has prompted at least two schools of thought among biomedical researchers. One holds that race is so poorly defined that, as opined in last year's New England Journal of Medicine, it is "biologically meaningless." The journal Nature Genetics warned of the "confusion and potential harmful effects of using 'race' as a variable in medical research." (4,5)

On the other hand, many population geneticists hold that it is essential to take race and ethnicity into account to understand each group's specific pattern of disease--particularly when self-reported by continent of ancestry . (4)

Meanwhile, these debates remain largely irrelevant to the practice community--those who deliver services and who run programs. In this arena, the use of these data is widespread and often mandatory. Yet last year, the Commonwealth Fund reported wide gaps between the federal goals (to eliminate racial and ethnic health disparities) and how federal agencies are collecting the correlative data .(3)

Widespread confusion remains in the health care sector about the legality of collecting information on the race and ethnicity of patients and clients. Some observers argue that, even considering these legal and practical concerns, health plans can and should collect data on disparities in quality of care for racial and ethnic minority groups. Neretz and colleagues suggest that stratified analyses or racial and ethnic data might be included within quality of care information for HEDIS and NCQA purposes. (2)

Reliability: Are the data comparable across agencies and programs?

Even if we accept that racial and ethnic data should be collected, pressing concerns have arisen about the very validity and reliability of the data we are collecting.

The federal Office of Management and Budget (OMB) sets the standards by which data are to be collected and presented, but does not mandate the collection of these data. These standards are not applicable to states and private industry. (3)

The OMB Revised Standards (1997) define the five race categories: American Indian/Alaska Native, Asian, Black/African American, Native Hawaiian/Other Pacific Islander, and White. The two ethnic categories are Hispanic or Latino and Not Hispanic or Latino. The 2000 census conformed to these standards. Federal agencies have until January 1, 2003 to integrate the new standards into current data collection efforts.

Such data collection and reporting in health programs are prescribed in at least seven federal statutes and five sets of federal regulations. At the same time, some believe that the Health Insurance Portability and Accountability Act may have erected new hurdles to the consistent and uniform collection of racial and ethnic data. The Commonwealth Fund study (3) concludes: "Data systems at the federal and state levels are not sufficiently exchange of data… Concerns remain about the privacy and security in creating and maintaining databases that include race, ethnicity, and primary language."

Beyond data collection, reporting statistically reliable population health data on small minority populations is a challenge. Most published state-level reports group several races together as "other" or simply suppress the data due to small numbers.

How should the data be collected?

The consistency and comparability of data across programs depends, of course, on how the data are collected. The inclusion of race and ethnicity fields in reporting forms does not guarantee that these data are actually collected. In particular, reporting for Hispanic/Latino people remains inconsistent. And some agencies have not yet changed this category from "race" to "ethnicity." (3)

Suggestions abound for how to reduce misclassification at the time of data collection. Self-reported data are considered preferable. (1,2,3,4,7) Many have called for providing training for clinic, hospital, and institutional staff on how to ask about race/ethnicity.

The Institute of Medicine (7) in its recently released report, Unequal Treatment: Confronting Racial and Ethnic Disparities in Healthcare, recommends the following:

Researchers are also urged to collect statistically adequate samples for minority groups by increasing overall sample size, over-sampling minority groups, or conducting population-specific surveys.

The Commonwealth study concludes with a call: "Support research on existing best practices for racial, ethnic, and primary language data collection…Also, emphasize documentation of the relationship between best practices and improved health outcomes."

Model Initiative

The State of Wisconsin has undertaken several initiatives to improve the quality of health and vital statistics data on the state's racial and ethnic populations. These have been spearheaded in large part by efforts to update the Minority Health Report, last published in 1993 by the Department of Health and Family Services.

Among several efforts, attention has focused on American Indians, who are under-represented and undercounted throughout demographic and vital statistics databases. Previous research has shown that American Indians are often misclassified as being other races in health-related data sources. (6)

This racial misclassification occurs by the use of personal observation or interpretation of surnames by the data collector. Further, definitions are imprecise and inconsistent, and self-identification depends on various and changing federal and tribal criteria for recognition of American Indian status.

The biomedical literature includes many articles documenting the effects of such racial misclassification, typically undercounting cases and rates of AIDS and STDs, mortality, injury, ESRD, and cancer. Studies done in the Bemidji region (MI, WI, MN) and elsewhere linking tribal health records with public records have resulted in significant increases in estimates of cancer rates (up to 100% for some cancers).

In Wisconsin, a collaborative, community-based research effort is addressing these concerns. The state Cancer Reporting System (CRS), the Great Lakes Inter-Tribal Council, the UW Comprehensive Cancer Center, and the eleven American Indian tribal clinics are now working together to develop community-based cancer registries. They are also conducting database matching between the state's CRS and the federal Indian Health Service.

This multi-sector statewide approach involves coordination among government agencies, public and private health organizations, academic institutions, and local communities. It may provide a model for continued efforts to improve the validity and relevance of these data to community users, academic, government, and private sectors.

Post-Script: The Denominator The 2000 U.S. Census may further complicate rate calculations by race. For the first time, respondents could self-identify two or more races. Nationally, 2.4% (6.8 million persons) reported more than one race, as did 1.2% (66,895 persons) in Wisconsin (8) . Apart from the many benefits of this change, this does fragment the population denominators. The numerators (i.e., hospital data, vital statistics, morbidity and mortality reports) do not currently include multi-race categories.

Researchers will inevitably need to devise conventions to handle these data. Such a consensus may then allow for meaningful understanding of trends by race and ethnicity, while at the same time reflecting - and respecting - the true diversity of the U.S. population.


  1. Bierman, AS, Lurie, N, Collins, KS, and JM Eisenberg. "Addressing Racial and Ethnic Barriers to Effective Health Care: The Need for Better Data." Health Affairs 21(3):91-102. May/June 2002.
  2. Nerenz, DR, Bonham, VL, et. al. "Eliminating Racial/Ethnic Disparities in Health Care: Can Health Plans Generate Reports?" Health Affairs 21(3): 259-26. Also, "Developing a Health Plan Report Care on Quality of Care for Minority Populations" Commonwealth Fund, 2002.
  3. Perot, RT and M. Youdelman. Racial Ethnic, and Primary Language Data Collection in the Health Care System: An Assessment of Federal Policies and Practices. New York: The Commonwealth Fund. September 2001.
  4. Wade, N. "Race Is Seen as Real Guide to Track Roots of Disease" New York Times, July 30, 2002.
  5. Villarosa, L. "Beyond Black and White in Biology and Medicine." New York Times. September 2001.
  6. Partin MR, Rith-Najarian SJ, Slater JS, et al. "Improving Cancer Incidence Estimates for American Indians in Minnesota." Am J Public Health 1999; 89(11):1673-1677.
  7. Institute of Medicine. Unequal Treatment, Confronting Racial and Ethnic Disparities in Health Care. Washington, DC: National Academy Press. 2002
  8. U.S. Census Bureau. The Two or More Races Population: 2000. Census 200 Brief. November 2001.