July 2000 (Vol. 1, No. 3)
Wisconsin's Collaborative Population Health Data Library
by Robert Stone-Newsom, Ph.D. Senior Scientist Wisconsin Network for Health Policy Research & Kristin Bray, Graduate Student, Department of Preventive Medicine University of Wisconsin Medical School

The information, knowledge and wisdom to make good health care policy decisions are built on a foundation of good data. In Wisconsin, as elsewhere, this raw product is often scattered, difficult to locate and of incompatible formats and structures. In the case of small areas, like counties, the data may not even exist. Consequently even highly experienced researchers and analysts find burdensome the mechanics of simply acquiring the data they need. These issues, as well as the widely differing motivations and methodologies behind primary data collections, often threatens to de-couple the natural relationship and reliance of good decision making on good data description and analysis.

The purpose of the recently launched Collaborative Population Health Data Base (CHDB) is to pull the scattered health care data sources into one location and format (a library) while making it freely accessible to all interested parties. The Wisconsin Network for Health Policy Research has collaborated with the major users and manufactures of state health data to make this health data library a reality. We believe that increasing the accessibility of data about which the accuracy, completeness, relevance and timeliness is known will help generate new information and knowledge and hopefully bring increased wisdom and effectiveness to our health care policy decisions. Functionally, the CHDB does secondary data collection and formatting. To extend a familiar analogy, CHDB is a wholesaler of data. Our customers, the retailers, consist of researchers and analysts who retail the data, in the form of numerical evidence and data-based conclusions, to their policymaking consumers.

The original vision for this collaborative data effort arose from meetings between representatives from the Applied Population Laboratory, the Bureau of Health Information, the Department of Health and Family Services and the Network. In the first phase of the project, we have developed an Internet site where interested parties can query and download health-related data for each Wisconsin county and the state as a whole, on:

These data were originally collected by a variety of agencies, but come to the CHDB principally through two sources: the US Census Bureau and the Wisconsin Bureau of Health Information (BHI), a unit within the Division of Health Care Financing in the Department of Health and Family Services. The data contains both actual population data and, particularly in the case of census data, updates based on large samples. Why should we care about such data, and how can this data help us to make better health policy decisions? One answer to these questions revolves around the established need for long-term, consistent national measures of population health utilization and health services. Thanks to inexpensive computing power, the kind of data contained in the CHDB has formed the core of much of the policy-related health services research and knowledge base. Since national and regional health policy decision-making depends upon this data, having it available and accessible locally provides all stakeholders with a level playing field. At its strongest, the CHDB data represents a best effort inventory of the medical resources available to a population as well as the socioeconomic conditions under which they live and a demographic representation of those who utilize medical resources along with the diagnosis and procedures they experienced. A weakness of all such data is that, with exception of mortality data, there are no performance or outcomes data available. As noted in recent news media, almost no information is collected or available on whether, or how well, the medical resources we have are producing health improvements in the populations they serve. Indeed if one compares dollars spent with results achieved, the United States appears to be seriously behind many other nations. A long-term goal of the CHDB is to include a source of outcomes data.

The questions in the sidebar (at right) illustrate the important strengths as well as the potential pitfalls of the CHDB. On one hand, the data is free, it is all in one place, it has not been analyzed or manipulated by others for some other reasons, it requires the user to learn only a single methodology for querying, and once acquired is, in a form that allows the customer to utilize any software tool with which they may be familiar. For people who use this kind of data, these are highly empowering concepts. On the other hand, it is raw data; which means that there are no limits to the statistical analysis and inferences one may attempt. Such unfettered freedom can readily lead to unfettered errors.

Regardless of the inexperience or exuberance of the analyst, CHDB data is vulnerable to some sources of error simply because it is secondary data. These are: (1) Timing, the delay between primary collection, translation and addition to our storehouse. (2) Mechanical error - incurred error rates (usually from omission) resulting from mistranslation or transposition between data files. (3) Errors in meaning - the possibility that we may lose track of how or why the original data was collected, thus risking an ascribing of more or less significance to data than was originally intended by the primary data manufacturers.

Finally, the user / analyst of this data should remain aware that, as indicated, some of the data is truly population data in that it represents actual counts of the variable at hand. Other data represents estimates of variables, albeit with very large sample sizes, and is thus open to the problems inherent in the sampling of populations.

The preceding paragraph is at the core of why actual non-sampled population data is so expensive and time consuming to collect, and why it is so important. Sampled data is relatively inexpensive and usually much more timely, but the assurance of knowing whether one's results are actual, or a function of the methodology used, are forever obscured in error rates and statistical assumptions. Population data pays a trade-off price of being expensive to collect and not remaining timely, in return for ease of analysis and the assurance of one's conclusions. Customers can be assured that the differences and directions they may observe in the CHDB population data were actual at the time they were collected. Whether conditions have changed since the data was collected is not a problem, unfortunately, that even sampling can always solve.

The Network is presently engaged in a market research study of the potential users of the CHDB. If warranted by our findings, our intent is to first construct training materials for those customers who often have the least analytical resources at their command. Our vision for the near future also includes an expansion of the library to encompass private expenditure data such as that stored by major insurers, health providers and employer organizations. Somewhat further out we expect to integrate data reflecting a summary measure of health outcomes and thus begin to tie together the determinants, costs and the short and long-term outcomes of health effecting decisions.

Since June of 2000 the CHDB has undergone a complete re-write of the functional software application that queries and downloads data. We have also added support, history and documentation pages to the web site and are engaged in a full analysis of what will be required to update, maintain and add new data to the product. If you haven't visited the site, we welcome you - if you have, please come back and check out our vastly improved performance and functionality.

Following are some examples of the hundreds of questions the CHDB can provider answers to: