User Support

Do Your own analysis
Explore Expert Research & Limited Datasets

Data Improvement through Data Linkages and Data Validation

Record linkage is an important tool in creating data required for examining the health of the public and of the healthcare system itself. It can be used to improve data collection, quality assessment, and the dissemination of information. Data sources can be examined to eliminate duplicate records, to identify under-reporting and missing cases, and to generate disease registries and health surveillance systems.

Record linkage refers to matching or merging data from a variety of data sources for the same individual. Data linkage can be accomplished manually (by visually comparing records from two separate sources) but this approach becomes time consuming, tedious, inefficient, and unpractical as the number of records in the data files increases.

Linking the Data

Technological advances in computer systems and programming techniques make it efficient and economically feasible to accurately perform computerized record linkage between large files. Probabilistic matching is recommended over traditional, exact matching methods when coding errors, reporting variations, missing data, or duplicate records exist.

Data Linkage Strategies
Record Linkage Concepts (PDF file, 678 KB; HTML)
A presentation on the theory behind deterministic and probabilistic matching, deduplication, and blocking and matching variables. (Developed by the Northwest grantee)

Process Diagram for Linking Data (PDF file, 180 KB; HTML)
A diagram showing the approach used by the project to link inpatient data from various sources with the Northwest Tribal Registry (NTR). (Developed by the Northwest grantee)

Linking Maternal and Child Health Data to Create a Comprehensive Longitudinal Dataset: The Florida Experience (PDF file, 498 KB; HTML)
A presentation that illustrates issues involved in linking maternal and child health data, provides a comparison of the capabilities of various software data linking products, and discusses a customized SAS macro that is being used for linking. (Developed by a Florida recipient of an AHRQ Clinical Content Enhancement grant)

Using Link Plus Software
Link Plus is a free probabilistic record linkage and de-duplication program developed by the CDC. Originally designed for use by CDCís National Program of Cancer Registries, the program can be used with any type of data in fixed width or delimited format.

Overview of Link Plus (PDF file, 394 KB; HTML)
A detailed presentation on using the Link Plus probabilistic record linkage and de-duplication program. (Developed by the Northwest grantee)

Link Plus Self-Training Manual for Linkage (PDF file, 697 KB; HTML)
A user guide that explains how to use the Link Plus software by walking through an example of a record linkage between the NTR and a state health registry. (Developed by the Northwest grantee)

Link Plus Tip Sheet and Resources (PDF file, 142 KB; HTML)
A two-page tip sheet offering ways to make using Link Plus easier, with links to additional resources. (Developed by the Northwest grantee)

Validating the Data

Data integrity and reliability are essential in identifying disparities in care and developing targeted quality interventions. Identifying missing, incomplete, and inaccurate data can be achieved using several data validation and assessment strategies, including comparison with population-based, administrative, or claims data.

The California grantees have developed audit measures for race/ethnicity reporting in statewide hospital databases. In addition, the Northwest region grantees assessed the validity of their record linkages by comparing their linked dataset to Census-based population estimates using data auditing and assessment methodologies.

Assessments and Results
New Measures to Access the Quality of Race/Ethnicity Reporting in State Databases (PDF file, 170 KB; HTML)
A presentation on a project to develop validated audit measures for race/ethnicity reporting that can be used for any stateís statewide databases. (Developed by the California grantee.)

Northwest Tribal Registry Data Assessment (PDF file, 179 KB; HTML)
A summary of an assessment of the completeness and representativeness of the NTR and the population contained therein. (Developed by the Northwest grantee)

Evaluating Progress Among Hospitals: Collecting Improved Race, Ethnicity, and Tribal Data in New Mexico (PDF file, 1.3 MB; HTML)
A presentation describing the NM Hospital Inpatient Discharge Database hospital reporting requirements, challenges, and results from 50 hospitals in the areas of timeliness, quality, and completeness. (Developed by the New Mexico grantee)

Motor Vehicle Crash Mortality among Northwest AI/AN (PDF file, 958 KB; HTML)
A presentation that summarizes the results of a data linkage project that showed a high prevalence of racial misclassification among all AI/AN and discusses how the findings contribute to injury prevention efforts. (Developed by the New Mexico grantee)

Table showing results from the IDEA-NW projectís data linkages in the areas of proportion of record matches and misclassifications of race. (Developed by the Northwest grantee)

Record Linkage to Enhance STD/HIV Surveillance Data for OR AI/AN Population (PDF file, 676 KB; HTML)
Poster presentation on the positive effect of correct classification of race on public health prevention and intervention efforts. (Developed by the Northwest grantee)

Racial Misclassification and Disparities in Mortality among AI/AN and Other Races, Washington (PDF file, 1.2 MB; HTML)
A presentation on the results of a linkage of WA death certificates with the NTR which determined that numerous AI/AN deaths were misclassified as deaths for other races. (Developed by the Northwest grantee)

Improving Data & Enhancing Access (IDEA-NW) Project (PDF file, 5.2 MB; HTML)
A presentation that describes the results of a record linkage project between NTR and various public health data sets to evaluate racial misclassification and improve disease/mortality estimates. (Developed by the Northwest grantee)

Pregnancy Risk Factors and Birth Outcomes within Oregonís AI/AN Population, 2008-2010 (PDF file, 1.0 MB; HTML)
A presentation that describes the results of a project that linked birth certificate data with the NTR to examine the public health effects of racial misclassification. (Developed by the Northwest grantee)

Trends in Unintentional Injury Mortality among AI/AN, Washington, 1990-2009 (PDF file, 1.0 MB; HTML)
A presentation that describes a project to link data between Indian patient registration and various disease registries. The results allow more accurate ascertainment of cause of death for the population. (Developed by the Northwest grantee)

Life Expectancy of Oregon AI/AN (PDF file, 1.3 MB; HTML)
A presentation on the results of a project to link death certificate data with the NTR to examine the underestimation of mortality measures for AI/AN due to racial misclassification. (Developed by the Northwest grantee.)


Internet Citation: Race and Ethnicity Data Improvement Toolkit. Healthcare Cost and Utilization Project (HCUP). July 2017. Agency for Healthcare Research and Quality, Rockville, MD.
Are you having problems viewing or printing pages on this website?
If you have comments, suggestions, and/or questions, please contact
Privacy Notice, Viewers & Players
Last modified 7/13/17