HEALTHCARE COST & UTILIZATION PROJECT

User Support

Do Your own analysis
Explore Expert Research & Limited Datasets

Record Linkage Concepts

slide 1

Record Linkage Concepts

slide 2

Acknowledgements

Slides adapted from training materials developed by CDC—NPCR Faculty:

CDC/Link Plus development and training:

Adapted by:

NPCR National Program of Cancer Registries logo

CDC logo

HHS logo

slide 3

Overview of Record Linkage

slide 4

Overview of Record Linkage

slide 5

Duplicate Detection

slide 6

Deterministic Matching

Last Name First Name Site SSN DOB Sex DateDX
Smith John C619 123654789 02011934 1 06152004
Smith John C619 123654789 02011934 1 06152004

slide 7

Deterministic Matching

Last Name First Name Site SSN DOB Sex DateDX
Smith John C619 123654789 02011934 1 06152004
Smyth John C619 123456786 02081934 1 06102004
Last Name First Name Site SSN DOB Sex DateDX
Smith John C619 123654789 02011934 1 06152004
Smith John C619   02011934 1 06152004

slide 8

Deterministic Matching / Manual Review

Last Name First Name Site SSN DOB Sex DateDX
Smith John C619 123654789 02011934 1 06152004
Smith John C619 123654786 02101934 1 06152004

slide 9

Probabilistic Matching

slide 10

Probabilistic Matching

slide 11

Probabilistic Matching

slide 12

Probabilistic Matching

slide 13

Probabilistic Matching

slide 14

Probabilistic Matching

slide 15

Linkage basics

slide 16

Concept of Blocking

slide 17

Graphic - Left image: Pairs of socks that do not match at all.
Middle image: Clothes basket full of socks.
Right image: Pairs of socks that match exactly or almost match.

slide 18

Graphic - Top image: Pair of socks that match exactly.
Middle image: Pair of socks that match almost exactly.
Bottom image: Pair of socks that match in many ways, but not in other ways.

slide 19

Probabilistic linkage concepts (1)

  Description Common usage*
Blocking An initial step to reduce the number of record comparisons and increase efficiency of linkage. At least one blocking variable must match exactly (or phonetically) between the two records being compared; subsequent comparisons are made after blocking. Blocking variables:
  • Last name
  • First name
  • Social security number
  • Date of birth
Matching After blocking, matching variables are compared to generate a match score for each record pair.
Match scores for each variable are:
  • Field-specific (matching DOB is scored higher than matching sex)
  • Value-specific (last name of "Hoopes" is scored higher than "Smith" due to frequency of occurrence) 
Matching variables:
  • Last name
  • First name
  • Social security number
  • Date of birth
  • Sex
  • Address
The user may designate matching algorithms & M-probabilities for each variable.

slide 20

Probabilistic linkage concepts (2)

  Description Common usage*
Match score The total probability weight assigned to each record pair; equal to the sum of scores generated by comparing each match field. Based on software-calculated M probability (sensitivity) and U probability (specificity). The range of match scores is examined to determine upper and lower cut-off values. High match scores are likely true matches and scores below cut-off value are automatically designated false matches. Record pairs between cut-off values are clerically reviewed.
Clerical review Case-by-case review of uncertain matches that fall between the upper and lower cut-off values. Additional variables can be added to record layout to assist in the designation of match status. This process can be completed independently by two or more reviewers to increase reliability. Additional variables may include:
  • Street address
  • City, state, zip code
  • Suffix
  • Race/ethnicity
  • Maiden name
* May vary based on data items and quality of data in available in matching data sets

Internet Citation: Record Linkage Concepts. Healthcare Cost and Utilization Project (HCUP). July 2014. Agency for Healthcare Research and Quality, Rockville, MD. www.hcup-us.ahrq.gov/datainnovations/raceethnicitytoolkit/or19.jsp.
Are you having problems viewing or printing pages on this website?
If you have comments, suggestions, and/or questions, please contact hcup@ahrq.gov.
Privacy Notice, Viewers & Players
Last modified 7/31/14