slide 1
Record Linkage Concepts
slide 2
Acknowledgements
Slides adapted from training materials developed by CDC—NPCR Faculty:
CDC/Link Plus development and training:
Adapted by:
NPCR National Program of Cancer Registries logo
CDC logo
HHS logo
slide 3
Overview of Record Linkage
slide 4
Overview of Record Linkage
slide 5
Duplicate Detection
slide 6
Deterministic Matching
Last Name | First Name | Site | SSN | DOB | Sex | DateDX |
---|---|---|---|---|---|---|
Smith | John | C619 | 123654789 | 02011934 | 1 | 06152004 |
Smith | John | C619 | 123654789 | 02011934 | 1 | 06152004 |
slide 7
Deterministic Matching
Last Name | First Name | Site | SSN | DOB | Sex | DateDX |
---|---|---|---|---|---|---|
Smith | John | C619 | 123654789 | 02011934 | 1 | 06152004 |
Smyth | John | C619 | 123456786 | 02081934 | 1 | 06102004 |
Last Name | First Name | Site | SSN | DOB | Sex | DateDX |
---|---|---|---|---|---|---|
Smith | John | C619 | 123654789 | 02011934 | 1 | 06152004 |
Smith | John | C619 | 02011934 | 1 | 06152004 |
slide 8
Deterministic Matching / Manual Review
Last Name | First Name | Site | SSN | DOB | Sex | DateDX |
---|---|---|---|---|---|---|
Smith | John | C619 | 123654789 | 02011934 | 1 | 06152004 |
Smith | John | C619 | 123654786 | 02101934 | 1 | 06152004 |
slide 9
Probabilistic Matching
slide 10
Probabilistic Matching
slide 11
Probabilistic Matching
slide 12
Probabilistic Matching
slide 13
Probabilistic Matching
slide 14
Probabilistic Matching
slide 15
Linkage basics
slide 16
Concept of Blocking
slide 17
Graphic - Left image: Pairs of socks that do not match at all.
Middle image: Clothes basket full of socks.
Right image: Pairs of socks that match exactly or almost match.
slide 18
Graphic - Top image: Pair of socks that match exactly.
Middle image: Pair of socks that match almost exactly.
Bottom image: Pair of socks that match in many ways, but not in other ways.
slide 19
Probabilistic linkage concepts (1)
Description | Common usage* | |
---|---|---|
Blocking | An initial step to reduce the number of record comparisons and increase efficiency of linkage. At least one blocking variable must match exactly (or phonetically) between the two records being compared; subsequent comparisons are made after blocking. | Blocking variables:
|
Matching | After blocking, matching variables are compared to generate a match score for each record pair. Match scores for each variable are:
|
Matching variables:
|
The user may designate matching algorithms & M-probabilities for each variable. |
slide 20
Probabilistic linkage concepts (2)
Description | Common usage* | |
---|---|---|
Match score | The total probability weight assigned to each record pair; equal to the sum of scores generated by comparing each match field. Based on software-calculated M probability (sensitivity) and U probability (specificity). | The range of match scores is examined to determine upper and lower cut-off values. High match scores are likely true matches and scores below cut-off value are automatically designated false matches. Record pairs between cut-off values are clerically reviewed. |
Clerical review | Case-by-case review of uncertain matches that fall between the upper and lower cut-off values. Additional variables can be added to record layout to assist in the designation of match status. This process can be completed independently by two or more reviewers to increase reliability. | Additional variables may include:
|
* May vary based on data items and quality of data in available in matching data sets |
Internet Citation: Record Linkage Concepts. Healthcare Cost and Utilization Project (HCUP). July 2014. Agency for Healthcare Research and Quality, Rockville, MD. hcup-us.ahrq.gov/datainnovations/raceethnicitytoolkit/or19.jsp. |
Are you having problems viewing or printing pages on this website? |
If you have comments, suggestions, and/or questions, please contact hcup@ahrq.gov. |
If you are experiencing issues related to Section 508 accessibility of information on this website, please contact hcup@ahrq.gov. |
Privacy Notice, Viewers & Players |
Last modified 7/31/14 |