National EMSC Data Analysis Resource Center
Combining multiple databases into one extensive database for analysis or linking multiple events...
Have you ever wondered what effect seatbelt usage has on the amount of money spent for hospital admission for crash victims? Or perhaps you want to know if size-appropriate splinting in the prehospital setting reduces hospital admissions, lengths of stay, and charges?
In most cases the answers for these problems are rarely contained in one database. A researcher must therefore start from scratch, building a new database that follows patients from the point of splinting, to the emergency department, and finally determine if the patient was admitted. You can imagine that building these databases can be expensive in terms of time and money. However, if you have access to existing databases, such as computerized EMS run reports and an emergency department or hospital discharge database, then probabilistic record linkage may be the tool for you.
The purpose of probabilistic record linkage is to combine multiple databases into one extensive database for analysis. It can also be used to link multiple events within one database that refer to a single patient or individual.
Probabilistic record linkage is accomplished by comparing data fields in two files, such as birth date or gender. The comparison of numerous data fields leads to a judgment of whether two records refer to the same patient and/or event (and should be linked). This judgment is based on the cumulative weight of agreement and disagreement among field values. This judgment is based on the cumulative weight of agreement and disagreement among field values. The amount of information in a field affects the field’s impact on whether two records should be linked. For instance, agreement of the gender field alone would not determine that two records refer to the same patient, but agreement on Social Security Number nearly guarantees that two records refer to the same individual. Probabilistic linkage software utilizes mathematical algorithms to determine whether two records should be linked based on the information in each record.
By assigning log-likelihood ratios to field comparisons, it is possible to computerize the judgment process. Let mi equal the probability the ith field agrees, given that the records are known to refer to the same person or event (a true match). Let ui equal the probability that the ith field will agree by chance among records known to not match.
Then for a given pair of records, if field i agrees, the agreement weight is wi= log2( mi / ui ). If field i disagrees, a disagreement weight wi = log2(( 1-mi ) / ( 1-ui )) is assigned. The composite weight for a record pair will be the sum of agreement and disagreement weights for all fields available for comparison.
To improve computation time, both files are sorted on one or several data fields. Comparisons are then made only on records that agree on the sorted fields, which are called blocking variables. If an error occurs in a data field that is used for blocking then records that should match will not be compared. This is because when the file is blocked, only records that agree on the blocking variable(s) are compared. To account for this problem, records that fail to match are subjected to subsequent attempts to match the files after re-blocking with different data fields.
Researchers can relate the match weight for a pair of records to the probability that these records are correctly matched. Based on the sizes of the databases being linked and the number of expected matches, researchers can relate the match weight for a pair of records to the probability that these records are correctly matched. Generally, only record pairs attaining a probability of being correct of at least 0.90 or higher are linked and considered true matches.
Probabilistic record linkage has been used on a national level to look at:
1Johnson SW, Walker J. The Crash Outcome Data Evaluation System (CODES). Washington DC: National Highway Traffic Safety Adminstration; 1996.
2Diller E, Cook LJ, Leonard DR, Dean M, Reading JM, Vernon DD. Evaluating Drivers Licensed with Medical Conditions Licensed with Medical Conditions in Utah, 1992 – 1996. National Highway Traffic Safety Administration 1999 June;Report No. DOT HS 809 023.
3Knight S, Cook LJ, Nechodom PJ, Olson LM, Reading JC, Dean JM. Improper Use of Shoulder Straps in Motor Vehicle Crashes: A Statewide Analysis of Restraint Efficacy. In Press; Accident Analysis and Prevention.
4Cook LJ, Knight S, Olson LM, Nechodom PJ, Dean JM. Crash Characteristics and Medical Outcomes of Older Drivers in Motor Vehicle Crashes in Utah, 1992 – 1995. Annals of Emergency Medicine 2000;35(6):585-591.
5Cvijanovich NZ, Cook LJ, Nechodom PJ, Dean JM. A Population-Based Study of Teenage Drivers: 1992-1996. 43rd Annual Proceedings Association for the Advancement of Automotive Medicine 1999;175-186.
6Berg M, Cook LJ, Corneli H, Vernon D, Dean JM. Effect of Seating Position and Restraint Use on Injuries to Children in Motor Vehicle Crashes. Pediatrics 2000;105(4):831-835.
7Corneli HM, Cook LJ, Dean JM. Adults and Children in severe motor vehicle crashes: A Matched-Pairs Study. In Press; Annals of Emergency Medicine 2000 Oct;36(4):340-5.
8Suruda AJ, Vernon DD, Reading J, Cook LJ, Nechodom PJ, Leonard D, Dean JM. Pre-Hospital Emergency Medical Services: A Population-Based Study of Pediatric Utilization. Injury Prevention 1999;5(4):294-297.
9Knight S, Junkins EP, Lightfoot AC, Cazier C, Olson LM, Injuries in School Shop Classes. Pediatrics 2000;106(1):10-13.
Tweet
rev. 04-Aug-2022