Imputation and record linkage strategies for educational data collected from surveys and administrative sources

Dr. Jörg Drechsler, IAB (Institute for Employment Research), Nuremberg
Dr. Joseph Sakshaug, IAB (Institute for Employment Research), Nuremberg

Research Staff:
Matthias Speidel, IAB (Institute for Employment Research), Nuremberg
Jonathan Geßendorfer, IAB (Institute for Employment Research), Nuremberg

Project Summary:

Possible ways to reduce non-consent bias in linked survey and administrative data

Record linkage has become an important tool for increasing research opportunities in the social sciences and is likely to become even more important in the “big data” era.  Surveys that perform record linkage are often required to obtain informed consent from respondents prior to linkage –which is not always given. This is also the case for the NEPS Starting Cohort 6 linkage to the Integrated Employment Biographies (IEB) of the German Federal Employment Agency.  The major concern regarding non-consent is that non-consenters may be systematically different from consenters introducing an additional source of potential bias in analyses based on survey and administrative data.

One strategy to solve the missing data problem induced by non-linkage is statistical matching.  The missing administrative data of a survey unit is estimated by using the data from a statistically similar individual in the IEB.  Other potential solutions are multiple imputation and weighting techniques.

To evaluate the effectiveness of the each strategy, we use only the respondents for whom the true links are available and induce a synthetic non-consent that leads to bias.  By comparing 1) the true (joint) distributions, 2) the biased (joint) distributions if synthetic non-consenters are not dealt with and 3) the (joint) distributions after using the respective method, one can assess whether or not the bias has been reduced.