Statistical Disclosure Control has been an issue at the forefront of
privacy for many years. The emphasis on open science and reproducible
& replicable research has precipitated the release of both raw and
summary data on publicly accessible portals. Yet, not all of those
releasing such data are aware of re-identification risks. Government
agencies and academic researchers have advanced initial methodological
and policy work in this area ultimately culminating in NIST standards,
journal articles, and books which address re-identification risk. The
current state of statistical disclosure control (SDC) methodologies goes
well beyond Safe Harbor from HIPAA and other best practices aimed at
reducing the likelihood of data re-identification. The more advanced
SDC techniques from these sources should be employed to reduce the risk
in many cases.
This seminar will introduce you to the concepts of data
re-identification, quasi-identifiers, and sensitive values from SDC. We
will provide guidance on conducting data re-identification risk
assessments and the accompanying SDC techniques used to assess and
mitigate this risk. These include concepts like k-anonymity,
l-diversity, household/cluster risk, etc. The trade-off between data
utility and re-identification will be discussed in the context of
methods such as suppression, perturbation, etc. that are used to
mitigate the risk. Available tools for implementing these analyses will
be highlighted.