Privacy Characterization and Quantification in Data Publishing
The increasing interest in collecting and publishing large amounts of individuals’ data to public for purposes such as medical research, market analysis and economical measures has created major privacy concerns about individual’s sensitive information. To deal with these concerns, many Privacy-Preserving Data Publishing (PPDP) techniques have been proposed in literature. However, they lack a proper privacy characterization and measurement. In the proposed model, a novel multi-variable privacy characterization and quantification model is proposed. Based on this model, the prior and posterior adversarial belief about attribute values of individuals can be analyzed. Sensitivity of any identifier in privacy characterization can also be analyzed. Then it is shown that privacy should not be measured based on one metric. Two different metrics for quantification of privacy leakage, distribution leakage and entropy leakage is proposed. Using these metrics, analyzed some of the most well-known PPDP techniques such as k-anonymity, l-diversity and t-closeness. Proposed privacy characterization and measurement framework contributes to better understanding and evaluation of these techniques.