Guide to Data Privacy: Models, Technologies, Solutions

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Data privacy technologies are essential for implementing information systems with privacy by design.

Privacy technologies clearly are needed for ensuring that data does not lead to disclosure, but also that statistics or even data-driven machine learning models do not lead to disclosure.  For example, can a deep-learning model be attacked to discover that sensitive data has been used for its training?  This accessible textbook presents privacy models, computational definitions of privacy, and methods to implement them. Additionally, the book explains and gives plentiful examples of how to implement―among other models―differential privacy, k-anonymity, and secure multiparty computation.

Topics and features:

  • Provides integrated presentation of data privacy (including tools from statistical disclosure control, privacy-preserving data mining, and privacy for communications)
  • Discusses privacy requirements and tools for different types of scenarios, including privacy for data, for computations, and for users
  • Offers characterization of privacy models, comparing their differences, advantages, and disadvantages
  • Describes some of the most relevant algorithms to implement privacy models
  • Includes examples of data protection mechanisms

This unique textbook/guide contains numerous examples and succinctly and comprehensively gathers the relevant information. As such, it will be eminently suitable for undergraduate and graduate students interested in data privacy, as well as professionals wanting a concise overview.

Vicenç Torra is Professor with the Department of Computing Science at Umeå University, Umeå, Sweden.

Author(s): Vicenç Torra
Series: Undergraduate Topics in Computer Science
Publisher: Springer
Year: 2022

Language: English
Pages: 322
City: Cham

Preface
Privacy Models
Organization
How to Use This Book
Acknowledgements
Contents
1 Introduction
1.1 Motivations for Data Privacy
1.1.1 Privacy, Security and Inference
1.2 Two Motivating Examples
1.2.1 Sharing a Database
1.2.2 Sharing a Computation
1.2.3 Privacy Leakages and Risk
1.3 Privacy and Society
1.4 Terminology
1.4.1 The Framework
1.4.2 Anonymity and Unlinkability
1.4.3 Disclosure
1.4.4 Dalenius' Definitions for Attribute and Identity Disclosure
1.4.5 Plausible Deniability
1.4.6 Undetectability and Unobservability
1.4.7 Pseudonyms and Identity
1.4.8 Transparency
1.5 Privacy and Disclosure
1.6 Privacy by Design
1.7 Bibliographical Notes
2 Machine and Statistical Learning, and Cryptography
2.1 Machine and Statistical Learning
2.2 Classification of Techniques
2.3 Supervised Learning
2.3.1 Classification
2.3.2 Regression
2.3.3 Validation of Results: k-fold Cross-validation
2.4 Unsupervised Learning
2.4.1 Clustering
2.4.2 Association Rules Mining
2.4.3 Expectation-Maximization Algorithm
2.5 Cryptography
2.5.1 Symmetric Cryptography
2.5.2 Public-key Cryptography
2.5.3 Homomorphic Encryption
2.6 Bibliographical Notes
3 Disclosure, Privacy Models, and Privacy Mechanisms
3.1 Disclosure: Definition and Controversies
3.1.1 A Boolean or Measurable Condition
3.1.2 Identity Disclosure
3.1.3 Attribute Disclosure
3.1.4 Attribute Disclosure in Clusters and Cells
3.1.5 Discussion
3.2 Measures for Attribute Disclosure
3.2.1 Attribute Disclosure for Numerical Data Releases
3.2.2 Attribute Disclosure for Categorical Data Releases
3.2.3 Model-Based Attribute Disclosure
3.2.4 Attribute Disclosure for Absent Attributes
3.2.5 Discussion on Attribute Disclosure
3.2.6 Attribute Disclosure Through Membership Inference Attacks
3.3 Measures for Identity Disclosure
3.3.1 Uniqueness
3.3.2 Re-Identification for Identity Disclosure
3.4 Privacy Models
3.4.1 Privacy from Re-Identification
3.4.2 k-Anonymity
3.4.3 k-Anonymity and Anonymity Sets: k-Confusion
3.4.4 k-Anonymity and Attribute Disclosure: Attacks and Privacy Models
3.4.5 k-Anonymity and Computational Anonymity
3.4.6 Differential Privacy
3.4.7 Local Differential Privacy
3.4.8 Integral Privacy
3.4.9 Homomorphic Encryption
3.4.10 Secure Multiparty Computation
3.4.11 Result Privacy
3.4.12 Privacy Models for Clusters and Cells
3.4.13 Discussion
3.5 Classification of Privacy Mechanisms
3.5.1 On Whose Privacy Is Being Sought
3.5.2 On the Computations to Be Done
3.5.3 On the Number of Databases
3.5.4 Knowledge Intensive Data Privacy
3.5.5 Other Dimensions and Discussion
3.6 Summary
3.7 Bibliographical Notes
4 Privacy for Users
4.1 User's Privacy in Communications
4.1.1 Protecting the Identity of the User
4.1.2 Protecting the Data of the User
4.2 User's Privacy in Information Retrieval
4.2.1 Protecting the Identity of the User
4.2.2 Protecting the Query of the User
4.2.3 Private Information Retrieval
4.3 Other Contexts
4.4 Bibliographical Notes
5 Privacy for Computations, Functions, and Queries
5.1 Differential Privacy Mechanisms
5.1.1 Differential Privacy Mechanisms for Numerical Data
5.1.2 Composition Theorems
5.1.3 Differential Privacy Mechanisms for Categorical Data
5.1.4 Properties of Differential Privacy
5.1.5 Machine Learning
5.1.6 Concluding Remarks
5.2 Secure Multiparty Computation Protocols
5.2.1 Assumptions on Data and on Adversaries
5.2.2 Computing a Distributed Sum
5.2.3 Secure Multiparty Computation and Inferences
5.2.4 Computing the Exclusive OR Function
5.2.5 Secure Multiparty Computation for Other Functions
5.3 Bibliographical Notes
6 Privacy for Data: Masking Methods
6.1 Perturbative Methods
6.1.1 Data and Rank Swapping
6.1.2 Microaggregation
6.1.3 Additive and Multiplicative Noise
6.1.4 PRAM: Post-Randomization Method
6.1.5 Lossy Compression and Other Transform-Based Methods: De-Noising Data
6.2 Non-perturbative Methods
6.2.1 Generalization and Recoding
6.2.2 Suppression
6.3 Synthetic Data Generators
6.3.1 Synthetic Data Generators and Generative Adversarial Networks
6.3.2 Table-GANs
6.4 Masking Methods and k-Anonymity
6.4.1 Mondrian
6.4.2 Microaggregation and Generalization
6.4.3 Algorithms for k-Anonymity: Variants and Big Data
6.5 Data Protection Procedures for Constrained Data
6.5.1 Types of Constraints
6.6 Masking Methods and Big Data
6.7 Bibliographical Notes
7 Selection of a Data Protection Mechanism: Information Loss and Risk
7.1 Information Loss: Evaluation and Measures
7.1.1 Generic Versus Specific Information Loss
7.1.2 Information Loss Measures
7.1.3 Generic Information Loss Measures
7.1.4 Specific Information Loss
7.1.5 Information Loss and Big Data
7.2 Selection of Masking Methods
7.2.1 Aggregation: A Score
7.2.2 Visualization: R-U Maps
7.2.3 Optimization and Post-Masking
7.3 Machine Learning
7.4 Privacy in Federated Learning
7.5 Bibliographical Notes
8 Other Data-Driven Mechanisms
8.1 Result-driven Approaches
8.2 Tabular Data
8.2.1 Sensitivity Rules
8.2.2 Tabular Data Protection
8.2.3 Cell Suppression
8.2.4 Controlled Tabular Adjustment
8.3 Bibliographical Notes
9 Conclusions
9.1 Guidelines
A Matching and Integration: Record Linkage for Identity Disclosure Risk
A.1 Heterogeneous Distributed Databases
A.1.1 Data Integration
A.1.2 Schema Matching
A.1.3 Data Matching
A.1.4 Preprocessing
A.1.5 Indexing and Blocking
A.1.6 Record Pair Comparison: Distances and Similarities
A.1.7 Classification of Record Pairs
A.2 Probabilistic Record Linkage
A.3 Distance-Based Record Linkage
A.3.1 Weighted Distances
A.3.2 Distance and Normalization
A.3.3 Parameter Determination for Record Linkage
A.4 Record Linkage Without Common Attributes
A.5 Comparison of Record Linkage Algorithms
A.6 Bibliographical Notes
References
Index