Udacity Course

Differential Privacy

Types of DP

Local DP

Coin flip jaywalking example

Each person is now protected with plausible deniability

If we collect a bunch of samples, and 60% answer yes, then we know the true distribution is 70%, because 70% averaged with 50% (coin flip) is 60% which is the result we obtained.

NB: This privacy technique comes at the cost of accuracy, especially when we only have a few samples. The greater the privacy protection (plausible deniability) the less accurate the results.


Types of Noise

How much noise to add?

Laplacian Noise


Perfect Privacy (AI model)

Training a model on a dataset should return the same model even if we remove any person from the training set.

Training a model is kind of like querying a database

Two points of complexity


Hospital Scenario

Steps

  1. Ask each hospital to train a model on their own dataset (10 models generated)
  2. Use each model to predict on your own local dataset, generating 10 labels for each datapoint
  3. Perform a DP query to generate the final true (DP) label for each datapoint (max function, where max is the most frequent label across the 10 labels assigned; then add laplacian noise for DP)
  4. Retrain a new model on your local dataset which now has DP labels