Research
A recent DeepMind document on the ethical and social risks of identified language models of large language models leak of sensitive information on their training data as a potential risk that organizations working on these models have a responsibility to manage. Another recent article shows that similar privacy risks can also arise in standard image classification models: a fingerprint of each individual training image can be found embedded in the model parameters, and malicious parties could exploit these fingerprints digital to reconstruct the training data from the model.
Privacy-enhancing technologies, such as differential privacy (DP), can be deployed at training time to mitigate these risks, but they often result in a significant reduction in model performance. In this work, we make substantial progress toward high-accuracy training of image classification models under differential privacy.
Figure 1: (left) Illustration of training data leak in GPT-2 (credit: Carlini et al. “Extracting Training Data from Large Language Models”, 2021). (right) CIFAR-10 training examples reconstructed from a 100,000-parameter convolutional neural network (credit: Balle et al. “Reconstructing Training Data with Informed Adversaries”, 2022)
Differential privacy was propose as a mathematical framework to capture the requirement for protection of individual records during statistical data analysis (including training of machine learning models). DP algorithms protect individuals from any inference about the characteristics that make them unique (including full or partial reconstruction) by injecting carefully calibrated noise when calculating the desired statistic or model. The use of DP algorithms provides strong and rigorous privacy guarantees, both in theory and in practice, and has become a de facto benchmark adopted by a number of organizations. public And private organizations.
The most popular DP algorithm for deep learning is differentially private stochastic gradient descent (DP-SGD), a modification of standard SGD achieved by clipping the gradients of individual examples and adding enough noise to mask the contribution of any individual to each update of the model:
Figure 2: Illustration of how DP-SGD processes individual example gradients and adds noise to produce model updates with privatized gradients.
Unfortunately, previous work has shown that in practice, the privacy provided by DP-SGD often comes at the cost of significantly less accurate models, posing a major barrier to widespread adoption of differential privacy in the machine learning community. Based on empirical evidence from prior work, this utility degradation in DP-SGD becomes more severe on larger neural network models, including those regularly used to achieve the best performance on difficult image classification tests.
Our work investigates this phenomenon and proposes a series of simple modifications to both the training procedure and the model architecture, providing a significant improvement in the accuracy of DP training on standard image classification benchmarks. The most striking observation from our research is that DP-SGD can be used to efficiently train much deeper models than previously thought, provided we ensure that the model gradients are well behaved. We believe that the substantial performance increase achieved by our research has the potential to unlock practical applications of image classification models trained with formal privacy guarantees.
The figure below summarizes two of our main results: an improvement of approximately 10% on CIFAR-10 compared to previous work when privately trained without additional data, and a top-1 accuracy of 86.7% on ImageNet when privately fine-tuning a pre-trained model on a different dataset, almost closing the gap with the best non-private performance.
Figure 3: (left) Our best results on training WideResNet models on CIFAR-10 without additional data. (right) Our best results on fine-tuning NFNet models on ImageNet. The best performing model was pre-trained on a disjoint internal ImageNet dataset.
These results are obtained at ε = 8, a standard parameter for calibrating the level of protection offered by differential privacy in machine learning applications. We refer to the paper for a discussion of this parameter, as well as additional experimental results at other values of ε and also on other data sets. Along with this paper, we also make our implementation open to allow other researchers to verify our results and build on them. We hope this contribution will help others interested in making Diploma Program practical training a reality.
Download our JAX implementation on GitHub.