You are What You Write: Preserving Privacy in the Era of Pre-Trained Language Models
61 Pages Posted: 13 Apr 2023
Abstract
Large scale adoption of pre-trained language models has introduced a new era of convenient knowledge transfer for a slew of natural language processing tasks. However, these models run the risk of undermining user trust, since they may enable malicious users to expose personally identifying information about subjects in other datasets through re-identification attacks. We present an empirical investigation into the extent of the personal information that can be extracted from pre-trained representations produced by popular models, and we show a positive correlation between the complexity of a model, the amount of data used in pre-training, and data leakage. In this paper, we present the first wide coverage evaluation and comparison of state-of-the-art privacy-preserving algorithms, on a large, multi-lingual dataset on sentiment analysis annotated with demographic information (location, age and gender). We provide evidence that privacy-preserving methods are more effective when applied to larger and more complex models, with improvements of >20% over non-private baselines. We also find that local differential privacy imposes serious performance penalties of ~20% in our test setting, which can be ameliorated using hybrid or metric-DP techniques.
Keywords: language models, privacy-preserving, differential privacy, adversarial learning, reidentification attacks
Suggested Citation: Suggested Citation