C4DT Distinguished Lecture : Hidden Backdoors in Deep Learning Systems
By Ben Zhao, UChicago
The lack of transparency in today’s deep learning systems has paved the way for a new type of threats, commonly referred to as backdoor or Trojan attacks. In a backdoor attack, a malicious party can corrupt a deep learning model (either at initial training time or later) to embed hidden classification rules that do not interfere with normal classification, unless an unusual “trigger” is applied to the input, which would then produce unusual (and likely incorrect) results. For example, a facial recognition model with a backdoor might recognize anyone with a pink earring as Elon Musk. Backdoor attacks have been validated in a number of image classification applications, and are difficult to detect given the black-box nature of most DNN models.
In this talk, I will describe two recent results on detecting and understanding backdoor attacks on deep learning systems. I will first present Neural Cleanse (S&P 2019), the first robust tool to detect a wide range of backdoors in deep learning models. We use the idea of inter-label perturbation distances to detect when a backdoor trigger has created shortcuts to misclassification to a particular label. Second, I will describe our new work on Latent Backdoors (CCS 2019), a stronger type of backdoor attacks that are more difficult to detect, and survives retraining in commonly used transfer learning systems. We use experimental validation to show that latent backdoors can be quite robust and stealthy, even against the latest detection tools (including neural cleanse). There are no known techniques to detect latent backdoors, but we present alternative techniques to defend against them via disruption.