Robustness and Generalization via Generative Adversarial Training


While deep neural networks have achieved remarkable success in various computer vision tasks, they often fail to generalize to subtle variations of input images. Several defenses have been proposed to improve the robustness against these variations. However, current defenses can only withstand the specific attack used in training, and the models often remain vulnerable to other input variations. Moreover, these methods often degrade performance of the model on clean images. In this paper, we present Generative Adversarial Training, an approach to simultaneously improve the model’s generalization and robustness to unseen adversarial attacks. Instead of altering a single pre-defined aspect of images, we generate a spectrum of low-level, mid-level and high-level changes using generative models with a disentangled latent space. Adversarial training with these examples enable the model to withstand a wide range of attacks by observing a variety of input alterations during training. We show that our approach not only improves performance of the model on clean images but also makes it robust against unforeseen attacks and outperforms prior work. We validate effectiveness of our method by demonstrating results on various tasks such as classification, semantic segmentation and object detection.