Not So Robust after All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks

Garaev, Roman; Rasheed, Bader; Khan, Adil Mehmood

doi:10.3390/a17040162

Not So Robust after All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks

Garaev, Roman; Rasheed, Bader; Khan, Adil Mehmood

Authors

Roman Garaev

Bader Rasheed

Professor Adil Khan A.M.Khan@hull.ac.uk
Professor

Abstract

Deep neural networks (DNNs) have gained prominence in various applications, but remain vulnerable to adversarial attacks that manipulate data to mislead a DNN. This paper aims to challenge the efficacy and transferability of two contemporary defense mechanisms against adversarial attacks: (a) robust training and (b) adversarial training. The former suggests that training a DNN on a data set consisting solely of robust features should produce a model resistant to adversarial attacks. The latter creates an adversarially trained model that learns to minimise an expected training loss over a distribution of bounded adversarial perturbations. We reveal a significant lack in the transferability of these defense mechanisms and provide insight into the potential dangers posed by L∞-norm attacks previously underestimated by the research community. Such conclusions are based on extensive experiments involving (1) different model architectures, (2) the use of canonical correlation analysis, (3) visual and quantitative analysis of the neural network’s latent representations, (4) an analysis of networks’ decision boundaries and (5) the use of equivalence of L2 and L∞ perturbation norm theories.

Citation

Garaev, R., Rasheed, B., & Khan, A. M. (2024). Not So Robust after All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks. Algorithms, 17, Article 162. https://doi.org/10.3390/a17040162

Journal Article Type	Article
Acceptance Date	Apr 15, 2024
Online Publication Date	Apr 19, 2024
Publication Date	2024
Deposit Date	Apr 19, 2024
Publicly Available Date	Apr 22, 2024
Journal	Algorithms
Print ISSN	1999-4893
Electronic ISSN	1999-4893
Publisher	MDPI
Peer Reviewed	Peer Reviewed
Volume	17
Article Number	162
DOI	https://doi.org/10.3390/a17040162
Keywords	machine learning; deep learning; adversarial attacks
Public URL	https://hull-repository.worktribe.com/output/4627382