Adversarial Robustness for Machine Learning summarizes the recent progress on this topic and introduces popular algorithms on adversarial attack, defense and verification. Sections cover adversarial attack, verification and defense, mainly focusing on image classification applications which are the standard benchmark considered in the adversarial robustness community. Other sections discuss adversarial examples beyond image classification, other threat models beyond testing time attack, and applications on adversarial robustness. For researchers, this book provides a thorough literature review that summarizes latest progress in the area, which can be a good
In addition, the book can also be used as a textbook for graduate courses on adversarial robustness or trustworthy machine learning. While machine learning (ML) algorithms have achieved remarkable performance in many applications, recent studies have demonstrated their lack of robustness against adversarial disturbance. The lack of robustness brings security concerns in ML models for real applications such as self-driving cars, robotics controls and healthcare systems.
Author(s): Pin-Yu Chen, Cho-Jui Hsieh
Publisher: Academic Press
Year: 2022
Language: English
Pages: 275
City: London
Contents
Preface
References
Biography
Dr. Pin-Yu Chen (1986–present)
Dr. Cho-Jui Hsieh (1985–present)
1 Background and motivation
1.1 What is adversarial machine learning?
1.2 Mathematical notations
1.3 Machine learning basics
1.4 Motivating examples
Adversarial robustness <> accuracy – what standard accuracy fails to tell
Fast adaptation of adversarial robustness evaluation assets for emerging machine learning models
1.5 Practical examples of AI vulnerabilities
1.6 Open-source Python libraries for adversarial robustness
2 White-box adversarial attacks
2.1 Attack procedure and notations
2.2 Formulating attack as constrained optimization
2.3 Steepest descent, FGSM and PGD attack
2.4 Transforming to an unconstrained optimization problem
2.5 Another way to define attack objective
2.6 Attacks with different lp norms
2.7 Universal attack
2.8 Adaptive white-box attack
2.9 Empirical comparison
2.10 Extended reading
3 Black-box adversarial attacks
3.1 Evasion attack taxonomy
3.2 Soft-label black-box attack
3.3 Hard-label black-box attack
3.4 Transfer attack
3.5 Attack dimension reduction
3.6 Empirical comparisons
3.7 Proof of Theorem 1
3.8 Extended reading
4 Physical adversarial attacks
4.1 Physical adversarial attack formulation
4.2 Examples of physical adversarial attacks
4.3 Empirical comparison
4.4 Extending reading
5 Training-time adversarial attacks
5.1 Poisoning attack
5.2 Backdoor attack
5.3 Empirical comparison
5.4 Case study: distributed backdoor attacks on federated learning
5.5 Extended reading
6 Adversarial attacks beyond image classification
6.1 Data modality and task objectives
6.2 Audio adversarial example
6.3 Feature identification
6.4 Graph neural network
6.5 Natural language processing
Sentence classification
Sequence-to-sequence translation
6.6 Deep reinforcement learning
6.7 Image captioning
6.8 Weight perturbation
6.9 Extended reading
7 Overview of neural network verification
7.1 Robustness verification versus adversarial attack
7.2 Formulations of robustness verification
7.3 Applications of neural network verification
Safety-critical control systems
Natural language processing
Machine learning interpretability
7.4 Extended reading
8 Incomplete neural network verification
8.1 A convex relaxation framework
8.2 Linear bound propagation methods
The optimal layerwise convex relaxation
8.3 Convex relaxation in the dual space
8.4 Recent progresses in linear relaxation-based methods
8.5 Extended reading
9 Complete neural network verification
9.1 Mixed integer programming
9.2 Branch and bound
9.3 Branch-and-bound with linear bound propagation
9.4 Empirical comparison
10 Verification against semantic perturbations
10.1 Semantic adversarial example
10.2 Semantic perturbation layer
10.3 Input space refinement for semantify-NN
10.4 Empirical comparison
11 Overview of adversarial defense
11.1 Empirical defense versus certified defense
11.2 Overview of empirical defenses
12 Adversarial training
12.1 Formulating adversarial training as bilevel optimization
12.2 Faster adversarial training
12.3 Improvements on adversarial training
12.4 Extended reading
13 Randomization-based defense
13.1 Earlier attempts and the EoT attack
13.2 Adding randomness to each layer
13.3 Certified defense with randomized smoothing
13.4 Extended reading
14 Certified robustness training
14.1 A framework for certified robust training
14.2 Existing algorithms and their performances
Interval bound propagation (IBP)
Linear relaxation-based training
14.3 Empirical comparison
14.4 Extended reading
15 Adversary detection
15.1 Detecting adversarial inputs
15.2 Detecting adversarial audio inputs
15.3 Detecting Trojan models
15.4 Extended reading
16 Adversarial robustness of beyond neural network models
16.1 Evaluating the robustness of K-nearest-neighbor models
A primal-dual quadratic programming formulation
Dual quadratic programming problems
Robustness verification for 1-NN models
Efficient algorithms for computing 1-NN robustness
Extending beyond 1-NN
Robustness of KNN vs neural network on simple problems
16.2 Defenses with nearest-neighbor classifiers
16.3 Evaluating the robustness of decision tree ensembles
Robustness of a single decision tree
Robustness of ensemble decision stumps
Robustness of ensemble decision trees
Training robust tree ensembles
17 Adversarial robustness in meta-learning and contrastive learning
17.1 Fast adversarial robustness adaptation in model-agnostic meta-learning
When and how to incorporate robust regularization in MAML?
17.2 Adversarial robustness preservation for contrastive learning: from pretraining to finetuning
18 Model reprogramming
18.1 Reprogramming voice models for time series classification
18.2 Reprogramming general image models for medical image classification
18.3 Theoretical justification of model reprogramming
18.4 Proofs
18.5 Extended reading
19 Contrastive explanations
19.1 Contrastive explanations method
19.2 Contrastive explanations with monotonic attribute functions
19.3 Empirical comparison
19.4 Extended reading
20 Model watermarking and fingerprinting
20.1 Model watermarking
20.2 Model fingerprinting
20.3 Empirical comparison
20.4 Extended reading
21 Data augmentation for unsupervised machine learning
21.1 Adversarial examples for unsupervised machine learning models
21.2 Empirical comparison
Index