Build smart cybersecurity systems with the power of machine learning and deep learning to protect your corporate assets
Key Features
• Identify and predict security threats using artificial intelligence
• Develop intelligent systems that can detect unusual and suspicious patterns and attacks
• Learn how to test the effectiveness of your AI cybersecurity algorithms and tools
Book Description
Today's organizations spend billions of dollars globally on cybersecurity. Artificial intelligence has emerged as a great solution for building smarter and safer security systems that allow you to predict and detect suspicious network activity, such as phishing or unauthorized intrusions.
This cybersecurity book presents and demonstrates popular and successful AI approaches and models that you can adapt to detect potential attacks and protect your corporate systems. You'll learn about the role of machine learning and neural networks, as well as deep learning in cybersecurity, and you'll also learn how you can infuse AI capabilities into building smart defensive mechanisms. As you advance, you'll be able to apply these strategies across a variety of applications, including spam filters, network intrusion detection, botnet detection, and secure authentication.
By the end of this book, you'll be ready to develop intelligent systems that can detect unusual and suspicious patterns and attacks, thereby developing strong network security defenses using AI.
What you will learn
• Detect email threats such as spamming and phishing using AI
• Categorize APT, zero-days, and polymorphic malware samples
• Overcome antivirus limits in threat detection
• Predict network intrusions and detect anomalies with machine learning
• Verify the strength of biometric authentication procedures with deep learning
• Evaluate cybersecurity strategies and learn how you can improve them
Who this book is for
If you're a cybersecurity professional or ethical hacker who wants to build intelligent systems using the power of machine learning and AI, you'll find this book useful. Familiarity with cybersecurity concepts and knowledge of Python programming is essential to get the most out of this book.
Author(s): Alessandro Parisi
Edition: 1
Publisher: Packt Publishing
Year: 2019
Language: English
Commentary: Vector PDF
Pages: 342
City: Birmingham, UK
Tags: Machine Learning; Unsupervised Learning; Reinforcement Learning; Anomaly Detection; Cybersecurity; Supervised Learning; Python; Malware Detection; Generative Adversarial Networks; IBM Watson; Feature Engineering; Keras; TensorFlow; Best Practices; scikit-learn; Spam Detection; Network Intrusion Detection; Malware Analysis; NumPy; matplotlib; Jupyter; PyTorch; Seaborn; Fraud Detection; Anaconda; Phishing; Detection Avoiding; Naïve Bayes
Cover
Title Page
Copyright and Credits
About Packt
Contributors
Table of Contents
Preface
Section 1: AI Core Concepts and Tools of the Trade
Chapter 1: Introduction to AI for Cybersecurity Professionals
Applying AI in cybersecurity
Evolution in AI: from expert systems to data mining
A brief introduction to expert systems
Reflecting the indeterministic nature of reality
Going beyond statistics toward machine learning
Mining data for models
Types of machine learning
Supervised learning
Unsupervised learning
Reinforcement learning
Algorithm training and optimization
How to find useful sources of data
Quantity versus quality
Getting to know Python's libraries
Supervised learning example – linear regression
Unsupervised learning example – clustering
Simple NN example – perceptron
AI in the context of cybersecurity
Summary
Chapter 2: Setting Up Your AI for Cybersecurity Arsenal
Getting to know Python for AI and cybersecurity
Python libraries for AI
NumPy as an AI building block
NumPy multidimensional arrays
Matrix operations with NumPy
Implementing a simple predictor with NumPy
Scikit-learn
Matplotlib and Seaborn
Pandas
Python libraries for cybersecurity
Pefile
Volatility
Installing Python libraries
Enter Anaconda – the data scientist's environment of choice
Anaconda Python advantages
Conda utility
Installing packages in Anaconda
Creating custom environments
Some useful Conda commands
Python on steroids with parallel GPU
Playing with Jupyter Notebooks
Our first Jupyter Notebook
Exploring the Jupyter interface
What's in a cell?
Useful keyboard shortcuts
Choose your notebook kernel
Getting your hands dirty
Installing DL libraries
Deep learning pros and cons for cybersecurity
TensorFlow
Keras
PyTorch
PyTorch versus TensorFlow
Summary
Section 2: Detecting Cybersecurity Threats with AI
Chapter 3: Ham or Spam? Detecting Email Cybersecurity Threats with AI
Detecting spam with Perceptrons
Meet NNs at their purest – the Perceptron
It's all about finding the right weight!
Spam filters in a nutshell
Spam filters in action
Detecting spam with linear classifiers
How the Perceptron learns
A simple Perceptron-based spam filter
Pros and cons of Perceptrons
Spam detection with SVMs
SVM optimization strategy
SVM spam filter example
Image spam detection with SVMs
How did SVM come into existence?
Phishing detection with logistic regression and decision trees
Regression models
Introducing linear regression models
Linear regression with scikit-learn
Linear regression – pros and cons
Logistic regression
A phishing detector with logistic regression
Logistic regression pros and cons
Making decisions with trees
Decision trees rationales
Phishing detection with decision trees
Decision trees – pros and cons
Spam detection with Naive Bayes
Advantages of Naive Bayes for spam detection
Why Naive Bayes?
NLP to the rescue
NLP steps
A Bayesian spam detector with NLTK
Summary
Chapter 4: Malware Threat Detection
Malware analysis at a glance
Artificial intelligence for malware detection
Malware goes by many names
Malware analysis tools of the trade
Malware detection strategies
Static malware analysis
Static analysis methodology
Difficulties of static malware analysis
How to perform static analysis
Hardware requirements for static analysis
Dynamic malware analysis
Anti-analysis tricks
Getting malware samples
Hacking the PE file format
The PE file format as a potential vector of infection
Overview of the PE file format
The DOS header and DOS stub
The PE header structure
The data directory
Import and export tables
Extracting malware artifacts in a dataset
Telling different malware families apart
Understanding clustering algorithms
From distances to clusters
Clustering algorithms
Evaluating clustering with the Silhouette coefficient
K-Means in depth
K-Means steps
K-Means pros and cons
Clustering malware with K-Means
Decision tree malware detectors
Decision trees classification strategy
Detecting malwares with decision trees
Decision trees on steroids – random forests
Random Forest Malware Classifier
Detecting metamorphic malware with HMMs
How malware circumvents detection?
Polymorphic malware detection strategies
HMM fundamentals
HMM example
Advanced malware detection with deep learning
NNs in a nutshell
CNNs
From images to malware
Why should we use images for malware detection?
Detecting malware from images with CNNs
Summary
Chapter 5: Network Anomaly Detection with AI
Network anomaly detection techniques
Anomaly detection rationales
Intrusion Detection Systems
Host Intrusion Detection Systems
Network Intrusion Detection Systems
Anomaly-driven IDS
Turning service logs into datasets
Advantages of integrating network data with service logs
How to classify network attacks
Most common network attacks
Anomaly detection strategies
Anomaly detection assumptions and challenges
Detecting botnet topology
What is a botnet?
The botnet kill chain
Different ML algorithms for botnet detection
Gaussian anomaly detection
The Gaussian distribution
Anomaly detection using the Gaussian distribution
Gaussian anomaly detection example
False alarm management in anomaly detection
Receiver operating characteristic analysis
Summary
Section 3: Protecting Sensitive Information and Assets
Chapter 6: Securing User Authentication
Authentication abuse prevention
Are passwords obsolete?
Common authentication practices
How to spot fake logins
Fake login management – reactive versus predictive
Predicting the unpredictable
Choosing the right features
Preventing fake account creation
Account reputation scoring
Classifying suspicious user activity
Supervised learning pros and cons
Clustering pros and cons
User authentication with keystroke recognition
Coursera Signature Track
Keystroke dynamics
Anomaly detection with keystroke dynamics
Keystroke detection example code
User detection with multilayer perceptrons
Biometric authentication with facial recognition
Facial recognition pros and cons
Eigenfaces facial recognition
Dimensionality reduction with principal component analysis (PCA)
Principal component analysis
Variance, covariance, and the covariance matrix
Eigenvectors and Eigenvalues
Eigenfaces example
Summary
Chapter 7: Fraud Prevention with Cloud AI Solutions
Introducing fraud detection algorithms
Dealing with credit card fraud
Machine learning for fraud detection
Fraud detection and prevention systems
Expert-driven predictive models
Data-driven predictive models
FDPS – the best of both worlds
Learning from unbalanced and non-stationary data
Dealing with unbalanced datasets
Dealing with non-stationary datasets
Predictive analytics for credit card fraud detection
Embracing big data analytics in fraud detection
Ensemble learning
Bagging (bootstrap aggregating)
Boosting algorithms
Stacking
Bagging example
Boosting with AdaBoost
Introducing the gradient
Gradient boosting
eXtreme Gradient Boosting (XGBoost)
Sampling methods for unbalanced datasets
Oversampling with SMOTE
Sampling examples
Getting to know IBM Watson Cloud solutions
Cloud computing advantages
Achieving data scalability
Cloud delivery models
Empowering cognitive computing
Importing sample data and running Jupyter Notebook in the cloud
Credit card fraud detection with IBM Watson Studio
Predicting with RandomForestClassifier
Predicting with GradientBoostingClassifier
Predicting with XGBoost
Evaluating the quality of our predictions
F1 value
ROC curve
AUC (Area Under the ROC curve)
Comparing ensemble classifiers
The RandomForestClassifier report
The GradientBoostingClassifier report
The XGBClassifier report
Improving predictions accuracy with SMOTE
Summary
Chapter 8: GANs - Attacks and Defenses
GANs in a nutshell
A glimpse into deep learning
Artificial neurons and activation functions
From artificial neurons to neural networks
Getting to know GANs
Generative versus discriminative networks
The Nash equilibrium
The math behind GANs
How to train a GAN
An example of a GAN–emulating MNIST handwritten digits
GAN Python tools and libraries
Neural network vulnerabilities
Deep neural network attacks
Adversarial attack methodologies
Adversarial attack transferability
Defending against adversarial attacks
CleverHans library of adversarial examples
EvadeML-Zoo library of adversarial examples
Network attack via model substitution
Substitute model training
Generating the synthetic dataset
Fooling malware detectors with MalGAN
IDS evasion via GAN
Introducing IDSGAN
Features of IDSGAN
The IDSGAN training dataset
Generator network
Discriminator network
Understanding IDSGAN's algorithm training
Facial recognition attacks with GAN
Facial recognition vulnerability to adversarial attacks
Adversarial examples against FaceNet
Launching the adversarial attack against FaceNet's CNN
Summary
Section 4: Evaluating and Testing Your AI Arsenal
Chapter 9: Evaluating Algorithms
Best practices of feature engineering
Better algorithms or more data?
The very nature of raw data
Feature engineering to the rescue
Dealing with raw data
Data binarization
Data binning
Logarithmic data transformation
Data normalization
Min–max scaling
Variance scaling
How to manage categorical variables
Ordinal encoding
One-hot encoding
Dummy encoding
Feature engineering examples with sklearn
Min–max scaler
Standard scaler
Power transformation
Ordinal encoding with sklearn
One-hot encoding with sklearn
Evaluating a detector's performance with ROC
ROC curve and AUC measure
Examples of ROC metrics
ROC curve example
AUC score example
Brier score example
How to split data into training and test sets
Algorithm generalization error
Algorithm learning curves
Using cross validation for algorithms
K-folds cross validation pros and cons
K-folds cross validation example
Summary
Chapter 10: Assessing your AI Arsenal
Evading ML detectors
Understanding RL
RL feedback and state transition
Evading malware detectors with RL
Black-box attacks with RL
Challenging ML anomaly detection
Incident response and threat mitigation
Empowering detection systems with human feedback
Testing for data and model quality
Assessing data quality
Biased datasets
Unbalanced and mislabeled datasets
Missing values in datasets
Missing values example
Assessing model quality
Fine-tuning hyperparameters
Model optimization with cross validation
Ensuring security and reliability
Ensuring performance and scalability
Ensuring resilience and availability
Ensuring confidentiality and privacy
Summary
Other Books You May Enjoy
Index