Mathematical Foundations of Data Science

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This textbook aims to point out the most important principles of data analysis from the mathematical point of view. Specifically, it selected these questions for exploring:  Which are the principles necessary to understand the implications of an application, and which are necessary to understand the conditions for the success of methods used? Theory is presented only to the degree necessary to apply it properly, striving for the balance between excessive complexity and oversimplification.  Its primary focus is on principles crucial for application success.  

Topics and features:

  • Focuses on approaches supported by mathematical arguments, rather than sole computing experiences
  • Investigates conditions under which numerical algorithms used in data science operate, and what performance can be expected from them
  • Considers key data science problems: problem formulation including optimality measure; learning and generalization in relationships to training set size and number of free parameters; and convergence of numerical algorithms
  • Examines original mathematical disciplines (statistics, numerical mathematics, system theory) as they are specifically relevant to a given problem
  • Addresses the trade-off between model size and volume of data available for its identification and its consequences for model parametrization
  • Investigates the mathematical principles involves with natural language processing and computer vision
  • Keeps subject coverage intentionally compact, focusing on key issues of each topic to encourage full comprehension of the entire book

    Although this core textbook aims directly at students of computer science and/or data science, it will be of real appeal, too, to researchers in the field who want to gain a proper understanding of the mathematical foundations “beyond” the sole computing experience.

    Author(s): Tomas Hrycej, Bernhard Bermeitinger, Matthias Cetto, Siegfried Handschuh
    Series: Texts in Computer Science
    Edition: 1
    Publisher: Springer
    Year: 2023

    Language: English
    Pages: 226
    City: Cham
    Tags: Data Science; Big Data; Statistical Learning; Machine Learning; Deep Learning; Artificial Neural Networks; Data Processing; Pattern Recognition; Learning and Generalization; Numerical Algorithms; Natural Language Processing; Computer Vision

    Preface
    For Whom Is This Book Written?
    What Makes This Book Different?
    Comprehension Checks
    Acknowledgments
    Contents
    Acronyms
    Data Science and Its Tasks
    Mathematical Foundations
    Application-Specific Mappings and Measuring the Fit to Data
    2.1 Continuous Mappings
    2.1.1 Nonlinear Continuous Mappings
    2.1.2 Mappings of Probability Distributions
    2.2 Classification
    2.2.1 Special Case: Two Linearly Separable Classes
    2.2.2 Minimum Misclassification Rate for Two Classes
    2.2.3 Probabilistic Classification
    2.2.4 Generalization to Multiple Classes
    2.3 Dynamical Systems
    2.4 Spatial Systems
    2.5 Mappings Received by ``Unsupervised Learning''
    2.5.1 Representations with Reduced Dimensionality
    2.5.2 Optimal Encoding
    2.5.3 Clusters as Unsupervised Classes
    2.6 Chapter Summary
    2.7 Comprehension Check
    3 Data Processing by Neural Networks
    3.1 Feedforward and Feedback Networks
    3.2 Data Processing by Feedforward Networks
    3.3 Data Processing by Feedback Networks
    3.4 Feedforward Networks with External Feedback
    3.5 Interpretation of Network Weights
    3.6 Connectivity of Layered Networks
    3.7 Shallow Networks Versus Deep Networks
    3.8 Chapter Summary
    3.9 Comprehension Check
    Learning and Generalization
    4.1 Algebraic Conditions for Fitting Error Minimization
    4.2 Linear and Nonlinear Mappings
    4.3 Overdetermined Case with Noise
    4.4 Noise and Generalization
    4.5 Generalization in the Underdetermined Case
    4.6 Statistical Conditions for Generalization
    4.7 Idea of Regularization and Its Limits
    4.7.1 Special Case: Ridge Regression
    4.8 Cross-Validation
    4.9 Parameter Reduction Versus Regularization
    4.10 Chapter Summary
    4.11 Comprehension Check
    5 Numerical Algorithms for Data Science
    5.1 Classes of Minimization Problems
    5.1.1 Quadratic Optimization
    5.1.2 Convex Optimization
    5.1.3 Non-convex Local Optimization
    5.1.4 Global Optimization
    5.2 Gradient Computation in Neural Networks
    5.3 Algorithms for Convex Optimization
    5.4 Non-convex Problems with a Single Attractor
    5.4.1 Methods with Adaptive Step Size
    5.4.2 Stochastic Gradient Methods
    5.5 Addressing the Problem of Multiple Minima
    5.5.1 Momentum Term
    5.5.2 Simulated Annealing
    5.6 Section Summary
    5.7 Comprehension Check
    Applications
    Specific Problems of Natural Language Processing
    6.1 Word Embeddings
    6.2 Semantic Similarity
    6.3 Recurrent Versus Sequence Processing Approaches
    6.4 Recurrent Neural Networks
    6.5 Attention Mechanism
    6.6 Autocoding and Its Modification
    6.7 Transformer Encoder
    6.7.1 Self-attention
    6.7.2 Position-Wise Feedforward Networks
    6.7.3 Residual Connection and Layer Normalization
    6.8 Section Summary
    6.9 Comprehension Check
    Specific Problems of Computer Vision
    7.1 Sequence of Convolutional Operators
    7.1.1 Convolutional Layer
    7.1.2 Pooling Layers
    7.1.3 Implementations of Convolutional Neural Networks
    7.2 Handling Invariances
    7.3 Application of Transformer Architecture to Computer Vision
    7.3.1 Attention Mechanism for Computer Vision
    7.3.2 Division into Patches
    7.4 Section Summary
    7.5 Comprehension Check
    Index