Python Data Science

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Rather than presenting Python as Java or C, this textbook focuses on the essential Python programming skills for data scientists and advanced methods for big data analysts. Unlike conventional textbooks, it is based on Markdown and uses full-color printing and a code-centric approach to highlight the 3C principles in data science: creative design of data solutions, curiosity about the data lifecycle, and critical thinking regarding data insights. Q&A-based knowledge maps, tips and suggestions, notes, as well as warnings and cautions are employed to explain the key points, difficulties, and common mistakes in Python programming for data science. In addition, it includes suggestions for further reading. This textbook provides an open-source community via GitHub, and the course materials are licensed for free use under the following license: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).

Author(s): Chaolemen Borjigin
Publisher: Springer
Year: 2023

Language: English
Pages: 353

Preface
Contents
1. Python and Data Science
1.1 How to learn Python for data science
1.2 How to setup my Python IDE for Data Science
1.3 How to write and run my Python codes
1.3.1 Inputs
1.3.2 Outputs
1.3.3 Errors and warnings
1.3.4 External data files
1.3.5 Tips for Python programming
Exercises
2. Basic Python Programming for Data Science
2.1 Data Types
2.1.1 Checking data types
2.1.2 Testing data types
2.1.3 Converting data types
2.1.4 Built-in data types
2.1.5 Sequences
2.2 Variables
2.2.1 Defining variables
2.2.2 Dynamically typed language
2.2.3 Strongly typed language
2.2.4 Variable naming rules
2.2.5 Case-sensitivity
2.2.6 Variable naming rules
2.2.7 Checking IPython variables
2.2.8 Checking Python keywords
2.2.9 Checking all defined variables
2.2.10 Deleting variables
2.3 Operators and Expressions
2.3.1 Common used operators
2.3.2 Built-in functions
2.3.3 Math modules
2.3.4 Precedence and associativity
2.4 Statements
2.4.1 Writing a statement in a line
2.4.2 Writing multiple statements in a single line
2.4.3 Splitting a statement into multiple lines
2.4.4 Compound statements
2.4.5 Empty statements
2.5 Assignment statements
2.5.1 Assigning objects
2.5.2 Chained assignment statements
2.5.3 Augmented assignment statements
2.5.4 Sequence unpacking
2.5.5 Swapping two variables
2.6 Comments
2.6.1 Line comments
2.6.2 Block comments
2.7 If statements
2.7.1 Basic syntax
2.7.2 Elif statement
2.7.3 Ternary operators
2.7.4 Advanced syntax
2.8 For statements
2.8.1 Basic syntax
2.8.2 The range( ) function
2.8.3 Advanced syntax
2.9 While statements
2.9.1 Basic syntax
2.9.2 Advanced syntax
2.10 Lists
2.10.1 Defining lists
2.10.2 Slicing
2.10.3 Reversing
2.10.4 Type conversion
2.10.5 the extend and append operator
2.10.6 List derivation
2.10.7 Insertion and deletion
2.10.8 Basic functions
2.11 Tuples
2.11.1 Define tuples
2.11.2 Main features
2.11.3 Basic usage
2.11.4 Tuples in data science
2.12 Strings
2.12.1 Defining strings
2.12.2 Main features
2.12.3 String operations
2.13 Sequences
2.13.1 Indexing
2.13.2 Slicing
2.13.3 Iteration
2.13.4 Unpacking
2.13.5 Repeat operator
2.13.6 Basic Functions
2.14 Sets
2.14.1 Defining sets
2.14.2 Main features
2.14.3 Basic operations
2.14.4 Sets and data science
2.15 Dictionaries
2.15.1 Defining dictionaries
2.15.2 Accessing dictionary items
2.15.3 Dictionary and data science
2.16 Functions
2.16.1 Built-in functions
2.16.2 Module Functions
2.16.3 User-defined functions
2.17 Built-in functions
2.17.1 Calling built-in functions
2.17.2 Mathematical functions
2.17.3 Type conversion functions
2.17.4 Other common used functions
2.18 Module functions
2.18.1 import module name
2.18.2 import module name as alias
2.18.3 From module name import function name
2.19 User-defined functions
2.19.1 Defining user-defined functions
2.19.2 Function docStrings
2.19.3 Calling user-defined functions
2.19.4 Returning values
2.19.5 Parameters and arguments
2.19.6 Scope of variables
2.19.7 Pass-by-value and pass-by-reference
2.19.8 Arguments in functions
2.20 Lambda functions
2.20.1 Defining a lambda function
2.20.2 Calling a lambda function
Exercises
3. Advanced Python Programming for Data Science
3.1 Iterators and
3.1.1 Iterable objects vs. iterators
3.1.2 Generator vs. iterators
3.2 Modules
3.2.1 Importing and using modules
3.2.2 Checking built-in modules list
3.3 Packages
3.3.1 Packages vs modules
3.3.2 Installing packages
3.3.3 Checking installed packages
3.3.4 Updating or removing installed packages
3.3.5 Importing packages or modules
3.3.6 Checking Package Version
3.3.7 Commonly used Packages
3.4 Help documentation
3.4.1 The help function
3.4.2 DocString
3.4.3 Checking source code
3.4.4 The doc attribute
3.4.5 The dir() function
3.5 Exception and errors
3.5.1 Try/Except/Finally
3.5.2 Exception reporting mode
3.5.3 Assertion
3.6 Debugging
3.6.1 Enabling the Python Debugger
3.6.2 Changing exception reporting modes
3.6.3 Working with checkpoints
3.7 Search path
3.7.1 The variable search path
3.7.2 The module search path
3.8 Current working directory
3.8.1 Getting current working directory
3.8.2 Resetting current working directory
3.8.3 Reading / writing current working directory
3.9 Object-oriented programming
3.9.1 Classes
3.9.2 Methods
3.9.3 Inheritance
3.9.4 Attributes
3.9.5 Self and Cls
3.9.6 __new__ () and __init__()
Exercises
4. Data wrangling with Python
4.1 Random number generation
4.1.1 Generating a random number at a time
4.1.2 Generating a random array at a time
4.2 Multidimensional arrays
4.2.1 Createting ndarrays
4.2.2 Slicing and indexing ndarrays
4.2.3 Shallow copy and deep copy
4.2.4 Shape and reshape
4.2.5 Dimension and size
4.2.6 Evaluation of ndarrays
4.2.7 Insertion and deletion
4.2.8 Handling missing values
4.2.9 Broadcasting ndarray
4.2.10 Sorting an ndarray
4.3 Series
4.3.1 Creating Series
4.3.2 Working with Series
4.4 DataFrame
4.4.1 Creating DataFrames
4.4.2 Index or columns of DataFrames
4.4.3 Slicing DataFrames
4.4.4 Filtering DataFrames
4.4.5 Arithmetic operating on DataFrames
4.4.6 Descriptive analysis of DataFrames
4.4.7 Sorting DataFrames
4.4.8 Importing/Exporting DataFrames
4.4.9 Handling missing values with Pandas
4.4.10 Grouping DataFrames
4.5 Date and time
4.5.1 Creating a time or date object
4.5.2 Parsing a string to a time or date object
4.5.3 Getting current local data or time object
4.5.4 Evaluating the difference between two date or time objects
4.5.5 Setting a time or date object as the index of Pandas
4.5.6 The pandas.period_range() method
4.6 Data visualization
4.6.1 Matplotlib visualization
4.6.2 Adjusting plot attributes
4.6.3 Changing the type of a plot
4.6.4 Changing the value range of the axes of a plot
4.6.5 Adjusting the margins of a plot
4.6.6 Creating multiple plots on the same coordinates
4.6.7 Adding an Axes to the current figure or retrieving an existing Axes
4.6.8 Saving plots to image files
4.6.9 Creating more complicate plots
4.6.10 Data visualization with Pandas
4.6.11 Data visualization with Seaborn
4.6.12 Data visualization cases projects
Exercises
5. Data analysis with Python
5.1 Statistical modelling with statsmodels
5.1.1 Business understanding
5.1.2 Data loading
5.1.3 Data understanding
5.1.4 Data wrangling
5.1.5 Model selection and hyperparameter tuning
5.1.6 Fitting model and summarizing the Regression Results
5.1.7 Model evaluation
5.1.8 Assumptions testing
5.1.9 Model optimization and re-selection
5.1.10 Model application
5.2 Machine learning with scikit-learn
5.2.1 Business understanding
5.2.2 Data loading
5.2.3 Data understanding
5.2.4 Data wrangling
5.2.5 Model selection and hyperparameter tuning
5.2.6 Model training
5.2.7 Predicting with a trained model
5.2.8 Model evaluation
5.2.9 Model optimization and application
5.3 Natural language understanding with NLTK
5.3.1 Business understanding
5.3.2 Data loading
5.3.3 Data understanding
5.3.4 Text normalization
5.3.5 Tokenization
5.3.6 Extracting high frequency words
5.3.7 Generating word clouds
5.4 Image processing with OpenCV
5.4.1 Installing and importing opencv-python package
5.4.2 Loading image from file
5.4.3 Converting a RGB image into Grayscale
5.4.4 Detecting faces
5.4.5 Showing images
5.4.6 Writing images
Exercises
Appendix I Best Python Resources for Data Scientists
Appendix II Answers to Chapter Exercises