Python Machine Learning Cookbook: Practical Solutions from Preprocessing to Deep Learning (draft)

Share
Add to Wishlist
PDF

Computers\\Programming

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Download

Search on Amazon

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Author(s): Chris Albon
Publisher: O'Reilly
Year: 2017

Language: English

Chapter 1. 1.0 Introduction The first step in any machine learning endeavor is get to the raw data into our system. The raw data can be held in a log file, dataset file, or database. Furthermore, often we will want to get data from multiple sources. The recipes in this chapter look at methods of loading data from a variety of sources including CSV files and SQL databases. We also cover methods of generating simulated data with desirable properties for experimentation. Finally, while there are many ways to load data in the Python ecosystem, we will focus on using the pandas library’s extensive set of methods for loading external data and scikit-learn -- an open source machine learning library Python -- for generating simulated data. 1.1 Loading A Sample Dataset Problem You need to load a pre-existing sample dataset. Solution scikit-learn comes with a number of popular datasets for you to use. # Load scikit-learn's datasets from sklearn import datasets # Load the digits dataset digits =
Chapter 1. 1.0 Introduction
Chapter 2. 2.0 Introduction Data wrangling is a broad term use, often informally, to describe the process of transforming raw data to a clean and organized format ready for further preprocessing, or final use. For us, data wrangling is only one step in preprocessing our data, but it is an important step. The most common data structure used to “wrangle” data is the data frame, which can be both intuitive and incredibly versatile. Data frames are tabular, meaning that they are based on rows and columns like you would see in a spreadsheet. Here is a data frame created from data about passengers on the Titanic: # Load library import pandas as pd # Create URL url = 'https://raw.githubusercontent.com/chrisalbon/simulated_datasets/master/titanic.csv' # Load data df = pd.read_csv(url) # Show the first 5 rows df.head(5) Name PClass Age Sex Survived SexCode 0 Allen, Miss Elisabeth Walton 1st 29.00 female 1 1 1 Allison, Miss Helen Loraine 1st 2.00 female 0 1 2 Allison, Mr Hudson Joshua Creighton
Chapter 2. 2.0 Introduction
Chapter 3. 3.0 Introduction Quantitative data is the measurement of something -- whether class size, monthly sales, or student scores. The natural way to represent these quantities is numerically (e.g. 29 students, $529,392 in sales, etc.). In this chapter, we will cover numerous strategies for transforming raw numerical data into features purpose-built for machine learning algorithms. 3.1 Rescaling A Feature Problem You need to rescale the values of a numerical feature to be between two values. Solution Use scikit-learn’s MinMaxScaler to rescale a feature array: # Load libraries from sklearn import preprocessing import numpy as np # Create feature x = np.array([[-500.5], [-100.1], [0], [100.1], [900.9]]) # Create scaler minmax_scale = preprocessing.MinMaxScaler(feature_range=(0, 1)) # Scale feature x_scale = minmax_scale.fit_transform(x) # Show feature x_scale array([[ 0. ], [ 0.28571429], [ 0.35714286], [ 0.42857143], [ 1. ]]) Discussion Rescaling is a common preprocessing task in ma
Chapter 3. 3.0 Introduction
Blank Page

Python Machine Learning Cookbook: Practical Solutions from Preprocessing to Deep Learning (draft)

My Account

Infomation

Talk To Us

Python Machine Learning Cookbook: Practical Solutions from Preprocessing to Deep Learning (draft)

How to Download?

Is it Free?

Book is not loading. What to do?