Data Science Essentials in Python: Collect - Organize - Explore - Predict - Value

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Go from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyze data using statistical and network-theoretical methods; and see actual examples of data analysis at work. This one-stop solution covers the essential data science you need in Python.Data science is one of the fastest-growing disciplines in terms of academic research, student enrollment, and employment. Python, with its flexibility and scalability, is quickly overtaking the R language for data-scientific projects. Keep Python data-science concepts at your fingertips with this modular, quick reference to the tools used to acquire, clean, analyze, and store data.This one-stop solution covers essential Python, databases, network analysis, natural language processing, elements of machine learning, and visualization. Access structured and unstructured text and numeric data from local files, databases, and the Internet. Arrange, rearrange, and clean the data. Work with relational and non-relational databases, data visualization, and simple predictive analysis (regressions, clustering, and decision trees). See how typical data analysis problems are handled. And try your hand at your own solutions to a variety of medium-scale projects that are fun to work on and look good on your resume.Keep this handy quick guide at your side whether you're a student, an entry-level data science professional converting from R to Python, or a seasoned Python developer who doesn't want to memorize every function and option.What You Need: You need a decent distribution of Python 3.3 or above that includes at least NLTK, Pandas, NumPy, Matplotlib, Networkx, SciKit-Learn, and BeautifulSoup. A great distribution that meets the requirements is Anaconda, available for free from www.continuum.io. If you plan to set up your own database servers, you also need MySQL (www.mysql.com) and MongoDB (www.mongodb.com). Both packages are free and run on Windows, Linux, and Mac OS.

Author(s): Dmitry Zinoviev
Year: 2016

Language: English
Pages: 200

Cover
Table of Contents
Acknowledgments
Preface
About This Book
About the Audience
About the Software
Notes on Quotes
The Book Forum
Your Turn
1. What Is Data Science?
Unit 1. Data Analysis Sequence
Unit 2. Data Acquisition Pipeline
Unit 3. Report Structure
Your Turn
2. Core Python for Data Science
Unit 4. Understanding Basic String Functions
Unit 5. Choosing the Right Data Structure
Unit 6. Comprehending Lists Through List Comprehension
Unit 7. Counting with Counters
Unit 8. Working with Files
Unit 9. Reaching the Web
Unit 10. Pattern Matching with Regular Expressions
Unit 11. Globbing File Names and Other Strings
Unit 12. Pickling and Unpickling Data
Your Turn
3. Working with Text Data
Unit 13. Processing HTML Files
Unit 14. Handling CSV Files
Unit 15. Reading JSON Files
Unit 16. Processing Texts in Natural Languages
Your Turn
4. Working with Databases
Unit 17. Setting Up a MySQL Database
Unit 18. Using a MySQL Database: Command Line
Unit 19. Using a MySQL Database: pymysql
Unit 20. Taming Document Stores: MongoDB
Your Turn
5. Working with Tabular Numeric Data
Unit 21. Creating Arrays
Unit 22. Transposing and Reshaping
Unit 23. Indexing and Slicing
Unit 24. Broadcasting
Unit 25. Demystifying Universal Functions
Unit 26. Understanding Conditional Functions
Unit 27. Aggregating and Ordering Arrays
Unit 28. Treating Arrays as Sets
Unit 29. Saving and Reading Arrays
Unit 30. Generating a Synthetic Sine Wave
Your Turn
6. Working with Data Series and Frames
Unit 31. Getting Used to Pandas Data Structures
Unit 32. Reshaping Data
Unit 33. Handling Missing Data
Unit 34. Combining Data
Unit 35. Ordering and Describing Data
Unit 36. Transforming Data
Unit 37. Taming Pandas File I/O
Your Turn
7. Working with Network Data
Unit 38. Dissecting Graphs
Unit 39. Network Analysis Sequence
Unit 40. Harnessing Networkx
Your Turn
8. Plotting
Unit 41. Basic Plotting with PyPlot
Unit 42. Getting to Know Other Plot Types
Unit 43. Mastering Embellishments
Unit 44. Plotting with Pandas
Your Turn
9. Probability and Statistics
Unit 45. Reviewing Probability Distributions
Unit 46. Recollecting Statistical Measures
Unit 47. Doing Stats the Python Way
Your Turn
10. Machine Learning
Unit 48. Designing a Predictive Experiment
Unit 49. Fitting a Linear Regression
Unit 50. Grouping Data with K-Means Clustering
Unit 51. Surviving in Random Decision Forests
Your Turn
A1. Further Reading
A2. Solutions to Single-Star Projects
Bibliography
Index
– SYMBOLS –
– A –
– B –
– C –
– D –
– E –
– F –
– G –
– H –
– I –
– J –
– K –
– L –
– M –
– N –
– O –
– P –
– Q –
– R –
– S –
– T –
– U –
– V –
– W –
– X –
– Y –
– Z –