Success in data science depends on the flexible and appropriate use of tools. That includes Python and R, two of the foundational programming languages in the field. This book guides data scientists from the Python and R communities along the path to becoming bilingual. By recognizing the strengths of both languages, you'll discover new ways to accomplish data science tasks and expand your skill set.
Authors Rick Scavetta and Boyan Angelov explain the parallel structures of these languages and highlight where each one excels, whether it's their linguistic features or the powers of their open source ecosystems. You'll learn how to use Python and R together in real-world settings and broaden your job opportunities as a bilingual data scientist.
• Learn Python and R from the perspective of your current language
• Understand the strengths and weaknesses of each language
• Identify use cases where one language is better suited than the other
• Understand the modern open source ecosystem available for both, including packages, frameworks, and workflows
• Learn how to integrate R and Python in a single workflow
• Follow a case study that demonstrates ways to use these languages together
Author(s): Rick J. Scavetta, Boyan Angelov
Edition: 1
Publisher: O'Reilly Media
Year: 2021
Language: English
Commentary: Vector PDF
Pages: 198
City: Sebastopol, CA
Tags: Data Science; Python; Data Visualization; R; Prophet; Data Engineering; Workflows; Data Exploration
Copyright
Table of Contents
Preface
Why We Wrote This Book
Technical Interactions
Who This Book Is For
Prerequisites
How This Book Is Organized
Let’s Talk
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Part I. Discovery of a New Language
Chapter 1. In the Beginning
The Origins of R
The Origins of Python
The Language War Begins
The Battle for Data Science Dominance
A Convergence on Cooperation and Community-Building
Final Thoughts
Part II. Bilingualism I: Learning a New Language
Chapter 2. R for Pythonistas
Up and Running with R
Projects and Packages
The Triumph of Tibbles
A Word About Types and Exploring
Naming (Internal) Things
Lists
The Facts About Factors
How to Find…Stuff
Reiterations Redo
Final Thoughts
Chapter 3. Python for UseRs
Versions and Builds
Standard Tooling
Virtual Environments
Installing Packages
Notebooks
How Does Python, the Language, Compare to R?
Import a Dataset
Examine the Data
Data Structures and Descriptive Statistics
Data Structures: Back to the Basics
Indexing and Logical Expressions
Plotting
Inferential Statistics
Final Thoughts
Part III. Bilingualism II: The Modern Context
Chapter 4. Data Format Context
External Versus Base Packages
Image Data
Text Data
Time Series Data
Base R
Prophet
Spatial Data
Final Thoughts
Chapter 5. Workflow Context
Defining Workflows
Exploratory Data Analysis
Static Visualizations
Interactive Visualizations
Machine Learning
Data Engineering
Reporting
Static Reporting
Interactive Reporting
Final Thoughts
Part IV. Bilingualism III: Becoming Synergistic
Chapter 6. Using the Two Languages Synergistically
Faux Operability
Interoperability
Going Deeper
Pass Objects Between R and Python in an R Markdown Document
Call Python in an R Markdown Document
Call Python by Sourcing a Python Script
Call Python Using the REPL
Call Python with Dynamic Input in an Interactive Document
Final Thoughts
Chapter 7. A Case Study in Bilingual Data Science
24 Years and 1.88 Million Wildfires
Setup and Importing Data
EDA and Data Visualization
Machine Learning
Setting Up Our Python Environment
Feature Engineering
Model Training
Prediction and UI
Final Thoughts
Appendix A. A Python:R Bilingual Dictionary
Package Management
Assign Operators
Types
Arithmetic Operators
Attributes
Keywords
Functions and Methods
Style and Naming Conventions
Analogous Data Storage Objects
Data Frames
Logical Expressions
Indexing
Index
About the Authors
Colophon