Big Data Analytics with R and Hadoop

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapReduce, offer alternatives to traditional data warehousing.

Author(s): Vignesh Prajapati
Publisher: Packt Publishing
Year: 2013

Language: English
Pages: 238
Tags: Библиотека;Компьютерная литература;R;

Cover......Page 1
Copyright......Page 3
Credits......Page 4
About the Author......Page 5
Acknowledgment......Page 6
About the Reviewers......Page 7
www.PacktPub.com......Page 9
Table of Contents......Page 10
Preface......Page 16
Chapter 1: Getting Ready to Use R
and Hadoop......Page 28
Installing R......Page 29
Installing RStudio......Page 30
Performing data operations......Page 31
Increasing community support......Page 32
Performing data modeling in R......Page 33
Installing Hadoop......Page 34
Installing Hadoop on Linux, Ubuntu flavor
(single node cluster)......Page 35
Installing Hadoop on Linux, Ubuntu flavor (multinode cluster)......Page 38
Installing Cloudera Hadoop on Ubuntu......Page 40
Understanding MapReduce......Page 43
Understanding HDFS components......Page 45
Understanding the HDFS and MapReduce architecture by plot......Page 46
Understanding Hadoop subprojects......Page 48
Summary......Page 51
Understanding the basics of MapReduce......Page 52
Introducing Hadoop MapReduce......Page 54
Loading data into HDFS......Page 55
Executing the Map phase......Page 56
Reducing phase execution......Page 57
Understanding the limitations of MapReduce......Page 58
Understanding the different Java concepts used in Hadoop programming......Page 59
Understanding MapReduce objects......Page 60
Deciding the number of Reducers in MapReduce......Page 61
Understanding MapReduce dataflow......Page 62
Taking a closer look at Hadoop MapReduce terminologies......Page 63
Writing a Hadoop MapReduce example......Page 66
Understanding the steps to run a
MapReduce job......Page 67
Learning to monitor and debug a Hadoop MapReduce job......Page 73
Exploring HDFS data......Page 74
Understanding several possible MapReduce definitions to solve business problems......Page 75
Learning RHadoop......Page 76
Summary......Page 77
Chapter 3: Integrating R and Hadoop......Page 78
Introducing RHIPE......Page 79
Installing Hadoop......Page 80
Environment variables......Page 81
Installing RHIPE......Page 82
Understanding the architecture of RHIPE......Page 83
RHIPE sample program (Map only)......Page 84
Word count......Page 86
HDFS......Page 88
MapReduce......Page 90
Introducing RHadoop......Page 91
Installing RHadoop......Page 92
Understanding RHadoop examples......Page 94
Word count......Page 96
The hdfs package......Page 97
Summary......Page 100
Understanding the basics of
Hadoop streaming......Page 102
Understanding a MapReduce application......Page 107
Understanding how to code a MapReduce application......Page 109
Executing a Hadoop streaming job from the command prompt......Page 113
Exploring an output from the command prompt......Page 114
Exploring an output from R or an RStudio console......Page 115
Understanding basic R functions used in Hadoop MapReduce scripts......Page 116
Monitoring the Hadoop MapReduce job......Page 117
Exploring the HadoopStreaming
R package......Page 118
Understanding the hsTableReader function......Page 119
Understanding the hsKeyValReader function......Page 121
Understanding the hsLineReader function......Page 122
Running a Hadoop streaming job......Page 125
Summary......Page 127
Understanding the data analytics project life cycle......Page 128
Designing data requirement......Page 129
Performing analytics over data......Page 130
Visualizing data......Page 131
Understanding data analytics problems......Page 132
Designing data requirement......Page 133
Preprocessing data......Page 135
Performing analytics over data......Page 136
Identifying the problem......Page 143
Preprocessing data......Page 144
Performing analytics over data......Page 145
Visualizing data......Page 151
Identifying the problem......Page 152
Designing data requirement......Page 153
Preprocessing data......Page 154
Understanding Poisson-approximation resampling......Page 156
Summary......Page 162
Introduction to machine learning......Page 164
Linear regression......Page 165
Linear regression with R......Page 167
Linear regression with R and Hadoop......Page 169
Logistic regression......Page 172
Logistic regression with R and Hadoop......Page 174
Clustering......Page 177
Performing clustering with R and Hadoop......Page 178
Recommendation algorithms......Page 182
Steps to generate recommendations in R......Page 185
Generating recommendations with
R and Hadoop......Page 188
Summary......Page 193
Chapter 7: Importing and Exporting Data from Various DBs......Page 194
Learning about data files as database......Page 196
Importing the data into R......Page 197
Understanding MySQL......Page 198
Learning to list the tables and their structure......Page 199
Understanding data manipulation......Page 200
Importing data into R......Page 201
Understanding MongoDB......Page 202
Installing MongoDB......Page 203
Mapping SQL to MongoDB......Page 204
Importing the data into R......Page 205
Understanding data manipulation......Page 206
Understanding SQLite......Page 207
Importing the data into R......Page 208
Understanding PostgreSQL......Page 209
Installing RPostgreSQL......Page 210
Exporting the data from R......Page 211
Installing Hive......Page 212
Setting up Hive configurations......Page 213
Understanding RHive operations......Page 214
Understanding HBase features......Page 215
Installing HBase......Page 216
Installing RHBase......Page 218
Summary......Page 219
R + Hadoop help materials......Page 220
Hadoop groups......Page 222
Popular R contributors......Page 223
Popular Hadoop contributors......Page 224
Index......Page 226