Data Warehousing in the Age of Big Data

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing Data Warehouse. As Big Data continues to revolutionize how we use data, it doesn't have to create more confusion. Expert author, Krish Krishnan will help you make sense of how Big Data fits into the world of Data Warehousing in clear and concise detail. You'll learn what you need to know about your infrastructure options and integration and come away with a solid understanding of how to leverage various architectures for integration. This book includes several business use cases that will really help you visualize reference architectures on Big Data and Data Warehouse. Learn how to leverage Big Data by effectively integrating it into your Data Warehouse. Includes real world examples and use cases that clearly demonstrate Hadoop, NoSQL, HBASE, Hive, and other Big Data technologies Understand how to optimize and tune your current Data Warehouse infrastructure and integrate newer infrastructure matching data processing workloads and requirements

Author(s): Krish Krishnan
Series: The Morgan Kaufmann Series on Business Intelligence
Edition: 1
Publisher: Morgan Kaufmann
Year: 2013

Language: English
Pages: 370
Tags: Информатика и вычислительная техника;Искусственный интеллект;Интеллектуальный анализ данных;

Front Cover......Page 1
Data Warehousing in the Age of Big Data......Page 4
Copyright Page......Page 5
Contents......Page 8
Acknowledgments......Page 16
About the Author......Page 18
Introduction......Page 20
Part 1: Big Data......Page 22
Part 3: Building the Big Data – Data Warehouse......Page 23
Companion website......Page 24
1 BIG DATA......Page 26
Big Data......Page 28
Why Big Data and why now?......Page 30
Social Media posts......Page 31
Survey data analysis......Page 32
Survey data......Page 33
Integration and analysis......Page 36
Additional data types......Page 38
Further reading......Page 39
Data explosion......Page 40
Machine data......Page 42
Emails......Page 43
Geographic information systems and geo-spatial data......Page 44
Example: Funshots, Inc.......Page 46
Data velocity......Page 48
Social media......Page 49
Data variety......Page 50
Summary......Page 52
Data processing revisited......Page 54
Data processing techniques......Page 55
Storage......Page 56
Processing......Page 57
Shared-everything and shared-nothing architectures......Page 58
Shared-nothing architecture......Page 59
OLTP versus data warehousing......Page 60
Big Data processing......Page 61
Infrastructure explained......Page 64
Telco Big Data study......Page 65
Data processing......Page 67
Introduction......Page 70
Distributed data processing......Page 71
Big Data processing requirements......Page 74
Technologies for Big Data processing......Page 75
Google file system......Page 76
Hadoop......Page 78
HDFS......Page 79
DataNodes......Page 80
HDFS client......Page 81
Heartbeats......Page 82
File system snapshots......Page 83
JobTracker and TaskTracker......Page 84
MapReduce......Page 85
MapReduce programming model......Page 86
MapReduce program design......Page 87
MapReduce job processing and management......Page 88
MapReduce v2 (YARN)......Page 89
YARN scalability......Page 91
Comparison between MapReduce v1 and v2......Page 92
SQL/MapReduce......Page 93
Zookeeper features......Page 94
Failure and recovery......Page 96
Programming with pig latin......Page 97
Common pig command......Page 98
HBase......Page 99
HBase architecture......Page 100
HBase components......Page 101
Write-ahead log......Page 102
Hive......Page 103
Hive architecture......Page 104
Execution: how does hive process queries?......Page 105
Chukwa......Page 107
HCatalog......Page 108
Sqoop1......Page 109
Hadoop summary......Page 110
NoSQL......Page 111
CAP theorem......Page 112
Column family store: Cassandra......Page 113
Data model......Page 114
Data sorting......Page 116
Built-in consistency repair features......Page 117
Cassandra ring architecture......Page 118
Data partitioning......Page 119
Gossip protocol: node management......Page 120
Document database: Riak......Page 121
Textual ETL processing......Page 122
Further reading......Page 124
Introduction......Page 126
Producing electricity from wind......Page 127
Tackling Big Data challenges......Page 129
Surveillance and security: TerraEchos......Page 130
The benefit......Page 131
Correlating sensor data delivers a zero false-positive rate......Page 132
Challenges......Page 133
Solution: getting ready for Big Data analytics......Page 134
Why aster?......Page 135
Overview......Page 136
Making better use of the data resource......Page 137
Solution components......Page 138
Merging human knowledge and technology......Page 139
Solution spotlight......Page 140
Solution......Page 141
Facilitates innovation......Page 142
Overview......Page 143
Enabling a better cross-sell and upsell opportunity......Page 146
Example......Page 147
Summary......Page 148
2 THE DATA WAREHOUSING......Page 150
Introduction......Page 152
Traditional data warehousing, or data warehousing 1.0......Page 153
Data architecture......Page 154
Infrastructure......Page 155
Pitfalls of data warehousing......Page 156
Performance......Page 157
Scalability......Page 160
Architecture approaches to building a data warehouse......Page 162
Pros and cons of datamart BUS architecture approach......Page 164
Data warehouse 2.0......Page 165
Overview of DSS 2.0......Page 166
Further reading......Page 169
Introduction......Page 172
Enterprise data warehouse platform......Page 173
Data warehouse......Page 174
Issues with the data warehouse......Page 175
Replatforming......Page 177
Platform engineering......Page 178
Data engineering......Page 179
Modernizing the data warehouse......Page 180
Current-state analysis......Page 182
Business benefits of modernization......Page 183
Scorecard......Page 184
Program roadmap......Page 185
Summary......Page 187
Current state......Page 188
Defining workloads......Page 189
Understanding workloads......Page 190
Datamarts......Page 192
Analytical databases......Page 193
Data warehouse processing overheads......Page 194
Wide/Wide......Page 195
Narrow/Wide......Page 196
ETL and CDC workloads......Page 197
Measurement......Page 199
Current system design limitations......Page 200
Big Data workloads......Page 201
Technology choices......Page 202
Summary......Page 203
Data warehouse challenges revisited......Page 204
Data volumes......Page 205
Data transport......Page 206
Data warehouse appliance......Page 207
Appliance architecture......Page 208
Data distribution in the appliance......Page 209
Key best practices for deploying a data warehouse appliance......Page 211
Cloud computing......Page 212
Platform as a service......Page 213
Cloud infrastructure......Page 214
Issues facing cloud computing for data warehouse......Page 215
What is data virtualization?......Page 216
Implementing a data virtualization program......Page 218
In-memory technologies......Page 219
Further reading......Page 220
3 BUILDING THE BIG DATA – DATA WAREHOUSE......Page 222
Introduction......Page 224
Data layer......Page 225
Algorithms......Page 227
Technology layer......Page 228
Data classification......Page 229
Workload......Page 231
Physical component integration and architecture......Page 232
Data volumes......Page 233
External data integration......Page 234
Hadoop & RDBMS......Page 236
Big Data appliances......Page 237
Data virtualization......Page 239
Semantic framework......Page 240
Clustering......Page 241
Summary......Page 242
Metadata......Page 244
Process design–level metadata......Page 246
Core business metadata......Page 247
Master data management......Page 248
Processing data in the data warehouse......Page 250
Processing complexity of Big Data......Page 253
Processing Big Data......Page 254
Analysis stage......Page 255
Metadata, master data, and semantic linkage......Page 256
Types of probabilistic links......Page 258
Machine learning......Page 260
Summary......Page 265
Information life-cycle management......Page 266
Goals......Page 267
Executive governance board......Page 268
Business teams......Page 269
Data quality......Page 270
Metadata......Page 271
Information life-cycle management for Big Data......Page 272
Data governance......Page 273
Processing......Page 274
Summary......Page 275
Big Data analytics......Page 276
Data discovery......Page 278
Visualization......Page 279
Summary......Page 280
Customer-centric business transformation......Page 282
Outcomes......Page 285
Hadoop and MySQL drives innovation......Page 286
Benefits......Page 288
Empowering decision making......Page 289
Summary......Page 290
Case study 1: Transforming marketing landscape......Page 292
Case study 2: Streamlining healthcare connectivity with Big Data......Page 296
Case study 3: Improving healthcare quality and costs using Big Data......Page 299
Case study 4: Improving customer support......Page 302
Case study 5: Driving customer-centric transformations......Page 306
Case study 6: Quantifying risk and compliance......Page 308
Case study 7: Delivering a 360° view of customers......Page 309
Executive summary......Page 314
The healthcare information factory......Page 315
A visionary architecture......Page 316
A common patient identifier......Page 317
Integrating data......Page 318
ETL and the collective common data warehouse......Page 319
Common elements of a data warehouse......Page 322
DSS/business intelligence processing......Page 323
Textual data......Page 325
The system of record......Page 332
Metadata......Page 333
Local individual data warehouses......Page 334
Data models and the healthcare information factory......Page 335
Creating the medical data warehouse data model......Page 341
The collective common data model......Page 342
Developing the healthcare information factory......Page 347
Healthcare information factory users......Page 351
Financing the infrastructure......Page 354
Implementing the healthcare information factory......Page 355
Further reading......Page 357
Summary......Page 358
Index......Page 360