Beautiful Data: The Stories Behind Elegant Data Solutions

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

In this insightful book, you'll learn from the best data practitioners in the field just how wide-ranging -- and beautiful -- working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video. With Beautiful Data, you will:

  • Explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web
  • Learn how to visualize trends in urban crime, using maps and data mashups
  • Discover the challenges of designing a data processing system that works within the constraints of space travel
  • Learn how crowdsourcing and transparency have combined to advance the state of drug research
  • Understand how new data can automatically trigger alerts when it matches or overlaps pre-existing data
  • Learn about the massive infrastructure required to create, capture, and process DNA data

That's only small sample of what you'll find in Beautiful Data. For anyone who handles data, this is a truly fascinating book. Contributors include:

Nathan Yau Jonathan Follett and Matt Holm J.M. Hughes Raghu Ramakrishnan, Brian Cooper, and Utkarsh Srivastava Jeff Hammerbacher Jason Dykes and Jo Wood Jeff Jonas and Lisa Sokol Jud Valeski Alon Halevy and Jayant Madhavan Aaron Koblin with Valdean Klump Michal Migurski Jeff Heer Coco Krumme Peter Norvig Matt Wood and Ben Blackburne Jean-Claude Bradley, Rajarshi Guha, Andrew Lang, Pierre Lindenbaum, Cameron Neylon, Antony Williams, and Egon Willighagen Lukas Biewald and Brendan O'Connor Hadley Wickham, Deborah Swayne, and David Poole Andrew Gelman, Jonathan P. Kastellec, and Yair Ghitza Toby Segaran

Author(s): Toby Segaran, Jeff Hammerbacher
Edition: 1
Publisher: O'Reilly Media
Year: 2009

Language: English
Pages: 383

Contents......Page 6
Preface......Page 12
How This Book Is Organized......Page 13
Conventions Used in This Book......Page 14
How to Contact Us......Page 15
Safari® Books Online......Page 16
Seeing Your Life in Data......Page 18
Personal Environmental Impact Report (PEIR)......Page 19
Personal Data Collection......Page 20
Asynchronous data collection......Page 21
Data Storage......Page 22
Data Processing......Page 23
Data Visualization......Page 24
Mapping multivariate location traces......Page 25
Choosing a color scheme......Page 27
Displaying distributions......Page 28
Sharing personal data......Page 29
YFD......Page 30
The Point......Page 31
How to Participate......Page 32
What Is UX?......Page 34
The Benefits of Applying UX Best Practices to Data Collection......Page 35
Challenges of Accessibility......Page 36
Length of survey......Page 37
Design Philosophy......Page 38
Designing the Form Layout......Page 39
Giving them some space......Page 40
Interaction design considerations: Dynamic form length......Page 41
Designing trust......Page 45
Designing for accurate data collection......Page 46
Reporting the live data results......Page 47
Results and Reflection......Page 48
Introduction......Page 52
Some Background......Page 54
To Pack or Not to Pack......Page 57
The Three Tasks......Page 59
Slotting the Images......Page 60
Passing the Image: Communication Among the Three Tasks......Page 63
Getting the Picture: Image Download and Processing......Page 65
Image Compression......Page 67
Conclusion......Page 69
Introduction......Page 72
The Challenge......Page 74
Our Approach......Page 76
More on mastership......Page 78
Supporting ordered data......Page 79
Trading off consistency for availability......Page 80
The Challenge......Page 81
Our Approach......Page 83
Google’s BigTable......Page 85
Amazon’s Dynamo......Page 86
Other Systems at Yahoo!......Page 87
Acknowledgments......Page 88
References......Page 89
Libraries and Brains......Page 90
Facebook Becomes Self-Aware......Page 91
A Business Intelligence System......Page 92
The Death and Rebirth of a Data Warehouse......Page 94
Beyond the Data Warehouse......Page 95
The Cheetah and the Elephant......Page 96
The Unreasonable Effectiveness of Data......Page 97
New Tools and Applied Research......Page 98
MAD Skills and Cosmos......Page 99
The Data Scientist......Page 100
Conclusion......Page 101
The Geographic Beauty of a Photographic Archive......Page 102
Beauty in Data: Geograph......Page 103
What Is Beauty in Visual Data Exploration?......Page 106
Making Treemaps Beautiful: A Geographic Perspective......Page 107
A Geographic Perspective on Geograph Term Use......Page 108
Representing the Term Hierarchy......Page 109
Representing Relative Location with Spatial Treemaps......Page 112
Representing Location Displacement......Page 113
Beauty in Discovery......Page 115
Acknowledgments......Page 118
References......Page 119
Introduction......Page 122
The Benefits of Just-in-Time Discovery......Page 123
Corruption at the Roulette Wheel......Page 124
Federated Search Ain’t All That......Page 128
Directories: Priceless......Page 130
Components and Special Considerations......Page 132
The Ability to Make Assertions (Same or Related) About New Observations......Page 133
The Ability to Notify the Appropriate Entity of Such Insight......Page 134
Conclusion......Page 135
Introduction......Page 136
XMPP......Page 137
Formats......Page 138
APIs......Page 139
Rate limiting......Page 140
Zero miles per gallon efficiency......Page 141
Events......Page 142
HTML 5 events......Page 143
WAN Scale Events......Page 144
Social Data Normalization......Page 145
Business Value of Data......Page 146
Public versus private......Page 147
Conclusion: Mediation via Gnip......Page 148
What Is the Deep Web?......Page 150
Alternatives to Offering Deep-Web Access......Page 152
Basics of HTML Form Processing......Page 154
Queries and Query Templates......Page 155
Selecting Input Combinations......Page 157
Quality of query templates......Page 158
Informativeness test......Page 159
Searching for informative query templates......Page 160
Predicting Input Values......Page 161
Generic text inputs......Page 162
Typed text inputs......Page 163
References......Page 164
How It All Started......Page 166
The Data Capture Equipment......Page 167
Velodyne Lidar......Page 168
Geometric Informatics......Page 170
The Data......Page 171
The Outdoor Lidar Shoot......Page 172
The Indoor Lidar Shoot......Page 174
The Indoor GeoVideo Shoot......Page 175
Post-Processing the Data......Page 177
Launching the Video......Page 178
Conclusion......Page 181
Introduction......Page 184
Background......Page 185
Cracking the Nut......Page 186
Making It Public......Page 191
Revisiting......Page 195
Conclusion......Page 198
The Design of Sense.us......Page 200
Visualization and Social Data Analysis......Page 201
Data......Page 203
Visualization......Page 205
Provide effective visual encodings......Page 206
Be engaging and playful......Page 207
Birthplace Voyager......Page 208
Population pyramid......Page 209
View Sharing......Page 211
Doubly Linked Discussion......Page 212
Pointing via Graphical Annotation......Page 213
Collecting and Linking Views......Page 214
Awareness and Social Navigation......Page 215
Hunting for Patterns......Page 216
Making Sense of It All......Page 218
Crowd Surfing......Page 219
References......Page 220
What Data Doesn’t Do......Page 222
When Doesn’t Data Drive?......Page 225
1. More Data Isn’t Always Better......Page 226
3. Data Alone Doesn’t Explain......Page 227
4. Data Isn’t Good for a Single Answer......Page 228
5. Data Doesn’t Predict......Page 230
8. The Real World Doesn’t Create Random Variables......Page 232
9. Data Doesn’t Stand Alone......Page 233
Conclusion......Page 234
References......Page 235
Natural Language Corpus Data......Page 236
Word Segmentation......Page 238
Secret Codes......Page 245
Spelling Correction......Page 251
Author Identification (Stylometry)......Page 256
Discussion and Conclusion......Page 257
Acknowledgments......Page 259
DNA As a Data Store......Page 260
Hacking Your DNA Data Store with Drugs......Page 262
Cancer......Page 263
Replication......Page 264
Cracking the Code......Page 265
Evolution As an Algorithm......Page 266
DNA As a Data Source......Page 267
A Quantum Leap......Page 268
“My God, It’s Full of Bases...”......Page 269
Fighting the Data Deluge......Page 270
Project management......Page 271
Flexible Data Capture......Page 272
Instrument and Data Management......Page 273
The Era of Big Data......Page 274
Acknowledgments......Page 275
The Problem with Real Data......Page 276
Providing the Raw Data Back to the Notebook......Page 277
Validating Crowdsourced Data......Page 279
Unique Identifiers for Chemical Entities......Page 280
Open Data and Accessible Services Enable a Wide Range of Visualization and Analysis Options......Page 281
Integrating Data with a Central Aggregation Service......Page 283
Enabling Data Integration via Unique Identifiers and Self-Describing Data Formats......Page 286
Closing the Loop: Visualizations to Suggest New Experiments......Page 288
Building a Data Web from Open Data and Free Services......Page 291
References......Page 294
Introduction......Page 296
Preprocessing the Data......Page 297
Exploring the Data......Page 299
Age, Attractiveness, and Gender......Page 302
Looking at Tags......Page 307
Which Words Are Gendered?......Page 311
Clustering......Page 312
References......Page 317
Introduction......Page 320
How Did We Get the Data?......Page 321
Data Checking......Page 322
Analysis......Page 323
The Influence of Inflation......Page 324
The Rich Get Richer and the Poor Get Poorer......Page 325
Geographic Differences......Page 328
Census Information......Page 331
Exploring San Francisco......Page 335
Conclusion......Page 336
References......Page 339
Beautiful Political Data......Page 340
Example 1: Redistricting and Partisan Bias......Page 341
Example 2: Time Series of Estimates......Page 343
Example 4: Public Opinion and Senate Voting on Supreme Court Nominees......Page 345
Example 5: Localized Partisanship in Pennsylvania......Page 347
References......Page 349
Connecting Data......Page 352
What Public Data Is There, Really?......Page 353
The Possibilities of Connected Data......Page 354
Within Companies......Page 355
The Representation Problem......Page 356
Shared Nouns and Shared Verbs......Page 358
The Same Thing with Different Names......Page 359
Possible Solutions......Page 360
Collective Reconciliation......Page 361
Conclusion......Page 365
Contributors......Page 366
Index......Page 374