Beautiful Data: The Stories Behind Elegant Data Solutions

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

... The contents are less impressive: O'Reilly bring together a heterogeneous group of authors and let them fend for themselves, with no editorial effort to unite their stories. Some authors hold their own, presenting interesting analyses and visualizations, or just interesting tales, others are less successful. (The spectrum of statistical expertise, for example, is bounded by Andrew Gelman and a graduate student believing that normality is a requirement of the central-limit theorem). 'Interesting' is a good thing, but for $40 I would like 'useful', and here the book comes up short. An appealing leisure read, but not much more, I am afraid.

Author(s): Toby Segaran, Jeff Hammerbacher
Edition: 1
Publisher: O'Reilly Media
Year: 2009

Language: English
Pages: 384

Contents......Page 7
Preface......Page 13
How This Book Is Organized......Page 14
Conventions Used in This Book......Page 15
How to Contact Us......Page 16
Safari® Books Online......Page 17
Seeing Your Life in Data......Page 19
Personal Environmental Impact Report (PEIR)......Page 20
Personal Data Collection......Page 21
Asynchronous data collection......Page 22
Data Storage......Page 23
Data Processing......Page 24
Data Visualization......Page 25
Mapping multivariate location traces......Page 26
Choosing a color scheme......Page 28
Displaying distributions......Page 29
Sharing personal data......Page 30
YFD......Page 31
The Point......Page 32
How to Participate......Page 33
What Is UX?......Page 35
The Benefits of Applying UX Best Practices to Data Collection......Page 36
Challenges of Accessibility......Page 37
Length of survey......Page 38
Design Philosophy......Page 39
Designing the Form Layout......Page 40
Giving them some space......Page 41
Interaction design considerations: Dynamic form length......Page 42
Designing trust......Page 46
Designing for accurate data collection......Page 47
Reporting the live data results......Page 48
Results and Reflection......Page 49
Introduction......Page 53
Some Background......Page 55
To Pack or Not to Pack......Page 58
The Three Tasks......Page 60
Slotting the Images......Page 61
Passing the Image: Communication Among the Three Tasks......Page 64
Getting the Picture: Image Download and Processing......Page 66
Image Compression......Page 68
Conclusion......Page 70
Introduction......Page 73
The Challenge......Page 75
Our Approach......Page 77
More on mastership......Page 79
Supporting ordered data......Page 80
Trading off consistency for availability......Page 81
The Challenge......Page 82
Our Approach......Page 84
Google’s BigTable......Page 86
Amazon’s Dynamo......Page 87
Other Systems at Yahoo!......Page 88
Acknowledgments......Page 89
References......Page 90
Libraries and Brains......Page 91
Facebook Becomes Self-Aware......Page 92
A Business Intelligence System......Page 93
The Death and Rebirth of a Data Warehouse......Page 95
Beyond the Data Warehouse......Page 96
The Cheetah and the Elephant......Page 97
The Unreasonable Effectiveness of Data......Page 98
New Tools and Applied Research......Page 99
MAD Skills and Cosmos......Page 100
The Data Scientist......Page 101
Conclusion......Page 102
The Geographic Beauty of a Photographic Archive......Page 103
Beauty in Data: Geograph......Page 104
What Is Beauty in Visual Data Exploration?......Page 107
Making Treemaps Beautiful: A Geographic Perspective......Page 108
A Geographic Perspective on Geograph Term Use......Page 109
Representing the Term Hierarchy......Page 110
Representing Relative Location with Spatial Treemaps......Page 113
Representing Location Displacement......Page 114
Beauty in Discovery......Page 116
Acknowledgments......Page 119
References......Page 120
Introduction......Page 123
The Benefits of Just-in-Time Discovery......Page 124
Corruption at the Roulette Wheel......Page 125
Federated Search Ain’t All That......Page 129
Directories: Priceless......Page 131
Components and Special Considerations......Page 133
The Ability to Make Assertions (Same or Related) About New Observations......Page 134
The Ability to Notify the Appropriate Entity of Such Insight......Page 135
Conclusion......Page 136
Introduction......Page 137
XMPP......Page 138
Formats......Page 139
APIs......Page 140
Rate limiting......Page 141
Zero miles per gallon efficiency......Page 142
Events......Page 143
HTML 5 events......Page 144
WAN Scale Events......Page 145
Social Data Normalization......Page 146
Business Value of Data......Page 147
Public versus private......Page 148
Conclusion: Mediation via Gnip......Page 149
What Is the Deep Web?......Page 151
Alternatives to Offering Deep-Web Access......Page 153
Basics of HTML Form Processing......Page 155
Queries and Query Templates......Page 156
Selecting Input Combinations......Page 158
Quality of query templates......Page 159
Informativeness test......Page 160
Searching for informative query templates......Page 161
Predicting Input Values......Page 162
Generic text inputs......Page 163
Typed text inputs......Page 164
References......Page 165
How It All Started......Page 167
The Data Capture Equipment......Page 168
Velodyne Lidar......Page 169
Geometric Informatics......Page 171
The Data......Page 172
The Outdoor Lidar Shoot......Page 173
The Indoor Lidar Shoot......Page 175
The Indoor GeoVideo Shoot......Page 176
Post-Processing the Data......Page 178
Launching the Video......Page 179
Conclusion......Page 182
Introduction......Page 185
Background......Page 186
Cracking the Nut......Page 187
Making It Public......Page 192
Revisiting......Page 196
Conclusion......Page 199
The Design of Sense.us......Page 201
Visualization and Social Data Analysis......Page 202
Data......Page 204
Visualization......Page 206
Provide effective visual encodings......Page 207
Be engaging and playful......Page 208
Birthplace Voyager......Page 209
Population pyramid......Page 210
View Sharing......Page 212
Doubly Linked Discussion......Page 213
Pointing via Graphical Annotation......Page 214
Collecting and Linking Views......Page 215
Awareness and Social Navigation......Page 216
Hunting for Patterns......Page 217
Making Sense of It All......Page 219
Crowd Surfing......Page 220
References......Page 221
What Data Doesn’t Do......Page 223
When Doesn’t Data Drive?......Page 226
1. More Data Isn’t Always Better......Page 227
3. Data Alone Doesn’t Explain......Page 228
4. Data Isn’t Good for a Single Answer......Page 229
5. Data Doesn’t Predict......Page 231
8. The Real World Doesn’t Create Random Variables......Page 233
9. Data Doesn’t Stand Alone......Page 234
Conclusion......Page 235
References......Page 236
Natural Language Corpus Data......Page 237
Word Segmentation......Page 239
Secret Codes......Page 246
Spelling Correction......Page 252
Author Identification (Stylometry)......Page 257
Discussion and Conclusion......Page 258
Acknowledgments......Page 260
DNA As a Data Store......Page 261
Hacking Your DNA Data Store with Drugs......Page 263
Cancer......Page 264
Replication......Page 265
Cracking the Code......Page 266
Evolution As an Algorithm......Page 267
DNA As a Data Source......Page 268
A Quantum Leap......Page 269
“My God, It’s Full of Bases...”......Page 270
Fighting the Data Deluge......Page 271
Project management......Page 272
Flexible Data Capture......Page 273
Instrument and Data Management......Page 274
The Era of Big Data......Page 275
Acknowledgments......Page 276
The Problem with Real Data......Page 277
Providing the Raw Data Back to the Notebook......Page 278
Validating Crowdsourced Data......Page 280
Unique Identifiers for Chemical Entities......Page 281
Open Data and Accessible Services Enable a Wide Range of Visualization and Analysis Options......Page 282
Integrating Data with a Central Aggregation Service......Page 284
Enabling Data Integration via Unique Identifiers and Self-Describing Data Formats......Page 287
Closing the Loop: Visualizations to Suggest New Experiments......Page 289
Building a Data Web from Open Data and Free Services......Page 292
References......Page 295
Introduction......Page 297
Preprocessing the Data......Page 298
Exploring the Data......Page 300
Age, Attractiveness, and Gender......Page 303
Looking at Tags......Page 308
Which Words Are Gendered?......Page 312
Clustering......Page 313
References......Page 318
Introduction......Page 321
How Did We Get the Data?......Page 322
Data Checking......Page 323
Analysis......Page 324
The Influence of Inflation......Page 325
The Rich Get Richer and the Poor Get Poorer......Page 326
Geographic Differences......Page 329
Census Information......Page 332
Exploring San Francisco......Page 336
Conclusion......Page 337
References......Page 340
Beautiful Political Data......Page 341
Example 1: Redistricting and Partisan Bias......Page 342
Example 2: Time Series of Estimates......Page 344
Example 4: Public Opinion and Senate Voting on Supreme Court Nominees......Page 346
Example 5: Localized Partisanship in Pennsylvania......Page 348
References......Page 350
Connecting Data......Page 353
What Public Data Is There, Really?......Page 354
The Possibilities of Connected Data......Page 355
Within Companies......Page 356
The Representation Problem......Page 357
Shared Nouns and Shared Verbs......Page 359
The Same Thing with Different Names......Page 360
Possible Solutions......Page 361
Collective Reconciliation......Page 362
Conclusion......Page 366
Contributors......Page 367
Index......Page 375