While Web 2.0 was about data, Web 3.0 is about knowledge and information. Scripting Intelligence: Web 3.0 Information Gathering and Processing offers the reader Ruby scripts for intelligent information management in a Web 3.0 environment—including information extraction from text, using Semantic Web technologies, information gathering (relational database metadata, web scraping, Wikipedia, Freebase), combining information from multiple sources, and strategies for publishing processed information. This book will be a valuable tool for anyone needing to gather, process, and publish web or database information across the modern web environment.
- Text processing recipes, including speech tagging and automatic summarization
- Gathering, visualizing, and publishing information from the Semantic Web
- Information gathering from traditional sources such as relational databases and web sites
What you’ll learn
- Gather and process information within the Web 3.0 environment.
- See the flexibility of scripting with Ruby to gather and process information.
- Extract text from various document formats.
- Work with the RDF data model and SPARQL query language, the foundations of the Semantic Web.
- Use GraphViz for data visualization.
- Extract information from relational databases and web sites.
Who is this book for?
- Anyone needing to gather and display information available in electronic formats
- Programmers needing to tag, summarize, or publish information
- Ruby programmers and computer enthusiasts interested in seeing what Ruby can do with information management and Semantic Web tools
- Academic researchers needing to extract and organize information in a more automated way.
Author(s): Mark Watson
Series: Expert's Voice in Open Source
Edition: 1
Publisher: apress
Year: 2009
Language: English
Pages: 394
Prelims
......Page 1
Contents at a Glance......Page 6
Contents......Page 9
About the Author......Page 17
About the Technical Reviewer......Page 19
Acknowledgments......Page 21
reasons for Using ruby......Page 23
Book Contents......Page 24
Book Software......Page 25
development tools......Page 26
Representing Styled Text......Page 29
Binary Document Formats......Page 32
HTML and XHTML......Page 33
OpenDocument......Page 36
RSS......Page 37
Atom......Page 39
Handling PDF Files......Page 40
GNU Metadata Extractor Library......Page 41
Wrapup......Page 42
Removing HTML Tags......Page 45
Using REXML......Page 47
Using Nokogiri......Page 48
Segmenting Text......Page 49
Spell-Checking Text......Page 53
Recognizing and Removing Noise Characters from Text......Page 55
Custom Text Processing......Page 58
Wrapup......Page 59
Natural Language processing......Page 61
Automating Text Categorization......Page 62
Using Word-Count Statistics for Categorization......Page 63
Using a Bayesian Classifier for Categorization......Page 65
Using LSI for Categorization......Page 68
Using Bayesian Classification and LSI Summarization......Page 71
Extracting Entities from Text......Page 72
Performing Entity Extraction Using Open Calais......Page 78
Automatically Generating Summaries......Page 81
Determining the fiSentimentfl of Te......Page 83
K-means Document Clustering......Page 84
Clustering Documents with Word-Use Intersections......Page 85
Combining the TextResource Class with NLP Code......Page 88
Wrapup......Page 91
Using rDF and rDFS Data Formats......Page 95
Understanding RDF......Page 96
Understanding RDFS......Page 101
Understanding OWL......Page 103
Converting Between RDF Formats......Page 104
Working with the Protégé Ontology Editor......Page 105
Exploring Logic and Inference......Page 108
Creating SPARQL Queries......Page 109
Accessing SPARQL Endpoint Services......Page 111
Using the Linked Movie Database SPARQL Endpoint......Page 113
Using the World Fact Book SPARQL Endpoint......Page 116
Wrapup......Page 118
Installing Redland RDF Ruby Bindings......Page 121
Using the Sesame RDF Data Store......Page 125
Embedding Sesame in JRuby Applications......Page 131
Using the AllegroGraph RDF Data Store......Page 133
Sources of Large RDF Data Sets......Page 135
Loading RDF Data from UMBEL into Redland......Page 136
Loading SEC RDF Data from RdfAbout.com into Sesame......Page 137
Loading SEC RDF Data from RdfAbout.com into AllegroGraph......Page 138
Wrapup......Page 139
URI......Page 141
RDF Typed Literal......Page 142
Blank RDF Node......Page 143
RDF Graph......Page 144
Comparing SPARQL Query Syntax......Page 145
SPARQL SELECT Queries......Page 146
SPARQL CONSTRUCT Queries......Page 149
SPARQL DESCRIBE Queries......Page 150
Implementing Reasoning and Inference......Page 151
RDFS Inferencing: Type Propagation Rule for Property Inheritance......Page 153
Using rdfs:range and rdfs:domain to Infer Triples......Page 154
Combining RDF Repositories That Use Different Schemas......Page 156
Wrapup......Page 157
Implementing SparQL endpoint Web portals......Page 159
Designing and Implementing a Common Front End......Page 162
Designing and Implementing the JRuby and Sesame Back End......Page 164
Designing and Implementing the Ruby and Redland Back End......Page 166
Modifying the Portal to Accept New RDF Data in Real Time......Page 168
Monkey-Patching the RedlandBackend Class......Page 169
Modifying the portal.rb Script to Automatically Load New RDF Files......Page 170
Modifying the Portal to Generate Graphviz RDF Diagrams......Page 171
Wrapup......Page 176
Working with relational Databases......Page 179
Doing ORM with ActiveRecord......Page 180
Quick-Start Tutorial......Page 181
One-to-Many Relationships......Page 182
Handling Transactions......Page 184
Handling Callbacks and Observers in ActiveRecord......Page 185
Modifying Default Behavior......Page 188
Using SQL Queries......Page 190
Accessing Metadata......Page 191
Doing ORM with DataMapper......Page 192
Quick-Start Tutorial......Page 193
Migrating to New Database Schemas......Page 197
Modifying Default Behavior......Page 198
Handling Callbacks and Observers in DataMapper......Page 199
Wrapup......Page 200
Using JRuby and Lucene......Page 201
Doing Spatial Search Using Geohash......Page 204
Using Solr Web Services......Page 208
Using Nutch with Ruby Clients......Page 210
Installing Sphinx......Page 214
Installing Thinking Sphinx......Page 215
Using PostgreSQL Full-Text Search......Page 218
Developing a Ruby Client Script......Page 221
Integrating PostgreSQL Text Search with ActiveRecord......Page 222
Using MySQL Full-Text Search......Page 224
Using MySQL SQL Full-Text Functions......Page 225
Integrating MySQL Text Search with ActiveRecord......Page 228
Wrapup......Page 230
Using Web Scraping to Create Semantic relations......Page 231
Using Firebug to Find HTML Elements on Web Pages......Page 232
Example Use of scRUBYt!......Page 235
Database Schema for Storing Web-Scraped Recipes......Page 236
Storing Recipes from CJsKitchen.com in a Local Database......Page 237
Example Use of FireWatir......Page 239
Storing Recipes from CookingSpace.com in a Local Database......Page 240
Extending the ScrapedRecipe Class......Page 244
Graphviz Visualization for Relations Between Recipes......Page 245
RDFS Modeling of Relations Between Recipes......Page 248
Automatically Generating RDF Relations Between Recipes......Page 250
Comparing the Use of RDF and Relational Databases......Page 253
Wrapup......Page 254
taking advantage of Linked Data......Page 255
Producing Linked Data Using D2R......Page 256
DBpedia......Page 261
Freebase......Page 265
Open Calais......Page 268
Wrapup......Page 272
Database Master/Slave Setup for PostgreSQL......Page 273
Database Master/Slave Setup for MySQL......Page 274
Database Sharding......Page 275
Using memcached......Page 276
Using memcached with ActiveRecord......Page 278
Using memcached with Web-Service Calls......Page 279
Using CouchDB......Page 281
Saving Wikipedia Articles in CouchDB......Page 284
Reading Wikipedia Article Data from CouchDB......Page 285
Using Amazon S3......Page 286
Using Amazon EC2......Page 289
Wrapup......Page 291
Creating Web Mashups......Page 295
Using the Twitter Gem......Page 296
Google Maps API Overview......Page 298
Using the YM4R/GM Rails Plugin......Page 300
An Example Rails Mashup Web Application......Page 301
MashupController Class......Page 303
Handling Large Cookies......Page 304
Wrapup......Page 305
performing Large-Scale Data processing......Page 307
Using the Distributed Map/Reduce Algorithm......Page 308
Installing Hadoop......Page 309
Running the Ruby Map/Reduce Functions......Page 310
Creating an Inverted Word Index with the Ruby Map/Reduce Functions......Page 311
Creating an Inverted Person-Name Index with the Ruby Map/Reduce Functions......Page 314
Creating an Inverted Person-Name Index with Java Map/Reduce Functions......Page 319
Running with Larger Data Sets......Page 322
Running the Ruby Map/Reduce Example Using Amazon Elastic MapReduce......Page 324
Wrapup......Page 327
Searching for People’s Names on Wikipedia......Page 329
Using the auto_complete Rails Plugin with a Generated auto_complete_for Method......Page 332
Using the auto_complete Rails Plugin with a Custom auto_complete_for Method......Page 333
A Personal fiInteresting Thingsfl Web Applicati......Page 335
Back-End Processing......Page 336
Rails User Interface......Page 345
Web-Service APIs Defined in the Web-Service Controller......Page 354
SPARQL Endpoint for the Interesting Things Application......Page 357
Scaling Up......Page 358
Wrapup......Page 359
Using the aMi with Book examples......Page 363
publishing HTML or RdF Based on HTTp Request Headers......Page 367
Handling Data Requests in a Rails Example......Page 368
introducing RdFa......Page 373
The RDFa Ruby Gem......Page 374
Implementing a Rails Application Using the RDFa Gem......Page 375
Index......Page 377