Regular Expressions Cookbook: Detailed Solutions in Eight Programming Languages

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Take the guesswork out of using regular expressions. With more than 140 practical recipes, this cookbook provides everything you need to solve a wide range of real-world problems. Novices will learn basic skills and tools, and programmers and experienced users will find a wealth of detail. Each recipe provides samples you can use right away.

This revised edition covers the regular expression flavors used by C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET. You’ll learn powerful new tricks, avoid flavor-specific gotchas, and save valuable time with this huge library of practical solutions.

  • Learn regular expressions basics through a detailed tutorial
  • Use code listings to implement regular expressions with your language of choice
  • Understand how regular expressions differ from language to language
  • Handle common user input with recipes for validation and formatting
  • Find and manipulate words, special characters, and lines of text
  • Detect integers, floating-point numbers, and other numerical formats
  • Parse source code and process log files
  • Use regular expressions in URLs, paths, and IP addresses
  • Manipulate HTML, XML, and data exchange formats
  • Discover little-known regular expression tricks and techniques

Author(s): Jan Goyvaerts, Steven Levithan
Edition: 2
Publisher: O'Reilly Media
Year: 2012

Language: English
Commentary: Revision History for the Second Edition: 2012-08-10 First release
Pages: 612
City: Sebastopol, CA
Tags: Computer Programming; Text Processing; Regular Expressions; Computer Science

Table of Contents
Preface
Caught in the Snarls of Different Versions
Intended Audience
Technology Covered
Organization of This Book
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgments
Chapter 1. Introduction to Regular Expressions
Regular Expressions Defined
Many Flavors of Regular Expressions
Regex Flavors Covered by This Book
Search and Replace with Regular Expressions
Many Flavors of Replacement Text
Tools for Working with Regular Expressions
RegexBuddy
RegexPal
RegexMagic
More Online Regex Testers
RegexPlanet
regex.larsolavtorvik.com
Nregex
Rubular
myregexp.com
More Desktop Regular Expression Testers
Expresso
The Regulator
SDL Regex Fuzzer
grep
PowerGREP
Windows Grep
RegexRenamer
Popular Text Editors
Chapter 2. Basic Regular Expression Skills
2.1  Match Literal Text
Problem
Solution
Discussion
Variations
Block escape
Case-insensitive matching
See Also
2.2  Match Nonprintable Characters
Problem
Solution
Discussion
Variations on Representations of Nonprinting Characters
The 26 control characters
The 7-bit character set
See Also
2.3  Match One of Many Characters
Problem
Solution
Calendar with misspellings
Hexadecimal character
Nonhexadecimal character
Discussion
Variations
Shorthands
Case insensitivity
Flavor-Specific Features
.NET character class subtraction
Java character class union, intersection, and subtraction
See Also
2.4  Match Any Character
Problem
Solution
Any character except line breaks
Any character including line breaks
Discussion
Any character except line breaks
Any character including line breaks
Dot abuse
Variations
See Also
2.5  Match Something at the Start and/or the End of a Line
Problem
Solution
Start of the subject
End of the subject
Start of a line
End of a line
Discussion
Anchors and lines
Start of the subject
End of the subject
Start of a line
End of a line
Zero-length matches
Variations
See Also
2.6  Match Whole Words
Problem
Solution
Word boundaries
Nonboundaries
Discussion
Word boundaries
Nonboundaries
Word Characters
See Also
2.7  Unicode Code Points, Categories, Blocks, and Scripts
Problem
Solution
Unicode code point
Unicode category
Unicode block
Unicode script
Unicode grapheme
Discussion
Unicode code point
Unicode category
Unicode block
Unicode script
Unicode grapheme
Variations
Negated variant
Character classes
Listing all characters
See Also
2.8  Match One of Several Alternatives
Problem
Solution
Discussion
See Also
2.9  Group and Capture Parts of the Match
Problem
Solution
Discussion
Variations
Noncapturing groups
Group with mode modifiers
See Also
2.10  Match Previously Matched Text Again
Problem
Solution
Discussion
See Also
2.11  Capture and Name Parts of the Match
Problem
Solution
Named capture
Named backreferences
Discussion
Named capture
Named backreferences
Groups with the same name
See Also
2.12  Repeat Part of the Regex a Certain Number of Times
Problem
Solution
Googol
Hexadecimal number
Hexadecimal number with optional suffix
Floating-point number
Discussion
Fixed repetition
Variable repetition
Infinite repetition
Making something optional
Repeating groups
See Also
2.13  Choose Minimal or Maximal Repetition
Problem
Solution
Discussion
See Also
2.14  Eliminate Needless Backtracking
Problem
Solution
Discussion
See Also
2.15  Prevent Runaway Repetition
Problem
Solution
Discussion
Variations
See Also
2.16  Test for a Match Without Adding It to the Overall Match
Problem
Solution
Discussion
Lookaround
Negative lookaround
Different levels of lookbehind
Matching the same text twice
Lookaround is atomic
Alternative to Lookbehind
Solution Without Lookbehind
See Also
2.17  Match One of Two Alternatives Based on a Condition
Problem
Solution
Discussion
See Also
2.18  Add Comments to a Regular Expression
Problem
Solution
Discussion
Free-spacing mode
Java has free-spacing character classes
Variations
2.19  Insert Literal Text into the Replacement Text
Problem
Solution
Discussion
When and how to escape characters in replacement text
.NET and JavaScript
Java
PHP
Perl
Python and Ruby
More escape rules for string literals
See Also
2.20  Insert the Regex Match into the Replacement Text
Problem
Solution
Regular expression
Replacement
Discussion
See Also
2.21  Insert Part of the Regex Match into the Replacement Text
Problem
Solution
Regular expression
Replacement
Discussion
Replacements using capturing groups
$10 and higher
References to nonexistent groups
Solution Using Named Capture
Regular expression
Replacement
Flavors that support named capture
See Also
2.22  Insert Match Context into the Replacement Text
Problem
Solution
Discussion
See Also
Chapter 3. Programming with Regular Expressions
Programming Languages and Regex Flavors
Languages Covered in This Chapter
More Programming Languages
3.1  Literal Regular Expressions in Source Code
Problem
Solution
C#
VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
Discussion
C#
VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
See Also
3.2  Import the Regular Expression Library
Problem
Solution
C#
VB.NET
XRegExp
Java
Python
Discussion
C#
VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
3.3  Create Regular Expression Objects
Problem
Solution
C#
VB.NET
Java
JavaScript
XRegExp
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
Compiling a Regular Expression Down to CIL
C#
VB.NET
Discussion
See Also
3.4  Set Regular Expression Options
Problem
Solution
C#
VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
Additional Language-Specific Options
.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
See Also
3.5  Test If a Match Can Be Found Within a Subject String
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
C# and VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
See Also
3.6  Test Whether a Regex Matches the Subject String Entirely
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
C# and VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
See Also
3.7  Retrieve the Matched Text
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
PHP
Perl
Python
Ruby
See Also
3.8  Determine the Position and Length of the Match
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
PHP
Perl
Python
Ruby
See Also
3.9  Retrieve Part of the Matched Text
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Named Capture
C#
VB.NET
Java
XRegExp
PHP
Perl
Python
Ruby
See Also
3.10  Retrieve a List of All Matches
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
PHP
Perl
Python
Ruby
See Also
3.11  Iterate over All Matches
Problem
Solution
C#
VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
See Also
3.12  Validate Matches in Procedural Code
Problem
Solution
C#
VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
Discussion
See Also
3.13  Find a Match Within Another Match
Problem
Solution
C#
VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
Discussion
See Also
3.14  Replace All Matches
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
PHP
Perl
Python
Ruby
See Also
3.15  Replace Matches Reusing Parts of the Match
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Named Capture
C#
VB.NET
Java 7
XRegExp
PHP
Perl
Python
Ruby
See Also
3.16  Replace Matches with Replacements Generated in Code
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
See Also
3.17  Replace All Matches Within the Matches of Another Regex
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
See Also
3.18  Replace All Matches Between the Matches of Another Regex
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
Perl and Ruby
Python
See Also
3.19  Split a String
Problem
Solution
C#
VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
Discussion
C# and VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
See Also
3.20  Split a String, Keeping the Regex Matches
Problem
Solution
C#
VB.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
Discussion
.NET
Java
JavaScript
XRegExp
PHP
Perl
Python
Ruby
See Also
3.21  Search Line by Line
Problem
Solution
C#
VB.NET
Java
JavaScript
PHP
Perl
Python
Ruby
Discussion
See Also
3.22  Construct a Parser
Problem
Solution
C#
VB.NET
Java
JavaScript
XRegExp
Perl
Python
PHP
Ruby
Discussion
See Also
Chapter 4. Validation and Formatting
4.1  Validate Email Addresses
Problem
Solution
Simple
Simple, with restrictions on characters
Simple, with all valid local part characters
No leading, trailing, or consecutive dots
Top-level domain has two to six letters
Discussion
About email addresses
Regular expression syntax
Building a regex step-by-step
Variations
See Also
4.2  Validate and Format North American Phone Numbers
Problem
Solution
Regular expression
Replacement
C# example
JavaScript example
Other programming languages
Discussion
Variations
Eliminate invalid phone numbers
Find phone numbers in documents
Allow a leading “1”
Allow seven-digit phone numbers
See Also
4.3  Validate International Phone Numbers
Problem
Solution
Regular expression
JavaScript example
Discussion
Variations
Validate international phone numbers in EPP format
See Also
4.4  Validate Traditional Date Formats
Problem
Solution
Discussion
Variations
See Also
4.5  Validate Traditional Date Formats, Excluding Invalid Dates
Problem
Solution
C#
Perl
Pure regular expression
Discussion
Regex with procedural code
Pure regular expression
Variations
See Also
4.6  Validate Traditional Time Formats
Problem
Solution
Discussion
Variations
See Also
4.7  Validate ISO 8601 Dates and Times
Problem
Solution
Dates
Weeks
Times
Date and time
XML Schema dates and times
Discussion
See Also
4.8  Limit Input to Alphanumeric Characters
Problem
Solution
Regular expression
Ruby example
Discussion
Variations
Limit input to ASCII characters
Limit input to ASCII noncontrol characters and line breaks
Limit input to shared ISO-8859-1 and Windows-1252 characters
Limit input to alphanumeric characters in any language
See Also
4.9  Limit the Length of Text
Problem
Solution
Regular expression
Perl example
Discussion
Variations
Limit the length of an arbitrary pattern
Limit the number of nonwhitespace characters
Limit the number of words
See Also
4.10  Limit the Number of Lines in Text
Problem
Solution
Regular expression
PHP (PCRE) example
Discussion
Variations
Working with esoteric line separators
See Also
4.11  Validate Affirmative Responses
Problem
Solution
Regular expression
JavaScript example
Discussion
See Also
4.12  Validate Social Security Numbers
Problem
Solution
Regular expression
Python example
Discussion
Variations
Find Social Security numbers in documents
See Also
4.13  Validate ISBNs
Problem
Solution
Regular expressions
JavaScript example, with checksum validation
Python example, with checksum validation
Discussion
ISBN-10 checksum
ISBN-13 checksum
Variations
Find ISBNs in documents
Eliminate incorrect ISBN identifiers
See Also
4.14  Validate ZIP Codes
Problem
Solution
Regular expression
VB.NET example
Discussion
See Also
4.15  Validate Canadian Postal Codes
Problem
Solution
Discussion
See Also
4.16  Validate U.K. Postcodes
Problem
Solution
Discussion
See Also
4.17  Find Addresses with Post Office Boxes
Problem
Solution
Regular expression
C# example
Discussion
See Also
4.18  Reformat Names From “FirstName LastName” to “LastName, FirstName”
Problem
Solution
Regular expression
Replacement
JavaScript example
Discussion
Variations
List surname particles at the beginning of the name
See Also
4.19  Validate Password Complexity
Problem
Solution
Length between 8 and 32 characters
ASCII visible and space characters only
One or more uppercase letters
One or more lowercase letters
One or more numbers
One or more special characters
Disallow three or more sequential identical characters
Example JavaScript solution, basic
Example JavaScript solution, with x out of y validation
Example JavaScript solution, with password security ranking
Discussion
Example JavaScript solutions
Variations
Validate multiple password rules with a single regex
See Also
4.20  Validate Credit Card Numbers
Problem
Solution
Strip spaces and hyphens
Validate the number
Example web page with JavaScript
Discussion
Strip spaces and hyphens
Validate the number
Incorporating the solution into a web page
Extra Validation with the Luhn Algorithm
See Also
4.21  European VAT Numbers
Problem
Solution
Strip whitespace and punctuation
Validate the number
Discussion
Strip whitespace and punctuation
Validate the number
Variations
See Also
Chapter 5. Words, Lines, and Special Characters
5.1  Find a Specific Word
Problem
Solution
Discussion
See Also
5.2  Find Any of Multiple Words
Problem
Solution
Using alternation
Example JavaScript solution
Discussion
Using alternation
Example JavaScript solution
See Also
5.3  Find Similar Words
Problem
Solution
Color or colour
Bat, cat, or rat
Words ending with “phobia”
Steve, Steven, or Stephen
Variations of “regular expression”
Discussion
Use word boundaries to match complete words
Color or colour
Bat, cat, or rat
Words ending with “phobia”
Steve, Steven, or Stephen
Variations of “regular expression”
See Also
5.4  Find All Except a Specific Word
Problem
Solution
Discussion
Variations
Find words that don’t contain another word
See Also
5.5  Find Any Word Not Followed by a Specific Word
Problem
Solution
Discussion
Variations
See Also
5.6  Find Any Word Not Preceded by a Specific Word
Problem
Solution
Lookbehind you
Words not preceded by “cat”
Simulate lookbehind
Discussion
Fixed, finite, and infinite length lookbehind
Simulate lookbehind
Variations
See Also
5.7  Find Words Near Each Other
Problem
Solution
Discussion
Variations
Using a conditional
Match three or more words near each other
Exponentially increasing permutations
The ugly solution
Exploiting empty backreferences
JavaScript backreferences by its own rules
Multiple words, any distance from each other
See Also
5.8  Find Repeated Words
Problem
Solution
Discussion
Variations
See Also
5.9  Remove Duplicate Lines
Problem
Solution
Option 1: Sort lines and remove adjacent duplicates
Option 2: Keep the last occurrence of each duplicate line in an unsorted file
Option 3: Keep the first occurrence of each duplicate line in an unsorted file
Discussion
Option 1: Sort lines and remove adjacent duplicates
Option 2: Keep the last occurrence of each duplicate line in an unsorted file
Option 3: Keep the first occurrence of each duplicate line in an unsorted file
See Also
5.10  Match Complete Lines That Contain a Word
Problem
Solution
Discussion
Variations
See Also
5.11  Match Complete Lines That Do Not Contain a Word
Problem
Solution
Discussion
See Also
5.12  Trim Leading and Trailing Whitespace
Problem
Solution
Discussion
Variations
See Also
5.13  Replace Repeated Whitespace with a Single Space
Problem
Solution
Clean any whitespace characters
Clean horizontal whitespace characters
Discussion
Clean any whitespace characters
Clean horizontal whitespace characters
See Also
5.14  Escape Regular Expression Metacharacters
Problem
Solution
Built-in solutions
Regular expression
Replacement
Example JavaScript function
Discussion
Variations
See Also
Chapter 6. Numbers
6.1  Integer Numbers
Problem
Solution
Discussion
See Also
6.2  Hexadecimal Numbers
Problem
Solution
Discussion
See Also
6.3  Binary Numbers
Problem
Solution
Discussion
See Also
6.4  Octal Numbers
Problem
Solution
Discussion
See Also
6.5  Decimal Numbers
Problem
Solution
Discussion
See Also
6.6  Strip Leading Zeros
Problem
Solution
Regular expression
Replacement
Getting the numbers in Perl
Stripping leading zeros in PHP
Discussion
See Also
6.7  Numbers Within a Certain Range
Problem
Solution
Discussion
See Also
6.8  Hexadecimal Numbers Within a Certain Range
Problem
Solution
Discussion
See Also
6.9  Integer Numbers with Separators
Problem
Solution
Discussion
See Also
6.10  Floating-Point Numbers
Problem
Solution
Discussion
See Also
6.11  Numbers with Thousand Separators
Problem
Solution
Discussion
See Also
6.12  Add Thousand Separators to Numbers
Problem
Solution
Basic solution
Match separator positions only, using lookbehind
Discussion
Introduction
Basic solution
Match separator positions only, using lookbehind
Variations
Don’t add commas after a decimal point
Use infinite lookbehind
Search-and-replace within matched numbers
See Also
6.13  Roman Numerals
Problem
Solution
Discussion
Convert Roman Numerals to Decimal
See Also
Chapter 7. Source Code and Log Files
7.1  Keywords
Problem
Solution
Discussion
Variations
See Also
7.2  Identifiers
Problem
Solution
Discussion
See Also
7.3  Numeric Constants
Problem
Solution
Discussion
See Also
7.4  Operators
Problem
Solution
Discussion
7.5  Single-Line Comments
Problem
Solution
Discussion
See Also
7.6  Multiline Comments
Problem
Solution
Discussion
Variations
See Also
7.7  All Comments
Problem
Solution
Discussion
See Also
7.8  Strings
Problem
Solution
Discussion
Variations
See Also
7.9  Strings with Escapes
Problem
Solution
Discussion
Variations
See Also
7.10  Regex Literals
Problem
Solution
Discussion
See Also
7.11  Here Documents
Problem
Solution
Discussion
See Also
7.12  Common Log Format
Problem
Solution
Discussion
Variations
See Also
7.13  Combined Log Format
Problem
Solution
Discussion
See Also
7.14  Broken Links Reported in Web Logs
Problem
Solution
Discussion
See Also
Chapter 8. URLs, Paths, and Internet Addresses
8.1  Validating URLs
Problem
Solution
Discussion
See Also
8.2  Finding URLs Within Full Text
Problem
Solution
Discussion
See Also
8.3  Finding Quoted URLs in Full Text
Problem
Solution
Discussion
See Also
8.4  Finding URLs with Parentheses in Full Text
Problem
Solution
Discussion
See Also
8.5  Turn URLs into Links
Problem
Solution
Discussion
See Also
8.6  Validating URNs
Problem
Solution
Discussion
See Also
8.7  Validating Generic URLs
Problem
Solution
Discussion
See Also
8.8  Extracting the Scheme from a URL
Problem
Solution
Extract the scheme from a URL known to be valid
Extract the scheme while validating the URL
Discussion
See Also
8.9  Extracting the User from a URL
Problem
Solution
Extract the user from a URL known to be valid
Extract the user while validating the URL
Discussion
See Also
8.10  Extracting the Host from a URL
Problem
Solution
Extract the host from a URL known to be valid
Extract the host while validating the URL
Discussion
See Also
8.11  Extracting the Port from a URL
Problem
Solution
Extract the port from a URL known to be valid
Extract the port while validating the URL
Discussion
See Also
8.12  Extracting the Path from a URL
Problem
Solution
Discussion
See Also
8.13  Extracting the Query from a URL
Problem
Solution
Discussion
See Also
8.14  Extracting the Fragment from a URL
Problem
Solution
Discussion
See Also
8.15  Validating Domain Names
Problem
Solution
Discussion
See Also
8.16  Matching IPv4 Addresses
Problem
Solution
Regular expression
Perl
Discussion
See Also
8.17  Matching IPv6 Addresses
Problem
Solution
Standard notation
Mixed notation
Standard or mixed notation
Compressed notation
Compressed mixed notation
Standard, mixed, or compressed notation
Discussion
Standard notation
Mixed notation
Standard or mixed notation
Compressed notation
Compressed mixed notation
Standard, mixed, or compressed notation
See Also
8.18  Validate Windows Paths
Problem
Solution
Drive letter paths
Drive letter and UNC paths
Drive letter, UNC, and relative paths
Discussion
Drive letter paths
Drive letter and UNC paths
Drive letter, UNC, and relative paths
See Also
8.19  Split Windows Paths into Their Parts
Problem
Solution
Drive letter paths
Drive letter and UNC paths
Drive letter, UNC, and relative paths
Discussion
Drive letter paths
Drive letter and UNC paths
Drive letter, UNC, and relative paths
See Also
8.20  Extract the Drive Letter from a Windows Path
Problem
Solution
Discussion
See Also
8.21  Extract the Server and Share from a UNC Path
Problem
Solution
Discussion
See Also
8.22  Extract the Folder from a Windows Path
Problem
Solution
Discussion
See Also
8.23  Extract the Filename from a Windows Path
Problem
Solution
Discussion
See Also
8.24  Extract the File Extension from a Windows Path
Problem
Solution
Discussion
See Also
8.25  Strip Invalid Characters from Filenames
Problem
Solution
Regular expression
Replacement
Discussion
See Also
Chapter 9. Markup and Data Formats
Processing Markup and Data Formats with Regular Expressions
Basic Rules for Formats Covered in This Chapter
9.1  Find XML-Style Tags
Problem
Solution
Quick and dirty
Allow > in attribute values
(X)HTML tags (loose)
(X)HTML tags (strict)
XML tags (strict)
Discussion
A few words of caution
Quick and dirty
Allow > in attribute values
(X)HTML tags (loose)
(X)HTML tags (strict)
XML tags (strict)
Skip Tricky (X)HTML and XML Sections
Outer regex for (X)HTML
Outer regex for XML
See Also
9.2  Replace Tags with
Problem
Solution
Discussion
Variations
Replace a list of tags
See Also
9.3  Remove All XML-Style Tags Except and
Problem
Solution
Solution 1: Match tags except and
Solution 2: Match tags except and , and any tags that contain attributes
Discussion
Variations
Whitelist specific attributes
See Also
9.4  Match XML Names
Problem
Solution
XML 1.0 names (approximate)
XML 1.1 names (exact)
Discussion
XML 1.0 names
XML 1.1 names
Variations
See Also
9.5  Convert Plain Text to HTML by Adding

and
Tags
Problem
Solution
Step 1: Replace HTML special characters with named character references
Step 2: Replace all line breaks with

Step 3: Replace double
tags with


Step 4: Wrap the entire string with


Example JavaScript solution
Discussion
Step 1: Replace HTML special characters with named character references
Step 2: Replace all line breaks with

Step 3: Replace double
tags with


Step 4: Wrap the entire string with


See Also
9.6  Decode XML Entities
Problem
Solution
Regular expression
Replace matches with their corresponding literal characters
Example JavaScript solution
Discussion
See Also
9.7  Find a Specific Attribute in XML-Style Tags
Problem
Solution
Tags that contain an id attribute (quick and dirty)
Tags that contain an id attribute (more reliable)
tags that contain an id attribute
Tags that contain an id attribute with the value “my-id”
Tags that contain “my-class” within their class attribute value
Discussion
See Also
9.8  Add a cellspacing Attribute to Tags That Do Not Already Include It
Problem
Solution
Solution 1, simplistic
Solution 2, more reliable
Insert the new attribute
Discussion
See Also
9.9  Remove XML-Style Comments
Problem
Solution
Discussion
How it works
When comments can’t be removed
Variations
Find valid XML comments
Find valid HTML comments
See Also
9.10  Find Words Within XML-Style Comments
Problem
Solution
Two-step approach
Single-step approach
Discussion
Two-step approach
Single-step approach
Variations
See Also
9.11  Change the Delimiter Used in CSV Files
Problem
Solution
Example web page with JavaScript
Discussion
See Also
9.12  Extract CSV Fields from a Specific Column
Problem
Solution
Example web page with JavaScript
Discussion
Variations
Match a CSV record and capture the field in column 1 to backreference 1
Match a CSV record and capture the field in column 2 to backreference 1
Match a CSV record and capture the field in column 3 or higher to backreference 1
Replacement string
See Also
9.13  Match INI Section Headers
Problem
Solution
Discussion
Variations
See Also
9.14  Match INI Section Blocks
Problem
Solution
Discussion
See Also
9.15  Match INI Name-Value Pairs
Problem
Solution
Discussion
See Also
Index