Generic Regular Expressions (The Boring Kind)

This page details the generic concept of regular expressions and may exclude certain implementation-specific concepts. The BARF implementation of regular expressions has certain extensions and caveats which are described fully in Regular Expressions As Implemented By BARF (The Awesome Kind).

A regular expression (regex) is a syntactical form for compactly specifying a regular language. In other words, a regex is a string which defines a set of strings which are acceptable to the machine using the regex. Examples of regular expressions:

xyz 
hip{2}o 
this|that 
smashy( smashy)* 

Structure Of A Regular Expression

TODO: write about regexes, branches, pieces, atoms, etc -- in the context of parsing a regex

Atoms

In the context of a regex, atoms are the most basic components from which more complicated regexes are created (hence the "atom" metaphor). You can think of an atom as accepting a single character (though technically the special "conditional" characters count as atoms, but they will not be discussed in this page). There are several forms of atoms.

Operations

Regular expressions use a terse notation -- they wouldn't be very useful if each regex string was longer than the strings each accepts. For the sake of simplicity, the examples in this section will not use escaped characters or bracket expressions. The operations are as follows, in order of highest to lowest precedence.

Conditionals In Generic Regular Expressions

From within the context of a regular expression (not inside a bracket expression), the characters ^ (carat) and $ (dollar sign) are special -- they match the empty string at the beginning of the line and the end of the line, respectively. The beginning of a line is denoted by the beginning of input, or if a newline was just accepted. The end of a line is denoted by the end of input, or if the next character in the unread input is a newline.

These two special characters are referred to as conditionals, in that they don't accept any physical input, but rather require certain input conditions to be met to accept. In various regex implementations, other conditionals exist, such as:

Here is an illustrative example.

two
lines! 

Again, see Regular Expressions As Implemented By BARF (The Awesome Kind) for implementation-specific details.


Hosted by SourceForge.net Logo -- Generated on Mon Jan 7 22:58:00 2008 for BARF by doxygen 1.5.1