Special Characters

Because we want to do more than simply search for literal pieces of text, we need to reserve certain characters for special use. In the regex flavors discussed in this tutorial, there are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), and the opening square bracket [, the opening curly brace {, These special characters are often called "metacharacters".
If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash. If you want to match 1+1=2, the correct regex is 1\+1=2. Otherwise, the plus sign has a special meaning.
Note that 1+1=2, with the backslash omitted, is a valid regex. So you won't get an error message. But it doesn't match 1+1=2. It would match 111=2 in 123+111=234, due to the special meaning of the plus character.
If you forget to escape a special character where its use is not allowed, such as in +1, then you will get an error message.
Most regular expression flavors treat the brace { as a literal character, unless it is part of a repetition operator likea{1,3}. So you generally do not need to escape it with a backslash, though you can do so if you want. An exception to this rule is the Java, which requires all literal braces to be escaped.
All other characters should not be escaped with a backslash. That is because the backslash is also a special character. The backslash in combination with a literal character can create a regex token with a special meaning. E.g. \d is a shorthand that matches a single digit from 0 to 9.

Escaping a single metacharacter with a backslash works in all regular expression flavors. Many flavors also support the \Q…\E escape sequence. All the characters between the \Q and the \E are interpreted as literal characters. E.g. \Q*\d+*\E matches the literal text *\d+*. The \E may be omitted at the end of the regex, so \Q*\d+* is the same as \Q*\d+*\E. This syntax is supported by the JGsoft enginePerlPCREPHPDelphi, and Java, both inside and outside character classes. Java 4 and 5 have bugs that cause \Q…\E to misbehave, however, so you shouldn't use this syntax with Java.

Post a Comment

0 Comments