More Shorthand Character Classes

While support for \d\s, and \w is quite universal, there are some regex flavors that support additional shorthand character classes. Perl 5.10 introduced \h and \v\h matches horizontal whitespace, which includes the tab and all characters in the "space separator" Unicode category. It is the same as [\t\p{Zs}]\v matches "vertical whitespace", which includes all characters treated as line breaks in the Unicode standard. It is the same as[\n\cK\f\r\x85\x{2028}\x{2029}]PCRE also supports \h and \v starting with version 7.2. PHP does as of version 5.2.2 and Java as of version 8.
In many other regex flavors, \v matches only the vertical tab character. Perl, PCRE, and PHP never supported this, so they could give \v a different meaning. Java 4 to 7 did use \v to match only the vertical tab but Java 8 changed the meaning of this token anyway. The vertical tab is also a vertical whitespace character. To avoid confusion, the above paragraph uses \cK to represent the vertical tab.
Ruby 1.9 and later have their own version of \h. It matches a single hexadecimal digit just like [0-9a-fA-F]\v is a vertical tab in Ruby.

Post a Comment

0 Comments