XML Character Classes

XML Schema and XPath regular expressions support the usual six shorthand character classes, plus four more. These four aren't supported by any other regular expression flavor. \i matches any character that may be the first character of an XML name. \c matches any character that may occur after the first character in an XML name. \Iand \C are the respective negated shorthands. Note that the \c shorthand syntax conflicts with the control character syntax used in many other regex flavors.

You can use these four shorthands both inside and outside character classes using the bracket notation. They're very useful for validating XML references and values in your XML schemas. The regular expression \i\c*matches an XML name like xml:schema.

The regex <\i\c*\s*> matches an opening XML tag without any attributes. </\i\c*\s*> matches any closing tag. <\i\c*(\s+\i\c*\s*=\s*("[^"]*"|'[^']*'))*\s*> matches an opening tag with any number of attributes. Putting it all together, <(\i\c*(\s+\i\c*\s*=\s*("[^"]*"|'[^']*'))*|/\i\c*)\s*> matches either an opening tag with attributes or a closing tag.

No other regex flavors discussed in this tutorial support XML character classes. If your XML files are plain ASCII , you can use [_:A-Za-z] for \i and [-._:A-Za-z0-9] for \c. If you want to allow all Unicode characters that the XML standard allows, then you will end up with some pretty long regexes. You would have to use[:A-Z_a-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02FF\u0370-\u037D\u037F-\u1FFF\u200C-\u200D \u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD] instead of \i and[-.0-9:A-Z_a-z\u00B7\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u037D\u037F-\u1FFF\u200C-\u200D \u203F\u2040\u2070-\u218F\u2C00-\u2FEF\u3001-\uD7FF\uF900-\uFDCF\uFDF0-\uFFFD] instead of \c.

XML Character Classes

Post a Comment

0 Comments

Popular Posts

5 Steps to Increase Your Uc Browser Union Wapsite Earnings

The Regex Engine Always Returns the Leftmost Match

Check: Navy unveils 39 new patrol boats to fight maritime crime

Facebook

Categories

Tags

Recent Posts

Categories

Tags

Recent in News

Footer Menu Widget

XML Character Classes

You may like these posts

Post a Comment

0 Comments

Popular Posts

5 Steps to Increase Your Uc Browser Union Wapsite Earnings

The Regex Engine Always Returns the Leftmost Match

Check: Navy unveils 39 new patrol boats to fight maritime crime

Facebook

Categories

Tags

Recent Posts

Categories

Tags

Recent in News

Footer Menu Widget