Line Break Characters

While support for the dot is universal among regex flavors, there are significant differences in which characters they treat as line break characters. All flavors treat the newline \n as a line break. UNIX text files terminate lines with a single newline. All the scripting languages discussed in this tutorial do not treat any other characters as line breaks. This isn't a problem even on Windows where text files normally break lines with a \r\n pair. That's because these scripting languages read and write files in text mode by default. When running on Windows, \r\npairs are automatically converted into \n when a file is read, and \n is automatically written to file as \r\n.

XML Schema and XPath also treat the carriage return \r as a line break character. JavaScript adds the Unicode line separator \u2028 and page separator \u2029 on top of that. Java includes these plus the Latin-1 next line control character \u0085. Only Delphi and the JGsoft flavor supports all Unicode line breaks, adding the form feed\f and the vertical tab \v to the mix.

.NET is notably absent from the list of flavors that treat characters other than \n as line breaks. Unlike scripting languages that have their roots in the UNIX world, .NET is a Windows development framework that does not automatically strip carriage return characters from text files that it reads. If you read a Windows text file as a whole into a string, it will contain carriage returns. If you use the regex abc.* on that string, without setting RegexOptions.SingleLine, then it will match abc plus all characters that follow on the same line, plus the carriage return at the end of the line, but without the newline after that.

Some flavors allow you to control which characters should be treated as line breaks. Java has the UNIX_LINES option which makes it treat only \n as a line break. PCRE has options that allow you to choose between \n only, \ronly, \r\n, or all Unicode line breaks.

Line Break Characters

Post a Comment

0 Comments

Popular Posts

5 Steps to Increase Your Uc Browser Union Wapsite Earnings

The Regex Engine Always Returns the Leftmost Match

Check: Navy unveils 39 new patrol boats to fight maritime crime

Facebook

Categories

Tags

Recent Posts

Categories

Tags

Recent in News

Footer Menu Widget

Line Break Characters

You may like these posts

Post a Comment

0 Comments

Popular Posts

5 Steps to Increase Your Uc Browser Union Wapsite Earnings

The Regex Engine Always Returns the Leftmost Match

Check: Navy unveils 39 new patrol boats to fight maritime crime

Facebook

Categories

Tags

Recent Posts

Categories

Tags

Recent in News

Footer Menu Widget