Use Negated Character Classes Instead of the Dot

negated character class is often more appropriate than the dot. The tutorial section that explains the repeat operators star and plus covers this in more detail. But the warning is important enough to mention it here as well. Again let's illustrate with an example.
Suppose you want to match a double-quoted string. Sounds easy. We can have any number of any character between the double quotes, so ".*" seems to do the trick just fine. The dot matches any character, and the star allows the dot to be repeated any number of times, including zero. If you test this regex onPut a "string" between double quotes, it matches "string" just fine. Now go ahead and test it onHouston, we have a problem with "string one" and "string two". Please respond.
Ouch. The regex matches "string one" and "string two". Definitely not what we intended. The reason for this is that the star is greedy.
In the date-matching example, we improved our regex by replacing the dot with a character class. Here, we do the same with a negated character class. Our original definition of a double-quoted string was faulty. We do not want any number of any character between the quotes. We want any number of characters that are not double quotes or newlines between the quotes. So the proper regex is "[^"\r\n]*".

Post a Comment

0 Comments