Using regex we search more than string literals, hence we need to reserve certain characters for special use.
Below are the metacharacters in Regex i.e these characters has special meaning and if we want to explicitly search these characters using Regex we need to escape these characters using a black slash \ .
10 Meta characters [ ] #Character set \ #Escape Character ^ #Negate Character $ #Ends with Character . #Matches any Character | ? #Zero or one * #Zero or more + #One or more ( )
Most regex engines treats “{” as a literal character unless it is a part of repetition operation operation like {1,3}. Generally you don’t need to escape this character but in java.util.regex package, it requires all the literal braces to be escaped.
Note : “/” should not be used when not required because combination of “/” and a literal can create a regex token, example “/d” will match single digit from 0-9
Q & A
Question : Search for “C:\temp” in “Hello world this is the temp dir path C:\tmp you can find files here” text
Text : Hello world this is the temp dir path C:\tmp you can find files here
Find : C:\temp
Regex : C:\\temp
Non printable characters
You can use special characters to match non printable character in regular expression. Suppose you want to search a tab space then you have to use “\t” as a regex to search tab space. Below are some of the non printable characters.
\s | Space character |
\t | Tab character |
\r | Carriage Return |
\n | New Line feed |
\v | Vertical tab |
\x | Search for hexadecimal ASCII Example “\xA9” for copyright symbol |
\c | Search for ASCII Control chracters Example “\cM” for carriage return |
\u | Search for Unicode Example “\u20AC” for euro currency symbol (UTF-8 may be the encoding) |
\b | Word Boundaries |
\B | \B points to position that \b does not |
\w | Search for words(a-z and A-Z and 0-9) |
\W | Search other than words |
\d | Search for digit (0-9) |
\D | Search for other than digits |
Note : To understand difference between encoding and character set follow the hyperlink
Q & A
Question : Search for “©” symbol in “Hello World i am copyright symbol © search me” text
Text : Hello World i am copyright symbol © search me
Find : ©
Regex UNICODE: \u00A9
Regex Hexcode : \xA9
Most regex flavors also support the tokens «\cA» through «\cZ» to insert ASCII control characters. The letter after the backslash is always a lowercase c. The second letter is an uppercase letter A through Z, to indicate Control+A through Control+Z. These are equivalent to «\x01» through «\x1A» (26 decimal). E.g. «\cM» matches a carriage return, just like «\r» and «\x0D». In XML Schema regular expressions, «\c» is a shorthand character class that matches any character allowed in an XML name.
If your regular expression engine supports Unicode, use «\uFFFF» rather than «\xFF» to insert a Unicode character. The euro currency sign occupies code point 0x20AC. If you cannot type it on your keyboard, you can insert it into a regular expression with «\u20AC».
Oh my goodness! Amazing article dude! Thanks, However I am encountering issues with your RSS.
I don’t know why I can’t subscribe to it. Is there anybody else having the same
RSS problems? Anyone who knows the solution will you kindly
respond? Thanx!!