Meta-characters in Regex

Using regex we search more than string literals, hence we need to reserve certain characters for special use.

Below are the metacharacters in Regex i.e these characters has special meaning and if we want to explicitly search these characters using Regex we need to escape these characters using a black slash \ .

10 Meta characters
[ ]    #Character set 
\      #Escape Character
^      #Negate Character
$      #Ends with Character
.      #Matches any Character
|
?      #Zero or one
*      #Zero or more
+      #One or more
( )

Most regex engines treats “{” as a literal character unless it is a part of repetition operation operation like {1,3}. Generally you don’t need to escape this character but in java.util.regex package, it requires all the literal braces to be escaped.

Note : “/” should not be used when not required because combination of “/” and a literal can create a regex token, example “/d” will match single digit from 0-9

Q & A

Question : Search for “C:\temp” in “Hello world this is the temp dir path C:\tmp you can find files here” text
Text : Hello world this is the temp dir path C:\tmp you can find files here
Find : C:\temp
Regex : C:\\temp

Non printable characters

You can use special characters to match non printable character in regular expression. Suppose you want to search a tab space then you have to use “\t” as a regex to search tab space. Below are some of the non printable characters.

\sSpace character
\tTab character
\rCarriage Return
\nNew Line feed
\vVertical tab
\xSearch for hexadecimal ASCII Example “\xA9” for copyright symbol
\cSearch for ASCII Control chracters Example “\cM” for carriage return
\uSearch for Unicode Example “\u20AC” for euro currency symbol (UTF-8 may be the encoding)
\bWord Boundaries
\B\B points to position that \b does not
\wSearch for words(a-z and A-Z and 0-9)
\WSearch other than words
\dSearch for digit (0-9)
\DSearch for other than digits

Note : To understand difference between encoding and character set follow the hyperlink

Q & A

Question : Search for “©” symbol in “Hello World i am copyright symbol © search me” text
Text : Hello World i am copyright symbol © search me
Find : ©
Regex UNICODE: \u00A9
Regex Hexcode : \xA9

Most regex flavors also support the tokens «\cA» through «\cZ» to insert ASCII control characters. The letter after the backslash is always a lowercase c. The second letter is an uppercase letter A through Z, to indicate Control+A through Control+Z. These are equivalent to «\x01» through «\x1A» (26 decimal). E.g. «\cM» matches a carriage return, just like «\r» and «\x0D». In XML Schema regular expressions, «\c» is a shorthand character class that matches any character allowed in an XML name.

If your regular expression engine supports Unicode, use «\uFFFF» rather than «\xFF» to insert a Unicode character. The euro currency sign occupies code point 0x20AC. If you cannot type it on your keyboard, you can insert it into a regular expression with «\u20AC».

1 thought on “Meta-characters in Regex”

  1. Oh my goodness! Amazing article dude! Thanks, However I am encountering issues with your RSS.
    I don’t know why I can’t subscribe to it. Is there anybody else having the same
    RSS problems? Anyone who knows the solution will you kindly
    respond? Thanx!!

Leave a Comment