Character classes in Regular Expression

Expression Meaning
[abc] Either ‘a’ or ‘b’ or ‘c’
[^abc] Except ‘a’, or ‘b’ or ‘c’ (Negate)
[a-z] Any lower case alphabet symbol
[A-Z] Any upper case alphabet symbol
[a-zA-Z] Any alphabet symbol
[0-9] Any digit from 0-9
[a-zA-Z0-9] Any alphanumeric character
[^a-zA-Z0-9] Special Character i.e Except Alphanumeric characters

Negation in character set([ ])?

Negating a character set is done by a “^” caret symbol, suppose the text is “Jimmy meets jonny at Germany@9:30 CET” and the regex is “y[^a-zA-Z0-9]”. So this should not be interpreted like “y followed by a special character” but it should be interpreted like “y followed by any character other than alphabets or numbers”. In the above example regex will match “y followed by space after Jimmy” and “y followed by space after jonny” and “y followed by @ after Germany”

Meta-characters inside character set ([ ])?

• No, not all the meta-characters have special meaning in a character set, only “[“,”]”square brackets,”^”caret, “-“hyphen have special meaning inside character set, set of the characters are just normal characters and you don’t have to escape then.
• Example : If you want to search and “*” or a “+” symbol you can directly do it using regex [*+]. • You can escape these character but by doing this it will reduce the readability of regex; using “\” and the regex will become [\*\+]
• All the non printable character like /u and /x work the same way as they work outside the character set.

Shorthand Character Classes

Since certain characters are used always hence a shorthand exist for them.
• \d for digits from 0-9
• \w for word character
• \s for white space character
Above three short hand has negative shorthand class as well
• \D is similar to [^\d]
• \W is similar to [^\w]
• \S is similar to [^\s]

Be careful when using the negated shorthands inside square brackets. «[\D\S]» is not the same as «[^\d\s]». The latter will match any character that is not a digit or whitespace. So it will match „x”, but not “8”. The former, however, will match any character that is either not a digit, or is not whitespace. Because a digit is not whitespace, and whitespace is not a digit, «[\D\S]» will match any character, digit, whitespace or otherwise

Repeating Character Classes

• By using ” * “, ” + ” and ” ? ” you can check for zero or more, one or more and one or zero respectively.
• Suppose you want to search words in a paragraph you would regex [a-zA-z]+

Q & A

Search for words in a sentence “Hello World i am learning regex”
Text : Hello World i am learning regex
Results : “Hello” “Wold” “i” “am” “learning” “regex”
=================================================
Regex : [a-zA-Z]+

• If you want to reference a captured group again in search then you can use \1 to reference to the first captured group and \2 to reference to the second capture group and so on..

Q & A

Find html tags in a given input
Input :
<html>
<body>
Hello Regex
</body
</html>
Search : <body> and all other tags
==========================================================
Regex : <[//a-zA-Z]+>

Below example demonstrate how regex character sets location can be printed using regex in java

package demo.example2;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * @author Tyson
 *
 *         Searching for a group of things together
 *
 *         Character Classes Example
 *
 */
public class Demo2 {
	public static void main(String[] args) {
		printMatch("[abc]", "a7b@2#9", "Find a or b or c");
		printMatch("[^abc]", "a7b@2#9", "Find Excepet a or b or c");
		printMatch("[a-z]", "a7b@2#9", "Find anything between a to z");
		printMatch("[0-9]", "a7b@2#9", "Find any digit");
		printMatch("[a-zA-Z0-9]", "a7b@2#9", "Find any alphanumeric value");
		printMatch("[^a-zA-Z0-9]", "a7b@2#9", "Find any special characters ie excepet alphanumeric");
	}

	public static void printMatch(String regularExpression, String targetString, String comment) {
		Pattern p = Pattern.compile(regularExpression);
		Matcher m = p.matcher(targetString);
		System.out.println("===" + regularExpression + "===in===" + targetString + "=== " + comment);
		while (m.find()) {
			System.out.println(m.start() + "..." + m.group());
		}
		System.out.println("\n");
	}
}

Leave a Comment