What are character classes, and what are some common examples?

Let’s learn about character classes in regular expressions, including some common examples.

Character classes are a special syntax you can use to match sets or subsets of characters.

The first character class you should learn is the wildcard class.

The wildcard is represented by a period or dot (.), and matches any single character except line breaks. To allow the wildcard class to match line breaks, remember that you would need to use the s flag.

A regular expression that matched the letter a followed by one single character might look like:

const regex = /a./;

This can be helpful when you are looking for specific patterns in a string, but don’t know what might be between those two patterns. But you can also use character classes to narrow down your matches.

For example, what if you wanted to match a numerical character? You might have to write out every possible digit, separating them with the or operator (|):

const regex = /0|1|2|3|4|5|6|7|8|9/;

A character class exists for this exact pattern, and gives you a shorthand syntax for writing the same thing. In this case, the character class is written as a backslash (\) followed by a d character:

const regex = /\d/;

This regular expression will match the exact same pattern as our previous expression: a single numerical character anywhere in the string.

Now consider a regular expression which also needs to match any letter character a through z. You could write out each individual character separated by the or operator. Or you could use another character class.

The \w class, which is a backslash followed by a w, represents any word character:

const regex = /\w/;

A word character is defined as any letter, from a to z, or a number from 0 to 9, or the underscore (_) character.

The inclusion of the underscore might seem strange, but consider the naming conventions for variables – variable names can often include underscores, so \w is designed to match that as well.

There is one more special character class to consider: the whitespace class \s, represented by a backslash followed by an s. This character class will match any whitespace, including new lines, spaces, tabs, and special unicode space characters.

These special character classes can be negated. To negate one of these character classes, instead of using a lowercase letter after the backslash, use the uppercase equivalent:

const regex = /\D/;

This regular expression, for example, does not match a numerical character. Instead, it matches any single character that is not a numerical character.

Negating the \w class would match any character that is not a to z0 to 9, or an underscore, and negating the \s character class would match any character that is not a whitespace.

But what if you wanted to match more specific subsets of characters?

Maybe you’re a professor grading papers, and you need to make sure your grades are valid. A valid grade can be ABCD, or F. You can use square brackets to construct your own character class:

const regex = /[abcdf]/;

This regular expression will match a single character that is in the list abcd, or f.

What about checking only grades that pass? A passing grade would be an ABC, or D. You can modify your character class to stop matching f by removing that character from the list:

const regex = /[abcd]/;

You may have noticed now that our character class consists only of consecutive characters. abc, and d are all directly next to each other in the alphabet. For numbers, consecutive characters might be 45, and 6.

When you have consecutive characters, you can create a range using the hyphen character.

Using a range, we can turn our regular expression into a shorter syntax while matching the exact same pattern.

const regex = /[a-d]/;

Remember that regular expressions are case-sensitive by default. This means our character class will only match the lowercase variants of abc, and d. You could use the i flag to achieve this, but you can also bake it directly into your character class by including the uppercase variants:

const regex = /[a-zA-Z]/;

You can also mix digits and numbers in your character class. For example, if you wanted the behavior of the \w class without the underscore, you could construct your own:

const regex = /[a-zA-Z0-9]/;

Note that if you want your character class to match a literal hyphen, you need to place a hyphen at the beginning or end of the class:

const regex = /[-a-zA-Z0-9]/;

And finally, you can include special character classes in your custom class. Maybe you want to include a hyphen in the set matched by \w:

const regex = /[-\w]/;

Character classes are a powerful tool that gives you incredible control over your pattern matching.