Capturing Groups and Backreferences

What are capturing groups and backreferences, and how do they work?

Let’s learn about capturing groups and backreferences in regular expressions.

A capturing group allows you to “capture” a portion of the matched string to use however you might need. Capturing groups are defined by parentheses containing the pattern to capture, with no leading characters like a lookahead.

Let’s capture the code from our freeCodeCamp regular expression. To do that, we’ll enclose code in parentheses and define it as a capture group:

const regex = /free(code)camp/i;

To confirm the behavior, we can test it against a freecodecamp string:

const regex = /free(code)camp/i;
console.log(regex.test("freecodecamp")); // true

But this doesn’t actually make use of our captured group. Instead, let’s take a look at the result of using match:

const regex = /free(code)camp/i;
console.log("freecodecamp".match(regex));
// [
//   'freecodecamp',
//   'code', <--
//   index: 0,
//   input: 'freecodecamp',
//   groups: undefined
// ]

Here we can see that our match array has a second element, which is the portion of the string which was captured by our capture group.

Notice how the capture group matches the exact pattern code, where a character class would match a single character from the list c, o, d, and e.

But how can we actually use this? Well, capture groups are often used when replacing contents of a string. Let’s set up some code to do that. We’re going to turn freecodecamp into paidcodeworld:

const regex = /free(code)camp/i;
console.log("freecodecamp".replace(regex, "paidcodeworld"));

This works on its own, but what if we didn’t know how many o’s were in code? If we need a quantifier for one or more os:

const regex = /free(co+de)camp/i;
console.log("freecoooooooodecamp".replace(regex, "paidcodeworld"));

We’re getting paidcodeworld as our result. We want to preserve the number of o’s, so we need to reuse what was captured by the regular expression.

This is where a backreference comes in. Instead of hardcoding the code portion of our replacement string, we can reference the captured group directly.

In a replace call, you achieve a backreference by using a dollar sign ($) followed by the number of the capture group to use. In our case, that would be $1, since code is captured in the first capture group:

const regex = /free(co+de)camp/i;
console.log("freecoooooooodecamp".replace(regex, "paid$1world")); // paidcooooooooworld

We have now successfully preserved an unknown number of o characters when converting freecodecamp into paidcodeworld. But backreferences aren’t just limited to the replace call. You can actually use them directly in a regular expression.

This would allow you to match a previously captured pattern later on in the regular expression.

Let’s say we want to match freecodecamp twice, with the same number of o’s, but anywhere in the string.

First, we need to separate them with our wildcard character, and allow any number of characters to match that wildcard:

const regex = /free(co+de)camp.*free(co+de)camp/i;

This current expression won’t ensure that the number of o characters is the same, however. To achieve that, we need to replace the second capture group with a reference to the first.

Inside a regular expression, a backreference is denoted with a backslash followed by the number of the capture group:

const regex = /free(co+de)camp.*free\1camp/i;
console.log(regex.test("freecooooodecamp is great i love freecooooodecamp")); // true
console.log(regex.test("freecooooodecamp is great i love freecodecamp")); // false

And with that, we can see that a string with the correct number of os matches, while a string with two different numbers of os does not.

This syntax is great, but can quickly get confusing when you are referencing multiple capture groups. Thankfully, instead of using numbers, you can give your groups names.

To define a named capture group, you add a question mark (?) followed by the name enclosed in less than and greater than signs to the beginning of the group. Let’s name our capture group code:

const regex = /free(?<code>co+de)camp.*free\1camp/i;

Now we can update our backreference in the regular expression to refer to this group. A named backreference starts with a backslash followed by the letter k in JavaScript. Then you add the name, again enclosed in less than (<) and greater than (>) signs. Let’s take a look at that:

const regex = /free(?<code>co+de)camp.*free\k<code>camp/i;

Now if we check our test() call, we can see that we still pass:

const regex = /free(?<code>co+de)camp.*free\k<code>camp/i;
console.log(regex.test("freecooooodecamp is freecooooodecamp")); // true

To use our named capture group in a replace() call, we’d insert a dollar sign into the string, followed by the name enclosed in less than and greater than signs:

const regex = /free(?<code>co+de)camp/i;
console.log("freecooooodecamp".replace(regex, "paid$<code>camp")); // paidcooooodecamp

Finally, sometimes you want to create a group of characters, but don’t need the captured result.

Let’s say we want to match either freecodecamp or freecandycamp. You could create two patterns separated by an OR assertion:

const regex = /freecodecamp|freecandycamp/i;

But this can become quite lengthy for larger-scale regular expressions. Instead, you can create a non-capturing group around the characters that you need to OR:

const regex = /free(?:code|candy)camp/i;

A non-capturing group does not store the code|candy match separately in memory. But it can be helpful for creating alternate patterns without sacrificing readability or performance.

Cerebrum

Explorer

Capturing Groups and Backreferences

Graph View

Backlinks