Regular Expressions: Tutorial

The absolute minimum amount of regex to learn

The minimum Regex one should know and still be fairly productive.

Almost all of this stuff should work in most regex implementations.
Notable exceptions are: \< and \>, [a-Z], \Q\E. Your mileage may vary.

Basic Syntax

Regex Matches Remarks
abc Literals match themselves. Here: abc  
^s s but only at the start of the input  
e$ e but only at the end. ^ and $ match a position.
\$ Escape character \: Match any $  
. Match any one character Matches \r\n with dotall (s) flag
\<e Word that starts with e Word boundaries: [a-zA-Z0-9_]
e\> That ends with e  
\b Word boundary Between \w and \W
Character Classes    
[az] a or z  
[0-9] A single digit \d
[a-Z] This might also include []\\^_ and a literal `  
[\n$^-] A newline, $, ^ or a hyphen (-) A ^ at the start negates the class
[^a\b] Everything but a and ‘backspace’ [\b] (backspace) vs \b (word boundary)
a? Zero or one  
a* Zero or more  
a+ One or more  
a{2,5} Two to five  
a{4} Exactly four  
a{5,} Five or more a{,5} for 0 to 5
(ab|yz) Literal ab or yz. Alternation
(?:ab) ab but non-capturing group  
(\w+) \1 \1 matches the same thing as (\w+) Backreferences (also \2 etc)


Shorthand Meaning Remarks
\d [0-9] \D -> [^0-9]
\w [a-zA-Z0-9_]  
\s Whitespace  
\t Tab  
\r Carriage return  
\n Newline  
\b Word boundary \A and \Z: Start/End of string
\Q…\E Literal sequence Ex: \Qlite[]ral\E

Shorthands can be inversed by capitalizing them: \D (not a digit)

Also \v (vertical tab), \f (form feed).
Inside a character clas \b matches backspace.


Modifier Description
g(lobal) JS: Match more than once
i(nsensitive) Case insensitive matching
m(ultiline) ^$ match every line in the string (vs \A and \Z)
s (dotall) . matches \r|\n


Replacement Description
$1 First captured group
$2 Second captured group
$$ A literal $
$& Entire match
$` Before matched string
$’ After matched string
$+ Last matched string

Some implementations use \ instead of $.

Use non-capturing groups (?:) to keep your backreferences ($1, $2, …) in check.
Or use named groups if supported in your regex implementation.

Looking around

The last & thoughest feature each developer should definitely know: lookahead & lookbehind!

Lookahead vs Lookbehind syntax


A(?=B)   – Literal A followed by B
A(?!B)   – Literal A not followed by B

Lookarounds do not match anything in themselves which make them very handy when you want to replace some text but only when it is (not) preceded/succeeded by something else.

