Regular Expressions: Tutorial
posted in productivity on • by Wouter Van SchandevijlThe minimum Regex one should know and still be fairly productive.
Almost all of this stuff should work in most regex implementations.
Notable exceptions are: \< and \>, [a-Z], \Q\E. Your mileage may vary.
Basic Syntax
| Regex | Matches | Remarks |
|---|---|---|
| Literals | ||
| abc | Literals match themselves. Here: abc |
|
| Metacharacters | ||
| ^s | s but only at the start of the input |
|
| e$ | e but only at the end. |
^ and $ match a position. |
| \$ | Escape character \: Match any $ |
|
| . | Match any one character | Matches \r\n with dotall (s) flag |
| \<e | Word that starts with e |
Word boundaries: [a-zA-Z0-9_] |
| e\> | That ends with e |
|
| \b | Word boundary | Between \w and \W |
| Character Classes | ||
| [az] | a or z |
|
| [0-9] | A single digit | \d |
| [a-zA-Z] | ||
| [a-Z] | This might also include []\\^_ and a literal ` |
|
| [\n$^-] | A newline, $, ^ or a hyphen (-) |
A ^ at the start negates the class |
| [^a\b] | Everything but a and ‘backspace’ |
[\b] (backspace) vs \b (word boundary) |
| Quantifiers | ||
| a? | Zero or one | |
| a* | Zero or more | |
| a+ | One or more | |
| a{2,5} | Two to five | |
| a{4} | Exactly four | |
| a{5,} | Five or more | a{,5} for 0 to 5 |
| Grouping | ||
| (ab|yz) | Literal ab or yz. |
Alternation |
| (?:ab) | ab but non-capturing group |
|
| (\w+) \1 | \1 matches the same thing as (\w+) |
Backreferences (also \2 etc) |
Shorthands
| Shorthand | Meaning | Remarks |
|---|---|---|
| \d | [0-9] | \D -> [^0-9] |
| \w | [a-zA-Z0-9_] | |
| \s | Whitespace | |
| \t | Tab | |
| \r | Carriage return | |
| \n | Newline | |
| \b | Word boundary | \A and \Z: Start/End of string |
| \Q…\E | Literal sequence | Ex: \Qlite[]ral\E |
Shorthands can be inverted by capitalizing them: \D (not a digit)
Also \v (vertical tab), \f (form feed).
Inside a character class \b matches backspace.
Modifiers
| Modifier | Description |
|---|---|
g(lobal) |
JS: Match more than once |
i(nsensitive) |
Case insensitive matching |
m(ultiline) |
^$ match every line in the string (vs \A and \Z) |
s (dotall) |
. matches \r|\n |
Replacement
| Replacement | Description |
|---|---|
| $1 | First captured group |
| $2 | Second captured group |
| $$ | A literal $ |
| $& | Entire match |
| $` | Before matched string |
| $’ | After matched string |
| $+ | Last matched string |
Some implementations use \ instead of $.
Use non-capturing groups (?:) to keep your backreferences ($1, $2, …) in check.
Or use named groups if supported in your regex implementation.
Looking around
The last & thoughest feature each developer should definitely know: lookahead & lookbehind!

Example:
A(?=B) – Literal A followed by B
A(?!B) – Literal A not followed by B
Lookarounds do not match anything in themselves which make them very handy when you want to replace some text but only when it is (not) preceded/succeeded by something else.
-
itenium-be/Regex-Tutorial : Powerpoint pptx and hands-on exercises