Links

Regular Expressions (RegEx)

Basic introduction to Regular Expressions

Overview

Regular expressions are a way to identify patterns in data. A Regular Expression (RegEx) uses a sequence of characters to specify a search pattern, and it can represent something as simple as "every value containing a zero" or something complicated like "every value that's between 10-12 characters in length and doesn't contain a capital A, a lowercase g, or a zero"
The backslash \ character is an "escape" character, which signals that whatever comes after it should be treated specially. (this means . returns different results than \. , as shown in the structures guide below)
Regular Expressions can be quite overwhelming at first, but most standard needs in Data-flo will be accomplished using a small number of structures. There are numerous resources online for learning RegEx, including a straightforward tutorial at RegexOne.

Basic RegEx structures guide

The following table shows the structure of a piece of RegEx, what that structure represents, and Examples of what it might return. See below for specific examples showing real-world use of these structures in Data-flo adaptor arguments.
RegEx structure
RegEx meaning
Examples
abcABC...
letters
Text
123...
digits (numbers)
9825
\d
any digit (number)
4
\D
any non-digit character
B; or _
.
any character
4; or B; or _
\.
full stop (period)
.
[abc]
only a, b, or c
[gb]et matches get and bet, but doesn't match let or net
[^abc]
Not a, b, or c
[^ln]et matches get and bet, but doesn't match let or net
[a-z]
characters a to z
[a-z]101 matches m101 but not 2101
[0-9]
numbers 0 to 9
[0-9]101 matches 2101 but not a101
\w
any alphanumeric character
a; or T; or 7
\W
any non-alphanumeric character
_; or @
{m}
m repetitions
a{3} matches aaa; [wxy]{3} can match www, xxx, wyy, etc.; [0-9]{2} matches any two-digit number
{m,n}
m to n repetitions
a{2,4} matches aa or aaa or aaaa; .{2,3} matches any two- or three-character string
*
zero or more repetitions
+
one or more repetitions
AB+ matches AB or ABAB or ABCAB, but not BA or BACB
?
optional character
ba?123 matches ba123 or b123 but not a123
\s
any whitespace (space, tab, new-line, carriage return)
a\sb matches a b
\S
any non-whitespace character (anything but space, tab, new-line, carriage return)
a\Sb matches aab but not a b
^...$
starts and ends (anchors to the beginning and end of a field) (Note: $ in a reference is different than in a pattern; in a reference, $ references a specific capture group)
^123$ matches 123 but not 1123 or 1233
(...)
capture group
(a(bc))
capture sub-group
(.*)
capture all
(abc|def)
matches abc or def

Examples

/^.+-CGPS+-[0-9]{6}/ matches someamountoftext-CGPS-123454 and matches someamountoftext-CGPS-1234540 and matches x-CGPS-0000000000 and matches x-CGPS-CGPS-000000 because in all cases, the start of the line ^ is followed by any character . any number of times + , followed by the very specific text -CGPS any number of times +, followed by any six {6} digits [0-9] (and doesn't specify what happens after the six digits).
If a dollar-sign is added at the end, it signifies that there are six digits and that's the end of the line, so /^.+-CGPS+-[0-9]{6}$/ matches someamountoftext-CGPS-123454 but not someamountoftext-CGPS-1234540 .

Converting lat/long to negative numbers

/(.+)[W|w|West|WEST|west|S|s|South|south]/ matches 100.67W and matches 90.7 S and can be used as the pattern when converting latitude and longitude to negative numbers, with the replacement value -$1 turning those values into -100.67 and -90.7 respectively.

Select and reference everything in a field

  • pattern is everything in the field: /^(.+)$/
    • ^ and $anchor the start & end of the field, () designate a capture group to reference in the replacement, and .+ means any characters any number of times (at least once).
  • replacement is #$1 (where $1 means everything in the first capture group, which here is everything you've selected)
  • This example shows the two different uses of the dollar sign $ character. In the pattern, it means the end of the field. In the reference, it signifies a capture group.

External Resources (unrelated to CGPS)

RegexOne

To get a step-by-step walk-through of how Regular Expressions work, and more information about the structures involved, visit https://regexone.com/.

Autoregex

This resource (www.autoregex.xyz) allows you to write plain English and return RegEx, which can be a good way to familiarize yourself with the concepts and get started creating a complicated pattern.
https://www.autoregex.xyz/home
Link to www.autoregex.xyz/home
Example of input and output on autoregex.xyz site

RegEx101

Regex101 is a good place to test and debug RegEx functionality, although some users find the interface unintuitive.
regex101: build, test, and debug regex
regex101
This link will bring you to the regex101 website, to build and test your regular expressions

RegExr

Another place to build, test, and debug RegEx is https://reg.exr.com/.