Regular Expressions (RegEx)
Basic introduction to Regular Expressions
Regular expressions are a way to identify patterns in data. A Regular Expression (RegEx) uses a sequence of characters to specify a search pattern, and it can represent something as simple as "every value containing a zero" or something complicated like "every value that's between 10-12 characters in length and doesn't contain a capital A, a lowercase g, or a zero"
\character is an "escape" character, which signals that whatever comes after it should be treated specially. (this means
.returns different results than
\., as shown in the structures guide below)
Regular Expressions can be quite overwhelming at first, but most standard needs in Data-flo will be accomplished using a small number of structures. There are numerous resources online for learning RegEx, including a straightforward tutorial at RegexOne.
The following table shows the structure of a piece of RegEx, what that structure represents, and Examples of what it might return. See below for specific examples showing real-world use of these structures in Data-flo adaptor arguments.
x-CGPS-CGPS-000000because in all cases, the start of the line
^is followed by any character
.any number of times
+, followed by the very specific text
-CGPSany number of times
+, followed by any six
[0-9](and doesn't specify what happens after the six digits).
If a dollar-sign is added at the end, it signifies that there are six digits and that's the end of the line, so
90.7 Sand can be used as the pattern when converting latitude and longitude to negative numbers, with the replacement value
-$1turning those values into
- pattern is everything in the field:
$anchor the start & end of the field,
()designate a capture group to reference in the replacement, and
.+means any characters any number of times (at least once).
- replacement is
$1means everything in the first capture group, which here is everything you've selected)
- This example shows the two different uses of the dollar sign $ character. In the pattern, it means the end of the field. In the reference, it signifies a capture group.
This resource (www.autoregex.xyz) allows you to write plain English and return RegEx, which can be a good way to familiarize yourself with the concepts and get started creating a complicated pattern.
Example of input and output on autoregex.xyz site
Regex101 is a good place to test and debug RegEx functionality, although some users find the interface unintuitive.