Data-flo
Data-floSource CodeCGPS
  • INTRODUCTION
    • What is Data-flo
    • Getting Started - Sign In
    • Privacy and Terms Of Service
    • Contact - Help & reporting errors
    • Change log
  • USING DATA-FLO
    • Data-flo site navigation
      • Transformations Page
      • Run Page
      • Canvas
    • Data
      • Bringing data in to Data-flo
      • Getting data out of Data-flo
      • Data Types
        • Boolean
        • Datatable
        • File
        • Graph
        • List
        • Map
        • Number
        • Text
    • Regular Expressions (RegEx)
    • Adaptors overview
      • Components of an adaptor
      • Binding types
        • Bind to Data-flo input
        • Bind to value
        • Bind to another transformation
    • Specific adaptors
      • add-column
      • append-to-list
      • calculate-time-difference
      • change-column-case
      • columns-concatenation
      • concatenate-text
      • create-microreact-project
      • csv-file-to-datatable
      • csv-to-datatable
      • datatable-columns
      • datatable-to-csv-file
      • datatable-to-graph
      • datatable-to-list
      • datatable-to-map
      • datatable-to-sqlite-file
      • date-to-text
      • dbf-file
      • dot-to-graph
      • download-file
      • dropbox-file
      • epicollect-project
      • extend-datatable
      • figshare-file
      • file-to-text
      • filter-blank-values
      • filter-columns
      • filter-list
      • filter-rows
      • filter-rows-numerically
      • force-directed-layout
      • format-date-column
      • forward-geocoding
      • ftp-file
      • gather-rows
      • google-drive-file
      • google-spreadsheet
      • graph-to-dot
      • join-datatables
      • list-to-datatable
      • lookup-map-value
      • merge-datatables
      • merge-lists
      • microreact-project
      • mysql-database
      • newick-leaf-labels
      • oracle-database
      • postgress-database
      • prepend-to-list
      • remove-columns
      • remove-duplicate-rows
      • rename-columns
      • replace-blank-values
      • replace-column-values
      • replace-text
      • replace-text-in-list
      • replace-value
      • reverse-geocoding
      • row-column-value
      • s3-file
      • select-columns
      • send-email-message
      • slice-datatable
      • slice-list
      • smb-file
      • sort-datatable
      • sort-list
      • split-column
      • split-datatable-rows
      • split-list
      • split-text
      • spread-rows
      • spreadsheet-file
      • sql-server-database
      • sqlite-database
      • sum-rows
      • text-template
      • text-to-file
      • unique-list-items
      • update-epicollect-entries
      • update-microreact-project
      • update-smb-file
      • upload-file-to-google-drive
      • upload-files-to-google-drive
      • url-builder
      • yaml-to-json
    • Building a data-flo
      • Debugging mode
      • Show detailed errors on Run Page
      • Permissions - Access Control
    • Tips & Tricks
  • TUTORIALS
    • Prep outbreak data for Microreact
    • Common use cases, solved
      • Fixing datatable headers
      • Select, remove, rename, reorder columns
      • Data in separate files
      • There's no single-column unique row ID (primary key)
      • Ensure non-dates stay non-dates
      • Connect directly to a database
      • Access files on a drive
Powered by GitBook
On this page
  • Overview
  • Basic RegEx structures guide
  • Examples
  • External Resources (unrelated to CGPS)
  • RegexOne
  • Autoregex
  • RegEx101
  • RegExr

Was this helpful?

  1. USING DATA-FLO

Regular Expressions (RegEx)

Basic introduction to Regular Expressions

PreviousTextNextAdaptors overview

Last updated 2 years ago

Was this helpful?

Overview

Regular expressions are a way to identify patterns in data. A Regular Expression (RegEx) uses a sequence of characters to specify a search pattern, and it can represent something as simple as "every value containing a zero" or something complicated like "every value that's between 10-12 characters in length and doesn't contain a capital A, a lowercase g, or a zero"

The backslash \ character is an "escape" character, which signals that whatever comes after it should be treated specially. (this means . returns different results than \. , as shown in the structures guide below)

Regular Expressions can be quite overwhelming at first, but most standard needs in Data-flo will be accomplished using a small number of structures. There are numerous resources online for learning RegEx, including a straightforward tutorial at .

Basic RegEx structures guide

The following table shows the structure of a piece of RegEx, what that structure represents, and Examples of what it might return. See below for specific examples showing real-world use of these structures in Data-flo adaptor arguments.

RegEx structure
RegEx meaning
Examples

abcABC...

letters

Text

123...

digits (numbers)

9825

\d

any digit (number)

4

\D

any non-digit character

B; or _

.

any character

4; or B; or _

\.

full stop (period)

.

[abc]

only a, b, or c

[gb]et matches get and bet, but doesn't match let or net

[^abc]

Not a, b, or c

[^ln]et matches get and bet, but doesn't match let or net

[a-z]

characters a to z

[a-z]101 matches m101 but not 2101

[0-9]

numbers 0 to 9

[0-9]101 matches 2101 but not a101

\w

any alphanumeric character

a; or T; or 7

\W

any non-alphanumeric character

_; or @

{m}

m repetitions

a{3} matches aaa; [wxy]{3} can match www, xxx, wyy, etc.; [0-9]{2} matches any two-digit number

{m,n}

m to n repetitions

a{2,4} matches aa or aaa or aaaa; .{2,3} matches any two- or three-character string

*

zero or more repetitions

+

one or more repetitions

AB+ matches AB or ABAB or ABCAB, but not BA or BACB

?

optional character

ba?123 matches ba123 or b123 but not a123

\s

any whitespace (space, tab, new-line, carriage return)

a\sb matches a b

\S

any non-whitespace character (anything but space, tab, new-line, carriage return)

a\Sb matches aab but not a b

^...$

starts and ends (anchors to the beginning and end of a field) (Note: $ in a reference is different than in a pattern; in a reference, $ references a specific capture group)

^123$ matches 123 but not 1123 or 1233

(...)

capture group

(a(bc))

capture sub-group

(.*)

capture all

(abc|def)

matches abc or def

Examples

/^.+-CGPS+-[0-9]{6}/ matches someamountoftext-CGPS-123454 and matches someamountoftext-CGPS-1234540 and matches x-CGPS-0000000000 and matches x-CGPS-CGPS-000000 because in all cases, the start of the line ^ is followed by any character . any number of times + , followed by the very specific text -CGPS any number of times +, followed by any six {6} digits [0-9] (and doesn't specify what happens after the six digits).

If a dollar-sign is added at the end, it signifies that there are six digits and that's the end of the line, so /^.+-CGPS+-[0-9]{6}$/ matches someamountoftext-CGPS-123454 but not someamountoftext-CGPS-1234540 .

Converting lat/long to negative numbers

/(.+)[W|w|West|WEST|west|S|s|South|south]/ matches 100.67W and matches 90.7 S and can be used as the pattern when converting latitude and longitude to negative numbers, with the replacement value -$1 turning those values into -100.67 and -90.7 respectively.

Select and reference everything in a field

  • pattern is everything in the field: /^(.+)$/

    • ^ and $anchor the start & end of the field, () designate a capture group to reference in the replacement, and .+ means any characters any number of times (at least once).

  • replacement is #$1 (where $1 means everything in the first capture group, which here is everything you've selected)

  • This example shows the two different uses of the dollar sign $ character. In the pattern, it means the end of the field. In the reference, it signifies a capture group.

External Resources (unrelated to CGPS)

RegexOne

Autoregex

This resource (www.autoregex.xyz) allows you to write plain English and return RegEx, which can be a good way to familiarize yourself with the concepts and get started creating a complicated pattern.

RegEx101

Regex101 is a good place to test and debug RegEx functionality, although some users find the interface unintuitive.

RegExr

To get a step-by-step walk-through of how Regular Expressions work, and more information about the structures involved, visit .

Another place to build, test, and debug RegEx is .

https://regexone.com/
https://reg.exr.com/
RegexOne
https://www.autoregex.xyz/home
Link to www.autoregex.xyz/home
Logo
regex101: build, test, and debug regexregex101
This link will bring you to the regex101 website, to build and test your regular expressions
Logo
Example of input and output on autoregex.xyz site