Data-flo
Data-floSource CodeCGPS
  • INTRODUCTION
    • What is Data-flo
    • Getting Started - Sign In
    • Privacy and Terms Of Service
    • Contact - Help & reporting errors
    • Change log
  • USING DATA-FLO
    • Data-flo site navigation
      • Transformations Page
      • Run Page
      • Canvas
    • Data
      • Bringing data in to Data-flo
      • Getting data out of Data-flo
      • Data Types
        • Boolean
        • Datatable
        • File
        • Graph
        • List
        • Map
        • Number
        • Text
    • Regular Expressions (RegEx)
    • Adaptors overview
      • Components of an adaptor
      • Binding types
        • Bind to Data-flo input
        • Bind to value
        • Bind to another transformation
    • Specific adaptors
      • add-column
      • append-to-list
      • calculate-time-difference
      • change-column-case
      • columns-concatenation
      • concatenate-text
      • create-microreact-project
      • csv-file-to-datatable
      • csv-to-datatable
      • datatable-columns
      • datatable-to-csv-file
      • datatable-to-graph
      • datatable-to-list
      • datatable-to-map
      • datatable-to-sqlite-file
      • date-to-text
      • dbf-file
      • dot-to-graph
      • download-file
      • dropbox-file
      • epicollect-project
      • extend-datatable
      • figshare-file
      • file-to-text
      • filter-blank-values
      • filter-columns
      • filter-list
      • filter-rows
      • filter-rows-numerically
      • force-directed-layout
      • format-date-column
      • forward-geocoding
      • ftp-file
      • gather-rows
      • google-drive-file
      • google-spreadsheet
      • graph-to-dot
      • join-datatables
      • list-to-datatable
      • lookup-map-value
      • merge-datatables
      • merge-lists
      • microreact-project
      • mysql-database
      • newick-leaf-labels
      • oracle-database
      • postgress-database
      • prepend-to-list
      • remove-columns
      • remove-duplicate-rows
      • rename-columns
      • replace-blank-values
      • replace-column-values
      • replace-text
      • replace-text-in-list
      • replace-value
      • reverse-geocoding
      • row-column-value
      • s3-file
      • select-columns
      • send-email-message
      • slice-datatable
      • slice-list
      • smb-file
      • sort-datatable
      • sort-list
      • split-column
      • split-datatable-rows
      • split-list
      • split-text
      • spread-rows
      • spreadsheet-file
      • sql-server-database
      • sqlite-database
      • sum-rows
      • text-template
      • text-to-file
      • unique-list-items
      • update-epicollect-entries
      • update-microreact-project
      • update-smb-file
      • upload-file-to-google-drive
      • upload-files-to-google-drive
      • url-builder
      • yaml-to-json
    • Building a data-flo
      • Debugging mode
      • Show detailed errors on Run Page
      • Permissions - Access Control
    • Tips & Tricks
  • TUTORIALS
    • Prep outbreak data for Microreact
    • Common use cases, solved
      • Fixing datatable headers
      • Select, remove, rename, reorder columns
      • Data in separate files
      • There's no single-column unique row ID (primary key)
      • Ensure non-dates stay non-dates
      • Connect directly to a database
      • Access files on a drive
Powered by GitBook
On this page
  • A single-column identifier key is needed
  • Scenario 1: Need single ID column in Microreact
  • Scenario 2: Join-datatables adaptor needs single column IDs
  • Solution: create a single-column row identifier

Was this helpful?

  1. TUTORIALS
  2. Common use cases, solved

There's no single-column unique row ID (primary key)

PreviousData in separate filesNextEnsure non-dates stay non-dates

Last updated 2 years ago

Was this helpful?

A single-column identifier key is needed

Data best practices demand that each row of data contains a unique identifier of some sort that is not shared by other rows (in a database, this is called the ""). Often, this is the value in a single column, but sometimes, it is a combination of columns that set the row's identity (also known as a composite key).

Some data analysis methods require the uniqueness in a single column.

Scenario 1: Need single ID column in Microreact

Microreact requires a single ID value that's shared across the data table, the tree file, the network file, etc., and this ID must be contained within a single column of each data source.

Scenario 2: Join-datatables adaptor needs single column IDs

Data-flo cannot properly join two datatables unless there is a single column in each table defining the join relationship. In this example, the unique rows are defined by the combination of "Patient" and "Sample Date". Using any single column to join will function in Data-flo, but the output will be incorrect.

Solution: create a single-column row identifier

Determine which columns combine to create the uniqueness of the row, which will be the same (and present) across all data sources. This combination of columns should contain the minimal set of attributes required to specify one row and differentiate it from other rows.

Example

In this example, the uniqueness of the row is a combination of three columns (Patient, Sample, Sequencing Date). If this is the combination that is required to enable a join to other datasets, then the columns-concatenation adaptor would be used.

Arguments: ****columns = Patient, Sample, Sequencing Date delimiter = _ target = Row ID

Now the new column can be used as the ID column in Microreact:

Use to combine the values from those columns into a new column to use as the ID (key).

columns-concatenation
primary key
Selecing the ID column in Microreact requires a single column for uniqueness
The same ID column is used to make the connection between Tree data & Metadata
Connecting these two datatables requires using Patient and SampleDate columns
Data lacking a single-column row identifier
Adaptor card
New Row ID column contains concatenation of Patient, Sample, and Sequencing Date.