What is Data-flo

Modular, reproducible, and extensible data integration and harmonisation.

Upgrade Announcement

In April 2024, a new version of Data-flo was released for testing at https://next.data-flo.io/, This large-scale upgrade includes significant changes to the user interface, improved ability to process large datasets, additional adaptors, and enhanced sharing permissions. Release plan: https://cgps.gitbook.io/data-flo/readme/new-release This version of Data-flo is planned to be sunsetted on 31 March 2025.

What is Data-flo?

Data-flo (https://data-flo.io/) is a system for customised integration and manipulation of diverse data via a simple drag and drop interface.

Data-flo can easily combine epidemiological data, genomic data, laboratory data, and various metadata from disparate sources (i.e., different data systems) and formats.

Data-flo provides a visual method to design a reusable pipeline to integrate, clean, and manipulate data in a multitude of ways, eliminating the need for continuous manual intervention (e.g., coding, formatting, spreadsheet formulas, manual copy-pasting).

Data-flo pipelines are combinations of ready-to-use data adaptors that can be tailored, modularised and shared for reuse and reproducibility. Once a Data-flo pipeline has been created, it can be run anytime, by anyone with access, to enable push-button data extraction and transformation. This saves significant time by removing the bulk of the manual repetitive workflows that require multiple sequential or tedious steps, enabling practitioners to focus on analysis and interpretation.

Why might I use Data-flo?

  • Prepare data for sharing

  • Automatically update a Microreact project with fresh data

  • Inform public health decisions using epidemiological metadata

  • More consistency in data passed from one team to another

  • Easier, faster data-driven decisions in your genome sequencing lab

  • Save time by removing the bulk of the manual repetitive workflows that require multiple sequential or tedious steps, enabling practitioners to focus on analysis and interpretation.

Data-flo turns raw, messy data into clean, usable data

Overview

Using Data-flo’s visual, drag-and-drop interface, a user sets up a data-flo to perform the integration & harmonisation: Ingesting data from one or more data-sources, combining, cleaning, harmonising, and reshaping it, and then sending the data onward in a format that’s ready for downstream applications or sharing. Once the data-flo has been created, the work is done, and running it requires only pointing to new data and clicking the Run button. The run page can be shared with other users, so that any of them can run the data-flo.

Sometimes, as in Public Health, users need to keep their data safe behind a firewall. In such cases, CGPS works with those users to install a local version of Data-flo behind their firewall, so that private data never leave their network.

Within the Data-flo software, you create a data-flo by combining adaptors.

A complete data-flo includes import adaptors & export adaptors to connect the transformation adaptors in the data-flo to external locations.

Each adaptor has input and output arguments that can be defined and customised.

An example of a Data-flo pipeline

Last updated