What is Data-flo
Modular, reproducible, and extensible data integration and harmonisation.
Data-flo (https://data-flo.io/) is a system for customised integration and manipulation of diverse data via a simple drag and drop interface.
Data-flo can easily combine epidemiological data, genomic data, laboratory data, and various metadata from disparate sources (i.e., different data systems) and formats.
Data-flo provides a visual method to design a reusable pipeline to integrate, clean, and manipulate data in a multitude of ways, eliminating the need for continuous manual intervention (e.g., coding, formatting, spreadsheet formulas, manual copy-pasting).
Data-flo pipeline are a combination of ready-to-use data adaptors that can be tailored, modularised and shared for reuse and reproducibility. Once a Data-flo pipeline has been created, it can be run anytime, by anyone with access, to enable push-button data extraction and transformation. This saves significant time by removing the bulk of the manual repetitive workflows that require multiple sequential or tedious steps, enabling practitioners to focus on analysis and interpretation.
- Prepare data for sharing
- Inform public health decisions using epidemiological metadata
- More consistency in data passed from one team to another
- Easier, faster data-driven decisions in your genome sequencing lab
- Save time by removing the bulk of the manual repetitive workflows that require multiple sequential or tedious steps, enabling practitioners to focus on analysis and interpretation.
Data-flo turns raw, messy data into clean, usable data
Using Data-flo’s visual, drag-and-drop interface, a user sets up a data-flo to perform the integration & harmonisation: Ingesting data from one or more data-sources, combining, cleaning, harmonising, and reshaping it, and then sending the data onward in a format that’s ready for downstream applications or sharing. Once the data-flo has been created, the work is done, and running it requires only pointing to new data and clicking the Run button. The run page can be shared with other users, so that any of them can run the data-flo.
Sometimes, as in Public Health, users need to keep their data safe behind a firewall. In such cases, CGPS works with those users to install a local version of Data-flo behind their firewall, so that private data never leave their network.
Within the Data-flo software, you create a data-flo by combining adaptors.
A complete data-flo includes import adaptors & export adaptors to connect the transformation adaptors in the data-flo to external locations.
Each adaptor has input and output arguments that can be defined and customised.
An example of a data-flo pipeline