Building a data-flo

Import data --> clean / join / organize data --> export data

The details of what happens in a data-flo are generally hidden from the end user; they'll see the Run Page and simply set the necessary inputs. This is true regardless of the permissions.

Some things to consider when building a data-flo:

What are you trying to accomplish?
Where are you getting the data to bring in to Data-flo?
What will you be doing with the data you get out of Data-flo?
Who will be running this data-flo once it's built?
Are there sections of your data process that can be modular?
Is there an existing data-flo that does part of what you need to do, or does something similar?

The idea is that you build one data-flo and then reuse it on different data. It’s not storing any data - it doesn’t remember the data from last time (with the exception of information used in "Bind to a value".

Sometimes, a data-flo is a compilation of other, modular data-flos that perform specific series of tasks.

Ways to build a data-flo

You can build a data-flo by:

Starting from scratch

In the bottom right corner of the Transformations Page, click the purple Plus Button and select "New dataflow".
Give your data-flo a name.
Set the description (e.g. the purpose of the data-flo, what it's doing, etc.).
Set the folder if you want to organize your data-flos that way.
Determine who can access the Run Page for this data-flo, and set permissions accordingly (Tip: leave the data-flo as "Private" until it is completely built).
There's no rule about how to start building the data-flo, but generally it's best to start by importing data, and working from left to right, checking the outputs of each adaptor as you go.
Tip: use Debugging mode, and test-run the data-flo frequently as you build it.

Copying an existing data-flo

You may not need to start from scratch. Any data-flo you can access can act as a template. Copy the existing data-flo and adapt it to your needs.

Navigate to the RUN PAGE of the data-flo you want to copy.
Click the 'copy' icon to open a new copy of the data-flo. It has the same name as the original.
Rename the new data-flo.
Adapt as needed -- note that this new data-flo is an exact copy of the original EXCEPT that "secret inputs" are not included (because they were defined as secret).

Import a Data-flo .json manifest file

In the bottom right corner of the Transformations Page, click the purple Plus Button and select "Import".

Using an existing data-flo as a step in a new data-flo

Building modular or nested data-flos is a great way to save time and avoid duplicating effort. If there's a data source or a series of steps you will need to use repeatedly, it often makes sense to create a modular data-flo instead of rebuilding the same steps or creating too many minorly-different copies of a data-flo.

Existing data-flos show up in the Canvas Page menu just like adaptors do. One section is “Data-flos”. As with adaptors, the data-types matter -- you can always check the expected input or output type by clicking on the argument.

Example

In this example, a common laboratory need is to join the instrument output from a laboratory plate-reader analysis with the information about samples that went onto the plate. This needs to be done every time a new plate is run through the lab, and the joined data may be used in many possible ways. The "Sample wells and well metrics" data-flo performs the join. This data-flo can then be plugged into a new data-flo as if it's a single step. This makes the new data-flo simpler to understand and ensures that the plug-in module is performed the same way in multiple data-flos. This increases consistency and interchangeability.

Set your outputs

To complete a data-flo, there needs to be at least one adaptor output checked as "Mark as data-flo output" (without this, the Run Page will say that the data-flo is incomplete and cannot be run).

Some adaptors push data to another location, while others make the data available on the Run Page. On the Run Page, the results from multiple outputs are shown in an order determined by the order the outputs were defined. Changes can be made to this order by altering the data-flo (deleting and redefining the outputs).

When you Mark as Data-flo output, the "name" is more of a label; this name is what the user will see. If a file is being created, the argument “filename” determines the name of the file, while the output's name is what is shown on the Run Page. A descriptive output name is especially helpful when the data-flo is being nested within another data-flo.

Once your data-flo is complete, you can share it with others.

****Permissions (Access control) **** enable people other than the owner to access and run the data-flo.

You can also export the data-flo as a .json file, and send that to others. Anyone with the file can upload it into Data-flo themselves.

Keep in mind that you can use secret inputs in data-flos you plan to share.

Previousyaml-to-json NextDebugging mode

Last updated 2 years ago

Was this helpful?