Data in separate files

Example scenarios and solutions

One column is separate from main data

Problem: Table 1 has plate layout, Table 2 has plate data

Example scenario: In a laboratory, there is a list of samples to test in a 96-well plate. There is a spreadsheet tracking which sample went into which well location.

The 96-well plate gets loaded onto the plate reader, which produces a data file with well number and measurement reading.

The plate reader measurements are useful when they are associated back to the samples. It's possible to associate them because both spreadsheets contain well location information.

  • Sample list: Sample ID, Well Location

  • Plate reader data: Well location and all the readings associated with that well.

Goal: associate the plate reader data with the sample IDs. Since Well Number is the same in both sheets, this is your Key in the Key-Value pair, and Sample is the Value.

Solution: Create data map, Extend datatable using map for values argument

While this problem can also be solved using Join-datatables, it is a simple single-column mapping that provides a good example of how the map data type can be useful. In this case,

One shared column, many different columns

When only one column is missing from the desired datatable, it can often be pulled in via a map (see Solution 1). When multiple columns need to be pulled from a second file, however, a larger JOIN is required.

Problem: Lab data and Epi data in two files

Making data-driven decisions requires information contained in many columns from LIMS (laboratory information management system) and many columns from a infectious disease surveillance system. The columns are different, but there is a unique ID that is common between the files.

Solution: Join datatables, keep multiple columns from both files

This is a simple use case for the join-datatables adaptor. One datatable becomes the "main" datatable, and columns from the "other" datatable are pulled in via matching on the unique ID shared between the two datasets.

Multiple files with the same columns

If the rows in two files could be all in the same spreadsheet without altering the column names or data types, you may want to copy-paste them into a single file. Data-flo can do that for you using the merge-datatables adaptor.

Problem: Files have same columns, different rows

Examples scenario: GISAID limits the number of rows that can be downloaded at once, so multiple smaller files get created instead of a single file containing all the rows.

Solution: Import each file, then merge them

This is a simple use case for the merge-datatables adaptor. If there are more than two tables, the adaptor can be used multiple times, adding new rows each time.

Last updated