There's no single-column unique row ID (primary key)

A single-column identifier key is needed

Data best practices demand that each row of data contains a unique identifier of some sort that is not shared by other rows (in a database, this is called the "primary key"). Often, this is the value in a single column, but sometimes, it is a combination of columns that set the row's identity (also known as a composite key).

Some data analysis methods require the uniqueness in a single column.

Scenario 1: Need single ID column in Microreact

Microreact requires a single ID value that's shared across the data table, the tree file, the network file, etc., and this ID must be contained within a single column of each data source.

Selecing the ID column in Microreact requires a single column for uniqueness
The same ID column is used to make the connection between Tree data & Metadata

Scenario 2: Join-datatables adaptor needs single column IDs

Data-flo cannot properly join two datatables unless there is a single column in each table defining the join relationship. In this example, the unique rows are defined by the combination of "Patient" and "Sample Date". Using any single column to join will function in Data-flo, but the output will be incorrect.

Connecting these two datatables requires using Patient and SampleDate columns

Solution: create a single-column row identifier

Determine which columns combine to create the uniqueness of the row, which will be the same (and present) across all data sources. This combination of columns should contain the minimal set of attributes required to specify one row and differentiate it from other rows.

Use columns-concatenation to combine the values from those columns into a new column to use as the ID (key).

Example

In this example, the uniqueness of the row is a combination of three columns (Patient, Sample, Sequencing Date). If this is the combination that is required to enable a join to other datasets, then the columns-concatenation adaptor would be used.

Data lacking a single-column row identifier
Adaptor card

Arguments: ****columns = Patient, Sample, Sequencing Date delimiter = _ target = Row ID

New Row ID column contains concatenation of Patient, Sample, and Sequencing Date.

Now the new column can be used as the ID column in Microreact:

Last updated