There's no single-column unique row ID (primary key)

A single-column identifier key is needed

Data best practices demand that each row of data contains a unique identifier of some sort that is not shared by other rows (in a database, this is called the "primary key"). Often, this is the value in a single column, but sometimes, it is a combination of columns that set the row's identity (also known as a composite key).

Some data analysis methods require the uniqueness in a single column.

Scenario 1: Need single ID column in Microreact

Microreact requires a single ID value that's shared across the data table, the tree file, the network file, etc., and this ID must be contained within a single column of each data source.

Scenario 2: Join-datatables adaptor needs single column IDs

Data-flo cannot properly join two datatables unless there is a single column in each table defining the join relationship. In this example, the unique rows are defined by the combination of "Patient" and "Sample Date". Using any single column to join will function in Data-flo, but the output will be incorrect.

Solution: create a single-column row identifier

Determine which columns combine to create the uniqueness of the row, which will be the same (and present) across all data sources. This combination of columns should contain the minimal set of attributes required to specify one row and differentiate it from other rows.

Use columns-concatenation to combine the values from those columns into a new column to use as the ID (key).

Example

In this example, the uniqueness of the row is a combination of three columns (Patient, Sample, Sequencing Date). If this is the combination that is required to enable a join to other datasets, then the columns-concatenation adaptor would be used.

Arguments: ****columns = Patient, Sample, Sequencing Date delimiter = _ target = Row ID

Now the new column can be used as the ID column in Microreact:

Last updated