Fixing datatable headers

All datatables require a single line of headers for column names.

If data have no column names, or if there is extra header space before the row containing column names, the data require editing before use. The top row will be interpreted as column names, so inputs need to be formatted that way.


A datatable has a single row of headers as column names, without any other rows above it or extra rows after it before the data rows begin.



Problem: No column names in input data

This CSV has no headers, so the first row of data gets interpreted as column names.


Solution: Add a header row containing column names.

If the file is small:

  1. Whatever format your data are in, get them into CSV.

  2. Use Concatenate-text adaptor to prepend column names to the data. Left argument (Bind to value) is the comma-separated series of desired column names, ending with a newline. Right argument (Bind to another transformation) is the csv from the data input.

  3. Combination output argument then connects to the next step (probably CSV-to-datatable)

To see this example, make a copy of this Data-flo so you can see how it is built.

If the file is large: Avoid converting to text. Instead,

  1. Use Datatable-columns to turn the header row (which is really supposed to be your first row of data) into a list

  2. use List-to-datatable so the list becomes the data in a new datatable column

  3. use Add-column to create a dummy column (for use in Spread-rows)

  4. use Spread-rows to turn the column of column names into the values in a single row

  5. use Remove-column to get rid of the dummy column. Now your first row of data is both the header and the first row of data.

  6. Finally, use Merge-datatables with this as 'first data' and the original datatable (which has the first data-row as column names instead of as data) as 'second data'.

  7. Now you can Rename-columns and have all your data as data.

To see this example, make a copy of this Data-flo so you can see how it is built.

Problem: Extra header rows before column names

In this situation, the extra information (or empty space) in the header gets interpreted as column names, so you need to remove it.

information above the header
more information above the header

Solution: Limit the range of data imported

  1. Use spreadsheet-file or google-spreadsheet adaptor to import the file.

  2. Use the **RANGE ** argument to remove rows at the top of the sheet. In the example data above, the real headers are in Row3. The desired import Range will be A3 until end of file.

    1. If you are unsure what the last row will be, choose something larger than expected (e.g. range=“A3:P20000”.

    2. The output of the adaptor will only include rows and columns that contain data within this specified range; you won’t get empty cells.

Last updated