Synapse

Planned

Load CSV with spaces in column headers into table

Vote (8)

Julian Gimbel on 22 Jun 2023 06:50:51

When trying to load a CSV file with column headers in the first row containing spaces in the lakehouse via "right click> load to Tables" the process fails with an error:

"Invalid column name(s) 'Item Type,Sales Channel,Order Priority,Order Date,Order ID,Ship Date,Units Sold,Unit Price,Unit Cost,Total Revenue,Total Cost,Total Profit' in file 'Files/100.csv'. Column names must contain UTF-8 encoded Unicode word characters and can be a maximum of 128 characters long. Unicode word characters include letters of any case and their nonspacing marks, punctuation connectors like the underscore(_), and decimal digits."

In a SaaS Service I would expect that instead of an error I get a warning which tells me that it will automatically replace the spaces in the column names with underscore and still loads the data.

Administrator

The Fabric product group has committed to delivering the idea.

Comments (1)

Comments (1)

Julian Gimbel on 06 Jul 2023 15:50:23

RE: Load CSV with spaces in column headers into table

My current workaround is to read the data with spark:df = spark.read.format("csv").option("header","true").load("Files/folder/data.csv")
and then replace the problematic column names with the following regular expression:import re
for col in df.columns:
df = df.withColumnRenamed(col, re.sub(r'[\W+_+]+', '_', col))
df
It converts: ["asd / asd", "def.def", "wcd & csv", "_ ubs_ _"]
to ["asd_asd", "def_def", "wcd_csv", "_ubs_"]
But that is just a workaround, it should directly be suggested to do that in the UI.