Plinio Nunez on 09 Jul 2024 19:24:05
I have an event stream (reading from an event hub) dumping data periodically in a delta table. This delta table, due to the nature of the data it contains, can grow very large. For example, a small streaming dataset can produce around 100 million rows in two weeks. Since data is constantly being inserted in tables, I cannot issue a delete statement because it would fail.
The work around to this is to have the table partition -- or is it? It turns out that delta tables cannot be partitioned when used as a sink? I keep getting: 'DeltaTable's partition columns are not equal to the configuration' .
If I create a delta table beforehand--with a partition already defined. The column used for the partition gets a null value--always.
Is this something you guys can allow, or do we have to seek alternate ways of doing this?
Administrator on 23 Aug 2024 15:58:42
Feedback received. Need more votes for next step.