Data Factory

Needs Votes

Improve Data factory (pipeline & dataflow gen2)

Vote (3)

Bamakus Montreuil on 13 Jun 2024 10:58:13

Hi Fabric ideas team

✔️Improve the copy (data pipeline & dfgen2) with these use case

Native management avoiding currents work arounds

- [Full Overwrite] ==> Ingestion all layers (RAW/ Staging / bronze / silver / gold tables)

- [Append] ==> Ingestion all layers (RAW/ Staging / bronze / silver / gold tables) | settings = date column name

- [Incremental append based a column type date] ==> Ingestion all layers (RAW/ Staging / bronze / silver / gold tables)

- [Incremental append based a file and its system update datetime] ==> Ingestion , only first step : RAW /Staging/Bronze layers

- [Rows snapshot] => ==> Gold tables management provide automatically start / end row version datetime , surrogate key | settings = primary key columns and type 2 columns names

- [Merge based primary keys and Type 1 columns updates] ==> Gold tables management | settings = primary key columns and type 1 columns names

✔️Improve the copy (data pipeline & dfgen2) by a setup avoiding duplicated rows after any append (full or incremental) incident recovery

✔️Copy (data pipeline & dfgen2)

Improve the Schema Change Management (Source VS Destination) Settings can be :

- Schema less (auto update table destination)

- Implicit schema is the current one on destination (& a columns mapping) - Manage a schema column “contract” (& a columns mapping)

If we fill in a schema , can we put default value if NULL as value ?

✔️ For Gold SQL TRANSFORMATIONS provide something similar DBT

DBT patterns are very cool and useful

✔️Offer a data quality control task on data pipeline (SQL & semantic model completeness ,your rules ..)

✔️Offer a python functions task on data pipeline for APIs ingestions

- Extract and Load via python script

- Similar than very robust Azure functions

- No need to use notebook for this use case