Bamakus Montreuil on 13 Jun 2024 10:58:13
Hi Fabric ideas team
✔️Improve the copy (data pipeline & dfgen2) with these use case
Native management avoiding currents work arounds
- [Full Overwrite] ==> Ingestion all layers (RAW/ Staging / bronze / silver / gold tables)
- [Append] ==> Ingestion all layers (RAW/ Staging / bronze / silver / gold tables) | settings = date column name
- [Incremental append based a column type date] ==> Ingestion all layers (RAW/ Staging / bronze / silver / gold tables)
- [Incremental append based a file and its system update datetime] ==> Ingestion , only first step : RAW /Staging/Bronze layers
- [Rows snapshot] => ==> Gold tables management provide automatically start / end row version datetime , surrogate key | settings = primary key columns and type 2 columns names
- [Merge based primary keys and Type 1 columns updates] ==> Gold tables management | settings = primary key columns and type 1 columns names
✔️Improve the copy (data pipeline & dfgen2) by a setup avoiding duplicated rows after any append (full or incremental) incident recovery
✔️Copy (data pipeline & dfgen2)
Improve the Schema Change Management (Source VS Destination) Settings can be :
- Schema less (auto update table destination)
- Implicit schema is the current one on destination (& a columns mapping) - Manage a schema column “contract” (& a columns mapping)
If we fill in a schema , can we put default value if NULL as value ?
✔️ For Gold SQL TRANSFORMATIONS provide something similar DBT
DBT patterns are very cool and useful
✔️Offer a data quality control task on data pipeline (SQL & semantic model completeness ,your rules ..)
✔️Offer a python functions task on data pipeline for APIs ingestions
- Extract and Load via python script
- Similar than very robust Azure functions
- No need to use notebook for this use case