Christoph Mettler on 08 Jun 2023 11:29:27
Data providers typically deliver data in formats such as parquet, JSON, or CSV. In traditional data warehouse scenarios, these files often land in a persistent staging area. Many organize these staging folders in a structured manner such as year/month/week/day. Currently, these files need to be converted to delta format with an additional workload before they can be consumed in the data warehouse.
The support of openrowset/create external table, as with parquet file globbing, would greatly simplify the load and transform to final delta files in the warehouse workload without the need for additional workloads/datalflow/pipelines complexity.
Charles Webb (administrator)
Folks, please feel free to share more details about the scenarios this would unblock for you.
- Comments (8)
RE: support openrowset/polybase
I agree with folks; this is a must-have feature in a lakehouse: the ability to easily separate the physical layer from the abstract layer without constantly copying or materializing data.Use case examples:Rapidly profile and explore raw data (CSV, JSON, etc.) in the lakehouse/files using T-SQL/SQL OPENROWSET in SQL ENDPOINT.MVVM style design pattern: Implement an interface (view) on top of raw data files (CSV, JSON, etc.) in SQL ENDPOINT using OPENROWSET to facilitate downstream data flows/ETLde-serialize and standardize json schema easily in a view using OPENROWSET and OPENJSONexpose data in a "fast track mode" to consumer So basically, just reply with the Synapse Serverless feature. ;-)Furthermore, it says that this idea is in "Planned" status, so can we get an ETA on it? This information could be very helpful in preparing our migration plan from Synapse to Fabric (or not).Thanks! :-)
RE: support openrowset/polybase
Hello Jovan PopovicAm using synapse serverless pool for converting JSON files on the fly to rows and columnsSo openrowset is definitely required in Fabric ThanksSanjay
RE: support openrowset/polybase
I think this idea asks for a similar approach https://ideas.fabric.microsoft.com/ideas/idea/?ideaid=6bd2453c-3eed-ee11-a73d-000d3a7c5b9a
RE: support openrowset/polybase
We are in synapse right now and use delta lake with serverless sql for all our use cases. It would be extremely important for us to have this feature in Fabric. Thanks again for your efforts to have this in Fabric.
RE: support openrowset/polybase
He use OpenRowset in all our projects.Allow usImplement data Virtualization, and avoid Staging tablesGrant access to data that is in the LZ but not in the User SQL tables with Views_LZImportant to haveThe ALDS / OneLake must be under firewall ==> Private endpoint compatible (**)Use SAS or Acces Keys . . .., But compatible with AAD (User MI or System MI) (**)Most of the Cases we use Authentication SQL Storage: Delegation / ImpersonateIn some cases for audit purpolses we need use Pass-Through Authentication, thus the AAD SQL user will appear on ADLS logsThe T-SQL Openrowset must be able to compile on Azure Data studio & Azure DevOps, currently is not able to do it and block all CI/CD progress (**)Format files (FMT) are really obsolete, need maintenance and are not part of Database Project / GIT, need to find a 2024 replacement (**)We hope that Select disctinct (col_1) [for Dim Tables] On parquet will be faster than on CSV, but sometimes is not (**)This Synapse "sugar" is very appreciatee: '*/*/*/myfile*.parquet'Currently select from openrowset INTO #Temptable is not supported (**)All the others: AGG pushdown, Metadata, Vertipaq / V-Order, PurView , , . . . future??(**) Hugo Queiroz knows about these issues and some of them are ACK by Azure support
RE: support openrowset/polybase
Hello Admin, is this being considered?Thanks.
RE: support openrowset/polybase
Hello,Currently we are serving many BI workloads(Power BI Reports, Excel Tables, etc) through Synapse Serverless SQL openrowset with data from datalake in different formats and from different layers (in medallion architecture).We are trying to move to Fabric now, but all our data is in parquet format in ADLS2 and existing pipelines are also ingesting the data into ADLS2 (external data, export setupts, etc).Shortcut is good but it can't be served to BI through fabric without loading it into delta tables. So there needs another ETL to load the ADLS2 data into Delta Tables through spark ETL which is killing mosquito with hammer.Hope to see OPENROWSET in fabric soon.Thanks.
RE: support openrowset/polybase
we use this pattern extensively at work, ADF, azure storage, staging table in sql server, our data engineer dismissed Fabric because of the lack of this feature.