Andy Cutler on 30 Jun 2023 06:43:14
Hi,
It would be great to have lakehouse and warehouse support different collations, specifically case sensitive and case insensitive.
At the moment we are able to create objects that are named the same but in different casing, this can be confusing.
e.g DimProduct, dimproduct, and Dimproduct can all be created.
Kevin Conan (administrator)
We are currently working on this feature.
- Comments (10)
RE: Support Multiple Collations
Are there any plans to implement it on Lakehouse? I know that Warehouse is more complete than Lakehouse when speaking about T-SQL, but it would be great to have it on Lakehouse too.
RE: Support Multiple Collations
The Fabric Roadmap seems to indicate this will be available for Warehouses in Q3 2024:https://learn.microsoft.com/en-us/fabric/release-plan/data-warehouse#case-insensitive-collation-supportHopefully the same is coming for Lakehouses??Case insensitive collation support (in Warehouse)Estimated release timeline: Q3 2024Using the public REST APIs to create a Data Warehouse includes a new option to set the default collation. This can be used to set a new Case Insensitive Collation default. You'll also be able to use the COLLATE command with CREATE TABLE to directly control which collation your VARCHAR fields use. The two supported collations are Latin1_General_100_CI_AS_KS_WS_SC_UTF8 (which is Case Insensitive) and Latin1_General_100_BIN2_UTF8 (which is Case Sensitive) and continues to be our default.
RE: Support Multiple Collations
Here's the reference documenting that "identifiers are case-insensitive" in Spark SQL, which ought to be a guide for Fabric as well: https://spark.apache.org/docs/latest/sql-ref-identifier.htmlFurther, SQL Server's default behavior in the US is to treat SQL identifiers as case-insensitive: SQL_Latin1_General_CP1_CI_AS. https://learn.microsoft.com/en-us/sql/relational-databases/collations/set-or-change-the-server-collation
RE: Support Multiple Collations
If we want any level of consistency in the behavior, we should look the Spark SQL standard, which says "identifiers are case-insensitive". Having collation options would be a plus, but at a minimum we should stick with standards as the default.
RE: Support Multiple Collations
Glad to hear that you are working on this feature. Wanted to throw out there - it would be a nice option to be able to configure a default collation for all objects at the tenant and/or workspace level.
RE: Support Multiple Collations
Want to second the input on being able to do case-insensitive sql for columns as well. The performance of specifying COLLATE Latin1_General_CI_AI in the where clause is awful and not as easy to pick up for some users. Thank you for working on this feature, we greatly appreciate it.
RE: Support Multiple Collations
I hope it means case-insensitive options on columns (the data in a table) and in Direct Lake semantic models.
RE: Support Multiple Collations
I see case-sensitivity in areas of Fabric as a serious mistake, particularly given the target audience.https://blog.greglow.com/2018/01/23/opinion-case-sensitivity-pox-computing/It's not just case sensitivity though. One thing that Fabric could have been is a unifying experience, yet today, each part of it has diferent rules for naming, etc. This should not have occured. It is a lost opportunity to make things easier for users.
RE: Support Multiple Collations
Case sensitivity is going to be a MAJOR obstacle to adoption in our environment. We don't have case sensitive data in all sources and trying to make everything case sensitive to migrate to fabric lake house / warehouse will be a major undertaking with no appreciable value to the end user or the business for the efforts involved.
RE: Support Multiple Collations
Case sensitivity is going to be a MAJOR obstacle to adoption in our environment. We don't have case sensitive data in all sources and trying to make everything case sensitive to migrate to fabric lakehouse / warehouse will be a major undertaking with no appreciable value to the end user or the business for the efforts involved.