Skip to main content

Data Warehouse

Planned

Support Multiple Collations

Vote (49) Share
Andy Cutler's profile image

Andy Cutler on 30 Jun 2023 06:43:14

Hi,


It would be great to have lakehouse and warehouse support different collations, specifically case sensitive and case insensitive.


At the moment we are able to create objects that are named the same but in different casing, this can be confusing.


e.g DimProduct, dimproduct, and Dimproduct can all be created.

Kevin Conan (administrator)

We are currently working on this feature.

Comments (10)
Andy Cutler's profile image Profile Picture

JOSE BARROS on 06 Aug 2024 19:00:52

RE: Support Multiple Collations

Are there any plans to implement it on Lakehouse? I know that Warehouse is more complete than Lakehouse when speaking about T-SQL, but it would be great to have it on Lakehouse too.

Andy Cutler's profile image Profile Picture

Jonathan Boarman on 23 May 2024 18:34:20

RE: Support Multiple Collations

The Fabric Roadmap seems to indicate this will be available for Warehouses in Q3 2024:https://learn.microsoft.com/en-us/fabric/release-plan/data-warehouse#case-insensitive-collation-supportHopefully the same is coming for Lakehouses??Case insensitive collation support (in Warehouse)Estimated release timeline: Q3 2024Using the public REST APIs to create a Data Warehouse includes a new option to set the default collation. This can be used to set a new Case Insensitive Collation default. You'll also be able to use the COLLATE command with CREATE TABLE to directly control which collation your VARCHAR fields use. The two supported collations are Latin1_General_100_CI_AS_KS_WS_SC_UTF8 (which is Case Insensitive) and Latin1_General_100_BIN2_UTF8 (which is Case Sensitive) and continues to be our default.

Andy Cutler's profile image Profile Picture

Jonathan Boarman on 19 May 2024 20:47:31

RE: Support Multiple Collations

Here's the reference documenting that "identifiers are case-insensitive" in Spark SQL, which ought to be a guide for Fabric as well: https://spark.apache.org/docs/latest/sql-ref-identifier.htmlFurther, SQL Server's default behavior in the US is to treat SQL identifiers as case-insensitive: SQL_Latin1_General_CP1_CI_AS. https://learn.microsoft.com/en-us/sql/relational-databases/collations/set-or-change-the-server-collation

Andy Cutler's profile image Profile Picture

Jonathan Boarman on 19 May 2024 20:36:01

RE: Support Multiple Collations

If we want any level of consistency in the behavior, we should look the Spark SQL standard, which says "identifiers are case-insensitive". Having collation options would be a plus, but at a minimum we should stick with standards as the default.

Andy Cutler's profile image Profile Picture

Trey U on 13 Feb 2024 22:37:18

RE: Support Multiple Collations

Glad to hear that you are working on this feature. Wanted to throw out there - it would be a nice option to be able to configure a default collation for all objects at the tenant and/or workspace level.

Andy Cutler's profile image Profile Picture

Nick Confer on 05 Feb 2024 14:15:57

RE: Support Multiple Collations

Want to second the input on being able to do case-insensitive sql for columns as well. The performance of specifying COLLATE Latin1_General_CI_AI in the where clause is awful and not as easy to pick up for some users. Thank you for working on this feature, we greatly appreciate it.

Andy Cutler's profile image Profile Picture

Greg Galloway on 30 Jan 2024 15:12:52

RE: Support Multiple Collations

I hope it means case-insensitive options on columns (the data in a table) and in Direct Lake semantic models.

Andy Cutler's profile image Profile Picture

Greg Low on 05 Sep 2023 00:50:44

RE: Support Multiple Collations

I see case-sensitivity in areas of Fabric as a serious mistake, particularly given the target audience.https://blog.greglow.com/2018/01/23/opinion-case-sensitivity-pox-computing/It's not just case sensitivity though. One thing that Fabric could have been is a unifying experience, yet today, each part of it has diferent rules for naming, etc. This should not have occured. It is a lost opportunity to make things easier for users.

Andy Cutler's profile image Profile Picture

Rick Steinke on 11 Jul 2023 12:21:28

RE: Support Multiple Collations

Case sensitivity is going to be a MAJOR obstacle to adoption in our environment. We don't have case sensitive data in all sources and trying to make everything case sensitive to migrate to fabric lake house / warehouse will be a major undertaking with no appreciable value to the end user or the business for the efforts involved.

Andy Cutler's profile image Profile Picture

Andrew Steinke on 07 Jul 2023 16:21:27

RE: Support Multiple Collations

Case sensitivity is going to be a MAJOR obstacle to adoption in our environment. We don't have case sensitive data in all sources and trying to make everything case sensitive to migrate to fabric lakehouse / warehouse will be a major undertaking with no appreciable value to the end user or the business for the efforts involved.