Data Warehouse
NewWarehouse optimizer enhancement: cascading filter criteria on between joins
Greg Galloway on 20 Mar 2024 19:31:41
The following is a reproduction of a real-world scenario. I came up with a way to make the query faster, but I wish the optimizer would do this optimization automatically.
In the reproduction query, the where clause filters Calendar.Date. Calendar.Date is also used in the join to dimension_city, but the optimizer doesn't appear to pick up that it could cascade that filter to the ValidFrom and ValidTo columns and filter dimension_city before attempting the join.
The optimization adds an extra join criteria which causes Fabric (or SQL Server for that matter) to filter in the scan of dimension_city which reduces the rowcount coming into the join.
I wish that adding the extra join criteria wasn't necessary and that the optimizer could implement this sort of cascading filter automatically. That's my feature request.
Here's some code you can run in a Fabric warehouse to reproduce this scenario with sample tables and try it out. Thank you for getting this in front of the optimizer team and considering implementing this. Note, I think this scenario could be improved in the SQL engine across all it's flavors (SQL Server, Azure SQL, Fabric, etc.) to automatically do this optimization. If you extend this example which joins a1-a4 and add a join to a5, a6, etc. you will see the query get exponentially slower. The real-world scenario had about 12 joins and would run for hours and never return.
The reproduction code can be found here:
https://gist.github.com/furmangg/707905c8c3ec0b51a1fadb685144dd1c