Skip to main content
Microsoft Idea

Power BI

Needs Votes

Query Folding support for bi-directional filters utilizing DirectQuery mode

Vote (1) Share
Pär Adeen's profile image

Pär Adeen on 08 Nov 2020 07:58:48

This might be a very easy fix.
The lack of Query Folding support for bi-directional filters in Direct Query is probably just an implementation bug/mistake. But as could bee seen below, the changes might be either on the Power BI Filter visual implementation (probably the easiest one) or in the Power Query code generated but the pattern TOPN(CALCULATETABLE(VALUES(),TREATAS()))

Fixing this problem will both support larger bi-directional dimensions, but also increase execution speed significantly when using bi-directional filters in DirectQuery mode.

Reasoning for fixing this problem:
One of the reasons for using DirectQuery mode is a huge amount of data. If there's a user demand to limit the items shown in a visual (this case a Filter visual), we nowadays better use the possibility to add a visual level filter. But... Since visual level filters executes two SQL queries in DirectQuery mode (can be seen in the SQLBI whitepaper for DirectQuery, or using the SQL profiler), this solution tends to be very slow. Stepping back to the pattern using bi-directional filters between dimensions is then the only feasible solution. Although, even this now seams to be very slow, and this is what I'm trying to adress

The problem:
Having large dimensions and bi-directional filters between facts and dimensions will cause a source server overload, in my case:
"The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information"

This error is caused within this DAX code that is self-generated by the Power BI standard Filter visual:
Note: The example below uses the AdventureWorks database, but the dimensions in the AdventureWorks database are not large enough to generate this problem.
DEFINE
VAR __DS0FilterTable =
TREATAS({"Absecon", "Accomac"}, 'Dimension City'[City])
VAR __DS0Core =
CALCULATETABLE(
VALUES('Dimension Customer'[Customer]),
KEEPFILTERS(__DS0FilterTable)
)
VAR __DS0PrimaryWindowed =
TOPN(101, __DS0Core, 'Dimension Customer'[Customer], 1)
EVALUATE __DS0PrimaryWindowed
ORDER BY 'Dimension Customer'[Customer]

To compare, replacing the Filter visual with a Table visual, the DAX code is identical except that DISTINCT() then is replacing VALUES(). Also, using a Table Visual we don't suffer from the source system overloading

So what's the reason for the the Filter Visual Using VALUES() instead of DISTINCT() used by the table visual. We might think that's it has anything to do with that the Filter visual shall be ably to show the "Blank()" row, but as I understand, the "Blank()" filter item is always implemented in a filter using DirectQuery mode.

So this leads me to Solution #1:
The Filter visual shall implements it's code using DISTINCT() instead of VALUES()

Looking into the generated SQL code below we can see the reason for the problem, although not visible in this minimized example. As a simple overview, the Filter visual executes two SQL queries in sequence. The first query gets all the dimension keys visible in the filter context, and the second query uses these keys in an WHERE IN() statements. Imaging having many thousands of keys, we will have many rows of WHERE X IN(,,,) OR X IN (,,,), etc

SELECT
TOP (1000001) [t7].[Customer Key]
FROM
(
(
select
[City Key],
[Customer Key]
from
[Fact].[Sale] as [$Table]
) AS [t7]
LEFT OUTER JOIN (
select
[$Table].[City Key] as [City Key],
[$Table].[City] as [City]
from
[Dimension].[City] as [$Table]
) AS [t8] on ([t7].[City Key] = [t8].[City Key])
)
WHERE
(([t8].[City] IN (N'Absecon', N'Accomac')))
GROUP BY
[t7].[Customer Key] / / SQL Query

SELECT
TOP (101) [t6].[Customer]
FROM
(
(
select
[_].[Customer Key],
[_].[Customer]
from
(
select
[Customer Key],
[Customer]
from
[Dimension].[Customer] as [$Table]
) as [_]
where
[_].[Customer Key] >= 0
)
) AS [t6]
WHERE
(([t6].[Customer Key] IN (396, 83)))
GROUP BY
[t6].[Customer]
ORDER BY
[t6].[Customer] ASC

I definitely can't see the reason for the first query as a separate query. So an easy-peasy solution is to have the first SQL query as a sub-query to the second query with either of these patterns
WHERE
(([t6].[Customer Key] IN (/*FIRST QUERY GOES HERE*/)))

of a
WHERE
EXISTS (SELECT 1 FROM (/*FIRST QUERY GOES HERE*/) as sub WHERE [t6].[Customer Key]=[sub].[Customer Key]

So this leads me to Solution #2:
Just Fold the first SQL query into the second SQL query