Let’s talk about the basic building block of any star schema: the table. It makes sense that it is critical to be able to create them and copy them. But it also makes sense that you want to be able to winnow them down to a manageable size if they have millions of rows. In fact, it is more efficient to manipulate a table prior to performing a particular operation. Meet FILTER, a DAX function which allows you to filter a table. Let’s explore it a little bit.
The syntax for FILTER is simple: it requires the name of the table to be filtered as the first argument, and then the filter expression you want to use.
FILTER(<table>,<filter expression>)
You can use it to create a brand new table with only the rows you want in it.
Here’s an example, using the Adventure Works database provided by Microsoft. The table name is Order Header, and the TotalDue column has the value for the order. This expression creates a brand new table called Sales>10K by filtering the original table to only show lines with a value greater than $10000.
Sales>10K = FILTER(‘Order Header’,’Order Header'[TotalDue]>10000)
(One note–I didn’t have to put double quotes around the number because it is a number, not a text value. Also note that I don’t include the currency symbol.) For reference, the original table, Order Header, has 31,465 rows, and the new table Sales>10K only has 1,878 rows.
But here’s the thing–if you create a new table this way, you have to connect the table into the existing data model in order to use it the same way you would use the original table. For many of us, creating the new table helps us understand and visualize what we are doing, but you should train yourself out of this habit. You probably already know that calculated columns are to be avoided, but should you also avoid calculated tables? The answer (as is so common in Power BI) is “It depends”. A calculated table requires a lot of RAM because of the way the Vertipaq engine handles it: every time the model refreshes, the engine expands the table fully in memory and then compresses it. If the table is huge, you will require a lot of RAM. Ultimately, a calculated table can be effectively compressed, but in the process you may need more RAM than you have, if the table is very large. *
What should you do instead? Use the same FILTER expression, but embed it into the expression that is your ultimate objective. Let’s say that I want to add up the sales for orders whose TotalDue is greater than $10,000. I can create a new, temporary, table within the larger expression, then add up the TotalDue column.
Sales>10Kv2=
SUMX(
FILTER(‘Order Header’, ‘Order Header'[Total Due]>10000),
‘Order Header'[TotalDue])
The result of this expression is that it will filter the Order Header table down to just the rows where TotalDue is greater than $10,000, and then add up the entire TotalDue column for the remaining rows. Note that just providing the name of the column in the second argument is enough–don’t make the mistake of enclosing it within another function, as shown below.
Sales>10Kv2=
SUMX(
FILTER(‘Order Header’, ‘Order Header'[Total Due]>10000),
SUM( ‘Order Header'[TotalDue]) )
This construction will produce an incorrect result.
I recommend experimenting first on a dataset by creating a winnowed table and then writing a measure for the new table. Then backtrack and write a measure that encompasses the FILTER statement within the measure (in other words, you are not creating a new table using this method). Then create two visuals (I use the table or matrix visual) and compare the results of the two measures. They should be the same.
- * I would like to thank SQLBI+ for the very thorough explanation of how the Vertipaq engine handles compression. Their video on this topic is very enlightening. (No affiliation–just a super satisfied customer.)
- PBILAB stands for “Power BI Like A Boss”. It’s a series I am creating (over time) exploring concepts that have been difficult for me to master. My target audience is the “citizen developer” who always needs to know why something works.