Welcome back flatlanders. First, a personal note: I have been travelling a lot in the last couple of weeks, and therefore have been remiss in my blogging. I promise to try to do better. But it was worth it. I went to “the” quilting show in Paducah Kentucky. If you are a quilt fan, follow this link to see the Best in Show quilt (you can skip the video at the top and just scroll down to see the quilt itself). I promise that you will be amazed!
Now back to data. In this post, I want to tackle one of the first things you need to know about DAX and your data model: how to add calculations (measures) that you need. Before we dive in, let’s review the uses for DAX (Data Analysis Expressions). DAX is the language used once you are inside the Power BI Desktop (or DAX Studio and Tabular Editor) to build four different types of content:
- New tables
- New columns (calculated columns)
- Row level security roles
Always try to create new columns and new tables as far upstream as possible and as far downstream as necessary (shout out to the great Matthew Roche for coining this aphorism). What does it mean exactly? It means do everything you can to avoid creating new columns or tables in the Power BI Desktop using DAX. Why? Because these creations do not compress as well as columns and tables that are imported into the Desktop from the Power Query Editor. But sometimes you don’t have access to the source (e.g. if it is in a database). Your next best option is to do the work in the Power Query Editor. Failing that option, you will have to use DAX.
With that disclaimer firmly in mind, let’s explore how to create a calculated column, a measure and an iterator measure. Measures are calculations that are not present in the data. If you think about it, one of the great strengths of Power BI is the ability to bring data together from a variety of sources. If you do that, it is highly likely that you won’t have all the calculations you might want because the data has never been ‘all together’ before. Here’s an easy example for us to start with:
In this case, we have a column for quantity and we have a column for the unit price, but we don’t have a column with the total price paid for each row (e.g. [Quantity]* [UnitPrice]). Now we Flatlanders instinctively want to reach for a column to solve this problem. And you can….but try to create the column either in the source or in the Power Query Editor. Assuming you can’t or won’t do either of these things, here’s how you would write the DAX to create this column:
Total Sale=’All Sales'[Quantity] * ‘All Sales'[UnitPrice]
Notice a couple of things in Version A:
- There is no function in this formula (and DAX is a functional language).
- Column names are enclosed in square brackets.
- The table names are enclosed in single quotations (this is only required if there is a space in the table name, but I use single quotes all the time because I can’t be bothered to remember when to use them).
This is the expression that will produce a new column called [Total Sale]. Once the column is there (whether you added it further upstream or used DAX to calculate it as we are doing here) you can now use DAX to add it up. In other words, you can write a measure using this column now:
Total Sales=SUM(‘All Sales'[Total Sale])
Notice a couple of things in Version B:
- Now we see a function–in this case the function SUM.
- The column name is accompanied by its table name. When you reference a column name in DAX, you always “fully qualify” it by putting its table name in front of it.
More importantly, there is a major difference between Version A and Version B. Version A produces a column where the calculation is performed row by row. Version B adds up the entire column. Let’s look at a Version C, which is wrong:
Wrong Total Sale= SUM(‘All Sales'[Quantity])*SUM(‘All Sales'[Unit Price]])
Notice a couple of things about Version C:
- Here we see the use of a function twice: once to add up the [Quantity] column and once to add up the [UnitPrice] column. This version is wrong because it is written as if we are creating a column and as if we are creating a measure (in other words it is a mashup of versions A and B). And while some mash ups are better together, some are just wrong. This is just wrong. It won’t work.
- (I have included this incorrect construction because when I first started writing DAX I always tried to write the expression this way and could not wrap my head around why it wasn’t working.)
Understanding why Version C doesn’t work involves understanding how DAX ‘thinks’. (I know, DAX is inanimate and doesn’t think per se, but if you have ever wrestled with writing an expression, you will believe DAX is a living, breathing entity with a mind of its own!) DAX ‘thinks’ in columns primarily. The default behavior when you write a DAX expression is for the entire column to be computed in one ‘gulp’. This is one of the reasons DAX is so fast. But what happens if you need to calculate something on a row by row basis? Look at the example above again–if we just added up the [Quantity] column and the [UnitPrice] column and then multiplied them together, we would get the wrong answer.
This is an example of a calculation that must be performed row by row. When you are confronted with this situation, you have two choices:
- Compute a calculated column that multiplies [Quantity] by [UnitPrice] and returns a value for each row (Version A)
- Write a measure that adds up the calculated column (Version B)
You can write a single measure that does both at once. If you don’t have a column that computes the total for each row, you don’t have to create one. Instead, use DAX to write a measure that will ‘iterate’ (go row by row) and calculate the total for each row and then add them up.
Total Sales Iterator=SUMX(‘All Sales’, ‘All Sales'[Quantity] * ‘All Sales'[Unit Price])
Notice a couple of things in Version D:
- This expression uses the function SUMX. The presence of an X at the end of the function tells DAX to perform the calculation row by row. Functions with an X at the end of them are called iterators. The first argument of the expression is the table that you want to iterate on. The second argument is the calculation you want performed.
- You will not see the individual row results. The iterator function keeps track of them in memory, but does not materialize (“show”) them to you. If you need to see the results row by row, you have to create a calculated column. When you use the measure in a visual, it will materialize the results.
- If you need to use one of the individual row results in a slicer or filter, you will have to create a calculated column. You can’t use a measure in a slicer or a filter.
Of all four options, Version D is the most effective if you do not have a column already present. The key here is understanding that DAX’s natural behavior is to gobble up columns. If you want to compute something row by row, you will have to use either a calculated column (Version A) and then a measure (Version B) or an expression using an iterator (Version D).
How can you practice this DAX? Easy. Build yourself a very small table in Excel with two columns [Unit Price] and [Quantity]. Bring that table into Power BI, and use each of the versions in turn. Remember, for Version A, you need to choose “New Column” first. For Version B, you need to choose “New Measure”. For Version C, it doesn’t matter if you choose new column or new measure–both will error out. For Version D, you need to choose “New Measure”.