Model, Don’t Muddle

Are you a flatlander? I am. Not just because I live in Illinois, which is flat. VERY flat. I am a flatlander in Power BI because I came from Excel to Power BI. This is a career trajectory that a lot of us have followed, and I train a lot of flatlanders . One of the biggest hurdles to mastery, for me, has been data modeling. The idea of having to model my data was foreign because when we work in Excel, if we want to create a pivot table, all of the data has to be on the same worksheet. Yes, you can work with the LOOKUP functions, but the data still has to be all on one worksheet–which is what I am calling “flatland”.  Power BI does not work the same way, which is why you have to tackle modeling head on, and not plan on ‘muddling’ through it.

When I got started, my husband tried to explain to me that the model is everything. I didn’t believe him–probably because he’s my husband! Much to my chagrin, he was (and is usually) right. This begs the question–why is the model “everything”? Because depending on how you construct your model, your report can be lightning fast and accurate. If you do a poor job of modeling, your report can be glacially slow and wrong! Seems like a no-brainer right? But it isn’t. Data modeling is a skill, a science and an art. So let’s review some basic principles.

  1. A data model is composed of multiple tables (unlike Excel, where all the data has to be in one worksheet).  Can you make Power BI work if you only have one table, with everything in it? Sure, but don’t expect your reports to be responsive.
  2. There are two primary types of tables in a Power BI data model: fact tables and dimension tables. Fact tables contain information (facts) that you want to report on. Dimension tables contain information about those facts.  Dimension tables are usually referred to as dim tables.
  3. The ideal structure for your data model is a star schema, with a fact table in the ‘center’ and dimension tables ringing it. The ‘star’ is just a construct for you to be able to visualize it.
  4. Tables are connected by relationships. If a table has no connections to any other tables, you will not be able to use fields from one table (typically from a dim table) with fields from another table (typically a fact table) in a visual.
  5. The fact table is usually the longest table in your model –the most number of rows. It should have the fewest columns. Ideally, the fields are all in a numeric data type (whole number, decimal, date, time).
  6. The dim tables have many more columns than rows. A dim table will have many fewer rows (usually) than a fact table. Columns in the dim tables can be any data type, but there will be quite a number of string (or text) fields because the purpose of the dim table is to describe the facts in greater details.

I have attached an infographic that is a cheat sheet to help you remember the differences between fact and dim tables.

Next up: Data Types and Data Formats