Inspiration for this post:
Data Engineering podcast, episode 46, “An Agile Approach to Master Data Management“, with guest Mark Marinelli, Head of Product at Tamr.
So you have some data you want to analyze. Are you lucky enough to have the data vetted by subject matter experts in your company? Has the data been ‘blessed’ (either de facto or de jure) because it originates in one of your ERP systems, such as the AP system, or the HR system?
In either of these cases, you are lucky because you won’t have to defend your data. By its’ very provenance (coming from a central data source), your analysis inherits legitimacy. (It goes without saying that ‘good’ data doesn’t protect you from your own misuse of self-same.) But what if you have some new data or some additional data that you want to combine in a new way? What steps in Power BI can you take to document both the source and the authenticity of your new mix-in without significantly slowing yourself down?
- Make sure you are mixing and matching data on fields that represent the same entity at the same level of detail (granularity). If a field in the canonical source (e.g. an HR source) has a certain heading, don’t assume that a field in your mix-in data with the same name represents the same value. (For example, if the “Department” field in the HR data is at one level of the organization, and the “Department” field in your mix-in represents a different level of the organization you have a mismatch and any analysis you create will be suspect.)
- If two fields represent the same value, but have slightly different names, you can rename one of them so that the readers of the analysis do not get confused. You should probably bias towards using the field name from the canonical source, as that name is an accepted standard. There are several ways to rename a field when using Power BI and the Power Query Editor. (Check out the next post for technical details.)
- It’s a good idea to document the source of your mix-in data. You can do that in the Properties section of the query in the Power Query Editor. (See the next post for several videos on this subject.) Even though you think you will remember where you got this data, odds are you won’t. The Data Source Settings feature in the Power Query Editor shows you the file path, but “where” is the bare minimum. You should also document the who, what, why, and when.
- You can change the name of a field in the Power BI Desktop. This is exactly the same as changing the field name in the Power Query Editor. (In fact, if you switch back to the Power Query Editor you will see a step added to rename the field.) If you only want to change the name as it is used in a visual, you can do that too (or instead). You have to create the visual first, then you can rename the field for that visual. The original name of the field stays the same in the Fields pane.
- Recently the Power BI Service added a new feature that allows you to certify or promote your data set. In this scenario you publish your data set (a workbook with a data model embedded in it) as a standalone or underpinning a Power BI report. You can then go to Settings, and choose to promote or certify your data set. (Only your Power BI Admin can certify a data set, so you will need to develop procedures for certification.) You can promote your data set, which means you stand behind the quality and content of this data. Whether you promote or certify a data set you have the option to enter a description of the data set. You could copy and paste the same description you used in the Properties section of the Power Query Editor, keeping both consistent. (Check out the next post for a video on how to do this.)
- If you put the description in the Power Query Editor, only you and anyone who has the .PBIX file can see and read it. If you put the documentation in the Settings in the Power BI Service, only members of the Workspace will see it. Where can you put it so that a consumer of the data who is seeing it in an App can see it? Simple. A text box could be added on a page on the report (which could then be added as a tile to a dashboard) that provides the same description/documentation.
The bottom line? Avoid buying a pig in a poke.
If you are using new data with the same fields as an established data source, consider aligning the field names so that they are the same. Require and provide documentation/descriptions for each of the mix-in data sets that you create or use. It takes about 5-10 minutes, not an onerous amount of time. Certainly not enough to detract from your “nimbility”.
Next up: Short videos (without sound) showing how to execute these mindset suggestions.