Recently, on #sml we discussed #powerbi and slowly changing dimensions (SCD). If you are wondering what an SCD is, it is a dimension that can change over time, such as an address or phone number. I will (eventually) write a newsletter on SCDs, but for now, I just wanted to post a great example of an SCD and how to handle it.
I volunteer at my local library (libraries are one of my happy places!) and we are in the process of cleaning up records prior to transitioning to a new catalog database. We had to decide how to handle the records of patrons who are either deceased or have moved away. Here were the options:
1) Delete the record entirely
2) Update the record in some way to “flag” it
We decided to be conservative (always a good idea with records) and looked for a field in the patron record that we could use to indicate an “Inactive” status. We had to be able to pull reports on this field and more specifically, reports that will allow us to specify the value “Inactive” for the field. We found a good field, and after checking that it would accept characters (it was previously used with only numeric values), we decided to use it. We tried entering “I” (for Inactive) but that looked too much like a lower case L and we wanted to avoid confusion. So we entered the full word “Inactive”. So far so good.
But we know that ultimately we want to delete these inactive patrons, but just not precipitously. So we decided to add a date to indicate when the patron was flagged as inactive. It had to be a separate field that we could include in a report. Now that we have an “Inactive” indicator and the date the record was designated as such, we can set a policy for when/why we will delete patron records. More importantly, we can take our time crafting this policy and making sure we have thought it through.
Lessons learned:
1) Do not be quick to delete records. Rather, look for a solution that allows you to mark a record and then reverse that mark as needed.
2) Make sure you have crafted a policy on how to handle the deletions of records; a policy that you can refer to and have gained agreement from all the stakeholders of the data.