It’s been said that data migration is 80% of any analytics project. If I were to argue with this statistic, it would be that it’s an underestimate. Getting the data out of all the places it lives and into a single analytics database can be a daunting challenge. At a high level, data migration is pretty simple. Pull data from one place and put it in another. I’ve seen a lot of projects that took a minimal approach to data migration. Simply pull data and push data. Easy. I think we can do better. Every project I build has the following elements in common.
- Error Tracking
- Performance Monitoring
- Housekeeping Fields
Let’s break these key components down one by one.
It’s always a shock when I review a data migration and find there is no error tracking built-in. How do those developers sleep at night? The ugly truth is errors happen, even to perfectly designed data migrations. It’s my job to make sure that the data migrations I create catch and handle errors. No one wants to wake up to an email with a long list of errors from the previous night’s data updates, but I can tell you from experience those emails are a lot better than not knowing about errors until they are found in a mission-critical report the next day.
Performance monitoring is something I see even less than error tracking, at least in other developers’ work. I don’t blame them. Performance monitoring can be viewed as a luxury, but in my opinion, it is a basic necessity. Performance monitoring is how I gauge the health of my data migrations. Knowing how long a migration SHOULD take and how long it ACTUALLY takes gives you a pretty good idea of potential problems before they become obvious. Did a two-hour migration finish in 2 minutes? There might not be any error records, but I’ll bet there’s still a problem in there somewhere.
These are the housekeeping fields that I add to every record that passes through any of my data migrations.
- date added
- added by
- date modified
- modified by
- hash key
These fields make maintenance of the data, inserting new records and updates to existing records much easier and give us a lot of flexibility should we decide to implement change tracking from the start or in the future.
No one enjoys writing documentation. I certainly don’t. But I do it. I don’t do it for myself, or even for you … I do it for those that follow me. Documentation takes time, and it can be tricky to write documentation that is concise, to the point, AND useful. It takes time, but it is totally worth it.