Tuesday, 19 June 2012

Usage Deviation

Legacy data marts. Unless your company is brand new or very small, there will be plenty of data marts dotted around your servers. When it comes to measuring the quality of the data, many people will go to the original functional specifications to devise the quality rules. If you are lucky, there will have been much thought put into these specifications. You will have all the project documentation, including the original business requirements. However, the truth is that over time these documents can become as aged as the data content.

You may find that other systems and measures will be set up to consume this legacy data - but for different purposes than what was originally intended.  The developers of these additional processes may have made incorrect assumptions about the content of the data or they may have known that the data does not precisely match their requirements, but it is the 'best fit', and they accept their process is not perfect.

  • The schedule of the batch processes that update the data mart may be out of sync with the newer processes.
  • The data mart may not capture all of the data to suit the new purpose.
  • The new processes may have to summarise unnecessary amounts of low granular data every time they run, making them slow and unreliable.
  • The event dates used may be subtly different than what is needed (eg - date keyed against date effective).
So when measuring the quality of your data, it is not enough to simply visit your original mart specifications to devise your business rules. You need to develop a more consumer driven model, based on the context of how the data is used. This context can only be gained by going to your consumers and discovering how they are using your marts. Then you can build measures that assess not only whether your processes are working, but whether the data is still fit for purpose.

No comments:

Post a Comment