Sunday, 29 September 2013

Retention policy - a fundamental principle

There are a lot of annoying things in life you naturally expect - like taxes, bad television and crime. But I did not expect such vehemence about a recent article I wrote about the value of data and the conflict of interest it has with cloud providers.

It seems there is a ground swell of opinion in the technology area that all data should be retained, as valuing it and prioritising it is too hard to do. Furthermore, these people (often cloud providers) could not see the harm in keeping all their data indefinitely.

There are some very good reasons why you should be scrapping data on a regular basis, and here they are:

1.  It will save you money
2.  It will make you compliant with regulation
3.  It will improve the trust with your suppliers and the general public
4.  As reality changes, your data will degrade in quality over time until it is little more than useless.

But the most important reason is something a little more fundamental. People have a right to make mistakes and have them forgiven and forgotten. People also have the right to change their opinions and image. They have the right to reinvent themselves and move on. If we cannot discard what did not work, how can we move on in a healthy and positive manner?

Friday, 20 September 2013

Questions to save you from data quality meltdown

In a large, busy organisation, inevitably, there are a lot of problems. Many issues can be assigned to data quality. But one of the biggest pitfalls for a data quality team, is taking on too many assignments.

Colleagues with over-simplistic viewpoints may use the data quality department as 'long grass' to conveniently kick their problems into. Here are some questions to ask yourself to prevent a data quality team meltdown.

Is it really a data quality problem?

A popular mistake is to assume that operational or functional problems are 'data quality'. For example - if a telecommunications company keeps debiting a customer's monthly charges, even though they cancelled their mobile phone connection, it is not a data quality problem. It is an operational problem. The data correctly reflects what money the customer has paid. Spot these kinds of problems early and remove them from your inbox.

Is there a conflict of purpose?

Systems, databases and data marts get built for specific purposes. It can be tempting for analysts to try to use them for other purposes. When the data doesn't work as expected, they may declare that the data needs 'fixing' so they can use it. These are not data quality issues. They are issues for developers to solve.

Is it a nomenclature issue?

Naming terms can be a problem. Departments may have different names for the same things - or even worse - use the same name for two completely different things. Push the problem back to them until they can articulate their technical terms in plain English. Don't be afraid of sounding stupid to ask for this. It can uncover a lot of underlying problems.

Is there a migration issue?

When data gets migrated from one system to another, not all of the data for the new system will be contained in the old one. Very often these missing data items will either be blank or have default values. Although they are data quality issues, they cannot be fixed because the data was never collected in the first place. Get to know the start dates of each system, and the limitations of any migrated data.  It could save you a great deal of running around.

Are you trying to boil the ocean?

There are some issues that require large-scale intervention. Be honest about your capabilities, and get the correct resources assigned. If this means rejecting issues that are too large, reject them until the right resources become available.

These are all common-sense questions to ask yourself before accepting data quality issues. if you have any others, please use the comments below.