Friday, 20 September 2013

Questions to save you from data quality meltdown

In a large, busy organisation, inevitably, there are a lot of problems. Many issues can be assigned to data quality. But one of the biggest pitfalls for a data quality team, is taking on too many assignments.

Colleagues with over-simplistic viewpoints may use the data quality department as 'long grass' to conveniently kick their problems into. Here are some questions to ask yourself to prevent a data quality team meltdown.

Is it really a data quality problem?

A popular mistake is to assume that operational or functional problems are 'data quality'. For example - if a telecommunications company keeps debiting a customer's monthly charges, even though they cancelled their mobile phone connection, it is not a data quality problem. It is an operational problem. The data correctly reflects what money the customer has paid. Spot these kinds of problems early and remove them from your inbox.

Is there a conflict of purpose?

Systems, databases and data marts get built for specific purposes. It can be tempting for analysts to try to use them for other purposes. When the data doesn't work as expected, they may declare that the data needs 'fixing' so they can use it. These are not data quality issues. They are issues for developers to solve.

Is it a nomenclature issue?

Naming terms can be a problem. Departments may have different names for the same things - or even worse - use the same name for two completely different things. Push the problem back to them until they can articulate their technical terms in plain English. Don't be afraid of sounding stupid to ask for this. It can uncover a lot of underlying problems.

Is there a migration issue?

When data gets migrated from one system to another, not all of the data for the new system will be contained in the old one. Very often these missing data items will either be blank or have default values. Although they are data quality issues, they cannot be fixed because the data was never collected in the first place. Get to know the start dates of each system, and the limitations of any migrated data.  It could save you a great deal of running around.

Are you trying to boil the ocean?

There are some issues that require large-scale intervention. Be honest about your capabilities, and get the correct resources assigned. If this means rejecting issues that are too large, reject them until the right resources become available.

These are all common-sense questions to ask yourself before accepting data quality issues. if you have any others, please use the comments below.

1 comment:

  1. An interesting read thanks, I like the comment on rejecting issues. For myself this has initially caused conflict however the client always comes round eventually and I sometimes even get a thank you.