Saturday, 9 February 2013

Data lineage - a cautionary tale

Recently, laboratories in Ireland discovered that some of their processed beef contained horse meat. The public were outraged, and the Food Standards Agency insisted that all retail sold processed beef in the UK was DNA tested for horse meat. Some high profile branded processed beef products have been found to contain 100% horse meat. 

Now the general public are very concerned that they have been eating food that could have been contaminated with chemicals that are used in the rearing of horses.

A senior politician was quoted to say, "We need to know the farmer and the meat processor."

It seems that the modern processing of meat has become a very complicated business, with different parts of animals being moved from one company to another. No-one truly knows where their processed meat comes from. 

You may be surprised to hear that there are many companies who deal with data in a similar way to the meat processing industry. They may know where the data is manufactured, and where the results appear in reports, but it is the processing in the middle that they don't understand.

Data may arrive in a database, then get extracted, parsed, standardised and moved from one mart to another. It may be summarised and moved into multiple spreadsheets, where adjustments are manually made and then the data is re-extracted into other systems before finally finding it's way into a report. The full map of systems, processes and departments involved in the processing chain may not be known by just one person in the organisation. It is also unlikely that any of it is written down!

The understanding of how data is manufactured, processed, stored and used is called 'data lineage'. The financial industry has already addressed the problem of companies not knowing their data lineage by the EU directive Solvency II. Although it is not in force yet, the value in understanding data lineage is now becoming law. 

If your company is large, tracking your lineage may be an expensive business. Certainly, the software is very expensive. Such costs may be hard to justify in the present financial climate, but doing it now, on your own terms, is far cheaper than waiting for an angry public and government legislation to force you to do it.


  1. Data Governance - With thousands of data attributes, delivered by hundreds of internal and external sources and stored in dozens of unconnected databases, we saw the need for a web-based solution that was capable of integrating the multiple business and technical tools currently in use by financial organizations. That is why we decided to work exclusively with Adaptive who has helped us migrate our Semantics Repository to their standards-based Metadata Manager.

  2. I like the example of data lineage to the horse meat scandal.
    Agree the importance and understanding of data lineage is hugely important to an organisation. I have also found the information from data lineage is gold dust to a support agent who could use it to impact access suggested changes to source data system. This data lineage will give them a full picture of the data flow and more importantly what (systems and tables) and who (users of reports that reference these tables) will be effected by a change.

  3. Adaptive is a market leading technology provider whose solutions support Data Governance, Data Data Lineage Quality, Metadata and Enterprise Architecture initiatives.

  4. Thank you, Rupali. For the sake of balance, there are also many other technology providers that do similar things, IBM, Informatica, ASG, SAS etc...