Musings of a Data Geek: Cracking your data code

Thursday, 21 June 2012

Cracking your data code

You've been given a project to migrate legacy data onto a new platform. The new entity has been prepared, but for the legacy system, there is no documentation. It is a bespoke system that was built twenty years ago, and all of the developers have left. Your IT professionals know how to fix the legacy batch processes if they break, that's all. Your business leaders know how the screens work and what the correspondence looks like. No-one knows about the data, and it's your job to sort the data out...

How stressed are you??

So how do you get a grip on data that no-one knows about? How do you work out the table structure and the data content? For many people, they would open each table manually and write long sequel to summarise the data. For a large migration, this could feasibly take months.

But there is another way.

Profiling your data is a way of discovering the content of the data and how the tables are joined together. There are many software manufacturers who make profiling tools, like SAS(Dataflux), IBM(Infosphere), Trillium, Ataccama, Talend etc. Having a profiling tool available means you can do the following:

Discover the content, format and ranges of your data by building statistical models of the data.
Follow how the data changes from one day to the next.
Map the relationship between data tables and fields across your organisation.
Calculate redundant data between related tables.
Monitor primary keys to ensure your indexes are correct.

A profiling tool can help developers to discover unknown data sources. It can form the basis for the development of data quality rules, measures and dashboards. Profiling can be done as part of the user acceptance testing of new systems to provide proof that large volumes of test data remain within acceptable tolerances. It really is the swiss-army-knife of data management.

So what's stopping you getting one? I guarantee, you will use it over and over.

2 comments:

Aaron jhonson19 April 2022 at 12:54
I truly appreciate the time and work you put into sharing your knowledge. I found this topic to be quite effective and beneficial to me. Thank you very much for sharing. Continue to blog.

Data Engineering Services

AI & ML Solutions

Data Analytics Services

Data Modernization Services
ReplyDelete
Replies
Richard Northwood19 April 2022 at 13:20
Thanks Aaron.
ReplyDelete
Replies

Add comment