Monday, 3 September 2012

What can data profiling do for me?

One of the more interesting challenges about publicising your data quality work is convincing colleagues about what the software tools can do for them....

Profiling is the statistical analysis of the content and relationships of data. This kind of dry description does not always capture the imagination of business leaders and accountants. You have to let them know what contexts it can be used.

So here is my list of real life applications that a data profiling tool can be used for within the information technology sphere:

1:   Spearheading migration projects.
2:   Backwards-engineering of processes.
3:   Measurement of data quality and process efficiency.
4:   Discovering relationships between disparate data sources.
5:   Identifying dual usages for fields.
6:   Assessing data eligibility for Business Intelligence purposes.
7:   Reducing risk of data issues for master data management (single customer view) projects.
8:   Testing data for any new implementation in a test environment.
9:   Monitoring the effectiveness of batch processes.
10: Assessing the implications of integrating new data into existing systems.
11: Measuring the relevance of old data.
12: Selecting the most appropriate sources of data for any project where more than one data source is available.
13: Discovering whether a data source that is made for one purpose can be used for another.

Check the comments for more great uses supplied by Sam Howley.
So for measuring and modelling your data, a profiling tool is the swiss-army-knife of data management. If you are the first  in your organisation to get one, and your colleagues know what it is capable of doing, prepare to become a very popular analyst indeed.


  1. Great list, I would add :

    14: Discovering mixed types in a field (we expected valid dates even though it's a varchar field but 1 row out of 10,000 doesn't contain a valid date string.)

    15: Identifying deprecated fields and tables (we still report off the EVENT_C table but it hasn't had a row inserted in 2 weeks, is the online system still using it ?)

    14: Measuring and comparing data entry patterns (does office A enter a complete valid address more or less often than office B ?)


    1. Thanks for the great ideas. Have added into the article. Have a great day.

  2. Excellent points, adding to my ammunition for driving adoption of profiling at work.

  3. I think the big one for me is prioritisation and focus, you'll never be short of thousands of DQ issues to resolve, data profiling helps you get a grip on which ones are critical, nice to have or irrelevant. It's one of the most essential data quality techniques.

    Great post.

    1. Thanks for the note, Dylan. That makes excellent sense. Am still getting my head around the potential of profiling, so that's another one to look out for.