Sunday 30 September 2012

5 things you can do with a metadata dictionary

You may have heard that you need a metadata dictionary. Usually this comes from your analysts or technicians. But what is it, and what can you do with one?

Metadata is information about your data. At it's very basic, it could simply be a list of tables and fields with their properties. 

Metadata can be collected and interpreted in whatever way that is important to your organisation. So what can a database of this kind of information be used for?


1.  Managing change
If your metadata includes lineage tracking, then you can find out how many other systems are affected if you were to change or remove an information service.

2.  Tracing unauthorised or unplanned change
You can compare your dictionary to what is actually on your system and see if unauthorised changes have been made. If systems fail, tables can be deleted. Comparing your production environment with a metadata dictionary can find out the extent of your problems.

3.  Best source analysis
It can help to ascertain the best source of information for MI projects.

4.  Migration planning
You can use it to help plan migration projects.

5.  Tracking responsible and accountable data owners
You can name RACI colleagues for each data source so that the people are held 'Responsible' and 'Accountable' when things go wrong, or are 'Consulted' and 'Informed' when there is a problem or a system change required.

A metadata dictionary will quickly become ubiquitous for many aspects of business, from disaster recovery to data governance and change project activity.

Wednesday 26 September 2012

3 ways to align data quality to your company goals

One of the key decisions for a data quality team is how you are going to deploy within your organisational structure. The positioning you want to take within your organisation and the activities you choose to undertake are vital to getting co-operation and support. Here are my thoughts about ensuring your deployment of data quality is consistent with the aspirations of your organisation.

1.  Control your IS expenditure and track against high-level corporate goals
The reason why outsourcing to India has been so prevelant in the UK is because all businesses are watching IS expenditure. It is one of the most aggressively controlled areas. Aligning your IT costs  to  high-level corporate ambition will ensure you are not left on a limb when the budget is being assigned.

2.  Align services with your marketing or sales department
Make sure that a sizeable amount of your service is focused towards supporting your sales and marketing functions. This ensures your actions support the commercial opportunities that your organisation wishes to attain.

3. Allow your risk, compliance and governance functions to have a say in the prioritisation of your work
This will ensure that your efforts are prioritised in accordance with your corporate intentions to avoid financial, regulatory, reputation and operational risk.

Data quality activities need to be a balancing act, allocating the best projects to mitigate risk and also to maximise opportunities. So find the key departments that already have the best track record in delivering these attributes and support them. 

Sunday 23 September 2012

What can go wrong?

Computers are simple - really simple. Fundamentally, they only deal with two things. Programs and data. If you are running everything on one solitary machine, and that machine is not defective, then it is either the program that is at fault, or the data that the program uses. If the program hasn't been changed since it last ran correctly, then it will be the data that is wrong. Simple.

When your computer connects to other computers, the network comes into play. Computers can get locked out when their passwords expire. This requires manual intervention to reset the user id or change the password. Then you can have network congestion. This when the quantity and/or size of all packets of information exceeds the capacity of the network router to deal with them. Individuals can control their own rates to achieve an optimal network rate allocation. You simply need to apply this formula:

\max\limits_x \sum_i U(x_i)   so that  Rx \le c

Simple...

You could have damaged cables. So if you work in a large organisation, testing each one to find out if there is a problem could take you many hours. The damaged cable might not even be in your building. One of the network cards could get stuck on transmit mode, leading to excessive network collisions. There could be a software configuration problem - typically DNS or TCP/IP settings can cause many issues. You could also have more than one computer using the same IP address. This can create intermittent problems in communication that can be hard to trace. 

All of these problems require a dedicated department that can:
  • Monitor the performance of your network
  • Keep records of failures
  • Keep a map of your network to enable diagnosis
  • Diagnose and fix problems
...otherwise known as your IT department. 

So if you think you can implement bespoke systems without the support of your IT department, think long and hard, particularly if you plan to use a network that other departments also rely upon. Do you really have a handle on what is required to fix problems you may encounter? Can you accept the consequences if your system causes problems with a network that other departments also rely upon?

There is an IT department for a reason. If you plan to develop on your network, engage them ASAP. They may be able to save you an awful lot of pain further down the road.

Wednesday 19 September 2012

Data Quality - the Missing Dynamic

When monitoring the productivity of your workforce, there are many tools that can help you to manage their workflow. One of the favourable outcomes of these workflow systems is the ability to generate management information on the productivity of your colleagues.

There are two basic measures that typically arise from workflow as MI - Efficiency and Effectiveness. Efficiency is a measure of the time they are actually working expressed as a percentage of the time they are available for work. Things that affect efficiency are unplanned interruptions in work. Effectiveness is the volume of work completed as time, expressed as a variance of the time available for work. Highly effective colleagues achieve a high volume of work in a shorter timescale.

With the drive for ever greater efficiency and effectiveness, data quality can suffer greatly. To get around this, the easiest way is to set up sampling plans for each colleague. However, when deadlines get stretched, this is usually the first thing to be de-prioritised. With sampling, you are also relying on luck to capture errors.

This is where data quality tooling can become a valuable asset for your organisation. Building business rules for processes, capturing the data and scoring each colleague on the quality of their work is a valuable and powerful addition to your productivity measures. It ensures that the drive for throughput does not adversely affect your quality. This will deliver greater efficiency by enabling more time to be spent delivering your services and less time remediating data.

Monday 17 September 2012

4 important questions for migrating data

Planning a migration of data to new systems? One popular expectation of new systems is that they will fix errors caused by incorrect data on the legacy systems that held the original data.

New systems rarely do fix data errors. So the phrase "Garbage in, garbage out" is never more significant when moving large amounts of data from one entity to another.

Very often business users will rightly concentrate on the needs and requirements of the business. This is correct business practice. But here are 4 alternative questions to ask before migrating data.

1. How old is my data?
Analyse those timestamps and find out how much of your data is likely to be out of date. Find out what the regulations are about your data. Do you have to keep it for a certain amount of time? At what age can your data be scrapped?

2. What granularity can you do without?
Perhaps you have a whole group of suppliers who have since been taken over by one company. Perhaps you have duplicates of customers. You could also have a complex list of multiple products that could be migrated over under one single product name. Multiple transaction types could be simplified. Standardising, simplifying and de-duplicating your data before migration greatly increases your chances of a smooth migration.

3. What data is not relevant?
Perhaps you have suppliers or customers who no longer do business with you. Perhaps you have products or services that you no longer offer. Do you need to migrate that data? 

4. How much of my data is bad?
Analyse the quality of your data. Do you really need to keep it if the quality is bad? Make a challenge as to why you need to migrate it. Make sure there is substantial business purpose for data remediation, as it is potentially the most costly part of the operation.

When gathering requirements, it is easy to fall into the trap of migrating everything and risking over-complicating the process. One of the reasons why you are migrating your data to another platform is presumably because you choose to take advantage of a more modern entity. Don't let the old entity's quirks and foibles infect your new system. Your business will be keen to migrate as much of the old data as possible. To ensure balance, stand the requirements on their head and offer the challenge - what data can we do without?

Wednesday 12 September 2012

Flamboyant code

I recently caught up with a colleague I used to work with (let's call him "Bill"). While I have had several jobs since, Bill is still in the same job when I first met him. He hasn't moved on.  He is still maintaining the systems he wrote nearly twenty years ago.

We both once worked for a pharmaceutical company. I was in a mainly admin role, with a lot of paper shuffling. The company implemented a new electronic document system to replace the more cumbersome microfiche factory they were running previously.

Being a bit of a data geek, I soon got to know some of the IT developers who were charged with developing this system (and many others) and supporting them. Bill was one of the developers in the team.

Bill was particularly proud of his work. He enjoyed showing off, and I liked to learn about programming. One day, he waved me over and pointed to his screen. What I saw absolutely fried my head.

"I reckon I'm the only one here who will ever understand this code," Bill announced proudly.

He then proceded to explain it to me. I could grasp the concepts, but the way it was written was mind-boggling. I was instantly amazed at his complex use of subroutines, algorithms and spacial awareness. For a long time, I held that example in my head of what a developer should be.

Since then, my view has changed significantly. Whenever you write programs, it is always important that other developers can understand what is happening. It should be logical. The naming conventions should be consistent. It should be clearly annotated. You should take time to make the documentation as clear and simple to understand as your code.

This is particularly important if you are employing contractors to deliver single projects. Signs of flamboyant code are:
  1. Inappropriate naming conventions for subroutines or variables.
  2. Excessive use of subroutines.
  3. Multiple programs to do one job.
  4. Very little or no functional documentation. (ERD, process flowcharts etc.)
  5. No evidence of collection of business requirements.
  6. Very little annotations in the code.
Devise coding standards. Insist they are adhered to by auditing delivered work before it is promoted to production. Standardising your approach to development will dramatically affect the technical agility of your organisation.

Friday 7 September 2012

Closing the gate

By now, you should all be aware that I like technical stuff. When I was 12, my parents bought me a Sinclair ZX81 (1k) home computer. I enjoyed learning BASIC programming. After a brief dalliance with CB radios (ground planes, aerial technology, bands in megahertz, phonetic language), I got an ORIC 1. Forth was a truly strange language to me, so I quickly swapped it for an Acorn Electron.

There was no stopping me. If it flashed or did something interesting, I was impressed. When other kids were hanging around the local newsagents trying to get adults to buy them cigarettes, I was hanging around Radio Shack buying a new RS232 interface lead. Not much has changed since. I work with computers. I love the work. I like helping people with their technical difficulties, and I like fixing things. So it was truly amazing when a walk with my family taught me a valuable lesson late last year.

I was working on selecting data quality software, and really enjoying the challenge of liaising with lots of vendors. But it is safe to say that while I was walking with the family, my mind was elsewhere, evaluating the pros and cons of various systems. We walked past a field with sheep in it. On that gate there was a sign saying, "Please close the gate."

I stopped and realised that all my thoughts about software had been purely based on one half of the data quality journey. Data quality software is the trigger that instigates remedial action (the gate is open and your sheep have escaped). The other half of the data quality journey is being able to implement preventative actions (keeping the gate closed).

Preventing problems from happening has to be the first goal of a data quality team. Leveraging technology to ensure problems are prevented is a question of clever system design. But sometimes, all it might take is an old metal sign screwed to a gate.

Tuesday 4 September 2012

The dating game



There's no avoiding it. They can be a source of immense frustration. I am referring, of course, to computer dates - and not the romantic type!! When it is all going well, you hardly know they are there. But beware. Dates are tricky.

Many systems and databases can have their own way of working out dates. They can store them in a bewildering array of formats. The date could also be stored as an integer, and merely converted to a date format by the application, so that when moving from one database to another, you have to factor in an algorithm to compensate.

Dates can be packed into small spaces. They can even be stored as hexadecimal. Many formats are simply not recognised by data quality software, and you may have to resort to flamboyant coding to resolve.

When scheduled systems fail and have to be run late, this can affect system generated dates. When systems become out of synchronisation, very often the data is recovered, but the dates have been delayed, which can have an adverse impact on the phasing of business intelligence reporting. You can often discover the date schedule in one system is permanently out of synchronisation with the other, creating systemic latency.

Perhaps more frustrating about dates is their very definition, especially when trying to share data between disparate systems. Nomenclature is key. But even using consistent naming conventions can cause issues. For instance "Start Date" may have a totally different definition from one system to another.

Also, when you look at a date within a system, are you sure it is correct? A profile may discover lots of dates coinciding with the date that the database started. These are often just put in as default dates when data was migrated over to the new system.

So whatever you do, wherever you are in data services. Do not neglect your dates. Even if you think you have everything planned, think again and again. 

Monday 3 September 2012

What can data profiling do for me?

One of the more interesting challenges about publicising your data quality work is convincing colleagues about what the software tools can do for them....

Profiling is the statistical analysis of the content and relationships of data. This kind of dry description does not always capture the imagination of business leaders and accountants. You have to let them know what contexts it can be used.

So here is my list of real life applications that a data profiling tool can be used for within the information technology sphere:

1:   Spearheading migration projects.
2:   Backwards-engineering of processes.
3:   Measurement of data quality and process efficiency.
4:   Discovering relationships between disparate data sources.
5:   Identifying dual usages for fields.
6:   Assessing data eligibility for Business Intelligence purposes.
7:   Reducing risk of data issues for master data management (single customer view) projects.
8:   Testing data for any new implementation in a test environment.
9:   Monitoring the effectiveness of batch processes.
10: Assessing the implications of integrating new data into existing systems.
11: Measuring the relevance of old data.
12: Selecting the most appropriate sources of data for any project where more than one data source is available.
13: Discovering whether a data source that is made for one purpose can be used for another.

Check the comments for more great uses supplied by Sam Howley.
So for measuring and modelling your data, a profiling tool is the swiss-army-knife of data management. If you are the first  in your organisation to get one, and your colleagues know what it is capable of doing, prepare to become a very popular analyst indeed.