Sunday, 16 December 2012

Reporting a problem

When the first retail computers were sold to the general public, they came with very thick technical handbooks on how to set them  up and run them. I remember my Sinclair ZX81 had a really thick book with lots of programming tips as well as clear instructions and diagrams on how to assemble it. But as computer use became more and more widespread, the instructions became more and more brief.

As the world took up personal computing, it became very clear that people were going to need help - hence the rise of the help desk.  Help desks are a modern fact of life. Where technical proficiency is required to solve problems, you will find a help desk.

Most people have horror stories about dealing with help desks. There is no doubt, some can be very frustrating to deal with. But take time to look at things from their point of view:

So when you are reporting a problem, here are my top tips for getting a result from your help desk. Before you pick up the phone, prepare the following information:

1.   Have all your contact information ready
Have any account numbers, user ID's, network addresses, authentication details etc. ready before you call. Scratching around for them during the call can cost you money and makes you more stressed before you have had chance to tell them what the problem is.

2.   Specify your problem
Collect as much information about the problem as you can. Where the failure is, what else is working Ok? Timescales (be precise as possible) - When did the problem first happen? When did things last work well? How many times has the problem occurred since the first occurrence? How often has it failed? Gauge the extent of the problem - who else has the same problem? How many other systems/machines have the same problem? Is there any pattern or trend to the problem? What internal and external changes might have affected your system?

Now you have your information, give them a call and follow these rules:

1.   Listen carefully to automated systems and make sure you select the right option
If the help desk has automated systems for routing the callers, be aware that they deal with high call volumes and they have specialist areas to deal with different technical problems. Don't be afraid to hang up and start again if you think you keyed the wrong option. Getting the right person first time can be key to your success.

2.   Be patient
If you are put on hold, use this time to prepare your information on the problem.

3.   Don't complain about authentication questions
When they answer the phone, they will want to know if you are who you claim to be (authentication). Be nice about this. They are only making sure you aren't an identity thief.

4.   Keep to the subject
When communicating the problem, state it clearly. Keep focused on asking how the problem can be fixed, rather than setting out on blaming someone. Many of the highest performing companies now see the rectification of problems as an opportunity to improve their service and prove competence. However, even the best operator will find it hard to give you good service if you are ranting at them.

5.   If it can't be fixed while on the phone, be firm about establishing deadlines
Get the operators name, then write down any dates or times for when things will be fixed. If no deadline is mentioned, ask for one. If they are vague or evasive, ask for a minimum and maximum timescale, then negotiate. If more than one thing has to happen, establish a timeline. Write this down slowly and clearly, so you can refer to it afterwards.

So now you have my recipe for dealing with help desks. Preparation before the call is key. Lastly - if you get good service, acknowledge it. Send an email to the head of the organisation. Good technical support is an art form.

Thursday, 13 December 2012

Diesel and Data

Recently, early in the morning, I was filling my car with diesel. In the back of my mind, I was planning my day. It was going to be busy, and I was also thinking about a technical challenge that required a solution. But while I listened to the fuel gurgling into my car, something wasn't quite right.  

It scratched on the edge of my consciousness, trying to get attention. Then I thought to myself, the diesel didn't smell like it usually did. I looked down, and everything went into slow motion as I realised, to my horror, that I had put £30 worth of unleaded petrol into my diesel car!

Now if you want to see what trouble unleaded petrol causes diesel cars, click HERE and HERE

I was not a happy bunny! So I went into the service station and told them of my problem. I then pushed my car to a safe place, and called a number of a company that pumps out cars when they have the wrong fuel. After a long wait, the van with specialist equipment arrived. 

Being a bit of a geek, I watched with interest as they uncoupled the fuel pipe from the engine and proceeded to pump the fuel out of my car. The head mechanic was quite jolly. He told me about a lorry driver who had put nearly £1000 worth of unleaded into his wagon before he realised his mistake. He told me not everyone had the presence of mind to notice while they were at the pump, and usually had to break down before they called for help.

He asked me what I did for a living. I told him I am a data quality analyst. 

"What do you do", he asked?

"Same as you," I replied, "you take bad fuel out of cars, I take bad data out of computers and replace it with good data."

He seemed to like that. Grinning broadly, he handed me my keys.

"Hope I don't see you again," he quipped.

"Likewise," I replied, and headed off to work.

Saturday, 24 November 2012

Just how wrong can you get?

The specification statement was clear, "Dogs must be kept on a lead". The project analysts got together to break the request down. The statement did not say how many dogs must be kept on a lead. But after much consideration, the scope was defined as 'Any dogs within the area upon which the sign was situated.'

Everyone agreed that the 'Must' category was the best part of the statement..... (obviously a MoSCoW requirement.) The word 'kept' caused much anguish amongst the project analysts. What did they mean by kept? The dictionary was not much help....

kept - (especially of promises or contracts) not violated or disregarded; "unbroken promises"; "promises kept"
1. the past tense and past participle of keep
kept woman Censorious a woman maintained by a man as his mistress

Much thought was poured into the interpretation. Did 'kept' imply that the dogs were to be groomed  or fed while sat on the surface of a lead? They could not agree. One contractor finally lost his temper and said, "For crying out loud, enough of this intellectual onanism... can't you take anything literally? It's obvious. If you can read this sign, and you have a dog, it must be kept attached to a lead."

The next day, he lost his job for having an attitude.

And an additional handbook was written about how to hold the lead and who should be holding it. It stated clear guidelines as to the heaviest dog you could possibly hold, plus a leaflet about your dog's health, the potential risks to the walker and the affects of chafing on the hands.

The analysts, having "questioned to the void" long ago, had decided to pass the specification to their technical department, unchanged:

"Dogs must be kept on a lead"

The technical department sniffed at the statement. How could their colleagues in the business be so ambiguous? They found out where the sign was designed to go and surveyed the park capacity and neighbouring population. They decided there was an average at any given time of 12 dogs within the park, with a maximum capacity of 32 at peak times.

So taking this maximum into account, they constructed a huge lead with 40 collars attached.

As a RACI matrix wasn't signed off, no-one is accountable for actually holding the lead.

Saturday, 17 November 2012

Breaking through the Cassandra Complex

So we all know about the Cassandra Complex from a previous post? Take time to read it. Done that? Good! So here is part two - getting your colleagues to believe in what you are saying, and act on your requests.

When you work in a governance, compliance or data quality role within an organisation, it can be very hard to persuade people that your proposals are correct. One of the biggest mistakes I have seen, is that professionals push for their own agendas without taking time to understand the people they are trying to convince. This brings me neatly to my first step:

1. Conduct an opinion survey
Find out exactly what your colleagues are thinking throughout the organisation. The ideal way is to construct an anonymous opinion survey where they are just asked for department, area and role. Here is a great site that helps you construct on line surveys. You have probably seen the kind of questionnaire. You list common opinions (both good and bad), and they fill in whether they agree or disagree with them. The results of this survey will be the basis for your communications strategy, but one of the amazing things about surveys is, they are also great tools to get people to evaluate their beliefs.

2. Prioritise the biggest areas of concern
Perhaps you discover that your colleagues don't believe they can make a difference. Perhaps they don't understand how important their role is. They may not care about the consequences of getting things wrong. They may even be too afraid of reprisals to talk about problems they are having. Find your top problems for each department.

Now for each high priority damaging opinion, you need to build a communications strategy that does the following:

3. Create uncertainty about the damaging opinions
The first thing you should do is to make them feel less certain about these opinions. Refer to occasions where their opinion would not have worked. Quote the relevant statistics that contradict them. Ask them whether this opinion serves them and your customers well enough.

4. Reduce their resistance to the opinions you want them to have
This is where you start to introduce why your opinion is better for them, your customers and the company as a whole. 

5. Amplify your new attitude
Reframing opinions brings out changes in attitude. Once a change of attitude happens, it is important that you don't just stop there. You have to paint a vivid and inspiring picture of just how great things are going to be for everyone involved. This is often referred to as "encouraging the heart".

6. Test your results
Conduct another survey after your comms has been implemented. Compare it against the previous one. What has changed? Has anything improved? What are the lessons learned?

Remember, just conducting a survey will greatly change the attitudes of your colleagues. It will also show that you value their opinion, which can be one of the barriers. Also, being able to profile the beliefs and values of a department before you meet with them will give you plenty of help in understanding the problems they have on a day-to-day basis.

Once you understand your colleagues opinions and beliefs, you can change their attitude and gain the behaviour and co-operation that you need.

Wednesday, 14 November 2012

Is your excel formulae correct?

Formulae can save you a lot of time. Excel employs some fantastic formulae that can really help you analyse data. But what can go wrong?

Well, for 2012, there will be 53 weeks if you use the WEEKNUM formula. It is a calendar quirk that may throw any calculations you may have this year. Look out for it. Here are some of the more common formulae problems I see in Excel spreadsheets:

1.  Not coding for zero values

Dividing one field by another is simple enough:


But what if one of your values could possibly be zero? Excel does not like dividing by zero and you will end up with an error. So put in a condition. Perhaps like this:

=IF(A2 = 0,0,(A1/A2))

2.  Circular references

You have two fields A1:=B1+5 and B1:=A1-5.

A1 requires B1 to be calculated, but B1 requires A1 to be calculated. Don't do it. Excel doesn't like it.

3.  Hard-coding values into your formulae

Write your formulae as clear as possible. Do not put values into your formulae. Have your formulae reference values in other fields

Wrong:        = A1*120

Right:          = A1*B1       (nb populate the field B1 with the value 120)

4.  Not including balance checks

When you have totalled your columns of data, it really is worth putting some balance checking into your sheet. This gives you some indication that your formulae has worked, and there are no missing values, or blank spaces where you haven't dragged your formulae properly.

Excel is a great tool for working things out. I'm sure you have come across more common errors. Leave your comments in the place below.

Sunday, 4 November 2012

Is your business culture like a Greek tragedy?

Sounds a bit dramatic? It could well be. But the phenomenon I am referring to is the tale of Cassandra. In Greek mythology, Cassandra was a beautiful woman who refused the romantic attention from the god, Apollo. He had his revenge by giving her the gift of being able to see the future, but in a cruel twist, cursed her so that no-one would believe her. She was said to have been ignored when she predicted the fall of the ancient city, Troy.

Sadly, this is a real phenomenon. There are people who have been totally ignored by their colleagues and peers. Nouriel Roubini was the economist who accurately predicted the collapse of the housing market and the worldwide recession. The New York Times famously labelled him, "Doctor Doom", and he was ignored when he addressed the IMF about his concerns. 

Environmental campaigners have been predicting the global warming, melting ice caps, rising sea levels and crazy weather since the 1970's, when the phenomenon was labelled the greenhouse effect. To this day, they are still being ignored, despite the rising tide of scientific evidence that it is happening. 

When someone makes wild claims to be able to see the future, and their predictions are excessively negative and not based on any sound judgement, they are often referred to as having a "Cassandra Complex".

Operating in the space of data management, data governance and data quality, you could very easily be labelled as "Cassandra". You will be telling people that the easiest way is not the best. You should be warning them of unacceptable risk taking. You should be encouraging best practices. You will be insisting on higher standards of technical development. This is not the kind of thing that most people want to hear.

How do you avoid being labelled the tinfoil-hatted doom merchant in the corner? Stay tuned. My next blog article will tell you how.

Thursday, 25 October 2012

How Scary Is Your Data?

Soon it will be Halloween. It's a time of ghosts, ghouls and demons. But all of that pales into insignificance when compared to the truly terrifying reality of kids running around the streets pumped up on chocolate, sugar and energy drinks!!!! And to celebrate the witching hour, here is my list of halloween data horrors... Don't say I didn't warn you... Mouhahahahaaaaaa!!!!

Undead data
This is the ancient data that you did not kill off. It served it's purpose years ago, and you archived it, but did not delete it. It lies in it's crypt waiting... waiting.. for the sun to set. If your regulators find out, it's you who will get it in the neck.

Alien data
It comes from another world (cue 50's b-movie music)... namely that company you have outsourced your data collection to. But you forgot to include data quality and governance standards in the agreement. And now you have data that is taking up all your resources trying to make sense of it. no-one can agree on the results and your whole organisation is paralysed.

Frankenstein data
They wanted to know diabolical things about your organisation, and they didn't care about how you did it. You could not find any documentation on your data sources, and they would not pay for a profiling tool. So you bolted and stitched huge amounts of unrelated data together to create an abomination. Deep into the night, you worked feverishly until finally you hysterically cried, "It lives, it lives".... All were amazed how you could breathe life into dead data and you reaped rewards. But deep down, you know it's only a matter of time before it either comes apart or brings your whole organisation crashing down around you.

Zombie systems
Those legacy systems died years ago. But someone keeps digging them up and re-animating them. Whoever did it, they certainly seem to have lost their BRRAAAIIIINS!!!

Godzilla data
No-one knows how or why they asked for it, but now it's here, and it's just too big. The scale is massive. All your IT staff run away screaming while it crushes servers and tangles networks. This big, 'Godzilla' data is requiring some other monster called 'Hadoop' to sort it out. They were last seen fighting off the coast of Java.

I hope you enjoyed my tales of data horror. Sleep well, now.. Pleasant dreams.... Mouhahahaaaa!!

Tuesday, 23 October 2012

Data Quality Failure - Apple Style

When Apple launched the iPhone 5, much was made of the new features of IOS6. One of which was the new maps application. This was lauded as "A beautiful vector based interface" and "Everything's easy to read and you won't get lost".

Although the application functioned well, the data it used was far from effective. Unlike the hype, people started to 'get lost'. One thing is patently clear. Apple had not conducted any data quality analysis of the databases that the maps application consumes. 

All databases are models of reality. The discipline of data quality is to ensure that the database is the best model possible. It is obvious that the maps database was not checked against reality to ascertain whether it was an accurate or complete model.

An independent analysis of a sample of the Apple Maps (using the Canadian province of Ontario) provided some interesting stats. Of the 2028 place names in Ontario, 400 were correct, 389 were close to correct,  551 were completely incorrect, and 688 were missing. 

Apple did not gather this information. It acquired the street and place data from Tom-Tom (the vehicle satellite navigation company) and integrated it with other databases. Despite strenuous denials of culpability by Tom-Tom, the facts show that the location data experienced by the users was missing or incorrect. 

To say that this has undermined the reputation of Apple is a large understatement. It prompted a public apology by the CEO, Tim Cook. 

So could Tom-Tom and the other suppliers of maps data have knowingly supplied incorrect data to Apple? Probably not. Surely Apple had data quality measurement in place? The results suggest not. Only 19.7% accurate place names and 33.9% of place names missing. 

When entering into agreements with 3rd party suppliers of information it is imperative that data quality standards are insisted on as part of the commercial agreement - with penalties for non-compliance. As the results of this little mess between Apple and their suppliers show, you may be able to outsource responsibility, but not accountability.

Thursday, 11 October 2012

5 Steps to Choosing the Right Data

You have a project, and you need data. So you go to your metadata dictionary and search for a data source, and you discover that there are several sources that you could possibly choose. Perhaps you have multiple measures and you need to know which ones to retire. How do you make the right choice? This is my 5 steps to choosing the best data source for your project.

1.  Classify and develop your objectives:
List all of your requirements. What data fields you need, reporting frequency, timings, transaction types, granularity etc. Make sure they are either classified as 'musts haves' or 'wants'. When you have a full list, give each 'want' a weighting score - highest value being most important.  

2.  Profile the data sources.
Build and run profiles of the data in each data source. Examine the field types, volumes, dates, times, transaction types and granularity. Profiling any creation timestamps will give you an idea of the scheduling that runs on the data. 

3.  Match the attributes and profile results of the data sources against the objectives.
Based on the profiling, how well does each source satisfy the objectives? Consider the timeliness of update and batch windows. Do they match the schedule in your objectives? Are the data sources structurally compatable to your requirements? Does each data source provide the correct level of granuarity? If any of the 'must haves' are not met for a data source, reject it outright. For all the other options, total the score based on how well they achieve the 'wants'.

4.  Idenfity the risks
Take your two highest scorers and ask yourself the following questions about them:
  • What future threats should we consider?
  • If we choose the data source, what could go wrong?
  • Is our understanding of this data source good enough?
  • What are the capacity/system constraints?
5.  Choose your preferred data source:
Are you willing to accept the risks carried by the best performer in order to attain the objectives?
          If yes, choose it.
          If no, consider the next best performer and ask again.

So there you have it, a rigorous approach to choosing the best data source. How much detail you go to will depend on the rigour that is required for your industry sector.

Sunday, 7 October 2012

The 4 C's of data management

The 4 C's are what I use to map the data journey. Here are the 4 C's:

Data is created. Generally, this is done by people who key in the data manually. Your customers may be data creators if they have to key online applications. Data creators are responsible for creating data as correctly as possible. 

The data is also changed.  It could be as simple as someone keying a change of address in your database, or changing services for your customers. Data Changers are responsible for keeping the data in line with changes to reality.

This includes data management, regulatory compliance and data quality monitoring and maintenance activities. In the data world, this can also cover parsing, standardising, error correction, de-duplication etc. Controllers are responsible for monitoring and controlling the data. It also covers anyone who has to archive or destroy data to fulfil data regulation.

These are people who view or use the information as part of their job. If you share your information with your customers (account statements etc), they are also included. This also covers Data Protection Act requests for information (UK only). They are responsible for understanding the data, challenging it if they find errors, and making the correct business decisions using it.

These are really snappy ways to remind yourself about the kind of questions you need to ask about a process that you are surveying to understand roles and responsibilities.

All are responsible for process change and maintenance in their areas. All should be consulted and informed about change to the data. The one who should be held Accountable is the colleague who has over-arching control over the whole process. Obviously, if the process spans several departments, accountability can be shared across several function leaders.

Sunday, 30 September 2012

5 things you can do with a metadata dictionary

You may have heard that you need a metadata dictionary. Usually this comes from your analysts or technicians. But what is it, and what can you do with one?

Metadata is information about your data. At it's very basic, it could simply be a list of tables and fields with their properties. 

Metadata can be collected and interpreted in whatever way that is important to your organisation. So what can a database of this kind of information be used for?

1.  Managing change
If your metadata includes lineage tracking, then you can find out how many other systems are affected if you were to change or remove an information service.

2.  Tracing unauthorised or unplanned change
You can compare your dictionary to what is actually on your system and see if unauthorised changes have been made. If systems fail, tables can be deleted. Comparing your production environment with a metadata dictionary can find out the extent of your problems.

3.  Best source analysis
It can help to ascertain the best source of information for MI projects.

4.  Migration planning
You can use it to help plan migration projects.

5.  Tracking responsible and accountable data owners
You can name RACI colleagues for each data source so that the people are held 'Responsible' and 'Accountable' when things go wrong, or are 'Consulted' and 'Informed' when there is a problem or a system change required.

A metadata dictionary will quickly become ubiquitous for many aspects of business, from disaster recovery to data governance and change project activity.

Wednesday, 26 September 2012

3 ways to align data quality to your company goals

One of the key decisions for a data quality team is how you are going to deploy within your organisational structure. The positioning you want to take within your organisation and the activities you choose to undertake are vital to getting co-operation and support. Here are my thoughts about ensuring your deployment of data quality is consistent with the aspirations of your organisation.

1.  Control your IS expenditure and track against high-level corporate goals
The reason why outsourcing to India has been so prevelant in the UK is because all businesses are watching IS expenditure. It is one of the most aggressively controlled areas. Aligning your IT costs  to  high-level corporate ambition will ensure you are not left on a limb when the budget is being assigned.

2.  Align services with your marketing or sales department
Make sure that a sizeable amount of your service is focused towards supporting your sales and marketing functions. This ensures your actions support the commercial opportunities that your organisation wishes to attain.

3. Allow your risk, compliance and governance functions to have a say in the prioritisation of your work
This will ensure that your efforts are prioritised in accordance with your corporate intentions to avoid financial, regulatory, reputation and operational risk.

Data quality activities need to be a balancing act, allocating the best projects to mitigate risk and also to maximise opportunities. So find the key departments that already have the best track record in delivering these attributes and support them. 

Sunday, 23 September 2012

What can go wrong?

Computers are simple - really simple. Fundamentally, they only deal with two things. Programs and data. If you are running everything on one solitary machine, and that machine is not defective, then it is either the program that is at fault, or the data that the program uses. If the program hasn't been changed since it last ran correctly, then it will be the data that is wrong. Simple.

When your computer connects to other computers, the network comes into play. Computers can get locked out when their passwords expire. This requires manual intervention to reset the user id or change the password. Then you can have network congestion. This when the quantity and/or size of all packets of information exceeds the capacity of the network router to deal with them. Individuals can control their own rates to achieve an optimal network rate allocation. You simply need to apply this formula:

\max\limits_x \sum_i U(x_i)   so that  Rx \le c


You could have damaged cables. So if you work in a large organisation, testing each one to find out if there is a problem could take you many hours. The damaged cable might not even be in your building. One of the network cards could get stuck on transmit mode, leading to excessive network collisions. There could be a software configuration problem - typically DNS or TCP/IP settings can cause many issues. You could also have more than one computer using the same IP address. This can create intermittent problems in communication that can be hard to trace. 

All of these problems require a dedicated department that can:
  • Monitor the performance of your network
  • Keep records of failures
  • Keep a map of your network to enable diagnosis
  • Diagnose and fix problems
...otherwise known as your IT department. 

So if you think you can implement bespoke systems without the support of your IT department, think long and hard, particularly if you plan to use a network that other departments also rely upon. Do you really have a handle on what is required to fix problems you may encounter? Can you accept the consequences if your system causes problems with a network that other departments also rely upon?

There is an IT department for a reason. If you plan to develop on your network, engage them ASAP. They may be able to save you an awful lot of pain further down the road.

Wednesday, 19 September 2012

Data Quality - the Missing Dynamic

When monitoring the productivity of your workforce, there are many tools that can help you to manage their workflow. One of the favourable outcomes of these workflow systems is the ability to generate management information on the productivity of your colleagues.

There are two basic measures that typically arise from workflow as MI - Efficiency and Effectiveness. Efficiency is a measure of the time they are actually working expressed as a percentage of the time they are available for work. Things that affect efficiency are unplanned interruptions in work. Effectiveness is the volume of work completed as time, expressed as a variance of the time available for work. Highly effective colleagues achieve a high volume of work in a shorter timescale.

With the drive for ever greater efficiency and effectiveness, data quality can suffer greatly. To get around this, the easiest way is to set up sampling plans for each colleague. However, when deadlines get stretched, this is usually the first thing to be de-prioritised. With sampling, you are also relying on luck to capture errors.

This is where data quality tooling can become a valuable asset for your organisation. Building business rules for processes, capturing the data and scoring each colleague on the quality of their work is a valuable and powerful addition to your productivity measures. It ensures that the drive for throughput does not adversely affect your quality. This will deliver greater efficiency by enabling more time to be spent delivering your services and less time remediating data.

Monday, 17 September 2012

4 important questions for migrating data

Planning a migration of data to new systems? One popular expectation of new systems is that they will fix errors caused by incorrect data on the legacy systems that held the original data.

New systems rarely do fix data errors. So the phrase "Garbage in, garbage out" is never more significant when moving large amounts of data from one entity to another.

Very often business users will rightly concentrate on the needs and requirements of the business. This is correct business practice. But here are 4 alternative questions to ask before migrating data.

1. How old is my data?
Analyse those timestamps and find out how much of your data is likely to be out of date. Find out what the regulations are about your data. Do you have to keep it for a certain amount of time? At what age can your data be scrapped?

2. What granularity can you do without?
Perhaps you have a whole group of suppliers who have since been taken over by one company. Perhaps you have duplicates of customers. You could also have a complex list of multiple products that could be migrated over under one single product name. Multiple transaction types could be simplified. Standardising, simplifying and de-duplicating your data before migration greatly increases your chances of a smooth migration.

3. What data is not relevant?
Perhaps you have suppliers or customers who no longer do business with you. Perhaps you have products or services that you no longer offer. Do you need to migrate that data? 

4. How much of my data is bad?
Analyse the quality of your data. Do you really need to keep it if the quality is bad? Make a challenge as to why you need to migrate it. Make sure there is substantial business purpose for data remediation, as it is potentially the most costly part of the operation.

When gathering requirements, it is easy to fall into the trap of migrating everything and risking over-complicating the process. One of the reasons why you are migrating your data to another platform is presumably because you choose to take advantage of a more modern entity. Don't let the old entity's quirks and foibles infect your new system. Your business will be keen to migrate as much of the old data as possible. To ensure balance, stand the requirements on their head and offer the challenge - what data can we do without?

Wednesday, 12 September 2012

Flamboyant code

I recently caught up with a colleague I used to work with (let's call him "Bill"). While I have had several jobs since, Bill is still in the same job when I first met him. He hasn't moved on.  He is still maintaining the systems he wrote nearly twenty years ago.

We both once worked for a pharmaceutical company. I was in a mainly admin role, with a lot of paper shuffling. The company implemented a new electronic document system to replace the more cumbersome microfiche factory they were running previously.

Being a bit of a data geek, I soon got to know some of the IT developers who were charged with developing this system (and many others) and supporting them. Bill was one of the developers in the team.

Bill was particularly proud of his work. He enjoyed showing off, and I liked to learn about programming. One day, he waved me over and pointed to his screen. What I saw absolutely fried my head.

"I reckon I'm the only one here who will ever understand this code," Bill announced proudly.

He then proceded to explain it to me. I could grasp the concepts, but the way it was written was mind-boggling. I was instantly amazed at his complex use of subroutines, algorithms and spacial awareness. For a long time, I held that example in my head of what a developer should be.

Since then, my view has changed significantly. Whenever you write programs, it is always important that other developers can understand what is happening. It should be logical. The naming conventions should be consistent. It should be clearly annotated. You should take time to make the documentation as clear and simple to understand as your code.

This is particularly important if you are employing contractors to deliver single projects. Signs of flamboyant code are:
  1. Inappropriate naming conventions for subroutines or variables.
  2. Excessive use of subroutines.
  3. Multiple programs to do one job.
  4. Very little or no functional documentation. (ERD, process flowcharts etc.)
  5. No evidence of collection of business requirements.
  6. Very little annotations in the code.
Devise coding standards. Insist they are adhered to by auditing delivered work before it is promoted to production. Standardising your approach to development will dramatically affect the technical agility of your organisation.

Friday, 7 September 2012

Closing the gate

By now, you should all be aware that I like technical stuff. When I was 12, my parents bought me a Sinclair ZX81 (1k) home computer. I enjoyed learning BASIC programming. After a brief dalliance with CB radios (ground planes, aerial technology, bands in megahertz, phonetic language), I got an ORIC 1. Forth was a truly strange language to me, so I quickly swapped it for an Acorn Electron.

There was no stopping me. If it flashed or did something interesting, I was impressed. When other kids were hanging around the local newsagents trying to get adults to buy them cigarettes, I was hanging around Radio Shack buying a new RS232 interface lead. Not much has changed since. I work with computers. I love the work. I like helping people with their technical difficulties, and I like fixing things. So it was truly amazing when a walk with my family taught me a valuable lesson late last year.

I was working on selecting data quality software, and really enjoying the challenge of liaising with lots of vendors. But it is safe to say that while I was walking with the family, my mind was elsewhere, evaluating the pros and cons of various systems. We walked past a field with sheep in it. On that gate there was a sign saying, "Please close the gate."

I stopped and realised that all my thoughts about software had been purely based on one half of the data quality journey. Data quality software is the trigger that instigates remedial action (the gate is open and your sheep have escaped). The other half of the data quality journey is being able to implement preventative actions (keeping the gate closed).

Preventing problems from happening has to be the first goal of a data quality team. Leveraging technology to ensure problems are prevented is a question of clever system design. But sometimes, all it might take is an old metal sign screwed to a gate.

Tuesday, 4 September 2012

The dating game

There's no avoiding it. They can be a source of immense frustration. I am referring, of course, to computer dates - and not the romantic type!! When it is all going well, you hardly know they are there. But beware. Dates are tricky.

Many systems and databases can have their own way of working out dates. They can store them in a bewildering array of formats. The date could also be stored as an integer, and merely converted to a date format by the application, so that when moving from one database to another, you have to factor in an algorithm to compensate.

Dates can be packed into small spaces. They can even be stored as hexadecimal. Many formats are simply not recognised by data quality software, and you may have to resort to flamboyant coding to resolve.

When scheduled systems fail and have to be run late, this can affect system generated dates. When systems become out of synchronisation, very often the data is recovered, but the dates have been delayed, which can have an adverse impact on the phasing of business intelligence reporting. You can often discover the date schedule in one system is permanently out of synchronisation with the other, creating systemic latency.

Perhaps more frustrating about dates is their very definition, especially when trying to share data between disparate systems. Nomenclature is key. But even using consistent naming conventions can cause issues. For instance "Start Date" may have a totally different definition from one system to another.

Also, when you look at a date within a system, are you sure it is correct? A profile may discover lots of dates coinciding with the date that the database started. These are often just put in as default dates when data was migrated over to the new system.

So whatever you do, wherever you are in data services. Do not neglect your dates. Even if you think you have everything planned, think again and again. 

Monday, 3 September 2012

What can data profiling do for me?

One of the more interesting challenges about publicising your data quality work is convincing colleagues about what the software tools can do for them....

Profiling is the statistical analysis of the content and relationships of data. This kind of dry description does not always capture the imagination of business leaders and accountants. You have to let them know what contexts it can be used.

So here is my list of real life applications that a data profiling tool can be used for within the information technology sphere:

1:   Spearheading migration projects.
2:   Backwards-engineering of processes.
3:   Measurement of data quality and process efficiency.
4:   Discovering relationships between disparate data sources.
5:   Identifying dual usages for fields.
6:   Assessing data eligibility for Business Intelligence purposes.
7:   Reducing risk of data issues for master data management (single customer view) projects.
8:   Testing data for any new implementation in a test environment.
9:   Monitoring the effectiveness of batch processes.
10: Assessing the implications of integrating new data into existing systems.
11: Measuring the relevance of old data.
12: Selecting the most appropriate sources of data for any project where more than one data source is available.
13: Discovering whether a data source that is made for one purpose can be used for another.

Check the comments for more great uses supplied by Sam Howley.
So for measuring and modelling your data, a profiling tool is the swiss-army-knife of data management. If you are the first  in your organisation to get one, and your colleagues know what it is capable of doing, prepare to become a very popular analyst indeed.

Wednesday, 29 August 2012


Governance or quality

The wonderful quandry - 'Which came first, the chicken or the egg?' has been with us since time immemorial. One can ask the same thing about so many aspects of life, including, perhaps, the two data management disciplines of Governance and Quality.

In modern organisations, very often you find quality audit functions appearing fairly quickly, particularly where there are manufacturing standards to uphold. In traditional manufacturing, it is easier to trace problems to specific areas and individuals to fix.

In the sphere of data management, delineation of responsibility and accountability can be a key issue, particularly when so many processes are scheduled and have been running for a long time. When systems mature, companies usually decide to put together specialist data quality initiatives. But when the data quality team discovers problems, securing resources and funds to fix them can be particularly difficult without the appropriately allocated responsible and accountable data owners.

So in this particular chicken and egg race, data quality often comes first. But to be truly effective, data governance should optimally commence first; Because without governance to enforce accountability and responsibility, data quality initiatives can fall upon deaf ears. 

Monday, 27 August 2012

The right stuff

Today I celebrate one of my lifelong heroes. On July the 20th, 1969, as  I was being born in a hospital in Macclesfield, Neil Armstrong took that iconic first step from the Eagle module onto the dusty surface of the moon.

When people consider the dangers of space travel, they like to think about the cold vacuum of space, or the radiation, or meteorites. It's a pretty dangerous place to be strapped to an overgrown firework! 

Then you add that they went there in equipment that had far less processing power than an iPhone, and you get some idea of the risks these men took.

The Apollo space missions had a computer system imaginatively called the 'Apollo Guidance Computer' (AGC). It was revolutionary at the time. It had 16 x 8 bit processors, reaching a speed of about 1 MHz (iPhone runs on 800MHz). Hobbyists are now making them in their basements for fun. 

During the moon landing phase of the Apollo 11 mission, there was great concern, because one of the crew had left a flight radar system on. When they went into land, they switched their landing radar on. Both radars functioning at the same time caused the AGC to overload. It was very fortunate that Neil Armstrong was ignoring the landing radar and landing by sight, manually, 4 miles from the agreed place with only 20 seconds of fuel left.  

The simplicity of the AGC also made it extremely complicated to fix. While orbiting the moon and preparing to land, the Apollo 14 crew noticed that the abort process was being instigated without anyone pressing the abort switch. The engineers in Cape Canaveral worked out a 'patch', and the crew had to re-program the whole system code before they could land. The whole program took 90 minutes to re-key.

So when you feel like screaming because your laptop won't connect to the internet, or your report is late, just think about Neil Armstrong and his Apollo 11 crew - and their overloaded radar system while in the final descent to the moon, alone in space, 238000 miles from earth.... and ask yourself this question. Did Neil lose his temper and blame his crew? No, he kept his cool and focused on recovery.

Safe journey home, Neil. The world will miss one of it's most enigmatic pioneers.

Friday, 24 August 2012

5 Steps to Credibility in Business Intelligence

A business intelligence department lives on it's credibility. Yet that same credibility can be undermined very quickly when the rest of the organisation is not engaged. 

Business intelligence is often seen as a bit of a 'black box' by other parts of the organisation, which can lead to misunderstandings. These lead to credibility issues. Before long, you have a long list of unwarranted queries about your reports. Credibility takes time to win and can be lost very quickly.

Here are my 5 steps to building business intelligence credibility.

1.  Definitions, definitions, definitions..

Write thorough, unambiguous, verbose business definitions with your customer and sign them off before developing. This ensures that the customer knows exactly what they are getting, the developers have a more focused set of requirements, and expectations are managed. Involving them in decision making and making the development process more visible are sure ways of building credibility. Establish a rule that all definitions are to be completed and signed off before development begins. Don't break that rule. 

2.  Visibility of lineage...

Publish your business definitions in a solution that integrates them with a metadata dictionary, so that there is complete visibility of the data lineage from user keystrokes to report. Make sure everyone knows about it. Visibility of data lineage informs people of the implications of their actions.

3.  Profile your source data before you start building

Profile the data that you are reading in. Take the results to your customer and discuss any finer points about the measure. Make it known that your area is merely reading data that other areas are manufacturing (lineage). Communicate any problems you encounter with this data and engage your Data Quality department/team. 

4.  Automated data quality measures

Arrange for your BI processes to be regularly profiled and have a data quality scorecard running on the same schedule as the finished report.

5.  Visibility of testing

Involve your customer in user-acceptance-testing activities. Even if it's just signing off the approach and the final report. This will give them visibility of the whole project life cycle.

When customers are aware of the definitions, development and data lineage, they understand that you are building the very best report you can. Giving them visibility of any data quality scorecard will highlight the steps you are going through to mitigate the risk that other areas pose to the accuracy of your report.

These are just some of the practical ways that good data governance and quality initiatives can improve the credibility of a business intelligence department.

Thursday, 23 August 2012

Humans are the exception

Even the best checking algorithms can make mistakes when dealing with a language that is as varied and flexible as ours.

The problem with language context is most commonly experienced when using predictive text messaging. How often have you sent a message and the phone has guessed the wrong word? Perhaps it's not as crazy as the one in the picture, but it happens often enough.

Speech recognition is far from complete and people are still having problems using their natural speaking language with a computer. Apple's Siri notoriously has problems understanding anyone with a distinctive accent.

So when looking for the cause of problems, the most probable systemic failure area is where there is a human to system interface. Perhaps one of the greatest challenges to the developers of future computer systems is to make them understand us better. Perhaps this will happen when we finally understand ourselves.

Tuesday, 14 August 2012

Access Excess?

Fast upon the heels of my article "Addicted to Excel", I now scrutinise another microsoft office tool - Access. When I started using databases, my first weapon of choice was Access.

Access has a very user-friendly graphical user interface that allows you to develop databases. You can drag and drop objects and build quick databases in no time at all. In my view, this is one of the most enabling applications I have ever used. It allowed me to learn about databases, relational models, macro functions, forms, queries, reports, visual basic, and many other concepts that I now take for granted. 

I owe my present career to MS Access.

But there is a point in your career when you have to 'step away from the access'.

Why? For a start, it has some serious limitations:
  • Access databases are limited to the size of 2Gb. Try to put more than 2Gb of data into them, and they will become corrupt.
  • Access is prone to corruption.
  • Access does not handle multiple users updating the same record well.
  • It is hard to make an access database secure.
  • It is hard to get access to recognise different users correctly.
  • Access is not optimised to handle the bulk loading or querying of large data sets.
  • Error handling is not good.
But the real problem comes when your try to govern and control your data. Access is so easy to use that  in a medium to large sized organisation there could possibly be hundreds of unsupported databases being developed. Unless you are insisting on full documentation and consistent development standards, your operations are at the mercy of the access developers. Access can also act like a front-end and connect to other databases via microsoft's ODBC framework. This makes it a security risk for anyone wanting to steal data.

Don't get me wrong... for an average start-up in a small business, Access is a little gem. It's suite of simple yet powerful tools are a great enabler. But for a large company, it brings too many operational and governance risks to be a serious prospect.

Monday, 13 August 2012

Addicted to excel?

When I first started in a management information role, I was bombarded with multiple requests for single pieces of MI from all and sundry. Mostly, I would write the results as short reports. For a short time, I received a great deal of largely unwarranted scrutiny. Then I decided to change my approach and instead of pasting results into documents, I pasted the results into excel spreadsheets. The scrutiny dropped rapidly.  

Am I the only person to notice this apparent disbelief of everything that is not on excel? I can't think this is an isolated part of business culture. 

Which brings me to another excel phenomenon - the 'spread mart'. Why build a business-critical data mart in a secure environment, with failover, disaster recovery and data quality feedback, when you can simply build a spreadsheet?

It constantly surprises me that business critical operations can be stored almost entirely in excel. Are you addicted? Get a cure before it's too late!

Friday, 10 August 2012

Friday Dilbert

You may know by now, I love Dilbert. Before this showing, I will leave you this question...

What is data governance without ethics?

Have a great weekend everyone!

Tuesday, 7 August 2012

Joining data - an ethical question

As the olympics draw to a close, the business of collating and analysing the data can begin in earnest. Take for instance the medal table. There has already been some interesting statistics emerging.

When you look at the normal medal table, orderd by medal count, you see the usuals at the top - China, USA etc. But when you cross-reference the medal volumes against each country's population size, you get a very different view. As a measure of medals per capita, New Zealand are top with Slovenia, Denmark and Australia close behind. This is because of their small population sizes.

The education establishment and development of sports will soon come under scrutiny, as recent data also shows that 30% of Great Britain's medals were won by people who attended a public school. This is not representative of GB, as public schools only comprise of 7% of the school population, and implies that privileged children go on to be more successful in the olympics,

As the olympic data becomes available to more and more people, expect more insight to arise as this data gets joined to other propriatory data sets. Which brings me to the crux of my point.

When you share your information, what you don't know is what data sets are going to be joined to it. How will your data be extrapolated, and will that extrapolation be correct? What kind of business and personal decisions could be made that affect your future happiness, comfort and freedom?

So, while collecting data for one purpose may be perfectly ethical, joining it to an unrelated source to make unrelated assumptions may not be. 

Monday, 6 August 2012

5 ways to support your colleagues

Predictably, the number one cause of data problems is human driven typing errors and deviations from data standards. Although there is a lot of automation, people still type data into systems. Without the correct support, we all have the capacity to get things spectacularly wrong. 

We mis-key, we mis-spell audio commands, we fail to standardise data, we put the right data into the wrong fields and we hit the wrong option in multiple choice values. Here is my 5 point plan for giving  everyone the support they need to get it right.

Field definitions - Make sure the design of the input fields restricts the keying options. If you need to put in a customer's title, restrict the field to a drop-down box of options. As much as possible, make sure that open text is not used. If they need to pick from a known customer list, give them a search option.

Training - Give all your colleagues information on how the data they manufacture is being consumed by other areas of the business. Raise their awareness. Arouse their curiosity. Win hearts and minds. It is so much more effective to get them to ask questions, rather than spoon-feeding them with empty directives.

Measurement - Measure the accuracy of your inputters and give rewards for the best and most improved. Avoid negative campaigns. Focus on what you do want, not what you don't.

Real-time validation - Modules can be added to real-time validate postal addresses, email addresses, telephone numbers and many other common data types. 

Give colleagues a voice - Listen to your colleagues. They may already have some inspiring ideas of their own about how to make things better. 

Saturday, 4 August 2012

Data - the funny side

Don't you just love Dilbert? Here are some anecdotes punctuated by Dilbert:

How often do I check spreadsheets to discover that the year-to-date average has been calculated by averaging the monthly averages? "More often than I would like" is the answer.

The employee - eye view of contractors? Surely not?

My mother always wanted me to be an actor....

Enjoy your weekend !

Wednesday, 1 August 2012

Bad data, bad health?

It has been long known that people with mental health issues have a shorter life expectancy than their healthier counterparts. But until now, it has not been understood just how far this extends. A recent health study has produced some startling results. People with mild mental health problems such as anxiety, depression and stress are 16% more likely to die prematurely.

There are many factors that contribute towards anxiety and stress - home life, status, location, self esteem etc. One of the major factors has to be where and how you work. You spend 8 hrs plus per day there, so it plays a significant role in your life. Most people will admit that they spend more time with their colleagues than their own families.

In my many years in business, one of the key drivers to stress is poor data quality that arises from high organisational entropy. Contrary to popular belief, your data quality section is not interested in apportioning blame. We are aware that the high entropy of the organisation is the real culprit. We work on all levels to solve problems, and you will rarely be personally scrutinised. If poor data quality is getting you down, you owe it to yourself, for the sake of your health, to raise your problems with colleagues who can help. Don't sit and suffer. Contact your data quality colleagues today. 

Tuesday, 31 July 2012

When is a problem not a problem?

One of your colleagues comes to you with a problem. There is a field in one of the marts that is not being populated. The data is being collected in your production database, it is just not being passed into your mart.

An analysis of your mart produces interesting results. The field has never had the data added. This is where we get into the world of the defect vs feature.

I have heard the term 'feature' politely used for something that ought to work, but was either not tested, overlooked at testing or the problem was discovered so late that a decision was made to implement without it. There is significantly less desire within your IS department to correct a 'feature' than a true deviation from agreed process ('if-it-ain't-broke-don't-fix-it'). The longer the time span between implementation and discovery of the feature, the more reluctant your colleagues will become to fix it.

It is very easy to become cynical when faced by vigorous challenges from your technical areas around remediation of features. You may have tried to fix the problem before, but with the advent of new disciplines like data quality, there is much more support available for getting these problems remediated.

So the answer is, a problem is always a problem - no matter how orwellian the description is. Some may choose to reframe the issue, but the truth is that it never worked properly. The 'feature' is still a problem. Screw your courage to the sticking post. Assert yourself. Engage your data quality section. Get it fixed.

Monday, 30 July 2012

4 fundamental questions for business intelligence

When setting up a business intelligence project, most people tend to start with systems, people and processes. It is very common to consider the 'hows' before the 'whys'. Taking a step back and asking some more fundamental questions may be more beneficial.

Any piece of information can be captured. Is it financially feasible to do so? Compare the costs to the anticipated benefits of acquiring the information. What people generally miss from a good cost benefit analysis is the cost of acquiring the money. This will depend on how your organisation is funded. If you have shareholders, the cost of acquiring money (dividend payments) is probably far more expensive than loaning from a bank (interest rates). But when you factor in financial acquisition costs, your benefit analysis may be far less than you thought. As new technologies become established, their cost comes down. Could you wait until this happens or are you losing a potential opportunity to gain an advantage over competitors?

Is it ethically appropriate to collect the information? Do your customers or colleagues know you are capturing this information? Do they have a right to know? If they found out, what would their reaction be? What are the rules about this information? What are the regulatory obligations and constraints? 

Who will own this new data? Ownership and accountability should be ascertained right at project inception. I guarantee that as soon as something has been built without an accountable owner agreed beforehand, your colleagues will head for the hills. If they can get something built without being held accountable, then they will. Without an accountable business owner, it becomes so much harder to get cooperation from the rest of the organisation when the data requires remediation. 

This is like accountability, only the other way round. A new report or data set becomes available and EVERYONE wants access. Do they really need it? Is the information politically sensitive? Could the performance of other colleagues be derived from this information? What could be the repercussions of general circulation? How valuable would the insight be to an external company? How vulnerable is this data to theft?

Consider carefully before you start. There are inherent risks - both financial and human - to collecting new data that need to be carefully considered.

Saturday, 28 July 2012

Security considerations for data quality tooling

One of the more unexpected places you will experience resistance to implementing a data quality toolset could be from your own internal IT department. If they take security and system performance seriously, they will want to know all about your new software and how it interacts with all the databases and marts.

All data quality tooling uses ODBC or JDBC connections to access your enterprise data sources. Each different connection will require a userID and password. If you are planning to implement a desktop only solution, prepare for a long fight. Desktop versions of data quality systems rely upon your PC having these connections set up and the passwords embedded. This could cause problems if other people gain access to your desktop. Some will be able to use your connections with other programs - like MS access to query your data and save it anywhere. Even if the O/JDBC connections are not accessible outside the data quality suite of programs, the embedded ETL capabilities of the data quality software may pose a security risk that may prove a step too far for some system administrators. 

The solution is to implement a server version of your data quality software. This involves installing your software on a central server. The ODBC/JDBC connections are similarly centralised. Users then have a desktop program that interfaces with the server, and cannot access it until a password is keyed in. This is far more secure, but will effectively treble the set-up costs, especially if you build a failover solution. A failover option may be mandatory if you work in a highly governed business like pharma or banking.

In this information age, the value of data has never been higher. With all current crime trends pointing to internal colleagues stealing data and selling on to other companies as being the biggest data security threat, it is important that the capability of data quality tooling is not perceived as too great a risk. All of these risks can be successfully mitigated with the correct infrastructure implemented and governance controls around appropriate access to data. 

Thursday, 26 July 2012

Trust, a bridge to better quality

Remember my post on organisational entropy? One of the key secrets to lowering the entropy with everyone in your organisation is the building of trust. You can tell when trust is lost with any individual, because every interaction becomes almost impossible. Working in the quality and governance space is a highly sensitive dynamic, for who will trust you with their business problems if they believe you are not trustworthy?

You can take the role of a critical friend; you can become a master of body language and unspoken communication; you can even learn NLP. But these are all superfluous without the core principles to building trust. Here is my take on building trust:

Deliver to your promise
Say what you are going to do, when you will do it, and deliver. When you can consistently do this, you are well on the way to building trust. Broken promises damage trust. If you find that you cannot deliver on something, go immediately to your customer, apologise and let them know what you can do. The earlier you can tell your customer about a potential problem, the better they manage expectations elsewhere.

Keep things private and personal, appropriately
Nothing destroys trust more than if you blurt out gossip that is told to you in confidence. This is particularly damaging if you talk about one department's troubles to another one. Word gets around that you can't be discreet, and that can cause problems. However, this is not the same as keeping secrets from colleagues who should be informed. If someone tells you of criminal activities, report them to the appropriate officials, and not to your friends on coffee break.

Delegate to educate
When you delegate something, you are also telling your colleague that you trust them. But don't just drop it and run. Use it as an opportunity to coach them in your area of expertise. Share the decision making that you would make if you were doing it yourself. Build that rapport.

Deliver together
If you have worked on something with someone, present the results together. Make sure they know they are being recognised as a key contributor. 

Take responsibility appropriately 
Don't blame others when things go wrong. At the same time, don't accept blame when it is not your fault. 

When dealing with data quality or governance issues, you are in a position of trust. Therefore, be worthy of that trust. It is one of the most important things you can build with your colleagues. Be consistent, be effective, be reliable and fair. The rewards are great, as Emerson said...... 
"Trust men and they will be true to you; treat them greatly and they will show themselves great."