Saturday, 24 November 2012

Just how wrong can you get?

The specification statement was clear, "Dogs must be kept on a lead". The project analysts got together to break the request down. The statement did not say how many dogs must be kept on a lead. But after much consideration, the scope was defined as 'Any dogs within the area upon which the sign was situated.'

Everyone agreed that the 'Must' category was the best part of the statement..... (obviously a MoSCoW requirement.) The word 'kept' caused much anguish amongst the project analysts. What did they mean by kept? The dictionary was not much help....

kept - (especially of promises or contracts) not violated or disregarded; "unbroken promises"; "promises kept"
1. the past tense and past participle of keep
kept woman Censorious a woman maintained by a man as his mistress

Much thought was poured into the interpretation. Did 'kept' imply that the dogs were to be groomed  or fed while sat on the surface of a lead? They could not agree. One contractor finally lost his temper and said, "For crying out loud, enough of this intellectual onanism... can't you take anything literally? It's obvious. If you can read this sign, and you have a dog, it must be kept attached to a lead."

The next day, he lost his job for having an attitude.

And an additional handbook was written about how to hold the lead and who should be holding it. It stated clear guidelines as to the heaviest dog you could possibly hold, plus a leaflet about your dog's health, the potential risks to the walker and the affects of chafing on the hands.

The analysts, having "questioned to the void" long ago, had decided to pass the specification to their technical department, unchanged:

"Dogs must be kept on a lead"

The technical department sniffed at the statement. How could their colleagues in the business be so ambiguous? They found out where the sign was designed to go and surveyed the park capacity and neighbouring population. They decided there was an average at any given time of 12 dogs within the park, with a maximum capacity of 32 at peak times.

So taking this maximum into account, they constructed a huge lead with 40 collars attached.

As a RACI matrix wasn't signed off, no-one is accountable for actually holding the lead.

Saturday, 17 November 2012

Breaking through the Cassandra Complex

So we all know about the Cassandra Complex from a previous post? Take time to read it. Done that? Good! So here is part two - getting your colleagues to believe in what you are saying, and act on your requests.

When you work in a governance, compliance or data quality role within an organisation, it can be very hard to persuade people that your proposals are correct. One of the biggest mistakes I have seen, is that professionals push for their own agendas without taking time to understand the people they are trying to convince. This brings me neatly to my first step:

1. Conduct an opinion survey
Find out exactly what your colleagues are thinking throughout the organisation. The ideal way is to construct an anonymous opinion survey where they are just asked for department, area and role. Here is a great site that helps you construct on line surveys. You have probably seen the kind of questionnaire. You list common opinions (both good and bad), and they fill in whether they agree or disagree with them. The results of this survey will be the basis for your communications strategy, but one of the amazing things about surveys is, they are also great tools to get people to evaluate their beliefs.

2. Prioritise the biggest areas of concern
Perhaps you discover that your colleagues don't believe they can make a difference. Perhaps they don't understand how important their role is. They may not care about the consequences of getting things wrong. They may even be too afraid of reprisals to talk about problems they are having. Find your top problems for each department.

Now for each high priority damaging opinion, you need to build a communications strategy that does the following:

3. Create uncertainty about the damaging opinions
The first thing you should do is to make them feel less certain about these opinions. Refer to occasions where their opinion would not have worked. Quote the relevant statistics that contradict them. Ask them whether this opinion serves them and your customers well enough.

4. Reduce their resistance to the opinions you want them to have
This is where you start to introduce why your opinion is better for them, your customers and the company as a whole. 

5. Amplify your new attitude
Reframing opinions brings out changes in attitude. Once a change of attitude happens, it is important that you don't just stop there. You have to paint a vivid and inspiring picture of just how great things are going to be for everyone involved. This is often referred to as "encouraging the heart".

6. Test your results
Conduct another survey after your comms has been implemented. Compare it against the previous one. What has changed? Has anything improved? What are the lessons learned?

Remember, just conducting a survey will greatly change the attitudes of your colleagues. It will also show that you value their opinion, which can be one of the barriers. Also, being able to profile the beliefs and values of a department before you meet with them will give you plenty of help in understanding the problems they have on a day-to-day basis.

Once you understand your colleagues opinions and beliefs, you can change their attitude and gain the behaviour and co-operation that you need.

Wednesday, 14 November 2012

Is your excel formulae correct?

Formulae can save you a lot of time. Excel employs some fantastic formulae that can really help you analyse data. But what can go wrong?

Well, for 2012, there will be 53 weeks if you use the WEEKNUM formula. It is a calendar quirk that may throw any calculations you may have this year. Look out for it. Here are some of the more common formulae problems I see in Excel spreadsheets:

1.  Not coding for zero values

Dividing one field by another is simple enough:

=A1/A2

But what if one of your values could possibly be zero? Excel does not like dividing by zero and you will end up with an error. So put in a condition. Perhaps like this:

=IF(A2 = 0,0,(A1/A2))

2.  Circular references

You have two fields A1:=B1+5 and B1:=A1-5.

A1 requires B1 to be calculated, but B1 requires A1 to be calculated. Don't do it. Excel doesn't like it.

3.  Hard-coding values into your formulae

Write your formulae as clear as possible. Do not put values into your formulae. Have your formulae reference values in other fields

Wrong:        = A1*120

Right:          = A1*B1       (nb populate the field B1 with the value 120)

4.  Not including balance checks

When you have totalled your columns of data, it really is worth putting some balance checking into your sheet. This gives you some indication that your formulae has worked, and there are no missing values, or blank spaces where you haven't dragged your formulae properly.

Excel is a great tool for working things out. I'm sure you have come across more common errors. Leave your comments in the place below.

Sunday, 4 November 2012

Is your business culture like a Greek tragedy?

Sounds a bit dramatic? It could well be. But the phenomenon I am referring to is the tale of Cassandra. In Greek mythology, Cassandra was a beautiful woman who refused the romantic attention from the god, Apollo. He had his revenge by giving her the gift of being able to see the future, but in a cruel twist, cursed her so that no-one would believe her. She was said to have been ignored when she predicted the fall of the ancient city, Troy.

Sadly, this is a real phenomenon. There are people who have been totally ignored by their colleagues and peers. Nouriel Roubini was the economist who accurately predicted the collapse of the housing market and the worldwide recession. The New York Times famously labelled him, "Doctor Doom", and he was ignored when he addressed the IMF about his concerns. 

Environmental campaigners have been predicting the global warming, melting ice caps, rising sea levels and crazy weather since the 1970's, when the phenomenon was labelled the greenhouse effect. To this day, they are still being ignored, despite the rising tide of scientific evidence that it is happening. 

When someone makes wild claims to be able to see the future, and their predictions are excessively negative and not based on any sound judgement, they are often referred to as having a "Cassandra Complex".

Operating in the space of data management, data governance and data quality, you could very easily be labelled as "Cassandra". You will be telling people that the easiest way is not the best. You should be warning them of unacceptable risk taking. You should be encouraging best practices. You will be insisting on higher standards of technical development. This is not the kind of thing that most people want to hear.

How do you avoid being labelled the tinfoil-hatted doom merchant in the corner? Stay tuned. My next blog article will tell you how.

Thursday, 25 October 2012

How Scary Is Your Data?

Soon it will be Halloween. It's a time of ghosts, ghouls and demons. But all of that pales into insignificance when compared to the truly terrifying reality of kids running around the streets pumped up on chocolate, sugar and energy drinks!!!! And to celebrate the witching hour, here is my list of halloween data horrors... Don't say I didn't warn you... Mouhahahahaaaaaa!!!!

Undead data
This is the ancient data that you did not kill off. It served it's purpose years ago, and you archived it, but did not delete it. It lies in it's crypt waiting... waiting.. for the sun to set. If your regulators find out, it's you who will get it in the neck.

Alien data
It comes from another world (cue 50's b-movie music)... namely that company you have outsourced your data collection to. But you forgot to include data quality and governance standards in the agreement. And now you have data that is taking up all your resources trying to make sense of it. no-one can agree on the results and your whole organisation is paralysed.

Frankenstein data
They wanted to know diabolical things about your organisation, and they didn't care about how you did it. You could not find any documentation on your data sources, and they would not pay for a profiling tool. So you bolted and stitched huge amounts of unrelated data together to create an abomination. Deep into the night, you worked feverishly until finally you hysterically cried, "It lives, it lives".... All were amazed how you could breathe life into dead data and you reaped rewards. But deep down, you know it's only a matter of time before it either comes apart or brings your whole organisation crashing down around you.

Zombie systems
Those legacy systems died years ago. But someone keeps digging them up and re-animating them. Whoever did it, they certainly seem to have lost their BRRAAAIIIINS!!!

Godzilla data
No-one knows how or why they asked for it, but now it's here, and it's just too big. The scale is massive. All your IT staff run away screaming while it crushes servers and tangles networks. This big, 'Godzilla' data is requiring some other monster called 'Hadoop' to sort it out. They were last seen fighting off the coast of Java.

I hope you enjoyed my tales of data horror. Sleep well, now.. Pleasant dreams.... Mouhahahaaaa!!

Tuesday, 23 October 2012

Data Quality Failure - Apple Style

When Apple launched the iPhone 5, much was made of the new features of IOS6. One of which was the new maps application. This was lauded as "A beautiful vector based interface" and "Everything's easy to read and you won't get lost".

Although the application functioned well, the data it used was far from effective. Unlike the hype, people started to 'get lost'. One thing is patently clear. Apple had not conducted any data quality analysis of the databases that the maps application consumes. 

All databases are models of reality. The discipline of data quality is to ensure that the database is the best model possible. It is obvious that the maps database was not checked against reality to ascertain whether it was an accurate or complete model.

An independent analysis of a sample of the Apple Maps (using the Canadian province of Ontario) provided some interesting stats. Of the 2028 place names in Ontario, 400 were correct, 389 were close to correct,  551 were completely incorrect, and 688 were missing. 

Apple did not gather this information. It acquired the street and place data from Tom-Tom (the vehicle satellite navigation company) and integrated it with other databases. Despite strenuous denials of culpability by Tom-Tom, the facts show that the location data experienced by the users was missing or incorrect. 

To say that this has undermined the reputation of Apple is a large understatement. It prompted a public apology by the CEO, Tim Cook. 

So could Tom-Tom and the other suppliers of maps data have knowingly supplied incorrect data to Apple? Probably not. Surely Apple had data quality measurement in place? The results suggest not. Only 19.7% accurate place names and 33.9% of place names missing. 

When entering into agreements with 3rd party suppliers of information it is imperative that data quality standards are insisted on as part of the commercial agreement - with penalties for non-compliance. As the results of this little mess between Apple and their suppliers show, you may be able to outsource responsibility, but not accountability.





Thursday, 11 October 2012

5 Steps to Choosing the Right Data

You have a project, and you need data. So you go to your metadata dictionary and search for a data source, and you discover that there are several sources that you could possibly choose. Perhaps you have multiple measures and you need to know which ones to retire. How do you make the right choice? This is my 5 steps to choosing the best data source for your project.

1.  Classify and develop your objectives:
List all of your requirements. What data fields you need, reporting frequency, timings, transaction types, granularity etc. Make sure they are either classified as 'musts haves' or 'wants'. When you have a full list, give each 'want' a weighting score - highest value being most important.  

2.  Profile the data sources.
Build and run profiles of the data in each data source. Examine the field types, volumes, dates, times, transaction types and granularity. Profiling any creation timestamps will give you an idea of the scheduling that runs on the data. 

3.  Match the attributes and profile results of the data sources against the objectives.
Based on the profiling, how well does each source satisfy the objectives? Consider the timeliness of update and batch windows. Do they match the schedule in your objectives? Are the data sources structurally compatable to your requirements? Does each data source provide the correct level of granuarity? If any of the 'must haves' are not met for a data source, reject it outright. For all the other options, total the score based on how well they achieve the 'wants'.

4.  Idenfity the risks
Take your two highest scorers and ask yourself the following questions about them:
  • What future threats should we consider?
  • If we choose the data source, what could go wrong?
  • Is our understanding of this data source good enough?
  • What are the capacity/system constraints?
5.  Choose your preferred data source:
Are you willing to accept the risks carried by the best performer in order to attain the objectives?
                 
          If yes, choose it.
          If no, consider the next best performer and ask again.

So there you have it, a rigorous approach to choosing the best data source. How much detail you go to will depend on the rigour that is required for your industry sector.