Wednesday 29 August 2012

Extrapolation


Governance or quality

The wonderful quandry - 'Which came first, the chicken or the egg?' has been with us since time immemorial. One can ask the same thing about so many aspects of life, including, perhaps, the two data management disciplines of Governance and Quality.

In modern organisations, very often you find quality audit functions appearing fairly quickly, particularly where there are manufacturing standards to uphold. In traditional manufacturing, it is easier to trace problems to specific areas and individuals to fix.

In the sphere of data management, delineation of responsibility and accountability can be a key issue, particularly when so many processes are scheduled and have been running for a long time. When systems mature, companies usually decide to put together specialist data quality initiatives. But when the data quality team discovers problems, securing resources and funds to fix them can be particularly difficult without the appropriately allocated responsible and accountable data owners.

So in this particular chicken and egg race, data quality often comes first. But to be truly effective, data governance should optimally commence first; Because without governance to enforce accountability and responsibility, data quality initiatives can fall upon deaf ears. 

Monday 27 August 2012

The right stuff

Today I celebrate one of my lifelong heroes. On July the 20th, 1969, as  I was being born in a hospital in Macclesfield, Neil Armstrong took that iconic first step from the Eagle module onto the dusty surface of the moon.

When people consider the dangers of space travel, they like to think about the cold vacuum of space, or the radiation, or meteorites. It's a pretty dangerous place to be strapped to an overgrown firework! 

Then you add that they went there in equipment that had far less processing power than an iPhone, and you get some idea of the risks these men took.

The Apollo space missions had a computer system imaginatively called the 'Apollo Guidance Computer' (AGC). It was revolutionary at the time. It had 16 x 8 bit processors, reaching a speed of about 1 MHz (iPhone runs on 800MHz). Hobbyists are now making them in their basements for fun. 

During the moon landing phase of the Apollo 11 mission, there was great concern, because one of the crew had left a flight radar system on. When they went into land, they switched their landing radar on. Both radars functioning at the same time caused the AGC to overload. It was very fortunate that Neil Armstrong was ignoring the landing radar and landing by sight, manually, 4 miles from the agreed place with only 20 seconds of fuel left.  

The simplicity of the AGC also made it extremely complicated to fix. While orbiting the moon and preparing to land, the Apollo 14 crew noticed that the abort process was being instigated without anyone pressing the abort switch. The engineers in Cape Canaveral worked out a 'patch', and the crew had to re-program the whole system code before they could land. The whole program took 90 minutes to re-key.

So when you feel like screaming because your laptop won't connect to the internet, or your report is late, just think about Neil Armstrong and his Apollo 11 crew - and their overloaded radar system while in the final descent to the moon, alone in space, 238000 miles from earth.... and ask yourself this question. Did Neil lose his temper and blame his crew? No, he kept his cool and focused on recovery.

Safe journey home, Neil. The world will miss one of it's most enigmatic pioneers.

Friday 24 August 2012

5 Steps to Credibility in Business Intelligence

A business intelligence department lives on it's credibility. Yet that same credibility can be undermined very quickly when the rest of the organisation is not engaged. 

Business intelligence is often seen as a bit of a 'black box' by other parts of the organisation, which can lead to misunderstandings. These lead to credibility issues. Before long, you have a long list of unwarranted queries about your reports. Credibility takes time to win and can be lost very quickly.

Here are my 5 steps to building business intelligence credibility.

1.  Definitions, definitions, definitions..

Write thorough, unambiguous, verbose business definitions with your customer and sign them off before developing. This ensures that the customer knows exactly what they are getting, the developers have a more focused set of requirements, and expectations are managed. Involving them in decision making and making the development process more visible are sure ways of building credibility. Establish a rule that all definitions are to be completed and signed off before development begins. Don't break that rule. 

2.  Visibility of lineage...

Publish your business definitions in a solution that integrates them with a metadata dictionary, so that there is complete visibility of the data lineage from user keystrokes to report. Make sure everyone knows about it. Visibility of data lineage informs people of the implications of their actions.

3.  Profile your source data before you start building

Profile the data that you are reading in. Take the results to your customer and discuss any finer points about the measure. Make it known that your area is merely reading data that other areas are manufacturing (lineage). Communicate any problems you encounter with this data and engage your Data Quality department/team. 

4.  Automated data quality measures

Arrange for your BI processes to be regularly profiled and have a data quality scorecard running on the same schedule as the finished report.

5.  Visibility of testing

Involve your customer in user-acceptance-testing activities. Even if it's just signing off the approach and the final report. This will give them visibility of the whole project life cycle.

When customers are aware of the definitions, development and data lineage, they understand that you are building the very best report you can. Giving them visibility of any data quality scorecard will highlight the steps you are going through to mitigate the risk that other areas pose to the accuracy of your report.

These are just some of the practical ways that good data governance and quality initiatives can improve the credibility of a business intelligence department.

Thursday 23 August 2012

Humans are the exception


Even the best checking algorithms can make mistakes when dealing with a language that is as varied and flexible as ours.

The problem with language context is most commonly experienced when using predictive text messaging. How often have you sent a message and the phone has guessed the wrong word? Perhaps it's not as crazy as the one in the picture, but it happens often enough.

Speech recognition is far from complete and people are still having problems using their natural speaking language with a computer. Apple's Siri notoriously has problems understanding anyone with a distinctive accent.

So when looking for the cause of problems, the most probable systemic failure area is where there is a human to system interface. Perhaps one of the greatest challenges to the developers of future computer systems is to make them understand us better. Perhaps this will happen when we finally understand ourselves.

Tuesday 14 August 2012

Access Excess?

Fast upon the heels of my article "Addicted to Excel", I now scrutinise another microsoft office tool - Access. When I started using databases, my first weapon of choice was Access.

Access has a very user-friendly graphical user interface that allows you to develop databases. You can drag and drop objects and build quick databases in no time at all. In my view, this is one of the most enabling applications I have ever used. It allowed me to learn about databases, relational models, macro functions, forms, queries, reports, visual basic, and many other concepts that I now take for granted. 

I owe my present career to MS Access.

But there is a point in your career when you have to 'step away from the access'.

Why? For a start, it has some serious limitations:
  • Access databases are limited to the size of 2Gb. Try to put more than 2Gb of data into them, and they will become corrupt.
  • Access is prone to corruption.
  • Access does not handle multiple users updating the same record well.
  • It is hard to make an access database secure.
  • It is hard to get access to recognise different users correctly.
  • Access is not optimised to handle the bulk loading or querying of large data sets.
  • Error handling is not good.
But the real problem comes when your try to govern and control your data. Access is so easy to use that  in a medium to large sized organisation there could possibly be hundreds of unsupported databases being developed. Unless you are insisting on full documentation and consistent development standards, your operations are at the mercy of the access developers. Access can also act like a front-end and connect to other databases via microsoft's ODBC framework. This makes it a security risk for anyone wanting to steal data.

Don't get me wrong... for an average start-up in a small business, Access is a little gem. It's suite of simple yet powerful tools are a great enabler. But for a large company, it brings too many operational and governance risks to be a serious prospect.

Monday 13 August 2012

Addicted to excel?

When I first started in a management information role, I was bombarded with multiple requests for single pieces of MI from all and sundry. Mostly, I would write the results as short reports. For a short time, I received a great deal of largely unwarranted scrutiny. Then I decided to change my approach and instead of pasting results into documents, I pasted the results into excel spreadsheets. The scrutiny dropped rapidly.  

Am I the only person to notice this apparent disbelief of everything that is not on excel? I can't think this is an isolated part of business culture. 

Which brings me to another excel phenomenon - the 'spread mart'. Why build a business-critical data mart in a secure environment, with failover, disaster recovery and data quality feedback, when you can simply build a spreadsheet?

It constantly surprises me that business critical operations can be stored almost entirely in excel. Are you addicted? Get a cure before it's too late!

Friday 10 August 2012

Friday Dilbert

You may know by now, I love Dilbert. Before this showing, I will leave you this question...

What is data governance without ethics?





Have a great weekend everyone!

Tuesday 7 August 2012

Joining data - an ethical question

As the olympics draw to a close, the business of collating and analysing the data can begin in earnest. Take for instance the medal table. There has already been some interesting statistics emerging.

When you look at the normal medal table, orderd by medal count, you see the usuals at the top - China, USA etc. But when you cross-reference the medal volumes against each country's population size, you get a very different view. As a measure of medals per capita, New Zealand are top with Slovenia, Denmark and Australia close behind. This is because of their small population sizes.

The education establishment and development of sports will soon come under scrutiny, as recent data also shows that 30% of Great Britain's medals were won by people who attended a public school. This is not representative of GB, as public schools only comprise of 7% of the school population, and implies that privileged children go on to be more successful in the olympics,

As the olympic data becomes available to more and more people, expect more insight to arise as this data gets joined to other propriatory data sets. Which brings me to the crux of my point.

When you share your information, what you don't know is what data sets are going to be joined to it. How will your data be extrapolated, and will that extrapolation be correct? What kind of business and personal decisions could be made that affect your future happiness, comfort and freedom?

So, while collecting data for one purpose may be perfectly ethical, joining it to an unrelated source to make unrelated assumptions may not be. 

Monday 6 August 2012

5 ways to support your colleagues

Predictably, the number one cause of data problems is human driven typing errors and deviations from data standards. Although there is a lot of automation, people still type data into systems. Without the correct support, we all have the capacity to get things spectacularly wrong. 

We mis-key, we mis-spell audio commands, we fail to standardise data, we put the right data into the wrong fields and we hit the wrong option in multiple choice values. Here is my 5 point plan for giving  everyone the support they need to get it right.

Field definitions - Make sure the design of the input fields restricts the keying options. If you need to put in a customer's title, restrict the field to a drop-down box of options. As much as possible, make sure that open text is not used. If they need to pick from a known customer list, give them a search option.

Training - Give all your colleagues information on how the data they manufacture is being consumed by other areas of the business. Raise their awareness. Arouse their curiosity. Win hearts and minds. It is so much more effective to get them to ask questions, rather than spoon-feeding them with empty directives.

Measurement - Measure the accuracy of your inputters and give rewards for the best and most improved. Avoid negative campaigns. Focus on what you do want, not what you don't.

Real-time validation - Modules can be added to real-time validate postal addresses, email addresses, telephone numbers and many other common data types. 

Give colleagues a voice - Listen to your colleagues. They may already have some inspiring ideas of their own about how to make things better. 

Saturday 4 August 2012

Data - the funny side

Don't you just love Dilbert? Here are some anecdotes punctuated by Dilbert:



How often do I check spreadsheets to discover that the year-to-date average has been calculated by averaging the monthly averages? "More often than I would like" is the answer.

The employee - eye view of contractors? Surely not?



My mother always wanted me to be an actor....



Enjoy your weekend !

Wednesday 1 August 2012

Bad data, bad health?

It has been long known that people with mental health issues have a shorter life expectancy than their healthier counterparts. But until now, it has not been understood just how far this extends. A recent health study has produced some startling results. People with mild mental health problems such as anxiety, depression and stress are 16% more likely to die prematurely.

There are many factors that contribute towards anxiety and stress - home life, status, location, self esteem etc. One of the major factors has to be where and how you work. You spend 8 hrs plus per day there, so it plays a significant role in your life. Most people will admit that they spend more time with their colleagues than their own families.

In my many years in business, one of the key drivers to stress is poor data quality that arises from high organisational entropy. Contrary to popular belief, your data quality section is not interested in apportioning blame. We are aware that the high entropy of the organisation is the real culprit. We work on all levels to solve problems, and you will rarely be personally scrutinised. If poor data quality is getting you down, you owe it to yourself, for the sake of your health, to raise your problems with colleagues who can help. Don't sit and suffer. Contact your data quality colleagues today.