Tuesday, 7 August 2012

Joining data - an ethical question

As the olympics draw to a close, the business of collating and analysing the data can begin in earnest. Take for instance the medal table. There has already been some interesting statistics emerging.

When you look at the normal medal table, orderd by medal count, you see the usuals at the top - China, USA etc. But when you cross-reference the medal volumes against each country's population size, you get a very different view. As a measure of medals per capita, New Zealand are top with Slovenia, Denmark and Australia close behind. This is because of their small population sizes.

The education establishment and development of sports will soon come under scrutiny, as recent data also shows that 30% of Great Britain's medals were won by people who attended a public school. This is not representative of GB, as public schools only comprise of 7% of the school population, and implies that privileged children go on to be more successful in the olympics,

As the olympic data becomes available to more and more people, expect more insight to arise as this data gets joined to other propriatory data sets. Which brings me to the crux of my point.

When you share your information, what you don't know is what data sets are going to be joined to it. How will your data be extrapolated, and will that extrapolation be correct? What kind of business and personal decisions could be made that affect your future happiness, comfort and freedom?

So, while collecting data for one purpose may be perfectly ethical, joining it to an unrelated source to make unrelated assumptions may not be. 


  1. I completely agree. An even greater fear is government use or misuse of extrapolated data. With new laws including warrantless access to online information, you may unknowingly or mistakenly be associated with criminals or criminal activity.

    Data from the Internet is almost certainly a contributing factor to the exponential growth of prison populations worldwide. Six degrees of separation and erroneous data connections should be of great concern to innocent, law abiding citizens.

    1. Hi Dwayne, thanks for your comment. I agree with your sentiment that unwarranted extrapolation is a real problem. The rise in prison populations has probably got more to do with how police are handling their resources and changes in criminal law.

      With progress, there will always be regress. In the industrial revolution, power looms were smashed by 'luddites' who were concerned that they would lose their jobs. The problem with the legal profession is that they find it hard to keep pace of technical innovation. It is really important that ethical driven data governance fills this gap and keeps us all on the 'straight and narrow', and therefore prevent draconian legislation from being imposed.