11/7/14

Who is 'data' really ?

I have been listening recently to discussions about data, often in the context of transforming something or often with critical implications for some bottom line, at other times with claims about it being by nature 'disruptive'.

But increasingly I am beginning to suspect that many of these intertwined discussions are fundamentally different ones. Or at least, that people mean quite different things and actually have quite different thought processes behind how they wish to present 'data' in any given case.

For example if I am talking with people about Open Data it is quite a lot easier. The crowd who do this are very clear about what this means for them. Here and here are posts from Open Knowledge advocating for these standards.

Data varies in terms of structure, use, and things like relevance and other qualities in different contexts. 

Then there are different types of data: open data, big data, etc...and the boundaries sometimes get blurred.


In terms of big data for example the defnition put forward by SAP from a business perspective points you to the qualities of the data, eg Volume, variety, velocity, validity.

This is a much more technical and ‘neutral’ definition really compared to the definition of 'big data' from UN Global Pulse for example. From the outset the concern is less about users, sources or use cases for the data.

Does it matter which definition we use? Well, I think it does but often in subtle ways. Agencies like UN Global Pulse are fairly clear about focusing on big data for global development. So it is really looking at a use case or context for application. I think it is also worth bearing in mind that the document seems to locate the construction of this definition at some time subsequent to meetings of the World Economic Forum and also of the G20.

To me this raises another associated question sort of by implication. For any agency or policy mission it is almost impossible to escape this - it is part of the whole business of making policy. In fact it is really hard to talk about something without thinking about what it is for. Inevitably the interests of certain groups become represented in the process.

So for UN Global Pulse we are looking at big data within international development or developing countries particularly with the G20 in mind, so with an emphasis on economic activity. Consider also that UN Global Pulse does not have direct involvement with UN agencies such as the UN Department of Economic and Social Affairs, or the United National Economic and Social Council (Ecosoc). So you might potentially expect a different vision again could easily come from other areas within the UN even.

Robert Kirkpatrick, who is the director of UN Global Pulse , gives a pretty clear interview here about what UN Global Pulse is, as he says it is a sort of R&D lab based in New York, Jakarta and Kampala.

There are reasons why it matters, and what the 'user case' is going to be. For example the issue of privacy particularly is one of the most obvious problematic areas with how big data is being presented and discussed in this document for me. If you look at the diagram from the World Economic Forum white paper that UN Pulse uses here, you can see that privacy concerns do surface at two levels but not at all clearly for private industry:



It simply mentions questions about ‘Ownership of sensitive data’.

In terms information streams relevant to global development it highlights these four:
- Data exhaust
- Online information
- Physical sensors
- Citizen reporting

Recently at the Open Development Camp in Amsterdam, I was able to listen to discussions and work going on around this area particular.  I think it is one that has been overlooked, but partly also crowded out by the language that gets adopted, in exactly the way it has been above. In a way it also has been silenced.

Check out the presentation here from the ODC in Amsterdam on "Responsible Development Data" and also download a copy of the great handbook that was written during the booksprint with the Engine Room and check out some of the discussion going on around this at Open Knowledge.

In case you want to see the other great presentations from ODC, they are all available on Vimeo here

So why does matter who creates the frameworks? Well, although the WEF sees a role for the private sector in the diagram above in terms of ‘managing and collecting’ consumer data, the vision in this document seems to mostly skip over how one huge area of data could really be made use of to protect people from shocks and price changes. That is of course the data produced and held by corporations on their own activities and those of consumers. It does mention how we could make inferences about the prices of food from interactions via mobile phones with banks, but it does not say how we could inform consumers or even small businesses about the movements of global corporations, exchange rates or even of their own government in order to manage price shocks for example.

Furthermore the use of terms such as ‘stock levels’ of data, and the constant reference to measuring things such as school attendance is more suggestive to me of seeking to optimise and work around concerns about human capital, productivity and efficiency. In short it tends to situate human activity and society very much within the gaze of a corporate understanding of the world.

The microeconomic / optimisation of bounded rationality type decision-making  / predicitive modelling and closed feedback loop type solutions all appear founded on a concept of living in a world founded upon a paradigm of inevitable risks and shocks. I personally think this is a dangerous and slightly spurious argument. It serves I think to lead us to default to placing nearly no emphasis on deliberative proceedures or processes. I mean if we are talking about work in 'international development' then really it presents quite limited scope for human agency in the conception of big data for development here.

It really seems to be advocating for a sort of social fire alarm and spinkler system powered by streams of automatic data. But it assumes in a way that we are living inside the machine, and like Sunstein and Thaler’s idea of the nudge, it is not really about choice but about modifying choice architectures that sustain an equilibrium around a volatility inherent in the system, both via feeback loops and millions of automatic corrections. All of this for me almost instantly assumes that forms of consent to this are automatic or simply fundamentally broken or degraded to a virtually meaningless degree.

Today another important document was released by the Independent Expert Advisory Group on ‘data revolution and sustainable development'.

NGO's particularly have been critical of the weak representation of CSO's around the UN 'data revolution' discussions, see here and also here.

The press conference just went out here.

You can access the document itself and also details of the consultation and judge for youself if you think that it represents a fair process or not. Here is the report in full.

It looks at two key areas in terms of the 'data revolution' for sustainable development:
  • The challenge of invisibility (gaps in what we know from data, and when we find out)
  • The challenge of inequality (gaps between those who with and without information, and what they need to know make their own decisions)
I will be continuing to blog about it in my next post. 

No comments:

Post a Comment