So we need data - for what?

This is a second blog post following the launch of the UN report on the ‘Data Revolution’. I'll be exploring my own ideas, and also trying to outline some ways that complexity theory is being used to work in this space. I'll also see if I can illustrate some points from my previous post as well. I am slowly rolling these ideas along, which appear to have first been scribbled in apparent frustration on a post-it below (there were many more). This is from a collaboration and discussion that took place earlier in 2014 at the Open Development Fringe Event at OKFest in Berlin. 

Brainstorm At Open Development Fringe Event 2014
Creative Commons Licensed (BY) flickr photo by tomsalmon: http://flickr.com/photos/fishytom/15185058154

One of the biggest challenges facing policy makers today is complexity, and this relates in interesting ways to ideas about a ‘Data Revolution’. There are quite a few initiatives working out ways to understanding the limits to governing and acting instrumentally to fulfill policy goals. The OECD published a piece on ‘Applications of Complexity Science for Public Policy’ in 2009 and I'll take that as a fairly good indication of it being on the horizon of mainstream public policy debates. I am not particularly familiar with approaches that are established in terms of dealing with 'complexity'. So my understanding is that complex situations might lead people to offer preventative policy approaches, others might adopt resilience terminology while a final group makes up a more transformative paradigm. I'll try to avoid rolling them all into one, as I am sure that not all approaches are equal or have the same notion of complexity embedded in them. I also have not yet come across any 'approaches' that see a particular virtue in making any claims about 'disruption' either, but I will keep my eyes open...

So take the recently published UNDP Human Development Report 2014, which proposes as its central idea the concept of ‘human vulnerability‘ as a paradigm of complexity for development in the future. It goes on to describe the prospects post-2015, particularly of development and growth eroding people’s capabilities and choices, and argues for a sustained enhancement of individuals’ and societies’ capabilities with a particular focus on collective action to counter this.

It puts forward an argument about ‘structural vulnerability’, which as it is proposed in the report seems to go further than ideas of ‘risk mitigation’ or a more limited ‘resilience type’ approaches to generate a clearer picture of how inequality interacts with interventions and actors. So as the report points out, eliminating extreme poverty is not just about ‘getting to zero’; it is also about staying there. It strongly adopts the language of capabilities for individuals, and cohesiveness and responsiveness for institutions and societies describing both how affirmative action and rights based approaches fit in to this picture as well as openness, transparency and accountability initiatives.

A simple message might be that this is one way of looking at adaptation in complex systems going through processes of change and innovation. Discussions around the 'data revolution' also seem caught up in an emerging tangle of efforts around this, although of course there are different takes on the ideas from different stakeholders.

For example one corner there is some interesting work going on with looking at how to solve complex problems better at the Kennedy Centre BuildingCapacity Program and the ODI where they are talking about ‘Doing Development Differently’ (yes: #DDD) and Problem Driven Iterative Adaption (yes: #PDIA).

At first glance some of the language about ‘politically smart development’ from them appears at some point borrowed from programs like the Developmental Leadership Program (DLP). However, the emphasis is in fact more on thinking about design as a paradigm and using design thinking with problem solving approaches like the 5 Why’s, problems trees and fishbone diagrams (the Ishakawa diagram). The Kennedy Center strongly references Hirschman’s ideas and approaches quoting the “Principle of the Hiding Hand” proposed by him:

      “men engage successfully in problem-solving [when] they take up problems which they think they can    
       solve, find them more difficult than expected, but then, being stuck with them, attack willy-nilly the 
       unsuspected difficulties – and sometimes even succeed.”

I have to say, to me this is not so much a statement of approach, but it certainly sums up a lot of my experience of 'problem solving' within large slow moving organisations that are not particularly good at encouraging divergent thinking. Another way to put it, as Matt Andrews does in his blog is that people have strong incentives to opt for 'big best practice type initiatives' and will also tend to 'overestimate the potential results'. So let's imagine upfront that we accept a degree to which path dependency and institutional constraints affect thinking in our organisation. Then the only real 'problem solving' that is really going on is in the attempts to devise a means, at each step of the process, to reconcile 'unsuspected difficulties' with the received 'best practice' methods. In some ways this seems to relate to a lot of the ideas around Agile Development and Open Development as well, but is probably from a different perspective. It also speaks to Easterly's long-standing distinction between 'planners' and 'searchers' in development, which is also championed by many market-based approaches to development work.

Let’s illustrate this in a context which is broadly the one that this blog post is addressing. The UN report is highlighted in a recent economist article on the ‘Data Revolution’, and states that a key problem is simply that there is an ever growing and expanding divide between the data rich countries and the data poor ones in terms of data. The article helps to paint a picture of why some of the slow moving machinery based on national statistics and silos of project data around the MDG’s is unlikely to be fit for purpose to be responsive to the shifts and shocks (like Ebola) that could impact on the post-2015 world. In fact it takes Ebola as a prime example of why real-time data is so important in an interconnected world.

The Economist goes on to explain that Humanitarian action is being hampered greatly by the lack of appropriate data on the location of hospitals, with poor maps of cities. Volunteer efforts with Humanitarian OpenStreetMap Team and charities like MSF and the Red Cross have led the way in creating MissingMaps.org which is trying to create free maps of cities across the developing world, particularly in potential trouble spots before they need crucial data in a disaster situation. Similarly it mentions how call-data from mobile-phone operators in an interconnected world can be put to use by for example comparing data on malaria outbreaks and on people’s movements to make predictions of spread. This seems all the more pressing considering the current situation of disaster around Ebola.

Interestingly the Economist then goes on to describe a role for the private sector as well. It points out that Premise (a startup in Silicon valley) is being used to spot trends such as the fact that as the number of Ebola cases in any one location rise, the local price of staple foods also increases dramatically. As they point out in recent weeks as the number of cases fell, prices also did. The Economist then goes on to say that the authorities already did realise that travel restrictions and closed borders would have this effect, but now they have a way to measure and track price shifts as they happen. The creators of Premise describe a little bit of what is behind their thinking (and success) here. For some of the analysis in the press around the relationship between Ebola and food prices and some of these concerns, there are articles here and here also.

"World Food Programme in Liberia 002" by 26th MEU(SOC) PAO (U.S. Marines) - 26th Marine Expeditionary Unit (Special Operations Capable) Public Affairs Office. Licensed under Public domain via Wikimedia Commons.

I think this returns my attention to the issue that I have sketched out in my previous post. That in essence the Economist is celebrating the work of ‘Premise’ and others like them because they will allow food distribution companies to continue doing business without interruption globally despite the Ebola crisis and the subsequent closing of borders and quarantines.I am not entirely sure that is exactly what the creators of Premise themselves envision for their platform, which seems to be aiming to be working for the social good in a very innovative way.

The way the Economist presents this is actually not as an initiative focusing on development per se, although it certainly could be good for the companies involved. A potential benefit might be if we are able to use the data from ‘Premise’ and other companies in an open way, reusing it and remixing it in order to help predict disasters and improve coordination of relief efforts for example. But the fact that people can make money out of this as a business is not the most interesting aspect of it. The question here should be how is it that corporations and businesses could play a more useful role in contributing to the ‘data divide’ in the global south instead of just focusing on a bottom line.

My question is also how much institutions like the Economist and the WEF are presenting a fairly narrow view of open development, and open data here and are actually pigeon-holing it into simplistic streams? The volunteers who map out cities and the companies who need only focus on how much extra profit the emergence of data about the developing world will potentially bring them in the long run.

The point is not that Premise (the Google funded startup) is designed to make a profit out of its analysis of financial data. Or as the FT article puts it 'Premise taps into hunger for real-time data'. The point is this is how the Economist and the FT have chosen to present it to us.

So to return the argument I proposed in my last blog post. At first glance the WEF is fairly comfortable with designating our personal data is a new class of asset, (see the report here) and like the Economist it probably sees that there is nothing wrong with this or with anyone else selling off analytic data on financial markets in order to supply the market with real-time data. But beyond this they are both fairly silent about is what responsibility the private sector might have to help to mitigate the impact of a humanitarian crisis like Ebola.

Potentially it seems quite likely to be hard to predict a crisis and to map out an area in the event that a crisis might occur there. One the other hand, to ask companies collecting data from the developing world to consider broadly that their work could play an important role in alerting people and agencies to things like price increases in the context of a global epidemic like Ebola seems like a very useful and effective measure.  

The role of open source software and open systems is also important here, because it is helping to build systems for data collection, and useful corridors and channels for information to flow. It is showing itself to be of key importance in the current fight against a threat like Ebola. This story from AllAfrica highlights some of the ways that NGO’s and others are using open systems. A great resource is a recently published book edited by Matthew L. Smith and Katherine M. A. Reilly from MIT Press which is digging into what ‘Open Development’ is and may seek to achieve in the future. Also to point out is the long standing work that rOg media in Berlin has been doing in South Sudan and work that that Stephen Kovats from rOg has already done with UNESCO too on this here.

If you consider how much is being done with open software, it begs the question. Why are we not also asking companies that have useful data and resources to think more clearly how they can move towards making them open? Is it really ever going to benefit poorer countries in Africa with less data to work with and less digital infrastructure if in the end all we are doing is focusing on selling food price data on regions in crisis to the market instead?

To return to the problem of complexity here, I think this illustrates why we have to think outside of the ‘boxes’ that we might be at first tempted to approach these problems with. Part of my argument is that this drive to work with open data in development should not only try make sense of the obvious challenges and opportunities for development actors in a post-2015 world, but also to make sure that it is taking on board Hirshman’s call for more lateral thinking.

Seeking to predict and plan for disasters that will require local solutions, and provide maps and data to aid NGO’s, local governments and local actors to act effectively is clearly valuable. However, an equally predictable effect of a crisis like Ebola anywhere is that markets will act in unpredictable ways and may even exacerbate the situation, and will capitalize on their access to real-time data and analysis to avoid being affected by it. Given the digital data divide, it seems that a more lateral approach to the situation would also be one whereby complex problem solving is employed to test out ways to yield benefits by matching up those who can deliver solutions best with those that have the knowledge, resources and data to assist in this. Not by working in silos, or neccessarily by competing with large 'best practice' initiatives and not by recreating proprietary systems that lead us further towards broadening an existing digital data chasm between the richer and poorer countries around the world.

So in my next blog post and will try to take a closer look at the report itself.    


Who is 'data' really ?

I have been listening recently to discussions about data, often in the context of transforming something or often with critical implications for some bottom line, at other times with claims about it being by nature 'disruptive'.

But increasingly I am beginning to suspect that many of these intertwined discussions are fundamentally different ones. Or at least, that people mean quite different things and actually have quite different thought processes behind how they wish to present 'data' in any given case.

For example if I am talking with people about Open Data it is quite a lot easier. The crowd who do this are very clear about what this means for them. Here and here are posts from Open Knowledge advocating for these standards.

Data varies in terms of structure, use, and things like relevance and other qualities in different contexts. 

Then there are different types of data: open data, big data, etc...and the boundaries sometimes get blurred.

In terms of big data for example the defnition put forward by SAP from a business perspective points you to the qualities of the data, eg Volume, variety, velocity, validity.

This is a much more technical and ‘neutral’ definition really compared to the definition of 'big data' from UN Global Pulse for example. From the outset the concern is less about users, sources or use cases for the data.

Does it matter which definition we use? Well, I think it does but often in subtle ways. Agencies like UN Global Pulse are fairly clear about focusing on big data for global development. So it is really looking at a use case or context for application. I think it is also worth bearing in mind that the document seems to locate the construction of this definition at some time subsequent to meetings of the World Economic Forum and also of the G20.

To me this raises another associated question sort of by implication. For any agency or policy mission it is almost impossible to escape this - it is part of the whole business of making policy. In fact it is really hard to talk about something without thinking about what it is for. Inevitably the interests of certain groups become represented in the process.

So for UN Global Pulse we are looking at big data within international development or developing countries particularly with the G20 in mind, so with an emphasis on economic activity. Consider also that UN Global Pulse does not have direct involvement with UN agencies such as the UN Department of Economic and Social Affairs, or the United National Economic and Social Council (Ecosoc). So you might potentially expect a different vision again could easily come from other areas within the UN even.

Robert Kirkpatrick, who is the director of UN Global Pulse , gives a pretty clear interview here about what UN Global Pulse is, as he says it is a sort of R&D lab based in New York, Jakarta and Kampala.

There are reasons why it matters, and what the 'user case' is going to be. For example the issue of privacy particularly is one of the most obvious problematic areas with how big data is being presented and discussed in this document for me. If you look at the diagram from the World Economic Forum white paper that UN Pulse uses here, you can see that privacy concerns do surface at two levels but not at all clearly for private industry:

It simply mentions questions about ‘Ownership of sensitive data’.

In terms information streams relevant to global development it highlights these four:
- Data exhaust
- Online information
- Physical sensors
- Citizen reporting

Recently at the Open Development Camp in Amsterdam, I was able to listen to discussions and work going on around this area particular.  I think it is one that has been overlooked, but partly also crowded out by the language that gets adopted, in exactly the way it has been above. In a way it also has been silenced.

Check out the presentation here from the ODC in Amsterdam on "Responsible Development Data" and also download a copy of the great handbook that was written during the booksprint with the Engine Room and check out some of the discussion going on around this at Open Knowledge.

In case you want to see the other great presentations from ODC, they are all available on Vimeo here

So why does matter who creates the frameworks? Well, although the WEF sees a role for the private sector in the diagram above in terms of ‘managing and collecting’ consumer data, the vision in this document seems to mostly skip over how one huge area of data could really be made use of to protect people from shocks and price changes. That is of course the data produced and held by corporations on their own activities and those of consumers. It does mention how we could make inferences about the prices of food from interactions via mobile phones with banks, but it does not say how we could inform consumers or even small businesses about the movements of global corporations, exchange rates or even of their own government in order to manage price shocks for example.

Furthermore the use of terms such as ‘stock levels’ of data, and the constant reference to measuring things such as school attendance is more suggestive to me of seeking to optimise and work around concerns about human capital, productivity and efficiency. In short it tends to situate human activity and society very much within the gaze of a corporate understanding of the world.

The microeconomic / optimisation of bounded rationality type decision-making  / predicitive modelling and closed feedback loop type solutions all appear founded on a concept of living in a world founded upon a paradigm of inevitable risks and shocks. I personally think this is a dangerous and slightly spurious argument. It serves I think to lead us to default to placing nearly no emphasis on deliberative proceedures or processes. I mean if we are talking about work in 'international development' then really it presents quite limited scope for human agency in the conception of big data for development here.

It really seems to be advocating for a sort of social fire alarm and spinkler system powered by streams of automatic data. But it assumes in a way that we are living inside the machine, and like Sunstein and Thaler’s idea of the nudge, it is not really about choice but about modifying choice architectures that sustain an equilibrium around a volatility inherent in the system, both via feeback loops and millions of automatic corrections. All of this for me almost instantly assumes that forms of consent to this are automatic or simply fundamentally broken or degraded to a virtually meaningless degree.

Today another important document was released by the Independent Expert Advisory Group on ‘data revolution and sustainable development'.

NGO's particularly have been critical of the weak representation of CSO's around the UN 'data revolution' discussions, see here and also here.

The press conference just went out here.

You can access the document itself and also details of the consultation and judge for youself if you think that it represents a fair process or not. Here is the report in full.

It looks at two key areas in terms of the 'data revolution' for sustainable development:
  • The challenge of invisibility (gaps in what we know from data, and when we find out)
  • The challenge of inequality (gaps between those who with and without information, and what they need to know make their own decisions)
I will be continuing to blog about it in my next post.