Opening up Open Data

FOI Man reports on a visit to meet Open Data experts at Southampton University.

Chris Gutteridge is a techie – in the best possible sense of the word. When I first arrive, he’s eager to show me “the cool stuff”. And it really is cool.

Despite my pseudonym, I don’t have superpowers, yet Chris is able to fly me over and into the university campus, bringing us to the ground with a bump outside his building. Of course, I’m talking about a visual representation of the campus achieved through the neat trick of linking building data collected from the university estates department to Google Earth. Chris is experimenting with making the maps 3D by collecting (and in some cases creating) data on the height of buildings on campus.

And the cool stuff like this is how he sells the open data initiative to colleagues across the university. Central and local government have already made great strides in the open data arena, but Southampton are pioneers – and indeed, award winners – in the higher education sector. After all, it is the home of the Government’s open data tsars (can you have two?) Professors Nigel Shadbolt and Tim Berners-Lee (also famous these days for sitting behind a desk entertainingly at the Olympic opening ceremony – no mean feat). But even with this top level support, it can be tricky to get busy departments to cooperate, as many FOI Officers will sympathise with.

Chris starts listing the objections colleagues raised when asked to provide datasets to make available for re-use through their Open Data Service. He looks puzzled when I start laughing, but some of you will recognise “what about data protection?”, “what if terrorists exploit it?”, and “isn’t it commercially sensitive?”. And the question that often lies behind these objections in reality, “what if someone realises our data is unreliable?” (or “shit” as Chris rather more prosaically puts it). Chris has blogged a whole list of these concerns and they all sound rather familiar.

But by demonstrating that once the data has been collected centrally it can be made useful to the departments that originally provided it, Chris and his team have started winning them over. One of their biggest supporters is the Catering department, who already maintained spreadsheets with details such as the items available in the cafes, bars and restaurants across campus and their unit cost. With a little adjustment, the spreadsheets were made into reusable data. Now the Catering department no longer have to manually update their web pages as the availability and cost of food and drinks in individual outlets is now pumped live from regularly refreshed open data straight to the pages. They’re now doing similarly useful things with events calendar data.

Chris thinks this repurposing of the open data is key to making open data a success. His mantra, repeated to me several times, is that the main return on investment in open data for the university is the availability of “a huge pile of data that can be used internally without fear”. And if outside users (or indeed their own students) find the data useful to create new apps, then all the better.

Chris gives me some tips for anyone wanting to establish an open data repository. Given that recent FOI amendments mean that this will soon be a requirement, that’s pretty much all of us in the public sector.

Avoid controversy. You want to get buy-in from colleagues, so don’t startle the horses. Pick simple but useful datasets that nobody will challenge. At Southampton they started out with building data. It’s data that isn’t sensitive – people can see most of it by walking around the campus – but as we saw at the start of this piece it can be put to very effective use by developers.

Think carefully about what datasets you think should exist. Then speak to the relevant departments and see if they do have those datasets in a useful format. Chris suggests that often suppliers can be very helpful, and can give advice on how datasets can be extracted from their systems. They may even be prepared to look at how their systems are specified so that datasets can be more easily exported in future.

Encourage colleagues to give you their data even if it isn’t complete. It is better to have some data than no data at all in most circumstances.

Be ready to challenge concerns. Some colleagues will be concerned about giving data away for free, rather than making money from it. But as Chris points out, “if a Chinese takeaway gave away its food, it would soon go out of business. But if a Chinese takeaway didn’t give away its menus, it would go bust even faster.”

Look at what you’re already making available, and how. Chris demonstrates that my university is already making available data in a reusable format because we have an online repository called EPrints. He tells me that if you have an RSS feed on your website, you are “already more or less doing open data”.

Make information available in a reusable format. Open data enthusiasts grimace at the mention of portable document format (pdf). At the very least, try to make data available in a spreadsheet format.

Adopt an open licence. Open data is about more than publishing information in a reusable format. It is also about licensing. It’s really only open data if you state on your website/open data repository that data can be reused without charge. The best way to do this is to adopt the Open Government Licence.

Keep datasets up-to-date. One of the things that public bodies will be expected to do once the FOI amendments on datasets come into force is keep published datasets up-to-date. I ask Chris how Southampton maintain their datasets. It depends, of course, on where it comes from.

  • Hearts will sink to be told that a lot are maintained manually by his team – not many of us will have resource to spare. So to make this work we need to ensure that it’s as easy as possible to get things corrected. Chris suggests encouraging feedback from users of the data so that they can flag up when data needs refreshing. At Southampton they also use student volunteers to maintain the datasets, which might be something to consider for universities in particular.
  • Some datasets automatically update. Depending on your set up, some will be fed live data from systems maintained by the supplying department. Some suppliers who provide systems across the public sector are already thinking about how to build open data publication functionality into their databases.

Chris is keen to encourage the growth of an information ecology, with others across higher education publishing more and more open data. He encourages universities to consider creating ‘profile’ documents on their websites, describing where their key datasets can be found and in which formats. This will help with auto-discovery by the new open data hub for higher education, which will eventually provide potential users of datasets with a single portal to locate useful data.

So the higher education sector isn’t moving into this new era of reusable open data entirely unprepared. And if you’re thinking of taking your first steps into this brave new world, hopefully you, like me, feel slightly less daunted thanks to Chris’s enthusiasm.

I’d like to thank Chris, Ash and Patrick for letting me disturb them for a whole afternoon last Friday. Any errors here reflect my ignorance and not their skill or willingness to share!

4 comments

  1. […] He’s written it up in his blog. […]

  2. Daniel Olive says:

    Just for that PDF point I could almost kiss you. Rail franchise agreements are just images (you can’t search for words) while the (leaked, illegal and classified) Manual of Protective Security is fully searchable. Just two days ago UCAS had some great stats, but I can only find them as text on a web page, no sign of the machine readable formet they must have originated in, so there is no way I’m analysing those figures at all.

  3. S Jones says:

    I approve of publishing as much as possible. I am wondering where all the resource is going to come from within the NHS to get it done though.

    On a side note, we default to pdf for information unless otherwise asked because it’s readable with freely downloadable software on any operating system – which is not the case with many other formats!