Tag Archive for Open Data

Game, Dataset and Match

FOI Man highlights forthcoming changes to FOI and provides some hints and tips for public authorities on how to deal with them.

Last year, the Protection of Freedoms Act was passed. Amongst the changes it brought in, were a small number of amendments to the Freedom of Information Act.

But we’re still waiting for most, if not all, of those changes to come into force. To bring them into force, the Government has to lay a commencement order before Parliament…and this is yet to happen. It was expected that the commencement order would be laid last month, bringing the changes into force on 1 April. But this has now been delayed, as reported by the Information Commissioner’s Office earlier this month.

The most significant change is the requirement on public authorities to release datasets in a reusable format, and to publish disclosed datasets in their publication schemes. In my latest article for PDP’s Freedom of Information Journal, I’ve written about these requirements and how to comply with them. (And don’t forget also my report on open data work at Southampton University, which contains further tips on managing and publishing open data).

Personally, I don’t think public authorities should worry too much about these changes. There are a few reasons for this. Firstly, as I commented when the Bill was first published, the effect of these changes will be very limited in my view – they change very little. Public authorities already have to provide information in the format requested “so far as reasonably practicable”; I’ve never been convinced by Francis Maude’s claims that public authorities routinely (and deliberately) choose to disclose data in pdf just to frustrate entrepreneurs.

There may be a mad rush of requests for datasets later this summer (if indeed the Government sticks to its latest timetable), and no doubt there will be more impact for some than others. But I don’t anticipate that this is going to cause significant issues overall.

What can public authorities do to prepare? Well, I suggest the following:

  • identify your key datasets – if you regularly get requests for particular data, then you know what is likely to be asked for in future
  • work out what kind of licence you want to apply to these datasets if you disclose them; the easiest thing will be to use the Open Government Licence for information your authority owns the copyright for, but it is likely you will also be able to offer a non-commercial licence (limiting re-use to non-commercial use) or a charged licence (allowing re-use in exchange for a fee)
  • set up a section in your publication scheme for datasets and if you are happy to disclose datasets and make them available for re-use, get them up there on your website for people to use – don’t wait for the requests
  • once you’ve released a dataset and have licensed re-use, you are obliged to make it available in your publication scheme and to keep it up to date.

That last point may sound worryingly like a potentially unmanageable task for some public authorities, but the relevant amendment goes on to say “unless the authority is satisfied that it is not appropriate for the dataset to be published”. “Not appropriate” isn’t defined (as ever), but if it would be expensive to keep the dataset up to date, for instance, that might well be a justifiable reason not to do so.

So the usual advice applies to these changes – don’t panic! But we’ll have to wait and see what the actual impact will be. And indeed when that impact will be felt. At the moment we only have a draft Code of Practice to go on, so hopefully these few thoughts will be useful.

 

 

Are universities transparent enough?

FOI Man talks to Times Higher Education about universities and openness.

Times Higher Education magazine this week features an article about…higher education, and how open and transparent it is. I was interviewed for this feature a few weeks ago – wonder at my high rhetoric – “[FOI is seen] as a pain in the backside”. Seriously, it’s a comprehensive survey of all aspects of transparency in the UK university sector, including everything from FOI to open data to MOOCs (that’s massive open online courses for those of you not in the know).

Opening up Open Data

FOI Man reports on a visit to meet Open Data experts at Southampton University.

Chris Gutteridge is a techie – in the best possible sense of the word. When I first arrive, he’s eager to show me “the cool stuff”. And it really is cool.

Despite my pseudonym, I don’t have superpowers, yet Chris is able to fly me over and into the university campus, bringing us to the ground with a bump outside his building. Of course, I’m talking about a visual representation of the campus achieved through the neat trick of linking building data collected from the university estates department to Google Earth. Chris is experimenting with making the maps 3D by collecting (and in some cases creating) data on the height of buildings on campus.

And the cool stuff like this is how he sells the open data initiative to colleagues across the university. Central and local government have already made great strides in the open data arena, but Southampton are pioneers – and indeed, award winners – in the higher education sector. After all, it is the home of the Government’s open data tsars (can you have two?) Professors Nigel Shadbolt and Tim Berners-Lee (also famous these days for sitting behind a desk entertainingly at the Olympic opening ceremony – no mean feat). But even with this top level support, it can be tricky to get busy departments to cooperate, as many FOI Officers will sympathise with.

Chris starts listing the objections colleagues raised when asked to provide datasets to make available for re-use through their Open Data Service. He looks puzzled when I start laughing, but some of you will recognise “what about data protection?”, “what if terrorists exploit it?”, and “isn’t it commercially sensitive?”. And the question that often lies behind these objections in reality, “what if someone realises our data is unreliable?” (or “shit” as Chris rather more prosaically puts it). Chris has blogged a whole list of these concerns and they all sound rather familiar.

But by demonstrating that once the data has been collected centrally it can be made useful to the departments that originally provided it, Chris and his team have started winning them over. One of their biggest supporters is the Catering department, who already maintained spreadsheets with details such as the items available in the cafes, bars and restaurants across campus and their unit cost. With a little adjustment, the spreadsheets were made into reusable data. Now the Catering department no longer have to manually update their web pages as the availability and cost of food and drinks in individual outlets is now pumped live from regularly refreshed open data straight to the pages. They’re now doing similarly useful things with events calendar data.

Chris thinks this repurposing of the open data is key to making open data a success. His mantra, repeated to me several times, is that the main return on investment in open data for the university is the availability of “a huge pile of data that can be used internally without fear”. And if outside users (or indeed their own students) find the data useful to create new apps, then all the better.

Chris gives me some tips for anyone wanting to establish an open data repository. Given that recent FOI amendments mean that this will soon be a requirement, that’s pretty much all of us in the public sector.

Avoid controversy. You want to get buy-in from colleagues, so don’t startle the horses. Pick simple but useful datasets that nobody will challenge. At Southampton they started out with building data. It’s data that isn’t sensitive – people can see most of it by walking around the campus – but as we saw at the start of this piece it can be put to very effective use by developers.

Think carefully about what datasets you think should exist. Then speak to the relevant departments and see if they do have those datasets in a useful format. Chris suggests that often suppliers can be very helpful, and can give advice on how datasets can be extracted from their systems. They may even be prepared to look at how their systems are specified so that datasets can be more easily exported in future.

Encourage colleagues to give you their data even if it isn’t complete. It is better to have some data than no data at all in most circumstances.

Be ready to challenge concerns. Some colleagues will be concerned about giving data away for free, rather than making money from it. But as Chris points out, “if a Chinese takeaway gave away its food, it would soon go out of business. But if a Chinese takeaway didn’t give away its menus, it would go bust even faster.”

Look at what you’re already making available, and how. Chris demonstrates that my university is already making available data in a reusable format because we have an online repository called EPrints. He tells me that if you have an RSS feed on your website, you are “already more or less doing open data”.

Make information available in a reusable format. Open data enthusiasts grimace at the mention of portable document format (pdf). At the very least, try to make data available in a spreadsheet format.

Adopt an open licence. Open data is about more than publishing information in a reusable format. It is also about licensing. It’s really only open data if you state on your website/open data repository that data can be reused without charge. The best way to do this is to adopt the Open Government Licence.

Keep datasets up-to-date. One of the things that public bodies will be expected to do once the FOI amendments on datasets come into force is keep published datasets up-to-date. I ask Chris how Southampton maintain their datasets. It depends, of course, on where it comes from.

  • Hearts will sink to be told that a lot are maintained manually by his team – not many of us will have resource to spare. So to make this work we need to ensure that it’s as easy as possible to get things corrected. Chris suggests encouraging feedback from users of the data so that they can flag up when data needs refreshing. At Southampton they also use student volunteers to maintain the datasets, which might be something to consider for universities in particular.
  • Some datasets automatically update. Depending on your set up, some will be fed live data from systems maintained by the supplying department. Some suppliers who provide systems across the public sector are already thinking about how to build open data publication functionality into their databases.

Chris is keen to encourage the growth of an information ecology, with others across higher education publishing more and more open data. He encourages universities to consider creating ‘profile’ documents on their websites, describing where their key datasets can be found and in which formats. This will help with auto-discovery by the new open data hub for higher education, which will eventually provide potential users of datasets with a single portal to locate useful data.

So the higher education sector isn’t moving into this new era of reusable open data entirely unprepared. And if you’re thinking of taking your first steps into this brave new world, hopefully you, like me, feel slightly less daunted thanks to Chris’s enthusiasm.

I’d like to thank Chris, Ash and Patrick for letting me disturb them for a whole afternoon last Friday. Any errors here reflect my ignorance and not their skill or willingness to share!

Draft Datasets Code of Practice

FOI Man highlights a new draft Code of Practice under section 45 of the Freedom of Information Act.

It’s all go with FOI at the moment. No sooner have we had to wade through the ICO’s Anonymisation Code of Practice than another comes along from the Ministry of Justice – this time a draft Code setting out best practice for meeting the new requirements under FOI relating to datasets.

The draft Code is a supplement to the existing section 45 Code of Practice, setting out best practice for public authorities in complying with FOI. It is required by the amendments made to FOI by the Protection of Freedoms Act (which are not yet in force).

It provides clarification on interpreting the definition of dataset in the amendments, as well as setting out the three licences (developed by The National Archives) that public authorities will be expected to use when licensing re-use of datasets (ie open, non-commercial and charged). What isn’t yet clear is what fees public authorities will be able to charge for re-use. The amendments allow for the Secretary of State for Justice to lay down regulations to allow this, but there is no news yet on if, or when, such regulations will be forthcoming.

It should be stressed that the Code is a draft, and the Government is inviting comments on it via the gov.uk website. So if you’re interested in the open data agenda, or simply want to ensure the Code is clear enough, do go and make your views known.

Anonymovember

FOI Man reports on the ICO’s new Code of Practice on anonymisation.

FOI Officers tend to be caught between a rock and a hard place on a pretty much continual basis. If it isn’t navigating between the Scylla of senior management and the Charybdis of requester ire, then it’s trying to balance the often competing demands of the Freedom of Information and Data Protection Acts (DP).

So new guidance from the Information Commissioner on the important subject of anonymisation is very welcome. Though at over 100 pages, some FOI and DP Officers may struggle to find the time to read it between fielding requests and CMP notices. But, ever at your service, I attempt to extract the key points for you here.

The Code notes DPA does not require anonymisation to be completely risk free – the role of the Code is to help organisations mitigate the risks involved with anonymisation. Similarly, it points out that – in line with R (on the application of the Department of Health) v Information Commissioner [2011] EWHC 1430 (Admin) – anonymised information ceases to be personal data. So if your data is truly anonymised, section 40 of FOI won’t apply to it, and the sort of large datasets that that nice Mr Maude likes Government departments to publish can be unleashed without concern.

But that’s the trick. We’ve got to be very careful that what we put out there is truly anonymised. The Code summarises the problems with that neatly – firstly, there are a number of ways that an individual could be identified, so just taking a name out may not be enough. And secondly, we have no way of knowing what information you folks out there might already have access to.

There are well documented examples of how individuals have been identified from supposedly anonymised datasets once put together with information available on the internet or with personal knowledge. The ICO point out that organisations aren’t omniscient – they can’t know for sure what is, and what will be, available to people. So what do they say about how FOI and DP Officers should reach the judgment as to whether or not it is safe to disclose an anonymised dataset?

Effectively – and I hate to throw a buzz word at you – it’s a risk assessment. They cite a Tribunal concept of the “motivated intruder”. Basically this is someone who will do anything short of commit crime to identify individuals where there is some motive, eg the information is newsworthy, of interest to the village gossip, perhaps politically sensitive. We need to consider whether someone like that could identify people using libraries, archives, the internet, social media. In other words, we’re talking about those people who you see on TV sometimes tracking down people for an inheritance. Or the producers of Who Do You Think You Are. Could they identify individuals from the data?

Of course, this is better than nothing, but it still relies on FOI and DP Officers or their colleagues to have the time to work out whether someone could be identified from all of these sources. If they haven’t got that time, then there is a risk that the Code just leaves us where we started – with authorities reluctant to release information for fear of individuals being identified.

Thankfully the ICO do recognise the difficulty of this with large datasets – the desire for publication of which is pretty much what prompted this Code. They say:

“It will often be acceptable [with larger datasets] to make a more general assessment of the risk of prior knowledge leading to identification, for at least some some of the individuals recorded in the information and then make a global decision about the information.”

But it still means that many FOI and DP Officers will be left feeling uncomfortable whenever considering disclosure of anonymised datasets. Have I checked enough sources? What if I’d tried that other search engine? Should I subscribe to that genealogy site to check what someone could find there? It’s difficult to see what else the ICO could have advised, but FOI Officers will take limited comfort from the Code on this point.

There is some useful practical advice in the Code such as the best ways to present personal and spatial data (eg in crime maps). The case studies that form the last half of the publication will be helpful as well.

Overall, the Code is a useful guide to the issue of anonymisation for FOI and DP Officers and anyone working with datasets containing personal data. But it won’t be the last word and it will be interesting to see what comes out of the new UK Anonymisation Network announced yesterday by the Information Commissioner.