We wanted to mine some data from HEFCE’s API of Impact Case Studies submitted to the REF2014 to find out how many UK Data Service data collections were used in Impact Case Studies in REF2014 – and the case studies they were used in. Basic analysis of the database showed high usage of data – but very low citation using persistent identifiers (data DOIs).
John Matthews, software engineer in the census support team, talks us through the process:
“The first step was to figure out how the API worked. Luckily the API is very well built, and includes a good amount of documentation to go along with it. Sure it might read like a wall of text, but uae rcs data some documentation is better than no documentation. The API itself works in a fairly regular way, with multiple endpoints and built in handy functions that we can utilise later on.
The UK Data Service uses persistent identifiers for users to cite data in the collection, but it seemed that a lot of the case studies using data in the UK Data Service collection had included data collections by name, rather than data DOIs, so it seemed reasonable to start with a list of our collections from ukdataservice.ac.uk/get-data/key-data.
There’s a joke within programming circles that says “if you want to be the best programmer, you’ve got to be lazy.” Essentially, “script everything“. Why bother doing something manually when you can just get a computer to do it for you. Doing so will tune up your scripting and logic skills, plus you never know when something like this might come in handy later. So with this in mind it shouldn’t come as a surprise that instead of copying down each and every survey title by hand, we just got a script to do it for us. Here’s how the script worked: