Scientific research has adopted computerised tools to assist with all aspects of the research life-cycle, however this progress has largely been driven by pragmatic adoption of tools and methods with little reference to prevailing standards. The result of this gradual evolution has been largely beneficial to the scientific community although it has led to a widely heterogeneous ecosystem of data and metadata constructs that are difficult or impossible to interpret by third-party researchers. The growth in data collections that has been the inevitable consequence of this revolution in the way we 'do science' has provided us with some acute dilemmas:

  1. How do we ensure that the data are correctly interpreted by third parties?
  2. How can we describe our activities to funding bodies and the public?
  3. How should we build and curate archives that can preserve this information for the future?
  4. How can we manage access to these datasets?

This can be summed up by a few key words: clarity, transparency, sustainability and control. Actually, this boils down to effective information security, and here lies the root of the SERPent project.

SERPent set out to examine an existind standard, the Data Documentation Initiative (DDI), and its related toolset. The project was not a conventional software development project since it set out to examine a pre-existing set of software tools and standards to assess the viability of adoption by the epidemiology and population health research community.

Deliverable assessment

Provision of DDI documentation for selected use cases.

During the course of the study the 6 use cases that were originally selected have been documented using DDI and uploaded to a local instance of the NADA catalogue service that is inaccessible to anonymous user access, hosted on an internal network in UCL. A public catalogue is available at https://epilab.ich.ucl.ac.uk/nada within which 3 of these studies have been made available. In practice, the release of study metadata has proved problematic in some cases due to contractual constraints imposed when using third-party datasets and intellectual property concerns form principal investigators where the study design has not yet been published. We see value in provision of a limited, high-level metadata record, encoded in DDI where the variable level information is optional.

Secure enclave metadata catalogue

As mentioned above, the catalogue curation process has demanded a segmentation of public and private catalogue entries. For the time being this is being managed by maintaining two separate instances of the NADA catalogue tool that provides suitable assurance of protection against unauthorised access. For the management in future of the public catalogues we intend to investigate a feature of the NADA catalogue that supports federation through DDI harvester functionality. This would allow for scaleability of independently managed DDI catalogues within a wider federated community of services.

Integration of a DDI based toolkit for the documentation of studies

The Nesstar Publisher application was originally purchasedfor use as part of this project. Since the completion of this project this tool has been made freely available by the licence holders. This has meant that we have been able to deploy this as standard on all internal user desktops, making the production of DDI document easily acheivable for most users. We are currently developing course content to begin to instruct all users in the use of this tool so that we can begin the process of standardisation of all high-level study metadata, leaving the variable level data as an optional element.

A by product of this project has been the identification of a suitable metadata driven data collection & survey tool called REDCap, produced by Vanderbilt University in the USA. UCL has signed up to the REDCap consortium (the first UK institution to do so) and have promoted this service within UCL with accompanying training, coordinated by Anthony Thomas.

Dissemination of lessons learned

The lessons of the SERpent project have been widely disseminated at academic and technical meetings, within UCL and among colleagues at the Medical Research Council.

  • UCL Biorepository workshop presentation (internal strategy discussion)
  • Dr Castillo chaired one of the sessions on 'Searching and Locating' and presented a paper entitled "The use of DDI tools and standards in epidemiology & public health research" (Abstract, Slides)
  • 5 day workshop to explore best practice in the use of DDI in longitudinal studies and was co-author of a paper entitled "Metadata for the Longitudinal Data Life Cycle".
  • NCRI-NCI 2011 (invitation only) Joint Conference presentation on Tools and Technologies discussion
  • Workshop in conjunction with the Open Data Foundation and sponsored by the Medical Research Council (MRC Data Management Workshop), to be hosted at the Institute of Child Health, that will focus on the use of metadata standards in health research. It is hoped that this will be the first in a series of similar meetings to explore issues and promote awareness of best practice in data management. (Details to be announced shortly)

Promotion of transparent information governance arrangements

The MRC Centre of Epidemiology for Child Health has commissioned a new website development project and plans to include high level metadata descriptions of all projects as data feeds into the new site (expected completion August 2011).

Training material and guidelines

Anthony Thomas is working on the development of a training course on basic concepts in metadata markup and the use of Nesstar publisher for representation of study metadata. This course is one component of the centre website redevelopment project.

Effective network of database managers across UCL biomedical sciences

Aida Sanchez & Spiros Denaxas have established a UCL data managers forum and associated website that meets regularly to discuss relevant issues in healthcare research data management.

Common framework of data management in UCL Population Health Sciences

UCL is in the process of restructing the existing faculties. This raises the profile of population health and epidemiology and will allow us to focus on the promotion of data management practice within the new structure. The new faculty brings together all of the existing partners in the SERPent consortium into one common faculty, that of Population Health Sciences, since it includes the Institute of Child Health, the Institute for Womens Health and the Department of Epidemiology and Public Health.