UCL WIKI

UCL Logo
Child pages
  • Retrospective

The Retrospective sesion was facilitated by Pascal Heas and examined user perceptions and issues that had been encountered in using DDI to annotate the selected studies.

General comments

The general view of participants was that the Nesstar Publisher application had been easy to use and made the process of creating DDI fairly intuitive and straightforward. The application is particularly useful when it is used in conjunction with full datasets since it provides a range of QA functions that allow the data to be examined and quality issues addressed.

Note: Nesstar Publisher is now available for free to download at: http://www.nesstar.com/software/download.html

National Study of HIV in Pregnancy in Childhood (NSHPC)

NSHPC worked with two types of datasets:

  1. MS-Access data, including derived data
  2. STATA dataset that is derived from the Access database for reporting purposes using an R-script.

The representation of some composite key relationships in the Access database proved to be impossible in DDI and there are some limitations in the built-in data typing that DDI provides.The data manager commented that it was difficult to guage how much annotation was needed.

DDI documentation has successfully been produced for the MS-Access database covering questionnaire variables, their data types and possible categories of values. The document is accessible through the NADA catalogue as are some supporting resources such as questionnaires.

Further refinement of the DDI is possible. The existing documentation was extremely useful but the exercise of producing DDI for the database highlighted that refining and structuring the current method of documenting the database would assist the NSHPC team and assist with automating DDI production.

Whitehall II

From the metadata perspective, the outputs of the project consist in a DDI-XML document that provides comprehensive information at the survey and variable level. This initial effort was focusing on two core data files of the Whitehall 2 study but can now easily be replicated to other phases which could rapidly provide a full collection in a web based catalog for end users.

From the tools perspective, the project resulted in a series of very useful utilities. While some may be fairly specific to the study, they can easily be adjusted to work in other contexts. The SAS to DDI2 export script is quite generic and represents a significant contribution for the community as many SAS users face a similar challenge when it comes to producing DDI-XML for their datasets.

As with any innovative projects, there is always room for improvement and enhancements. We however believe this use case to have been very successful. It has demonstrated the advantages of using DDI as well as produces quality metadata or the Whitehall II survey along with a series of useful tools.

Whitehall II chose one phase of the data collection (phase 7 from 2002) which was relatively 'clean' data. The ability to manage data across different phases of the study is not a feature of the version of DDI that was being used but the data manager would have liked to have been able to devote more time to the work and is very interested in identifying further resource to allow for ongoing work in this area. The promise of DDI v3 is of particular interest.

UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS)

Early in the life of the SERPent project, it became apparent that two aspects of UKCTOCS set it slightly aside from standard data sets: firstly, UKCTOCS is principally a cancer screening trial and as such the collection of demographic and questionnaire-related data has not been the main focus of the trial; secondly, UKCTOCS is ongoing, screening continues to end 2011 and further questionnaires are proposed for mid-2014. As a result, when compiling high level metadata with the DDI2-based Metadata Editor it was often difficult to judge how to enter information that was ongoing and/or incomplete into a package predominantly designed to cater for completed data sets. However, it has been possible to generate DDI documentation covering the principle demographic and survey-based data sets of UKCTOCS.

Multiple clinics trials are managed out the the Institute of Women’s Health, many of which are managed using TMS software based on the UKCTOCS TMS. It is therefore, going to be a much easier task to generate comparable XML for these studies with the software developed for UKCTOCS.

At the onset of this initiative the UKCTOCS team had only a very basic comprehension of the value of good quality metadata. However, the SERPent project has enabled DDI documentation to be produced for large section of the UKCTOCS data. The documentation thus created has proved invaluable in the dissemination of data and in the facilitation of collaborative ventures. Plans are already underway to generate similar DDI documentation for other data sets currently managed and maintained by the Institute for Women’s Health. Furthermore, the team now feel that all future clinical trial projects would greatly benefit from consideration of metadata being made at the start of a project rather than at the end.

UK Collaborative Study of Congenital Heart Defects

The data manager found Nesstar publisher extremely simple to use and was able to markup her study without the need for any assistance.The variable data was originally contained within STATA files and not all variables were fully annotated. Where variable annotations were missing it was simple to add these using Nesstar Publisher. Both long and wide representations of the data were considered by the data manager but it was noted that the Nesstar Publisher was unable to transform between these two formats which would have been very useful. This study is in part representative of similar surveillance studies that are coordinated from the British Paediatric Surveillance Unit and it may be useful for the BPSU to consider developing a standard template for use with Nesstar Publisher.

OMA & Caliber

The data manager was able to annotate both studies however, due to the fact that these studies are currently in the design phase, he did not feel able to publish the metadata on the public catalogue. The data manager had some difficulty reflecting the relational model as outputs in DDI and thought that the final metadata did not fully capture the study detail. It was particularly problemmatic with respect to hospital and prmary care data.

How do you currently work?

 

UKCSCHD

Calibre/OMA

Whitehall II

UK-CTOCS

NSHPC

e-Docs

y

y

y

y

n

paper

y

n

y

y

y

database

y

y

y

y

y

spreadsheet

y

n

y

y

y

statistical packages

y

n

y

n

y

separate admin

y

y

y

n

n

What's missing?

 

UKCSCHD

Calibre/OMA

Whitehall II

UK-CTOCS

NSHPC

sensitive flag

y

y

y

n

y

derived data

y

y

y

y

y

better terminology support

n

y

n

n

n

support for versioning

n

y

y

y

y

What would you use in the future?

 

UKCSCHD

Calibre/OMA

Whitehall II

UK-CTOCS

NSHPC

DDI_share

y

probably

y

y

y

DDI_archive

y

probably

y

y

y

Instrument_reg

unlikely

n

n

y

y

q_design

probably

n

n

y

n

access arrangements

limited exclusive access to primary researchers

limited exclusive access to primary researchers

controlled public access

Collaborative access among scientists

Collaborative access among scientists

data sharing plan

n

y

y

n

n

citation standards

n

y

n

n

n

registered open accedd db

y

n

n

n

n

registered public website

y

y

y

n

y

microdata submission

n

n

n

n

n