2024-25 Project (Morris & Carey & Lewin)
Analysing aggregated electronic health records data, with application to European data on children with rare congenital anomalies
SUPERVISORY TEAM
Supervisor
Professor Joan Morris at SGUL
Email: jmorris@sgul.ac.uk
Co-Supervisor
Dr Iain Carey at SGUL
Email: sgjd450@sgul.ac.uk
Co-Supervisor
Dr Alex Lewin at LSHTM
Email: alex.lewin@lshtm.ac.uk
PROJECT SUMMARY
Project Summary
Analysing aggregated electronic health records data, with application to European data on children with rare congenital anomalies
There is a large increase in the availability of routinely collected electronic health care data from different health care settings across Europe. Many countries are unable to share data on individual cases but may provide aggregate data and analytic results. However, there is a lack of statistical methods to combine results from different sources when the samples within each data source may be small, which occurs when analysing rare congenital anomalies and/or specific medications in pregnancy. Methods are being developed to combine data from Kaplan Meier survival curves (to analyse the survival of children with congenital anomalies) and to combine data from median values (to analyse the length of hospital stays). However, these methods need to be refined further. The Observational Health Data Sciences and Informatics (OHDSI) program is a multi-stakeholder, interdisciplinary collaborative whose aim is to bring out the value of health data through large-scale analytics. All their solutions are open-source. They currently have an “EvidenceSynthesis” package (in R), which only includes the fitting of random and fixed effects meta-analysis.
The aim of this PhD is to continue the collaboration with researchers in EUROlinkCAT (evaluating the health and educational achievements of children with specific congenital anomalies compared to children without the congenital anomalies across Europe) and ConcepTION (evaluating the safety of medications in pregnancy) to develop the methods they require and to also contact the OHDSI program to suggest updates for their “EvidenceSynthesis” package to include more tools for synthesising data, particularly when it is sparse.
Project Key Words
Meta-analysis, Survival Analysis, R programming
MRC LID Themes
- Global Health = Yes
- Health Data Science = Yes
- Infectious Disease = Yes
- Translational and Implementation Research = Yes
Skills
MRC Core Skills
- Quantitative skills = Yes
- Interdisciplinary skills = No
- Whole organism physiology = No
Skills we expect a student to develop/acquire whilst pursuing this project
Ability to analyse data sets in R and to write open-source R scripts. Detailed knowledge about meta-analytic techniques. Knowledge of health care data sources across Europe as well as general knowledge about congenital anomalies and safety of medications in pregnancy will be acquired.
Routes
Which route/s is this project available for?
- 1+4 = Yes
- +4 = Yes
Possible Master’s programme options identified by supervisory team for 1+4 applicants:
- LSHTM – MSc Medical Statistics
Full-time/Part-time Study
Is this project available for full-time study? Yes
Is this project available for part-time study? Yes
Eligibility/Requirements
Particular prior educational requirements for a student undertaking this project
- SGUL’s standard institutional eligibility criteria for doctoral study.
- The candidate needs a strong mathematical background, preferably with some statistical knowledge
Other useful information
- Potential CASE conversion? = No
PROJECT IN MORE DETAIL
Scientific description of this research project
Objectives
To extend methods of combining results from electronic health care databases across Europe with an application to the health of children with congenital anomalies (CAs).
Background
There is a wealth of information in routine electronic health care databases in Europe. However, many countries are unable to share individual record-level data but may provide aggregate data and analytic results. The EUROlinkCAT study linked data on health and education in children with specific CAs across Europe, created a common data model and ran scripts comparing the health and educational achievements of children with specific CAs with reference cohorts of children in different European countries. As there is a lack of statistical methods to combine analytic results from different sources when the samples within each data source are small methods are being developed to combine the data from Kaplan Meier estimates (survival of children with CAs) and to combine data from median values (length of hospital stays). These methods need to be refined further and additional methods concerning combining data with categorical exposures and outcomes would be useful.
All methods developed will have great applicability across a range of subject areas. This is extremely timely as there is a large increase in the availability of data from different health care settings and increased motivation to use such data. For example the ConcePTION consortium is establishing a platform across Europe in which to analyse routine health care data to establish the safety of medications in pregnancy. Such novel methods would be of immediate use in this consortium. The Observational Health Data Sciences and Informatics (OHDSI) program is a multi-stakeholder, interdisciplinary collaborative whose aim is to bring out the value of health data through large-scale analytics. All their solutions are open-source. They currently have an “EvidenceSynthesis” package (in R), which only includes the fitting of random and fixed effects meta-analysis.
The aim of this PhD is to continue the collaboration with researchers in EUROlinkCAT and ConcepTION to develop the methods they require and to also contact the OHDSI program to suggest updates for their “EvidenceSynthesis” package to include more tools for synthesising data, particularly when it is sparse.
Techniques to be used
Literature review identifying potential methods
Refining published methods to be suitable for small samples
Simulation studies to evaluate proposed methods using R
Testing new methods on real data
Producing guidance on appropriate analysis techniques
Writing open source R scripts for use by EUROlinkCAT, ConcePTION and OHDSI
Confirmed availability of any required databases or specialist materials
The lead supervisor is the PI of the EUROlinkCAT consortium and lead statistician in ConcePTION. Hence, whilst individual permissions are necessary from the respective data owners, such permissions are likely to be given.
Potential risks to the project and plans for their mitigation.
Refused access to data from EUROlinkCAT and ConcePTION. In this case data owners who are collaborating with the OHDSI program will be approached and asked to collaborate. Alternatively published data (e.g. EUROSTAT) and simulated data may be used.
Further reading
(Relevant preprints and/or open access articles)
Additional information from the supervisory team
- The supervisory team has provided a recording for prospective applicants who are interested in their project. This recording should be watched before any discussions begin with the supervisory team.
Morris-Carey-Lewin Recording - The following website of interest may be of interest: https://www.eurolinkcat.eu/
MRC LID LINKS
- To apply for a studentship: MRC LID How to Apply
- Full list of available projects: MRC LID Projects
- For more information about the DTP: MRC LID About Us