2025-26 Project (Tazare & McDonald & Williamson)
High-dimensional approaches for addressing confounding in vaccine research using routinely collected electronic health data
SUPERVISORY TEAM
Supervisor
Dr John Tazare at LSHTM
Email: john.tazare1@lshtm.ac.uk
Co-Supervisor
Dr Helen McDonald at LSHTM
Email: him24@bath.ac.uk
Co-Supervisor
Dr Elizabeth Wlliamson at LSHTM
Email: elizabeth.williamson@lshtm.ac.uk
PROJECT SUMMARY
Project Summary
Electronic health record (EHR) data are a critical resource for studying real-world effectiveness and safety of vaccines. This project tackles a key concern surounding confounding due to differences between vaccinated and unvaccinated individuals. Many determinants of vaccine uptake are complex and not directly measurable in EHR data. Recent work has identified a set of proxy markers for health-seeking behaviour and healthcare access in EHR. Multiple proxy markers may be needed to capture the complex determinants of vaccine uptake. Data-driven/high-dimensional approaches are increasingly adopted across EHR research more generally and may offer improved ability to characterise these complex concepts and mitigate confounding.
Project objectives:
(1) Develop theoretical and data-driven frameworks for identifying key potential confounders
(2) Compare high-dimensional approaches with an investigator-led confounding strategy, applied to a vaccine safety/effectiveness case study (e.g. real-world effectiveness of seasonal influenza or RSV vaccine
(3) Evaluation of high-dimensional approaches to provide recommendations for future studies.
Project Key Words
Electronic health records, vaccine research
MRC LID Themes
- Health Data Science
- Infectious Disease
- Translational and Implementation Research
Skills
MRC Core Skills
- Quantitative skills
- Interdisciplinary skills
Skills we expect a student to develop/acquire whilst pursuing this project
This project will develop the student’s skills in statistical methodology, statistical modelling, machine learning, and vaccine epidemiology using EHR data. Successful completion of the project will result in a framework and software, which will be made freely available, for researchers to apply the generated approaches for addressing confounding in vaccine studies using EHR data. This will be an invaluable resource which is likely to be widely used in the ongoing and future vaccine studies and could be adapted for other studies using these data.
Routes
Which route/s are available with this project?
- 1+4 = Yes
- +4 = Yes
Possible Master’s programme options identified by supervisory team for 1+4 applicants:
- LSHTM – MSc Epidemiology
- LSHTM – MSc Health Data Science
- LSHTM – MSc Medical Statistics
Full-time/Part-time Study
Is this project available for full-time study? Yes
Is this project available for part-time study? Yes
Location & Travel
Students funded through MRC LID are expected to work on site at their primary institution, meeting – at the minimum – the institutional research degree regulations and expectations. Students may also be required to travel for conferences (up to 3 over the duration of the studentship), and for any required training (for research degree study). Other travel expectations and opportunities highlighted by the supervisory team are noted below.
Primary location for duration of this research degree: LSHTM, London
Travel requirements for this project: The project can be completed without travel to additional sites. However, there is the possibility of travelling for short periods (e.g. short courses/research visits), if desired.
Eligibility/Requirements
Particular prior educational requirements for a student undertaking this project
- Minimum LSHTM institutional eligibility criteria for doctoral study.
- MSc in medical statistics, health data science, epidemiology or equivalent. Students holding an MSc in a less quantitative area may still be eligible if they can demonstrate sufficient quantitative aptitude/experience.
Other useful information
- Potential Industrial CASE (iCASE) conversion? = No
PROJECT IN MORE DETAIL
Scientific description of this research project
Electronic health record (EHR) data are invaluable for health research, including the real-world effectiveness and safety of vaccines . A key challenge for estimating causal effects using EHR is the identification and selection of important confounders. Data-driven approaches offer promise, but there is limited evidence on their performance against traditional adjustment strategies. This project tackles a key concern of observational vaccine studies: how to address confounding due to differences between vaccinated and unvaccinated individuals.
Many determinants of vaccine uptake are complex and not directly measurable in EHR data. Recent work has identified a set of proxy markers for health-seeking behaviour and healthcare access in EHR. A range of markers are needed to capture these complex phenomena, and each marker may behave differently. Extension of this approach to other factors influencing vaccine uptake, such as frailty and disability, may improve control of confounding, but their complexity poses a challenge for traditional variable selection and adjustment. This project will explore the potential value of a high-dimensional data-driven approach to characterise and adjust for complex confounders in vaccine studies.
The objectives of this PhD are to
(1) Develop theoretical and data-driven frameworks for identifying markers of potential confounders
(2) Compare high-dimensional approaches with an investigator-led confounding adjustment strategy in a case study
(3) Evaluate the application of high-dimensional approaches across different scenarios.
Techniques to be used
(1) Identifying markers of potential confounders
The student will use the ‘3Cs’ model of vaccine hesitance to extend previous work on markers of health-seeking behaviour and healthcare access to other complex determinants of vaccine uptake. The student will identify (a) a parsimonious set of potential confounders based on a theoretical framework and (b) a highly dimensional set of markers using data-driven approaches (e.g. via high-dimensional propensity score algorithm/deep learning)
(2) Comparison of high-dimensional approaches with an investigator-led adjustment strategy in a case study
A vaccine safety/effectiveness case study will be conducted using a causal analysis framework applying both adjustment strategies from Objective (1). The use of machine learning for variable selection in high-dimensional approaches will be investigated. Results under the different strategies will be compared empirically and using established diagnostic measures. Case study selection will be adjusted to student preference: suitable examples could include seasonal influenza effectiveness, and real-world effectiveness of the newly-introduced RSV vaccine.
(3) Evaluation of the use of high-dimensional approaches across a range of contexts.
High-dimensional approaches will be applied to a second case study and using plasmode simulation studies to evaluate their performance across a range of contexts. Results from the case studies and simulations will be synthesised to provide recommendations for confounder adjustment in vaccine studies.
Confirmed availability of databases
All supervisors have access to CPRD databases and are members of the Electronic Health Record Research Group at LSHTM.
Potential risks to the project and plans for mitigation
We have extensive experience obtaining approval for CPRD data and using these data to study vaccines. In the unlikely scenario these data were unavailable, we have experience of suitable alternatives, e.g. OpenSAFELY.
Further reading
Relevant preprints and/or open access articles:
(DOI = Digital Object Identifier)
Additional information from the supervisory team
The supervisory team has provided a recording for prospective applicants who are interested in their project. This recording should be watched before any discussions begin with the supervisory team.
Tazare-McDonald-Williamson recording
MRC LID LINKS
- To apply for a studentship: MRC LID How to Apply
- Full list of available projects: MRC LID Projects
- For more information about the DTP: MRC LID About Us