2025-26 Project (Carroll & Pittman)
Advancing genotype-phenotype correlations with Multiplex Assays of Variant Effect
SUPERVISORY TEAM
Supervisor
Dr Christopher Carroll at City St George’s
Email: ccarroll@sgul.ac.uk
Co-Supervisor
Dr Alan Pittman at City St George’s
Email: apittman@sgul.ac.uk
PROJECT SUMMARY
Project Summary
In this project the student will integrate next-generation sequencing genomic data with high-content functional analyses of gene variant function to improve the genetic diagnoses for rare diseases. Data from Multiplex Assays of Variant Effect (MAVE) experiments are anticipated provide insight into gene variant function and to improve the diagnostic yield for individuals suspected of having a genetic disease. MAVE data will be incorporated into bioinformatic pipelines for analysis of whole-genome and whole-exome datasets of cohorts of individuals with neurodevelopmental and neuromuscular disorders. The outcome of these studies will be to provide a molecular diagnosis for individuals with rare disease, allowing for genetic counselling and gene or variant specific therapies.
Project Key Words
MAVE, next-generation-sequencing, genetics, genomics, bioinformatics
MRC LID Themes
- Health Data Science
- Translational and Implementation Research
Skills
MRC Core Skills
- Quantitative skills
Skills we expect a student to develop/acquire whilst pursuing this project
Bioinformatic pipeline scripting using Python and R and analyses performed on SGUL STATS3/ HPC for analyses of Whole-exome and whole-genome sequencing for raw data processing, read alignment, variant calling and annotation. Database development using mySQL
Routes
Which route/s are available with this project?
- 1+4 = Yes
- +4 = Yes
Possible Master’s programme options identified by supervisory team for 1+4 applicants:
- City St George’s – MRes Biomedical Science – Clinical Biomedical Research
- City St George’s – MRes Biomedical Science – Reproduction and Development
- City St George’s – MSc Genomic Medicine
Full-time/Part-time Study
Is this project available for full-time study? Yes
Is this project available for part-time study? Yes
Location & Travel
Students funded through MRC LID are expected to work on site at their primary institution, meeting – at the minimum – the institutional research degree regulations and expectations. Students may also be required to travel for conferences (up to 3 over the duration of the studentship), and for any required training (for research degree study). Other travel expectations and opportunities highlighted by the supervisory team are noted below.
Primary location for duration of this research degree: City St George’s, London
Travel requirements for this project: None
Eligibility/Requirements
Particular prior educational requirements for a student undertaking this project
- Minimum City St George’s institutional eligibility criteria for doctoral study.
- 2:1 undergraduate degree or Merit MSc in biology or related degree
Other useful information
- Potential Industrial CASE (iCASE) conversion? = No
PROJECT IN MORE DETAIL
Scientific description of this research project
In 2022, the NHS released its five-year strategy “Accelerating genomic medicine in the NHS” with the ambition to firmly embed genomics into routine healthcare. However, the recently completed 100,000 Genome Project pilot study only achieved a molecular diagnosis for around 35% of patients with rare diseases, indicating significant challenges in molecular diagnostics need to be overcome before these ambitions can be realised. The main challenge is distinguishing pathogenic from benign variants. Clinical assessment of variants for pathogenicity are often inconclusive, with approximately 50% of around three million curated variants in ClinVar classified as “variants of uncertain significance” (VUS), largely because of a lack of functional evidence of variant effect. To address functional effects of genetic variants, assays known as Multiplex assays of variant effect (MAVE) have been recently developed for large scale functional assessment of genetic variants. In this project, we will apply functional data from MAVE assays for the clinical interpretation of genetic variants in patients with rare diseases.
Project Objectives
1. Curate MAVE datasets
MAVE data for relevant rare disease genes will be obtained from Mave-DB, a repository for datasets from Multiplexed Assays of Variant Effect, and from our own experimental MAVE data We will focus on rare diseases of our own main research interests including neurodevelopmental and neuromuscular disorders. A local database will be established of MAVE data which can be used in the annotation of whole-genome and whole-exome sequencing data.
2. Apply MAVE data to analysis of whole-genome and whole-exome sequencing data
We will use MAVE datasets to aid clinical interpretation of variants of uncertain significance in individuals with a suspected genetic disorder in the 100,000 Genomes Projects, our own in-house database exome-sequencing and genome-sequencing datasets of more than >3000 individuals, ClinVar and in the literature. A bioinformatic pipeline for variant annotation with MAVE-derived variant specific functional scores will be developed and used to annotate whole-genome and whole-exome sequencing datasets.
3. Evaluate impact of implementing MAVE datasets to clinical interpretation of variants of uncertain significance
Whole-exome and whole-genome datasets will be re-evaluated with MAVE scores and clinical interpretation of variants of uncertain significance will be assessed according to American College of Medical Genetics criteria.
Techniques to be used
Bioinformatic pipeline scripting using Python and R and analyses performed on SGUL STATS3/ HPC for analyses of Whole-exome and whole-genome sequencing for raw data processing, read alignment, variant calling and annotation. Database development using mySQL
Confirmed availability of any required databases or specialist materials
MAVE-DB is an open-source platform to distribute and interpret data from MAVE studies.
ClinVar is a publicly available resource.
The project supervisors are members of several GeCIP domains of the 100,000 Genomes Project
Potential risks to the project and plans for their mitigation
Data is available so risks are low. Risks may include interruption to computing networks or corruption of data. University has regular backups of data and some analyses can be performed on local hardware.
Further reading
Relevant preprints and/or open access articles:
(DOI = Digital Object Identifier)
Additional information from the supervisory team
The supervisory team has provided a recording for prospective applicants who are interested in their project. This recording should be watched before any discussions begin with the supervisory team.
MRC LID LINKS
- To apply for a studentship: MRC LID How to Apply
- Full list of available projects: MRC LID Projects
- For more information about the DTP: MRC LID About Us