Quantitative Skills for Large Data Sets
Theme Leads: Elizabeth Williamson (LSHTM) and David Strachan (SGUL)
Observational epidemiology is changing with increasing awareness of the limitations of standard analytical methods. Increasing availability of large and complex datasets, enhanced by extensive linkage between diverse data sources, requires highly-skilled quantitative researchers able to adapt traditional analytic approaches and to develop new ones. This particularly applies to extending current ‘big data’ methods (that are appropriate for prediction) to allow causal inference. This theme thus addresses the MRC skill priority in quantitative skills and the MRC strategic research aims of investigating health through the life course, and how lifestyles and environment affect health.
LSHTM and SGUL have been at the forefront of major recent methodological developments in, and applications of, research methodologies for large data sets. Developing and applying optimal approaches to analyse linked electronic healthcare records (e.g. Clinical Practice Research Datalink), and the development of novel methods for record linkage, are priorities. Both institutions have a strong track record in the development and application of methods for causal inference – directed acyclic graphs, propensity-score based methods, instrumental variable methods including Mendelian randomization, mediation analysis within life-course epidemiology, structural models and g-methods – and related issues, including methods for handling missing data with a focus on multiple imputation and measurement error methods. Both institutions have strong track records in methods for disease modelling and projections of disease (distributed lag non-linear models, methods based on Markov-Chain Monte Carlo, approximate Bayesian Computations, emulation, and development of methods for efficient and rigorous fitting of complex models to data).
Interdisciplinary work is facilitated by the LSHTM Centres for Statistical Methodology and for Mathematical Modelling of Infectious Disease. A key feature of this theme is our commitment to methodological development to answer important causal medical and epidemiological questions, with our students applying novel methods to high-profile epidemiological studies. Students will benefit from extensive academic collaborations and partnerships with, e.g., the Farr Institute of Health Informatics, the MRC Clinical Trials Unit, PHE, and industry partners (e.g. GSK). This programme will equip a cohort of researchers with strong quantitative skills and a breadth of experience in developing, applying and evaluating complex quantitative methods for causal inference in diverse settings. Training will be delivered through existing short-courses (eg Causal Inference in Epidemiology; Statistical Analysis with Missing Data; Mathematical Modelling), supplemented by accredited modules providing general epidemiological and statistical skills.