Steven Watterson

Logo

Computational Cardiovascular Research @ Ulster University

Ideas List

Background

At the Northern Ireland Centre for Stratified Medicine, part of Ulster University, we recruited approx. 2500 patients across a range of immune and pro-inflammatory diseases, collecting their clinical data, genome sequence data and protein activity data in order to better learn how to stratify patients into specific sub-groups depending on disease and treatment characteristics. Our research is focused on following range of degenerative inflammatory/immune related diseases that each have shared genetic or lifestyle determinants: (a) cardiovascular disease, (b) diabetes, (c) rheumatoid arthritis, (d) mental health, (e) alzheimer’s and (f) cancer, including multiple myeloma, prostate, breast and head & neck cancers.

We have measurements of whole genome sequences, methylation, protein expression and abundence, and the microbiome of patients recruited in each of the above disease areas. By interrogating their phenotypic and omics data we aim to better inform the diagnoses and treatment, particularly for comorbid conditions that share a pro-inflammatory etiology. This work is driving the discovery and validation of biomarkers associated with disease predisposition and response to therapy, with the intenntion of ultimately devising more effective therapies.

Project topics

We are looking for projects that yield tools that better enable us to mine this data for insight. The general challenges we face include i) the automation of analysis, ii) the combinatorial scaling of analysis when subgroups are scanned and iii) the validation of mechanisms driving identified.

Idea 1 - Software to simulate health and disease

We have produced a number of openly available mathematical and computational models of cardiovascular disease. However, they currently exist in formats that are too obscure for clinicians and lab biologists to use. We would love to turn these models into easy-to-use predictive software tools with which clinicians or biologist could predict the effect of a mutation or a drug.

Expected outcomes: a piece of desktop software or a web app.

Required skills: Project management. GUI development. An understanding of numerical integration. We’re largely platform agnostic. An interest in biomedical science would be useful, but not essential.

Possible mentors: Dr Steven Watterson, Dr Priyank Shukla

Difficulty rating: Low

Idea 2 – Automated model optimisation

We have produced a number of mathematical and computational models of cardiovascular disease at the centre. An enduring challenge to using these models is that a significant proportion of the parameters have never been measured and must be estimated by fitting to experimental data. It would be very useful to be able to map out the dynamics of such models across the space of the unknown parameters and to identify the regions of parameter space corresponding to various model dynamics.

Expected outcomes: a function or small library.

Required skills: Project management. An understanding of numerical integration. We’re largely platform agnostic. An interest in biomedical science would be useful, but not essential.

Possible mentors: Dr Steven Watterson, Dr Priyank Shukla

Difficulty rating: Medium

Idea 3 - A tool set to extract the most update gene annotations of a given species

The NCBI (National Center for Biotechnology Information) provides biological databases which are queried by thousands of biomedical researchers daily all over the world. The NCBI Gene database stores records of all genes discovered or predicted and is updated regularly from the ever growing scientific literature. We often need to map gene identifiers among several different ID systems. However, for historical reason, almost all genes have different names in different databases, causing confusions and mismatched annotations. It would dbe very useful to have a tool set that can query the NCBI Gene database and others to extract the most up to date gene annotations.

Users would specify a set of gene identifiers from one ID system (Gene Symbol, Entrez Gene ID, RefSeq, GeneBank IDs, Protein ID, etc) and the tool will query the NCBI databases using the APIs provided by NCBI (Entrez Programming Utilities) to retrieve the latest gene records that map to the identifiers input. Such a tool set will be extremely useful in biomedical and bioinformatics research as the retrieved records would then serve as inputa to other bioinformatics applications. This would add a new and invaluable tool to the NCBI GitHub repository.

Expected outcomes: a function or small library.

Required skills: Project management. API experience. We’re largely platform agnostic. An interest in biomedical science would be useful, but not essential.

Possible mentors: Dr Shu-Dong Zhang, Dr Priyank Shukla

Difficulty rating: Medium

Idea 4 – Tools for the identification of patient subgroups

We currently use machine learning methods with patient data to build predictors of a patients disease or treatment outcomes. We would be very interested in software tools that enable us identify the patient subgroups for which the accuracy of prediction increases. This could be done by enumerating all permutations of patient subgroups and globally searching this space for the best performing classifications or locally by undertaking crude sweeps of patient subgroups and then using convergent algorithms (simulated annealing or genetic algorithms) to identify patient subgroups that can be classified effectively.

Expected outcomes: a function or small library.

Required skills: Project management. An understanding of machine learning and optimisation. We’re largely platform agnostic. An interest in biomedical science would be useful, but not essential.

Possible mentors: Dr Steven Watterson, Dr Priyank Shukla

Difficulty rating: High

Idea 5 - CellWhere MyoNeuro: automated molecular network schema for neuromuscular disease

When lab biologists study the genetic and molecular biology of diseases like Motor Neuron Disease or Duchenne Muscular Dystrophy, it’s very important to be able to conceptualize what is happening inside the diseased cell. The cell is a complex structure composed of many compartments, each containing thousands of different proteins. Each protein can physically bind to other proteins and the set of all possible binds describes molecular interaction networks (MINs). Large databases exist describing our knowledge of proteins, the compartments to which they belongand and the other proteins to which they bind.

Our existing CellWhere tool creates a schema of the cell and its compartments, building a MIN around a query list of proteins. At present, CellWhere treats the cell as a generic structure. We would like to transition CellWhere to become an open-source resource. Coding will focus on web-design (PHP, Javascript, CSS, etc) and there is scope for great creativity in realizing beautiful, stylistic representations of the cell. Along the way, we would automate extraction of cell schema for the neurons and muscle cells involved in neuromuscular disease.

Expected outcomes: a new open source code-base.

Required skills: Project management. PHP. Javascript. CSS. An interest in biomedical science would be useful, but not essential.

Possible mentors: Dr Bill Duddy, Dr Steven Watterson

Difficulty rating: High

Idea 6 - Your own idea!

Do you have a background in bioinformatics? Are there any tools you’d like to develop that are applicable to multi-disease, multi-omic’s data sets?

Expected outcomems: You tell us!

Required skills: You tell us!

Potential mentors: Dr Steven Watterson, Dr Priyank Shukla, Dr Shu-Dong Zhang, Dr Bill Duddy

Difficulty rating: You tell us!