Thanks to major advances in biotechnology and instrumentation, biology is becoming an information centred science. The field of bioinformatics draws on computer science, math and statistics to enable discoveries in biological data sets. Our research aims to develop, investigate and apply bioinformatics methodologies to understand and resolve a range of open problems in genomics, molecular and systems biology. Recent applications involve protein sorting, nuclear protein organisation, mechanisms of transcriptional regulation, sequence and structure determinants of protein function and modification, and protein engineering.
Biological data are now available at scales that challenges our ability to process and analyse them. On the flip side, greater scale gives statistical power to distinguish biologically meaningful signals from mere noise or artefacts, i.e. to identify "drivers" and "determinants" of function and structure. Sometimes the number of features (that describe each observation) is so great that we must use (biological) expertise to constrain the search for signals.
Broadly put, our research aims to
Reconstructing protein populations of the past to explain functional specificity and engineer biological diversity (with Gillam, Kobe and Rost) (Australian Research Council Discovery Project 160100865; Jan 2016 - Dec 2018)
The aim of this project is to develop computational methods to construct entirely new proteins that operate in combinations Nature never tried. Computational reconstruction of enzymes that have been extinct for over 400 million years has revealed remarkable opportunities for biotechnological innovation. The intended outcomes are to develop bioinformatics methods to broaden the scope of ancestral protein reconstruction to include protein super-families, to establish what specific changes led to the evolutionary success of a protein, and to re-run evolution to generate proteins that perform in conditions suitable for industrial and agricultural applications, in particular the production of hydroxylated fatty acids for bioplastics.
Tracing nature's template: Using statistical machine learning to evolve biocatalysts (with Gillam, UQ) (Australian Research Council Discovery Project 120101772; Jan 2012 - Dec 2014)
Proteins like the P450 enzymes are highly versatile biological catalysts with untapped potential to improve the efficiency of chemical industries, lessen their environmental impact, reduce drug development costs, and help to remediate environmental contamination. In this project we use statistical machine learning to reveal how to redesign proteins for industrial use based on the examples Nature has refined over millions of years. By making and characterising experimentally a large library of mutant proteins we use statistical methods to detect previously hidden relationships between protein sequence and structural stability, then mine this data using machine learning to predict how to best design commercially useful proteins.
A systems biology approach to elucidate common principles and mechanisms underlying triplet repeat expansion associated genetic defects (with Balasubramanian, Monash University; Arumugam, UQ; Wiles, UQ; Sarsero, Murdoch Childrens Research Inst.) (National Health and Medical Research Council Project 1004112; June 2011 - Dec 2013)
Several human genetic diseases that affect the nervous system occur due to expansions of the DNA repeats in the genome. This project uses a combination of cutting edge technologies such as systems biology and genomics to uncover the common principles and use them to devise novel therapeutic strategies. Specifically, we develop and/or use a range of bioinformatics methodologies to integrate information in large-scale biological data sets to construct models capable of predicting the occurrence of expanding repeats.
The group anno 2016. From left: Rhys, Gabe, Sean, Julian, Marnie, Alex, Mikael, Yosephine, Chris, Burkhard (visiting).
…