Boden lab: Bioinformatics

Thanks to major advances in biotechnology and instrumentation, biology is becoming an information centred science. The field of bioinformatics draws on computer science, math and statistics to enable discoveries in biological data sets. Our research aims to develop, investigate and apply bioinformatics methodologies to understand and resolve a range of open problems in genomics, molecular and systems biology. To this end, my group uses probabilistic modelling, machine learning and data analytics, in particular, and we regularly publish prediction services and computational tools open to the scientific community.

Biological data are now available at scales that challenges our ability to process and analyse them. On the flip side, greater scale gives statistical power to distinguish biologically meaningful signals from mere noise or artefacts, i.e. to identify "drivers" and "determinants" of function and structure. Sometimes the number of features (that describe each observation) is so great that we must use (biological) expertise to constrain the search for signals.

Broadly put, our research aims to

  1. effectively manage the complexity of operations involved in analysing millions of sequence reads, thousands of genomes, and proteomes of thousands of dynamically regulated molecules, etc
  2. enable the seamless aggregation (or integration) of uncertain and incomplete data, typical of the next wave of biotechnology, across genomics, proteomics, structural biology, etc, and of using biological expertise
  3. empower the interpretation of "whole system" data, aimed at understanding of basis of disease and other scientifically relevant phenotypes, using statistics and machine learning


TRIAGE: A disease agnostic computational and modelling platform to accelerate variant classification (Palpant, Yanes, Stark, Ingles, Harvey, McGaughran, Fatkin, Atherton, Hill, Bodén, Mallett, Bagnall, Shah, Coombes, Richardson) (Medical Research Future Fund 2021 Genomics Health Futures Mission; Dec 2022 - 2025)

Over a third of all patients with genetic testing do not have a known cause of disease. This research program implements new methods to reveal genes that are the likely cause of disease. We couple these predictions with disease-agnostic modelling to determine whether specific disease variants identified in patients are the cause of disease. Collectively, these approaches will facilitate accelerated classification of disease causing variants for any disease.

What is the common factor driving brain overgrowth in autism? Investigating the relationship between epigenetic marks and neural stem cell proliferation (Piper, Thor, Bodén) (The Simons Foundation Autism Research Initiative 2022 Pilot Award; 2022 - 2024)

We will analyze how methylation of H3K36 regulates neural development, with the rationale behind our work being that H3K36 methylation-mediated regulation of NSC proliferation is a central mechanism controlling brain size during development, and that deficits in this process lead to megalencephaly and ASD.

Dual-function ribonucleases: unexpected agents of antibiotic resistance (with Hugenholtz, Schenk, Soo and Schofield) (National Health and Medical Research Council 2010390; Jan 2022 - Dec 2024)

Antibiotic resistance is a major global problem. Metallo-β-lactamases (MBLs) are of particular concern as they inactivate the most widely prescribed class of antibiotics, the β-lactams, and no clinically useful MBL inhibitor is currently available. While investigating environmental reservoirs of one MBL subgroup (B3), we identified major lineages of zinc-dependent ribonucleases (RNaseZ and β-CASP nucleases), which share a common ancestor with B3 MBLs. Unexpectedly, representatives of these distantly related enzymes had a moderate to pronounced ability to confer β-lactam resistance, pointing to an unrecognized reservoir of antibiotic resistance that may comprise thousands of genes across multiple biomes and hosts (including bacteria, virus, fungi and animals). Furthermore, we experimentally demonstrated that removal of the β-CASP domain from β-CASP nucleases results in enhanced β-lactamase activity. This raises the possibility that such ribonucleases could be "weaponised" for antibiotic resistance. This project will use deep sequence analysis to map the reservoir of these dual-function enzymes (Aim 1), reconstruct ancestral forms in order to define factors that underpin their antibiotic-degrading activity (Aim 2), and evaluate how this activity can evolve in real time from ribonucleases (Aim 3), to link the past, present and future states of a highly effective antibiotic resistance mechanism.

EnzOnomy - an enzyme-based production pipeline for the bioeconomy (with Schenk, Guddat, Hine and Sieber) (Australian Research Council Discovery Project 210101802; Jan 2021 - Dec 2023)

This project aims to harness the potential of protein engineering to develop a technology (EnzOnomy) to convert renewable raw material (e.g. sugar) into platform chemicals (e.g. isobutanol, a building block for jet fuels, fibers, plastics and antioxidants). It sets out to use a raft of data driven analyses, to identify and use biological sequence variation to optimise enzymes for different applications, including the use and continued development of phylogenetics centred tools from the lab.

Reconstructing protein populations of the past to explain functional specificity and engineer biological diversity (with Gillam, Kobe and Rost) (Australian Research Council Discovery Project 160100865; Jan 2016 - Dec 2019)

The aim of this project is to develop computational methods to construct entirely new proteins that operate in combinations Nature never tried. Computational reconstruction of enzymes that have been extinct for over 400 million years has revealed remarkable opportunities for biotechnological innovation. The intended outcomes are to develop bioinformatics methods to broaden the scope of ancestral protein reconstruction to include protein super-families, to establish what specific changes led to the evolutionary success of a protein, and to re-run evolution to generate proteins that perform in conditions suitable for industrial and agricultural applications, in particular the production of hydroxylated fatty acids for bioplastics.

Tracing nature's template: Using statistical machine learning to evolve biocatalysts (with Gillam, UQ) (Australian Research Council Discovery Project 120101772; Jan 2012 - Dec 2015)

Proteins like the P450 enzymes are highly versatile biological catalysts with untapped potential to improve the efficiency of chemical industries, lessen their environmental impact, reduce drug development costs, and help to remediate environmental contamination. In this project we use statistical machine learning to reveal how to redesign proteins for industrial use based on the examples Nature has refined over millions of years. By making and characterising experimentally a large library of mutant proteins we use statistical methods to detect previously hidden relationships between protein sequence and structural stability, then mine this data using machine learning to predict how to best design commercially useful proteins.

A systems biology approach to elucidate common principles and mechanisms underlying triplet repeat expansion associated genetic defects (with Balasubramanian, Monash University; Arumugam, UQ; Wiles, UQ; Sarsero, Murdoch Childrens Research Inst.) (National Health and Medical Research Council Project 1004112; June 2011 - Dec 2014)

Several human genetic diseases that affect the nervous system occur due to expansions of the DNA repeats in the genome. This project uses a combination of cutting edge technologies such as systems biology and genomics to uncover the common principles and use them to devise novel therapeutic strategies. Specifically, we develop and/or use a range of bioinformatics methodologies to integrate information in large-scale biological data sets to construct models capable of predicting the occurrence of expanding repeats.

The group anno 2021. From left: Brad, Ariane, Gabe, Sam, Mikael, Rich. In the background: Phaedra, Kieran

The group anno 2016. From left: Rhys, Gabe, Sean, Julian, Marnie, Alex, Mikael, Yosephine, Chris, Burkhard (visiting).

  • Sorelle Bowman, Master student
  • Anoushka Shah, Master student
  • Hisatake Ishida, Winter research scholar
  • Rhys Newell, PhD student (at TRI with Gene Tyson & Ben Woodcroft)
  • Diya Prabhuram, Master student
  • Richard Pienaar, Master student
  • Kieran Convery, Project student
  • Woo Jun (Chris) Shim, PhD student, now Postdoc at IMB/UQ
  • Wenyu Pan, Summer research scholar/Master student
  • Qinglan Ou, Summer research scholar
  • Anastassia Demeschko, Summer research scholar
  • Alice Schulz, Summer research scholar
  • Tarisha Moodley, Master student
  • Asli Yoruk, Master student
  • Kate Wathen-Dunn, Master student, previously Sugar Research Australia, now Queensland Government, Department of Natural Resources, Mines and Energy
  • Alexandra Essebier, PhD student, now Data scientist at REDD Digital
  • Jhih-Siang (Sean) Lai, PhD student
  • Julian Zaugg, PhD student, now Bioinformatician/Software developer at Australian Centre for Ecogenomics
  • Suzanne Butcher, PhD student (primary supervisor Christine Wells)
  • Glen van den Bergen, PhD student (primary supervisor Alan Mark)
  • Marnie Lamprecht, Research officer
  • Yosephine Gumulya, Research officer, now Research Scientist at CSIRO
  • Jun Xu, Masters student
  • Isaac Asamoah, MPhil student
  • Aniek Roelofs, Research intern from University of Amsterdam
  • Ron Ramsay, Undergraduate research student
  • Ralph Patrick, PhD student, now Postdoc at Victor Chang Cardiac Research Institute
  • Timothy O'Connor, PhD student, now bioinformatician at Illumina San Diego
  • Sitthichoke Subpaiboonkit, PhD student
  • Patricia Vera Wolf, Master of Bioinformatics student
  • Sam Dai, MPhil Student (primary supervisor Bostjan Kobe)
  • Yufei Wang, Summer Research Scholar
  • Sebastian Seitz, Visiting student from TU Munich
  • Coralie Horin, Research Intern from Université Nice Sophia Antipolis
  • Danny Lee, Biotechnology Honours student
  • Elham Alhathli, Master of Bioinformatics student
  • Ahmed Mehdi, PhD student, now Postdoc at UQ/Diamantina
  • Diego Moncayo, Masters student
  • Minh Duc Cao, Postdoc, now at Gritstone Oncology/US
  • Kai Willadsen, Postdoc
  • German Ibarra, MPhil student
  • Samir Lal, Honours student, now Computational Biologist at Pfizer
  • Benjamin Merlet, Research Intern from Université Nice Sophia Antipolis


  • research/start.txt
  • Last modified: 2022/10/09 15:18
  • by mikael