Table of Contents

Boden lab: Bioinformatics

Research Topics

Thanks to major advances in biotechnology and instrumentation, biology is becoming an information centred science. The field of bioinformatics draws on computer science, math and statistics to enable discoveries in biological data sets. Our research aims to develop, investigate and apply bioinformatics methodologies to understand and resolve a range of open problems in genomics, molecular and systems biology. To this end, my group uses probabilistic modelling, machine learning and data analytics, in particular, and we regularly publish prediction services and computational tools open to the scientific community.

Biological data are now available at scales that challenges our ability to process and analyse them. On the flip side, greater scale gives statistical power to distinguish biologically meaningful signals from mere noise or artefacts, i.e. to identify "drivers" and "determinants" of function and structure. Sometimes the number of features (that describe each observation) is so great that we must use (biological) expertise to constrain the search for signals.

Broadly put, our research aims to

  1. effectively manage the complexity of operations involved in analysing millions of sequence reads, thousands of genomes, and proteomes of thousands of dynamically regulated molecules, etc
  2. enable the seamless aggregation (or integration) of uncertain and incomplete data, typical of the next wave of biotechnology, across genomics, proteomics, structural biology, etc, and of using biological expertise
  3. empower the interpretation of "whole system" data, aimed at understanding of basis of disease and other scientifically relevant phenotypes, using statistics and machine learning

Publications

Invited talks

ASR symposium @ UNSW Sydney 23 Mar 2023

Library Design for Protein Engineering @ Okinawa, Japan 6-9 Nov 2023

International Conference on Bioinformatics 12-15 Nov 2023

Funded Projects

TRIAGE: A disease agnostic computational and modelling platform to accelerate variant classification (Palpant, Yanes, Stark, Ingles, Harvey, McGaughran, Fatkin, Atherton, Hill, Bodén, Mallett, Bagnall, Shah, Coombes, Richardson) (Medical Research Future Fund 2021 Genomics Health Futures Mission; Dec 2022 - 2025)

Over a third of all patients with genetic testing do not have a known cause of disease. This research program implements new methods to reveal genes that are the likely cause of disease. We couple these predictions with disease-agnostic modelling to determine whether specific disease variants identified in patients are the cause of disease. Collectively, these approaches will facilitate accelerated classification of disease causing variants for any disease.

What drives the Anterior Expansion of the Central Nervous System? (Thor, Piper, Bodén) (Australian Research Council Discovery Project 230101750; Jan 2023-Dec 2025)

A striking and highly conserved feature of the central nervous system is that the brain is larger than the spinal cord. Despite the manifest implications this has for nervous system function, the underlying drivers are largely unknown. This project aims to investigate the mechanisms controlling anterior expansion of the central nervous system, and will generate new knowledge in the areas of nervous system development and evolution. This project aims to impact on our understanding of nervous system function, develop bioinformatics tools with broad utility within the biosciences field, strengthen Australia’s international standing in the developmental neuroscience, and enhance the capacity for interdisciplinary international collaborations.

What is the common factor driving brain overgrowth in autism? Investigating the relationship between epigenetic marks and neural stem cell proliferation (Piper, Thor, Bodén) (The Simons Foundation Autism Research Initiative 2022 Pilot Award; 2022 - 2024)

We will analyze how methylation of H3K36 regulates neural development, with the rationale behind our work being that H3K36 methylation-mediated regulation of NSC proliferation is a central mechanism controlling brain size during development, and that deficits in this process lead to megalencephaly and ASD.

Dual-function ribonucleases: unexpected agents of antibiotic resistance (with Hugenholtz, Schenk, Soo and Schofield) (National Health and Medical Research Council 2010390; Jan 2022 - Dec 2024)

Antibiotic resistance is a major global problem. Metallo-β-lactamases (MBLs) are of particular concern as they inactivate the most widely prescribed class of antibiotics, the β-lactams, and no clinically useful MBL inhibitor is currently available. While investigating environmental reservoirs of one MBL subgroup (B3), we identified major lineages of zinc-dependent ribonucleases (RNaseZ and β-CASP nucleases), which share a common ancestor with B3 MBLs. Unexpectedly, representatives of these distantly related enzymes had a moderate to pronounced ability to confer β-lactam resistance, pointing to an unrecognized reservoir of antibiotic resistance that may comprise thousands of genes across multiple biomes and hosts (including bacteria, virus, fungi and animals). Furthermore, we experimentally demonstrated that removal of the β-CASP domain from β-CASP nucleases results in enhanced β-lactamase activity. This raises the possibility that such ribonucleases could be "weaponised" for antibiotic resistance. This project will use deep sequence analysis to map the reservoir of these dual-function enzymes (Aim 1), reconstruct ancestral forms in order to define factors that underpin their antibiotic-degrading activity (Aim 2), and evaluate how this activity can evolve in real time from ribonucleases (Aim 3), to link the past, present and future states of a highly effective antibiotic resistance mechanism.

EnzOnomy - an enzyme-based production pipeline for the bioeconomy (with Schenk, Guddat, Hine and Sieber) (Australian Research Council Discovery Project 210101802; Jan 2021 - Dec 2023)

This project aims to harness the potential of protein engineering to develop a technology (EnzOnomy) to convert renewable raw material (e.g. sugar) into platform chemicals (e.g. isobutanol, a building block for jet fuels, fibers, plastics and antioxidants). It sets out to use a raft of data driven analyses, to identify and use biological sequence variation to optimise enzymes for different applications, including the use and continued development of phylogenetics centred tools from the lab.

Reconstructing protein populations of the past to explain functional specificity and engineer biological diversity (with Gillam, Kobe and Rost) (Australian Research Council Discovery Project 160100865; Jan 2016 - Dec 2019)

The aim of this project is to develop computational methods to construct entirely new proteins that operate in combinations Nature never tried. Computational reconstruction of enzymes that have been extinct for over 400 million years has revealed remarkable opportunities for biotechnological innovation. The intended outcomes are to develop bioinformatics methods to broaden the scope of ancestral protein reconstruction to include protein super-families, to establish what specific changes led to the evolutionary success of a protein, and to re-run evolution to generate proteins that perform in conditions suitable for industrial and agricultural applications, in particular the production of hydroxylated fatty acids for bioplastics.

Tracing nature's template: Using statistical machine learning to evolve biocatalysts (with Gillam, UQ) (Australian Research Council Discovery Project 120101772; Jan 2012 - Dec 2015)

Proteins like the P450 enzymes are highly versatile biological catalysts with untapped potential to improve the efficiency of chemical industries, lessen their environmental impact, reduce drug development costs, and help to remediate environmental contamination. In this project we use statistical machine learning to reveal how to redesign proteins for industrial use based on the examples Nature has refined over millions of years. By making and characterising experimentally a large library of mutant proteins we use statistical methods to detect previously hidden relationships between protein sequence and structural stability, then mine this data using machine learning to predict how to best design commercially useful proteins.

A systems biology approach to elucidate common principles and mechanisms underlying triplet repeat expansion associated genetic defects (with Balasubramanian, Monash University; Arumugam, UQ; Wiles, UQ; Sarsero, Murdoch Childrens Research Inst.) (National Health and Medical Research Council Project 1004112; June 2011 - Dec 2014)

Several human genetic diseases that affect the nervous system occur due to expansions of the DNA repeats in the genome. This project uses a combination of cutting edge technologies such as systems biology and genomics to uncover the common principles and use them to devise novel therapeutic strategies. Specifically, we develop and/or use a range of bioinformatics methodologies to integrate information in large-scale biological data sets to construct models capable of predicting the occurrence of expanding repeats.

Group members

The group anno 2023 (Nov). From left: Linh, Ruyi, Olly, Gabe, Mikael, Sam, Seb, Chongting, Zach, Sanjana

The new lab contingent of USA Nov 2023. From left: Ariane, Brad

The group anno 2023 (Oct). From left: Gabe, Mikael, Olly, Linh, Seb, Sanjana, Will, Ruyi, Brad

The group anno 2021. From left: Brad, Ariane, Gabe, Sam, Mikael, Rich. In the background: Phaedra, Kieran

The group anno 2016. From left: Rhys, Gabe, Sean, Julian, Marnie, Alex, Mikael, Yosephine, Chris, Burkhard (visiting).

Alumni

Authors

Created by admin (Mikael Boden) on 2017/01/17 09:30.

Contributing authors: