Boden lab: Bioinformatics

Research Topics

Thanks to major advances in biotechnology and instrumentation, biology is becoming an information centred science. The field of bioinformatics draws on computer science, math and statistics to enable discoveries in biological data sets. Our research aims to develop, investigate and apply bioinformatics methodologies to understand and resolve a range of open problems in genomics, molecular and systems biology. To this end, my group uses probabilistic modelling, machine learning and data analytics, in particular, and we regularly publish prediction services and computational tools open to the scientific community.

Biological data are now available at scales that challenges our ability to process and analyse them. On the flip side, greater scale gives statistical power to distinguish biologically meaningful signals from mere noise or artefacts, i.e. to identify "drivers" and "determinants" of function and structure. Sometimes the number of features (that describe each observation) is so great that we must use (biological) expertise to constrain the search for signals.

Broadly put, our research aims to

effectively manage the complexity of operations involved in analysing millions of sequence reads, thousands of genomes, and proteomes of thousands of dynamically regulated molecules, etc
enable the seamless aggregation (or integration) of uncertain and incomplete data, typical of the next wave of biotechnology, across genomics, proteomics, structural biology, etc, and of using biological expertise
empower the interpretation of "whole system" data, aimed at understanding of basis of disease and other scientifically relevant phenotypes, using statistics and machine learning

Publications

Invited talks

ASR symposium @ UNSW Sydney 23 Mar 2023

Library Design for Protein Engineering @ Okinawa, Japan 6-9 Nov 2023

International Conference on Bioinformatics 12-15 Nov 2023

Funded Projects

Synergistic industry partnership for bioproduction of platform chemicals (Schenk, Bodén, Hine, Guddat, Evans, Ley, Sieber) (ARC Linkage Project LP250100335; 2026-2028)

Isobutanol is an attractive chemical for the manufacture of a range of products (e.g. rubber, aviation fuels), but has yet to reach competitive yields and pricing. This project integrates the pioneering technique of ancestral sequence reconstruction, in combination with AI, to engineer biocatalysts for the competitive production of isobutanol using both cell-free reaction cascades and fermentative processes. The scalability and commercial impact of both production pathways will be monitored by technoeconomic analyses. This project aims to deliver a rapid pipeline for the design of optimal biocatalysts for sustainable bioproduction processes, will generate valuable IP and contribute to the growth of the aviation fuel industry in Australia.

Ancestral Sequence Reconstruction meets protein Language Models (Bodén, Rost) Queensland Department of Environment and Science 2025-2026

The broad objective is to develop and deliver a proof-of-principle framework that “smartly” re-designs nature’s enzymes to produce high-value fine chemicals under industrial conditions; our target enzymatic cascade creates isobutanol from sugar, addressing critical needs in Queensland and Bavaria's biotechnology sectors. We integrate enzyme family-specific insight of Ancestral Sequence Reconstruction with the protein-universe learned by Protein Language Models, enabling the design of novel proteins.

Learning from the past to design drugs for the future (Schenk, Gumulya, Bodén, Evans) (National Health and Medical Research Council 2036797) 2025-2028

Bacterial pathogens use a group of enzymes called metallo-β-lactamases (or MBLs) that enable them to be multi-drug resistant. Our project aims to determine how MBLs have evolved from harmless and widespread enzymes, from deep in the past until very recently in response to modern use of antibiotics. These insights will provide valuable functional and structural information that will enable the development of urgently needed strategies to combat β-lactam antibiotic resistance.

AI-designing enzymes for the bioeconomy (Bodén, Grimm) (Queensland-Bavaria Collaborative Research / Queensland State Government; 2024-2025)

This project specifically aims to develop novel strategies for so-called large language models popularised in AI to condition the generation of proteins with desired functions, with a focus on combining multiple functionally distinct enzymes into hybrid variants. By bringing together expertise in machine learning, bioinformatics, statistical genetics, and bioeconomy, the project aims to identify different training strategies and prototype AI-methods through collaborative hackathons.

TRIAGE: A disease agnostic computational and modelling platform to accelerate variant classification (Palpant, Yanes, Stark, Ingles, Harvey, McGaughran, Fatkin, Atherton, Hill, Bodén, Mallett, Bagnall, Shah, Coombes, Richardson) (Medical Research Future Fund 2021 Genomics Health Futures Mission; Dec 2022 - 2025)

Over a third of all patients with genetic testing do not have a known cause of disease. This research program implements new methods to reveal genes that are the likely cause of disease. We couple these predictions with disease-agnostic modelling to determine whether specific disease variants identified in patients are the cause of disease. Collectively, these approaches will facilitate accelerated classification of disease causing variants for any disease.

What drives the Anterior Expansion of the Central Nervous System? (Thor, Piper, Bodén) (Australian Research Council Discovery Project 230101750; Jan 2023-Dec 2025)

A striking and highly conserved feature of the central nervous system is that the brain is larger than the spinal cord. Despite the manifest implications this has for nervous system function, the underlying drivers are largely unknown. This project aims to investigate the mechanisms controlling anterior expansion of the central nervous system, and will generate new knowledge in the areas of nervous system development and evolution. This project aims to impact on our understanding of nervous system function, develop bioinformatics tools with broad utility within the biosciences field, strengthen Australia’s international standing in the developmental neuroscience, and enhance the capacity for interdisciplinary international collaborations.

What is the common factor driving brain overgrowth in autism? Investigating the relationship between epigenetic marks and neural stem cell proliferation (Piper, Thor, Bodén) (The Simons Foundation Autism Research Initiative 2022 Pilot Award; 2022 - 2024)

We will analyze how methylation of H3K36 regulates neural development, with the rationale behind our work being that H3K36 methylation-mediated regulation of NSC proliferation is a central mechanism controlling brain size during development, and that deficits in this process lead to megalencephaly and ASD.

Dual-function ribonucleases: unexpected agents of antibiotic resistance (with Hugenholtz, Schenk, Soo and Schofield) (National Health and Medical Research Council 2010390; Jan 2022 - Dec 2024)

Antibiotic resistance is a major global problem. Metallo-β-lactamases (MBLs) are of particular concern as they inactivate the most widely prescribed class of antibiotics, the β-lactams, and no clinically useful MBL inhibitor is currently available. While investigating environmental reservoirs of one MBL subgroup (B3), we identified major lineages of zinc-dependent ribonucleases (RNaseZ and β-CASP nucleases), which share a common ancestor with B3 MBLs. Unexpectedly, representatives of these distantly related enzymes had a moderate to pronounced ability to confer β-lactam resistance, pointing to an unrecognized reservoir of antibiotic resistance that may comprise thousands of genes across multiple biomes and hosts (including bacteria, virus, fungi and animals). Furthermore, we experimentally demonstrated that removal of the β-CASP domain from β-CASP nucleases results in enhanced β-lactamase activity. This raises the possibility that such ribonucleases could be "weaponised" for antibiotic resistance. This project will use deep sequence analysis to map the reservoir of these dual-function enzymes (Aim 1), reconstruct ancestral forms in order to define factors that underpin their antibiotic-degrading activity (Aim 2), and evaluate how this activity can evolve in real time from ribonucleases (Aim 3), to link the past, present and future states of a highly effective antibiotic resistance mechanism.

EnzOnomy - an enzyme-based production pipeline for the bioeconomy (with Schenk, Guddat, Hine and Sieber) (Australian Research Council Discovery Project 210101802; Jan 2021 - Dec 2023)

This project aims to harness the potential of protein engineering to develop a technology (EnzOnomy) to convert renewable raw material (e.g. sugar) into platform chemicals (e.g. isobutanol, a building block for jet fuels, fibers, plastics and antioxidants). It sets out to use a raft of data driven analyses, to identify and use biological sequence variation to optimise enzymes for different applications, including the use and continued development of phylogenetics centred tools from the lab.

Reconstructing protein populations of the past to explain functional specificity and engineer biological diversity (with Gillam, Kobe and Rost) (Australian Research Council Discovery Project 160100865; Jan 2016 - Dec 2019)

The aim of this project is to develop computational methods to construct entirely new proteins that operate in combinations Nature never tried. Computational reconstruction of enzymes that have been extinct for over 400 million years has revealed remarkable opportunities for biotechnological innovation. The intended outcomes are to develop bioinformatics methods to broaden the scope of ancestral protein reconstruction to include protein super-families, to establish what specific changes led to the evolutionary success of a protein, and to re-run evolution to generate proteins that perform in conditions suitable for industrial and agricultural applications, in particular the production of hydroxylated fatty acids for bioplastics.

Tracing nature's template: Using statistical machine learning to evolve biocatalysts (with Gillam, UQ) (Australian Research Council Discovery Project 120101772; Jan 2012 - Dec 2015)

Proteins like the P450 enzymes are highly versatile biological catalysts with untapped potential to improve the efficiency of chemical industries, lessen their environmental impact, reduce drug development costs, and help to remediate environmental contamination. In this project we use statistical machine learning to reveal how to redesign proteins for industrial use based on the examples Nature has refined over millions of years. By making and characterising experimentally a large library of mutant proteins we use statistical methods to detect previously hidden relationships between protein sequence and structural stability, then mine this data using machine learning to predict how to best design commercially useful proteins.

A systems biology approach to elucidate common principles and mechanisms underlying triplet repeat expansion associated genetic defects (with Balasubramanian, Monash University; Arumugam, UQ; Wiles, UQ; Sarsero, Murdoch Childrens Research Inst.) (National Health and Medical Research Council Project 1004112; June 2011 - Dec 2014)

Several human genetic diseases that affect the nervous system occur due to expansions of the DNA repeats in the genome. This project uses a combination of cutting edge technologies such as systems biology and genomics to uncover the common principles and use them to devise novel therapeutic strategies. Specifically, we develop and/or use a range of bioinformatics methodologies to integrate information in large-scale biological data sets to construct models capable of predicting the occurrence of expanding repeats.

Group members

Mikael Boden, Group leader
Georgina Joyce, Postdoc (with Schenk)
Woo Jun Shim, Postdoc (with Palpant)
Sanjana Tule, PhD student
Sam Davis, PhD student
Oliver Hughes, PhD student
Zachary Riedlshah, PhD student
Ruyi Chen, PhD student
Claire Cheng, PhD student
Sebastian Porras, PhD student
William Rieger, PhD student
Adam Wyatt, Undergrad research student
Haeli Gagle, Undergrad research student

The group anno 2024 (Nov). From left: Sanjana, Claire, Mikael, Olly, Ariane (visiting from Caltech), Zach, Will, Sam, Ruyi, Gabe

The group anno 2024 (Oct). From left: Sanjana, Georgia, Sam, Seb, Mikael, Will, Ruyi, Olly, Chongting, Claire, Gabe

The group anno 2023 (Nov). From left: Linh, Ruyi, Olly, Gabe, Mikael, Sam, Seb, Chongting, Zach, Sanjana

The new lab contingent of USA Nov 2023. From left: Ariane, Brad

The group anno 2023 (Oct). From left: Gabe, Mikael, Olly, Linh, Seb, Sanjana, Will, Ruyi, Brad

The group anno 2021. From left: Brad, Ariane, Gabe, Sam, Mikael, Rich. In the background: Phaedra, Kieran

The group anno 2016. From left: Rhys, Gabe, Sean, Julian, Marnie, Alex, Mikael, Yosephine, Chris, Burkhard (visiting).

Alumni

Chongting Zhao, Master student, volunteer
Gabriel Foley, Postdoc, now Research Fellow at ARC Centre of Excellence in Synthetic Biology, QUT
Georgia Wyldbore, Master student
Harry Newton, Master student
Aishwarya Dhruva, Master student
Thuy Linh Nguyen, Master student
Ariane Mora, PhD student, now Postdoc at CalTech Arnold group
Brad Balderson, PhD student, now Postdoc at Salk Institute McVicker lab
William Rieger, Research assistant, now at FU Berlin
Sorelle Bowman, Master student
Anoushka Shah, Master student
Hisatake Ishida, Winter research scholar
Rhys Newell, PhD student (at TRI with Gene Tyson & Ben Woodcroft)
Diya Prabhuram, Master student
Richard Pienaar, Master student
Kieran Convery, Project student
Woo Jun (Chris) Shim, PhD student, now Postdoc at IMB/UQ
Wenyu Pan, Summer research scholar/Master student
Qinglan Ou, Summer research scholar
Anastassia Demeschko, Summer research scholar
Alice Schulz, Summer research scholar
Tarisha Moodley, Master student
Asli Yoruk, Master student
Kate Wathen-Dunn, Master student, previously Sugar Research Australia, now Queensland Government, Department of Natural Resources, Mines and Energy
Alexandra Essebier, PhD student, now Data scientist at REDD Digital
Jhih-Siang (Sean) Lai, PhD student
Julian Zaugg, PhD student, now Bioinformatician/Software developer at Australian Centre for Ecogenomics
Suzanne Butcher, PhD student (primary supervisor Christine Wells)
Glen van den Bergen, PhD student (primary supervisor Alan Mark)
Marnie Lamprecht, Research officer
Yosephine Gumulya, Research officer, now Research Scientist at CSIRO
Jun Xu, Masters student
Isaac Asamoah, MPhil student
Aniek Roelofs, Research intern from University of Amsterdam
Ron Ramsay, Undergraduate research student
Ralph Patrick, PhD student, now Postdoc at Victor Chang Cardiac Research Institute
Timothy O'Connor, PhD student, now bioinformatician at Illumina San Diego
Sitthichoke Subpaiboonkit, PhD student
Patricia Vera Wolf, Master of Bioinformatics student
Sam Dai, MPhil Student (primary supervisor Bostjan Kobe)
Yufei Wang, Summer Research Scholar
Sebastian Seitz, Visiting student from TU Munich
Coralie Horin, Research Intern from Université Nice Sophia Antipolis
Danny Lee, Biotechnology Honours student
Elham Alhathli, Master of Bioinformatics student
Ahmed Mehdi, PhD student, now Postdoc at UQ/Diamantina
Diego Moncayo, Masters student
Minh Duc Cao, Postdoc, now at Gritstone Oncology/US
Kai Willadsen, Postdoc
German Ibarra, MPhil student
Samir Lal, Honours student, now Computational Biologist at Pfizer
Benjamin Merlet, Research Intern from Université Nice Sophia Antipolis

…

Authors

Created by admin (Mikael Boden) on 2017/01/17 09:30.

Contributing authors:

Table of Contents