Open projects in Bioinformatics

Below you will find descriptions of projects that bioinformatics students may do in fulfilment of project courses in their respective degrees. Please contact the person listed for further detail.

If you are a potential supervisor, click here

The Barrier Atlas: Cross-Tissue Insights into Homeostasis and Dysfunction

Contact: Amanda Oliver (Amanda.Oliver@qimrb.edu.au)

Single-cell RNA sequencing (scRNA-seq) has transformed our understanding of human tissue biology, revealing cellular diversity across organs and disease states. Building on existing datasets profiling millions of cells, this project aims to construct a unified single-cell atlas of barrier tissues, including the lung and gut, to uncover shared and tissue-specific mechanisms that maintain immune balance at the body’s environmental interfaces. The student will develop and apply computational pipelines for large-scale data integration, quality control, cell type annotation, and spatial and microbial mapping across millions of cells and thousands of samples. Advanced methods such as gene regulatory network inference, deep learning, and foundation models will be used to explore cross-tissue immune regulation and barrier dysfunction. By combining single-cell, spatial, and microbiome data, the project will deliver the first cross-tissue atlas of barrier biology, providing new insights into diseases such as inflammatory bowel disease and chronic respiratory disorders.

Suitable for Masters, or PhD students. Strong bioinformatics skills using Python or R are essential; experience with single-cell or spatial transcriptomics and knowledge of immunology or barrier tissue biology is highly desirable.

The Escape of Human Genomic Data into Public Repositories

Contact: Michael Hall (michael.hall1@uq.edu.au)

Public sequencing repositories (e.g. SRA) are growing rapidly, but many studies involving human clinical samples may inadvertently include identifiable host DNA—even when ethics approvals explicitly prohibit this. This project investigates the extent and implications of such data leakage.

Objectives:

•	Identify publicly available datasets from clinical pathogen/metagenomic sequencing studies
•	Quantify residual human genomic content using a variety of approaches and references
•	Benchmark human read detection approaches (e.g. host depletion vs k-mer-based methods)
•	Assess potential identifiability using forensic markers (e.g. Illumina Infinium SNPs, CODIS loci)
•	Explore the role of ethics language, technical variability, and population bias (e.g. African vs European genomes) in leakage rates

Skills you’ll gain:

•	Handling and processing large sequencing datasets
•	Working knowledge of alignment and k-mer classification tools (e.g. minimap2, kraken) and human read detection pipelines
•	Experience in reproducible bioinformatics analysis and privacy-aware genomic research
•	Insight into the intersection of ethics, bioinformatics, and public data governance

This project is ideal for students interested in clinical genomics, privacy, ethics, or data-driven policy impact. Familiarity with the command line is necessary. Knowledge of Python would be great, but not required—we can build those skills as you go!

Decoding the relationships between DNA replication, genome architecture, chromatin organisation

Contact: Dr Mathew Jones (mathew.jones@uq.edu.au)

The human genome is packaged into chromatin and assembled into 3D self-interacting chromatin domains that regulate gene expression and coordinate the process of DNA replication. Understanding the relationships between genome structure and function is one of the outstanding challenges in modern biology. Changes in the 3D structure of the genome can cause copying errors (genetic mutations) during DNA replication that results in diseases such as cancer and advanced aging. Decoding the relationships between the genomic landscape and cellular processes such as DNA replication has the potential to inform the development of novel treatments that can treat cancer and extend longevity.

In this project we are seeking talented and enthusiastic postgraduate students to tackle two fundamental questions: 1. How does the epigenome and the 3D organisation of the genome regulate DNA replication? 2. How are these processes disrupted in cancer and impacted by cancer therapies. The project will assess the impact of genomic features on replication using nanopore sequencing data generated by the Jones lab’s and their artificial intelligence assay for assessing DNA replication in human cells (https://doi.org/10.1101/2022.09.22.509021) and publicly available Hi-C, Repli-Seq, CUT & RUN, ChIP-seq, scSeq, datasets (e.g., GEO, ENCODE).

Bioinformatics and Computer Science students with skills in R, Python and C++ that are familiar with software suites for the comparison, manipulation and annotation of genomic features are encouraged to contact Dr Mathew Jones (mathew.jones@uq.edu.au) to learn more about the projects available.

Pangenomes to predict bacterial transmission in healthcare settings

Contacts: Leah Roberts l.roberts3@uq.edu.au, Michael Hall michael.hall2@unimelb.edu.au

Predicting whether two bacterial isolates are the same (and thereby inferring if transmission has occurred) has traditionally been performed by identifying and counting single nucleotide variants (SNVs). To do this, a reference genome is usually selected, and isolate reads are mapped to the reference to identify SNVs in regions shared between all isolates. However, for large datasets of very diverse bacterial strains, a single reference genome is usually insufficient, as the shared regions between the strains becomes a very small proportion of the total genomic content.

We propose a novel method using pangenome reference graphs to better identify and discriminate transmission of bacterial pathogens. This project would start to build test datasets and develop novel workflows for predicting transmission from pangenome graphs.

This project is suitable for an honours, Masters, or PhD student. Background in command line, HPC and python is highly desirable. This project will be based at UQCCR (Herston Campus) and co-supervised by Dr Michael Hall (University of Melbourne).

Investigation of the effect of the circadian rhythm on the genetic control of gene expression

Contact: Sonia shah sonia.shah@imb.uq.edu.au, Solal Chauquet <uqschauq@uq.edu.au >

The circadian rhythm reflects the daily cycle of behaviours and metabolic processes organisms exhibit. A 24-hour gene expression pattern occurs at the molecular level, with genes activated either during the day or night. Different tissues all display circadian control, with some more affected than others. Within the liver, for example, 3000 genes are subjected to circadian control. This regulation is orchestrated by a small group of CLOCK genes, establishing feedback loops that result in rhythmic gene expression in every tissue.

We know that gene expression can be influences by genetics variants, called expression quantitative trait loci (eQTL), and this may be one mechanism linking genetic variants to disease. As a result, large eQTL datasets have been generated to assist in understanding disease mechanisms. However, it remains unknown whether sample collection time can affect eQTL identification. This project therefore aims to identify the possible effects of the circadian rhythm on the genetic control of gene expression using the Genotype-Tissue expression (GTEx) dataset.

During this project, you will run Python tools such as PEER and tensorQTL to identify eQTL within 49 tissues. You will subsequently investigate the associations identified and follow up on the role of the genes under circadian controls within different phenotypes.

Understanding the influence of taste and olfactory perception on eating behaviour and health conditions using big genetic data

Contact info: Daniel Hwang d.hwang@uq.edu.au

Project description: Human perception of taste and smell plays a key role in food preferences and choices. There is a large and growing body of work suggesting that taste and smell (together known as "chemosensory perception") determine eating behaviour and dietary intake, a primary risk factor of chronic conditions such as obesity, cardiometabolic disorders, and cancer. Evidence to date is largely based on observational studies that are susceptible to confounding and reverse causation, leaving the "causal effects" of chemosensory perception on food consumption unclear. If their relationship is truly causal, flavour modification may represent a tangible way of modifying food consumption in a way that benefits public health outcomes. This project aims to: (i) elucidate the genetic architecture underlying individual differences in taste and smell perception, (ii) use this information to assess their causal effects on eating behaviour, and (iii) create a sensory-food causal network mapping individual sensory qualities (i.e. sweet taste, bitter taste, and more) to individual food items.

Increasing drug success rate in human clinical trials using genomics

Around 90% of drug candidates fail in human clinical trials largely due to lack of efficacy or safety concerns. This partly reflects the limitations of using in vitro and animal studies to predict the effect of compounds in humans. Recent studies highlight that drug targets backed by evidence from human genetic studies are 2 times more likely to make it to market. Human genetic data can also identify potential adverse side effects. Such information prior to embarking on human clinical trials could improve the success rate of a compound in human clinical trials and help avoid adverse outcomes for participants. This project will use statistical genomics analyses using publicly available human genomic data to predict efficacy as well as any safety concerns of compounds that are currently in the drug development pipeline.

Project significance: Findings from this project could potentially identify new therapeutic applications for these compounds or unknown side effects, and ultimately informing future human clinical trials.

Contact: Sonia Shah sonia.shah@imb.uq.edu.au

Supervisors: You will be working with a multidisciplinary team of supervisors Prof Dave Evans, Dr Sonia Shah, Prof Glenn King, Assoc/Prof Nathan Palpant

Familiarity with computational analyses (e.g using R or python etc) is needed for this project. Some knowledge around genome-wide association studies and statistical genomics methods such as Mendelian randomisation analysis would be beneficial

Developing quiescent stem cell classifier using single cell transcriptomics

Contact info: Dr Lachlan Harris (Lachlan.Harris@qimrberghofer.edu.au), Dr Olga Kondrashova (Olga.Kondrashova@qimrberghofer.edu.au)

Quiescence is a reversible state of cell-cycle arrest, sometimes referred to as the “G0” phase of the cell-cycle. It is an adaptive feature of most adult stem cell populations, where it ensures that stem cells divide only when needed, preserving regenerative capacity. However, quiescence is also adopted by cancer stem cells to evade chemo- and radiotherapies that preferentially kill fast-dividing cells. Single-cell data promises to uncover the molecular regulation of quiescent stem cells in health and disease but the identification of these cells within these datasets is either reliant on expert knowledge and manual curation or is currently impossible, due to a lack of marker genes.

The most common classifiers that define cell-cycle stages (G1/S/G2/M) in single-cell RNA-sequencing data (scRNA- seq) were trained on populations of actively cycling cells. Therefore, these tools cannot identify quiescent stem cells in “G0” phase of the cell-cycle. It is an outstanding question as to whether there are sufficient transcriptomic similarities across quiescent stem cells from different tissue types to build a generalisable model to discriminate these cellular populations. Furthermore, it is unknown whether such a model would generalise to cancerous tissue, where increased variability in transcriptomic states often degrades the distinction between cell types.

This project aims to develop a broadly applicable quiescent classifier. As a first step towards this, this project will seek to 1) contribute to the curation of datasets and isolation of tissue-agnostic and tissue-specific feature sets that define quiescent stem cells and 2) compare methods for training quiescent classifiers and for determining the most salient features.

Understanding sex-specific cardiovascular disease risk

Contact info: Dr Sonia Shah (sonia.shah@imb.uq.edu.au), Dr Clara Jiang (j.jiang@uq.edu.au)

Description: Cardiovascular diseases (CVD) account for 35% of female deaths globally (29% in Australia). However, CVDs remain under-studied, under-diagnosed and under-treated in women. This sex disparity is partly due to the lack of knowledge of female-specific risk factors. This project involves statistical analysis of large-scale health and genetic data to identify sex-specific CVD risk factors and underlying mechanisms.

Requirements: A background in genetics and computational data analysis is preferable.

De-risking the drug development pipeline by finding biomarkers of drug action

Supervisor: Dr Nathan Palpant (n.palpant@uq.edu.au)

Greater than 90% of drugs fail to advance into clinical approval. Genetic evidence supporting a drug-target-indication can improve the success by greater than 50%. This project aims to make use of consortium-level data resources (UKBiobank, Human Cell Atlas, ENCODE etc) to identify genetic links between genetic targets and phenotypes to help facilitate the translation of drugs from healthy individuals (Phase 1 clinical trial assessing safety) into sick patients (Phase 2 clinical trial assessing efficacy). Finding orthogonal biomarkers of drug action in healthy individuals is critical to de-risk drug dosing when transitioning from Phase 1 to Phase 2 trials. Using ASIC1a as a candidate drug being developed to treat heart attacks, we aim to develop a functionally validated computational pipeline to predict orthogonal biomarkers of ASIC1a inhibitor drug action in healthy individuals to help inform dosing in human clinical trials. Computationally predicted biomarkers will be validated using genetic knockout animals and pharmacological inhibitors of ASIC1a. Collectively, this project will help develop proof-of-principle computational pipeline for orthogonal biomarker prediction of drug targets in the human genome.

Parsing the genome into functional units to understand the genetic basis of cell identity and function

Supervisor: Dr Nathan Palpant (n.palpant@uq.edu.au)

The billions of bases in the genome are shared among all cell types and tissues in the body. Understanding how regions of the genome control the diverse functions of cells is fundamental to understanding evolution, development, and disease. We recently identified approaches to define diverse biologically constrained regions of the genome that appear to control very specific cellular functions. This project will evaluate how these biologically constrained regions of the genome have influenced evolutionary processes, evaluate their regulatory basis in controlling the identity and function of cells, and analyse the promiscuity of cross-talk between different biologically constrained regions. The project will also study how these genomic regions impact disease mechanisms by evaluating how disease-associated variants in different regions influence survival of patients with cancer and assessing whether these regions are associated with identifying causal disease variants in human complex trait data. The project will involve significant collaborative work with industry partners and researchers across Australia with the goal of providing critical insights into fundamental mechanisms of genome regulation.

Machine learning integration of sequencing and imaging data in cancer research

Cancer is a complex disease that is difficult to treat due to the high level of variation within a tumor and between patients. To better understand cancer complexity at the tissue level, we use a combination of techniques such as single-cell sequencing, spatial transcriptomics, tissue imaging, statistical learning, and deep learning. We analyze the data using high-performance computing to computationally reconstruct biological regulatory networks underlying human diseases in every single cell and between cells within a tissue, like a tumor. By measuring both molecular profiles of the cells and their neighborhood environment, we can integrate genomics and imaging data for earlier and more accurate diagnosis and prognosis of diseases from using tissue biopsies. Our goal is to advance the understanding of biomarkers and cellular regulatory networks that are specific to cell types and tissue microenvironment, which will contribute to early disease diagnosis, targeted drug discovery, and precision medicine.

Contact: Quan Nguyen quan.nguyen@imb.uq.edu.au

Resolving molecular trajectories of differentiation pathways of adaptive immune cells in chronic infection generating long-term memory

Chronic virus infections are a major ongoing global problem that establish lifelong disease due to the development of persistent refractory infections and the lack of long-term effective antiviral therapies. Cytotoxic CD8+ T cells play a key role in destroying virus-infected cells and in they are required to generate immunological memory which provides long-term protective immunity. The transcriptional fate map of cells generated under conditions of chronic infection and how ‘stem cell-like’ features are encoded are not well understood. In this project, we aim to unravel the key transcription factors and molecular circuits that define the long-term generation and maintenance of stem-cell like memory responses during such infections. This study will use and develop computational and statistical tools to model the transcriptional landscape and models of T cell differentiation generating memory heterogeneity and stem-cell properties. The project will include implementing protocols for quality control analysis, normalization, and clustering, analysing gene networks underlying cell subpopulations, identifying key genetic regulators of cell states. We will explore how key factors can be manipulated to improve T cell immunotherapy for infections and cancer.

Contact: Gabrielle Belz g.belz@uq.edu.au

Dissecting the stromal and innate immune cell heterogeneity and transcriptional regulation in lung inflammation and infection

Lung protection depends in early defence mechanisms orchestrated by responses by stromal cells and innate immune cells. The differentiation, heterogeneity and functional regulation of these cells is not yet understood. Understanding these processes is crucial to map the temporal development of lung protective responses and pinpoint key check points that dictate reversible and irreversible stages of disease. Uncovering these features will maximize the capacity to uncover translational pathways for cell therapeutics and drug discovery. In this project, you will use and develop computational and statistical tools to study the transcriptional landscape of lung cell fibrosis and inflammation at bulk and single cell resolution. The project will include implementing protocols for quality control analysis, normalization, and clustering, analysing gene networks underlying cell subpopulations, identifying key genetic regulators of cell states, and helping develop novel strategies for studying and analysing single cell RNA-sequencing data to study biological questions.

Contact: Gabrielle Belz g.belz@uq.edu.au

Investigating Spatial Transcriptomics and Proteomics in Hard-to-Cure Paediatric Cancers

This MRFF-funded project aims to study spatial transcriptomics and proteomics in challenging paediatric cancers, specifically sarcoma and neuroblastoma, which are a clinical challenge where new therapies haven’t been offered since the last 3 decades. We are particularly interested in the immunoevasive strategies that cancer cells use to highjack the anti-cancer immunity. By analysing the molecular landscape and spatial organization of tumour cells within the microenvironment, we aim to identify biomarkers, therapeutic targets, and understand treatment resistance. The outcomes could revolutionize diagnosis, treatment, and prognosis for these devastating paediatric cancers. Objective: Investigate spatial transcriptomics and proteomics in sarcoma and neuroblastoma to identify biomarkers, immune-suppression, therapeutic targets, and understand treatment resistance. Methodology: (1) Use spatially resolved RNA sequencing to capture the transcriptomic landscape and identify distinct cell populations. (2) Analyse protein expression patterns and spatial distribution within tumour tissues using mass spectrometry-based proteomics. (3) Integrate data with clinical, histopathological, and genomic information to identify molecular subtypes and potential therapeutic opportunities. Expected Outcomes: • Discover novel molecular markers and spatially distinct cell populations. • Enhance understanding of the tumour microenvironment and treatment response. • Identify potential therapeutic targets for personalized treatment strategies. Impact: This project has the potential to revolutionize diagnosis, treatment, and prognosis in hard-to-cure paediatric cancers. By unravelling spatial complexities, it aims to develop targeted therapies, improve patient stratification, and achieve better clinical outcomes. Collaborators: Dr. Arutha Kulasingue, Dr. Ahmed Mehdi & A/Prof. Fernando Guimaraes

Contact: f.guimaraes@uq.edu.au

Investigating Spatial Transcriptomics of Immune Responses in Respiratory Infections

This project aims to investigate the immune responses in bacterial pneumonia, influenza, COVID-19, and syncytial virus using spatial transcriptomics. By analysing existing sequencing data from these infections, we will explore the spatial organization of immune cells and gene expression profiles within lung tissue. The research aims to uncover unique immune signatures, molecular interactions, and potential therapeutic targets, improving our understanding and management of these respiratory infections. Objective: Analyse spatial transcriptomics data to understand and compare immune responses in bacterial pneumonia, influenza, COVID-19, and syncytial virus. Methodology: Apply spatial transcriptomics techniques to study immune cell organization and gene expression patterns within lung tissue; Conduct comparative analysis to identify distinct immune signatures and pathways across the infections; Investigate molecular interactions and signalling pathways associated with immune responses. Expected Outcomes: • Comparative understanding of immune responses in respiratory infections. • Identification of unique immune signatures and potential therapeutic targets. • Insights into molecular interactions and signalling pathways involved in the immune response. Impact: This research project advances our understanding of immune responses in respiratory infections, potentially leading to improved diagnostics, therapies, and management strategies for bacterial pneumonia, influenza, COVID-19, and syncytial virus. Collaborators: Dr. Arutha Kulasingue, Dr. Ahmed Mehdi & A/Prof. Fernando Guimaraes

Contact: f.guimaraes@uq.edu.au

Evaluating machine learning models classifying cancer-specific pattern in children with cancer

Profiling the expression of active genes and adaptive immune receptors on cancer cells to develop a deeper understanding of paediatric hematopoietic cancer

Developing single-cell trajectory analysis methods for adaptive immune cells

A cancer diagnosis at any age is upsetting, but felt more harshly when the patient is a young child who has only started out in life. Compared to adult cancer patients, the window of opportunity to help child cancer patients is especially short. We need to create an early warning system for paediatric cancers. Specialized immune cells known as T-cells and B-cells use specific receptors to recognize tumour antigens and fight cancerous cells. My lab's vision is to harness these cells and their receptors to enable early cancer detection and disease monitoring. These specific adaptive immune receptors are essential for all aspects of the T- and B-cell’s life cycle, serving as natural ‘time-keepers’ of the immune response against cancer progression. We will create bespoke computational algorithms to explore the properties that define how effective these immune cells are in childhood cancer, perform high resolution gene expression profiling at the single-cell level and develop highly advanced computer models that can be used to detect adaptive immune receptors that are targeted towards cancer. The projects will be largely dry-lab based and the candidates should expect to be working as part of a team together with leading groups in Australia as well as international collaborative networks (Cambridge, Sanger, UK).

The projects will suit either an immunologist wanting to learn bioinformatics and/or a computer scientist who wants to apply their skills onto biological problems. An ideal candidate would have a background in immunology, computer science, and/or bioinformatics. Basic understanding of statistical methods and machine learning experience working with python/R is highly desirable.

Contact: Kelvin (Zewen) Tuong z.tuong@uq.edu.au

Resolve by Bioinformatics the Ixodida (tick) branches of the tree-of-life from entire mitochondrial & nuclear genome sequences

This work is the Australian contribution to an international project to resolve the tree-of-life called the Tree-of-Life Web Project (http://tolweb.org). Students will use common bioinformatics tools to mine our Illumina nucleotide sequence data & the Sequence Read Archive for mitochondrial genomes & ribosomal RNA gene-clusters. Then students will use Geneious to manipulate sequences, & other programs to predict (“reconstruct”) the evolutionary tree of the ticks. Virtual time machines! Have a look at the You Tube Channel of my Tick Mitochondrial Genome Network: https://www.youtube.com/channel/UCnBhfhYxjC4rsJmVpBwHT0g/featured

Mine with the tools of Bioinformaticsour Illumina nucleotide sequence data & the Sequence Read Archive for new and interesting genes in ticks that are associated with disease in humans our domestics animals, particularly dogs, and wildlife.

Contact info: Professor Stephen Barker SCMB s.barker@uq.edu.au

This project is designed for students who are studying for Masters of Molecular Biology, Masters of Biotechnology, & Masters of Bioinformatics (BIOX700x).

Available for semester 1, 2 and summer

Resolve by Bioinformatics & Big Data the climatic requirements of the ticks of Australia & PNG, & thus the potential changes to the geographic distributions & amount of disease caused by ticks, under a range of climate-change scenarios

Ticks are the “pests of our times” in that the current generation of Australians & indeed people all over the world are more concerned about the diseases associated than with ticks that with any other infectious disease. This is understandable since we know so little about how the climate influences why ticks live where they live (geographic-risk) & why ticks are abundant in some years yet rare in other years (variation in risk among years). Students will use computer models to predict how the climate influences why ticks live where they live (geographic-risk) & why ticks are abundant in some years yet rare in other years (variation in risk among years); and thus the potential changes to the geographic distributions & amount of disease caused by ticks, under a range of climate-change scenarios.

Contact info: Professor Stephen Barker SCMB s.barker@uq.edu.au

This project is designed for students who are studying for Masters of Molecular Biology, Masters of Biotechnology, & Masters of Bioinformatics (BIOX700x).

Available for semester 1, 2 and summer

Dissecting the ageing transcription factor network

Contact info: Dr Christian Nefzger (c.nefzger@imb.uq.edu.au) and Marina Naval-Sanchez (m.navalsanchez@imb.uq.edu.au)

On a cellular level, ageing appears to be a largely epigenetic phenomenon. To uncover transcription factors (TFs) and chromatin state changes that drive ageing in different cell types, we have generated a molecular atlas (RNAseq, ATACseq) comprised of dozens of mammalian cell types from both young and aged subjects. By pinpointing and analysing age-related changes to the TF network the project will reveal if there are TFs or TF families that drive ageing across different cell types or if ageing is a largely cell type specific process. The project will leverage bulk data and entail computational techniques related to quantification of TF activity levels. Integrative network analyses between transcriptional and chromatin state data will also be performed. The project aims to improve our understanding about the ageing process to ultimately find new ways that make aged cells work more efficiently.

The project will be well supervised. The students do not need to be directly familiar with the analysis procedures for this project, but the ideal candidate will be able to efficiently program in R or Python. This project is looking for bioinformatics Masters students (ideally 16 units, but we consider 8 unit applicants as well. Students placed overseas who want to conduct a project remotely are welcome). We also consider Phd students.

(Deep) Learning the regulatory grammar of ageing

Contact info: Dr Christian Nefzger (c.nefzger@imb.uq.edu.au), Dr. Marina Naval-Sanchez (m.navalsanchez@imb.uq.edu.au and Dr. Ralph Patrick (ralph.patrick@imb.uq.edu.au)

Aging is a gradual process of functional and homeostatic decline in living systems and the greatest risk factor for virtually all degenerative diseases. At a cellular level, epigenetic changes in the non-coding part of the genome play a major role in this functional decline. The laboratory has access to deep profiling of age-related chromatin accessibility changes (ATAC-seq) with matched gene expression (RNA-seq) from 22 purified primary cell types across 11 tissues providing a roadmap of distinct regulatory elements including promoters and enhancers impacted by ageing.

Recently, machine learning and deep learning methods to understand the regulatory lexicon from DNA-protein interactions advanced our understanding of gene regulation (http://kipoi.org). These methods can predict and annotate the sequence lexicon and impact of mutations at the nucleotide resolution.

The project aims to:

1. Compare available machine learning Convolutional Neural Networks (CNNS) algorithms (http://kipoi.org) to decode the regulatory drivers of cellular ageing.

2. Statistically associate phenotypic variants from GWAs studies impacting the ageing regulatory lexicon.

The ideal candidate should have an interest in machine/deep learning, CNNS and will be able to program in R/Python.

The project is embedded in the Nefzger lab with a major focus on “Cellular reprogramming and Ageing”. The applicant will be closely working with Dr. Naval-Sanchez as the main supervisor.

Decoding Transcription Factor Dosage Effects on Cell State Transitions with DoseH-Seq

Contact info: Dr Christian Nefzger (c.nefzger@imb.uq.edu.au), Ralph Patrick (ralph.patrick@imb.uq.edu.au) and Marina Naval-Sanchez (m.navalsanchez@imb.uq.edu.au)

Cell identity is controlled by different combinations of transcription factors (TFs) that bind to genomic regulatory elements to regulate gene expression. TF activity is not binary in most instances but graded and in response to TF dosage levels (e.g., Naqvi et al., Nat Genet., 2023, PMID: 37024583). For this reason, TFs are strongly enriched for haploinsufficient disease associations (Seidman et al, 2002, J. Clin. Invest. PMID: 11854316; Van de Lee et al., 2020, Trends Genet., PMID: 32451166) and TF dosage and stoichiometry strongly affects reprogramming outcomes (e.g., Polo et al, 2012, Cell, PMID: 32939092; An et al., 2019, Cell Reports, PMID: 31722212). Furthermore, TF dosage effects may also underlie seemingly contradictory effects linked to overactivation of certain TFs in cancer contexts, including of the Nfi family (Becker-Santos, 2017, The Lancet Discovery Science, PMID: 28596133).

Single-cell RNA+ATAC-seq is a uniquely powerful assay to measure the impact of TF levels on cell regulatory architecture; however, no tools currently exist to directly study TF dosage effects on temporal cell state transitions. To address these gaps, we developed Dosage and Hashtag sequencing (DoseH-seq), an expansion of the 10x Genomics single-nucleus (sn)RNA+ATAC-seq assay that enables sensitive detection of lentiviral perturbations (e.g., TFs) linked to a heterogeneously expressed promoter. In combination with sample hash tagging, multiple temporal, and dosage states, for theoretically any number of genes of interest, can be profiled. This allows detection of TF dosage-dependent effects on temporal cell state transitions, chromatin architecture, co-factor expression, and the rewiring of TF networks at high-resolution. Compatibility with BGI sequencing technology enables the generation of low-cost datasets.We demonstrate the utility of DoseH-seq by tracking the dosage effects of somatic transcription factor, Nfix, during reprogramming towards pluripotency. Contrary to the current dogma, we find that Nfi overexpression can act either as a reprogramming roadblock or as a reprogramming booster, depending on TF dosage and context. These insights may help resolve the TF’s paradoxical role in cancer. DoseH-seq represents a powerful tool for elucidating, and ultimately controlling, both desired and pathological cell state transitions.

The applicant would help drive method establishment around our novel DoseH-seq technique and support analysis to understand TFs dosage effects with established data sets. Ideal candidate will be able to efficiently program in R or Python. This project is looking for bioinformatics Masters students (ideally 16 units, but we consider 8 unit applicants as well. We also consider PhD students.

Trans-ancestry conditional analyses of genome-wide association studies

Contact: Dr Loic Yengo (l.yengo@imb.uq.edu.au)

The experimental design of genome-wide association studies (GWAS) consists in testing the association between a large number of DNA polymorphisms and a trait of interest. Classically, these associations are tested using a simple linear regression (i.e. one at a time) framework, which cannot distinguish associations from correlated variants. To solve that issue, conditional and joint (COJO) analyses leverage the correlation structure between polymorphisms to identify subsets of variants that are jointly associated with the trait of interest. Current implementations of COJO algorithms can be applied to GWAS performed in individuals of a single ancestry, where the correlation structure between variants is constant; but they cannot yet handle meta-analyses of GWAS from diverse ancestries (e.g. East-Asian, European).

This project aims at developing a COJO algorithm to simultaneously perform variants selection and meta-analyses of multiple GWAS from participants of diverse ancestries. The research will include: (i) developing and comparing algorithms, (ii) testing the impact of violations of model assumptions through simulations and (iii) writing a C++ based software implementing this algorithm. Application of this research can improve our ability to discover genes involved in the susceptibility of common diseases.

The ideal candidate will have a good understanding of the multiple linear regression model and will be able to efficiently program in R/Python and C++.

Join the Tick Tree of Life Project

Contact: Professor Stephen Barker, SCMB, UQ (s.barker@uq.edu.au); Dr Renfu Shao University of the Sunshine Coast, Qld (rshao@usc.edu.au)

Join the International Tick Tree of Life Project. Predict the evolutionary history (phylogeny) of ticks from mitochondrial and nuclear genomes. Students will use common bioinformatics tools to mine the Sequence Read Archive for mitochondrial genomes and rRNA gene clusters. Then the students will use Geneious to manipulate sequences, and other programs to predict (“reconstruct”) the evolutionary tree of the ticks. Virtual time machines!

Available all year for Master of Bioinformatics students; suitable for one or two semesters, full-time.

Generation of a metabolic model for the respiratory pathogen Haemophilus influenzae

Contact info: Ulrike Kappler (u.kappler@uq.edu.au) and Birgitta Ebert (birgitta.ebert@uq.edu.au)

Haemophilus influenzae is a human respiratory pathogen that causes both acute respiratory tract diseases and increases the severity of chronic diseases such a bronchiectasis, chronic obstructive pulmonary disease and cystic fibrosis. Virulence of this bacterium depends strongly on its metabolic processes, and in this project we aim to produce a metabolic model for this bacterium that will allow us to investigate the effect of changes in flux through specific pathways on pathogen fitness.

Suitable for bioinformatics project or Masters students.

Genome characterisation with real time long read sequencing

Contact info: Ben Hayes (b.hayes@uq.edu.au), for further information contact Elizabeth Ross (e.ross@uq.edu.au) and Loan Nguyen (t.nguyen3@uq.edu.au)

Genotyping by sequencing is a genetic screening method for characterising both novel and known single nucleotide polymorphisms (SNPs) and performing genotyping studies. Until recently short read sequence data was favoured for novel SNP discovery, however recent advances in portable long read sequencers may change this. Oxford Nanopore sequencing Technologies (ONT) offers a range of rapid real-time sequencing platforms, including sequencers that can fit in the palm of your hand. With the significant increase in read length as well as the absence of certain biases, Oxford Nanopore sequencing has become a popular option for genome characterisation. This project will implement the use of nanopore sequencing for SNP genotyping, structural variant identification, and calling methylation. Students will develop a deep understanding of cutting edge sequencing and genotyping methods, bioinformatics skills, project design, scientific communication and data management.

The project as available on an ongoing basis for honours or Masters of Bioinformatics students, full time. PhD projects are also available in this area.

Novel isoform discovery using Iso-seq

Contact info: Ben Hayes (b.hayes@uq.edu.au), for further information contact Elizabeth Ross (e.ross@uq.edu.au) and Loan Nguyen (t.nguyen3@uq.edu.au)

New technology now allows the sequencing to hundreds of thousands of full length transcripts (expressed genes) from a single sample. A dataset of 10 tissues has been generated using Iso-seq - a method that can sequence the full length expressed isoforms in a sample. This project will analyse that Iso-seq data and identify novel isoforms including those for genes that are known to be of industry importance. This information will provide a deeper understanding of the genetic variation in Australian beef cattle and be used to inform large genome wide association studies and discovery of mutations controlling gene expression. The project focuses on bioinformatics and analysis skills in a fast developing area of research.

The project as available on an ongoing basis for honours or Masters of Bioinformatics students, full time.

Rumen microbiome

Contact info: Ben Hayes (b.hayes@uq.edu.au), for further information contact Elizabeth Ross (e.ross@uq.edu.au)

Ruminants such as cattle are host to a vast array of microbial species which reside in a specialised chamber of their stomach called the rumen. Microbes in the rumen digest the feed which the animals eat. Cutting edge sequencing technologies now allow for accurate profiling of microbiome communities. This study will analyse the microbes that live inside the rumen of cattle fed a methane mitigating diet. Methane is a potent greenhouse gas that is produced as a by-product of ruminant digestion. The goal of this study is to utilize high throughput sequencing to identify the species of microbe that have a changed abundance in response to the diet and therefore increase our understanding of how the rumen microbiome can be manipulated to reduce methane emissions from ruminants.

The project as available on an ongoing basis for honours or Masters of Bioinformatics students, full time. PhD projects are also available in this area.

Comparing algorithms to estimate polygenic effects

Contact info: Ben Hayes (b.hayes@uq.edu.au),

With the advent of new genomic technologies comes the need to develop new statistical and computational algorithms that can handle large amounts of data in Animal Science. Within the Bayesian paradigm, current methods to estimate polygenic effects for complex traits rely mostly on Gibbs sampling. These approaches are not necessarily scalable to big datasets as the computation time grows more than linearly with sample size. This means that huge computational resources, in terms of RAM memory and/or computing time, need to be used to fit such models. The aim of this project is to compare the performance of alternative Markov chain Monte Carlo (MCMC) algorithms when estimating polygenic effects for complex traits in tropically adapted beef cattle. In addition to Gibbs sampling, at least two MCMC algorithms will be compared: Hamiltonian Monte Carlo and Variational Inference. The student will also learn the basics of Bayesian Statistics and High Performance Computing at UQ.

The project as available on an ongoing basis for honours or Masters of Bioinformatics students, full time.

How genomic variation and gene expression impact fertility

Contact info: Ben Hayes (b.hayes@uq.edu.au), for further information contact Bailey Engle (b.engle@uq.edu.au)

Fertility is an important but complex genetic trait influenced by a large number of genetic and environmental factors. In beef cattle, this combination of variables affects the pregnancy potential of a cow. The goal of this project is to assess genetic variation leading to different pregnancy outcomes in Brahman heifers. Students will use high density SNP genotypes and RNA-seq data to identify genes and genomic regions influencing positive pregnancy status in a very large genomic and phenotypic data set (up to 30,000 cattle). This project will develop skills in bioinformatics and data analysis, management of extremely large biological datasets, and applied quantitative genetics to understand and improve cow fertility.

The project as available on an ongoing basis for honours or Masters of Bioinformatics students, full time.

Develop tools to impute methylation sites from low-coverage sequencing

Contact info: Loan Nguyen (t.nguyen3@uq.edu.au)

In humans, the methylation state of CpG sites changes with age and can therefore be utilized as an accurate biomarker for aging. In cattle, biological age prediction based on methylation status could provide key information for genetic improvement programs. Additionally, comparing chronological age with biological age (based on methylation status) can provide important information about the stress an animal has been under during its lifetime. In this project, students will use cutting edge data sources including reduce representation bisulphite sequencing data, whole genome bisulphite sequencing, long read sequencing and human methylation data to develop a tool to impute methylation sites from low coverage ONT sequence data.

This project is designed for students who are studying for Masters of Molecular Biology, Masters of Biotechnology, & Masters of Bioinformatics.

Available for semester 1, 2 and summer

Differential methylated regions related to puberty in Brahman cattle

Puberty is a complex whole-body phenomenon that affects bone growth. In this study, we investigated how puberty in Bos indicus females affects methylation profiles in the epiphyseal growth plate, the cartilage that is essential to bone growth in long bones. Student will analyse nanopore sequencing data of 12 samples (6 pre-puberty and 6 post-puberty) to call methylation and identify the differentially methylated regions between these two groups.

This project is designed for students who are studying for Masters of Molecular Biology, Masters of Biotechnology, & Masters of Bioinformatics.

Available for semester 1, 2 and summer

CRISPR

Contact: Dr Dimitri Perrin (dimitri.perrin@qut.edu.au) and Prof. Ernst Wolvetang (AIBN/UQ)

CRISPR-Cas9 revolutionised the field of gene editing and opened up new applications in health, agriculture, environment, etc. The default approach is to rely on a guide RNA (gRNA) engineered to bind at a specific location of interest in the genome, and on Cas9 to ‘follow’ this guide and make a cut at that position. A more recent approach is to use ‘base editors’ that combine enzymes modifying DNA with inactive Cas9 variants. This allows base changes without cut, DNA repair mechanism or donor template. This has a huge potential, but the performance of base editors is still incompletely characterised.

In this project, you will develop a computational method to design optimal gRNAs for base editors, and validate these results in the context of human induced pluripotent stem cells. You will investigate the efficiency of the guide, as well as characterise the off-target effects and discover determinants that influence the targeting specificity of gRNA. By doing so, the objective is to develop an optimised gRNA selection and design platform that can be used to accelerate human stem cell research.

Origin and evolution of animal aquaporins

Contact: Dr Sandie Degnan (s.degnan@uq.edu.au)

Aquaporins are pore-forming membrane channels that play critical roles in controlling the water contents of cells in all kingdoms of life. Their discovery in 1992 led to the 2003 Nobel Prize in Chemistry. Aquaporins are remarkable for permitting the transport of water, and sometimes other small solutes, across biological membranes, while at the same time being completely impermeable to charged species. Both their physiological functions and the physiological relevance of their selectivity to unconventional permeants are poorly understood, not least because of the paucity of data from non-model organisms living in diverse environments with respect to water availability or osmolality. This project will involve bioinformatic characterisation, and phylogenetic analysis, of aquaporins from marine animals representing phyletic lineages at the base of the animal tree, including sponges, ctenophores, cnidarians and placozoans. There will be opportunity to explore the possibility of acquisition by animals of aquaporins from bacteria, as has been demonstrated already in plants, and thus to help clarify fundamental issues about the evolutionary origin and diversification of animal aquaporins.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Machine learning and data integration in bioinformatics

Contact: Mikael Bodén (m.boden@uq.edu.au)

Biological data at greater scale give statistical power to distinguish meaningful signals from mere noise or artefacts, i.e. to identify "drivers" and "determinants" of function and structure.

The application of machine learning is far-reaching: my group is involved in several collaborative projects ranging from mapping regulatory events during development, identification of cancer/disease drivers, and engineering of proteins for medical and industrial applications. With emerging technologies come new data types; there are challenges with combining them to compose an accurate representation of complex biology. If you have interest on using and/or developing methods applicable to such problems, there is scope to articulate specific projects.

Availability all year, for bioinformatics students with problem-solving skills, Honours or Masters.

Reconstruction of ancestral proteins

Contact: Elizabeth Gillam (e.gillam@uq.edu.au), Yosephine Gumulya (y.gumulya@uq.edu.au), or Mikael Boden (m.boden@uq.edu.au). We are looking for a student interested in sequence analysis for purposes of reconstructing ancestral proteins, potentially to have existed many millions years ago. This work can be focused on the application of current methodologies to generate candidate ancestral sequences, and/or the development of computational methodologies to support the process of generating candidate ancestors or ancestral components. The former angle is short term (a #2 course or part of a longer project course) and is suited to candidates with an interest in sequence analysis for understanding evolution of protein function and protein engineering. The latter angle is longer term (semester long) and suits candidates with strong computational skills.

Available all year, for Master of Bioinformatics students; suitable for one semester (#2 – #8).

Transcriptomic and proteomic analysis of cone snail venom

Contact: Richard J. Lewis (r.lewis@uq.edu.au) The one-year project would analyse the transcriptome from the venom duct of a cone snail species using advanced bioinformatic tools and prepare table and figures to visualise the data. If the study goes well the work by the student should end up as part of a publication. The following articles provide background to the project.

Dutertre S, Jin A-H, Vetter I, Hamilton B, Sunagar K, Lavergne V, Dutertre V, Fry BG, Antunes A, Venter DJ, Alewood PF, Lewis RJ (2014) Evolution of separate predation- and defence-evoked venoms in carnivorous cone snails. Nature Communications 5:3521.

Dutertre S, Jin AH, Kaas Q, Jones A, Alewood PF, Lewis RJ (2013) Deep venomics reveals the mechanism for expanded peptide diversity in cone snail venom. Mol Cell Proteomics 12:312-329.

Investigation of vaccine targeting to skin with microprojection arrays

Contact: Dr Stefano Meliga (s.meliga@uq.edu.au)

The Nanopatch is a high density microprojection array for epidermal and dermal delivery of vaccines. Application to mouse skin has resulted into immunogenicity comparable with intramuscular injection using less than 1/100 of the dose. However, the mechanisms underlying this low-dose potent response have not yet been fully understood. Experimental evidence suggests that enhanced immunogenicity is triggered by precise targeting of vaccine to skin antigen-presenting cells in conjunction with generation of controlled levels of cell damage. We offer student projects aiming at the numerical investigation of Nanopatch-mediated vaccine targeting, generation of inflammation and triggering of signalling pathways leading to immune response. The successful candidate will be developing and applying mathematical / statistical models to verify the mode-of-action hypotheses, and have the chance to drive the design of the next-generation delivery device. This project is computer-centred and will be taking place at AIBN. Requirements: Applied mathematical and statistical skills, programming ability (e.g. MATLAB, R), interest in working in a highly multidisciplinary team is a plus, knowledge of immunology is not mandatory.

Available all year.

Genomics and epitranscriptomics of coral reef symbionts

Contact: Dr Cheong Xin Chan (c.chan1@uq.edu.au)

Symbiodiniaceae are a specialised group of dinoflagellate algae that grow symbiotically with diverse coral reef animals including corals and sponges. A modest, episodic increase in ocean water temperature can break down the coral-dinoflagellate symbiotic association (thus cause coral bleaching); unless this symbiosis is soon re-established, corals are at risk for starvation, disease and death. Our group is interested in the genome evolution of Symbiodiniaceae and their closely related species, specifically related to their evolutionary transition from free-living to symbiotic lifestyles, and its functional implications for the coral host and the health of coral reefs in light of global climate change. This project aims to discover genes and functions in these algae that are specific and/or relevant to environmental adaptation using a comparative genomic approach, using newly sequenced and existing data (Illumina, PacBio and Nanopore). Using native direct RNA sequencing, we are also investigating the impact of mRNA modification (i.e. epitranscriptome) on genome evolution of these ecologically important species.

This project is strictly computational based. The researcher will acquire skills in genomics and bioinformatics of non-model systems, specifically in the analysis of high-throughput sequencing data, comparative analysis of large-genome-scale data, functional annotation of genome sequences, and/or the use of machine learning in sequence analysis. The researcher will work as part of a team, and is expected to produce a report or oral presentation at the end of their project. Research outcomes may be included in a scholarly publication.

This project is available on an on-going basis, and is suitable for Masters students or advanced undergraduate students (year 3+) with a strong background in life sciences (i.e. biology and related subject areas), mathematics, and/or computer sciences. A background in genomics and/or bioinformatics is desirable but not essential. This project will require scripting (e.g. Python), high-performance computing in the UNIX environment, and/or R.

QUT PhD Scholarships in Genomics and Computational Biology focused on Indigenous Health

Contact: Dr Shiv Nagaraj (shiv.nagaraj@qut.edu.au)

Life expectancy of Aboriginal and Torres Strait Islander Australians is much lower than other Australians, in part due to an apparent genetic predisposition to chronic diseases. Better understanding of this genetic contribution has the potential to improve early detection and target prevention strategies. This project will use whole genome sequencing (WGS) to define the genetic architecture of the Indigenous Australians and its association to serious chronic diseases, helping to develop a precision medicine approach that will enable accurate diagnosis and inform targeted treatments. This project builds on the most comprehensive chronic disease profiling performed in any Indigenous community, the longest follow-up, treatment and prevention trials, and documentation of endpoints. The PhD research work could ultimately lead to development of tests for early detection of chronic disease, to protocols of personalised management, and ultimately, better health outcomes for Indigenous Australians.

Project Genomic architecture of chronic disease in Australia’s First Peoples

The overall goal of this project is to understand the landscape of Indigenous genomes and define its architecture using nearly 500 whole genomes from Australia’s First Peoples. The student will be trained in analysis if next-generation sequence datasets including copy number variation and analysis of variants using ACMG guidelines. The project will create a global variant map and study the association of variants to chronic diseases. Functional validation experimental analysis of the effects of protein variants identified may be undertaken through collaboration with experimental biologists.

This project is aligned with the QUT Annual Scholarship Round. When applying for admission to the PhD the applicant will be considered in the scholarship round provided that the application is submitted by the 20 September 2020.

If successful in the Annual Scholarship Round the candidate will receive a tax-exempt living allowance stipend of $28,092 per annum (indexed annually) for three years. International applicants successful in the Annual Scholarship Round will also receive a full Tuition Fee Sponsorship. There is also the opportunity for a $5000.00 per annum top-up scholarship for applicants of exceptional academic merit.

Application criteria:

Our ideal candidates hold an MSc or equivalent in a relevant discipline (e.g. Bioinformatics, Statistics, Computer Science, Data Science, Genomics etc) with strong analytical and programming skills. Excellent oral and written communication skills, motivation and the ability to work as part of a team is also required.

Aboriginal and Torres Strait Islander students are strongly encouraged to apply

Identification and characterization of full-length extrachromosomal circular DNA in post-mitotic neurons

Dr Qiongyi Zhao (q.zhao@uq.edu.au)

Extrachromosomal circular DNA (eccDNA) had previously been observed across different species of organisms from yeast to human. Circular DNA enrichment sequencing (CIDER-Seq) is a technique to enrich and accurately sequence circular DNA without the need for polymerase chain reaction amplification, cloning, and computational sequence assembly. In this study, we performed CIDER-Seq and generated the long-read (PacBio) sequencing data using post-mitotic mice neurons. Combined with the Illumina data that was generated previously, this project will require the candidate student to analyse both long-read (PacBio) and short-read (Illumina) sequencing data, to identify and characterize full-length eccDNAs in post-mitotic neurons.

The ideal candidate will have knowledge and experience in bioinformatics and analysis of next-generation sequencing data. The off-shore candidate needs to be able to access the UQ computing cluster or the local computing server to process the sequencing data.

Network analysis of connectomics data

Contact: Dr Kai Feng (k.feng@uq.edu.au) and prof Barry Dickson (b.dickson@uq.edu.au)

Understanding the logic of biological neural networks will not only shed light on how behaviours are generated by the nervous system but also inspire the design of more agile robots and more efficient artificial neural network architectures. Reconstruction of neural circuits using electron microscope has reached unprecedented large scale in the last few years, such that the connectome (i.e. the synaptic connections between all neurons) has now been determined for almost the entire brain and nerve cord of the fruit fly Drosophila (further reading: google “hemibrain”). This connectome consists of >10m synapses between >200000 neurons. Analysing the structure of this connectome will help to uncover the algorithms used by the brain to perform different kinds of computations. Connectomes can be analysed using methods similar to those developed for other biological networks, such as genomes and proteomes, which also seek to uncover motifs within a network of interacting components. We are specifically interested in the network motifs that generate the fly's motor patterns. Insights from these computational studies will create testable hypotheses to guide our experimental studies.

The project is suitable for one or two semesters, full time. International students working remotely from overseas are welcome. The ideal candidate should have interests in neuroscience and good skills in programming using Python or R. PhD scholarships in related areas are also available.

Complex neoantigen prediction in cancers

Contact: Dr Nic Waddell (nic.waddell@qimrberghofer.edu.au) and Dr Venkat Addala (Venkateswar.Addala@qimrberghofer.edu.au)

Next generation sequencing has allowed researchers to characterise the somatic landscape of cancer genomes, which has led to the discovery of biomarkers that may be predictive and prognostic to targeted therapies. However, the efficacy of current targeted therapies has failed to raise the overall survival curve in many tumour types. Immunotherapy has shown a promising benefit in treating many tumours and demonstrated remarkable responses in some patients even at recurrent, relapse and metastasis stage. The challenge now is to determine who and why some patients respond to treatment. Somatic mutations within the genomes of cancer cells may result in neoantigens that are presented on the tumour cell surface. These can then be seen by the immune system and killed by the patients immune system.

This project will test and develop bioinformatic approaches that can be applied to understand complex tumor-immune interactions. Specifically the project will use genome and RNAseq data to predict neoantigens and determine which of these be important in immunotherapy. The findings from this work are likely to shed new insight into tumour immunology and may predict which patients will respond to immunotherapy. This project requires basic knowledge of R or python.

The project is available throughout the year for students with bioinformatics knowledge (Honours or Masters). This project cannot be carried out remotely due to ethics restrictions on data sharing.

Authors

Contributing authors: