User Tools

Site Tools


open_projects

This is an old revision of the document!


Open projects in Bioinformatics

Below you will find descriptions of projects that bioinformatics students may do in fulfilment of project courses in their respective degrees. Please contact the person listed for further detail.

If you are a potential supervisor, click here.

Finding new phages in the genomes of gut bacteria

Contact: Rosalind Gilbert, ros.gilbert@daf.qld.gov.au, Department of Agriculture and Fisheries; Diane Ouwerkerk, diane.ouwerkerk@daf.qld.gov.au, Department of Agriculture and Fisheries.

Viruses infecting bacteria (bacteriophages or phages) are highly abundant in microbial ecosystems such as those found in the gut. This project will involve finding and annotating novel bacteriophages present as prophage elements within the genome sequences of gut-associated bacteria, for example, those infecting the genera often found in ruminant livestock (for example, Ruminococcus, Bacteroides and Butyrivibrio). Prophage elements will be annotated, characterised and compared to previously identified bacteriophages. The extent to which these novel prophages are found in gut microbial ecosystems will also be determined through comparison with metagenomic datasets. This computer-based project will use bioinformatics tools to build and interrogate sequence datasets, and combines interests in microbial genetics and viral ecology. Available all year, for Master of Bioinformatics or Honours students; suitable for 2 unit (one day per week, 1 semester), 4 unit (2 to 3 days per week, 1 semester) projects

Identification of carbohydrate utilisation genes in gut-derived metagenomic datasets

Contact: Rosalind Gilbert, ros.gilbert@daf.qld.gov.au, Department of Agriculture and Fisheries; Diane Ouwerkerk, diane.ouwerkerk@daf.qld.gov.au, Department of Agriculture and Fisheries.

The herbivore gut has evolved to harbour a dense microbial community able to break down plant material which would be otherwise indigestible. In order to understand how this microbial community digests plant carbohydrates, a metagenomics approach can be used to identify the microbial genes responsible for carbohydrate breakdown. This project will involve processing new metagenomics sequence datasets (quality filtering, de novo assembly and sequence annotation). It will also develop an analysis pipeline to identify the genes responsible for carbohydrate utilisation and allow comparison of gut-associated metagenomics datasets from ruminant livestock (cattle, sheep) and native animals (kangaroos).

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Visualisation framework for cancer gene modules using Cytoscape

Contact: Sriganesh Srihari (s.srihari@uq.edu.au) or Mark Ragan (m.ragan@uq.edu.au)

Cytoscape (http://www.cytoscape.org/) is an open-source platform for visualisation of molecular networks. In this project, we seek develop a comprehensive resource of cancer gene alterations – mutations, and changes in copy-number, gene expression changes and methylation – by integrating multiple cancer datasets (from The Cancer Genome Atlas, METABRIC and our in-house datasets) to build a visualisable and easy-to-navigate network of these alterations using Cytoscape. Since this project will integrate multiple datasets, a certain level of R or Python programming is expected, together with experience or interest in handling large data. Loading these data into Cytoscape will also involve similar data-handling capabilities.

This is a six month to one year project.

Changes in the epigenome associated with alcohol consumption and/or smoking

Contact: Professor Naomi Wray (Naomi.wray@uq.edu.au), Dr. Sonia Shah (s.shah1@uq.edu.au)

We know that epigenetic processes such as DNA methylation, that are essential for the regulation of gene expression, are dynamic. They are influenced by genetics, lifestyle and environment. We can offer two Honours projects in the area of epigenetics of populations- one focused on alcohol consumption and the other on smoking. A number of genome-wide association studies (GWAS) have focused on identifying genetic variants that may influence susceptibility to alcohol or nicotine dependence. This project aims to perform a epigenome-wide association study (EWAS) to identify CpG sites in the genome whose methylation status is associated with alcohol consumption/smoking status. Using the EWAS-identified CpGs, we would like to develop a predictor for alcohol consumption/smoking from methylation data. In many research cohorts DNA and disease status is known, but environmental risk factors may not be available. Estimation of environmental risk factors from analysis of DNA methylation status will be a useful tool in research and forensics. This is a computer-based project suited to those with quantitative skills and some knowledge of statistics and bioinformatics.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Identification of DNA methylation changes in Anxiety and Depression

Contact: Dr Divya Mehta (d.mehta1@uq.edu.au), Dr Allan McRae(a.mcrae@uq.edu.au),

Background: By 2030, depression and anxiety are expected to be the number one health concern in the world. Epigenetics might help tie the complexities of how genes and environment interact in depression and anxiety disorders to bring about better understanding of the biological underpinnings of these disorders. DNA methylation is the addition of a methyl group by DNA methyltransferases at cytosine–phosphate-guanine dinucleotide (CpG) sites within the promoter regions of genes and is the most well studied epigenetic mechanism. Recent developments in microarrays allow measuring DNA methylation at 480,000 CpG sites in the genome in a fast, cost-effective and high throughput manner. CpG methylation in genes may be associated with the Hospital Anxiety and Depression (HAD) scale and could reflect the underlying genetic and/or non-genetic risk factors for anxiety and depression, having the potential to provide novel insight into the molecular pathways involved in the pathophysiology of disease. Aim of project: To perform an unbiased genome-wide analysis to identify changes in DNA methylation associated with anxiety and depression. Methods: The Lothian Birth Cohorts of 1921 and 1936 were combined of which 1366 participants have relevant methylation and phenotype data. Anxiety and Depression has been measured using the Hospital Anxiety and Depression Scale and methylation data was processed from peripheral blood samples using InfiniumHumanMethylation450 Array. All statistical analysis will be carried out in R, that is a free software providing language and environment for statistical computing and graphics. Expected outcomes: We expect to identify novel genes whose DNA methylation patterns are associated with depression or anxiety. The long term goal of all these studies is one day provide novel biomarkers to predict future susceptibility/onset of disease, improve diagnosis and aid in development of epigenetic based therapies. Requirements: This is a computer-based project suited for students with basic quantitative skills.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Shared genetic influences between telomere length and schizophrenia

Contact: Professor Naomi Wray (naomi.wray@uq.edu.au), Dr Divya Mehta (d.mehta1@uq.edu.au)

Telomeres are DNA-protein complexes located at chromosome ends. Telomere shortening is a marker of oxidative stress and ageing. Telomeres play a key role in diseases since they prevent chromosome fusion and maintain genome stability. Psychiatric disorders including Schizophrenia, Bipolar disorder and Major depressive disorders are associated with telomere shortening. This study will test the genetic link between telomere length and Schizophrenia. Polygenetic risk score is a measure depicting an individual’s genetic risk of disease. Polygenic risk scores have been increasingly used to understand the polygenic architecture of psychiatric diseases and investigate the overlap between psychiatric diseases and other phenotypes. This study will use polygenic scores from genetic variants associated with telomere length to test if these predict risk for Schizophrenia in an independent sample. We will use the most recently data from by the Psychiatric Genomics Consortium (over 60,000 people) and telomere length Genome-Wide Association Studies results. This study will make use of genetic data from genome-wide association studies and interrogate the genetic overlap between telomere length and schizophrenia using statistical pipelines in PLINK, R and Linux. Depending on initial results, we will extend these analyses to include other psychiatric disorders such as Bipolar disorder and Major depressive disorders. These results will provide evidence that the molecular underpinnings of telomere length also contribute towards the genetic susceptibility of psychiatric disorders. This is a computer-based project suited for students with basic quantitative skills.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

The role of trans-eQTLs in schizophrenia

Contact: Professor Peter Visscher (Peter.Visscher@uq.edu.au), Dr Jake Gratten (j.gratten1@uq.edu.au), Dr Joseph Powell (joseph.powell@uq.edu.au)

Recent genome-wide association studies (GWAS) have identified >100 genomic regions containing common genetic variants underlying risk of schizophrenia, but the biological and molecular mechanisms of these associations are not yet understood. The prevailing theory is that many will be explained by mutations influencing gene regulation, so-called expression quantitative trait loci or eQTLs. eQTLs can be classified as cis or trans, according to whether they influence the RNA transcription levels of nearby or distant genes (respectively). Most attention to date has focused on cis-eQTLs but trans-eQTLs are likely to be extremely important.The starting hypothesis for this proposal is that trans-eQTLs explain a proportion of identified schizophrenia associations. A genome-wide scan will be performed using RNA transcription data from postmortem human brain to identify genes whose regulation is controlled in trans by eQTLs in schizophrenia associated genome regions. Common genetic variants (termed single nucleotide polymorphisms or SNPs) in the vicinity of these genes will then be assessed for enrichment of association signal in Psychiatric Genomics Consortium (PGC) GWAS datasets. Additional analyses are possible and we encourage the student to develop and explore their own ideas in consultation with the supervisors. This is a computer-based project suited to those with quantitative skills and some knowledge of statistics and bioinformatics and a desire to improve programming skills.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Analysis of candidate genes for motor neuron disease

Contact: Dr Marie Mangelsdorf, (m.mangelsdorf@uq.edu.au)

Motor neuron disease (MND) is a late onset neurodegenerative disease in which the motor neurons that control muscle movement die, leading to paralysis and death usually within 3 years of diagnosis. There is no treatment. MND may be both familial or sporadic. More than 20 genes for MND have been identified largely through analysis of familial cases and for most genes, sporadic cases have also been shown to harbour mutations in the same genes. Currently mutations in these genes account for ~60% of familial cases, and 10% of sporadic cases. We are undertaking multi-faceted genomics approach of sporadic cases in order to uncover the genetic causes of sporadic MND. One aspect of this study is whole exome sequencing (WES) of sporadic cases. The Honours project will validate and investigate sequence variants in candidate genes identified by WES. Techniques used to determine pathogenicity of the variants will include polymerase chain reaction, Sanger sequencing, molecular cloning, tissue culture and microscopy.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Risk prediction for psychiatric disorders in a prodromal cohort

Contact: Professor Naomi Wray, (Naomi.Wray@uq.edu.au) In conjunction with Professor Ian Hickie, University of Sydney

Young people presenting at adolescent mental health facilities display prodromal symptoms for which diagnosis and hence optimum treatment strategies may be unclear. Some may be on trajectories for long-term mental health issues, while others are struggling with normal teenage angst and many not have long-term mental health problems. We have longitudinal data collected on a cohort of ~200 individuals attending a youth mental health clinic in Sydney. These subjects also provided a DNA sample and have been genome-wide genotyped. The student will make predictors of genetic risk for a range of psychiatric disorders (and also non-psychiatric disorders as a negative control). He/she will interrogate the longitudinal phenotypic data together with the genetic risk prediction. The ultimate goal is to determine if genetic risk prediction at time of first presentation at a youth mental health clinic will have any clinical utility. This is a computer-based project that requires numerical competency and combines interests in genetics, psychiatry and epidemiology.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

The role of mitochondrial genome in gene expression

Contact: Professor Peter Visscher, (peter.visscher@uq.edu.au), Dr. Matthew Robinson (m.robinson11@uq.edu.au)

A central goal in modern biology is to understand how the integration of gene functions across a genome leads to the individual’s specific phenotype. A key facet to this effort is to develop models that mathematically link genetic to phenotypic variation, an aim that is central to studies of human medical genetics. However, in most genomics studies, analysis is largely confined to the nuclear genome, with much less attention paid to the organellar genome (mitochondrial DNA). This is in contrast to the central role that the organellar genome plays in controlling organismal metabolism and function, and increasing evidence from other non-human organisms that organellar genomic variation can modulate the effects of nuclear genomic variation.Genomic variation in human mitochondria has been linked to several severe diseases, and more recently quantitative studies of common human diseases have suggested that genetic variation in organellar genomes may modify the effects of nuclear loci. This study aims to directly estimate the amount of variation in gene expression attributable to organellar DNA and to interrogate the interaction of the nuclear and cytoplasmic genomes. This is a computer-based project suited to those with quantitative skills and some knowledge of statistics and/or bioinformatics Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Genetics of X Inactivation

Contact: Professor Peter Visscher (peter.visscher@uq.edu.au), Dr. Allan McRae (a.mcrae@uq.edu.au)

X-inactivation inactivation is a process in which one of two copies of the X chromosomes in a female is inactivated early in development. Usually this inactivation occurs randomly, but one chromosome can be preferentially inactivated to mask X-linked disorders. In addition, some genes can escape inactivation and have the “inactivated” copy expressed. The extent of the escape from X-inactivation at a gene can vary among individuals, with some people showing complete inactivation at a gene while others show both copies of the gene expressing fully.This project will use gene-expression and DNA methylation microarray data to identify genes on the X-chromosome that vary across individuals in their extent of escape from X-inactivation. The identified gene will be tested to see if any genetic variants can be found to explain the variation between individuals. This is a computer-based project suited to those who have some knowledge of statistics and a willingness to learn basis data manipulation skills.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Gene expression from dried blood spots

Contact: Professor Naomi Wray, (Naomi.wray@uq.edu.au); In conjunction with Professor Grant Montgomery at BQIMR (Grant.Montgomery@qimrberghofer.edu.au)

Gene expression studies are providing important insight into variation between individuals and the causes and consequences of disease. Currently gene expression studies require biological samples such as blood to be fixed using RNA stabilization products so that the gene expression measured represents the gene expression at the time of collection. At the Berghofer Queensland Institute of Medical Research we have collected large cohorts of twins and their family members. The participants have completed a broad range of interviews, questionnaires and other tests. In more recent years RNA has been collected using PAXgene tubes. All samples have blood collected in EDTA tubes and blood smeared onto special filter paper. Recent publications have shown that RNA can be measured from dried blood spots. Measurement of RNA from historical samples would be very valuable for the BQIMR cohorts. It would also save grant funds if collection using expensive PAXgene tubes were no longer needed.In this project the student would work up protocols for measurement of RNA from dried blood spots and would compare gene expression levels in the sample individuals using RNA from dried blood spots and PAXgene tubes. If the project progresses well then extensions to consider gene expression from different paper types and the impact of the age of the sample can be examined.

This project is a lab-based project followed by analysis of gene expression data. It is suited to those with interests in molecular genetics. It is a project shared across BQIMR (lab work) and QBI (analysis). Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Calling variants for whole exome sequencing data – a matter of depth

Contact: Professor Naomi Wray (Naomi.wray@uq.edu.au), Dr Qiongyi Zhao (q.zhao@uq.edu.au)

Recent advances in sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing (WES) using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results.Although WES is getting more and more popular, there is still no consensus on what is an optimal sequencing depth per individual sample for the performance of calling variants in terms of accuracy, sensitivity and specificity. In this project, we are going to investigate the effect of sequencing depths on variants discovery using GATK pipeline based on our in-house WES data, calling variants at both individual level and joint genotyping level. If time allowed, this study can be extended to test different variants callers in GATK or even include other publicly available variants calling methods (eg. Illumina ELAND and CASAVA software tools). Results of this study will be useful to guide the study design for future WES projects. The student will learn some basic skill of programming (ideally PERL) and gain experience in using some popular NGS analysis tools. This is a computer-based project suited to a student with basic quantitative skills.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Genetic mechanisms underlying beneficial effects of pregnancy in multiple sclerosis

Contact: Dr Marie Mangelsdorf (m.mangelsdorf@uq.edu.au), Dr Divya Mehta (d.mehta1@uq.edu.au)

Pregnancy, a psychosocial stressor, interestingly has a long-term beneficial effect for development of multiple sclerosis. The precise mechanisms involved in this protective effect are not clear. This project aims to evaluate longitudinal gene expression profiles in women during pregnancy. Whole-genome gene expression data will be analyzed to identify genes significantly differentially expressed across pregnancy. Top candidates will be selected for further analysis in a rat model of multiple sclerosis. RNA will be extracted from brain and spinal cord from rats and gene expression tested by qPCR. This project encompasses statistics, bioinformatics and wet-laboratory procedures, providing a holistic research experience. Do a subset of genes confer a protective effect against multiple sclerosis via altered gene expression profiles in pregnancy? Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Bioinformatics analysis to characterise species of gnathiid isopods

Contact: Jess Morgan, Jessica.morgan@uq.edu.au, The University of Queensland. I am looking for a motivated student to help me investigate applying bioinformatics techniques to characterise species of gnathiid isopods that parasitise fish. Specifically, we aim to capture the complete mtDNA of the parasites (plus any other markers would be a downstream bonus). I will need a student capable of constructing contigs from a next gen library (either ion torrent or illumina as determined by their lit review) then mining the contigs to scaffold against published crustacean mt genomes via Blast searches. This is a bioinformatics project for semester 2, 2014 and depending on funding may have the option to extend into semester 1, 2015. If you are interested in applying please forward a short resume and your academic record to Jess Morgan (Jessica.morgan@uq.edu.au). Available for Master of Bioinformatics students; suitable for one or two semesters, full-time

Analysing cancer genomic datasets using HotNet

Contact: Sriganesh Srihari (s.srihari@uq.edu.au) or Mark Ragan (m.ragan@uq.edu.au), Institute for Molecular Bioscience, The University of Queensland

HotNet (http://compbio.cs.brown.edu/projects/hotnet/) is an algorithm developed by Eli Upfal and colleagues at Brown University to identify significantly altered subnetworks within large gene interaction networks using mutation scores for genes.We are interested in applying HotNet to breast cancer datasets to understand alterations in key signalling pathways of the DNA-damage response (DDR) machinery. Our focus will primarily be on familial breast cancers which are caused due to germline mutations in key members of DDR such as BRCA1, BRCA2, ATM, BRIP1 and PALB2. Interestingly, although these genes belong to the same or closely related pathways, alterations in them result in distinct molecular and clinical phenotypes. For instance, BRCA1 and BRCA2 belong to the same (homologous recombination DNA double-strand break repair) pathway, yet the tumours initiated from BRCA1 mutations are considerably more aggressive (characterized by lower expression levels of oestrogen and progesterone receptors, higher metastatic ability and lower survival rates) compared to BRCA2-initiated tumours. We hypothesize that identifying the exact subpathways impacted in these two tumour groups could through further light on the reasons behind these differences, and also help us identify specific drug targets for these tumour groups.The project will involve applying HotNet to our in-house datasets to extract subpathways relevant to familial breast cancer groups. Requirements: Programming ability in Python and/or C/C++. Some experience or interest in network/graph algorithms is a plus. Knowledge of biology or cancer is not mandatory.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Integration of multi-omic cancer datasets via networks

Contact: Sriganesh Srihari (s.srihari@uq.edu.au) or Mark Ragan (m.ragan@uq.edu.au), Institute for Molecular Bioscience, The University of Queensland Over the last few years, there has been a deluge of studies producing large-scale datasets spanning epigenomic, genomic, gene expression, transcriptomic and proteomic profiles from thousands of cancer samples. While each of these individual datasets presents a distinct “mono-omic lens” to view cancer, an integrated multi-omic picture, crucial to investigate the complex mechanisms of cancer, is largely missing. This is because we lack the methods to integrate these individual omic datasets in an additive and efficient manner. Networks of interactions among genes (conceptual networks), or of physical interactions among proteins (molecular networks), provide an interesting way to integrate multi-omic datasets in an effective manner. This project seeks to develop efficient computational methods to integrate multi-omic datasets in cancer based on fundamental concepts in network/graph theory. We will devise a network propagation-based algorithm that takes different kinds of omic data on individual genes and propagates them across to other genes, in this way building integrative multi-omic networks. The resulting network will be a highly enriched resource of biological information, and mining it could lead to novel and valuable insights into cancer mechanisms. While the theory and algorithms required in this project have already been devised or investigated, the student is expected to discuss with the supervisor and quickly understand these algorithms and implement them in software. (S)he is expected to have some experience in developing software using Python, Perl or C/C++, and will be required to perform experiments and analysis through constant discussions with the supervisor. This project is ideally suited for a student with a basic degree computer science, mathematics, physics, or statistics, although students from other backgrounds with interest, skills and enthusiasm to work in a computational project are always welcome. Opportunities exist for developing new ideas and approaches, and for contributing to high-quality publications.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Co-expression analysis to elucidate common factors underlying cancer cell metastasis

Contact: Alison Anderson or Mark Ragan (m.ragan@uq.edu.au), Institute for Molecular Bioscience, The University of Queensland

There is little overlap among genes identified as candidate mediators of metastatic colonisation in brain lung, bone and liver (Valastyan & Weinberg, 2011). These tissue-specific profiles may however be driven by common transcriptional / post-translational regulatory mechanisms. The key genes involved in these underlying mechanisms are likely to have subtle changes in expression that are not detected by differential expression approaches. An alternative approach is to look for dysregulated co-expression: changes in the correlation between their expression levels and that of genes they are known to interact with. Similarly, different primary cancers can metastasise to the same target organ by developing different molecular programs and activating distinct signalling pathways (Lorusso & Ruegg, 2012). It is possible that there are common underlying mechanisms here too that have been missed by traditional differential expression approaches. Consider, for example, the perturbation of a chromatin modifier that is key to providing access to target genes in response to a specific receptor signal: the signal trigger and target genes are tissue-specific, but the role of the chromatin modifier in opening up chromatin is likely a common factor. To look for common factors in cell metastasis using co-expression analysis you would need to: construct a candidate geneset by conducting a literature search to identify key genes in the mechanisms of interest, then identifying known protein-protein interactions for these genes; find suitable publicly available transcription data with which to profile co-expression of genes in the geneset; and look for common features in the resulting co-expression networks. Available all year; suitable for Masters of Bioinformatics students with basic programming/scripting skills. Knowledge of gene transcription regulatory mechanisms would be an advantage.

Method for the reconstruction of an ancestral protein interaction network

Contact: Mikael Boden, m.boden@uq.edu.au, School of Chemistry and Molecular Biosciences, The University of Queensland.

While phylogenetic analysis has enabled the identification and in some cases the synthesis of ancestral genes, the broader context of such genes are seldom reconstructed in tandem. This project will investigate methods that enable the identification of interactions amongst ancestral proteins, on the basis of interactions present in existing species. See Gibson and Goldberg (Bioinformatics 27(3) 2011) and Voordeckers et at. (PLoS Biology 10(12) 2012). Available all year, for Master of Bioinformatics students; suitable for one semester, full-time

Identification and characterization of short open reading frames

Contact: Joseph Rothnagel, j.rothnagel@uq.edu.au, School of Chemistry and Molecular Biosciences, The University of Queensland.

Short peptides (sPEPs) that are encoded by short Open Reading Frames (sORFs) are surprisingly common in eukaryote genomes. Recent bioinformatic and ribosomal footprinting studies have identified several thousand sORFs with coding potential and several sPEPs have been identified by mass spectrometry. However, their role in cellular functions remains to be determined. In this project you will identify and characterize sPEPs using bioinformatic tools to interrogate large data sets from genomic, transcriptomic and proteomic experiments. You will help to determine the contribution of sPEPs to the human proteome, and provide insights into their roles. Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Reconstruction of ancestral proteins

Contact: Elizabeth Gillam (e.gillam@uq.edu.au), Yosephine Gumulya (y.gumulya@uq.edu.au), or Mikael Boden (m.boden@uq.edu.au). We are looking for a student interested in sequence analysis for purposes of reconstructing ancestral proteins, potentially to have existed many millions years ago. This work can be focused on the application of current methodologies to generate candidate ancestral sequences, and/or the development of computational methodologies to support the process of generating candidate ancestors or ancestral components. The former angle is short term (a #2 course or part of a longer project course) and is suited to candidates with an interest in sequence analysis for understanding evolution of protein function and protein engineering. The latter angle is longer term (semester long) and suits candidates with strong computational skills.

Available all year, for Master of Bioinformatics students; suitable for one semester (#2 – #8).

Transcriptomic and proteomic analysis of cone snail venom

Contact: Richard J. Lewis (r.lewis@uq.edu.au) The one-year project would analyse the transcriptome from the venom duct of a cone snail species using advanced bioinformatic tools and prepare table and figures to visualise the data. If the study goes well the work by the student should end up as part of a publication. The following articles provide background to the project.

Dutertre S, Jin A-H, Vetter I, Hamilton B, Sunagar K, Lavergne V, Dutertre V, Fry BG, Antunes A, Venter DJ, Alewood PF, Lewis RJ (2014) Evolution of separate predation- and defence-evoked venoms in carnivorous cone snails. Nature Communications 5:3521.

Dutertre S, Jin AH, Kaas Q, Jones A, Alewood PF, Lewis RJ (2013) Deep venomics reveals the mechanism for expanded peptide diversity in cone snail venom. Mol Cell Proteomics 12:312-329.

Methods for data integration and models of transcriptional regulation

Contact: Mikael Bodén (m.boden@uq.edu.au)

Biological data at greater scale give statistical power to distinguish meaningful signals from mere noise or artefacts, i.e. to identify "drivers" and "determinants" of function and structure. To enable the integration of uncertain and incomplete data, and of biological expertise, my group is developing probabilistic modelling tools and machine learning approaches to go hand-in-hand. These can provide interpretations of "whole system" data, aimed at understanding of basis of disease and other scientifically relevant phenotypes. Projects are available for students with interests of developing algorithms or applications of them to understand regulatory events, involving the use of genome-wide data, including ChIP-seq, ChIA-PET, chromatin state assays, etc.

Availability all year, for bioinformatics students with analytical skills, Honours or Masters.

Benchmarking statistical approaches for the meta-analysis of clinical data; application to Sjogren's Syndrome

Contact: Kim-Anh LeCao (k.lecao@uq.edu.au) or Florian Rohart (f.rohart@uq.edu.au).

This is a project targeted to biostatistical or bioinformatics students. The overarching aim of this project is to evaluate and apply multivariate statistical approaches on a large clinical dataset, to identify signatures of an autoimmune disease, Sjogren's Syndrome. Sjogren's Syndrome is an autoimmune disease that results in an inability to produce saliva or tears - 'dry eyes, dry mouth'. It is an inflammatory condition that impacts on many systems in the body, leading to pain and fatigue, and may be accompanied by rheumatoid arthritis, lupus and scleroderma [1]. It caused by a dysregulation of the immune system, and some individuals with Sjogren's Syndrome go on to develop lymphocytic leukemia. The student will integrate experimental data from several Sjogren's datasets and will evaluate different statistical approaches that can aid with identification of Sjogren's syndrome biomarkers. This information will be used to gain a better understanding of the immune processes that lead to the disease. In addition, the student will seek to find useful classifiers within these datasets that might help clinicians stratify patients and predict the symptomatic course of the disease.

One of the major challenges facing modern genetic studies is the integration of data from different sources. It is desirable to combine datasets for several reasons: increased statistical power by increasing sample sizes, building in robustness and reproducibility of findings across unrelated datasets and identifying underlying patterns that may only be possible on merged data types. However when combining different studies for a meta-analysis several potential confounding issues arise - these include variation driven by the data source (laboratory or technical operator), biological variation (such as genetic background, poorly-defined phenotypic criteria, or environmental factors) or by the instrumentation (platform technology, file format, linear range of data). Normalization methods were developed to address the unwanted variation in the data from technical sources that may confound downstream analyses. This project will require the student to run a head-to-head evaluation of commonly used normalization methods, and benchmark these on a series of related clinical datasets, to enable the main aim of the project, which is to mine the combined data for biological signatures of disease severity.

The project will provide the student with experience in handling different types of file formats, running several common normalization approaches, include Quantile normalization [2], ComBat [3] and YuGene [4] as well as classification methods, including machine learning approaches (Support Vector Machine (SVM) [5], random forests [6]) and multivariate approaches (sPLS-DA [7]).

This project requires an independently motivated and enthusiastic applicant with a sound understanding of descriptive statistics and familiarity with the R-programming environment. The project forms part of a wider collaboration between researchers at the University of Queensland, Australia, and two UK institutions- The University of Glasgow and Newcastle University. The clinical lead on the project is Professor Fai Ng, University of Newcastle and outcomes of this work will go back to Professor Fai for evaluation within the Sjogren's project. The student will be supervised by Dr Kim-Anh LeCao (TRI, UQ) and Dr Florian Rohart (AIBN, UQ), in the laboratory of Associate Professor Christine Wells (AIBN, UQ and University of Glasgow, UK).

[1] Ng WF. Primary Sjogren's Syndrome Research and Therapy: Has a New Dawn Arrived?. Current Pharmaceutical Biotechnology 2012, 13(10), 1987-1988. Editorial (also see accompanying articles in this issue). [2] Bolstad, B.M., Irizarry, R.A., Astrand, M. and Speed, T.P. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185-193. [3] Johnson, W.E., Li, C. and Rabinovic, A. (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8, 118-127. [4] Le Cao, K-A., Rohart, F., McHugh, L., Korn, O. and Wells C-A. (2014) YuGene: a simple approach to scale gene expression data derived from different platforms for integrated analyses. Genomics. 103(4):239-51. [5]Cortes, C.; Vapnik, V. (1995). Support-vector networks. Machine Learning 20 (3): 273. [6] Breiman, L. (2001). Random Forests. Machine Learning, 45, 1, 5-32 [7] Le Cao, K-A, Boitard, S. and Besse, Ph. (2011) Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems. BMC Bioinformatics. 12: 253.

Available all year, for Master of Bioinformatics students; suitable for one semester, full-time.

Identification and functional analysis of novel transcripts in ankylosing spondylitis

Contact: Dr Gethin Thomas (gethin.thomas@uq.edu.au) Human genetics Group, UQ Diamantina Insitute Ankylosing spondylitis (AS) is a common inflammatory arthritis affecting over 22,000 Australians. It causes pain and stiffness predominantly of the spine, and inexorable progressive fusion (ankylosis) of the affected joints which no current treatment can prevent. AS is strongly heritable and our group has lead several GWAS studies which have identified 28 independent loci associated with the disease. Although some of the associated-loci lie in coding regions and induce functional changes in the gene’s protein products, nearly all complex disease GWAS conducted to date indicate that most of the associated loci lie in introns or intergenic regions.In AS, 19 of the 28 independent loci identified are intergenic. Thus, it is likely that some if not most of these loci will be involved in altered transcription in these regions, either of the proposed candidate genes or of previously unidentified genes and transcripts. We have adopted two strategies to fully characterise the transcriptional activity at these intergenic loci in peripheral blood mononuclear cells (PBMCs). To identify very rare transcripts (as we have already demonstrated for one of the loci) we have undertaken a “CaptureSeq” study which utilises very deep RNA-sequencing that generates an equivalent read depth of 10 billion reads. We have targeted this sequencing approach to 10 intergenic loci in 5 AS patients and 5 healthy controls. To complement this targeted approach we have also performed a large-scale RNAseq study in 70 AS patients and 80 healthy controls to a depth of 56 million reads to enable identification and quantification of transcripts across the whole genome. This is a world-first resource in AS and we now are in a position to mine this data to further elucidate the molecular changes underlying the disease processes in AS. We will be able to define the full transcriptome for AS, incorporating known genes and their splice variants as well as a novel transcriptome cataloguing ncRNA expression in AS. We will be combining this data with genotyping data to also undertake eQTL studies. Data analyses will involve validation of candidate transcripts using large scale genomic resources such as ENCODE and FANTOM5 together with lab-based approaches We have projects available to; · Identify novel ncRNAs underlying GWAS hits · Analyse the known AS transcriptome · Analyse the novel AS transcriptome · Analyse the AS splicesome · Undertake genomewide eQTL analyses

Available all year, for Master of Bioinformatics or Honours students; suitable for one semester, full-time.

Investigation of vaccine targeting to skin with microprojection arrays

Contact: Dr Stefano Meliga (s.meliga@uq.edu.au)

The Nanopatch is a high density microprojection array for epidermal and dermal delivery of vaccines. Application to mouse skin has resulted into immunogenicity comparable with intramuscular injection using less than 1/100 of the dose. However, the mechanisms underlying this low-dose potent response have not yet been fully understood. Experimental evidence suggests that enhanced immunogenicity is triggered by precise targeting of vaccine to skin antigen-presenting cells in conjunction with generation of controlled levels of cell damage. We offer student projects aiming at the numerical investigation of Nanopatch-mediated vaccine targeting, generation of inflammation and triggering of signalling pathways leading to immune response. The successful candidate will be developing and applying mathematical / statistical models to verify the mode-of-action hypotheses, and have the chance to drive the design of the next-generation delivery device. This project is computer-centred and will be taking place at AIBN. Requirements: Applied mathematical and statistical skills, programming ability (e.g. MATLAB, R), interest in working in a highly multidisciplinary team is a plus, knowledge of immunology is not mandatory.

Available all year.

open_projects.1484692698.txt.gz · Last modified: 2017/01/18 09:38 by 192.168.223.219