Research interests

We are fascinated by how data science accelerates biomedical science, not only as a tool but also as a way of thinking, right in this era with exponential growth of data.

Within our lab, we aim to formulise a range of biomedical challenges into data science problems by designing powerful statistical models and developing efficient algorithms, including Markov chain Monte Carlo and variational inference. Our work is mostly problem-driven and methodology focused, and we benefit greatly from close collaborations with experimental labs. Currently, we have a focus on the following themes.

You can also have a look at our continuous reading in our paper monitoring archive.

Machine learning methods and their biomedical applications

Our recent work covers both supervised learning and unsupervised learning, including predicting RNA splicing efficiency with genomic sequences and learning effective embeddings for robust RNA velocity analysis. We are further stretching into deep learning frameworks, often in conjunction with Bayesian graphical models.

Statistical modelling of single-cell genomics and spatial transcriptomics

In the past few years, we have developed a set of statistical methods for analysing single-cell genomic/transcriptomic data, for example, BRIE (Bayesian Regression for Isoform Estimate) for RNA splicing quantification and phenotype detection with efficient variational inference. Currently, we are actively investigating the dynamics of single-cell transcriptomes and inferring the underlying cell differentiation, particularly by leveraging RNA velocity, a recent technique based on intrinsic RNA splicing processes. The recent accumulation of high-quality spatial transcriptomic data is also promising for dissecting complex systems, and we are expanding our modelling of cellular transitions through both time and space.

Integrative analysis of cancer mutations and clonal evolution

The complexity of cancer tissues is partly caused by the heterogeneity at multiple molecular layers, e.g., DNA and gene expression, hence integrative analysis is often crucial for understanding the clonal mutations. We recently developed a suite of tools for achieving accurate analysis in different steps, including cellsnp-lite for efficient genotyping of a vast amount of cells, MQuad for accurately identifying clonal informed variants in the mitochondrial genome, and Cardelino / Vireo to effective clustering of cells by their clonal mutations. Along this path, we are further developing methods to integrate information at multiple levels to more accurately infer clonal structure and mutation evolution, hence deciphering the impact of somatic mutations on transcriptome phenotypes.

  • multiple assays (e.g., scRNA-seq, single-cell DNA-seq, scATAC-seq, bulk exome-seq)
  • multiple mutation types (e.g., SNV, CNV and mtSNV)