Incorporating genomic variation information
into DNA sequencing data analysis

OPUS 11 scientific project of National Science Centre

Project description

In the majority of DNA sequencing experiments the first step of analysis consists of mapping sequencing reads onto a so-called reference genome, which represents the consensus of genomic sequence of the species of interest. Currently reference genomes are available for thousands of species and much effort is devoted to the analysis of genomic diversity among them. This is especially visible in the case of human genomics, where the development is driven by the perspective of application to personalized medicine. However, current pipelines of sequencing data analysis are unable to utilize this knowledge to reduce the bias and the noise caused by differences between reference and actual genomes.

The objective of the current project is to address this problem. We will introduce the concept of reference multi-genome that will model multiple variants of particular genomic loci. Furthermore, we will design and implement tools incorporating this concept into current sequencing analysis pipelines. It will consist of two components: efficient algorithm for read mapping onto a reference multi-genome and a set of tools adapting mapping results to further analysis within various standard sequencing data processing pipelines. Finally, we will illustrate the advantages of our approach in a case study: the application to the discovery of DNA double-strand breaks in cancer cells using the BLESS experiment – a break detection method that is extremely precise yet sensitive to mapping errors.

Summarizing, our project will provide a complete set of tools to incorporate reference multi-genomes into sequencing data analysis pipelines. Furthermore, we will show that our approach can be advantageous for a wide range of research projects benefiting from DNA sequencing technology, including cancer genomics and personalized medicine.


We are recruiting candidates for a PhD student internship position.

  • scholarship 3000 PLN/month
  • for up to 15 months
  • PhD student (or qualified candidate to a doctoral school)
  • background in Computer Science, Bioinformatics or related field
  • solid experience in programming
  • interest in developing and implementing efficient algorithms for genomic data
  • send your CV and motivation letter to
  • application deadline: 30.08.2020.