Incorporating genomic variation information
into DNA sequencing data analysis

OPUS 11 scientific project of National Science Centre

Project description

In the majority of DNA sequencing experiments the first step of analysis consists of mapping sequencing reads onto a so-called reference genome, which represents the consensus of genomic sequence of the species of interest. Currently reference genomes are available for thousands of species and much effort is devoted to the analysis of genomic diversity among them. This is especially visible in the case of human genomics, where the development is driven by the perspective of application to personalized medicine. However, current pipelines of sequencing data analysis are unable to utilize this knowledge to reduce the bias and the noise caused by differences between reference and actual genomes.

The objective of the current project is to address this problem. We will introduce the concept of reference multi-genome that will model multiple variants of particular genomic loci. Furthermore, we will design and implement tools incorporating this concept into current sequencing analysis pipelines. It will consist of two components: efficient algorithm for read mapping onto a reference multi-genome and a set of tools adapting mapping results to further analysis within various standard sequencing data processing pipelines. Finally, we will illustrate the advantages of our approach in a case study: the application to the discovery of DNA double-strand breaks in cancer cells using the BLESS experiment – a break detection method that is extremely precise yet sensitive to mapping errors.

Summarizing, our project will provide a complete set of tools to incorporate reference multi-genomes into sequencing data analysis pipelines. Furthermore, we will show that our approach can be advantageous for a wide range of research projects benefiting from DNA sequencing technology, including cancer genomics and personalized medicine.


We are recruiting candidates for two (master) student internship positions (1000 PLN/month, up to 18 months).


  • reasonable experience in programming
  • background in Computer Science, Bioinformatics or related field
  • interest in DNA sequencing data analysis and/or developing and implementing efficient algorithms for genomic data

How to apply?

  • send your CV and motivation letter to
  • application deadline is 29.01.2017.