Incorporating genomic variation information
into DNA sequencing data analysis


OPUS 11 scientific project of National Science Centre

Project description

In the majority of DNA sequencing experiments the first step of analysis consists of mapping sequencing reads onto a so-called reference genome, which represents the consensus of genomic sequence of the species of interest. Currently reference genomes are available for thousands of species and much effort is devoted to the analysis of genomic diversity among them. This is especially visible in the case of human genomics, where the development is driven by the perspective of application to personalized medicine. However, current pipelines of sequencing data analysis are unable to utilize this knowledge to reduce the bias and the noise caused by differences between reference and actual genomes.

The objective of the current project is to address this problem. We will introduce the concept of reference multi-genome that will model multiple variants of particular genomic loci. Furthermore, we will design and implement tools incorporating this concept into current sequencing analysis pipelines. It will consist of two components: efficient algorithm for read mapping onto a reference multi-genome and a set of tools adapting mapping results to further analysis within various standard sequencing data processing pipelines. Finally, we will illustrate the advantages of our approach in a case study: the application to the discovery of DNA double-strand breaks in cancer cells using the BLESS experiment – a break detection method that is extremely precise yet sensitive to mapping errors.

Summarizing, our project will provide a complete set of tools to incorporate reference multi-genomes into sequencing data analysis pipelines. Furthermore, we will show that our approach can be advantageous for a wide range of research projects benefiting from DNA sequencing technology, including cancer genomics and personalized medicine.

Positions

We are recruiting candidates for one PhD student and one master student internship positions.

PhD student position

What?
  • scholarship 3000 PLN/month
  • from October-November 2017
  • for up to 36 months
Who?
  • PhD student
  • background in Computer Science, Bioinformatics or related field
  • solid experience in C/C++ programming
  • interest in developing and implementing efficient algorithms for genomic data
How?
  • send your CV and motivation letter to dojer@mimuw.edu.pl
  • application deadline: 30.09.2017.

Master student position

What?
  • scholarship 1000 PLN/month
  • from October-November 2017
  • for up to 24 months
Who?
  • master student
  • background in Computer Science, Bioinformatics or related field
  • reasonable experience in programming
  • interest in DNA sequencing data analysis and/or developing tools for genomic data analysis
How?
  • send your CV and motivation letter to dojer@mimuw.edu.pl
  • application deadline: 30.09.2017.

Contact