NCN Preludium

Bayesian analysis of bladder cancer subtypes based on high-throughput data

research project objectives

Technological progress, present in almost every science field, is tightly related to gathering the huge amounts of data. The key example here is molecular biology, which thanks to automated diagnostics can provide a description of given patient with gigabytes of data from high throughput technologies. One of such technologies are microarray methods for gene expression measurements or spectrometric determination of metabolites concentrations in cells that. Nevertheless, whatever the type of technology is, provided data sets carry important and useful information that can and should be retrieved and used in diagnostics, personalised medicine or drug design procedures.

On the other hand, an ongoing, joint work of molecular biologists, bio-physicists and -chemists results in systematic description of processes that occur in human organisms and reveals possibilities for highly specific inference. A good example of a molecular mechanism description are signalling pathways, that with their formal representation can be deeply studied to better understand the complex system of a human cell.

This project aimed to derive an integrative model that analyse series of high throughput experimental data in two interrelated contexts:

  • locally - the independent analysis of individual signalling pathways, that will allow to select the most significantly (in a statistical sense) perturbed signalling cascades in inter-cellular communication;
  • globally - the analysis of a full metabolic network, that describes all possible sources of phenotypic perturbations (with respect to the homeostatic profile) in given experimental data.

project results

MPH - Molecular Process Heterogeneity

During the realization of the project two worflows were described and developed. The first one is MPH method allows for the analysis of the activity of molecular processes on the basis of transcriptomic data in homogeneous cell populations.

The method carries out two phases of non-negative matrix factorization, resulting in an estimate of the functional composition of a homogeneous population of cells along with potential marker genes and activity patterns of specific molecular processes. In the first step, we adapt the Beyesian DSection method previously proposed for transcriptome analysis of heterogeneous populations. Using the set a priori knowledge regarding the expected proportion of functional sub-populations, the first stage estimates the transcriptomic profiles characterizing the activity of the expected molecular processes. Then, the obtained profiles are used to extract the marker genes of individual processes and use them in the second step based on the iterative ssKL method based on minimizing the Kullback-Leibler divergence. The result of the second step of the method is the percentage share of cells in individual molecular processes, and moreover, for each of the processes, new potential marker genes are indicated.

extended FBA analysis

The second proposed method of analyzing transcriptomic data, taking into account metabolomic knowledge, was the processing of transcriptomic data in the context of the metabolic network of a human cell. This method is aimed at determining the profiles of metabolic activity of patients with diagnosed cancer and then at determining the key metabolic reactions that could constitute biomarkers of the relevant molecular subtypes of a given tumor.

Extended FBA (Flux Balance Analysis) method determines the activity of metabolic reactions on the basis of the integration of transcriptomic knowledge with the metabolic model of the human cell, taking into account the correction for the heterogeneous distribution of enzymes in the metabolic network.

The first step in this method is the preparation of the patient’s Personalized Genome-Scale Metabolic Model. We solve this task by integrating the individual transcriptomic profile with the general model of the human cell metabolic network RECON 2.2. In general, the design of the model is based on the determination of the initial activity state of the enzymatic reactions in the lattice. The decision on the status of a reaction is based on the activity of specific genes or groups of genes that guarantee the production of enzymes necessary for a given reaction to take place.

Then, on the basis of a given personalized model, a linear problem is formulated, which maximizes the compliance of the reaction activity after taking into account the structure of the metabolic network with the observed profiles of gene expression levels. The linear problem formulated at this stage belongs to the group of FBA (Flux Balance Analysis) problems that are commonly used to determine the level of flows in networks. The solution to the linear programming problem is the patient’s metabolic landscape, which describes the most optimal distribution of the activity of individual metabolic reactions.

The last step in the construction of our method was the introduction of an amendment taking into account the heterogeneous and incomplete knowledge of human metabolism. The presence of enzymes that coordinate (or co-coordinate) the activity of many reactions is the reason for the observation of artifacts related to the structure of the metabolic network at the stage of analyzing the metabolic profiles of patients. Hence, the last step of the method is to correct the estimated activity of the reaction in accordance with the belonging of the reaction to the cell compartment and the frequency of co-ordinating enzymes.

Funding

The project was funded by Polish National Science Center, Grant no. 2016/21/N/ST6/01507.

Related publications

  1. BioMedRes
    Inferring Molecular Processes Heterogeneity from Transcriptional Data
    Gogolewski, K., Wronowska, W., Lech, A., Lesyng, B., and Gambin, A.
    Biomed Res Int 2017
  2. Renal cell carcinoma classification: a case study of pitfalls associated with metabolic landscape analysis
    Gogolewski, K., Kostecki, M., and Gambin, A.
    In 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2018