Note: This content is accessible to all versions of every browser. However, this browser does not seem to support current Web standards, preventing the display of our site's design details.

Systems Biology

Systems Biology

The following scheme illustrates the 4 main tasks involved in the analysis and control of biological systems. Our group is actively doing research in all these fields.

 PhysicalModeling ExperimentDesign SystemIdentification Predictionsand Control

### Physical modeling of biochemical reaction networks

Chemical reaction networks consist of a set of distinct chemical species and a set of reactions by which the amounts of molecules that are present in the system are changed. Traditionally, such systems have mostly been modeled deterministically using the so-called reaction rate equations. These are ordinary differential equations which describe the time evolution of the concentrations of the chemical species, under the assumption of large amounts of molecules. Inside cells, however, the reaction volumes are very small and some species may be present in low abundances. Under these conditions, stochastic models proved to be necessary to explain phenomena that are not captured by deterministic models. A famous example is the bistable switch network shown in Figure 1. Therein, the switching between the two different stable equilibria of the deterministic model can be explained only using a stochastic model.

Figure 1: Example of a bistable switch. Left: Genes a and b produce protein A and B, respectively. Protein A inhibits the expression of gene b. Right: Evolution of the probability density (vertical axis) of the state (x1=A,x2=B horizontal plane) of the bistable switch from a fixed initial value (simulation)

Under the assumption that the system is well-stirred, in thermal equilibrium and that the reaction volume is constant it can be shown that the amount of molecules that are present in the biochemical system follows a continuous-time Markov chain (CTMC) whose dimension is equal to the number of different chemical species. The time evolution of the probability distribution of this stochastic process follows an equation known as the chemical master equation (CME).

A question that has been much discussed lately is how much of the variability observed in biological measurements can really be attributed to random molecular fluctuations and how much stems from other factors such as differences between cells which are already present at the start of an experiment or different local micro-environments in the population. These different sources of variability are usually termed intrinsic and extrinsic noise as referring to the randomness that is intrinsically present in the biochemical process of interest and the variability that stems from factors which are extrinsic to the studied process, respectively. Biological experiments targeted to separate intrinsic and extrinsic noise have shown that both sources of variability can play an important role. Often, extrinsic noise dominates but, as dictated by theory, if the molecule counts are small, intrinsic noise becomes more important. Continuous time Markov chains and the chemical master equation offer a framework for modeling the intrinsic noise of biochemical processes. Our group works on constructing and analyzing extensions of CTMC models which also include extrinsic noise sources, mainly in the form of reaction rates that are assumed to vary randomly between different cells of the population. The major difficulty in the analysis of such models is that it is in most cases not possible to compute the time evolution of the whole probability distribution of the corresponding stochastic process.

Figure 2. Illustration of the different noise sources in biochemical reaction networks. Variability in measured population distributions stems on the one hand from random fluctuations inside single cells and, on the other hand from small differences between genetically identical cells.

### Experiment design

Quantitative studies of biological systems with mathematical models strongly depend on an appropriate characterization of the underlying system, that is on good knowledge about the underlying mechanisms and kinetic parameters. While extracting such knowledge from averaged cell population data is common practice, it has only recently been realized that also the molecular noise observed in single cell measurements may be a rich source of information about the model parameters. Mathematically, one way to quantify the information provided by single cell experiments is to determine the precision to which the model parameters can at best be estimated in a given experimental setup, that is to determine the variances of the best possible unbiased estimators of the model parameters. Thanks to the Cramer-Rao inequality these variances can be computed from the Fisher information matrix (FIM). Unfortunately, computing the FIM, to decide whether a single cell experiment might be beneficial, requires the solution of the CME and is usually very difficult for CTMC models. Our group is working on methods that aim at approximately computing the FIM. One specific mathematical trick that we use is to transition from a full description of the stochastic process to a computationally more tractable framework in which only low-order moments of the process are captured. These moments can then be used to compute the FIM for statistics of the measured single cell data and, hence, to determine lower bounds on the total information provided by the experiment. See for example [2],[3],[4] and [5].

As stated before, the FIM can be used to decide whether or not a single cell experiment should be performed. Possibly even more important, however, is that it can generally be used to compare different experiments and, hence, to search for the most informative one. This task, known as optimal experimental design, requires solving an optimization problem in which a functional (usually the determinant) of the FIM is maximized over the set of possible experiments. Our group is developing algorithms that can be used to perform optimal experimental design in realistic experimental settings and, together with our collaborators, we are also applying our methods in the wet lab. See for example [6].

Figure 5: Confidence regions for the model parameters obtained
from an optimal (red) and an unplanned experiment (blue).

### System identification

The use of mathematical models is of paramount importance for the quantitative characterization of biological systems. In order to obtain such models, one typically derives the equations describing the dynamics of the compounds, starting from the available physical knowledge. These equations, however, depend on parameters that are usually unknown. A key step in the construction of a quantitative model is therefore the collection of data, from real experiments, from which the unknown parameters can be inferred. The inference process is called grey-box identification, to stress the fact that the structure of the model is known, contrary to black-box modeling where no information is available, but some pieces are missing (i.e. the parameters value). In the literature there are two main approaches for parameter identification.

• FREQUENTIST APPROACH: following this approach, a point estimate of the model parameters is computed by minimizing the distance, according to an appropriate norm, between the measured data and the model output. The Prediction Error Methods (PEM) are among the most used approaches in this context. If additional structure on the model is available, more tailored methods can be used, as done in [7] for linear systems.
• BAYESIAN APPROACH: contrary to the previous case, Bayesian inference methods return a probability distribution for the parameters. At the very core, this approach is built on the idea that prior information about the model parameters should be incorporated in the inference process. This is done by defining the parameters as a random variable distributed according to a "prior probability distribution" that should incorporate all the available knowledge. If no other information apart from the physiological range is available, then a uniform distribution can be used. After collecting the data, this "a priori distribution" is updated into an "a posteriori distribution", according to Bayes rule. Computing the posterior distribution in closed form is usually not possible, however a good approximation can be obtained from samples taken using Markov Chain Monte Carlo (MCMC) approaches.

### Predictions and control

Once a quantitative model of the biological system has been derived, it can be used for several different tasks.

• PREDICTIONS: the most straightforward application is the prediction of the system behavior, under different conditions or different stimuli. In particular, in the Bayesian approach, it is possible to propagate the information gained from the parameter posterior distribution into the so-called predictive distribution. This allows one to quantify how much of the parameter uncertainty is actually important to determine the behavior of the system.
• ANALYSIS: a quantitative model can also be useful to gain additional insight into the biology of the system. For example, the identified parameters can give information about the rate of the reactions (i.e. which reactions are more frequent/important). Another possibility is to use the model to study what would be the effect of removing some of the reactions (knock-out) or the effect of external stimuli.
• CONTROL: A quite interesting aspect of systems biology is that it is nowadays possible to construct biological components that react to external inputs, such as light or concentration signals. Quantitative models of such components can be used to design which signals should be applied to the system in order to obtain a desired behavior. For example, in the case study detailed below, a light signal is used to control the protein production of a gene expression circuit in yeast. In this context, a fundamental contribution can be gained from techniques developed in control theory. Viceversa, the analysis of controlled biological systems can open new theoretical challenges for control theorists. In our group, we are actively doing research on reachability for biological systems. For the system described in [8], this task turned out to be strictly related to the reachability analysis of switched linear positive systems.

#### Application: Control of a light-inducible gene expression system in yeast [9].

 © 1999-2014 by ETH Zurich | Francesca Parise | October  7, 2014