# Team:Slovenia/ModelingMethods

(Difference between revisions)
 Revision as of 16:08, 25 September 2012 (view source)← Older edit Revision as of 16:20, 25 September 2012 (view source)Dusanv (Talk | contribs) Newer edit → Line 692: Line 692: +

Transcription factor binding

+

+ Each binding site is modeled as an individual entity. The algorithm, at each simulation step, uniformly distributes transcription factors among their target binding sites based on the amount of each transcription factor available. A binding site entity, B, is defined by the following parameters: +

+
• capacity, C (i.e. the number of binding site repetitions),
• +
• the amount of bound activator, BA (integer) – how many activator entities are bound to this binding site;
• + +
• the amount of bound repressor, BR (integer) – how many repressor entities are bound to this binding site.
• +

+

+ The sum of all repressors and activators bound to a binding site is never greater than that binding site capacity: +

+

+ + +

+ Let A be the amount of activator entities (objects) available for binding and R the amount of repressor entities available for binding.  Activators and repressors that competitively bind to a binding site B distribute among the available binding site repetitions (specified by capacity C) uniformly, with equal binding affinity, according to the following equations: +

+
• if A+R < C, then: +
+ BA = A +
+ BR = R +

• + +
• + if A+R is greater or equal than C, then: +

+
• +
+

+ +

+ All values are non-negative integers, i.e. rounding is used to obtain final BA and BR values. These equations can easily be generalized for a non-competitive binding scenario, when only activators or only repressors bind – in the first case, R is set to 0 (i.e. only activator entities are available); in the second case, A is set to zero (i.e. only repressor  entities are available). +

+ +

+ For example, let 10 TAL-A binding site repetitions exist, meaning TAL-A binding site capacity is equal to 10. TAL-A:KRAB and TAL-A:VP16 competitively bind for this binding site. Suppose that at a given time, the available amount of each of these proteins is 100 units: +

+
• TAL-A:KRAB = 100 units
• +
• TAL-A:VP16 = 100 units
• +
+

+ +

+ The amount of TAL-A:VP16 that will bind to the TAL-A binding site is: +

+ meaning 5 binding site repetitions will become bound by the TAL-A:VP16. Similarly, the other 5 will become bound by the TAL-A:KRAB: +

+ +

+ Thus, a uniform distribution was achieved. +

+ +

+ Bound proteins degrade in the same manner as the unbound proteins. It should be noted that with transcription and translation in the context of this algorithm we also mean mRNA and protein degradation.

+ +

+ +

# Modeling methods

## Introduction

In order to examine (i.e. simulate) the proposed genetic switches in silico, different modeling approaches were used. First, a deterministic model based on the probabilistic interpretation of gene regulation was constructed for each type of a genetic switch. Next, a stochastic simulation was performed to take inherent stochastic dynamics of gene expression into account. To further verify the results obtained using these methods, we also developed a quantitative model that builds upon experimental data. Moreover, we developed a modeling algorithm to more explicitly simulate transcription factor binding, considering number of available binding sites and competitive binding. Each modeling approach is discussed in the following sections. We also discuss the notion of cooperativity in the context of bistability.

## Cooperativity

It is often assumed that functional cooperativity (e.g. multimeric regulation) is required for bistability. However, it has been shown theoretically that bistability can emerge in systems without multimeric regulation, provided that at least one regulatory autoloop is present. (Widder et al., 2006). Furthermore, in silico analysis has shown the existence of bistable architectures without the transcription factor cooperativity typically associated with switch-like properties. (Siegal-Gaskins et al., 2011). An essential feature of these proposed architectures was the competitive binding of two transcription factors to the promoter.

In terms of modeling, sigmoidal functions (characterizing the rate of change dP/dt) – arising from (Hill) exponents greater than one - are often equated with molecular cooperativity (the way the transcription factor binds to a promoter). However, as non-linearity and multi-stability can arise without assuming molecular cooperativity, it has been suggested this is not an accurate proposition and that mathematical, or functional cooperativity – referring to a sigmoidal function arising from system equations - should not automatically be interpreted as molecular cooperativity (Andrecut et al., 2011). One reason for this is that model equations represent a significant simplification of actual biological dynamics of gene expression, which include a large number of reactions not explicitly considered in modeling, such as reactions describing chromosome opening and transcription initiation.

For this reasons, we believe that sigmoidal behavior alone – arising in some of our models for transcription factors’ exponent values (non-linearity) greater than 1 - should not by default be interpreted as molecular cooperativity. Thus, in the context of modeling, with the term cooperativity we mean functional cooperativity greater than 1. Functional cooperativity equal to 1 is referred to as no cooperativity.

Indeed, in case of our positive feedback loop switch – which contains both competitive binding and regulatory autoloops - even deterministic models predict experimentally-verified bistability at low (i.e. close to 1; deterministic fractional occupancy model) or no (quantitative model) functional cooperativity. Our stochastic model of the positive feedback loop switch also predicts bistability is possible without cooperativity.

## Deterministic modeling

Our deterministic models are based on fractional occupancy, a quantity which expresses the degree of saturation at the transcription factor binding site (Sauro, 2012). Fractional occupancy can be expressed as a ratio of active binding site states – i.e. states leading to gene expression – to all possible binding site states:

As such, it can be interpreted as a probability of a promoter being active, or a probability that transcription and/or translation will occur. Gene expression in our model, in turn, is proportional to this probability.

Two main possibilities are distinguished in our models depending on the type of the promoter. In case of a minimal promoter, binding of a transcriptional activator (leading to an active state) is required for transcription initiation. Other states – binding site being free or bound by a repressor - are considered inactive. In case of a constitutive promoter, binding of a transcriptional repressor leads to an inactive state, while unbound (or activator-bound) states are active, hence leading to gene expression.

We assumed large species concentrations compared to the number of available binding sites. We also assumed fast transitions between promoter states, i.e. transcription factor binding and unbinding occurs much faster than transcription/translation, which in this type of deterministic models are represented as a single step of protein production. Another simplification made was a representation of multiple repetitions of a binding site as a single binding site (e.g. a specific TAL-A:KRAB binding site may in reality have 10 repetitions, while in our models it is represented as a single site). As the purpose of our deterministic models is to provide an approximate, basic characterization of the proposed logic, we consider these simplifications acceptable. Other models, such as C#Sim models (described later), try to formalize some of these aspects in more detail.

The mathematical framework which describes protein production and into which we incorporate fractional occupancy is represented as a set of ordinary differential equations (ODEs) of the form:

and is roughly based on a framework proposed in (Kaern et al., 2005). Here:
• [Protein] is protein concentration at a given time;
• k is a constant specifying protein production rate in case of an active promoter.
• kb is a constant specifying the amount of leaky gene expression, meaning protein production that takes place even in the case of inactive (i.e. repressed constitutive or non-activated minimal) promoter.
• dg is a constant specifying protein degradation rate.
• P(active) is the probability of a promoter being in an active state – this probability is equal to fractional occupancy f.
• P(inactive) is the probability of a promoter being inactive and is equal to (1-f).

No concrete quantitative data describing e.g. transcription, translation or degradation rates of TAL effectors was available to us. For this reason, the parameter values (specifying e.g production or degradation rates) in our models were assumed based on commonly accepted propositions, such as that production (i.e. transcription and translation) rates of proteins are (usually) higher than their degradation rates. Parameter values that were used in our simulations should hence only be considered in relative terms (e.g. as protein production to degradation ratios). We also assumed parameter symmetry of different TAL effectors used (TAL-A and TAL-B), because we did not expect that one would have e.g. a production rate significantly higher than the other.

All deterministic models were implemented in MATLAB.

## Stochastic modeling

Gene expression is an inherently stochastic process and deterministic simulation is often not satisfactory. The required conditions for the two approaches to be similar are large system size (high mRNA and protein numbers) and fast promoter kinetics (Kaern et al., 2005), but these conditions are not always met. Furthermore, stochastic models can capture the individuality of single cells, as opposed to deterministic models that can only predict the average behavior of a cell population (Kaern et al., 2003). For this reason, we performed stochastic simulations of the proposed genetic switches. Our results were obtained using SGN Sim, a stochastic genetic networks simulator (Ribeiro et al., 2007) based on stochastic simulation algorithm (SSA).

Competitive binding between an activator and a repressor exists in our positive feedback loop switch. This can lead to three major minimal promoter states: the promoter being free (i. e. unbound - neither an activator nor a repressor is bound, zero activation is present), bound by a repressor or bound by an activator. When minimal promoter is unbound, only leaky expression occurs. If bound by an activator, this causes high, activated expression levels. Bound repressor causes the expression levels to drop even below the ones achieved for the unbound (non-activated) state.

Our deterministic models made no clear distinction between free (unbound) minimal promoter state and a repressor-bound state – in both cases, inactive promoter state was assumed, leading to only leaky expression. In our stochastic models, we improved this by introducing different expression levels for all three states, achieving a more realistic formalization.

Using stochastic simulation, we also tested how the delay between transcription and translation affects bistability.

## Quantitative model and stability analysis

Based on experimental data, we constructed what we refer to as a quantitative model. Please see Quantitative and stability model for details. To get a better idea of the functioning of the model, an interactive web application was developed and is available here.

## C#Sim - algorithmic modeling

We developed a new modeling algorithm, called C#Sim. The main purpose of this was:

• to explicitly model a limited number of binding site repetitions;
• to explicitly model competitive binding of activators and repressors;
• to construct a new modeling approach that incorporates stochasticity of gene expression into an otherwise deterministic approach.

The assumptions made here were similar to deterministic modeling assumptions. First, we assumed that high concentrations of mRNA and proteins existed compared to the number of available binding sites that transcription factors could bind to. Second, we assumed that binding of transcription factors to binding sites was much faster than transcription and translation.

A feature of C#Sim is that it operates with discrete numbers – at each step of the simulation, mRNA and protein amounts are non-negative integers. Each mRNA and protein molecule is represented as an independent entity – or object - defined by its parameters and interactions. An object-oriented programming approach was used to achieve this. At the moment, C#Sim supports the following entity types, implemented as classes:

mRNA
Parametrs:
• birth time (integer) – tells at what simulation step this mRNA entity was created (transcribed); used for modeling transcription-translation delay;
• alive (true/false) – tells if this mRNA is alive and subject to translation (true) or not.
Protein
Parametrs:
• birth time (integer) – tells at what simulation step this protein entity was created (translated);
• alive (true/false) – tells if this protein is alive (true), e.g. it can function as a transcription factor.
Binding site
Parametrs:
• capacity (integer) – the number of binding site repetitions; each repetition can be bound by one Protein entity;
• amount of bound activator (integer) – the number of activator entities bound to this binding site;
• amount of bound repressor (integer) – the number of repressor entities bound to this binding site.
Promoter
Parametrs:
• promoter type – either minimal or constitutive;
• list of promoter binding sites;
• active mRNA transcription rate, which determines the number of mRNA entities to produce at each simulation step for activated promoter (in case of a minimal promoter) or non-repressed promoter (in case of a constitutive promoter);
• inactive (leaky) mRNA transcription rate, which determines the number of mRNA entities that can be produced even when the promoter is not activated (for minimal promoter), or when the promoter is repressed (for constitutive promoter);
• the amount of all repressors and activators bound to all promoter binding sites.
Gene
Parametrs:
• gene's promoter;
• translation rate, describing the number of proteins produced per each transcribed alive mRNA entity at each simulation step;
• mRNA and protein degradation rate (i.e. the percentage of mRNA and protein degraded at each simulation step).

Note that the majority of these parameters are computed automatically, not specified by the user (programmer). The parameters specified by the user (when defining the network structure) are: binding sites' capacities, promoter types, list of promoter binding sites, active and inactive transcription rates, a promoter for each gene, translation rates and degradation rates. A delay between transcription and translation can also be specified.

Using these entities, a hierarchy of gene regulatory network can quickly be constructed. For examples, please take a look at the source code, which contains both our switches implemented in C# language.

Other parameters also define the behavior of the mentioned entity types – they are listed and explained in more detail in the algorithm overview below.

### Algorithm overview

C#Sim modeling consists of two major steps:

1. define gene regulatory network structure as a series of related entities (objects): genes to transcribe and translate, promoters, binding sites and transcription factor binding interactions. Specify necessary parameters, such as production rates, degradation rates and promoter binding sites.
2. Specify the number of simulation steps to take. At each step of the simulation:
1. (optional) introduce input signals and define their actions (e.g., introduce signal 1 at a specific time of the simulation and make it prevent transcription factor from binding);
2. transcribe and translate genes; degrade (i.e. delete) a parameter-specified percentage of produced mRNA and proteins;
3. bind existent transcription factor proteins to their target binding sites; considering binding site capacity (i.e. the number of binding site repetitions), uniformly distribute transcription factors among target binding sites;
4. execute the next simulation step.

### Transcription factor binding

Each binding site is modeled as an individual entity. The algorithm, at each simulation step, uniformly distributes transcription factors among their target binding sites based on the amount of each transcription factor available. A binding site entity, B, is defined by the following parameters:

• capacity, C (i.e. the number of binding site repetitions),
• the amount of bound activator, BA (integer) – how many activator entities are bound to this binding site;
• the amount of bound repressor, BR (integer) – how many repressor entities are bound to this binding site.

The sum of all repressors and activators bound to a binding site is never greater than that binding site capacity:

Let A be the amount of activator entities (objects) available for binding and R the amount of repressor entities available for binding. Activators and repressors that competitively bind to a binding site B distribute among the available binding site repetitions (specified by capacity C) uniformly, with equal binding affinity, according to the following equations:

• if A+R < C, then:
BA = A
BR = R

• if A+R is greater or equal than C, then:

All values are non-negative integers, i.e. rounding is used to obtain final BA and BR values. These equations can easily be generalized for a non-competitive binding scenario, when only activators or only repressors bind – in the first case, R is set to 0 (i.e. only activator entities are available); in the second case, A is set to zero (i.e. only repressor entities are available).

For example, let 10 TAL-A binding site repetitions exist, meaning TAL-A binding site capacity is equal to 10. TAL-A:KRAB and TAL-A:VP16 competitively bind for this binding site. Suppose that at a given time, the available amount of each of these proteins is 100 units:

• TAL-A:KRAB = 100 units
• TAL-A:VP16 = 100 units

The amount of TAL-A:VP16 that will bind to the TAL-A binding site is:

meaning 5 binding site repetitions will become bound by the TAL-A:VP16. Similarly, the other 5 will become bound by the TAL-A:KRAB:

Thus, a uniform distribution was achieved.

Bound proteins degrade in the same manner as the unbound proteins. It should be noted that with transcription and translation in the context of this algorithm we also mean mRNA and protein degradation.

## References

Andrecut M, Halley JD, Winkler DA, Huang S. (2011) A General Model for Binary Cell Fate Decision Gene Circuits with Degeneracy: Indeterminacy and Switch Behavior in the Absence of Cooperativity. PLoS ONE 6(5): e19358. doi:10.1371/journal.pone.0019358.

Kaern M, Blake WJ, Collins JJ. (2003) The engineering of gene regulatory networks. Annual Review of Biomedical Engineering. 5, 179-206.

Kaern M, Elston TC, Blake WJ, Collins JJ. (2005) Stochasticity in gene expression: from theories to phenotypes. Nature. 6, 451-464.

Ribeiro AS, Lloyd-Price J. (2007) SGN Sim, a Stochastic Genetic Networks Simulator. Bioinformatics. 23 (6): 777-779.

Sauro HM. (2012) Enzyme Kinetics for Systems Biology. Future Skill Software.

Siegal-Gaskins D, Mejia-Guerra MK, Smith GD, Grotewold E. (2011) Emergence of Switch-Like Behavior in a Large Family of Simple Biochemical Networks. PLoS Comput Biol 7(5): e1002039. doi:10.1371/journal.pcbi.1002039.

Widder S, Macía J, Solé R. (2009) Monomeric Bistability and the Role of Autoloops in Gene Regulation. PLoS ONE 4(4): e5399. doi:10.1371/journal.pone.0005399.

Next: Deterministic model of the mutual repressor switch >>