Inferring Drosophila gap gene regulatory network: pattern analysis of simulated gene expression profiles and stability analysis

Background Inference of gene regulatory networks (GRNs) requires accurate data, a method to simulate the expression patterns and an efficient optimization algorithm to estimate the unknown parameters. Using this approach it is possible to obtain alternative circuits without making any a priori assumptions about the interactions, which all simulate the observed patterns. It is important to analyze the properties of the circuits. Findings We have analyzed the simulated gene expression patterns of previously obtained circuits that describe gap gene dynamics during early Drosophila melanogaster embryogenesis. Using hierarchical clustering we show that amplitude variation and defects observed in the simulated gene expression patterns are linked to similar circuits, which can be grouped. Furthermore, analysis of the long-term dynamics revealed four main dynamical attractors comprising stable patterns and oscillatory patterns. In addition, we also performed a correlation analysis on the parameters showing an intricate correlation pattern. Conclusions The analysis demonstrates that the obtained gap gene circuits are not unique showing variable long-term dynamics and highly correlating scattered parameters. Furthermore, although the model can simulate the pattern up to gastrulation and confirms several of the known regulatory interactions, it does not reproduce the transient expression of all gap genes as observed experimentally. We suggest that the shortcomings of the model may be caused by overfitting, incomplete model description and/or missing data.


Introduction
A biological system that has been extensively studied is the segmentation mechanism of early development in Drosophila melanogaster (see [1] for review). At early stage, a cascade of maternal and zygotic genes is activated in the syncytial embryo that subdivides the ectoderm into smaller domains. First, maternal morphogenes such as bicoid (bcd), caudal (cad) and hunchback (hb) activate zygotic gap genes such as hb, giant (gt), Krüppel (Kr), knirps (kni), or tailles (tll), which in turn will activate the pair rule genes. The pair rule genes will regulate segment polarity genes and Hox genes, which both control the differentiation of each segment of the future embryo [1].
The gap gene circuit has been extensively investigated using mathematical models [2,3]. In all cases, the goal was to derive the regulatory interactions that control gene expression. The gene circuit approach [4] combined with a parameter optimization method allowed to infer gene regulatory interactions directly from experimental spatio-temporal gene expression data [5,6]. In all cases the optimization involved minimization of the difference between observed data and simulated data. Previous studies [4,7,8] have analyzed the obtained gene circuits essentially by visual inspection of the simulated patterns, mainly because of an insufficient number of circuits. Fomekong et al. [8] proposed a faster optimization method that yielded a higher number of circuits, allowing for a more detailed analysis. Finding a set of parameters that reproduces the observed data does not necessary imply that the network structure has been identified correctly, or that the underlying pattern formation mechanism of the system has been revealed [9,10]. For some systems, the network structure itself inherently leads to robust pattern formation and is weakly depended on the specific parameter values [11,12]. Inference may lead to a unique network, however for many cases many circuits with different topologies and scattered parameter values are found. It is necessary to further analyze these circuits and discriminate between realistic and non-realistic circuits based on other criteria [13,14].
We have analyzed the simulated patterns and parameters of the circuits that were obtained previously using descriptive statistics and stability analysis [8,15]. The incompleteness of the available experimental data, the complexity and the non-linearity of the model and the large number of unknown parameters potentially leading to over-fitting makes the reverse engineering problem challenging. It might lead to circuits with different regulatory interactions or variability in the simulated patterns and dynamical behavior.
In groups 2 and 3, Gt represses hb, causing the dip observed at anterior hb. Also, Hb activates gt (contrarily to group 1). Consequently, there should be an increased production of anterior gt and something should locally repress gt to keep it at its normal level. At this position, Tll is the gene that controls gt expression level, and one way to keep it constant would be to increase the repression weight.
tion time are observed (see Additional file 1). These features are not observed in the data, and may represent circuits that are not biologically realistic. We performed a hierarchical cluster analysis on the profiles to identify groups that share deviant features. By statistical comparison of the parameters among the different groups using a T-test we find parameters that may explain the observed features. We noticed that some of the clusters share the same circuits as shown in Figure 3. We observe that the group with the hb-anterior dip largely overlaps with the tll bump cluster and also with one of the gt clusters. This means that the features in hb, tll and gt share a common circuit topology (see Venn-diagram Figure S1 in Additional file 1).
The differences between the profiles may be explained by variability in circuit topology or by differences in parameter magnitude. By comparing the parameters of the different groups using a T-test, we find that parameters of group 2 and group 3 do not show any significant differences, and therefore are combined into one group 2-3. Comparison of group 1 and group 2-3 yields three parameters: ( Table 1). In Group 2-3 Gt represses hb and causes the anterior hb dip. Also, tll activation by Gt is considerably decreased, leading to higher production of tll, which causes the tll-bump. Comparison of group 1 and 4 shows that Kr autoactivation is strong and repression by Gt is weak within group 4 ( Table 2). This combination causes a local increased production of Kr on the domain where Gt is expressed. A consequence of the strong autoactivation would be a higher level of Kr all along the A-P axis. This is prevented by increased repression through Hb and Kni. Also, the weaker production rate of Kr compensates for the strong autoactivation.

Pattern stability at later times
During gastrulation most of the gap domains, maternal bcd and cad disappear within 30 min. The anterior hb domain disappears rapidly during gastrulation [16], while posterior hb domain can still be detected for a few more hours until the end of germ band extension [17]. Central Kr domain decays rapidly after the onset of gastrulation [18,19]. Posterior gt domain disappears rapidly during gastrulation while the anterior domains persist for a few hours but change quite drastically and become involved in organ formation [20][21][22]. The entire kni domain and the posterior domain of tll disappear rapidly after gastrulation [23,24].
The long-term dynamics of the circuits should show if the model is able to predict the disappearance of the gap gene domains, and provide information about the asymptotic stability of the model and potentially gives its attractors. The parameters were obtained by fitting the model to real data until gastrulation time. To study long term dynamics we simulated all circuits for an extended period (see Methods, Additional file 1 and Additional files 2, 3, 4, 5 where long term dynamic movies are shown). We classified the behavior into the following groups: 1. stable patterns: 64 circuits, where tll and cad domains disappear completely in most cases. This group is composed of three sub-groups (a) 9 circuits show a rudimentary gap gene pattern with all gene domains more or less well defined. (Figures 4-A, B).
(b) 27 circuits develop an uniform hb domain that covers the whole embryo. (Figures 4-C, D).  Spatio-temporal surface plots showing the behavior of four different circuits at later times, and on the right the corresponding circuits (b) 19 circuits where all genes oscillate but tll, which disappears (Figures 4-G, H).
We have compared parameters using T-tests and the average circuit topology (see Table 3, 4 and 5). From Table 3, we see that the main difference between the two stable patterns is the strong hb autoactivation in the group with expanded hb. In some of the oscillatory circuits (Figure 4-H), we observed a basic motif composed of autoactivation and negative feedback loops. It can be shown theoretically that the minimal requirement for oscillations to occur in a two-gene network is that an activator activates its repressor and also itself. Nevertheless, the positive and negative feedback loops may be indirect and also the actual parameter values may prevent the formation of oscillations even if the minimal requirement for oscillations is present. In the first oscillatory group we observe the basic motif for oscillation between Hb and Gt ( Figure 5-H). In the data we observe that the anterior hb peak slightly collapses, however it collapses more at the position of the anterior gt peak. In a number of circuits the fit to Hb is improved by repression of hb by Gt (group 2 cluster analysis). Almost all members of this group show oscillations. Hb in this group has an intermediate autoactivation (the group with strong autoactivation does not show oscillations) and Cad activates hb leading to constitutive activation of posterior hb. Next to the hb-gt oscillatory motif we see similar motifs with Tll. When negative feedback interactions are removed we observe that the corresponding gene does not oscillate any longer. Although the connections in these motifs are weak the behavior at later times is strongly affected.

Parameter correlations
From the two previous analyses and T-tests, we see that parameters differ from circuit to circuit leading to different behaviour. This might be a sign of overfitting, which can be determined by looking at the correlation matrix of all the parameters ( Figure 5). Because of compensation mechanisms parameters may not be identifiable. Examples of these are promoter and decay rates, which both scale the expression profile.
Furthermore, the input weights on a single gene can also compensate each other. If a positive input on a gene becomes stronger, increasing negative weights or decreasing positive weights can adjust for the increased total input, such that the total input on that gene is not altered much. However, these correlation patterns may be more intricate [see Additional file 1 for an extended correlation analysis].

Discussion
It seems difficult to determine which of the circuits have the "correct" topology. From the clustering of the gastrulation profiles, we could have considered that only circuits without defects should be taken into account, but we see that it is not that trivial since the difference from circuit to circuit is not only based on the regulatory interaction type, but also their strength. None of the circuits predicted the disappearance of the gap genes during gastrulation, this The table summarizes the list of parameters that are significantly different (mean mi, difference between mean dm and their p-value from the T-test t. The parameter difference found between Group I and II are the strength of hb autoactivation and the activation/ repression of kni by Bcd. [see Tab. S1 in Additional file 1, where networks diagrams are shown.] may be related to missing mechanisms like degradation of maternal genes. The long-term dynamics of the circuits show that the patterns converge to four main attractors. This difference in convergence may be explained by differences in a few, but also the presence of certain motifs. More intriguing, one of the attractors resembles the gastrulation pattern and circuits falling into this group have interactions more consistent with experimental evidence. Combined with the well defined parameters obtained from the correlation analysis [12], the following gap gene interactions consistent with literature were derived: 1. All the gap gene are activated by Cad.
2. All the gap gene but kni are activated by Bcd.
4. kni does not have a auto-repression, but certitude on auto-activation can not be deduced (strong correlation coefficient with most parameters)

Mutual repression between Gt and Kr
These interactions are consistent with the regulatory mechanism proposed in [4] as well as those obtained in early literature [19][20][21][25][26][27][28][29] and previous analysis [10]. For example biological evidence suggests that the anterior hb dip is caused by different early and late regulation mechanisms, which is not included in the current model, consequently the optimization predicts for many circuits suppression of hb by Gt to mimic this data feature. Furthermore the model tries to reproduce the experimentally observed decrease of cad by introducing negative feedback through the gap genes.
Jaeger et al. [4] suggested that the anterior shift of posterior domains after cycle 14A is caused by asymmetric repression of the gap genes. All the current circuits reproduce the shift, but from the current analysis, it seems that the shift is not necessary a consequence of the asymmetric repression triggered by Hb. In many circuits we see that the shift of these domains continues to progress and leads to domain expansion or disappearance of other domains. Schroder et al. [30] suggested that autoactivation is involved in maintenance of gap gene expression and

W Kni
Bcd sharpening of gap domain boundaries [15]. Although this might be true, strong autoactivation also affects pattern stability later on during gastrulation, making it more difficult for domains to fade. The inability of the circuits to predict transient expression suggests that either an additional mechanism is missing in the model or that the optimization failed to capture the dynamics.
Parameter correlation matrix Figure 5 Parameter correlation matrix. Left: Matrix showing the pairwise correlation; the colour scale goes from intensive red (strong negative correlation) to bright green (positive correlation). The correlation matrix shows that there exist many pair wise correlations that tend to form clusters. Right: The absolute value of the correlation coefficients are used as a similarity measure to cluster the parameters, which is presented as a dendrogram. The parameters are sorted according to the dendrogram.