Materials
The NMA in question represents a systematic review of randomised controlled trials that compared active treatments for bipolar disorder (or placebo), either as monotherapy or as addon treatment, for at least 12 weeks [4]. The primary outcome was the number of participants with recurrence of any mood episode this primary outcome was a combination of two secondary outcomes, namely the number of participants with recurrence of a manic episode and those with recurrence of a depressive episode. All in all we identified and included 33 randomised controlled trials that examined 17 maintenance pharmacotherapies for bipolar disorder in 6846 participants. Figure 1 shows the network formed by the identified comparisons in this NMA. We conducted a randomeffects network metaanalysis within a Bayesian framework using Markov chain Monte Carlo in OpenBUGS 3.2.2. [6].
Assessment of risk of bias of each study and of each direct comparison
Two assessors rated the risk of bias (RoB) of each RCT according to the Cochrane Handbook risk of bias tool [7]. The RoB examines the key methodological issues in a randomised trial, such as generation of random sequence, concealment of allocation, blinding of participants, blinding of therapists, blinding of outcome assessment, incomplete outcome data, and selective outcome reporting. We also assessed whether the definitions of the mood episode relapse or recurrence were explicit/operationalised or not in the primary studies, and the sponsorship bias. We rated an item at unclear risk of bias when we did not find sufficient information to judge it at either high or low risk.
Then we made a summary evaluation of RoB for each included study according to the following categories:

Low risk of bias: there is no item rated at high risk among the nine items listed above.

Moderate risk of bias: there is one item rated at high risk.

High risk of bias: there are two or more items rated at high risk.
We examined the validity of this classification by pooling and comparing RR for studies rated as low, moderate or high risk of bias in a comparison if this comparison had an enough number of included trials to enable such validation.
After making a summary evaluation of RoB for each study, we made a similar evaluation of RoB for each direct comparison. When studies rated at different risks of bias were pooled, we made a summary evaluation by taking into account the weight that each study is given in pooling the studies into one direct comparison estimate as follows:

Low risk of bias: all the included studies were rated as low risk of bias.

Moderate risk of bias: all the studies were rated as moderate or low risk of bias; or there was one study rated as high risk of bias but this study contributed less than one quarter of the pooled sample.

High risk of bias: there are two or more studies rated at high risk; or one major study at high risk of bias made a substantial contribution.
The above method of summarising RoBs of various domains into RoB of a study and then summarising study RoBs into RoB of a comparison is admittedly to a certain extent arbitrary. However, it must be noted that we can use the same logic and calculations, as we demonstrate below, to synthesise these characteristics at the level of each pairwise comparison into those at the level of each network estimate. In the following we shall therefore use the definitions above to illustrate our method.
Assessment of ‘enrichment design’ for each study and for each direct comparison
We also evaluated whether each study used the enrichment design in relation with the polarity of the mood episode. The influence of the enrichment design was assessed separately for the two secondary outcomes: prevention of depressive episodes and prevention of manic episodes. Participants were considered to be enriched for a certain drug for depressive episode relapse (depressive enrichment) when they had been recruited at an acute depressive episode and investigated for the depressive episode relapse after being stabilised by that drug, and participants were considered to be enriched for a drug for manic episode relapse (manic enrichment) when they had been recruited at an acute manic episode and investigated for the manic episode relapse after being stabilized by that drug.
We first calculated the percentages of both depressive and manic enrichment for each study according to the number of participants in acute depressive or manic episode at recruitment, and then we estimated the corresponding percentages for each direct comparison consisting of one or more studies with consideration of the direction of enrichment for each study. For example, if a direct comparison A vs B consisted of two studies, one of which (n = 100) did not use the enrichment design but the other (n = 200) recruited patients at their depressive episodes and treated them with drug A, then this direct comparison would have 67 % (200/300) of participants enriched for depressive relapse in favour of drug A, 33 % not enriched for depressive relapse and 100 % not enriched for manic relapse.
Using the contribution matrix to quantify the influence of RoB and of enrichment design in each network estimate
We used a recently developed tool for NMA, called the contribution matrix, that quantifies how much each direct comparison in the network contributes to each network estimate in the NMA [8, 9].
Let’s take a simple, triangular network ABC. We first calculate the direct estimate comparing A vs B, A vs C and B vs C by pooling trials comparing A vs B, A vs C, and B vs C, respectively. We denote these as D_{AB}, D_{AC} and D_{BC}. In the NMA of the full triangle, the mixed or network estimate comparing A vs B comes from the direct comparison D_{AB} and the indirect comparison I_{AB} consisting of D_{BC} and D_{CA} via C. For the simple situation in which each of the direct estimates has the same variance, the network estimate N_{AB} is (2*D_{AB} + (D_{AC−}D_{BC}))/3. Thus, for the mixed estimate (or also called network estimate) N_{AB}, the three direct estimates D_{AB}, D_{AC} and D_{BC} makes contributions of 50, 25 and 25 %, respectively.
When the network structure is complex and when variances are not equal, calculating the contribution of each direct estimate to each network estimate in the NMA is more complicated. In general more weight is given to direct comparisons with more precision and to those that are more central to the network and thus contribute to more indirect comparisons. Using the netweight command in Stata [10], we calculated the contribution matrix showing contributions from each direct comparison to the network comparisons. The weight that each direct comparison contributes to the network estimates is a combination of the variance of the direct comparison and the network structure: a comparison with much direct information not only contributes much to the network estimate of that comparison but also is more influential on its neighboring comparisons than its remotely placed comparisons, and a comparison for which little direct evidence exists benefits most from the rest of the network. Using netweight,
^{Footnote 1} the percentage contribution of each direct comparison to each network estimate is summarised in a matrix with rows representing network estimates and columns representing the available direct comparison in the network.
In order to characterize the RoB of each network estimate, we multiplied the contributions from direct comparisons at low, moderate or high risk of bias, respectively, by the contribution percentage that each direct estimate is making to the network estimate. This calculation provided the percentage of contributions from direct estimates rated at low, moderate or high risk of bias, respectively, to each network estimate.
In order to quantify the contribution from enrichment design to each network estimate, we multiplied the percentage of enrichment for each direct comparison by the contribution percentage that each direct estimate is making to the network estimate. For a particular network estimate of A vs B, this calculation provided the percentage of contributions from enriched studies favouring A, those favouring B, those disfavouring A (i.e. favouring another drug C over A), those disfavouring B, and those that involve neither A nor B (enrichment of unknown direction). The remaining came from nonenriched studies. We summed up the percentage of contributions from studies favouring A and those disfavouring B as the percentage of enrichment favouring A. In the same manner, the percentage of enrichment favouring B was calculated by summing up the percentage of contributions from studies favouring B and those disfavouring A.