# A parameter estimation method for fluorescence lifetime data

- Daniel Sewell
^{1}, - Hajin Kim
^{2, 3}, - Taekjip Ha
^{4}and - Ping Ma
^{5}Email author

**Received: **26 November 2013

**Accepted: **19 May 2015

**Published: **9 June 2015

## Abstract

### Background

When modeling single-molecule fluorescence lifetime experimental data, the analysis often involves fitting a biexponential distribution to binned data. When dealing with small sample sizes, there is the potential for convergence failure in numerical optimization, for convergence to local optima resulting in physically unreasonable parameter estimates, and also for overfitting the data.

### Results

To avoid the problems that arise in small sample sizes, we have developed a gamma conversion method to estimate the lifetime components. The key idea is to use a gamma distribution for initial numerical optimization and then convert the gamma parameters to biexponential ones via moment matching. A simulation study is undertaken with 30 unique configurations of parameter values. We also performed the same analysis on data obtained from a fluorescence lifetime experiment using the fluorophore Cy3. In both the simulation study and the real data analysis, fitting the biexponential directly led to a large number of data sets whose estimates were physically unreasonable, while using the gamma conversion yielded estimates consistently close to the true values.

### Conclusions

Our analysis shows that using numerical optimization methods to fit the biexponential distribution directly can lead to failure to converge, convergence to physically unreasonable parameter estimates, and overfitting the data. The proposed gamma conversion method avoids these numerical difficulties, yielding better results.

### Keywords

Single-molecule fluorescence lifetime Numerical optimization Overfitting data Mixture distribution## Findings

### Background

In the single-molecule fluorescence lifetime experiments, a fluorophore is attached to the molecule under study, which is placed in a focal volume illuminated by a pulsed laser. The fluorophore emits photons when excited by the pulsed laser. The time length that it takes for the fluorophore to release the photon from the moment that it is excited is termed as the photon delay time (or fluorescence lifetime). The photon delay time is recorded by time-correlated single photon counting device [1].

Because the dye’s photon emission pattern depends on its photophysical state and molecular environment which are then affected by the conformational or electronic state of the molecule with which it is interacting (e.g., the active and inactive states of an enzyme could have different effect on the dye’s photon emission intensity in certain cases), by examining how the photon emission pattern fluctuates over time, one can investigate the underlying dynamics of the molecules. It is thus of interest to study the photon delay time.

This time lapse (or fluorescence lifetime) data are often binned to form count data. The decay curve describing the stochasticity of the continuous time lapse is then indirectly estimated from the count data. This leads to a two-level hierarchical model, where the first level models the binned counts and the second level models the continuous time lapse. That is, the stochasticity of the count data is determined by certain bin probabilities (the first level), and these probabilities are in turn modeled by the decay curve corresponding to the time lapse (the second level).

More specifically, conditioning on the total number of photons counted, the bin counts follow a multinomial distribution. The probability that a photon is counted during a given time interval (bin) is determined by the cumulative distribution function (cdf) of the time lapse. A mixture of exponential probability density functions (pdf) is most widely used to model the decay curve of the fluorescence lifetime [2–4]. The specific context considered here is that the data follow a two-component mixture of exponentials (biexponential distribution). Furthermore, we assume that, by carefully controlled experimental conditions, the major lifetime component is known (though as we will see later, this restriction is not necessary to our method of parameter estimation) and we aim to estimate the second component.

Parameter estimation in this context can often be difficult, unreliable and biased. Novikov et al. [5] showed that the parameter estimation for biexponential decays is more critical and depends on the detection procedure, leaving substantial obscurity on the estimation. Early work to address this was done by Sasaki and Masuhara [6], who used a convolved autoregressive model that can be fitted using the least squares (LS) method. This approach was made more efficient by Enderlein and Erdmann [7]. However, employing LS leads to unnecessary bias [8].

A better statistical approach would be to try to find the maximum likelihood estimators (MLEs) of the biexponential distribution involved in this hierarchical model. Indeed, using LS is equivalent to finding the MLE while assuming the bin counts follow a normal pdf; however, this assumption of normality is clearly not the case, as small bin counts and sparsity of the data make the normal model an inadequate approximation of the distribution of the bin counts [9]. The fact that finding the MLE is more appropriate than using LS has been reviewed by Maus et al. [10], Edel et al. [11], and Laurence and Chromy [9].

When dealing with mixture models such as a biexponential pdf, the expectation–maximization (EM) algorithm has been widely used for finding MLE’s [12]. In this hierarchical setting, however, the EM algorithm may be both difficult to implement and slow to converge, and hence other numerical optimization methods may be employed. With a small sample size and small bin width, there will inevitably be a zero count in many of the bins [8], and such sparsity of the data may cause these numerical optimization techniques to be unstable and error-prone in finding the MLE for a mixture distribution. The commonly used direct search Nelder–Mead algorithm [13] was found to perform poorly with such a two level hierarchical model (see McKinnon [14] for more details on situations in which the Nelder–Mead algorithm fails). Enderlein et al. [15] used an MLE approach to distinguish between distinct states or molecules. [16] used MLE and iterative convolutions to fit the arrival time histograms to single exponential decay. Enderlein and Sauer [17] presented a pattern-matching procedure for identifying single molecules from a mixture of molecules, although the algorithm presented works best only if the lifetimes are already known. This is not applicable to the cases where we cannot experimentally separate the two distinct states of a complex. complex always exhibits mixed states because we cannot predetermine the lifetimes of the respective states. Edel et al. [11] developed a modified MLE method to compensate localized background fluorescence and instrument response function (IRF). However, this method focuses on fitting only the monoexponential decay curve.

Moreover, there are some non-MLE based parameter estimation methods in the literature. For example, Digman et al. [18] developed a phasor plot method and required labor-intensive visual inspection. Kim et al. [19] developed a promptness ratio method for estimating the lifetime.

This paper focuses on two issues, numerical stability and overfitting small data sets. Overfitting the data in this context can be described as yielding a model which gives very high probability to data similar to the observed data yet describing the true underlying generative process poorly. When fitting a mixture of exponential decay curves with binned data, the numerical optimization algorithm for finding MLE may not converge. Even if it converges, in practice the numerical optimization algorithm may converge to a value that is physically unreasonable. In addition, we show that the MLE’s for the mixture of exponential distributions can often overfit the data, hence giving estimates that appear satisfactory but fail to accurately represent the true parameter values.

To address these issues, we propose a novel method of estimating the parameters of the biexponential distribution using binned count data. The object is to find a generalization of the mono-exponential distribution whose pdf is flexible enough to well approximate the shape of a biexponential density curve. With this motivation in mind, we propose a new estimation method which utilizes the gamma distribution family, a family which contains that of monoexponential distributions. We show that our approach can successfully recover the parameters of the underlying biexponential distribution, while avoiding the inherent numerical instabilities involved with a mixture distribution. Our proposed estimation algorithm is robust, and is not likely to overfit the data.

The rest of the paper is organized as follows. We first present the model and the estimation method. We then demonstrate the performance of the proposed method through simulations in which data sets are generated using biexponential pdfs with varying parameters. We finally present results for real data analysis from the fluorophore Cy3, collected via single photon counting technique.

### Methods

*k*th derivative of \(M_{X,\cdot }\) with respect to \(t\). The closed form solutions using the MLEs from fitting the gamma distribution to approximate the parameters of the biexponential distribution are

*a priori*, the system of equations needed to be solved for a \(M\) component exponential mixture model is \(M^{(k)}_{X,\gamma }(0)=M^{(k)}_{X,BE}(0)\) for \(k=1,2,\ldots ,2M-1\).

### Results

### Simulation results

Biexponential data were simulated as follows. In the biexponential distribution, the first lifetime component \(\tau _1\) was fixed at 1,500, and assumed known when fitting the biexponential distribution directly by maximizing the likelihood and when fitting the biexponential distribution indirectly by using the gamma conversion method; the first component weight \(c\) took values in \(\{ 0.60, \, 0.75, \, 0.90\}\); the second lifetime component \(\tau _2\) took values equaling \(k\tau _1\), for \(k\) in {0.500, 0.800, 0.900, 0.950, 0.990, 1.01, 1.05, 1.10, 1.20, 2.00}; the bin width \(\delta\) was set to be 50. We generated 1,000 data sets of 50 photons for each of the 30 configurations. For each data set we estimated the lifetime parameter values by fitting the biexponential directly and also by using our proposed approach. In both cases optimization was performed by using the Nelder–Mead algorithm, setting the maximum number of iterations to be 10,000 and the relative convergence tolerance to be \(1\times 10^{-8}\). In the former case, we attempted to fit the data using 25 different starting values of \(c\) and initializing \(\tau _2\) to be equal to the mean of the bin counts (i.e., \(\sum _j (jy_j)/\sum _{\ell }y_{\ell }\)). In the latter case we initialized \(\alpha =0.5\), and \(\tau _{\gamma }\) was initialized similarly to \(\tau _2\) when fitting the biexponential directly. We note here that to find good solutions from the optimization algorithm it was necessary to use multiple starting points for fitting the biexponential directly whereas this was not necessary with our method; in particular, without using multiple initialization points for fitting the biexponential distribution directly we would often fail to converge or obtain poor estimates. Out of the 30,000 simulated data sets, attempting to fit the biexponential model directly failed to converge in 14,272 instances even while using multiple starting points, as opposed to 2,160 instances when using the proposed gamma method using only one starting point.

In reality, many of these results yielding extremely large estimates simply would not be accepted in practice. Instead, an artificial ceiling may be put on the lifetime estimates. When we do this in our simulation study, using a cap of 100 ns, our results lead to the same conclusions. To give a brief summary of these slightly modified results, we computed the mean square error (MSE), which is the average of \((\widehat{\tau }_2-\tau _2)^2\), for the direct fitting of the biexponential (\(1{,}500\,ns^2\)) and for our proposed gamma conversion method (\(55\,ns^2\)); clearly even with this truncation of extremely high estimates, our proposed method is performing much better.

To evaluate the overfitting problem, we compute the Hellinger Distance and Pearson’s \(\chi ^2\) statistics. For each of the simulated data sets in which both methods converged, these two values were computed by fitting the biexponential distribution directly and also by using the gamma conversion method. Figure 2 gives the two-dimensional histogram of these values, where the plot on the left corresponds to fitting the biexponential directly and the plot on the right corresponds to using the gamma conversion method. We see that fitting the biexponential directly, in a large number of the data sets, yields estimates which fit the data quite well, as evidenced by a small \(\chi ^2\) value, but are far from the truth, as evidenced by a large Hellinger Distance. Using the gamma conversion method eliminates this overfitting problem, as evidenced by the observations that all the Hellinger Distance values are small.

### Fluorophore Cy3 results

Single molecule fluorescence lifetime was measured as follows. We used a confocal microscope setup to minimize the detection volume. A DNA strand labeled with Cy3

The excitation pulses were branched to a photodiode for synchronization. Time delay between the signals from the avalanche and sync photodiodes was measured by the time correlated single photon counter (SPC630, Becker&Hickl GmbH). Figure 3 shows a schematic of that described above. We used 2 nM of fluorophores for detecting fluorescence from diffusing molecules. We set it up such that it gives the APD counting rate smaller than \(10^5\)/s. Considering that the excitation pulse repeats at 80 MHz (i.e. 12.5 ns), this corresponds to detecting less than one photon every 800 pulses on average. The probability of detecting more than two photons (from two different molecules) from a single pulse is less than 1/800. As we used only 50 photons per histogram and also the pulse interval of 12.5 ns much longer than the decay time, there will be practically no photon that is not coming from the latest excitation pulse. Thus we confirm that we are measuring tightly correlated photon emission from excited single molecules. The data from SPC630 were collected until desired number of photons were detected and then plotted as a lifetime histogram with appropriate bin sizes. See [1] for more details.

As with the simulation study, we again applied a ceiling of 100 ns to the extremely high estimates of \(\tau _2\). Again the conclusions were the same for these modified results. The MSE for directly fitting the biexponential was \(725\,ns^2\) and for the gamma conversion method was \(0.89\,ns^2\).

Similar results were obtained for the fluorophore Cy3 data when \(\tau _1\) was assumed to be unknown. See Additional files 1, 2, 3 and 4 for these results.

### Conclusion

In the single-molecule fluorescence lifetime experiments, it is of great interest to study the photon delay time. In particular, we are interested in fitting a mixture of exponential model to the photon count data. However, directly fitting a mixture of exponential model may lead to numerical optimization problems, whether that be failure to converge or convergence to local optima resulting in physically unreasonable values or overfitting. In this paper, we proposed the gamma conversion method, where we first fit a gamma distribution to the data and then, via moment matching, estimate biexponential parameters. In this manner both the numerical instability and the overfitting problems are avoided.

The proposed method was evaluated using Pearson’s \(\chi ^2\) statistic and the Hellinger Distance. As an alternative to Pearson’s \(\chi ^2\) statistic and the Hellinger Distance, we could have compared the MSE, just as we did when we applied the ceiling to the lifetime estimates. Calculating the ratio of MSEs obtained from fitting the biexponentials directly and from our proposed method yielded a value of 5.6e10 for the real data example, and similar ratios were consistently found in all 30 simulation configurations. These observations suggest that the estimates obtained from gamma conversion significantly outperform those obtained from directly fitting biexponenetials.

Although the method was designed to analyze photon counts in single-molecule fluorescence lifetime experiments, the method may be applied to other problems involving fitting mixture of exponential distributions. Most FLIM measurements, however, have rather large number of photons (\(\sim1{,}000\)) for each pixel, and thus do not suffer from the overfitting or numerical instability issues highlighted here when discussing single molecule fluorescence lifetime data.

## Declarations

### Authors' contributions

PM conceived of the study. DS and PM designed algorithm. HK and TH performed experiments. DS, HK, TH, and PM wrote the manuscript. All authors read and approved the final manuscript.

### Acknowledgements

The study was partially supported by NSF Grants DMS-1055815 (DMS-1438957), DMS-1228288 (DMS-1440038) and DMS-1222718 (DMS 1440037). Hajin Kim was supported by the Institute for Basic Science (IBS-R020-D1), 2014 UNIST Research Fund (1.130091.01), and Basic Science Research Program through NRF of Korea (2014R1A1A1003949).

### Compliance with ethical guidelines

**Competing interests** The authors declare that they have no competing interests.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Sorokina M, Koh H, Patel SS, Ha T (2009) Fluorescent lifetime trajectories of a single uorophore reveal reaction intermediates during transcription initiation. J Am Chem Soc 131(28):9630–9631PubMed CentralPubMedView ArticleGoogle Scholar
- Kollner M, Wolfrum J (1992) How many photons are necessary for fluorescence-lifetime measurements? Chem Phys Lett 200(1,2):199–204Google Scholar
- Xie XS (2002) Single-molecule approach to dispersed kinetics and dynamic disorder: probing conformational fluctuation and enzymatic dynamics. J Chem Phys 117(24):11024–11032View ArticleGoogle Scholar
- Duncan R, Bergmann A, Cousin M, Apps D, Shipston M (2004) Multi-dimensional time-correlated single photon counting (TCSPC) fluorescence lifetime imaging microscopy (FLIM) to detect fret in cells. J Microsc 215(1):1–12PubMed CentralPubMedView ArticleGoogle Scholar
- Novikov E, Hofkens J, Cotlet M, Maus M, De Schryver FC, Boens N (2001) A new analysis method of single molecule fluorescence using series of photon arrival times: theory and experiment. Spectrochim Acta Part A Mol Biomol Spectrosc 57(11):2109View ArticleGoogle Scholar
- Sasaki K, Masuhara H (1991) Analysis of transient emission curves by a convolved autoregressive model. Appl Opt 30(8):977–980PubMedView ArticleGoogle Scholar
- Enderlein J, Erdmann R (1997) Fast fitting of multi-exponential decay curves. Opt Commun 134(1):371–378View ArticleGoogle Scholar
- Turton D, Reid G, Beddard G (2003) Accurate analysis of fluorescence decays from single molecules in photon counting experiments. Anal Chem 75(16):4182–4187PubMedView ArticleGoogle Scholar
- Laurence TA, Chromy BA (2010) Efficient maximum likelihood estimator fitting of histograms. Nat Methods 7(5):338–339PubMedView ArticleGoogle Scholar
- Maus M, Cotlet M, Hofkens J, Gensch T, De Schryver FC, Schaffer J et al (2001) An experimental comparison of the maximum likelihood estimation and nonlinear least-squares fluorescence lifetime analysis of single molecules. Anal chem 73(9):2078–2086PubMedView ArticleGoogle Scholar
- Edel JB, Eid JS, Meller A (2007) Accurate single molecule fret efficiency determination for surface immobilized dna using maximum likelihood calculated lifetimes. J Phys Chem B 111(11):2986–2990PubMedView ArticleGoogle Scholar
- Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Royal Stat Soc Ser B 39:1–37. (with discussion) Google Scholar
- Nelder JA, Mead R (1965) A simplex method for function minimization. Comput J 7(4):308–313View ArticleGoogle Scholar
- McKinnon KIM (1998) Convergence of the Nelder–Mead simplex method to a nonstationary point. SIAM J Optim 9(1):148–158View ArticleGoogle Scholar
- Enderlein J, Goodwin PM, Van Orden A, Patrick Ambrose W, Erdmann R, Keller RA (1997) A maximum likelihood estimator to distinguish single molecules by their fluorescence decays. Chem Phys Lett 270(5):464–470View ArticleGoogle Scholar
- Rothwell P, Berger S, Kensch O, Felekyan S, Antonik M, Wöhrl B et al (2003) Multiparameter single-molecule fluorescence spectroscopy reveals heterogeneity of HIV-1 reverse transcriptase: primer/template complexes. Proc Natl Acad Sci 100(4):1655–1660PubMed CentralPubMedView ArticleGoogle Scholar
- Enderlein J, Sauer M (2001) Optimal algorithm for single-molecule identification with time-correlated single-photon counting. J Phys Chem A 105(1):48–53View ArticleGoogle Scholar
- Digman MA, Caiolfa VR, Zamai M, Gratton E (2008) The phasor approach to fluorescence lifetime imaging analysis. Biophys J 94(2):14–16View ArticleGoogle Scholar
- Kim G-H, Legresley SE, Snyder N, Aubry PD, Antonik M (2011) Single-molecule analysis and lifetime estimates of heterogeneous low-count-rate time-correlated fluorescence data. Appl Spectrosc 65(9):981–990PubMedView ArticleGoogle Scholar
- Le Cam L, Lo Yang G (2000) Asymptotics in statistics: some basic concepts. Springer, New YorkView ArticleGoogle Scholar
- Sanborn ME, Connolly BK, Gurunathan K, Levitus M (2007) Fluorescence properties and photophysics of the sulfoindocyanine Cy3 linked covalently to DNA. J Phys Chem B 111(37):11064–11074PubMedView ArticleGoogle Scholar