How many of the digits in a mean of 12.3456789012 are worth reporting?
BMC Research Notes volume 12, Article number: 148 (2019)
A computer program tells me that a mean value is 12.3456789012, but how many of these digits are significant (the rest being random junk)? Should I report: 12.3?, 12.3456?, or even 10 (if only the first digit is significant)? There are several rules-of-thumb but, surprisingly (given that the problem is so common in science), none seem to be evidence-based.
Here I show how the significance of a digit in a particular decade of a mean depends on the standard error of the mean (SEM). I define an index, DM that can be plotted in graphs. From these a simple evidence-based rule for the number of significant digits (‘sigdigs’) is distilled: the last sigdig in the mean is in the same decade as the first or second non-zero digit in the SEM. As example, for mean 34.63 ± SEM 25.62, with n = 17, the reported value should be 35 ± 26. Digits beyond these contain little or no useful information, and should not be reported lest they damage your credibility.
Numerous scientists—perhaps a majority—need to report mean values, yet many have little idea of how many digits carry useful meaning—are significant (‘sigdig’s)—and at what point further digits are mere random junk. Thus a report that the mean of 17 values was 34.63 g with a standard error of the mean (SEM) of 25.62 g raises in a conspicuously permanent way a suspicion that none of the seven authors of the article were fully aware of what they were doing. But the frequency of a transition of a trapped and laser-cooled, lone ion of 88Sr+ was reported  convincingly as 444,779,044,095,484.6 Hz, with an SEM of 1.5 Hz. It is a surprise that there seems to be no evidence to support the commonly used rules-of-thumb for this basic need. Here I derive simple evidence-based rules for restricting a mean value (and its SEM) to their sigdigs.
To understand the trends, consider Table 1A which shows the frequency of digits in 6 decades (from the ‘10’s to the ‘0.0001’s) in 8000 random samples from a population of Gaussian (‘normal’) values with mean 39.61500 and SEM 1.33. In the 10’s decade the frequency of ‘3’s is a bit more than that of the ‘4’s, reflecting the mean of 39…. The influence of the second digit (‘9’) is thus visible in the frequency of ‘4’s in the ‘10’s decade. The count (in italic) in target digit ‘3’ is also the most frequent (underlined). This decade is clearly significant: one or more digits close to the target dominate the frequencies. The same is true of the ‘1’s decade, though here there is a clear pattern of decline in frequency centred around the target ‘9’. In the ‘0.1’s decade the target digit (‘6’) is only next to the most frequent digit (‘7’), and pattern around ‘7’ is not conspicuous.
We may measure inequality (non-uniformity) across the digits in a decade with an index, IQ, based on the sum of absolute deviations from the mean in a row/decade, defined by the ‘R’ expression ‘sum (abs (x − xbar))/s’, where x is a vector of the 10 counts for the individual digits, 0–9, xbar is the mean of the ‘x’ values, and ‘s = 2 * (sum (x) − mean (x))’ is a standardisation factor that brings IQ into the range 0–1. In Table 1 the IQ values are multiplied by 1000 as mIQ.
This IQ measure is linear and is a pure number, so values in different decades (rows) can be summed.
In Table 1A there are big reductions in IQ in the first 3 decades; thereafter values differ erratically governed by random frequencies of the digits. This pattern resembles an ice-hockey stick. As you move down the handle (rows/decades in Table 1A) the downward steps in the inequality measure are large. But when you reach the blade, differences in the measures between rows/decades become erratically smaller and larger, with no obvious further predictable change with additional rows/decades. At what decade may we suppose that little or no more useful information is present? This is tantamount to locating the junction between the hockey stick handle and blade. This is not a sharp angle, but a mIQ value of 200 seems, from Table 1, to be suitable. A crude stopping-rule is thus to continue down the decades until mIQ is below 200, i.e. (Table 1A) to the same decade as the first digit in the SEM. This becomes Rule 1 in Rules Box (later).
This rule uses the SEM to show where to stop: it makes no use whatever of the position of the decimal point. For example, the value 12.345 mm has 5 digits after the first non-‘0’, and 3 decimal places, while the same value in different units is 0.012345 m which also has 5 digits after the first non-‘0’ (i.e. ignoring preceding zeros) but 6, not 3, decimal places. Rules-of-thumb that specify a number of decimal places miss the point (literally as well as metaphorically) that precision is measured by SEM (and n).
Table 1B shows similar results for the same mean as in Table 1A, 39.61500, but SEM 100 times smaller. The same features are visible, and the same crude stopping-rule emerges. The ‘10’s and ‘1’s decades show only a single (the target) digit.; not until the ‘0.1’s do the frequencies begin to spread out.
The IQ calculation takes no notice of the order of the frequencies within a decade. Murray Hannah (personal communication) points out that at least one more decade may contain some residual conditional information. For example, in Table 1A, the 0.1’s decade contains the ‘run’ of increasing or decreasing values 751, 827, 851, 813, 776, draped over the most frequent value: a faint echo of the strong patterns in earlier decades. But in Table 1B at the ‘0.001’s decade (the first with mIQ < 200) there is no sign at all of a sequence. It seems that we need to add somewhere between 0 and 1 digits to the sigdig identified by the basic stopping rule (though this would require a fractional decade). At worst, the crude rule becomes stop at the same decade as the second digit in the SEM.
A continuous index and trends for sigdigs
In Table 1A counts in the ‘0.1’s decade show little regularity, but if we were to decrease the SEM gradually (details not shown) the totals for each digit in a decade become more and more unequal as frequency peaks emerge and grow from the hummocky sinking plain and, consequently, indicate that we may soon be able to justify another sigdig. The examples in Table 1 are indicative, but to understand the trends and to distil general rules, we need a sigdig index, DM, for the mean that is continuous, and which can be plotted on a graph. For this purpose, because IQ is linear, we can simply add the IQ values for each decade (row) until we stop at the last decade with mIQ more than 200 (IQ more than 0.2). This value, DM = ΣIQ, is then a plottable measure of sigdigs (Figs. 1 and 2).
In Fig. 1, the large circles are for a stopping rule at 200 mIQ, Putting the stopping rule at 100 mIQ (not shown) makes little difference.
Sigdigs in the SEM, DSEM (Fig. 2) are got in the same way as DM.
The points in Fig. 1 show how DM depends experimentally on C, the quotient of mean/SEM in experiments similar to those outlined in Table 1. The sloping line, DM = log10 C, is close to the circles, but is not fitted to them. If we take the ceiling of these values—equivalent to truncating and adding 1—to get an integer value we get the broken line in Fig. 1, superimposed on which is the direct integer sigdig (triangles). The overshoot into the random digits region is from 0 to 1 sigdig.
The possibility of Murray Hannah’s contingent information may be accommodated by adding one extra decade to the dashed steps (Fig. 1). It may be accommodated in another way: shift the steps about half a decade left using log10 (3) ≈ 0.5 (continuous line steps in Fig. 1). The overshoot is more uniform at 0.5–1.5 digits, and this accommodates most if not all contingent information.
Rule 2 for DSEM is simpler but its origin is more complicated. Figure 2 shows, for a fixed mean and standard deviation (SD), how DSEM depends, in experiments similar to those in Table 1, on the number of items, NS, in the calculation of an SEM. Points for two such experiments, with the same mean and different SDs are shown. Over a range of 100 the value of DSEM rises with a slope ≈ 1 on the log-linear scales shown: DSEM ≈ log10 (NS) + c, but eventually it falls over a cliff creating a sawtooth pattern. The cliff effect is at first very confusing. We know that the precision of the SD estimate must increase monotonically with increasing sample size. So too must the precision of the SEM. The reason for the cliffs is that, since SEM = SD/√NS, it also decreases in magnitude. With every 100-fold increase in NS the SEM loses a leading significant decade, as a ‘1’ in the leading decade shrinks to a ‘9’ in the next decade. So while the precision increases, the number of significant digits decreases by one.
The overall slope of this saw-toothed progression (≈ 0.5) is half that of the teeth themselves reflecting the fact that the SEM depends on √NS. The exact position of the sawtooth depends on the numerical value of the SEM, and to accommodate this the bounding line DSEM = log10 (NS)/2 + 1 is shown. The steps show Rule 2 in Rules Box. The offset for NS ≤ 6 accommodates the fact that at small NS the bounding line curves downwards, though this is not shown in detail in Fig. 2. Reports of percentages have additional problems. The Rules Box below lists all these rules. Cole  considers the special case of risk (and other) ratios (strictly quotients).
Special cases of zeros
Suppose a raw mean of 0.0298699, has DM = 3 sigdigs under Rule 1A. The reported value should be 0.0300. The first two ‘0’s locate the decade of the first sigdig; the final two ‘0’s are significant, and their presence is sufficient to show that. They should not be omitted.
But suppose that the raw mean is 298,699 with 3 sigdigs again, then the reported value should be 300,000. The first two ‘0’s are sigdigs, but the next 3 function only to show where the decimal point is. One way (there are others) to indicate such packing digits is by italics: 300,000, or by expressing the value in exponent form: 3.00e5.
Finally, apply these rules to the example in the Introduction: mean = 34.63, SEM = 25.62, n = 17. This justifies SEM = 26, mean = 30 (Rule 1A) or 30 (Rule 1B, the italic ‘0’ is just a packing digit and its numerical value is not significant).
This analysis deals with precision alone. Bias (and sometimes mistakes) may often have a bigger effect on a mean than does precision.
Margolis HS, Barwood GP, Huang G, Klein HA, Lea SN, Szymaniec K, Gill P. Hertz-level measurement of the optical clock frequency in a single 88Sr+ ion. Science. 2004;306:1355–8.
Cole TJ. Settling number of decimal places for reporting risk ratios. BMJ. 2015;350:h1845.
RSC is the sole author. The author read and approved the final manuscript.
I thank Murray Hannah for pointing out possible contingent information beyond the DM limit. I salute those whose ignorance of when to stop goaded me to start this work.
RSC declares that he has no competing interests.
Availability of data
All in the article.
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.