For 50 years and at least since the seminal paper of Currie (Currie, 1968), metrologists have been using Null Hypothesis Significance Testing (NHST) to characterize their measurements to their clients. Indeed, significance thresholds have been devised leading to decision on whether a result is significant or not. Non-significant results are usually reported as below the detection limits (<LD), with the commendable exception of radioactive measurements in the USA. In the environmental literature, such values are often called “nondetects”. These so-called censored data lead to a very significant amount of work in order to analyse them (Helsel, 2011). Attempts have been made to use Bayesian methodologies (ISO 11929) to determine characteristic limits but with no discernible differences in the outputs and while maintaining the data censoring for non-significant results. Several authors have shown that this method is not entirely satisfactory or that many problems remain.
With the reproductibility crisis, voices are heard asking for the “ditching” of statistical significance (Nature, 20 March 2019) in science. The authors in this special issue do not call for significance values themselves to be ditched as a statistical tool — rather, they want an end to their use as an arbitrary threshold of significance which is exactly what is happening in metrology. Is it time to reconsider the use of the characteristic limits (based on NHST and significance thresholding) in metrology? We will try to give a tentative answer by using the example of radioactivity measurements which are inherently heteroscedastic.
The first question to be asked is whether NHST deliver on its promises. While characteristics limits (decisions thresholds and detection limits) for homoscedastic systems seem to perform in a statistically satisfying way, this is not true for heteroscedastic systems as demonstrated by simulations (Strom & MacLellan, 2001). Following Bolstad, we show that by using interval estimation for radioactivity measurements and considering intervals containing zero as non-significant (Bolstad, 2007), it is possible to design characteristic limits having good performances in simulation.
The second question is whether non-censored measurements reports provided as interval estimation are adequate for processing? These non-censored results give more information, greatly ease their use for a combination of data and avoid time consuming and labor-intensive processing to analyse sets of measurements. This does not preclude the use of a non-significant qualifier to the provided results.
Finally, we can ask ourselves the question of the compatibility of this methodology with Bayesian methods. In radioactivity measurements, it is possible to show that frequentist and Bayesian methods give compatible and adequate results.
We conclude that at least for radioactivity measurements the use of interval estimations lead to characteristic limits having good statistical performances, to reported results that are easy to process and avoid the obstacles present in censored data.