Errors on Small Numbers
This page springs out of a discussion on the Statistics Hypernews:
The issue is the treatment of errors on small numbers of events
(typically less that 10). For small numbers the standard +- root N error
approximation breaks down, as the Poisson distribution is not well
approximated by a Gaussian.
Physicists displaying such errors in histograms want to know a method of doing it
which is 'correct' or - given that there is no single 'correct' technique - a method which
will be acceptable to their Review Committee.
These errors are asymmetric, with the negative 'error' smaller than the positive,
not because of the shape of the Poisson distribution as such but because the
variance of the Poisson increases with the mean: a true mean of 10.0 is more
likely to give a fluctuation down to 7 events than a true mean of 4.0 is likely to fluctuate up to 7.
To take an even more extreme example, 10.0 can plausibly give 5 events, whereas 0.0 cannot fluctuate up to 5.
The SWG convenors have steadfastly declined to give a compulsory BaBar procedure
on the grounds that
there are various acceptable techniques and the best method will be different for different
analyses. However they do endorse various methods and these should be acceptable
to your Review Committee without quibble, provided you obey Rule 1.
Anyone who has some time to do a systematic study of how these different
methods compare in practice would be doing us all a useful service.
Rule 1 Always state what procedure you are using.
The SWG report technique
The
Report on recommended Statistical practices describes a
procedure in Chapter 13 (page 95) with a set of numerical values
given in Table 13.1, which can be used.
These are obtained by requiring that the integral of the likelihood function
across the range of the error interval be 68%. This is a Bayesian method,
and assumes that the prior possibility for any true (non-negative) value be
the same.
The report does not actually state this (in violation of Rule 1) but
as this is purely for display purposes it is probably harmless.
However it does give negative errors that are bigger than positive errors,
which is the wrong way round.
RooFit
RooFit
uses the Confidence Belt method of Neyman to construct a 68% central interval with the correct frequentist properties.
Delta ln L = - 1/2
You can approximate the full confidence belt by finding the points at which the
log likelihood falls by 1/2 from its peak value (which is at the value mu = N).
For a table and a discussion of accuracy see
"A note on estimating errors from the likelihood function"
arXiv:physics/0403046,
Nuclear Instruments and Methods in Physics Research A550, Pages 392-396
SWG convenors: Roger Barlow and Ilya Narsky
Last modified: Fri Dec 28, 2007 Jan 11 14:31:39 PST 2006
|