Design Your Experiments: Noise characteristics and measurement


			Design Your Experiments	Noise Characteristics and Measurement

Design Your Experiments, Part III:
Noise Characteristics and Measurement

by Kevin Kilty

As I advised at the end of the previous installment of this series, even though noise is a constant impediment to our search for truth, there is no reason to despair over it. We can handle noisy experiments if we learn to think statistically. In this installment I wish to begin doing this by examining the following topics:

Characteristics of noise in general.
The statistical description of noise.
The effect of noise on experimental design.

What to expect of noise

As a model to explore the nature of noise I'll propose a simple one, Y_o = Y + n; where, Y_o is the value I obtain from my measurement or experiment, Y is a true value (which I don't know of course) and n is some additional value which has corrupted my measurement. You can call it error or noise. What should I expect of the noise values? This is straight forward to answer. I expect that as long as all conditions surrounding my measurment stay constant then there is some probability density, call it N, that describes the noise. A probability density is a function of sorts which assigns to each possible noise value (or range of noise value) an associated probability. I do not need to know the exact density at this time, but I do need to suppose a couple of its characteristics.

The mean value of N should be zero. If it is not, then there is a constant difference between a best estimate based on experiment and a true value. This means that there is a systematic bias in the experiment. Experimental design ought to identify possible systematic errors and eliminate them.
Each n ought to be independent and random. If this is not true, then I need some measure of how one is correlated to another.
I should strive to make each individual value of n as small as possible. This makes my experimental result very precise.

What replications of an experiment provide is a series of observed values that cluster around a central value. Statisticians call this central tendency. We assume, from our experiment being well designed and executed, that the central value is our best estimate of the true value of the thing we are looking for. This is the value that we will report in our experiment. I am going to assume throughout this installment that our experiment is calibrated to provide a true value. I'll discuss calibration separately at a later time.

Because each measurement contains an unknown n, unless I report a measure of n as well, then my result has limited value. One question to answer is how to report it. The National Institute of Standards and Technology (NIST) suggests two ways of doing this. The first is to report the square root of the mean variance of replicated measurements as the standard uncertainty of the measurement result. Most of you familiar with statistics will recognize this as being the standard error of a sample mean. Take {Y_oi};i=1...n as the set of measured values from replicated experiments, and <Y> as the mean measured value. Then...

u = sqrt( Sum of (Y_oi-<Y>)²/n(n-1) )

is the standard uncertainty of these measurements.

As an example, suppose I measure the speed of light 5 times with some apparatus and get the following values in km/Sec: 299,034, 300,006, 298,510, 299,435, and 299,987. Then the best value to report for the speed of light and the uncertainty in this estimate is 299394.4±286.3 km/Sec. Including the uncertainty of measurement makes quibbling about whether or not to include the single digit beyond the decimal point insignificant.

Calculating uncertainty directly from measurements this way is very useful, but it does obscure the idea that noise, or random errors if you will, follow some probability density. Standard error provides a complete description of noise density only if the noise is Gaussian--i.e. follows a normal probability distribution. This might not always be so, and at some point in a future installment I'll speak at length about probability density and distributions.

Calculating uncertainty from replicated measurements is fine when we have access to them, but, what do we do otherwise? NIST suggests a second reporting method in which a standard uncertainty is obtained from the square root of a variance determined by other means. This is very vague, but also very interesting, for it suggests using prior knowledge about measurements, models of noise sources, equipment, and even experimental conditions to calculate and report uncertainty.

Once again let me provide an example. A photomultiplier counts photons. If I use one to scan across a spectral peak, each count it makes has some associated uncertainty. I could figure this uncertainty by making numerous photon counts at one setting of my spectrometer and calculating the standard uncertainty. However, counting photons is a Poisson process, the variance of which equals the count itself, so at low count rates I could spend forever obtaining data to calculate uncertainty directly. Yet, by knowing the theoretical density of a particular noise process, such as Poisson in this instance, I can estimate uncertainty from it. What matters about an estimate of uncertainty is not so much how it was done, but that how a person does it is defensible. Peter Baum suggested an even better example to me recently.

Suppose that I am measuring some quantity that varies little from one experiment to another, and I am doing it with an instrument that has not very much resolution. All of my measurements are practically the same, and when I get around to calculating a standard uncertainty from the replications I get a very tiny value. Have I made an impressively precise measurement? Of course not. This is exactly what I mean by a standard uncertainty that is not defensible. What I must do in such a situation is use an estimate of measurement uncertainty, such as using a unform noise (error) density equal to my instrument's least significant digit (call it e), and report the standard uncertainty as e/(sqrt(12*n)). The square root of 12 comes from the standard deviation of a uniform density and the factor of n, which is the number of independent measurements I made, makes this a standard error of the mean.

Now that I know how to figure uncertainty, how should I interpret it, and what effect does it have on experimental design?

The first part of the question is easy to answer. The distribution of noise is unknown, but if it is concentrated near a central value, then I can invoke the central limit theorem and treat the density of mean value as normal with a standard deviation of u as calculated above. Therefore, the true value, Y, is within ±u of the mean value of this experiment to 68% certainty. What this percentage certainty really means is difficult to assess. Statisticians from the school of "Bayesians" don't believe it means a thing. The idea is, though, that if a person were to make many repetitions of the experiment I described in my example of 5 measurements of the speed of light, then in 68% of these the true value (Y) would occur within an interval of plus or minus u around each mean value, or, <Y>±u. If more certainty is needed then the interval <Y>±2u would be 95% certain and <Y>±3u would be 99.7% certain and so forth.

The second part of the question gets to the heart of design, but is complex to explain. Remember that we do not actually know the true value Y. The whole point of the experiment is to estimate it as accurately as possible. The initial design of the experiment has two aims. First, we rid the experiment of systematic bias and make sure our method is calibrated. This insures that our measurements tend to center on the true value. Second, we need to make the uncertainty of central value (mean typically) small enough to accomplish the experimental objective.

Often we organize an experiment to distinguish between two alternatives. For example, we may want to see if a treatment is 50% effective at preventing a disease. One alternative commonly is the nominal or null hypothesis, so called because it implies that a treatment is not effective. The other is the alternative hypothesis that the treatment is, in this example, 50% effective. Because of unavoidable experimental noise, it may not be possible to distinguish between these hypotheses unless we design carefully in advance.

This brings me to the idea of the power of an experiment and what it means for design. Nominal incidence of a disease implies a certain probability of incidence. A 50% effective treatment means that incidence of the disease is cut to one-half of the nominal probability in a group that gets treated. Power of an experiment is a probability measure and has to be between 0 and 100% as a result. Power of 90% means that if the treatment is truly 50% effective, our experimental design has 90% chance of success in detecting it. Simply put, we can detect an effective treatment only if two things happen. 1) The incidence in the treated group is less than nominal incidence; and 2) the uncertainty of experimental incidence is so small that we cannot mistake it for nominal incidence easily. In other words we have to reject the nominal hypothesis. The one thing that affects uncertainty and the probability of rejecting the nominal hypothesis when it is in fact false, and which is also under our control, is the number of replications of the experiment or the size of the control and test groups.

Another numerical example is in order. Let's take the 1954 trials of the Salk Vaccine as a great historical example. The nominal incidence of Polio in the 1950s was about 0.0003, or 30 per hundred thousand. Jonas Salk wanted to estimate the size required of test and control groups to detect 50% effectiveness of his vaccine at 95% confidence with 95% power. Polio is a binomial process. A person either gets the disease or not, and this dichotomy of outcomes is truly binomial. Unfortunately the binomial distribution is inconvenient. However, when the expected number of successes is at least 15 or so, the binomial density is nearly a normal density with mean value of np, where n is the number of trials (at-risk population size in this case) and p is the probability of incidence per trial (0.0003). Binomial variance is np(1-p), which I can use the square root of as my uncertainty calculated by other means in the venacular of NIST.

Salk reasoned as follows. The nominal incidence of Polio is 0.0003 and there is some uncertainty in a control group that might produce a slightly different result. A treated group of 50% effectiveness would have an incidence of 0.00015 with some uncertainty of a slightly different result. As long as the observed incidence of the treated group is farther than 1.96u from the observed incidence in the control, he would reject the nominal hypothesis, because the 95% confidence region of the normal density is 1.96u. The required size of u is that which will separate the expected values at nominal and one-half nominal incidence by the combined 95% confidence regions around both the expected nominal and experimental result. This way Salk is 95% certain that his treated group result would be outside the acceptance region of the nominal group 95% of the time, presuming his vaccine really is 50% effective. Therefore, an experiment of 95% power requires that

 expected incidence difference>95% treated uncertainty+95% nominal uncertainty

or

 0.5np>1.96sqrt(0.5np(1-0.5p))+1.96sqrt(np(1-p))

The only unknown is n, and solving for it shows that 150,000 people in each group would just do the task. I hope your amateur experiments never need so many.

Design Your Experiments, Part III: Noise Characteristics and Measurement

Design Your Experiments, Part III:
Noise Characteristics and Measurement