What Are Repeatability and Reproducibility?
Part 2: The E11 Viewpoint1
Q: Many ASTM standard test methods contain repeatability and reproducibility statements and values. What variations can be expected? What is the standard deviation for the repeatability and reproducibility of the method?
A: The interlaboratory study is the first step to obtaining contrasting values for the repeatability and reproducibility of a test method. It is a way to learn how well or poorly a test method behaves when performed in different situations.
ASTM standard E691, Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method, is the basic standard describing how to perform an ILS and obtain values for the variations that one might expect for tests done at typical laboratories. As described in the preceding article of this series (for article, please click here), taking some number of repeats within each laboratory in a very short time by the same operator and equipment yields a best case situation, which should be the smallest variation among readings. This becomes a measure of the repeatability of measurements and is represented by calculating the repeatability standard deviation.
Usually only a small number of repeats for each test protocol in every laboratory is needed. By averaging results over many laboratories we learn how a typical lab might perform. Of course, all of this depends on how many laboratories participated and how well they represent the real world of laboratories. Thus, if you only have 10 laboratories involved, especially if they have been the ones developing the standard, it may be questionable to assume that any other random laboratory would perform as well. This could certainly be the case for new methods.
When we perform the test method in many different laboratories on the same material, we hope to discover all of the potential variations that can occur when the test method is used. Because we now have different operators, different equipment and different environmental conditions, all of the intermediate conditions and more will have been introduced. Thus, we should expect to have greater variability among the results from different laboratories. The measure of this larger variation due to readings taken among laboratories is found as the reproducibility standard deviation.
Again, the problem of interpreting this variation is that it is dependent on how many laboratories have participated. When only a very small sample of all the labs that might run the method is used, you must be cautious in believing that these results are typical of what all laboratories might have done with the same test. In addition, it is also important to look at the results to see if some laboratories consistently perform differently. Often the major reason for variation among laboratories is a consequence of some type of bias, or systematic difference, that occurs for one or more of the labs. This is especially a problem to recognize when only a very small number of labs actually participated (say, fewer than 10).
If we use the terms repeatability and reproducibility as describing the nature of the variation, then that variation is best computed as a standard deviation. Let’s take a closer look at how these terms are defined in an ILS.
When the terms r and R were first presented, the idea was to provide a simple approximate comparison for a very special case of using the results of the ILS. The value of r, called the repeatability interval, is found by simply multiplying the repeatability standard deviation by 2.8; it is similar to the statistical estimate of a 95 percent confidence interval for the difference between two readings. So, by using r we reduce the statistical jargon. The same goes for R, which is the reproducibility standard deviation times 2.8. With these calculations we arrive at a reproducibility interval, which we then use to compare the difference among a pair of actual test results that we might observe from two labs.
These interval types assume the following:
A few comments should also be noted:
Even more important is the repeated use of the comparison of samples. For example, if you make many paired comparisons, the chance that one would randomly be different rapidly increases.
EXAMPLE — E691 Serum in Glucose Study
To interpret these values of standard deviation, if we had a reading of about 135 (material C), and we had a single operator in one laboratory run many tests on that material, then 95 percent of the readings would fall within a range of approximately +3.0 units (or about 1.96 times the repeatability standard deviations or a total range of about 6.0 units). But if only two readings were run at random, then 95 percent of the time the difference between those two readings should not be more than 4.33 units (the value of r for material C). Similarly, if many laboratories ran a single test then 95 percent of the single readings would fall in a range of about 8.6 units, but pairs of readings would rarely have a difference of greater than 6.02 units.
Next Issue: Part 3 — Repeatability and reproducibility in measurement systems analysis, or “gage R&R” methodology.