January/February 2009 ## Sample Sizes## Considering the Number of Items to Include in a Sample
## Q: How many items from a lot should I sample to determine the average value for a property of the lot?A: The number of items to sample is a compromise between the precision with which you must obtain the average and the cost of sampling and testing all the items in the sample. in which σ is the standard deviation of the property in the lot and is the required precision of the average value. The objective is that the error in the average due to sampling should almost certainly be less than The principle behind the formula is the distribution of the average of - The expected value of the average is equal to the population mean µ,
- The standard deviation of the average is equal to , and
- The form of the distribution of the average is close to Gaussian.
The Gaussian, or normal, distribution is the well-known “bell-shaped curve” of probability and statistics. For a random quantity with a Gaussian distribution, values will be within one standard deviation approximately two-thirds of the time, within two standard deviations approximately 95 percent of the time, and within three standard deviations approximately 99.7 percent of the time. The last degree of assurance is the one aimed for by the factor 3 in the sample size formula. Notably absent from influence on the required sample size for given accuracy is the size of the lot or population. The precision of the average does not depend on population size unless the lot is so small that the number to be sampled is a significant fraction of all the items. Sampling strategies often do take larger samples of larger populations. For example, a traditional prescription is given by the square-root-of- Also absent from significant influence on the required sample size is the form of the distribution of the property in the lot. The property does not have to have a normal distribution in order for the normal distribution to apply to the average. The property values may be skewed to one side or rectangular (box-shaped). The distribution of the average depends strongly only on the lot mean and standard deviation. That the sample be a random representative sample is critical. The notion that the sample average has a statistical distribution at all depends on it. If, for example, you take a grab sample of five items, then those five give you only a snapshot of a small portion of the lot, and none of the distribution theory applies. The lot standard deviation σ plays the key role. This is awkward, because the standard deviation may not be known at the time the sampling is planned. If similar material has been sampled before, a good projection is to pool standard deviations over the previous samplings. If it is practical to take the sample in two steps, then another effective strategy is take an initial sample of If you know more about the lot, for example, that it contains runs that differ in mean value or that units in the lot differ in size, then this information can be exploited to design a sampling plan that may be more accurate than a simple random sample. ASTM E1402-08, Guide for Sampling Design, describes types of sampling plans that use the additional information. The sample size may also be adjusted to fit a convenient number for sampling and testing. For example, you may estimate that you need 28 items from a lot, but testing is best done in batches of ten. Then 30 samples is the number of units to sample.
Statistics play an important role in the ASTM International standards you write, such as the development of precision and bias statements for test methods, running interlaboratory studies, knowing how to round numbers properly and determining sample size. A panel of experts is ready to answer your questions about how to use statistical principles in ASTM standards. Please send your questions to SN Editor in Chief Maryann Gorman at ASTM International, 100 Barr Harbor Drive, P.O. Box C700, W. Conshohocken, PA 19428-2959. |
|||||||