Significance and Use
5.1 ASTM regulations require precision statements in all test methods in terms of repeatability and reproducibility. This practice may be used in obtaining the needed information as simply as possible. This information may then be used to prepare a precision statement in accordance with Practice E177.
5.2 Test Method and Protocol—In this practice, the term “test method” is used both for the actual measurement process and for the written description of the process, while the term “protocol” is used for the directions given to the laboratories for conducting the ILS.
5.3 Observations, Test Determinations and Test Results:
5.3.1 A test method often has three distinct stages, the direct observation of dimensions or properties, the arithmetic combination of the observed values to obtain a test determination, and the arithmetic combination of a number of test determinations to obtain the test result of the test method. In the simplest of test methods a single direct observation is both the test determination and the test result. For example, the test method may require the measurement of the mass of a test specimen prepared in a prescribed way. Another test method may require the measurement of the area of the test specimen as well as the mass, and then direct that the mass be divided by the area to obtain the mass per unit area of the specimen. The whole process of measuring the mass and the area and calculating the mass per unit area is a test determination. If the test method specifies that only one test determination is to be made, then the test determination value is the test result of the test method. Some test methods require that several determinations be made and the values obtained be averaged or otherwise combined to obtain the test result of the test method. Averaging of several determinations is often used to reduce the effect of local variations of the property within the material.
5.3.2 In this practice, the term “test determination” is used both for the process and for the value obtained by the process, except when “test determination value” is needed for clarity.
5.3.3 The number of test determinations required for a test result should be specified in each individual test method. The number of test results required for an interlaboratory study of a test method is specified in the protocol of that study.
5.4 Test Specimens and Test Units—In this practice a test unit is the total quantity of material needed for obtaining a test result as specified by the test method. The portion of the test unit needed for obtaining a single test determination is called a test specimen. Usually a separate test specimen is required for each test determination.
5.5 Precision, Bias, and Accuracy of a Test Method:
5.5.1 When a test method is applied to a large number of portions of a material, that are as nearly alike as possible, the test results obtained nevertheless will not all have the same value. A measure of the degree of agreement among these test results describes the precision of the test method for that material.
5.5.2 Numerical measures of the variability between such test results provide inverse measures of the precision of the test method. Greater variability implies smaller (that is, poorer) precision and larger imprecision.
5.5.3 This practice is designed only to estimate the precision of a test method. However, when accepted reference values are available for the property levels, the test result data obtained according to this practice may be used in estimating the bias of the test method. For a discussion of bias estimation and the relationships between precision, bias, and accuracy, see Practice E177.
5.6 Repeatability and Reproducibility—These terms deal with the variability of test results obtained under specified laboratory conditions. Repeatability concerns the variability between independent test results obtained within a single laboratory in the shortest practical period of time by a single operator with a specific set of test apparatus using test specimens (or test units) taken at random from a single quantity of homogeneous material obtained or prepared for the ILS. Reproducibility deals with the variability between single test results obtained in different laboratories, each of which has applied the test method to test specimens (or test units) taken at random from a single quantity of homogeneous material obtained or prepared for the ILS.
5.6.1 Repeatability Conditions—The within-laboratory conditions specified above for repeatability. The single-operator, single-set-of-apparatus requirement means that for a particular step in the measurement process the same combination of operator and apparatus is used for every test result and on every material. Thus, one operator may prepare the test specimens, a second measure the dimensions and a third measure the breaking force. “Shortest practical period of time” means that the test results, at least for one material, are obtained in a time not less than in normal testing and not so long as to permit significant changes in test material, equipment or environment.
Abstract
The procedure presented in this practice consists of three basic steps: planning the interlaboratory study, guiding the testing phase of the study, and analyzing the test result data. The analysis utilizes tabular, graphical, and statistical diagnostic tools for evaluating the consistency of the data so that unusual values may be detected and investigated, and also includes the calculation of the numerical measures of precision of the test method pertaining to both within-laboratory repeatability and between-laboratory reproducibility.
Tests performed on presumably identical materials in presumably identical circumstances do not, in general, yield identical results. This is attributed to unavoidable random errors inherent in every test procedure; the factors that may influence the outcome of a test cannot all be completely controlled. In the practical interpretation of test data, this inherent variability has to be taken into account. For instance, the difference between a test result and some specified value may be within that which can be expected due to unavoidable random errors, in which case a real deviation from the specified value has not been demonstrated. Similarly, the difference between test results from two batches of material will not indicate a fundamental quality difference if the difference is no more than can be attributed to inherent variability in the test procedure. Many different factors (apart from random variations between supposedly identical specimens) may contribute to the variability in application of a test method, including: a the operator, b equipment used, c calibration of the equipment, and d environment (temperature, humidity, air pollution, etc.). It is considered that changing laboratories changes each of the above factors. The variability between test results obtained by different operators or with different equipment will usually be greater than between test results obtained by a single operator using the same equipment. The variability between test results taken over a long period of time even by the same operator will usually be greater than that obtained over a short period of time because of the greater possibility of changes in each of the above factors, especially the environment.
The general term for expressing the closeness of test results to the “true” value or the accepted reference value is accuracy. To be of practical value, standard procedures are required for determining the accuracy of a test method, both in terms of its bias and in terms of its precision. This practice provides a standard procedure for determining the precision of a test method. Precision, when evaluating test methods, is expressed in terms of two measurement concepts, repeatability and reproducibility. Under repeatability conditions the factors listed above are kept or remain reasonably constant and usually contribute only minimally to the variability. Under reproducibility conditions the factors are generally different (that is, they change from laboratory to laboratory) and usually contribute appreciably to the variability of test results. Thus, repeatability and reproducibility are two practical extremes of precision.
The repeatability measure, by excluding the factors a through d as contributing variables, is not intended as a mechanism for verifying the ability of a laboratory to maintain“ in-control” conditions for routine operational factors such as operator-to-operator and equipment differences or any effects of longer time intervals between test results. Such a control study is a separate issue for each laboratory to consider for itself, and is not a recommended part of an interlaboratory study.
The reproducibility measure (including the factors a through d as sources of variability) reflects what precision might be expected when random portions of a homogeneous sample are sent to random “in-control” laboratories.
To obtain reasonable estimates of repeatability and reproducibility precision, it is necessary in an interlaboratory study to guard against excessively sanitized data in the sense that only the uniquely best operators are involved or that a laboratory takes unusual steps to get “good” results. It is also important to recognize and consider how to treat “poor” results that may have unacceptable assignable causes (for example, departures from the prescribed procedure). The inclusion of such results in the final precision estimates might be questioned.
An essential aspect of collecting useful consistent data is careful planning and conduct of the study. Questions concerning the number of laboratories required for a successful study as well as the number of test results per laboratory affect the confidence in the precision statements resulting from the study. Other issues involve the number, range, and types of materials to be selected for the study, and the need for a well-written test method and careful instructions to the participating laboratories.
To evaluate the consistency of the data obtained in an interlaboratory study, two statistics may be used: the “k-value”, used to examine the consistency of the within-laboratory precision from laboratory to laboratory, and the “h-value”, used to examine the consistency of the test results from laboratory to laboratory. Graphical as well as tabular diagnostic tools help in these examinations.
Scope
1.1 This practice describes the techniques for planning, conducting, analyzing, and treating the results of an interlaboratory study (ILS) of a test method. The statistical techniques described in this practice provide adequate information for formulating the precision statement of a test method.
1.2 This practice does not concern itself with the development of test methods but rather with gathering the information needed for a test method precision statement after the development stage has been successfully completed. The data obtained in the interlaboratory study may indicate, however, that further effort is needed to improve the test method.
1.3 Since the primary purpose of this practice is the development of the information needed for a precision statement, the experimental design in this practice may not be optimum for evaluating materials, apparatus, or individual laboratories.
1.4 Field of Application—This practice is concerned exclusively with test methods which yield a single numerical figure as the test result, although the single figure may be the outcome of a calculation from a set of measurements.
1.4.1 This practice does not cover methods in which the measurement is a categorization; however, for many practical purposes categorical outcomes can be scored, such as zero-one scoring for binary measurements or as integers, ranks for example, for well-ordered categories and then the test result can be defined as an average, or other summary statistic, of several individual scores.
1.5 The information in this practice is arranged as follows:
| Section |
Scope | 1 |
Referenced Documents | 2 |
Terminology | 3 |
Summary of Practice | 4 |
Significance and Use | 5 |
|
|
Planning the Interlaboratory Study (ILS) | Section |
ILS Membership | 6 |
Basic Design | 7 |
Test Method | 8 |
Laboratories | 9 |
Materials | 10 |
Number of Test Results per Material | 11 |
Protocol | 12 |
|
|
Conducting the Testing Phase of the ILS | Section |
Pilot Run | 13 |
Full Scale Run | 14 |
|
|
Calculation and Display of Statistics | Section |
Calculation of the Statistics | 15 |
Tabular and Graphical Display of Statistics | 16 |
|
|
Data Consistency | Section |
Flagging Inconsistent Results | 17 |
Investigation | 18 |
Task Group Actions | 19 |
Glucose ILS Consistency | 20 |
|
|
Precision Statement Information | Section |
Repeatability and Reproducibility | 21 |
|
|
Appendixes | Appendix |
Theoretical Considerations | X1 |
Pentosans in Pulp Example | X2 |
|
|
References |
|
|
|
Tables and Figures |
|
| Table |
Glucose in Serum Example | 1–4, 6–8 |
Critical Values of Consistency Statistics, h and k | 5 |
|
|
| Figure |
Glucose in Serum Example | 1–3 |
1.6 This standard may involve hazardous materials, operations, and equipment. This standard does not purport to address all of the safety problems associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory limitations prior to use.