Significance and Use
4.1 This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Large data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations. Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable amounts of empirical data.
4.1.1 A data set containing a single variable usually consists of a column of numbers. Each row is a separate observation or instance of measurement of the variable. The numbers themselves are the result of applying the measurement process to the variable being studied or observed. We may refer to each observation of a variable as an item in the data set. In many situations, there may be several variables defined for study.
4.1.2 The sample is selected from a larger set called the population. The population can be a finite set of items, a very large or essentially unlimited set of items, or a process. In a process, the items originate over time and the population is dynamic, continuing to emerge and possibly change over time. Sample data serve as representatives of the population from which the sample originates. It is the population that is of primary interest in any particular study.
4.2 The data (measurements and observations) may be of the variable type or the simple attribute type. In the case of attributes, the data may be either binary trials or a count of a defined event over some interval (time, space, volume, weight, or area). Binary trials consist of a sequence of 0s and 1s in which a “1” indicates that the inspected item exhibited the attribute being studied and a “0” indicates the item did not exhibit the attribute. Each inspection item is assigned either a “0” or a “1.” Such data are often governed by the binomial distribution. For a count of events over some interval, the number of times the event is observed on the inspection interval is recorded for each of n inspection intervals. The Poisson distribution often governs counting events over an interval.
4.3 For sample data to be used to draw conclusions about the population, the process of sampling and data collection must be considered, at least potentially, repeatable. Descriptive statistics are calculated using real sample data that will vary in repeating the sampling process. As such, a statistic is a random variable subject to variation in its own right. The sample statistic usually has a corresponding parameter in the population that is unknown (see Section 5). The point of using a statistic is to summarize the data set and estimate a corresponding population characteristic or parameter.
4.4 Descriptive statistics consider numerical, tabular, and graphical methods for summarizing a set of data. The methods considered in this practice are used for summarizing the observations from a single variable.
4.5 The descriptive statistics described in this practice are:
4.5.1 Mean, median, min, max, range, mid range, order statistic, quartile, empirical percentile, quantile, interquartile range, variance, standard deviation, Z-score, coefficient of variation, skewness and kurtosis, and standard error.
4.6 Tabular methods described in this practice are:
4.6.1 Frequency distribution, relative frequency distribution, cumulative frequency distribution, and cumulative relative frequency distribution.
4.7 Graphical methods described in this practice are:
4.7.1 Histogram, ogive, boxplot, dotplot, normal probability plot, and q-q plot.
4.8 While the methods described in this practice may be used to summarize any set of observations, the results obtained by using them may be of little value from the standpoint of interpretation unless the data quality is acceptable and satisfies certain requirements. To be useful for inductive generalization, any sample of observations that is treated as a single group for presentation purposes must represent a series of measurements, all made under essentially the same test conditions, on a material or product, all of which have been produced under essentially the same conditions. When these criteria are met, we are minimizing the danger of mixing two or more distinctly different sets of data.
4.8.1 If a given collection of data consists of two or more samples collected under different test conditions or representing material produced under different conditions (that is, different populations), it should be considered as two or more separate subgroups of observations, each to be treated independently in a data analysis program. Merging of such subgroups, representing significantly different conditions, may lead to a presentation that will be of little practical value. Briefly, any sample of observations to which these methods are applied should be homogeneous or, in the case of a process, have originated from a process in a state of statistical control.
4.9 The methods developed in Sections 6, 7, and 8 apply to the sample data. There will be no misunderstanding when, for example, the term “mean” is indicated, that the meaning is sample mean, not population mean, unless indicated otherwise. It is understood that there is a data set containing n observations. The data set may be denoted as:
4.9.1 There is no order of magnitude implied by the subscript notation unless subscripts are contained in parenthesis (see 6.7).
1.1 This practice covers methods and equations for computing and presenting basic descriptive statistics using a set of sample data containing a single variable. This practice includes simple descriptive statistics for variable data, tabular and graphical methods for variable data, and methods for summarizing simple attribute data. Some interpretation and guidance for use is also included.
1.2 The system of units for this practice is not specified. Dimensional quantities in the practice are presented only as illustrations of calculation methods. The examples are not binding on products or test methods treated.
1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory limitations prior to use.
2. Referenced Documents (purchase separately) The documents listed below are referenced within the subject standard but are not provided as part of the standard.
E178 Practice for Dealing With Outlying Observations
E456 Terminology Relating to Quality and Statistics
E2282 Guide for Defining the Test Result of a Test Method
ISO 3534-2 Statistics--Vocabulary and Symbols, part 2: Applied Statistics
boxplot; dot plot; empirical percentile; frequency distribution; histogram; kurtosis; mean; median; mid range; Ogive; order statistic; population parameter; probability plot; q-q plot; range; sample statistic; skewness; standard deviation; standard error; variance;
ICS Number Code 03.120.30 (Application of statistical methods)
ASTM International is a member of CrossRef.
Citing ASTM Standards
[Back to Top]