Search ASTM
A01 STEEL, STAINLESS STEEL AND RELATED ALLOYS A04 IRON CASTINGS A05 METALLIC-COATED IRON AND STEEL PRODUCTS B01 ELECTRICAL CONDUCTORS B05 COPPER AND COPPER ALLOYS B07 LIGHT METALS AND ALLOYS C01 CEMENT C04 VITRIFIED CLAY PIPE C07 LIME AND LIMESTONE C09 CONCRETE AND CONCRETE AGGREGATES C11 GYPSUM AND RELATED BUILDING MATERIALS AND SYSTEMS C12 MORTARS AND GROUTS FOR UNIT MASONRY C13 CONCRETE PIPE C14 GLASS AND GLASS PRODUCTS C15 MANUFACTURED MASONRY UNITS C16 THERMAL INSULATION C17 FIBER-REINFORCED CEMENT PRODUCTS C18 DIMENSION STONE C21 CERAMIC WHITEWARES AND RELATED PRODUCTS C24 BUILDING SEALS AND SEALANTS C27 PRECAST CONCRETE PRODUCTS D01 PAINT AND RELATED COATINGS, MATERIALS, AND APPLICATIONS D04 ROAD AND PAVING MATERIALS D07 WOOD D08 ROOFING AND WATERPROOFING D09 ELECTRICAL AND ELECTRONIC INSULATING MATERIALS D11 RUBBER D14 ADHESIVES D18 SOIL AND ROCK D20 PLASTICS D35 GEOSYNTHETICS E05 FIRE STANDARDS E06 PERFORMANCE OF BUILDINGS E33 BUILDING AND ENVIRONMENTAL ACOUSTICS E36 ACCREDITATION & CERTIFICATION E57 3D IMAGING SYSTEMS E60 SUSTAINABILITY F01 ELECTRONICS F06 RESILIENT FLOOR COVERINGS F13 PEDESTRIAN/WALKWAY SAFETY AND FOOTWEAR F16 FASTENERS F17 PLASTIC PIPING SYSTEMS F33 DETENTION AND CORRECTIONAL FACILITIES F36 TECHNOLOGY AND UNDERGROUND UTILITIES G03 WEATHERING AND DURABILITY C14 GLASS AND GLASS PRODUCTS C21 CERAMIC WHITEWARES AND RELATED PRODUCTS D01 PAINT AND RELATED COATINGS, MATERIALS, AND APPLICATIONS D06 D09 ELECTRICAL AND ELECTRONIC INSULATING MATERIALS D10 PACKAGING D11 RUBBER D12 SOAPS AND OTHER DETERGENTS D13 TEXTILES D14 ADHESIVES D15 ENGINE COOLANTS AND RELATED FLUIDS D20 PLASTICS D21 POLISHES D31 LEATHER E12 COLOR AND APPEARANCE E18 SENSORY EVALUATION E20 TEMPERATURE MEASUREMENT E35 PESTICIDES, ANTIMICROBIALS, AND ALTERNATIVE CONTROL AGENTS E41 LABORATORY APPARATUS E53 ASSET MANAGEMENT E57 3D IMAGING SYSTEMS F02 FLEXIBLE BARRIER PACKAGING F05 BUSINESS IMAGING PRODUCTS F06 RESILIENT FLOOR COVERINGS F08 SPORTS EQUIPMENT, PLAYING SURFACES, AND FACILITIES F09 TIRES F10 LIVESTOCK, MEAT, AND POULTRY EVALUATION SYSTEMS F11 VACUUM CLEANERS F13 PEDESTRIAN/WALKWAY SAFETY AND FOOTWEAR F14 FENCES F15 CONSUMER PRODUCTS F16 FASTENERS F24 AMUSEMENT RIDES AND DEVICES F26 FOOD SERVICE EQUIPMENT F27 SNOW SKIING F37 LIGHT SPORT AIRCRAFT F43 LANGUAGE SERVICES AND PRODUCTS F44 GENERAL AVIATION AIRCRAFT A01 STEEL, STAINLESS STEEL AND RELATED ALLOYS A04 IRON CASTINGS A05 METALLIC-COATED IRON AND STEEL PRODUCTS A06 MAGNETIC PROPERTIES B01 ELECTRICAL CONDUCTORS B02 NONFERROUS METALS AND ALLOYS B05 COPPER AND COPPER ALLOYS B07 LIGHT METALS AND ALLOYS B08 METALLIC AND INORGANIC COATINGS B09 METAL POWDERS AND METAL POWDER PRODUCTS B10 REACTIVE AND REFRACTORY METALS AND ALLOYS C03 CHEMICAL-RESISTANT NONMETALLIC MATERIALS C08 REFRACTORIES C28 ADVANCED CERAMICS D01 PAINT AND RELATED COATINGS, MATERIALS, AND APPLICATIONS D20 PLASTICS D30 COMPOSITE MATERIALS E01 ANALYTICAL CHEMISTRY FOR METALS, ORES, AND RELATED MATERIALS E04 METALLOGRAPHY E07 NONDESTRUCTIVE TESTING E08 FATIGUE AND FRACTURE E12 COLOR AND APPEARANCE E13 MOLECULAR SPECTROSCOPY AND SEPARATION SCIENCE E28 MECHANICAL TESTING E29 PARTICLE AND SPRAY CHARACTERIZATION E37 THERMAL MEASUREMENTS E42 SURFACE ANALYSIS F01 ELECTRONICS F34 ROLLING ELEMENT BEARINGS F40 DECLARABLE SUBSTANCES IN MATERIALS F42 ADDITIVE MANUFACTURING TECHNOLOGIES G01 CORROSION OF METALS G03 WEATHERING AND DURABILITY D21 POLISHES D26 HALOGENATED ORGANIC SOLVENTS AND FIRE EXTINGUISHING AGENTS D33 PROTECTIVE COATING AND LINING WORK FOR POWER GENERATION FACILITIES E05 FIRE STANDARDS E27 HAZARD POTENTIAL OF CHEMICALS E30 FORENSIC SCIENCES E34 OCCUPATIONAL HEALTH AND SAFETY E35 PESTICIDES, ANTIMICROBIALS, AND ALTERNATIVE CONTROL AGENTS E52 FORENSIC PSYCHOPHYSIOLOGY E54 HOMELAND SECURITY APPLICATIONS E58 FORENSIC ENGINEERING F06 RESILIENT FLOOR COVERINGS F08 SPORTS EQUIPMENT, PLAYING SURFACES, AND FACILITIES F10 LIVESTOCK, MEAT, AND POULTRY EVALUATION SYSTEMS F12 SECURITY SYSTEMS AND EQUIPMENT F13 PEDESTRIAN/WALKWAY SAFETY AND FOOTWEAR F15 CONSUMER PRODUCTS F18 ELECTRICAL PROTECTIVE EQUIPMENT FOR WORKERS F23 PERSONAL PROTECTIVE CLOTHING AND EQUIPMENT F26 FOOD SERVICE EQUIPMENT F32 SEARCH AND RESCUE F33 DETENTION AND CORRECTIONAL FACILITIES G04 COMPATIBILITY AND SENSITIVITY OF MATERIALS IN OXYGEN ENRICHED ATMOSPHERES D08 ROOFING AND WATERPROOFING D18 SOIL AND ROCK D19 WATER D20 PLASTICS D22 AIR QUALITY D34 WASTE MANAGEMENT D35 GEOSYNTHETICS E06 PERFORMANCE OF BUILDINGS E44 SOLAR, GEOTHERMAL AND OTHER ALTERNATIVE ENERGY SOURCES E47 E48 BIOENERGY AND INDUSTRIAL CHEMICALS FROM BIOMASS E50 ENVIRONMENTAL ASSESSMENT, RISK MANAGEMENT AND CORRECTIVE ACTION E60 SUSTAINABILITY F20 HAZARDOUS SUBSTANCES AND OIL SPILL RESPONSE F40 DECLARABLE SUBSTANCES IN MATERIALS G02 WEAR AND EROSION B01 ELECTRICAL CONDUCTORS C26 NUCLEAR FUEL CYCLE D02 PETROLEUM PRODUCTS, LIQUID FUELS, AND LUBRICANTS D03 GASEOUS FUELS D05 COAL AND COKE D19 WATER D27 ELECTRICAL INSULATING LIQUIDS AND GASES D33 PROTECTIVE COATING AND LINING WORK FOR POWER GENERATION FACILITIES E10 NUCLEAR TECHNOLOGY AND APPLICATIONS E44 SOLAR, GEOTHERMAL AND OTHER ALTERNATIVE ENERGY SOURCES E48 BIOENERGY AND INDUSTRIAL CHEMICALS FROM BIOMASS A01 STEEL, STAINLESS STEEL AND RELATED ALLOYS C01 CEMENT C09 CONCRETE AND CONCRETE AGGREGATES D02 PETROLEUM PRODUCTS, LIQUID FUELS, AND LUBRICANTS D03 GASEOUS FUELS D04 ROAD AND PAVING MATERIALS D15 ENGINE COOLANTS AND RELATED FLUIDS D18 SOIL AND ROCK D24 CARBON BLACK D35 GEOSYNTHETICS E12 COLOR AND APPEARANCE E17 VEHICLE - PAVEMENT SYSTEMS E21 SPACE SIMULATION AND APPLICATIONS OF SPACE TECHNOLOGY E36 ACCREDITATION & CERTIFICATION E57 3D IMAGING SYSTEMS F03 GASKETS F07 AEROSPACE AND AIRCRAFT F09 TIRES F16 FASTENERS F25 SHIPS AND MARINE TECHNOLOGY F37 LIGHT SPORT AIRCRAFT F38 UNMANNED AIRCRAFT SYSTEMS F39 AIRCRAFT SYSTEMS F41 UNMANNED MARITIME VEHICLE SYSTEMS (UMVS) F44 GENERAL AVIATION AIRCRAFT F45 DRIVERLESS AUTOMATIC GUIDED INDUSTRIAL VEHICLES D10 PACKAGING D11 RUBBER E31 HEALTHCARE INFORMATICS E35 PESTICIDES, ANTIMICROBIALS, AND ALTERNATIVE CONTROL AGENTS E54 HOMELAND SECURITY APPLICATIONS E55 MANUFACTURE OF PHARMACEUTICAL PRODUCTS E56 NANOTECHNOLOGY F02 FLEXIBLE BARRIER PACKAGING F04 MEDICAL AND SURGICAL MATERIALS AND DEVICES F29 ANESTHETIC AND RESPIRATORY EQUIPMENT F30 EMERGENCY MEDICAL SERVICES G04 COMPATIBILITY AND SENSITIVITY OF MATERIALS IN OXYGEN ENRICHED ATMOSPHERES C07 LIME AND LIMESTONE D14 ADHESIVES D16 AROMATIC HYDROCARBONS AND RELATED CHEMICALS D20 PLASTICS D26 HALOGENATED ORGANIC SOLVENTS AND FIRE EXTINGUISHING AGENTS D28 ACTIVATED CARBON D32 CATALYSTS E13 MOLECULAR SPECTROSCOPY AND SEPARATION SCIENCE E15 INDUSTRIAL AND SPECIALTY CHEMICALS E27 HAZARD POTENTIAL OF CHEMICALS E35 PESTICIDES, ANTIMICROBIALS, AND ALTERNATIVE CONTROL AGENTS F40 DECLARABLE SUBSTANCES IN MATERIALS E11 QUALITY AND STATISTICS E36 ACCREDITATION & CERTIFICATION E43 SI PRACTICE E55 MANUFACTURE OF PHARMACEUTICAL PRODUCTS E56 NANOTECHNOLOGY F42 ADDITIVE MANUFACTURING TECHNOLOGIES
Bookmark and Share

DataPoints

DataPoints

Is It One or Two Groups of Data?

How to Determine It

Q: When I look at my data it appears that instead of one group of data I see two. How can I determine statistically if there are one or two groups of data present when I expect to see only one?

A: It is not uncommon for researchers to look at their data and see something they didn’t expect. Often the person is expecting to see two sets of measurements look statistically similar when selected from what is believed to be a single population (measurement system) with mean μ and variance . Typically, one can test the difference between two sample means, say and , with sample variances, s12 and s22, based on sample sizes, n1 and n2, respectively, using a student’s t statistic as described in the basic statistics standard ASTM E2586, Practice for Calculating and Using Basic Statistics. However, in this case, we are interested in whether the two samples actually represent clusters of data from two different populations.

A set of observations x1, x2, …, xn can be partitioned into two clusters x11, x12, …, and x21, x22, ..., . Fortunately, we don’t need to consider all 2n possible clustering of the data. If the observations are ordered, then we only need to consider (n - 1) partitions of the data.

{x1},{x2, L, xn}

{x1, x2},{x3, L, xn}

etc.

{x1, L, xn-1}, {xn}

For the two clusters, Engelman and Hartigan1 define a statistic between whose maximum value represents the maximum distance between the optimal clusters (partitions). This maximum value is denoted by C. If C exceeds a critical value, then it indicates that the clusters do represent two populations with means, μ1 and μ2, respectively, with the same variance, . The formula looks like this:

The value of C measures the distance between clusters and tests for the following:

  • Null hypothesis: x1, x2, …, xn is a random sample from a single population with mean μ and variance , against the
  • Alternate hypothesis: For some partition of x1, x2, ..., xn the cluster x11, x12, …, is a sample from a population with mean, μ1, and variance, ; and the cluster x21, x22, ...,, is a sample from a population with mean, μ2, and same variance, .

We will let the critical value Cα be such that P(C < Cα) = α under the null hypothesis. So, for a set of data, we will compute the value of C based on the sample statistics for both samples and then compare C to Cα. Table 1 presents the critical values for Cα for testing the most extreme grouping possible, so any value of C that exceeds Cα is solid evidence of clustering.

Example

Suppose we have a dataset of eight lots that were tested for some characteristic and yielded the values 102, 95, 75, 201, 67, 194, 81 and 187. So, can these lots can be divided into two groups, e.g., did the lots come from a single process or not? Ordering the values and looking at the possible (n - 1) partitions give

{67},{75,81,95,102,187,194,201}

{67,75},{81,95,102,187,194,201}

{67,75,81},{95,102,187,194,201}

{67,75,81,95},{102,187,194,201}

{67,75,81,95,102},{187,194,201}

{67,75,81,95,102,187},{194,201}

{67,75,81,95,102,187,194},{201}

The only reasonable possibility for two groups is the partition {67,75,81,95,102}, {187,194,201}. Using the equation for C we have

In this example, we have a total of n = 8 values so we choose Cα from Table 1 for an appropriate value of α. Here we see that C = 24.61 is more significant (larger) than the critical value at α = 0.01 (C0.01 = 15.1), so we can safely conclude that the lots came from two different groups (processes) with at least 99 percent confidence. In this case, the means of the two groups are 84 and 194, respectively, and their common standard deviation is estimated to be

The only reasonable possibility for two groups is the partition {67,75,81,95,102}, {187,194,201}. Using the equation for C we have

with 6 degrees of freedom.

Reference

1. Engelman, L., and Hartigan, J. A., “Percentage Points of a Test for Clusters,” Journal of the American Statistical Association, Vol. 64, No. 328, Dec. 1969, pp. 1647-1648.

Dean V. Neubauer, Corning Inc., Corning, N.Y., is an ASTM fellow; he serves as chairman of Committee E11 on Quality and Statistics, chairman of E11.90.03 on Publications and coordinator of the DataPoints column.

Table 1 — Critical Values for C for Testing for Two Clusters

n

α=.10

α=.05

α=.01

5

15.10

24.00

74.10

6

9.84

14.10

33.10

7

7.66

10.50

20.90

8

6.46

8.39

15.10

9

5.68

7.18

11.70

10

5.14

6.34

9.89

11

4.75

5.77

8.66

12

4.45

5.34

7.78

13

4.21

5.00

7.11

14

4.01

4.73

6.59

15

3.85

4.51

6.15

16

3.71

4.31

5.82

17

3.59

4.15

5.53

18

3.49

4.01

5.29

19

3.40

3.89

5.08

20

3.32

3.78

4.91

21

3.25

3.69

4.74

22

3.19

3.61

4.59

23

3.13

3.53

4.47

24

3.08

3.46

4.35

25

3.03

3.40

4.25

30

2.84

3.16

3.86

35

2.71

2.99

3.59

40

2.62

2.86

3.39

45

2.54

2.77

3.24

50

2.48

2.69

3.12

This article appears in the issue of Standardization News.