Search ASTM
A01 STEEL, STAINLESS STEEL AND RELATED ALLOYS A04 IRON CASTINGS A05 METALLIC-COATED IRON AND STEEL PRODUCTS B01 ELECTRICAL CONDUCTORS B05 COPPER AND COPPER ALLOYS B07 LIGHT METALS AND ALLOYS C01 CEMENT C04 VITRIFIED CLAY PIPE C07 LIME AND LIMESTONE C09 CONCRETE AND CONCRETE AGGREGATES C11 GYPSUM AND RELATED BUILDING MATERIALS AND SYSTEMS C12 MORTARS AND GROUTS FOR UNIT MASONRY C13 CONCRETE PIPE C14 GLASS AND GLASS PRODUCTS C15 MANUFACTURED MASONRY UNITS C16 THERMAL INSULATION C17 FIBER-REINFORCED CEMENT PRODUCTS C18 DIMENSION STONE C21 CERAMIC WHITEWARES AND RELATED PRODUCTS C24 BUILDING SEALS AND SEALANTS C27 PRECAST CONCRETE PRODUCTS D01 PAINT AND RELATED COATINGS, MATERIALS, AND APPLICATIONS D04 ROAD AND PAVING MATERIALS D07 WOOD D08 ROOFING AND WATERPROOFING D09 ELECTRICAL AND ELECTRONIC INSULATING MATERIALS D11 RUBBER D14 ADHESIVES D18 SOIL AND ROCK D20 PLASTICS D35 GEOSYNTHETICS E05 FIRE STANDARDS E06 PERFORMANCE OF BUILDINGS E33 BUILDING AND ENVIRONMENTAL ACOUSTICS E36 ACCREDITATION & CERTIFICATION E57 3D IMAGING SYSTEMS E60 SUSTAINABILITY F01 ELECTRONICS F06 RESILIENT FLOOR COVERINGS F13 PEDESTRIAN/WALKWAY SAFETY AND FOOTWEAR F16 FASTENERS F17 PLASTIC PIPING SYSTEMS F33 DETENTION AND CORRECTIONAL FACILITIES F36 TECHNOLOGY AND UNDERGROUND UTILITIES G03 WEATHERING AND DURABILITY C14 GLASS AND GLASS PRODUCTS C21 CERAMIC WHITEWARES AND RELATED PRODUCTS D01 PAINT AND RELATED COATINGS, MATERIALS, AND APPLICATIONS D06 D09 ELECTRICAL AND ELECTRONIC INSULATING MATERIALS D10 PACKAGING D11 RUBBER D12 SOAPS AND OTHER DETERGENTS D13 TEXTILES D14 ADHESIVES D15 ENGINE COOLANTS AND RELATED FLUIDS D20 PLASTICS D21 POLISHES D31 LEATHER E12 COLOR AND APPEARANCE E18 SENSORY EVALUATION E20 TEMPERATURE MEASUREMENT E35 PESTICIDES, ANTIMICROBIALS, AND ALTERNATIVE CONTROL AGENTS E41 LABORATORY APPARATUS E53 ASSET MANAGEMENT E57 3D IMAGING SYSTEMS F02 FLEXIBLE BARRIER PACKAGING F05 BUSINESS IMAGING PRODUCTS F06 RESILIENT FLOOR COVERINGS F08 SPORTS EQUIPMENT, PLAYING SURFACES, AND FACILITIES F09 TIRES F10 LIVESTOCK, MEAT, AND POULTRY EVALUATION SYSTEMS F11 VACUUM CLEANERS F13 PEDESTRIAN/WALKWAY SAFETY AND FOOTWEAR F14 FENCES F15 CONSUMER PRODUCTS F16 FASTENERS F24 AMUSEMENT RIDES AND DEVICES F26 FOOD SERVICE EQUIPMENT F27 SNOW SKIING F37 LIGHT SPORT AIRCRAFT F43 LANGUAGE SERVICES AND PRODUCTS F44 GENERAL AVIATION AIRCRAFT A01 STEEL, STAINLESS STEEL AND RELATED ALLOYS A04 IRON CASTINGS A05 METALLIC-COATED IRON AND STEEL PRODUCTS A06 MAGNETIC PROPERTIES B01 ELECTRICAL CONDUCTORS B02 NONFERROUS METALS AND ALLOYS B05 COPPER AND COPPER ALLOYS B07 LIGHT METALS AND ALLOYS B08 METALLIC AND INORGANIC COATINGS B09 METAL POWDERS AND METAL POWDER PRODUCTS B10 REACTIVE AND REFRACTORY METALS AND ALLOYS C03 CHEMICAL-RESISTANT NONMETALLIC MATERIALS C08 REFRACTORIES C28 ADVANCED CERAMICS D01 PAINT AND RELATED COATINGS, MATERIALS, AND APPLICATIONS D20 PLASTICS D30 COMPOSITE MATERIALS E01 ANALYTICAL CHEMISTRY FOR METALS, ORES, AND RELATED MATERIALS E04 METALLOGRAPHY E07 NONDESTRUCTIVE TESTING E08 FATIGUE AND FRACTURE E12 COLOR AND APPEARANCE E13 MOLECULAR SPECTROSCOPY AND SEPARATION SCIENCE E28 MECHANICAL TESTING E29 PARTICLE AND SPRAY CHARACTERIZATION E37 THERMAL MEASUREMENTS E42 SURFACE ANALYSIS F01 ELECTRONICS F34 ROLLING ELEMENT BEARINGS F40 DECLARABLE SUBSTANCES IN MATERIALS F42 ADDITIVE MANUFACTURING TECHNOLOGIES G01 CORROSION OF METALS G03 WEATHERING AND DURABILITY D21 POLISHES D26 HALOGENATED ORGANIC SOLVENTS AND FIRE EXTINGUISHING AGENTS D33 PROTECTIVE COATING AND LINING WORK FOR POWER GENERATION FACILITIES E05 FIRE STANDARDS E27 HAZARD POTENTIAL OF CHEMICALS E30 FORENSIC SCIENCES E34 OCCUPATIONAL HEALTH AND SAFETY E35 PESTICIDES, ANTIMICROBIALS, AND ALTERNATIVE CONTROL AGENTS E52 FORENSIC PSYCHOPHYSIOLOGY E54 HOMELAND SECURITY APPLICATIONS E58 FORENSIC ENGINEERING F06 RESILIENT FLOOR COVERINGS F08 SPORTS EQUIPMENT, PLAYING SURFACES, AND FACILITIES F10 LIVESTOCK, MEAT, AND POULTRY EVALUATION SYSTEMS F12 SECURITY SYSTEMS AND EQUIPMENT F13 PEDESTRIAN/WALKWAY SAFETY AND FOOTWEAR F15 CONSUMER PRODUCTS F18 ELECTRICAL PROTECTIVE EQUIPMENT FOR WORKERS F23 PERSONAL PROTECTIVE CLOTHING AND EQUIPMENT F26 FOOD SERVICE EQUIPMENT F32 SEARCH AND RESCUE F33 DETENTION AND CORRECTIONAL FACILITIES G04 COMPATIBILITY AND SENSITIVITY OF MATERIALS IN OXYGEN ENRICHED ATMOSPHERES D08 ROOFING AND WATERPROOFING D18 SOIL AND ROCK D19 WATER D20 PLASTICS D22 AIR QUALITY D34 WASTE MANAGEMENT D35 GEOSYNTHETICS E06 PERFORMANCE OF BUILDINGS E44 SOLAR, GEOTHERMAL AND OTHER ALTERNATIVE ENERGY SOURCES E47 E48 BIOENERGY AND INDUSTRIAL CHEMICALS FROM BIOMASS E50 ENVIRONMENTAL ASSESSMENT, RISK MANAGEMENT AND CORRECTIVE ACTION E60 SUSTAINABILITY F20 HAZARDOUS SUBSTANCES AND OIL SPILL RESPONSE F40 DECLARABLE SUBSTANCES IN MATERIALS G02 WEAR AND EROSION B01 ELECTRICAL CONDUCTORS C26 NUCLEAR FUEL CYCLE D02 PETROLEUM PRODUCTS, LIQUID FUELS, AND LUBRICANTS D03 GASEOUS FUELS D05 COAL AND COKE D19 WATER D27 ELECTRICAL INSULATING LIQUIDS AND GASES D33 PROTECTIVE COATING AND LINING WORK FOR POWER GENERATION FACILITIES E10 NUCLEAR TECHNOLOGY AND APPLICATIONS E44 SOLAR, GEOTHERMAL AND OTHER ALTERNATIVE ENERGY SOURCES E48 BIOENERGY AND INDUSTRIAL CHEMICALS FROM BIOMASS A01 STEEL, STAINLESS STEEL AND RELATED ALLOYS C01 CEMENT C09 CONCRETE AND CONCRETE AGGREGATES D02 PETROLEUM PRODUCTS, LIQUID FUELS, AND LUBRICANTS D03 GASEOUS FUELS D04 ROAD AND PAVING MATERIALS D15 ENGINE COOLANTS AND RELATED FLUIDS D18 SOIL AND ROCK D24 CARBON BLACK D35 GEOSYNTHETICS E12 COLOR AND APPEARANCE E17 VEHICLE - PAVEMENT SYSTEMS E21 SPACE SIMULATION AND APPLICATIONS OF SPACE TECHNOLOGY E36 ACCREDITATION & CERTIFICATION E57 3D IMAGING SYSTEMS F03 GASKETS F07 AEROSPACE AND AIRCRAFT F09 TIRES F16 FASTENERS F25 SHIPS AND MARINE TECHNOLOGY F37 LIGHT SPORT AIRCRAFT F38 UNMANNED AIRCRAFT SYSTEMS F39 AIRCRAFT SYSTEMS F41 UNMANNED MARITIME VEHICLE SYSTEMS (UMVS) F44 GENERAL AVIATION AIRCRAFT F45 DRIVERLESS AUTOMATIC GUIDED INDUSTRIAL VEHICLES D10 PACKAGING D11 RUBBER E31 HEALTHCARE INFORMATICS E35 PESTICIDES, ANTIMICROBIALS, AND ALTERNATIVE CONTROL AGENTS E54 HOMELAND SECURITY APPLICATIONS E55 MANUFACTURE OF PHARMACEUTICAL PRODUCTS E56 NANOTECHNOLOGY F02 FLEXIBLE BARRIER PACKAGING F04 MEDICAL AND SURGICAL MATERIALS AND DEVICES F29 ANESTHETIC AND RESPIRATORY EQUIPMENT F30 EMERGENCY MEDICAL SERVICES G04 COMPATIBILITY AND SENSITIVITY OF MATERIALS IN OXYGEN ENRICHED ATMOSPHERES C07 LIME AND LIMESTONE D14 ADHESIVES D16 AROMATIC HYDROCARBONS AND RELATED CHEMICALS D20 PLASTICS D26 HALOGENATED ORGANIC SOLVENTS AND FIRE EXTINGUISHING AGENTS D28 ACTIVATED CARBON D32 CATALYSTS E13 MOLECULAR SPECTROSCOPY AND SEPARATION SCIENCE E15 INDUSTRIAL AND SPECIALTY CHEMICALS E27 HAZARD POTENTIAL OF CHEMICALS E35 PESTICIDES, ANTIMICROBIALS, AND ALTERNATIVE CONTROL AGENTS F40 DECLARABLE SUBSTANCES IN MATERIALS E11 QUALITY AND STATISTICS E36 ACCREDITATION & CERTIFICATION E43 SI PRACTICE E55 MANUFACTURE OF PHARMACEUTICAL PRODUCTS E56 NANOTECHNOLOGY F42 ADDITIVE MANUFACTURING TECHNOLOGIES
Bookmark and Share

DataPoints

DataPoints

Quantile Estimation

When More than the Mean and the Standard Deviation Are Needed

Q: Which method of quantile estimation should be used?

A: The estimation of a specific quantile of a data population characteristic is a routine statistical task. Some commonly selected levels for estimation are the first quartile (25 percent), second quartile (50 percent, median), and the third quartile (75 percent). A named quantile, say 25 percent, has 25 percent of the data distribution below the named quantile and (100-25 percent) = 75 percent of the data distribution above the named quantile. Often lower or higher quantile levels such as 1 percent, 5 percent, 10 percent, 90 percent, 95 percent and 99 percent are of interest when the tail regions of a population characteristic are of interest rather than the core of the distribution.

There are two broad approaches to quantile estimation, both of which make use of a set of sample data but in differing manners. One approach is the direct estimation approach. In this approach a given quantile is typically estimated from use of one or two specific elements of the ordered data. Statisticians consider such direct estimates to be nonparametric since they do not rely on parameters from an assumed distribution. A second approach is the distributional approach. In this approach the data is used to estimate parameters from an assumed distributional model, and these parameter estimates allow any selected quantile to be estimated. Each approach provides potential advantages and disadvantages.

Figure 1 was constructed by simulating seven random observations from a standard normal distribution with mean 0 and standard deviation 1. The blue curve shows the true population from which the sample was drawn. Two classes of quantile estimation are illustrated, distributional and direct. The distributional method shows a fitted normal distribution (red curve) using the sample mean and sample standard deviation of the seven measurements. The observed difference between the red fitted distributional model and the blue population model is due to sampling variation. This is part of the price for having very little sample data. Despite the small sample size, a distributional methodology can estimate any desired quantile.

The green and the cyan estimates were generated by two differing direct estimation methodologies (Excel and SAS) that use the ordered sample data to estimate a given quantile. These methodologies use either one sample value or two neighboring ordered sample values to provide any desired estimated quantile. A familiar example is the median, which for an odd sample size is estimated as the middle ordered observation and for even sample sizes is estimated as the average of the two middle ordered observations. While the directly estimated medians are in agreement (quantile = 50 percent), Excel and SAS are generally not in good agreement.

Figure 1 — Small Sample Size, Multiple Quantile Estimation Methodologies

Since the direct estimates are also based on the sample data, they will reflect whatever bias there was in the sample data collection. Hence they track the fitted distributional model more than the true population distribution used in Figure 1. The SAS and Excel estimates do have some things in common. Neither one provides an estimate that is below the observed minimum or is above the observed maximum. This is problematic in that if, for example, one additional sample point beyond seven were to be collected there is a 25 percent = (2/(7 + 1)) chance it will exceed either the prior maximum or fall under the prior minimum. Such direct estimates are not reasonable for estimating a quantile that is beyond the quantile that is likely to be contained in the sample data. In this respect, the Excel method (percentile function) is worse than the illustrated SAS method, but neither is good for small sample estimation of a quantile relatively close to zero or relatively close to one, i.e., the extremes of the distribution. There are many alternative “direct” methodologies, for example, SAS offers the choice of five approaches (PCTLDEF = 4 used herein and is recommended). SAS’s default approach, PCTLDEF = 5, is not recommended for small sample sizes. Excel’s results do not benchmark to any of these five SAS definitions and appears to be a “unique” definition. Minitab, for example, uses the equivalent of SAS’s PCTLDEF = 4 when it reports quartiles in descriptive statistics results. For a small sample size, it is recommended that you do not use the Excel percentile function.

Figure 2 — Moderate Sample Size, Multiple Quantile Estimation Methodologies

Figure 2 was constructed by simulating 50 random observations from a nonnormal distribution. The blue curve shows the true population from which the sample was drawn. The results of fitting a normal distribution using the sample mean and sample standard deviation is in red. The relatively large observed differences between the red fitted distributional model and the blue population model is due to assuming normality when it is not appropriate. The direct estimation methodologies do a much better job than a poorly assumed normal distribution in this example. If any distribution is to be fit, at a minimum, the data should not statistically contradict use of such an assumed model. Additionally, the differences between the SAS PCTLDEF = 4 methodology (cyan) and the Excel methodology (green) have become relatively small other than in the tail regions.

As the sample size becomes large, the use of any fitted distributional model becomes relatively more questionable as a means of quantile estimation. The differences between varying definitions of how to estimate a quantile from an ordered set of data becomes less and less relevant with increasing sample size. Large sample size almost always implies that a direct method of estimation is preferred. The only large sample size caution is that estimating an extreme tail quantile can still be problematic unless the sample data collection is large enough.

Why should quantile estimation be of interest? The short version is that the mean and the standard deviation are often not enough to effectively summarize a distribution. Statistically significant disagreement that is practically meaningful between a quantile estimated directly and one estimated from an assumed distribution implies that use of the distributional model is ill advised. As sample size becomes large, direct estimation will almost always provide better results.

Thomas J. Bzik, Air Products and Chemicals Inc., Allentown, Pennsylvania, is the American Statistical Association representative to ASTM Committee E11 on Quality and Statistics; he serves as E11 secretary and as vice chairman of E11.11 onSampling/Statistics.

Dean V. Neubauer, Corning Inc., Corning, New York, is an ASTM International fellow, chairman of E11.90.03 on Publications and coordinator of the DataPoints column; he is immediate past chairman of Committee E11 on Quality and Statistics.

Go to other DataPoints articles.

This article appears in the issue of Standardization News.