Using Simple Graphics in Data Analysis
A. Many years ago, as a student in the statistics program at Rutgers University, I took a course entitled Interpretation of Data from Professor Ellis R. Ott. This course was very practical, as it stressed the use of graphics as an important part of data analysis. This article shares an example showing two simple graphics, the time plot and the multiple dot plot, that have proven to be particularly useful in process troubleshooting.
A test method was conducted in a control laboratory to determine the active content level in a powdered premix prepared by a dry blending operation. The test was conducted in duplicate and the two test results were averaged and reported. The absolute difference between the two results (denoted as the range in statistical terminology) had to be less than three units in order for the assay to be valid. A similar limit is known within ASTM as the repeatability limit, with symbol r, the value below which the range of duplicate test results should occur approximately 95 percent of the time.
The laboratory manager stated that this three-unit limit had been exceeded recently and asked me to look at the data with the possibility of raising the limit. I was given 46 recent analysis sheets that recorded the date of analysis, the two test results and their range, the lab operator performing the test and the blend number.
As with most process troubleshooting situations, the data occurred in a time-sequential order. A good initial graphic to use is the time plot, which simply plots the data in time order, as shown in Figure 1 for the test result ranges. (This may be also called a run chart.) For this data set, the time plot reflected the repeatability of the measurement process over time, and the plot indicated an upward shift in in the test variability after the 20th time point. During this latter period, the three-unit limit on the ranges was exceeded three times. A cause for this shift in precision needed to be found.
The simple time plot approach may not always give such a definitive lead in troubleshooting, and the next two levels of sophistication would be a runs test (number of time points in a row above or below the average value) and a control chart analysis. A control chart is a time plot with control limits superimposed on the plot. ASTM E2587, Practice for Use of Control Charts in Statistical Process Control, contains additional material for using control charts, where the plotted points can represent individual data, averages, ranges, standard deviations, proportions or counts of events.
The data sheets indicated that a single lab operator had conducted the first 20 analyses and that three additional operators appeared thereafter. This information suggested stratifying the range data by lab operator. A simple graphic for this situation is the multiple dot plot, where the data subsets for each operator are plotted on separate horizontal lines on the graph. This plot provides a rough picture of the data distribution for each operator for comparison purposes.
The ranges from the last 30 time points are depicted as a multiple dot plot in Figure 2. Operator 1 was the initial operator and Operators 2, 3 and 4 were the newly assigned ones. The data distributions indicated that the new operators had appreciably larger differences than Operator 1.
In further discussion with the laboratory supervisor it developed that the production level of this product had increased and the manufacturing had gone to a three-shift operation from a single shift. He also stated that three more lab operators were assigned to this test to support the increased testing load.
A meeting was scheduled with the four operators to compare how they conducted the test method. The main steps in the test method were 1) weighing out a subsample from the plant sample, 2) extracting the active content into a solvent and 3) reading the solution in an instrument. The operating procedure of the test method was well-defined for Steps 2 and 3, but the procedure for taking the subsample in Step 1 was not well-defined, so the operators had been left to their own devices in performing subsampling. The technique of subsampling used by Operator 1 minimized the de-mixing of the material during subsampling, which would lead to different amounts of active content in the two subsamples used in the test. This technique, which gave the better precision, was adopted and written into the test method. After training Operators 2-4, the repeatability problem was solved.
For further analysis of stratified data, additional data analysis techniques exist. A recommended reference for such techniques is the book, Process Quality Control: Troubleshooting and Interpretation of Data, by Ott et. al.1 This classic book gives many examples of troubleshooting together with a number of data analysis procedures for insight into the problems.
Graphical analysis, in addition to statistical analysis, is an important component of data analysis applied to troubleshooting situations. As Yogi Berra might have said, "You can see a lot by just looking!"
1. Ott, E.R., Schilling, E.G, and Neubauer, D.V., Process Quality Control: Troubleshooting and Interpretation of Data, Fourth Edition, American Society for Quality/Quality Press, Milwaukee, Wis., 2005.Thomas Murphy, T.D. Murphy Statistical Consulting LLC, Morristown, N.J., is chairman of Subcommittee E11.20 on Test Method Evaluation and Quality Control, a part of ASTM Committee E11 on Quality and Statistics.