If you are an ASTM Compass Subscriber and this document is part of your subscription, you can access it for free at ASTM Compass
    ASTM F3263 - 17

    Standard Guide for Packaging Test Method Validation

    Active Standard ASTM F3263 | Developed by Subcommittee: F02.50

    Book of Standards Volume: 15.10

      Format Pages Price  
    PDF 14 $58.00   ADD TO CART
    Hardcopy (shipping and handling) 14 $58.00   ADD TO CART

    Significance and Use

    4.1 Addressing consensus standards with inter-laboratory studies (ILS) and methods specific to an organization. Test methods need to be validated in many cases, in order to be able to rely on the results. This has to be done at the organization performing the tests but is also performed in the development of standards in inter-laboratory studies (ILS), which are not substitutes for the validation work to be performed at the organization performing the test.

    4.1.1 Validations at the Testing Organization—Validations at the test performing organization include planning, executing, and analyzing the studies. Planning should include description of the scope of the test method which includes the description of the test equipment as well as the measurement range of samples it will be used for, rationales for the choice of samples, the amount of samples as well as rationales for the choice of methodology.

    4.1.2 Objective of ILS Studies—ILS studies (per E691-14) are not focused on the development of test methods but rather with gathering the information needed for a test method precision statement after the development stage has been successfully completed. The data obtained in the interlaboratory study may indicate however, that further effort is needed to improve the test method. Precision in this case is defined as the repeatability and reproducibility of a test method, commonly known as gage R&R. For interlaboratory studies, repeatability deals with the variation associated within one appraiser operating a single test system at one facility whereas reproducibility is concerned with variation between labs each with their own unique test system. It is important to understand that if an ILS is conducted in this manner, reproducibility between appraisers and test systems in the same lab are not assessed.

    4.1.3 Overview of the ILS Process—Essentially the ILS process consists of planning, executing, and analyzing studies that are meant to assess the precision of a test method. The steps required to do this from an ASTM perspective are; create a task group, identify an ILS coordinator, create the experimental design, execute the testing, analyze the results, and document the resulting precision statement in the test method. For more detail on how to conduct an ILS refer to E691-14.

    4.1.4 Writing Precision and Bias Statements—When writing Precision and Bias Statements for an ASTM standard, the minimum expectation is that the Standard Practice outlined in E177-14 will be followed. However, in some cases it may also be useful to present the information in a form that is more easily understood by the user of the standard. Examples can be found in 4.1.5 below.

    4.1.5 Alternative Approaches to Analyzing and Stating Results—Variable Data: Capability Study: 

    (1) A process capability greater than 2.00 indicates the total variability (part-to-part plus test method) of the test output should be very small relative to the tolerance. Mathematically,

    Equation F3263-17_1

    (2) Notice, σTotal in the above equation includes σPart and σTM. Therefore, two conclusions can be made:

    (a) The test method can discriminate at least 1/12 of the tolerance and hence the test method resolution is adequate Therefore, no additional analysis such as a Gage R&R Study is necessary.

    (b) The measurement is precise relative to the specification tolerance.

    (3) In addition, since the TMV capability study requires involvement of two or more operators utilizing one or more test systems, a high capability number will prove consistent test method performance across operators and test systems. Gage R&R Study: 

    (1) The proposed acceptance criteria below for %SV, %R&R, and %P/T came from the industry-wide adopted requirements for measurement systems. According to Automotive Industry Action Group (AIAG) Measurement System Analysis Manual (4th edition, p. 78), a test method can be accepted if the test method variation (σTM) counts for less than 30 percent of the total variation of the study (σTotal).

    (2) This is equivalent to:A process capability greater than 2.00 indicates the total variability (part-to-part plus test method) of the test output should be very small relative to the tolerance. Mathematically,

    Equation F3263-17_2

    (3) When historical data is available to evaluate the variability of the process, we should also have:

    Equation F3263-17_3

    (4) For %P/T, another industry-wide accepted practice is to represent the population using the middle 99% of the normal distribution.5 And ideally, the tolerance range of the output should be wider than this proportion. For a normally distributed population, this indicates:

    Equation F3263-17_4

    (5) The factor 5.15 in the above equation is the two-sided 99% Z-score of a normal distribution. Therefore:

    Equation F3263-17_5

    (6) In practice this means that a test method with up to 6% P/T reproducibility would be effective at assessing the P/T for a given design. Power and Sample Size Study: 

    (1) When comparing the means of two or more populations using statistical tests, excessive test method variability may obscure the real difference (“Signal”) and decrease the power of the statistical test. As a result, a large sample size may be needed to maintain an adequate power ( 80%) for the statistical test. When the sample size becomes too large to accept from a business perspective, one should improve the test method before running the comparative test. Therefore, an accept /reject decision on a comparative test method could be made based on its impact on the power and sample size of the comparative test (ex. 2 Sample T-test).

    4.2 Attribute Test Method Validation: 

    4.2.1 Objective of Attribute Test Method Validation—Attribute test method validation (ATMV) demonstrates that the training and tools provided to inspectors enable them to distinguish between good and bad product with a high degree of success. There are two criteria that are used to measure whether an ATMV has met this objective. The primary criterion is to demonstrate that the maximum escape rate, β, is less than or equal to its prescribed threshold of βmax. The parameter β is also known as Type II error, which is the probability of wrongly accepting a non-conforming device. The secondary criterion is to demonstrate that the maximum false alarm rate, α, is less than or equal to its prescribed threshold of αmax. The parameter α is also known as Type I error, which is the probability of wrongly rejecting a conforming device.

    4.2.2 Overview of the ATMV Process—This section describes how an ATMV typically works. In an attribute test method validation, a single, blind study is conducted that is comprised of both conforming and non-conforming units. The ATMV passes when the requirements of the both sampling plans are met. The first sampling plan demonstrates that the test method meets the requirements for the maximum allowable beta error (escape rate), and the second sampling plan demonstrates that the test method meets the requirements for the maximum allowable alpha error (false alarm rate). In other words, the test method is able to demonstrate that it accepts conforming units and rejects non-conforming units with high levels of effectiveness. The beta error sampling plan will consist entirely of nonconforming units. The total number of beta trials conducted by each inspector6 are pooled together, and their total number of misclassifications (nonconforming units that were accepted) need to be less than or equal to the number of failures prescribed by the beta error sampling plan. The alpha error sampling plan will consist entirely of conforming units. The total number of alpha trials conducted by each inspector are pooled together, and their total number of misclassifications (conforming units that were rejected) need to be less than or equal to the number of failures prescribed by the alpha error sampling plan.

    4.2.3 ATMV Examples—Attribute test methods cover a broad range of testing. Examples of these test method categories are listed in Table 1. The right half of the table consists of test methods that return qualitative responses, and the left half of the table contains test methods that provide variable measurement data.

    4.2.4 ATMV for Variable Measurement Data—It is a good practice to analyze variable test methods as variable measurement data whenever possible. However, there are instances where measurement data is more effectively treated as qualitative data. Example: A Sterile Barrier System (SBS) for medical devices with a required seal strength specification of 1.0-1.5 lb./in. is to be validated. A tensile tester is to be used to measure the seal strength, but it only has a resolution of 0.01 lbs. As a result, the Ppk calculations typically fail, even though there is very rarely a seal that is out of specification in production. The validation team determines that the data will need to be treated as attribute, and therefore, an ATMV will be required rather than a variable test method validation.

    4.2.5 Self-evident Inspections—This section illustrates the requirements of a self-evident inspection called out in the definitions above. To be considered a self-evident inspection, a defect is both discrete in nature and requires little or no training to detect. The defect cannot satisfy just one or the other requirement. The following may be considered self-evident inspections:

    (1) Sensor light illuminates when lubricity level on a wire is correct and otherwise does not light up when lubrication is insufficient – Since the test equipment is creating a binary output for the inspector and the instructions are simple, this qualifies as self-evident. However, note that a test method validation involving the equipment needs to be validated.

    (2) Component is present in the assembly – If the presence of the component is reasonably easy to detect, this qualifies as self-evident since the outcome is binary.

    (3) The correct component is used in the assembly – As long as the components are distinct from one another, this qualifies as self-evident since the outcome is binary. The following would generally not be considered self-evident inspections:

    (1) Burn or heat discoloration – Unless the component completely changes color when overheated, this inspection is going to require the inspector to detect traces of discoloration, which fails to satisfy the discrete conditions requirement.

    (2) Improper forming of S-bend or Z-bend – The component is placed on top of a template, and the inspector verifies that the component is entirely within the boundaries of the template. The bend can vary from perfectly shaped to completely out of the boundaries in multiple locations with every level of bend in-between. Therefore, this is not a discrete outcome.

    (3) No nicks on the surface of the component – A nick can vary in size from “not visible under magnification” to “not visible to the unaided eye” to “plainly visible to the unaided eye”. Therefore, this is not a discrete outcome.

    (4) No burrs on the surface of a component – Inspectors vary in the sensitivity of their touch due to callouses on their fingers, and burrs vary in their degree of sharpness and exposure. Therefore, this is neither a discrete condition nor an easy to train instruction.

    (5) Component is cracked – Cracks vary in length and severity, and inspectors vary in their ability to see visual defects. Therefore, this is neither a discrete outcome nor an easy to train instruction.

    4.2.6 ATMV Steps: Step 1 – Prepare the test method documentation: 

    (1) Make sure equipment qualifications have been completed or are at least in the validation plan to be completed prior to executing the ATMV.

    (2) Examples of equipment settings to be captured in the test method documentation include environmental or ambient conditions, magnification level on microscopes, lighting and feed rate on automatic inspection systems, pressure on a vacuum decay test and lighting standards in a cleanroom, which might involve taking lux readings in the room to characterize the light level.

    (3) Work with training personnel to create pictures of the defects. It may be beneficial to also include pictures of good product and less extreme examples of the defect, since the spectrum of examples will provide better resolution for decision making.

    (4) Where possible, the visual design standards should be shown at the same magnification level as will be used during inspection.

    (5) Make sure that the ATMV is run using the most recent visual design standards and that they are good representations of the potential defects. Step 2 – Establish acceptance criteria: 

    (1) Identify which defects need to be included in the test.

    (2) Use scrap history to identify the frequency of each defect code or type. This could also be information that is simply provided by the SME.

    (3) Do not try to squeeze too many defects into a single inspection step. As more defects are added to an inspection process, inspectors will eventually reach a point where they are unable to check for everything, and this threshold may also show itself in the ATMV testing. Limits will vary by the type of product and test method, but for visual inspection, 15-20 defects may be the maximum number that is attainable. Step 3 – Determine the required performance level of each defect: 

    (1) If the ATMV testing precedes completion of a risk analysis, the suggested approach is to use a worse-case outcome or high risk designation. This needs to be weighed against the increase in sample size associated with the more conservative rating.

    (2) Failure modes that do not have an associated risk index may be tested to whatever requirements are agreed upon by the validation team. If a component or assembly can be scrapped for a particular failure mode, good business sense is to make sure that the inspection is effective by conducting an ATMV.

    (3) Pin gages are an example of a variable output that is sometimes treated as attribute data due to poor resolution combined with tight specification limits. In this application, inspectors are trained prior to the testing to understand the level of friction that is acceptable versus unacceptable.

    (4) Incoming inspection is another example of where variable data is often treated as attribute. Treating variable measurements as pass/fail outcomes can allow for less complex measurement tools such as templates and require less training for inspectors. However, these benefits should be weighed against the additional samples that may be required and the degree of information lost. For instance, attribute data would say that samples centered between the specification limits are no different than samples just inside of the specification limits. This could result in greater downstream costs and more difficult troubleshooting for yield improvements. Step 4 – Determine acceptance criteria: 

    (1) Refer to your company’s predefined confidence and reliability requirements; or

    (2) Refer to the chart example in Appendix X1. Step 5 – Create the validation plan: 

    (1) Determine the proportion of each defect in the sample.

    (a) While some sort of rationale should be provided for how the defect proportions are distributed in the ATMV, there is some flexibility in choosing the proportions. Therefore, different strategies may be employed for different products and processes, for example 10 defective parts in 30 or 20 defects in 30. The cost of the samples along with the risk associated with incorrect outcomes affects decision making.

    (b) Scrap production data will often not be available for new products. In these instances, use historical scrap from a similar product or estimate the expected scrap proportions based on process challenges that were observed during development. Another option is to represent all of the defects evenly. Step 6 – Determine the number of inspectors and devices needed: 

    (1) When the number of trials is large, consider employing more than three inspectors to reduce the number of unique parts required for the test. More inspectors can inspect the same parts without adding more parts to achieve additional trials and greater statistical power.

    (2) Inspectors are not required to all look at the same samples, although this is probably the simplest approach.

    (3) For semi-automated inspection systems that are sensitive to fixture placement or setup by the inspector, multiple inspectors should still be employed for the test.

    (4) For automated inspection systems that are completely inspector independent, only one inspector is needed. However, in order to reduce the number of unique parts needed, consider controlling other sources of variation such as various lighting conditions, temperature, humidity, inspection time, day/night shift, and part orientations. Step 7 – Prepare the Inspectors: 

    (1) Train the inspectors prior to testing:

    (a) Explain the purpose and importance of ATMV to the inspectors.

    (b) Inspector training should be a two-way process. The validation team should seek feedback from the inspectors on the quality and clarity of visual standards, pictures and written descriptions in the inspection documentation.

    (1) Are there any gray areas that need clarification?

    (2) Would a diagram be more effective than an actual picture of the defect?

    (c) Review borderline samples. Consider adding pictures/diagrams of borderline samples to the visual standards. In some cases there may be a difference between functional and cosmetic defects. This may vary by method/package type.

    (d) Some validation teams have performed dry run testing to characterize the current effectiveness of the inspection. Note that the same samples should not be used for dry run testing and final testing if the same inspectors are involved in both tests. Step 8 – Select a representative group of inspectors as the test group: 

    (1) There will be situations, such as site transfer, where all of the inspectors have about the same level of familiarity with the product. If this is the case, select the test group of inspectors based on other sources of variability within the inspectors, such as their production shift, skill level or years of experience with similar product inspection.

    (2) The inspectors selected for testing should at least have familiarity with the product, or this becomes an overly conservative test. For example, a lack of experience with the product may result in an increase in false positives.

    (3) Document that a varied group of inspectors were selected for testing. Step 9 – Prepare the Test Samples: 

    (1) Collect representative units.

    (a) Be prepared for ATMV testing by collecting representative defect devices early and often in the development process. Borderline samples are particularly valuable to collect at this time. However, be aware that a sample that cannot even be agreed upon as good or bad by the subject matter experts is only going to cause problems in the testing. Instead, choose samples that are representative of “just passing” and “just failing” relative to the acceptance criteria.

    (2) Use the best judgment as to whether the man-made defect samples adequately represent defects that naturally occur during the sealing process, distribution simulation, or other manufacturing processes, for example. If a defect cannot be adequately replicated and/or the occurrence rate is too low to provide a sample for the testing, this may be a situation where the defect type can be omitted with rationale from the testing.

    (3) Estimate from a master plan how many defects will be necessary for testing, and try to obtain 1.5 times the estimated number of samples required for testing. This will allow for weeding out broken samples and less desirable samples.

    (4) Traceability of samples may not be necessary. The only requirement on samples is that they accurately depict conformance or the intended nonconformance. However, capturing traceability information may be helpful for investigational purposes if there is difficulty validating the method or if it is desirable to track outputs to specific non-conformities.

    (5) There should preferably be more than one SME to confirm the status of each sample in the test. Keep in mind that a trainer or production supervisor might also be SMEs on the process defect types.

    (6) Select a storage method appropriate for the particular sample. Potential options include tackle boxes with separate labeled compartments, plastic resealable bags and plastic vials. Refer to your standardized test method for pre-conditioning requirements.

    (7) Writing a secret code number on each part successfully conceals the type of defect, but it is NOT an effective means of concealing the identity of the part. In other words, if an inspector is able to remember the identification number of a sample and the defect they detected on that sample, then the test has been compromised the second time the inspector is given that sample. If each sample is viewed only once by each inspector, then placing the code number on the sample is not an issue.

    (8) Video testing is another option for some manual visual inspections, especially if the defect has the potential to change over time, such as a crack or foreign material.

    (9) If the product is extremely long/large, such as a guidewire, guide catheter, pouch, tray, container closure system (jar & lid), and the defects of interest are only in a particular segment of the product, one can choose to detach the pertinent segment from the rest of the sample. If extenuating factors such as length or delicacy is an element in making the full product challenging to inspect, then the full product should be used. Example: leak test where liquid in the package that could impact the test result.

    (10) Take pictures or videos of samples with defects and store in a key for future reference. Step 10 – Develop the protocol: 

    (1) Suggested protocol sections

    (a) Purpose and scope.

    (b) Reference to the test method document being validated.

    (c) A list of references to other related documents, if applicable.

    (d) A list of the types of equipment, instruments, fixtures, etc. used for the TMV.

    (e) TMV study rationale, including:

    (1) Statistical method used for TMV;

    (2) Characteristics measured by the test method and the measurement range covered by the TMV;

    (3) Description of the test samples and the rationale;

    (4) Number of samples, number of operators, and number of trials;

    (5) Data analysis method, including any historical statistics that will be used for the data analysis (for example, the historical average for calculating %P/T with a one-sided specification limit).

    (f) TMV acceptance criteria.

    (g) Validation test procedures (for example, sample preparation, test environment setup, test order, data collection method, etc.).

    (h) Methods of randomization

    (1) There are multiple ways to randomize the order of the samples. In all cases, store the randomized order in another column, then repeat and append the second randomized list to the first stored list for each sample that is being inspected a second time by the same inspector.

    (2) Consider using Excel, Minitab, or an online random number generator to create the run order for the test.

    (3) Draw numbers from a container until the container is empty and record the order.

    (i) Some companies apply time limits to each sample or a total time limit for the test so that the testing is more representative of the fast-paced requirements of the production environment. If used, this should be noted in the protocol. Step 11 – Execute the protocol: 

    (1) Be sure to comply with the pre-conditioning requirements during protocol execution.

    (2) Avoid situations where the inspector is hurrying to complete the testing. Estimate how long each inspector will take and plan ahead to start each test with enough time in the shift for the inspector to complete their section, or communicate that the inspector will be allowed to go for lunch or break during the test.

    (3) Explain to the inspector which inspection steps are being tested. Clarify whether there may be more than one defect per sample. However, note that more than one defect on a sample can create confusion during the testing.

    (4) If the first person fails to correctly identify the presence or absence of a defect, it is a business/team decision on whether to continue the protocol with the remaining inspectors. Completing the protocol will help characterize whether the issues are widespread, which could help avoid failing again the next time. On the other hand, aborting the ATMV right away could save considerable time for everyone.

    (5) It is not good practice to change the sampling plan during the test if a failure occurs.7 For instance, if the original beta error sampling plan was n=45, a=0, and a failure occurs, updating the sampling plan to an n = 76, a=1 during the test is incorrect since the sampling plan being performed is actually a double sampling plan with n1=45, a1=0, r1=2, n2=31, a1=1. This results in an LTPD = 5.8%, rather than the 5.0% LTPD in the original plan.

    (6) Be prepared with replacement samples in reserve if a defect sample becomes damaged.

    (7) Running the test concurrently with all of the test inspectors is risky, since the administrator will be responsible for keeping track of which inspector has each unlabeled sample.

    (8) Review misclassified samples after each inspector to determine whether the inspector might have detected a defect that the prep team missed. Step 12 – Analyze the test results: 

    (1) Scrapping for the wrong defect code or defect type:

    (a) There will be instances where an inspector describes a defect with a word that wasn’t included in the protocol. The validation team needs to determine whether the word used is synonymous with any of the listed names for this particular defect. If not, then the trial fails. If the word matches the defect, then note the exception in the deviations section of the report.

    (2) Excluding data from calculations of performance:

    (a) If a defect is discovered after the test is complete, there are two suggested options. First, the inspector may be tested on a replacement part later if necessary. Alternatively, if the results of the individual trial will not alter the final result of the sampling plan, then the replacement trials can be bypassed. This rationale should be documented in the deviations section of the report.

    (1) As an example, consider an alpha sampling plan of n = 160, a = 13 that is designed to meet a 12% alpha error rate. After all inspectors had completed the test, it was determined that one of the conforming samples had a defect, and five of the six trials on this sample identified this defect, while one of the six called this a conforming sample. The results of the six trials need to be scratched, but do they need to be repeated? If the remaining 154 conforming trials have few enough failures to still meet the required alpha error rate of 12%, then no replacement trials are necessary. The same rationale would also apply to a defective sample in a beta error sampling plan.

    (2) If a vacuum decay test sample should have failed the leak test, in that case as part of the protocol the process may be to send the sample back to the company that created the defective sample for confirmation that it is indeed still defective. If found to no longer be representative of the desired defect type, then the sample would be excluded from the calculations. Step 13 – Complete the validation report: 

    (1) When the validation test passes:

    (a) If the ATMV was difficult to pass or it requires special inspector training, consider adding an appraiser proficiency test to limit those who are eligible for the process inspection.

    (2) When the validation test fails:

    (a) Repeating the validation

    (1) There is no restriction on how many times an ATMV can fail. However, some common sense should be applied, as a high number of attempts appear to be a test-until-you-pass approach and could become an audit flag. Therefore, a good rule of thumb is to perform a dry run or feasibility assessment prior to execution to optimize appraiser training and test methodology in order to reduce the risk of failing the protocol. If an ATMV fails, members of the validation team could take the test themselves. If the validation team passes, then something isn’t being communicated clearly to the inspectors, and additional interviews are needed to identify the confusion. If the validation team also fails the ATMV, this is a strong indication that the visual inspection or attribute test method is not ready for release.

    (b) User Error

    (1) Examples of ATMV test error include:

    (a) Microscope set at the wrong magnification.

    (b) Sample traceability compromised during the ATMV due to a sample mix-up.

    (2) A test failure demonstrates that the variability among inspectors needs to be reduced. The key is to understand why the test failed, correct the issue and document rationale, so that subsequent tests do not appear to be a test-until-you-pass approach.

    (3) As much as possible, the same samples should not be used for the subsequent ATMV if the same inspectors are being tested that were in the previous ATMV.

    (4) Interview any inspectors who committed classification errors to understand if their errors were due to a misunderstanding of the acceptance criteria or simply a miss.

    (5) To improve the proficiency of defect detection/test methodology the following are some suggested best practices:

    (a) Define an order of inspection in the work instruction for the inspectors, such as moving from proximal end to distal end or doing inside then outside.

    (b) When inspecting multiple locations on a component or assembly for specific attributes, provide a visual template with ordered numbers to follow during the inspection.

    (c) Transfer the microscope picture to a video screen for easier viewing.

    (d) If there are too many defect types to look for at one inspection step, some may get missed. Move any inspections not associated with the process back upstream to the process that would have created the defect.

    (6) When an inspector has misunderstood the criteria, the need is to better differentiate good and nonconforming product. Here are some ideas:

    (a) Review the visual standard of the defect with the inspectors and trainers.

    (b) Determine whether a diagram might be more informative than a photo.

    (c) Change the magnification level on the microscope.

    (d) If an ATMV is failing because borderline defects are being wrongly accepted, slide the manufacturing acceptance criteria threshold to a more conservative level. This will potentially increase the alpha error rate, which typically has a higher error rate allowance anyway, but the beta error rate should decrease.

    (7) Consider using an attribute agreement analysis to help identify the root cause of the ATMV failure as it is a good tool to assess the agreement of nominal or ordinal ratings given by multiple appraisers. The analysis will calculate both the repeatability of each individual appraiser and the reproducibility between appraisers, similar to a variable gage R&R. Step 14 – Post-Validation Activities: 

    (1) Test Method Changes

    (a) If requirements, standards, or test methods change, the impact of the other two factors needs to be assessed.

    (b) As an example, many attribute test methods such as visual inspection have no impact on the form, fit or function of the device being tested. Therefore, it is easy to overlook that changes to the test method criteria documented in design prints, visual design standards, visual process standards need to be closely evaluated for what impact the change might have on the performance of the device.

    (c) A good practice is to bring together representatives from operations and design to review the proposed change and consider potential outcomes of the change.

    (d) For example, changes to the initial visual inspection standards that were used during design verification builds may not identify defects prior to going through the process of distribution simulation. Stresses that were missed during this initial inspection may be exacerbated by exposure to shock, vibration, thermal cycling associated with the distribution simulation process. Thus, it’s important to understand the impact that changes to the visual standards used upstream may have on downstream inspections.

    (2) Augmented Test Method Validation—Sometimes a new defect is identified after the ATMV has already been validated. There are a variety of ways to validate detection of the new failure mode.

    (a) Option #1 – Repeat the entire validation with the addition of the new criterion.

    (1) Advantages: The end result is a complete, stand-alone validation that completely represents the final inspection configuration.

    (2) Disadvantages: This is an excessive level of work that amounts to revalidation of the entire ATMV, all for the addition of a single inspection criterion. Furthermore, if the ATMV fails for one of the pre-existing failure modes, this brings the validated ATMV into question, as well as historical production devices that were approved by this test method.

    (b) Option #2 – Run a fully powered ATMV with only the defect associated with the new criterion.

    (1) Advantages: Since the ability to detect the other defects has already been validated, this approach keeps the focus on the newly introduced criterion.

    (2) Disadvantages: the distribution of defect types in the test samples should be based on a proportional historical scrap representation. Therefore, an ATMV comprised of only the one defect would completely overwhelm the sample sizes of other defect codes in the original ATMV. Secondly, if there are a limited number of inspectors available, this quickly becomes a burdensome effort to find enough nonconforming samples.

    (3) Note—Consider doing a risk-based analysis on the defect type in order to determine the appropriate amount of samples.

    (c) Option #3 – Run an augmented ATMV with only the new defect at a smaller sample size.

    (1) Advantages: This approach combines the advantages of the first two options without the drawbacks. The focus is on the defect with the new inspection criteria but at sample sizes that sufficiently exercise the knowledge of the inspectors. The augmented report can point back at the original ATMV and the updated MVR, so that future ATMVs are aware of the new defect to include in future reports.

    (2) Disadvantages: One could argue that the inspectors are not tested on a mix of new and old defects at the same time. However, the inspectors are still presented a mix of conforming and nonconforming parts in order to evaluate both alpha and beta error rates, so an augmented approach still challenges the inspectors’ understanding of the visual requirements.

    (d) Option #4 – Run an augmented ATMV based on additional process knowledge gained over time.

    (1) Advantages: May reduce sample size because more data has been collected over time that better allows you to understand the true occurrence rate of defects.

    (2) Disadvantages: May increase sample sizes you have more data regarding the occurrence rate for defects.

    (3) Leveraging of Attribute Test Methods—Leveraging is a smart, efficient approach to use when it’s appropriate to do so. In particular, leveraging should be considered when the assembly being inspected is identical or sufficiently similar to a previously validated assembly. This might occur when an assembly is being used in a next-generation device or when a product is being transferred to another site.

    (a) Suggested ATMV requirements for leveraging:

    (1) The same tools, inspection criteria, and training methods are used. For instance, microscopes should be set at the same magnification level. If the inspection is being relaxed or tightened for the new device, then the ATMV should be repeated.

    (2) Minimum performance requirements still need to be met for the new product. So if the defect on the old product was rated as Low Risk Index but the same defect on the new product is considered to be a Medium or High Risk, then the existing ATMV cannot be leveraged unless the original ATMV was tested to High Risk requirement levels.

    (3) There cannot be any changes or “improvements” to the inspection process or acceptance criteria. One man’s clarification is another’s confusion.

    (4) A protocol is not necessarily required when leveraging; only a report is needed. The message here is that there should be a consistent way of documenting that the test method has been leveraged, but that this activity doesn’t require the protocol to be touched. Instead, just the original test method report can be updated to say that an additional product, production line or site is also using this validated test method.

    4.3 Variable Test Method Validation: 

    4.3.1 Objective of variable test method validation: A variable test method should be validated before assuming the test results are accurate and precise. Test method variation exists in all testing and should be assessed for its impact on the test results and/or product specification tolerance. The purpose of running variable test method validation is to:

    (1) Demonstrate that the precision of the test method is adequate for the test characteristic being measured.

    (2) Provide objective evidence of consistent operation of test equipment and consistent performance of test procedures. The variable TMV covers the precision requirement for using a gage or test equipment. Although accuracy of the gage or test equipment is not explicitly defined in this document, it should be evaluated per the calibration procedure and equipment qualification process of your company or the equipment manufacturer’s recommendations. The successful outcome of a variable TMV, combined with the completion of calibration and equipment qualification, will ensure that the test method (including test equipment) makes accurate and precise measurements.

    4.3.2 Test Method Variation: In general, observed test results include two sources of variation: part-to-part variation and test method variation. The relationship between the observed, part-to-part, and test method variation can be described mathematically:

    Equation F3263-17_6

    Or, graphically as shown in Fig. 1.

    (1) Capability Study

    (a) In the absence of a statistically driven sample size defer to collecting a minimum of n=30 measurement data points from two or more appraisers.

    (b) Existing data can be used if it contains a minimum of 30 samples and the TMV protocol is not required in this case. However, reference to the original data sources should be included in the TMV report.

    (c) If a test method is to be performed by a single designated appraiser, it is permissible to use data solely from this appraiser to meet the n=30 sample size requirement. However, the test method document and test method validation report should clearly state this limitation. In order to add additional trained appraisers subsequently to the test method, reassessment of the data from the original appraisers is required, followed by revision of the test method document and test method validation report.

    (2) Gage Repeatability and Reproducibility Study

    (a) A gage repeatability and reproducibility (Gage R&R) study is a statistical method to:

    (1) Estimate the repeatability, reproducibility and total variability of a test method;

    (2) Assess whether the precision of a test method is adequate relative to the tolerance range of the parts or products being measured, or to the overall process variation or the total study variation.

    (b) Sample Size of a Gage R&R Study

    (1) The goal of a Gage R&R study is to estimate σrepeatability and σreproducibility. Sample size does impact the standard deviations (or variance) estimation:

    Sample Size


    # of parts

    σrepeatability and σreproducibility

    # of operators


    # of trials


    (2) Generally speaking, the higher the sample size, the more accurate the standard deviation estimation. This is because the sample standard deviation statistic follows a chi-square distribution with the degrees of freedom related to the sample size. A larger sample size (higher degrees of freedom) will lead to a chi-square distribution with less spread and therefore a narrower confidence interval for the standard deviation.

    (3) A detailed mathematical assessment of Gage R&R sample size described by reference book Design and Analysis of Gage R&R8 allows us to reach the following conclusions:

    (# of parts) × (# of operators) × (# of trials - 1 ) 15.

    3 or more operators in a Gage R&R study is recommended to reduce the risk of getting an inflated estimation of the reproducibility, which has a tighter threshold.

    (c) Non-destructive Test Method

    (1) Use a minimum of 3 test samples for a Gage R&R study. If the scope of the TMV includes multiple products/models/tabs, use one of the following three approaches to select samples for testing:

    (a) Use samples of the product/model/tab that provide the greatest challenge toward the TMV acceptance criteria (for example, the product that has high test method variability but the tightest tolerance range.)

    (b) Use the bracketing technique. Run two or more Gage R&R studies, each for a product/model/tab that represents a possible extreme condition. Using this strategy is necessary to adequately assure the test method performance across the whole spectrum bracketed by those extreme conditions.

    (c) Pool samples of different products/models/tabs in one gage R&R study. This approach applies when the products/models/tabs being pooled have the same tolerance, or the test method variation across different products/models/tabs can be assumed comparable.

    (2) Select at least 2 appraisers from a pool of all qualified test method appraisers at random or covering a broad range of experience levels.

    (3) Follow Table 3 to determine the minimum number of trials (repeated readings) per each appraiser and each test sample.

    (4) Randomize the test order and prepare a datasheet if necessary.

    (d) Destructive Test Method

    (1) For a destructive test method, it is difficult to separate the test method repeatability from the part-to-part variation. However, both the test method repeatability and reproducibility can still be assessed in a destructive Gage R&R study. Four methods can be used:

    (a) Use surrogates. A surrogate is a substitute for the actual product/part that is representative of the test method intended use. The surrogate can either be measured repeatedly as a non-destructive test, or the surrogate’s part-to-part variability might be small enough compared to the repeatability component of the measurement error that it has a minimal impact on the total measurement error. To use a surrogate, the description of the surrogates and justification of their representativeness of the test method intended use should be included in the TMV protocol. The number of surrogate samples, number of appraisers, and number of trials should meet the requirement in Table 3. For example, using magnets of different strengths to represent the range of seal strength to be studied is a possible idea.

    (b) Use master groups. A master group (also known as master batch, master lot) is a collection of homogenous units with small part-to-part variation (for example, units taken from a single manufacturing batch). In a Gage R&R study, each master group will be treated as a hypothetical “test part” and every unit within the master group will be treated as a trial. Notice that under this method, the estimated repeatability will include not only the true test method repeatability, but also the part-to-part variation within the master groups. To use the master group method for a Gage R&R study, the master groups should be clearly defined and documented in the TMV protocol. The number of master groups, number of appraisers, and number of trials are suggested in Table 3.

    (c) If the master group homogeneity cannot be presumed, a Gage R&R analysis can still be run to estimate the test method reproducibility using the master group method. However, the test method repeatability will be overly estimated by including a large part-to-part variation within the master groups. In this case, one can run a separate study using surrogates, standards, or other techniques to estimate the test method repeatability. The total standard deviation of the test method will be calculated according to the Root Sum Square (RSS)9 method:

    Equation F3263-17_7

    (d) In some cases, when measurement of a test sample is not completely repeatable, changes in subsequent measurements can be characterized using engineering or statistical models. These models may help to estimate the true test method repeatability. Using this method often involves advanced scientific or statistical methods and users should consult with a SME for assistance and approval.

    Note 1: For commonly used destructive packaging test methods, like ASTM F88 Heat Seal Peel Testing, it is important to understand that when doing a test method validation on a tensile tester (the instrument for this method), one should be concerned with the variation in the method and not the material. Thus, if possible try to use materials with as little variation as possible that still cover the range of force that will be tested.

    (3) Power/Sample Size Study—A Power/Sample Size study only applies to validating a comparative test method that is used for comparing the means of populations using statistical tests (for example, 1-sample t test, 2-sample t test, or ANOVA). The purpose of running a Power/Sample Size study is to ensure that the variability of the test method does not obscure the difference of means (the “signal”) that should be detected by the statistical test with an adequate power ( 80%)10.

    (a) To conduct a Power/Sample Size study:

    (1) Establish the practical significance Δ for comparing the means and document the Δ with rationale in the TMV protocol and/or report.

    (2) Estimate the total standard deviation based on a minimum of n=15 measurements. These measurements should be collected from 5 or more unique samples.

    (3) Calculate the required sample size n for the statistical test to detect Δ with 80% power.

    (4) Accept the test method if the sample size n is acceptable.

    (4) Use Existing Data—Existing data could be used for variable TMV, if it met the following requirements:

    (a) Existing data should be stored in a controlled manner with full traceability to its original sources (for example, data stored in DHF; in an approved electronic lab notebook (ELN); or as an attachment to an approved change controlled document).

    (b) Existing data should be collected from measuring products or parts that are within the validated range of the test method, or representative of the test method’s intended use.

    (c) The TMV report could include the rationale on how existing data used meet these requirements. Step #4 – Complete TMV Protocol: 

    (1) Except when using the “Report-only” approach, a released TMV protocol is required per ISO 11607 prior to executing the validation testing. VTMV protocol could include, at minimum, the following contents:

    (a) Purpose and scope of the TMV protocol.

    (b) Reference to the test method document being validated.

    (c) A list of references to other related documents, if applicable.

    (d) A list of the types of equipment, instruments, fixtures, etc. used for the TMV.

    (e) TMV study rationale, including:

    (1) Statistical method used for TMV;

    (2) Characteristics measured by the test method and the measurement range covered by the TMV;

    (3) Description of the test samples and the rationale;

    (4) Number of samples, number of appraisers, and number of trials;

    (5) Data analysis method, including any historical statistics that will be used for the data analysis (for example, the historical average for calculating %P/T with a one-sided specification limit).

    (f) TMV acceptance criteria.

    (g) Validation test procedures (for example, sample preparation, test environment setup, test order, data collection method, etc.). Step #5 – Prepare Appraisers: 

    (1) Select the required number of appraisers from a pool of all qualified test method appraisers, covering a broad range of skills and experience levels.

    (2) Train appraisers on the test method prior to executing the TMV protocol. Training should include any special setup of the test environment and operation of equipment unique to the TMV for which the appraisers will be responsible prior to running the test method. Training should be documented per company training requirements. Step #6 – Prepare Test Samples: 

    (1) Collect the required number of test samples and prepare them for validation testing:

    (a) The test samples should be within the TMV scope, except for using representative surrogates or standards.

    (b) For TMV using the Capability Study or Power/Sample Size study, the test samples should be from a nominal build that represents the actual design and/or production process.

    (c) For TMV using a Gage R&R study, the test samples may include nominal, borderline, and out-of-specification units. This will allow for assessment of the test method performance across the tolerance range.

    (d) Prepare the test samples in a consistent manner.

    (e) Label and/or control every test sample carefully to prevent mix-up and bias.

    1. Scope

    1.1 This guide provides information to clarify the process of validating packaging test methods specific for an organization utilizing them as well as through inter-laboratory studies (ILS), addressing consensus standards with inter-laboratory studies (ILS) and methods specific to an organization.

    1.1.1 ILS discussion will focus on writing and interpretation of test method precision statements and on alternative approaches to analyzing and stating the results.

    1.2 This document provides guidance for defining and developing validations for both variable and attribute data applications.

    1.3 This guide provides limited statistical guidance; however, this document does not purport to give concrete sample sizes for all packaging types and test methods. Emphasis is on statistical techniques effectively contained in reference documents already developed by ASTM and other organizations.

    1.4 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.

    1.5 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

    2. Referenced Documents (purchase separately) The documents listed below are referenced within the subject standard but are not provided as part of the standard.

    ASTM Standards

    E177 Practice for Use of the Terms Precision and Bias in ASTM Test Methods

    E456 Terminology Relating to Quality and Statistics

    E691 Practice for Conducting an Interlaboratory Study to Determine the Precision of a Test Method

    E2282 Guide for Defining the Test Result of a Test Method

    E2782 Guide for Measurement Systems Analysis (MSA)

    F17 Terminology Relating to Primary Barrier Packaging

    F2097 Guide for Design and Evaluation of Primary Flexible Packaging for Medical Products

    ISO Standards

    ISO/TS 16775 Packaging for terminally sterilized medical devicesGuidance on the application of ISO 11607-1 and ISO 11607-2

    ICS Code

    ICS Number Code 55.020 (Packaging and distribution of goods in general)

    UNSPSC Code

    UNSPSC Code 24120000(Packaging materials)

    Referencing This Standard
    Link Here
    Link to Active (This link will always route to the current Active version of the standard.)

    DOI: 10.1520/F3263-17

    Citation Format

    ASTM F3263-17, Standard Guide for Packaging Test Method Validation, ASTM International, West Conshohocken, PA, 2017, www.astm.org

    Back to Top