The standard will focus on developing a comprehensive, universal and at the same time simple language quality assessment methodology that would make it possible to quickly and reliably assess language quality of non-profit, publicly available content using crowdsourcing. The standard will focus on most essential, holistic quality factors and will outline the approach allowing to obtain trustworthy and consistent results that reflect actual translation quality despite the inevitable subjectivity of human perception and natural limitations of crowdsourcing. Both the methodology and quality metric will use an intentionally simplified approach based on selecting the factors most important for a brief analysis (and for human perception in general) and strictly separating objective issues from semi-objective ones to improve the reliability of results obtained. The standard will be applicable to a wide variety of content and subject matter areas, including both written and multimedia materials, but will not apply to interpreting services. Apart from generating robust results quickly and at a minimal cost, the methodology could also be applied as an introductory step for making subsequent decisions about the necessity of going forward with a detailed professional quality assessment.
Existing translation metrics are numerous and diverse, most of them are rather complex, require serious reviewer training and effort, and thus cant be applied within the crowdsourcing context. Most notable problems with publicly available metrics include the following: The majority of metrics concentrate on technical errors, such as typos, grammar issues, broken functionality, country standards violations etc. and pay either no or little attention to holistic issues, such as adequacy between the meaning and message conveyed by source and translated content or readability of the translated content as a whole, which tend to be the most important factors when it comes to human perception. No publicly available metrics take into account the nature of quality issues, all of them ignoring the fact that some of the assessments (either provided by a human reviewer or obtained using automated methods) are not completely objective by design. This methodological oversight results in low reliability (and wide statistical variance) of actual quality assessments in most cases. There is also no universally accepted, complete and thorough catalogue of language quality issues (potential problems). The first such attempt is being undertaken currently (work item WK46396) and the effort is mostly based on the public MQM project. The primary (and most commendable) target is establishing and standardizing the underlying framework of well-defined language quality issues. But neither that project, nor the submission concentrate on quality metrics or methodology as such, they are rather supposed to serve as a universal foundation for building quality metrics. It is important to emphasize that simply proposing a minimized, degenerated quality issue framework for crowdsourcing projects is not sufficient, this framework needs to be paired with a methodology that makes it possible to properly interpret obtained data, produces reliable results, clearly outlines inherent limitations of the model and minimizes effort. This work item will describe a clear-cut combination of comprehensive methodology and simplified quality metric that overcomes the deficiencies of existing metrics listed above and makes it possible to utilize the natural advantage of crowdsourcing, i.e. obtaining a statistical set of results instead of a single (human or automated) assessment, to compensate for such factors as the natural lack of complete objectivity in quality assessment in certain key areas, the reduced issue framework and minimal reviewer training. The methodology/metric combination standard briefly outlined above is specifically targeted at obtaining quick, inexpensive and reliable quality assessment of a wide variety of publicly available resources such as websites, audio/video content, etc. It is expected to provide a common, universal and sufficiently objective approach for the area with high exposure, where subjectivity currently rules. It would complement the detailed and all-embracing quality foundation framework described in item WK46396.
Keywordslanguage quality assurance, language quality assessment, language quality evaluation, crowdsourcing, simplified evaluation
The title and scope are in draft form and are under development within this ASTM Committee.Back to Top
Draft Under Development