MQM Council releases Multi-Range Theory of Translation Quality Evaluation with new scoring

The new MQM update offers a breakthrough. Translation Quality Evaluation (TQE) is the cornerstone of any translation and localization process, and it has only grown in importance with the advent of machine and neural translation. In an effort to standardize TQE efforts, the MQM (Multidimensional Quality Metrics) framework was developed and has been extended by a group of translation quality experts.

MQM provides a model for analytical translation quality evaluation (TQE). It was first introduced shortly before the arrival of NMT and was originally published as a deliverable of the EU-funded QTLaunchpad project. Since 2018, the widely used DQF subset of MQM has been improved by the MQM Council and updated to become MQM Core, along with its expanded MQM Full variant.

The original metrics have been largely represented by scorecards that only contain a raw score model. The Raw Scoring Model had disadvantages: the scores provided by the model were not very human readable, not customizable and the scores were difficult to compare and use due to variations in the threshold levels between different metrics.

The new linear calibrated scoring model enables implementers to create metrics that are comparable across content types, use cases and service levels. Calibrating the metric means setting quality thresholds that are relevant to the customer’s expectations and specific use cases. This approach is reflected in error tolerance limits that are much easier to understand and more flexible, making the PASS/FAIL decision clearer and more human-readable.

In addition to the MQM Error Typology itself, the MQM Council now offers two separate scoring models, the linear calibrated scoring model for medium-sized text samples and the non-linear scoring model for very large samples.

Although many users have previously used the linear scoring model, it has failed to provide the same consistent results for text that was either very large or very small. The non-linear model takes into account the fact that human perception changes throughout the process of content consumption. Human tolerance for error decreases sharply with increasing sample size. This means that the rater’s perception of content quality may become more subjective over time and may deviate from actual statistical TQE results achieved using a scoring formula. The non-linear scoring model is based on MQM’s standard analytical approach and typology, but it introduces a logarithmic function to define the score, reflecting this non-linearity of human perception. The non-linear scoring model can produce accurate scores over a wide range of sample sizes, from small to infinite.

The MQM Council has also addressed the problem of low Inter-Rater Reliability among human linguists evaluating very small samples (such as one sentence). For very small samples, the MQM Council recommends methods used in Statistical Quality Control (SQC).

Today, we are pleased to announce that after 18 months of close collaboration, the MQM Council working group has developed and published a full and detailed document – ​​and proposed methods to solve them depending on the size of the TQE samples.

The paper is freely available on ArXiv at:

In particular, this paper explains why it is impossible to judge the translation quality of a sentence as an example and proposes the SQC method as the solution. SQC does not provide a quality rating as such, but rather it provides a risk assessment, which assesses the likelihood of producer and consumer risk.

The MQM Council hopes that this work will generate both new research and promote change in TQE processes, enabling language experts to work faster and clients to have real confidence in the results of the assessment.

To disseminate and share news, quality measurement research (including AI), reviews, opinions, comments, colleagues’ work and developments in the MQM universe, Tte MQM Substack newsletter is established.

Subscribe to MQM Matters Substack by visiting this page:!

Back To Top