What are BLEU, F-Measure and TER scores?

Follow

KantanMT provides a number of different methods to measure the performance of your KantanMT engine.
One of these methods is providing industry standard quality metrics for each engine build. These quality metrics are BLEU, F-Measure and TER.

  • BLEU: The BLEU score measures how many words overlap in a given translation when compared to a reference translation, giving higher scores to sequential words. BLEU scores range from 0-100, the higher the score, the more the translation correlates to a human translation. BLEU provides some insight into how good the fluency of the output from an engine will be. 

 

  • F-Measure: The F-Measure score measures how precise KantanMT operates when retrieving words and how many words it can retrieve or recall during translation. This is why it is commonly referred to as a Recall and Precision measurement. By expressing these two measurements as a ratio, it is a good indicator as to the performance of an engine and its ability to translate content. F-Measure scores range from 0-1, the closer to 1 the score is, the better the recall and precision of the translations will be. On your KantanMT dashboard we display this score as a percentage. The F-Measure gives an indication as to the quality of the translations that a engine will produce.

 

  • TER: The TER score measures the amount of editing that a translator would have to perform to change a translation so it exactly matches a reference translation. By repeating this analysis on a large number of sample translations, it is possible to estimate the post-editing effort required for a project. TER scores also range from 0-1. However unlike the other scores, with TER a higher score is a sign of more post-editing effort and so the LOWER the score the better, as this indicates less post-editing is required. Again on your KantanMT dashboard we display these scores as a percentage. TER gives an indication as to how much post-editing will be required on the translated output of an engine.

Use KantanBuildAnalytics to view these scores for a KantanMT engine.

 

Have more questions? Submit a request

Comments