How to Create a Test Set for Automatic Evaluation

  1. After building your baseline engine, navigate to the KantanMT Dashboard, and click  to access the KantanBuildAnalytics™ page.
  2. When no test set is uploaded to an engine, KantanBuildAnalytics automatically and randomly selects 500 segments to use as a reference set to score the quality of that engine. These segments can be found in any of the following tabs: F-Measure, BLEU, TER.
  3. Navigate to any of these three tabs, and click Download to download the preselected test set.
  4. Open the XLSX spreadsheet.
  5. Filter the columns and order by smallest BLEU score.
  6. Remove all segments that show the maximum discrepancy between F-Measure and BLEU, that is F-Measure=100% and BLEU=0%
  7. Remove all segments where MT output and Reference are the same but BLEU score is not 100% (use the highlight duplicate function on MT output and reference).
  8. Delete score column.
  9. Calculate source segment length using the following function: (=LEN(CELLVALUE)), where ‘CELLVALUE’ is the source segment cell.
  10. Sort by smallest source segment length and delete all segments whose length is less than 15 or more than 150 characters.
  11. Delete all unnecessary segments that might cause false positives/negatives in the scores (e.g., telephone numbers, dates, addresses, websites, etc.), or any clear and visible mismatch between source and target.
  12. Delete all segments that appear as duplicates in the source column.
  13. Copy source and target column into two different text files (using a text editor), encode them in UTF-8, save them respectively as source.test.src and source.test.trg, and upload them to the Training tab. Alternatively, save the spreadsheet as test.reference.set.xlsx before uploading it to the Training tab.


If you encounter any problem or need help, please contact us at

Have more questions? Submit a request