How to Create a Test Set for Automatic Evaluation


We recommend reading these instructions in full before you start.

  1. After building your baseline engine, navigate to the KantanMT Dashboard, and click  to access the KantanBuildAnalytics™ page.
  2. When no test set is uploaded to an engine, KantanBuildAnalytics automatically and randomly selects 500 segments to use as a reference set to score the quality of that engine. These segments can be found in any of the following tabs: F-Measure, BLEU, TER.
  3. Navigate to any of these three tabs, and click Download to download the preselected test set.
  4. Open the XLSX spreadsheet and click on ‘Yes’ when you get the warning message.
  5. Filter the columns.
  6. If present, remove all segments that show the maximum discrepancy between F-Measure and BLEU, that is F-Measure = 100% and BLEU = 0% or F-Measure = 100% and BLEU = 0%.
  7. Using the Conditional Formatting tool, highlight duplicate values in the source column and delete any segments that appear as duplicates.
  8. Clear the Conditional Formatting from the source column.
  9. Use the Conditional Formatting tool again, this time to highlight duplicate values between the MT Output and Reference columns.
  10. Remove all segments where MT output and Reference are the same, but BLEU score is not 100%.
  11. In an empty column, calculate source segment length using the following function: (=LEN(CELLVALUE)), where ‘CELLVALUE’ is the source segment cell. Extend your filter so that you can sort according to segment length.
  12. Sort by smallest source segment length and delete all segments whose length is less than 15 or more than 150 characters. You should have at least 350-400 segments at the end of the cleansing process, so you may need to include segments with 12-15 or 150-200 characters to meet this amount.
  13. Delete all unnecessary segments that might cause false positives/negatives in the scores (e.g., telephone numbers, dates, addresses, websites, etc.), or any clear and visible misalignments between source and target.
  14. Copy the source and reference columns into two different text files (using a text editor), making sure they are encoded in UTF-8. Then, save them respectively as source.test.src and source.test.trg.
  15. Once saved, upload them to the Training tab of your engine, ensuring there is no .txt at the end of either file name.


If you encounter any problem or need help, please contact us at

Have more questions? Submit a request