KantanMT allows automated pre-processing of the training data and/or the documents to translate thanks to PPX rules, which consist of search and replace patterns based on regular expressions (REGEX). When a PPX file is uploaded in the Training tab of an engine, all training data will be automatically pre-processed by such rules, if they match any pattern. The same process would apply to the translation data when any PPX file is uploaded to the Translation tab.

This is how a typical PPX rule looks like:

Fig.1. Example of PPX rule

To apply PPX rules to your training data, you only need to upload the necessary PPX file(s) to the Training tab. To pre-process the source side of your training data, your PPX file has to be named source.ppx; to pre-process the target side of your training data, your PPX file has to be named target.ppx; to pre-process your monolingual data, your PPX file needs to be called mono.ppx. If you want to pre-process your translation document, you will need to upload a file called source.ppx to the Translation tab.

Another way to add and apply PEX rules is through the KantanMT Rule Editor.


