Automated domain selection

The importance of selecting the correct domain

A smarter way to enrich your training data

The quality of a neural MT engine depends on the quality and quantity of the dataset used to train the engine. And the smartest way to improve the datasets used to train your machine translation engine is to organize them by domain, which, for our purpose, is defined as a set of elements associated with a particular subject.

More often than not, we encountered a scenario in which the available training data was unorganized, uncleaned, referring to many different domains and of a mixed content type. Such training data does not yield good results.

Our automated domain selection service can help you generate a custom dataset based on domain-specific content. We can use, i.e., the Translation Memory of your product manual or e-commerce catalogue, which is domain-specific by definition, to retrieve pertinent data from a larger, unorganized set of translation memory entries, comprising millions of segments from unspecified domains.

The result is a richer training data set, which can be used to train your custom neural MT engine in the most effective way possible.

To know more on how we can assist you, @@book a consultation@@ to discuss your specific requirements.