Terminology analysis (Automated and Human) & clean up

Terminology and Machine Translation

What is the relationship between Neural Networks and terminology? Do engines have a mechanism to decide which word fits best within a given context? How do engines treat polysemy, that is, words that bear multiple meanings, sometimes not even in a related semantic field? Or ambiguity, or homonyms?

In truth, MT engines do not bode well with terminology. 

There is a simple reason for this: neural networks learn using large corpora of data. The bigger, the better. This is why very large models improve more than smaller ones do. The machine does not recognize the meaning of a word. And terminological ambiguity or polysemy can be a real problem for a translation. Homonyms pose a problem for speech to text conversion linked to automatically translated output (textual or spoken). 

To clarify, consider the following examples:

Arm can be a body part or a division of a company.

Accelerator – Verknüpfung, Shortcut

Alignment – Einheitlichkeit, Ausrichtung

Area – Zone, Fläche, Flächeninhalt


This is just a taster of the intricacies of terminology.

Puns can get lost between the lines, and superficially fluent texts can stumble on terms that do not fit the context, causing amusement or, worse, mistakes. 

Processing terminology within machine translation is a complex undertaking. Since it is impossible for a neural network to understand the terminology, the two possible approaches are:

  1.  Include a pre-and post-processing step to the automatic machine translation of your content.
  2. Restrict the training data used to train the engine to non-ambiguous terms.

The approach that will work best for your specific case depends on many variables, including project specifications, language pairs and content type.

