Terminology for machine translation vs. terminology for corporate language maintenance - New career chances for terminologists?

04.01.2021

CB Multilingual attended textshuttle’s MT MeetUp “Terminology integration in machine translation: does it work in practice?” on 25.10.2021 in Zurich.

Systematic terminology management is essential for the transfer of knowledge and the maintenance and development of a uniform corporate vocabulary in all company languages

To this purpose, terminology databases in the corporate field contain different data categories depending on the situation, number of languages managed and complexity of the underlying subject areas, to delimit and define terms in each case precisely and clearly. In certain cases, other metadata is entered, such as the genus, the word class or even the data for the identification of projects, subsidiaries and specialist subjects etc. In order to show how terms are used, it can be useful to provide the terminology entries with context sentences, annotations or references to other terminology content. Only in this way can terminology satisfy the needs of its target groups (employees within the company, external partners and of course translators, editors or other language experts). Moreover, a continuous update of the terminology entries is necessary.

In this context terminologists recommend to strictly state a source when capturing the individual terms and under no circumstances relinquish tendering a definition in at least one language. The existing or respectively possible linguistic variants (forms of designations) in the source and target texts, such as abbreviations or synonyms, should be indicated with a corresponding attribute. Even terms that are not be used, because they are either not permitted or obsolete, find their place in the terminology collection: Only in this way can editors have the possibility to recognise these terms in advance and can exclude them when writing technical content.

The goals of terminology in the context of machine translation (MT) look quite different

In this context the target groups can be limited on the MT-system and on the post-editors of the MT output.

The speakers at the MT Meetup gave many recommendations on which elements of a terminology database can “confuse” the MT system and should therefore be avoided:

Synonyms can lead to inconsistencies (linguistic discrepancies)
Acronyms can be responsible for a “weird” output
Terms which are designated as “preferred” or “deprecated” are not recognised
Morphologically complex terms are unlikely to be translated correctly

In addition, large terminology databases are slowing down the MT process.

Basically – as the experts Philip Ursprung and Janine Aeberhard explained at the MT Meetup – the MT system only needs the designations, a designation ID and a language identifier for its work.

Based on these considerations, some possible solutions were briefly outlined, for example:

Filtering entries by date, length or lack of synonyms or other term types
Tagging of relevant entries
Using separate terminology databases for each MT project
If possible, extracting terminology automatically from corpora to have it post-edited (by human post-editors)

New perspectives for terminologists

The use of MT goes hand in hand with the changes made in the working approaches of terminology management. MT focusses on a term-oriented terminology instead of a concept-oriented terminology, which corresponds to the state of technology and science.

The information, which the MT system needs from the terminology, seems to be quite different from the ones translators and other target groups need.

A specialised MT terminologist could leverage the full potential of the technology and language assets. We look forward to seeing what new developments in this field will have in store for us.