Dame pensant avec des symboles à gauche

L’avant-garde de l’interprétation automatique avec ses pièges et promesses (en anglais)


As we enter the digital age, the advent of machine interpreting marks an exciting chapter in the evolution of language services. While it may not completely replace human interpreters, the capabilities of machine interpreting offer tremendous potential in areas ranging from business to healthcare and beyond. By harnessing the power of artificial intelligence, we can strive for a more connected and inclusive world, breaking down language barriers and fostering global understanding.

Let us take a look at what machine interpreting is, what approaches are currently being researched to improve its performance, and what risks and limitations its use may entail.

What is machine interpreting?

Machine interpreting (MI), also known as automatic interpreting, AI interpreting or speech-to-speech translation, is the process of automatically translating spoken content from one language to another in real time. It aims to break down language barriers and enable global information sharing and involves the use of advanced language processing technologies. Rather than relying solely on human interpreters, sophisticated algorithms and artificial intelligence are used to analyse and translate spoken language, enabling seamless communication across language barriers.

What is the state of the art and what challenges does MI face?

In a recently published paper titled “The Emergence of Machine Interpreting”, Claudio Fantinuoli explores the progress made in using machine learning to develop Machine Interpreting (MI). This technology has found applications in various areas, both informal and professional.

The paper discusses two primary approaches to MI:

  • The end-to-end approach directly translates spoken language without going through a written text stage (input audio to output audio).
  • The cascading approach, on the other hand, involves multiple steps such as speech recognition, machine translation and the generation of spoken output (voice synthesis). Recent developments aim to simplify this process by combining the steps into a single component, making the system more accurate and natural.

The most sophisticated form is considered to be simultaneous MI, which involves interpreting ongoing speech streams in real time without interruption.

Evaluation methods for MI from a user-centred perspective are still in their infancy. Initial evaluation attempts have used written translation or human interpretation as a reference point, demonstrating high accuracy in certain scenarios, but revealing limitations in others. Comprehensive evaluation methods are still needed in the field of MI in order to assess speech clarity, speech naturalness and human-machine interaction.

Existing MI systems focus primarily on spoken language and lack the ability to understand non-verbal cues, vocal intonation and contextual information. However, there are emerging approaches that aim to address these limitations. For example, researchers are exploring the incorporation of additional information, such as images, and the use of generative language models to improve the quality of translations.

A long way to go to match the complexity of human communication, especially in multilingual exchanges

While MI demonstrates proficiency in tasks that demand high levels of human intelligence, it may not be suitable for all purposes. In situations where deep understanding, empathy, and accountability are crucial, human interpreters will continue to play an irreplaceable role.

It is therefore crucial to approach the implementation of MI in certain contexts, such as legal and medical, with caution. Balancing the benefits of technology with the essential skills and expertise of human interpreters will be key to maintaining the highest standards of interpretation in such settings.

Machine interpreting should be seen as a complementary tool to human interpreters, rather than a complete replacement

Human interpreters bring invaluable cultural understanding, contextual knowledge and adaptability that machines cannot fully replicate. The synergy between humans and machines in interpreting services could lead to better results and a better user experience in the future.

The future is therefore likely to involve collaborative efforts between humans and machines to facilitate multilingual communication, requiring practical and ethical considerations in the use and regulation of MI.