In a groundbreaking collaboration, Google Meet, in conjunction with teams from DeepMind and its research department, has introduced a real-time language translation feature that aims to eliminate communication barriers for users worldwide. This integration found its roots in the efforts of team members Fredric Lindstrom and Huib Kleinhout, who work across different countries and languages, enabling seamless conversations during virtual meetings.

Transforming Communication in Meetings

Fredric and Huib, based in Sweden and Norway respectively, spearheaded the development of Speech Translation, Google Meet’s innovative new feature. By leveraging AI, this technology allows users to communicate effectively during calls, even if they speak different languages. As Fredric describes, “Speech Translation automatically translates your speech in near real-time, in a voice like yours”—highlighting the emphasis on maintaining the speaker’s unique vocal characteristics to foster natural, uninterrupted dialogue.

AI Innovations and Rapid Development

Fredric, who leads the audio engineering team, has witnessed substantial transformations in productivity owing to AI advancements. The journey began approximately two years ago, with existing models primarily supporting offline translation. The primary challenge was to achieve instantaneous translation for live interactions. Many were skeptical about the timeline with initial thoughts suggesting a five-year development cycle. However, the tools from Google DeepMind accelerated the process significantly, bringing together expertise from across various Google sectors to realize the project’s ambitious timeline.

A Breakthrough in Translation Technology

The conventional audio translation practices depended on a cumbersome, multi-step methodology: transcribing the speech, translating it, and re-converting it into speech. This approach often resulted in frustrating delays, with latencies of 10-20 seconds that impeded natural exchanges. Huib, the product manager for audio quality, shared that the key breakthrough stemmed from utilizing “large models” capable of providing near-instantaneous translations. The transition from a drawn-out process to real-time outputs allowed conversations to flow smoothly, achieving a desirable translation speed of just two to three seconds.

Refining Quality and Addressing Challenges

Despite the advancements, the development of this cutting-edge feature met significant challenges, particularly in ensuring high-quality translations that account for speaker accents, background noise, and various network conditions. To tackle these issues, the teams collaborated closely, implementing extensive testing involving linguists and language experts. This delicate process aimed to capture the nuances of different languages, with the integration of languages that share closer affinities being notably easier than those with more complex syntactic structures. Meanwhile, certain limitations persist, such as the model’s tendency to translate idiomatic expressions literally, leading to potential misunderstandings.

Bridging Global Communication Gaps

Both Huib and Fredric expressed a shared sense of gratification at witnessing their AI research come to fruition in a way that can profoundly impact a considerable global audience. As of now, the innovative speech translation feature has been rolled out for several languages including Italian, Portuguese, German, and French. Fredric reflects on the transformative power this technology presents, especially for families divided by language barriers. He asserts, “Now they have a common language. All of a sudden, they can talk to each other,” illustrating the potential for technology to bridge significant cultural gaps.