German tech darling DeepL has (finally) launched a voice-to-text service. It’s called DeepL Voice, and it turns audio from live or video conversations into translated text.
DeepL users can now listen to people speaking a language they don’t understand and automatically translate it to one they do — in real-time. The new feature currently supports English, German, Japanese, Korean, Swedish, Dutch, French, Turkish, Polish, Portuguese, Russian, Spanish, and Italian.
What makes the launch of DeepL Voice exciting is that it runs on the same neural networks as the company’s text-to-text offering, which it claims is the “world’s best” AI translator.
As someone who’s just moved to a foreign country, I’m keen to try a voice-to-text translator that actually might work. All the ones I’ve tried so far aren’t real-time — there’s a lag that renders them pretty useless — and the translation quality is pretty poor.
Webinar: Unicorn DNA: The Blueprint for Scaling Success
What does it take to build a unicorn? On November 19, 3pm CET, top executives of unicorn companies will reveal the mindset, strategies, and innovative thinking that propelled their companies to the top.
For face-to-face conversations, you can launch DeepL Voice on your mobile and place it between you and the other speaker. It then displays your conversation so each person can follow translations easily on one device.
You can also integrate DeepL Voice into Microsoft Teams and video-conference across language barriers. The translated text appears on a sidebar as captions. It remains to be seen whether DeepL Voice will be available on platforms like Zoom or Google Meet anytime soon.
‘The next frontier’
While this is DeepL’s first such offering, it’s unlikely to be its last. DeepL’s founder and CEO, Jarek Kutylowski called real-time voice translation the “next frontier” for the business.
“DeepL is already a leader in written translation, but real-time speech translation is an entirely different story,” said DeepL’s founder and CEO, Jarek Kutylowski.
“When translating speech as it happens, you’re dealing with incomplete input, pronunciation issues, latency and more, all of which can lead to inaccurate translations and poor user experience.
“So we built a solution that would take these into account from the offset and enable businesses to break down language barriers by enabling them to communicate in multiple languages as required,” said Kutylowski.
Quality will likely be DeepL Voice’s differentiating factor from the countless other providers of voice-to-text translations.
From a technological perspective, DeepL’s success lies in the architecture of its neural networks, the input from human editors, and the training data. But Kutylowski also believes it has a key advantage over its competitors: focus.
“Focus is always an important thing,” Kutylowski previously told TNW. “Translate isn’t the core business of Google — it’s one of the 100 side gigs. The same goes if you consider LLMs and the OpenAIs of this world as our competition; translation is only one thing of what they’re doing and their GPU is doing a tonne of different things. We’re focused on one particular area.”
In May, the DeepL reached a $2bn valuation after securing a new investment of $300mn (€277mn). It covers 32 languages and counts over 100,000 business users.