WellSaid Labs Raises $10M Series A Round to Build ‘Human-Parity’ Synthetic Voices

AI text-to-speech startup to expand development of its core product line to deliver "voices for every brand."

WellSaid Labs

WellSaid Labs is working on AI text-to-speech technology.

One application of artificial intelligence that has implications for human-machine interaction in industrial automation, autonomous vehicles, and service robots is voice technology. WellSaid Labs Inc. today announced a $10 million Series A round. The Seattle-based company said it plans to use the funds to drive further AI and product innovation, scale go-to-market functions, and grow its team.

“Robotics engineers and makers now have an opportunity to further enhance the machine-to-human interface with WellSaid,” said Matt Hocking, CEO of WellSaid Labs. “From consumer-first robots chartered with ushering an experience, to manufacturing equipment capable of providing work instructions for maintenance or operation on its own.”

“Any robot designed to directly interact with humans can deliver a more pleasant and delightful experience by speaking with a WellSaid AI Voice,” he told Robotics 24/7.

WellSaid Labs works on AI challenge

Creating natural-sounding speech from text is considered a “grand challenge” in the field of AI and has been a research goal for decades. Over the past three years, WellSaid Labs said it has developed advances in the quality, speed, and reliability of neural text-to-speech (TTS) systems.

In June 2020, the company claimed that its TTS technnology became the “first synthetic media service to achieve human parity” for naturalness on short audio clips across multiple voices.

WellSaid said it has rearchitected TTS to enable content creators of all sizes to develop all their desired content in a consistent voice that represents their brands. Its Voice Avatar library provides access to multiple read styles and tones that producers can use. In addition, “brands can create their own AI voice avatars to spec — capturing the likeliness, style, and uniqueness of the voice needed to tell their stories in exactly the right way,” said the company.

“We’ve added AI voice to the toolkit of thousands of content creators and their teams,” said Hocking. “Our human-parity AI voice can be produced faster than real time and updated on demand, opening up new and exciting opportunities to 'add voice' where never before perceived possible. AI voice easily ensures every production can be created and updated efficiently at scale.”

WellSaid Studio eliminates the complexities of conventional TTS technologies, making the production, updating, and publishing of voiceovers extremely simple and cost-effective, said the company. Product developers can access WellSaid Labs’ core AI engine via real-time application programming interfaces (APIs) to power digital experiences with a reliable and scalable voice infrastructure.

VCs back synthetic voices

FUSE led WellSaid Labs' Series A, with participation from previous investor Voyager Capital, as well as Qualcomm Ventures LLC and GoodFriends. The startup said its round was oversubscribed with interest from venture capital firms because of its record year-over-year growth in revenue and strong customer demand.

“Plain and simple, WellSaid is the future of content creation for voice,” said Cameron Borumand, general partner at FUSE. “This is why thousands of customers love using the product daily with off-the-charts bottom-up adoption.”

“Content creators or product experience designers were previously faced with difficult tradeoffs between quality and scalability when using TTS tools or human voiceover,” said James Newell of Voyager Capital. “WellSaid’s incredible voices, which are accessible through a studio application or a scalable API, removes the need to choose whether you want natural, lifelike speech or infinitely scalable and easily editable voice content.”

“WellSaid provides both and delivers it however your team would like to consume it,” he said. “Creative teams have found it to be extremely useful when they need to produce multiple pieces of high-quality content in a consistent voice in hours instead of weeks.”

“Recent developments in TTS technology using generative AI have enabled synthetic voices to sound very human-like, finding exciting new applications for voice including e-learning, advertising, and news readers,” said Carlos Kokron, vice president at Qualcomm Technologies Inc. and managing director of Qualcomm Ventures Americas. “We look forward to working with WellSaid Labs to help fuel the creator economy with human-parity AI voices across mobile and IoT.”

“WellSaid's team has applied deep technical expertise to build a platform that enables easy creation and editing of incredibly life-like audio,” said Dave Gilboa of Good Friends and co-CEO of Warby Parker. “We see meaningful growth potential in the use of high-quality audio in giving brands the ability to communicate with customers and creators the ability to engage with audiences.”


Email Sign Up

Get news, papers, media and research delivered
Stay up-to-date with news and resources you need to do your job. Research industry trends, compare companies and get market intelligence every week with Robotics 24/7. Subscribe to our robotics user email newsletter and we'll keep you informed and up-to-date.


WellSaid Labs

WellSaid Labs is working on AI text-to-speech technology.


Robot Technologies