OpenVoice instant voice cloning technology represents a significant advancement in speech synthesis. Developed by Qin, Zengyi, Zhao, Wenliang, Yu, Xumin, and Sun, Xin, and detailed in their 2023 arXiv preprint, OpenVoice boasts three main advantages: accurate tone color cloning, flexible voice style control, and zero-shot cross-lingual voice cloning. Since its integration into myshell.ai in May 2023, OpenVoice has seen tens of millions of uses globally, contributing to explosive user growth on the platform.
OpenVoice’s capabilities include generating speech in multiple languages and accents, controlling voice styles like emotion and accent, and manipulating style parameters such as rhythm, pauses, and intonation. Its zero-shot cross-lingual voice cloning means that neither the generated speech’s language nor the reference speech’s language needs to be in the training dataset. The technology is currently licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, with plans to transition to a Free Commercial usage license soon. MyShell maintains the ability to detect whether audio is generated by OpenVoice, ensuring social responsibility and preventing misuse.