
Your voiceprint is not safe after Meta launched the Spirit LM model to imitate human voices
Meta launches the open-source “Spirit LM” model for voice generation
Meta announced the launch of its new open-source model “Spirit LM”, which focuses on addressing the challenges associated with multimodal models in the field of artificial intelligence, especially with regard to voice generation.
The new model aims to deliver a more natural and richer voice experience, representing an advanced step toward developing intelligent robots capable of voice communication in a more sophisticated and realistic way.
The “Spirit LM” model relies on a pre-trained language model with 7 billion parameters, giving it the ability to process audio in distinctive ways compared to traditional models that rely on automatic speech recognition (ASR) technologies.
With this launch, Meta seeks to enhance artificial intelligence capabilities in the field of audio, opening new horizons for interaction between humans and machines.
Meta unveils the “Spirit LM” model to improve the voice generation experience
Meta points out that the traditional approach to audio processing leads to the loss of many natural expressions. Therefore, the “Spirit LM” model relies on using phoneme tokens (sound units), tones, and pitch levels to overcome these limitations, enabling it to produce more expressive natural voices.
This approach allows the model to learn new tasks including speech recognition, text-to-speech conversion, and speech classification, which enhances its capabilities and makes it more effective in meeting user needs.
Meta revealed the details of this model in a research paper, in which it also reviewed the research that led to the development of “Spirit LM”. In addition, samples of the model's vocal performance were presented, giving the public a clear idea of its future capabilities and how it can contribute to improving human interaction with technology.





