YouTube Video
Episode Summary
– Lot of news on Microsoft lately
– Can mimic voice with accent and emotion in just 3 seconds
– Trained on some 60,000 hours of English speech
– Consequences for voice actors and narrators?
– Potential of scams
– Sonantic acquired by Spotify
– Descript acquired Lyrebird in 2019 for voice-cloning
– Pairing it with something like Google translate for instant communication
Resources:
VALL-E
https://valle-demo.github.io/
Microsoft’s new VALL-E AI can capture your voice in 3 seconds
https://newatlas.com/technology/microsoft-vall-e-speech-synthesis/
Steven Tey on Twitter: “Surprised there isn’t more chatter around VALL-E This new model by @Microsoft can generate speech in any voice after only hearing a 3s sample of that voice 🤯 Demo → https://t.co/GgFO6kWKha https://t.co/JY88vf4lYc” / Twitter
Surprised there isn't more chatter around VALL-E
This new model by @Microsoft can generate speech in any voice after only hearing a 3s sample of that voice 🤯
Demo → https://t.co/GgFO6kWKha pic.twitter.com/JY88vf4lYc
— Steven Tey (@steventey) January 9, 2023
Voice-imitation advance means we can’t trust what we see or hear anymore
https://newatlas.com/ai-mimic-voices-video-lyrebird/49851/?itm_source=newatlas&itm_medium=article-body
Microsoft Launched VALL-E, A Voice DALL-E
https://www.theinsaneapp.com/2023/01/microsoft-launched-a-voice-based-dall-e-called-vall-e.html
Descript | All-in-one video and audio editing
https://www.descript.com/home-2
Lyrebird – Descript
https://www.descript.com/lyrebird
Andrew Mason’s Descript snags $15M, acquires Lyrebird to let users type text to create audio in their own voices | TechCrunch
Spotify is acquiring Sonantic, the AI voice platform used to simulate Val Kilmer’s voice in ‘Top Gun: Maverick’ | TechCrunch
How A.I. restored Top Gun star Val Kilmer’s voice | Fortune
https://fortune.com/2022/05/27/how-does-val-kilmer-speak-in-top-gun-maverick-sonantic-artificial-intelligence/
VALL-E – Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers – Microsoft : singularity
VALL-E – Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers – Microsoft from singularity