OpenAI Releases Open Source Speech Recognition System Whisper

This is is part of my live-learning series! I will be updating this post as I continue through my journey. I apologize for any grammatical errors or incoherent thoughts. This is a practice to help me share things that are valuable without falling apart from the pressure of perfection.

https://tylerbryden.com/podcast-player/60621/openai-releases-open-source-speech-recognition-system-whisper.mp3

Episode Summary

– Automatic speech recognition (ASR) system trained on 680,000 hours of multilingual data
– Enables transcription in multiple languages, as well as translation from those languages into English
– The models exhibit improved robustness to accents, background noise, technical language, as well as zero-shot translation
– There are 9 models of different sizes and capabilities
– Anticipated use is to improve accessibility tools
– Concerns about technology being used to build capable surveillance technologies
– Used Python 3.9.9 and PyTorch 1.10.1 to train and test models
– Also uses HuggingFace Transformers for their fast tokenizer implementation and ffmpeg-python for reading audio files

YouTube Video

Resources

Sam Altman on Twitter: “near human-level speech recognition, open-sourced: https://t.co/eeuTfXJkwy (check out the examples, i find them difficult)” / Twitter
Sam Altman (@sama) / Twitter
Emad on Twitter: “Bravo 👏 Speech2Image anyone?” / Twitter
OpenAI (@OpenAI) / Twitter
OpenAI on Twitter: “We’ve trained a neural net called Whisper that approaches human-level robustness and accuracy on English speech recognition. It performs well even on diverse accents and technical language. Whisper is open source for all to use. https://t.co/ueVywYPEkK” / Twitter
whisper/model-card.md at main · openai/whisper
Introducing Whisper
whisper.pdf
openai/whisper
🤗 Transformers
kkroening/ffmpeg-python: Python bindings for FFmpeg – with complex filtering support
PyTorch
OpenAI

Affiliates

Shure MV7

Shure SM7B

Hashtags

tech news,openai,whisper,speech recognition,transcription,sam altman,artificial intelligence,open-source speech recognition,speech-to-text,voice recognition,machine learning,tech news today

Automated Transcription

Hello, Hello, Tyler Bryden here. I hope everything is going well. It’s a big day in AI, big day in generative AI, Open AI. Finally, after April, that’s sort of the beta release has announced that they have removed the wait list for Dolly. You can sign up and start creating immediately. So I’ve got the link popped up here. I’ve got a couple other resources, but if you don’t even want to listen to me now, you just want to get creating and you’ve been waiting for access, you can now go and do it. I got a couple other things that I think are super interesting about this, but. You’ve got the actual link. You can sign up right away. I’ve been lucky enough to have access to Dali for awhile and have been playing around with it. Obviously super incredible. And they they talk about some of the stats of people who have been using it even in the beta, which is more than 1.5 million users, 2,000,000 images a day. So that’s a lot already happening here. We’re going to continue to see, you know, a massive explosion for image generation, generative, generative AI, image to video or text to video, text to.

Audio text to music creation, all this incredible stuff is going to continue to happen. Open AI with this sort of joins mid journey and stable diffusion which has sort of allowed you know, wide access to people. But a lot of people, you know look at some of the core innovations here as dowry and are really excited about this release. I, you know I’ve been posting a couple videos about this and immediately in the morning and last night got some messages already say hey, they finally have allowed access. So.

Yeah, I had previously put a video about how to get access to dally and that had like 6K views. It is no longer relevant because there is no beta. There’s no little trips, you know, tips and tricks that allow you to, you know, expedite that process. It is now available for you. And, you know, I think there are some things that are super interesting for this, couple articles that have, you know, talked about this already. But one of the things is, first of all, there’s a price to it. So right here you’ve got 15.

Dollars for 115 credits, that’s about 460 square images. There is the ability to apply and I think I’ve got the link here. I’ll pop it up and drop it just in case if you are, you know, if you’re an artist you need financial assistance in there for good reason. There’s a way to apply for, for subsidized access. So you’ve got that ability there. A couple other sort of pieces are that there is this idea of like sort of you know what happens now and I think in general.

You know, stable diffusion has already sort of been criticized for this, that they have allowed access and to, you know, wide access to this system. And already sort of pornographic and violent images have been created and open AI specifically talks that they have created filters they’ve prepared for this moment. They’ve been doing a lot of work to make sure that the and I’ve got up here make sure that the release of this is as safe as possible and you know, sort of adheres to, you know. You know the right guidelines for using this no matter what your definition of that is. Obviously that can be, you know, very different depending on, you know, who you are and what you think this should be capable. But if you violate the content policy, they can actually see. So if you put a text prompt in with things that are violent or sexual or whatever, that will be flagged on your account. If you continue to do that, you could land yourself in trouble. So that’s something to consider if you’re excitedly rushing into this and you know, can’t wait and but you have these desires to do these certain things.

That maybe are not, you know, under the, you know, the guise of, you know, not accepted under the guise of open AI here. So I think this is again super interesting day. We’re going to see what comes out of this. We’re going to see a flood of generative AI tools creating, you know, content. Yesterday published a little video on Getty Images and Shutter stock removing, you know, removing access, you know, basically removing the ability to upload images that were created by AI.

Their systems, so there’s already pushed back, there’s already thoughts about copyright, but overall this has been a huge sort of. You know, crater of people being able to flex their own imagination, creativity without maybe some of the, you know, skills that, you know, people, you know, have worked really hard to build up. And some people would say, you know, that’s a good thing. It’s democratizing access to creativity. Other people would think, you know, that that is a right that should be built over practice and work and skill. And I think this conversation is going to continue at the edge. There’s a, you know, a philosophical piece to this.

And then there’s also just the incredible sort of technical innovation that has enabled this through, you know, Transformers and large language models and, you know, in the end, you know, from my perspective, I see both sides, but I just do feel I feel fired up. I feel inspired by the incredible. Art that is coming out of this and I hope that you can now go ahead, get access to Dolly and find, you know, find happiness, find joy, find love in all this because it is a super interesting and exciting time in technology and I think just the world in general. So if you like this, I’ll continue to share more of this as more news releases, but this is a big day for generative AI and appreciate you checking it out and I hope you have a lot of fun playing with. This has been the bride. Wonderful rest of your day. Bye bye.

More To Explore

Podcast

Tumblr and WordPress Selling Data To Midjourney and OpenAI

Interested in Tumblr and WordPress Selling Data To Midjourney and OpenAI? Check out the latest video and resources from Tyler Bryden on Tumblr and WordPress Selling Data To Midjourney and OpenAI!

Tyler Bryden February 27, 2024

Podcast

Mistral Releases New AI Model Mistral Large & Partners With Microsoft

Interested in Mistral Releases New AI Model Mistral Large & Partners With Microsoft? Check out the latest video and resources from Tyler Bryden on Mistral Releases New AI Model Mistral Large & Partners With Microsoft!

Tyler Bryden February 26, 2024

Podcast

Google’s Gemini Won’t Generate White People

Interested in Google’s Gemini Won’t Generate White People? Check out the latest video and resources from Tyler Bryden on Google’s Gemini Won’t Generate White People!

Tyler Bryden February 22, 2024

Podcast

2023 YouTube Year In Review

Interested in 2023 YouTube Year In Review? Check out the latest video and resources from Tyler Bryden on 2023 YouTube Year In Review!

Tyler Bryden January 2, 2024

Podcast

Founder Wealth

Interested in Founder Wealth? Check out the latest video and resources from Tyler Bryden on Founder Wealth!

Tyler Bryden December 5, 2023

Podcast

Datastreamer, Diply & Unstructured Data

Interested in Datastreamer, Diply & Unstructured Data? Check out the latest video and resources from Tyler Bryden on Datastreamer, Diply & Unstructured Data!

Tyler Bryden November 28, 2023

OpenAI Releases Open Source Speech Recognition System Whisper

Episode Summary

YouTube Video

Resources

Affiliates

Shure MV7

Shure SM7B

Hashtags

Automated Transcription

More To Explore

Tumblr and WordPress Selling Data To Midjourney and OpenAI

Mistral Releases New AI Model Mistral Large & Partners With Microsoft

Google’s Gemini Won’t Generate White People

2023 YouTube Year In Review

Founder Wealth

Datastreamer, Diply & Unstructured Data

Connect

Listen to my podcast:

Support my work

Share This Post

Join My Personal Newsletter ❤

Get insights and resources into awareness, well-being, productivity, technology, psychedelics and more.

Let's Grow Together.

Connect

Social Channels

How to Contribute

Don't want to chat but want to keep updated?

You have Successfully Subscribed!

Pin It on Pinterest