Speech-to-Art With DALLePaperFrame

This is is part of my live-learning series! I will be updating this post as I continue through my journey. I apologize for any grammatical errors or incoherent thoughts. This is a practice to help me share things that are valuable without falling apart from the pressure of perfection. 

Speak With Tyler Bryden
Speak With Tyler Bryden
Speech-to-Art With DALLePaperFrame

Episode Summary

– Press a button, speak and generate art
– Wav2vec and min-dalle
– 10 second generation time
– Connected with Inky Impression 5.7” ePaper frame


#promptbase #bloom #llm #dalle #openai #bigscience #gpt #gpt3 #gpt2 #dalle2 #huggingface #ai21labs #eleutherai #jeanzay #thomaswolf #idris #genci #minidalle #craiyon #cohere #thedallesong #dalle #openai #imagen #gpt3 #gpt #dalle2 #ai #llm #llms #ml #imagegeneration #dallemini #midjourney #generativeai #texttoimage #ai21labs #cohere #openbeta #beta #davidholz #jimkeller #natfriedman #philiprosedale #billwarner #katherinecrowson #midjourney #ai #ml #aiart #CLIP #wombo #womboai #aiartgenerator #aimusic #musicai #cogvideo


kuprel/min-dalle: min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch
fairseq/README.md at main · facebookresearch/fairseq
Inky Impression 5.7″ (7 colour ePaper/eInk HAT) – Pimoroni
Buy a Raspberry Pi 1 Model B+ – Raspberry Pi
Create With Voice

YouTube Video


Automated Transcription

Alright hello, hello Tyler Bryden here, hope everything’s going well today. Wanna talk about something that just fires me up and that is this idea of speech to art and now I am upset because I can’t find the original LinkedIn post that that led me to this but we’ve got famous director here and he’s writing code. He’s doing stuff on GitHub and he shared a Dolly E paper frame. Basically a document. There’s a whole GitHub link for it that I’ve got pulled up here. And if you can’t see by now, or if you’re just listening, the idea here is almost this. It’s it’s speech to art and and this idea of art on demand, and the fact that now that we have these text prompts with Dolly and image generation, really we should just be able to use our voice to generate these prompts. And this is something that fires me up. As I mentioned, and this is something I’ve actually worked on like I have this little piece here. We’ll see if this works. Harp.

I did not work, now I feel OK. I got a heart I’ll take it there we go so you can see the transcription accuracy not that great when you run it directly through the browser and you’re also running it on multiple things. I’m sorry for yelling harp at you, which was actually I think heart, But I’ll take it and you can see there. I don’t think this is going to generate much if I refresh this page. Dog.

Beautiful little slow little delay, not too much CSS on this little clunky. Also, the fact that I couldn’t figure out the function to submit this form. You know that was a challenge, but the idea here was that I could speak, you know, through keywords through phrases. There should be able to generate images in this this point. We actually looked at Giffy pulled gifts in that were related. This was before some of the proliferation of of of Dolly of Mini Dolly. Some of these systems, and so again, something that basically I’m just trying to show here is I love this stuff.

I think that this is truly a fascinating place. We’re going on here. I could talk for a long time about the potential use cases of this and a world that I think we’re moving into, which is truly interactive where we’re creating our own virtual environments, our own art, our own experiences through just our voice, the understanding of it, understanding of the language of how we say it and then modifying those environments in real time so that future is on its way. And it’s nice to see other people who are also also doing that so dalie paper frame. Thank you to famous director. One day we’ll find your real name.

On LinkedIn, basically asked this question is like why can I, you know, do all these technologies but you know why I want to see this in real time. Basically as quickly as possible and I just want to use my voice and so he has a little. I believe it was. I mean I shouldn’t. I shouldn’t judge that, but I’m going to continue with the fact that I believe it was a he and that I saw on LinkedIn and most so talks about a generator. Obviously most famously Dolly and and then he wanted to figure out how could you locally host that AI generation art and then and then combine that with automatic speech recognition abilities requesting new RB generated on demand. Wonderful so he then linked together many mindeli, which is a massive shout out. Keith gives a massive shout out to Brett Cooper.

An incredible pie torch. Port which you might be able to find in fact, I’ll open this up and then he’s also using the Wave 2 VEC model. Believe originally I should give them, you know, credit without. But through Facebook and some of their releases that they’ve done has been a huge innovation in speech, automatic speech recognition systems. So they you know. So it does. How does it work? Basically in the video GPU and I believe you work for the video epaper frame and then the microphone and the buttons have there’s buttons on this that have different functions.

Generation new art request. A new generation of art with a new prompt from pre built pumps. And then I hadn’t actually read this part. Sorry I should have and then request a new generation with new prompt creative on the microphone. So after the button is pressed the microphone will start recording for three seconds.

And that’s the part that gets really exciting. So obviously this one was very clear. You already saw my heart one break right in front of your eyes. This one worked. It got the sentence, clipped it properly, and then did the painting of a bird. Now you know you could think and talked about it before. And if you’ve watched my videos, you’ve heard me talk about prompt engineering. You wonder if you could have preset prompt, sort of or attributes put into the system. And then you’re just using voice on top of that. So you’re getting, say, hyper realistic or you know psychedelic art or in the style of Monet.

Or things like that. So you’ve already got that pre setting on that. Or you could just say that with your voice and push that through all the way and mini Dolly would understand him in that case and even has a little bit of a piece here. He actually has. How do I use this project so you can actually run this? If I was a nice, you know if I was a good person and I had these pieces of equipment I would actually have run this right here and then showed you might even do that at some point because I just think this is so fantastic. I do think there are problems with the, you know, you know maybe a little bit of tech and he sort of complains about it.

First of all, high resolution the resolution. Yeah, so it only has seven colors and then it takes a bit. So 32nd has a refresh rate of about 30 seconds and then so this is a highly speed up sample so this may not be the best output device to put it on. But obviously you know I linked you here with this. This could be directly onto a, you know, a device that’s.

Just the laptop screen, desktop screen, whatever. That is a monitor and you’ll get much higher resolution of this, and generally it’s pretty easy to port speech into. You know any sort of text input box on browsers these days, and so this is already being realized directly in. You know, in Dali someone’s probably doing this. Maybe too lazy to type or just like speaking, and I think it’s a lot of fun to sort of brainstorm and then just say it right off the off the top here. So I’ve got a couple links not going to spend too much time on this.

Can let you explore it, but I do think this was a really fascinating article and points to a future where we are using our voice to create and getting feedback in real time. Creating in real time and not just being not just. I’m thinking, looking at what’s being said, so a painting of a bird, how that’s being said, what kind of emotion is reflected in that prompt? And then are there any more attributes around it? And then are we not just rendering images? Are we rendering videos I shared about COG two video in our previous video which is text to video generation?

Which you can port the same speech recognition into and I think there’s a long way to go where we move into completely virtual environments and in that case the experiences that we have as people. So I thought this was fun to take a couple minutes to just talk about this talk about this project. Awesome to famous director who again I do not know you Sir. And if you are Sir. If not, I apologize. But a couple of links in here if you to check out and overall I think just you know this was hacked together by someone. Obviously someone intelligent who had quite a bit of inspiration and technical ability.

But points to a future where we’re just speaking in creating, so I hope you enjoyed this. This has been Tyler Braden. If you like this kind of stuff, send me a message you have any other projects that you think are fascinating? Let me know. I would love to hear my name’s Tyler. I’m a cool guy.

Like comment subscribe algorithms? Appreciate you very much. Love you. Hope you have a good rest of the day. Bye bye.


More To Explore

Share This Post

Join My Personal Newsletter ❤

Get insights and resources into awareness, well-being, productivity, technology, psychedelics and more.

Don't want to chat but want to keep updated?

I'd love if you subscribed today. I promise I will only send you great, valuable content that has transformed me and helped others flourish. 

You have Successfully Subscribed!

Pin It on Pinterest