– Not available to the general public
– Trained by an internal Google dataset made up of 14 million video-text samples and 60 million image-text pairs
– Imagen Video model outputs 16 frames of 3fps footage at 24×48 resolution
– Boosted to generate a final 128-frame video at 1280×768 resolution and 24 frames per second – resulting in 5.3 seconds of high-definition video
– Share multiple examples
– “Video generative models can be used to positively impact society, for example by amplifying and augmenting human creativity”
– Another model released is called Phenaki which I will cover in more detail
You can also learn how you can search for and find Stable Diffusion prompts using Lexica! And, if you don’t yet have access, find out how to get Beta access to Stable Diffusion by Stability AI.
Did you know that DALL-E Mini rebranded as Craiyon?
Google teases its own AI text-to-video model: Imagen Video • The Register
Text-to-image models are dated, text-to-video is in now • The Register
Google’s newest AI generator creates HD video from text prompts | Ars Technica
Google demos two new text-to-video AI systems, focusing on quality and length – The Verge
All these images were generated by Google’s latest text-to-image AI – The Verge
Google’s Imagen takes on Meta’s Make-A-Video as text-to-video AI models ramp up | VentureBeat
Google answers Meta’s video-generating AI with its own, dubbed Imagen Video | TechCrunch
Google Imagen: AI art but it’s videos now | Boing Boing
Google Answers Meta’s Video-generating AI with Its Own, Dubbed Imagen Video – Latest Tweet by TechCrunch | 📲 LatestLY
Introducing Make-A-Video: An AI system that generates videos from text
Sundar Pichai on Twitter: “Two important breakthroughs from @GoogleAI this week – Imagen Video, a new text-conditioned video diffusion model that generates 1280×768 24fps HD video. And Phenaki, a model which generates long coherent videos for a sequence of text prompts. https://t.co/nTs67r21Sf https://t.co/sOf5clkB3V” / Twitter
Dumitru Erhan 🇺🇦 (@doomie) / Twitter
text to video,text to video ai,imagen video,imagen,google ai,google brain,midjourney,make-a-video,meta,stable diffusion,stability ai,craiyon,dalle,dalle2,openai,image generation,video generation,generative ai,meta ai,dall e 2
Alright. Hello. Hello. Hello. Tyler Bryden here. I hope everything’s going well. Man, the days are flying by and with every day that flies by, a new development in large language model models and generative AI is born. And this one was a big one after last week. I think even this we can’t even know. Losing track of time.
Facebook released you know make a video basically text prompt to video generation. People are saying hi beta coming to everyone’s face and just slaps out anyone who’s doing image generation. Google comes in enough to warrant a tweet from Sundar himself and talks about image and video. A new text condition video diffusion model that generates 1280 by 76824 frames per second HD video. On top of that, they drop, but I’m guessing it’s pronounced Funaki and that one has long coherent videos for a sequence of text moms. Super fascinating. I’m going to specifically talk about image and video today, but I’m going to do a dedicated video on furnace because it’s just mind blowing what is possible. You’ve got this part here.
Completely generated. Obviously, you know struggles with text and spelling, but overall super fascinating. And then you can see the responses starting to come. I’m excited, I’m scared. All these wonderful things. Now, one thing I wanted to do is give some credit to some of the people who are behind this. So I’ve got one of the criticisms, specifically Funaki. But then I’ve got Jonathan Hoe who originally announced image and video and then he’s tagged a bunch of great people on it. So in Google or in the YouTube video in the description, I’ll.
Drop these links and these are definitely people who are worthwhile to follow, all of them working sort of Google AI, Google brain team and just doing some mind blowing stuff. So if you’re interested in this, definitely some people to keep tabs on as they’re going through and I just wanted to hop in on a couple of things so there was. Sort of make a video last week and then like this sort of the register says here, not meta. Now we can do better. We can do, we can do it even better. And so one of the challenges with the Facebook make a video was the actual resolution. So the content resolution at that one was 64 pixels by 64 pixels at 16 frames, and then they were resized using a separate AI model, and then they ended up at 768 by 768. And so that’s what happened. As we look into one of the links here, let’s see if it’s, I’ve got the right one, basically Google, while they’re claiming that it’s.
Did they actually did the same thing? So they used basically some low res content which was only, so the original version was only 16 frames at three frames per second at 24 by 24. And then they actually boosted it. So they’re specifically saying, you know, it’s higher quality than meters, make a video and so in the end we’ve got pretty high resolution videos in them. I’ll just play a couple here.
Now, obviously some things not quite. Adding U. But overall. Aye. I mean it’s obvious to the human eye that these are not perfect, but I think all of us here who are watching this, who are following this are like man, if this is right now, where are we going to be in two years? Not alone like 2 weeks. So because again you know I said this before, but just the excitement, the competition like between open AI, between mid journey, stable diffusion, Facebook slash meta, Now Google, all of these people want to for whatever reason. I mean I think there’s many reasons why they want to win this battle.
Want to win. And I don’t know if it’s a battle and not I mean it all is a battle for them but they want to be the ones who were the leadings and leaders in innovation around around this sort of generative AI space. And obviously you know text image is one thing. A lot of practical use cases I believe will emerge. People still trying to figure that out but you know text a video and especially with this Funaki piece here which I’ll touch on like being able to tell stories in video through.
Simple text prompts. It’s just I’m overwhelmed thinking about it. I’m sure you know if you’ve ended up here on this video, you may be feel the same way. Couple notes on this not available to the open public. Google talks about how it can be used positively to impact society, for example, amplifying and augmenting human creativity. But at the same time, it can have a negative effect. It can be able to create, generate or generate fake, hateful, explicit or harmful content. And I’m just going to, you know, quickly show what that could be from. So this is, I’ve talked about this in a previous video, basically lexica art. And generally what we’ll see here is they’re starting to be a little bit of some sexualization of the images that are.
And the reason is, stable diffusion doesn’t really limit the kind of text prompts that are able to put in the system. It’s open source and that creates a lot of value, but it also can create problems. What I was looking to hone in on here is that there are images and videos of celebrities who we know. So I’m trying to find one. There we go. We’ve got just example, Donald.
Quite a, you know, an ideal version of Donald I think that he would love to see. But the fact that you can just plug his name and you will get true renders of them without their consent, without their knowledge is pretty dangerous. And I think that’s what Google is talking about now Google you know and his past doesn’t always, not not always the most considerate of, you know people sort of privacy and the ramifications and consequences of technology but I think you know. We all see that this one has such mind blowing. Consequences that are possible with what’s being unleashed here that they are, you know considering the best way to do this just like open AI did and again it’s not going to be a deal. They’re going to be problems and challenges along the way. But in the end there’s a lot of things that can can come from this. So in the end I wanted to show I think I’ve got this last little couple show, this is from the Funaki and basically there was what the prompt and so they share the prompt and then they show you know what actually.
You know, comes out of it, so I should shut up and just watch this. So basically storytelling through video and I’ve got to stop here because I’ve got another, you know, video I want to spend wholly on this because it’s just absolutely insane what is actually taking place here. But generally the idea here is that they’re going to combine this with images, video sort of low res to high resolution endpoint that’s possible and then combine it with Funaki sort of, you know, understanding of storytelling structure through detailed prompts that can create. Longer videos and they’re saying there’s not even some, you know, there won’t even be a limitation. There’s just an arbitrary length with no limits. So that’s it for me today. I thank you for checking this out. I just needed to jump on and share a couple minutes on this image in video release, which is obviously super fascinating, thrown in, you know, following just after meta and everything else going on. And I think that we’re probably going to have a lot more of these videos, of these releases, of these announcements coming out in the not too distant future.
If you’re interested in following those, I’ll be doing my best to cover them. It’s hard at this break, you know, breakneck speed, but I’m doing my best considering all things considering, and I thank you for checking this out. I hope you have a wonderful rest today. Hope you have a wonderful rest of your weekend. If you’re checking this out, then thank you so much for being here with me. Bye, bye.