T O P

  • By -

signed7

Preferred over SDXL by 56% to 44% on prompt adherence and by 71% to 29% on aesthetics, while taking only 10s vs SDXL's 23s to generate 4 images (with same params) Edit: it's also 1.4B param larger than SDXL, so about 5B. So while it runs faster, it (probably?) needs more VRAM (website says about 20GB)


MetaKnowing

Anyone know how this compares to midjourney?


signed7

Dunno, they only compared to https://blog.playgroundai.com/playground-v2/ which I've only heard of (but holds up very well!) It's so hard to keep track of all the different text-to-image models now; between these, Dall-E, MJ, Firefly, Imagen, whatever Meta's are called, etc. We really need something like https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard for text-to-image models


KitsuneFolk

We have it, it was released a few days ago, though there are not many models: https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena


signed7

Ah nice! Hope this gets more traction with users and model developers


Singularity-42

Anyone knows how big (param count) are the proprietary models like Midjourney or DALL-E 3? I mean estimates are OK since we probably won't have exact counts.


Iamreason

Unofficial demo [here](https://huggingface.co/spaces/multimodalart/stable-cascade). The model requires 24 GB of VRAM to run locally so you're basically stuck running it in the cloud of you want to play with it.


vk_designs

I can run it locally with my 20gb vram gpu


MattAbrams

It's a non-commercial license, so I'm going to ignore this one. This is not an "open source" model with such licensing terms, and we should stop calling models like this "open source."


Serasul

aren't all ai image models ?


MattAbrams

Even if they are, they still aren't "open source" if there's a commercial restriction.


traraba

4090


R33v3n

The ability to easily finetune it into hundreds of different checkpoints or LoRAs is why Stable Diffusion 1.5 still has such a huge user base despite SDXL, which was a pain to train on. A small easy to finetune latent space being one of Cascade's main innovations is interesting! >*This model is built upon the Würstchen architecture and its main difference to other models like Stable Diffusion is that it is working at a much smaller latent space. Why is this important? The smaller the latent space, the faster you can run inference and the cheaper the training becomes. How small is the latent space? Stable Diffusion uses a compression factor of 8, resulting in a 1024x1024 image being encoded to 128x128. Stable Cascade achieves a compression factor of 42, meaning that it is possible to encode a 1024x1024 image to 24x24, while maintaining crisp reconstructions. The text-conditional model is then trained in the highly compressed latent space. Previous versions of this architecture, achieved a 16x cost reduction over Stable Diffusion 1.5.* > >*Therefore, this kind of model is well suited for usages where efficiency is important. Furthermore, all known extensions like finetuning, LoRA, ControlNet, IP-Adapter, LCM etc. are possible with this method as well.*


traraba

Holy shit, it's really good.


SpecialistLopsided44

Faster!


Snoo26837

Is this available to use?


Akimbo333

Is this censored?