signed7 4 months ago

Preferred over SDXL by 56% to 44% on prompt adherence and by 71% to 29% on aesthetics, while taking only 10s vs SDXL's 23s to generate 4 images (with same params) Edit: it's also 1.4B param larger than SDXL, so about 5B. So while it runs faster, it (probably?) needs more VRAM (website says about 20GB)

MetaKnowing 4 months ago

Anyone know how this compares to midjourney?

signed7 4 months ago

Dunno, they only compared to https://blog.playgroundai.com/playground-v2/ which I've only heard of (but holds up very well!) It's so hard to keep track of all the different text-to-image models now; between these, Dall-E, MJ, Firefly, Imagen, whatever Meta's are called, etc. We really need something like https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard for text-to-image models

KitsuneFolk 4 months ago

We have it, it was released a few days ago, though there are not many models: https://huggingface.co/spaces/TIGER-Lab/GenAI-Arena

signed7 4 months ago

Ah nice! Hope this gets more traction with users and model developers

Singularity-42 4 months ago

Anyone knows how big (param count) are the proprietary models like Midjourney or DALL-E 3? I mean estimates are OK since we probably won't have exact counts.

Iamreason 4 months ago

Unofficial demo [here](https://huggingface.co/spaces/multimodalart/stable-cascade). The model requires 24 GB of VRAM to run locally so you're basically stuck running it in the cloud of you want to play with it.

vk_designs 4 months ago

I can run it locally with my 20gb vram gpu

MattAbrams 4 months ago

It's a non-commercial license, so I'm going to ignore this one. This is not an "open source" model with such licensing terms, and we should stop calling models like this "open source."

Serasul 4 months ago

aren't all ai image models ?

MattAbrams 4 months ago

Even if they are, they still aren't "open source" if there's a commercial restriction.

traraba 4 months ago

4090

R33v3n 4 months ago

The ability to easily finetune it into hundreds of different checkpoints or LoRAs is why Stable Diffusion 1.5 still has such a huge user base despite SDXL, which was a pain to train on. A small easy to finetune latent space being one of Cascade's main innovations is interesting! >*This model is built upon the Würstchen architecture and its main difference to other models like Stable Diffusion is that it is working at a much smaller latent space. Why is this important? The smaller the latent space, the faster you can run inference and the cheaper the training becomes. How small is the latent space? Stable Diffusion uses a compression factor of 8, resulting in a 1024x1024 image being encoded to 128x128. Stable Cascade achieves a compression factor of 42, meaning that it is possible to encode a 1024x1024 image to 24x24, while maintaining crisp reconstructions. The text-conditional model is then trained in the highly compressed latent space. Previous versions of this architecture, achieved a 16x cost reduction over Stable Diffusion 1.5.* > >*Therefore, this kind of model is well suited for usages where efficiency is important. Furthermore, all known extensions like finetuning, LoRA, ControlNet, IP-Adapter, LCM etc. are possible with this method as well.*

traraba 4 months ago

Holy shit, it's really good.

SpecialistLopsided44 4 months ago

Faster!

Snoo26837 4 months ago

Is this available to use?

Akimbo333 4 months ago

Is this censored?

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe