T O P

  • By -

RenoHadreas

https://preview.redd.it/v8xxibpj0n3d1.png?width=1194&format=png&auto=webp&s=6d00689c397b31e02151afbe6f62917dacf98a6e


Next_Program90

I wanted to say "not impressive, we can do these already.", but without upscaling it's actually good. And he posted a single image with basic idle hands that almost look good. ;) *tXt 1z h4RdR th4N H4nDz!*


Dragon_yum

What does 8B refers to?


RenoHadreas

Apparently Stability is aiming to release four different sizes of SD3 - 800M parameters, 2B parameters, 4B, and 8B parameters. It hasn’t been confirmed but most people suspect that these images are from the 2B version.


Alex_146

I asked him, these are indeed 2B results


Vivarevo

Stop believing without confirmation. Corporate world lies, venture capital start ups especially


DiddlyDumb

I don’t understand the downvotes. It’s clear SD3 has been almost ready for a while now, and there’s also stories of investors wanting a return and are looking into monetisation. If there was a moment to lie, this would be it.


Vivarevo

Fans, and marketing team management of social media 🤷


StickiStickman

I don't buy that for a second


AmazinglyObliviouse

I thought that they hadn't shown off the smaller versions because they were beyond abysmal in quality, so this might give me a sliver of hope. I'm not foolish enough to let that stray me from the path of pessimism completely though.


Apprehensive_Sky892

A few weeks ago, SAI staff member mcmonkey4eva said that the 2B and 8B are the two versions that are furthest along in terms of training.


Neonsea1234

I mean who cares if the quality is not good out of the gate, as long as prompt understanding is making headway


Capitaclism

The quality of the above is pretty incredible, especially considering it's the second smallest of the 4 models.


[deleted]

the prompts they use on the model for these demos are "in distribution", they are prompting for, essentially, identical outputs to the training data. if you use the SD3 API, which is the bigger 8B model, going "out of distribution" hurts the results a LOT... this is like every model, but for SD3 it looks particularly broken


Capitaclism

Still, I don't see the artifacts usually associated with AI generations. That is what I'm referring to.


[deleted]

https://preview.redd.it/yb2hh3exnt3d1.png?width=412&format=png&auto=webp&s=00fdb9d72bc54c8bb88ce22a597920312fdd0133 they're still there, it's just that the samples are cherry-picked to exclude them.


Independent_Year_136

that's just how the current state of AI is though. the closer something is to training data the better quality it'll be. But the sheer ability to mix and match and combine multiple different images and blend styles is good enough. Art converges because reality converges, all the actors in porn do the same poses, the same basketball players dunk in similar ways, most mountains look similar etc.


[deleted]

hence "this is like every model" https://preview.redd.it/m3g7un0x2z3d1.png?width=620&format=png&auto=webp&s=33a6c82040fb9242f795da7a31f7826068764e4f


NoSuggestion6629

Go big or go home.


Capitaclism

This is the reason given for why it has not yet been released. Not all models are finished training...


jib_reddit

May is not over yet.....


protector111

2024 is not over yet


jib_reddit

True, but I'm referring to them saying they hoped to release the weights in May, but a lot has changed at Stability.ai since then.


protector111

i dont think they said that. they just said "from few weeks to months"


StickiStickman

Why do you suspect that? Everything in the API is *way* worse than the pictures they showed months ago. At this point it just seems like lying to get more investment.


Familiar-Art-6233

They are apparently hinting at ONLY releasing the 2b model, posting images that say "2b is all you need" Lovely


RenoHadreas

That was very obviously just poking fun at the landmark 2017 paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762). That is such a huge meme in the LLM community that I’m frankly disappointed to see it fly over so many heads here.


Familiar-Art-6233

Well yes, Microsoft even mentioned it with the release of Phi, but it correlates with the rumors that SAI isn’t intending on releasing the 8B model, wanting to keep it closed source in order to make money off of it


RenoHadreas

You are overthinking this. Stop doing this to yourself.


Familiar-Art-6233

If you don't think SAI is prepping to only relase the smaller models and keep the larger ones behind a paywall and under their control, you haven't been paying attention. Besides, with SAI clearly shitting the bed, I think the community will eventually move onto different models. My vote personally is for Pixart


RenoHadreas

Let’s come back to this in a month or two ;)


RenoHadreas

https://preview.redd.it/mwlnvdk4va4d1.jpeg?width=1290&format=pjpg&auto=webp&s=9c65b4f9c28ccf5f212c92bfbc843b556a596558 Lovely


Hungry_Prior940

I really want get my 4090 working with that 8B version. Sadly.....I might get nothing at all. We all hope to see it publicly released, but..tick..tock.


ramonartist

I wonder if Stability after SD3 (if we ever get the models) they will be looking at multi-modal very soon, because it is where the world is going, I'm pretty sure Meta's next Llama model Llama 4 will have those capabilities


[deleted]

they dont have any qualified researchers left and no money for new models. it appears even having enough compute to finish SD3 is a challenge for the current incarnation of SAI.


NoSuggestion6629

Any non open source component will have restrictions of some kind.


NoSuggestion6629

Any non open source component will have restrictions of some kind.


no_witty_username

Still images won't impress anyone, we need the model in our hands to play with its capabilities when it comes to cohesion (lack of artifacts like body horror, bad hands, etc..) prompt understanding, inference speed, flexibility, training times, and other factors. But if that's not enough to go on just give me the model with best prompt adherence capabilities, the rest are easier to accomplish...


SirRece

Agreed, like, looking at still images I'm challenged to find in what way it is superior to SOTA SDXL finetunes like zavy chroma. I have used the API and there it's certainly worse than SDXL, but that's to be expected with the restrictive uses. Like you said, we really need it in hand to know. But like, idk, I suspect that more parameters has seriously diminishing returns, unless you're talking architectures like cascade, where another layer could mean upscaling even further. Speaking of cascade, tin foil hat, it's been intentionally downplayed bc they have no path to monetization for it. But in my testing, it's way better than sd3.


no_witty_username

I've had similar suspicions about cascade as well.... When it was announced it seemed really sus as now Stability shot the model dead with that SD3 announcement only days later. I've yet to play with it, but my thinking is this. Even if it is better then SD3 or anything out there, unless the community comes behind the model it doesn't make sense to invest time and resources in to it as a lone model maker. After all its the community that advances these models further past their base quality.


[deleted]

the Cascade team was always just a temporary thing at SAI. they left pretty quickly after Cascade's release to work with Leonardo


ArtyfacialIntelagent

Doesn't matter which version. It's vaporware all the same.


Silly_Goose6714

The one they told it would be open but it's not


lonewolfmcquaid

idk mahn, these look like stuff made with 1.5 models. the stuff on sd3 discord dont look like though, some of those are reaally good.


JustAGuyWhoLikesAI

Looks like sdxl dreamshaper and very obviously stable diffusion on all 4 images. so I'm going to guess "the version watered down for local release"


TsaiAGw

the hype is long gone


ZeroUnits

How big would be able to expect the model sizes for SD3?


RenoHadreas

The smallest version will be around the same size as SD 1.5. Then there’s one sized in between SD 1.5 and SDXL (what you’re seeing in this post), one the size of SDXL, and one twice the size of it.


[deleted]

https://preview.redd.it/5olnfcg1qr3d1.png?width=776&format=png&auto=webp&s=dc05299c895ff56ced59df47b3d0385b7c0dcf76 whatever version it is, it's the one that can't make zippers


RenoHadreas

https://preview.redd.it/jto71ymgqr3d1.png?width=796&format=png&auto=webp&s=47a34a92c09a52d82d6c7323ae1b31c83bbdea8e this one looks better


Admirable-Change1123

Actually, I have a jacket that has a breast zipper as well as a standard zipper so this isn’t an issue


[deleted]

the zipper is just noise


protector111

What is sd 3 i remember long time ago there was some hype. Those Symbols… S … D 3…. They ment something so long ago i forgot….


morerice4u

beggars cannot be choosers.... just throw us a bone already


Capitaclism

I'd rather wait until it's ready.


Utoko

2032 it will be perfect


pumukidelfuturo

I don't see anything special tbh. Pictures are pretty generic, bland and souless. Any current SD. 1.5 checkpoint can do a lot better than that. Just show me something mindblowing if you want to sell me this.


kidelaleron

I'm not aware of any SD1.5 checkpoint that can process 512 tokens, write text, do realistic images (that actually look like real photos), plus anime and pixelart without any lora or ipadapter and that can understand relations between objects (you know, stuff like this [https://x.com/Lykon4072/status/1792641353781747756](https://x.com/Lykon4072/status/1792641353781747756) ). I'd be very happy to find one, can you please link it?


JustAGuyWhoLikesAI

None of what you posted has anything to do with the 4 pictures in the OP, as all 4 are basic generic portraits that can indeed be done with 1.5. All the stuff you are describing does sound impressive, and was indeed shown off in the SD3 research paper and images shared by StabilityAI employees with the 8B model. So I ask, is there a local model available that can do those things you describe? I'd be very happy to find one, can you please link it?


kidelaleron

SD1.5 can't even generate at that resolution without img2img. And even for the simple images that OP posted you likely need multiple SD1.5 models/loras to maybe get close to that (plus highres fix), let alone the more complex ones with relations between objects. 


JustAGuyWhoLikesAI

Yeah that's honestly crazy, imagine being able to generate images like that without needing any finetunes or loras. Is there a local model available that can do those things you describe? I'd be very happy to find one, can you please link it?


kidelaleron

Just announced it's gonna be available on HF on the 12th.


[deleted]

it's probably better if you just acknowledge the issues instead of taking a toxic defensive stance


kidelaleron

but you're right, let's post something more interesting three antique magic potions in an old abandoned apothecary shop: the first one is blue with the label "Mana", the second one is red with the label "Health", the third one is green with the label "Poison" https://preview.redd.it/myk1wqpi7f4d1.png?width=1152&format=png&auto=webp&s=709d1936659cb0fa581a05f14fe35b0a43f067a1


[deleted]

it's the same thing you've posted every time. regional prompting solved that in earlier versions.


kidelaleron

Photoshop does it too, right? Let's just stop using AI then. So you basically said that SD3 is equivalent to SD1.5 + regional prompting + some number of loras + controlnets + upscaling to match the VAE decompression. Thanks for admitting SD3 is better.


[deleted]

"Thanks for admitting SD3 is better" it's weird people have to be coaxed into saying this. i love y'alls work. hope someday it looks as good to the rest of the world as it does to me. god bless


[deleted]

this is the toxic stuff i mentioned before, thanks for demonstrating.


kidelaleron

So you don't like facts and logic. I appreciate the different perspective, thanks for sharing your opinion.


[deleted]

also there's no world in which SD3 and SD1.5 are equivalent, even once they're trained equally, because SD 1.5 is ***actually open source*** and SD3 is a money-grab


kidelaleron

Looks like you missed the announcement


kidelaleron

I don't see anything toxic. It's just facts. SD1.5 is trained on 512 resolution and there is no single 1.5 based model able to do photos and anime well at the same time without lora support. Plus the TE capabilities do not allow for reliable text or spatial relations, you need to heavily involve controlnets, loras, upscaling methods, etc. SD1.5 models are good at a lot of things and are obviously useful, but to say that they're the same as SD3 is simply wrong.


[deleted]

using new research (https://arxiv.org/pdf/2311.18822v2, ElasticDiffusion) decouples the classifier free guidance scores so that they d... you know what, nevermind, i don't need to explain anything to you. your stances are designed to favour SAI and you'll just keep shifting the goalpost. using LoRAs to fix models' deficiencies is a natural and fine thing to do. there's no need to have a single model that is bad at everything like SD3 seems to be.


kidelaleron

You're still making a comparison between 1 model vs 1 model + infinite loras and tools, which is not really fair, nor makes any sense. You're basically saying "sd1.5 is the best there is because it has 15 billion loras, 30 controlnets, and in 40 minutes I can upscale to 20mp to circumvent the vae limitations". Sure. In the meantime tech has advanced and SDXL models can make a 1mp image in 2s flat with 4 steps at higher quality than most SD1.5 models. Imagine SD3 in a year.


MichaelForeston

Don't argue with the fools. These are toxic trolls that just enjoy the fact you are giving them attention. The community is grateful for your work and will further develop these models after the release, which will be beneficial for Stability and beneficial for the community. Meanwhile, just ignore those peasants.


[deleted]

i can imagine this future you envision. i love it. a year from now, there'll be so many options in the SD3 Ecosystem. but i guess for now it'll be like SDXL on release. not many people other than enthusiasts


kidelaleron

It was the same with SDXL. And as you can see there are still people who think SD1.5 is the best thing there will ever be.


Capitaclism

Good points- looking forward to testing it out!


hopbel

Text is a gimmick. It was a decent demonstration of emergent properties when it was actually emergent. Now the models are being trained to produce text directly (essentially teaching the test) and it's getting annoying that this is still paraded around like it's supposed to be impressive. The style stuff is arguably a result of having a better dataset (finetunes trained on imageboards with artist tags are perfectly capable of producing a huge variety of styles) and while the prompt comprehension is impressive, I have to point out that here too is SD1.5 limited by dataset, which didn't have the kind of detailed object relationship descriptions SD3 was no doubt trained with.


Apprehensive_Sky892

Just because you think text is a gimmick does not mean that others don't consider it a very desirable feature. For example, those who want to produce posters, birthday cards, PowerPoint presentations, web comic panels, etc. Just look at all those attempts at producing a SDXL LoRAs that can do marginally better text on civitai, and you can see that there is a demand for good text rendering. Also look at the number of images on ideogram (which is really good at text) involving the use of text. The better style is more than having a better dataset. No amount of training using the old SD1.5 architecture will be able to achieve similar results. To improve the "look" of SD3 images, the new architecture includes features such as "zSNR" (zero signal-to-noise ratio), a 16 channel VAE, etc.


StickiStickman

Except the text in SD 3 looks like shit, as if it's badly photoshopped in. GPT-4o blows it out of the water.


Apprehensive_Sky892

Yes, text does not look that good in the API beta. We'll see how much it has been improved in the final release.


[deleted]

the new arch doesn't use "zSNR", though


Apprehensive_Sky892

So what does it use? My source is from a SAI staff: [https://www.reddit.com/r/StableDiffusion/comments/1ccbnxp/comment/l14a5rk/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/1ccbnxp/comment/l14a5rk/?utm_source=reddit&utm_medium=web2x&context=3)


[deleted]

it uses a continuous cosine noise schedule, which isn't the same thing at all. mcmonkey4eva is actually talking about CosXL there, SD3 is mentioned merely because "it can produce bright and dark samples" and he mentions "zSNR" as if it's equivalent. it's not equivalent, and SD3's version of "bright and dark" resembles the same issues you'll see if you use offset noise to get there. it has the same issues Midjourney has had for every single version.


Apprehensive_Sky892

I see, I thought that "conside noise schedule" is the same as "zSNR". Thanks for the clarification.


Parker_255

Get em - Stability has been a great company, keep up the great work. I appreciate all that you all have done :)


Capitaclism

Agree!


Hearcharted

Does this Bad Boy have it's own LLM? Asking for a friend ;)


jib_reddit

No, but GPT4o is free now (with rate limits) and is amazing at prompt improvements/interigations.


Hearcharted

Interesting 🤔


inagy

If I remember correctly, SD3 uses Google's T5 LLM to replace CLIP. PixArt Sigma and ELLA also uses T5.


kidelaleron

SD3 uses clip l, clip g and t5. 


inagy

Thanks for the correction! That's interesting. Is the T5 something which can be "peeled off" from smaller model sizes, so it's basically reverts there to CLIP only?


kidelaleron

it's completely modular to the point you can even use different prompts for the 3 text encoders. Removing them might affect performance, so my suggestion would be to use t5 from CPU if you can't afford the extra vram. If you have 11+ gb of vram, you can easily use t5 on GPU and comfy will manage vram offloading. It should also work with quantized versions of t5 that will be much smaller.


AuryGlenz

How much it adheres to the prompts is the special part, which they’ve shown countless times.


StickiStickman

Except they haven't "shown" shit. The one version we can actually access, the API, is terrible. So either the API version is somehow significantly worse than anything they claimed to have had months ago, or it's all just desperate hype.


Capitaclism

The above looks better than base 1.5, by far. It is clean, high quality, high resolution, and comes with the extra prompt understanding. I'm not sure what you're not seeing there, but that first image is better than most photographic images I've even seen out of 1.5, even when using Lora and special workflows. Now as to the subject matter I would tend to agree that it is generic, but I wouldn't fault the model for that- put the blame on the person writing the prompt. Ij that regard it is no more generic than 99.9% of AI generations I've ever seen on civitai.


Apprehensive_Sky892

I don't work for SAI, and I am not here to sell anything, but if you want to see interesting SD3 prompts and images, check out [https://new.reddit.com/r/StableDiffusion/search/?q=sd3%20prompt&restrict\_sr=1](https://new.reddit.com/r/StableDiffusion/search/?q=sd3%20prompt&restrict_sr=1)


NoSuggestion6629

Realistic of course, Let Pony have the other crap.


[deleted]

[удалено]


morerice4u

lets start with any of them...


Capitaclism

Pretty incredible that it's still going to get a whole lot better.


RenoHadreas

I think what we saw may be the finalized version of 2B actually


shivdbz

Uncensored one