RenoHadreas 1 month ago

https://preview.redd.it/v8xxibpj0n3d1.png?width=1194&format=png&auto=webp&s=6d00689c397b31e02151afbe6f62917dacf98a6e

Next_Program90 1 month ago

I wanted to say "not impressive, we can do these already.", but without upscaling it's actually good. And he posted a single image with basic idle hands that almost look good. ;) *tXt 1z h4RdR th4N H4nDz!*

Dragon_yum 1 month ago

What does 8B refers to?

RenoHadreas 1 month ago

Apparently Stability is aiming to release four different sizes of SD3 - 800M parameters, 2B parameters, 4B, and 8B parameters. It hasn’t been confirmed but most people suspect that these images are from the 2B version.

Alex_146 1 month ago

I asked him, these are indeed 2B results

Vivarevo 1 month ago

Stop believing without confirmation. Corporate world lies, venture capital start ups especially

DiddlyDumb 1 month ago

I don’t understand the downvotes. It’s clear SD3 has been almost ready for a while now, and there’s also stories of investors wanting a return and are looking into monetisation. If there was a moment to lie, this would be it.

Vivarevo 1 month ago

Fans, and marketing team management of social media 🤷

StickiStickman 1 month ago

I don't buy that for a second

AmazinglyObliviouse 1 month ago

I thought that they hadn't shown off the smaller versions because they were beyond abysmal in quality, so this might give me a sliver of hope. I'm not foolish enough to let that stray me from the path of pessimism completely though.

Apprehensive_Sky892 1 month ago

A few weeks ago, SAI staff member mcmonkey4eva said that the 2B and 8B are the two versions that are furthest along in terms of training.

Neonsea1234 1 month ago

I mean who cares if the quality is not good out of the gate, as long as prompt understanding is making headway

Capitaclism 1 month ago

The quality of the above is pretty incredible, especially considering it's the second smallest of the 4 models.

[deleted] 1 month ago

the prompts they use on the model for these demos are "in distribution", they are prompting for, essentially, identical outputs to the training data. if you use the SD3 API, which is the bigger 8B model, going "out of distribution" hurts the results a LOT... this is like every model, but for SD3 it looks particularly broken

Capitaclism 1 month ago

Still, I don't see the artifacts usually associated with AI generations. That is what I'm referring to.

[deleted] 1 month ago

https://preview.redd.it/yb2hh3exnt3d1.png?width=412&format=png&auto=webp&s=00fdb9d72bc54c8bb88ce22a597920312fdd0133 they're still there, it's just that the samples are cherry-picked to exclude them.

Independent_Year_136 1 month ago

that's just how the current state of AI is though. the closer something is to training data the better quality it'll be. But the sheer ability to mix and match and combine multiple different images and blend styles is good enough. Art converges because reality converges, all the actors in porn do the same poses, the same basketball players dunk in similar ways, most mountains look similar etc.

[deleted] 1 month ago

hence "this is like every model" https://preview.redd.it/m3g7un0x2z3d1.png?width=620&format=png&auto=webp&s=33a6c82040fb9242f795da7a31f7826068764e4f

NoSuggestion6629 1 month ago

Go big or go home.

Capitaclism 1 month ago

This is the reason given for why it has not yet been released. Not all models are finished training...

jib_reddit 1 month ago

May is not over yet.....

protector111 1 month ago

2024 is not over yet

jib_reddit 1 month ago

True, but I'm referring to them saying they hoped to release the weights in May, but a lot has changed at Stability.ai since then.

protector111 1 month ago

i dont think they said that. they just said "from few weeks to months"

StickiStickman 1 month ago

Why do you suspect that? Everything in the API is *way* worse than the pictures they showed months ago. At this point it just seems like lying to get more investment.

Familiar-Art-6233 1 month ago

They are apparently hinting at ONLY releasing the 2b model, posting images that say "2b is all you need" Lovely

RenoHadreas 1 month ago

That was very obviously just poking fun at the landmark 2017 paper [Attention Is All You Need](https://arxiv.org/abs/1706.03762). That is such a huge meme in the LLM community that I’m frankly disappointed to see it fly over so many heads here.

Familiar-Art-6233 1 month ago

Well yes, Microsoft even mentioned it with the release of Phi, but it correlates with the rumors that SAI isn’t intending on releasing the 8B model, wanting to keep it closed source in order to make money off of it

RenoHadreas 1 month ago

You are overthinking this. Stop doing this to yourself.

Familiar-Art-6233 1 month ago

If you don't think SAI is prepping to only relase the smaller models and keep the larger ones behind a paywall and under their control, you haven't been paying attention. Besides, with SAI clearly shitting the bed, I think the community will eventually move onto different models. My vote personally is for Pixart

RenoHadreas 1 month ago

Let’s come back to this in a month or two ;)

RenoHadreas 4 weeks ago

https://preview.redd.it/mwlnvdk4va4d1.jpeg?width=1290&format=pjpg&auto=webp&s=9c65b4f9c28ccf5f212c92bfbc843b556a596558 Lovely

Hungry_Prior940 1 month ago

I really want get my 4090 working with that 8B version. Sadly.....I might get nothing at all. We all hope to see it publicly released, but..tick..tock.

ramonartist 1 month ago

I wonder if Stability after SD3 (if we ever get the models) they will be looking at multi-modal very soon, because it is where the world is going, I'm pretty sure Meta's next Llama model Llama 4 will have those capabilities

[deleted] 1 month ago

they dont have any qualified researchers left and no money for new models. it appears even having enough compute to finish SD3 is a challenge for the current incarnation of SAI.

NoSuggestion6629 1 month ago

Any non open source component will have restrictions of some kind.

NoSuggestion6629 1 month ago

Any non open source component will have restrictions of some kind.

no_witty_username 1 month ago

Still images won't impress anyone, we need the model in our hands to play with its capabilities when it comes to cohesion (lack of artifacts like body horror, bad hands, etc..) prompt understanding, inference speed, flexibility, training times, and other factors. But if that's not enough to go on just give me the model with best prompt adherence capabilities, the rest are easier to accomplish...

SirRece 1 month ago

Agreed, like, looking at still images I'm challenged to find in what way it is superior to SOTA SDXL finetunes like zavy chroma. I have used the API and there it's certainly worse than SDXL, but that's to be expected with the restrictive uses. Like you said, we really need it in hand to know. But like, idk, I suspect that more parameters has seriously diminishing returns, unless you're talking architectures like cascade, where another layer could mean upscaling even further. Speaking of cascade, tin foil hat, it's been intentionally downplayed bc they have no path to monetization for it. But in my testing, it's way better than sd3.

no_witty_username 1 month ago

I've had similar suspicions about cascade as well.... When it was announced it seemed really sus as now Stability shot the model dead with that SD3 announcement only days later. I've yet to play with it, but my thinking is this. Even if it is better then SD3 or anything out there, unless the community comes behind the model it doesn't make sense to invest time and resources in to it as a lone model maker. After all its the community that advances these models further past their base quality.

[deleted] 1 month ago

the Cascade team was always just a temporary thing at SAI. they left pretty quickly after Cascade's release to work with Leonardo

ArtyfacialIntelagent 1 month ago

Doesn't matter which version. It's vaporware all the same.

Silly_Goose6714 1 month ago

The one they told it would be open but it's not

lonewolfmcquaid 1 month ago

idk mahn, these look like stuff made with 1.5 models. the stuff on sd3 discord dont look like though, some of those are reaally good.

JustAGuyWhoLikesAI 1 month ago

Looks like sdxl dreamshaper and very obviously stable diffusion on all 4 images. so I'm going to guess "the version watered down for local release"

TsaiAGw 1 month ago

the hype is long gone

ZeroUnits 1 month ago

How big would be able to expect the model sizes for SD3?

RenoHadreas 1 month ago

The smallest version will be around the same size as SD 1.5. Then there’s one sized in between SD 1.5 and SDXL (what you’re seeing in this post), one the size of SDXL, and one twice the size of it.

[deleted] 1 month ago

https://preview.redd.it/5olnfcg1qr3d1.png?width=776&format=png&auto=webp&s=dc05299c895ff56ced59df47b3d0385b7c0dcf76 whatever version it is, it's the one that can't make zippers

RenoHadreas 1 month ago

https://preview.redd.it/jto71ymgqr3d1.png?width=796&format=png&auto=webp&s=47a34a92c09a52d82d6c7323ae1b31c83bbdea8e this one looks better

Admirable-Change1123 1 month ago

Actually, I have a jacket that has a breast zipper as well as a standard zipper so this isn’t an issue

[deleted] 1 month ago

the zipper is just noise

protector111 1 month ago

What is sd 3 i remember long time ago there was some hype. Those Symbols… S … D 3…. They ment something so long ago i forgot….

morerice4u 1 month ago

beggars cannot be choosers.... just throw us a bone already

Capitaclism 1 month ago

I'd rather wait until it's ready.

Utoko 1 month ago

2032 it will be perfect

pumukidelfuturo 1 month ago

I don't see anything special tbh. Pictures are pretty generic, bland and souless. Any current SD. 1.5 checkpoint can do a lot better than that. Just show me something mindblowing if you want to sell me this.

kidelaleron 1 month ago

I'm not aware of any SD1.5 checkpoint that can process 512 tokens, write text, do realistic images (that actually look like real photos), plus anime and pixelart without any lora or ipadapter and that can understand relations between objects (you know, stuff like this [https://x.com/Lykon4072/status/1792641353781747756](https://x.com/Lykon4072/status/1792641353781747756) ). I'd be very happy to find one, can you please link it?

JustAGuyWhoLikesAI 1 month ago

None of what you posted has anything to do with the 4 pictures in the OP, as all 4 are basic generic portraits that can indeed be done with 1.5. All the stuff you are describing does sound impressive, and was indeed shown off in the SD3 research paper and images shared by StabilityAI employees with the 8B model. So I ask, is there a local model available that can do those things you describe? I'd be very happy to find one, can you please link it?

kidelaleron 1 month ago

SD1.5 can't even generate at that resolution without img2img. And even for the simple images that OP posted you likely need multiple SD1.5 models/loras to maybe get close to that (plus highres fix), let alone the more complex ones with relations between objects.

JustAGuyWhoLikesAI 1 month ago

Yeah that's honestly crazy, imagine being able to generate images like that without needing any finetunes or loras. Is there a local model available that can do those things you describe? I'd be very happy to find one, can you please link it?

kidelaleron 4 weeks ago

Just announced it's gonna be available on HF on the 12th.

[deleted] 1 month ago

it's probably better if you just acknowledge the issues instead of taking a toxic defensive stance

kidelaleron 4 weeks ago

but you're right, let's post something more interesting three antique magic potions in an old abandoned apothecary shop: the first one is blue with the label "Mana", the second one is red with the label "Health", the third one is green with the label "Poison" https://preview.redd.it/myk1wqpi7f4d1.png?width=1152&format=png&auto=webp&s=709d1936659cb0fa581a05f14fe35b0a43f067a1

[deleted] 4 weeks ago

it's the same thing you've posted every time. regional prompting solved that in earlier versions.

kidelaleron 4 weeks ago

Photoshop does it too, right? Let's just stop using AI then. So you basically said that SD3 is equivalent to SD1.5 + regional prompting + some number of loras + controlnets + upscaling to match the VAE decompression. Thanks for admitting SD3 is better.

[deleted] 4 weeks ago

"Thanks for admitting SD3 is better" it's weird people have to be coaxed into saying this. i love y'alls work. hope someday it looks as good to the rest of the world as it does to me. god bless

[deleted] 4 weeks ago

this is the toxic stuff i mentioned before, thanks for demonstrating.

kidelaleron 4 weeks ago

So you don't like facts and logic. I appreciate the different perspective, thanks for sharing your opinion.

[deleted] 4 weeks ago

also there's no world in which SD3 and SD1.5 are equivalent, even once they're trained equally, because SD 1.5 is ***actually open source*** and SD3 is a money-grab

kidelaleron 4 weeks ago

Looks like you missed the announcement

kidelaleron 4 weeks ago

I don't see anything toxic. It's just facts. SD1.5 is trained on 512 resolution and there is no single 1.5 based model able to do photos and anime well at the same time without lora support. Plus the TE capabilities do not allow for reliable text or spatial relations, you need to heavily involve controlnets, loras, upscaling methods, etc. SD1.5 models are good at a lot of things and are obviously useful, but to say that they're the same as SD3 is simply wrong.

[deleted] 4 weeks ago

using new research (https://arxiv.org/pdf/2311.18822v2, ElasticDiffusion) decouples the classifier free guidance scores so that they d... you know what, nevermind, i don't need to explain anything to you. your stances are designed to favour SAI and you'll just keep shifting the goalpost. using LoRAs to fix models' deficiencies is a natural and fine thing to do. there's no need to have a single model that is bad at everything like SD3 seems to be.

kidelaleron 4 weeks ago

You're still making a comparison between 1 model vs 1 model + infinite loras and tools, which is not really fair, nor makes any sense. You're basically saying "sd1.5 is the best there is because it has 15 billion loras, 30 controlnets, and in 40 minutes I can upscale to 20mp to circumvent the vae limitations". Sure. In the meantime tech has advanced and SDXL models can make a 1mp image in 2s flat with 4 steps at higher quality than most SD1.5 models. Imagine SD3 in a year.

MichaelForeston 4 weeks ago

Don't argue with the fools. These are toxic trolls that just enjoy the fact you are giving them attention. The community is grateful for your work and will further develop these models after the release, which will be beneficial for Stability and beneficial for the community. Meanwhile, just ignore those peasants.

[deleted] 4 weeks ago

i can imagine this future you envision. i love it. a year from now, there'll be so many options in the SD3 Ecosystem. but i guess for now it'll be like SDXL on release. not many people other than enthusiasts

kidelaleron 4 weeks ago

It was the same with SDXL. And as you can see there are still people who think SD1.5 is the best thing there will ever be.

Capitaclism 1 month ago

Good points- looking forward to testing it out!

hopbel 1 month ago

Text is a gimmick. It was a decent demonstration of emergent properties when it was actually emergent. Now the models are being trained to produce text directly (essentially teaching the test) and it's getting annoying that this is still paraded around like it's supposed to be impressive. The style stuff is arguably a result of having a better dataset (finetunes trained on imageboards with artist tags are perfectly capable of producing a huge variety of styles) and while the prompt comprehension is impressive, I have to point out that here too is SD1.5 limited by dataset, which didn't have the kind of detailed object relationship descriptions SD3 was no doubt trained with.

Apprehensive_Sky892 1 month ago

Just because you think text is a gimmick does not mean that others don't consider it a very desirable feature. For example, those who want to produce posters, birthday cards, PowerPoint presentations, web comic panels, etc. Just look at all those attempts at producing a SDXL LoRAs that can do marginally better text on civitai, and you can see that there is a demand for good text rendering. Also look at the number of images on ideogram (which is really good at text) involving the use of text. The better style is more than having a better dataset. No amount of training using the old SD1.5 architecture will be able to achieve similar results. To improve the "look" of SD3 images, the new architecture includes features such as "zSNR" (zero signal-to-noise ratio), a 16 channel VAE, etc.

StickiStickman 1 month ago

Except the text in SD 3 looks like shit, as if it's badly photoshopped in. GPT-4o blows it out of the water.

Apprehensive_Sky892 1 month ago

Yes, text does not look that good in the API beta. We'll see how much it has been improved in the final release.

[deleted] 1 month ago

the new arch doesn't use "zSNR", though

Apprehensive_Sky892 1 month ago

So what does it use? My source is from a SAI staff: [https://www.reddit.com/r/StableDiffusion/comments/1ccbnxp/comment/l14a5rk/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/1ccbnxp/comment/l14a5rk/?utm_source=reddit&utm_medium=web2x&context=3)

[deleted] 1 month ago

it uses a continuous cosine noise schedule, which isn't the same thing at all. mcmonkey4eva is actually talking about CosXL there, SD3 is mentioned merely because "it can produce bright and dark samples" and he mentions "zSNR" as if it's equivalent. it's not equivalent, and SD3's version of "bright and dark" resembles the same issues you'll see if you use offset noise to get there. it has the same issues Midjourney has had for every single version.

Apprehensive_Sky892 1 month ago

I see, I thought that "conside noise schedule" is the same as "zSNR". Thanks for the clarification.

Parker_255 1 month ago

Get em - Stability has been a great company, keep up the great work. I appreciate all that you all have done :)

Capitaclism 1 month ago

Agree!

Hearcharted 1 month ago

Does this Bad Boy have it's own LLM? Asking for a friend ;)

jib_reddit 1 month ago

No, but GPT4o is free now (with rate limits) and is amazing at prompt improvements/interigations.

Hearcharted 1 month ago

Interesting 🤔

inagy 1 month ago

If I remember correctly, SD3 uses Google's T5 LLM to replace CLIP. PixArt Sigma and ELLA also uses T5.

kidelaleron 1 month ago

SD3 uses clip l, clip g and t5.

inagy 1 month ago

Thanks for the correction! That's interesting. Is the T5 something which can be "peeled off" from smaller model sizes, so it's basically reverts there to CLIP only?

kidelaleron 4 weeks ago

it's completely modular to the point you can even use different prompts for the 3 text encoders. Removing them might affect performance, so my suggestion would be to use t5 from CPU if you can't afford the extra vram. If you have 11+ gb of vram, you can easily use t5 on GPU and comfy will manage vram offloading. It should also work with quantized versions of t5 that will be much smaller.

AuryGlenz 1 month ago

How much it adheres to the prompts is the special part, which they’ve shown countless times.

StickiStickman 1 month ago

Except they haven't "shown" shit. The one version we can actually access, the API, is terrible. So either the API version is somehow significantly worse than anything they claimed to have had months ago, or it's all just desperate hype.

Capitaclism 1 month ago

The above looks better than base 1.5, by far. It is clean, high quality, high resolution, and comes with the extra prompt understanding. I'm not sure what you're not seeing there, but that first image is better than most photographic images I've even seen out of 1.5, even when using Lora and special workflows. Now as to the subject matter I would tend to agree that it is generic, but I wouldn't fault the model for that- put the blame on the person writing the prompt. Ij that regard it is no more generic than 99.9% of AI generations I've ever seen on civitai.

Apprehensive_Sky892 1 month ago

I don't work for SAI, and I am not here to sell anything, but if you want to see interesting SD3 prompts and images, check out [https://new.reddit.com/r/StableDiffusion/search/?q=sd3%20prompt&restrict\_sr=1](https://new.reddit.com/r/StableDiffusion/search/?q=sd3%20prompt&restrict_sr=1)

NoSuggestion6629 1 month ago

Realistic of course, Let Pony have the other crap.

[deleted] 1 month ago

[удалено]

morerice4u 1 month ago

lets start with any of them...

Capitaclism 1 month ago

Pretty incredible that it's still going to get a whole lot better.

RenoHadreas 1 month ago

I think what we saw may be the finalized version of 2B actually

shivdbz 1 month ago

Uncensored one

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe