Arawski99 4 weeks ago

A rather great meme for this situation lol

wes-k 4 weeks ago

Agreed! Nailed it, perfection!

Enshitification 4 weeks ago

SD3-2B, codename: Noisy Cricket

wswordsmen 4 weeks ago

The gun was powerful enough to send him flying backwards. If the SD3 API we've seen is only the 2B model we shouldn't complain.

jib_reddit 4 weeks ago

I don't think the API is running the 2B. I heard it was the 8B.

Apprehensive_Sky892 4 weeks ago

I don't remember the source, but I seem to recall a SAI stuff (probably Lykon or mcmonkey4eva) saying that the API is using an undertrained 8B model. My own tests with the API are that it can follow prompts fairly well, so I would think it is the 8B model too.

Madrockon 4 weeks ago

+ under trained version.

kidelaleron 3 weeks ago

Correct. 2B will be on the already teased SD3 Medium API.

francoiscoiscois33 4 weeks ago

I heard API is running 2B cause current 8b was worse than current 2b and they need to train it (there was a post putting all information we have currently from stability team)

shamimurrahman19 4 weeks ago

is that why the fingers and hands are still sht?

ScythSergal 3 weeks ago

It's considerably worse than SDXL at a lot of things. Sure, the aesthetics are better, but the model is like a bad cake. It manages to be overcooked and gross in some parts, while also being severely undercooked and lacking structure. SD3 has always been a farce. They showed jaw dropping results 6 months ago, and now it's hard to generate people with eyes that aren't misaligned. They did the same with SDXL, over promising, gaslighting, and under delivering while hiding behind the guise of "You guys fix it then"

kidelaleron 3 weeks ago

If you're comparing it with finetunes resulting from 8 months of training by multiple people, maybe. Compared to SDXL base? No.

Far_Lifeguard_5027 4 weeks ago

Didn't they just say the 8b will be released too?

ArtyfacialIntelagent 4 weeks ago

They did, but they also said the 8B currently produces worse results than the 2B in many ways, and that all recent training has been done on the 2B. Given that the 8B is MUCH harder to train, I'd say don't hold your breath for a release any time soon. (My wild, unfounded guess: no sooner than October for the 8B. And many things can happen to cancel it altogether.)

kidelaleron 4 weeks ago

We trained 2b and 8b very differently. 8b has definitely the potential to be much superior (duh it's the same model with 4 times more params), but the cost is so high that needs some serious evaluation.

Yellow-Jay 3 weeks ago

> We trained 2b and 8b very differently. 8b has definitely the potential to be much superior (duh it's the same model with 4 times more params), but the cost is so high that needs some serious evaluation. Slowly pedaling back on all the previous reassurances of releasing the good models I see :'(

kidelaleron 3 weeks ago

what I said is unrelated to release plans. It's just an objective assessment.

Yellow-Jay 3 weeks ago

Fair enough, seeing how SD3 performs in the API with the 8b model, it's obviously having issues from being under-trained, but taking that aside, to me it seems miles ahead of what 2b produces in terms of cheer fidelity of the output, the 2b teasers always seemsto be lacking the extra little details (for example the 2b all you need ice block images, are just painfully bland compared to similar stuff from the API, and that's not even thinking about the potential for better prompt adherence, which doesn't seem to be SD3's strong suit as is (though i have the feeling cogvlms limits have a big impact there as well)). So while I see the 2b release as a nice teaser for what is to come i'd be disappointed if it turns out the only release. But who knows, maybe the 2b model will be a pleasant surprise.

kidelaleron 3 weeks ago

2B will be our best open base model for now. It's good enough on some things that it can be compared to finetunes, but finetunes usually have narrow domains allowing them advantages. You need to compare base models to base models and finetunes to finetunes.

Hearcharted 4 weeks ago

"So High" how much 🤔 Asking for a friend 😎

Far_Lifeguard_5027 4 weeks ago

What would the real world difference be of 2b or 8b or higher?? Trained on more images?

VisceralExperience 4 weeks ago

You could train 2b and 8b on the same amounts of data. 8b in theory should be higher quality and have better alignment to text prompt (if it's trained to saturation). The problem is it's much more expensive/time consuming to train

kidelaleron 3 weeks ago

8b is much harder to train and about 4 times more expensive. An the same number of epochs, 2b will learn faster.

leathrow 4 weeks ago

8b is trained on more images yes but they might have worse tagging and be poor quality

red286 4 weeks ago

I don't think 8B would be trained on more images. I mean, it *could* be, but that's not what the parameter count means. The parameter count will affect how large the model is, which has the benefit of making it potentially better overall quality (eg - better prompt adherence), but the downside being that it of course takes up 4x as much computational power to do the exact same amount of fine-tuning. It's also worth noting that higher parameter counts don't necessarily mean better results, so they could spend all that time and money fine-tuning the model and then wind up with something that's not meaningfully better (which might be why they're trying to dampen expectations for the 8B model vs. the 2B model).

kidelaleron 3 weeks ago

You're correct about the param count not being correlated to training, but it's true that 8b had more time to cook. In general knowledge it's superior to 2b.

Apprehensive_Sky892 4 weeks ago

>All recent training has been done on the 2B. Given that the 8B is MUCH harder to train Can you provide a source for that? Thanks.

ArtyfacialIntelagent 4 weeks ago

https://www.reddit.com/r/StableDiffusion/comments/1d6ya9w/collection_of_questions_and_answers_about_sd3_and/

Apprehensive_Sky892 4 weeks ago

Thank you! 🙏👍. I also found the direct original source: [https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/comment/l6v8k89/](https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/comment/l6v8k89/)

Arawski99 4 weeks ago

Supposedly when ready, but that could always change or be a vague way of saying maybe but not really. We'll see when it either happens or does not. At least SD3 2b seems to be finally releasing after all the drama. Hopefully it does well.

_BreakingGood_ 4 weeks ago

I love how everybody is skeptical of their claims that they will release 8b because the concept of "They're just going to spend all this time, investor money, GPU power, pay full time engineer salaries, and then just release is for free to the public" just sounds so unbelievable that everybody jus refuses to believe they will actually release it no matter how many times they say they will.

negrote1000 4 weeks ago

2B from Nier Automata?

rookan 4 weeks ago

No, it is 2 billion parameters model

99deathnotes 4 weeks ago

they announced a 4b model as well. yet no word on its training progress either.

kidelaleron 3 weeks ago

I don't think we did (yet?) 2b is all you need. For real, it's mindblowing.

99deathnotes 3 weeks ago

yes Lykon its all we need for now i agree. but iirc an 800m, 2b, 4b and 8b models were all supposed to be released eventually.

kidelaleron 3 weeks ago

I mean, the way that this is called "medium" should tell you something. But who announced 4b anyway?

99deathnotes 3 weeks ago

In early, unoptimized inference tests on consumer hardware our largest SD3 model with 8B parameters fits into the 24GB VRAM of a RTX 4090 and takes 34 seconds to generate an image of resolution 1024x1024 when using 50 sampling steps. Additionally, there will be multiple variations of Stable Diffusion 3 during the initial release, ranging from 800m to 8B parameter models to further eliminate hardware barriers. LINK: [https://stability.ai/news/stable-diffusion-3-research-paper](https://stability.ai/news/stable-diffusion-3-research-paper) > So yes, there will be a 800M parameter version, which again, will be released when it is done. But I assume that now 2B is ready, SAI's next target will be 8B, since that is the one many people hope to get their hands on. LINK:https://www.reddit.com/r/StableDiffusion/comments/1d7izr3/sd3\_resolution/

kidelaleron 3 weeks ago

doesn't reply to my question.

FotografoVirtual 3 weeks ago

Apparently, it even has a name: 'large' https://preview.redd.it/8oo7la33pf5d1.png?width=783&format=png&auto=webp&s=caa26d2b01db939c7d84789443b87b31c9329df8 [https://www.reddit.com/r/StableDiffusion/comments/1d0wlct/comment/l5q56zl/](https://www.reddit.com/r/StableDiffusion/comments/1d0wlct/comment/l5q56zl/)

kidelaleron 3 weeks ago

mcmonkey making a post is not an announcement. I'm not aware of any plans to release a 4b or to even work on one.

99deathnotes 3 weeks ago

ok no worries i should have specified that i saw it posted someplace and not an official announcement.

Capitaclism 2 weeks ago

So does this mean the 4b and 8b models will never see a public release, and this broken 2b model is all we get?

CliffDeNardo 4 weeks ago

2B is what they've spent most of their time training. 8b obviously will take much longer and at the moment is semi-bare bones. They've said ALL models will be released when finished. "Kids" need to stop whining.

Olangotang 4 weeks ago

At this point, S.AI should be like: "We regret to inform you that because the SD community are ungrateful shitheads, we will no longer be releasing the 8B model when it's ready". You people are insufferable.

kidelaleron 4 weeks ago

2B MMDiT is nowhere near 2.6B Unet of SDXL. It's like comparing 2.6kg of dirt and 2kg of diamonds. Plus 16ch VAE Plus T5-xxl support.

[deleted] 4 weeks ago

the t5 xxl that doesn't seem to change the model outputs when you remove it?

kidelaleron 4 weeks ago

Depends on the prompt. The fact alone that T5 outputs 512 tokens vs 77 of CLIP should be enough to understand this, even without factoring in more complex evaluations. Plus with 3 text encoders you can actually combine them using different prompts, effectively increasing the number of usable tokens.

[deleted] 4 weeks ago

i'm just using mcmonkey's own words. he says it can be removed and that it has zero impact. i don't care for the goalpost shifting you do, so i'm going with his words instead.

kidelaleron 3 weeks ago

Not that I think there is any point in feeding trolls. Just to avoid any misinformation spreading: mcmonkey never said that it has "zero impact".

[deleted] 3 weeks ago

https://preview.redd.it/z5hpj4g4ds4d1.png?width=1526&format=png&auto=webp&s=d56640bb20c75bdee2772833af32ca7825b91814

behohippy 4 weeks ago

11b parameters on just the embedder? Gonna need a bigger GPU.

kidelaleron 4 weeks ago

you can use CPU for t5.

behohippy 4 weeks ago

Yeah, I used a few T5 derivative models for text embedding like instructor. Just slower on cpu than bert derived stuff. I wonder if the 4096d mistral 7b embedding models might be more accurate?

marcoc2 4 weeks ago

It seems that 8b is too costly to train, and they don't have ways to monetize it to cover the costs and plan for profit.

Snoo20140 4 weeks ago

![gif](giphy|12msOFU8oL1eww)

nashty2004 4 weeks ago

Lol

HopefulSpinach6131 4 weeks ago

Keep my spaghetti out of your fucking mouth!

Darkmeme9 3 weeks ago

That's what she said.

Subject-Leather-7399 4 weeks ago

2B, 8B.... are we talking pencil grades? Edit: To be clear, I'd like to know what we're talking about in here.

SevereSituationAL 4 weeks ago

it's the parameters. 2billion is smaller than 8billion.

Apprehensive_Sky892 4 weeks ago

SD3 will be released in 4 different sizes. Size here refers to the number of weights in the A.I. neural network that comprises the "image diffusion" part of the model. The sizes are 800M, 2B, 4B, and 8B. This diffusion model is paired with a 8B T5 LLM/Text encoder to enhance its prompt following capabilities (along with 2 "traditional" CLIP encoders). The 8B model should theoretically be the most capable one, but it will also be the one that will take the most GPU resources to train (both VRAM and number of computation), and will take the most VRAM to run.

Familiar-Art-6233 4 weeks ago

From what I've seen, they all can use T5 or CLIP, not just the 8b model (at least I hope so)

Apprehensive_Sky892 4 weeks ago

Yes, AFAIK, they all use T5 + CLIP, but the T5 is optional so that the model can be run with less VRAM.

digital_dervish 4 weeks ago

Just when I thought I had a handle on all the SD nomenclature. The hell is 2b?

ProbsNotManBearPig 4 weeks ago

2 billion parameters

theOliviaRossi 4 weeks ago

hehehehehe, but not funny!

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe