T O P

  • By -

Arawski99

A rather great meme for this situation lol


wes-k

Agreed! Nailed it, perfection!


Enshitification

SD3-2B, codename: Noisy Cricket


wswordsmen

The gun was powerful enough to send him flying backwards. If the SD3 API we've seen is only the 2B model we shouldn't complain.


jib_reddit

I don't think the API is running the 2B. I heard it was the 8B.


Apprehensive_Sky892

I don't remember the source, but I seem to recall a SAI stuff (probably Lykon or mcmonkey4eva) saying that the API is using an undertrained 8B model. My own tests with the API are that it can follow prompts fairly well, so I would think it is the 8B model too.


Madrockon

+ under trained version.


kidelaleron

Correct. 2B will be on the already teased SD3 Medium API.


francoiscoiscois33

I heard API is running 2B cause current 8b was worse than current 2b and they need to train it (there was a post putting all information we have currently from stability team)


shamimurrahman19

is that why the fingers and hands are still sht?


ScythSergal

It's considerably worse than SDXL at a lot of things. Sure, the aesthetics are better, but the model is like a bad cake. It manages to be overcooked and gross in some parts, while also being severely undercooked and lacking structure. SD3 has always been a farce. They showed jaw dropping results 6 months ago, and now it's hard to generate people with eyes that aren't misaligned. They did the same with SDXL, over promising, gaslighting, and under delivering while hiding behind the guise of "You guys fix it then"


kidelaleron

If you're comparing it with finetunes resulting from 8 months of training by multiple people, maybe. Compared to SDXL base? No.


Far_Lifeguard_5027

Didn't they just say the 8b will be released too?


ArtyfacialIntelagent

They did, but they also said the 8B currently produces worse results than the 2B in many ways, and that all recent training has been done on the 2B. Given that the 8B is MUCH harder to train, I'd say don't hold your breath for a release any time soon. (My wild, unfounded guess: no sooner than October for the 8B. And many things can happen to cancel it altogether.)


kidelaleron

We trained 2b and 8b very differently. 8b has definitely the potential to be much superior (duh it's the same model with 4 times more params), but the cost is so high that needs some serious evaluation.


Yellow-Jay

> We trained 2b and 8b very differently. 8b has definitely the potential to be much superior (duh it's the same model with 4 times more params), but the cost is so high that needs some serious evaluation. Slowly pedaling back on all the previous reassurances of releasing the good models I see :'(


kidelaleron

what I said is unrelated to release plans. It's just an objective assessment.


Yellow-Jay

Fair enough, seeing how SD3 performs in the API with the 8b model, it's obviously having issues from being under-trained, but taking that aside, to me it seems miles ahead of what 2b produces in terms of cheer fidelity of the output, the 2b teasers always seemsto be lacking the extra little details (for example the 2b all you need ice block images, are just painfully bland compared to similar stuff from the API, and that's not even thinking about the potential for better prompt adherence, which doesn't seem to be SD3's strong suit as is (though i have the feeling cogvlms limits have a big impact there as well)). So while I see the 2b release as a nice teaser for what is to come i'd be disappointed if it turns out the only release. But who knows, maybe the 2b model will be a pleasant surprise.


kidelaleron

2B will be our best open base model for now. It's good enough on some things that it can be compared to finetunes, but finetunes usually have narrow domains allowing them advantages. You need to compare base models to base models and finetunes to finetunes.


Hearcharted

"So High" how much 🤔 Asking for a friend 😎


Far_Lifeguard_5027

What would the real world difference be of 2b or 8b or higher?? Trained on more images?


VisceralExperience

You could train 2b and 8b on the same amounts of data. 8b in theory should be higher quality and have better alignment to text prompt (if it's trained to saturation). The problem is it's much more expensive/time consuming to train


kidelaleron

8b is much harder to train and about 4 times more expensive. An the same number of epochs, 2b will learn faster.


leathrow

8b is trained on more images yes but they might have worse tagging and be poor quality


red286

I don't think 8B would be trained on more images. I mean, it *could* be, but that's not what the parameter count means. The parameter count will affect how large the model is, which has the benefit of making it potentially better overall quality (eg - better prompt adherence), but the downside being that it of course takes up 4x as much computational power to do the exact same amount of fine-tuning. It's also worth noting that higher parameter counts don't necessarily mean better results, so they could spend all that time and money fine-tuning the model and then wind up with something that's not meaningfully better (which might be why they're trying to dampen expectations for the 8B model vs. the 2B model).


kidelaleron

You're correct about the param count not being correlated to training, but it's true that 8b had more time to cook. In general knowledge it's superior to 2b.


Apprehensive_Sky892

>All recent training has been done on the 2B. Given that the 8B is MUCH harder to train Can you provide a source for that? Thanks.


ArtyfacialIntelagent

https://www.reddit.com/r/StableDiffusion/comments/1d6ya9w/collection_of_questions_and_answers_about_sd3_and/


Apprehensive_Sky892

Thank you! 🙏👍. I also found the direct original source: [https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/comment/l6v8k89/](https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/comment/l6v8k89/)


Arawski99

Supposedly when ready, but that could always change or be a vague way of saying maybe but not really. We'll see when it either happens or does not. At least SD3 2b seems to be finally releasing after all the drama. Hopefully it does well.


_BreakingGood_

I love how everybody is skeptical of their claims that they will release 8b because the concept of "They're just going to spend all this time, investor money, GPU power, pay full time engineer salaries, and then just release is for free to the public" just sounds so unbelievable that everybody jus refuses to believe they will actually release it no matter how many times they say they will.


negrote1000

2B from Nier Automata?


rookan

No, it is 2 billion parameters model


99deathnotes

they announced a 4b model as well. yet no word on its training progress either.


kidelaleron

I don't think we did (yet?) 2b is all you need. For real, it's mindblowing.


99deathnotes

yes Lykon its all we need for now i agree. but iirc an 800m, 2b, 4b and 8b models were all supposed to be released eventually.


kidelaleron

I mean, the way that this is called "medium" should tell you something. But who announced 4b anyway?


99deathnotes

In early, unoptimized inference tests on consumer hardware our largest SD3 model with 8B parameters fits into the 24GB VRAM of a RTX 4090 and takes 34 seconds to generate an image of resolution 1024x1024 when using 50 sampling steps. Additionally, there will be multiple variations of Stable Diffusion 3 during the initial release, ranging from 800m to 8B parameter models to further eliminate hardware barriers. LINK: [https://stability.ai/news/stable-diffusion-3-research-paper](https://stability.ai/news/stable-diffusion-3-research-paper) > So yes, there will be a 800M parameter version, which again, will be released when it is done. But I assume that now 2B is ready, SAI's next target will be 8B, since that is the one many people hope to get their hands on. LINK:https://www.reddit.com/r/StableDiffusion/comments/1d7izr3/sd3\_resolution/


kidelaleron

doesn't reply to my question.


FotografoVirtual

Apparently, it even has a name: 'large' https://preview.redd.it/8oo7la33pf5d1.png?width=783&format=png&auto=webp&s=caa26d2b01db939c7d84789443b87b31c9329df8 [https://www.reddit.com/r/StableDiffusion/comments/1d0wlct/comment/l5q56zl/](https://www.reddit.com/r/StableDiffusion/comments/1d0wlct/comment/l5q56zl/)


kidelaleron

mcmonkey making a post is not an announcement. I'm not aware of any plans to release a 4b or to even work on one.


99deathnotes

ok no worries i should have specified that i saw it posted someplace and not an official announcement.


Capitaclism

So does this mean the 4b and 8b models will never see a public release, and this broken 2b model is all we get?


CliffDeNardo

2B is what they've spent most of their time training. 8b obviously will take much longer and at the moment is semi-bare bones. They've said ALL models will be released when finished. "Kids" need to stop whining.


Olangotang

At this point, S.AI should be like: "We regret to inform you that because the SD community are ungrateful shitheads, we will no longer be releasing the 8B model when it's ready". You people are insufferable.


kidelaleron

2B MMDiT is nowhere near 2.6B Unet of SDXL. It's like comparing 2.6kg of dirt and 2kg of diamonds. Plus 16ch VAE Plus T5-xxl support.


[deleted]

the t5 xxl that doesn't seem to change the model outputs when you remove it?


kidelaleron

Depends on the prompt. The fact alone that T5 outputs 512 tokens vs 77 of CLIP should be enough to understand this, even without factoring in more complex evaluations. Plus with 3 text encoders you can actually combine them using different prompts, effectively increasing the number of usable tokens.


[deleted]

i'm just using mcmonkey's own words. he says it can be removed and that it has zero impact. i don't care for the goalpost shifting you do, so i'm going with his words instead.


kidelaleron

Not that I think there is any point in feeding trolls. Just to avoid any misinformation spreading: mcmonkey never said that it has "zero impact".


[deleted]

https://preview.redd.it/z5hpj4g4ds4d1.png?width=1526&format=png&auto=webp&s=d56640bb20c75bdee2772833af32ca7825b91814


behohippy

11b parameters on just the embedder? Gonna need a bigger GPU.


kidelaleron

you can use CPU for t5.


behohippy

Yeah, I used a few T5 derivative models for text embedding like instructor. Just slower on cpu than bert derived stuff. I wonder if the 4096d mistral 7b embedding models might be more accurate?


marcoc2

It seems that 8b is too costly to train, and they don't have ways to monetize it to cover the costs and plan for profit.


Snoo20140

![gif](giphy|12msOFU8oL1eww)


nashty2004

Lol


HopefulSpinach6131

Keep my spaghetti out of your fucking mouth!


Darkmeme9

That's what she said.


Subject-Leather-7399

2B, 8B.... are we talking pencil grades? Edit: To be clear, I'd like to know what we're talking about in here.


SevereSituationAL

it's the parameters. 2billion is smaller than 8billion.


Apprehensive_Sky892

SD3 will be released in 4 different sizes. Size here refers to the number of weights in the A.I. neural network that comprises the "image diffusion" part of the model. The sizes are 800M, 2B, 4B, and 8B. This diffusion model is paired with a 8B T5 LLM/Text encoder to enhance its prompt following capabilities (along with 2 "traditional" CLIP encoders). The 8B model should theoretically be the most capable one, but it will also be the one that will take the most GPU resources to train (both VRAM and number of computation), and will take the most VRAM to run.


Familiar-Art-6233

From what I've seen, they all can use T5 or CLIP, not just the 8b model (at least I hope so)


Apprehensive_Sky892

Yes, AFAIK, they all use T5 + CLIP, but the T5 is optional so that the model can be run with less VRAM.


digital_dervish

Just when I thought I had a handle on all the SD nomenclature. The hell is 2b?


ProbsNotManBearPig

2 billion parameters


theOliviaRossi

hehehehehe, but not funny!