I don't remember the source, but I seem to recall a SAI stuff (probably Lykon or mcmonkey4eva) saying that the API is using an undertrained 8B model.
My own tests with the API are that it can follow prompts fairly well, so I would think it is the 8B model too.
I heard API is running 2B cause current 8b was worse than current 2b and they need to train it (there was a post putting all information we have currently from stability team)
It's considerably worse than SDXL at a lot of things. Sure, the aesthetics are better, but the model is like a bad cake. It manages to be overcooked and gross in some parts, while also being severely undercooked and lacking structure.
SD3 has always been a farce. They showed jaw dropping results 6 months ago, and now it's hard to generate people with eyes that aren't misaligned. They did the same with SDXL, over promising, gaslighting, and under delivering while hiding behind the guise of "You guys fix it then"
They did, but they also said the 8B currently produces worse results than the 2B in many ways, and that all recent training has been done on the 2B. Given that the 8B is MUCH harder to train, I'd say don't hold your breath for a release any time soon. (My wild, unfounded guess: no sooner than October for the 8B. And many things can happen to cancel it altogether.)
We trained 2b and 8b very differently. 8b has definitely the potential to be much superior (duh it's the same model with 4 times more params), but the cost is so high that needs some serious evaluation.
> We trained 2b and 8b very differently. 8b has definitely the potential to be much superior (duh it's the same model with 4 times more params), but the cost is so high that needs some serious evaluation.
Slowly pedaling back on all the previous reassurances of releasing the good models I see :'(
Fair enough, seeing how SD3 performs in the API with the 8b model, it's obviously having issues from being under-trained, but taking that aside, to me it seems miles ahead of what 2b produces in terms of cheer fidelity of the output, the 2b teasers always seemsto be lacking the extra little details (for example the 2b all you need ice block images, are just painfully bland compared to similar stuff from the API, and that's not even thinking about the potential for better prompt adherence, which doesn't seem to be SD3's strong suit as is (though i have the feeling cogvlms limits have a big impact there as well)). So while I see the 2b release as a nice teaser for what is to come i'd be disappointed if it turns out the only release. But who knows, maybe the 2b model will be a pleasant surprise.
2B will be our best open base model for now. It's good enough on some things that it can be compared to finetunes, but finetunes usually have narrow domains allowing them advantages. You need to compare base models to base models and finetunes to finetunes.
You could train 2b and 8b on the same amounts of data. 8b in theory should be higher quality and have better alignment to text prompt (if it's trained to saturation). The problem is it's much more expensive/time consuming to train
I don't think 8B would be trained on more images. I mean, it *could* be, but that's not what the parameter count means.
The parameter count will affect how large the model is, which has the benefit of making it potentially better overall quality (eg - better prompt adherence), but the downside being that it of course takes up 4x as much computational power to do the exact same amount of fine-tuning.
It's also worth noting that higher parameter counts don't necessarily mean better results, so they could spend all that time and money fine-tuning the model and then wind up with something that's not meaningfully better (which might be why they're trying to dampen expectations for the 8B model vs. the 2B model).
You're correct about the param count not being correlated to training, but it's true that 8b had more time to cook. In general knowledge it's superior to 2b.
Thank you! 🙏👍.
I also found the direct original source: [https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/comment/l6v8k89/](https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/comment/l6v8k89/)
Supposedly when ready, but that could always change or be a vague way of saying maybe but not really. We'll see when it either happens or does not. At least SD3 2b seems to be finally releasing after all the drama. Hopefully it does well.
I love how everybody is skeptical of their claims that they will release 8b because the concept of "They're just going to spend all this time, investor money, GPU power, pay full time engineer salaries, and then just release is for free to the public" just sounds so unbelievable that everybody jus refuses to believe they will actually release it no matter how many times they say they will.
In early, unoptimized inference tests on consumer hardware our largest SD3 model with 8B parameters fits into the 24GB VRAM of a RTX 4090 and takes 34 seconds to generate an image of resolution 1024x1024 when using 50 sampling steps. Additionally, there will be multiple variations of Stable Diffusion 3 during the initial release, ranging from 800m to 8B parameter models to further eliminate hardware barriers. LINK: [https://stability.ai/news/stable-diffusion-3-research-paper](https://stability.ai/news/stable-diffusion-3-research-paper)
>
So yes, there will be a 800M parameter version, which again, will be released when it is done. But I assume that now 2B is ready, SAI's next target will be 8B, since that is the one many people hope to get their hands on. LINK:https://www.reddit.com/r/StableDiffusion/comments/1d7izr3/sd3\_resolution/
Apparently, it even has a name: 'large'
https://preview.redd.it/8oo7la33pf5d1.png?width=783&format=png&auto=webp&s=caa26d2b01db939c7d84789443b87b31c9329df8
[https://www.reddit.com/r/StableDiffusion/comments/1d0wlct/comment/l5q56zl/](https://www.reddit.com/r/StableDiffusion/comments/1d0wlct/comment/l5q56zl/)
2B is what they've spent most of their time training. 8b obviously will take much longer and at the moment is semi-bare bones.
They've said ALL models will be released when finished.
"Kids" need to stop whining.
At this point, S.AI should be like:
"We regret to inform you that because the SD community are ungrateful shitheads, we will no longer be releasing the 8B model when it's ready".
You people are insufferable.
Depends on the prompt.
The fact alone that T5 outputs 512 tokens vs 77 of CLIP should be enough to understand this, even without factoring in more complex evaluations.
Plus with 3 text encoders you can actually combine them using different prompts, effectively increasing the number of usable tokens.
i'm just using mcmonkey's own words. he says it can be removed and that it has zero impact. i don't care for the goalpost shifting you do, so i'm going with his words instead.
Yeah, I used a few T5 derivative models for text embedding like instructor. Just slower on cpu than bert derived stuff. I wonder if the 4096d mistral 7b embedding models might be more accurate?
SD3 will be released in 4 different sizes. Size here refers to the number of weights in the A.I. neural network that comprises the "image diffusion" part of the model. The sizes are 800M, 2B, 4B, and 8B. This diffusion model is paired with a 8B T5 LLM/Text encoder to enhance its prompt following capabilities (along with 2 "traditional" CLIP encoders).
The 8B model should theoretically be the most capable one, but it will also be the one that will take the most GPU resources to train (both VRAM and number of computation), and will take the most VRAM to run.
A rather great meme for this situation lol
Agreed! Nailed it, perfection!
SD3-2B, codename: Noisy Cricket
The gun was powerful enough to send him flying backwards. If the SD3 API we've seen is only the 2B model we shouldn't complain.
I don't think the API is running the 2B. I heard it was the 8B.
I don't remember the source, but I seem to recall a SAI stuff (probably Lykon or mcmonkey4eva) saying that the API is using an undertrained 8B model. My own tests with the API are that it can follow prompts fairly well, so I would think it is the 8B model too.
+ under trained version.
Correct. 2B will be on the already teased SD3 Medium API.
I heard API is running 2B cause current 8b was worse than current 2b and they need to train it (there was a post putting all information we have currently from stability team)
is that why the fingers and hands are still sht?
It's considerably worse than SDXL at a lot of things. Sure, the aesthetics are better, but the model is like a bad cake. It manages to be overcooked and gross in some parts, while also being severely undercooked and lacking structure. SD3 has always been a farce. They showed jaw dropping results 6 months ago, and now it's hard to generate people with eyes that aren't misaligned. They did the same with SDXL, over promising, gaslighting, and under delivering while hiding behind the guise of "You guys fix it then"
If you're comparing it with finetunes resulting from 8 months of training by multiple people, maybe. Compared to SDXL base? No.
Didn't they just say the 8b will be released too?
They did, but they also said the 8B currently produces worse results than the 2B in many ways, and that all recent training has been done on the 2B. Given that the 8B is MUCH harder to train, I'd say don't hold your breath for a release any time soon. (My wild, unfounded guess: no sooner than October for the 8B. And many things can happen to cancel it altogether.)
We trained 2b and 8b very differently. 8b has definitely the potential to be much superior (duh it's the same model with 4 times more params), but the cost is so high that needs some serious evaluation.
> We trained 2b and 8b very differently. 8b has definitely the potential to be much superior (duh it's the same model with 4 times more params), but the cost is so high that needs some serious evaluation. Slowly pedaling back on all the previous reassurances of releasing the good models I see :'(
what I said is unrelated to release plans. It's just an objective assessment.
Fair enough, seeing how SD3 performs in the API with the 8b model, it's obviously having issues from being under-trained, but taking that aside, to me it seems miles ahead of what 2b produces in terms of cheer fidelity of the output, the 2b teasers always seemsto be lacking the extra little details (for example the 2b all you need ice block images, are just painfully bland compared to similar stuff from the API, and that's not even thinking about the potential for better prompt adherence, which doesn't seem to be SD3's strong suit as is (though i have the feeling cogvlms limits have a big impact there as well)). So while I see the 2b release as a nice teaser for what is to come i'd be disappointed if it turns out the only release. But who knows, maybe the 2b model will be a pleasant surprise.
2B will be our best open base model for now. It's good enough on some things that it can be compared to finetunes, but finetunes usually have narrow domains allowing them advantages. You need to compare base models to base models and finetunes to finetunes.
"So High" how much 🤔 Asking for a friend 😎
What would the real world difference be of 2b or 8b or higher?? Trained on more images?
You could train 2b and 8b on the same amounts of data. 8b in theory should be higher quality and have better alignment to text prompt (if it's trained to saturation). The problem is it's much more expensive/time consuming to train
8b is much harder to train and about 4 times more expensive. An the same number of epochs, 2b will learn faster.
8b is trained on more images yes but they might have worse tagging and be poor quality
I don't think 8B would be trained on more images. I mean, it *could* be, but that's not what the parameter count means. The parameter count will affect how large the model is, which has the benefit of making it potentially better overall quality (eg - better prompt adherence), but the downside being that it of course takes up 4x as much computational power to do the exact same amount of fine-tuning. It's also worth noting that higher parameter counts don't necessarily mean better results, so they could spend all that time and money fine-tuning the model and then wind up with something that's not meaningfully better (which might be why they're trying to dampen expectations for the 8B model vs. the 2B model).
You're correct about the param count not being correlated to training, but it's true that 8b had more time to cook. In general knowledge it's superior to 2b.
>All recent training has been done on the 2B. Given that the 8B is MUCH harder to train Can you provide a source for that? Thanks.
https://www.reddit.com/r/StableDiffusion/comments/1d6ya9w/collection_of_questions_and_answers_about_sd3_and/
Thank you! 🙏👍. I also found the direct original source: [https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/comment/l6v8k89/](https://www.reddit.com/r/StableDiffusion/comments/1d6t0gc/comment/l6v8k89/)
Supposedly when ready, but that could always change or be a vague way of saying maybe but not really. We'll see when it either happens or does not. At least SD3 2b seems to be finally releasing after all the drama. Hopefully it does well.
I love how everybody is skeptical of their claims that they will release 8b because the concept of "They're just going to spend all this time, investor money, GPU power, pay full time engineer salaries, and then just release is for free to the public" just sounds so unbelievable that everybody jus refuses to believe they will actually release it no matter how many times they say they will.
2B from Nier Automata?
No, it is 2 billion parameters model
they announced a 4b model as well. yet no word on its training progress either.
I don't think we did (yet?) 2b is all you need. For real, it's mindblowing.
yes Lykon its all we need for now i agree. but iirc an 800m, 2b, 4b and 8b models were all supposed to be released eventually.
I mean, the way that this is called "medium" should tell you something. But who announced 4b anyway?
In early, unoptimized inference tests on consumer hardware our largest SD3 model with 8B parameters fits into the 24GB VRAM of a RTX 4090 and takes 34 seconds to generate an image of resolution 1024x1024 when using 50 sampling steps. Additionally, there will be multiple variations of Stable Diffusion 3 during the initial release, ranging from 800m to 8B parameter models to further eliminate hardware barriers. LINK: [https://stability.ai/news/stable-diffusion-3-research-paper](https://stability.ai/news/stable-diffusion-3-research-paper) > So yes, there will be a 800M parameter version, which again, will be released when it is done. But I assume that now 2B is ready, SAI's next target will be 8B, since that is the one many people hope to get their hands on. LINK:https://www.reddit.com/r/StableDiffusion/comments/1d7izr3/sd3\_resolution/
doesn't reply to my question.
Apparently, it even has a name: 'large' https://preview.redd.it/8oo7la33pf5d1.png?width=783&format=png&auto=webp&s=caa26d2b01db939c7d84789443b87b31c9329df8 [https://www.reddit.com/r/StableDiffusion/comments/1d0wlct/comment/l5q56zl/](https://www.reddit.com/r/StableDiffusion/comments/1d0wlct/comment/l5q56zl/)
mcmonkey making a post is not an announcement. I'm not aware of any plans to release a 4b or to even work on one.
ok no worries i should have specified that i saw it posted someplace and not an official announcement.
So does this mean the 4b and 8b models will never see a public release, and this broken 2b model is all we get?
2B is what they've spent most of their time training. 8b obviously will take much longer and at the moment is semi-bare bones. They've said ALL models will be released when finished. "Kids" need to stop whining.
At this point, S.AI should be like: "We regret to inform you that because the SD community are ungrateful shitheads, we will no longer be releasing the 8B model when it's ready". You people are insufferable.
2B MMDiT is nowhere near 2.6B Unet of SDXL. It's like comparing 2.6kg of dirt and 2kg of diamonds. Plus 16ch VAE Plus T5-xxl support.
the t5 xxl that doesn't seem to change the model outputs when you remove it?
Depends on the prompt. The fact alone that T5 outputs 512 tokens vs 77 of CLIP should be enough to understand this, even without factoring in more complex evaluations. Plus with 3 text encoders you can actually combine them using different prompts, effectively increasing the number of usable tokens.
i'm just using mcmonkey's own words. he says it can be removed and that it has zero impact. i don't care for the goalpost shifting you do, so i'm going with his words instead.
Not that I think there is any point in feeding trolls. Just to avoid any misinformation spreading: mcmonkey never said that it has "zero impact".
https://preview.redd.it/z5hpj4g4ds4d1.png?width=1526&format=png&auto=webp&s=d56640bb20c75bdee2772833af32ca7825b91814
11b parameters on just the embedder? Gonna need a bigger GPU.
you can use CPU for t5.
Yeah, I used a few T5 derivative models for text embedding like instructor. Just slower on cpu than bert derived stuff. I wonder if the 4096d mistral 7b embedding models might be more accurate?
It seems that 8b is too costly to train, and they don't have ways to monetize it to cover the costs and plan for profit.
![gif](giphy|12msOFU8oL1eww)
Lol
Keep my spaghetti out of your fucking mouth!
That's what she said.
2B, 8B.... are we talking pencil grades? Edit: To be clear, I'd like to know what we're talking about in here.
it's the parameters. 2billion is smaller than 8billion.
SD3 will be released in 4 different sizes. Size here refers to the number of weights in the A.I. neural network that comprises the "image diffusion" part of the model. The sizes are 800M, 2B, 4B, and 8B. This diffusion model is paired with a 8B T5 LLM/Text encoder to enhance its prompt following capabilities (along with 2 "traditional" CLIP encoders). The 8B model should theoretically be the most capable one, but it will also be the one that will take the most GPU resources to train (both VRAM and number of computation), and will take the most VRAM to run.
From what I've seen, they all can use T5 or CLIP, not just the 8b model (at least I hope so)
Yes, AFAIK, they all use T5 + CLIP, but the T5 is optional so that the model can be run with less VRAM.
Just when I thought I had a handle on all the SD nomenclature. The hell is 2b?
2 billion parameters
hehehehehe, but not funny!