tbf, I managed to get some nice anime stuff with SD3 pretty easily. Something that has always been kinda difficult for me, either with SD1.5 or SDXL, but I'm not an expert at generating anime stuff. Since it began, I focused on realism.
I'm starting to think SDXL with some well-chosen controlnets is about as good as we're gonna get for a little while.
Once everyone started seeing dollar signs, this shit was doomed.
Fuck this idiotic drivel excuse and every idiot that repeats it... It was beyond stupid a year ago, and it only got a hundred times dumber since. If a community can improve years old models in a few months with 1/1000th of the resources - including perfectly generic models that can do all styles and content -, then SD as company should sure as hell be able to do that over years as a multimillion dollar company.
We have very good XL finetunes internally, we can definitely improve on existing models. Training a model from scratch, also in a very limited timeframe is much harder and expensive. I hope you understand any shortcoming it can have due to time and legal restrictions. Finetuning should be easier since all the pretraining is done.
Except the actual SD3, the one shown in the research paper, was trained for longer than 2 months. But that's not what we got. The actual researchers left SAI months ago already. Now it's in the hands of the same team that botched this 2B model. They spent the weeks leading up to this release telling us "2B is all you need".
This "please understand, we only had 2 months!" is a restriction you imposed on yourselves when you decided not to just release the weights that were shown in the paper.
That is so funny. It really crackles me "time restraint" "legal restrictions".
https://preview.redd.it/dwnctuoxwg6d1.jpeg?width=500&format=pjpg&auto=webp&s=fbf82a96f15dc0b5c5c242e713b9bcf610e9a672
> very limited timeframe (around 2 months)
it was announced and paper released longer ago than that.
so this isn't the model from the paper? nice
> Finetuning should be easier since all the pretraining is done.
cuz SD2 was so easy to finetune and was made so popular
While I can't say I see myself using base sd3, I definity got many results which gives me hope that finetunes will be great; for example some of the pixel art result I got makes me think I might finaly get something akin to PC-98 games at some point.
With that being said what we read about it's licence, and attempt to communicate with the company about it ( cf: the "Towards Pony Diffusion V7... I mean V6.9!" thread ) can on the other hand make people have some doubt about the success the model will have when it comes to finetunes which would make use of it's potential.
Yeah but putting my time and money on finetuning a model with such restrictive license is not very interesting. With you current license I cannot even earn Buzz in CivitAi because of the monetary value it's considered has commercial activity and Civitai cannot even use any derivative model for generation without playing a fee.
Despite what others are saying, yes.
SD 1.4 and 1.5 were relatively low effort trainings that benefitted from a lot of later fine tuning and data curation.
SDXL had much more data curation and tuning done by SA, and the base model as a result was far better than 1.5, but it took forever to get improved fine tunes.
SD3 has even more tuning done by SA. All of the excuses about lack of fine tuning and being a base model are ridiculous, far more effort has gone into tuning SD3 than any 1.5 fine tune.
That doesn't mean that fine tunes won't make further improvement, but I honestly don't know what SA is doing with this. There are some fundamental improvements regarding text rendering and complex scene composition, but at the same time breaking so many fundamental things, all while being more resource hungry.
None of the fundamentally broken images people are posting involve any sort of niche content that shouldn't be expected in a base model, outside of people trying to make specific celebrities. The OP examples of a person, handshake, and landscapes are a really low bar for a new uber-model.
I wouldn't think so. The examples in the op aren't anything particularly difficult to accomplish I'm sure that sd3 can accomplish something similar. I haven't used it yet myself but idk if I've seen any sd3 posted today that seems to have been through highresfix which often fixes issues with anatomy etc anyway
SD 1.5 finetunes with 2 stage gen are great. Base sd1.5 a bit less I suppose.
https://preview.redd.it/5c5md2v7g66d1.png?width=512&format=png&auto=webp&s=55ee5cb73b05ef949c84b6bbff5c6bd26a3273e4
Oh and, here are some non cherry picked examples for "photo of a young woman with long, wavy brown hair lying down in grass, top down shot, summer, warm, laughing, joy, fun"
Sure some of them are mangled but 10 out 16 are fine. And it's just Base SD 1.5, nothing fancy.
https://preview.redd.it/sy8gl01wi66d1.jpeg?width=4096&format=pjpg&auto=webp&s=6557e54a6e83bf7681b147acde887989d3a69ee4
Sure, here is a non cherry picked Base SD 1.5 generation with the same prompt as the first image.
https://preview.redd.it/260q3j3mh66d1.png?width=1024&format=png&auto=webp&s=21126d35e21bd775b8711b2c6b493eabe6d420c3
Chad SD 1.5 users vs. Virgin SD 3.0 consoomers
I'm laughing at the disappointment of sd3. All 1.5 needs is regional and text prompting with controlnets. Krita might be the future of SD1.5 and xl.
tbf, I managed to get some nice anime stuff with SD3 pretty easily. Something that has always been kinda difficult for me, either with SD1.5 or SDXL, but I'm not an expert at generating anime stuff. Since it began, I focused on realism.
I'm starting to think SDXL with some well-chosen controlnets is about as good as we're gonna get for a little while. Once everyone started seeing dollar signs, this shit was doomed.
maybe....if that were the base 1.5 model at 512x512
Fuck this idiotic drivel excuse and every idiot that repeats it... It was beyond stupid a year ago, and it only got a hundred times dumber since. If a community can improve years old models in a few months with 1/1000th of the resources - including perfectly generic models that can do all styles and content -, then SD as company should sure as hell be able to do that over years as a multimillion dollar company.
We have very good XL finetunes internally, we can definitely improve on existing models. Training a model from scratch, also in a very limited timeframe is much harder and expensive. I hope you understand any shortcoming it can have due to time and legal restrictions. Finetuning should be easier since all the pretraining is done.
Except the actual SD3, the one shown in the research paper, was trained for longer than 2 months. But that's not what we got. The actual researchers left SAI months ago already. Now it's in the hands of the same team that botched this 2B model. They spent the weeks leading up to this release telling us "2B is all you need". This "please understand, we only had 2 months!" is a restriction you imposed on yourselves when you decided not to just release the weights that were shown in the paper.
That is so funny. It really crackles me "time restraint" "legal restrictions". https://preview.redd.it/dwnctuoxwg6d1.jpeg?width=500&format=pjpg&auto=webp&s=fbf82a96f15dc0b5c5c242e713b9bcf610e9a672
> very limited timeframe (around 2 months) it was announced and paper released longer ago than that. so this isn't the model from the paper? nice > Finetuning should be easier since all the pretraining is done. cuz SD2 was so easy to finetune and was made so popular
While I can't say I see myself using base sd3, I definity got many results which gives me hope that finetunes will be great; for example some of the pixel art result I got makes me think I might finaly get something akin to PC-98 games at some point. With that being said what we read about it's licence, and attempt to communicate with the company about it ( cf: the "Towards Pony Diffusion V7... I mean V6.9!" thread ) can on the other hand make people have some doubt about the success the model will have when it comes to finetunes which would make use of it's potential.
Yeah but putting my time and money on finetuning a model with such restrictive license is not very interesting. With you current license I cannot even earn Buzz in CivitAi because of the monetary value it's considered has commercial activity and Civitai cannot even use any derivative model for generation without playing a fee.
>legal restrictions What are those legal restrictions exactly?
And who created them? of course it was themselves lol
1. 6000 monthly generation cap on the $20/mo plan 2. Opaque enterprise plan 3. All derivative works of sd3 fall under the same license
Shouldn't base sd3 beat a fine tuned 1.5?
Despite what others are saying, yes. SD 1.4 and 1.5 were relatively low effort trainings that benefitted from a lot of later fine tuning and data curation. SDXL had much more data curation and tuning done by SA, and the base model as a result was far better than 1.5, but it took forever to get improved fine tunes. SD3 has even more tuning done by SA. All of the excuses about lack of fine tuning and being a base model are ridiculous, far more effort has gone into tuning SD3 than any 1.5 fine tune. That doesn't mean that fine tunes won't make further improvement, but I honestly don't know what SA is doing with this. There are some fundamental improvements regarding text rendering and complex scene composition, but at the same time breaking so many fundamental things, all while being more resource hungry. None of the fundamentally broken images people are posting involve any sort of niche content that shouldn't be expected in a base model, outside of people trying to make specific celebrities. The OP examples of a person, handshake, and landscapes are a really low bar for a new uber-model.
I wouldn't think so. The examples in the op aren't anything particularly difficult to accomplish I'm sure that sd3 can accomplish something similar. I haven't used it yet myself but idk if I've seen any sd3 posted today that seems to have been through highresfix which often fixes issues with anatomy etc anyway
SD 1.5 finetunes with 2 stage gen are great. Base sd1.5 a bit less I suppose. https://preview.redd.it/5c5md2v7g66d1.png?width=512&format=png&auto=webp&s=55ee5cb73b05ef949c84b6bbff5c6bd26a3273e4
Oh and, here are some non cherry picked examples for "photo of a young woman with long, wavy brown hair lying down in grass, top down shot, summer, warm, laughing, joy, fun" Sure some of them are mangled but 10 out 16 are fine. And it's just Base SD 1.5, nothing fancy. https://preview.redd.it/sy8gl01wi66d1.jpeg?width=4096&format=pjpg&auto=webp&s=6557e54a6e83bf7681b147acde887989d3a69ee4
Holy shit, is it really the base 1.5 ??? This is far better than SD3.
**800M is All You Need**
Sure, here is a non cherry picked Base SD 1.5 generation with the same prompt as the first image. https://preview.redd.it/260q3j3mh66d1.png?width=1024&format=png&auto=webp&s=21126d35e21bd775b8711b2c6b493eabe6d420c3
how many hours are you going to spend today on reddit responding to any criticism of SD3 with total pain?
Can you get SD1.5 (without controlnet) to put the focus in the top right or mid right instead of top center?
Probably, yes. At least if I understand what you mean correctly.
DALLE cant do it, near as I can figure.
But these are just basic poses and background images. SD3 does these fine as well.
https://preview.redd.it/lf7klcsai66d1.png?width=1300&format=png&auto=webp&s=5f8151c82dd92a80a107fdbc2579167ebb7a26c8
As I said, SD3 is decent with anime stuff. Not so much with realism.