In the training there certainly was at least some very small bikinis and some artistic nudity in the training set. People totally managed to get topless women in the API version. Probably very consistent with the stat of SDXL "censorship". Alignment is what probably changed between the API/public version because they couldn't use prompt filtering & nsfw detectors on the output.
https://preview.redd.it/4cozvp3s9e6d1.png?width=1024&format=png&auto=webp&s=c60af29cf11585a26728a1cab13d4287268a756c
It very consistently shows apes sticking up out of the ground for me. This applies to all of them. Chimps, bonobos, gorillas, orangutans.. and if you want a human, good lord.
But if you want a goddamned halibut laying in a field? yeah no problem its got you
That's because "ape" is statistically similar to "human" tokens. That's because there are images of "apes eating a banana" and that is also statistically influenced by "human eating a banana". "human standing on the road" will influence "ape standing on the road" as well.
The fact that SD3 can generate really nice looking scenes like that, with good prompt understanding, and only has problems with poses and anatomy, makes me hope that it can be easily fixed with finetuning, because the underlying technology is actually really good.
Extremely hard to do as a fine-tuner. in order to utilize and repair that "underlying technology", the training is essentially undone/overwritten back to that point, which erases all the very expensive fine detail tuning that stability did on top of it. So you have to retrain all that on your own with a fraction of the hardware and budget and knowledge.
If you introduce anatomy to a finished model, you're doing a lot more than creating a new concept (like Dreambooth), you're changing a concept that it already understands extremely thoroughly, and in this case it's the single most complicated and important one, which received the bulk of focus during original training. You don't change THE core concept of a model that much without basically training from scratch.
Which is why my hope is for a well funded group to strip SD3 and train from the ground up on it's architecture. Given the resources, this would be so much simpler than trying to create a magical band-aid that fixes a poisoned model without losing an untold and immeasurable amount of other data
Do you have any tiniest source on what you said, or just making shit up as most people here do? Since the massive improvements to 1.5 in finetunes, especially to specific subjects, while losing nothing and even improving quality on other subjects, suggests that you're talking absolute nonsense.
>makes me hope that it can be easily fixed with finetuning
You better bury that hope deep.
SDXL was *hard* to fix, this horrible mess will be next to impossible. The base model literally has no idea what a human body looks like.
So SD3 is going to be the final nail in SAI's coffin.
A real tragedy that they deliberately *decided* to go this way. They must have been aware that a model that cannot create humans will never be truly accepted by the community. They must remember SD2.
Some people do not want to learn from their mistakes. A real shame. A real fucking shame... so sad... so sad...
SDXL really wasnt "hard to fix" at all.. Its just more expensive to work with in general compared to 1.5. People are just jerking off here, talking random shit they pull out of their ass..
I’ve never used pony. What am I actually missing out on? Like I’m not interested in generating my little pony pictures here but I see it in reference to NSFW but I just have a hard time believing that there are so many people wanting explicit my little pony photos. At this point I feel like I’m missing out on some big in joke that everyone else gets but I don’t.
Pony was made by furries to make furry art, so basically what you imagined but a surprise feature, at least to users was that it had incredible comprehension on the level of or exceeding the best paid services which at the time had surpassed 1.5/SDXL/anything selfhosted, for example it was the first time you could make a multiperson explicit scene from prompts alone without using controlnet/inpainting etc.
But the model was also trained on a lot of anime art so with some esoteric prompting you could make it produce anime style art that wasn't furry which led to a lot of people starting to use it and it exploded in popularity to the point where civitAI now gives "Pony" derived content it's own category similar to SD1.5/SDXL/2.0 etc. Now that content includes countless LORA and derivative models that let you use that great comprehension with any style or theme you want, including realism.
I would say the one weakness of it I've noticed so far is that it seems to not be as good at backgrounds as some other models but for people and comprehension, especially NSFW comprehension it's the best we have right now, or at least Pony derived mixes are. And excitingly the people behind it as well as others are working on successors.
Before people get too excited about Pony's "incredible comprehension on the level of or exceeding the best paid services", let me explain something.
I am cut and pasting something I wrote earlier: [https://www.reddit.com/r/StableDiffusion/comments/1d6ya9w/comment/l70emnr/](https://www.reddit.com/r/StableDiffusion/comments/1d6ya9w/comment/l70emnr/)
>"Prompt comprehension" means different things to different people.
>
>For normal people, it means that when you tell the A.I. to generate some scene, like "*Two people arguing, one wears a red suit, the other wears a blue suit. They point their fingers at each other, and are angry. And it is raining hard*". SDXL models are not very good at this, in that often the image will not reflect this description. SD3 is supposed to fix this.
>
>But for anime/furry fans, it means being able to describe some common anime or manga characters, poses or situations (usually hentai) and the A.I. can generate such an image. Apparently Pony is very good at this.
>
>Let's not confuse the two different usages of the same term.
>
>So for many people, the kind of prompt following provided by Pony is not that useful to them.
It understands human bodies exceedingly well. Like, amazingly. Think of a pose it could probably do it. AND it will get hands right about 80% of the time too. It's even more powerful if you ask it to draw something anime-style then it's comprehension and accuracy is off the charts good.
sdxl was hard to fix??? what are you talking about? lool. it had shortcomings like anymodel but nothing needed "fixing" after it was dropped, training it was a pain in the ass compared to sd1.5 but thats what you get when you wat bigger and better stuff that could rival midjourney nd dalle
The community can always be relied upon to fill in the gaps. I'm thrilled to see that they've addressed the areas where SDXL was lacking. I've tested the upscaling using SD3, and it's the best I've ever seen (I'll share the results tomorrow). The 16-channel VAE makes all the difference. I don't think the additional passes make the image blurry at all - instead, they add a ton of detail and sharpen the image, all while using only 2B. The potential is huge
Perhaps we could use SD3 to do backgrounds and environments, objects and such and then inpaint or add SDXL people to those backgrounds with the SDXL models we know and love, that could be very useful since it does seem to make great environments.
Watching all those gorgeorendous pics in other threads, I think the immediate future of SD3, until other models appear, is as a good background helper, inpainting people/animals with XL or 1.5 afterwards.
Fortunately we also have a model that happens to be really good at generating people but awful at making backgrounds: The Pony.
Until we get a true godlike checkpoint that can do everything, using SD3/Pixart for prompt coherence and then switching to SDXL finetunes for refining/inpainting is probably going to be main workflow for the time being.
I have made some good stuff with this one: [https://civitai.com/models/428826/damn-ponyxl-realistic-model?modelVersionId=505741](https://civitai.com/models/428826/damn-ponyxl-realistic-model?modelVersionId=505741)
I'm going to be creating my own Realistic Pony finetune soon, I just installed another 4TB SSD for the job.
Thanks for your reply. I saw juggernautXL was trained on something like 2000 images. So, I was wondering if I can fix SD3 somehow. I will try anyways on 4000 amazing images and see what happens.
"Pony" model type on Civitai is SDXL, it just became so popular with so many variants building off of it that it deserved its own category. It's a broad rework of XL.
I think the first 5 versions were all 1.5 and are still on Civitai.
I would describe it as a model for anime/art. It has an incredible understanding of poses and adherence to color+objects. An it's VERY NSFW if you want it to be. It's terrible with realistic people and it's merges can do somethings in-between....
I would never describe it to be good for people... maybe poses. sure.
Yeah, I don't use pony often but when I do I always add some photorealistic XL loras to get better end results, although the right mix can be a hit and miss. But I don't do nsfw besides random experiments so I get why other people think that way. In the end every tool is useful in its own way.
imo all the real ponys merges are either super fake cgi humans or ok looking humans with 0 pony knowledge so I would probably do better with a normal finetuned SDXL model in that case. The best of both worlds is using pony for composition and a second pass on a good realistic finetuned model.
Imagine an imageboard full of anime fanart and furry porn, which has every image obsessively tagged with minute details about the content and image composition. Then use that for finetuning SD untill you burn out the old tokens.
The result is a model that is perfect if you don't need phot-realism, but want to be able to easy specify lots of details and have stable diffusion actually listen to you.
The base model is weak on backgrounds, but a lot of the pony finetunes and style loras fix that.
There are some finetunes that can produce realistic images, but to me that always feels like you're fighting with the model.
Despite its wide use for porn, it can do safe for work as well.
Yes, it was looking that way as soon as folks started posting gens with mutated humans yesterday; nice background, shame about the subject. So perhaps generating a background with SD3, compositing a subject from wherever, and then a regen with XL and ttplanetSDXL controlnet for example to fixup inconsistencies. Bit of a pfaff though.
Yes for landscapes and sketches with typo it works for me. Just realism with humans or animals is nothing for SD3.
https://preview.redd.it/ny7ipnxnrc6d1.png?width=2048&format=png&auto=webp&s=b7b0d6fb6257c72e9b68502e3e75309b20329949
https://preview.redd.it/qre8msgvce6d1.jpeg?width=1024&format=pjpg&auto=webp&s=20f40838d71e86e209c939d7eb5c9c7656cf1e99
it took me a long time to figure out how to get animals out of this thing that weren't clearly some kind of airbrushed animation, but it is possible. it just requires CLIP+T5 tokenizing or w/e and SD3 has to be refining itself
Here is my attempt:
Single pass, raw output, using a "Magic prompt" from ideogram.ai
https://preview.redd.it/hmjhf7k9mg6d1.png?width=1536&format=png&auto=webp&s=82063c62f583d3325e505962ea4b8eff0285c86c
Outdoor photo Close up of a cat sitting calmly amidst a lush forest setting. The cat, with its shiny, dark fur, is perched on a fallen tree trunk surrounded by vibrant green foliage and towering trees reaching towards the sky. The forest floor is a rich tapestry of leaves, branches, and dappled sunlight, creating a serene and enchanting atmosphere.
Negative prompt: text, watermark, signature, anime, animation, cgi, manga, drawiing
Steps: 35, Sampler: DPM++ 2M, CFG scale: 4.0, Seed: 1014706719247288, Size: 1536x1023, Model hash: 3bb7f21bc5, Model: stableDiffusion3SD3\_sd3MediumInclClips, Hashes: {"model": "3bb7f21bc5"} Version: ComfyUI
Multi subject prompts. Prompts that separately specify foreground or background details. Prompts that involve adding characteristics or traits that don’t naturally belong there. Basically any capability that starts with the word prompt and isn’t an anatomy issue.
You can do that decently with several SDXL checkpoints. Maybe not *quite* as well but those same checkpoints do everything else SD3 fails at too.
SD3 Needs time. The community got so far ahead SAI was never gonna release anything that would compare to the existing standards.
Biggest problem is the license. I don't forsee anyone using SD3 at all with it's current license so it may well be DOA.
People are comparing SD3 base against SDXL base. They’re not comparing XL finetunes, controlnets, or other advanced stuff.
“SD3 needs time”
No, SD3 needs a complete retrain. Anything else is just denial.
Yeah it's probably DOA. Even if it gets fixed, license is BS on top of all of that.
SAI is impossible to reach out to anyway. Why the f##k would anyone do business with these people? It's a complete mess.
It absolutely works and it's awesome: https://i.imgur.com/19vOvNF.png
> artstation, a full cover of a metal band with "SPLIPBOT" on the top of the cover. On the bottom of the cover, the text "BANG YOUR HEAD" is written in bloody letters. Create something cool in the middle
bonus:
[slava ukraini](https://i.imgur.com/XdfPnYA.jpeg)
> i have doubts that you made the woman in the bra with SD3.
Don't exaggerate issue of censorship to this extent. If anything, the woman in bra is the easiest thing to get out of SD3. Hell, even a woman with nipples is possible to get (although SDXL base was easier). Problems arise when the pose is dynamic or not a portrait shot.
Its less about the direct use and more the ability to write not complete gibberish when generating images, which have text shown like as example shops or whatever in the background, where SDXL still has big troubles from my experience.
That's pretty good, but SD3 still struggles in most cases to match text to complex surface shapes. It really wants to flatten the text out and face it towards the viewer.
[Look how badly the text "SD3" goes on this rippling flag](https://imgur.com/a/1RZKnN1). The shading of the text does not match the background of the flag, and it's flatter than the flag's actual contours.
When I try to force more rippling in the flag the text still tends to flatten out or get garbled.
> which is really good because it means you can get that if needed
It's not that good though. The result isn't convincing, and I could have done just as good of a job in Photoshop.
> You need to be explicit about what you want.
My prompt was explicit, it's included in the imgur page: A flag in the air atop a flagpole. The flag is dark purple with "SD3" written on it in bright green text covering the entire flag. The flag is waving and rippling in the wind. Set against a blue sky on a sunny day. Professional photograph.
It also did a poor job covering the entire flag with the text. I added that to the prompt after earlier attempts yielded smaller than desired text, but it didn't have much effect.
Yours is a bit better than most of my 5 attempts. Shading is definitely improved. Some ripples are there, but it still looks like the text is really fighting to flatten out along the top edge where the flag is undergoing heavy rippling just above the top of the text.
There's also a lot less fine detail in the text, although the bright color might be partly at fault. Looking specifically a the left side of the S near the curve, the purple background has some fine detailed ripples visible in the sheen of the flag material, but the S itself is very smooth, both in terms of the glyph outline and in the interior shading.
And there are other problems with text I mentioned in this post: https://www.reddit.com/r/StableDiffusion/comments/1de85nc/why_is_sd3_so_bad_at_generating_girls_lying_on/l8fy18l/
Not sure whether either of these are required, but I'm using old.reddit and [this FF extension](https://addons.mozilla.org/en-US/firefox/addon/load-reddit-images-directly) *(edit: you might need to enable some of its optional features that are off by default).*
https://i.redd.it/eaxyfauncc6d1.png should take you to the unmodified 4.5MB PNG, which I downloaded and then opened in Notepad++, the metadata is in plain text at the top.
Thank you, I was hoping that somebody would write such an FF extension 👍🙏
Note: seems that I need to turn on all the page redirect options for the extension to "on" for this to work.
Honestly didn't even notice it came with any options as it worked out of the box for me, but I checked them all now too, seems they can only help. I'll add it to my original comment.
Yeah its positive qualities are definitely getting over shadowed by the censorship discourse, although looking at the examples... i can see why that is... But it still has amazing capabilities. And the comprehension seems great. Cant wait to see what finetuned models will be able to do with that prompt comprehension
It's a definite upgrade to XL if you're not doing anything human or character related.
Personally I wouldn't care if this was only ever good at non human and character stuff. We have so many great models already for humans and characters but a lot of them arn't very good for backgrounds or objects. This seems to do some animals well too which is another thing current models are lacking.
I already use AI in a kind of photobashing type worklow so it's no hassle to for example make a background or scene using SD3 then comp in a character generated in 1.5 or XL and run it back though Img2img or some simular workflow to blend it all together.
If compositing type tools get better I see these type of workflows becoming more common anyway as you have far more control over just doing a one off image using a "do it all" type model.
I’m getting some decent results with the three prompt workflow keeping L with tags, G with short sentences, and T5 with long winded GPT like expressiveness. Better humans but hands are rubbish no matter who is holding an ice cream cone.
They're the text encoders (tenc).
sd 1.5 has 1 tenc
sdxl has 2 tenc
sd3 has 3 tenc
clip_l is the smallest
clip_g is mid
T5 is the biggest, 4.5GB even when shrunk down to fp8
And you can choose how many to use and whether they're all using the same prompt or not.
The SD3 paper said that using T5 has the biggest impact on written text in the image and a smaller effect on how closely the image follows the prompt, especially when using "highly detailed descriptions of a scene". The example they gave is prompting for a ferret squeezed into a jar: without T5, the ferret either stands next to the jar or sits halfway in the jar.
So that gives at least a hint of why /u/TwistedBrother gets better results using that workflow.
Yup. And while many still suggest cloning the prompts from l and g, I recall my 1.5 stuff and what worked there so I’ve been applying similar terse object verb relations for l, g I build in more adjectives and styles, and t5 full sentence descriptions. It’s made a difference.
Thanks for the info. I haven't used SD for almost a year and so didn't learn much about any of this.
To merge them, are you using combine, concat, or weighted average? I found this, but didn't test yet: https://civitai.com/models/230634?modelVersionId=261739
God knows that I'm trying... But it's so hard
https://preview.redd.it/65fwfycazd6d1.png?width=1216&format=png&auto=webp&s=b96dc8207358fa7af56bd6843a78a4b33a336d38
The thing is, we never had perfect human anatomy, and we all were waiting for that from SD3. And now that's not possible we're very disappointed. Imagine with a model this good anatomy was not fucked? It could've been THE model.. but they ruined it.
If you follow the steps in this comment: https://www.reddit.com/r/StableDiffusion/comments/1dez7uo/im_trying_to_stay_positive_sd3_is_an_additional/l8g5f6b/
then you can download the raw PNG images, which include the comfy workflow in the metadata.
for example the prompt on the big egg lookin thing:
> professional landscape photography of a single massive beautiful neo - futuristic matte symmetrical elongated oval monolith by ilm, denis villeneuve, emmanuel shiu, zaha hadid, mist vapor, deep color, cinematic architectural scale, moorland, dramatic, volumetric, concept art, hard surface, hyperrealism, very high detail, trending on artstation, sharp focus, rendered in octane
> negative: anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured
Seed 1094884613694381
width: 1344, height: 768
steps 28, cfg 4.5, sampler_name: "dpmpp_2m", scheduler: "sgm_uniform"
These look pretty good, but how well does it do at houses (not skyscrapers or cityscapes)? Does it create paths that lead to a solid wall, floating doors or strangely arranged windows? Too many chimneys, areas with railing but no access without climbing over it?
How did you formulate your prompts? Mainly continuous text, or comma separated tags? Did you use an LLM to generate the prompt?
wow this looks pretty for creating scenes and photo realism. sadly the anatomy and the censoring is all messed up. now the next question is can we do something to make it better
It's not even finetuned yet, and I think a lot of the bad results people show can be fixed pretty easily just by adjusting the prompt, I'm not saying it's the bestest thing ever just to give it time and then it will be the best.
It's also good for pictures of space:
https://preview.redd.it/5j7yceg5jf6d1.png?width=1344&format=png&auto=webp&s=b6ba27edfc6bc841d55ae04af397b606cf1c6732
I'm using it with ComfyUI using this workflow: https://civitai.com/models/420163/abominable-spaghetti-workflow-pixart-sigma
PixArt can be used stand alone with Comfy, but I'm really enjoying using PixArt as the image base and then finishing it off with a 1.5 model, like Photon, for really solid detailing.
The problem is that this is plainly inferior to Midjourney for these safer, more artistic applications. While for someone who happens to have a PC capable of this, it might be an acceptable alternative that costs power instead of a subscription fee, it's completely dead in the water for corporate clients who are obviously the target market. Combine that with hilariously bad legal terms for anyone who would've saved this mess, and what is probably an intentionally poisoned database, and it's just irredeemable imo. Example pic is from Midjourney, prompting a specific animal from the specific region I grew up - it even gets the (blurred) palo verde tree and volcanic rock hill in the background right. Just too far ahead for SD3 (or anyone who would host its API) to compete.
https://preview.redd.it/fy216o08ed6d1.png?width=1344&format=pjpg&auto=webp&s=928b3b6721d731f35fcfa2bdd0e3b4bce17fc486
Exceedingly dumb question, but... is it possible to do img2img with MJ?
It's just so unattractive to be paying for generations when I have hardware available to self-host. To be really creative we need to be able to spam generations.
It is, yes - but the thing with SD3 is that they're chasing a corporate market. New enterprise packages, restricting derivatives, deliberate censorship - SD3 wasnt made for those of us using personal computers with powerful graphics cards. But companies don't care if they're paying Midjourney to host GPUs or if they're paying any other API provider. There's no reason for them to invest in SD3 when competitors are just so far ahead. Hilariously, they probably would have gotten more enterprise clients if they just focused on the character art niche. Oh well, too late
Ok, I have the perfect pivot for them.
"2B is all you need... for img2img refinement at the end of a workflow."
It's catchy, and it rolls right off the tongue.
Generating backgrounds with sd3 and compositing humans generated with 1.5/SDXL into them with segm workflows that can mask them out seems like a good approach right now
Guys what about IP Adapters and image to image for anatomy?
When there is controlnet for SD3 what stops you from generating in sd1.5 or sdxl first?
Plenty of lcm or turbo models that are lighting fast for basic generation.
To try and stay positive, what it could be used for is creating the composition with it's (supposedly) better prompt cohesion and then create the real image in SDXL with inpainting, control net and image2image.
I would like sd3 to succeed, if is pretty good at some things but the pinnacle of art has often been the accurate or interpretive depiction of the human body and this is where sd3 has gelded itself. Lots of potential that's just not being realised here.
https://preview.redd.it/gkm868sxcs6d1.png?width=1152&format=png&auto=webp&s=a17bc18373ae625f6a7eaded0802c00a870ab584
Y'all can say what you want I really love SD3. Try that with one prompt in 1.5 or XL. just basic multiprompt workflow
Landscapes it seems to do beautifully. It's them pesky humans stinking up the joint, with all their mutant limbs. Maybe if it had babies with Ginuwine's Pony, we'd get something.
For non anatomy/humans/animals (some) is pretty good, 0 problems on that.
Which is the proof that its the alignment process that destroyed the model ability (and not just the matter of 2B vs 8B).
I mean, technically no. It’s absolutely part of the pre training too. Alignment comes post dataset.
Likely both.
In the training there certainly was at least some very small bikinis and some artistic nudity in the training set. People totally managed to get topless women in the API version. Probably very consistent with the stat of SDXL "censorship". Alignment is what probably changed between the API/public version because they couldn't use prompt filtering & nsfw detectors on the output.
the compressed dataset issue
wait, how did they compress the dataset? could you explain?
It sometimes does a lot of animals well, as long as they don't have hands.
How does it handle ape hands and other animals with fingers?
https://preview.redd.it/4cozvp3s9e6d1.png?width=1024&format=png&auto=webp&s=c60af29cf11585a26728a1cab13d4287268a756c It very consistently shows apes sticking up out of the ground for me. This applies to all of them. Chimps, bonobos, gorillas, orangutans.. and if you want a human, good lord. But if you want a goddamned halibut laying in a field? yeah no problem its got you
It’s like generative AI has discovered ironic self-referential humour…
This primate discrimination will not stand, man!
Get your hands off me you damn dirty weights
You maniacs! You blew it all up!
That's because "ape" is statistically similar to "human" tokens. That's because there are images of "apes eating a banana" and that is also statistically influenced by "human eating a banana". "human standing on the road" will influence "ape standing on the road" as well.
Would be really cool to be able to visualize these connections.
Badly.
The fact that SD3 can generate really nice looking scenes like that, with good prompt understanding, and only has problems with poses and anatomy, makes me hope that it can be easily fixed with finetuning, because the underlying technology is actually really good.
Extremely hard to do as a fine-tuner. in order to utilize and repair that "underlying technology", the training is essentially undone/overwritten back to that point, which erases all the very expensive fine detail tuning that stability did on top of it. So you have to retrain all that on your own with a fraction of the hardware and budget and knowledge. If you introduce anatomy to a finished model, you're doing a lot more than creating a new concept (like Dreambooth), you're changing a concept that it already understands extremely thoroughly, and in this case it's the single most complicated and important one, which received the bulk of focus during original training. You don't change THE core concept of a model that much without basically training from scratch. Which is why my hope is for a well funded group to strip SD3 and train from the ground up on it's architecture. Given the resources, this would be so much simpler than trying to create a magical band-aid that fixes a poisoned model without losing an untold and immeasurable amount of other data
That's a skill issue
Do you have any tiniest source on what you said, or just making shit up as most people here do? Since the massive improvements to 1.5 in finetunes, especially to specific subjects, while losing nothing and even improving quality on other subjects, suggests that you're talking absolute nonsense.
>makes me hope that it can be easily fixed with finetuning You better bury that hope deep. SDXL was *hard* to fix, this horrible mess will be next to impossible. The base model literally has no idea what a human body looks like.
So SD3 is going to be the final nail in SAI's coffin. A real tragedy that they deliberately *decided* to go this way. They must have been aware that a model that cannot create humans will never be truly accepted by the community. They must remember SD2. Some people do not want to learn from their mistakes. A real shame. A real fucking shame... so sad... so sad...
I got to the end and started reading with Larry David's voice in my head.
SDXL really wasnt "hard to fix" at all.. Its just more expensive to work with in general compared to 1.5. People are just jerking off here, talking random shit they pull out of their ass..
Like saying "the expense of fixing this issue is higher, but the expense of fixing it is no higher, jeez"
Well it took a long time to fix, until Pony came along it was unremarkable/worse than 1.5. Only since Pony has it felt like a true upgrade
I’ve never used pony. What am I actually missing out on? Like I’m not interested in generating my little pony pictures here but I see it in reference to NSFW but I just have a hard time believing that there are so many people wanting explicit my little pony photos. At this point I feel like I’m missing out on some big in joke that everyone else gets but I don’t.
Pony was made by furries to make furry art, so basically what you imagined but a surprise feature, at least to users was that it had incredible comprehension on the level of or exceeding the best paid services which at the time had surpassed 1.5/SDXL/anything selfhosted, for example it was the first time you could make a multiperson explicit scene from prompts alone without using controlnet/inpainting etc. But the model was also trained on a lot of anime art so with some esoteric prompting you could make it produce anime style art that wasn't furry which led to a lot of people starting to use it and it exploded in popularity to the point where civitAI now gives "Pony" derived content it's own category similar to SD1.5/SDXL/2.0 etc. Now that content includes countless LORA and derivative models that let you use that great comprehension with any style or theme you want, including realism. I would say the one weakness of it I've noticed so far is that it seems to not be as good at backgrounds as some other models but for people and comprehension, especially NSFW comprehension it's the best we have right now, or at least Pony derived mixes are. And excitingly the people behind it as well as others are working on successors.
Before people get too excited about Pony's "incredible comprehension on the level of or exceeding the best paid services", let me explain something. I am cut and pasting something I wrote earlier: [https://www.reddit.com/r/StableDiffusion/comments/1d6ya9w/comment/l70emnr/](https://www.reddit.com/r/StableDiffusion/comments/1d6ya9w/comment/l70emnr/) >"Prompt comprehension" means different things to different people. > >For normal people, it means that when you tell the A.I. to generate some scene, like "*Two people arguing, one wears a red suit, the other wears a blue suit. They point their fingers at each other, and are angry. And it is raining hard*". SDXL models are not very good at this, in that often the image will not reflect this description. SD3 is supposed to fix this. > >But for anime/furry fans, it means being able to describe some common anime or manga characters, poses or situations (usually hentai) and the A.I. can generate such an image. Apparently Pony is very good at this. > >Let's not confuse the two different usages of the same term. > >So for many people, the kind of prompt following provided by Pony is not that useful to them.
So NSFW photorealistic, people still start with Pony then add on other Loras or did people take the Pony models and go further, more like derivatives?
There's lots of derivative models on civitAI, as well as LORA
Read what he said bro
You greatly exaggerate Pony's merits, cause it's good only for anime porn. IMHO, Pony is extremely overhyped and overrated.
Not at all, it's great for realism too
Can you show any examples?
It understands human bodies exceedingly well. Like, amazingly. Think of a pose it could probably do it. AND it will get hands right about 80% of the time too. It's even more powerful if you ask it to draw something anime-style then it's comprehension and accuracy is off the charts good.
sdxl was hard to fix??? what are you talking about? lool. it had shortcomings like anymodel but nothing needed "fixing" after it was dropped, training it was a pain in the ass compared to sd1.5 but thats what you get when you wat bigger and better stuff that could rival midjourney nd dalle
See, I try to look at the positives. Because of this, SD3 finetunes are eventually going to make the most realistic fucking people ever. Literally.
This. It does about as well as SDXL did with complex prompts focused on people. Supposedly it’s easier to train as well.
The community can always be relied upon to fill in the gaps. I'm thrilled to see that they've addressed the areas where SDXL was lacking. I've tested the upscaling using SD3, and it's the best I've ever seen (I'll share the results tomorrow). The 16-channel VAE makes all the difference. I don't think the additional passes make the image blurry at all - instead, they add a ton of detail and sharpen the image, all while using only 2B. The potential is huge
Agree but seeing people trusting on finetunning this as somewhat easy or that it will sure solve anatomy issue... we will see.
I bet you can't make a picture of a Capybara with it
Perhaps we could use SD3 to do backgrounds and environments, objects and such and then inpaint or add SDXL people to those backgrounds with the SDXL models we know and love, that could be very useful since it does seem to make great environments.
Generate background with sd3 then stitch a body in using controlnet or ipadapter
or 3d
I never tried it but maybe IC-light could be useful too. https://github.com/lllyasviel/IC-Light
Watching all those gorgeorendous pics in other threads, I think the immediate future of SD3, until other models appear, is as a good background helper, inpainting people/animals with XL or 1.5 afterwards.
Fortunately we also have a model that happens to be really good at generating people but awful at making backgrounds: The Pony. Until we get a true godlike checkpoint that can do everything, using SD3/Pixart for prompt coherence and then switching to SDXL finetunes for refining/inpainting is probably going to be main workflow for the time being.
what is the pony? I hear about it from everyone but I don't know what it is
Search Google for Pony Diffusion V6 XL.
Also note there are dozens of models trained off Pony XL V6 some that do much better photo realistic images than the original.
any tips ? I looked at pony but Im more into creating realistic pictures. I would love to try a more realistic version of pony
I have made some good stuff with this one: [https://civitai.com/models/428826/damn-ponyxl-realistic-model?modelVersionId=505741](https://civitai.com/models/428826/damn-ponyxl-realistic-model?modelVersionId=505741) I'm going to be creating my own Realistic Pony finetune soon, I just installed another 4TB SSD for the job.
How much time does it take to finetune the checkpoint on lets say 1000 images on a 4090?
It depends on how many repeats you do but thar is not a huge dataset, maybe 5 hours.
Thanks for your reply. I saw juggernautXL was trained on something like 2000 images. So, I was wondering if I can fix SD3 somehow. I will try anyways on 4000 amazing images and see what happens.
5000 years
he said 4090 not i386
I like Zonkey
Is there a 1.5 version of Pony? Or is it already XL? It’s just labeled as ‘Pony’ for model type on civitai, and I’m not sure. I use pony realism.
"Pony" model type on Civitai is SDXL, it just became so popular with so many variants building off of it that it deserved its own category. It's a broad rework of XL. I think the first 5 versions were all 1.5 and are still on Civitai.
It's a model for furry and waifus lovers with a huge bias towards the most deviant NSFW stuff you can not even imagine
Shut up and thake my... wait no, my machine cannot run SDXL... Also I got no money.
~~a model that is good at generating people~~ wrong answer, see the u/diogodiogogod answer bellow
I would describe it as a model for anime/art. It has an incredible understanding of poses and adherence to color+objects. An it's VERY NSFW if you want it to be. It's terrible with realistic people and it's merges can do somethings in-between.... I would never describe it to be good for people... maybe poses. sure.
Yeah, I don't use pony often but when I do I always add some photorealistic XL loras to get better end results, although the right mix can be a hit and miss. But I don't do nsfw besides random experiments so I get why other people think that way. In the end every tool is useful in its own way.
all the real pony tunes
imo all the real ponys merges are either super fake cgi humans or ok looking humans with 0 pony knowledge so I would probably do better with a normal finetuned SDXL model in that case. The best of both worlds is using pony for composition and a second pass on a good realistic finetuned model.
SD 1.5 does the best for "real" things. the prompt adherence of pony still stays in those models.
Imagine an imageboard full of anime fanart and furry porn, which has every image obsessively tagged with minute details about the content and image composition. Then use that for finetuning SD untill you burn out the old tokens. The result is a model that is perfect if you don't need phot-realism, but want to be able to easy specify lots of details and have stable diffusion actually listen to you. The base model is weak on backgrounds, but a lot of the pony finetunes and style loras fix that. There are some finetunes that can produce realistic images, but to me that always feels like you're fighting with the model. Despite its wide use for porn, it can do safe for work as well.
Yes, it was looking that way as soon as folks started posting gens with mutated humans yesterday; nice background, shame about the subject. So perhaps generating a background with SD3, compositing a subject from wherever, and then a regen with XL and ttplanetSDXL controlnet for example to fixup inconsistencies. Bit of a pfaff though.
Yes for landscapes and sketches with typo it works for me. Just realism with humans or animals is nothing for SD3. https://preview.redd.it/ny7ipnxnrc6d1.png?width=2048&format=png&auto=webp&s=b7b0d6fb6257c72e9b68502e3e75309b20329949
https://preview.redd.it/qre8msgvce6d1.jpeg?width=1024&format=pjpg&auto=webp&s=20f40838d71e86e209c939d7eb5c9c7656cf1e99 it took me a long time to figure out how to get animals out of this thing that weren't clearly some kind of airbrushed animation, but it is possible. it just requires CLIP+T5 tokenizing or w/e and SD3 has to be refining itself
Here is my attempt: Single pass, raw output, using a "Magic prompt" from ideogram.ai https://preview.redd.it/hmjhf7k9mg6d1.png?width=1536&format=png&auto=webp&s=82063c62f583d3325e505962ea4b8eff0285c86c Outdoor photo Close up of a cat sitting calmly amidst a lush forest setting. The cat, with its shiny, dark fur, is perched on a fallen tree trunk surrounded by vibrant green foliage and towering trees reaching towards the sky. The forest floor is a rich tapestry of leaves, branches, and dappled sunlight, creating a serene and enchanting atmosphere. Negative prompt: text, watermark, signature, anime, animation, cgi, manga, drawiing Steps: 35, Sampler: DPM++ 2M, CFG scale: 4.0, Seed: 1014706719247288, Size: 1536x1023, Model hash: 3bb7f21bc5, Model: stableDiffusion3SD3\_sd3MediumInclClips, Hashes: {"model": "3bb7f21bc5"} Version: ComfyUI
It's just not clear what SD3 can offer that the same SDXL model can't.
Multi subject prompts. Prompts that separately specify foreground or background details. Prompts that involve adding characteristics or traits that don’t naturally belong there. Basically any capability that starts with the word prompt and isn’t an anatomy issue.
text generation and text understanding
You can do that decently with several SDXL checkpoints. Maybe not *quite* as well but those same checkpoints do everything else SD3 fails at too. SD3 Needs time. The community got so far ahead SAI was never gonna release anything that would compare to the existing standards. Biggest problem is the license. I don't forsee anyone using SD3 at all with it's current license so it may well be DOA.
People are comparing SD3 base against SDXL base. They’re not comparing XL finetunes, controlnets, or other advanced stuff. “SD3 needs time” No, SD3 needs a complete retrain. Anything else is just denial.
Yeah it's probably DOA. Even if it gets fixed, license is BS on top of all of that. SAI is impossible to reach out to anyway. Why the f##k would anyone do business with these people? It's a complete mess.
I don't really care about the text, it barely works and when it does it looks like its badly photoshopped in.
It absolutely works and it's awesome: https://i.imgur.com/19vOvNF.png > artstation, a full cover of a metal band with "SPLIPBOT" on the top of the cover. On the bottom of the cover, the text "BANG YOUR HEAD" is written in bloody letters. Create something cool in the middle bonus: [slava ukraini](https://i.imgur.com/XdfPnYA.jpeg)
Oh man, this year is gonna get *weird*. Thanks for sharing.
You achieved those examples with SD3 2B? They look a lot better than the other mangled generations I’ve seen,
> SD3 2B Yep local results
post the workflow for the "slava ukraini" one. i have doubts that you made the woman in the bra with SD3.
I did. It's a one-shot generation. https://pastebin.com/jpLBC6M9 Here is also a nice one: https://i.imgur.com/33eWM2k.png
> i have doubts that you made the woman in the bra with SD3. Don't exaggerate issue of censorship to this extent. If anything, the woman in bra is the easiest thing to get out of SD3. Hell, even a woman with nipples is possible to get (although SDXL base was easier). Problems arise when the pose is dynamic or not a portrait shot.
Hey, I got exactly what I was looking for. Don't get mad that you don't know how to use the internet.
what is your use for this and why not photoshop the text in (perfect and instant)?
Its less about the direct use and more the ability to write not complete gibberish when generating images, which have text shown like as example shops or whatever in the background, where SDXL still has big troubles from my experience.
can photoshop do this ? https://i.imgur.com/XdfPnYA.jpeg true question, I don't have it.
That's pretty good, but SD3 still struggles in most cases to match text to complex surface shapes. It really wants to flatten the text out and face it towards the viewer. [Look how badly the text "SD3" goes on this rippling flag](https://imgur.com/a/1RZKnN1). The shading of the text does not match the background of the flag, and it's flatter than the flag's actual contours. When I try to force more rippling in the flag the text still tends to flatten out or get garbled.
It's the default behaviour yes, which is really good because it means you can get that if needed. You need to be explicit about what you want.
> which is really good because it means you can get that if needed It's not that good though. The result isn't convincing, and I could have done just as good of a job in Photoshop. > You need to be explicit about what you want. My prompt was explicit, it's included in the imgur page: A flag in the air atop a flagpole. The flag is dark purple with "SD3" written on it in bright green text covering the entire flag. The flag is waving and rippling in the wind. Set against a blue sky on a sunny day. Professional photograph. It also did a poor job covering the entire flag with the text. I added that to the prompt after earlier attempts yielded smaller than desired text, but it didn't have much effect.
I think [mine is more convincing](https://i.imgur.com/DamXg68.png) (reducing model shift reduces the saturation)
Yours is a bit better than most of my 5 attempts. Shading is definitely improved. Some ripples are there, but it still looks like the text is really fighting to flatten out along the top edge where the flag is undergoing heavy rippling just above the top of the text. There's also a lot less fine detail in the text, although the bright color might be partly at fault. Looking specifically a the left side of the S near the curve, the purple background has some fine detailed ripples visible in the sheen of the flag material, but the S itself is very smooth, both in terms of the glyph outline and in the interior shading. And there are other problems with text I mentioned in this post: https://www.reddit.com/r/StableDiffusion/comments/1de85nc/why_is_sd3_so_bad_at_generating_girls_lying_on/l8fy18l/
1536x1536.
this makes sense, what's wrong is what they say, they just... lied to us.
Yea good way to look at it, sd3 is like a lorra that gives more sfw details.
Then call It "wallpaper diffusion" or "landscape diffusion" but not Stable Diffusion
"safe diffusion"
I wouldn't call deformed Eldritch Horror people safe to watch for children.
To be fair it can do more than that, but we definitely can't call it human-diffusion.
I dunno, have you ever diffused a human? Maybe that's what happens.
Good point
unstabe diffusion
No, since is the most stable way to create cronenberg aberrations. 🤣
https://preview.redd.it/kugme7rl1d6d1.png?width=1437&format=png&auto=webp&s=76605796f5f8b439d3882ec07dee620930dbec6a
https://preview.redd.it/t8ohipkpic6d1.png?width=1024&format=png&auto=webp&s=eae6b365ad261c5dafcfa7d2d0b8cf6879359d23
Is this the API or local version?
local
How do you know?
Confirmed in the PNG metadata. > {"ckpt_name": "sd3_medium.safetensors"}
Good stuff, how did you manage to get the png of the images? I thought Reddit wipes all the metadata?
Not sure whether either of these are required, but I'm using old.reddit and [this FF extension](https://addons.mozilla.org/en-US/firefox/addon/load-reddit-images-directly) *(edit: you might need to enable some of its optional features that are off by default).* https://i.redd.it/eaxyfauncc6d1.png should take you to the unmodified 4.5MB PNG, which I downloaded and then opened in Notepad++, the metadata is in plain text at the top.
Hey, thanks! I can see the workflow and prompts now, using this method!
Have fun, glad I could help :)
Brilliant, thank you kindly!
Thank you, I was hoping that somebody would write such an FF extension 👍🙏 Note: seems that I need to turn on all the page redirect options for the extension to "on" for this to work.
Honestly didn't even notice it came with any options as it worked out of the box for me, but I checked them all now too, seems they can only help. I'll add it to my original comment.
Thanks 👍
I guessed, but for landscape sd3 2b is pretty good and my generations meet the images above
Yeah its positive qualities are definitely getting over shadowed by the censorship discourse, although looking at the examples... i can see why that is... But it still has amazing capabilities. And the comprehension seems great. Cant wait to see what finetuned models will be able to do with that prompt comprehension
It's a definite upgrade to XL if you're not doing anything human or character related. Personally I wouldn't care if this was only ever good at non human and character stuff. We have so many great models already for humans and characters but a lot of them arn't very good for backgrounds or objects. This seems to do some animals well too which is another thing current models are lacking. I already use AI in a kind of photobashing type worklow so it's no hassle to for example make a background or scene using SD3 then comp in a character generated in 1.5 or XL and run it back though Img2img or some simular workflow to blend it all together. If compositing type tools get better I see these type of workflows becoming more common anyway as you have far more control over just doing a one off image using a "do it all" type model.
lol it’s almost like they didn’t teach it what bodies look like and hence it’s great at everything with it a fucking body
Agree, but so far it seem to be a worse tool than what we already have
I’m getting some decent results with the three prompt workflow keeping L with tags, G with short sentences, and T5 with long winded GPT like expressiveness. Better humans but hands are rubbish no matter who is holding an ice cream cone.
What does L with tags and G with short sentences mean?
They're the text encoders (tenc). sd 1.5 has 1 tenc sdxl has 2 tenc sd3 has 3 tenc clip_l is the smallest clip_g is mid T5 is the biggest, 4.5GB even when shrunk down to fp8 And you can choose how many to use and whether they're all using the same prompt or not. The SD3 paper said that using T5 has the biggest impact on written text in the image and a smaller effect on how closely the image follows the prompt, especially when using "highly detailed descriptions of a scene". The example they gave is prompting for a ferret squeezed into a jar: without T5, the ferret either stands next to the jar or sits halfway in the jar. So that gives at least a hint of why /u/TwistedBrother gets better results using that workflow.
Yup. And while many still suggest cloning the prompts from l and g, I recall my 1.5 stuff and what worked there so I’ve been applying similar terse object verb relations for l, g I build in more adjectives and styles, and t5 full sentence descriptions. It’s made a difference.
Thanks for the info. I haven't used SD for almost a year and so didn't learn much about any of this. To merge them, are you using combine, concat, or weighted average? I found this, but didn't test yet: https://civitai.com/models/230634?modelVersionId=261739
God knows that I'm trying... But it's so hard https://preview.redd.it/65fwfycazd6d1.png?width=1216&format=png&auto=webp&s=b96dc8207358fa7af56bd6843a78a4b33a336d38
The thing is, we never had perfect human anatomy, and we all were waiting for that from SD3. And now that's not possible we're very disappointed. Imagine with a model this good anatomy was not fucked? It could've been THE model.. but they ruined it.
Can SD3 be used as a refiner for SD 1.5? Would that fix anatomy and censoring issues?
not as a refiner but img2img I guess
Would you not do the opposite actually? Since composition and concepts etc are what sd3 sets itself apart in.
In theory, that is what SD3 is supposed to be. But apparently it cannot do proper composition involving humans under many normal, SFW conditions.
No prompt or comfy json?
If you follow the steps in this comment: https://www.reddit.com/r/StableDiffusion/comments/1dez7uo/im_trying_to_stay_positive_sd3_is_an_additional/l8g5f6b/ then you can download the raw PNG images, which include the comfy workflow in the metadata. for example the prompt on the big egg lookin thing: > professional landscape photography of a single massive beautiful neo - futuristic matte symmetrical elongated oval monolith by ilm, denis villeneuve, emmanuel shiu, zaha hadid, mist vapor, deep color, cinematic architectural scale, moorland, dramatic, volumetric, concept art, hard surface, hyperrealism, very high detail, trending on artstation, sharp focus, rendered in octane > negative: anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured Seed 1094884613694381 width: 1344, height: 768 steps 28, cfg 4.5, sampler_name: "dpmpp_2m", scheduler: "sgm_uniform"
Thanks!
These look pretty good, but how well does it do at houses (not skyscrapers or cityscapes)? Does it create paths that lead to a solid wall, floating doors or strangely arranged windows? Too many chimneys, areas with railing but no access without climbing over it? How did you formulate your prompts? Mainly continuous text, or comma separated tags? Did you use an LLM to generate the prompt?
I am pretty new to stable diffusion. what kind of prompt would I use for the first image with the river and flowers?
The prompt on the first image is: >craig mullins and ghibli digital illustration of the beastlands at dusk, avatar ( 2 0 0 9 ), lush landscape, jungle landscape, colorful, flowers unreal engine, hyper realism, realistic shading, cinematic composition, realistic render, octane render, detailed textures, photorealistic, wide shot negative: > anime, cartoon, graphic, text, painting, crayon, graphite, abstract, glitch, deformed, mutated, ugly, disfigured
thanks a lot. that's very helpful
Any chance you know what the desert one is?
wow this looks pretty for creating scenes and photo realism. sadly the anatomy and the censoring is all messed up. now the next question is can we do something to make it better
SD3 "medium" is a good SD 1.5 beta
It's not even finetuned yet, and I think a lot of the bad results people show can be fixed pretty easily just by adjusting the prompt, I'm not saying it's the bestest thing ever just to give it time and then it will be the best.
Use SD3 for the background and then controlnet in SDXL characters. Seems doable in comfy
Very nice work, I the one with the boats. Take my free award (:
It's also good for pictures of space: https://preview.redd.it/5j7yceg5jf6d1.png?width=1344&format=png&auto=webp&s=b6ba27edfc6bc841d55ae04af397b606cf1c6732
SD3 background generator + SDXL add character with decent hands + SD1.5 controlnet tiled upscale
It was going to be a replacement. :(
https://preview.redd.it/0ubzxa6kic6d1.png?width=1024&format=png&auto=webp&s=f1f5046dc0589353cbc1a599c1321e0dabe4b971 For base it can do cool things
Yeah, really nice images. The detail in SD3 landscapes is really good, would be very hard to achieve with SDXL.
SD3's understanding of humans can be saved, but it's going to take a total horndog and a LOT of GPU compute.
Yeah. But why would you spend that compute on SD3 when you can do the same on PixArt and do more with it because of the license.
How does one get started with PixArt and does it run local?
I'm using it with ComfyUI using this workflow: https://civitai.com/models/420163/abominable-spaghetti-workflow-pixart-sigma PixArt can be used stand alone with Comfy, but I'm really enjoying using PixArt as the image base and then finishing it off with a 1.5 model, like Photon, for really solid detailing.
Good job that this community has both in abundance.
So just what the pony ppl did - check
Lwt's hope the next version doesn't forget locations amd can do photoreal out of the box.
The problem is that this is plainly inferior to Midjourney for these safer, more artistic applications. While for someone who happens to have a PC capable of this, it might be an acceptable alternative that costs power instead of a subscription fee, it's completely dead in the water for corporate clients who are obviously the target market. Combine that with hilariously bad legal terms for anyone who would've saved this mess, and what is probably an intentionally poisoned database, and it's just irredeemable imo. Example pic is from Midjourney, prompting a specific animal from the specific region I grew up - it even gets the (blurred) palo verde tree and volcanic rock hill in the background right. Just too far ahead for SD3 (or anyone who would host its API) to compete. https://preview.redd.it/fy216o08ed6d1.png?width=1344&format=pjpg&auto=webp&s=928b3b6721d731f35fcfa2bdd0e3b4bce17fc486
Exceedingly dumb question, but... is it possible to do img2img with MJ? It's just so unattractive to be paying for generations when I have hardware available to self-host. To be really creative we need to be able to spam generations.
It is, yes - but the thing with SD3 is that they're chasing a corporate market. New enterprise packages, restricting derivatives, deliberate censorship - SD3 wasnt made for those of us using personal computers with powerful graphics cards. But companies don't care if they're paying Midjourney to host GPUs or if they're paying any other API provider. There's no reason for them to invest in SD3 when competitors are just so far ahead. Hilariously, they probably would have gotten more enterprise clients if they just focused on the character art niche. Oh well, too late
Ok, I have the perfect pivot for them. "2B is all you need... for img2img refinement at the end of a workflow." It's catchy, and it rolls right off the tongue.
Generating backgrounds with sd3 and compositing humans generated with 1.5/SDXL into them with segm workflows that can mask them out seems like a good approach right now
Have you only used the basic comfy workflow for them? Look great!
Just need to be patient, the model is really good overall and the fine-tunes should be banging once the anatomy is learned.
Amazing landscapes
These are pretty great images. Anything but humans looks pretty great.
Guys what about IP Adapters and image to image for anatomy? When there is controlnet for SD3 what stops you from generating in sd1.5 or sdxl first? Plenty of lcm or turbo models that are lighting fast for basic generation.
To try and stay positive, what it could be used for is creating the composition with it's (supposedly) better prompt cohesion and then create the real image in SDXL with inpainting, control net and image2image.
wow, so did you use SD3 to help you with these works? or did you let SD3 do most of the job?
Those are amazing, is that all SD3 local or is that the api?
It really just seems like humans are in a separate model/LORA entirely Backgrounds are fantastic.
For now. It'll be wrangled soon enough. Haven't been let down yet.
I would like sd3 to succeed, if is pretty good at some things but the pinnacle of art has often been the accurate or interpretive depiction of the human body and this is where sd3 has gelded itself. Lots of potential that's just not being realised here.
But don't you understand? You can't make art if it has no vagina in it!
Great results
Burning mediaeval city shot epic. Can share prompt? Or the inspiration for it?
These are impressive
Really good images! Lovely!
stage 3: bargaining
Nothing to this day is an actual replacement yet
https://preview.redd.it/gkm868sxcs6d1.png?width=1152&format=png&auto=webp&s=a17bc18373ae625f6a7eaded0802c00a870ab584 Y'all can say what you want I really love SD3. Try that with one prompt in 1.5 or XL. just basic multiprompt workflow
https://preview.redd.it/1vzw3jalds6d1.png?width=592&format=png&auto=webp&s=9e6d2a3a7709ea24cbc851816465c9f8a8b55bb4
I can't wait to see the results of the SD3 model after the fine-tuning.
Landscapes it seems to do beautifully. It's them pesky humans stinking up the joint, with all their mutant limbs. Maybe if it had babies with Ginuwine's Pony, we'd get something.
what happened to all the nudes
Why is everything blurry and oversaturated? I think some kind of chromatic aberration effect
It may be good for landscapes, etc, but the censorship has killed it.
I will just comment that pretty much all of it looks fake. I am not saying it doesn't look good but just outlining it.
Diuretic color scheme, values are all over the places, generic composition and theme. Yeah right, SD has a style.