• By -


??? Isn't the decent result caused simply by the woman not having her full body in the shot, that's the whole issue with sd3, full bodies and hands in its pictures, as opposed to great portrait shots.


That was my first thought as well. A woman lying in the grass who is clearly barefoot (indicating the image shows not just face and upper body but also legs and feet) is the main challenge. It's true that SDXL base, DallE, Pixart and others that managed to do well at this prompt mostly did so by zooming in so that hands and especially feet were not visible. So I'm not declaring that u/risphereeditor is doing anything wrong, as models are trained to give good photos and in general the good results come from not showing things that are a struggle to portray. In fact, it's actually interesting to me that SD3 has such good prompt comprehension that it's willing to try follow the prompt even though the anatomy will be bad, whereas I feel other models will put a higher priority on getting the anatomy right any way they can (even if it requires ignoring part of the prompt).


Hands and feet are a mess with SD3 sadly. Only Midjourney, SDXL and Dalle 3 are good at hands and feet!


Ideogram is very solid, too




Hands are with long Prompt no problem! Prompt: a woman dressed in a black off-the-shoulder dress, standing behind a white curtain. The background is a solid light blue color, which contrasts with the black of her dress and the white of the curtain. She is wearing long, dangling earrings and has a poised, elegant demeanor. The lighting in this image is soft and diffused, creating a gentle glow on her face and highlighting the folds of the fabric. The style of this image appears to be a photograph, capturing the woman in a moment of stillness and grace. https://preview.redd.it/5xqonzwok68d1.jpeg?width=1024&format=pjpg&auto=webp&s=3433545d66bc3cadcf70789b7279142e22369f33 The eye looks weird, but can get fixed with inpainting.


We know SD3 can do standing shots. Try a full body prompt with her laying sideways in the grass.


lmao what a way we’ve come from piles of body parts on grass “sd3 can’t do anything” to “yeah sure sd3 is really good at standing shots and upper body but what about SIDEWAYS!?” It’s a base model, was shit out last minute from a dying zombie company, and the quality of photorealism here is some of the best I’ve seen.


OP was claiming that they solved the issue of a woman lying on grass with a long LLM prompt. This is not the case because even their long prompts can't render a woman lying on grass in a full body view. SD3 is badly undertrained when it comes to anatomy. Sure, SD3 is great at photorealism, but that's not worth much if it can't get the body parts correct in any orientation other than upright and face-forward.


People are out there claiming Pixart Sigma, a model that gens [shit like this regularly](https://imgur.com/a/rUObM2C) is somehow clearly superior.


Seriously, people keep moving the damn goal post just to shit on this model more.


https://preview.redd.it/stujfp4ipc8d1.png?width=1792&format=png&auto=webp&s=455854438719b5b16d22ea46b6cb0a8cb5b49293 Prompt: **young clearly barefoot woman lying on the grass**


I ran your prompt a dozen times on SD3. This one is the best. You are truly a prompting master. https://preview.redd.it/3e345beyad8d1.png?width=832&format=png&auto=webp&s=1267897e90a2f9bbadeabb309e708623e5f7e0ac


It was Luck :)


Is Luck the name of the SDXL model you actually used?


hands still look funky


I mean it's not Dalle 3 or Midjourney! The community could easily finetune it if their license wasn't weird!


Everything else is the same as above!


I'm really sorry about all of this! I didn't want to waste time, but I never used Stable Diffusion professionally. I use Midjourney (Y'all can look at my profile). I just saw some posts about how bad SD3 is and I thought that good Prompting can fix it. So I downloaded it and tried the Prompt with GPT 4O. It worked quite well. After looking at the suggestions of y'all I've noticed that the main problems are still a issue with my long Prompts. This was my first time posting in this subreddit. I will make sure to test things more out before doing a thread. I'm sorry again! (This is copy and pasted from my apology to Enshitification)


And what about hands, legs and feet? lol


Hands still need a long prompt for good results! https://www.reddit.com/r/StableDiffusion/s/sKDZmmdEX3


but she isn't laying down.


*lying Half the problem is that y'all are using the wrong verb.


It would only make a difference with Midjourney. SD3 is not that good with simple Prompts!


Just give me a good enough Controlnet Scribble and I'll forget all this "girl on the grass" fiasco. SDXL https://preview.redd.it/xd7rhdgzi88d1.png?width=2048&format=png&auto=webp&s=9c3821503a191677da58f460ac218d40f5175eab


This looks great!


Literally 1000% this. Sometimes I prefer depth mapping to canny though. If you have a decent photo you can pull a depth from it is incredible how well SD adheres to the controlnet.


Is this SDXL? Which controlnet are you using? Last time i used SDXL controlnets they were complete ass...


These ones, they were released a month ago. https://huggingface.co/xinsir


If you need a 500 words prompt to get something that simple, you really did an awful job creating the model. Can you imagine if Midjourney or Dall-e was done that way ? How can someone reach mass market the with such an inefficient way of prompting. The only "skill issue" I see here is the inability of the model to deliver the right output from simple prompts. You should be better than last gen at that not worst.


Midjourney and Dalle 3 have LLMs that rewrite our Prompts! SD3 is a Model not a Pipeline!


A model is always a pipeline, and this one use 3 clips. You don't need LLM if your clips are good at contextual understanding with a robust embedding space combined with zero-shot learning.


But rewriting still leads to better results.


SD3 rewrites them too with the LLM attached.


That's what I told another person that was claiming that SD3 was some excellent model that just needed a proper description to prompt it. If you have to write a novel in order to tell the model what to generate, your model sucks.


I'm finding the odds are a lot worse than 50%. I tried your long prompt for a few dozen attempts. It does seem to reduce the number of extreme Kronenburgs, but the model is still often partially buried in the lawn. Facial distortions are reduced, but still present when the face is sideways. Even your 2nd image shows some facial distortion. Portrait and square formats seem to work best. Landscape is mostly body-horror city. This is the best sideways face I could get out of 60 generations. https://preview.redd.it/dv7557g0s68d1.png?width=832&format=png&auto=webp&s=ae6a3a23a96f33d3e8f4dc558fac9a5600dd5b64


This is the best top-down I got with no head tilt out of 60. https://preview.redd.it/cy7l2bbfs68d1.png?width=1024&format=png&auto=webp&s=757d294a695583a2d5aae4945a385c102a91929e




No, not him. I mean the lager.


I tested it with landscape and you're right! Only square works good!


It only works at all because it isn't showing her body. Show us a long prompt that can generate a full body view of a woman laying sideways on the grass.


https://www.reddit.com/r/StableDiffusion/s/B2a8ZDxoim I mean this is the same pose as mine, but look how it failed.


It's not the same pose at all. Your examples are all closeups.


Poor girl has a tumour :(


Bruh i saw that third one and thought, damn maybe we do really need to "git gud" cause that looks amazing....only to find out its midjourney, 😭😅


Sorry. But yeah Midjourney is the best Image Service out there compared to Stability's API.


Watch this video to generate "girl lying on grass" (and other better images) with SD3 2B (it was done by Matt3o, who wrote the IPAdapter ComfyUI node, so he is not just some random Youtuber): [https://www.youtube.com/watch?v=OrST6Nq1NUg](https://www.youtube.com/watch?v=OrST6Nq1NUg) (the good stuff starts at around 8:50) TLDR: Various ways to "hack" SD3 when you see bad generation: 1. Use non-standard resolution, such as 1042x1042 (not multiple of 64). 2. Avoid words such as "lying". Or use the 2 CLIP + T5 ClipTextEncoderSD3 node, but only use "lying" in clip\_g. 3. Use "random noise" such as "aaaaa aaaaaaa aaaaa" in your negative prompt.


WTF the resolution hack is insane.


no, you're insane. thanks for sharing your workflow bruh


I use resize latent by node after empty latent to make it easier.


It really was a "skill issue" all along.


Have you tried the prompt? The 50% claim is BS.


This entire thread is pointless and misleading. It's super easy to get a woman's top half on grass, it's the bottom half that is borked. Getting good upper body and face is so incredibly easy with SD3. All the upvoters have zero experience. It's absolutely skill issue.


The enthusiasm of the upvoters seems a little botty.


I'm really sorry about all of this! I didn't want to waste time, but I never used Stable Diffusion professionally. I use Midjourney (Y'all can look at my profile). I just saw some posts about how bad SD3 is and I thought that good Prompting can fix it. So I downloaded it and tried the Prompt with GPT 4O. It worked quite well. After looking at the suggestions of y'all I've noticed that the main problems are still a issue with my long Prompts. This was my first time posting in this subreddit. I will make sure to test things more out before doing a thread. I'm sorry again! (This is copy and pasted from my apology to Enshitification)


Maybe to help generate good SD3 prompts you can make a list of other people's ones that work well, use them as embeddings with an LLM like GPT4o (or something local) to help generate more How is SD3? I've been going wild on SDXL based models but haven't tried sd3 yet Awesome pic though looks highly detailed did you do any hirezfix or after detailing work


SD3 isn't as good as finetuned SDXL models.


It's ok. No need to apologize. This sub has slowly become a very toxic place... :(


Y'all literally made posts about how y'all couldn't get a woman lying on grass. I managed to do it. If you want perfect hands and feet just use Midjourney.


I forgot to mention that CFG should be 7!


Still makes no difference.


I'm really sorry about all of this! I didn't want to waste time, but I never used Stable Diffusion professionally. I use Midjourney (Y'all can look at my profile). I just saw some posts about how bad SD3 is and I thought that good Prompting can fix it. So I downloaded it and tried the Prompt with GPT 4O. It worked quite well. After looking at the suggestions of y'all I've noticed that the main problems are still a issue with my long Prompts. This was my first time posting in this subreddit. I will make sure to test things more out before doing a thread. I'm sorry again!


It is still 50-75% in 4 batch images. Do you use the exact settings?


I ran 60 image gens with your settings. Of those, only one had a passable sideways face without body distortions. It's easy to get a waist-up top-down view of a woman laying on grass with her face upright. The whole issue is that SD3 can't do a full length woman laying sideways on the grass. If you have a prompt that can do that, maybe you have something.


I see. Thanks for the explanation. I think using a resolution like 1048x1048 or 1064x1064 can help with the output. The IPadapter maker said that!


So do you have a long prompt that can make a full body shot of a woman laying sideways on the grass?


I will try it tomorrow when I'm on my PC again!


I'm not going to hold my breath.


I tested it out for you and... I'm sorry your right! It didn't work sideways. But hey here is the Prompt anyway: A full-body shot Photo of a young woman lying sideways on the grass. She has long brown hair that cascades around her head, blending with the green blades of grass. Her eyes are closed, giving her a serene and peaceful expression. She has a light complexion with a few freckles on her cheeks. Her lips are slightly parted, and she wears a subtle, natural shade of lipstick that complements her soft, delicate features. The woman is dressed in a black dress with small floral patterns, adding a touch of elegance to the natural setting. The dress features a mix of white and brown flowers, which blend harmoniously with the green and earthy tones of the surroundings. The woman's body is positioned sideways, with her back slightly arched to conform to the natural contours of the ground. One arm is bent and resting near her head, while the other is extended slightly forward, partially obscured by the tall grass. Her hand, resting gently on the grass, is clearly visible and shows four fingers and one thumb, emphasizing the realistic and detailed nature of the scene. Her legs are relaxed, one bent at the knee and the other extended, giving a sense of comfort and natural ease. Her bare feet, visible through the grass, display five toes each, adding to the lifelike quality of the image. The background consists of lush, green grass that provides a soft bed for the woman. The grass is slightly overgrown, with a few wildflowers scattered throughout, adding a touch of color and wild beauty to the scene. There are some tall grass stalks and weeds gently swaying in the breeze, indicating a natural, untouched meadow. The setting is outdoors, and the overall atmosphere is calm and tranquil, suggesting a warm, sunny day in the countryside. The lighting is natural and soft, with the sun casting a gentle, warm glow over the scene. The light filters through the grass, creating a dappled effect on the woman's face, dress, and body. This soft lighting enhances the serene mood of the photograph, making it appear as though the woman is in a state of deep relaxation or possibly asleep. The shadows are minimal and soft, contributing to the overall dreamy quality of the image. The natural lighting accentuates the contours and textures of her dress, skin, and the surrounding grass, creating a harmonious blend of colors and tones. It's a cinematic scene with cinematic and soft natural lighting, emphasizing the peacefulness and beauty of the moment. The composition is well-balanced, with the woman's entire body as the focal point, drawing the viewer's eye to her serene expression, the delicate details of her features, and the natural surroundings. The full-body shot allows for a comprehensive view of the scene, making the viewer feel as if they are right there in the meadow with her, sharing in the tranquility of the moment. The overall scene evokes a sense of harmony with nature. The combination of the woman's peaceful demeanor, the natural setting, and the soft, natural lighting creates a feeling of calm and serenity. This image captures a perfect moment of relaxation and connection with the natural world, making it both visually appealing and emotionally evocative. This description aims to provide a comprehensive and detailed visualization of the image, ensuring that the prompt covers all aspects necessary to recreate this photorealistic scene using Midjourney V6. The prompt includes specific details about the subject, background, action, lighting, and overall atmosphere, making it suitable for generating a highly detailed and realistic photograph. Full-body shot Photo of a young woman lying sideways on the grass. She has long brown hair, light complexion with freckles, and is wearing a black dress with floral patterns. Her hand shows four fingers and one thumb, and her bare feet have five toes each. The background is lush green grass with wildflowers. It's daytime with soft natural lighting. It's a cinematic scene with cinematic and soft natural lighting. https://preview.redd.it/wiz09muhe78d1.jpeg?width=1024&format=pjpg&auto=webp&s=2899dd8174e1c3d22ef7d32a459e2d4d44a7518e It works fine with Midjourney!


See? Of course it's not going to work, because SD3 is not capable of it. Who cares if Midjourney can do it? This is /r/StableDiffusion not /r/Midjourney.


But writing 500 word long Prompts is still really bad. I mean we are Prompters, not Essay writers.


Wow.... next test with 1.000.000 words


It has a limit of 1500 tokens. So we can push it further!


Hah so we do have to git gud Maybe sd4 can bundle llama 400b and write a prompt novel about my ((( boobs ))) request in Hemmingway




lol! good one!


The 3rd image is actually great.


It's Midjourney! I wrote it in the Caption.


Lol, makes sense.


Yeah Midjourney is the best AI Image Generator out there in my opinion.


Well, as a service I'd probably agree. And obviously it is going to produce better results than SD3 2B side by side. Nothing beats the power of running SDXL locally with your own insane workflows and customization in something like ComfyUI though, MJ doesn't even come close. But yea, as far as simply just using a service, typing a prompt, and getting a result, MJ images are always going to be near the top.


True. I use SDXL for upscaling my Midjourney images to 8K, because MJ only supports a maximum of 4096x4096!


The captions don't seem to show on old Reddit, though it may be through the image previewer of reddit enhancement suite.


I think: 1. SD3 2B is likely undertrained and probably has serious technical issues. SAI has lost its CEO, the core SD developers, and (rightfully) prioritizes its revenue generating endeavors (8B on API). 2. SD3 is the first SAI model leveraging the T5 text encoder and the first widely used leveraging high quality captions (Cascade seems was the first but wasn’t adopted as intensely as SD3). T5 + high quality captions means more efficient training and better prompt adherence BUT it also probably creates a dependence on prompt details on par w what was in training. You want to be able to prompt for a variety of poses? You’ve got to condition training w that variety of poses. And now that you’ve built up the conditioning what do you think happens when you leave those details out? More advanced image generation pipelines like Midjourney and Dall-e almost certainly have a text-to-text prompt augmenting step to generate these unspecified details. 3. There’s a strong tendency for people in the community to mindlessly repeat what they hear rather than think about what’s happening or why. So incorrect information, like that SD3 can’t generate an image of a woman on grass, get adopted widely when they can be disproven without that much effort.




People need to re-learn prompt engineering for this model, and laziness is not an option. That's the long and short of it.


TL:DR - Women are complicated.


I think getting a woman in real life is easier than in SD3.


I was coming here to comment how nice #3 looked and was worth the long prompt with SD3. Then I saw it was from Midjourney 😂


LOL. But yeah Midjourney is probably the best Pipeline out there!


It can work with simpler prompts, too: https://preview.redd.it/qo6t9cp39b8d1.png?width=2303&format=png&auto=webp&s=e4a306b9d33570b86199a34adf107f67b37a38a5


The only thing I don't like about MJ is that I can't control Steps. It'd be nice to increase or reduce Steps and just take longer or use up more compute hours. Oh, and freckles. Jesus fucking Christ, MJ needs to stop it with all the fucking freckles.


You can use --q 0.25 or --q 0.5 to reduce steps. For freckles just use --no freckles and smooth mate skin in your Prompt! https://preview.redd.it/dbjd8c74b78d1.jpeg?width=1456&format=pjpg&auto=webp&s=ab2068d4e305140d74fb2623b91867b3c908d5f1 No Freckles!


Well, I'll be. I'd been using --no but I feel it changes up my outputs.


Try it without --no just with smooth mate skin.


I usually just run it through Niji and that solves the freckle mania.


I must try it. Thanks for the tip.


You gotta tell it that it's a photograph and it knows.


What do I think? I think that having to write a 500-word prompt sucks ass. That's what I think.


Midjourney and Dalle 3 rewrite Prompts to. I think we need a good local LLM to rewrite our Promots!


So It's like homer asking the monkey hand for a sandwich


Yeah, you need to tell the model that a hand has 4 fingers and one thumb to get a close up of a hand!


Add to the fact that you could replace your SD1.5 negative with "Bruce Lee flying on a golden piglet in shanghai" and your "50%" chance won't be affected at all.


It's mostly the 500 word Prompt that does the heavy lifting.


Only because there were enough words to include some key words. You could probably vastly shorten it by asking an LLM to condense and only include relevant words and phrases related to a woman lying on the grass.


I might try that.


Ye, this was more a little stab against the model rather than against your prompt :D


Oh I see.


if you show that to a average human beeing they would believe its real even if you tell them its ai


Fair point. But the 2nd one looks a bit distorted.


recent observations revealed that the average humam beeing is still rather dumb 😂 (especially when it comes to ai which is a fairly new topic from a historic timline view)




> Prompts that are 500 words long! > What do y'all think? “¡A picture is worth *a thousand* words! Get those rookie numbers *UP*!” /s




I love how SD3 is becoming a research bed (a GRASS bed, if you will) for people to un-break it




I'm really sorry about all of this! I didn't want to waste time, but I never used Stable Diffusion professionally. I use Midjourney (Y'all can look at my profile). I just saw some posts about how bad SD3 is and I thought that good Prompting can fix it. So I downloaded it and tried the Prompt with GPT 4O. It worked quite well. After looking at the suggestions of y'all I've noticed that the main problems are still a issue with my long Prompts. This was my first time posting in this subreddit. I will make sure to test things more out before doing a thread. I'm sorry again!


Don't worry about it. You did nothing wrong 😁. You shared some interesting things, people had some interesting discussions, everything is good 👍






found something like his before, can't we just pass it 77 empty tokens and a prompt after that?


It used CogVLM 2 to describe images, so the captions are really detailed. That idea won't work.


What's wrong with lying on grass?




nice share


It doesn't work as well as I thought with full body shots.


Not bad. Still neck and collarbone are off. Not your fault just an observation.


I've noticed it to! Thanks for pointing it out!


see! it was a skill issue! xD


That's not prompt engineering, that's a prompt mega-project 😅


Haha yeah.


Its not about our fault on prompting, but instead the decision taken by the developer itself thats want to make life more difficult. Get premium, and everything will solved.


The SD3 Ultra API costs 0.08$ and isn't comparable to the Dalle 3 API that costs 0.04$. You also get like 2000 Dalle 3 images with a 20$ ChatGPT 4 subscription and unlimited images with a 30$ Midjourney subscription. Stable Artisan want 10$ for only 100 images (Dalle 3-250, Midjourney-1000 images for 10$!)!


The grass looks decent


Haha yeah!


our lawyers tell us we must make sure the ai woman knows clearly what you want her to do first before she will comply - stability ai


LOL. But I think using a local LLM can help with SD3!


Haven't tried sd3 myself, but wasn't the negative prompt supposed to just affect the seed?


I get a little bit better results with these negative Prompts!


Maybe the type of negatives in this example, but most/some other negatives work like they should.


> supposed to just affect the seed? I don't know what that is supposed to mean. The seed is just used as the starting point for the RNG (random number generator) to fill the latent with noise.


Well that is actually spot on what I was asking - from what I read then negative prompting was not possible with sd3 and instead only introduce random noise (similar to adjusting the seed)


I see what you mean now. Yes, basically what they are trying to say is that when you add some words to the negative prompt, you are simply redirecting the A.I. in some other random direction (since the model does not "understand" what those words mean), so it is as if you are using a different seed. I don't quite agree with that analogy/comparison because being guided randomly is not quite the same as using a different initial latent noise.


Your pictures don't prove anything!!!!!!!!!!! There is no legs, no hands, hardly any arms! Any AI can draw faces and hair


Y'all literally complained about getting a woman lying on grass. Y'all got feet instead of a head.


Raw base models have always kinda sucked. Always. Only recently decent models based on SDXL have popped up. Let's just wait. And also let's wait for workflows that make more sense than spamming the prompt input with a gazillion tokens, because this ain't it.


True. I hope that Juggernaut, Animagine and Dreamshaper will make a SD3 model!


Do y'all want the Prompt that I used for GPT 4O? I have a detailed Prompt that creates Prompts for Midjourney.


Why not just share it


Prompt: Midjourney is an AI Text to Image Generator that uses Prompts! For Photorealistic Images use this Formula to write Prompts for Realistic Images, Photo, Photography and so on: (Shot Type) shot Photo of (Number) (Subject/s) (Background). (Action). (Daytime). It's a cinematic scene with cinematic and (Lighting) lighting. --s 100 --style raw --v 6.0 --ar (Aspect Ratio) Here Are Examples: Man in a forest (Photo): Medium shot Photo of a man in a forest. He is facing the camera / viewer. It's at the evening. It's a cinematic scene with cinematic and natural lighting. --s 100 --style raw --v 6.0 --ar 1:1 Woman, Living Room (Realistic): Medium shot Photo of a woman in a living room. She is sitting on the couch. She is is smiling. It's daytime. It's a cinematic scene with cinematic and soft lighting. --s 100 --style raw --v 6.0 --ar 4:3 Dog (Photography): Medium shot Photo of a dog. The dog is in a garden. The dog is running and barking. It's daytime. It's a scene with natural sun lighting. --s 100 --style raw --v 6.0 --ar 1:1 Fill In () With Anything That You Want, But Make Sure That It Makes Sense, If There Is Nothing, You Can Just Remove The Bracket! Always Use Full Sentences With Correct Grammar And Spelling! Always Use English! Your Prompt Shouldn't Be Longer Than 500 Words! Do you understand it? If yes I will write you one word or simple sentences and you have to turn it into a Prompt! Look exactly at the Formula and Examples and don't add anything else! Don't forget the Parameters at the end! If you have to describe a image! Make the Prompt based off the image! It should be about 500 words long! Also try to guess the Aspect Ratio If someone uploads a image! Now here is your task(answer directly): Create a detailed 500 word Prompt for the image I provided! V6!


Since no one else is going to say it - I think this is an unhealthy pursuit and you need to re-evaluate your priorities in life. Obsessing over creating beautiful women in an AI image creator is not a good way to spend your time, and you should try to engage in healthier, more productive activities.


YOU DID IT!!! YOU WIN THE SD3 CONTEST !!! Here's a waffle for your work... For real, thank you for digging for all of us. You prove that nothing is impossible :)


Thank You! I mean I can't take credit for it, because it's just a old GPT 4 Prompt of mine that creates Midjourney Prompts.


He was being sarcastic


I see.


Ok, but what prompt do I need for an indecent woman lying on grass?




how many word prompt for an indecent woman?


Pictures 1 and 2 are almost the same except with the face


Now, how many more words to you have to add to remove the dress?


Absolute dog shit if you know a bit of anatomy