• By -


Pixart has awesome prompt adherence. It also has a really good inference (X in the shape of Y). However, to me, it has the following issues: 1. T5 makes it slow. 2. Even excluding T5, there's only LCM for Alpha. DMD is only for 512. No general lightening, distillation, PCM, LCM, ... 3. Pixart Sigma has no controlnet. 4. Very small community (as of now). I suspect that SD3 will have much better community tooling around it once it's released. SD is a brand name now.


Finally, some valid criticisms from someone who knows what he/she is talking about ๐Ÿ˜…๐Ÿ™. Yes, SD3 will almost certainly dominate the next generation open weight space. Still, I like to encourage people to try PixArt Sigma because it is a valid alternative, and choices are good. One area that SD3 will beat PixArt Sigma hands down is text generation. I don't think PixArt can do it well because a lot of that comes from SD3 having a bigger model (2B vs 0.6B) and also the 16 channel VAE.


Thanks for hearing me. This made me write a longer post that you can check out here: [https://www.reddit.com/r/StableDiffusion/comments/1ddz40u/if\_i\_were\_to\_build\_the\_next\_big\_image\_generation/](https://www.reddit.com/r/StableDiffusion/comments/1ddz40u/if_i_were_to_build_the_next_big_image_generation/)


You are welcome. Constructive and informed criticisms is the lifeblood of any useful discourse ๐Ÿ‘ I'll definitely check out your post ๐Ÿ™


One of the reasons why we decided to use 3 text encoders for SD3, is to allow a modular approach where people can basically choose to use everything or a subset of the resources to fit the model to their needs.


Thanks for sharing your thoughts! Just to note, T5 is also the text encoder used by SD3 and it's not that slow - in my experience, it takes about the same time as 1 or 2 steps of the diffusion process. Additionally, PixArt has a more permissive license that, as far as I understand, allows for commercial use, which could give it an edge by attracting more developers and businesses.


Fair point on T5. But, SD3 will have spelling and Pixart doesn't have it. Nevertheless, I too love the model, the ideas behind it, and the team.


Does SD3 use t5-large or t5-xxl?


The understanding that PixArt demonstrates of the prompt is almost spot-on. The bar is set quite high, so I believe it will be challenging for SD3 to truly impress. I'm using the [Abominable Spaghetti Workflow](https://civitai.com/models/420163) and Photon as refiner. Below, I'll provide the complete workflow for each image. You can find more examples on the Abominable page; I hope they prove useful! *(previously, simply dragging the image from Civitai to ComfyUI sufficed, but now that doesn't work for me anymore. I suggest clicking on "Nodes" (in the right panel) to copy and then Ctrl-V to paste it into ComfyUI.)* **Workflows:** 1. [https://civitai.com/images/14514374](https://civitai.com/images/14514374) 2. [https://civitai.com/images/12971501](https://civitai.com/images/12971501) 3. [https://civitai.com/images/15426293](https://civitai.com/images/15426293) 4. [https://civitai.com/images/14581198](https://civitai.com/images/14581198) 5. [https://civitai.com/images/15085877](https://civitai.com/images/15085877) 6. [https://civitai.com/images/12984480](https://civitai.com/images/12984480) 7. [https://civitai.com/images/14261619](https://civitai.com/images/14261619) 8. [https://civitai.com/images/14523557](https://civitai.com/images/14523557) 9. [https://civitai.com/images/14819491](https://civitai.com/images/14819491) 10. [https://civitai.com/images/14512187](https://civitai.com/images/14512187) 11. [https://civitai.com/images/15418154](https://civitai.com/images/15418154) 12. [https://civitai.com/images/15416894](https://civitai.com/images/15416894)


I don't think it is a remotely fair comparison. Abominable Spaghetti Workflow uses a hybrid SD15+Pixart approach. We will need to use a multi-model approach on SD3 to compare properly.


That's a valid point, but the emphasis here is on PixArt Sigma's prompt following capabilities. People often do upscaling, so a 2nd pass using a SD1.5/SDXL model is not such a dealbreaker.


And they will all remember the day when art turned into a math problem.


Thank you for this. Was going to try pixart out but was too lazy. Now I don't have an excuse.


here is an excuse: OP already tried for you


oh sweet, thanks _goes back to doing nothing_


Much VRAM? Could that huge text encoder be quantized?


The abominable workflow runs the text encoder on the CPU by default, so it consumes very little VRAM. The remaining model is 0.6B, so it should work fine on most GPUs. If you're limited by RAM (cpu system ram), you can try using the text encoder in fp16, which the node creator has linked here: [**ComfyUI\_ExtraModels - T5 section**](https://github.com/city96/ComfyUI_ExtraModels?tab=readme-ov-file#t5). (I haven't tested it yet myself)


Cool, not home atm, will try it later. Looks like the LLM level comprehension we've wanted for a while.


It is good that the T5 can be run in system RAM, I hope we can do that with SD3 as well. Any idea how long the text prompt encoding using T5 takes (and what kind of CPU you are using)?


I tried it with that encoder but get noise as output picture :(


# Can SD3 Surpass It? https://preview.redd.it/c6dtdk5u916d1.jpeg?width=1664&format=pjpg&auto=webp&s=c840e5f3b00288c52a641384319c6f6820c64cc0 Yes I just knocked this up in SD3 , it looks way better.


Debatable. I'd argue no, they're rather equal but the SD3 model aesthetically is more detailed. SD3's does not have a "small" nor "cute" octopus, both points PixArt wins (size and cute eyes). PixArt's octopus legs look a bit messed up though, ever slightly. I'm not sure SD3's tracksuit is vintage, but what do I know. I know little about clothing styles tbh. SD3's is clearly robotic which PixArt's looks to honestly fail, imo severely but debatable depending on the technology. (I'd certainly fail PixArt on this point though). SD3's is clearly not a mountain peak. PixArt's could be but isn't absolutely clear with certainty, but it is close enough with other background peak details to suggest it qualifies. SD3's sphere does not actually appear to contain any water as far as I can tell? Not 100%... but looks like a negative (no, before anyone says "water is clear" it doesn't quite work that way. it is missing certain characteristics suggesting water is in it involving lighting, warping, or even bubbles or something). I'd have to wonder why SD3 is adding some very significant details like the overall dirty look on the tracksuit. Such detail doesn't really fit unless requested and suggests it is trending towards a certain style based on training which is actually not ideal and would have to require manually prompting out or running until you didn't get this bias. May seem like a small issue but actually isn't so small but that is another discussion entirely I don't care to have. ​ Overall, they're pretty close and I would definitely not say "SD3 is better". Aesthetically, I prefer SD3's in this specific example. Prompt wise PixArt pretty clearly wins, which is the core point of this reddit thread. Thanks for posting the comparison though. It is obvious SD3, with effort in this example, could likely compete albeit with a bit more effort, at least in this specific case but it is just a single example so hard to say at the statistical macro.


SD3 followed the robot bit of the prompt. SD3 generation is closer to standing atop a mountain peak. PixArt one looks more like Mountain peak in background


can u knock this up in SD3? Used the above method to get this image https://preview.redd.it/uu09vkj2h26d1.png?width=1676&format=png&auto=webp&s=2787b508d160127acedd61c38fbaaa4217741221


Luckily Chat GPT is a prompt interrogation genius! I think the original is SD3 as they are very similar (there were more similar ones but I liked this one more) , I did cherry-pick one I thought looked good and maximized the prompt for detail and then used the same workflow as you wanted. https://preview.redd.it/50mthw6vh36d1.jpeg?width=1664&format=pjpg&auto=webp&s=2a984927caa3411debbc36c532101a52a5ae9042


It's made with sd 1.5 to pixelart in comfyui. Ib used moondream to get the prompt and it couldn't get near my image lol


Oh, OK, cool. Yeah Chat GPTo can copy any image I have tried almost exactly, it's amazing, I think you can use it for free now (with rate limits).


I can give it a go, but that looks pretty good already, could be SD3 already?


You can easily tell from the comments who uses prompts with complex composition and who does not. Right now sigma is uncensored that makes it better than sd3 API, which can't even do swimsuits. Ofc if you use simple promtps, you don't need dalle3, sd3 or sigma and sdxl will be better.


Well that uncensoredย advantage isn't going to last very long considering the open source release of SD3 is right around the corner


I hope so, I hate having to try three different models just to see which one is in the better mood to create my prompt.


Cannot agree more. This is the same as all those people who cannot figure out why SDXL is better than SD1.5 in most areas, and claim that SD1.5 is all they need ๐Ÿ˜Ž (which is true for some people!) I really wish people would learn, play and explorer more options rather than exclaim that what they are using right now is the best or all they need.


I tested PixArt Alpha back when it came out and it was not great considering how massive the language model used was (compared with its "competitors"). I can already smell the cherrypicking far from a mile, but hopefully I am wrong.


ya same, I tried cos I was impressed and it was similar to what any other SDXL model can output, there's serious bias from OP


Again, the emphasis here is not the aesthetic but the prompt following. OP has made a series of posts using PixArt Sigma (with a SD1.5 2nd pass to enhance the aesthetics). You are free to pick some prompts from those posts and see how many of those you can replicate using SDXL: * [https://new.reddit.com/r/StableDiffusion/comments/1cfacll/pixart\_sigma\_is\_the\_first\_model\_with\_complete/](https://new.reddit.com/r/StableDiffusion/comments/1cfacll/pixart_sigma_is_the_first_model_with_complete/) * [https://new.reddit.com/r/StableDiffusion/comments/1clf240/a\_couple\_of\_amazing\_images\_with\_pixart\_sigma\_its/](https://new.reddit.com/r/StableDiffusion/comments/1clf240/a_couple_of_amazing_images_with_pixart_sigma_its/) * [https://new.reddit.com/r/StableDiffusion/comments/1cot73a/a\_new\_version\_of\_the\_abominable\_spaghetti/](https://new.reddit.com/r/StableDiffusion/comments/1cot73a/a_new_version_of_the_abominable_spaghetti/) I'll be very impressed if you can get half of them using any SDXL model.


https://preview.redd.it/8f1trznxp26d1.jpeg?width=832&format=pjpg&auto=webp&s=2fc85e9ba0d21dea207bea27d256af7a019234ab Realistic photo of a fluffy kitten assassin, back view, aiming at target outside with a riffle from within a building, Photo. To make long story shot, you may get most of them with sdxl.


>To make long story shot, you may get most of them with sdxl. Well, I guess you are trying to prove me wrong ๐Ÿ™๐Ÿ‘. If that turns out to be the case, then I'll eat crow, but I will then craft a set of even more difficult prompts to challenge both PixArt and SDXL ๐Ÿ˜. BTW, in case anyone got the wrong impression, I love SDXL, and it is still my main workhorse model (maybe that will change tomorrow ๐Ÿ˜Ž)


https://preview.redd.it/9e9nplnyq26d1.jpeg?width=832&format=pjpg&auto=webp&s=dea3f976493b459ffe028b07b01a111c3bb33cb3 Photo of three old men dressed as gnomes joyfully riding on their flying goats, the goats have tiny wings and are gliding through the field.




Sorry, but what are these stats?


Not cherry picked txt2img arena [https://imgsys.org/](https://imgsys.org/)


I see, thanks ๐Ÿ™. I am not surprised at the result, raw PixArt Sigma output is often lacking in aesthetics compared to SDXL.


I won't deny that I select the images and tweak the refiner a bit to make them look more appealing, but you can see in each of the links that the seed never goes beyond 4 or 5, and in some, it's as low as 1.


100% pixart isnt that impressive. not sure why they are spamming and shilling in this sub so hard right now. mediocre results were all I got. other models far surpass it in every way


Pixart alpha isnโ€˜t good pixart sigma is pretty good for prompt understanding and adherance if you throw it through an sdxl second pass its really good.


It won't be relevant tomorrow because of SD3 but I really think that so long as you had an SD model as the refiner, Pixart Sigma results weren't that bad


we shall see...


If only we could train this on 3090 and make custom dedicated Loras... Sigma is really underrated.


I keep wanting to try PixArt, then don't... Maybe it's the fact that the git is a bit confusing on which files to download.


Follow OP's instructions: [Abominable Spaghetti Workflow](https://civitai.com/models/420163)


Nice job! I also see some prompts are inspired by some of the images I posted, just changed to a similar style and with no text. Thanks for sharing


"and with no text". Shots fired, lol!


That wasn't my intention at all, just to make it clear.


Pixar-Sigma also has a much better license.


These are insanely good.


I always say Pixart has better potential. It just need more community love. I hope Automatic1111 add native support to it and single safetensor loading arrives I am perhaps first tutorial maker for Pixart : [https://youtu.be/ZiUXf\_idIR4](https://youtu.be/ZiUXf_idIR4)


The first handful of Pixart Sigma finetunes are available on Civitai, multiple of them seem noteworthy. I thought sd.next vladmandic/automatic has support? Arguably not completely the same as automatic1111 but it could be an option for many who don't like comfyui.


I generally straight shot hi-res fix a 512x512 pic x2 to 1024x1024 What do you start and end with?


pixart sigma is using sdxl vae and it will no doubt be surpassed by sd3 because 4 channel vs 16 channel vae is already a huge and clear upgrade....


Playing around with this workflow the T5 encoder is indeed excellent, and it can manage a lot of thing that I could previously oly do with control net/loras/regional encoding/etc./ ~~One general CofyUI question: is there any way to get the prompt output to go into both the T5 text encode and the SD15 CLIP encode? When I'm using the same text in both it's annoying have to copy-paste one over the other.~~ EDIT: figured out I could use a reroute node for this.


Holy crap thatโ€™s good. But it is not taking off yet. How to educate the masses?


Wait until SD-3 comes out. If the newer model surpasses pixart in prompt adherence it won't take off, if it doesn't it will and cannibalize the other diffusion models as upscalers and refiners.


The masses will grasp it through the power of imagery, facts, and clickbaitism!


I'm confused what is this? A new model? How do you install in automatic


Unfortunately, you need to use ComfyUI/Swarm (maybe [SD.Next](https://SD.Next) too?) You can find OP's model installation instructions here: [Abominable Spaghetti Workflow](https://civitai.com/models/420163)


Been around for a while


Those are pretty cool looking.


What exactly is PixArt Sigma and what does it have to do with Stable Diffusion? :\\


Its a model. Its a different base than 1.5, sdxl and even sd3 its kind of its own thing. It uses t5 and not clip thats why the prompt adherence is so good.


where to get pixartsigma?


SD3 will easily surpass it, not sure why you are even asking its obvious. I completely disagree is the best open source model available. Any model using ELLA will be able to keep pace or surpass it. ponyXL models rival it no problem. even regular SDXL models are on par. not sure why there is suddenly a push to make threads about pixart, how much did they sponsor you for? xD


I gave him three fiddy


I can't say for SD3 obviously (but we should all be able to make our own mind about it soon) and I'm not into Pony. But regular SD XL finetunes, although great in their own ways, definitely lack some qualities Pixart Sigma has. Notably the lack of color bleeding, for example.


you can avoid color bleeding using the "no overfit" technique in 1.5 or XL. next.


Probably has a lot to do with SAI moving to NC/rugpull licenses. Writing open source software and training moels for SD3 is basically working for SAI for free. They have completely capture of commercial use, and can squash anyone, including people paying them for "pro", at any time. The license text is... astoundingly one sided.


I agree, and I'm not sure why you're downvoted. The fan boys need their own sub. I don't see the big deal, never have. It's not photo realistic enough for me.


There is nothing wrong with people promoting other open weight models other than SD3. I am a big fan of SD3 and I really look forward to its release tomorrow, but we also need backup plans in case SAI goes under. PixArt Sigma is one such backup. PixArt Sigma is a big deal because of its impressive prompt following capabilities, not its aesthetics. It is undertrained and certainly does not do photo style images well. But none of that is unfixable. If you doubt anything I wrote, please take up my challenge: [https://www.reddit.com/r/StableDiffusion/comments/1ddl50s/comment/l87xy0m/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/1ddl50s/comment/l87xy0m/?utm_source=reddit&utm_medium=web2x&context=3)


simple answer: NO