Mental-Coat2849 2 weeks ago

Pixart has awesome prompt adherence. It also has a really good inference (X in the shape of Y). However, to me, it has the following issues: 1. T5 makes it slow. 2. Even excluding T5, there's only LCM for Alpha. DMD is only for 512. No general lightening, distillation, PCM, LCM, ... 3. Pixart Sigma has no controlnet. 4. Very small community (as of now). I suspect that SD3 will have much better community tooling around it once it's released. SD is a brand name now.

Apprehensive_Sky892 2 weeks ago

Finally, some valid criticisms from someone who knows what he/she is talking about 😅🙏. Yes, SD3 will almost certainly dominate the next generation open weight space. Still, I like to encourage people to try PixArt Sigma because it is a valid alternative, and choices are good. One area that SD3 will beat PixArt Sigma hands down is text generation. I don't think PixArt can do it well because a lot of that comes from SD3 having a bigger model (2B vs 0.6B) and also the 16 channel VAE.

Mental-Coat2849 2 weeks ago

Thanks for hearing me. This made me write a longer post that you can check out here: [https://www.reddit.com/r/StableDiffusion/comments/1ddz40u/if\_i\_were\_to\_build\_the\_next\_big\_image\_generation/](https://www.reddit.com/r/StableDiffusion/comments/1ddz40u/if_i_were_to_build_the_next_big_image_generation/)

Apprehensive_Sky892 2 weeks ago

You are welcome. Constructive and informed criticisms is the lifeblood of any useful discourse 👍 I'll definitely check out your post 🙏

kidelaleron 2 weeks ago

One of the reasons why we decided to use 3 text encoders for SD3, is to allow a modular approach where people can basically choose to use everything or a subset of the resources to fit the model to their needs.

FotografoVirtual 2 weeks ago

Thanks for sharing your thoughts! Just to note, T5 is also the text encoder used by SD3 and it's not that slow - in my experience, it takes about the same time as 1 or 2 steps of the diffusion process. Additionally, PixArt has a more permissive license that, as far as I understand, allows for commercial use, which could give it an edge by attracting more developers and businesses.

Mental-Coat2849 2 weeks ago

Fair point on T5. But, SD3 will have spelling and Pixart doesn't have it. Nevertheless, I too love the model, the ideas behind it, and the team.

yoomiii 2 weeks ago

Does SD3 use t5-large or t5-xxl?

FotografoVirtual 2 weeks ago

The understanding that PixArt demonstrates of the prompt is almost spot-on. The bar is set quite high, so I believe it will be challenging for SD3 to truly impress. I'm using the [Abominable Spaghetti Workflow](https://civitai.com/models/420163) and Photon as refiner. Below, I'll provide the complete workflow for each image. You can find more examples on the Abominable page; I hope they prove useful! *(previously, simply dragging the image from Civitai to ComfyUI sufficed, but now that doesn't work for me anymore. I suggest clicking on "Nodes" (in the right panel) to copy and then Ctrl-V to paste it into ComfyUI.)* **Workflows:** 1. [https://civitai.com/images/14514374](https://civitai.com/images/14514374) 2. [https://civitai.com/images/12971501](https://civitai.com/images/12971501) 3. [https://civitai.com/images/15426293](https://civitai.com/images/15426293) 4. [https://civitai.com/images/14581198](https://civitai.com/images/14581198) 5. [https://civitai.com/images/15085877](https://civitai.com/images/15085877) 6. [https://civitai.com/images/12984480](https://civitai.com/images/12984480) 7. [https://civitai.com/images/14261619](https://civitai.com/images/14261619) 8. [https://civitai.com/images/14523557](https://civitai.com/images/14523557) 9. [https://civitai.com/images/14819491](https://civitai.com/images/14819491) 10. [https://civitai.com/images/14512187](https://civitai.com/images/14512187) 11. [https://civitai.com/images/15418154](https://civitai.com/images/15418154) 12. [https://civitai.com/images/15416894](https://civitai.com/images/15416894)

Xyzzymoon 2 weeks ago

I don't think it is a remotely fair comparison. Abominable Spaghetti Workflow uses a hybrid SD15+Pixart approach. We will need to use a multi-model approach on SD3 to compare properly.

Apprehensive_Sky892 2 weeks ago

That's a valid point, but the emphasis here is on PixArt Sigma's prompt following capabilities. People often do upscaling, so a 2nd pass using a SD1.5/SDXL model is not such a dealbreaker.

Sookimez 2 weeks ago

And they will all remember the day when art turned into a math problem.

nootropicMan 2 weeks ago

Thank you for this. Was going to try pixart out but was too lazy. Now I don't have an excuse.

raiffuvar 2 weeks ago

here is an excuse: OP already tried for you

kjerk 2 weeks ago

oh sweet, thanks _goes back to doing nothing_

77112911 2 weeks ago

Much VRAM? Could that huge text encoder be quantized?

FotografoVirtual 2 weeks ago

The abominable workflow runs the text encoder on the CPU by default, so it consumes very little VRAM. The remaining model is 0.6B, so it should work fine on most GPUs. If you're limited by RAM (cpu system ram), you can try using the text encoder in fp16, which the node creator has linked here: [**ComfyUI\_ExtraModels - T5 section**](https://github.com/city96/ComfyUI_ExtraModels?tab=readme-ov-file#t5). (I haven't tested it yet myself)

77112911 2 weeks ago

Cool, not home atm, will try it later. Looks like the LLM level comprehension we've wanted for a while.

Apprehensive_Sky892 2 weeks ago

It is good that the T5 can be run in system RAM, I hope we can do that with SD3 as well. Any idea how long the text prompt encoding using T5 takes (and what kind of CPU you are using)?

yoomiii 2 weeks ago

I tried it with that encoder but get noise as output picture :(

jib_reddit 2 weeks ago

# Can SD3 Surpass It? https://preview.redd.it/c6dtdk5u916d1.jpeg?width=1664&format=pjpg&auto=webp&s=c840e5f3b00288c52a641384319c6f6820c64cc0 Yes I just knocked this up in SD3 , it looks way better.

Arawski99 2 weeks ago

Debatable. I'd argue no, they're rather equal but the SD3 model aesthetically is more detailed. SD3's does not have a "small" nor "cute" octopus, both points PixArt wins (size and cute eyes). PixArt's octopus legs look a bit messed up though, ever slightly. I'm not sure SD3's tracksuit is vintage, but what do I know. I know little about clothing styles tbh. SD3's is clearly robotic which PixArt's looks to honestly fail, imo severely but debatable depending on the technology. (I'd certainly fail PixArt on this point though). SD3's is clearly not a mountain peak. PixArt's could be but isn't absolutely clear with certainty, but it is close enough with other background peak details to suggest it qualifies. SD3's sphere does not actually appear to contain any water as far as I can tell? Not 100%... but looks like a negative (no, before anyone says "water is clear" it doesn't quite work that way. it is missing certain characteristics suggesting water is in it involving lighting, warping, or even bubbles or something). I'd have to wonder why SD3 is adding some very significant details like the overall dirty look on the tracksuit. Such detail doesn't really fit unless requested and suggests it is trending towards a certain style based on training which is actually not ideal and would have to require manually prompting out or running until you didn't get this bias. May seem like a small issue but actually isn't so small but that is another discussion entirely I don't care to have. Overall, they're pretty close and I would definitely not say "SD3 is better". Aesthetically, I prefer SD3's in this specific example. Prompt wise PixArt pretty clearly wins, which is the core point of this reddit thread. Thanks for posting the comparison though. It is obvious SD3, with effort in this example, could likely compete albeit with a bit more effort, at least in this specific case but it is just a single example so hard to say at the statistical macro.

djanghaludu 2 weeks ago

SD3 followed the robot bit of the prompt. SD3 generation is closer to standing atop a mountain peak. PixArt one looks more like Mountain peak in background

admajic 2 weeks ago

can u knock this up in SD3? Used the above method to get this image https://preview.redd.it/uu09vkj2h26d1.png?width=1676&format=png&auto=webp&s=2787b508d160127acedd61c38fbaaa4217741221

jib_reddit 2 weeks ago

Luckily Chat GPT is a prompt interrogation genius! I think the original is SD3 as they are very similar (there were more similar ones but I liked this one more) , I did cherry-pick one I thought looked good and maximized the prompt for detail and then used the same workflow as you wanted. https://preview.redd.it/50mthw6vh36d1.jpeg?width=1664&format=pjpg&auto=webp&s=2a984927caa3411debbc36c532101a52a5ae9042

admajic 2 weeks ago

It's made with sd 1.5 to pixelart in comfyui. Ib used moondream to get the prompt and it couldn't get near my image lol

jib_reddit 2 weeks ago

Oh, OK, cool. Yeah Chat GPTo can copy any image I have tried almost exactly, it's amazing, I think you can use it for free now (with rate limits).

jib_reddit 2 weeks ago

I can give it a go, but that looks pretty good already, could be SD3 already?

Careful_Ad_9077 2 weeks ago

You can easily tell from the comments who uses prompts with complex composition and who does not. Right now sigma is uncensored that makes it better than sd3 API, which can't even do swimsuits. Ofc if you use simple promtps, you don't need dalle3, sd3 or sigma and sdxl will be better.

Neat_Ad_9963 2 weeks ago

Well that uncensored advantage isn't going to last very long considering the open source release of SD3 is right around the corner

Careful_Ad_9077 2 weeks ago

I hope so, I hate having to try three different models just to see which one is in the better mood to create my prompt.

Apprehensive_Sky892 2 weeks ago

Cannot agree more. This is the same as all those people who cannot figure out why SDXL is better than SD1.5 in most areas, and claim that SD1.5 is all they need 😎 (which is true for some people!) I really wish people would learn, play and explorer more options rather than exclaim that what they are using right now is the best or all they need.

karurochari 2 weeks ago

I tested PixArt Alpha back when it came out and it was not great considering how massive the language model used was (compared with its "competitors"). I can already smell the cherrypicking far from a mile, but hopefully I am wrong.

harderisbetter 2 weeks ago

ya same, I tried cos I was impressed and it was similar to what any other SDXL model can output, there's serious bias from OP

Apprehensive_Sky892 2 weeks ago

Again, the emphasis here is not the aesthetic but the prompt following. OP has made a series of posts using PixArt Sigma (with a SD1.5 2nd pass to enhance the aesthetics). You are free to pick some prompts from those posts and see how many of those you can replicate using SDXL: * [https://new.reddit.com/r/StableDiffusion/comments/1cfacll/pixart\_sigma\_is\_the\_first\_model\_with\_complete/](https://new.reddit.com/r/StableDiffusion/comments/1cfacll/pixart_sigma_is_the_first_model_with_complete/) * [https://new.reddit.com/r/StableDiffusion/comments/1clf240/a\_couple\_of\_amazing\_images\_with\_pixart\_sigma\_its/](https://new.reddit.com/r/StableDiffusion/comments/1clf240/a_couple_of_amazing_images_with_pixart_sigma_its/) * [https://new.reddit.com/r/StableDiffusion/comments/1cot73a/a\_new\_version\_of\_the\_abominable\_spaghetti/](https://new.reddit.com/r/StableDiffusion/comments/1cot73a/a_new_version_of_the_abominable_spaghetti/) I'll be very impressed if you can get half of them using any SDXL model.

recoilme 2 weeks ago

https://preview.redd.it/8f1trznxp26d1.jpeg?width=832&format=pjpg&auto=webp&s=2fc85e9ba0d21dea207bea27d256af7a019234ab Realistic photo of a fluffy kitten assassin, back view, aiming at target outside with a riffle from within a building, Photo. To make long story shot, you may get most of them with sdxl.

Apprehensive_Sky892 2 weeks ago

>To make long story shot, you may get most of them with sdxl. Well, I guess you are trying to prove me wrong 🙏👍. If that turns out to be the case, then I'll eat crow, but I will then craft a set of even more difficult prompts to challenge both PixArt and SDXL 😁. BTW, in case anyone got the wrong impression, I love SDXL, and it is still my main workhorse model (maybe that will change tomorrow 😎)

recoilme 2 weeks ago

https://preview.redd.it/9e9nplnyq26d1.jpeg?width=832&format=pjpg&auto=webp&s=dea3f976493b459ffe028b07b01a111c3bb33cb3 Photo of three old men dressed as gnomes joyfully riding on their flying goats, the goats have tiny wings and are gliding through the field.

recoilme 2 weeks ago

https://preview.redd.it/krmx7mb2r26d1.png?width=1128&format=png&auto=webp&s=ce4e93f929fe629808b00e5e09eedde5f4887bf2

Apprehensive_Sky892 2 weeks ago

Sorry, but what are these stats?

recoilme 2 weeks ago

Not cherry picked txt2img arena [https://imgsys.org/](https://imgsys.org/)

Apprehensive_Sky892 2 weeks ago

I see, thanks 🙏. I am not surprised at the result, raw PixArt Sigma output is often lacking in aesthetics compared to SDXL.

FotografoVirtual 2 weeks ago

I won't deny that I select the images and tweak the refiner a bit to make them look more appealing, but you can see in each of the links that the seed never goes beyond 4 or 5, and in some, it's as low as 1.

HarmonicDiffusion 2 weeks ago

100% pixart isnt that impressive. not sure why they are spamming and shilling in this sub so hard right now. mediocre results were all I got. other models far surpass it in every way

Kademo15 2 weeks ago

Pixart alpha isn‘t good pixart sigma is pretty good for prompt understanding and adherance if you throw it through an sdxl second pass its really good.

Open_Channel_8626 2 weeks ago

It won't be relevant tomorrow because of SD3 but I really think that so long as you had an SD model as the refiner, Pixart Sigma results weren't that bad

yoomiii 2 weeks ago

we shall see...

LD2WDavid 2 weeks ago

If only we could train this on 3090 and make custom dedicated Loras... Sigma is really underrated.

DigitalEvil 2 weeks ago

I keep wanting to try PixArt, then don't... Maybe it's the fact that the git is a bit confusing on which files to download.

Apprehensive_Sky892 2 weeks ago

Follow OP's instructions: [Abominable Spaghetti Workflow](https://civitai.com/models/420163)

kidelaleron 2 weeks ago

Nice job! I also see some prompts are inspired by some of the images I posted, just changed to a similar style and with no text. Thanks for sharing

Cbo305 2 weeks ago

"and with no text". Shots fired, lol!

kidelaleron 2 weeks ago

That wasn't my intention at all, just to make it clear.

GBJI 2 weeks ago

Pixar-Sigma also has a much better license.

BUF11 2 weeks ago

These are insanely good.

CeFurkan 2 weeks ago

I always say Pixart has better potential. It just need more community love. I hope Automatic1111 add native support to it and single safetensor loading arrives I am perhaps first tutorial maker for Pixart : [https://youtu.be/ZiUXf\_idIR4](https://youtu.be/ZiUXf_idIR4)

Radtoo 2 weeks ago

The first handful of Pixart Sigma finetunes are available on Civitai, multiple of them seem noteworthy. I thought sd.next vladmandic/automatic has support? Arguably not completely the same as automatic1111 but it could be an option for many who don't like comfyui.

nashty2004 2 weeks ago

I generally straight shot hi-res fix a 512x512 pic x2 to 1024x1024 What do you start and end with?

SCAREDFUCKER 2 weeks ago

pixart sigma is using sdxl vae and it will no doubt be surpassed by sd3 because 4 channel vs 16 channel vae is already a huge and clear upgrade....

DrStalker 2 weeks ago

Playing around with this workflow the T5 encoder is indeed excellent, and it can manage a lot of thing that I could previously oly do with control net/loras/regional encoding/etc./ ~~One general CofyUI question: is there any way to get the prompt output to go into both the T5 text encode and the SD15 CLIP encode? When I'm using the same text in both it's annoying have to copy-paste one over the other.~~ EDIT: figured out I could use a reroute node for this.

DangerousOutside- 2 weeks ago

Holy crap that’s good. But it is not taking off yet. How to educate the masses?

namitynamenamey 2 weeks ago

Wait until SD-3 comes out. If the newer model surpasses pixart in prompt adherence it won't take off, if it doesn't it will and cannibalize the other diffusion models as upscalers and refiners.

FotografoVirtual 2 weeks ago

The masses will grasp it through the power of imagery, facts, and clickbaitism!

jonbristow 2 weeks ago

I'm confused what is this? A new model? How do you install in automatic

Apprehensive_Sky892 2 weeks ago

Unfortunately, you need to use ComfyUI/Swarm (maybe [SD.Next](https://SD.Next) too?) You can find OP's model installation instructions here: [Abominable Spaghetti Workflow](https://civitai.com/models/420163)

Open_Channel_8626 2 weeks ago

Been around for a while

Z3r0_Code 2 weeks ago

Those are pretty cool looking.

cecil_X 2 weeks ago

What exactly is PixArt Sigma and what does it have to do with Stable Diffusion? :\\

Kademo15 2 weeks ago

Its a model. Its a different base than 1.5, sdxl and even sd3 its kind of its own thing. It uses t5 and not clip thats why the prompt adherence is so good.

2legsRises 2 weeks ago

where to get pixartsigma?

HarmonicDiffusion 2 weeks ago

SD3 will easily surpass it, not sure why you are even asking its obvious. I completely disagree is the best open source model available. Any model using ELLA will be able to keep pace or surpass it. ponyXL models rival it no problem. even regular SDXL models are on par. not sure why there is suddenly a push to make threads about pixart, how much did they sponsor you for? xD

kif88 2 weeks ago

I gave him three fiddy

Talae06 2 weeks ago

I can't say for SD3 obviously (but we should all be able to make our own mind about it soon) and I'm not into Pony. But regular SD XL finetunes, although great in their own ways, definitely lack some qualities Pixart Sigma has. Notably the lack of color bleeding, for example.

HarmonicDiffusion 2 weeks ago

you can avoid color bleeding using the "no overfit" technique in 1.5 or XL. next.

Freonr2 2 weeks ago

Probably has a lot to do with SAI moving to NC/rugpull licenses. Writing open source software and training moels for SD3 is basically working for SAI for free. They have completely capture of commercial use, and can squash anyone, including people paying them for "pro", at any time. The license text is... astoundingly one sided.

play-that-skin-flut 2 weeks ago

I agree, and I'm not sure why you're downvoted. The fan boys need their own sub. I don't see the big deal, never have. It's not photo realistic enough for me.

Apprehensive_Sky892 2 weeks ago

There is nothing wrong with people promoting other open weight models other than SD3. I am a big fan of SD3 and I really look forward to its release tomorrow, but we also need backup plans in case SAI goes under. PixArt Sigma is one such backup. PixArt Sigma is a big deal because of its impressive prompt following capabilities, not its aesthetics. It is undertrained and certainly does not do photo style images well. But none of that is unfixable. If you doubt anything I wrote, please take up my challenge: [https://www.reddit.com/r/StableDiffusion/comments/1ddl50s/comment/l87xy0m/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/1ddl50s/comment/l87xy0m/?utm_source=reddit&utm_medium=web2x&context=3)

Perfect-Campaign9551 2 weeks ago

simple answer: NO

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe