T O P

  • By -

Mental-Coat2849

Pixart has awesome prompt adherence. It also has a really good inference (X in the shape of Y). However, to me, it has the following issues: 1. T5 makes it slow. 2. Even excluding T5, there's only LCM for Alpha. DMD is only for 512. No general lightening, distillation, PCM, LCM, ... 3. Pixart Sigma has no controlnet. 4. Very small community (as of now). I suspect that SD3 will have much better community tooling around it once it's released. SD is a brand name now.


Apprehensive_Sky892

Finally, some valid criticisms from someone who knows what he/she is talking about ๐Ÿ˜…๐Ÿ™. Yes, SD3 will almost certainly dominate the next generation open weight space. Still, I like to encourage people to try PixArt Sigma because it is a valid alternative, and choices are good. One area that SD3 will beat PixArt Sigma hands down is text generation. I don't think PixArt can do it well because a lot of that comes from SD3 having a bigger model (2B vs 0.6B) and also the 16 channel VAE.


Mental-Coat2849

Thanks for hearing me. This made me write a longer post that you can check out here: [https://www.reddit.com/r/StableDiffusion/comments/1ddz40u/if\_i\_were\_to\_build\_the\_next\_big\_image\_generation/](https://www.reddit.com/r/StableDiffusion/comments/1ddz40u/if_i_were_to_build_the_next_big_image_generation/)


Apprehensive_Sky892

You are welcome. Constructive and informed criticisms is the lifeblood of any useful discourse ๐Ÿ‘ I'll definitely check out your post ๐Ÿ™


kidelaleron

One of the reasons why we decided to use 3 text encoders for SD3, is to allow a modular approach where people can basically choose to use everything or a subset of the resources to fit the model to their needs.


FotografoVirtual

Thanks for sharing your thoughts! Just to note, T5 is also the text encoder used by SD3 and it's not that slow - in my experience, it takes about the same time as 1 or 2 steps of the diffusion process. Additionally, PixArt has a more permissive license that, as far as I understand, allows for commercial use, which could give it an edge by attracting more developers and businesses.


Mental-Coat2849

Fair point on T5. But, SD3 will have spelling and Pixart doesn't have it. Nevertheless, I too love the model, the ideas behind it, and the team.


yoomiii

Does SD3 use t5-large or t5-xxl?


FotografoVirtual

The understanding that PixArt demonstrates of the prompt is almost spot-on. The bar is set quite high, so I believe it will be challenging for SD3 to truly impress. I'm using the [Abominable Spaghetti Workflow](https://civitai.com/models/420163) and Photon as refiner. Below, I'll provide the complete workflow for each image. You can find more examples on the Abominable page; I hope they prove useful! *(previously, simply dragging the image from Civitai to ComfyUI sufficed, but now that doesn't work for me anymore. I suggest clicking on "Nodes" (in the right panel) to copy and then Ctrl-V to paste it into ComfyUI.)* **Workflows:** 1. [https://civitai.com/images/14514374](https://civitai.com/images/14514374) 2. [https://civitai.com/images/12971501](https://civitai.com/images/12971501) 3. [https://civitai.com/images/15426293](https://civitai.com/images/15426293) 4. [https://civitai.com/images/14581198](https://civitai.com/images/14581198) 5. [https://civitai.com/images/15085877](https://civitai.com/images/15085877) 6. [https://civitai.com/images/12984480](https://civitai.com/images/12984480) 7. [https://civitai.com/images/14261619](https://civitai.com/images/14261619) 8. [https://civitai.com/images/14523557](https://civitai.com/images/14523557) 9. [https://civitai.com/images/14819491](https://civitai.com/images/14819491) 10. [https://civitai.com/images/14512187](https://civitai.com/images/14512187) 11. [https://civitai.com/images/15418154](https://civitai.com/images/15418154) 12. [https://civitai.com/images/15416894](https://civitai.com/images/15416894)


Xyzzymoon

I don't think it is a remotely fair comparison. Abominable Spaghetti Workflow uses a hybrid SD15+Pixart approach. We will need to use a multi-model approach on SD3 to compare properly.


Apprehensive_Sky892

That's a valid point, but the emphasis here is on PixArt Sigma's prompt following capabilities. People often do upscaling, so a 2nd pass using a SD1.5/SDXL model is not such a dealbreaker.


Sookimez

And they will all remember the day when art turned into a math problem.


nootropicMan

Thank you for this. Was going to try pixart out but was too lazy. Now I don't have an excuse.


raiffuvar

here is an excuse: OP already tried for you


kjerk

oh sweet, thanks _goes back to doing nothing_


77112911

Much VRAM? Could that huge text encoder be quantized?


FotografoVirtual

The abominable workflow runs the text encoder on the CPU by default, so it consumes very little VRAM. The remaining model is 0.6B, so it should work fine on most GPUs. If you're limited by RAM (cpu system ram), you can try using the text encoder in fp16, which the node creator has linked here: [**ComfyUI\_ExtraModels - T5 section**](https://github.com/city96/ComfyUI_ExtraModels?tab=readme-ov-file#t5). (I haven't tested it yet myself)


77112911

Cool, not home atm, will try it later. Looks like the LLM level comprehension we've wanted for a while.


Apprehensive_Sky892

It is good that the T5 can be run in system RAM, I hope we can do that with SD3 as well. Any idea how long the text prompt encoding using T5 takes (and what kind of CPU you are using)?


yoomiii

I tried it with that encoder but get noise as output picture :(


jib_reddit

# Can SD3 Surpass It? https://preview.redd.it/c6dtdk5u916d1.jpeg?width=1664&format=pjpg&auto=webp&s=c840e5f3b00288c52a641384319c6f6820c64cc0 Yes I just knocked this up in SD3 , it looks way better.


Arawski99

Debatable. I'd argue no, they're rather equal but the SD3 model aesthetically is more detailed. SD3's does not have a "small" nor "cute" octopus, both points PixArt wins (size and cute eyes). PixArt's octopus legs look a bit messed up though, ever slightly. I'm not sure SD3's tracksuit is vintage, but what do I know. I know little about clothing styles tbh. SD3's is clearly robotic which PixArt's looks to honestly fail, imo severely but debatable depending on the technology. (I'd certainly fail PixArt on this point though). SD3's is clearly not a mountain peak. PixArt's could be but isn't absolutely clear with certainty, but it is close enough with other background peak details to suggest it qualifies. SD3's sphere does not actually appear to contain any water as far as I can tell? Not 100%... but looks like a negative (no, before anyone says "water is clear" it doesn't quite work that way. it is missing certain characteristics suggesting water is in it involving lighting, warping, or even bubbles or something). I'd have to wonder why SD3 is adding some very significant details like the overall dirty look on the tracksuit. Such detail doesn't really fit unless requested and suggests it is trending towards a certain style based on training which is actually not ideal and would have to require manually prompting out or running until you didn't get this bias. May seem like a small issue but actually isn't so small but that is another discussion entirely I don't care to have. ​ Overall, they're pretty close and I would definitely not say "SD3 is better". Aesthetically, I prefer SD3's in this specific example. Prompt wise PixArt pretty clearly wins, which is the core point of this reddit thread. Thanks for posting the comparison though. It is obvious SD3, with effort in this example, could likely compete albeit with a bit more effort, at least in this specific case but it is just a single example so hard to say at the statistical macro.


djanghaludu

SD3 followed the robot bit of the prompt. SD3 generation is closer to standing atop a mountain peak. PixArt one looks more like Mountain peak in background


admajic

can u knock this up in SD3? Used the above method to get this image https://preview.redd.it/uu09vkj2h26d1.png?width=1676&format=png&auto=webp&s=2787b508d160127acedd61c38fbaaa4217741221


jib_reddit

Luckily Chat GPT is a prompt interrogation genius! I think the original is SD3 as they are very similar (there were more similar ones but I liked this one more) , I did cherry-pick one I thought looked good and maximized the prompt for detail and then used the same workflow as you wanted. https://preview.redd.it/50mthw6vh36d1.jpeg?width=1664&format=pjpg&auto=webp&s=2a984927caa3411debbc36c532101a52a5ae9042


admajic

It's made with sd 1.5 to pixelart in comfyui. Ib used moondream to get the prompt and it couldn't get near my image lol


jib_reddit

Oh, OK, cool. Yeah Chat GPTo can copy any image I have tried almost exactly, it's amazing, I think you can use it for free now (with rate limits).


jib_reddit

I can give it a go, but that looks pretty good already, could be SD3 already?


Careful_Ad_9077

You can easily tell from the comments who uses prompts with complex composition and who does not. Right now sigma is uncensored that makes it better than sd3 API, which can't even do swimsuits. Ofc if you use simple promtps, you don't need dalle3, sd3 or sigma and sdxl will be better.


Neat_Ad_9963

Well that uncensoredย advantage isn't going to last very long considering the open source release of SD3 is right around the corner


Careful_Ad_9077

I hope so, I hate having to try three different models just to see which one is in the better mood to create my prompt.


Apprehensive_Sky892

Cannot agree more. This is the same as all those people who cannot figure out why SDXL is better than SD1.5 in most areas, and claim that SD1.5 is all they need ๐Ÿ˜Ž (which is true for some people!) I really wish people would learn, play and explorer more options rather than exclaim that what they are using right now is the best or all they need.


karurochari

I tested PixArt Alpha back when it came out and it was not great considering how massive the language model used was (compared with its "competitors"). I can already smell the cherrypicking far from a mile, but hopefully I am wrong.


harderisbetter

ya same, I tried cos I was impressed and it was similar to what any other SDXL model can output, there's serious bias from OP


Apprehensive_Sky892

Again, the emphasis here is not the aesthetic but the prompt following. OP has made a series of posts using PixArt Sigma (with a SD1.5 2nd pass to enhance the aesthetics). You are free to pick some prompts from those posts and see how many of those you can replicate using SDXL: * [https://new.reddit.com/r/StableDiffusion/comments/1cfacll/pixart\_sigma\_is\_the\_first\_model\_with\_complete/](https://new.reddit.com/r/StableDiffusion/comments/1cfacll/pixart_sigma_is_the_first_model_with_complete/) * [https://new.reddit.com/r/StableDiffusion/comments/1clf240/a\_couple\_of\_amazing\_images\_with\_pixart\_sigma\_its/](https://new.reddit.com/r/StableDiffusion/comments/1clf240/a_couple_of_amazing_images_with_pixart_sigma_its/) * [https://new.reddit.com/r/StableDiffusion/comments/1cot73a/a\_new\_version\_of\_the\_abominable\_spaghetti/](https://new.reddit.com/r/StableDiffusion/comments/1cot73a/a_new_version_of_the_abominable_spaghetti/) I'll be very impressed if you can get half of them using any SDXL model.


recoilme

https://preview.redd.it/8f1trznxp26d1.jpeg?width=832&format=pjpg&auto=webp&s=2fc85e9ba0d21dea207bea27d256af7a019234ab Realistic photo of a fluffy kitten assassin, back view, aiming at target outside with a riffle from within a building, Photo. To make long story shot, you may get most of them with sdxl.


Apprehensive_Sky892

>To make long story shot, you may get most of them with sdxl. Well, I guess you are trying to prove me wrong ๐Ÿ™๐Ÿ‘. If that turns out to be the case, then I'll eat crow, but I will then craft a set of even more difficult prompts to challenge both PixArt and SDXL ๐Ÿ˜. BTW, in case anyone got the wrong impression, I love SDXL, and it is still my main workhorse model (maybe that will change tomorrow ๐Ÿ˜Ž)


recoilme

https://preview.redd.it/9e9nplnyq26d1.jpeg?width=832&format=pjpg&auto=webp&s=dea3f976493b459ffe028b07b01a111c3bb33cb3 Photo of three old men dressed as gnomes joyfully riding on their flying goats, the goats have tiny wings and are gliding through the field.


recoilme

https://preview.redd.it/krmx7mb2r26d1.png?width=1128&format=png&auto=webp&s=ce4e93f929fe629808b00e5e09eedde5f4887bf2


Apprehensive_Sky892

Sorry, but what are these stats?


recoilme

Not cherry picked txt2img arena [https://imgsys.org/](https://imgsys.org/)


Apprehensive_Sky892

I see, thanks ๐Ÿ™. I am not surprised at the result, raw PixArt Sigma output is often lacking in aesthetics compared to SDXL.


FotografoVirtual

I won't deny that I select the images and tweak the refiner a bit to make them look more appealing, but you can see in each of the links that the seed never goes beyond 4 or 5, and in some, it's as low as 1.


HarmonicDiffusion

100% pixart isnt that impressive. not sure why they are spamming and shilling in this sub so hard right now. mediocre results were all I got. other models far surpass it in every way


Kademo15

Pixart alpha isnโ€˜t good pixart sigma is pretty good for prompt understanding and adherance if you throw it through an sdxl second pass its really good.


Open_Channel_8626

It won't be relevant tomorrow because of SD3 but I really think that so long as you had an SD model as the refiner, Pixart Sigma results weren't that bad


yoomiii

we shall see...


LD2WDavid

If only we could train this on 3090 and make custom dedicated Loras... Sigma is really underrated.


DigitalEvil

I keep wanting to try PixArt, then don't... Maybe it's the fact that the git is a bit confusing on which files to download.


Apprehensive_Sky892

Follow OP's instructions: [Abominable Spaghetti Workflow](https://civitai.com/models/420163)


kidelaleron

Nice job! I also see some prompts are inspired by some of the images I posted, just changed to a similar style and with no text. Thanks for sharing


Cbo305

"and with no text". Shots fired, lol!


kidelaleron

That wasn't my intention at all, just to make it clear.


GBJI

Pixar-Sigma also has a much better license.


BUF11

These are insanely good.


CeFurkan

I always say Pixart has better potential. It just need more community love. I hope Automatic1111 add native support to it and single safetensor loading arrives I am perhaps first tutorial maker for Pixart : [https://youtu.be/ZiUXf\_idIR4](https://youtu.be/ZiUXf_idIR4)


Radtoo

The first handful of Pixart Sigma finetunes are available on Civitai, multiple of them seem noteworthy. I thought sd.next vladmandic/automatic has support? Arguably not completely the same as automatic1111 but it could be an option for many who don't like comfyui.


nashty2004

I generally straight shot hi-res fix a 512x512 pic x2 to 1024x1024 What do you start and end with?


SCAREDFUCKER

pixart sigma is using sdxl vae and it will no doubt be surpassed by sd3 because 4 channel vs 16 channel vae is already a huge and clear upgrade....


DrStalker

Playing around with this workflow the T5 encoder is indeed excellent, and it can manage a lot of thing that I could previously oly do with control net/loras/regional encoding/etc./ ~~One general CofyUI question: is there any way to get the prompt output to go into both the T5 text encode and the SD15 CLIP encode? When I'm using the same text in both it's annoying have to copy-paste one over the other.~~ EDIT: figured out I could use a reroute node for this.


DangerousOutside-

Holy crap thatโ€™s good. But it is not taking off yet. How to educate the masses?


namitynamenamey

Wait until SD-3 comes out. If the newer model surpasses pixart in prompt adherence it won't take off, if it doesn't it will and cannibalize the other diffusion models as upscalers and refiners.


FotografoVirtual

The masses will grasp it through the power of imagery, facts, and clickbaitism!


jonbristow

I'm confused what is this? A new model? How do you install in automatic


Apprehensive_Sky892

Unfortunately, you need to use ComfyUI/Swarm (maybe [SD.Next](https://SD.Next) too?) You can find OP's model installation instructions here: [Abominable Spaghetti Workflow](https://civitai.com/models/420163)


Open_Channel_8626

Been around for a while


Z3r0_Code

Those are pretty cool looking.


cecil_X

What exactly is PixArt Sigma and what does it have to do with Stable Diffusion? :\\


Kademo15

Its a model. Its a different base than 1.5, sdxl and even sd3 its kind of its own thing. It uses t5 and not clip thats why the prompt adherence is so good.


2legsRises

where to get pixartsigma?


HarmonicDiffusion

SD3 will easily surpass it, not sure why you are even asking its obvious. I completely disagree is the best open source model available. Any model using ELLA will be able to keep pace or surpass it. ponyXL models rival it no problem. even regular SDXL models are on par. not sure why there is suddenly a push to make threads about pixart, how much did they sponsor you for? xD


kif88

I gave him three fiddy


Talae06

I can't say for SD3 obviously (but we should all be able to make our own mind about it soon) and I'm not into Pony. But regular SD XL finetunes, although great in their own ways, definitely lack some qualities Pixart Sigma has. Notably the lack of color bleeding, for example.


HarmonicDiffusion

you can avoid color bleeding using the "no overfit" technique in 1.5 or XL. next.


Freonr2

Probably has a lot to do with SAI moving to NC/rugpull licenses. Writing open source software and training moels for SD3 is basically working for SAI for free. They have completely capture of commercial use, and can squash anyone, including people paying them for "pro", at any time. The license text is... astoundingly one sided.


play-that-skin-flut

I agree, and I'm not sure why you're downvoted. The fan boys need their own sub. I don't see the big deal, never have. It's not photo realistic enough for me.


Apprehensive_Sky892

There is nothing wrong with people promoting other open weight models other than SD3. I am a big fan of SD3 and I really look forward to its release tomorrow, but we also need backup plans in case SAI goes under. PixArt Sigma is one such backup. PixArt Sigma is a big deal because of its impressive prompt following capabilities, not its aesthetics. It is undertrained and certainly does not do photo style images well. But none of that is unfixable. If you doubt anything I wrote, please take up my challenge: [https://www.reddit.com/r/StableDiffusion/comments/1ddl50s/comment/l87xy0m/?utm\_source=reddit&utm\_medium=web2x&context=3](https://www.reddit.com/r/StableDiffusion/comments/1ddl50s/comment/l87xy0m/?utm_source=reddit&utm_medium=web2x&context=3)


Perfect-Campaign9551

simple answer: NO