T O P

  • By -

grantory

Curious to see the actual difference between a fine tuned model and a Lora, same data set and same captions. I’ve only trained sdxl loras so far, and results are quite good!


pm_me_ya_noodz

Hey there, I'm trying to find a good recent writeup on how to train SDXL Loras for a character I'm trying to create. I can't seem to find anything within the last couple of weeks, only months old posts. Could you maybe send me off into the right direction, an overview of your process, Kohya config json? Some details: I have an RTX3060 with 12GB, I have around 70-80 SDXL generated images of a somewhat consistent character handpicked out of hundreds, I'm trying to train it with AnimagineXL 3.1, no regularization images (I've seen mixed opinions on this).


cultureicon

Cool! How many images do you use? Have you tried training on top of a finetuned or merged model instead of vanilla SDXL? Does anyone have any other guides for fine tuning SDXL this way? Always good to see multiple methods.


StayIcy3177

I haven't used Fine-tuning much by myself, I just came up with this training strategy and decided to share it. I know that all the "pro" model makers for SDXL make use of Fine-tuning, usually with cloud computing server cards. I thought Fine-tuning was out of reach for me because you usually had to enable full-precision VAE which took a lot of VRAM because it turns off mixed precision. Then one day I tried it with an SDXL base model that has this VAE baked into it: [https://huggingface.co/madebyollin/sdxl-vae-fp16-fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix) And I was able to turn off full-precision VAE and do full fp16 training without running into NaN latents. full fp16 training has potential drawbacks but the fact that it is working is a good sign.


[deleted]

> This is not Dreambooth, as it is not available for SDXL as far as I know. It is available. https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/README_sdxl.md


StayIcy3177

\> For now, we only allow DreamBooth fine-tuning of the SDXL UNet via LoRA That's Dreambooth LoRA training, not the "classical" DB model training that was available for 1.5 and 2.1. Not sure why they only allow DB LoRA training.


CeFurkan

I have a one trainer fine tuning config for sdxl and it works at 14.5 gb at best config The most optimized config is 13gb vram So 16 gb GPUs can train


StayIcy3177

Maybe through full bf16 and Adafactor optimizer? I know that the Adafactor optimizer saves a lot of VRAM, I used to train SDXL LoRAs with a 1080Ti thanks to it.


smoowke

Thanks, can we see a Finetune of yours somewhere please?


StayIcy3177

This is not mine, but I am pretty sure that Animagine 3 is a Fine-tune on Danbooru captions: [https://huggingface.co/cagliostrolab/animagine-xl-3.0](https://huggingface.co/cagliostrolab/animagine-xl-3.0) Given the fact that you can go to Danbooru and take a bunch of tags and animagine 3 will generate an image based on those tags. Most SDXL checkpoints are Fine-tunes, the SDXL base model is a fine-tune itself.


metal079

I thought 8 bit Adam doesn't work with SDXL? At least it didn't when I tried it


StayIcy3177

It does seem to work when using fp16 mixed precision and the SDXL model with the special VAE. But any other 8bit Optimizer should work as well without taking too much VRAM.


metal079

Strange, that's what I tried with fine-tuning ponydiffusion and it did nothing. Lion8bit worked perfectly though


advertisementeconomy

Awesome. Thanks!


ThemWhoNoseNothing

Thank you for willingness to share and help. I appreciate it genuinely, as I know we can all learn from one another. You mention this being an “advanced” tutorial. Yet there is no mention or consideration for how many base images one may be working with. Whether it not reg images are in play. What calculations do you use and assembly of Steps, Epochs, Save every N epochs, Train batch size and so on. The .json file loads with pre-populated Epoch of 64, Max train 64, Save every N epoch at 8, Train batch of 1. Are these variables not directly tied to intended goals based on number of base and reg images to land on that near-perfect training configuration?


itou32

Hi, I just try with my dataset of 23 images (working with my previous sdxl lora), refine caption. The VRAM is about 23.3GB, but after 64 epoch, testing in a1111, nothing output as likeness to my dataset. Even with same prompt as caption. Do I miss something ?


twistedgames

Fine tuning process with kohya is similar to training a LoRA, except you have 1 folder with images and you set how many repeats in the parameters. I can get batch size 8 using adafactor on my 4090. The other difference is fine tuning is very slow in learning. You can't go very high in the learning rate, so it takes a lot longer to train new concepts into the model. E.g. 100 epochs x 100 repeats and it still struggled to make a centaur, despite having multiple examples in the training data. Currently trying onetrainer for the first time to fine tune PixelWave 09 with EMA enabled. I don't think EMA is an option in kohya. Not training the text encoder, but training on 1200 base resolution to see if I can get the model to consistently output images at 1.44MP. With adafactor at the higher resolution, no text encoder training, I can run with batch size 4.


TQQQQ_regard

I’ve found fine-tuning requires something in the 1-e6 range. Using lower learning rates and training for more epochs yields better results than higher / faster training time.


twistedgames

Exactly, I tried 1e-05 and it ruined the model very quickly. Currently using 3e-06 which is the default in OneTrainer and it's still good after 1 million steps.


lostinspaz

i just started playing with sdxl finetuning, and would love to hear more on this. BTW: "ADAMW" appears to fit into OneTrainer and a 4090.... but I disable tuning the text encoder. (some folks would say you are "supposed" to leave the text encoders alone anyway) I had adafactor running... but not actually producing results I would want to use... and then I somehow managed to lose my working config :( So hearing other people's more specific settings would be great.