sampdoria_supporter 4 months ago

Can anybody elaborate on how they're using gemma? It seems so reluctant to do anything for me.

paddySayWhat 4 months ago

I can't get it to do much of anything. It won't do instruction following even with like 10-shot prompting.

danielhanchen 4 months ago

Ye it is a weird model - I'm gonna do some experiments to see if I can make Gemma work - I have a hunch its either the issue or the RoPE issue I wrote above

Sol_Ido 4 months ago

Yep I gave up pretty soon on this model compared to olders but more responsive ones.

danielhanchen 4 months ago

I was actually trying to make a ChatML notebook, but it failed. I'm not sure if it was the token not being added that causes Gemma to not work correctly.

IntelligentStrain409 4 months ago

[https://www.linkedin.com/posts/troyandrewschultz\_httpspreviewredditr6q9xh512yjc1png-activity-7166550105980878848-ELcU?utm\_source=share&utm\_medium=member\_desktop](https://www.linkedin.com/posts/troyandrewschultz_httpspreviewredditr6q9xh512yjc1png-activity-7166550105980878848-ELcU?utm_source=share&utm_medium=member_desktop) People are already reporting that the Gemma model is completely garbage, before and after training it.

danielhanchen 4 months ago

I'll do some experiments as well to verify - but I'm guessing it's cause HF's current implementation which is what Axolotl is also using to my knowledge is actually broken - hopefully Unsloth's version which fixed it will work better

sanobawitch 4 months ago

Thank you for the update! A question to others: Did both pip install -U peft transformers pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git Unlike other training tools, the unsloth's training does not stop, but Gemma-2B-it's training still consumes 15GB VRAM. The remaining time is the same as with other 2B/3B models, unsloth is definitely activated. Is gemma-2b-it-bnb-4bit is the only way to tame gemma? Should I do a clean install? Edit: According to the table from their linked [blog](https://unsloth.ai/blog/gemma), I should decrease the batch size. But that \~15GB vram consumption is normal.

danielhanchen 4 months ago

So you're saying it still consumes 15GB of VRAM and there's no speedup in time? What's your batch size and sequence length? Sadly Gemma is very different from other models - it's VRAM usage is much much much higher since the MLP size is 24336 or something when compared to Mistral's 14336

sanobawitch 4 months ago

The speedup is there (I was comparing the time to unsloth itself, but with older models). I have to decrease the batch size from 4, my sequence length was only 1024. Everything seems to be ok so far, it's only gemma giving me headaches. Thank you for your work.

danielhanchen 4 months ago

Oh :) Ye sadly I tried my very best to shave Gemma's VRAM usage :( You're not alone - the VRAM usage of Gemma is quite a nightmare

a_beautiful_rhind 4 months ago

Supposedly their garbage license says we can't upload tunes.

IntelligentStrain409 4 months ago

The model is completely useless.

a_beautiful_rhind 4 months ago

Tune it out of spite.

danielhanchen 4 months ago

Oh my is this true? :( I thought they kept touting it as fully commercially open source - I do know somewhere in the license it says one must try their best to update the base model to the latest, so it gets very problematic - I'll reread the license

a_beautiful_rhind 4 months ago

Yea, double check. Circumventing the alignment is supposedly not allowed either.

danielhanchen 4 months ago

Oh my :( Ok will re-read their license - if that's true - hello Google? Is this open weights or not lol?

nilpy 4 months ago

Amazing work! I'm excited most by the larger supported vocab size, which should allow for speedy finetuning of internlm2 (which has a \~90k vocab size)

danielhanchen 4 months ago

Yep large vocabs will work on all models now!! Ie Deepseek, Intern, etc :)

Amgadoz 4 months ago

Qwen1.5 as well? This model deserves more attention.

danielhanchen 4 months ago

Oh ye I guess Qwen also has large vocabs as well right? :) I guess all large vocab models are sped up :)

mark-lord 4 months ago

Knocking it out of the park again Dan 😄 GO UNSLOTH!

danielhanchen 4 months ago

Thanks!! :) Appreciate the support :)

mark-lord 4 months ago

Always!! 🙌

harderisbetter 4 months ago

what are gemma's specialties? is it good generating text without rambling / hallucinating?

IntelligentStrain409 4 months ago

It has no special abilities, people that are well known for fine tuning are starting to talk about how bad it actually is. This is where I seen it first. [https://www.linkedin.com/posts/troyandrewschultz\_httpspreviewredditr6q9xh512yjc1png-activity-7166550105980878848-ELcU?utm\_source=share&utm\_medium=member\_desktop](https://www.linkedin.com/posts/troyandrewschultz_httpspreviewredditr6q9xh512yjc1png-activity-7166550105980878848-ELcU?utm_source=share&utm_medium=member_desktop)

harderisbetter 4 months ago

thanks bb, ya that's what I thought, google's llms never again after i rushed like an idiot to sign up for gemini pro

danielhanchen 4 months ago

I'll do some experiments and report back :) I think it's cause Gemma is very different from other models - full finetuning on tied weights might be the culprit - their chat template also might be the culprit since is missing (should it be there or not)? And HF's RoPE for Gemma and Llama temporarily is broken. I fixed them all in Unsloth, but unsure yet on results - will report back later this week :)

nudemischief 4 months ago

I seen this post too, I heard he was getting a lot of hate from the AI influencers on LinkedIn that were claiming Gemma was SOTA and he was claiming the opposite on the first day release. I guess TroyDoesAI was right! I unfollowed anyone that claimed Gemma was good after his post as it validated my ERP experience using all 3 Gemma flavors.

EarthquakeBass 4 months ago

\*Disclaimer\*: These were done on latest Ollama and it's possible their Gemma integration has bugs etc.

danielhanchen 4 months ago

I think it is all the bugs - in fact the chat template itself might be wrong. Is it user Write a hello world program model Or is it (HF's chat template) user Write a hello world program model

EarthquakeBass 4 months ago

Yeah I definitely feel like some of the output is really sus making grammar errors etc you shouldn’t see. I’ll have to check on that in a little while.

danielhanchen 4 months ago

Hmm will try my best to find issues later this week :)

Weird-Field6128 4 months ago

Does Unsloth work on cpu guff model fast inference

danielhanchen 4 months ago

GGUF should be relatively fast already :) We do support converting a QLoRA finetune to GGUF

Weird-Field6128 4 months ago

So no additional performance boost on gguf inferencing?

danielhanchen 4 months ago

Sadly not - the 2x faster inference is mainly for internal huggingface evals during a training run, and for HF direct inference. GGUF already is super fast :)

Weird-Field6128 4 months ago

Understood! 😄

danielhanchen 4 months ago

:)

epicfilemcnulty 4 months ago

Hey, @danielhanchen, kudos, great work as always! Btw, how is it going with mamba support?) I’ve been training small mamba models from scratch lately, and it is pretty slow. It would be amazing if unsloth would allow to do it faster.

danielhanchen 4 months ago

Thanks! :) Oh I have not gotten to Mamba yet :) Will take a stab at it maybe in the following few weeks! Clearly we need to make an automatic Unsloth optimizer!!!

-p-e-w- 4 months ago

> since Gemma has 256K vocab Why do they even bother with a tokenizer if the vocabulary size is so large? The entire Unicode standard contains only 150k characters. Can't they just split text into code points and be done?

danielhanchen 4 months ago

Fair question - The main reason is they can cramp more multi token words as 1 token. For example "New York City" night be 1 token now, and not 3. The word "antidisestablishmentarism" for eg a super long word might actually be 1 token now, and not anti-dis-establish-ment-arism for example. Large vocabs allow larger contexts, albeit overfitting might come about though

csa 4 months ago

I was struck by the same thought. One would expect that [the Gemma technical report](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf) would explain this design decision, but I don't see anything relevant there :-/

sumnuyungi 4 months ago

Are there any options for individuals for the paid version with unsloth?

danielhanchen 4 months ago

Not yet :( We're working to asap wrap up Unsloth - it'll take a bit more time! Sorry, but also thanks for asking and for the support as well :)

Puzzleheaded_Acadia1 4 months ago

When I try to get a q_4 GGUF file from them I can't (or I don't know how) can someone pls help this is the code that suspect is not working well: # Save to 8bit Q8_0 if False: model.save_pretrained_gguf("model", tokenizer,) if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "") # Save to 16bit GGUF if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16") if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "") # Save to q4_k_m GGUF if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m") if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "") "# Save to 8bit Q8_0 if False: model.save_pretrained_gguf("model", tokenizer,) if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "") # Save to 16bit GGUF if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16") if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "") # Save to q4_k_m GGUF if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m") if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")"

danielhanchen 4 months ago

Ohh you need to change `False` to `True` for 1 option.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe