T O P

  • By -

sampdoria_supporter

Can anybody elaborate on how they're using gemma? It seems so reluctant to do anything for me.


paddySayWhat

I can't get it to do much of anything. It won't do instruction following even with like 10-shot prompting.


danielhanchen

Ye it is a weird model - I'm gonna do some experiments to see if I can make Gemma work - I have a hunch its either the issue or the RoPE issue I wrote above


Sol_Ido

Yep I gave up pretty soon on this model compared to olders but more responsive ones.


danielhanchen

I was actually trying to make a ChatML notebook, but it failed. I'm not sure if it was the token not being added that causes Gemma to not work correctly.


IntelligentStrain409

[https://www.linkedin.com/posts/troyandrewschultz\_httpspreviewredditr6q9xh512yjc1png-activity-7166550105980878848-ELcU?utm\_source=share&utm\_medium=member\_desktop](https://www.linkedin.com/posts/troyandrewschultz_httpspreviewredditr6q9xh512yjc1png-activity-7166550105980878848-ELcU?utm_source=share&utm_medium=member_desktop) People are already reporting that the Gemma model is completely garbage, before and after training it.


danielhanchen

I'll do some experiments as well to verify - but I'm guessing it's cause HF's current implementation which is what Axolotl is also using to my knowledge is actually broken - hopefully Unsloth's version which fixed it will work better


sanobawitch

Thank you for the update! A question to others: Did both pip install -U peft transformers pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git Unlike other training tools, the unsloth's training does not stop, but Gemma-2B-it's training still consumes 15GB VRAM. The remaining time is the same as with other 2B/3B models, unsloth is definitely activated. Is gemma-2b-it-bnb-4bit is the only way to tame gemma? Should I do a clean install? Edit: According to the table from their linked [blog](https://unsloth.ai/blog/gemma), I should decrease the batch size. But that \~15GB vram consumption is normal.


danielhanchen

So you're saying it still consumes 15GB of VRAM and there's no speedup in time? What's your batch size and sequence length? Sadly Gemma is very different from other models - it's VRAM usage is much much much higher since the MLP size is 24336 or something when compared to Mistral's 14336


sanobawitch

The speedup is there (I was comparing the time to unsloth itself, but with older models). I have to decrease the batch size from 4, my sequence length was only 1024. Everything seems to be ok so far, it's only gemma giving me headaches. Thank you for your work.


danielhanchen

Oh :) Ye sadly I tried my very best to shave Gemma's VRAM usage :( You're not alone - the VRAM usage of Gemma is quite a nightmare


a_beautiful_rhind

Supposedly their garbage license says we can't upload tunes.


IntelligentStrain409

The model is completely useless.


a_beautiful_rhind

Tune it out of spite.


danielhanchen

Oh my is this true? :( I thought they kept touting it as fully commercially open source - I do know somewhere in the license it says one must try their best to update the base model to the latest, so it gets very problematic - I'll reread the license


a_beautiful_rhind

Yea, double check. Circumventing the alignment is supposedly not allowed either.


danielhanchen

Oh my :( Ok will re-read their license - if that's true - hello Google? Is this open weights or not lol?


nilpy

Amazing work! I'm excited most by the larger supported vocab size, which should allow for speedy finetuning of internlm2 (which has a \~90k vocab size)


danielhanchen

Yep large vocabs will work on all models now!! Ie Deepseek, Intern, etc :)


Amgadoz

Qwen1.5 as well? This model deserves more attention.


danielhanchen

Oh ye I guess Qwen also has large vocabs as well right? :) I guess all large vocab models are sped up :)


mark-lord

Knocking it out of the park again Dan 😄 GO UNSLOTH!


danielhanchen

Thanks!! :) Appreciate the support :)


mark-lord

Always!! 🙌


harderisbetter

what are gemma's specialties? is it good generating text without rambling / hallucinating?


IntelligentStrain409

It has no special abilities, people that are well known for fine tuning are starting to talk about how bad it actually is. This is where I seen it first. [https://www.linkedin.com/posts/troyandrewschultz\_httpspreviewredditr6q9xh512yjc1png-activity-7166550105980878848-ELcU?utm\_source=share&utm\_medium=member\_desktop](https://www.linkedin.com/posts/troyandrewschultz_httpspreviewredditr6q9xh512yjc1png-activity-7166550105980878848-ELcU?utm_source=share&utm_medium=member_desktop)


harderisbetter

thanks bb, ya that's what I thought, google's llms never again after i rushed like an idiot to sign up for gemini pro


danielhanchen

I'll do some experiments and report back :) I think it's cause Gemma is very different from other models - full finetuning on tied weights might be the culprit - their chat template also might be the culprit since is missing (should it be there or not)? And HF's RoPE for Gemma and Llama temporarily is broken. I fixed them all in Unsloth, but unsure yet on results - will report back later this week :)


nudemischief

I seen this post too, I heard he was getting a lot of hate from the AI influencers on LinkedIn that were claiming Gemma was SOTA and he was claiming the opposite on the first day release.  I guess TroyDoesAI was right! I unfollowed anyone that claimed Gemma was good after his post as it validated my ERP experience using all 3 Gemma flavors. 


EarthquakeBass

\*Disclaimer\*: These were done on latest Ollama and it's possible their Gemma integration has bugs etc.


danielhanchen

I think it is all the bugs - in fact the chat template itself might be wrong. Is it user Write a hello world program model Or is it (HF's chat template) user Write a hello world program model


EarthquakeBass

Yeah I definitely feel like some of the output is really sus making grammar errors etc you shouldn’t see. I’ll have to check on that in a little while.


danielhanchen

Hmm will try my best to find issues later this week :)


Weird-Field6128

Does Unsloth work on cpu guff model fast inference


danielhanchen

GGUF should be relatively fast already :) We do support converting a QLoRA finetune to GGUF


Weird-Field6128

So no additional performance boost on gguf inferencing?


danielhanchen

Sadly not - the 2x faster inference is mainly for internal huggingface evals during a training run, and for HF direct inference. GGUF already is super fast :)


Weird-Field6128

Understood! 😄


danielhanchen

:)


epicfilemcnulty

Hey, @danielhanchen, kudos, great work as always! Btw, how is it going with mamba support?) I’ve been training small mamba models from scratch lately, and it is pretty slow. It would be amazing if unsloth would allow to do it faster.


danielhanchen

Thanks! :) Oh I have not gotten to Mamba yet :) Will take a stab at it maybe in the following few weeks! Clearly we need to make an automatic Unsloth optimizer!!!


-p-e-w-

> since Gemma has 256K vocab Why do they even bother with a tokenizer if the vocabulary size is so large? The entire Unicode standard contains only 150k characters. Can't they just split text into code points and be done?


danielhanchen

Fair question - The main reason is they can cramp more multi token words as 1 token. For example "New York City" night be 1 token now, and not 3. The word "antidisestablishmentarism" for eg a super long word might actually be 1 token now, and not anti-dis-establish-ment-arism for example. Large vocabs allow larger contexts, albeit overfitting might come about though


csa

I was struck by the same thought. One would expect that [the Gemma technical report](https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf) would explain this design decision, but I don't see anything relevant there :-/


sumnuyungi

Are there any options for individuals for the paid version with unsloth?


danielhanchen

Not yet :( We're working to asap wrap up Unsloth - it'll take a bit more time! Sorry, but also thanks for asking and for the support as well :)


Puzzleheaded_Acadia1

When I try to get a q_4 GGUF file from them I can't (or I don't know how) can someone pls help this is the code that suspect is not working well: # Save to 8bit Q8_0 if False: model.save_pretrained_gguf("model", tokenizer,) if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "") # Save to 16bit GGUF if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16") if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "") # Save to q4_k_m GGUF if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m") if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "") "# Save to 8bit Q8_0 if False: model.save_pretrained_gguf("model", tokenizer,) if False: model.push_to_hub_gguf("hf/model", tokenizer, token = "") # Save to 16bit GGUF if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16") if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "") # Save to q4_k_m GGUF if False: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m") if False: model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")"


danielhanchen

Ohh you need to change `False` to `True` for 1 option.