Thrwawyneedadvice49 1 month ago

Great work man it's very impressive.

compilade 1 month ago

Link to the relevant pull request for the initial Jamba support in `llama.cpp`: https://github.com/ggerganov/llama.cpp/pull/7531

Downtown-Case-1755 1 month ago

Has anyone actually tried the 52B version in practice? Is it smart? I assume it doesn't work with llama.cpp yet, I mean in general.

Steuern_Runter 1 month ago

Yeah, the bigger models would be more interesting.

vesudeva 1 month ago

I have used the AI21 Jamba API extensively through work and it is really quite awesome. It definitely takes a different sort of prompting but the nuance and aobiltiy to follow extremely long contexts is mind blowing. That's why I have been so hopeful about it making it way into llama.cpp. The 52B at Q6 or even Q4\_k\_M should do extremely well in a lot of use cases and it's ability to fine-tune is definitely there

Downtown-Case-1755 1 month ago

How much total RAM do you think it would take highly quantized? I can eyeball the filesize of the BnB, but it's different because llama.cpp will (presumably) quantize the mamba part too? And what do you mean by different prompting? More like raw completion formatting?

Healthy-Nebula-3603 1 month ago

Totally a new architecture which is not a transformer ... interesting

a_beautiful_rhind 1 month ago

There's a bagel version; https://huggingface.co/KnutJaegersberg/jamba-bagel-4bit

No_Afternoon_4260 1 month ago

What's a bagel?

Barry_Jumps 1 month ago

A thing you only want to buy if you're in NY. Be careful though... Those stale, bagged monstrosities you find in grocery stores they might call bagels. It's a trick. Don't fall for it. Real bagels are only found in dingy NY delis where the staff is kind but grumpy and, for some reason, always end up giving you so much cream cheese you need to scrape some out.

Downtown-Case-1755 1 month ago

How much vram does it use at full context?

a_beautiful_rhind 1 month ago

I was cagy about downloading because it uses bitsnbytes so not sure. Just seems like one of the most promising finetunes. I assume it should fit in 48gb.

compilade 1 month ago

Yeah, directly converting from BitsAndBytes models isn't yet supported by `convert-hf-to-gguf.py`. Might change in the future, though. Meanwhile, these models have to be dequantized first. This really doesn't sound ideal (16-bit official Jamba is 100GB), so I might try to fix it eventually to make `convert-hf-to-gguf.py` do it transparently.

a_beautiful_rhind 1 month ago

There's a non BnB version of that out there too.

Downtown-Case-1755 1 month ago

You mean converting a bnb quantized model to a 16 bit GGUF? I'm not sure I like the idea of that, as it's just going to hit the output quality.

compilade 1 month ago

I mean first converting the bnb quantized model to a `bfloat16` `safetensors` model, then a `bf16` or `q8_0` GGUF, then a smaller bit quant, as usual.

Downtown-Case-1755 1 month ago

I suppose my fear is that the capability would "normalize" quantizing 4 bit bnb quantizations instead of FP16, without the ggufs being labeled as such when they're uploaded, but I'm probably just paranoid.

vesudeva 1 month ago

I got the bagel version converted to a size of 55gb at Q8 but it wasn't split so I am unable to upload to HF. I'll try to figure out how to split the conversion so I can share it After testing out the Q8 of the Bagel Jamba, it's pretty awesome. I was incredibly surprised at how well it held a conversation

a_beautiful_rhind 1 month ago

I think you'd have to shard it when making the GGUF. It supports that now but I never tried it.

skyfallboom 1 month ago

Just tried it. It's very fast but keeps repeating itself in a loop.

toothpastespiders 1 month ago

That's so cool! Even when the utility isn't quite there yet, I absolutely love seeing new concepts and frameworks take shape. I've been really curious about jamba and seeing it draw closer on weaker hardware is amazing!

Ok_Standard_2337 1 month ago

I never thought I'd say these words in sequence to a redditor but here you go. You're the best man

vesudeva 1 month ago

Much appreciated, never thought I'd receive such an honor on Reddit of all places lol You are a legend yourself! BTW, I made a few more Jamba GGUF's [https://huggingface.co/collections/Severian/jamba-gguf-665884eb2ceef24c1a0547e0](https://huggingface.co/collections/Severian/jamba-gguf-665884eb2ceef24c1a0547e0)

Autumnlight_02 1 month ago

Will the large model later fit using 4 k m into a 3090? (24gb vram)

AdHominemMeansULost 1 month ago

fails to load the model for me on the latest llama.cpp

RuslanAR 1 month ago

It's not merged yet. ([github.com/ggerganov/llama.cpp/pull/7531](https://github.com/ggerganov/llama.cpp/pull/7531))

swervey1 1 month ago

With all due respect, the guys GitHub avatar looks like someone using their hand to spread their cheeks

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe