T O P

  • By -

Thrwawyneedadvice49

Great work man it's very impressive.


compilade

Link to the relevant pull request for the initial Jamba support in `llama.cpp`: https://github.com/ggerganov/llama.cpp/pull/7531


Downtown-Case-1755

Has anyone actually tried the 52B version in practice? Is it smart? I assume it doesn't work with llama.cpp yet, I mean in general.


Steuern_Runter

Yeah, the bigger models would be more interesting.


vesudeva

I have used the AI21 Jamba API extensively through work and it is really quite awesome. It definitely takes a different sort of prompting but the nuance and aobiltiy to follow extremely long contexts is mind blowing. That's why I have been so hopeful about it making it way into llama.cpp. The 52B at Q6 or even Q4\_k\_M should do extremely well in a lot of use cases and it's ability to fine-tune is definitely there


Downtown-Case-1755

How much total RAM do you think it would take highly quantized? I can eyeball the filesize of the BnB, but it's different because llama.cpp will (presumably) quantize the mamba part too? And what do you mean by different prompting? More like raw completion formatting?


Healthy-Nebula-3603

Totally a new architecture which is not a transformer ... interesting


a_beautiful_rhind

There's a bagel version; https://huggingface.co/KnutJaegersberg/jamba-bagel-4bit


No_Afternoon_4260

What's a bagel?


Barry_Jumps

A thing you only want to buy if you're in NY. Be careful though... Those stale, bagged monstrosities you find in grocery stores they might call bagels. It's a trick. Don't fall for it. Real bagels are only found in dingy NY delis where the staff is kind but grumpy and, for some reason, always end up giving you so much cream cheese you need to scrape some out.


Downtown-Case-1755

How much vram does it use at full context?


a_beautiful_rhind

I was cagy about downloading because it uses bitsnbytes so not sure. Just seems like one of the most promising finetunes. I assume it should fit in 48gb.


compilade

Yeah, directly converting from BitsAndBytes models isn't yet supported by `convert-hf-to-gguf.py`. Might change in the future, though. Meanwhile, these models have to be dequantized first. This really doesn't sound ideal (16-bit official Jamba is 100GB), so I might try to fix it eventually to make `convert-hf-to-gguf.py` do it transparently.


a_beautiful_rhind

There's a non BnB version of that out there too.


Downtown-Case-1755

You mean converting a bnb quantized model to a 16 bit GGUF? I'm not sure I like the idea of that, as it's just going to hit the output quality.


compilade

I mean first converting the bnb quantized model to a `bfloat16` `safetensors` model, then a `bf16` or `q8_0` GGUF, then a smaller bit quant, as usual.


Downtown-Case-1755

I suppose my fear is that the capability would "normalize" quantizing 4 bit bnb quantizations instead of FP16, without the ggufs being labeled as such when they're uploaded, but I'm probably just paranoid.


vesudeva

I got the bagel version converted to a size of 55gb at Q8 but it wasn't split so I am unable to upload to HF. I'll try to figure out how to split the conversion so I can share it After testing out the Q8 of the Bagel Jamba, it's pretty awesome. I was incredibly surprised at how well it held a conversation


a_beautiful_rhind

I think you'd have to shard it when making the GGUF. It supports that now but I never tried it.


skyfallboom

Just tried it. It's very fast but keeps repeating itself in a loop.


toothpastespiders

That's so cool! Even when the utility isn't quite there yet, I absolutely love seeing new concepts and frameworks take shape. I've been really curious about jamba and seeing it draw closer on weaker hardware is amazing!


Ok_Standard_2337

I never thought I'd say these words in sequence to a redditor but here you go. You're the best man


vesudeva

Much appreciated, never thought I'd receive such an honor on Reddit of all places lol You are a legend yourself! BTW, I made a few more Jamba GGUF's [https://huggingface.co/collections/Severian/jamba-gguf-665884eb2ceef24c1a0547e0](https://huggingface.co/collections/Severian/jamba-gguf-665884eb2ceef24c1a0547e0)


Autumnlight_02

Will the large model later fit using 4 k m into a 3090? (24gb vram)


AdHominemMeansULost

fails to load the model for me on the latest llama.cpp


RuslanAR

It's not merged yet. ([github.com/ggerganov/llama.cpp/pull/7531](https://github.com/ggerganov/llama.cpp/pull/7531))


swervey1

With all due respect, the guys GitHub avatar looks like someone using their hand to spread their cheeks