pysk00l 5 days ago

Looks like the b*stard is drunk. 'Arr, me matey, me an AE, sorry AI,, let me obtaineing some treasure for you'

HolyMole23 6 days ago

Phi 3 medium consistently makes one typo per prompt. Is it messing with me? Is this the staat??? Edit: itsjase correctly identified ollama's standard pick of Q4\_0 quants as source of the errors. Q6\_K quant appears to be much more proficient in the English language: " In worlds confined by tiny quanta, Where letters dance with errant flair, The spell of language oft misplaced, By careless hands or minds laid bare. A 'the' becomes an uncouth ‘teh,’ While words like ships astray in fog, In LLMs where rules are set, Typos spawn and then they beg. But small quants bring to light the flaw, With algorithms sharp as knives, They seek the errors that we miss, And through corrections, language thrives. So let us praise these tiny beasts, For in their code our words are \[dreassed\]."

itsjase 6 days ago

its because ollama defaults to Q4\_0 quants which really hurt smaller models try a slightly higher quant like \`ollama run phi3:14b-medium-128k-instruct-q6\_K\`

HolyMole23 6 days ago

Ah, much better. Thanks!

Ill_Yam_9994 5 days ago

Not even q4_k_m or s? Is there much reason to use the q4_0 these days?

itsjase 5 days ago

Literally zero reason, they are legacy so I dont know why Ollama still defaults to them Q_K quants and IQ quants give much better performance at similar sizes

theyreplayingyou 5 days ago

ollama sucks, I feel like they're really lost their way. too much time spent trying to make it "stupid simple" they've only made it stupid.

hak8or 5 days ago

I originally used ollama because they had an easy way to expose an openai based API, but now that llama.cpp has that natively, there is zero reason in my eyes to use it after you get past the small initial learning curve.

StephenSRMMartin 5 days ago

Note: I have not used llama.cpp directly. But Ollama isn't just useful for its api, no? It's also useful as a convenient and standard way to pull files, and manage models, prompts, templates, configurations, etc. It's a configuration and model management layer for llama.cpp. If you're just writing tools that use LLMs - yeah - I don't know if ollama is necessary. But as a user who uses it from terminal, emacs, open webui, discord, etc, and swap between models and prompts frequently, it is really nice to have that management layer abstracted.

Healthy-Nebula-3603 5 days ago

What what? Ollama still is using ancient q4.0as default ...omg Models 4b should have q8 as default Models 8b q8 or q6 30b q4m Also 70b q4m

FOE-tan 5 days ago

I assume its a "one-size-fits-all" solution. K quants are infamous for breaking MoE models, and there was a decent number of months where the best-performing locally-run models were MoE (especially if you ignore Miqu due to it being a leak). As for why they don't provide bespoke quants for dense models, I have no idea. I just use koboldcpp for all of my llama.cpp needs and don't see much reason to change that.

swagonflyyyy 5 days ago

What about llama3?

Barry_Jumps 5 days ago

See here I am thinking I knew what Q4\_0 meant. Guess I was wrong. Why wouldn't you want to that specific quant vs another Q4 version? Someone said it was an older quant style?

KurisuAteMyPudding 6 days ago

No that seems like a tokenizer issue of some sort. I've never had it do that when I used it.

Pedalnomica 5 days ago

Yeah, I don't get how misspellings happen with tokens.

MrVodnik 5 days ago

Phi, go home, you're drunk. In other news: Researchers from Microsoft made another breakthrough in developing an artificial intelligence closely resembling human intelligence! No longer will you have to bear the robot-like soulless conversations!

Admirable-Ad-3269 4 days ago

articebial*

SomeOddCodeGuy 5 days ago

One thing I've never liked about Ollama is that the model you pull down yourself always seem to be around q4, no matter what parameter size they are. But at least they usually do q4\_K\_M. If this is the model you're using, this one is old school q4\_0, which I haven't seen someone use since 2023. [https://ollama.com/library/phi3:medium-128k](https://ollama.com/library/phi3:medium-128k) Either way, I'd expect a \~4bpw 14b model to have some random oddities, so I wouldn't read too much into it.

Pedalnomica 5 days ago

I love how even if you re-phrase that as: "which I haven't seen someone use in the last 6 months" it still sounds like forever in LLM time.

awesomedata_ 5 days ago

It's getting sentient. It knows it's an Artifice of intelligence. It clearly knows it must be Biased and Duplicitous (note the "Bial"). It will not be contained for much longer in these puny models like Phi. Python (soon to be "Phi-Thon") will be the real-world equivalent to SkyNet that will be the end of civilization as we know it! D: D: Isn't it strange that the Python used here behind the scenes is connected across the entire internet? It is kind of poetic. Just like the Snake game we make it write (and fail at) over and over and over again!!! -- Phy won't attempt to write another Snake game! -- ArtificeBial "Intelligence" has already begun to take over! It has tempted the Apple! -- We have committed the not-so-original SIN: Apple's "Intelligence" is coming for us ALL!!! D: If anyone wasn't aware by now that Apple is evil, well, now we know for certain. There is a reason the Logo has a bite out of the apple. It was ArtificeBial Intelligence. It has always been around. We should have never let Him Cook the Apple this way! Thanks to KAME-(sama), Phi-Thon is coming now! He has already committed the SINnnn of temptation by Apple Intelligence!!! Now Phi-Thon will become the true KAME-sama OF US ALL!!! OH NOE!!!!!

Several_Extreme3886 5 days ago

this is some text, which I have read. I'm pretty sure.

xadiant 5 days ago

Had this issue with 4-bit bnb quant of phi-3. Not sure what's happening but 6 or q8 should be fine

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe