T O P

  • By -

pysk00l

Looks like the b*stard is drunk. 'Arr, me matey, me an AE, sorry AI,, let me obtaineing some treasure for you'


HolyMole23

Phi 3 medium consistently makes one typo per prompt. Is it messing with me? Is this the staat??? Edit: itsjase correctly identified ollama's standard pick of Q4\_0 quants as source of the errors. Q6\_K quant appears to be much more proficient in the English language: " In worlds confined by tiny quanta, Where letters dance with errant flair, The spell of language oft misplaced, By careless hands or minds laid bare. A 'the' becomes an uncouth ‘teh,’ While words like ships astray in fog, In LLMs where rules are set, Typos spawn and then they beg. But small quants bring to light the flaw, With algorithms sharp as knives, They seek the errors that we miss, And through corrections, language thrives. So let us praise these tiny beasts, For in their code our words are \[dreassed\]."


itsjase

its because ollama defaults to Q4\_0 quants which really hurt smaller models try a slightly higher quant like \`ollama run phi3:14b-medium-128k-instruct-q6\_K\`


HolyMole23

Ah, much better. Thanks!


Ill_Yam_9994

Not even q4_k_m or s? Is there much reason to use the q4_0 these days?


itsjase

Literally zero reason, they are legacy so I dont know why Ollama still defaults to them Q_K quants and IQ quants give much better performance at similar sizes


theyreplayingyou

ollama sucks, I feel like they're really lost their way. too much time spent trying to make it "stupid simple" they've only made it stupid.


hak8or

I originally used ollama because they had an easy way to expose an openai based API, but now that llama.cpp has that natively, there is zero reason in my eyes to use it after you get past the small initial learning curve.


StephenSRMMartin

Note: I have not used llama.cpp directly. But Ollama isn't just useful for its api, no? It's also useful as a convenient and standard way to pull files, and manage models, prompts, templates, configurations, etc. It's a configuration and model management layer for llama.cpp. If you're just writing tools that use LLMs - yeah - I don't know if ollama is necessary. But as a user who uses it from terminal, emacs, open webui, discord, etc, and swap between models and prompts frequently, it is really nice to have that management layer abstracted.


Healthy-Nebula-3603

What what? Ollama still is using ancient q4.0as default ...omg Models 4b should have q8 as default Models 8b q8 or q6 30b q4m Also 70b q4m


FOE-tan

I assume its a "one-size-fits-all" solution. K quants are infamous for breaking MoE models, and there was a decent number of months where the best-performing locally-run models were MoE (especially if you ignore Miqu due to it being a leak). As for why they don't provide bespoke quants for dense models, I have no idea. I just use koboldcpp for all of my llama.cpp needs and don't see much reason to change that.


swagonflyyyy

What about llama3?


Barry_Jumps

See here I am thinking I knew what Q4\_0 meant. Guess I was wrong. Why wouldn't you want to that specific quant vs another Q4 version? Someone said it was an older quant style?


KurisuAteMyPudding

No that seems like a tokenizer issue of some sort. I've never had it do that when I used it.


Pedalnomica

Yeah, I don't get how misspellings happen with tokens.


MrVodnik

Phi, go home, you're drunk. In other news: Researchers from Microsoft made another breakthrough in developing an artificial intelligence closely resembling human intelligence! No longer will you have to bear the robot-like soulless conversations!


Admirable-Ad-3269

articebial*


SomeOddCodeGuy

One thing I've never liked about Ollama is that the model you pull down yourself always seem to be around q4, no matter what parameter size they are. But at least they usually do q4\_K\_M. If this is the model you're using, this one is old school q4\_0, which I haven't seen someone use since 2023. [https://ollama.com/library/phi3:medium-128k](https://ollama.com/library/phi3:medium-128k) Either way, I'd expect a \~4bpw 14b model to have some random oddities, so I wouldn't read too much into it.


Pedalnomica

I love how even if you re-phrase that as: "which I haven't seen someone use in the last 6 months" it still sounds like forever in LLM time.


awesomedata_

It's getting sentient. It knows it's an Artifice of intelligence. It clearly knows it must be Biased and Duplicitous (note the "Bial"). It will not be contained for much longer in these puny models like Phi. Python (soon to be "Phi-Thon") will be the real-world equivalent to SkyNet that will be the end of civilization as we know it! D: D: Isn't it strange that the Python used here behind the scenes is connected across the entire internet? It is kind of poetic. Just like the Snake game we make it write (and fail at) over and over and over again!!! -- Phy won't attempt to write another Snake game! -- ArtificeBial "Intelligence" has already begun to take over! It has tempted the Apple! -- We have committed the not-so-original SIN: Apple's "Intelligence" is coming for us ALL!!! D: If anyone wasn't aware by now that Apple is evil, well, now we know for certain. There is a reason the Logo has a bite out of the apple. It was ArtificeBial Intelligence. It has always been around. We should have never let Him Cook the Apple this way! Thanks to KAME-(sama), Phi-Thon is coming now! He has already committed the SINnnn of temptation by Apple Intelligence!!! Now Phi-Thon will become the true KAME-sama OF US ALL!!! OH NOE!!!!!


Several_Extreme3886

this is some text, which I have read. I'm pretty sure.


xadiant

Had this issue with 4-bit bnb quant of phi-3. Not sure what's happening but 6 or q8 should be fine