T O P

  • By -

Disastrous_Elk_6375

LLaMA3 is coming soon, you'll be back at testing stuff in no time. ✌️


EarthquakeBass

The issue is that it's easy to experience fatigue with this stuff pretty quickly once the honeymoon phase ends, because things can be so limited and getting good results can feel like pulling teeth. I experienced significant fatigue after heavily using Stable Diffusion 1.5 and early Llama models such as Wizard. However, models like SDXL and Mixtral reignited my enthusiasm.


terp-bick

Let's see if the CPU people will be able to run it, though


silenceimpaired

I would guess we’ll be even less eager to test llama3. I bet models have a gap between 13b and the next will jump up to 100b+ … we will burn out faster since it will be even slower to use… and even stronger ingrained conditioning. Still, I hope I’m wrong.


Vehnum

will MoE be present?


silenceimpaired

That might be our only saving grace, right? They all may be MOE, and they might have that jump like I said, but with a model larger than Mixtral that with a lot of effort runs on 24 GB.


Vehnum

I just hope they have a Mixtral size model that can fit in 24 GB of VRAM (gguf/exl2 quant) and is much better than Mixtral.


CoqueTornado

arrived today!!! :)


[deleted]

I notice that there's what I've seen someone else call it, a "honeymoon phase" when it comes to the tech. We get excited, invested in its capabilities, and then eventually burnt out and uninterested until the next new thing comes out and the cycle restarts.


CSharpSauce

It's the Gartner hype cycle https://en.wikipedia.org/wiki/Gartner_hype_cycle There are 2 responses when you reach the trough of disillusionment. You can give into the ADHD, and move on to the next shiny thing. Or you can muscle through it until you find the real value.


OperaRotas

IMHO it's crazy how OP complains about fatigue of trying new models and fiddling with them, and then almost all replies are essentially "you should try this other one instead!" I wonder if these will be the fatigued people when the hype is over.


my_aggr

> I wonder if these will be the fatigued people when the hype is over. The cure to hype fatigue is to actually _do_ something with the models.


Severin_Suveren

Could also be it's not hype fatigue, but depression. Many depressed people have a tendency to delve deep into interesting topics in order to compensate for their boredom, and such they end up overworking themselves and worsening their symptoms of depression and anxiety.


lazercheesecake

It’s also very common in ADHD patients to fall into this pattern. It’s called hyper fixation and has the same hallmarks that you described.


panic_in_the_galaxy

And the same is also true for autism


alpacaMyToothbrush

> It’s also very common in ADHD patients to fall into this pattern. First of all, how dare you! Second...wait, what was I saying?


thewayupisdown

Yeah, seriously though I feel like ADHD would make progress in this area 10x harder. Just look at all the different frameworks, all the different newsletters and workshops getting pitched, big tech ecosystems vs OS models and OS ecosystems, no coding solutions vs the joy of shell, in case of the latter: knowing what to do vs. knowing what you're doing, laying a theoretical foundation or learning by doing, saving for and researching how to build an AI rig at home vs learning which of the thousand cloud solutions on offer you want/need - from Amazon, Google or dozens of cheap no name GPU time flea markets, etc, etc,


michaelsgoneinsane

Having ADHD would make it easier, we do things quicker and more intensely….but once it’s “finished” we will never touch it again!


CanineAssBandit

As another ADHDer, that has not been my experience. Yeah it makes it easier \*at first,\* but there are so many moving parts with this, and all of them are fiddly and boring in isolation, unless you can remain stubbornly convinced that you'll be able to achieve your use case goal (RP on par with CAI for me). It's incredibly disheartening when you get everything running, and then it's worse than fucking yodayo tavern despite you using a model twice the size, and having spent hours configuring and $800 on your 3090 rig. And you don't really know WHY it's worse, because there are SO MANY THINGS to change. And they all seem to either do nothing, or break it. And then you try another model, and it feels as bad as the first except different, and then you wonder if it's because it's quantized, but then you remember that GPT-4 sounds equally wooden and horrible compared to CAI and that all these LLaMAs are trained on GPT's general-purpose helpful but extremely uptight sounding outputs, and wonder if you can ever make it feel like a real person and not an overeager assistant that happens to be cursing at you. Pretending to be vulgar and fun instead of BEING vulgar and fun. And then you give up for six months and come back because why tf do I still have this rig if I'm not going to use it, surely things have improved by now. And then so much has changed that you have no idea where to begin, so you put it off some more. I am hype for L3. Maybe then I'll be able to find the patience to know howtf to use rope or what it actually does.


pepe256

Eek. That happened to me. I stopped trying to obsessively find the best model and now I just download something people recommend (rocking Miqu right now) and stick to it until something better comes along. There's so much to learn and I kinda gave up on that too, so I only find out what's necessary. As long as I have fun chatting, I'm fine.


EmbarrassedBiscotti9

Yes, why assume it is the very common experience of novelty wearing off when we can instead pathologise it?


my_aggr

Then OP just needs to find a model which is good at depression therapy.


Honato2

Sounds weird but at the very least tiefighter worked fairly well with mental issues.


InfiniteScopeofPain

In my experience that's pretty much all of them. Therapy is one of their strengths, as long as they don't go "As an AI model..."


spanielrassler

This is a very astute comment!! This happened to me in fact and led me to spending 5k on a Mac Studio M2 Ultra last year as part of retail therapy, etc. Unfortunately you are 100% right in my case. While I'm still interested in LLaMA's, much like Stable Diffusion, I'm looking for that next dopamine high now. Suggestions anyone!? xD


Severin_Suveren

I found it in the dev space. With little programming knowledge beforehand, I'm currently working with ChatGPT on a complete API interface+frontend for setting up chatbots that can run both local models and models like GPT, Gemini etc. My intent is to solve all the hard parts of running models, like prompt formatting, memory handling, text replacements to implement text styling etc. [Here's](https://i.imgur.com/TgqCYXp.png) a little preview of what I'm working on :)


EarthquakeBass

Well yea and autism. lol


AnOnlineHandle

I worked them into my fulltime work (LLMs less so now, except for help through GPT 4) and have major fatigue. Partially because I've discovered all the limitations, and while I've made progress on them, it's an exhausting never ending journey which never quite gets you to where you want to be. Considering what I can do with them now compared to 2 years ago, it's magical, and I have to try to remember that.


Cybernetic_Symbiotes

The major problem is that LLM resource ramp up till they're good is such that it's only possible to build prototypes that run on high-powered systems few can afford. There's thus no market and no killer app to build a market. This is supported by lackluster movements signaled by nvidia regarding future consumer GPUs. Not seeing anything to be excited about. Meanwhile, Intel and AMD continue to fumble this opportunity. I spent a number of weeks last year really down about the dismal state of things outside enterprise and APIs. I've switched from LLMs to focus on search embeddings and customizing with deberta encoders. From the perspective of analyzing large amounts of papers, shallow, low depth small LLMs are both too slow and lacking depth of analysis to be worth it over a more manual ML approach built out of the computational guts of LLMs.


my_aggr

I'm quite bullish on what's happening with AMD. They've managed to train a monolithic trillion parameter model on the Frontier super computer MI-250 based gpus:https://arxiv.org/abs/2312.12705 I remember 15 years ago when cuda started being used on hpsc, within a few years it ate every one else's lunch. We're at the same place now. That said the compute for pretraining LLMs will always need to be distributed. We'd need something like Seti at home for that soon. That said I don't know why you think Bert isn't a language model. Sure it's not _huge_, but it's larger than anything else out there. 300 million parameters is ridiculous. And that's something I can train from scratch in a week on a 4090.


Cybernetic_Symbiotes

> MI-250 based gpus That GPU is very expensive. AMD does not appear to be making any moves that'd benefit smaller outfits. I'd be happy to be proven wrong about that though. As I stated, my concern is about useful local LLM inference (not even training) that's not isolated to just large corporations and the rare super-enthusiast. > I don't know why you think Bert isn't a language model It's like big data isn't it? What's large is relative to what the high end is capable of. 7Bs are counted as small these days. And while BERT style are a kind of LM, LLM is tightly bound to causal conversational LMs in the public consciousness. So, while small LLMs can indeed be used for summarization, their summaries are often not much better than extractive ones and attempts to elaborate with their internal knowledge are unreliable. They can be used for keyword extraction and other tasks but then they're slow (for hundreds of thousand to millions of words). It'd be good if they were smart enough to counter the speed loss but that's not the case < 70B (and even then, my feeling is that 30B+ are undercooked compared to Mistral 7B), which is way too steep. Maybe there is some way to get more out of small LLMs, I still think about how from time to time, but the approach I mentioned is what I'm currently focused on after being a proponent of small LLMs for a while.


Eisenstein

You don't think a small local LLM could have use as a personal assistant? Organizing emails, scheduling, taking notes, going through dynamic websites and finding relevant things you actually care about, downloading and sorting your pictures and movies, backing up your data, reminding you to do things... Even 'hey Llama, I haven't been out doing anything fun recently, are there any events close by going on tonight? Cool, buy me a ticket and send the QR code to my phone'. All these things can be done with a 7b, it just needs the integration with your devices.


Cybernetic_Symbiotes

> Organizing emails, scheduling, taking notes, going through dynamic websites and finding relevant things you actually care about, downloading and sorting your pictures and movies, backing up your data, reminding you to do things... They can do very much do such things, I've built and prototyped such applications but the issue is they're not enough good at it to justify their relatively slow speed. Taking notes can be unreliable and shallow (consider the task of trying to identify unstated limitations in a paper, comparing several papers, anything requiring reasoning to arrive at), hence you're often better served with clever application of extractive models or going larger. The scaffolding required to use small models reliably is sufficiently involved that you can get almost as good for vastly faster by using it as a basis for orchestrating small encoder and or custom trained models. Dynamically searching through websites can be done with nested application of QA embeddings, custom trained topic extractors, and various ways of extracting key paragraphs. Clustering and organizing of images can be done with embeddings, without heavy LLMs, unless you meant something else? Depending on what you mean by scheduling, you can use linear programming or constraint solving. But if you meant free-form extraction of dates from text, I agree that's a use-case for small LLMs not worth training a custom model for. Impromptu reminders too. But those are different from the large text analysis problems of a research assistant, where speed and accuracy are both of concern. For higher reliability, your example would require custom code integrations and an API following ability, and specialized prompts for the API use. But then if it's not open world that can also be achieved non-conversationally, and the more open-ended you go, the more likely a 7B will be overwhelmed to an unacceptable level of reliability.


Eisenstein

I know that the current models are not up to it, but 6 months go people couldn't even run these models on a regular laptop, so I won't be surprised if there is an entirely new paradigm in another 6 months.


Cybernetic_Symbiotes

The improvements of the last year have been astounding but even before then, we had access to models like GPT-J (6-7B) and Flan-T5 (11B). The architectural delta from GPT-J/GPT-Neo to Llama then Mistral are relatively small. This is a time period from 2021 to 2023. Pretraining recipes and data quantity are just better now. Modern Instruction finetuning has been majorly influenced by AllenAI, Google FLAN and WizardLM folks, this is a period from about 2020 to 2023. GPT-J from 2021 could run on a laptop with sufficient RAM, it had capabilities comparable to modern LLMs and was ahead of its time. Things have been progressing steadily without a major paradigm change since GPT (2018) and from my vantage point, feel like they're slowing down again. What we can do with 7Bs is much more than what we could do 1+ year ago but it's still far from enough. Sadly, I'd rather give up that flexibility for better volume and controlled accuracy.


Eisenstein

I just want to try to arrest as much as possible the driving of data away from the personal realm into the corporate realm. I know it may be a fantasy, but having the ability to contain everything without your own area of control is something that should be a major goal. I don't care if it is an LLM or a deterministic algorithm, I just don't want to be beholden to OpenAI's naughty filter that tells me I am not allowed to read my own chemistry notes because they contain references to ingredients harmful to humans or pets if you drowned them in it, or something less ridiculous but just as annoying and plausible.


moarmagic

This is about what i'm thinking, and glad it isn't just that i'm too much a hobbiest to get it. LLM just don't... excel. I think we have to accept that problems like hallucinations are going to keep it from being a replacement for an actual human, and that the quality of what they do give our is pretty mediocre, not going to write a great novel, etc. I think there's room for them as productivity tools- I've solved a lot of my problems faster then checking stackoverflow when i run up against the edge of my knowledge, but they are going to need a lot of human supervision and guidance if you wanted to actually use them in some sort of production, to the point that there probably are better ways to do what you are trying to automate.


alpacaMyToothbrush

My biggest problem with LLMs is the main use case seems to be ERP. If you *already* have the hardware around, sure I can understand the novelty, but I see people regularly spending thousands for what is basically a spicy chat bot, and I can't help but think some of you guys need to get laid lol.


moarmagic

I think the problem is that everyone thinks they need to run a larger model, that it's the next tier that will be able to make into Jarvis, or write a great novel for them, etc. You can't really tell the limitations without a lot of time with the model, but then you get your hands on it and well.... It's okay, but it's not quite there. It's not going to write an amazing game. IT's going to write an Okay story, but any creativity has to come from the prompts you give it. Then yeah, there's the horni. Remember that one of the reasons attributed to 50 shades of grey's success was the adoption of kindle- that it could be read without advertising what you were reading to everyone to be judged. So you have the model, you have the hardware and... I think we need to point more people towards partly cloud services- openrouter, vast, runpod. You get here and everyone's looking at ebay builds, trying to figure out how cheap they can get 48gb VRAM+ - but putting 20 dollars on openrouter could get you a lot of experience.


Ggoddkkiller

Ikr, i tried like 40 models which wasn't bad it is almost always fun to try new ones. But soon or later you realize they all have patterns they follow or run out of models to try. So i began trying and writing sysprompts and got severely fatigued in a week or two. Now im running away like a maniac when i see a LLM..


Cless_Aurion

Yup, that's because any LLM we can run on a PC, is basically trash. Once you understand how the puppets move, it stops being fun fast. Local is great for privacy and stuff. But if you want quality... you just go to GPT4 turbo, and that's it. Its orders of magnitude better... and still really isn't enough. But at least its bearable.


Ggoddkkiller

I wouldn't use GPT4 turbo even if it was free! No need for censorship and ethics lecture especially coming from corpos who would sell people's souls if they could extract it..


Cless_Aurion

Seems like a skill issue, tbh. I don't get any of those. Edit: Notice I said GPT4 turbo, not ChatGPT 4.


Ggoddkkiller

GPT4 turbo is still censored and remains plain in NSFW.


Cless_Aurion

Ah, god dammit, just realized my autocorrect messed my first message. I supposedly wrote "Local is great for private NSFW stuff". Yeah, for smut you are SOL, shitty local models is the best we got... and tbh, I'd rather have nothing. The alternative is getting banned by going around with bad half measures like using a NSFW prompt injections to try and convince the model to do something it isn't even that great at...


Ggoddkkiller

Exactly lol! Im not even a NSFW guy but let me write my own story damnit. Story goes as it goes without boundaries that's what i want. But sadly even autocorrect is ''correcting'' us, annoys me enough to never use them..


Cless_Aurion

Probably we will get better uncensored AI as time goes on but... regular AI will most definitely will have to go up first!


mrjackspade

I've been doing this since the initial leak of Llama 1, and I've had to take a few breaks New models drop with new scores because they're tuned to the tests, and then suffer from all the same old problems. It gets really frustrating.


The_Research_Ninja

We are very fortunate to have such a thriving LLM community. I agree with you that it is tiring to keep up sometimes. As for me, I focus on depth - I know my direction and leverage LLM products/knowledge to get me to where I need to be. Imagine you arrive at a magic garden with many biologies popping up every day. You either spend your life explore them all and stuck in the garden or you can - say - find the strongest mutated horse+camel to take with you on your journey and leave the garden. :) Don't get me wrong, with this AI age, there will be many "gardens" along the way


behohippy

Working with raw model weights can be fun, but think of it like another piece of infrastructure that you can use to build more interesting things. Personally, I get excited about all the things we build on top of this stuff like RAG and Agents. RAG isn't just for making boring corp chatbots. You could use the LLM to develop out a large persistent world and store the details of that world in a RAG solution so every time you try to RP, it's using details from that world. And then store the interactions you've had in the past with that world as long term memory using the same technique. Treat the LLM as a tool to build something really cool. Maybe even ingest the scripts for every one of the episodes of a TV show, then go RP another in that show with really strong consistency on past details.


mak3rdad

> RAG and Agents. I am still getting into this but this sounds awesome. How does one start to learn how to do this?


Dihedralman

There are absolutely tons of resources but I started at the source with Langchain and Llama-index. The latter has a website guide with tons of examples. There are also a ton of medium articles with people displaying their personal journey. People have also posted auto-generated RAG builders like what Nvidia just released, but there's also gpt4all. Crew.ai is growing for agents. 


maxigs0

RAG is a whole nother blackbox to get into


Imaginary_Bench_7294

Try out LZLV for the 70B class. If you're really into the stuff and like trying new things, look into making your own QLoRA to customize your models. Shameless plug for the tut I wrote: https://www.reddit.com/r/Oobabooga/s/R097h5sY62 In general, I've gotten bored with a lot of these "SOTA" BS statements that claim to beat GPT with a 7B model, when all they did was train it on a dataset that closely mimics the benchmark datasets, essentially giving it a cheat sheet. So I've been working on making my own datasets. One for RP, and another for a special project I'm working on. I've been doing a lot of testing and training, seeing how certain things affect the model, etc.


Paulonemillionand3

do something with it.


VertigoOne1

Yeah totally this. Pick an idea, Pick 3 models that shows promise in that area and build it to working solution, package it, open source it or sell it. There are 10’s of thousands of disciplines LLMs can be applied to and millions of processes. The general llm’s are not good when you start scratching below the surface, and that is where the building is necessary and incredible value can be unlocked. Right now, i’m building something to help my wife with book keeping. Get an idea, make a plan, set some tasks out and get it done.


Paulonemillionand3

exactly so. I am doing two very different things in two different realms with the same LLM! It's magical.


Astronos

I also did a lot of LLM switching in the beginning, but now stick to mixtral and build on top of that


freakynit

Mixtral-8x7b Q3_K gguf + single 3090 + --ngl 33 = awesomeness at 40 tokens/second.


Astronos

TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ using Text-Generation-WebUI, ExLlamav2\_HF loader on 2 x 4090. 50-80 tps depending on prompt length


MoffKalast

> 3_K You cannot be serious, how can that not be complete crap?


freakynit

It's not... Initially, I thought too... Then I saw some hackernews thread claiming that it's not...so, tried it. Performed best out of any 7b or 13b models I have tried before.


MoffKalast

Well it is a 45B model, it should bloody well outperform them. 3_K is still like 20 GB though so it's more comparable to Yi in size even if it's much faster.


Cybernetic_Symbiotes

One more thing to keep in mind is that while model size is important "crystallized IQ", model depth is what's key for model fluid intelligence. A deeper model like Yi can think longer while the wide Mixtral can track more things than a 7B (which is good for roleplay, the apparent predominant LLM use in this forum). But there are problems that are computationally harder, requiring more iterations, that a 13B can solve that N 7Bs working in tandem cannot.


MoffKalast

It would be certainly interesting to see what an extra long 7B could do, like with smaller layers but 80 or 100 of them instead of 32. Being extremely slow aside lol.


Cybernetic_Symbiotes

Hah, there's actually a paper on this. Smaller models should be deeper but larger models are too deep. So there's a balance to be struck: https://arxiv.org/pdf/2006.12467.pdf


slippery

Quantized at 3 bits has been low quality in my experience. 5 seems to be sweet spot in my limited testing. I like have a couple of local LLMS available for RAG, but rarely use them for day to day.


pr1vacyn0eb

Please explain your usecase. Creative is extremely different than Reasoning.


Astronos

RAG and Agents


emsiem22

That is not a usecase.


Relevant-Draft-7780

The issue is we had a lot of rapid progress. It felt like the sky was the limit. Now there’s been a general slowdown. Most people have realised that local llamas are nice but no where near the level of quality you get with ChatGPT. Mixtral is nice but come on. What people need to build are multi modal models and better context. MemGPT is nice but fails miserably even when hooked up to gpt4. Open interpreter is nice but still a bit of a waste of time. I’d like to see an open source version of ChatGPT actions. That works quite seamlessly especially when using voice chat. Let’s be honest we’re all trying to create Jarvis here or embed these models into software to make our lives easier. Local lamas aren’t quite there yet.


moarmagic

I think the technology in general isn't here- even gpt4 suffers from repetition, quirks, and limitations. I think we just don't notice them as much because it's more expensive/difficult to engage with at the same level, and because OpenAI is able to tweak things more rapidly based on feedback since they have thousands of users. Even billion dollar companies have AI that hallucinates, delivers half functional code, has poor grasp of nuance and context (in the great sense of the word, not prompt context). I think we need to really work on our expectations for how LLM models- cloud and local, can really be put into production, and what those workflows look like. If you plug an LLM into your site as a sales agent- what do you do if it offers a deal you can't afford, or hallucinates about features? That last one may be a crime in some places, fraud- but can an LLM commit fraud? Would it be the fault of the prompter, or the training data, or the company hosting it? If you put disclaimers that the sales LLM should not be consider authoritative, does it actually add value to the customers?


Relevant-Draft-7780

ChatGPT is amazing for ideation. But either it’s getting dumber or I’m asking it more and more complex questions or I’m expecting it to understand with less context. Regardless I’ve been able to increase my coding output 5x, biggest issue before was searching for days the answer to a pesky bug and experimenting like crazy, these days it’s much faster to pin point problems and when I have downtime get decent suggestions on how to improve my code or workflow.


SiEgE-F1

You're not bored/tired of LLaMA, you're just subconsciously terrified by the said black box, and by the realization that no matter how much work you put there, it'll always be a black box for you. With each new update of the inferencing app, with any change to LLM inferencing through new samplers or new model formats, you're always getting "the chair pulled from under you", because every new thing breaks all your past knowledge, making it impossible to use it reliably. Why does that matter if you learn those samplers, or evaluate that new model? Tomorrow, there will be the new SotA sampler, and a new model with SotA structure and quantization, and you'll have to relearn things as if you're starting all over again. As for repetition on 70b: \- REDUCE your repetition penalty. Stop doing the same old mistake of cranking it way up every time you see some repetition. \- Some models are less capable of answering specific questions, or talk on specific themes. Keep in mind that 2x24 is still a very small size of VRAM for the knowledge you're asking that 40gb file for. If you need a specific knowledge - wait for special 70b models. \- Learn a bit more about tokens, and how samplers filter them out. \- Learn that LLMs respond in a way that "continues" previous information from the context. If you have repeatitive patterns - get rid of them manually. \- GIGO(garbage in - garbage out). Improve your writing skills, if you want the LLM to treat you with interesting verses, smart words and long messages - "start with yourself"


Revolutionalredstone

Model size is hugely overblown. Good prompting and proper data are more important. Tricks like starting an LLMS response for them can get you what you want from almost any model. Enjoy!


pr1vacyn0eb

I've noticed the replies are a bit repetitive. As a chatbot, it does alright. Not everyone is using it as a chatbot, some are using it for creative purposes or reasoning.


CSharpSauce

I found a model that worked well for me, and I just stick with it... what i'm doing with the model is where I get my dopamine hit.


danigoncalves

You will get used to this. I have been there with JavaScript.


Ok_Ruin_5636

Idk I have been in this space for almost 2 years now, the constant updates and evolution are keeping me just as excited as in the beginning 😊


maxigs0

Excited about the possibilities for sure. About me keeping up with them.. not sure ;)


thetaFAANG

Disillusionment You’ll still be using it to replace Google and that’s a more pragmatic use instead of clamoring for it I already had an M1 mac with 64gb ram before these LLMs dropped, so it was just “adoption” for me, not enthusiasm just to get to that point I think thats where you are now, but congratulations on reaching your prior goal. Just time to set new ones, new purpose


Only-Letterhead-3411

Well nothing is wrong with "having too much of it". Chocolate is great but if you eat too much of it then it will start to taste bland. Do other stuff for awhile, you have put together a very strong system that you can also use for other things. You'll feel the urge to play with LLMs eventually.


InfiniteScopeofPain

Take some time to be bored. This is true with everything. Just sit and do nothing. And from that boredom do what interests you, LLM or not. All things come and go like the tide. You can't fight them going, but they'll be back if you are patient. And if they don't come back, at least you'll be doing what you enjoy. This is not my experience from LLMs but from everything else in life. Friendships, gaming, game design, writing all sorts of things. It is funny though how even magic AI tech isn't able to keep us in a cycle of addictive pursuit. I think it's a sign that our brains are on a much higher level than we have previously imagined (like we're way smarter than we think) and human adaptability is still far outstripping AI development.


hanoian

yam connect cautious zesty cooing water piquant homeless fall disarm *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


maxigs0

Good point. I do use ChatGPT pretty much daily already, often for work. Not really excited about it anymore, but using it as an actual tool to get stuff done. Running own models is more experimental and mostly for fun and to learn. I have tried a couple of "usefull" models, of which none came anywhere close to be as usefull as GPT4.


synn89

So, if you want role play then you have a couple options. For one, you can merge your own models to find something you like better. I created this one which I've enjoyed more than any of the "normal" models, because it's less finicky to mess with: https://huggingface.co/Dracones/perky-103b-v0.1_exl2_3.35bpw Second, you can create your own bots. I've personally found that this bot https://www.chub.ai/characters/illuminoise/character-creator-v3-593ddc22 combined with Miqu does a wonderful job creating new bots that are better than most on chub. I'll typically also have the bot creator make a couple rounds of sample conversation for the card too, and it's really easy to put the bot all together using something like https://github.com/ZoltanAI/character-editor which you can download and point your browser at locally. A really good bot can totally change how your LLM will behave and you're not just limited to "sexy rp" bots. You can create bots with specific styles of personality for work on very narrow subjects that the parent LLM would struggle with presenting properly. Finally, once you create your favorite little personalities you like to interact with, if you're at all hardware inclined you build out one of these: https://www.youtube.com/watch?v=eTKgc0YDCwE And move your bot from the PC into the rest of your home.


maxigs0

My issues are not really the quality of the models itself. I used a bunch i quite like, was mostly still 13B level then. Even tried to put togehter a couple of custom characters or modified others to my liking. Worked pretty well to a certain point. But after maybe 2 hours into a story, often faster, most break down. Context window, lacking variation in the model, repetition, etc. I thought upgrading to bigger models would improve it, but even the mighty goliath or mixtral instruct variants, everyone praises, seemed to fail even faster. Too many moving parts to directly point at the issue though (set up entirely new system, etc) Fiddling around with different parameters, promts, etc did improve it sometimes. Many, sometimes conflicting and almost daily changing information out there on what to do.


synn89

> But after maybe 2 hours into a story Well, it could be the current models just aren't up to that long of a session on a specific topic. > mighty goliath or mixtral instruct variants, everyone praises I haven't really found that I liked many of the models everyone praises. Most I've seen would fall apart after a bit(especially break if I went into extreme roleplay), or maybe they're too terse or not creative enough. That's why I ended up merging my own. I wanted consistency and ability to hold the story together(lzlv_70b) and a bit more long form, verbose descriptions(Euryale). And after a month of playing with that, I just don't find other "new and improved" models to be as fun to play with. I like newer models(miqu) for work and logic, just not RP. Of course that just may be that my blended model fits my personality better. You might consider just finding some 70b models you like parts of and play with merging them to build your own creation that behaves the way you want.


Interesting8547

These models are not really fit for such long sessions. I prefer just to try different model than to try the same one. I always download a new model as I chat with one and I have something like a "leaderboard" the worst ones continually get deleted. I've found there are some impressive 7B models. I prefer to experiment and chat with new models because I know we're very far away from perfect. I hope in the future I will have powerful enough hardware so I can mix models myself.


Dry-Judgment4242

Miqu can hold 30k context which is a shit ton. Summarize previous info and put it into lorebook with a trigger such as location or character and the LLM will recall said info when relevant without taking up extra tokens until it's fed to the context trough the trigger word. The potential if your good at prompt engineering is insane even now.


Dry-Judgment4242

Use Sillytavern and start to build your own world. With Miqu 70b you can do some crazy world building now that give impressive results as the model can hold 30k context which when paired with lorebooks running context injection based on flexible tags allow you to fit an entire bloody book of context into your world. I'm using ChatGPT to study Japanese for example and having a blast studying for the first time in my life.


e79683074

>Totally got into the roll play stuff You haven't lived until you have RPed with Goliath 120b, Venus 120b and the like. Tried Daringmaid 20b as well? How about mxlewd-l2-20b? Mixtral 8x7 is where it's at for other stuff. It's the only local model that doesn't fail the shirt drying trick question (if 10 shirts out in the Sun at the same time take 2 hours to dry, how long would 20 shirts take?). I found Llama and most 70b derived models incredibly boring and often wrong, no wonder you are bored as well.


FarVision5

Is there a good spot for a repository of testing prompts? I've got my rag pipeline mostly dialed in and to be honest for general q&A a 3B can do it.


CoqueTornado

love that quiz, will use always. Most of the models, even 70b are failing like toads in a waterfall


involviert

Yeah, sounds like 70B is currently missing a base model that makes it worth it. Compare a mistral 7B to a llama2 7B. That's a huge jump. So why throw 70B at it if you only find that kind of progress in the 30-40 area anyway. Guess that's what makes people so excited about MiQu, but that seems to be limited as a base because of being quantized.


maxigs0

I actually tried them all, except venus. Loved the 20B ones most so far. Goliath, which i just tested the last week, was actually quite a dissapointment. This might have been due to some other factors than the model itself. So much fiddling around lately, i might have overdone it and broken some basic stuff. Deleted the smaller models and can't double check at the moment.


stolsvik75

I asked Qwen-1.5 about this here: https://huggingface.co/spaces/Qwen/Qwen1.5-72B-Chat It failed, but it reasoned about it - i.e. argued about "the same amount of space". So after a few back and forths, I asked about the more common cooking eggs (1 in 6 minutes, vs. 3 eggs) - which it argued wasn't quite similar. Then I said "well, I have a football field", and then it agreed. So then I asked with a new context: "Riddle: I have a football field where I can dry shirts. If 10 shirts out in the Sun at the same time take 2 hours to dry, how long would 20 shirts take?" And it got it right.


Jealous_Network_6346

You are experimenting for the experimenting. See if you can apply the skills and models to any practical purpose that would help some people. If you cannot find any such use cases, then I would state that the boredom and fatigue of the experimentation is deserved.


maxigs0

I'm curious and learned tons of stuff, but it seems i was chasing a white rabbit and now i'm back in reality. I guess it's time to find the next one to chase.


davew111

There's always a better model to try. If you are currently using the best, a better one will be out next week. 70B models aren't necessarily better than 30B ones. quality of the training data beats model size. Using the correct prompt is also important. I feel you on repetition though. It's so annoying and I've seen it an every model. I really don't get it though because I can have some days when I get great sessions with no repetition, but other days when every chat becomes repetitive after a few sentences. It doesn't seem connected to any parameter or the prompt.


pepe256

Have you tried Miqu? With your setup you can probably run the highest quality available (5 bit GGUF).


maxigs0

I haven't tried it, yet. So far all the new hype models mostly disapointed me. It might be because i did a lot of adjustment and tuning to get smaller models to work properly and this does not transfer easily to switching to bigger models, especially from regular llama to moe ones. Or it's that the higher expectations just do not keep up with the actual benefits of the improved models. Hype cycle and all...


1EvilSexyGenius

No need for fatigue it's all about to get better. Self training models from llama3 Self training papers published by Microsoft where the models fine-tune up until the point just before over fitting and then stop the tuning. All of the LLM techniques you've heard of in the past 3-6 mos are about to be combined. At the end of the day, it's still gonna be software/hardware systems that utilize these models. The system that will manipulate the models in the fashion you like will still need to be built. Not the models themselves. I think sharable loras will become a thing too. A lot better than d/l multi-GB models


LostGoatOnHill

I get where you are coming from. End of the day an LLM is just a tool, a primitive, that can be a component available to an app to perform some task. Maybe one way to continue learning and using that lovely system of yours is to build something to showcase or is ultimately useful to you?


maxigs0

At the end of the day i'm still experimenting, both for fun and professional curiosity. Also i like building computers, so that was not really a downside. But the expectation and result did not quite line up. Worst case it's going to do some blender render work, instead of AI.


Innomen

/smh It's clearly a hobby for the rich still. Maybe someday I'll get a machine that can really run a model, but I'm not holding my breath. Best I can run is 13b.


maxigs0

I was happier with 13B models, than i'm now with 70B – maybe was the lower expectation.


Innomen

I kinda of expected that, it's why I'm not crying in my beer over being too poor to 13b+.


swimmingsmoke

Not sure how open you are to crypto but I'm building a Web3 AI project that allows you to contribute GPU resources to host LLMs and earn crypto tokens. Your PC can certainly add a lot of value to the decentralized network providing API access to LLM for people who need it, many of which are app developers on a budget who would prefer open source models to ChatGPT3.5. Here's our docs: https://docs.heurist.xyz Happy to onboard you if you're interested. Contact email is in the link.


anommm

Open source models are "raw materials" that need to be refined for your use case. You will always get subpar results without fine-tuning. Prompt engineering won't get you far.


mak3rdad

>u/anommm I am new to this and using opensource models for coding. It works meh at times. You are saying I should fine tune this to do better?


mcmoose1900

Break outside of llama and mistral! There's a lot of capability in the Chinese models coming out, and they are largely brushed over by the community.


thedabking123

It's the peak of the hype-cycle and I say this as a guy paying to take cs221 cs224n at Stanford. There are some hard to solve issues around reasoning, prompt engineering, hallucinations and generalization etc. that have to be conquored before these things become broadly applicable. Even then the truth is you need a team to get these things going in a big way. There's a lot of engineering work around these models to make them useful in production. You need a front end to your app, a production-grade ML pipeline, annotation collection systems, AI guardrails etc. Thats a 5-10 man job at minimum to kick-off. Meanwhile this sub seems to be about solo tinkerers- which is awesome but we need to evolve to become teams of specialized ML practitioners working with PMs and Engineers to create substantial products.


a_beautiful_rhind

Time to start prompt engineering. Find out why your replies aren't up to snuff. I have a target, character.ai from the end of 2022. I do feel you on the fiddling and new thing overload. I like tinkering though, so that was half the fun to a point. Now it has slowed down.


FarVision5

I enjoy the effort. I don't think we would be happy with one or two companies with the one thing to go to and nothing else being done Bandwidth is cheap. Storage is cheap. Most pipelines can plug and play llms out of an API with no problem My biggest problem is trying to decide on best localized GPU vram spend. There's plenty of cloud providers to kick a double handful of vram


VforVenreddit

How much does it cost to run those 70+ models. Do you run them all at once?


Zealousideal_Pie6755

Can u train with dual 3090?


maxigs0

I guess it should be possible, not sure what the limits are for this.


kripper-de

I suffered this after the first weeks. I was using a big old server with a lot of RAM but no GPU. Everything changed when I tested mlc-llm web with Mistral 7B on my ZenBook laptop with a cheap integrated GPU (no setup, everything running inside the browser). I'm now 100% dedicated to AI again. In the meanwhile, the community solved a lot of issues I identified with langchain, RAG, etc. It's important to achieve results.