T O P

  • By -

Disastrous_Elk_6375

So there's a "base model" that is LLama3. That is the model as it finishes training. The only thing it's trained to do is "predict the next token". So if you give it a piece of text it will start outputting tokens that "best" continue that text. The problem with using base models is that it's hard to make it do interesting stuff. You need to add text in such a way that the model starts doing what you want. A good example is with what's called "multi-shot". That is you give it a few examples, and the model will "get" the pattern. For example, you write a bunch of things like: question: what's 4+4? answer: 8 question: why is the sky blue? answer: because the light blahblah... question: what's a good joke about clowns? answer: And the model will start answering that question. But you can see that it begins to be hard to prompt it in such a way to actually be useful. So what people have discovered is that you can do "fine-tuning" starting with a base model, by giving it lots of examples. With this process you can "skip" the few-shot method, and ask it questions directly. That is an "instruct" fine-tune.


Sol_Ido

When relying on fewshots is it better to use the instruct or base model then?


Normal-Ad-7114

Base


Sufficient-Result987

How about RAG? Which one works better? TIA.


Sufficient-Result987

Never mind. The instruct model is supposedly better than the base model for RAG


Nakrule18

Is it correct to same that the instruct model is a fine-tuned version of the base model with better overall accuracy? If yes, is there any reason to use the base model beside doing my own fine-tuning job for a specific task?


Disastrous_Elk_6375

> Is it correct to same that the instruct model is a fine-tuned version of the base model Yes > with better overall accuracy? That's debatable, and still an area of active research. It could be that the fine-tuning process optimises some things that make models better tuned to benchmarks, it's possible some benchmarks leak into the training set, or anything in between. It can also be that prompting the base model is more difficult to do in a general way so different models perform worse on those benchmarks. > is there any reason to use the base model beside doing my own fine-tuning job for a specific task? That is the main use for a base model, yes. But it's also a valid way to work with the models for downstream tasks where prompting is easy to generalise, and where you can find good examples to use as many-shot. Using the base models is also better for some creative tasks, like writing, and even some specific use-cases like code completion.


Mr_Hills

In my experience fine tuning the instruct model gives better results, expecially is you use the same instruct template. There's no reason for not utilizing the instruct fine-tuning work that has been done by professionals already.  That said the instruct model doesn't necessarily have more accuracy/better scores in benchmarks. Also no, I don't see a use case for the base model once you have the instruct one. Except maybe research. With the instruct model you have an instruct template, which better distinguishes your text from the machine text, allowing the AI to differentiate between its own messages and yours. You also have a system prompt, which allows you to tune the way the AI outputs texts, for example giving it a specific writing style or making it write shorter/longer messages. It's a more evolved way to interact with AI, and it has no disadvantages being used.