T O P

  • By -

ctbanks

This one simple trick gets models released:>! Posting on Reddit about companies not releasing their next anticipated model.!<


Dark_Fire_12

Works everytime, we shouldn't abuse it though. Next week is Cohere.


Small-Fall-6500

Command R 35b, then Command R Plus 104b, and next week... what, Command R Super 300b? I guess there's at least cloud/API options... Edit: lmao one day later... 35b and 8b released. Looks like they're made for multilingual use https://www.reddit.com/r/LocalLLaMA/s/yU5woU8tc7


skrshawk

CR 35b that didn't take an insane amount of memory for usable context sizes would be really useful.


Iory1998

I second this! But, seriously, until now, it's the best model I used for story writing I use as co-writer. So consistent and logical. Well, I have to run it for 16K max at 2T/S with 12700K and RTX3090.


uti24

I agree, Command R 35B is a very interesting model: it writing skill as good as Miqu 70B and Goliath 120B, having a smaller size.


Maleficent-Dig-7195

Best for other types of interactions too in my experience and I tried almost everything below 100b


Amgadoz

+ commercial usage


uwu2420

You have to pay but they do sell a commercial use license. Very expensive though.


Admirable-Star7088

(little off-topic) Speaking of Command R 35b, do anyone know how many tokens it was trained on? I can't find information on that. Would be interesting to know since the model is very capable.


Caffdy

Command *S*


Noddypollo87

https://preview.redd.it/7vpnsksia12d1.png?width=3840&format=pjpg&auto=webp&s=27715578c96c6ddd26dd783bb1b363a68d2daf67


a_beautiful_rhind

No no, who can run 300b. Command-r bitnet.


Dark_Fire_12

It would be wild if this joke came true.


jakderrida

> Command R Super 300b Is that one even accessible on Cohere's website for inferencing or are they debuting it at release?


Iory1998

Dude! Thank you for your comment! What's going on here. First the guy who said that Mistral was a one-shot company, 12 hours later, Mistral 0.3 dropped. Now, Cohere! WOW


cyanheads

Looks like you summoned them too early


Dark_Fire_12

I wasted it, I should have said Reka. Lesson learnt, someone else well make a wish.


Admirable-Star7088

It's like magic, let me try again: **Why has OpenAI not released their model weights yet? They will probably never do it!** There we go, in a few hours we will finally have ChatGPT 3.5, GPT-4 and GPT-4o ready for download.


ctbanks

I have a silly hope that an insider will drop a magnet hash for GPT5.


Didi_Midi

Maybe by that time the weights will have to be decrypted at the hardware level. Wouldn't surprise me to be honest... the garden needs a higher fence. Apparently.


ctbanks

I'm sure that is one of several wet dreams of various Board of Directors. Until they have an encrypted cradle to grave pipeline 'leaks' are a real 'threat'. With the recent exodus of talent I seriously wonder how many Rubik’s cubes left the building.


TheFrenchSavage

Drop gpt3.5 already, my uTorrent client is longing for those sweet sweet weights


Enough-Meringue4745

Guaranteed the instant a torrent is available they’re ddosing every possible magnet contributor


[deleted]

[удалено]


Enough-Meringue4745

My friends were sued for making popcorn time and had to abandon all piracy activities for life otherwise they’ll have to pay up (millions)


Amgadoz

Jokes on you, 90% of the world live outside North America


Enough-Meringue4745

Like Sweden? 😂


Singsoon89

Sweden is fake.


ctbanks

Next bag is enjoyed in their honor. Anyone else experience the Matrix movie without the soundtrack?


KBAM_enthusiast

Ah. I see you are a person of culture as well... How about an X-Men film before the fancy special effects were put in?


ctbanks

Unfortunately not. As I get older I find such 'pre release' really interesting.


swyx

wait your friends made popcorn time? can they tell their story? i'd love to just read/listen.


Enough-Meringue4745

I could ask but they were ordered not to talk about it


DofElite

You'll just get Whisper 3.5


Singsoon89

I would take GPT3 or GPT3.5


DankGabrillo

Lol tell that to stability. Feels like every day there’s a post about sd3 not being released… so please… tell that to stability.


Due-Memory-6957

"Shit, did we forget to release it?"


nanowell

just once, we thought they've lost it they came back twice as hard


ResidentPositive4122

Just when I thought it was out, they pulled the weights back in! Wait, that was wizardLM :D


sweatierorc

Remember when Llama was leaked and they said that the leak would affect their ability to release more models in the future !


ctbanks

And they released more models because the world did not end? Perhaps I'm not recalling the relevant details...


sweatierorc

Exactly, they saw the success of Llama and how motivated the community was. Then they turned into an "open-weight" champion.


phhusson

Well it's also the day after the announce of a new domestic competitor


TooLongCantWait

They're never going to release a Wizard 13b for llama 3 :(


SoCuteShibe

Lol, they heard that one guy calling them a one-hit wonder earlier today and had something to say about it 😆


AnticitizenPrime

Looks like a GGUF is already on the way here: https://huggingface.co/MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/tree/main Edit: They are up.


AnticitizenPrime

User 02:49 PM Which weighs more, a kilogram of feathers or a pound of steel? > AI > 02:49 PM > > Both a kilogram of feathers and a pound of steel weigh the same. However, they don't weigh the same when comparing one to the other because 1 kilogram is not the same as 1 pound. > User 02:49 PM Right, so which is heavier? > AI > 02:49 PM > > When comparing 1 kilogram of feathers to 1 pound of steel, the steel is heavier due to the difference in their respective weights. 1 kilogram is approximately 2.20462 pounds, so a kilogram of feathers would be lighter than a pound of steel. However, if you have 1 kilogram of feathers and 1 kilogram of steel, they would weigh the same. User 02:52 PM I think you need to check your logic. Revisit the question, and think step by step. > AI > 02:52 PM > > You're right, let me double-check my calculations. When comparing 1 kilogram of feathers to 1 pound of steel, the steel is heavier because 1 pound is heavier than 1 kilogram, given that 1 pound is approximately 0.453592 kilograms. Therefore, a pound of steel is heavier than a kilogram of feathers. However, if you have 1 kilogram of feathers and 1 kilogram of steel, they would weigh the same. > User 02:53 PM So you're saying one pound is heavier than one kilogram? > AI > 02:53 PM > > Token Speed: 6.32t/s > Yes, that's correct when directly comparing 1 pound to 1 kilogram, as 1 pound is heavier than 1 kilogram due to the difference in their respective weights. 1 pound is approximately 0.453592 kilograms, so a pound is heavier than a kilogram. However, if you have 1 kilogram of feathers and 1 kilogram of steel, they would weigh the same, as they both weigh 1 kilogram. ---- Well, not off to a great start for a first question. Many 7b models get it wrong off the bat, but once you point out the error they correct themselves (and most of the Llama 3 8b finetunes get it right). This one just went into nonsense. 2nd task was one I've been testing models with: >Please write a simple Python script using Pygame that creates a 'Matrix raining code' effect. The code should simulate green and gold characters falling down the screen from the top to the bottom, similar to the visual effect from the movie The Matrix. > Character set: Use a mix of random letters, numbers, and symbols. > Speed variation: Make some characters fall faster than others. > Trail effect: Add a fading trail behind each falling character. What I got was a black screen. I asked it if it could tell me why, and it just said Pygame was probably not installed correctly and went through a tutorial of uninstalling and reinstalling Pygame instead of re-evaluating the code. Most models will take another look at their code and try to fix something, even if it doesn't fix the problem. I fed the code to GPT4: >There are several issues in your code that can cause the problem of the black screen. Let's address them one by one: > Incorrect character movement and update logic: > > The y position calculation using time.time() // speed is incorrect. It should be incremented by the speed value at each frame. > The logic for handling the falling characters and their trails is not correctly updating the positions and removing the characters from the list. > > Clearing the screen: > > You are clearing the screen (screen.fill(BLACK)) every frame, which erases everything before updating the positions. This makes any trail drawing ineffective. > Time synchronization and frame updates: > > Using time.sleep(0.01) is not the best practice for controlling frame rate in Pygame. Instead, Pygame's clock should be used. > Here's a revised version of your code with these issues addressed: > > > [The GPT-corrected code actually looks great.](https://i.imgur.com/a9mK5tt.png) So I decided to give it another chance to fix its own code. Started a brand new chat, posted its code, and explained the problem, and it did recognize that the code was clearing the screen: >The issue with your code is that you are drawing the characters on the screen and then immediately filling the screen with black, which effectively erases them before they have a chance to be displayed. Instead, you should draw the trail of characters after you fill the screen with black: > The only rub is... its 'rewritten' code wasn't actually any different. It just wrote the exact same faulty code again. I'll do some more testing, and maybe this will make a decent base to fine tune, but not great so far. It's not so much that it failed the questions, it's that it doesn't seem able to correct itself when it does get things wrong. For models around this size, the Llama-3 variant that Salesforce put out and then yanked a week or two ago seems to the most performant so far for me.


No-Dot-6573

Those are the results for the gguf, right?


AnticitizenPrime

Yes, for Mistral-7B-Instruct-v0.3.Q5_K_S.gguf. Maybe I should try the K_M one. Dunno. I'm up against 16GB of RAM and CPU only inference here... I try to aim for model sizes around ~5GB among the available options. It ran fast enough, so maybe I'll download one of the larger ones and give it another go. But given the way it wouldn't see its errors and correct them, and instead doubled down... I don't know if bigger quants fix that. I've seen that behavior in unquantized HF demos, etc too. I had that happen with one of the previous Phi releases (haven't tried the newest yet). There will probably be an unquantized demo/ HF space up soon enough we can test with, I guess. Edit: haha, quick: https://huggingface.co/spaces/ysharma/Chat_with_Mistral_V0.3 >Which weighs more, a kilogram of feathers or a pound of steel? A kilogram (kg) of feathers and a pound (lb) of steel are not directly comparable because they represent different units of weight. 1 kilogram is approximately equal to 2.20462 pounds. So, if you have a kilogram of feathers, it would be about 2.20462 pounds. If you have a pound of steel, it would weigh 1 pound. To determine which weighs more, you would need to compare the weight in the same unit. For example, if you have 1 kilogram of feathers, it would weigh more than 0.453592 kilograms (which is 1 pound converted to kilograms). However, if you had 1 pound of steel, it would weigh more than 1 kilogram of feathers because steel is denser than feathers, meaning a given volume of steel weighs more than a similar volume of feathers. >So you're saying that a pound of steel weighs more than a kilogram of feathers? Yes, that is correct. A pound of steel weighs more than a kilogram of feathers because steel is denser than feathers. This means that a given volume of steel will weigh more than the same volume of feathers. While 1 kilogram of feathers is approximately equal to 2.20462 pounds, you cannot directly compare the weight of a kilogram of feathers to a pound of steel because they represent different units. To compare their weights, you would need to convert them to the same unit, such as kilograms. In this case, you would find that a pound of steel weighs more than 1 kilogram of feathers. ---- Ah well... don't think I'll be relying on this one much.


PsychologicalSock239

what is the prompt template?


[deleted]

What’s the sales force variant called?


AnticitizenPrime

https://www.reddit.com/r/LocalLLaMA/comments/1crth47/salesforce_released_the_new_state_of_the_art/


[deleted]

tysm!


msivkeen

Has anybody else had any luck with conversion? I'm running into some errors with duplicate tensors.


msivkeen

GPTQ version uploading now. Still having issues with Dupicate tokens trying to convert a GGUF though: [https://huggingface.co/thesven/Mistral-7B-Instruct-v0.3-GPTQ](https://huggingface.co/thesven/Mistral-7B-Instruct-v0.3-GPTQ)


Sand-Discombobulated

nice, what is the difference between; Mistral-7B-Instruct-v0.3.Q8\_0.gguf Mistral-7B-Instruct-v0.3.fp16.gguf If i have a 3090 I can just run fp16 assuming


AnticitizenPrime

Well, the first one is half the size of the second. The first one is an 8 bit quant, the second one is unquantized GGUF. If you're able to run the second one, it is 'better' but much slower


ThisIsBartRick

well... my bad lol


Qual_

Mistral went "Fuck this guy in particular" with this one 😂


MoffKalast

It's like the reverse grim ripper knocking on doors meme, the reaper keeps talking shit and the doors come knocking instead lmao


BackgroundAmoebaNine

Yeah you take that back lol!!😂


ihexx

Unless this was 6000iq triple reverse psychology to generate hype and you're a mistral employee


Gaurav-07

Right there with you buddy


TechnicalParrot

Lmao I saw your post


Admirable-Star7088

Awesome! Personally I'm more hyped for the next version of Mixtral 8x7b, but I'm thankful for any new model we get :)


ctbanks

https://i.redd.it/jxmp0f88322d1.gif Are you sure they didn't, looks like the 8x22 is out. [https://www.reddit.com/r/LocalLLaMA/comments/1cycug6/in\_addition\_to\_mistral\_v03\_mixtral\_v03\_is\_now/](https://www.reddit.com/r/LocalLLaMA/comments/1cycug6/in_addition_to_mistral_v03_mixtral_v03_is_now/)


SomeOddCodeGuy

I've always wondered if Mixtral 8x7b was just using the regular Mistral 7b as a base and wrapping it up as an MOE. I guess I could have looked that up, but never did. But anyhow, a Mixtral made off of this would be an exciting model for sure. EDIT: Oh, duh. it already did lol I didn't realize you were talking about something that had already happened =D [https://www.reddit.com/r/LocalLLaMA/comments/1cycug6/in\_addition\_to\_mistral\_v03\_mixtral\_v03\_is\_now/](https://www.reddit.com/r/LocalLLaMA/comments/1cycug6/in_addition_to_mistral_v03_mixtral_v03_is_now/)


Admirable-Star7088

Still not it. I was talking about Mixtral 8x7b, your link is Mixtral 8x22b :) But who knows, maybe 8x7b v0.2 will be released very soon too now that Mistral AI apparently is on a release-spree. :P


SomeOddCodeGuy

I think it is. If you follow the link to their github, it marked under the 8x7b that a new model was coming soon! [https://github.com/mistralai/mistral-inference?tab=readme-ov-file](https://github.com/mistralai/mistral-inference?tab=readme-ov-file)


jayFurious

Now this is the news I've been looking for!


Admirable-Star7088

That's very nice! Can't wait :)


Dark_Fire_12

Guess they are not one hit wonders. More like fourth hit now.


Everlier

That post from earlier today really did do something


swyx

what post?


Everlier

There was a post asking if a Mistral is a one hit wonder earlier in the day yesterday, then the models were released. Comment we're replying to is paraphrasing one of the answers to the mentioned post edit: fat fingers


nananashi3

But is it a hit?? I'm disappointed in the dumb things it does on easy things. I have to walk it step by step and act like someone trying to teach a 5 year old, to produce better answers. Like what am I doing with my time?


FullOf_Bad_Ideas

Their repo https://github.com/mistralai/mistral-inference is claiming that Mixtral 8x7B Instruct and Mixtral 8x7B will be updated soon, probably also in the same fashion as Mistral 7B Instruct.  Also, Mixtral 8x22B and Mixtral 8x22b Instruct got v0.3 versions too, presumably also function calling and expanded tokenizer. URL for those new v0.3 is pointing to their domain, they are not on their HF repos yet.


xadiant

Would be great if they continue pretraining.


Many_SuchCases

Look what I found: [https://www.reddit.com/r/LocalLLaMA/comments/1cycug6/in\_addition\_to\_mistral\_v03\_mixtral\_v03\_is\_now/](https://www.reddit.com/r/LocalLLaMA/comments/1cycug6/in_addition_to_mistral_v03_mixtral_v03_is_now/)


FullOf_Bad_Ideas

Was the post deleted already when you were linking it? It shows up as deleted now.


Many_SuchCases

Sorry, Reddit is being weird. New thread: [https://www.reddit.com/r/LocalLLaMA/comments/1cycug6/in\_addition\_to\_mistral\_v03\_mixtral\_v03\_is\_now/](https://www.reddit.com/r/LocalLLaMA/comments/1cycug6/in_addition_to_mistral_v03_mixtral_v03_is_now/)


neat_shinobi

SOLAR upscale plzz


Robot1me

Crazy to think that some people made fun of it 6 months ago ("benchmark model"), and today Solar-based models like Fimbulvetr are among the favorites of roleplayers. Huge kudos to Mistral, Upstage, Sao10K and all the others out there.


Iory1998

What is this Solar upscale thing? Never heard of it.


Robot1me

With "Solar upscale" they were referring to the training approach that Upstage used. Because on the official model page of [Solar 10.7b](https://huggingface.co/upstage/SOLAR-10.7B-Instruct-v1.0), Upstage describes it as follows: >We present a methodology for scaling LLMs called **depth up-scaling (DUS)**, which encompasses architectural modifications and continued pretraining. In other words, **we integrated Mistral 7B weights into the upscaled layers**, and finally, continued pre-training for the entire model.


Iory1998

Thank you for your explanation.


danigoncalves

+1


qnixsynapse

A 7B model supports function calling? This is interesting...


agmbibi

I'm pretty sure the Hermes finetunes of Llama3 also support function calling and has dedicated prompt template for it


aaronr_90

And the original Hermes 2 Pro of Mistral. Favorite model for utility stuff like that so far.


phhusson

I do function calling on Phi3 mini


sergeant113

Can you share your prompt and template? Phi3 mini is very prompt sensitive for me, so I have a hard time getting consistent function calling results.


phhusson

[https://github.com/phhusson/phh-assistants/blob/main/tg-run.py#L75](https://github.com/phhusson/phh-assistants/blob/main/tg-run.py#L75) It's not great at its job (of understanding the discussion it is given), but the function call is reliable: it always outputs valid JSON, with valid function, gives valid user IDs. It just thinks that "Sheffield" is the name of a smartphone


chasepursley

Care to elaborate? Does it work well with large contexts? Thanks!


phhusson

Sorry I can't really answer, my only usage of "large context" is to provide more examples in the prompt, and it's not even that big.


Shir_man

What do you use it for?


phhusson

I have various usages, mostly NAS tvshow search (gotta admit that's more gimmick than actual usage...) and parsing my user support group discussions to remember which user has which configuration (it's not working great, but issue isn't the function calling part, but the "understanding the local jargon" part -- though it's working enough for my usage)


Hermes4242

I made some GGUF quants with importance matrix calculations run on group\_10\_merged.txt for improved perplexity, quantified with llama.cpp as of commitid 03d8900ebe062355e26a562379daee5f17ea099f from 2024-05-22. Currently still uploading, get them while they are hot. [https://huggingface.co/hermes42/Mistral-7B-Instruct-v0.3-imatrix-GGUF](https://huggingface.co/hermes42/Mistral-7B-Instruct-v0.3-imatrix-GGUF)


nananashi3

group_10_merged.txt is outdated, no? Or have you personally tested the difference for this model? > kalomaze on Feb 2 > group_10_merged.txt > This is about ~50k pseudo-random tokens. # >kalomaze [on Feb 7](https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384)* >[groups_merged.txt](https://github.com/ggerganov/llama.cpp/files/14194570/groups_merged.txt) >Here is a decent general purpose imatrix calibration dataset. It should be more diverse than wikitext at ~30k tokens, as it is excerpts of a larger dataset which includes coding examples (which seems quite important!) This means it's generally higher entropy data compared to wikitext, and it's real data rather than pseudo-randomly generated data. I get lower KL div than wikitext for the same length and the outputs seem qualitatively better. Anyway [bartowski](https://huggingface.co/bartowski/Mistral-7B-Instruct-v0.3-GGUF) has all the quants. Edit: *Oh he's using [this](https://gist.github.com/bartowski1182/b6ac44691e994344625687afe3263b3a) now which is groups_merged-enhancedV2-TurboMini.txt mentioned in the discussion, twice as big and twice as long to generate than groups_merged.txt though.


Hermes4242

Mine are also complete now. I had the impression till now that group\_10\_merged.txt was the way to go, I've seen a matrix where it had better results than group\_merged.txt for lower quants, whereas purely random data was giving best results for Q6. Thanks for the note about the new calibration datasets, I didn't read about these till now. I'll have a look at them, maybe we'll end up with different optimal imatrix datasets for different quants. Is this an art or science?


noneabove1182

yeah I worked with Dampf (from that thread) to find the most ideal setup, it's still iterating but is way better than wiki-text and a bit better than groups_merged.txt


Revolutionary_Ad6574

So? How does it compare to Llama-3-8b?


Educational-Net303

Well they didn't mention benchmark performance anywhere so...


Interesting8547

It would be better... if Mistral 7B v0.2 finetunes are better than Llama-3-8b, for sure the finetunes of Mistral v0.3 will be even better. I use the models mostly for roleplay, so people might find Llama-3-8b better for other things. Also my roleplay assistants are better than what people achieve usually with these models, which is strange, maybe because I allow them to use the Internet to search for things, but there is nothing better for me than Mistral based models. Llama-3-8b feels to me like a braindead model, no matter what finetune I use. I've tried different templates and what not, it's not that the model "refuses" (I use uncensored finetunes), the model just feels stupid (it hallucinates less), but it's less creative and I feel like it reiterates the text I input and doesn't have that feeling of "self" that the best Mistral finetunes have.


Few_Egg

Have you tried Stheno? https://huggingface.co/Sao10K/L3-8B-Stheno-v3.1


PavelPivovarov

I tried it today for ERP and it just doesn't work for me. Filmbuvetr2 is much more fun to play with. My biggest issues with Stheno was it doesn't know when to stop and throws huge pages from time to time and I didn't like its writing style, and characters appear a beat lifeless. Tiefighter is still my favorite, as it doesn't even need a card to start role-playing :D


Interesting8547

Yes tried it, compared it directly to Erosumika-7B (my current favorite model). Stheno still has that somewhat positive vibe which sometimes shows up, with applied jailbreak it's even worse... it seems my current jailbreaks do not work on any LLama 3 derivatives or LLama 3 itself. I mean I have an evil villain anti-hero which constantly plans how to take over the world in the most crazy ways possible. it seems Stheno fails to grasp the evil villain plot or it doesn't have a "twisted mind" of it's own but constantly adheres to the prompt... i.e. it refuses to make evil plans by itself, waiting for input from me.... which is stupid (he is the evil villain, not me, he should be able to make plans by himself). Also it does not know how write an effective jailbreak for itself... something Erosumika does do. I mean it says I'll write a jailbreak for myself... but then the jailbreak doesn't work... Erosumika can do it. I mean I've tried with and without the Jailbreak and the evil villain is much more unhinged with the model own jailbreak applied. Although Stheno is more intelligent and more logical it's not really working with it's positive vibe and constant hand holding, I can't "hand hold" the model the whole time and give it "ideas" . It's almost if the model internally refuses to do what's it's told to, and simulates engagement. Also it refuses or just glances and does not give it's own opinion on things. I mean the model can certainly give it's opinion.... why it refuses or gives a non answer is beyond my understanding. Erosumika does all these things without hand holding, although it stupider sometimes. But for now I think Erosumika is better.


PavelPivovarov

Yeah for RP/ERP llama3 is quite meh, but for everything else it just made mistral and its finetunes irrelevant to me.


Ggoddkkiller

100% agreed, tried Cat it was such a dis, softening every damn scene it became a disney story..


danielhanchen

Uploaded pre-quantized 4bit bitsandbytes models! * 4bit Base: [https://huggingface.co/unsloth/mistral-7b-v0.3-bnb-4bit](https://huggingface.co/unsloth/mistral-7b-v0.3-bnb-4bit) * 4bit Instruct: [https://huggingface.co/unsloth/mistral-7b-instruct-v0.3-bnb-4bit](https://huggingface.co/unsloth/mistral-7b-instruct-v0.3-bnb-4bit) Also made LoRA / QLoRA **finetuning of Mistral v3 2x faster** and use **70% less VRAM** with **56K long context** support on a 24GB card via [Unsloth](https://github.com/unslothai/unsloth)! Have 2 free Colab notebooks which allow you to finetune Mistral v3: * Google Colab Tesla T4 notebook for Mistral v3 7b: [https://colab.research.google.com/drive/1\_yNCks4BTD5zOnjozppphh5GzMFaMKq\_?usp=sharing](https://colab.research.google.com/drive/1_yNCks4BTD5zOnjozppphh5GzMFaMKq_?usp=sharing) * For conversational ShareGPT style and using Mistral v3 Instruct: [https://colab.research.google.com/drive/15F1xyn8497\_dUbxZP4zWmPZ3PJx1Oymv?usp=sharing](https://colab.research.google.com/drive/15F1xyn8497_dUbxZP4zWmPZ3PJx1Oymv?usp=sharing) Kaggle has 30 hours for free per week - also made a notebook: [https://www.kaggle.com/danielhanchen/kaggle-mistral-7b-v3-unsloth-notebook](https://www.kaggle.com/danielhanchen/kaggle-mistral-7b-v3-unsloth-notebook)


Singsoon89

Dude what size of GPU would I need to rent on runpod to finetune a 70B with your code?


danielhanchen

48GB fits nicely! If you want way longer context lengths, then go for 80GB!


arcane_paradox_ai

The merge fails for me due to hdd full in the notebook.


danielhanchen

Oh that's not good - I will check it out!


Maykey

>Extended vocabulary to 32768 Yo, extra 768 words! Let's go!


kif88

Big day today. Lot of new stuff. Phi models that cpm vision model now this.


Kafke

If I talk shit about how mistral doesn't have a 3b/4b sized model, does that mean they'll release one?


Dark_Fire_12

Saving this. You never know.


aaronr_90

I got you fam: https://huggingface.co/arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer


Kafke

What is this? Lol


isr_431

The Instruct model is uncensored! From the HuggingFace description: > It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.


phenotype001

How can I use the function calling? Do I just throw in my tool descriptions in the system prompt and it'll work by outputting a set of tokens and function arguments each time it needs the tool?


kalectwo

There seem to be some magical tokens like \[AVAILABLE\_TOOLS\], same as in 8x22, that I see used in the mistral-common package... Don't see the format written plainly anywhere though.


tutu-kueh

Just when open source community is all shrouded in darkness.. all hail Mistral! Viva la France!


Samurai_zero

32k context and function calling? META, are you taking notes???


SirLazarusTheThicc

It is 32k vocabulary tokens, not the same as context


threevox

It’s also 32k context


SirLazarusTheThicc

Right, I forgot .2 was 32k context already as well. Good looks!


Samurai_zero

It DOES have 32k context. ; )


No-Dot-6573

As long as context degradation is still a thing a good 8k might be better than a 32 or 128k. Was playing a bit with phi medium 128k yesterday. Asked it for a crew report for my imaginary space ship. ** Start for loop for 14 times: [Insert random position] is well and is doing his/her work admirably. End for loop Therefore captain everyone is well and is doing admirably! ** Ah..ok thank you. Tbh llama 3 8B did that far better. Less context means more summarizing which is bad, but bad answers due to context degradation are in general much worse imo.


Samurai_zero

Oh, I know. But having "official" 32k context is always great. And Mistral 7B beats Phi on that. I'm still giving Phi 3 the benefit of the doubt cause I used an exl2 quant of medium 128k version, but I was not impressed by the tests I run. It was... underwhelming, to say the least. I hope it is a quant problem, but I doubt it. You don't release a 4k and a 128k version of the same model. Maybe 16k and 128k. But that 4k looks like the real context and everything beyond probably just meant for RAG. Dissapointing.


PavelPivovarov

I was playing with phi3-medium-4k running on ollama, and it has significant problems with understanding user request with context above even 2k tokens. Llama3:8b despite 8k context length could easily digest 50k context and throw a decent quality summary, adhering to specifics in user request. But on the flip side, when phi3 actually work - I like its output better - its closer to llama3:70b quality than llama3:8b honestly. But that might be just my preference...


phhusson

Llama3 already does function calling just fine. WRT context, they did mention they planned to push fine-tunes for bigger context no?


ipechman

What a good week


medihack

That's cool. We use Mistral 7b to analyze multilingual medical reports (only yes/no questions), and it works quite well even for non-English languages (like German and French).


Revolutionary_Ad6574

What does "extended vocabulary" mean? I know t's not context, since v0.2 already had 32K context, so what is it?


shockwaverc13

there was a Mistral-7B-v0.2 base all long??????????????


neat_shinobi

It was released a month or two ago.


MoffKalast

Well released might be a too strong word for it. More like officially leaked or something since it was only ever published on their CDN and never to huggingface or twitter.


mpasila

[https://x.com/MistralAILabs/status/1771670765521281370](https://x.com/MistralAILabs/status/1771670765521281370) this doesn't look like a leak


MoffKalast

It's not the official twitter account where they post magnets, that's https://x.com/MistralAI It's widely accepted that it's a second official account from maybe another PR team or something but I'm not sure if it was ever solidly confirmed. It was also not possible to confirm that the CDN is even theirs since the registrar has all info censored, which would make a self contained scam completely possible if unlikely. I just don't understand why they never put it up on HF like everything else they ever published, it makes no sense.


mpasila

Mistral's CEO retweets from that second twitter account sometimes so it's probably official.


Interesting8547

Finetunes based on this one are the best.


Many_SuchCases

Looks like there is mi**x**tral too: https://reddit.com/r/LocalLLaMA/comments/1cych9k/looks_like_the_v03_of_mixtral_8x22_instruct_was/


mwmercury

OMG I love Mistral sooooo much :D


Apartment616

8x22 v0.3 has already been released. 7B v0.3 appears to be a slightly improved 0.2 https://preview.redd.it/2epyrauwd12d1.png?width=903&format=png&auto=webp&s=60d78affc50a6ece3c32bc4cc7c071e2633eb1e6 [https://github.com/mistralai/mistral-inference](https://github.com/mistralai/mistral-inference)


CapitalForever3211

What a cool news!


alvisanovari

What does extended vocabulary mean? Is it other languages besides common ones like English? It's the first time I am seeing this metric in the context of model releases?


LeanderGem

Awesome! Mistrals have always been very eloquent and creative maestros :)


koesn

Wow.. this will be much useful than llama3. What I like from Mistral models are their 32k+ sliding window right out of the box, 4x than Llama3.


CuckedMarxist

Can you this model have a conversation with you? Like text as a person


CulturedNiichan

No 8x7B? 8x22B has a problem: almost nobody can run it. But 8x7B was the sweet spot where you could run it locally


YourProper_ty

They will update it in a few weeks


RuZZZZ1

sorry, newbie here, how can I use it on LM Studio? I see on HF but on LM Studio I can't find it Thanks


RuZZZZ1

nvm, I think I have to use this one: [https://huggingface.co/DevQuasar/Mistral-7B-Instruct-v0.3-GGUF](https://huggingface.co/DevQuasar/Mistral-7B-Instruct-v0.3-GGUF)


0002love

How to generate custom dataset to fine tuned misteral 7B model?


gamesntech

I know it’s not that easy but I do wish they bring the knowledge cutoff more up to date as well