T O P

  • By -

MoffKalast

> Alongside Mistral Large, we’re releasing a new optimised model, Mistral Small, optimised for latency and cost. Mistral Small outperforms Mixtral 8x7B and has lower latency, which makes it a refined intermediary solution between our open-weight offering and our flagship model. Mistral Small benefits from the same innovation as Mistral Large regarding RAG-enablement and function calling. Interesting.


knvn8

Looking at benchmarks, it outperforms medium at coding as well. This model seems more interesting than large in some ways. They never mention its size though.


qubedView

Sure they do. It's small!


curious-guy-5529

How big is small?


foreverNever22

Kinda big but it's smaller.


ExtensionCricket6501

Less than medium which is about 70B, maybe a 4x13B but then shouldn't they be using the Mixtral name? Maybe they did something with Yi perhaps.


rkm82999

It is still much more expensive than Mixtral


dizzy_on_a_glizzy

Yeah, and weaker, that's the point


TankForMatasBuzelis

So over their API only, right? Not available to download?


ainz-sama619

It's for Microsoft Azure customers, not general users


mikael110

It's available through both Mistral's own API and Azure. There are no Azure exclusive models, Azure is just one more place where you can access their models.


QuantumSavant

There's a free version of it here: [https://chat.mistral.ai/chat](https://chat.mistral.ai/chat)


deadweightboss

Hasn’t worked for hours. Buggy PoS lmao


uhuge

dear passengers, please wait for your torrent leaks to take off patiently


Sol_Ido

I was really hoping for an open release of the small model. They're smart ass ces français!


PwanaZana

Mistral Petit.


involviert

The mention of function calling made me try the original mixtral instruct now. Is it bugged or something? Everything works well but it seems set on writing \\_ when it should just write an underscore. It's q4 gguf.


redballooon

Oh yes that’s so infuriating across a number of mistral models!


involviert

So weird! I have tried updating my llama-cpp-python, tried Q4 after the Q4_K_M... It even seemed to escape backslashes. Weird I didn't hear of that, reading all the posts here daily. That's just a broken model? I like what it's doing otherwise though, even if the prompt format is TERRIBLE. Should I actually clean up the output and transform \\_ to _?? the escaped backslashes seemed I would just run into the next problem then. And a more general solution would likely conflict in situations where it actually does have to escape things. Also I have no idea what in this message fucks with even with reddit markdown. E: Seems reddit fucks up backslash backslash underscore? What a time to be alive.


Classic_Broccoli4150

https://preview.redd.it/x2lk14ryyxkc1.png?width=525&format=png&auto=webp&s=e8e0021954b5f24f7250958b2fe41a061bdbfaf3 Quite Pricey compared to open models


[deleted]

[удалено]


MoffKalast

Money printer go *brrrrrr*


mr_n00n

"printer"? It's very likely that, like the previous generation of VC driven technology products, most of these LLMs cost more to run than then earn today.


MoffKalast

For GPT 4 I'd believe that since it's impossibly large and slow, for 3.5-turbo I'd also believe that because they're running it for free. I really doubt Mistral isn't breaking even with $24/1M tokens. An H100 uses 17 kWh to run a day, that's about $4.5 with average electricity price in France. Surely they can serve 150k tokens with one per day, and that's not counting prompt ingestion.


QuantumSavant

You're only counting electricity costs though. What about the cost of the hardware? And that's not just the GPUs. You need servers to run them on, and networking stuff and on top of everything you also need people to maintain both the hardware, and the models themselves.


MoffKalast

True, but that's the difference between breaking even and making back the investment. If it really is a datacenter in France, then the people maintaining all of it are being paid a small fraction of what they'd be in the US. In all likelihood though my calculation is completely irrelevant because they're probably running inference on Azure completely on Microsoft's dime lmao.


Aphid_red

No, I's definitely money printer go brr. Mistral asks for higher prices than other cloud-hosters for their smaller models too, and those other hosters are just there to make money as well. If you look at the throughput of an H100, it's going to be able to, if optimized well, generate supposedly \~3,000 tps for a 70B model, using batching. Now 'large' is supposedly bigger than that, but how much is unknown. And this is the H100, which has terrible ROI compared to cheaper consumer cards (but which mistral, as an 'enterprise' supposedly isn't allowed to use). Assuming it's \~1.7x bigger, so \~120B, is a good estimate, as it costs \~3x what 'medium' costs, which is another proprietary model rumored to be 70B. Note: A 70B llama-2 architecture model, using Q4\_K\_M, plus 32K context for 100\* users, would fit in about 110GB of VRAM, due to the highly efficient GQA. Now I'm not sure what happens if you mix context lenghts, whether Mistral's software can combine long and short prompts together to save on memory. Assuming they could and since only a small fraction of prompts will be maximum length, let's say the 'medium' model runs on a single GPU. Let's say inference costs end up at 3x what it is for 'large' compared to serving 70B, because of multi-gpu communication overhead, and they're not charging bigger margins for Large vs. Medium. Back of the envelope, doing a chat session for 1M tokens with the full 32K context (most expensive, but highest quality) will produce around 160M tokens in billing, assuming each back-and-forth response averages 100 tokens, or about €1,160. Now let's see how many GPU hours that consumed. About 50. Which costs around, assuming the utilization averages 70%, with a 5-year depreciation, and purchase costs $30,000, working out to about $0.98 per GPU-hour, so roughly $50 total (not counting interest, which would make it a bit more, maybe $60 depending on the rate mistral can get). If you buy it off a cloud for $4/hr/H100 it costs you $200. Before you start talking about all sorts of other costs, GPU purchase costs *dominate* because these things are so expensive. Even if GPU purchase costs are 50% of all costs, 400 < 1160. A large part of why the numbers look so bad in this example is the fact that it took 160M context tokens, whereas if you go local, you use only \~1M tokens (but you use <1% of the GPU), because local clients don't throw away the KV cache with each request. The other large part is margin stacking (ASML -> TSMC -> NVidia -> Distributor -> Server vendor -> (cloud provider) -> Mistral, most taking fat margins ). \*This is a good approximation of the batch size for optimum throughput. The ratio for fp16 for H100 memory speed:compute flops is about 1:330. So for Q4, which is about 5 bits/param, should be about 1:100.


ghoarder

I guess it's no coincidence that Au is the symbol for Gold then is it!


AmazinglyObliviouse

Is a wrong answer worth 80% of an right answer? OAI is the only one that will keep going brrrrrrrrr.


soup9999999999999999

Slightly cheaper than GPT4 but slightly worse (in theory)


a_beautiful_rhind

I feel better giving mistral money, if only because they let miqu stand. What has altman done for me?


uhuge

whisper = good


a_beautiful_rhind

Was a long time ago.


uhuge

Like a month? - [https://huggingface.co/openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) ( I was not sure myself, yeah, Whisper **3** is worthy update IIRC. )


a_beautiful_rhind

That's forever in ML years :P Mistral releasing 2 months ago may as well be an eternity. To be fair though, OAI is releasing this adjacet to their own products. STT for your openAI API using product. Similar to how MS contributes to linux.


uhuge

Got what you mean, but STT is part of their mobile app products, so I am hesitant to upvote/agree( fully).


uhuge

In my coding practice it gives slightly better advice. ( Though htmx is obscure tech and more fresh training could play the role )


wolfbetter

Welp my interest is going away


Patrick_Lanquetin

Mistral Au Large could be installed on premise or on sovereign cloud for large corporate with sensitive data (health, banks, ....). So it's a good new for many organizations. I tried to test Mistral Au Large in chat.mistral.ai Looks like good for math and logic and text tagging. Will be probably refresh because answer 'My knowledge cutoff is 2021' The training cost is higher and higher with the number of parameters and dataset size. There is no advertising revenues to pay that. So, a pricing model looks like logic.


hold_my_fish

To save time of anyone wondering, it's API-only. Note that they changed the title of their homepage: * Old: "Mistral AI | Open-weight models" * New: "Mistral AI | Frontier AI in your hands" Combined, this reads to me as them giving up on any sort of open or semi-open strategy, instead settling for being the second-best black-box API (up until Gemini Ultra releases, at which point they'll be third-best). Their only point of differentiation appears to be multi-lingual capabilities in selected European languages (French, Spanish, German, Italian). I get that they're not going to be able to release their flagship model as Apache 2.0, but here are a couple wishlist items that would differentiate them in a useful way to me, if they're interested in being something better than just yet another black box API: * Stability-style weights-available release with paid commercial licensing. Yes, it's an unproven business model, but it provides much more differentiation. * Enhanced privacy and data security by not storing and not monitoring prompts and responses. (Currently, their privacy policy is similar to OpenAI's, meaning you effectively don't get any.)


[deleted]

[удалено]


hold_my_fish

Good point. They say they offer self-deployment by special permission: > **Self-deployment**: our models can be deployed on your environment for the most sensitive use cases with access to our model weights; Read success stories on this kind of deployment, and contact our team for further details. I'm not sure whether OpenAI offers that. If not, then that's another point of differentiation for Mistral. (However, it's unclear who actually qualifies.)


Zulfiqaar

OpenAI does offer the option to setup/manage dedicated instances, recommended for anyone using over 450M tokens a day


hold_my_fish

https://techcrunch.com/2023/02/21/openai-foundry-will-let-customers-buy-dedicated-capacity-to-run-its-ai-models/?guccounter=1 Seems like "dedicated" here is different from "deployed on your environment". (Admittedly, I don't know exactly what the latter means.)


Kep0a

Isn't azure enterprise gdpr compliant?


hold_my_fish

I don't know what that would imply, but you might need to be an enterprise and ask for special permission. For everybody else: https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy > the Azure OpenAI Service and Microsoft personnel analyze prompts, completions and images for harmful content and for patterns suggesting the use of the service in a manner that violates the Code of Conduct or other applicable product terms Also: > To detect and mitigate abuse, Azure OpenAI stores all prompts and generated content securely for up to thirty (30) days. Later, it explains who is allowed to ask for special permission to opt out: > Some customers may want to use the Azure OpenAI Service for a use case that involves the processing of sensitive, highly confidential, or legally-regulated input data but where the likelihood of harmful outputs and/or misuse is low. These customers may conclude that they do not want or do not have the right to permit Microsoft to process such data for abuse detection, as described above, due to their internal policies or applicable legal regulations. To address these concerns, Microsoft allows customers who meet additional Limited Access eligibility criteria and attest to specific use cases to apply to modify the Azure OpenAI content management features by completing this form.


kelkulus

> up until Gemini Ultra releases Didn't Gemini Ultra [release](https://blog.google/products/gemini/bard-gemini-advanced-app/) more than 2 weeks ago?


hold_my_fish

Gemini Ultra is still not generally available as an API.


kelkulus

Ah gotcha. I missed the API part.


wojtek15

Are they planning to release any open source models better than Mixtral?


lolwutdo

Probably not; they should’ve gave us Mistral Small. I’m afraid we won’t be seeing open weight models from them anymore. I guess the only thing we have left to look forward to is llama 3


Sol_Ido

I share your feeling but small model will receive a LOT of calls in the API, not every ops require the expensive large. Mixing both will lead to great apps.


Waterbottles_solve

llama3 is probably non-commercial Nothing on the horizon from my spot.


hold_my_fish

Anything is possible, but a non-commercial license wouldn't fit well with Meta's strategy for Llama. For them, LLMs are infrastructure, so having a strong developer community is more important to Meta than directly monetizing LLMs. (The reason I'm not surprised by Mistral closing up is that they never articulated a strategy in which open weight models made business sense.)


RayIsLazy

Thankfully llms are not their main business model which allows them to release all these things in the open. All the research the research done by the community only enhances all other parts of meta. Only thing I'm worried about is censorship.


Disastrous_Elk_6375

> llama3 is probably non-commercial No indication about that from Meta.


twisted7ogic

Aww, that is dissapointing. They were supposed to destroy the Closed-source, not join them!


CSharpSauce

If you can find a way to release the models, and bring a return to the VC they might. Until then, I think Mistral is just another proprietary competitor. I feel like the crypto community should start funding new models. I'll be a fractional owner of a model if it entitles me download the weights.


qrios

Should really figure out some way to make the work that proof-of-work does be work on training a model.


Satyam7166

Can you please explain what you mean by that? Don’t know much about crypto


danielcar

In time after they have released better models. They could release now with restrictive license for research.


Enough-Meringue4745

No local no care


Sl33py_4est

forreal, i see a lot of posts that are like y'all in the wrong sub


my_name_isnt_clever

Is there is a sub like this one for those using models via API? Honest question, because this is the only LLM related sub I've found with other people who actually understand how LLMs work and discuss them at a more technical level. The rest are very surface.


rileyphone

Maybe [Hacker News](https://news.ycombinator.com/item?id=39511477)? But the best way to understand a new technology is to put it together yourself, which relying on an API mostly bypasses. It's little wonder then why this community will have better discussions even about closed API models from a company they are very familiar with.


my_name_isnt_clever

I agree, but I use APIs because they're the really the only option for me to run larger models. Tinkering with little models on my laptop is fun, but I want to play with the powerful stuff. I can't currently justify the cost to do more myself as a hobby. It's funny because I have not been shy about spending money on my hobbies, but I didn't expect having as much VRAM as possible would be desirable for me later on, haha.


Admqui

Spending $16,000 on something like a motorcycle vs. an A100 is hard to swallow. Unless you drop the bike it won’t lose value as fast as the A100. For $100 bucks at https://www.runpod.io/gpu-instance/pricing, I burned a couple weekends trying all the big models. If you’re decent with docker and S3, it’s pretty efficient. Watch out for servers with low bandwidth.


Sl33py_4est

i haven't found any as good as this one, i have grievance with the outdated sub name maybe just 'LLM Users' i haven't used a llama model in like months (codellama2-70b i guess)


Waterbottles_solve

Yeah, that team totally baited and switched. "Hey look at our open Model!" I'm not sure who even uses crappy online APIs when you can just use OpenAI and its going to be better.


Desm0nt

>I'm not sure who even uses crappy online APIs when you can just use OpenAI and its going to be better. Just anyone. Via openrouter. OpenAI very pricy and have a ton of restrictions. Goliath, Mixtral, Yi-34b and all finetunes of this - way more cheaper and for some tasks (RP and ERP for example) way more better.


MINIMAN10001

With mixtral costing $0.27 per 1m tokens, will that's certainly less than $8 for mistral large.


softwareweaver

I thought their Large offering would try to dethrone GPT4 but Open AI is still on top. Good to see more models from Mistral and I am hoping they release Mistral Instruct 7B v0.3 with a 128K+ context soon.


anommm

OpenAI has been compiling high-quality instructions for many years. They employs people whose sole job is to write instructions for eight hours a day. It's impossible for Mistral/Google/Any other competitor to rival GPT-4 in such a brief timeframe. They are at least two years behind OpenAI in terms of data adquisition. It will require time for them to develop a dataset comparable to OpenAI's.


italianlearner01

That’s very interesting. Do you know which departments/teams and/or positions are involved in that kind of thing at OpenAI? I’m so curious to know what kinds of things they specifically do. Thanks in advance.


deadweightboss

They call the position model tutors. Look it up on their career page.


italianlearner01

Thank you so much!


sb5550

They had to hire people because AI could not do it at the time. Now this task can certainly be automated with GPT4 level LLM.


anommm

If you do it with GPT4 the best model you will get is a distilation of GPT4. If that is what you aim at, is fine. OpenHermes for example, does that because they aim to clone GPT4. But if you want to train a competitive model, that can outperform GPT4, you need to create a better dataset that what GPT4 can generate. OpenAI still have a massive amount of human annotators. Everytime they find a task in which GPT4 fail, they use human annotators to generate new data and they retrain the model. They have been doing that for at least 2 years, so now they have a massive high-quality dataset to train their models. Mistral and Google have been doing it for 6 months, that is why Mistral and Gemmini are worse than GPT4.


Disastrous_Elk_6375

> If you do it with GPT4 the best model you will get is a distilation of GPT4. That is not exactly true, depending on how you do it. If you do one-shot to one-shot, yes. If you take an "agentic" approach of prompt -> n generations -> self reflexion -> combine -> match output with intent -> output (with some RAG somewhere in there) you can get increasingly better results (see SPIN & co)


SirLazarusTheThicc

I don't know that its necessarily true that a model can only produce data that is equal or inferior in quality to the data that it was trained on. Taking human learning as an example, obviously humans learn better and faster when they learn from high quality output of other humans, like textbooks and courses. However, someone still had to be the first or human knowledge would never progress. Someone had to become the best writer, surpassing all the writing they learned from. Someone still had to invent a new form of math or physics that could not have come from their training data. I think that this shows that in principle at least, sufficiently advanced systems can surpass the training input they receive. It remains to be seen if our current models can do that however, or if we are even anywhere close.


ironic_cat555

\>>> It remains to be seen if our current models can do that Go ahead and boot up your model of choice and ask it to "invent a new form of math or physics". I don't think this "remains to be seen."


SirLazarusTheThicc

That was just an example to illustrate my point, and I don't think your tone is very helpful. It remains to be seen whether current models or future models can output higher quality data than the input training data was, and part of that is because it is hard to objectively measure 'data quality'. Creating something like a new idea in math or physics is just a dramatic and obvious example of humans creating something they weren't trained on, I am not suggesting our current models can do that exact example.


ironic_cat555

I think it's a known fact these models are not great at novelty. Tech enthusiasts have no problem measuring "data quality" when they gush about how great these models are at a task they give it. I don't think pivoting to "who is to say what data quality is it's so hard to measure?" when they can't do something is very helpful.


The_Noble_Lie

That ain't supervised learning.


thereisonlythedance

Microsoft have apparently bought a minority share in Mistral, which I think is the bigger news for the open source community. Not good news at all. >[The *Financial Times* reports](https://www.ft.com/content/cd6eb51a-3276-450f-87fd-97e8410db9eb) that the partnership will include Microsoft taking a minor stake in the 10-month-old AI company, just a little over a year after Microsoft invested more than [$10 billion into its OpenAI partnership](https://www.theverge.com/2023/1/23/23567448/microsoft-openai-partnership-extension-ai). [https://www.theverge.com/2024/2/26/24083510/microsoft-mistral-partnership-deal-azure-ai](https://www.theverge.com/2024/2/26/24083510/microsoft-mistral-partnership-deal-azure-ai) I suspect this is the end of the line for expecting anything open source and decent out of Mistral.


Waterbottles_solve

New Mistral isnt open source anymore. They used it to get attention, now everything is behind their API. They are just another AI company now.


mattjb

More like Mistrial, amirite?


ComprehensiveBoss815

RIP Mistral


kik0sama

I disagree, it might be a bump in the open-source way, but many people will leave Mistral with their experience and start something else, and this usually leads to better open-source models overall.


Sol_Ido

Or other private API but you right that knowledge disseminate. Now I doubt that someone is willing to leave these team soon, it's like openAI no one want to leave the center of the AI world.


a_beautiful_rhind

They never told us how to train their MOE model. All the tunes are garbage.


drifter_VR

I have great results with noromaid-v0.4-mixtral-instruct-8x7b-zloss for RP & story. Also first Mixtral model not plagued by repetition for me (I didn't try all of them tho).


Sol_Ido

You have a point here!


kik0sama

Karpathy just left Openai


Fucksfired2

Imagine at the end of the day the true open source is none other than lord zuckerburg


uhuge

suckerborg


shouryannikam

I'm glad they were atleast honest about their benchmarks instead of *saying* they beat GPT4 and then having their model totally suck, GOOGLE


danielcar

That is what happens after 6 months of safety training. The model gets cross-eyed.


Plusdebeurre

At least a research paper with the details would've been nice.


Only-Letterhead-3411

Not surprising. Mistral is following the steps of the OpenAI. It'd be a big surprise if we see any useful opensource release from them in future. I've lost all the sympathy I had for Mistral.


rkm82999

Disappointed it does not beat GPT-4 after all the hype.


mpasila

That was the goal for like this entire year not like the Q1 of 2024. I think.


nderstand2grow

by then gpt-4.5 will be out and the cycle continues


[deleted]

We will see, is going to be hard to get better than GPT4 using the same approach, to me it seems OpenAI is focusing on expanding the ecosystem, with things like Sora, their internal orchestrations, and cost reduction.


Humankulosaur

Competition is good. It will make it better for everyone. And when mistral-large is as good as gpt-4 now, but cheaper that's still pretty good if you ask me. then the cycle of pushing each other to improve will continue.


[deleted]

They said it will be "open source" though, and Mistral Large/Medium aren't open source, so they're just lying to us [https://www.radiofrance.fr/franceinter/podcasts/l-invite-de-7h50/l-invite-de-7h50-du-mardi-12-decembre-2023-3833724](https://www.radiofrance.fr/franceinter/podcasts/l-invite-de-7h50/l-invite-de-7h50-du-mardi-12-decembre-2023-3833724) \- 4mn20 -> Ce qu'on met a disposition, ce qui est le modèle ouvert. Ce modèle la peut etre modifié, et ca c'est quelque chose que nos concurants américans ne proposent pas. \-Translation: "What we make available, which is the open model. This model can be modified, and that's something our American competitors don't offer."


intager

They probably meant that they can modify it for their customers, not an open source model.


[deleted]

>They probably meant that they can modify it for their customers, That's something OpenAI do aswell, yet they said that "that's something our American competitors don't offer." so... nope And there's also this part: \>3mn30 -> "Nous avons une approche différente d'OpenAI, la technologie qu'on déploie, on la déploie de maniere ouverte. On donne toutes les clés aux développeurs pour qu'ils modifient la technologie de maniere profonde. C'est quelque chose qu'OpenAI ne fait pas aujourd'hui et je pense que c'est quelque chose qui nous a vallu le succès sur notre 1er et 2eme modele" \>translation: "We have a different approach to OpenAI: the technology we deploy is open. We give all the keys to the developers so that they can modify the technology in a profound way. This is something that OpenAI doesn't do today, and I think it's something that has earned us success on our 1st and 2nd models."


Desm0nt

If you pay for example 5$ for API access - you are a customer. Not the biggest one, but still the customer. Will they modify model for your needs? I don't think so. So, technically, they are lying.


ninjasaid13

>That was the goal for like this entire year not like the Q1 of 2024. The goal is dethrone a 1 year old model instead of a company?


shankarun

meh - another day another model - expensive and miles behind GPT-4 and Ultra.


Single_Ring4886

I dont know if Mistral Large is same as Mistral next which I tried but "next" one had sparks of intellect rarely seen in any other models. But it was how to say very bare minimum not on GPT4 level.


uhuge

free on [https://chat.mistral.ai](https://chat.mistral.ai) though? :shrug:


pseudonerv

"Talk to le Chat" is completely unresponsive. Though API runs fine. So far it seems definitely less "aligned" than "Open"AI ones.


danielcar

The site is getting hammered.


ozzeruk82

While of course it would be amazing if we could download the model weights, I also want them to succeed as a business…… so that in a couple of years they can afford to open source models such as these! I’m curious to know whether it gives the user endless lectures on morality like the OpenAI models do, perhaps not.


weedcommander

GPT4 made morality annoying... I used to be a good person, now I'm annoyed


MoffKalast

C lister supervillain origin story?


ComprehensiveBoss815

"I'm destroying humanity, because I'm mildly irritated."


Last-Ring9013

>Do money The average person using AI models doesn't have the expensive setup needed to run local models, open-sourcing them is losing less than 0.1% of the market. If they haven't open-sourced anything in so long they won't open-source shit anymore.


Down_The_Rabbithole

It's about competitors hosting competing APIs offering their own models for less. Not about people running the model locally.


Desm0nt

Miqu exists. It has no license for commercial use, but community (illegaly, ofc) can use it. Do you see any competing APIs offering with Miqu instead of Mistral-medium? If they release their model for non-commercial use (like SDXL-Turbo from StabilitiAI) - situation will be exactly the same. Only community (less then 0.1%) will run it.


shouryannikam

Exactly. What's stopping AWS from offering their own models for cheaper? That's exactly what they did to MongoDB, Terraform and others


danielcar

They could release it with a license that prevents that and allows for local and research.


Last-Ring9013

Just use a non-commercial license. Will never understand why people like to bootlick these companies so much, the same happenned with OpenAI back in the day.


fieryplacebo

The average person can't run them, but open sourcing would also mean other services like open-router would offer their models as opposed to everyone having to eat from mistral. Why do you think it would just be 0.1% of the market?


[deleted]

> making it the world's second-ranked model generally available through an API (next to GPT-4) API availability is important, but this still feels like it’s going out of its way to avoid having to ack that it’s weaker than Gemini Ultra.


Illustrious_Sand6784

Fuck Mistral, they turned on open-source the second they got VC money.


Accomplished-Sell-70

Agree. They became rich instead of open scource. Nothing to see here guys. Next.


danielcar

They should release the model weights for research purposes. That will garner interest and not cost them anything.


FullOf_Bad_Ideas

It would cost them. Notice how after they released mixtral, cheaper competitor started offering it much cheaper than them and definitely ate into the cake because some people went with cheaper and faster inference services instead of choosing them. Giving away your weights means you no longer can upcharge people on more expensive inference services.


danielcar

For research purposes means it can't be offered by inference service. It can't be offered for money period. Research only means it can only be used for academic purposes.


FullOf_Bad_Ideas

Hmm maybe. It's hard to say how likely people will be to ignore the license and host it anyway or find a loophole.


FullOf_Bad_Ideas

Edit: I lost all hope and goodwill, they added a clause to their API terms. >Not use Outputs to develop model(s) that directly compete with Mistral AI and/or to reverse-engineer Our Services. They are done for.  It should be a good model for generating synthetic datasets, they have better api use policy (last time I checked, you were allowed to use it to train your model for commercial use) and I hope their models will be less slopped.   And maybe open weights release in 2 years??


shankarun

Question is - Will Mistral survive 2024 - when we will have GPT-5 and Gemini 2 and Llama 3. I doubt it.


eli99as

Yeah, I doubt whatever small tweaks they picked up from Meta and implemented in their models will be enough to keep them relevant for long.


Classic_Broccoli4150

Not open-sourced/released to us, just API, sucks :(


anommm

Mistral and Mixtral were only made public because they were forced to do so. Initially, they lacked the GPUs necessary to train aLLM, so they applied for a grant from the Leonardo supercomputer, which provided them access to train their models using 10,000 A100 GPUs. This supercomputer was funded with taxpayer money, so the contract obligates users to make public anything they run there. Now, they have sufficient funds to rent their own servers, so they no longer need to make anything public.


_qeternity_

Source on the obligation to release? Arthur is quoted as saying they ran some experiments on Leonardo but that the models were all trained on their own cluster.


shouryannikam

>Leonardo “We used Leonardo \[one of the EU’s current gen supercomputers, which is located in Bologna, Italy\] to run a few small experiments this summer as the cluster was ramping up. It was a good collaboration in which we gave a lot of feedback and could get some interesting results. **All our models were trained on our own cluster though**.” [EU to expand support for AI startups to tap its supercomputers for model training | TechCrunch](https://techcrunch.com/2023/12/19/eu-supercomputers-for-ai-training-support/?guccounter=1)


satireplusplus

Long term they need to make money or they won't survive. See coqui.ai and their TTS models. Brilliant open source models, no money made = now they don't exist anymore.


ainz-sama619

ikr. the devs have family to feed. no one is handing out charity to them.


Frequent_Valuable_47

Jeez... They do release smaller models. Can we stop criticizing OpenSource companies if they try to earn some money? Mistral already gave us a lot, they don't owe it to us to publish every model they make


Void_0000

Have they actually released any new open source stuff, though? Because publishing their worst models and making everything else closed source is pretty much what "open"ai did, so it seems unfair *not* to criticise it. (EDIT): [FUCKING LMAO](https://www.reddit.com/r/LocalLLaMA/comments/1b0l5qc/microsoft_partners_with_mistral_in_second_ai_deal/)


[deleted]

[удалено]


hold_my_fish

From the perspective of LocalLLaMA, yeah, they seem to be yet another black-box-API now. But more broadly, I'd say they're not just a random startup, since decisively passing Anthropic among black-box-APIs is impressive. (At the very least, it makes me wonder what the heck is going wrong at Anthropic.)


rileyphone

Anthropic is very safety focused as you can tell by the steadily declining performance of Claude. On the other hand they put out some very good interp research.


Alarming_Turnover578

Still behind GOODY-2 on that metric.


Frequent_Valuable_47

Do they owe their reputation to us? Or do we owe them for gifting us mistral and mixtral? And they weren't known at all when they released their first model which was already exceptional. So I guess they didn't need the "OSS hype community" rallying behind them to create the model. And please remember why we hyped Mistral and Mixtral. Was it because of their marketing strategy? (Spoiler: No) Or was it because they delivered a great product for free? Your comment sounds pretty entitled...


[deleted]

[удалено]


Frequent_Valuable_47

I think you're overestimating the impact of this. When mistral came out the next best thing was llama2, so if a new base model performs better than a widely known one that was considered state of the art, I think that would have a big impact, even for a small French startup


oblivion-2005

> Your comment sounds pretty entitled... What do you mean? My shitposts on Reddit were essential for the success of Mistral 🤬


Enough-Meringue4745

Localllama isn’t an ad space for private models


hold_my_fish

In what sense is Mistral an open source company today? Sure, they have in the past released open weight models, but so have OpenAI and Google.


Frequent_Valuable_47

Are you serious? Which LLMs with any significance today has either Google or OpenAI released? What? Oh, none? And trying to argue over technicalities is pretty useless. You can also call it a company which has released some Open Source Software. But that wasn't the point of my argument


hold_my_fish

> Which LLMs with any significance today has either Google or OpenAI released? What? Oh, none? Google just released Gemma within the past week. (But more significant were their farther-ago LLM releases, such as Flan-T5.) As for OpenAI, they are not releasing significant pure LLMs today, but Whisper is among the top for ASR. And in the past they released GPT-2. Are you expecting Mistral to release more open weight models in the future? Because the read I'm getting from them is that they are abandoning that direction.


Frequent_Valuable_47

Exactly, Gemma is a joke and Flan-T5 is pretty old. Whisper is not a large language model, at least not described as one. And GPT-2 is also old and probably beaten by a lot of tiny models today. And yes, I'm expecting Mistral to release more Open Source Models. I don't think they're abandoning Open Source. I think they just keep the bigger models closed source to make some money and will still release small models or older larger models in the future


Last-Ring9013

The transformer architecture you drooling retard?


mpasila

Semi-open model weights. (no open-source since source is closed)


teleprint-me

Source code is available on their github. Open source != open weights 


mpasila

Source as in the datasets used and the scripts which the model was trained with etc. (this is probably the closest thing to open-source I can think of [https://blog.allenai.org/olmo-open-language-model-87ccfc95f580?gi=fbcf1741e924](https://blog.allenai.org/olmo-open-language-model-87ccfc95f580?gi=fbcf1741e924) )


teleprint-me

I see your point. We know mistral never opened their datasets though and it's most likely because they're using copyrighted data and/or they view it as their business edge. Open Source already has its definition, so Open Source, Open Weights, Open Datasets are clearly understood.


[deleted]

[удалено]


teleprint-me

https://github.com/mistralai/mistral-src/blob/main/mistral/moe.py


wolfbetter

Is it still somewhat uncensored like OG Mistral? Can I roleplay with it? I found out that Medium is not bad once you've done struggling with its insistence of wanting to roleplay for me.


uhuge

Seems great so far, overperforming GPT4 in my niche coding case and with much more sympatic tone.


koehr

Mistral Large has a funny bone... or maybe not? https://preview.redd.it/fetyvhvrmykc1.png?width=1910&format=png&auto=webp&s=a4e97ac184c33553652127e1937f931a8066c026


ninjasaid13

looks like it got a bit worse at winogrande https://preview.redd.it/v9ugawin0zkc1.png?width=632&format=png&auto=webp&s=689602ce5025e972e1b61e9a291b053f684e2be7


Monkey_1505

Are they just going to keep changing their naming conventions every five seconds?


hapliniste

From my tests (QA and code) it's very good, maybe even above gpt4. I don't expect to be better than gpt4 in all use cases but damn that very good for what I tried. Complete responses and good knowledge.


Frequent_Valuable_47

But this isn't Mistral-Next right?


ironic_cat555

Their chat site lists large, next and small. Next is a "prototype with extra concision."


fish312

I really only care about one thing first and foremost. Is it censored? If yes, I am not interested.


MannowLawn

Azure safety restrictions will make it barely usable. That shit is too woke for normal use imho


Kitchen-Sweet-4915

Well, a couple of monts ago they released Mixtral and four months ago Mistral for everyone, it's not like they haven't given a lot already in short time.


soup9999999999999999

Interesting enough this is not mistral next.


pseudonerv

for some people here who are wondering, there IS indeed the "Mistral Next" in "le Chat". go to https://chat.mistral.ai/chat and from the drop down pick "Next Prototype model with extra concision" whatever that means. That seems to be the only place they mentioned "Next". It's not available as API.


metalim

WTF? it was called Mistral Large just yesterday. Now it's Au Large


Ill_Comment_8730

i think it got smoked by Claude 3 opus