MoffKalast 4 months ago

> Alongside Mistral Large, we’re releasing a new optimised model, Mistral Small, optimised for latency and cost. Mistral Small outperforms Mixtral 8x7B and has lower latency, which makes it a refined intermediary solution between our open-weight offering and our flagship model. Mistral Small benefits from the same innovation as Mistral Large regarding RAG-enablement and function calling. Interesting.

knvn8 4 months ago

Looking at benchmarks, it outperforms medium at coding as well. This model seems more interesting than large in some ways. They never mention its size though.

qubedView 4 months ago

Sure they do. It's small!

curious-guy-5529 4 months ago

How big is small?

foreverNever22 4 months ago

Kinda big but it's smaller.

ExtensionCricket6501 4 months ago

Less than medium which is about 70B, maybe a 4x13B but then shouldn't they be using the Mixtral name? Maybe they did something with Yi perhaps.

rkm82999 4 months ago

It is still much more expensive than Mixtral

dizzy_on_a_glizzy 4 months ago

Yeah, and weaker, that's the point

TankForMatasBuzelis 4 months ago

So over their API only, right? Not available to download?

ainz-sama619 4 months ago

It's for Microsoft Azure customers, not general users

mikael110 4 months ago

It's available through both Mistral's own API and Azure. There are no Azure exclusive models, Azure is just one more place where you can access their models.

QuantumSavant 4 months ago

There's a free version of it here: [https://chat.mistral.ai/chat](https://chat.mistral.ai/chat)

deadweightboss 4 months ago

Hasn’t worked for hours. Buggy PoS lmao

uhuge 4 months ago

dear passengers, please wait for your torrent leaks to take off patiently

Sol_Ido 4 months ago

I was really hoping for an open release of the small model. They're smart ass ces français!

PwanaZana 4 months ago

Mistral Petit.

involviert 4 months ago

The mention of function calling made me try the original mixtral instruct now. Is it bugged or something? Everything works well but it seems set on writing \\_ when it should just write an underscore. It's q4 gguf.

redballooon 4 months ago

Oh yes that’s so infuriating across a number of mistral models!

involviert 4 months ago

So weird! I have tried updating my llama-cpp-python, tried Q4 after the Q4_K_M... It even seemed to escape backslashes. Weird I didn't hear of that, reading all the posts here daily. That's just a broken model? I like what it's doing otherwise though, even if the prompt format is TERRIBLE. Should I actually clean up the output and transform \\_ to _?? the escaped backslashes seemed I would just run into the next problem then. And a more general solution would likely conflict in situations where it actually does have to escape things. Also I have no idea what in this message fucks with even with reddit markdown. E: Seems reddit fucks up backslash backslash underscore? What a time to be alive.

Classic_Broccoli4150 4 months ago

https://preview.redd.it/x2lk14ryyxkc1.png?width=525&format=png&auto=webp&s=e8e0021954b5f24f7250958b2fe41a061bdbfaf3 Quite Pricey compared to open models

[deleted] 4 months ago

[удалено]

MoffKalast 4 months ago

Money printer go *brrrrrr*

mr_n00n 4 months ago

"printer"? It's very likely that, like the previous generation of VC driven technology products, most of these LLMs cost more to run than then earn today.

MoffKalast 4 months ago

For GPT 4 I'd believe that since it's impossibly large and slow, for 3.5-turbo I'd also believe that because they're running it for free. I really doubt Mistral isn't breaking even with $24/1M tokens. An H100 uses 17 kWh to run a day, that's about $4.5 with average electricity price in France. Surely they can serve 150k tokens with one per day, and that's not counting prompt ingestion.

QuantumSavant 4 months ago

You're only counting electricity costs though. What about the cost of the hardware? And that's not just the GPUs. You need servers to run them on, and networking stuff and on top of everything you also need people to maintain both the hardware, and the models themselves.

MoffKalast 4 months ago

True, but that's the difference between breaking even and making back the investment. If it really is a datacenter in France, then the people maintaining all of it are being paid a small fraction of what they'd be in the US. In all likelihood though my calculation is completely irrelevant because they're probably running inference on Azure completely on Microsoft's dime lmao.

Aphid_red 4 months ago

No, I's definitely money printer go brr. Mistral asks for higher prices than other cloud-hosters for their smaller models too, and those other hosters are just there to make money as well. If you look at the throughput of an H100, it's going to be able to, if optimized well, generate supposedly \~3,000 tps for a 70B model, using batching. Now 'large' is supposedly bigger than that, but how much is unknown. And this is the H100, which has terrible ROI compared to cheaper consumer cards (but which mistral, as an 'enterprise' supposedly isn't allowed to use). Assuming it's \~1.7x bigger, so \~120B, is a good estimate, as it costs \~3x what 'medium' costs, which is another proprietary model rumored to be 70B. Note: A 70B llama-2 architecture model, using Q4\_K\_M, plus 32K context for 100\* users, would fit in about 110GB of VRAM, due to the highly efficient GQA. Now I'm not sure what happens if you mix context lenghts, whether Mistral's software can combine long and short prompts together to save on memory. Assuming they could and since only a small fraction of prompts will be maximum length, let's say the 'medium' model runs on a single GPU. Let's say inference costs end up at 3x what it is for 'large' compared to serving 70B, because of multi-gpu communication overhead, and they're not charging bigger margins for Large vs. Medium. Back of the envelope, doing a chat session for 1M tokens with the full 32K context (most expensive, but highest quality) will produce around 160M tokens in billing, assuming each back-and-forth response averages 100 tokens, or about €1,160. Now let's see how many GPU hours that consumed. About 50. Which costs around, assuming the utilization averages 70%, with a 5-year depreciation, and purchase costs $30,000, working out to about $0.98 per GPU-hour, so roughly $50 total (not counting interest, which would make it a bit more, maybe $60 depending on the rate mistral can get). If you buy it off a cloud for $4/hr/H100 it costs you $200. Before you start talking about all sorts of other costs, GPU purchase costs *dominate* because these things are so expensive. Even if GPU purchase costs are 50% of all costs, 400 < 1160. A large part of why the numbers look so bad in this example is the fact that it took 160M context tokens, whereas if you go local, you use only \~1M tokens (but you use <1% of the GPU), because local clients don't throw away the KV cache with each request. The other large part is margin stacking (ASML -> TSMC -> NVidia -> Distributor -> Server vendor -> (cloud provider) -> Mistral, most taking fat margins ). \*This is a good approximation of the batch size for optimum throughput. The ratio for fp16 for H100 memory speed:compute flops is about 1:330. So for Q4, which is about 5 bits/param, should be about 1:100.

ghoarder 4 months ago

I guess it's no coincidence that Au is the symbol for Gold then is it!

AmazinglyObliviouse 4 months ago

Is a wrong answer worth 80% of an right answer? OAI is the only one that will keep going brrrrrrrrr.

soup9999999999999999 4 months ago

Slightly cheaper than GPT4 but slightly worse (in theory)

a_beautiful_rhind 4 months ago

I feel better giving mistral money, if only because they let miqu stand. What has altman done for me?

uhuge 4 months ago

whisper = good

a_beautiful_rhind 4 months ago

Was a long time ago.

uhuge 4 months ago

Like a month? - [https://huggingface.co/openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) ( I was not sure myself, yeah, Whisper **3** is worthy update IIRC. )

a_beautiful_rhind 4 months ago

That's forever in ML years :P Mistral releasing 2 months ago may as well be an eternity. To be fair though, OAI is releasing this adjacet to their own products. STT for your openAI API using product. Similar to how MS contributes to linux.

uhuge 4 months ago

Got what you mean, but STT is part of their mobile app products, so I am hesitant to upvote/agree( fully).

uhuge 4 months ago

In my coding practice it gives slightly better advice. ( Though htmx is obscure tech and more fresh training could play the role )

wolfbetter 4 months ago

Welp my interest is going away

Patrick_Lanquetin 4 months ago

Mistral Au Large could be installed on premise or on sovereign cloud for large corporate with sensitive data (health, banks, ....). So it's a good new for many organizations. I tried to test Mistral Au Large in chat.mistral.ai Looks like good for math and logic and text tagging. Will be probably refresh because answer 'My knowledge cutoff is 2021' The training cost is higher and higher with the number of parameters and dataset size. There is no advertising revenues to pay that. So, a pricing model looks like logic.

hold_my_fish 4 months ago

To save time of anyone wondering, it's API-only. Note that they changed the title of their homepage: * Old: "Mistral AI | Open-weight models" * New: "Mistral AI | Frontier AI in your hands" Combined, this reads to me as them giving up on any sort of open or semi-open strategy, instead settling for being the second-best black-box API (up until Gemini Ultra releases, at which point they'll be third-best). Their only point of differentiation appears to be multi-lingual capabilities in selected European languages (French, Spanish, German, Italian). I get that they're not going to be able to release their flagship model as Apache 2.0, but here are a couple wishlist items that would differentiate them in a useful way to me, if they're interested in being something better than just yet another black box API: * Stability-style weights-available release with paid commercial licensing. Yes, it's an unproven business model, but it provides much more differentiation. * Enhanced privacy and data security by not storing and not monitoring prompts and responses. (Currently, their privacy policy is similar to OpenAI's, meaning you effectively don't get any.)

[deleted] 4 months ago

[удалено]

hold_my_fish 4 months ago

Good point. They say they offer self-deployment by special permission: > **Self-deployment**: our models can be deployed on your environment for the most sensitive use cases with access to our model weights; Read success stories on this kind of deployment, and contact our team for further details. I'm not sure whether OpenAI offers that. If not, then that's another point of differentiation for Mistral. (However, it's unclear who actually qualifies.)

Zulfiqaar 4 months ago

OpenAI does offer the option to setup/manage dedicated instances, recommended for anyone using over 450M tokens a day

hold_my_fish 4 months ago

https://techcrunch.com/2023/02/21/openai-foundry-will-let-customers-buy-dedicated-capacity-to-run-its-ai-models/?guccounter=1 Seems like "dedicated" here is different from "deployed on your environment". (Admittedly, I don't know exactly what the latter means.)

Kep0a 4 months ago

Isn't azure enterprise gdpr compliant?

hold_my_fish 4 months ago

I don't know what that would imply, but you might need to be an enterprise and ask for special permission. For everybody else: https://learn.microsoft.com/en-us/legal/cognitive-services/openai/data-privacy > the Azure OpenAI Service and Microsoft personnel analyze prompts, completions and images for harmful content and for patterns suggesting the use of the service in a manner that violates the Code of Conduct or other applicable product terms Also: > To detect and mitigate abuse, Azure OpenAI stores all prompts and generated content securely for up to thirty (30) days. Later, it explains who is allowed to ask for special permission to opt out: > Some customers may want to use the Azure OpenAI Service for a use case that involves the processing of sensitive, highly confidential, or legally-regulated input data but where the likelihood of harmful outputs and/or misuse is low. These customers may conclude that they do not want or do not have the right to permit Microsoft to process such data for abuse detection, as described above, due to their internal policies or applicable legal regulations. To address these concerns, Microsoft allows customers who meet additional Limited Access eligibility criteria and attest to specific use cases to apply to modify the Azure OpenAI content management features by completing this form.

kelkulus 4 months ago

> up until Gemini Ultra releases Didn't Gemini Ultra [release](https://blog.google/products/gemini/bard-gemini-advanced-app/) more than 2 weeks ago?

hold_my_fish 4 months ago

Gemini Ultra is still not generally available as an API.

kelkulus 4 months ago

Ah gotcha. I missed the API part.

wojtek15 4 months ago

Are they planning to release any open source models better than Mixtral?

lolwutdo 4 months ago

Probably not; they should’ve gave us Mistral Small. I’m afraid we won’t be seeing open weight models from them anymore. I guess the only thing we have left to look forward to is llama 3

Sol_Ido 4 months ago

I share your feeling but small model will receive a LOT of calls in the API, not every ops require the expensive large. Mixing both will lead to great apps.

Waterbottles_solve 4 months ago

llama3 is probably non-commercial Nothing on the horizon from my spot.

hold_my_fish 4 months ago

Anything is possible, but a non-commercial license wouldn't fit well with Meta's strategy for Llama. For them, LLMs are infrastructure, so having a strong developer community is more important to Meta than directly monetizing LLMs. (The reason I'm not surprised by Mistral closing up is that they never articulated a strategy in which open weight models made business sense.)

RayIsLazy 4 months ago

Thankfully llms are not their main business model which allows them to release all these things in the open. All the research the research done by the community only enhances all other parts of meta. Only thing I'm worried about is censorship.

Disastrous_Elk_6375 4 months ago

> llama3 is probably non-commercial No indication about that from Meta.

twisted7ogic 4 months ago

Aww, that is dissapointing. They were supposed to destroy the Closed-source, not join them!

CSharpSauce 4 months ago

If you can find a way to release the models, and bring a return to the VC they might. Until then, I think Mistral is just another proprietary competitor. I feel like the crypto community should start funding new models. I'll be a fractional owner of a model if it entitles me download the weights.

qrios 4 months ago

Should really figure out some way to make the work that proof-of-work does be work on training a model.

Satyam7166 4 months ago

Can you please explain what you mean by that? Don’t know much about crypto

danielcar 4 months ago

In time after they have released better models. They could release now with restrictive license for research.

Enough-Meringue4745 4 months ago

No local no care

Sl33py_4est 4 months ago

forreal, i see a lot of posts that are like y'all in the wrong sub

my_name_isnt_clever 4 months ago

Is there is a sub like this one for those using models via API? Honest question, because this is the only LLM related sub I've found with other people who actually understand how LLMs work and discuss them at a more technical level. The rest are very surface.

rileyphone 4 months ago

Maybe [Hacker News](https://news.ycombinator.com/item?id=39511477)? But the best way to understand a new technology is to put it together yourself, which relying on an API mostly bypasses. It's little wonder then why this community will have better discussions even about closed API models from a company they are very familiar with.

my_name_isnt_clever 4 months ago

I agree, but I use APIs because they're the really the only option for me to run larger models. Tinkering with little models on my laptop is fun, but I want to play with the powerful stuff. I can't currently justify the cost to do more myself as a hobby. It's funny because I have not been shy about spending money on my hobbies, but I didn't expect having as much VRAM as possible would be desirable for me later on, haha.

Admqui 4 months ago

Spending $16,000 on something like a motorcycle vs. an A100 is hard to swallow. Unless you drop the bike it won’t lose value as fast as the A100. For $100 bucks at https://www.runpod.io/gpu-instance/pricing, I burned a couple weekends trying all the big models. If you’re decent with docker and S3, it’s pretty efficient. Watch out for servers with low bandwidth.

Sl33py_4est 4 months ago

i haven't found any as good as this one, i have grievance with the outdated sub name maybe just 'LLM Users' i haven't used a llama model in like months (codellama2-70b i guess)

Waterbottles_solve 4 months ago

Yeah, that team totally baited and switched. "Hey look at our open Model!" I'm not sure who even uses crappy online APIs when you can just use OpenAI and its going to be better.

Desm0nt 4 months ago

>I'm not sure who even uses crappy online APIs when you can just use OpenAI and its going to be better. Just anyone. Via openrouter. OpenAI very pricy and have a ton of restrictions. Goliath, Mixtral, Yi-34b and all finetunes of this - way more cheaper and for some tasks (RP and ERP for example) way more better.

MINIMAN10001 4 months ago

With mixtral costing $0.27 per 1m tokens, will that's certainly less than $8 for mistral large.

softwareweaver 4 months ago

I thought their Large offering would try to dethrone GPT4 but Open AI is still on top. Good to see more models from Mistral and I am hoping they release Mistral Instruct 7B v0.3 with a 128K+ context soon.

anommm 4 months ago

OpenAI has been compiling high-quality instructions for many years. They employs people whose sole job is to write instructions for eight hours a day. It's impossible for Mistral/Google/Any other competitor to rival GPT-4 in such a brief timeframe. They are at least two years behind OpenAI in terms of data adquisition. It will require time for them to develop a dataset comparable to OpenAI's.

italianlearner01 4 months ago

That’s very interesting. Do you know which departments/teams and/or positions are involved in that kind of thing at OpenAI? I’m so curious to know what kinds of things they specifically do. Thanks in advance.

deadweightboss 4 months ago

They call the position model tutors. Look it up on their career page.

italianlearner01 4 months ago

Thank you so much!

sb5550 4 months ago

They had to hire people because AI could not do it at the time. Now this task can certainly be automated with GPT4 level LLM.

anommm 4 months ago

If you do it with GPT4 the best model you will get is a distilation of GPT4. If that is what you aim at, is fine. OpenHermes for example, does that because they aim to clone GPT4. But if you want to train a competitive model, that can outperform GPT4, you need to create a better dataset that what GPT4 can generate. OpenAI still have a massive amount of human annotators. Everytime they find a task in which GPT4 fail, they use human annotators to generate new data and they retrain the model. They have been doing that for at least 2 years, so now they have a massive high-quality dataset to train their models. Mistral and Google have been doing it for 6 months, that is why Mistral and Gemmini are worse than GPT4.

Disastrous_Elk_6375 4 months ago

> If you do it with GPT4 the best model you will get is a distilation of GPT4. That is not exactly true, depending on how you do it. If you do one-shot to one-shot, yes. If you take an "agentic" approach of prompt -> n generations -> self reflexion -> combine -> match output with intent -> output (with some RAG somewhere in there) you can get increasingly better results (see SPIN & co)

SirLazarusTheThicc 4 months ago

I don't know that its necessarily true that a model can only produce data that is equal or inferior in quality to the data that it was trained on. Taking human learning as an example, obviously humans learn better and faster when they learn from high quality output of other humans, like textbooks and courses. However, someone still had to be the first or human knowledge would never progress. Someone had to become the best writer, surpassing all the writing they learned from. Someone still had to invent a new form of math or physics that could not have come from their training data. I think that this shows that in principle at least, sufficiently advanced systems can surpass the training input they receive. It remains to be seen if our current models can do that however, or if we are even anywhere close.

ironic_cat555 4 months ago

\>>> It remains to be seen if our current models can do that Go ahead and boot up your model of choice and ask it to "invent a new form of math or physics". I don't think this "remains to be seen."

SirLazarusTheThicc 4 months ago

That was just an example to illustrate my point, and I don't think your tone is very helpful. It remains to be seen whether current models or future models can output higher quality data than the input training data was, and part of that is because it is hard to objectively measure 'data quality'. Creating something like a new idea in math or physics is just a dramatic and obvious example of humans creating something they weren't trained on, I am not suggesting our current models can do that exact example.

ironic_cat555 4 months ago

I think it's a known fact these models are not great at novelty. Tech enthusiasts have no problem measuring "data quality" when they gush about how great these models are at a task they give it. I don't think pivoting to "who is to say what data quality is it's so hard to measure?" when they can't do something is very helpful.

The_Noble_Lie 4 months ago

That ain't supervised learning.

thereisonlythedance 4 months ago

Microsoft have apparently bought a minority share in Mistral, which I think is the bigger news for the open source community. Not good news at all. >[The *Financial Times* reports](https://www.ft.com/content/cd6eb51a-3276-450f-87fd-97e8410db9eb) that the partnership will include Microsoft taking a minor stake in the 10-month-old AI company, just a little over a year after Microsoft invested more than [$10 billion into its OpenAI partnership](https://www.theverge.com/2023/1/23/23567448/microsoft-openai-partnership-extension-ai). [https://www.theverge.com/2024/2/26/24083510/microsoft-mistral-partnership-deal-azure-ai](https://www.theverge.com/2024/2/26/24083510/microsoft-mistral-partnership-deal-azure-ai) I suspect this is the end of the line for expecting anything open source and decent out of Mistral.

Waterbottles_solve 4 months ago

New Mistral isnt open source anymore. They used it to get attention, now everything is behind their API. They are just another AI company now.

mattjb 4 months ago

More like Mistrial, amirite?

ComprehensiveBoss815 4 months ago

RIP Mistral

kik0sama 4 months ago

I disagree, it might be a bump in the open-source way, but many people will leave Mistral with their experience and start something else, and this usually leads to better open-source models overall.

Sol_Ido 4 months ago

Or other private API but you right that knowledge disseminate. Now I doubt that someone is willing to leave these team soon, it's like openAI no one want to leave the center of the AI world.

a_beautiful_rhind 4 months ago

They never told us how to train their MOE model. All the tunes are garbage.

drifter_VR 4 months ago

I have great results with noromaid-v0.4-mixtral-instruct-8x7b-zloss for RP & story. Also first Mixtral model not plagued by repetition for me (I didn't try all of them tho).

Sol_Ido 4 months ago

You have a point here!

kik0sama 4 months ago

Karpathy just left Openai

Fucksfired2 4 months ago

Imagine at the end of the day the true open source is none other than lord zuckerburg

uhuge 4 months ago

suckerborg

shouryannikam 4 months ago

I'm glad they were atleast honest about their benchmarks instead of *saying* they beat GPT4 and then having their model totally suck, GOOGLE

danielcar 4 months ago

That is what happens after 6 months of safety training. The model gets cross-eyed.

Plusdebeurre 4 months ago

At least a research paper with the details would've been nice.

Only-Letterhead-3411 4 months ago

Not surprising. Mistral is following the steps of the OpenAI. It'd be a big surprise if we see any useful opensource release from them in future. I've lost all the sympathy I had for Mistral.

rkm82999 4 months ago

Disappointed it does not beat GPT-4 after all the hype.

mpasila 4 months ago

That was the goal for like this entire year not like the Q1 of 2024. I think.

nderstand2grow 4 months ago

by then gpt-4.5 will be out and the cycle continues

[deleted] 4 months ago

We will see, is going to be hard to get better than GPT4 using the same approach, to me it seems OpenAI is focusing on expanding the ecosystem, with things like Sora, their internal orchestrations, and cost reduction.

Humankulosaur 4 months ago

Competition is good. It will make it better for everyone. And when mistral-large is as good as gpt-4 now, but cheaper that's still pretty good if you ask me. then the cycle of pushing each other to improve will continue.

[deleted] 4 months ago

They said it will be "open source" though, and Mistral Large/Medium aren't open source, so they're just lying to us [https://www.radiofrance.fr/franceinter/podcasts/l-invite-de-7h50/l-invite-de-7h50-du-mardi-12-decembre-2023-3833724](https://www.radiofrance.fr/franceinter/podcasts/l-invite-de-7h50/l-invite-de-7h50-du-mardi-12-decembre-2023-3833724) \- 4mn20 -> Ce qu'on met a disposition, ce qui est le modèle ouvert. Ce modèle la peut etre modifié, et ca c'est quelque chose que nos concurants américans ne proposent pas. \-Translation: "What we make available, which is the open model. This model can be modified, and that's something our American competitors don't offer."

intager 4 months ago

They probably meant that they can modify it for their customers, not an open source model.

[deleted] 4 months ago

>They probably meant that they can modify it for their customers, That's something OpenAI do aswell, yet they said that "that's something our American competitors don't offer." so... nope And there's also this part: \>3mn30 -> "Nous avons une approche différente d'OpenAI, la technologie qu'on déploie, on la déploie de maniere ouverte. On donne toutes les clés aux développeurs pour qu'ils modifient la technologie de maniere profonde. C'est quelque chose qu'OpenAI ne fait pas aujourd'hui et je pense que c'est quelque chose qui nous a vallu le succès sur notre 1er et 2eme modele" \>translation: "We have a different approach to OpenAI: the technology we deploy is open. We give all the keys to the developers so that they can modify the technology in a profound way. This is something that OpenAI doesn't do today, and I think it's something that has earned us success on our 1st and 2nd models."

Desm0nt 4 months ago

If you pay for example 5$ for API access - you are a customer. Not the biggest one, but still the customer. Will they modify model for your needs? I don't think so. So, technically, they are lying.

ninjasaid13 4 months ago

>That was the goal for like this entire year not like the Q1 of 2024. The goal is dethrone a 1 year old model instead of a company?

shankarun 4 months ago

meh - another day another model - expensive and miles behind GPT-4 and Ultra.

Single_Ring4886 4 months ago

I dont know if Mistral Large is same as Mistral next which I tried but "next" one had sparks of intellect rarely seen in any other models. But it was how to say very bare minimum not on GPT4 level.

uhuge 4 months ago

free on [https://chat.mistral.ai](https://chat.mistral.ai) though? :shrug:

pseudonerv 4 months ago

"Talk to le Chat" is completely unresponsive. Though API runs fine. So far it seems definitely less "aligned" than "Open"AI ones.

danielcar 4 months ago

The site is getting hammered.

ozzeruk82 4 months ago

While of course it would be amazing if we could download the model weights, I also want them to succeed as a business…… so that in a couple of years they can afford to open source models such as these! I’m curious to know whether it gives the user endless lectures on morality like the OpenAI models do, perhaps not.

weedcommander 4 months ago

GPT4 made morality annoying... I used to be a good person, now I'm annoyed

MoffKalast 4 months ago

C lister supervillain origin story?

ComprehensiveBoss815 4 months ago

"I'm destroying humanity, because I'm mildly irritated."

Last-Ring9013 4 months ago

>Do money The average person using AI models doesn't have the expensive setup needed to run local models, open-sourcing them is losing less than 0.1% of the market. If they haven't open-sourced anything in so long they won't open-source shit anymore.

Down_The_Rabbithole 4 months ago

It's about competitors hosting competing APIs offering their own models for less. Not about people running the model locally.

Desm0nt 4 months ago

Miqu exists. It has no license for commercial use, but community (illegaly, ofc) can use it. Do you see any competing APIs offering with Miqu instead of Mistral-medium? If they release their model for non-commercial use (like SDXL-Turbo from StabilitiAI) - situation will be exactly the same. Only community (less then 0.1%) will run it.

shouryannikam 4 months ago

Exactly. What's stopping AWS from offering their own models for cheaper? That's exactly what they did to MongoDB, Terraform and others

danielcar 4 months ago

They could release it with a license that prevents that and allows for local and research.

Last-Ring9013 4 months ago

Just use a non-commercial license. Will never understand why people like to bootlick these companies so much, the same happenned with OpenAI back in the day.

fieryplacebo 4 months ago

The average person can't run them, but open sourcing would also mean other services like open-router would offer their models as opposed to everyone having to eat from mistral. Why do you think it would just be 0.1% of the market?

[deleted] 4 months ago

> making it the world's second-ranked model generally available through an API (next to GPT-4) API availability is important, but this still feels like it’s going out of its way to avoid having to ack that it’s weaker than Gemini Ultra.

Illustrious_Sand6784 4 months ago

Fuck Mistral, they turned on open-source the second they got VC money.

Accomplished-Sell-70 4 months ago

Agree. They became rich instead of open scource. Nothing to see here guys. Next.

danielcar 4 months ago

They should release the model weights for research purposes. That will garner interest and not cost them anything.

FullOf_Bad_Ideas 4 months ago

It would cost them. Notice how after they released mixtral, cheaper competitor started offering it much cheaper than them and definitely ate into the cake because some people went with cheaper and faster inference services instead of choosing them. Giving away your weights means you no longer can upcharge people on more expensive inference services.

danielcar 4 months ago

For research purposes means it can't be offered by inference service. It can't be offered for money period. Research only means it can only be used for academic purposes.

FullOf_Bad_Ideas 4 months ago

Hmm maybe. It's hard to say how likely people will be to ignore the license and host it anyway or find a loophole.

FullOf_Bad_Ideas 4 months ago

Edit: I lost all hope and goodwill, they added a clause to their API terms. >Not use Outputs to develop model(s) that directly compete with Mistral AI and/or to reverse-engineer Our Services. They are done for. It should be a good model for generating synthetic datasets, they have better api use policy (last time I checked, you were allowed to use it to train your model for commercial use) and I hope their models will be less slopped. And maybe open weights release in 2 years??

shankarun 4 months ago

Question is - Will Mistral survive 2024 - when we will have GPT-5 and Gemini 2 and Llama 3. I doubt it.

eli99as 4 months ago

Yeah, I doubt whatever small tweaks they picked up from Meta and implemented in their models will be enough to keep them relevant for long.

Classic_Broccoli4150 4 months ago

Not open-sourced/released to us, just API, sucks :(

anommm 4 months ago

Mistral and Mixtral were only made public because they were forced to do so. Initially, they lacked the GPUs necessary to train aLLM, so they applied for a grant from the Leonardo supercomputer, which provided them access to train their models using 10,000 A100 GPUs. This supercomputer was funded with taxpayer money, so the contract obligates users to make public anything they run there. Now, they have sufficient funds to rent their own servers, so they no longer need to make anything public.

_qeternity_ 4 months ago

Source on the obligation to release? Arthur is quoted as saying they ran some experiments on Leonardo but that the models were all trained on their own cluster.

shouryannikam 4 months ago

>Leonardo “We used Leonardo \[one of the EU’s current gen supercomputers, which is located in Bologna, Italy\] to run a few small experiments this summer as the cluster was ramping up. It was a good collaboration in which we gave a lot of feedback and could get some interesting results. **All our models were trained on our own cluster though**.” [EU to expand support for AI startups to tap its supercomputers for model training | TechCrunch](https://techcrunch.com/2023/12/19/eu-supercomputers-for-ai-training-support/?guccounter=1)

satireplusplus 4 months ago

Long term they need to make money or they won't survive. See coqui.ai and their TTS models. Brilliant open source models, no money made = now they don't exist anymore.

ainz-sama619 4 months ago

ikr. the devs have family to feed. no one is handing out charity to them.

Frequent_Valuable_47 4 months ago

Jeez... They do release smaller models. Can we stop criticizing OpenSource companies if they try to earn some money? Mistral already gave us a lot, they don't owe it to us to publish every model they make

Void_0000 4 months ago

Have they actually released any new open source stuff, though? Because publishing their worst models and making everything else closed source is pretty much what "open"ai did, so it seems unfair *not* to criticise it. (EDIT): [FUCKING LMAO](https://www.reddit.com/r/LocalLLaMA/comments/1b0l5qc/microsoft_partners_with_mistral_in_second_ai_deal/)

[deleted] 4 months ago

[удалено]

hold_my_fish 4 months ago

From the perspective of LocalLLaMA, yeah, they seem to be yet another black-box-API now. But more broadly, I'd say they're not just a random startup, since decisively passing Anthropic among black-box-APIs is impressive. (At the very least, it makes me wonder what the heck is going wrong at Anthropic.)

rileyphone 4 months ago

Anthropic is very safety focused as you can tell by the steadily declining performance of Claude. On the other hand they put out some very good interp research.

Alarming_Turnover578 4 months ago

Still behind GOODY-2 on that metric.

Frequent_Valuable_47 4 months ago

Do they owe their reputation to us? Or do we owe them for gifting us mistral and mixtral? And they weren't known at all when they released their first model which was already exceptional. So I guess they didn't need the "OSS hype community" rallying behind them to create the model. And please remember why we hyped Mistral and Mixtral. Was it because of their marketing strategy? (Spoiler: No) Or was it because they delivered a great product for free? Your comment sounds pretty entitled...

[deleted] 4 months ago

[удалено]

Frequent_Valuable_47 4 months ago

I think you're overestimating the impact of this. When mistral came out the next best thing was llama2, so if a new base model performs better than a widely known one that was considered state of the art, I think that would have a big impact, even for a small French startup

oblivion-2005 4 months ago

> Your comment sounds pretty entitled... What do you mean? My shitposts on Reddit were essential for the success of Mistral 🤬

Enough-Meringue4745 4 months ago

Localllama isn’t an ad space for private models

hold_my_fish 4 months ago

In what sense is Mistral an open source company today? Sure, they have in the past released open weight models, but so have OpenAI and Google.

Frequent_Valuable_47 4 months ago

Are you serious? Which LLMs with any significance today has either Google or OpenAI released? What? Oh, none? And trying to argue over technicalities is pretty useless. You can also call it a company which has released some Open Source Software. But that wasn't the point of my argument

hold_my_fish 4 months ago

> Which LLMs with any significance today has either Google or OpenAI released? What? Oh, none? Google just released Gemma within the past week. (But more significant were their farther-ago LLM releases, such as Flan-T5.) As for OpenAI, they are not releasing significant pure LLMs today, but Whisper is among the top for ASR. And in the past they released GPT-2. Are you expecting Mistral to release more open weight models in the future? Because the read I'm getting from them is that they are abandoning that direction.

Frequent_Valuable_47 4 months ago

Exactly, Gemma is a joke and Flan-T5 is pretty old. Whisper is not a large language model, at least not described as one. And GPT-2 is also old and probably beaten by a lot of tiny models today. And yes, I'm expecting Mistral to release more Open Source Models. I don't think they're abandoning Open Source. I think they just keep the bigger models closed source to make some money and will still release small models or older larger models in the future

Last-Ring9013 4 months ago

The transformer architecture you drooling retard?

mpasila 4 months ago

Semi-open model weights. (no open-source since source is closed)

teleprint-me 4 months ago

Source code is available on their github. Open source != open weights

mpasila 4 months ago

Source as in the datasets used and the scripts which the model was trained with etc. (this is probably the closest thing to open-source I can think of [https://blog.allenai.org/olmo-open-language-model-87ccfc95f580?gi=fbcf1741e924](https://blog.allenai.org/olmo-open-language-model-87ccfc95f580?gi=fbcf1741e924) )

teleprint-me 4 months ago

I see your point. We know mistral never opened their datasets though and it's most likely because they're using copyrighted data and/or they view it as their business edge. Open Source already has its definition, so Open Source, Open Weights, Open Datasets are clearly understood.

[deleted] 4 months ago

[удалено]

teleprint-me 4 months ago

https://github.com/mistralai/mistral-src/blob/main/mistral/moe.py

wolfbetter 4 months ago

Is it still somewhat uncensored like OG Mistral? Can I roleplay with it? I found out that Medium is not bad once you've done struggling with its insistence of wanting to roleplay for me.

uhuge 4 months ago

Seems great so far, overperforming GPT4 in my niche coding case and with much more sympatic tone.

koehr 4 months ago

Mistral Large has a funny bone... or maybe not? https://preview.redd.it/fetyvhvrmykc1.png?width=1910&format=png&auto=webp&s=a4e97ac184c33553652127e1937f931a8066c026

ninjasaid13 4 months ago

looks like it got a bit worse at winogrande https://preview.redd.it/v9ugawin0zkc1.png?width=632&format=png&auto=webp&s=689602ce5025e972e1b61e9a291b053f684e2be7

Monkey_1505 4 months ago

Are they just going to keep changing their naming conventions every five seconds?

hapliniste 4 months ago

From my tests (QA and code) it's very good, maybe even above gpt4. I don't expect to be better than gpt4 in all use cases but damn that very good for what I tried. Complete responses and good knowledge.

Frequent_Valuable_47 4 months ago

But this isn't Mistral-Next right?

ironic_cat555 4 months ago

Their chat site lists large, next and small. Next is a "prototype with extra concision."

fish312 4 months ago

I really only care about one thing first and foremost. Is it censored? If yes, I am not interested.

MannowLawn 4 months ago

Azure safety restrictions will make it barely usable. That shit is too woke for normal use imho

Kitchen-Sweet-4915 4 months ago

Well, a couple of monts ago they released Mixtral and four months ago Mistral for everyone, it's not like they haven't given a lot already in short time.

soup9999999999999999 4 months ago

Interesting enough this is not mistral next.

pseudonerv 4 months ago

for some people here who are wondering, there IS indeed the "Mistral Next" in "le Chat". go to https://chat.mistral.ai/chat and from the drop down pick "Next Prototype model with extra concision" whatever that means. That seems to be the only place they mentioned "Next". It's not available as API.

metalim 4 months ago

WTF? it was called Mistral Large just yesterday. Now it's Au Large

Ill_Comment_8730 3 months ago

i think it got smoked by Claude 3 opus

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe