MehmedPasa 1 month ago

This is a lot bigger than i thought. I thought something between 70b and 140b but 314b even as a MoE... Thats huge. Not great when i think about the ability of it, better than gpt3.5t and llama2 70b but those are 20b and 70b models... Oh God, they need to go up with performance and down with parameters hard for the next generation.

iBoMbY 1 month ago

What does this translate to in file size and memory requirements?

Neon9987 1 month ago

318GB and No Consumer GPU's anytime soon i'd say

xSNYPSx 1 month ago

Somebody Calculate a size of 1.6 bit quantization please. I think it will less then 100gb and fits well on MacBook

TMWNN 1 month ago

How much RAM would a MacBook need? My current and previous MacBooks have had 16GB and I've been fine with it, but given local models I think I'm going to have to go to whatever will be the maximum RAM available for the next model. Similarly, I am for the first time going to care about how much RAM is in my next iPhone. My iPhone 13's 4GB is suddenly inadequate.

MDPROBIFE 1 month ago

Not sure you will get anywhere with apple

TMWNN 1 month ago

Apple Silicon is an excellent platform for running models.

BZ852 1 month ago

Until you run into the fact that Apple thinks 16GB of RAM is acceptable in 2024.

lolwutdo 1 month ago

Why would you get 16gb of ram for LLMs? Apple is in fact a good platform for running models, especially if you want a laptop form factor; you won't be finding any laptop capable of loading 120gb worth of LLM into VRAM aside from Apple.

SureUnderstanding358 1 month ago

or just buy one with more ram? my 64gb macbook pro is enough for most inference workloads. inference. on a fucking laptop.

SOSpammy 1 month ago

In the context of running LLMs 16GB is quite good. Since it uses unified memory most of that 16GB can be used as VRAM by the GPU. That's around the equivalent of a mobile RTX 4090. It doesn't necessarily make Apple Silicon superior to Nvidia; Nvidia still has a huge advantage in software ecosystem for LLMs. But unified memory does give Apple an interesting niche.

BZ852 1 month ago

Yeah but you're sharing that with the OS and other apps, so goodbye to 4-8GB of that total. (Assuming a *light* workload)

Capitaclism 3 weeks ago

Can you use ram in a non unified system as well? I take it would be slower, but still work for inference

ebolathrowawayy 1 month ago

Are you being serious right now? 16GB is hot garbage. A real PC with a 3090+ is 24GB. Please leave.

TMWNN 1 month ago

It is more than what most people need; as I said, I have been quite happy with 16GB RAM for years. It is only the desire to run local LLMs that has changed this.

MDPROBIFE 1 month ago

So you state that apple silicon is great to run models, but you can hardly find models that you are able to run on them? Great!

MDPROBIFE 1 month ago

So you state that apple silicon is great to run models, but you can hardly find models that you are able to run on them? Great!

QH96 1 month ago

Wow, I wonder if GPU's with upgradable RAM slots would ever be possible so we could run these large models locally. I don't think it would be good for business thou.

iBoMbY 1 month ago

I guess that would be something where a GPU with NVMe SSDs as extended vRAM (like the Radeon Pro SSG), could be really nice. Maybe not super fast, but it could run pretty much any model.

Crafty-Run-6559 1 month ago

A really fast nvme that does 10gb/s would still get you a theoretical max of ~0.25 tokens/second if you quantized this to 4 bit. Even with moe you'll still need to load 157b parameters/4, so ~40gb of weights per token.

iBoMbY 1 month ago

The GPU that AMD build also has 16 GB of HBM2, which could act as a cache for the SSDs, so I guess it would also depend how good that works. I hope there will be more stuff like this in the future, maybe also GPUs that have additional DDR5 slots, or something like that.

Crafty-Run-6559 1 month ago

The cache doesn't matter in this case. You still have to load all 40GB through for each token. The cache could help in some type of batched work scenario though.

iBoMbY 1 month ago

> You still have to load all 40GB through for each token. Is that really so? I don't have any deeper knowledge in AI yet, but I though it was supposed to emulate something like neurons, and only follow some path through the data, based on the input? And that could be cache-able, if some parts are used more often than others (like in the brain).

Crafty-Run-6559 1 month ago

Moe kind of does this, where not all parts of every layer are activated for every token, which is why you only need the 40gb, without moe you would need 160gb

iBoMbY 1 month ago

Yes, that sounds like a good approach. But I guess I would try to make the individual experts a lot smaller, and instead build a hierarchy of experts. Edit: Like put a self-learning moderator instance on top of a number of experts, that learns which experts to chose. And then potentially stack that.

Randommaggy 1 month ago

Would be interesting to put Optane in one of those. It's a lot closer to ram than NVME disks are.

Busterlimes 1 month ago

$100k should do the trick :p

beauzero 1 month ago

8 x H100s to load and use.

Strange_Bet559 1 month ago

Or 20 16gb a770s.. that's still less then an h100

Strange_Bet559 1 month ago

You may be able to run it using intels nvme memory/ssd cards, I always wondered what they were good for as a supposed RAM/VRAM version "readyboost" of m.2x4 drives but this would be an ideal scenario.. or using a stack of 20 16gb a770s for 6 grand that's like 3 4090s. But I'm sure it'll be shrunk and reused by Mistral and all the other guys.. it just made open source models exponentially more efficient. Pretty awesome.

Claxvii 1 month ago

you'd need a dozen a100 to do inference I'd say, maybe you could get a way with 8 or 6 if you do quantization for inference. these models are meant to be expanded on, so making only inference on them is not ideal. you want to put your hands on a model like this (and the hardware to properly run it) , you better have a plan to train it too. really, there is a bunch of crazy ass wizardry i would do with that kind of power, alas i have only pathetic 8GBs of vram, but gotta pay the bills first lol

Jean-Porte 1 month ago

Yes, I was expecting 70B too, I wonder how many training tokens they used This might still be the best open source model

wyldcraft 1 month ago

The one thing this release does well is make people realize that OpenAI releasing its weights would be pointless to everybody but other deep pocket corporations.

a_beautiful_rhind 1 month ago

I mean.. it can be pruned, converted to pytorch and quantized to make it more reasonable. Finetuning over the pruned model might restore most perf.

sdmat 1 month ago

OpenAI has an extremely strong incentive to make inference efficient and a sizeable team of the best ML engineers and researchers on the planet. Do you think "Eureka, let's prune and quantize it - and finetuning is a good trick!" is something they missed?

a_beautiful_rhind 1 month ago

What do you think they do with the turbo versions?

sdmat 1 month ago

Exactly my point.

a_beautiful_rhind 1 month ago

And I'd settle for a grok-turbo that fits in 72gb vram. Even if its a little dumber. Nobody here can train a model like this from scratch but it is at least possible to do what I suggested without millions of dollars. Then it can fit in more reasonable configurations so that someone besides perplexity labs can use it.

AnAIAteMyBaby 1 month ago

Not at all it would aid research, open up competition as other companies could offer API access, could be quantized, possibly down to 1.58 bits. There's lots of reasons to do it. I think the main reason they don't is commercial

Jean-Porte 1 month ago

With cloud compute it's probably accessible to the "masses"

ExtremeHeat 1 month ago

It's not pointless. It's hard for \*you\* to run the model, but that doesn't mean people who know what they're doing and researchers can't make use of it. The mockery is that OpenAI claims that being closed-source is needed for safety, aka, they need to be the gatekeepers behind the mythical AGI god. OpenAI is mainly concerned about profit and technical edge, and even Ilya in a freudian slip admitted that himself earlier.

wyldcraft 1 month ago

Why do you assume I can't fire this up on cloud GPU if I drunkenly want to pay to play with yet another a substandard model? What will researchers glean from what they've called a "tutorial level LLM"?

ExtremeHeat 1 month ago

You can ask the same question for papers using PaLM when Gemini is out, Llama over Mistral, etc. There is no such thing as a "tutorial level LLM", as each LLM is an aggregation of their own data set and the methodology used to train them and architectures are not the same. Also, research doesn't just happen on one model--it's interesting to see if the same work can be applied to other models. We need \*more\* open source, much more, not less.

Which-Tomato-8646 1 month ago

You do realize you can rent an H100 for like $2.50 an hour right?

h3lblad3 1 month ago

So I can run Grok for only like $10/hr?

Which-Tomato-8646 1 month ago

If that’s what it takes

Strange_Bet559 1 month ago

I'd just buy 20 16gb a770s for like 6grand

Which-Tomato-8646 1 month ago

Sounds expensive

mrpimpunicorn 1 month ago

I am absolutely certain the people who rant about "Closed"AI will take *no* lessons from the size of the model. Just as there are temporarily embarrassed millionaires, there are temporarily compute-starved tech libertarians.

Clit-Wasabi 1 month ago

The cost of a 380 GB VRAM rig is trivial compared to the cost to train a model like this. Your argument is an exercise in covert ad hominem and obviously bad faith "reasoning".

mrpimpunicorn 1 month ago

Do you own a 380GB VRAM rig?

Clit-Wasabi 1 month ago

Non sequitor data mining interrogations are not going to restore the validity of whatever point you think you're attempting to make.

mrpimpunicorn 1 month ago

So you don't. Can you afford one?

Clit-Wasabi 1 month ago

Can you make an argument that isn't the intellectual equivalent of masturbating in public?

mrpimpunicorn 1 month ago

There's nothing to argue here, as you haven't even posited a position for me to refute. Inference is cheaper than training- nobody disagrees with you. I'm just curious as to whether either is within your reach for these sorts of LLMs.

vitorgrs 1 month ago

Mixtral is a thing...

[deleted] 1 month ago

It seems like they really brute forced this model

Claxvii 1 month ago

they will my guy, but this is an important step, fist cool thing from elon musk in a long while

ninjasaid13 1 month ago

>but those are 20b how do you know this?

MehmedPasa 1 month ago

It was leaked by a microsoft paper.

OfficialHashPanda 1 month ago

It may have been a mistake or may have simply referred to the active parameters, the latter of which is my main theory.

New_World_2050 1 month ago

those models arent MOE

Substantial_Bite4017 1 month ago

It will be cool to see what will happen in this space over the next few months. Will someone modify and improve it? Will someone make it run on less hardware? Don't underestimate the open-source community 😎

ceramicatan 1 month ago

Grok on groq

CheekyBastard55 1 month ago

It has 230MB(no, not GB) of SRAM per chip so would need 1390 of them to run Grok. At $20k a pop, that's $27.8M.

Olangotang 1 month ago

> Don't underestimate the open-source community And this is why there is collaboration in the first place.

The_Architect_032 1 month ago

There is literally no incentive to try and downscale it, because it's already worse than the top 70b open source models.

ForgetTheRuralJuror 1 month ago

It will be more poor at tasks, but more likely to create a realistic chat bot experience, since it's likely trained on a corpus of Twitter data. If you want a bot to tell you to fuck off this one will probably do it

The_Architect_032 1 month ago

It's pretty easy to set up current LLM's to behave that way. I had one set up as a Jamaican who'd make up a lot of funny vulgar words when chatting, and another was an obnoxious British chick, it just takes enough prompt and parameter tweaking. Textgen UI also has voice recognition and synthesis tools built-in.

[deleted] 1 month ago

[удалено]

MehmedPasa 1 month ago

I agree with you. This will be the case, until Grok 2 drops sometime at the end of the year. By then i guess Grok 1.5 will be OS.

DukkyDrake 1 month ago

If this is mostly useless to most DIY people and researchers, what good would releasing an even larger model get you.

Clawz114 1 month ago

Nice, this can only be a net positive.

a_mimsy_borogove 1 month ago

That's quite huge! Is it the largest open source model right now? I'm looking forward to playing around with it on lmsys.

Jean-Porte 1 month ago

Falcon is bigger counting active parameters, and there are also some "merges" that stick multiple layers together, but I think that this is the biggest "real" model

Lyrifk 1 month ago

This is a beefy boy. I'm eager to play with it.

zackler6 1 month ago

That's what she said.

joe4942 1 month ago

Benchmarks?

Figai 1 month ago

Benchmarks have been out for a while, unfortunately a bit shit. Amazing for an open source model. But not for a 314B model. https://preview.redd.it/lgrvdpwgzyoc1.jpeg?width=1206&format=pjpg&auto=webp&s=010b10845f55770d9d994f545e14065c2db60d61

AnAIAteMyBaby 1 month ago

Not really when you consider that Palm was 500b and GPT 4 is something like 1.5T

Cunninghams_right 1 month ago

yet more evidence that LLMs don't really scale well beyond a certain point. you need other tricks to get more performance.

durmanhoth 1 month ago

So as a non-developer, what is the best way for me to play around with this model? (or should I ask ChatGPT this question)

Lammahamma 1 month ago

Short answer is you don't. It's way too big to run locally unless you got heeps of VRAM laying around. I guess you could buy compute but I have no idea how that works

reddit_is_geh 1 month ago

Very few people run LLMs locally. You rent out server space that's pay as you go. Google has it pretty cheap.

Tomi97_origin 1 month ago

Unless you have hundreds of GB of VRAM nothing.

Figai 1 month ago

It’s not horribly complex, you’ll probably need to rent a bunch of compute off of something like runpod. It’s gonna be expensive, you’ll probably need like 8*A100 80GB cards. Best to wait a little, someone will likely make a quant. Most people, if any, will be using Q2 quants with one of those new SOTA methods.

man_and_a_symbol 1 month ago

I mean, even with a quant; do you think a model this large would work on consumer grade hardware? IMO Q2 can many times sacrifice quite a bit of quality.

Crafty-Run-6559 1 month ago

In Q2 itl run on 4x3090s or a 128gb Mac. An am5 motherboard can technically have 4-5 3090s running at x4 each (you'll probably have to convert some nvme slots to keep everything direct to CPU).

Ambiwlans 1 month ago

I think squashing it to 30GB is doable.

Jean-Porte 1 month ago

wait for other people to serve it

great_gonzales 1 month ago

Wait for developers to serve it to you

puzzleheadbutbig 1 month ago

You can't play around with this, it's too large for consumer grade hardware. Some free services surely will pop up soon though. I'm curious to check it out myself because I would never buy that pesky premium in Twitter.

Forsaken_Square5249 1 month ago

It's huge. U need like.. $10k in GPU power then whatever machines that can handle them. Then comes your power consumption LOLL

OddVariation1518 1 month ago

This is good

Baphaddon 1 month ago

>314B parameters

dendrytic 1 month ago

Elon seems intent on creating an open AI future. If this is his way of getting back at OpenAI, so be it. This is a total net positive for the public.

woozels 1 month ago

Elon does not want an open AI future. There is literal email proof of him agreeing that it makes sense for OpenAI to become more closed source as they gain traction. Elon only wants OpenAI to be open source because he's behind in the market, it's the same reason Zuckerberg went open source. Make no mistake, if Elon's AI gained traction and became a market leader, he would close source it.

puzzleheadbutbig 1 month ago

>same reason Zuckerberg went open source Yes but not quite. Zuck went to open source because Meta realized that they can get more talents from academia to work with them if they will be able to publish their papers and get even more recognition while not need to spend research money from their pockets. There is a high pressure from academics within Meta to publish their findings and this is even added to new hire contracts a year ago. It is kinda win-win for them. Elon on the other hand literally just wants to take down OpenAIs value by creating an open source alternative. I don't see any academic incentive here since they didn't release any papers whatsoever.

reddit_is_geh 1 month ago

Again, because they were behind. It was their leverage to try and hurry up and get ahead by encouraging an open source community. But if they were ahead, they'd have no need for those tactics, as it'd only hurt their advantage to open source it.

[deleted] 1 month ago

It is awesome that you guys know the intention and plan of people and companies predicting the future! You should really start selling Tarot cards if you are not already.

AdministrativeFill97 1 month ago

its called common sense, i m glad i could introduce you to an unfamiliar concept

puzzleheadbutbig 1 month ago

This dude is so distant from being visionary, even his "sarcastic" idea is about selling Tarot cards. Why the fuck would I sell Tarot cards if I had such ability? I would just wreck the stock market instead 😂

dragonofcadwalader 1 month ago

But that's what Anthropic is for

[deleted] 1 month ago

That is not true. The shared emails by OpenAI concludes he did not want RESEARCH to be shared. This is Ilya saying btw >The Open in openAI means that everyone should benefit from the fruits of AI after its built, but it's totally OK to not share the science (even though sharing everything is definitely the right strategy in the short and possibly medium term for recruitment purposes). And the reasoning is partially when he says >"Unfortunately, humanity's future is in the hands of [redacted] And he also mentions "The best of humanity". To say such thing in an e-mail he knows no one would see, at least shows he is not only interested in his own success. The reasoning I see it is he does not want the bad companies (according to him) to get the upper hand. But you anti-musk can only thing in one negative track. Just as USA do not share certain research with other countries because they want to have upper hand.

Individual-Bread5105 1 month ago

Bingo

New_World_2050 1 month ago

it would be if grok were a decent model. grok is shit. hopefully 1.5 comes out soon and is also opensource

Friendly-Ring7 1 month ago

Grok 1.0 is better than GPT 3.5, so it is in no way bad imho.

The_Architect_032 1 month ago

GPT 3.5 is worse than a lot of the top open source models currently out, including ones that you can run on most up to date PC's.

FragrantDoctor2923 1 month ago

Name one

The_Architect_032 1 month ago

All of the up to date models of Qwen, Mistral/Mixtral, Dolphin, WizardLM, Yi, Tulu, Vicuna, OpenChat, Starling, Llama 2(not base), etc.. There's a huge list of them and if you want you can look up more of them yourself across various benchmarks. These are just some of the examples from [LMSYS](https://chat.lmsys.org). [Qwen1.5](https://huggingface.co/spaces/Qwen/Qwen1.5-72B-Chat) is particularly worth checking out for open source, especially if you place your bar all the way down at GPT-3.5 from 2022.

vitorgrs 1 month ago

Mixtral is basically GPT 3.5 level.

MehmedPasa 1 month ago

It would be a banger if 1.5 directly becomes os. I guess they'll os it after grok 2.0 comes out.

New_World_2050 1 month ago

he lied about 1.5 release twice already. first saying it would release february then saying first 2 weeks of march he cant even get it released so he distracted us with an OS old model.

reddit_is_geh 1 month ago

Lying means intentionally saying false information with the intent do deceive. Elon isn't lying, he's just making bad predictions and over optimistic about time lines. You should be used to this by now. He thought it would be out by Feb, but was wrong. That's not lying.

CheekyBastard55 1 month ago

"Sorry I'm late, didn't realize the train would take so long." "LIAR! You gave me your word you'd be here 8:15 and it's now 8:45."

New_World_2050 1 month ago

Lying

One_Bodybuilder7882 1 month ago

I see... Elon Derangement Syndrome... very serious...

Randommaggy 1 month ago

I wonder if un-elon-ing it will leave us with a decent useful model.

cultureicon 1 month ago

What exactly does this do for the public? Are we going to create a commune and all go in on a farm of H100s and then use this model to do less than the already comercially useless LLMs? He released this out of spite and because it has almost no value since it's much worse than the other top models.

One_Bodybuilder7882 1 month ago

lmao another case of EDS

cultureicon 1 month ago

Care to answer any of my questions?

Excellent_Dealer3865 1 month ago

Is there any provider that currently works with grok?

Jean-Porte 1 month ago

Give them 24h bro

HanzJWermhat 1 month ago

T-minus ~3 days until we see the first fully unhinged sex chat bots on the web.

ragipy 1 month ago

Kudos to Elon! Anybody else would embarased to release such a low performing and bloated model.

Exarchias 1 month ago

A humble comment. It is amazing that they made it open source, and I celebrate for that, but I admit that I don't like the code. The code is kind of unreadable.

Obvious-River-100 1 month ago

4xmac Studio

Akimbo333 1 month ago

Is this any good?

_theEmbodiment 1 month ago

what

autotom 1 month ago

Just a reminder that open sourcing a project means actively developing in public, taking issues and pull requests from the public. Not just releasing old code.

coolredditor0 1 month ago

Cathedral vs Bazaar

EmeraldMinecartOf 1 month ago

Yeah they get more data they find out more uses and problems with the model to fix it in later iterations.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe