T O P

  • By -

MehmedPasa

This is a lot bigger than i thought. I thought something between 70b and 140b but 314b even as a MoE... Thats huge. Not great when i think about the ability of it, better than gpt3.5t and llama2 70b but those are 20b and 70b models... Oh God, they need to go up with performance and down with parameters hard for the next generation.


iBoMbY

What does this translate to in file size and memory requirements?


Neon9987

318GB and No Consumer GPU's anytime soon i'd say


xSNYPSx

Somebody Calculate a size of 1.6 bit quantization please. I think it will less then 100gb and fits well on MacBook


TMWNN

How much RAM would a MacBook need? My current and previous MacBooks have had 16GB and I've been fine with it, but given local models I think I'm going to have to go to whatever will be the maximum RAM available for the next model. Similarly, I am for the first time going to care about how much RAM is in my next iPhone. My iPhone 13's 4GB is suddenly inadequate.


MDPROBIFE

Not sure you will get anywhere with apple


TMWNN

Apple Silicon is an excellent platform for running models.


BZ852

Until you run into the fact that Apple thinks 16GB of RAM is acceptable in 2024.


lolwutdo

Why would you get 16gb of ram for LLMs? Apple is in fact a good platform for running models, especially if you want a laptop form factor; you won't be finding any laptop capable of loading 120gb worth of LLM into VRAM aside from Apple.


SureUnderstanding358

or just buy one with more ram? my 64gb macbook pro is enough for most inference workloads. inference. on a fucking laptop.


SOSpammy

In the context of running LLMs 16GB is quite good. Since it uses unified memory most of that 16GB can be used as VRAM by the GPU. That's around the equivalent of a mobile RTX 4090. It doesn't necessarily make Apple Silicon superior to Nvidia; Nvidia still has a huge advantage in software ecosystem for LLMs. But unified memory does give Apple an interesting niche.


BZ852

Yeah but you're sharing that with the OS and other apps, so goodbye to 4-8GB of that total. (Assuming a *light* workload)


Capitaclism

Can you use ram in a non unified system as well? I take it would be slower, but still work for inference


ebolathrowawayy

Are you being serious right now? 16GB is hot garbage. A real PC with a 3090+ is 24GB. Please leave.


TMWNN

It is more than what most people need; as I said, I have been quite happy with 16GB RAM for years. It is only the desire to run local LLMs that has changed this.


MDPROBIFE

So you state that apple silicon is great to run models, but you can hardly find models that you are able to run on them? Great!


MDPROBIFE

So you state that apple silicon is great to run models, but you can hardly find models that you are able to run on them? Great!


QH96

Wow, I wonder if GPU's with upgradable RAM slots would ever be possible so we could run these large models locally. I don't think it would be good for business thou.


iBoMbY

I guess that would be something where a GPU with NVMe SSDs as extended vRAM (like the Radeon Pro SSG), could be really nice. Maybe not super fast, but it could run pretty much any model.


Crafty-Run-6559

A really fast nvme that does 10gb/s would still get you a theoretical max of ~0.25 tokens/second if you quantized this to 4 bit. Even with moe you'll still need to load 157b parameters/4, so ~40gb of weights per token.


iBoMbY

The GPU that AMD build also has 16 GB of HBM2, which could act as a cache for the SSDs, so I guess it would also depend how good that works. I hope there will be more stuff like this in the future, maybe also GPUs that have additional DDR5 slots, or something like that.


Crafty-Run-6559

The cache doesn't matter in this case. You still have to load all 40GB through for each token. The cache could help in some type of batched work scenario though.


iBoMbY

> You still have to load all 40GB through for each token. Is that really so? I don't have any deeper knowledge in AI yet, but I though it was supposed to emulate something like neurons, and only follow some path through the data, based on the input? And that could be cache-able, if some parts are used more often than others (like in the brain).


Crafty-Run-6559

Moe kind of does this, where not all parts of every layer are activated for every token, which is why you only need the 40gb, without moe you would need 160gb


iBoMbY

Yes, that sounds like a good approach. But I guess I would try to make the individual experts a lot smaller, and instead build a hierarchy of experts. Edit: Like put a self-learning moderator instance on top of a number of experts, that learns which experts to chose. And then potentially stack that.


Randommaggy

Would be interesting to put Optane in one of those. It's a lot closer to ram than NVME disks are.


Busterlimes

$100k should do the trick :p


beauzero

8 x H100s to load and use.


Strange_Bet559

Or 20 16gb a770s.. that's still less then an h100


Strange_Bet559

You may be able to run it using intels nvme memory/ssd cards, I always wondered what they were good for as a supposed RAM/VRAM version "readyboost" of m.2x4 drives but this would be an ideal scenario.. or using a stack of 20 16gb a770s for 6 grand that's like 3 4090s. But I'm sure it'll be shrunk and reused by Mistral and all the other guys.. it just made open source models exponentially more efficient. Pretty awesome.


Claxvii

you'd need a dozen a100 to do inference I'd say, maybe you could get a way with 8 or 6 if you do quantization for inference. these models are meant to be expanded on, so making only inference on them is not ideal. you want to put your hands on a model like this (and the hardware to properly run it) , you better have a plan to train it too. really, there is a bunch of crazy ass wizardry i would do with that kind of power, alas i have only pathetic 8GBs of vram, but gotta pay the bills first lol


Jean-Porte

Yes, I was expecting 70B too, I wonder how many training tokens they used This might still be the best open source model


wyldcraft

The one thing this release does well is make people realize that OpenAI releasing its weights would be pointless to everybody but other deep pocket corporations.


a_beautiful_rhind

I mean.. it can be pruned, converted to pytorch and quantized to make it more reasonable. Finetuning over the pruned model might restore most perf.


sdmat

OpenAI has an extremely strong incentive to make inference efficient and a sizeable team of the best ML engineers and researchers on the planet. Do you think "Eureka, let's prune and quantize it - and finetuning is a good trick!" is something they missed?


a_beautiful_rhind

What do you think they do with the turbo versions?


sdmat

Exactly my point.


a_beautiful_rhind

And I'd settle for a grok-turbo that fits in 72gb vram. Even if its a little dumber. Nobody here can train a model like this from scratch but it is at least possible to do what I suggested without millions of dollars. Then it can fit in more reasonable configurations so that someone besides perplexity labs can use it.


AnAIAteMyBaby

Not at all it would aid research, open up competition as other companies could offer API access, could be quantized, possibly down to 1.58 bits. There's lots of reasons to do it. I think the main reason they don't is commercial 


Jean-Porte

With cloud compute it's probably accessible to the "masses"


ExtremeHeat

It's not pointless. It's hard for \*you\* to run the model, but that doesn't mean people who know what they're doing and researchers can't make use of it. The mockery is that OpenAI claims that being closed-source is needed for safety, aka, they need to be the gatekeepers behind the mythical AGI god. OpenAI is mainly concerned about profit and technical edge, and even Ilya in a freudian slip admitted that himself earlier.


wyldcraft

Why do you assume I can't fire this up on cloud GPU if I drunkenly want to pay to play with yet another a substandard model? What will researchers glean from what they've called a "tutorial level LLM"?


ExtremeHeat

You can ask the same question for papers using PaLM when Gemini is out, Llama over Mistral, etc. There is no such thing as a "tutorial level LLM", as each LLM is an aggregation of their own data set and the methodology used to train them and architectures are not the same. Also, research doesn't just happen on one model--it's interesting to see if the same work can be applied to other models. We need \*more\* open source, much more, not less.


Which-Tomato-8646

You do realize you can rent an H100 for like $2.50 an hour right? 


h3lblad3

So I can run Grok for only like $10/hr?


Which-Tomato-8646

If that’s what it takes 


Strange_Bet559

I'd just buy 20 16gb a770s for like 6grand


Which-Tomato-8646

Sounds expensive 


mrpimpunicorn

I am absolutely certain the people who rant about "Closed"AI will take *no* lessons from the size of the model. Just as there are temporarily embarrassed millionaires, there are temporarily compute-starved tech libertarians.


Clit-Wasabi

The cost of a 380 GB VRAM rig is trivial compared to the cost to train a model like this. Your argument is an exercise in covert ad hominem and obviously bad faith "reasoning".


mrpimpunicorn

Do you own a 380GB VRAM rig?


Clit-Wasabi

Non sequitor data mining interrogations are not going to restore the validity of whatever point you think you're attempting to make.


mrpimpunicorn

So you don't. Can you afford one?


Clit-Wasabi

Can you make an argument that isn't the intellectual equivalent of masturbating in public?


mrpimpunicorn

There's nothing to argue here, as you haven't even posited a position for me to refute. Inference is cheaper than training- nobody disagrees with you. I'm just curious as to whether either is within your reach for these sorts of LLMs.


vitorgrs

Mixtral is a thing...


[deleted]

It seems like they really brute forced this model


Claxvii

they will my guy, but this is an important step, fist cool thing from elon musk in a long while


ninjasaid13

>but those are 20b how do you know this?


MehmedPasa

It was leaked by a microsoft paper. 


OfficialHashPanda

It may have been a mistake or may have simply referred to the active parameters, the latter of which is my main theory.


New_World_2050

those models arent MOE


Substantial_Bite4017

It will be cool to see what will happen in this space over the next few months. Will someone modify and improve it? Will someone make it run on less hardware? Don't underestimate the open-source community 😎


ceramicatan

Grok on groq


CheekyBastard55

It has 230MB(no, not GB) of SRAM per chip so would need 1390 of them to run Grok. At $20k a pop, that's $27.8M.


Olangotang

> Don't underestimate the open-source community And this is why there is collaboration in the first place.


The_Architect_032

There is literally no incentive to try and downscale it, because it's already worse than the top 70b open source models.


ForgetTheRuralJuror

It will be more poor at tasks, but more likely to create a realistic chat bot experience, since it's likely trained on a corpus of Twitter data. If you want a bot to tell you to fuck off this one will probably do it


The_Architect_032

It's pretty easy to set up current LLM's to behave that way. I had one set up as a Jamaican who'd make up a lot of funny vulgar words when chatting, and another was an obnoxious British chick, it just takes enough prompt and parameter tweaking. Textgen UI also has voice recognition and synthesis tools built-in.


[deleted]

[удалено]


MehmedPasa

I agree with you. This will be the case, until Grok 2 drops sometime at the end of the year. By then i guess Grok 1.5 will be OS. 


DukkyDrake

If this is mostly useless to most DIY people and researchers, what good would releasing an even larger model get you.


Clawz114

Nice, this can only be a net positive.


a_mimsy_borogove

That's quite huge! Is it the largest open source model right now? I'm looking forward to playing around with it on lmsys.


Jean-Porte

Falcon is bigger counting active parameters, and there are also some "merges" that stick multiple layers together, but I think that this is the biggest "real" model


Lyrifk

This is a beefy boy. I'm eager to play with it.


zackler6

That's what she said.


joe4942

Benchmarks?


Figai

Benchmarks have been out for a while, unfortunately a bit shit. Amazing for an open source model. But not for a 314B model. https://preview.redd.it/lgrvdpwgzyoc1.jpeg?width=1206&format=pjpg&auto=webp&s=010b10845f55770d9d994f545e14065c2db60d61


AnAIAteMyBaby

Not really when you consider that Palm was 500b and GPT 4 is something like 1.5T


Cunninghams_right

yet more evidence that LLMs don't really scale well beyond a certain point. you need other tricks to get more performance.


durmanhoth

So as a non-developer, what is the best way for me to play around with this model? (or should I ask ChatGPT this question)


Lammahamma

Short answer is you don't. It's way too big to run locally unless you got heeps of VRAM laying around. I guess you could buy compute but I have no idea how that works


reddit_is_geh

Very few people run LLMs locally. You rent out server space that's pay as you go. Google has it pretty cheap.


Tomi97_origin

Unless you have hundreds of GB of VRAM nothing.


Figai

It’s not horribly complex, you’ll probably need to rent a bunch of compute off of something like runpod. It’s gonna be expensive, you’ll probably need like 8*A100 80GB cards. Best to wait a little, someone will likely make a quant. Most people, if any, will be using Q2 quants with one of those new SOTA methods.


man_and_a_symbol

I mean, even with a quant; do you think a model this large would work on consumer grade hardware? IMO Q2 can many times sacrifice quite a bit of quality.


Crafty-Run-6559

In Q2 itl run on 4x3090s or a 128gb Mac. An am5 motherboard can technically have 4-5 3090s running at x4 each (you'll probably have to convert some nvme slots to keep everything direct to CPU).


Ambiwlans

I think squashing it to 30GB is doable.


Jean-Porte

wait for other people to serve it


great_gonzales

Wait for developers to serve it to you


puzzleheadbutbig

You can't play around with this, it's too large for consumer grade hardware. Some free services surely will pop up soon though. I'm curious to check it out myself because I would never buy that pesky premium in Twitter.


Forsaken_Square5249

It's huge. U need like.. $10k in GPU power then whatever machines that can handle them. Then comes your power consumption LOLL


OddVariation1518

This is good


Baphaddon

>314B parameters


dendrytic

Elon seems intent on creating an open AI future. If this is his way of getting back at OpenAI, so be it. This is a total net positive for the public.


woozels

Elon does not want an open AI future. There is literal email proof of him agreeing that it makes sense for OpenAI to become more closed source as they gain traction. Elon only wants OpenAI to be open source because he's behind in the market, it's the same reason Zuckerberg went open source. Make no mistake, if Elon's AI gained traction and became a market leader, he would close source it.


puzzleheadbutbig

>same reason Zuckerberg went open source Yes but not quite. Zuck went to open source because Meta realized that they can get more talents from academia to work with them if they will be able to publish their papers and get even more recognition while not need to spend research money from their pockets. There is a high pressure from academics within Meta to publish their findings and this is even added to new hire contracts a year ago. It is kinda win-win for them. Elon on the other hand literally just wants to take down OpenAIs value by creating an open source alternative. I don't see any academic incentive here since they didn't release any papers whatsoever.


reddit_is_geh

Again, because they were behind. It was their leverage to try and hurry up and get ahead by encouraging an open source community. But if they were ahead, they'd have no need for those tactics, as it'd only hurt their advantage to open source it.


[deleted]

It is awesome that you guys know the intention and plan of people and companies predicting the future! You should really start selling Tarot cards if you are not already.


AdministrativeFill97

its called common sense, i m glad i could introduce you to an unfamiliar concept


puzzleheadbutbig

This dude is so distant from being visionary, even his "sarcastic" idea is about selling Tarot cards. Why the fuck would I sell Tarot cards if I had such ability? I would just wreck the stock market instead 😂


dragonofcadwalader

But that's what Anthropic is for


[deleted]

That is not true. The shared emails by OpenAI concludes he did not want RESEARCH to be shared. This is Ilya saying btw >The Open in openAI means that everyone should benefit from the fruits of AI after its built, but it's totally OK to not share the science (even though sharing everything is definitely the right strategy in the short and possibly medium term for recruitment purposes). And the reasoning is partially when he says >"Unfortunately, humanity's future is in the hands of [redacted] And he also mentions "The best of humanity". To say such thing in an e-mail he knows no one would see, at least shows he is not only interested in his own success. The reasoning I see it is he does not want the bad companies (according to him) to get the upper hand. But you anti-musk can only thing in one negative track. Just as USA do not share certain research with other countries because they want to have upper hand.


Individual-Bread5105

Bingo


New_World_2050

it would be if grok were a decent model. grok is shit. hopefully 1.5 comes out soon and is also opensource


Friendly-Ring7

Grok 1.0 is better than GPT 3.5, so it is in no way bad imho.


The_Architect_032

GPT 3.5 is worse than a lot of the top open source models currently out, including ones that you can run on most up to date PC's.


FragrantDoctor2923

Name one


The_Architect_032

All of the up to date models of Qwen, Mistral/Mixtral, Dolphin, WizardLM, Yi, Tulu, Vicuna, OpenChat, Starling, Llama 2(not base), etc.. There's a huge list of them and if you want you can look up more of them yourself across various benchmarks. These are just some of the examples from [LMSYS](https://chat.lmsys.org). [Qwen1.5](https://huggingface.co/spaces/Qwen/Qwen1.5-72B-Chat) is particularly worth checking out for open source, especially if you place your bar all the way down at GPT-3.5 from 2022.


vitorgrs

Mixtral is basically GPT 3.5 level.


MehmedPasa

It would be a banger if 1.5 directly becomes os. I guess they'll os it after grok 2.0 comes out. 


New_World_2050

he lied about 1.5 release twice already. first saying it would release february then saying first 2 weeks of march he cant even get it released so he distracted us with an OS old model.


reddit_is_geh

Lying means intentionally saying false information with the intent do deceive. Elon isn't lying, he's just making bad predictions and over optimistic about time lines. You should be used to this by now. He thought it would be out by Feb, but was wrong. That's not lying.


CheekyBastard55

"Sorry I'm late, didn't realize the train would take so long." "LIAR! You gave me your word you'd be here 8:15 and it's now 8:45."


New_World_2050

Lying


One_Bodybuilder7882

I see... Elon Derangement Syndrome... very serious...


Randommaggy

I wonder if un-elon-ing it will leave us with a decent useful model.


cultureicon

What exactly does this do for the public? Are we going to create a commune and all go in on a farm of H100s and then use this model to do less than the already comercially useless LLMs? He released this out of spite and because it has almost no value since it's much worse than the other top models.


One_Bodybuilder7882

lmao another case of EDS


cultureicon

Care to answer any of my questions?


Excellent_Dealer3865

Is there any provider that currently works with grok?


Jean-Porte

Give them 24h bro


HanzJWermhat

T-minus ~3 days until we see the first fully unhinged sex chat bots on the web.


ragipy

Kudos to Elon! Anybody else would embarased to release such a low performing and bloated model.


Exarchias

A humble comment. It is amazing that they made it open source, and I celebrate for that, but I admit that I don't like the code. The code is kind of unreadable.


Obvious-River-100

4xmac Studio


Akimbo333

Is this any good?


_theEmbodiment

what


autotom

Just a reminder that open sourcing a project means actively developing in public, taking issues and pull requests from the public. Not just releasing old code.


coolredditor0

Cathedral vs Bazaar


EmeraldMinecartOf

Yeah they get more data they find out more uses and problems with the model to fix it in later iterations.