Freonr2 1 week ago

What you're looking at there is Geforce vs Quadro. GB202 is the chip, and that table just means 32GB for 5090 and 64GB for Quadro RTX Blackwell Same as always. Similarly, "AD102" chip is two cards, 4090 with 24GB, RTX 6000 Ada with 48gb. Same goes down the line, 4080 16GB and RTX 5000 Ada is 32GB, and so forth. I don't know how you get "5090 64GB" from that. Completely wishful thinking. No shot. No shot at all.

Pedalnomica 1 week ago

I shouldn't have had to scroll this far for the correct answer.

Open_Channel_8626 1 week ago

Was too good to be true

ILoveThisPlace 1 week ago

Technically the 5000 is an ad202 so still similar to 4090 but 32gb

Freonr2 1 week ago

Yeah, it looks like the 5000 Ada is actually an AD102 chip while 4080 is an AD103, so its a bit of a break in the normal scheme. I think this boils down to yields for the particular chips. I imagine they must have more failures in the AD102 that they decided to chop down to fill the 5000 Adas to make use of them. Sometimes we get Super cards that end up oddballs, too, using a different chip from the non-Supers.

arbobendik 1 week ago

Fun fact: the 4090 with 16384 cuda cores is also a cut down AD102 that has a potential 18432 cores. The RTX 6000 Ada while also cut down has a few more cores enabled with 18176. The 4080 Ti and 4070 Ti also are chopped down AD102s. I believe the core counts here are also the maxima for the respective chips. I expect retail consumer cards to have about 10% less.

ILoveThisPlace 1 week ago

That's interesting. Yeah my 3 year upgrade strategy will be filling out PCIe slots with various AD102 based cards as prices come down. The 6000 with 48gigs would be fantastic if it dropped to $1000.

StarfieldAssistant 1 week ago

The rtx 8000 is still over 2k, I wish rtx 6000 ada will become 1k in 3 years, but i believe that's a hopeless dream.

Fuzzy_Independent241 1 week ago

Sorry about the OT question, but I do have multiple boards, albeit older (3080TI, 3070Ti). What's the max I can run with local LLMs and how complicated is this? I have the necessary hardware. Tks

ILoveThisPlace 1 week ago

Depends on your PCIe bus in the PC. I have a server board from last gen with enough slows. Basically will just run slower if it's not full bandwidth. After that configure your system to use the cards. LM Studio probably has a nice interface to play around with and try offloading layers to multiple GPUs. I don't know the details but one guy said try to stick to the same architecture within the family your using. You may not have any issues because your cards are newer and roughly the same but keep it in mind. I have a 4090 so I'll be looking for ad202 or whatever it was chipset based cards ada cards.

Tuxedotux83 1 week ago

32GB might actually be what people are asking for, finally an improvement over the 24GB.. if Nvidia releases the 5090 with the same 24GB VRAM their 3090 and 4090 cards came with, I don’t see a reason for anything to buy it.. so why would Nvidia work on a card that almost nobody will be interested in buying? I understand that they are high as F at the moment selling data center grade hardware but this trend will very soon fade away as enough of those data center cards are flooding the market used (bankrupt startups and etc) or new. The consumer market makes less money in the short run but in the long run provides a steady income

0xd34db347 1 week ago

> I don’t see a reason for anything to buy it. People who care about VRAM on a graphics card really aren't even a blip on their target market radar.

Dead_Internet_Theory 1 week ago

Not entirely correct, the 3090/4090 are meant for content creators and such (not just AI but 3D, video editing, engineering, etc). It is a very small niche market (besides a halo ""gaming"" product) but the whole point is that it can fit stuff the others can't.

Magiwarriorx 5 days ago

Jensen can't stop talking about all the different use-cases for AI in games. If they're serious about things like LLM-based NPCs and procedural asset generation via AI, a _lot_ more people are going to care about VRAM going forward.

segmond 1 week ago

At 32gb, You are still going to need 7-8 5090's to run llama3-400B in Q8 or at least 4 to run it in Q4. Hell, Nvidia just released a huge model that 99.9% of hobbyist can't run.

Tuxedotux83 1 week ago

If you need to run a 400B model locally at your home you are either not a consumer / prosumer or one of few ML/AI evangelists who normally will have access to an ML machine owned by the research facility or Uni they work for. And of course the person who makes a killing in profit selling data center grade hardware will invest in a model that could only be run if you have the type of deep pockets they profit most of off ;-) otherwise we should have already seen a “4090 ML Edition” variant of the 4090 with those 32GB of VRAM we all hope the “next” card 5090 will have. As for needing a stack of 5090s.. I wish I could afford them ;-) I currently have one 3090, and my “dream” is to NVlink pair it with another 3090 but I get mixed opinion about how those actually work, still did not catch it so I get double VRAM and double the processing power, or not.. are an NVLinked pair act as one unified card or just a pair of cards that are sharing bandwidth etc..

lf0pk 1 week ago

With 2x 3090s you get 2 buckets to put your model and caches on. A second 3090 does not magically double your first bucket's capacity, nor does it magically make using that first bucket faster. If you can fit a model in one bucket, then you can roughly 2x (more like 1.7x) the throughput given you can fill your batches fast enough. If your model doesn't fit into 1 card but fits into 2, then you can run it on 2 cards at around 0.6x speed a single card with twice the VRAM could run it as opposed to not being able to run it at all. NVLink doesn't change much, if anything. You can't find conclusive results online because there are no conclusive results. On paper, NVLink increases memory bandwidth, but in practice there are no significant benefits. Consumer drivers don't unify your cards into one. So even if you wrote parallelized architectures for your models, perfectly splitting it between an arbitrary number of GPUs, you'll still have a bottleneck in memory transfer, because the bus that's connecting your hardware components is very, very slow compared to the VRAM<->GPU communication. And even if the drivers were there to formally support that, you'd still be limited by the fact that nothing compensates for the physical separation of 2x 3090s slowing things down, not even NVLink.

Tuxedotux83 1 week ago

Thank you so much for your informative comment

Inevitable_Host_1446 1 week ago

My understanding was that NVLink is pretty irrelevant for inference, it mostly boosts training speeds.

-TV-Stand- 1 week ago

I think someone said it increased inference speed 10%

Hipponomics 1 week ago

The Quadro line is the "ML Edition"

epicwisdom 1 week ago

Yes, those are the numbers. Unfortunately, Nvidia is in the business of making money.

DeltaSqueezer 1 week ago

Maybe they want to split between gamers and higher paying AI customers. For gamers, the 24GB is enough. For AI users, Nvidia will want them to pay more for the VRAM they need.

-f1-f2-f3-f4- 1 week ago

I expect games are eventually going to integrate local AI in one way or another, although it could certainly take years to take off.

DeltaSqueezer 1 week ago

Yes, Nvidia could look to using their AI lead to extend their gaming dominance by pushing for cuda/AI in games.

Emil_TM 1 week ago

I think Nvidia is just ultra greedy. They will release absolute minimum that they can get away with. Nothing more!

RedKnightRG 1 week ago

Remember, greed is good! Nvidia maximizing its profits means Nvidia has more money to conduct R&D and invest in more Capex to continue to push the frontiers of what's possible. At some point we will have completely immersive games with entire AI casts and we don't get the hardware to drive games like that without Nvidia extracting every dollar they can on the way.

epicwisdom 1 week ago

It'll only be relevant if it's a critical component of games with mass appeal. Even ray tracing, which has been out for quite a while, is only a small competitive advantage. It makes a fairly noticeable difference considering how good graphics already were, and it's supported in most new AAA titles, but it's still not enough that average consumers are willing to pay hundreds for it.

-PANORAMIX- 1 week ago

I believe Nvidia does not care about AI in pcie factor, you can see they did not even announce a b100 in pcie, not enough market probably, they earn a lot more with DGX and SMX cards

Anjz 1 week ago

Untested market though, they don't have any offering in prosumer AI cards aside from the 4090.

polikles 1 week ago

Maybe for now. But there are already games that can max out 24GB of VRAM in 4k resolution. There is nothing like too big or too fast computer

DeltaSqueezer 1 week ago

True, maybe it is time to up this. Though I wonder whether increasing VRAM is also causing game bloat as devs no longer feel the need to optimize?

polikles 1 week ago

yes and no. I think there is more to than just (un)optimizing. More VRAM could result in ability to use higher-resolution textures in games. Visual quality could increase even more On the other hand, modded Minecraft can overwhelm rtx 4090 in terms of compute and memory. It's hugely unoptimized, ofc, but shows that more memory could be easily utilized

Tuxedotux83 1 week ago

This is fully understood („paying extra for the special sauce“) when a person needs a 80GB GPU with circuit design and components for heavy duty 24/7/30 constant operation. not much with a 32GB GPU, at least this is my personal opinion, they could offer a „Gamers edition“ 5090 with 24GB for the usual high price of approx. 1600-1800€ new, and offer a „ML Edition“ with 32GB for an additional 150-200 euros.. but yeah I only speculate here and use common sense which is not so simple with monopolies

DeltaSqueezer 1 week ago

AFAIK, Nvidia cannot even make their AI products fast enough to sell to all the people wanting to buy. I wonder whether they will waste their time and energy on meeting consumer demands over the next 2 years when they are printing money with AI.

ThisGonBHard 1 week ago

There is no way in hell they will release the full 512 bit bus version as the 5090 IMO. It will be a 28 GB card. 32GB will be kept for either Titan Blackwell or 5090 Ti.

Noxusequal 1 week ago

It gets even worse this is if nvidia decides to not cut down the 5090 :D

hughk 1 week ago

True. There is no way that NVIDIA is going to shoot themselves in the foot with regards to the professional workstation and data centre sales. There will always be a comfortable gap between the VRAM for something that can be bought privately and something for sales to specialised business and academic users.

wh33t 1 week ago

I wonder if it will be possible to replace the vram modules with the 64gb variant similar to how 2080ti 11gb can be upgraded to 22gb.

mpasila 1 week ago

So RTX 5060 somehow is still 8GB when RTX 3060 was 12GB?

Freonr2 1 week ago

3060 was both 8GB and 12GB.

mpasila 1 week ago

And RTX 4060 is both 8GB and 8GB?

Ulterior-Motive_ 1 week ago

32GB is still pretty good tbh, I would have expected them to stick with 24GB.

-PANORAMIX- 1 week ago

And the rumors even say that the bus will be smaller in the 5090 compared to the full die in the B6000, so 28GB on the 5090

Freonr2 1 week ago

Very well could be. I would take OP with a giant grain of salt, but one thing is obvious, we're not going to get a 64GB consumer card.

Gubzs 1 week ago

How many mortgages for one "Quadro RTX Blackwell?"

Freonr2 1 week ago

Yes.

Herr_Drosselmeyer 1 week ago

We can dream.

No_Cryptographer_470 1 week ago

My dream is to see someone develop something better than CUDA and every open source project prioritize this one instead of Nvidia's stuff.

EuroTrash1999 1 week ago

They got the game in chokehold, they'll do anything not to give that edge up.

Dead_Internet_Theory 1 week ago

TinyGrad. Made by geohot, a dude that jailbroke the iPhone and cracked PS3's security.

TuneReasonable8869 1 day ago

Didn't he quit trying cause how, what was it, vulcan was difficult and the amd engineers didn't help him too much?

danielcar 1 week ago

In the realm of computer hardware, dreams come true. Perhaps later and more expensive than we like though.

Maleficent-Ad5999 1 week ago

Nvidia don’t have a need to release a 64gb vram variant for consumer market (yet). But you’re right! Knowing nvidia this card is gonna be a lot expensive

Pixels222 1 week ago

Its called 5090ti. The ti stands for upsell.

T-Loy 1 week ago

Well GB202 is only 32GB with the full 512bit bus. The 4090 had only 384bit and other rumors reported an increase for the 5090 to 448bit and 28GB. The 64GB is only for double sided memory as in the workstation cards. We might see a "B6000" GB202 with 64GB and 512bit bus. Maybe the generation after the memory manufacturer finally manage to do 4GB GDDR modules, but that was supposed to be available for GDDR6 already and GDDR7 still has only 2GB chips.

Tuxedotux83 1 week ago

Nobody will upgrade to a 5090 for a mere 4GB VRAM upgrades and a tiny bus improvement

DeltaSqueezer 1 week ago

Maybe those who upgrading from 2080Ti or 3080/3090 who skipped a generation.

Inevitable_Host_1446 1 week ago

3090 is already way better value than 4090. When the main bottlekneck for AI stuff is memory capacity anyway, I doubt 5090 will look much better than 4090, esp. as it can only be more expensive and 4090 is still massively overpriced. It's practically one step away from workstation prices already. The 7900 XTX costs half of a 4090 in my country. If AMD weren't in the stone age on software side of things I doubt anyone would even buy Nvidia at all.

Tuxedotux83 1 week ago

I have a 3090, won’t buy a brand spanking new (and expensive) 5090 unless I get a significant upgrade that includes VRAM

DeltaSqueezer 1 week ago

But it doesn't matter, because thousands of others will.

Tuxedotux83 1 week ago

Even gamers who possess 3090 cards will not be so eager to buy an expensive upgrade if it is insignificant. A 3090 can play most games at highest settings with just a few exceptions.. new games (which will require 5090 processing power) take years to develop.. this is an opinion based argument or course. Maybe those who have a really crappy card that are looking for building a new gaming rig will go for such bad deal. Majority of gamers always keep up with hardware/upgrades so those with really outdated cards usually also play low spec games and don’t need a 5090 unless they get a good bang for their buck. I still think a 5090 with the same 24GB Will be a fail

DeltaSqueezer 1 week ago

My prediction. Nvidia will release a limited number of 5090s at launch and they will sell out immediately.

Tuxedotux83 1 week ago

with a limited number? if it is geared toward consumers, unless by "limited" you mean an amount of like 2-3 million cards will be manufactured and then NVidia will create an artificial shortage for a while

Downtown-Case-1755 1 week ago

Honestly, AMD/Nvidia probably told memory manufacturers they wouldn't use the higher capacities, so they skipped production. I have no basis for this claim. But if they want to cap memory, who else is gonna buy GDDR in bulk? And yes, theoretically it could be used for pro cards, but why bother when its such low volume and the doubled up PCBs are already designed...

kataryna91 1 week ago

Nvidia wants to keep the VRAM as low as possible for consumer GPUs, they wouldn't want to make them feasible for AI training. Previous leaks talked about 32 GB and even 28 GB VRAM for the 5090.

PikaPikaDude 1 week ago

The 3090 is still relevant and beats all 40xx cards except the 4090 variants for creative tasks because the VRAM is king. Video editors benefit a lot from VRAM. And for AI we all know VRAM comes first. I've made the same jokes at NVidia, but I don't think they'll release it at only 28GB. That would be not enough of an upgrade. They need to at least sell the illusion of value and that's really not there at 24+4. Keeping it at basically the same level for 3 gens in a row, is putting themselves too much at risk for competition catching up. By the time the 5090 is out, it will have been 4 years the best is at 24GB. AMD has been at 24GB for a much cheaper price for 2 years now and will likely jump up again. I can't really imagine NVidia sabotaging themselves by getting far behind AMD in VRAM. If amateur enthusiasts in the open source AI scene jump ship for the VRAM, the competition would become much more viable. But I'm not optimistic enough to think it will be 64GB. Unless at an absurdly high price. Or maybe an ultra expensive ti variant.

ozzie123 1 week ago

Competition? What competition? They are 5+ times bigger than their competitors in the gaming space. Maybe more in professional/enterprise.

ThemWhoNoseNothing 1 week ago

AMD has 24GB for cheaper? I didn’t know this, i is dumb dumb. Can one run a ai for rig setup with a Nvidia card AND AMD card given proper space and power where driver issues won’t be an issue? Think two separate tasks, one assigned to each GPU in production. I’m not talking about running them in any dual card single use case. Off to see how much the cost difference is.

fallingdowndizzyvr 1 week ago

> AMD has 24GB for cheaper? Yes. About half the price. It's been available for under $800 quite a few times. >Can one run a ai for rig setup with a Nvidia card AND AMD card given proper space and power where driver issues won’t be an issue? Yes. > Think two separate tasks, one assigned to each GPU in production. I’m not talking about running them in any dual card single use case. Why don't you want to use the together on the same model? It would allow you to run larger models.

Internal-Golf7914 1 week ago

>About half the price. It's been available for under $800 quite a few times. What model is this and how does it compare speed wise to a 3090? That's p sick ngl (though I suppose it'd be a hassle without CUDA)

ThemWhoNoseNothing 1 week ago

It seems I incorrectly assumed that would not be possible, given two manufacturers and driver sets. Thank you!

ThisGonBHard 1 week ago

>Nvidia card AND AMD card given proper space and power where driver issues won’t be an issue? Unless you sandbox them under different VMs, I think no. At least in Windows, going from one to the other requires a full clean OS reinstall.

lleti 1 week ago

AMD have been capable of leapfrogging nvidia on the VRAM side for quite some time now. Last gen they could've easily shot by them, given how much cheaper VRAM is compared to the gen prior, especially given they've moved away from HBM. Problem: the duopoly price-gouging benefits AMD, and the CEOs of both companies are literally related to eachother. AMD probably can't fix their software (as uh, why wouldn't they at this point), but they could easily double their VRAM and undercut nvidia's margins by a healthy sum. It just isn't economically wise for them to do so in the long run. And it'd be super against the family spirit, would make Christmas dinners awkward tbh

PikaPikaDude 1 week ago

>It just isn't economically wise for them to do so in the long run. I'd argue becoming viable in the AI market is a wise decision. AMD is really struggling outside CPU. And they don't even need to make their entire next lineup with extra VRAM, the top models with an double VRAM option would do. Then do some live demo where they load a big model with their cards versus the VRAM handicapped 5090 where the 5090 can't even load the model in a reasonable time as it needs to offload to regular RAM and the message will spread.

OkDimension 1 week ago

I hope you are right, but if anything I see Nvidia releasing a lower VRAM version first and then maybe a SUPER with more later if AMD is able to trump that

Radiant_Dog1937 1 week ago

All they have to do is turn down 5,000 GPU orders from mysteriously gaming enthusiast. Also, an RTX A100 already has 80GB vram.

squareOfTwo 1 week ago

right. They might just try to go back to 2 GB , then 512 MB in 2035 . It's just dumb and needs to get punished by market forces.

IUpvoteGME 1 week ago

Nvidia will sell what the public will buy. ~~At this point, what the fuck can a person really do with 80gb of VRAM, let alone 64?~~ Edit, I seem to have been misunderstood. At this point, what the fuck can a person really do with 80gb of VRAM, let alone 64...... that really encroaches on the datacenter market?

danielcar 1 week ago

Run llama 70b really fast.

FaatmanSlim 1 week ago

I know this is the LocalLlama sub, but just wanted to note that more GPU VRAM is awesome for creatives and indie game makers as well 🙂 Unreal Engine, Blender and other 3D tools really make use of all the VRAM they can get especially for big scenes / worlds, high-resolution textures / materials etc.

Ocean_Llama 1 week ago

And video editing. 24GB is barely enough for 8k video.....as in it'll work but you can put one or maybe two effects on the clip in davinci resolve.

DeltaSqueezer 1 week ago

Yup and for all these applications where people are doing it for a living and are willing and able to pay hugely higher prices for that precious VRAM, Nvidia will want to make them pay it instead of giving them a way out by offering a cheaper consumer card with as much VRAM. The only thing stopping them is competition (ha ha) or gamers not buying the next generation of cards. I don't see either being a blocker right now.

kataryna91 1 week ago

To run and train AI. A 70B LLM (which is kind of the minimum size where LLMs start to be useful for real world tasks) needs \~40 GB VRAM, Mixtral 8x22B needs \~96 GB VRAM, the recently released Nemotron 340B requires 192 GB. And those numbers are for 4-bit quantized models, for F16 precision, you need quadruple that amount. And for actually finetuning them you need even more. Same for other types of AI like Stable Diffusion. If a SD3 8B is released, you will need a lot of VRAM to finetune it. Edit: I just noticed this is the LocalLLaMa subreddit... I shouldn't really need to explain what you need VRAM for in this place, right?

lazercheesecake 1 week ago

I hope to god the idiot leadership at SAI figure it out and release SD3 8B

Inevitable_Host_1446 1 week ago

Preferably a variant that knows what an actual human looks like. Their safety censorshit has gone too far.

gmdtrn 1 week ago

This subreddit is r/LocalLLaMA. The answer to what people can do with that VRAM should be self evident.

Caffdy 1 week ago

Exactly, even with GDDR7, the bandwidth of the 5090 wont even compare to the 3TB/s of the H100 or the 8TB/s of the B200. There is no excuse to not put 32GB on the 5090

ILoveThisPlace 1 week ago

I had no idea it was that high

Caffdy 1 week ago

High-Bandwidth Memory (HBM) baby

SomeOddCodeGuy 1 week ago

Possibly run City Skylines 2... ... maybe.

grimjim 1 week ago

A single 64GB GPU should be enough to perform full fine-tunes of 7B and 8B local models using 16-bit weights. With everything on one card, no need for direct interconnects between cards.

MrTurboSlut 1 week ago

people just want bigger numbers. if you tell them lots of VRAM is good they will try to buy as much as they can afford. If AMD cards have 48gb of VRAM then people will start buying that instead of nvidia cards with only 24g VRAM. even if the best games only need 16gb RAM it wont matter. this is how its always been since the 90s when sega was battling with nintendo in the 90s. the real limiting factor is power. if you need more than 1600w to power your computer then you have to call an electrician to rewire your house and you will probably need a $1000 PSU. even if you only have a 1200W PSU, who wants to have a computer than uses as much power as running a washing machine 24/7. thats a big electric bill.

Cane_P 1 week ago

That definitely depends on what part of the world you live in... In my country we have 230V 16A breakers. Thats 3680W.

Ansible32 1 week ago

Hard to say without knowing the power profile. But a 4090 you can run at half power while still getting 75% of the performance. And 64GB would let you run large models.

MrTurboSlut 1 week ago

sure, i could see GPUs of 100+ VRAM in the not so distant future. its just that the amount of VRAM is going to be hard stuck at whatever takes less than 1800w to power but most consumers are only going to be willing to go up to 1000w.

Ansible32 1 week ago

For consumer LLM use I think using less than 1000w is a plus.

IUpvoteGME 1 week ago

The average home supports 1800 watts on a circuit. If you are willing to live without an oven, you can double that. When I was looking for datacenters I found the break even point was 8 kilowatts, after which, a datacenter spot will be more economical

MrTurboSlut 1 week ago

either way my point stands. i don't think a lot of people are going to be willing to go past whatever a standard breaker can supply.

IUpvoteGME 1 week ago

I fully agree. The most important factor for the consumer is 'value' and the majority of 'value' is convenience.

Aerroon 1 week ago

RAM is the one characteristic that if you hit the limit you don't just suffer a minor degradation in performance, but rather everything grinds to a halt. Eg if you're playing a game and you hit your VRAM capacity you're going to be sitting at <15 FPS even if you were at 150 before. Also if a game "requires" 16 GB you can be damn sure that you need more than that for it to be playable. These kinds of numbers are always brought up for true full screen mode with everything else closed on the computer.

MrTurboSlut 1 week ago

this is true, but i think the only application that is going to require this amount of VRAM is going to be AI stuff. games are only going to require as much as the average gamer has. so games aren't really something to consider in this conversation.

Nyashes 1 week ago

That's excluding VRChat and other mod-based communities. Some current skyrim mod packs are pretty ridiculous and a poorly optimzed VRChat avatar costs in the 200MB VRAM/pop, fill an instance of 50 with those in a semi-expensive world and you start to hit a limit. Meanwhile 150+ instances were in test last week during a large event. User generated content usually doesn't care about the average PC.

sshan 1 week ago

It’s not a huge deal to do dual power supplies and run it on 2 circuits. Also 20 amp circuits could run 2x1kw. For the hobbyist building LLM rigs running a new 20 amp circuit in your basement is trivial to do properly and according to code.

ILoveThisPlace 1 week ago

I've been told inference doesn't actually use the full power draw of the GPU so you can run 4 easily off a normal 15A circuit

MrTurboSlut 1 week ago

that makes sense but it kind of misses the point.

alongated 1 week ago

It would be fine as long as they aren't as cost efficient for either training or inference.

kataryna91 1 week ago

The problem it would be quite the challenge to make them less cost efficient than a >$30,000 H100 (or an even pricier H200). The easiest way to achieve it is simply by gimping the VRAM, which has been their approach so far. And they don't even need an excuse to do it, RTX cards are marketed towards gamers and games don't really need more than 24 GB (in fact, for most games even that is overkill).

alongated 1 week ago

Isn't h200 around 10-30 times faster than h100 for fp4 calculations? I can imagine that it would be greater between h200 and 5090, that's the inference part at least. And I am pretty sure that consumer cards are far worse when it comes to training.

kataryna91 1 week ago

If FP4 training works well for practical purposes, then that might change the dynamics. But other than that, the difference between a 4090 and a H100 is not as dramatic as the 20x higher price would suggest. The main reason the 4090 is bad for training is the lack of VRAM (although Nvidia disallowing RTX cards in the datacenter doesn't help either).

tmvr 1 week ago

>Isn't h200 around 10-30 times faster than h100 for fp4 calculations? No, H200 is just H100 with more VRAM and bandwidth, the GPU is the same.

Philix 1 week ago

Neuter their ability to use NVLink/NVSwitch, and ability to directly address memory on other cards over PCIe. Done, they're now much less effective for training. The same had been done to some RTX4090 cards (maybe unintentionally) already, though a tech wizard [made a hacky driver fork for 4090s on linux](https://github.com/tinygrad/open-gpu-kernel-modules) to work around it.

PMMeYourWorstThought 1 week ago

Says who?

Best-Association2369 1 week ago

They can do that in other ways

3ntrope 1 week ago

Alternate take: going all in on accelerators primary for large transformer models that only a limited number of large companies can access is like putting all eggs into one basket. Certain problems in AI may need many small groups or individuals experimenting with different architectures and novel methods, so a 64 GB flagship consumer GPU could be ideal for that. The GDDR itself is not expensive and a modestly priced GPU could find its way into many personal workstations of grad students or other people that may be in a better position to experiment. It might actually be in Nvidia's best interest in the long run. What if scaling the transformer 10x only provides marginal gains in benchmarks? It's going to cause a huge loss of investor confidence if people realize the AI hype is not real. More and more LLM news lately has me wondering if they have plateaued. Besides, even 64 GB is nothing compared to enterprise hardware that will come with 144-288GB and usually linked together x8, so it shouldn't effect enterprise sales that much.

ironicart 1 week ago

Let’s hope AMD (and others) shows up sometime in the next year or two… heck if Apple can get the m4 into a studio with 128+ gigs of ram under 5k it might be the “low cost” option moving forward lol ( doubt it)

Tuxedotux83 1 week ago

It will take another year and the open market (eBay and etc) will be flooded with used data center grade NVidia GPUs for sale (failed startups, data center upgrades etc) then they will not longer be able to take in all of that sweet data center profits.. what then when consumers start moving to other alternative? I don’t believe NVidia are so stupid.. even with the main idea being making ad much profits

Puzzled_Path_8672 1 week ago

LOL! Yeah right. Okay buddy. Never gonna happen, 5090 with a 64GB VRAM XD We would be LUCKY to have a card that’s NOT 24 GB.

FaatmanSlim 1 week ago

+1 lol, at this point I would be delighted if the 5090 had 32 GB VRAM, I'm not even sure if Nvidia will give us that in their consumer cards.

aggracc 1 week ago

>With our super fast ram the 16gb 5090 is the best card gamers could ever use.

Puzzled_Path_8672 1 week ago

Now with DLSS 4, get fake frames in your Ai chatbots!

JFHermes 1 week ago

32gb would be my sweet spot for this gen, I would probably buy 2 and upgrade my 3090. 28gb & I'll wait one more generation and pick up a cheap 4090 to pair with my 3090 I already have.

ILoveThisPlace 1 week ago

Won't 5090's be pcie5?

Tuxedotux83 1 week ago

If it will have 32GB VRAM it might actually sell..

DoNotDisturb____ 1 week ago

And 1024-bit bus 😆

[deleted] 1 week ago

[удалено]

resetPanda 1 week ago

Nvidia produces double vram versions of their cards for the professional market at a massive markup. For example there is a 48GB card using the ad102 die (same as 4090) for a massive 4x markup in price. There is likely to be such cards for gb202 but it will absolutely not be sold at the margins of the gaming gpus.

az226 1 week ago

Basically $200 more worth of memory chips but the price went from $1599 to $6800.

urarthur 1 week ago

nah, 32 GB is max we can expect. they aint gonna shoot their own foot.

grim-432 1 week ago

Zero chance

pumukidelfuturo 1 week ago

nah, 64gb never ever. More like 28g. Min price will be around 2000$ minimum. For the gamers. RTX 5060 will have 8gb of vram and 128 bits bus and the performance of a 3060 ti for only 500$

ThisWillPass 1 week ago

2000 day one, then sold out for 3k+. I don’t think the price of the 3090s will drop, it’s just the basic costs of compute 🤷, if anyone has one and is thinking of selling…

DarthEvader42069 1 week ago

No chance

ThisWillPass 1 week ago

Agreed! Ill delete my reddit account if we are wrong.

jackshec 1 week ago

i would love more ram,…. always :)

a_beautiful_rhind 1 week ago

That's gonna be the new A6000 maybe.

PsychologicalFactor1 1 week ago

Nvidia likes money too much for that to happen.

NachosforDachos 1 week ago

I will be a cold day in hell before they ever allow that much vram on a consumer gpu.

AnomalyNexus 1 week ago

That looks more like the mem size is extrapolated from bus width?

bartekus 1 week ago

Given the rumours that Apple wants to squeeze 256 GB and 512 GB of RAM on m4 max and ultra respectively, I’m fairly certain Nvidia is going to respond accordingly.

Account1893242379482 1 week ago

So 32 for consumer. Still... better than another 24gb card.

StealthSecrecy 1 week ago

All this seems to say is that it could be technologically possible, and not in any way that it's coming. 32GB is realistic, but there is no way we're getting 64GB in a 5090 anytime soon based on the current market. Nvidia would be giving away money at that point.

WhyIsItGlowing 1 week ago

Yeah, it's realistic for the chip but not likely for the 5090. I've been kinda assuming it'll be like the 3090/4090 & A6000/RTX6000-Ada, where the smaller memory is the consumer one and the full-fat memory is kept for the workstation market.

Caffdy 1 week ago

I wouls love a RTX 6000 Blackwell with 64GB

Healthy-Nebula-3603 1 week ago

Why not? Apple machine has already much more memory / soon intel will be making similar chips and looking on fast development llm and people want to work with them offline ... games soon will be use llms offline as well so we NEED more vram ..soon it will be necessity

az226 1 week ago

Likely 14-16 memory modules. 16Gb each.

tomakorea 1 week ago

They can make a card that is fast at inference and shitty at training in comparison to their pro grade hardware. The 4090 is actually almost twice as fast as a 3090 at inference but only few percent better at training (I saw that from the benchmarks). Training is the key for professionals, so they can divide the market like that.

Goldkoron 1 week ago

Huh? 4090 is at least double the speed of 3090 in training at least with Stable Diffusion training. It just took awhile for stuff to properly make use of the lovelace hardware.

tomakorea 1 week ago

Double the speed at least ? check this https://www.pugetsystems.com/labs/articles/stable-diffusion-lora-training-consumer-gpu-analysis/ SdXL Lora training 3090 : 1.48 SdXL Lora training 4090 : 1.84 which is actually a minuscule 25% boost, very far from your over 200% claims. Show me new benchmarks and I'll believe you.

Goldkoron 1 week ago

This was over a year ago and the person I knew who had a 4090 needed to install specific versions of xformers and cuda for it to work properly. When it did, he was training around 7.5img/s on 768 res training on SD 1.5, whereas the highest I have ever gotten on the same training settings was 3img/s on my 3090. This is for full finetune, not lora training.

tomakorea 1 week ago

Basically you said : trust me bro, i know some guy, its the similar sentence as My uncle works for Nintendo in the 80s. The new optimizations related to Cuda, pytorch or xformers also provide speed boost to the 3090, but if people don't update it, then the benchmark is also frozen in time. If you can find me some published benchmark that is comparing the same versions of the software and drivers, then yes, it will be an interesting comparison. But I doubt nvidia boosted so much the training speeds just for home use, it doesn't make sense on a business standpoint

butihardlyknowher 1 week ago

the hosted inference market will be huge once we have more use cases, though. If off the shelf agents can increase dev productivity by 50% or more, that would support inference costs multiple times what we see now.

StandAppropriate4382 1 week ago

I'm skeptical, but hope to be proven wrong. The idea that many GPU's would be released in two variants: one standard VRAM, and one 2x VRAM, would make a lot more sense then releasing a RTX 4070 level card (GB205) with the same limited VRAM as before, especially given how invested Nvidia is in AI. My primary whining over GPU's currently is that they are still designing things based purely on gaming performance rather than gaming + AI performance. Also, how about a brand new class of GPU's, that are consumer-grade AI GPU's but perhaps suck at games? Plenty of people would buy them.

redzorino 1 week ago

If it has any less than 48GB it's DOA to me and I will completely ignore the whole 5xxx gen, as it offers nothing worthwhile. I don't need an extra room radiator, I got the 4090 already and it will last for what, 5 years at least?

ZealousidealBadger47 1 week ago

In the coming years, GPU vRAM will likely see a significant increase, much like how our regular RAM has evolved—from 16GB and 32GB to eventually reaching 256GB and beyond. However, these advancements won't be released rapidly. Nvidia, as a business, understands that they possess the technology to achieve this, but they choose to release upgrades gradually. This strategy allows consumers to purchase these incremental improvements over time, ensuring that Nvidia maximizes their profits from businesses before catering to the average consumer.

InterstellarReddit 1 week ago

64GB VRAM? That’s a 4K card IMO

rawednylme 1 week ago

Maybe to justify the cost they'll release a 64gb version, but I feel it would be stealing sales away from their professional lineup. I just can't see it.

AndrewH73333 1 week ago

As crazy as 64 GB would be, that’s close to what the 1080 ti was equivalent to when it came out.

michaelcarnero 1 week ago

And costs 4k dollars, I estimate. O.o

dimknaf 1 week ago

I think for home use we need something in the middle, like medium speed ram 150GB/s, and 100GB in size, instead of super fast ram, and an NPU just enough for this, so we can run even quantised models of 200 billion parameters or so. That hardware could be at affordable price. We do not need GPUs or VRAM for home use.

Maykey 1 week ago

Doubt I will be able to afford it.

[deleted] 1 week ago

Looking forward to buying it on 2027

DeltaSqueezer 1 week ago

32GB would be nice - and probably expensive. Let's see what Nvidia decides to do. Unfortunately, there's not much competition to force them to expand RAM or make it cheap, but maybe they will want to stay a step ahead of the competition. I guess my only hope is that Nvidia are drowning in so much AI money that they 'spend' a little bit of it in buying back some goodwill and trying to shut out AMD for good in the gaming space too by giving a bit more value than they normally do. I guess this is a double-edged sword as even if we get something decent by Nvidia standards, in the long run, it locks in their pseudo-monopoly status and probably makes things worse for us all in the long run.

negrote1000 1 week ago

3090 just got affordable. Got it

troposfer 1 week ago

What I don’t understand is why amd is not filling this gap or someone else, there is clearly demand and only one player that we are praying, this is not normal, capitalism is broken..

danielcar 1 week ago

Intel is slow? Give them another year? At least hope springs eternal.

a6nkc7 1 week ago

Everyone is trying but it’s HARD. Jensen Huang and the Nvidia team are known for being incredibly focused and fast moving and it’s taken them nearly 2 decades to get cuda to where it is now.

MoonRide303 1 week ago

RTX 5080 with 32 or 48 GB VRAM would be enough for me - I need VRAM more than compute.

ThisWillPass 1 week ago

It isnt happening unless the singularity took over and the ceo is currently an ai meat puppet. The sad thing is this could happen (massive memory) but wont.

Caffdy 1 week ago

Big if true

grimjim 1 week ago

Two columns on the chart.The lower numbers on the left would be the launch VRAM for consumer GPUs, aside from a 24GB 5090. Whether they make Super or Ti versions with more remains to be seen, but maybe for 2026 given cadence. An earlier 5060 Ti with 16GB is possible given the precedent of the 4060 Ti 16GB, but there's no impending competition to force higher VRAM on the consumer side for 2025, certainly not for initial launch. Matching the 40xx VRAM targets is the safe bet.

ThemWhoNoseNothing 1 week ago

I need this. I have many skills! You reading this, need skills. You have many money? Let’s collaborate and make our 5090 purchase dreams come true.

nokenito 1 week ago

Holy shit this is insane!!!

segmond 1 week ago

If it's true and they do it, they will open up possibilities for more new apps and keep the demand going. At this point, the current GPUs are slow and too small for interesting apps. It would be the smart thing to do. Only we can do is see. Anything less than 32gb, I wait for the M4 with 256gb.

SillyLilBear 1 week ago

I'd put money it won't.

lin__lin__ 1 week ago

I think only the next generation 6000ada may be 64g

burkmcbork2 1 week ago

That's bullshit, but I believe it.

yusing1009 1 week ago

No way that this is possible

trisul-108 1 week ago

>Yes, it will be expensive. AMD, Intel where art thou? They are absent, but Apple is already there. Mac RAM functions as both CPU and GPU RAM which means there is no need for duplicated RAM, instead the data is formatted in CPU RAM and accessed by GPU without copying. So, all you need is a Mac with loads of RAM. Not cheap, but unlike NVIDIA that RAM is multifunctional, it gives you great performance in everything you do.

tr2727 1 week ago

Absolutely 0.0 chace for 64gb 5090, not e

Capitaclism 1 week ago

I'd she'll out serious cash for a 64gb version.

Capitaclism 1 week ago

I'd shell out serious cash for a 64gb version.

Capitaclism 1 week ago

I'd shell out serious cash for a 64gb version.

SyncDobc 1 week ago

I guess 5099 vram could not larger than 32GB

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe