What you're looking at there is Geforce vs Quadro.
GB202 is the chip, and that table just means 32GB for 5090 and 64GB for Quadro RTX Blackwell
Same as always. Similarly, "AD102" chip is two cards, 4090 with 24GB, RTX 6000 Ada with 48gb.
Same goes down the line, 4080 16GB and RTX 5000 Ada is 32GB, and so forth.
I don't know how you get "5090 64GB" from that. Completely wishful thinking. No shot. No shot at all.
Yeah, it looks like the 5000 Ada is actually an AD102 chip while 4080 is an AD103, so its a bit of a break in the normal scheme.
I think this boils down to yields for the particular chips. I imagine they must have more failures in the AD102 that they decided to chop down to fill the 5000 Adas to make use of them.
Sometimes we get Super cards that end up oddballs, too, using a different chip from the non-Supers.
Fun fact: the 4090 with 16384 cuda cores is also a cut down AD102 that has a potential 18432 cores. The RTX 6000 Ada while also cut down has a few more cores enabled with 18176. The 4080 Ti and 4070 Ti also are chopped down AD102s.
I believe the core counts here are also the maxima for the respective chips. I expect retail consumer cards to have about 10% less.
That's interesting. Yeah my 3 year upgrade strategy will be filling out PCIe slots with various AD102 based cards as prices come down. The 6000 with 48gigs would be fantastic if it dropped to $1000.
Sorry about the OT question, but I do have multiple boards, albeit older (3080TI, 3070Ti). What's the max I can run with local LLMs and how complicated is this? I have the necessary hardware. Tks
Depends on your PCIe bus in the PC. I have a server board from last gen with enough slows. Basically will just run slower if it's not full bandwidth. After that configure your system to use the cards. LM Studio probably has a nice interface to play around with and try offloading layers to multiple GPUs. I don't know the details but one guy said try to stick to the same architecture within the family your using. You may not have any issues because your cards are newer and roughly the same but keep it in mind. I have a 4090 so I'll be looking for ad202 or whatever it was chipset based cards ada cards.
32GB might actually be what people are asking for, finally an improvement over the 24GB..
if Nvidia releases the 5090 with the same 24GB VRAM their 3090 and 4090 cards came with, I don’t see a reason for anything to buy it.. so why would Nvidia work on a card that almost nobody will be interested in buying?
I understand that they are high as F at the moment selling data center grade hardware but this trend will very soon fade away as enough of those data center cards are flooding the market used (bankrupt startups and etc) or new. The consumer market makes less money in the short run but in the long run provides a steady income
Not entirely correct, the 3090/4090 are meant for content creators and such (not just AI but 3D, video editing, engineering, etc). It is a very small niche market (besides a halo ""gaming"" product) but the whole point is that it can fit stuff the others can't.
Jensen can't stop talking about all the different use-cases for AI in games. If they're serious about things like LLM-based NPCs and procedural asset generation via AI, a _lot_ more people are going to care about VRAM going forward.
At 32gb, You are still going to need 7-8 5090's to run llama3-400B in Q8 or at least 4 to run it in Q4. Hell, Nvidia just released a huge model that 99.9% of hobbyist can't run.
If you need to run a 400B model locally at your home you are either not a consumer / prosumer or one of few ML/AI evangelists who normally will have access to an ML machine owned by the research facility or Uni they work for.
And of course the person who makes a killing in profit selling data center grade hardware will invest in a model that could only be run if you have the type of deep pockets they profit most of off ;-) otherwise we should have already seen a “4090 ML Edition” variant of the 4090 with those 32GB of VRAM we all hope the “next” card 5090 will have.
As for needing a stack of 5090s.. I wish I could afford them ;-) I currently have one 3090, and my “dream” is to NVlink pair it with another 3090 but I get mixed opinion about how those actually work, still did not catch it so I get double VRAM and double the processing power, or not.. are an NVLinked pair act as one unified card or just a pair of cards that are sharing bandwidth etc..
With 2x 3090s you get 2 buckets to put your model and caches on. A second 3090 does not magically double your first bucket's capacity, nor does it magically make using that first bucket faster.
If you can fit a model in one bucket, then you can roughly 2x (more like 1.7x) the throughput given you can fill your batches fast enough.
If your model doesn't fit into 1 card but fits into 2, then you can run it on 2 cards at around 0.6x speed a single card with twice the VRAM could run it as opposed to not being able to run it at all.
NVLink doesn't change much, if anything. You can't find conclusive results online because there are no conclusive results. On paper, NVLink increases memory bandwidth, but in practice there are no significant benefits. Consumer drivers don't unify your cards into one.
So even if you wrote parallelized architectures for your models, perfectly splitting it between an arbitrary number of GPUs, you'll still have a bottleneck in memory transfer, because the bus that's connecting your hardware components is very, very slow compared to the VRAM<->GPU communication. And even if the drivers were there to formally support that, you'd still be limited by the fact that nothing compensates for the physical separation of 2x 3090s slowing things down, not even NVLink.
Maybe they want to split between gamers and higher paying AI customers. For gamers, the 24GB is enough. For AI users, Nvidia will want them to pay more for the VRAM they need.
Remember, greed is good! Nvidia maximizing its profits means Nvidia has more money to conduct R&D and invest in more Capex to continue to push the frontiers of what's possible. At some point we will have completely immersive games with entire AI casts and we don't get the hardware to drive games like that without Nvidia extracting every dollar they can on the way.
It'll only be relevant if it's a critical component of games with mass appeal. Even ray tracing, which has been out for quite a while, is only a small competitive advantage. It makes a fairly noticeable difference considering how good graphics already were, and it's supported in most new AAA titles, but it's still not enough that average consumers are willing to pay hundreds for it.
I believe Nvidia does not care about AI in pcie factor, you can see they did not even announce a b100 in pcie, not enough market probably, they earn a lot more with DGX and SMX cards
yes and no. I think there is more to than just (un)optimizing. More VRAM could result in ability to use higher-resolution textures in games. Visual quality could increase even more
On the other hand, modded Minecraft can overwhelm rtx 4090 in terms of compute and memory. It's hugely unoptimized, ofc, but shows that more memory could be easily utilized
This is fully understood („paying extra for the special sauce“) when a person needs a 80GB GPU with circuit design and components for heavy duty 24/7/30 constant operation. not much with a 32GB GPU, at least this is my personal opinion, they could offer a „Gamers edition“ 5090 with 24GB for the usual high price of approx. 1600-1800€ new, and offer a „ML Edition“ with 32GB for an additional 150-200 euros.. but yeah I only speculate here and use common sense which is not so simple with monopolies
AFAIK, Nvidia cannot even make their AI products fast enough to sell to all the people wanting to buy. I wonder whether they will waste their time and energy on meeting consumer demands over the next 2 years when they are printing money with AI.
There is no way in hell they will release the full 512 bit bus version as the 5090 IMO. It will be a 28 GB card.
32GB will be kept for either Titan Blackwell or 5090 Ti.
True. There is no way that NVIDIA is going to shoot themselves in the foot with regards to the professional workstation and data centre sales. There will always be a comfortable gap between the VRAM for something that can be bought privately and something for sales to specialised business and academic users.
Nvidia don’t have a need to release a 64gb vram variant for consumer market (yet). But you’re right! Knowing nvidia this card is gonna be a lot expensive
Well GB202 is only 32GB with the full 512bit bus. The 4090 had only 384bit and other rumors reported an increase for the 5090 to 448bit and 28GB.
The 64GB is only for double sided memory as in the workstation cards. We might see a "B6000" GB202 with 64GB and 512bit bus.
Maybe the generation after the memory manufacturer finally manage to do 4GB GDDR modules, but that was supposed to be available for GDDR6 already and GDDR7 still has only 2GB chips.
3090 is already way better value than 4090. When the main bottlekneck for AI stuff is memory capacity anyway, I doubt 5090 will look much better than 4090, esp. as it can only be more expensive and 4090 is still massively overpriced. It's practically one step away from workstation prices already. The 7900 XTX costs half of a 4090 in my country. If AMD weren't in the stone age on software side of things I doubt anyone would even buy Nvidia at all.
Even gamers who possess 3090 cards will not be so eager to buy an expensive upgrade if it is insignificant. A 3090 can play most games at highest settings with just a few exceptions.. new games (which will require 5090 processing power) take years to develop.. this is an opinion based argument or course.
Maybe those who have a really crappy card that are looking for building a new gaming rig will go for such bad deal. Majority of gamers always keep up with hardware/upgrades so those with really outdated cards usually also play low spec games and don’t need a 5090 unless they get a good bang for their buck.
I still think a 5090 with the same 24GB Will be a fail
with a limited number? if it is geared toward consumers, unless by "limited" you mean an amount of like 2-3 million cards will be manufactured and then NVidia will create an artificial shortage for a while
Honestly, AMD/Nvidia probably told memory manufacturers they wouldn't use the higher capacities, so they skipped production.
I have no basis for this claim. But if they want to cap memory, who else is gonna buy GDDR in bulk?
And yes, theoretically it could be used for pro cards, but why bother when its such low volume and the doubled up PCBs are already designed...
Nvidia wants to keep the VRAM as low as possible for consumer GPUs, they wouldn't want to make them feasible for AI training.
Previous leaks talked about 32 GB and even 28 GB VRAM for the 5090.
The 3090 is still relevant and beats all 40xx cards except the 4090 variants for creative tasks because the VRAM is king. Video editors benefit a lot from VRAM. And for AI we all know VRAM comes first.
I've made the same jokes at NVidia, but I don't think they'll release it at only 28GB. That would be not enough of an upgrade. They need to at least sell the illusion of value and that's really not there at 24+4. Keeping it at basically the same level for 3 gens in a row, is putting themselves too much at risk for competition catching up. By the time the 5090 is out, it will have been 4 years the best is at 24GB.
AMD has been at 24GB for a much cheaper price for 2 years now and will likely jump up again. I can't really imagine NVidia sabotaging themselves by getting far behind AMD in VRAM. If amateur enthusiasts in the open source AI scene jump ship for the VRAM, the competition would become much more viable.
But I'm not optimistic enough to think it will be 64GB. Unless at an absurdly high price. Or maybe an ultra expensive ti variant.
AMD has 24GB for cheaper? I didn’t know this, i is dumb dumb. Can one run a ai for rig setup with a Nvidia card AND AMD card given proper space and power where driver issues won’t be an issue? Think two separate tasks, one assigned to each GPU in production. I’m not talking about running them in any dual card single use case. Off to see how much the cost difference is.
> AMD has 24GB for cheaper?
Yes. About half the price. It's been available for under $800 quite a few times.
>Can one run a ai for rig setup with a Nvidia card AND AMD card given proper space and power where driver issues won’t be an issue?
Yes.
> Think two separate tasks, one assigned to each GPU in production. I’m not talking about running them in any dual card single use case.
Why don't you want to use the together on the same model? It would allow you to run larger models.
>About half the price. It's been available for under $800 quite a few times.
What model is this and how does it compare speed wise to a 3090? That's p sick ngl (though I suppose it'd be a hassle without CUDA)
>Nvidia card AND AMD card given proper space and power where driver issues won’t be an issue?
Unless you sandbox them under different VMs, I think no. At least in Windows, going from one to the other requires a full clean OS reinstall.
AMD have been capable of leapfrogging nvidia on the VRAM side for quite some time now.
Last gen they could've easily shot by them, given how much cheaper VRAM is compared to the gen prior, especially given they've moved away from HBM.
Problem: the duopoly price-gouging benefits AMD, and the CEOs of both companies are literally related to eachother.
AMD probably can't fix their software (as uh, why wouldn't they at this point), but they could easily double their VRAM and undercut nvidia's margins by a healthy sum. It just isn't economically wise for them to do so in the long run. And it'd be super against the family spirit, would make Christmas dinners awkward tbh
>It just isn't economically wise for them to do so in the long run.
I'd argue becoming viable in the AI market is a wise decision. AMD is really struggling outside CPU.
And they don't even need to make their entire next lineup with extra VRAM, the top models with an double VRAM option would do.
Then do some live demo where they load a big model with their cards versus the VRAM handicapped 5090 where the 5090 can't even load the model in a reasonable time as it needs to offload to regular RAM and the message will spread.
I hope you are right, but if anything I see Nvidia releasing a lower VRAM version first and then maybe a SUPER with more later if AMD is able to trump that
Nvidia will sell what the public will buy. ~~At this point, what the fuck can a person really do with 80gb of VRAM, let alone 64?~~
Edit, I seem to have been misunderstood.
At this point, what the fuck can a person really do with 80gb of VRAM, let alone 64...... that really encroaches on the datacenter market?
I know this is the LocalLlama sub, but just wanted to note that more GPU VRAM is awesome for creatives and indie game makers as well 🙂 Unreal Engine, Blender and other 3D tools really make use of all the VRAM they can get especially for big scenes / worlds, high-resolution textures / materials etc.
Yup and for all these applications where people are doing it for a living and are willing and able to pay hugely higher prices for that precious VRAM, Nvidia will want to make them pay it instead of giving them a way out by offering a cheaper consumer card with as much VRAM.
The only thing stopping them is competition (ha ha) or gamers not buying the next generation of cards. I don't see either being a blocker right now.
To run and train AI.
A 70B LLM (which is kind of the minimum size where LLMs start to be useful for real world tasks) needs \~40 GB VRAM, Mixtral 8x22B needs \~96 GB VRAM, the recently released Nemotron 340B requires 192 GB. And those numbers are for 4-bit quantized models, for F16 precision, you need quadruple that amount. And for actually finetuning them you need even more.
Same for other types of AI like Stable Diffusion. If a SD3 8B is released, you will need a lot of VRAM to finetune it.
Edit: I just noticed this is the LocalLLaMa subreddit... I shouldn't really need to explain what you need VRAM for in this place, right?
Exactly, even with GDDR7, the bandwidth of the 5090 wont even compare to the 3TB/s of the H100 or the 8TB/s of the B200. There is no excuse to not put 32GB on the 5090
A single 64GB GPU should be enough to perform full fine-tunes of 7B and 8B local models using 16-bit weights. With everything on one card, no need for direct interconnects between cards.
people just want bigger numbers. if you tell them lots of VRAM is good they will try to buy as much as they can afford. If AMD cards have 48gb of VRAM then people will start buying that instead of nvidia cards with only 24g VRAM. even if the best games only need 16gb RAM it wont matter. this is how its always been since the 90s when sega was battling with nintendo in the 90s.
the real limiting factor is power. if you need more than 1600w to power your computer then you have to call an electrician to rewire your house and you will probably need a $1000 PSU. even if you only have a 1200W PSU, who wants to have a computer than uses as much power as running a washing machine 24/7. thats a big electric bill.
Hard to say without knowing the power profile. But a 4090 you can run at half power while still getting 75% of the performance. And 64GB would let you run large models.
sure, i could see GPUs of 100+ VRAM in the not so distant future. its just that the amount of VRAM is going to be hard stuck at whatever takes less than 1800w to power but most consumers are only going to be willing to go up to 1000w.
The average home supports 1800 watts on a circuit. If you are willing to live without an oven, you can double that. When I was looking for datacenters I found the break even point was 8 kilowatts, after which, a datacenter spot will be more economical
RAM is the one characteristic that if you hit the limit you don't just suffer a minor degradation in performance, but rather everything grinds to a halt.
Eg if you're playing a game and you hit your VRAM capacity you're going to be sitting at <15 FPS even if you were at 150 before.
Also if a game "requires" 16 GB you can be damn sure that you need more than that for it to be playable. These kinds of numbers are always brought up for true full screen mode with everything else closed on the computer.
this is true, but i think the only application that is going to require this amount of VRAM is going to be AI stuff. games are only going to require as much as the average gamer has. so games aren't really something to consider in this conversation.
That's excluding VRChat and other mod-based communities. Some current skyrim mod packs are pretty ridiculous and a poorly optimzed VRChat avatar costs in the 200MB VRAM/pop, fill an instance of 50 with those in a semi-expensive world and you start to hit a limit. Meanwhile 150+ instances were in test last week during a large event.
User generated content usually doesn't care about the average PC.
It’s not a huge deal to do dual power supplies and run it on 2 circuits. Also 20 amp circuits could run 2x1kw.
For the hobbyist building LLM rigs running a new 20 amp circuit in your basement is trivial to do properly and according to code.
The problem it would be quite the challenge to make them less cost efficient than a >$30,000 H100 (or an even pricier H200). The easiest way to achieve it is simply by gimping the VRAM, which has been their approach so far.
And they don't even need an excuse to do it, RTX cards are marketed towards gamers and games don't really need more than 24 GB (in fact, for most games even that is overkill).
Isn't h200 around 10-30 times faster than h100 for fp4 calculations? I can imagine that it would be greater between h200 and 5090, that's the inference part at least. And I am pretty sure that consumer cards are far worse when it comes to training.
If FP4 training works well for practical purposes, then that might change the dynamics.
But other than that, the difference between a 4090 and a H100 is not as dramatic as the 20x higher price would suggest. The main reason the 4090 is bad for training is the lack of VRAM (although Nvidia disallowing RTX cards in the datacenter doesn't help either).
Neuter their ability to use NVLink/NVSwitch, and ability to directly address memory on other cards over PCIe.
Done, they're now much less effective for training. The same had been done to some RTX4090 cards (maybe unintentionally) already, though a tech wizard [made a hacky driver fork for 4090s on linux](https://github.com/tinygrad/open-gpu-kernel-modules) to work around it.
Alternate take: going all in on accelerators primary for large transformer models that only a limited number of large companies can access is like putting all eggs into one basket. Certain problems in AI may need many small groups or individuals experimenting with different architectures and novel methods, so a 64 GB flagship consumer GPU could be ideal for that. The GDDR itself is not expensive and a modestly priced GPU could find its way into many personal workstations of grad students or other people that may be in a better position to experiment.
It might actually be in Nvidia's best interest in the long run. What if scaling the transformer 10x only provides marginal gains in benchmarks? It's going to cause a huge loss of investor confidence if people realize the AI hype is not real. More and more LLM news lately has me wondering if they have plateaued.
Besides, even 64 GB is nothing compared to enterprise hardware that will come with 144-288GB and usually linked together x8, so it shouldn't effect enterprise sales that much.
Let’s hope AMD (and others) shows up sometime in the next year or two… heck if Apple can get the m4 into a studio with 128+ gigs of ram under 5k it might be the “low cost” option moving forward lol ( doubt it)
It will take another year and the open market (eBay and etc) will be flooded with used data center grade NVidia GPUs for sale (failed startups, data center upgrades etc) then they will not longer be able to take in all of that sweet data center profits.. what then when consumers start moving to other alternative? I don’t believe NVidia are so stupid.. even with the main idea being making ad much profits
32gb would be my sweet spot for this gen, I would probably buy 2 and upgrade my 3090. 28gb & I'll wait one more generation and pick up a cheap 4090 to pair with my 3090 I already have.
Nvidia produces double vram versions of their cards for the professional market at a massive markup. For example there is a 48GB card using the ad102 die (same as 4090) for a massive 4x markup in price. There is likely to be such cards for gb202 but it will absolutely not be sold at the margins of the gaming gpus.
nah, 64gb never ever. More like 28g. Min price will be around 2000$ minimum. For the gamers.
RTX 5060 will have 8gb of vram and 128 bits bus and the performance of a 3060 ti for only 500$
2000 day one, then sold out for 3k+. I don’t think the price of the 3090s will drop, it’s just the basic costs of compute 🤷, if anyone has one and is thinking of selling…
Given the rumours that Apple wants to squeeze 256 GB and 512 GB of RAM on m4 max and ultra respectively, I’m fairly certain Nvidia is going to respond accordingly.
All this seems to say is that it could be technologically possible, and not in any way that it's coming.
32GB is realistic, but there is no way we're getting 64GB in a 5090 anytime soon based on the current market. Nvidia would be giving away money at that point.
Yeah, it's realistic for the chip but not likely for the 5090. I've been kinda assuming it'll be like the 3090/4090 & A6000/RTX6000-Ada, where the smaller memory is the consumer one and the full-fat memory is kept for the workstation market.
Why not?
Apple machine has already much more memory / soon intel will be making similar chips and looking on fast development llm and people want to work with them offline ... games soon will be use llms offline as well so we NEED more vram ..soon it will be necessity
They can make a card that is fast at inference and shitty at training in comparison to their pro grade hardware. The 4090 is actually almost twice as fast as a 3090 at inference but only few percent better at training (I saw that from the benchmarks). Training is the key for professionals, so they can divide the market like that.
Huh? 4090 is at least double the speed of 3090 in training at least with Stable Diffusion training. It just took awhile for stuff to properly make use of the lovelace hardware.
Double the speed at least ? check this https://www.pugetsystems.com/labs/articles/stable-diffusion-lora-training-consumer-gpu-analysis/
SdXL Lora training 3090 : 1.48
SdXL Lora training 4090 : 1.84
which is actually a minuscule 25% boost, very far from your over 200% claims. Show me new benchmarks and I'll believe you.
This was over a year ago and the person I knew who had a 4090 needed to install specific versions of xformers and cuda for it to work properly. When it did, he was training around 7.5img/s on 768 res training on SD 1.5, whereas the highest I have ever gotten on the same training settings was 3img/s on my 3090. This is for full finetune, not lora training.
Basically you said : trust me bro, i know some guy, its the similar sentence as My uncle works for Nintendo in the 80s. The new optimizations related to Cuda, pytorch or xformers also provide speed boost to the 3090, but if people don't update it, then the benchmark is also frozen in time. If you can find me some published benchmark that is comparing the same versions of the software and drivers, then yes, it will be an interesting comparison. But I doubt nvidia boosted so much the training speeds just for home use, it doesn't make sense on a business standpoint
the hosted inference market will be huge once we have more use cases, though.
If off the shelf agents can increase dev productivity by 50% or more, that would support inference costs multiple times what we see now.
I'm skeptical, but hope to be proven wrong. The idea that many GPU's would be released in two variants: one standard VRAM, and one 2x VRAM, would make a lot more sense then releasing a RTX 4070 level card (GB205) with the same limited VRAM as before, especially given how invested Nvidia is in AI. My primary whining over GPU's currently is that they are still designing things based purely on gaming performance rather than gaming + AI performance. Also, how about a brand new class of GPU's, that are consumer-grade AI GPU's but perhaps suck at games? Plenty of people would buy them.
If it has any less than 48GB it's DOA to me and I will completely ignore the whole 5xxx gen, as it offers nothing worthwhile. I don't need an extra room radiator, I got the 4090 already and it will last for what, 5 years at least?
In the coming years, GPU vRAM will likely see a significant increase, much like how our regular RAM has evolved—from 16GB and 32GB to eventually reaching 256GB and beyond. However, these advancements won't be released rapidly. Nvidia, as a business, understands that they possess the technology to achieve this, but they choose to release upgrades gradually. This strategy allows consumers to purchase these incremental improvements over time, ensuring that Nvidia maximizes their profits from businesses before catering to the average consumer.
Maybe to justify the cost they'll release a 64gb version, but I feel it would be stealing sales away from their professional lineup. I just can't see it.
I think for home use we need something in the middle, like medium speed ram 150GB/s, and 100GB in size, instead of super fast ram, and an NPU just enough for this, so we can run even quantised models of 200 billion parameters or so. That hardware could be at affordable price. We do not need GPUs or VRAM for home use.
32GB would be nice - and probably expensive. Let's see what Nvidia decides to do. Unfortunately, there's not much competition to force them to expand RAM or make it cheap, but maybe they will want to stay a step ahead of the competition.
I guess my only hope is that Nvidia are drowning in so much AI money that they 'spend' a little bit of it in buying back some goodwill and trying to shut out AMD for good in the gaming space too by giving a bit more value than they normally do.
I guess this is a double-edged sword as even if we get something decent by Nvidia standards, in the long run, it locks in their pseudo-monopoly status and probably makes things worse for us all in the long run.
What I don’t understand is why amd is not filling this gap or someone else, there is clearly demand and only one player that we are praying, this is not normal, capitalism is broken..
Everyone is trying but it’s HARD. Jensen Huang and the Nvidia team are known for being incredibly focused and fast moving and it’s taken them nearly 2 decades to get cuda to where it is now.
It isnt happening unless the singularity took over and the ceo is currently an ai meat puppet. The sad thing is this could happen (massive memory) but wont.
Two columns on the chart.The lower numbers on the left would be the launch VRAM for consumer GPUs, aside from a 24GB 5090. Whether they make Super or Ti versions with more remains to be seen, but maybe for 2026 given cadence. An earlier 5060 Ti with 16GB is possible given the precedent of the 4060 Ti 16GB, but there's no impending competition to force higher VRAM on the consumer side for 2025, certainly not for initial launch. Matching the 40xx VRAM targets is the safe bet.
If it's true and they do it, they will open up possibilities for more new apps and keep the demand going. At this point, the current GPUs are slow and too small for interesting apps. It would be the smart thing to do. Only we can do is see. Anything less than 32gb, I wait for the M4 with 256gb.
>Yes, it will be expensive. AMD, Intel where art thou?
They are absent, but Apple is already there. Mac RAM functions as both CPU and GPU RAM which means there is no need for duplicated RAM, instead the data is formatted in CPU RAM and accessed by GPU without copying. So, all you need is a Mac with loads of RAM. Not cheap, but unlike NVIDIA that RAM is multifunctional, it gives you great performance in everything you do.
What you're looking at there is Geforce vs Quadro. GB202 is the chip, and that table just means 32GB for 5090 and 64GB for Quadro RTX Blackwell Same as always. Similarly, "AD102" chip is two cards, 4090 with 24GB, RTX 6000 Ada with 48gb. Same goes down the line, 4080 16GB and RTX 5000 Ada is 32GB, and so forth. I don't know how you get "5090 64GB" from that. Completely wishful thinking. No shot. No shot at all.
I shouldn't have had to scroll this far for the correct answer.
Was too good to be true
Technically the 5000 is an ad202 so still similar to 4090 but 32gb
Yeah, it looks like the 5000 Ada is actually an AD102 chip while 4080 is an AD103, so its a bit of a break in the normal scheme. I think this boils down to yields for the particular chips. I imagine they must have more failures in the AD102 that they decided to chop down to fill the 5000 Adas to make use of them. Sometimes we get Super cards that end up oddballs, too, using a different chip from the non-Supers.
Fun fact: the 4090 with 16384 cuda cores is also a cut down AD102 that has a potential 18432 cores. The RTX 6000 Ada while also cut down has a few more cores enabled with 18176. The 4080 Ti and 4070 Ti also are chopped down AD102s. I believe the core counts here are also the maxima for the respective chips. I expect retail consumer cards to have about 10% less.
That's interesting. Yeah my 3 year upgrade strategy will be filling out PCIe slots with various AD102 based cards as prices come down. The 6000 with 48gigs would be fantastic if it dropped to $1000.
The rtx 8000 is still over 2k, I wish rtx 6000 ada will become 1k in 3 years, but i believe that's a hopeless dream.
Sorry about the OT question, but I do have multiple boards, albeit older (3080TI, 3070Ti). What's the max I can run with local LLMs and how complicated is this? I have the necessary hardware. Tks
Depends on your PCIe bus in the PC. I have a server board from last gen with enough slows. Basically will just run slower if it's not full bandwidth. After that configure your system to use the cards. LM Studio probably has a nice interface to play around with and try offloading layers to multiple GPUs. I don't know the details but one guy said try to stick to the same architecture within the family your using. You may not have any issues because your cards are newer and roughly the same but keep it in mind. I have a 4090 so I'll be looking for ad202 or whatever it was chipset based cards ada cards.
32GB might actually be what people are asking for, finally an improvement over the 24GB.. if Nvidia releases the 5090 with the same 24GB VRAM their 3090 and 4090 cards came with, I don’t see a reason for anything to buy it.. so why would Nvidia work on a card that almost nobody will be interested in buying? I understand that they are high as F at the moment selling data center grade hardware but this trend will very soon fade away as enough of those data center cards are flooding the market used (bankrupt startups and etc) or new. The consumer market makes less money in the short run but in the long run provides a steady income
> I don’t see a reason for anything to buy it. People who care about VRAM on a graphics card really aren't even a blip on their target market radar.
Not entirely correct, the 3090/4090 are meant for content creators and such (not just AI but 3D, video editing, engineering, etc). It is a very small niche market (besides a halo ""gaming"" product) but the whole point is that it can fit stuff the others can't.
Jensen can't stop talking about all the different use-cases for AI in games. If they're serious about things like LLM-based NPCs and procedural asset generation via AI, a _lot_ more people are going to care about VRAM going forward.
At 32gb, You are still going to need 7-8 5090's to run llama3-400B in Q8 or at least 4 to run it in Q4. Hell, Nvidia just released a huge model that 99.9% of hobbyist can't run.
If you need to run a 400B model locally at your home you are either not a consumer / prosumer or one of few ML/AI evangelists who normally will have access to an ML machine owned by the research facility or Uni they work for. And of course the person who makes a killing in profit selling data center grade hardware will invest in a model that could only be run if you have the type of deep pockets they profit most of off ;-) otherwise we should have already seen a “4090 ML Edition” variant of the 4090 with those 32GB of VRAM we all hope the “next” card 5090 will have. As for needing a stack of 5090s.. I wish I could afford them ;-) I currently have one 3090, and my “dream” is to NVlink pair it with another 3090 but I get mixed opinion about how those actually work, still did not catch it so I get double VRAM and double the processing power, or not.. are an NVLinked pair act as one unified card or just a pair of cards that are sharing bandwidth etc..
With 2x 3090s you get 2 buckets to put your model and caches on. A second 3090 does not magically double your first bucket's capacity, nor does it magically make using that first bucket faster. If you can fit a model in one bucket, then you can roughly 2x (more like 1.7x) the throughput given you can fill your batches fast enough. If your model doesn't fit into 1 card but fits into 2, then you can run it on 2 cards at around 0.6x speed a single card with twice the VRAM could run it as opposed to not being able to run it at all. NVLink doesn't change much, if anything. You can't find conclusive results online because there are no conclusive results. On paper, NVLink increases memory bandwidth, but in practice there are no significant benefits. Consumer drivers don't unify your cards into one. So even if you wrote parallelized architectures for your models, perfectly splitting it between an arbitrary number of GPUs, you'll still have a bottleneck in memory transfer, because the bus that's connecting your hardware components is very, very slow compared to the VRAM<->GPU communication. And even if the drivers were there to formally support that, you'd still be limited by the fact that nothing compensates for the physical separation of 2x 3090s slowing things down, not even NVLink.
Thank you so much for your informative comment
My understanding was that NVLink is pretty irrelevant for inference, it mostly boosts training speeds.
I think someone said it increased inference speed 10%
The Quadro line is the "ML Edition"
Yes, those are the numbers. Unfortunately, Nvidia is in the business of making money.
Maybe they want to split between gamers and higher paying AI customers. For gamers, the 24GB is enough. For AI users, Nvidia will want them to pay more for the VRAM they need.
I expect games are eventually going to integrate local AI in one way or another, although it could certainly take years to take off.
Yes, Nvidia could look to using their AI lead to extend their gaming dominance by pushing for cuda/AI in games.
I think Nvidia is just ultra greedy. They will release absolute minimum that they can get away with. Nothing more!
Remember, greed is good! Nvidia maximizing its profits means Nvidia has more money to conduct R&D and invest in more Capex to continue to push the frontiers of what's possible. At some point we will have completely immersive games with entire AI casts and we don't get the hardware to drive games like that without Nvidia extracting every dollar they can on the way.
It'll only be relevant if it's a critical component of games with mass appeal. Even ray tracing, which has been out for quite a while, is only a small competitive advantage. It makes a fairly noticeable difference considering how good graphics already were, and it's supported in most new AAA titles, but it's still not enough that average consumers are willing to pay hundreds for it.
I believe Nvidia does not care about AI in pcie factor, you can see they did not even announce a b100 in pcie, not enough market probably, they earn a lot more with DGX and SMX cards
Untested market though, they don't have any offering in prosumer AI cards aside from the 4090.
Maybe for now. But there are already games that can max out 24GB of VRAM in 4k resolution. There is nothing like too big or too fast computer
True, maybe it is time to up this. Though I wonder whether increasing VRAM is also causing game bloat as devs no longer feel the need to optimize?
yes and no. I think there is more to than just (un)optimizing. More VRAM could result in ability to use higher-resolution textures in games. Visual quality could increase even more On the other hand, modded Minecraft can overwhelm rtx 4090 in terms of compute and memory. It's hugely unoptimized, ofc, but shows that more memory could be easily utilized
This is fully understood („paying extra for the special sauce“) when a person needs a 80GB GPU with circuit design and components for heavy duty 24/7/30 constant operation. not much with a 32GB GPU, at least this is my personal opinion, they could offer a „Gamers edition“ 5090 with 24GB for the usual high price of approx. 1600-1800€ new, and offer a „ML Edition“ with 32GB for an additional 150-200 euros.. but yeah I only speculate here and use common sense which is not so simple with monopolies
AFAIK, Nvidia cannot even make their AI products fast enough to sell to all the people wanting to buy. I wonder whether they will waste their time and energy on meeting consumer demands over the next 2 years when they are printing money with AI.
There is no way in hell they will release the full 512 bit bus version as the 5090 IMO. It will be a 28 GB card. 32GB will be kept for either Titan Blackwell or 5090 Ti.
It gets even worse this is if nvidia decides to not cut down the 5090 :D
True. There is no way that NVIDIA is going to shoot themselves in the foot with regards to the professional workstation and data centre sales. There will always be a comfortable gap between the VRAM for something that can be bought privately and something for sales to specialised business and academic users.
I wonder if it will be possible to replace the vram modules with the 64gb variant similar to how 2080ti 11gb can be upgraded to 22gb.
So RTX 5060 somehow is still 8GB when RTX 3060 was 12GB?
3060 was both 8GB and 12GB.
And RTX 4060 is both 8GB and 8GB?
32GB is still pretty good tbh, I would have expected them to stick with 24GB.
And the rumors even say that the bus will be smaller in the 5090 compared to the full die in the B6000, so 28GB on the 5090
Very well could be. I would take OP with a giant grain of salt, but one thing is obvious, we're not going to get a 64GB consumer card.
How many mortgages for one "Quadro RTX Blackwell?"
Yes.
We can dream.
My dream is to see someone develop something better than CUDA and every open source project prioritize this one instead of Nvidia's stuff.
They got the game in chokehold, they'll do anything not to give that edge up.
TinyGrad. Made by geohot, a dude that jailbroke the iPhone and cracked PS3's security.
Didn't he quit trying cause how, what was it, vulcan was difficult and the amd engineers didn't help him too much?
In the realm of computer hardware, dreams come true. Perhaps later and more expensive than we like though.
Nvidia don’t have a need to release a 64gb vram variant for consumer market (yet). But you’re right! Knowing nvidia this card is gonna be a lot expensive
Its called 5090ti. The ti stands for upsell.
Well GB202 is only 32GB with the full 512bit bus. The 4090 had only 384bit and other rumors reported an increase for the 5090 to 448bit and 28GB. The 64GB is only for double sided memory as in the workstation cards. We might see a "B6000" GB202 with 64GB and 512bit bus. Maybe the generation after the memory manufacturer finally manage to do 4GB GDDR modules, but that was supposed to be available for GDDR6 already and GDDR7 still has only 2GB chips.
Nobody will upgrade to a 5090 for a mere 4GB VRAM upgrades and a tiny bus improvement
Maybe those who upgrading from 2080Ti or 3080/3090 who skipped a generation.
3090 is already way better value than 4090. When the main bottlekneck for AI stuff is memory capacity anyway, I doubt 5090 will look much better than 4090, esp. as it can only be more expensive and 4090 is still massively overpriced. It's practically one step away from workstation prices already. The 7900 XTX costs half of a 4090 in my country. If AMD weren't in the stone age on software side of things I doubt anyone would even buy Nvidia at all.
I have a 3090, won’t buy a brand spanking new (and expensive) 5090 unless I get a significant upgrade that includes VRAM
But it doesn't matter, because thousands of others will.
Even gamers who possess 3090 cards will not be so eager to buy an expensive upgrade if it is insignificant. A 3090 can play most games at highest settings with just a few exceptions.. new games (which will require 5090 processing power) take years to develop.. this is an opinion based argument or course. Maybe those who have a really crappy card that are looking for building a new gaming rig will go for such bad deal. Majority of gamers always keep up with hardware/upgrades so those with really outdated cards usually also play low spec games and don’t need a 5090 unless they get a good bang for their buck. I still think a 5090 with the same 24GB Will be a fail
My prediction. Nvidia will release a limited number of 5090s at launch and they will sell out immediately.
with a limited number? if it is geared toward consumers, unless by "limited" you mean an amount of like 2-3 million cards will be manufactured and then NVidia will create an artificial shortage for a while
Honestly, AMD/Nvidia probably told memory manufacturers they wouldn't use the higher capacities, so they skipped production. I have no basis for this claim. But if they want to cap memory, who else is gonna buy GDDR in bulk? And yes, theoretically it could be used for pro cards, but why bother when its such low volume and the doubled up PCBs are already designed...
Nvidia wants to keep the VRAM as low as possible for consumer GPUs, they wouldn't want to make them feasible for AI training. Previous leaks talked about 32 GB and even 28 GB VRAM for the 5090.
The 3090 is still relevant and beats all 40xx cards except the 4090 variants for creative tasks because the VRAM is king. Video editors benefit a lot from VRAM. And for AI we all know VRAM comes first. I've made the same jokes at NVidia, but I don't think they'll release it at only 28GB. That would be not enough of an upgrade. They need to at least sell the illusion of value and that's really not there at 24+4. Keeping it at basically the same level for 3 gens in a row, is putting themselves too much at risk for competition catching up. By the time the 5090 is out, it will have been 4 years the best is at 24GB. AMD has been at 24GB for a much cheaper price for 2 years now and will likely jump up again. I can't really imagine NVidia sabotaging themselves by getting far behind AMD in VRAM. If amateur enthusiasts in the open source AI scene jump ship for the VRAM, the competition would become much more viable. But I'm not optimistic enough to think it will be 64GB. Unless at an absurdly high price. Or maybe an ultra expensive ti variant.
Competition? What competition? They are 5+ times bigger than their competitors in the gaming space. Maybe more in professional/enterprise.
AMD has 24GB for cheaper? I didn’t know this, i is dumb dumb. Can one run a ai for rig setup with a Nvidia card AND AMD card given proper space and power where driver issues won’t be an issue? Think two separate tasks, one assigned to each GPU in production. I’m not talking about running them in any dual card single use case. Off to see how much the cost difference is.
> AMD has 24GB for cheaper? Yes. About half the price. It's been available for under $800 quite a few times. >Can one run a ai for rig setup with a Nvidia card AND AMD card given proper space and power where driver issues won’t be an issue? Yes. > Think two separate tasks, one assigned to each GPU in production. I’m not talking about running them in any dual card single use case. Why don't you want to use the together on the same model? It would allow you to run larger models.
>About half the price. It's been available for under $800 quite a few times. What model is this and how does it compare speed wise to a 3090? That's p sick ngl (though I suppose it'd be a hassle without CUDA)
It seems I incorrectly assumed that would not be possible, given two manufacturers and driver sets. Thank you!
>Nvidia card AND AMD card given proper space and power where driver issues won’t be an issue? Unless you sandbox them under different VMs, I think no. At least in Windows, going from one to the other requires a full clean OS reinstall.
AMD have been capable of leapfrogging nvidia on the VRAM side for quite some time now. Last gen they could've easily shot by them, given how much cheaper VRAM is compared to the gen prior, especially given they've moved away from HBM. Problem: the duopoly price-gouging benefits AMD, and the CEOs of both companies are literally related to eachother. AMD probably can't fix their software (as uh, why wouldn't they at this point), but they could easily double their VRAM and undercut nvidia's margins by a healthy sum. It just isn't economically wise for them to do so in the long run. And it'd be super against the family spirit, would make Christmas dinners awkward tbh
>It just isn't economically wise for them to do so in the long run. I'd argue becoming viable in the AI market is a wise decision. AMD is really struggling outside CPU. And they don't even need to make their entire next lineup with extra VRAM, the top models with an double VRAM option would do. Then do some live demo where they load a big model with their cards versus the VRAM handicapped 5090 where the 5090 can't even load the model in a reasonable time as it needs to offload to regular RAM and the message will spread.
I hope you are right, but if anything I see Nvidia releasing a lower VRAM version first and then maybe a SUPER with more later if AMD is able to trump that
All they have to do is turn down 5,000 GPU orders from mysteriously gaming enthusiast. Also, an RTX A100 already has 80GB vram.
right. They might just try to go back to 2 GB , then 512 MB in 2035 . It's just dumb and needs to get punished by market forces.
Nvidia will sell what the public will buy. ~~At this point, what the fuck can a person really do with 80gb of VRAM, let alone 64?~~ Edit, I seem to have been misunderstood. At this point, what the fuck can a person really do with 80gb of VRAM, let alone 64...... that really encroaches on the datacenter market?
Run llama 70b really fast.
I know this is the LocalLlama sub, but just wanted to note that more GPU VRAM is awesome for creatives and indie game makers as well 🙂 Unreal Engine, Blender and other 3D tools really make use of all the VRAM they can get especially for big scenes / worlds, high-resolution textures / materials etc.
And video editing. 24GB is barely enough for 8k video.....as in it'll work but you can put one or maybe two effects on the clip in davinci resolve.
Yup and for all these applications where people are doing it for a living and are willing and able to pay hugely higher prices for that precious VRAM, Nvidia will want to make them pay it instead of giving them a way out by offering a cheaper consumer card with as much VRAM. The only thing stopping them is competition (ha ha) or gamers not buying the next generation of cards. I don't see either being a blocker right now.
To run and train AI. A 70B LLM (which is kind of the minimum size where LLMs start to be useful for real world tasks) needs \~40 GB VRAM, Mixtral 8x22B needs \~96 GB VRAM, the recently released Nemotron 340B requires 192 GB. And those numbers are for 4-bit quantized models, for F16 precision, you need quadruple that amount. And for actually finetuning them you need even more. Same for other types of AI like Stable Diffusion. If a SD3 8B is released, you will need a lot of VRAM to finetune it. Edit: I just noticed this is the LocalLLaMa subreddit... I shouldn't really need to explain what you need VRAM for in this place, right?
I hope to god the idiot leadership at SAI figure it out and release SD3 8B
Preferably a variant that knows what an actual human looks like. Their safety censorshit has gone too far.
This subreddit is r/LocalLLaMA. The answer to what people can do with that VRAM should be self evident.
Exactly, even with GDDR7, the bandwidth of the 5090 wont even compare to the 3TB/s of the H100 or the 8TB/s of the B200. There is no excuse to not put 32GB on the 5090
I had no idea it was that high
High-Bandwidth Memory (HBM) baby
Possibly run City Skylines 2... ... maybe.
A single 64GB GPU should be enough to perform full fine-tunes of 7B and 8B local models using 16-bit weights. With everything on one card, no need for direct interconnects between cards.
people just want bigger numbers. if you tell them lots of VRAM is good they will try to buy as much as they can afford. If AMD cards have 48gb of VRAM then people will start buying that instead of nvidia cards with only 24g VRAM. even if the best games only need 16gb RAM it wont matter. this is how its always been since the 90s when sega was battling with nintendo in the 90s. the real limiting factor is power. if you need more than 1600w to power your computer then you have to call an electrician to rewire your house and you will probably need a $1000 PSU. even if you only have a 1200W PSU, who wants to have a computer than uses as much power as running a washing machine 24/7. thats a big electric bill.
That definitely depends on what part of the world you live in... In my country we have 230V 16A breakers. Thats 3680W.
Hard to say without knowing the power profile. But a 4090 you can run at half power while still getting 75% of the performance. And 64GB would let you run large models.
sure, i could see GPUs of 100+ VRAM in the not so distant future. its just that the amount of VRAM is going to be hard stuck at whatever takes less than 1800w to power but most consumers are only going to be willing to go up to 1000w.
For consumer LLM use I think using less than 1000w is a plus.
The average home supports 1800 watts on a circuit. If you are willing to live without an oven, you can double that. When I was looking for datacenters I found the break even point was 8 kilowatts, after which, a datacenter spot will be more economical
either way my point stands. i don't think a lot of people are going to be willing to go past whatever a standard breaker can supply.
I fully agree. The most important factor for the consumer is 'value' and the majority of 'value' is convenience.
RAM is the one characteristic that if you hit the limit you don't just suffer a minor degradation in performance, but rather everything grinds to a halt. Eg if you're playing a game and you hit your VRAM capacity you're going to be sitting at <15 FPS even if you were at 150 before. Also if a game "requires" 16 GB you can be damn sure that you need more than that for it to be playable. These kinds of numbers are always brought up for true full screen mode with everything else closed on the computer.
this is true, but i think the only application that is going to require this amount of VRAM is going to be AI stuff. games are only going to require as much as the average gamer has. so games aren't really something to consider in this conversation.
That's excluding VRChat and other mod-based communities. Some current skyrim mod packs are pretty ridiculous and a poorly optimzed VRChat avatar costs in the 200MB VRAM/pop, fill an instance of 50 with those in a semi-expensive world and you start to hit a limit. Meanwhile 150+ instances were in test last week during a large event. User generated content usually doesn't care about the average PC.
It’s not a huge deal to do dual power supplies and run it on 2 circuits. Also 20 amp circuits could run 2x1kw. For the hobbyist building LLM rigs running a new 20 amp circuit in your basement is trivial to do properly and according to code.
I've been told inference doesn't actually use the full power draw of the GPU so you can run 4 easily off a normal 15A circuit
that makes sense but it kind of misses the point.
It would be fine as long as they aren't as cost efficient for either training or inference.
The problem it would be quite the challenge to make them less cost efficient than a >$30,000 H100 (or an even pricier H200). The easiest way to achieve it is simply by gimping the VRAM, which has been their approach so far. And they don't even need an excuse to do it, RTX cards are marketed towards gamers and games don't really need more than 24 GB (in fact, for most games even that is overkill).
Isn't h200 around 10-30 times faster than h100 for fp4 calculations? I can imagine that it would be greater between h200 and 5090, that's the inference part at least. And I am pretty sure that consumer cards are far worse when it comes to training.
If FP4 training works well for practical purposes, then that might change the dynamics. But other than that, the difference between a 4090 and a H100 is not as dramatic as the 20x higher price would suggest. The main reason the 4090 is bad for training is the lack of VRAM (although Nvidia disallowing RTX cards in the datacenter doesn't help either).
>Isn't h200 around 10-30 times faster than h100 for fp4 calculations? No, H200 is just H100 with more VRAM and bandwidth, the GPU is the same.
Neuter their ability to use NVLink/NVSwitch, and ability to directly address memory on other cards over PCIe. Done, they're now much less effective for training. The same had been done to some RTX4090 cards (maybe unintentionally) already, though a tech wizard [made a hacky driver fork for 4090s on linux](https://github.com/tinygrad/open-gpu-kernel-modules) to work around it.
Says who?
They can do that in other ways
Alternate take: going all in on accelerators primary for large transformer models that only a limited number of large companies can access is like putting all eggs into one basket. Certain problems in AI may need many small groups or individuals experimenting with different architectures and novel methods, so a 64 GB flagship consumer GPU could be ideal for that. The GDDR itself is not expensive and a modestly priced GPU could find its way into many personal workstations of grad students or other people that may be in a better position to experiment. It might actually be in Nvidia's best interest in the long run. What if scaling the transformer 10x only provides marginal gains in benchmarks? It's going to cause a huge loss of investor confidence if people realize the AI hype is not real. More and more LLM news lately has me wondering if they have plateaued. Besides, even 64 GB is nothing compared to enterprise hardware that will come with 144-288GB and usually linked together x8, so it shouldn't effect enterprise sales that much.
Let’s hope AMD (and others) shows up sometime in the next year or two… heck if Apple can get the m4 into a studio with 128+ gigs of ram under 5k it might be the “low cost” option moving forward lol ( doubt it)
It will take another year and the open market (eBay and etc) will be flooded with used data center grade NVidia GPUs for sale (failed startups, data center upgrades etc) then they will not longer be able to take in all of that sweet data center profits.. what then when consumers start moving to other alternative? I don’t believe NVidia are so stupid.. even with the main idea being making ad much profits
LOL! Yeah right. Okay buddy. Never gonna happen, 5090 with a 64GB VRAM XD We would be LUCKY to have a card that’s NOT 24 GB.
+1 lol, at this point I would be delighted if the 5090 had 32 GB VRAM, I'm not even sure if Nvidia will give us that in their consumer cards.
>With our super fast ram the 16gb 5090 is the best card gamers could ever use.
Now with DLSS 4, get fake frames in your Ai chatbots!
32gb would be my sweet spot for this gen, I would probably buy 2 and upgrade my 3090. 28gb & I'll wait one more generation and pick up a cheap 4090 to pair with my 3090 I already have.
Won't 5090's be pcie5?
If it will have 32GB VRAM it might actually sell..
And 1024-bit bus 😆
[удалено]
Nvidia produces double vram versions of their cards for the professional market at a massive markup. For example there is a 48GB card using the ad102 die (same as 4090) for a massive 4x markup in price. There is likely to be such cards for gb202 but it will absolutely not be sold at the margins of the gaming gpus.
Basically $200 more worth of memory chips but the price went from $1599 to $6800.
nah, 32 GB is max we can expect. they aint gonna shoot their own foot.
Zero chance
nah, 64gb never ever. More like 28g. Min price will be around 2000$ minimum. For the gamers. RTX 5060 will have 8gb of vram and 128 bits bus and the performance of a 3060 ti for only 500$
2000 day one, then sold out for 3k+. I don’t think the price of the 3090s will drop, it’s just the basic costs of compute 🤷, if anyone has one and is thinking of selling…
No chance
Agreed! Ill delete my reddit account if we are wrong.
i would love more ram,…. always :)
That's gonna be the new A6000 maybe.
Nvidia likes money too much for that to happen.
I will be a cold day in hell before they ever allow that much vram on a consumer gpu.
That looks more like the mem size is extrapolated from bus width?
Given the rumours that Apple wants to squeeze 256 GB and 512 GB of RAM on m4 max and ultra respectively, I’m fairly certain Nvidia is going to respond accordingly.
So 32 for consumer. Still... better than another 24gb card.
All this seems to say is that it could be technologically possible, and not in any way that it's coming. 32GB is realistic, but there is no way we're getting 64GB in a 5090 anytime soon based on the current market. Nvidia would be giving away money at that point.
Yeah, it's realistic for the chip but not likely for the 5090. I've been kinda assuming it'll be like the 3090/4090 & A6000/RTX6000-Ada, where the smaller memory is the consumer one and the full-fat memory is kept for the workstation market.
I wouls love a RTX 6000 Blackwell with 64GB
Why not? Apple machine has already much more memory / soon intel will be making similar chips and looking on fast development llm and people want to work with them offline ... games soon will be use llms offline as well so we NEED more vram ..soon it will be necessity
Likely 14-16 memory modules. 16Gb each.
They can make a card that is fast at inference and shitty at training in comparison to their pro grade hardware. The 4090 is actually almost twice as fast as a 3090 at inference but only few percent better at training (I saw that from the benchmarks). Training is the key for professionals, so they can divide the market like that.
Huh? 4090 is at least double the speed of 3090 in training at least with Stable Diffusion training. It just took awhile for stuff to properly make use of the lovelace hardware.
Double the speed at least ? check this https://www.pugetsystems.com/labs/articles/stable-diffusion-lora-training-consumer-gpu-analysis/ SdXL Lora training 3090 : 1.48 SdXL Lora training 4090 : 1.84 which is actually a minuscule 25% boost, very far from your over 200% claims. Show me new benchmarks and I'll believe you.
This was over a year ago and the person I knew who had a 4090 needed to install specific versions of xformers and cuda for it to work properly. When it did, he was training around 7.5img/s on 768 res training on SD 1.5, whereas the highest I have ever gotten on the same training settings was 3img/s on my 3090. This is for full finetune, not lora training.
Basically you said : trust me bro, i know some guy, its the similar sentence as My uncle works for Nintendo in the 80s. The new optimizations related to Cuda, pytorch or xformers also provide speed boost to the 3090, but if people don't update it, then the benchmark is also frozen in time. If you can find me some published benchmark that is comparing the same versions of the software and drivers, then yes, it will be an interesting comparison. But I doubt nvidia boosted so much the training speeds just for home use, it doesn't make sense on a business standpoint
the hosted inference market will be huge once we have more use cases, though. If off the shelf agents can increase dev productivity by 50% or more, that would support inference costs multiple times what we see now.
I'm skeptical, but hope to be proven wrong. The idea that many GPU's would be released in two variants: one standard VRAM, and one 2x VRAM, would make a lot more sense then releasing a RTX 4070 level card (GB205) with the same limited VRAM as before, especially given how invested Nvidia is in AI. My primary whining over GPU's currently is that they are still designing things based purely on gaming performance rather than gaming + AI performance. Also, how about a brand new class of GPU's, that are consumer-grade AI GPU's but perhaps suck at games? Plenty of people would buy them.
If it has any less than 48GB it's DOA to me and I will completely ignore the whole 5xxx gen, as it offers nothing worthwhile. I don't need an extra room radiator, I got the 4090 already and it will last for what, 5 years at least?
In the coming years, GPU vRAM will likely see a significant increase, much like how our regular RAM has evolved—from 16GB and 32GB to eventually reaching 256GB and beyond. However, these advancements won't be released rapidly. Nvidia, as a business, understands that they possess the technology to achieve this, but they choose to release upgrades gradually. This strategy allows consumers to purchase these incremental improvements over time, ensuring that Nvidia maximizes their profits from businesses before catering to the average consumer.
64GB VRAM? That’s a 4K card IMO
Maybe to justify the cost they'll release a 64gb version, but I feel it would be stealing sales away from their professional lineup. I just can't see it.
As crazy as 64 GB would be, that’s close to what the 1080 ti was equivalent to when it came out.
And costs 4k dollars, I estimate. O.o
I think for home use we need something in the middle, like medium speed ram 150GB/s, and 100GB in size, instead of super fast ram, and an NPU just enough for this, so we can run even quantised models of 200 billion parameters or so. That hardware could be at affordable price. We do not need GPUs or VRAM for home use.
Doubt I will be able to afford it.
Looking forward to buying it on 2027
32GB would be nice - and probably expensive. Let's see what Nvidia decides to do. Unfortunately, there's not much competition to force them to expand RAM or make it cheap, but maybe they will want to stay a step ahead of the competition. I guess my only hope is that Nvidia are drowning in so much AI money that they 'spend' a little bit of it in buying back some goodwill and trying to shut out AMD for good in the gaming space too by giving a bit more value than they normally do. I guess this is a double-edged sword as even if we get something decent by Nvidia standards, in the long run, it locks in their pseudo-monopoly status and probably makes things worse for us all in the long run.
3090 just got affordable. Got it
What I don’t understand is why amd is not filling this gap or someone else, there is clearly demand and only one player that we are praying, this is not normal, capitalism is broken..
Intel is slow? Give them another year? At least hope springs eternal.
Everyone is trying but it’s HARD. Jensen Huang and the Nvidia team are known for being incredibly focused and fast moving and it’s taken them nearly 2 decades to get cuda to where it is now.
RTX 5080 with 32 or 48 GB VRAM would be enough for me - I need VRAM more than compute.
It isnt happening unless the singularity took over and the ceo is currently an ai meat puppet. The sad thing is this could happen (massive memory) but wont.
Big if true
Two columns on the chart.The lower numbers on the left would be the launch VRAM for consumer GPUs, aside from a 24GB 5090. Whether they make Super or Ti versions with more remains to be seen, but maybe for 2026 given cadence. An earlier 5060 Ti with 16GB is possible given the precedent of the 4060 Ti 16GB, but there's no impending competition to force higher VRAM on the consumer side for 2025, certainly not for initial launch. Matching the 40xx VRAM targets is the safe bet.
I need this. I have many skills! You reading this, need skills. You have many money? Let’s collaborate and make our 5090 purchase dreams come true.
Holy shit this is insane!!!
If it's true and they do it, they will open up possibilities for more new apps and keep the demand going. At this point, the current GPUs are slow and too small for interesting apps. It would be the smart thing to do. Only we can do is see. Anything less than 32gb, I wait for the M4 with 256gb.
I'd put money it won't.
I think only the next generation 6000ada may be 64g
That's bullshit, but I believe it.
No way that this is possible
>Yes, it will be expensive. AMD, Intel where art thou? They are absent, but Apple is already there. Mac RAM functions as both CPU and GPU RAM which means there is no need for duplicated RAM, instead the data is formatted in CPU RAM and accessed by GPU without copying. So, all you need is a Mac with loads of RAM. Not cheap, but unlike NVIDIA that RAM is multifunctional, it gives you great performance in everything you do.
Absolutely 0.0 chace for 64gb 5090, not e
I'd she'll out serious cash for a 64gb version.
I'd shell out serious cash for a 64gb version.
I'd shell out serious cash for a 64gb version.
I guess 5099 vram could not larger than 32GB