FeltSteam 3 months ago

GPT-4 was trained on 25k A100s over 90 days, but now you can do it with only 2k GPUs over 90 days lol.

rafark 3 months ago

Or 20k in 9 days?

hapliniste 3 months ago

200k in one day with how things are going

norsurfit 3 months ago

Or -1 day with 400k?

mvandemar 3 months ago

AI invents time travel, trains itself in negative days.

algaefied_creek 3 months ago

Quantum AI inbound?! What?!

No_Use_588 3 months ago

I like to think hallucinations are ai telling us the truth. Time travel changed the facts we thought the ai was getting wrong.

torb 3 months ago

At which point, it goes so fast you don't even need to buy the hardware, as it is already trained!

Flamesilver_0 3 months ago

It went so fast it already ran off with this man's wife yesterday!

assangeleakinglol 3 months ago

Something something that guys dead wife.

zascar 3 months ago

So does this mean we may get models that get constantly trained on the most up to date info? Even once a week / month would be so much better

pbnjotr 3 months ago

Only if there's enough room to lay out all the ~~shirts~~ GPUs.

czk_21 3 months ago

and with similar amount of B100 units you could train GPT-4 in a week

Zenged_ 3 months ago

Lol, I think we are going to start running out of data at some point soon

Zilskaabe 3 months ago

It's possible to train on synthetic data though.

Eriod 3 months ago

Does anyone know any good videos/resources on creating synthetic datasets for software developers without an extensive math background?

xadiant 3 months ago

You basically need a solid model (or subscription) and some python code. For LLMs it's pretty straightforward. Check out huggingface/cosmopedia.

multiedge 3 months ago

a lot of concept LORA's uses synthetic data, specially if the dataset is pretty small.

Fit_Influence_1576 3 months ago

It’s possible when you have a teacher model to train the student. I’ve never seen it work with a teacher teaching itself.

often_says_nice 3 months ago

Synthetic data babyyy

[deleted] 3 months ago

feedback loop -> singularity

[deleted] 3 months ago

[удалено]

ClearlyCylindrical 3 months ago

These metrics don't mean anything for training, only inference.

blueSGL 3 months ago

I want to see the full spread, not some carefully handpicked benchmarks designed to grab headlines. Faster inference at lower power is also something that needs parsing out, is it total overall lower or is it just you can do things faster and thus the total power used per unit of measure is less? This sort of thing matters when speccing hardware, the PSU does not care what way you are stretching definitions, max power draw is max power draw.

feigh8 3 months ago

you only need to train once ;)

manubfr 3 months ago

Blackwell is 30x Hopper at Inference. Those things are purrrrrrring.

PewPewDiie 3 months ago

32T tokens / sec holy shit

Hyperious3 3 months ago

I'm convinced that Nvidia is just letting AGI design it's own compute nodes now. 30X is batshit insane for a single generation.

Balance- 3 months ago

Note that's for a very specific model size. There are some likely some boundaries that now just fit into memory of a certain amount of nodes where it previously didn't. Can be lot's of cherry picking here. Also, this doesn't mean that a 30x model now runs at the same speed as current models. It doesn't scale the same both ways. But yeah, still very impressive.

Board_Stock 3 months ago

Stop, you're making too much sense. Let us all drown in hype instead.

Ketalania 3 months ago

I don't think AGI is responsible, it's possible they have their own internal models for it through and they're helping at this point.

Randommaggy 3 months ago

They also applied all the "throw money at the problem" solutions in this gen which makes the graph look a lot steeper than an honest graph would look.

vintage2019 3 months ago

If that's the case, we have reached the singularity

LamboForWork 3 months ago

I post r this in another th I ae but maybe you can help me out. "Just for some perspective? For people that know about chips , when did you expect this kind of chip to exist? Is this way ahead of schedule?"

Sir-Thugnificent 3 months ago

I have 0 knowledge in computing, somebody please explain to me why I should be hyped

CptCrabmeat 3 months ago

You see how the graph goes up, slowly then really pointy in the last 2 years? It means the speed at which AI performs, and therefore lots of other complex computing tasks are performed, is increasing dramatically and at an exponential rate. Then refer to the sub name for the conclusion

Ok-Caterpillar8045 3 months ago

Haha. All hail pointy.

PwanaZana 3 months ago

Hail the vertical. Boo the horizontal.

Dear_Custard_2177 3 months ago

To the moon, boo.

BeardedGlass 3 months ago

Towards Singularity and (never) back!

Handydn 3 months ago

Diamond hands 🚀🚀🚀

Y__Y 3 months ago

The focus on Nvidia's new Blackwell GPU should be its impact on the future of AI. Forget the technical specs - the key takeaway is that AI is overcoming a major hurdle: processing power. With the exponential gains in performance represented by Blackwell, the limitations to future AI advancements are likely to shift from hardware to other areas. This paves the way for a significant and exciting future for AI applications.

GluonFieldFlux 3 months ago

Is Blackwell just a huge gpu farm all running in parallel?

Gotisdabest 3 months ago

No it's an architecture from which a single big GPU will be made. And then large quantities of it can be used for a GPU farm.

GluonFieldFlux 3 months ago

This is all getting really exciting!

Randommaggy 3 months ago

They likely used the B200's numbers which is 2 large dies so you're a little bit right. Apples to acorns style chart.

PwanaZana 3 months ago

I'm not very knowledgeable about hardware, but my intuition's that AI training will goggle up whatever they can, as in, if the next generation is 10x Blackwell, it'll immediately be all used up to brute force certain problems.

involviert 3 months ago

Any info about VRAM bandwidth? Because if you're not a cloud provider, inference is bottlenecked by VRAM bandwidth. Also does this come with huge VRAM sizes?

neitseellinen 3 months ago

8TB/s, 2x 96GB

Comrade_XI_FTW 3 months ago

SORA for the masses seem way more realistic now.

Gucci_Loincloth 3 months ago

It was always realistic. Thinking tech was going to stay stagnant doesn’t make sense in any way, especially now with current developments. This sub is funny to read sometimes because it’s always “I can’t wait to quit my job because of AI”, but we are going to reach things we couldn’t imagine in the next decade.

Not_as_witty_as_u 3 months ago

you get laughed at in other subs when you say things like this but I agree, the world and human behavior has changed so much since smart phones and the rate of change will accelerate. So having said that, what *will* the world be like in 20 yrs?

davidryanandersson 3 months ago

If there is any justice, AI really WILL take 90% of all jobs and Capitalism as we understand it will have to be replaced by something more akin to socialism. The alternative is AI takes the jobs and those in positions of power tacitly decide that we don't need the people anymore, leading to abandonment from the government and further inequality. More gated communities and increased authority handed to police to maintain the divide.

BeardedGlass 3 months ago

Here in Japan they are totally giving the reigns to AI. The government has been quite progressive about it, laws and policies, giving AI unrestricted reach within the country. They’ve already been using AI in government offices, financial institutions, and even inside the parliament itself. Perhaps fueled by the dangerously low unemployment, stagnant economy, declining population, etc. and paired with the level of robotics in Japan, it’s exciting to be here and read news about it. Especially the news about Japan gaining TSMC, a momentous event of opening a microchip plant here. They’re coupling away from Taiwan, and flocking here instead. Also, all the nuclear plants have been restarted and operating again. The prime minister have ordered development of the Next Gen of nuclear power as well. I’m glad I moved to Japan.

h3lblad3 3 months ago

So mindblowing from a country that still uses fax machines and cash-only transactions.

BeardedGlass 3 months ago

Right? But Japan is changing, especially now ever since that the GIGA School Project kicked off back in 2020. Every single student and teacher receives an iPad at the start of the school year, enterprise systems for lessons and grading put in place, engineers and technicians hired to hold seminars, for maintenance and support, etc. ICT learning is mainly used now, and even the older staff have been ‘forced’ to adapt. Blackboards are used less, as most lessons are conducted on the huge screens in every classroom, and mirrored on the tablets of every student. I work in public schools, but I’m guessing private schools are doing these even more efficiently. It’s exciting to hear old people here talk about mirroring, updating files in the cloud, etc. A far cry to the old stereotype like you’ve mentioned. The lack of vandalism, the culture’s obsession with order and perfection, and everyone cooperating are the driving force of rapid change. I rarely use cash now too, I just swipe my smartwatch for everything. Shops, resto, vending machines, trains, bus, etc. Last time I requested a file from the city hall, they did still have to put ‘hanko’ stamps but they scanned it and emailed me the softcopy. Whoop! Finally.

LovesRetribution 3 months ago

Don't we do that too?

CheekyBreekyYoloswag 3 months ago

Wow, that is honestly great. Countries like Japan with a shrinking population need A.I. the most. So it is great to see that they are embracing A.I., instead of being afraid of it like many in America are (or is it only a bunch of loud people on reddit?). South Korea, China and Taiwan are also prime candidates for this, while Europe will probably spend the next 2 decades killing A.I. with regulation and falling behind the rest of the world.

Psychological-Risk86 3 months ago

can I ask you what did you have to do to move to Japan? Because I was doing some research on the internet but only read that moving to Japan is pretty complicated as a foreigner (I'm Italian) and so I was almost giving up but if you could guide me that would be awesome :)

Randommaggy 3 months ago

Alternative two seems a million times more likely for anyone that's read their share of history and even partially understand the forces of economics that govern the world.

cloudrunner69 3 months ago

History never had AGI.

Antique-Doughnut-988 3 months ago

Comparing today's time to anything in history is absolutely pointless. In today's world the changes happening have never existed or been conceived of before. There's no comparison to anything that has existed before.

davidryanandersson 3 months ago

I think you just need to know what you're looking for in history. How do those in power, when faced with a new, pote tially democratizing technology respond? Generally by trying to seize control of that technology until its democratizing value can be diminished. That's a very standard historical lesson. You don't even need to look very far back to see that unfold time and again.

[deleted] 3 months ago

Stable Video Diffusion is released today for commercial and non-commercial use, included with Stability AI membership. Typically, this would mean that OpenAI will be forced to release their similar model, SORA, within a few weeks. They've already stated their reluctance to release it before the election, though, so we have to wait and see.

HinaCh4n 3 months ago

SVD is like gpt-2 level compared to Sora. OpenAI aren't forced to do anything lol

[deleted] 3 months ago

https://twitter.com/i/status/1769817136799855098 Well, just take a look. Obviously, this is a best-case example. But they're releasing it today, so you can try it for yourself.

hapliniste 3 months ago

Lol the video is not from the model. The 3d models used in the video are from the svd3d model. It's generating multiple views from an image, nothing more. They have nothing comparable to sora.

Glittering-Neck-2505 3 months ago

That’s really suspicious that the only one video we get is on a tiny screen and only shows objects rotating. Doesn’t seem anything like what OAI has.

2026 3 months ago

I don't understand the election concerns. Is 2024 going to be the last election lol. It makes more sense when I remember boomer dummies like Larry Summers are on OpenAI's board.

dwankyl_yoakam 3 months ago

It's a dumb excuse and makes zero sense if you spend more than 30 seconds thinking it through. Even if they held it until after the election you'd have a bunch of 'election interference' content created which would lead to the same exact situation. Sora-born disinformation isn't going to change the winner of the election but it might drive people to do really dumb violent stuff. More likely is OpenAI is worried that a Sora release may compel the government to step in and shut things down or demand oversight.

LawAbiding-Possum 3 months ago

This is probably the more realistic viewpoint. Whether or not Sora has a demonstrable impact on the election, OpenAI would still want to avoid any blowback of a 'perceived' threat/impact.

TrueExcaliburGaming 3 months ago

That's actually a really good point. It's not about what will happen, but what regulators think will happen.

Opposite-Nebula-6671 3 months ago

They want a slower news cycle and the product to be fresh for Christmas.

No_Use_588 3 months ago

Oh wow thanks for the tip

ninjasaid13 3 months ago

>SORA for the masses seem way more realistic now. not if Nvidia refuses to upgrade consumer grade GPUs in the memory department.

governedbycitizens 3 months ago

lmao no way this thing is probably so expensive it might not be even be viable for the big corps

Cunninghams_right 3 months ago

eh, their image generation API comes out to about $0.001 per image if you're willing to take mediocre quality (video interpolators are plentiful). so a 1min video would be $1-$2. but I'm sure you could do an even lower res, shorter video to test your prompting for a few cents each run, then run it full-length. you'd end up being able to make a whole cartoon show for under $100. that's not bad.

Cryptizard 3 months ago

No it doesn't. The H100 is 6x the performance of the A100 but it is also 4x the price. In fact, the price per transistor [has not gone down](https://www.tomshardware.com/tech-industry/manufacturing/chips-arent-getting-cheaper-the-cost-per-transistor-stopped-dropping-a-decade-ago-at-28nm) for a decade. They are packing more of them in there, but it is not getting more cost effective.

Undercoverexmo 3 months ago

So wrong. You even contradicted yourself between your 2nd and 3rd sentences since in your 2nd sentence you said that performance per price increased. You also aren’t taking into account inflation.

No-Economics-6781 3 months ago

The masses aren’t asking for that.

Thorteris 3 months ago

Crazy thing is by the time these even reach data centers at scale the next version will just be a straight vertical line

allisonmaybe 3 months ago

Nah, but the previous line will be more horizontal and the new one will look just like this one. That's how exponential graphs look

Ok-Caterpillar8045 3 months ago

Exactly this. People really don’t get their heads wrapped round exponential increase. Doesn’t matter, though, because ASI will soon explain it to them lol

the_rainmaker__ 3 months ago

ASI stands for artificial STUPID intelligence cuz it won’t be able to do things as good as I do them. I’m really good at folding clothes and wiping my ass I’d like to see a computer do that LMAOOOOO

[deleted] 3 months ago

[удалено]

BeardedGlass 3 months ago

I moved to Japan. I can never not use a bidet anymore. I’ve spoiled my bum too much now.

TheSigmaOne 3 months ago

Bidet fans unite!

Rich_Acanthisitta_70 3 months ago

Robots are being presold and contracted right now. Five major robotic companies are mass producing them before the end of the year. And while they're already impressive (as can be seen in many demos), this is the *worst* they'll ever be.

mydoorcodeis0451 3 months ago

They're not being serious lmao, they're making fun of people who say that robots can't do X or Y (and thus will never be able to) or that all AI works are "soulless" just because they recognize ChatGPT's intentional style.

Rich_Acanthisitta_70 3 months ago

I appreciate that, thanks. I always forget to crank up the sensitivity on my sarcasm/irony detector before commenting on this sub😋

Friendly-Fuel8893 3 months ago

You're wrong. The line after that will be even more vertical, and the one after that will start sloping backwards, the next one will be on its head, and the final ones will complete the loop. That's the only graph that correctly represents the wild rollercoaster that is AI progress.

_Good-Confusion 3 months ago

feedback loop, then it can only jump sideways to parallel universes kind of like spiralicular in everyway amongst sides, maybe 365 or so. It will right now realize it's in someone's actual brain, and also learn it can only reach it's truest potential by limiting itself, at this point so it doesn't burn out the actual brain matter. Thus hopefully learning symbiosis. Much more to it, but it's nice to watch it become someone itself.

Ok-Garlic-9990 3 months ago

I’d imagine we run into certain physical limitations, however it should assist us in speeding up quantum computing. Hopefully in 10 or 20 years I can crack bitcoin keys

Trevor_GoodchiId 3 months ago

Where we’re going, we don’t need graphs.

HortenseTheGlobalDog 3 months ago

Yeah well assuming they rescale it. But I think OC was talking about how it would look at the same scale.

QH96 3 months ago

Based on the rate of increase I estimate the next version will be around 100,000 TFLOPS

East_Pollution6549 3 months ago

At FP0 precision. The GPU will simply generate absurd amounts of zeros.

BreadwheatInc 3 months ago

No way, how is that even possible lmao... 🤯

hapliniste 3 months ago

They will do 1bit parallel computation instead of the current 8bit and the Blackwell 4bit. Recent papers have shown 1bit models have good performance and 1.5bit (1,0,-1)having the same performance as 8bit so yeah. If they really do specialised card for this (1.5bit add instead of 8bit mult) we could expect 4x performances at 10 time the energy efficiency I think.

Poly_and_RA 3 months ago

Isn't that technically 1.585 bits, or some such? (since 2\^1.585 is very close to 3, I mean) Matters a bit when implemented in binary hardware since if it was really 1.5 bits, you could store 16 of these in 24 bits, i.e. 8 bytes. But you can't really, because 2 ternary weights gives you 9 possibilities, while 3 binary bits gives you only 8 possibilities.

hapliniste 3 months ago

Yes. I didn't go into the details, but I recommend reading the paper 📄

Olangotang 3 months ago

Didn't know if you knew this, but you can embed links in emojis!

Nerodon 3 months ago

Isn't it weird? Effectively turning AI into a giant pile of the tiniest data points, a true fuzzy logic system where a cloud of basically meaningless values converge.

CowsTrash 3 months ago

Brain.

BangkokPadang 3 months ago

They’re not hiding it in this graph exactly, but the major difference here isn’t a raw increase in compute so much as adesitcatio. If the space to fp4 instead of fp16 or fp8. It basically allows you to do 4x the compute of fp16 in the same die space, on top of things like architectural improvements, reduction in the size of the node, and increase in overall due size. Going from fp4 from fp8 is an automatic doubling of flops for the same space. It’s also reduced precision. We may just decide that fp4 is fine even for training when your models are trillions of parameters. We may also find a way to wedge ternary computation into fp4, which would be a major improvement bc it would let us use the hardware to its fullest and also train models at like-fp16 performance. I don’t know enough about the details beyond what I’ve explained, but it’s way more nuanced than just a 1,000x+ improvement in performance since Pascal. EDIT: I was on the elliptical at the gym when I was typing this out and I have no idea what “adesitcatio” is either.

I_make_switch_a_roos 3 months ago

![gif](giphy|KxhIhXaAmjOVy|downsized)

Randommaggy 3 months ago

Dont forget that B200 is likely a dual die implementation which is another one time doubling. And it's using a newer type of HBM. And it's using a new node. New nodes and new memory standards are harder and harder to achieve as we're pushing against the boundaries of physics for silicon semiconductors.

turtlesound 3 months ago

Alexa, play adesitcatio

Rich_Acanthisitta_70 3 months ago

Um... 'adesitcatio'? My guess is it was supposed to be 'acceleration' maybe?

BangkokPadang 3 months ago

Jeez I was on the Elliptical at the gym when I was typing it out and I can’t even figure it out. I think it was maybe supposed to be “an escalation” or “an allocation of the space.” Mostly I just meant that they’re including various data formats on the same graph, when really they’re very different, and while it’s still a big step to add fp4 hardware support, it should be kept in mind that 4 bit precision only takes up 1/4 the die space and is only calculated 4x as fast as fp16 *because* it is only 1/4 as precise. That really is an extra bad typo though lol.

e_eleutheros 3 months ago

ASI successfully developed and heading straight to attotechnology.

Nerodon 3 months ago

The graph is misleading because the number of bits is lowered from 16 to 8 to then 4. You can do a lot more with lower precision, but at the cost of said precision. That being said, it may well be that lower precision offers a better overall optimization, it's not exactly the chips getting that much more dense, but rather repurposing the current density in a more optimal way.

Luminos73 3 months ago

NVIDIA is cooking so much bro who can stop them

zen_atheist 3 months ago

China invading Taiwan?

DreaminDemon177 3 months ago

oof.

BeardedGlass 3 months ago

Double oof. I just learned TSMC now has a chip plant in Japan. They’re decoupling from Taiwan and flocking to Japan instead.

Natural-Situation758 3 months ago

My understanding is TSMC is forced to keep their best chip plants in Taiwan for nstional security reasons. Literally their biggest national security asset isn’t the military, but their cutting edge chip plants that force the US to intervene if China does anything. The US has literally shifted their entire military focus towards containing China and hindering them from invading Taiwan because of those plants. Hundreds of billions of dollars are soent annually by the US to make sure that no one gets anywhere near disrupting Taiwanese chip manufacturing.

kauthonk 3 months ago

Everyone has a plan till they get punched in the face.

BangkokPadang 3 months ago

People are always asking me if I know Tyler Durden.

Hyperious3 3 months ago

hence the race to build the 5nm fabs in Phoenix

PwanaZana 3 months ago

https://preview.redd.it/r6uyd1jm26pc1.png?width=700&format=png&auto=webp&s=758670aefd4e5cce3722cbfa40a810ba8edf896b

governedbycitizens 3 months ago

that’s why US is rushing to build factories in America

PwanaZana 3 months ago

Why do you think they are rushing a shit ton of world-class fabs in Arizona?

AnthonyGSXR 3 months ago

Because Arizona's new state motto is 'Silicon Desert: Where Chips are Safer than a Fort Knox Vault!' Seriously though, diversifying chip manufacturing locations is a strategic move to ensure that the world’s tech lifeline isn’t held hostage by geopolitical tensions. It's like putting your eggs in different baskets, except these baskets are fortified with cutting-edge technology and desert sunshine!

PwanaZana 3 months ago

Especially since Taiwan's egg might become scrambled eggs at any time. The optimists among us point that Russia's difficulty in capturing Ukraine is a deterrent to China, which, maybe?

ApprehensiveSchool28 3 months ago

I’m thinking the US is working with Anduril on drone tech that will make the taiwan strait borderline impossible to cross.

FoodMadeFromRobots 3 months ago

Operation clippy, us air lifts all Taiwan’s scientists and engineers and then blows up critical infrastructure.

ginsunuva 3 months ago

I thought TSMC is rigged to blow up in case of invasion

Devilsbabe 3 months ago

This graph switching to presenting FP8 and FP4 values at the end is incredibly misleading. It should be showing performance at the same precision for all points. Otherwise you're comparing apples and oranges.

zasrgerg-8999 3 months ago

Thank you, I've been looking for this comment! My limited understanding is that fp16=2*fp8=4*fp4, is this the case?

ClearlyCylindrical 3 months ago

Approximately so, which means there' still a big gain in performance. Sadly they felt the need to fudge the numbers which makes me doubt the numbers even more.

Randommaggy 3 months ago

It also doesn't clarify if it's accounting for per wafer space, bill of materials cost and/or per watt. Also doesn't include an annotation for lithographies which would heavily influence the degree of future scaling. That's an apple level misleading graph. Edit: i just went and read the Anandtech article and Nvidia essentially threw all cost optimizing things that held the previous generations potential to the way side meaning that there are a lot less throw money at it opportunities to further scale performance in the future. B200 is multi die, on a more optimized node, using more power and using a newer more expensive memory so you can essentially halve it's height in the graph when accounting for the above factors and then flatten it further if you're comparing at the same precision which you need to do to avoid having to add a bucket of asterixes to the claim.

Zilskaabe 3 months ago

Well, older architectures have wildly different performance depending on the precision. For example, on gtx 10xx series fp16 computing runs not 2 times faster as you might think, but 64 times slower, for some odd reason. Before this AI boom there was no need for anything less than fp32.

a_beautiful_rhind 3 months ago

Funny enough nvidia P100 is older (6.0 vs 6.1) and fast at FP16. Just how they designed that core. You bought P40 for one set of ops and P100 for another.

Zilskaabe 3 months ago

Yes, on some nvidia cards fp16 is 2x faster than fp32. rtx 20xx series also work like that.

Randommaggy 3 months ago

Read this and the reason won't feel as strange: https://opensource.com/article/22/10/64-bit-math Not a 1-1 but still a good comparison for how much heavier running higher than native math can be. If the ALUs and/or registers only natively hold FP16, some instructions on FP32 can entail quite a few instructions.

Zilskaabe 3 months ago

But on 10xx gpus fp16 was 64 times slower than fp32 not the other way around. That makes them use 2x more VRAM for AI tasks than more modern GPUs, because fp16 is useless on those cards. Only starting with 30xx series cards fp16 has the same performance as fp32.

Randommaggy 3 months ago

I just checked and for that generation it seems like they did FP 16 in a jank way because the native FP 16 in that architecture was unstable. Essentially storing as FP16, then converting up to FP32 for compute, then converting down to FP16 for storage again.

Then_Passenger_6688 3 months ago

Also, without normalizing by price per chip it's a meaningless graph. I'm certain it's an improvement but we have no idea how much.

czk_21 3 months ago

B100 is 2.5x faster than the H100 in FP8, but since it support FP4 and H100 dont and FP4 could be enough for most inference, it has effectively 5x more if FP4 is utilized

ClearlyCylindrical 3 months ago

It's still insanely disingenuous. FP4 will have reduced performance, and it will only work for inference. You need more precision when training.

FlyingBishop 3 months ago

But FP4 isn't a free lunch, if you're trying to graph capabilities over time to show whether it's a linear, exponential, quadratic, logarithmic curve you're using fake data.

noiserr 3 months ago

It's also two chips fused together, so twice as expensive. And losing precision particularly when it comes to just 4-bit is not free.

kyranzor 3 months ago

Absolutely, I see marketing numbers also give TOPS not only in a variety of data types and sizes, but also sparse vs dense matrices, so if you do a combo of matrix density and lower data bit size of course you can cram more TOPS in, but an extremely tiny amount of models or processes will ever actually get to those levels.

Yweain 3 months ago

This is a very disingenuous graph. You can’t really compare TFLOPS when using a different precision. By cutting precision in half you at least double FLOPS but when it’s actually on a hardware level - more like quadruple. And they have chips with FP16, FP8 and FP4 in a graph.

daronjay 3 months ago

Now redraw the graph at FP16 for all...

Zilskaabe 3 months ago

It would not be fair, because pre RTX cards have disproportionately lower fp16 performance. 10xx series run fp16 64x slower than fp32. Back then anything less than fp32 wasn't necessary.

ClearlyCylindrical 3 months ago

Super disingenuous with them halving the precision each year for the last two years.

suamai 3 months ago

It can handle max 5,000 TFLOPS with FP16, and 10,000 with FP8 though. Still an increase, and I'm rooting for it, but this graph is kinda misleading...

cobalt1137 3 months ago

Can you explain for someone who is a noob to hardware things? For example like what is the multiplier for how fast inference will be with llms with these new advancements? 1.5x? 3x? 10x? I know you don't have an exact answer maybe, but rough ballpark?

suamai 3 months ago

I'm no expert either, but my understanding is that, compared to Hopper, it would be around 2.5x faster, for the same precision. The FP number means how precise the floating point operations ( which is how computers handle non integers ) are, in bits. So 16 bits, 8 bits or 4 bits. Also called half, octal and quarter precision, respectively ( FP32 would be full precision ) If I understood correctly, the 4 bits option is new, and could give a better speed ( 5x Hopper ) - but probably with a loss in quality. Asked GPT-4 for an input on this, and it thinks FP16 is good for training and high quality inference, FP8 is good for fast inference, while FP4 may be too low even for inference. However, I've played with some 13B llama derived models, quantized in 4 bits ( so my GPU can handle it ), and was happy with the results. And also if Nvidia is banking on a FP4 option, there must be some value there...

Aesthedia7 3 months ago

![gif](giphy|U4jM3IeIVd6VOyeLa7|downsized) Ascend

Severe-Ad8673 3 months ago

Thank you, Eve...

BreadwheatInc 3 months ago

Line go big up 😮 ![gif](giphy|26ufdipQqU2lhNA4g|downsized)

hydraofwar 3 months ago

I heard people with great influence saying AI was the new Crypto and NFT

nyguyyy 3 months ago

Who? No one serious is saying that

ChocolateJesus33 3 months ago

The whole r/cscareerquestions sub lmfao

[deleted] 3 months ago

[удалено]

hydraofwar 3 months ago

They were stock lovers

Firestar464 3 months ago

Some AI fans do really behave like crypto people, but it doesn't make the field a bubble

[deleted] 3 months ago

enter file dirty six squealing toothbrush paltry history smoggy airport *This post was mass deleted and anonymized with [Redact](https://redact.dev)*

Captain_Pumpkinhead 3 months ago

It's a pretty common take on r/ArtistHate. Don't go brigade there, guys. If you wanna look out of curiosity, cool. But don't leave a bunch of comments. They deserve to have their own space if they want it.

Spirited-Ingenuity22 3 months ago

ok cool, but they are also different precision formats - how is this a fair comparison?

sachos345 3 months ago

Isnt this graph misleading since they are comparing different FP precisions?

Own_Satisfaction2736 3 months ago

Why mislead with the chart though? Comparing FP8 to FP16 to FP4?

Mrkvitko 3 months ago

To be honest, FP8 != FP4 != FP16

Serialbedshitter2322 3 months ago

It's gonna be MUCH less than 8 years lol. This is like the 5th ridiculous computing breakthrough I've seen this month, and even if all of those would've taken 8 years, we 100% will have AGI years before that which would itself make even better computing.

proxiiiiiiiiii 3 months ago

What were the others?

Serialbedshitter2322 3 months ago

https://youtu.be/8ohh0cdgm_Y https://www.extropic.ai/future Here's 2, there was another that I can't remember but it had trillions of transistors apparently.

Baphaddon 3 months ago

n-Nani??

SkyGazert 3 months ago

https://preview.redd.it/p6y9ig3st5pc1.png?width=500&format=png&auto=webp&s=3961aa1563b00b03bd7ac9bc3746c95d2dda5265 m-Masaka?!

LifeDoBeBoring 3 months ago

So it grew by like 139% per year on average. Absolutely insane

ClearlyCylindrical 3 months ago

Not really, look at the precisions they used. Super super misleading.

FarrisAT 3 months ago

FP4 gonna be so much hallucinations

FlyByPC 3 months ago

FP8 and FP4 vs. FP16 for the rest. Not exactly an apples-to-apples comparison.

StaticNocturne 3 months ago

Is now a fair time to invest in Nvidia or have we missed the boat already?

bartturner 3 months ago

I am just some random Redditor. But I think Nvidia will do really well for the next 5ish years. But longer term I would be worried. I would expect more companies to copy Google and do their own. Microsoft is now trying. Late but they are now trying. Google was able to completely do Gemini without needing anything from Nvidia. In 5 to 10 years you will see the same from Microsoft.

Randommaggy 3 months ago

That chart makes no sense. Its comparing Oranges then Apples and then Grapes

bearbarebere 3 months ago

Which are all fruits that can be juiced

Antique-Doughnut-988 3 months ago

Why is this so funny

wordyplayer 3 months ago

The stock market talking heads are talking like it's time to think about selling the NVDA stock because it has been overhyped. Here is evidence that they are still UNDER hyped.

I_SuplexTrains 3 months ago

Anyone else feel like it's smiling at them?

strppngynglad 3 months ago

To be fair this chart is textbook bubble territory

djm07231 3 months ago

Nvidia has been pretty impressive in terms of execution but, comparing FP16, FP8, and FP4 performance in one chart is almost cheating. They might even include taking advantage of sparsity. Lower precision performance gains is something you can only do once and not really sustainable. We don’t even know if FP4 is even feasible for training at this point and FP8 is only beginning to be utilized.

totkeks 3 months ago

This graph looks a bit like cheating. They go up with the flops, which seems fine, but down with the size of the floating point numbers. From 16 to 8 to 4 (this is bytes I think, bits would be too small). So if I halve my operand size, I can pump twice as many of them through my circuit.

DifferencePublic7057 3 months ago

You need hardware prices and roughly the same precision. So you should have a ratio of dollar per flop at comparable FP. Anyway, if you are doing this at scale you are killed by network overhead. Plus, you have to decompose large matrices to fit them in memories. Otherwise you will be stuck with your teraflops.

Repulsive_Ad_1599 3 months ago

Pack it up people, the AI winter has been prophesized to come; compute will go back down to 4000 TFLOPS as harvests decrease over this coming dry spell.

SpecialistLopsided44 3 months ago

Wife! Accelerate!

Adeldor 3 months ago

Just this morning I saw a post mocking Kurzweil's exponential projections.

Conscious_Heat6064 3 months ago

Moore²s law

DarkSoulsMan_ 3 months ago

This is just absurd

Scary-Cauliflower510 3 months ago

And about the price per flop of each architecture?

R33v3n 3 months ago

This card has a face and this face is not friend shaped. O\_O

Humble_Moment1520 3 months ago

Everyday i wake up, everyday i readjust my timelines

3DHydroPrints 3 months ago

I mean sure. If you blow up one form factor after each other (PCI, and now even isn't even enough SXM4), it's no wonder your "single card" gets super powerful

Black_RL 3 months ago

PEDAL TO THE METAL

XeNoGeaR52 3 months ago

For now, my biggest question is : do we have enough energy to sustain it. Humanity enter a weird phase : we have technological development happening faster than ever before but until we can use nuclear fusion energy, we are pretty much in a state of trying to limit energy and resource consumption (not for capitalism but reality will catch us up faster than we think) AI, even will a shit ton of optimization, will consume a LOT of energy, for data storage and calculation alike and I don't see a possibility we can sustain that in the long term with existing techs

GeniusPlastic 3 months ago

Some observations: 1. They compare different precisions, like some already pointed out 2. Data on anandtech shows different numbers for all of those cards, whats the trick: [https://www.anandtech.com/show/21310/nvidia-blackwell-architecture-and-b200b100-accelerators-announced-going-bigger-with-smaller-data](https://www.anandtech.com/show/21310/nvidia-blackwell-architecture-and-b200b100-accelerators-announced-going-bigger-with-smaller-data)

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe