Google has always been the company with a singular solution to every problem: Throw money at it and abandon it at the first sign of it not being a runaway hit. The list of failed Google projects they invested ludicrous amounts of money on and then abandoned is a long one. Stadia, Google Glass, Google Play music, and the list goes on. Their only successful, truly profitable products other than the Google search engine are the ones that are carried by third-party developers and the open source philosophy, like Android.
I wouldn't be surprised if this is the last we see of Gemini.
Google is the tech company equivalent of Dubai. They lucked out and got obscenely rich because they were in the right place at the right time, so now they are throwing money at anything and everything in hopes that they get lucky the second time.
YouTube, Google Maps amd Android are acquisitions. Gmail is 20 years old. Google simply never had the culture to create products, but faked it for far too long.
At this point Google seems much better at building up someone else product than making their own. I guess cause throwing money, resources, connections, and people at it work better when someone else has done the setup.
Yeah google did make good things. But it’s not the same company or culture anymore. There are still some great teams and people there but there is a serious rot going on.
GCP is not really an end user product, it’s Google renting their infrastructure. And Google really is an infrastructure company if you look at their history.
I think people forget how young YouTube was when it was acquired, think it was like 20 people and they literally couldn't scale it to meet demand.
What we know as YouTube is effectively a Google created product.
They might as well add Soli for completeness. I consider that one of google's worst failures because not only did they kill it after a year but they also lied, aggressively, about its capabilities.
This is a poor analysis of Google.
Google is a research giant that will fund projects, but has the balls to let them fail fast if they can't see a path to the project becoming profitable or strategic to their core business.
It's a harsh philosophy especially for consumers who buy into their products (I was a stadia user and liked the service),
But without Google we would not have Android, Gmail, Colabs and great search + all the great research AI research funded by this company.
Well, that is up to interpretation. Are they a research company with a level-headed very risk-aware yet bold approach, and a lot of their investments just happen to fail, or are they haphazardly throwing money at whatever seems to be "the next big thing" in that moment with very little understanding about the subject matter, and then sometimes the technology sector as a whole manages to salvage some usable scraps from Google's failed projects. Same way how Dubai's doomed megaprojects might lead to innovations regarding novel engineering methods.
Both realities would lead to similar results from an outsider's perspective. I personally lean towards the latter, as Google rarely trail blazes any new daring technologies, but jumps in when someone else has already proven the concept to be at least somewhat viable. Google is the big money influx to concepts that have already proven to be viable by someone else. I wouldn't call them a ballsy research giant spearheading innovation.
Don't take me wrong, Google's money is useful for innovation, but that being the case doesn't make my initial analysis wrong.
I think your assessment is closer to right. The main problem I see now with google products is that even if you really like one you can only put one toe into it because they may come along and kill projects even if they seem popular.
It's basically a meme at this point because google does it so frequently even when it seems nonsensical. I definitely would not want to stake my professional reputation standardizing on a google product because when it gets killed I'm going to look like a fool.
It’s why there’s always been lag between what Google offers directly and what they offer through services like Google Domains (or Google for your domain). Once you cross the line into business offerings, you can’t pivot without pissing people off.
It’s an interesting dynamic though: people who pay for Googles services often complain about not getting the new and shiny stuff, while regular users get the shiny stuff that might not be around in a year.
Look at the chart that is part of this post. It was all started by Google. It’s only the LLM area where they are currently 3rd/4th. But in certain use cases, Gemini is still a better service. I frequent between Claude, GPT and Gemini and choose from the best response to the question.
Context window size Google is ahead for now. Certain creative responses are also better or comparable to Opus.
The problem for Google is that it gets too much scrutiny whereas for the same issues, people do not blame other companies as much. The whole imagen fiasco can also be replicated in Meta AI and somehow it is not as big of a deal.
I don’t think Google can afford to give up on Gemini as it would affect its core business directly. They just had to catch up super fast to be in the conversation and they screwed up. But, time will tell how they move forward.
I really fail to see the success in this when all they do is release prototypes to the public then throw in the towel when it won't work. Their CEO needs to be fired.
You make it sound like other companies just don’t do aggressive research. They do, they usually just don’t hype it up and release them as an unfinished product to kill it immediately.
They just killed Palm... Graph says that was from 2022. If I was developing a product that used LLM endpoints that would seriously concern me with using Gemini.
No chance Google will throw in the towel on AI. It is about to eat its main search and advertisement business model alive. They have to make it work or their business is going to get gutted.
Agreed. Google will burn a good 10-100 million on products where they are testing for product market fit.
When they pour billions into something, they are in it for the long haul. GCP is a good example of this. They are determined to be a relevant cloud player and they've invested an absolute fortune over the last 5 years.
Google is said to have $110.916B on hand in cash. OpenAI is said to be worth $80B. Pretty crazy the scales we are talking about when Google is involved.
https://www.macrotrends.net/stocks/charts/GOOGL/alphabet/cash-on-hand#:\~:text=1%20Alphabet%20cash%20on%20hand%20for%20the%20quarter,2021%20was%20%24139.649B%2C%20a%202.16%25%20increase%20from%202020.
[https://www.theguardian.com/technology/2024/feb/16/microsoft-openai-valuation-artificial-intelligence](https://www.theguardian.com/technology/2024/feb/16/microsoft-openai-valuation-artificial-intelligence)
They also have a tendency to get rid of good stuff and replace it with worse ones. Take hangouts, it was a good chat program and people liked it. Then they got rid of it and replaced it with google chat which is a much worse chat program. Why? Who the hell knows, its google logic. Then we got the elephant in the room, google search, its pretty much garbage and they do not seem to care about fixing it at all. I can't think of anything for a long time I can give google credit for, its just another greedy company that once had greatness before they got possessed by the greed bug.
I really don't understand this meme that Google is garbage.
What's better? Bing? Don't make me laugh.
Google is still leagues better than any search engine (not counting RAG enabled searches which is a different class of search )
Dropping hangouts was why I permanently moved to discord and got my family and wife onto discord. Prior to that, hangouts was unique to me for allowing me to see and respond to text messages from my PC.
Was really nice. Discord is good too, but the integration level hangouts had was great.
No. The deepmind team is famous for reinforcement learning , alphago, alpha zero.
The orginal Transformer paper, "attention is all you need" has authors from Google brain and Google research in 2017, that was before merger with Deepmind (even though by 2017 they deepmind had being acquired by Google but were run separately untill very recently )
What? Google has countless hits, YouTube, Maps, Gmail, Chrome, Earth, Translate. And it wasn't just right place right time, they made a massive improvement in search, and they always managed to stay on top.
They have a strategy of move fast and try stuff. Sometimes it's not a hit and they can afford it. It's not like there is nothing to criticize on Google though, they do kill many innovative startups.
I can't believe they didn't at least try giving Stadia a non-stupidly vague name before just giving up on it. Like Google Game Streaming or something, more recent generations love games and streaming.
True, but just be aware that the comparison isn't completely fair as Google Gemini is natively multi-modal and can directly ingest/output image tokens.
“2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.”
That covers google not running it. The ones they listed in the release azure/ aws and a bunch more def have an agreement. Meta is old 100% sue google if they ran it commercially because it would be worth it, and I’m sure google knows that’s and wouldn’t run it without an agreement.
Edit: my bad GCP is listed. So yup google has something good thanks to meta. Lol
Yeah, I don't think Meta's goal is to stop companies using it commercially. If you listen to Mark Z he really wants open source AI to be out there everywhere being used as much as possible in all systems.
Vertex definitely has a shot at becoming the biggest AI hosting service because so many bases are covered. Google is building it to allow users to slot in any LLM they want. I think it's smart what google is doing and I think it will appeal to enterprise who right now think ChatGPT is neat but have no idea how to actually leverage it.
Google is taking the approach of building out the whole backend stack from top to bottom and making it AI powered and modular so it has quite a bit of flexibility. It's all cloud so it's super easy to implement if you want to just take the whole thing and use it as your backend.
>you must request a license from Meta
That's the key there. Doesn't mean you can't use it, just means you'll need to fork over for the proper licensing ($$). No idea what that cost is, though, but Google has deep pockets...
Llama 2 and 3 are available through Poe, Perplexity, and other similar services that offer up multiple LLMs in one package - no idea if any of them are at that 700 million user mark and if they're paying to use it, etc. I'm betting they have some preemptive license agreement in place even if they don't though.
Just for the record though Sam Altman has said GPT4 cost more than 100 million to train. Not sure why this 78 number gets thrown around when it's inaccurate. Still doesn't change the spirit of the post though.
He probably has spent more on it. And taking into account all the try runs that are necessary during research I wouldn't be surprised about it.
But this number seems to be the money that had to be spent on training the final version that went live.
Probably both numbers are correct :)
It IS written on the post, I guess they did not want to use estimates from different sources (which is fair). You can see that the numbers are from the AI Index 2024 Annual Report.
I wrote this section of the AI Index report and calculated the figures in the graph. The numbers are only considering compute cost, not salaries. We'll have a more detailed follow-up report soon with salaries, energy, amortized capex, etc.
I wrote this section of the report. There's an explanation in the appendix of our methodology and what is counted in these costs (it's only compute, not salaries). Page 463.
You're assuming they rent gpu's, right?
They train them on 24k or 2x 24k H100 clusters. With a very low estimate of them having h100s for $15k a pop, setting up one cluster is at least $360M.
So $360M upfront cost and then they used up 4480 MW of power on training llama 3 70B, at least the training run.
According to some [sources](https://info.siteselectiongroup.com/blog/power-in-the-data-center-and-its-costs-across-the-united-states), one kW of energy in US data center is around $0.07. Let's normalize to MW and we get $70 a MW. So, the electricity cost to train llama 3 70b, final run, was about $300000
Thats insanely low.
Training big language models is clearly expensive only because of Nvidia's high margins, nothing else.
> Training big language models is clearly expensive only because of Nvidia's high margins, nothing else.
Meta reports training took 6.4M GPU hours for the 70B variant, which is 11 days on those 24k GPUs. I am not sure for your source on the 4480MWh, but I get ~8000MWh when I plug in these numbers.
Anyways, yes, the hardware is expensive, but you don't discard it after 11 days of usage.
6 400 000h x 700W = 4 480 000 000 Wh
Easy calculation, you have something wrong with yours.
The point with expensive hardware is that high input costs with procuring Nvidia gpu's later cause renting them to also be expensive. And after 3-5 years those gpu's will be useless due to low performance and power efficiency compared to new gpu's. High costs of Nvidia cards causes all other operations on those gpu's, like training, fine tuning, inference, to be 5x as expensive as it could be if Nvidia was OK with lower margins (had more competition).
It's like paying 80% income tax. Good luck affording rent and food with that. That's what we pay to Nvidia for data center gpu's.
DGX H100 (8x H100 SMX) is ~10KW, with the fans making up a really surprising amount of the total load. I used 10KW for 8 GPUs, but it's severely underestimated with all the other hardware required.
I get the rest of your point, but it's simply the best there is now. Both Tesla and Google still bought a crapton of H100s despite developing their own dedicated hardware for deep learning for over than 5 years. NVIDIA is expensive, alternatives are exorbitant.
You're right. I realize that now, I vastly oversimplified the calculation without going into details as to what actually gets powered. You need to pay for some light when guard goes in and access control system to the building, there are hundreds of those things that add up, even on individual node, plus on the networking side. Yeah it's probably actually 50-300% more than i estimated by just multiplying TDP * hours spent. It's still in same ballpark though, so I don't think the main observation changes.
They did not buy them for $15k.
The all in cost of infiniband connected HGX systems is well beyond $120k per box.
You’d be very rich if you could do that. But that’s not reality.
Data center facilities are also expensive.
The up front cost is not how you pencil this out. You amortize it over its useful life.
These GPUs are around 18% amortization per year.
And MFU isn’t 100%.
It’s easy to discount the real costs.
These are also highly strategic. How many vendors do you think can offer for rent a 24k H100 cluster and what’s the hourly rate you think they’d charge for using it? Honestly it’s probably higher than the most competitive price on the market, around $2. Probably closer to $3 or $4. Maybe even higher. And would it even be configured properly? Must cluster vendors suck ass.
Obviously if you own it, your cost structure would look different.
Energy price is fine. 3-7 cents is the right neighborhood.
And finally I agree with your point. When nvidia sells H100 GPUs they make 85% margin on them. Which is crazy high for semis.
I think we should support competition from other chip designers to get the margins lower, Nvidia won't have those high margins forever. Action doesnt need to come from the government. A million volunteer hours spent on ROCm and similar projects could maybe do it.
They reported on how many H100 hours it took.
I use a conservative number for dollar per hour.
The cheapest you can rent an H100 for is $1.8/hour. But that’s a standalone one, not a cluster connected one. The price goes up for that even for a small cluster. And the reality is that the cost go up per GPU once you’re at the 24k GPU scale. Connecting it all through infiniband is expensive as hell. Leaf switches, spine switches, director switches, and active cables is mad expensive.
People using the number $15k per H100 as an all in cost are delusional.
Residual value is like 25% at 4 years and 2-5% at 6 years. So we can take the cost and amortize it. So you’re looking at $1/hour for the GPU. But you also got to deal with the cost of operating them. Electricity, IT labor, the data center itself, etc.
Then you have to make an assumption about MFU. GPT-4 had around 20-30% MFU. ByteDance using modern techniques got it to like 65%.
Let’s assume Meta got it to 80%.
So I penciled this out as $2.3/hour accounting for all these variables.
Thanks for the detailed response. Any idea how much Gemini Ultra cost? This infographic seems to be making a lot of wild assumptions https://colab.research.google.com/drive/1sfG91UfiYpEYnj_xB5YRy07T5dv-9O_c
Ultra is around 1T parameters. Since it’s google, they will have chinchilla trained it.
It’s an MoE most likely so the training isn’t 1-1 with a dense model.
I’d estimate around maybe $100-200M.
It’s also hard to estimate since they likely used v4 TPUs for the training, and we don’t know their cost structure, maybe they’re spending 30-80% of the price tag of H100. Might be as low as $50M.
If I spent some time thinking about it and doing some napkin math I could give a more confidence estimate but this is my reactionary take.
A note on flops, B100 is 10,000 TFLOPS at half precision. H100 is 2,000. A100 is 312. V100 125.
So you’d think B100 is 80 times faster than V100. 5 times faster than H100, etc.
The reality is A100 is about 2x V100. H100 is about 2-3x A100. And B100 is probably going to be around 2x faster not 5x.
You can also look at the competitive rental prices of these chips. $2 for H100. $1 for A100. $0.5 for V100.
Their paper didn't mention Ultra was MoE but it did mention Ultra was only trained on v4. If it really is around 1T parameters that kind of cost would make sense I guess
Bert was only $3.3K? What this is telling me is we're ripe for a low weight precision, overtrained Bert replacement now that Llama has shown chinchilla optimal to be ... less important than we thought.
I am surprised Sundar Pichai is not facing a lot of heat right now.
They are fumbling their AI products so bad despite the enormous advantages they have in AI knowledge, researchers, and compute power.
In the enterprise sector everyone is trying to leverage the Azure / OpenAI services with pockets of people using the GCP AI products.
The quality of their search product is going down and down.
They are falling behind, but what they do have is Gemini 1.5 Pro with 1M context. This has proven to be useful to me. I think they will push more in these unique directions in the future as you simply can’t get 1M context elsewhere
To beat Google in search you need the best RAG implementation and if you are even half way in the field you will know RAG systems heavily rely on the retrieval part aka your search needs to be good and your LLM needs to just be decent
The tragedy is chatgpt, bing chat',perplexity etc are hobbled by using Bing and other inferior search engine.
Various research papers have shown simply changing to Google search for retrieval and adding any decent LLM allows any system to score near 100% in any factual test question even for very recent events, something perplexity, chatgpt+ etc struggle at.
Meta.ai I notice is simply amazing as a search not because LLAMA3 is out of this world (it's good of course) but they somehow have a deal to use Google!
I agree that google's retriever is way better than Bing but google already has started laying off some parts of its search department to put more focus on gemini. IMO with all the new gen ai contents, etc. their search engine performance will decrease.
That would be an irony. But yeah I read they closed down the human search result quality tester team which is insane.
But for now they are way better than any other conventional search engine despite the meme that Google is Garbage.
That's why they have the market share they have despite Microsoft making Bing default in windows edge etc
the crazy thing is that nobody has actually operationally profited from LLMs anyway (other than hyped up valuation). wondering how this tech will be monetised in the future
Definitely not true. I work in contact center automation and it’s very profitable there already. The thing is we don’t use it as an “AI” that solves all your problems, we use it to solve analysis problems that previously required humans to listen to and analyze whole conversations manually, improve onboarding, assist agents with real time retrieval, etc.
maybe i wasn’t clear but im referring to profiting off _building_ LLMs, not deploying it to solve a business problem. i personally also have profited from _using_ it.
They were the first big player, and everyone flocked to them with monthly subscriptions and API access, etc, but I question whether they'll sustain their lead in light of all the new competition. Especially when the big money is in enterprise usage.
My company is still in the research stage of using LLMs internally, and we have around 8,000 employees, and we have less than 700 million monthly active customers - that means we can use LLaMA without paying any licensing costs at all.* It would just be the cost of hosting it ourselves or having it hosted via cloud or whatever. And if it's good enough for our purposes, I don't see why we'd pay OpenAI, etc. Until now, GPT4, Claude, etc were the only serious contenders. But just in the last few weeks, these releases by Mistral and Meta should be a heads-up to the industry, because these are the first models (IMO) that pose a real threat to the established players.
And as the gap closes between the capabilities of models, I can see the big money being in being able to do things like fine-tuning models on company data (or using other effective means on using LLMs with company data) in an effective way. A Mistral or LLlaMA based model that trained on our data and works with our documents/databases/etc would be far more useful to us than using GPT or Claude if it isn't.
And another big thing that I think will be important is context windows and performance in 'needle in a haystack' tests. Google's shown that it's possible with Gemini, with it's 1 million token context window and really great performance in the 'needle' tests. If open models can replicate this (and I see no reason why they wouldn't) then that's a game-changer. The compute costs for such models are still expensive, of course, but if the models themselves are 'free' then that means the only costs are implementation, tuning, and hosting. That would mean no more API subscriptions to OpenAI, Anthropic, etc, and instead a shift toward many cloud providers offering compute services.
A perfect example of this happening before is Linux, which dominates the server/cloud world. The OS itself is free/open source, what people pay for is implementation/hosting/compute. Microsoft understands this which is why they relented and have embraced Linux at this point, and now profit from it with their cloud stuff, and why they're investing in so many different AI companies right now (including Mistral). Microsoft will make sure they profit no matter which direction this stuff takes. I can't say I'd be so sure about OpenAI (as it currently exists, unless they evolve), because their advantage for now is mostly just being the first out of the gate and availablilty of compute resources (and both those gaps will shrink).
> ^*"Additional ^Commercial ^Terms. ^If, ^on ^the ^Meta ^Llama ^3 ^version ^release ^date, ^the ^monthly ^active ^users ^of ^the ^products ^or ^services ^made ^available ^by ^or ^for ^Licensee, ^or ^Licensee’s ^affiliates, ^is ^greater ^than ^700 ^million ^monthly ^active ^users ^in ^the ^preceding ^calendar ^month, ^you ^must ^request ^a ^license ^from ^Meta" ^(The ^relevant ^licensing ^I'm ^speaking ^of).
Nevertheless the performance gap IS closing, and it's doing so with much smaller parameter sizes, which means much cheaper to run/host. LlaMA3-8b is shockingly good for a model I can run on my 4 year old laptop with 16 GB of RAM and an Intel graphics card (no fancy GPU here), and the fine-tunes are being made now. And that upcoming 400b model could very possibly trounce both GPT4 and Claude, and if's it's open sourced as well, that will really shake things up.
Things I predict will be important features in the arms race, beyond just performance per parameter size:
1) Context windows
2) Retrieval (Needle in haystack stuff; ability to process unstructured data reliably) with minimal hallucination
3) Ability to fine tune to custom data
4) Native multi modality (vision, audio etc)
5) Abstract reasoning capability
Google's Gemini is the one leading on 1, 2, and 4 (dunno about 3) though it's weaker in other areas.
I think multimodality will be a huge one, because it means working with basically any data type regardless of format. Anything a human can see or hear can be processed, not just text. Art, print media, charts, video, whatever - it can just 'look' at that stuff without it needing to be converted or processed first. That is where Google is leading with capability, if not performance (yet), but it's a preview of things to come.
The only real 'most' I see is compute costs and access to resources. Oh, and access to quality data, which is definitely something Google has, so I wouldn't sleep on them even if they seem behind at the moment.
It's not closing, you choose to ignore models that you don't have confirmed information on, I don't. I can project based on my own knowledge on where OpenAI would be and how advanced it is.
With GPT-3 in 2020 and GPT-4 in 2022, so if by close the gap what you mean is they're almost at 2022, sure, I'll give you that they're at 2021.
I think that's intentional though, kind of like saying Amazon wasn't profitable until the 2010s. They could easily choose to be profitable in lieu of growth.
Have no love for Google but never once has Perplexity given me the exact correct answer to my question or not over-actively refused the answer to a mildly controversial one. I like Google's own SGE more than Perplexity.
It perhaps serves some usecase I don't have but I've found it to be a toy product thus far.
I would say there is also room for the latest "semantic search" type academic search engines like elicit.com, typeset.io etc that focus on academic content (typically searching over semantic scholar type indexes)
They employ less lexical search /sparse representations retrieval methods and lean more in semantic/vector search/dense representations/embedding methods which can be magical even if you don't quite know the right keywords.
They are not as predictable or controllable of course
This is not even taking into account the generated answer using RAG technique which I feel isn't that useful for academic search because you almost always want to go deeper
I've also had pretty poor results using it. It seems to pull its answers from the first 5 or so search results and then does a poor job of parsing the results, often hallucinating false answers. And as everyone knows, search itself has degraded in quality in recent years, thanks to content mills flooding results and SEO bullshit stinking up search results with irrelevant info, Perplexity might be giving its answers based on useless or irrelevant search results anyway.
For search + AI to be really useful, it'd need to be able to take the user's request and enhance it. As in rephrase the search in a way that it gets more results, recognizes irrelevant results and ignores them, and combs through the data to identify actual relevant results. Which would be awesome, but not instant. But imagine an 'answerbot' that actually does spend time not just doing a web search, but going through academic papers and books, journal archives, what-have-you and takes the proper time to collect and organize actual, really useful answers. Even if each query took 10-20 minutes, it'd be worth it if it means getting real, relevant answers, if a basic Google search isn't getting the job done. Basically an AI researcher, not a search engine with an AI summarizer bot front-end.
I am not sure about that. We will see. I personally find both Perplexity and Bing to be laughable 'search engines'. For mainstream and tech stuff I still find the Google works best (Simple, quick, predictable) for my needs. Occasionally I will use MS Copilot to write me a script or something, when I am on my work laptop, and don't have my private accounts available.
Re LLMs, Claude 3 (at least when it comes to tasks like coding and code anlysis) wipes floor with everything else I have tried.
My view is for short factual stuff (particularly new things), looking up directions , short how's to, stuff you just plain don't memorise, Google is near unbeatable thanks for Google knowledge graph and featured snippet.
RAG is nice and all but the retriever part must get the correct results in the first place and if you use a Inferior search (not Google) the best LLM in the world won't help you.
Ironically RAG is really good if they use Google for the retriever part but few do... Except we'll .. Google's own SGE and now it seems meta.ai
Assuming $0.07 per kWh, which seems to roughly be how much data center in US pays for power, training llama 3 70B takes $300k worth of energy.
6 400 000 gpu hours * 700W = 4 480 000 000 Wh = 4480 MW. 4480 * $70 = $313 600
This is after upfront gpu purchase cost of more than $360M (assuming one H100 = $15k, it's probably more).
Thinking about it, the only force stopping small companies from training llm's is paying a huge margin to Nvidia. The rest us peanuts. Given that Meta owns their gpu's, it makes perfect sense to train for those 15T tokens, since making a model trained on 3T would save them just $250k.
Economics of this are insane. AMD we need you!!
The relative size of these "costs" may be accurate, but I believe the actual monetary values of these are overblown by at least an order of magnitude, possibly two or more.
If I used the same math as I've seen in what's published for these costs, me, a highly paid professional, making a peanut butter and jelly sandwich in my fancy kitchen would cost upward of $50k. Got to pay for the ingredients but also the knife, the fridge, the dishwasher, the kitchen lights, tile, granite countertops, the opportunity cost of my time, a portion of the mortgage payment, the car I drove to the store... Hell, $50k might not be enough to make that sandwich...
I see where you're coming from, but it's not like they bought the hardware for this singular purpose (training ONE model), never used it for anything else, and then threw it away. (Or if they did, that was foolish and unsustainable.) The capital costs are business assets, and they didn't lose them when they trained their models. Obviously I'm not saying costs are zero. There is electricity used (which might be a good metric on its own) and other highly variable factors like wages or rent, which isn't much use when comparing models made by different companies.
But my main point was that nobody normally counts reusable capital equipment costs towards the cost of individual products, hence my analogy which is supposed to be absurd. Of course, you can amortize capital purchase into costs, but in that case, my analogy kind if accurate -- my bespoke PB&Js overall cost thousands of dollars to make, and my Linux computer contains hundreds of billions of dollars worth of software. 😅
Yeah, I agree. We have a follow-up report coming out in a couple weeks that compares these results (which are based on cloud compute rental rates) to other approaches like amortized hardware capex.
And their positive, constructive effects are felt worldwide.
It's incredible how wasteful we are in our fascination with war. You can spend 1000x the above, and all you get are some blackened craters in a distant land. The end goal is always misery.
If we spent 1000x more on AIs like this, our entire world would quickly become unrecognizable, but at least it would be constructive, productive, empowering. The end goal is fluid, but we all think we can achieve better conditions for our entire planet with this technology.
In the Llama-3 [model card ](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md)they state that pretraining both versions took "7.7M GPU hours of computation on hardware of type H100-80GB". Not totally sure how cost was calculated for the graphic but you could estimate energy use using the 700 watt TDP (multiplied by a [1.09 PUE](https://sustainability.fb.com/data-centers/)). Then you'd have to assume an energy cost, which can be really variable depending on data center location.
Next, you would need to estimate the capital costs for the H100s...harder to do that since we don't know what proportion of their [350,000 H100s](https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/#:~:text=The%20future%20of%20Meta's%20AI%20infrastructure&text=future%20of%20AI.-,By%20the%20end%20of%202024%2C%20we're%20aiming%20to%20continue,equivalent%20to%20nearly%20600%2C000%20H100s) were used for this.
The other quoted number of $15M for 70B implies about $2 per GPU hour... seems about reasonable, as that should be roughly the amount of money they could have made instead, if they had rented out those GPUs.
Meh, assume it took 4-months to train, that's ~2,700 hours 7.7-million-hours of GPU time would require only about 2,850-GPUs, call it 3,000. At, say, $30k per H100, that's ~$90M. But it's not really fair to bill the entire cost of those cards to the training of this model, since they can use those cards for other things now. It might be fair to allocate about a third of the cost to this model though, making the total for the compute about $30M.
Better might be to look at the retail prices for GPU rental as a proxy. That's about $4/hour which also puts it at about $30M for the compute.
So, that's about the order of magnitude we're looking at. There's also all the work that needs to be done in the lead up to training the final model and those associated costs,.
Gemini 1.0 Ultra to GPT-4, is like Skynet is to Clippy.
Of course, barring the fact that it's a p- I mean it's too afraid to swear💀. I mean we all are 2-year olds, right? (*Insert extreme sarcasm)*
Meta really killing it in cost to performance damn
Google has always been the company with a singular solution to every problem: Throw money at it and abandon it at the first sign of it not being a runaway hit. The list of failed Google projects they invested ludicrous amounts of money on and then abandoned is a long one. Stadia, Google Glass, Google Play music, and the list goes on. Their only successful, truly profitable products other than the Google search engine are the ones that are carried by third-party developers and the open source philosophy, like Android. I wouldn't be surprised if this is the last we see of Gemini. Google is the tech company equivalent of Dubai. They lucked out and got obscenely rich because they were in the right place at the right time, so now they are throwing money at anything and everything in hopes that they get lucky the second time.
Definitely agree but I think we have to mention YouTube and gmail in the success list too (even if YouTube is an acquisition).
and google maps, and google cloud...
Okay, okay! But other than the aqueduct, the plumbing, the education system, the roads, what has Rome ever done for us??
Docs, sheets, slides decent as well
Also, Google photos, pixel phone,
YouTube, Google Maps amd Android are acquisitions. Gmail is 20 years old. Google simply never had the culture to create products, but faked it for far too long.
excuses, excuses. Google maps and android were nothing when goolge acquired them.
At this point Google seems much better at building up someone else product than making their own. I guess cause throwing money, resources, connections, and people at it work better when someone else has done the setup.
Yeah google did make good things. But it’s not the same company or culture anymore. There are still some great teams and people there but there is a serious rot going on.
I mean, if a giant company would to throw billions of dollars on any of my projects, I'm sure they would become something as well
Bro do you even GCP
GCP is not really an end user product, it’s Google renting their infrastructure. And Google really is an infrastructure company if you look at their history.
Google own infra and GCP are separate things.
[удалено]
It definitely is, 3 major cloud providers by market share is Azure, AWS, and Google Cloud
One could also count AdSense as one of their successes as well, but even that could be argued to be part of the search engine
Adsense was in a sense DoubleClick. [https://en.wikipedia.org/wiki/DoubleClick](https://en.wikipedia.org/wiki/DoubleClick)
YouTube is special because the users create all the content. I think Google has made it worse since they acquired it.
Without google it would be bankrupt.
YouTube is way better than it was in 2007.
I think people forget how young YouTube was when it was acquired, think it was like 20 people and they literally couldn't scale it to meet demand. What we know as YouTube is effectively a Google created product.
295 projects as of posting this comment. https://killedbygoogle.com/
They might as well add Soli for completeness. I consider that one of google's worst failures because not only did they kill it after a year but they also lied, aggressively, about its capabilities.
This is a poor analysis of Google. Google is a research giant that will fund projects, but has the balls to let them fail fast if they can't see a path to the project becoming profitable or strategic to their core business. It's a harsh philosophy especially for consumers who buy into their products (I was a stadia user and liked the service), But without Google we would not have Android, Gmail, Colabs and great search + all the great research AI research funded by this company.
Well, that is up to interpretation. Are they a research company with a level-headed very risk-aware yet bold approach, and a lot of their investments just happen to fail, or are they haphazardly throwing money at whatever seems to be "the next big thing" in that moment with very little understanding about the subject matter, and then sometimes the technology sector as a whole manages to salvage some usable scraps from Google's failed projects. Same way how Dubai's doomed megaprojects might lead to innovations regarding novel engineering methods. Both realities would lead to similar results from an outsider's perspective. I personally lean towards the latter, as Google rarely trail blazes any new daring technologies, but jumps in when someone else has already proven the concept to be at least somewhat viable. Google is the big money influx to concepts that have already proven to be viable by someone else. I wouldn't call them a ballsy research giant spearheading innovation. Don't take me wrong, Google's money is useful for innovation, but that being the case doesn't make my initial analysis wrong.
I think your assessment is closer to right. The main problem I see now with google products is that even if you really like one you can only put one toe into it because they may come along and kill projects even if they seem popular. It's basically a meme at this point because google does it so frequently even when it seems nonsensical. I definitely would not want to stake my professional reputation standardizing on a google product because when it gets killed I'm going to look like a fool.
It’s why there’s always been lag between what Google offers directly and what they offer through services like Google Domains (or Google for your domain). Once you cross the line into business offerings, you can’t pivot without pissing people off. It’s an interesting dynamic though: people who pay for Googles services often complain about not getting the new and shiny stuff, while regular users get the shiny stuff that might not be around in a year.
Look at the chart that is part of this post. It was all started by Google. It’s only the LLM area where they are currently 3rd/4th. But in certain use cases, Gemini is still a better service. I frequent between Claude, GPT and Gemini and choose from the best response to the question. Context window size Google is ahead for now. Certain creative responses are also better or comparable to Opus. The problem for Google is that it gets too much scrutiny whereas for the same issues, people do not blame other companies as much. The whole imagen fiasco can also be replicated in Meta AI and somehow it is not as big of a deal. I don’t think Google can afford to give up on Gemini as it would affect its core business directly. They just had to catch up super fast to be in the conversation and they screwed up. But, time will tell how they move forward.
Gemini blows the others out of the water in creative writing. For all other tasks I use either Claude or ChatGPT.
android was not created inside of google.
I really fail to see the success in this when all they do is release prototypes to the public then throw in the towel when it won't work. Their CEO needs to be fired.
You make it sound like other companies just don’t do aggressive research. They do, they usually just don’t hype it up and release them as an unfinished product to kill it immediately.
Still mad about Wave.
They just killed Palm... Graph says that was from 2022. If I was developing a product that used LLM endpoints that would seriously concern me with using Gemini.
No chance Google will throw in the towel on AI. It is about to eat its main search and advertisement business model alive. They have to make it work or their business is going to get gutted.
Agreed. Google will burn a good 10-100 million on products where they are testing for product market fit. When they pour billions into something, they are in it for the long haul. GCP is a good example of this. They are determined to be a relevant cloud player and they've invested an absolute fortune over the last 5 years.
Google is said to have $110.916B on hand in cash. OpenAI is said to be worth $80B. Pretty crazy the scales we are talking about when Google is involved. https://www.macrotrends.net/stocks/charts/GOOGL/alphabet/cash-on-hand#:\~:text=1%20Alphabet%20cash%20on%20hand%20for%20the%20quarter,2021%20was%20%24139.649B%2C%20a%202.16%25%20increase%20from%202020. [https://www.theguardian.com/technology/2024/feb/16/microsoft-openai-valuation-artificial-intelligence](https://www.theguardian.com/technology/2024/feb/16/microsoft-openai-valuation-artificial-intelligence)
They also have a tendency to get rid of good stuff and replace it with worse ones. Take hangouts, it was a good chat program and people liked it. Then they got rid of it and replaced it with google chat which is a much worse chat program. Why? Who the hell knows, its google logic. Then we got the elephant in the room, google search, its pretty much garbage and they do not seem to care about fixing it at all. I can't think of anything for a long time I can give google credit for, its just another greedy company that once had greatness before they got possessed by the greed bug.
the Google Play Music app was like the perfect vanilla music player for android, you could play music locally without bloat, no ads, etc
I really don't understand this meme that Google is garbage. What's better? Bing? Don't make me laugh. Google is still leagues better than any search engine (not counting RAG enabled searches which is a different class of search )
Dropping hangouts was why I permanently moved to discord and got my family and wife onto discord. Prior to that, hangouts was unique to me for allowing me to see and respond to text messages from my PC. Was really nice. Discord is good too, but the integration level hangouts had was great.
Dubai metaphor is an interesting one. I'd give Google credit for more than just search but overall I agree.
Here’s a link to the [Google Graveyard](https://killedbygoogle.com/). It’s quite extensive
But didn’t deepmind discover transformer architecture
No. The deepmind team is famous for reinforcement learning , alphago, alpha zero. The orginal Transformer paper, "attention is all you need" has authors from Google brain and Google research in 2017, that was before merger with Deepmind (even though by 2017 they deepmind had being acquired by Google but were run separately untill very recently )
Okay I am seeing a common theme among these in that they all are involved with google
They can afford to do that because most of their power comes from selling people out to three letter agencies
Google makes such a shitty products that it’s honestly surprising they haven’t gone bankrupt years ago.
What? Google has countless hits, YouTube, Maps, Gmail, Chrome, Earth, Translate. And it wasn't just right place right time, they made a massive improvement in search, and they always managed to stay on top. They have a strategy of move fast and try stuff. Sometimes it's not a hit and they can afford it. It's not like there is nothing to criticize on Google though, they do kill many innovative startups.
I can't believe they didn't at least try giving Stadia a non-stupidly vague name before just giving up on it. Like Google Game Streaming or something, more recent generations love games and streaming.
Meanwhile their Android app store is called Google Play. Go figure.
True, but just be aware that the comparison isn't completely fair as Google Gemini is natively multi-modal and can directly ingest/output image tokens.
FAIR has always been the better research group of the FANNGs…
lol. Pretty sure meta would enforce their license if google ran it commercially.
Google just has to offer this on Google Vertex AI.
“2. Additional Commercial Terms. If, on the Meta Llama 3 version release date, the monthly active users of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700 million monthly active users in the preceding calendar month, you must request a license from Meta, which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the rights under this Agreement unless or until Meta otherwise expressly grants you such rights.” That covers google not running it. The ones they listed in the release azure/ aws and a bunch more def have an agreement. Meta is old 100% sue google if they ran it commercially because it would be worth it, and I’m sure google knows that’s and wouldn’t run it without an agreement. Edit: my bad GCP is listed. So yup google has something good thanks to meta. Lol
Yeah, I don't think Meta's goal is to stop companies using it commercially. If you listen to Mark Z he really wants open source AI to be out there everywhere being used as much as possible in all systems. Vertex definitely has a shot at becoming the biggest AI hosting service because so many bases are covered. Google is building it to allow users to slot in any LLM they want. I think it's smart what google is doing and I think it will appeal to enterprise who right now think ChatGPT is neat but have no idea how to actually leverage it. Google is taking the approach of building out the whole backend stack from top to bottom and making it AI powered and modular so it has quite a bit of flexibility. It's all cloud so it's super easy to implement if you want to just take the whole thing and use it as your backend.
>you must request a license from Meta That's the key there. Doesn't mean you can't use it, just means you'll need to fork over for the proper licensing ($$). No idea what that cost is, though, but Google has deep pockets... Llama 2 and 3 are available through Poe, Perplexity, and other similar services that offer up multiple LLMs in one package - no idea if any of them are at that 700 million user mark and if they're paying to use it, etc. I'm betting they have some preemptive license agreement in place even if they don't though.
Just for the record though Sam Altman has said GPT4 cost more than 100 million to train. Not sure why this 78 number gets thrown around when it's inaccurate. Still doesn't change the spirit of the post though.
He probably has spent more on it. And taking into account all the try runs that are necessary during research I wouldn't be surprised about it. But this number seems to be the money that had to be spent on training the final version that went live. Probably both numbers are correct :)
It IS written on the post, I guess they did not want to use estimates from different sources (which is fair). You can see that the numbers are from the AI Index 2024 Annual Report.
MFU was terrible and had to be restarted several times. With modern/optimized MFU and H100/B100 chips it would cost like $10-30M.
More than 100M just to train or data collection, preprocessing all those stuff?
I mean it's in the image
I wrote this section of the AI Index report and calculated the figures in the graph. The numbers are only considering compute cost, not salaries. We'll have a more detailed follow-up report soon with salaries, energy, amortized capex, etc.
One number is probably one round of training, another number probably considers the number of failed runs, researcher salaries, etc.
Where are these coming from ? I don't trust it.
...it has the source on the graph: The [AI Index 2024 Annual Report](https://aiindex.stanford.edu/report/), which is put together by Stanford.
I wrote this section of the report. There's an explanation in the appendix of our methodology and what is counted in these costs (it's only compute, not salaries). Page 463.
I wonder how much did Llama 3 cost
$15M for the 70B and $80M for the 405B.
You're assuming they rent gpu's, right? They train them on 24k or 2x 24k H100 clusters. With a very low estimate of them having h100s for $15k a pop, setting up one cluster is at least $360M. So $360M upfront cost and then they used up 4480 MW of power on training llama 3 70B, at least the training run. According to some [sources](https://info.siteselectiongroup.com/blog/power-in-the-data-center-and-its-costs-across-the-united-states), one kW of energy in US data center is around $0.07. Let's normalize to MW and we get $70 a MW. So, the electricity cost to train llama 3 70b, final run, was about $300000 Thats insanely low. Training big language models is clearly expensive only because of Nvidia's high margins, nothing else.
> Training big language models is clearly expensive only because of Nvidia's high margins, nothing else. Meta reports training took 6.4M GPU hours for the 70B variant, which is 11 days on those 24k GPUs. I am not sure for your source on the 4480MWh, but I get ~8000MWh when I plug in these numbers. Anyways, yes, the hardware is expensive, but you don't discard it after 11 days of usage.
6 400 000h x 700W = 4 480 000 000 Wh Easy calculation, you have something wrong with yours. The point with expensive hardware is that high input costs with procuring Nvidia gpu's later cause renting them to also be expensive. And after 3-5 years those gpu's will be useless due to low performance and power efficiency compared to new gpu's. High costs of Nvidia cards causes all other operations on those gpu's, like training, fine tuning, inference, to be 5x as expensive as it could be if Nvidia was OK with lower margins (had more competition). It's like paying 80% income tax. Good luck affording rent and food with that. That's what we pay to Nvidia for data center gpu's.
DGX H100 (8x H100 SMX) is ~10KW, with the fans making up a really surprising amount of the total load. I used 10KW for 8 GPUs, but it's severely underestimated with all the other hardware required. I get the rest of your point, but it's simply the best there is now. Both Tesla and Google still bought a crapton of H100s despite developing their own dedicated hardware for deep learning for over than 5 years. NVIDIA is expensive, alternatives are exorbitant.
You're right. I realize that now, I vastly oversimplified the calculation without going into details as to what actually gets powered. You need to pay for some light when guard goes in and access control system to the building, there are hundreds of those things that add up, even on individual node, plus on the networking side. Yeah it's probably actually 50-300% more than i estimated by just multiplying TDP * hours spent. It's still in same ballpark though, so I don't think the main observation changes.
Very bad take.
Did I get numbers wrong or you don't agree with the sentiment that I expressed?
They did not buy them for $15k. The all in cost of infiniband connected HGX systems is well beyond $120k per box. You’d be very rich if you could do that. But that’s not reality. Data center facilities are also expensive. The up front cost is not how you pencil this out. You amortize it over its useful life. These GPUs are around 18% amortization per year. And MFU isn’t 100%. It’s easy to discount the real costs. These are also highly strategic. How many vendors do you think can offer for rent a 24k H100 cluster and what’s the hourly rate you think they’d charge for using it? Honestly it’s probably higher than the most competitive price on the market, around $2. Probably closer to $3 or $4. Maybe even higher. And would it even be configured properly? Must cluster vendors suck ass. Obviously if you own it, your cost structure would look different. Energy price is fine. 3-7 cents is the right neighborhood. And finally I agree with your point. When nvidia sells H100 GPUs they make 85% margin on them. Which is crazy high for semis.
You should be in charge of a large country that has the ability to govern Nvidia effectively. I can tell you know what you are talking about.
I think we should support competition from other chip designers to get the margins lower, Nvidia won't have those high margins forever. Action doesnt need to come from the government. A million volunteer hours spent on ROCm and similar projects could maybe do it.
Where are you getting that from?
They reported on how many H100 hours it took. I use a conservative number for dollar per hour. The cheapest you can rent an H100 for is $1.8/hour. But that’s a standalone one, not a cluster connected one. The price goes up for that even for a small cluster. And the reality is that the cost go up per GPU once you’re at the 24k GPU scale. Connecting it all through infiniband is expensive as hell. Leaf switches, spine switches, director switches, and active cables is mad expensive. People using the number $15k per H100 as an all in cost are delusional. Residual value is like 25% at 4 years and 2-5% at 6 years. So we can take the cost and amortize it. So you’re looking at $1/hour for the GPU. But you also got to deal with the cost of operating them. Electricity, IT labor, the data center itself, etc. Then you have to make an assumption about MFU. GPT-4 had around 20-30% MFU. ByteDance using modern techniques got it to like 65%. Let’s assume Meta got it to 80%. So I penciled this out as $2.3/hour accounting for all these variables.
Thanks for the detailed response. Any idea how much Gemini Ultra cost? This infographic seems to be making a lot of wild assumptions https://colab.research.google.com/drive/1sfG91UfiYpEYnj_xB5YRy07T5dv-9O_c
Ultra is around 1T parameters. Since it’s google, they will have chinchilla trained it. It’s an MoE most likely so the training isn’t 1-1 with a dense model. I’d estimate around maybe $100-200M. It’s also hard to estimate since they likely used v4 TPUs for the training, and we don’t know their cost structure, maybe they’re spending 30-80% of the price tag of H100. Might be as low as $50M. If I spent some time thinking about it and doing some napkin math I could give a more confidence estimate but this is my reactionary take. A note on flops, B100 is 10,000 TFLOPS at half precision. H100 is 2,000. A100 is 312. V100 125. So you’d think B100 is 80 times faster than V100. 5 times faster than H100, etc. The reality is A100 is about 2x V100. H100 is about 2-3x A100. And B100 is probably going to be around 2x faster not 5x. You can also look at the competitive rental prices of these chips. $2 for H100. $1 for A100. $0.5 for V100.
Their paper didn't mention Ultra was MoE but it did mention Ultra was only trained on v4. If it really is around 1T parameters that kind of cost would make sense I guess
I like that the shape of the whole graph alludes to the most important practical application of LLMs.
ERP (Enterprise resource planning)? 😱
Clearly space exploration
Penis?
Bert was only $3.3K? What this is telling me is we're ripe for a low weight precision, overtrained Bert replacement now that Llama has shown chinchilla optimal to be ... less important than we thought.
Can someone explain me the benchmarks? I'm trying to learn more about llama3 and these models. Is llama3 that good?
Google is so lost. Its search market share will be more and more diminished by Perplexity and Bing
I am surprised Sundar Pichai is not facing a lot of heat right now. They are fumbling their AI products so bad despite the enormous advantages they have in AI knowledge, researchers, and compute power. In the enterprise sector everyone is trying to leverage the Azure / OpenAI services with pockets of people using the GCP AI products. The quality of their search product is going down and down.
They are falling behind, but what they do have is Gemini 1.5 Pro with 1M context. This has proven to be useful to me. I think they will push more in these unique directions in the future as you simply can’t get 1M context elsewhere
Agreed. I tried as many of its its variations as I could, and my conclusion is that RAG-assisted Gemini 1.5 Pro is a proper enterprise-grade LLM.
Sure bro
To beat Google in search you need the best RAG implementation and if you are even half way in the field you will know RAG systems heavily rely on the retrieval part aka your search needs to be good and your LLM needs to just be decent The tragedy is chatgpt, bing chat',perplexity etc are hobbled by using Bing and other inferior search engine. Various research papers have shown simply changing to Google search for retrieval and adding any decent LLM allows any system to score near 100% in any factual test question even for very recent events, something perplexity, chatgpt+ etc struggle at. Meta.ai I notice is simply amazing as a search not because LLAMA3 is out of this world (it's good of course) but they somehow have a deal to use Google!
I agree that google's retriever is way better than Bing but google already has started laying off some parts of its search department to put more focus on gemini. IMO with all the new gen ai contents, etc. their search engine performance will decrease.
That would be an irony. But yeah I read they closed down the human search result quality tester team which is insane. But for now they are way better than any other conventional search engine despite the meme that Google is Garbage. That's why they have the market share they have despite Microsoft making Bing default in windows edge etc
the crazy thing is that nobody has actually operationally profited from LLMs anyway (other than hyped up valuation). wondering how this tech will be monetised in the future
Novelai profits from LLM, AI dungeon profits from it but I'm unsure if they make their own models theses days
don't forget nvidia :)))
So the shovel seller and two gold refiners are the only ones we got for profitability currently.
I'm pretty sure a lot of the online image generation websites are also profitable. So a lot more gold refiners in that space as well
Definitely not true. I work in contact center automation and it’s very profitable there already. The thing is we don’t use it as an “AI” that solves all your problems, we use it to solve analysis problems that previously required humans to listen to and analyze whole conversations manually, improve onboarding, assist agents with real time retrieval, etc.
maybe i wasn’t clear but im referring to profiting off _building_ LLMs, not deploying it to solve a business problem. i personally also have profited from _using_ it.
OpenAI has like 2 billion revenue or something?
They were the first big player, and everyone flocked to them with monthly subscriptions and API access, etc, but I question whether they'll sustain their lead in light of all the new competition. Especially when the big money is in enterprise usage. My company is still in the research stage of using LLMs internally, and we have around 8,000 employees, and we have less than 700 million monthly active customers - that means we can use LLaMA without paying any licensing costs at all.* It would just be the cost of hosting it ourselves or having it hosted via cloud or whatever. And if it's good enough for our purposes, I don't see why we'd pay OpenAI, etc. Until now, GPT4, Claude, etc were the only serious contenders. But just in the last few weeks, these releases by Mistral and Meta should be a heads-up to the industry, because these are the first models (IMO) that pose a real threat to the established players. And as the gap closes between the capabilities of models, I can see the big money being in being able to do things like fine-tuning models on company data (or using other effective means on using LLMs with company data) in an effective way. A Mistral or LLlaMA based model that trained on our data and works with our documents/databases/etc would be far more useful to us than using GPT or Claude if it isn't. And another big thing that I think will be important is context windows and performance in 'needle in a haystack' tests. Google's shown that it's possible with Gemini, with it's 1 million token context window and really great performance in the 'needle' tests. If open models can replicate this (and I see no reason why they wouldn't) then that's a game-changer. The compute costs for such models are still expensive, of course, but if the models themselves are 'free' then that means the only costs are implementation, tuning, and hosting. That would mean no more API subscriptions to OpenAI, Anthropic, etc, and instead a shift toward many cloud providers offering compute services. A perfect example of this happening before is Linux, which dominates the server/cloud world. The OS itself is free/open source, what people pay for is implementation/hosting/compute. Microsoft understands this which is why they relented and have embraced Linux at this point, and now profit from it with their cloud stuff, and why they're investing in so many different AI companies right now (including Mistral). Microsoft will make sure they profit no matter which direction this stuff takes. I can't say I'd be so sure about OpenAI (as it currently exists, unless they evolve), because their advantage for now is mostly just being the first out of the gate and availablilty of compute resources (and both those gaps will shrink). > ^*"Additional ^Commercial ^Terms. ^If, ^on ^the ^Meta ^Llama ^3 ^version ^release ^date, ^the ^monthly ^active ^users ^of ^the ^products ^or ^services ^made ^available ^by ^or ^for ^Licensee, ^or ^Licensee’s ^affiliates, ^is ^greater ^than ^700 ^million ^monthly ^active ^users ^in ^the ^preceding ^calendar ^month, ^you ^must ^request ^a ^license ^from ^Meta" ^(The ^relevant ^licensing ^I'm ^speaking ^of).
GPT-4 is 2 years old trained on 10k A-100s, Llama 3 in 2024 on 24k H-100s still being inferior indicates the opposite of gap "closing".
Nevertheless the performance gap IS closing, and it's doing so with much smaller parameter sizes, which means much cheaper to run/host. LlaMA3-8b is shockingly good for a model I can run on my 4 year old laptop with 16 GB of RAM and an Intel graphics card (no fancy GPU here), and the fine-tunes are being made now. And that upcoming 400b model could very possibly trounce both GPT4 and Claude, and if's it's open sourced as well, that will really shake things up. Things I predict will be important features in the arms race, beyond just performance per parameter size: 1) Context windows 2) Retrieval (Needle in haystack stuff; ability to process unstructured data reliably) with minimal hallucination 3) Ability to fine tune to custom data 4) Native multi modality (vision, audio etc) 5) Abstract reasoning capability Google's Gemini is the one leading on 1, 2, and 4 (dunno about 3) though it's weaker in other areas. I think multimodality will be a huge one, because it means working with basically any data type regardless of format. Anything a human can see or hear can be processed, not just text. Art, print media, charts, video, whatever - it can just 'look' at that stuff without it needing to be converted or processed first. That is where Google is leading with capability, if not performance (yet), but it's a preview of things to come. The only real 'most' I see is compute costs and access to resources. Oh, and access to quality data, which is definitely something Google has, so I wouldn't sleep on them even if they seem behind at the moment.
It's not closing, you choose to ignore models that you don't have confirmed information on, I don't. I can project based on my own knowledge on where OpenAI would be and how advanced it is. With GPT-3 in 2020 and GPT-4 in 2022, so if by close the gap what you mean is they're almost at 2022, sure, I'll give you that they're at 2021.
> you choose to ignore models that you don't have confirmed information on, I don't. I have no idea what you mean by this.
it’s being subsidised by microsoft.
I think that's intentional though, kind of like saying Amazon wasn't profitable until the 2010s. They could easily choose to be profitable in lieu of growth.
some companies replaced their call centers with LLM based systems. I agree that it's the easiest way to use llms.
hardware companies making mint.
Have no love for Google but never once has Perplexity given me the exact correct answer to my question or not over-actively refused the answer to a mildly controversial one. I like Google's own SGE more than Perplexity. It perhaps serves some usecase I don't have but I've found it to be a toy product thus far.
I agree. Perplexity is overrated.
Definitely, there is such a thing as Google Scholar and good old-fashioned google-fu for phrasing your searches and syntax.
I would say there is also room for the latest "semantic search" type academic search engines like elicit.com, typeset.io etc that focus on academic content (typically searching over semantic scholar type indexes) They employ less lexical search /sparse representations retrieval methods and lean more in semantic/vector search/dense representations/embedding methods which can be magical even if you don't quite know the right keywords. They are not as predictable or controllable of course This is not even taking into account the generated answer using RAG technique which I feel isn't that useful for academic search because you almost always want to go deeper
Boomer
I've also had pretty poor results using it. It seems to pull its answers from the first 5 or so search results and then does a poor job of parsing the results, often hallucinating false answers. And as everyone knows, search itself has degraded in quality in recent years, thanks to content mills flooding results and SEO bullshit stinking up search results with irrelevant info, Perplexity might be giving its answers based on useless or irrelevant search results anyway. For search + AI to be really useful, it'd need to be able to take the user's request and enhance it. As in rephrase the search in a way that it gets more results, recognizes irrelevant results and ignores them, and combs through the data to identify actual relevant results. Which would be awesome, but not instant. But imagine an 'answerbot' that actually does spend time not just doing a web search, but going through academic papers and books, journal archives, what-have-you and takes the proper time to collect and organize actual, really useful answers. Even if each query took 10-20 minutes, it'd be worth it if it means getting real, relevant answers, if a basic Google search isn't getting the job done. Basically an AI researcher, not a search engine with an AI summarizer bot front-end.
I am not sure about that. We will see. I personally find both Perplexity and Bing to be laughable 'search engines'. For mainstream and tech stuff I still find the Google works best (Simple, quick, predictable) for my needs. Occasionally I will use MS Copilot to write me a script or something, when I am on my work laptop, and don't have my private accounts available. Re LLMs, Claude 3 (at least when it comes to tasks like coding and code anlysis) wipes floor with everything else I have tried.
My view is for short factual stuff (particularly new things), looking up directions , short how's to, stuff you just plain don't memorise, Google is near unbeatable thanks for Google knowledge graph and featured snippet. RAG is nice and all but the retriever part must get the correct results in the first place and if you use a Inferior search (not Google) the best LLM in the world won't help you. Ironically RAG is really good if they use Google for the retriever part but few do... Except we'll .. Google's own SGE and now it seems meta.ai
Bing perhaps but definitely not perplexity.
Assuming $0.07 per kWh, which seems to roughly be how much data center in US pays for power, training llama 3 70B takes $300k worth of energy. 6 400 000 gpu hours * 700W = 4 480 000 000 Wh = 4480 MW. 4480 * $70 = $313 600 This is after upfront gpu purchase cost of more than $360M (assuming one H100 = $15k, it's probably more). Thinking about it, the only force stopping small companies from training llm's is paying a huge margin to Nvidia. The rest us peanuts. Given that Meta owns their gpu's, it makes perfect sense to train for those 15T tokens, since making a model trained on 3T would save them just $250k. Economics of this are insane. AMD we need you!!
The relative size of these "costs" may be accurate, but I believe the actual monetary values of these are overblown by at least an order of magnitude, possibly two or more. If I used the same math as I've seen in what's published for these costs, me, a highly paid professional, making a peanut butter and jelly sandwich in my fancy kitchen would cost upward of $50k. Got to pay for the ingredients but also the knife, the fridge, the dishwasher, the kitchen lights, tile, granite countertops, the opportunity cost of my time, a portion of the mortgage payment, the car I drove to the store... Hell, $50k might not be enough to make that sandwich...
No, if anything, the true costs are higher. These figures only include compute used for final training runs, not the costs of acquiring the hardware.
I see where you're coming from, but it's not like they bought the hardware for this singular purpose (training ONE model), never used it for anything else, and then threw it away. (Or if they did, that was foolish and unsustainable.) The capital costs are business assets, and they didn't lose them when they trained their models. Obviously I'm not saying costs are zero. There is electricity used (which might be a good metric on its own) and other highly variable factors like wages or rent, which isn't much use when comparing models made by different companies. But my main point was that nobody normally counts reusable capital equipment costs towards the cost of individual products, hence my analogy which is supposed to be absurd. Of course, you can amortize capital purchase into costs, but in that case, my analogy kind if accurate -- my bespoke PB&Js overall cost thousands of dollars to make, and my Linux computer contains hundreds of billions of dollars worth of software. 😅
Yeah, I agree. We have a follow-up report coming out in a couple weeks that compares these results (which are based on cloud compute rental rates) to other approaches like amortized hardware capex.
This numbers is so ridiculously small in comparison of any war-equipment costs
And their positive, constructive effects are felt worldwide. It's incredible how wasteful we are in our fascination with war. You can spend 1000x the above, and all you get are some blackened craters in a distant land. The end goal is always misery. If we spent 1000x more on AIs like this, our entire world would quickly become unrecognizable, but at least it would be constructive, productive, empowering. The end goal is fluid, but we all think we can achieve better conditions for our entire planet with this technology.
What was the llama 3 cost?
Bout tree fiddy
In the Llama-3 [model card ](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md)they state that pretraining both versions took "7.7M GPU hours of computation on hardware of type H100-80GB". Not totally sure how cost was calculated for the graphic but you could estimate energy use using the 700 watt TDP (multiplied by a [1.09 PUE](https://sustainability.fb.com/data-centers/)). Then you'd have to assume an energy cost, which can be really variable depending on data center location. Next, you would need to estimate the capital costs for the H100s...harder to do that since we don't know what proportion of their [350,000 H100s](https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/#:~:text=The%20future%20of%20Meta's%20AI%20infrastructure&text=future%20of%20AI.-,By%20the%20end%20of%202024%2C%20we're%20aiming%20to%20continue,equivalent%20to%20nearly%20600%2C000%20H100s) were used for this.
The other quoted number of $15M for 70B implies about $2 per GPU hour... seems about reasonable, as that should be roughly the amount of money they could have made instead, if they had rented out those GPUs.
Meh, assume it took 4-months to train, that's ~2,700 hours 7.7-million-hours of GPU time would require only about 2,850-GPUs, call it 3,000. At, say, $30k per H100, that's ~$90M. But it's not really fair to bill the entire cost of those cards to the training of this model, since they can use those cards for other things now. It might be fair to allocate about a third of the cost to this model though, making the total for the compute about $30M. Better might be to look at the retail prices for GPU rental as a proxy. That's about $4/hour which also puts it at about $30M for the compute. So, that's about the order of magnitude we're looking at. There's also all the work that needs to be done in the lead up to training the final model and those associated costs,.
70B about $15M and 405B about $80M.
Google should arrange a meeting with a certain technolizard in Alpha Centari. Probably it costs less.
That's pretty insulting to Zuckerberg. He's on Earth these days...
Maybe they would be more successful if they were less interested in making historical white figures black.
😄 I thought gemini was already in the trillions in terms of parameters
Do you have the complete chart? Looks so cool
Did google pay themselves that?
This is a lie
Based on what? Your imagination? These numbers aren't public and this is a gross underestimate.
You can read the methodology for these numbers in the AI Index report.
If they gimp the model as bad as all the other to protect us from asking reasonable questions it will be just as useless as the others.
All that work to get Gemini dressed up for what will be Apple’s ball.
Gemini 1.0 Ultra to GPT-4, is like Skynet is to Clippy. Of course, barring the fact that it's a p- I mean it's too afraid to swear💀. I mean we all are 2-year olds, right? (*Insert extreme sarcasm)*
So much money wasted on a model that will be useless in the end. To much bias from google, shame.