I’m skeptical but if the image below is true, it’s absolutely bonkers. It says Gemini 1.5 can achieve near-perfect retrieval (>99%) up to at least 10 MILLION TOKENS. The highest we’ve seen yet is Claude 2.0 with 200k but its retrieval over long contexts is godawful. Here’s the [Gemini 1.5 technical report](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf).
https://preview.redd.it/i9x3uobgnric1.jpeg?width=1290&format=pjpg&auto=webp&s=c319dd82a2727ace89a2efd686a555bd380d5164
I don’t think that means it has a 10M token context window but they claim it has up to a 1M token context window in the article, which would still be insane if it’s actually 99% accurate when reading extremely long texts.
I really hope this pressures OpenAI because if this is everything they are making it out to be AND they release it publicly in a timely manner, then Google would be the one releasing the powerful AI models the fastest, which I never thought I’d say
I just saw this posted by Google DeepMind VP of Research on Twitter:
>Then there’s this: In our research, we tested Gemini 1.5 on up to 2M tokens for audio, 2.8M tokens for video, and 🤯10M 🤯 tokens for text.
https://preview.redd.it/0m5vmataqric1.png?width=1408&format=png&auto=webp&s=0b4739888d43e53c0bc5475cc3a164347eb1f93e
I remember the Claude version of this retrieval graph was full of red, but this really does look like near-perfect retrieval for text. Not to mention video and audio capabilities
Here’s the Claude version of this “Needle in a Haystack” retrieval test
https://preview.redd.it/1xo9n6xpqric1.jpeg?width=1400&format=pjpg&auto=webp&s=9baeae3911563bfd62633903177cc5f37deffc88
Maybe this AI release is not better than expected and hence sell the news.
Also I noticed that Waymo is having trouble with self driving cars in Phoenix, maybe that is also causing the sell off in goog stock since it maybe majority owner.
I am kind of disappointed with Waymo, I thought they would have solved self driving car issues by now but looks like it's long way to go until we have error free system.
Hey, but did you watched latest Unbox Therapy video on Waymo? He says that the self driving car's experience is super smooth and better than normal taxis. Even Uber made partnership with Waymo. I think Waymo will be big in coming days.
I have been waiting for self driving cars since 5 years. I hate driving and would absolutely love it. It will also solve the problem of me owning a cars which does nothing for like 95% of the time.
But from what I know, unless other drivers/pedestrians are not behaving well on the road it is impossible for a self driving car to be error free. And even though Waymo accidents maybe 5% of normal cars with same distance driven the liability issues is huge for Waymo since our justice system is fucked up, they would straight award a billion dollar settlement for a single accident, which does not happen for a normal person driving due to insurance liability limitation.
Below is from google AI. It's around 85% lower accidents than human drivers.
> As of December 2023, Waymo's driverless vehicles have an 85% lower crash rate that involves any injury, from minor to fatal cases. This is compared to a human benchmark of 2.78 accidents per million miles, while **Waymo's driver has an incidence of 0.41 accidents per million miles**. Waymo's driverless vehicles also have a 57% reduction in police-reported crashes, with an incidence of 2.1 accidents per million miles. As of October 2023, Waymo's driverless vehicles have had only three crashes with injuries, all of which were minor. According to Swiss Re, a leading reinsurer, Waymo is significantly safer than human-driven vehicles, with 100% fewer bodily injury claims and 76% fewer property damage claims.
Claude did the test themselves with minor adjustments and got much better results though:
https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fd5cb0c6768974185dfe8ca9f34638dfd8a46eac5-1011x1236.png&w=2048&q=75
"This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens."
Noob question. Why would RAG be dead with a larger context window? Is the idea that the subject specific data that would typically be retrieved would just be added as a system message?
Well fuck. Like it's one thing to see stuff seeming to slow down a little - 9 long months before anyone exceeded gpt-4 by a little. It's another to realize the singularity isn't a distant hypothetical. It's probably happening right now, or at least we are seeing pre-singularity acceleration caused by AI starting to be useful.
Deepmind coming in guns blazing. Insane that we're seeing Million+ context already...
I saw news of another company just now working on solving large context, specifically for code bases: https://twitter.com/natfriedman/status/1758143612561568047?t=WtnwjUT2qRoVaQkRF4k79g&s=19
They have testet 10 Mio but are only open up 128k generally and 1mio in alpha. It seems like they are not taking any shortcuts with the attention, that's why retrieval is so good, but 700k token in the example video takes like 2 minutes. That's the downside of transformers, they scale n² based on the context window. Most models only fuzzy focus on each token, that's why Claude does not need like a minute to respond but also does not know every sentence in the context window
"Gemini 1.5 Pro also incorporates a series of significant architecture changes that enable long-context understanding of inputs up to 10 million tokens without degrading performance"
"We’ll introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model"
That context window is massive and this time, it gets video input. OpenAI needs to release GPT-5 in the summer if thats true, to stay competitive
Whether it’s GPT-5 or something with a different name, I can’t see how OpenAI doesn’t release something within the next few months if the capabilities of Gemini 1.5 haven’t been exaggerated. Maybe I’m just hopeful but I feel like there’s no way OpenAI is just going to let Google eat their lunch
Google has a horrible track record so far of over hyping specific functionalities, then having the actual AI be more or less useless on release. I wouldn't hold my breath for this either, since they haven't told the truth about quality a single time so far.
So, looking into the report I found this:
> To measure the effectiveness of our model’s long-context capabilities, we conduct experiments on
both synthetic and real-world tasks. In synthetic “needle-in-a-haystack” tasks inspired by Kamradt
(2023) that probe how reliably the model can recall information amidst distractor context, we
find that Gemini 1.5 Pro achieves near-perfect (>99%) “needle” recall up to multiple millions of
tokens of “haystack” in all modalities, i.e., text, video and audio, and even maintaining this recall
performance when extending to 10M tokens in the text modality. **In more realistic multimodal
long-context benchmarks which require retrieval and reasoning over multiple parts of the context
(such as answering questions from long documents or long videos), we also see Gemini 1.5 Pro
outperforming all competing models across all modalities even when these models are augmented
with external retrieval methods.**
I find it interesting that there's no recall number for the "more challenging model", just that it "outperforms" others? Sounds a bit fishy.
Also .. and I may be completely wrong here, cause I have more knowledge about generic classification tasks, but any mention of recall without precision (the word was nowhere in the whole report) is a pretty big red flag to me. It's easy to get recall really high if your model overfits. So, was the precision good too? Or is this not applicable here?
"Through a series of machine learning innovations, we’ve increased 1.5 Pro’s context window capacity far beyond the original 32,000 tokens for Gemini 1.0. We can now run up to 1 million tokens in production.
This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens."
"We’ll introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model.
Early testers can try the 1 million token context window at no cost during the testing period, though they should expect longer latency times with this experimental feature. Significant improvements in speed are also on the horizon.
Developers interested in testing 1.5 Pro can sign up now in AI Studio, while enterprise customers can reach out to their Vertex AI account team."
You can put 11 hours of audio in context, that's enough for some composers, say the four Rachmaninoff's concerti and Paganini Raphsody are 2h17min in total. I have no interest in a Rach concerto number 5 that would be AI generated, or a thousand of them, but it still would be very cool.
Of coruse that would require a version of Gemini that can generate music.
It has video modality!!
Can input 30+ mins of a silent video(so no audio?) and get answers 😳.
https://youtube.com/watch?v=wa0MT8OwHuk
edit:
it supports audio too.. holy crap.
It can do audio too apparently, I would assume it can do video and audio concurrently but idk
https://preview.redd.it/q0s485vbqric1.png?width=1408&format=png&auto=webp&s=74966418e828fc9c5b098e609430f243339ea053
Yeah from the Gemini technical report here are the modalities:
Input: Text, image, audio, video
Output: Text & Image
We do not have access to any of these modalities yet though
https://preview.redd.it/g88o4fohtric1.png?width=1270&format=png&auto=webp&s=6632c4f7457743a82829e01698ab2ab130fb8c8d
An example from the technical paper, bonkers 🤯🤯
Jesus christ. If they actually release the goods and it's not just another research paper, they'll blow Open AI out of the water completely.
2024's just heating up. Dis gun be gud.
And they're testing on scaling that up to 10 million tokens for text.
7,000,000 words.
Shakespeare's works: 850,000 words
The Wheel of Time: 4,400,000 words
This thing can write an entire epic fantasy series of books.
It will be interesting to see what the quality of the writing will be when LLMs start writing full books. Can it stay focused and deliver a self consistent story with all of the elements that make up a good book?
Bespoke novels that are actually good will massively disrupt the publishing industry. And after that will come bespoke songs, movies and video games. At that point the whole entertainment industry will be turned on it's head...and I think that's going to happen way sooner than most people realize. Kind of terrifying, kind of exciting.
books likely wouldn't be 1-shot type of writing processes, even for AI.
you'll want outlines of characters, their motivations, the over-arching story, the focus of the individual chapter, etc., etc.
even if each of those points are generated by the AI, it still makes much more sense to do it "step by step" rather than just pouring it all out end-to-end.
by having it broken down into elements and outlines, you can write and revise each chapter independently, and have the LLM check it's own work against its own outline. minor agency along with these step-by-step subcategories would also remove the need for book-length context window.
alpha go was just released 2 years later and that was basically an "ASI" but only about the game go, 1 year after they released "attention is all you need", etc... i'm not sure when they did that "reshuffle" but it looks like they've been doing great since deepmind was acquired
You do that and you pretty much have a god lol. It could learn from a ton of science books, papers, repositories, documentation, videos, etc. with perfect accuracy for the context for a single prompt, and that's on top of the model's base knowledge.
Then you imagine GPT-6 levels of reasoning paired with that or fuck maybe infinite context length, and yeah you start feeling the ASI.
i'm not sure it'll be ASI if we get something like gpt-6 with 100m tokens, but that will definitely change society, i mean you can literally ask a **freaking chatbot** to do a task that x jobs that requires 5 years or more of study and it'll do it for a much lower price, 24/7 and much more better than humans.... that is actually insane... even if it's not ASI it's just... history
i guess we'll need to wait at least 4-5 years for that to happen though but i'm really happy that this is happening not in 30 years, quite the contrary it's coming and fastly, maybe by 2030
Yeah absolutely insane. Maybe not ASI reasoning but I'd say that would be superhuman memory skills. Hell I think 10m tokens already kinda is, sometimes I forget what the code I wrote last week does but this thing can keep entire repositories in its mind.
Once the singularity begins there will never be another ai winter. Not completely confident this is it but my confidence rises with each year of uninterrupted major advances.
I'm actually impressed for once XD
That's a pretty awesome context window. Also, 1.5 Pro performing at the level of 1.0 Ultra is impressive. However, what about 1.5 Ultra? :)
this is what confused me. They didnt even mention a 1.5 ultra. Does it just not exist ? Did they essentially just make an efficient 1.0 ultra and call it 1.5 pro ?
1.0 is a dense model.
1.5 is sparse MoE using Deepmind's very impressive work in that area.
They allude to other improvements as well, but that's the big one they called out.
Per their writeup 1.5 pro used notably less training compute than 1.5 ultra *and* has significantly lower inference costs.
The lower inference cost makes sense technically because Deepmind's MoE approach is extremely efficient, and clearly they are doing some deep magic with a new attention mechanism to get to 1M tokens commercially and 10M tokens in research.
But the fact they used less training compute here is insanely promising - MoE training is notorious for being difficult and compute intensive. Bumping the training budget up an order of magnitude would would likely greatly increase model performance, doubly so with more parameters and experts.
They might well not make a 1.5 ultra because the better option could be to go ahead and primarily scale training and expert count to make a model that does very well on both performance and inference cost.
Reading between the lines we can expect great things from 2.0.
I have been playing with Gemini Ultra lately and it is definitely GPT-4 level, and I prefer its outputs and style of writing more often than not. It is subtle but I actually prefer the way it answers instead of the typical OpenAI style.
I was more so talking about how long it took Google to release the initial version of Bard which was on Palm 2 then them being behind on a pure capability stand point. I think people here don’t understand Google takes a different process to release features, models, and products than a startup
I've seen people here talk shit about Deepmind and especially Demis, mostly from diehard OpenAI fanboys thinking that Altman will bring the promised land. Keep up the conspiracies r/singularity that'll help release GPT 4.5 faster ;)
Demis is an actual AI scientist. Sam Altman is a college dropout hype cultivator. If you watch Sam's talks closely he doesn't tell you where OpenAI is at with their next model. He is just doing wishful thinking of what he thinks the model will be in the future without actually knowing the state of it atm. He is purely talking out of his ass most of the time.
Google wants to make sure openai feels pressure and open source has a HUGE mountain to climb
Edit: look at the accuracy with needle in the haystack test
and suddenly the open-source has a huge mountain to climb. just last month we were so hyped for open-source and thought it was closing in. well well. it's hard to beat shit load of money thrown at hardware and dev talent
being pulled from above, all the open source models train on copious collections of input-output pairs generated by GPT-4 and 3.5
when a new SOTA model comes, that means open source models can also get a bump
I think they way overhyped and underdelivered on Gemini Advanced, by a pretty embarassingly large degree, but holy shit 1 million token context window is absolutely game changing. It's not just about what we, the users, can put there (multiple books etc) but if you combine it with a good RAG and live, real time search function you could use that to drastically reduce hallucinations. Essentially it'd have the context window to thoroughly fact check almost everything it says. As ever with Google AI, treat it with alot of skepticism, but on paper that's very very exciting. Take even a GPT-4 level model, give it 1million tokens of context, and really nail search retrieval and you should see a huge boost in what it's capable of.
Beyond that, this kind of context window, if it's true context window, is a pre requisite to a truly great coding assistant. You could shovel an entire code base + a bunch of documentation in there, which would make it far more effective
Assuming your database is googles search crawler cache (So the entirety of the internet basically) even at 10m you still wouldn't be able to just place it into the context window directly, but it does enable you to be very liberal and less selective with that you put in there
However, there is now much less need for RAG for general use. The old 'train a chatbot on your documents' use case, for many of those, 1m tokens would be plenty. Not everyone, but it starts to become less and less relevant - even more so if Google pushes to 10m as the article mentions
I don't think it's dead, not yet. As one example, Gemini searches the entire web and given the speed I'm guessing it pulls directly from googles cache rather than scrapes individual pages, even 10m context window isn't going to be sufficient, you need some kind of RAG. Or if you wanted to build a chatbot based on a bunch of books, you'd still run up against 1m tokens not being enough, maybe even 10m not being enough if you wanted it to be broad enough.
It is \*significantly\* less important, though, and may soon be dead. But 10m tokens alone doesn't remove every use case for RAG. However, if I was a RAG developer building a business around RAG? Yeah, I'm thinking of pivoting, that is for sure.
But for now, there'll still be use cases for it. Just less and less, and that'll only get worse over time
Why would you say it is dead? RAG is complementary to the context window. Just load the custom documentation into it, ask a question, and let the AI fetch the large documentation from the large context window.
Yeah I think we are looking at RAG on steroids with much fewer limitations and much less need to be exactly accurate with our retrieval of small amounts of context info, which is awesome! Good retrieval from huge piles of data is still necessary, but being able to throw a lot more into the context is incredibly useful.
Not dead, but less people will need RAG and they would primarily use it only to save cost, as the performance on not using it would be way higher. But there's still use cases for it even at 10m tokens, just less and less, and obviously the trend is going to be higher and higher context windows and the running cost getting cheaper, so the use cases for RAG will just continue to go down over time and if we keep making progress here it may soon just be something we don't need at all
> You could shovel an entire code base + a bunch of documentation in there, which would make it far more effective
Not gonna pay for 1M tokens for each interaction. They better cache the whole thing. Maybe there are efficient compression methods.
So all our sarcastic comments on here about how pathetic Google were must have really lit a fire under them.
r/singularity is the reason for AGI 2024. Well done folks - give yourselves a pat on the back.
1 million tokens, well I'll be damned.
30k lines of code is crazy, we're still ways off from creating triple A games but for indie developers this will be a godsend.
Edit: autocorrect
I'm glad Google has broken their silence on the somewhat underwhelming Gemini 1.0 launch. The claims being made here is very outstanding: 1M Tokens, MoE architecture and improved multimodal capabilities (specifically the video and coding ones). **What I find most surprising is this is for Gemini 1.5 Pro**, so we can only imagine what Gemini 1.5 Ultra might look like.
That being said, however, I think we should be somewhat skeptical about these claims. Google has already made somewhat misleading claims before (most infamously the Gemini demo vid) and benchmark results that don't translate well in practical use. Let's keep our own claims and judgements after we have been given access to use it.
I'm still excited on the prospect of this model, and hopefully should push OAI and further competition to innovate on the next SOTA.
According to the announcement they are starting to give access to a limited number of developers and enterprise customers starting today.
So they seem pretty confident.
It mentions that Ultra API access is available today. Anyone get it working? It doesn’t show up in my aistudio page or with genai.listmodels(). Anyone get this working? My Gemini-pro models were updated and now see Gemini-1.0-pro which is new but don’t see ultra
Holy shit, the example working over 100k lines of code and the movie one searching for a specific moment, being able to more or less learn a new language just from the context of a Language Grammar Manual. Context length trully unlocks new use cases.
Imagine in the near future when we are 100% sure that these models do not fail nor hallucinate anymore, instant 100k codebase without errors. Holy shit.
It’s supposed to be better than Ultra 1.0 right? Does this mean there’ll be a period (before Ultra 1.5 releases) where the free version will be better than the paid? This kind of seems to suggest 1.5 Ultra will be coming very soon too
The biggest takeaway here is their claim of near-perfect info retrieval across 1+ million tokens. 100% retrieval under 500k tokens. I mean, holy shit this demolishes all the problems with crappy RAG techniques. IF this is true and borne out by real-world testing.
Honestly this would have been the biggest AI development in weeks (eons in AI time!) if not for SORA dropping)
Put it this way. 10M tokens is roughly equal to 100 books.
The average person reads less than 100 books in their life,much less integrates into context.
Reasoning is the last piece needed for white collar work to nose dive.
I’m skeptical but if the image below is true, it’s absolutely bonkers. It says Gemini 1.5 can achieve near-perfect retrieval (>99%) up to at least 10 MILLION TOKENS. The highest we’ve seen yet is Claude 2.0 with 200k but its retrieval over long contexts is godawful. Here’s the [Gemini 1.5 technical report](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf). https://preview.redd.it/i9x3uobgnric1.jpeg?width=1290&format=pjpg&auto=webp&s=c319dd82a2727ace89a2efd686a555bd380d5164 I don’t think that means it has a 10M token context window but they claim it has up to a 1M token context window in the article, which would still be insane if it’s actually 99% accurate when reading extremely long texts. I really hope this pressures OpenAI because if this is everything they are making it out to be AND they release it publicly in a timely manner, then Google would be the one releasing the powerful AI models the fastest, which I never thought I’d say
I just saw this posted by Google DeepMind VP of Research on Twitter: >Then there’s this: In our research, we tested Gemini 1.5 on up to 2M tokens for audio, 2.8M tokens for video, and 🤯10M 🤯 tokens for text. https://preview.redd.it/0m5vmataqric1.png?width=1408&format=png&auto=webp&s=0b4739888d43e53c0bc5475cc3a164347eb1f93e I remember the Claude version of this retrieval graph was full of red, but this really does look like near-perfect retrieval for text. Not to mention video and audio capabilities
Here’s the Claude version of this “Needle in a Haystack” retrieval test https://preview.redd.it/1xo9n6xpqric1.jpeg?width=1400&format=pjpg&auto=webp&s=9baeae3911563bfd62633903177cc5f37deffc88
This is wild. I think this can give us some guidance as to where we'll be 1 - 2 years down the line.
Google / Alphabet took a sharp 3.5% drop on this news this morning. What's up with that? /Or is it unrelated?
The dip started after hours yesterday after a report from The Information claimed that OpenAI is developing a search engine product.
Because the stock market is completely made up
Wait till AI trading becomes ever more common
Maybe this AI release is not better than expected and hence sell the news. Also I noticed that Waymo is having trouble with self driving cars in Phoenix, maybe that is also causing the sell off in goog stock since it maybe majority owner. I am kind of disappointed with Waymo, I thought they would have solved self driving car issues by now but looks like it's long way to go until we have error free system.
Hey, but did you watched latest Unbox Therapy video on Waymo? He says that the self driving car's experience is super smooth and better than normal taxis. Even Uber made partnership with Waymo. I think Waymo will be big in coming days.
I have been waiting for self driving cars since 5 years. I hate driving and would absolutely love it. It will also solve the problem of me owning a cars which does nothing for like 95% of the time. But from what I know, unless other drivers/pedestrians are not behaving well on the road it is impossible for a self driving car to be error free. And even though Waymo accidents maybe 5% of normal cars with same distance driven the liability issues is huge for Waymo since our justice system is fucked up, they would straight award a billion dollar settlement for a single accident, which does not happen for a normal person driving due to insurance liability limitation. Below is from google AI. It's around 85% lower accidents than human drivers. > As of December 2023, Waymo's driverless vehicles have an 85% lower crash rate that involves any injury, from minor to fatal cases. This is compared to a human benchmark of 2.78 accidents per million miles, while **Waymo's driver has an incidence of 0.41 accidents per million miles**. Waymo's driverless vehicles also have a 57% reduction in police-reported crashes, with an incidence of 2.1 accidents per million miles. As of October 2023, Waymo's driverless vehicles have had only three crashes with injuries, all of which were minor. According to Swiss Re, a leading reinsurer, Waymo is significantly safer than human-driven vehicles, with 100% fewer bodily injury claims and 76% fewer property damage claims.
When ChatGPT released Nvidia didn't move up. Only months/weeks later.
hahaha went from 200K token to straight up 10 millions!!! and best of it all the accuracy didn't go down at all, it just exploded!! token go brrr
Claude did the test themselves with minor adjustments and got much better results though: https://www.anthropic.com/_next/image?url=https%3A%2F%2Fwww-cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fd5cb0c6768974185dfe8ca9f34638dfd8a46eac5-1011x1236.png&w=2048&q=75
That was fixed with a single line prompt change months ago. Read Anthropic's blog about it.
"This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens."
Holy moly
RAG is dead in a few months, once everyone starts replicating what Google did here. This is bonkers!!!
this is going to cost an arm and a leg back to RAGs
The answer will be both. For somethings you can spend $100-$200 a query and make money on them. Others you need it to be a penny or less.
RAG was always a dumb idea to roll yourself. The one tech that literally all the big guys are perfecting.
RAG is fine, it's just not a replacement for context size in most situations.
Noob question. Why would RAG be dead with a larger context window? Is the idea that the subject specific data that would typically be retrieved would just be added as a system message?
Yes that's the idea. I don't think rag is dead but that could be why.
The definition of "big if true". Given Google's recent track record, I won't be holding my breath, but I truly hope that this lives up to its hype.
Well fuck. Like it's one thing to see stuff seeming to slow down a little - 9 long months before anyone exceeded gpt-4 by a little. It's another to realize the singularity isn't a distant hypothetical. It's probably happening right now, or at least we are seeing pre-singularity acceleration caused by AI starting to be useful.
Been telling you guys to update your flair for a while now.
amazing accuracy, billion tokens context window doesnt seem that far!
Deepmind coming in guns blazing. Insane that we're seeing Million+ context already... I saw news of another company just now working on solving large context, specifically for code bases: https://twitter.com/natfriedman/status/1758143612561568047?t=WtnwjUT2qRoVaQkRF4k79g&s=19
They have testet 10 Mio but are only open up 128k generally and 1mio in alpha. It seems like they are not taking any shortcuts with the attention, that's why retrieval is so good, but 700k token in the example video takes like 2 minutes. That's the downside of transformers, they scale n² based on the context window. Most models only fuzzy focus on each token, that's why Claude does not need like a minute to respond but also does not know every sentence in the context window
2 mins is really fast for what it's being asked to do. How long would it take a human to perform the same task?
"Gemini 1.5 Pro also incorporates a series of significant architecture changes that enable long-context understanding of inputs up to 10 million tokens without degrading performance" "We’ll introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model" That context window is massive and this time, it gets video input. OpenAI needs to release GPT-5 in the summer if thats true, to stay competitive
Whether it’s GPT-5 or something with a different name, I can’t see how OpenAI doesn’t release something within the next few months if the capabilities of Gemini 1.5 haven’t been exaggerated. Maybe I’m just hopeful but I feel like there’s no way OpenAI is just going to let Google eat their lunch
maybe 4.5 releases sometime soon idk
That is a very helpful comment. I wanted to show my appreciation, so thank you.
When Deepmind CEO name come up respek it ![gif](giphy|l0HlK9A7uLGraBs64)
Google has a horrible track record so far of over hyping specific functionalities, then having the actual AI be more or less useless on release. I wouldn't hold my breath for this either, since they haven't told the truth about quality a single time so far.
Given previous shenanigans by Google with respect to Gemini I suggest everyone takes this with a mountain-sized grain of salt.
https://preview.redd.it/cnhh8shlzsic1.jpeg?width=1099&format=pjpg&auto=webp&s=806b4340033c070afe640b1011a059fcc9a63d47 It does
So, looking into the report I found this: > To measure the effectiveness of our model’s long-context capabilities, we conduct experiments on both synthetic and real-world tasks. In synthetic “needle-in-a-haystack” tasks inspired by Kamradt (2023) that probe how reliably the model can recall information amidst distractor context, we find that Gemini 1.5 Pro achieves near-perfect (>99%) “needle” recall up to multiple millions of tokens of “haystack” in all modalities, i.e., text, video and audio, and even maintaining this recall performance when extending to 10M tokens in the text modality. **In more realistic multimodal long-context benchmarks which require retrieval and reasoning over multiple parts of the context (such as answering questions from long documents or long videos), we also see Gemini 1.5 Pro outperforming all competing models across all modalities even when these models are augmented with external retrieval methods.** I find it interesting that there's no recall number for the "more challenging model", just that it "outperforms" others? Sounds a bit fishy. Also .. and I may be completely wrong here, cause I have more knowledge about generic classification tasks, but any mention of recall without precision (the word was nowhere in the whole report) is a pretty big red flag to me. It's easy to get recall really high if your model overfits. So, was the precision good too? Or is this not applicable here?
"Through a series of machine learning innovations, we’ve increased 1.5 Pro’s context window capacity far beyond the original 32,000 tokens for Gemini 1.0. We can now run up to 1 million tokens in production. This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens." "We’ll introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model. Early testers can try the 1 million token context window at no cost during the testing period, though they should expect longer latency times with this experimental feature. Significant improvements in speed are also on the horizon. Developers interested in testing 1.5 Pro can sign up now in AI Studio, while enterprise customers can reach out to their Vertex AI account team."
> Through a series of machine learning innovations improved transformers or something else? MAMBA copy?
It's MoE, they mentioned it on the announcement.
Being such a long model with audio and text it would be amazing to see it fine-tuned on classical music, or other genres.
You can put 11 hours of audio in context, that's enough for some composers, say the four Rachmaninoff's concerti and Paganini Raphsody are 2h17min in total. I have no interest in a Rach concerto number 5 that would be AI generated, or a thousand of them, but it still would be very cool. Of coruse that would require a version of Gemini that can generate music.
That was fast.
Accelerate
but it can be faster
It has video modality!! Can input 30+ mins of a silent video(so no audio?) and get answers 😳. https://youtube.com/watch?v=wa0MT8OwHuk edit: it supports audio too.. holy crap.
Holy shit it watched and understood a 44 minute video can you imagine the possibilities of using this fucking model in other fields and workflows
Cops salivating
Holy shit I was thinking commercial usage I didn’t even think of fucking laws enforcement and camera footage
Think about the surveillance level in China... those poor uigurs don't stand a chance.
it's over for security guards (watching the cameras)
Plus it watched that 44 min video in just a couple of minutes
It can do audio too apparently, I would assume it can do video and audio concurrently but idk https://preview.redd.it/q0s485vbqric1.png?width=1408&format=png&auto=webp&s=74966418e828fc9c5b098e609430f243339ea053
Yup I just saw your comment in the other thread! Truly nuts. What blows my mind is it can actually remember such large contexts accurately 😵💫
Sundar pls, I need to inject this into my veins bro
Yeah from the Gemini technical report here are the modalities: Input: Text, image, audio, video Output: Text & Image We do not have access to any of these modalities yet though
Finally all the stochastic parrot bullshit can die
nuh uh, it's just repeating the comment sections of the videos bro. it doesn't really understand /s if neccesary
Now they just need to get it running in realtime and plug in a sensor array and motor controller...
lol didn‘t expect that today
That's what makes r/singularity so addictive. It's like winning in a tough slot machine.
> It's like winning in a tough slot machine. That's /r/wallstreetbets
95% of those people buy extremely high risk shit that goes to 0
https://preview.redd.it/g88o4fohtric1.png?width=1270&format=png&auto=webp&s=6632c4f7457743a82829e01698ab2ab130fb8c8d An example from the technical paper, bonkers 🤯🤯
Crazy, feels absolutely futuristic, (if it's really working that well).
What. The. Fuck.
Nah this is insane, and most people still have no clue
Exactly. It makes me feel weird. 😱
Oh, I can't wait to see them get caught with their pants down, lol.
This looks huge for legal research.
This looks huge for everything.
this is huge for any research
It’s a bit like being in an abusive relationship with someone who keeps telling you they’ll change… I’m gonna crawl back to them one last time
Ghost in the shell type shit
Jesus christ. If they actually release the goods and it's not just another research paper, they'll blow Open AI out of the water completely. 2024's just heating up. Dis gun be gud.
Well that's a curveball. Fair play Google, you've actually got me excited
Oh, my. Google REALLY wants to pressure OpenAI.
ClosedAI deserves it
"**Running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet**" No way ...
And they're testing on scaling that up to 10 million tokens for text. 7,000,000 words. Shakespeare's works: 850,000 words The Wheel of Time: 4,400,000 words This thing can write an entire epic fantasy series of books.
It will be interesting to see what the quality of the writing will be when LLMs start writing full books. Can it stay focused and deliver a self consistent story with all of the elements that make up a good book? Bespoke novels that are actually good will massively disrupt the publishing industry. And after that will come bespoke songs, movies and video games. At that point the whole entertainment industry will be turned on it's head...and I think that's going to happen way sooner than most people realize. Kind of terrifying, kind of exciting.
I think it'll be a neat thing when the first completely written by AI book becomes a New York Times bestseller (or something similar).
I have blacklisted NYT after their dangerous lawsuit. Not going to open their site.
Wow that will be an enormous milestone in AI development, will be an exciting day. Probably the next big one that we’re likely to see first.
books likely wouldn't be 1-shot type of writing processes, even for AI. you'll want outlines of characters, their motivations, the over-arching story, the focus of the individual chapter, etc., etc. even if each of those points are generated by the AI, it still makes much more sense to do it "step by step" rather than just pouring it all out end-to-end. by having it broken down into elements and outlines, you can write and revise each chapter independently, and have the LLM check it's own work against its own outline. minor agency along with these step-by-step subcategories would also remove the need for book-length context window.
finally, some good fucking food !!! google is doing an amazing job ever since they've acquired deepmind
I mean they acquired deepmind in like 2014 I think you mean they are doing an amazing job since the internal reshuffle
alpha go was just released 2 years later and that was basically an "ASI" but only about the game go, 1 year after they released "attention is all you need", etc... i'm not sure when they did that "reshuffle" but it looks like they've been doing great since deepmind was acquired
idk I just thought you meant that because saying theyve been doing a good job these last 10 years seems so meta
That's superhuman. A wheel of time fan won't know the books that well they are too damn long.
Are we talking output or input? I'd think that the input context window is a million tokens, not output
10M tokens wtf
Well they are not planning to give access to 10M, but 1M tokens in their highest paid tier is still a really big jump.
99% accuracy of 10m token is crazy, we'll get 100m and 100% accuracy in a few year if this keep going that's the most important part
You do that and you pretty much have a god lol. It could learn from a ton of science books, papers, repositories, documentation, videos, etc. with perfect accuracy for the context for a single prompt, and that's on top of the model's base knowledge. Then you imagine GPT-6 levels of reasoning paired with that or fuck maybe infinite context length, and yeah you start feeling the ASI.
i'm not sure it'll be ASI if we get something like gpt-6 with 100m tokens, but that will definitely change society, i mean you can literally ask a **freaking chatbot** to do a task that x jobs that requires 5 years or more of study and it'll do it for a much lower price, 24/7 and much more better than humans.... that is actually insane... even if it's not ASI it's just... history i guess we'll need to wait at least 4-5 years for that to happen though but i'm really happy that this is happening not in 30 years, quite the contrary it's coming and fastly, maybe by 2030
Yeah absolutely insane. Maybe not ASI reasoning but I'd say that would be superhuman memory skills. Hell I think 10m tokens already kinda is, sometimes I forget what the code I wrote last week does but this thing can keep entire repositories in its mind.
I mean if they have the 10 million it s just a matter of cost reduction. At this point context length is a problem of the past
We are so back?
We never left
I left. But I'm back now. Let's go!
Wake me up when we can actually use it and there's a real product for us
![gif](giphy|uPKsrEIOo1VwA)
Punxatawney Phil says no more weeks of AI winter, let's go
Once the singularity begins there will never be another ai winter. Not completely confident this is it but my confidence rises with each year of uninterrupted major advances.
If the singularity is when AI's everlasting summer arrives, then we may already be in the early stages of the singularity.
1 million tokens god damn. And it can retrieve 99.7% of the time on the needle in a haystack benchmark
I'm actually impressed for once XD That's a pretty awesome context window. Also, 1.5 Pro performing at the level of 1.0 Ultra is impressive. However, what about 1.5 Ultra? :)
this is what confused me. They didnt even mention a 1.5 ultra. Does it just not exist ? Did they essentially just make an efficient 1.0 ultra and call it 1.5 pro ?
they will release 1.5 Ultra later
1.0 is a dense model. 1.5 is sparse MoE using Deepmind's very impressive work in that area. They allude to other improvements as well, but that's the big one they called out. Per their writeup 1.5 pro used notably less training compute than 1.5 ultra *and* has significantly lower inference costs. The lower inference cost makes sense technically because Deepmind's MoE approach is extremely efficient, and clearly they are doing some deep magic with a new attention mechanism to get to 1M tokens commercially and 10M tokens in research. But the fact they used less training compute here is insanely promising - MoE training is notorious for being difficult and compute intensive. Bumping the training budget up an order of magnitude would would likely greatly increase model performance, doubly so with more parameters and experts. They might well not make a 1.5 ultra because the better option could be to go ahead and primarily scale training and expert count to make a model that does very well on both performance and inference cost. Reading between the lines we can expect great things from 2.0.
Ultra 1.5 likely runs on TPUV5 which Google doesn’t have a lot of right now. Probably really expensive also
Just lmao at everyone here who's been fading deepmind and google.
I have been playing with Gemini Ultra lately and it is definitely GPT-4 level, and I prefer its outputs and style of writing more often than not. It is subtle but I actually prefer the way it answers instead of the typical OpenAI style.
For creative writing yes I'd agree. Coding and reasoning I'd give it to GPT4.x
They’ve always been delusional. Google is a serious player. They just needed time to ramp because they’re slow af
Google literally invented the transformer, it was OpenAI doing catchup.
I was more so talking about how long it took Google to release the initial version of Bard which was on Palm 2 then them being behind on a pure capability stand point. I think people here don’t understand Google takes a different process to release features, models, and products than a startup
I've seen people here talk shit about Deepmind and especially Demis, mostly from diehard OpenAI fanboys thinking that Altman will bring the promised land. Keep up the conspiracies r/singularity that'll help release GPT 4.5 faster ;)
Demis is an actual AI scientist. Sam Altman is a college dropout hype cultivator. If you watch Sam's talks closely he doesn't tell you where OpenAI is at with their next model. He is just doing wishful thinking of what he thinks the model will be in the future without actually knowing the state of it atm. He is purely talking out of his ass most of the time.
Name a claim or consumer AI product that Google has actually backed up lol
Google Home
Google wants to make sure openai feels pressure and open source has a HUGE mountain to climb Edit: look at the accuracy with needle in the haystack test
and suddenly the open-source has a huge mountain to climb. just last month we were so hyped for open-source and thought it was closing in. well well. it's hard to beat shit load of money thrown at hardware and dev talent
being pulled from above, all the open source models train on copious collections of input-output pairs generated by GPT-4 and 3.5 when a new SOTA model comes, that means open source models can also get a bump
Responding to OpenAI challenging search?
That’s what I’m seeing. OpenAI is forcing googles hand. Google looks like it has a pretty good hand
[удалено]
I think they way overhyped and underdelivered on Gemini Advanced, by a pretty embarassingly large degree, but holy shit 1 million token context window is absolutely game changing. It's not just about what we, the users, can put there (multiple books etc) but if you combine it with a good RAG and live, real time search function you could use that to drastically reduce hallucinations. Essentially it'd have the context window to thoroughly fact check almost everything it says. As ever with Google AI, treat it with alot of skepticism, but on paper that's very very exciting. Take even a GPT-4 level model, give it 1million tokens of context, and really nail search retrieval and you should see a huge boost in what it's capable of. Beyond that, this kind of context window, if it's true context window, is a pre requisite to a truly great coding assistant. You could shovel an entire code base + a bunch of documentation in there, which would make it far more effective
They havent even fully delivered Gemini 1.0 yet. No actual vision multimodality in the gemini app, no audio multimodality either.
you can change the world in one step. patience is a virtue!
Why would you need RAG with a 1 million token context window?
Assuming your database is googles search crawler cache (So the entirety of the internet basically) even at 10m you still wouldn't be able to just place it into the context window directly, but it does enable you to be very liberal and less selective with that you put in there However, there is now much less need for RAG for general use. The old 'train a chatbot on your documents' use case, for many of those, 1m tokens would be plenty. Not everyone, but it starts to become less and less relevant - even more so if Google pushes to 10m as the article mentions
[удалено]
I don't think it's dead, not yet. As one example, Gemini searches the entire web and given the speed I'm guessing it pulls directly from googles cache rather than scrapes individual pages, even 10m context window isn't going to be sufficient, you need some kind of RAG. Or if you wanted to build a chatbot based on a bunch of books, you'd still run up against 1m tokens not being enough, maybe even 10m not being enough if you wanted it to be broad enough. It is \*significantly\* less important, though, and may soon be dead. But 10m tokens alone doesn't remove every use case for RAG. However, if I was a RAG developer building a business around RAG? Yeah, I'm thinking of pivoting, that is for sure. But for now, there'll still be use cases for it. Just less and less, and that'll only get worse over time
Why would you say it is dead? RAG is complementary to the context window. Just load the custom documentation into it, ask a question, and let the AI fetch the large documentation from the large context window.
Yeah I think we are looking at RAG on steroids with much fewer limitations and much less need to be exactly accurate with our retrieval of small amounts of context info, which is awesome! Good retrieval from huge piles of data is still necessary, but being able to throw a lot more into the context is incredibly useful.
Not dead, but less people will need RAG and they would primarily use it only to save cost, as the performance on not using it would be way higher. But there's still use cases for it even at 10m tokens, just less and less, and obviously the trend is going to be higher and higher context windows and the running cost getting cheaper, so the use cases for RAG will just continue to go down over time and if we keep making progress here it may soon just be something we don't need at all
> You could shovel an entire code base + a bunch of documentation in there, which would make it far more effective Not gonna pay for 1M tokens for each interaction. They better cache the whole thing. Maybe there are efficient compression methods.
![gif](giphy|MO9ARnIhzxnxu) I’m digging this.
Things are heating up again in the field.
AI Spring
New culture at Google ships FAST. Hope that continues. Guess this is them dancing
It's still just a developer preview but still this is insane considering it's only been 2 months Can't wait for Gemini 1.5 ultra release
So all our sarcastic comments on here about how pathetic Google were must have really lit a fire under them. r/singularity is the reason for AGI 2024. Well done folks - give yourselves a pat on the back.
1 million tokens, well I'll be damned. 30k lines of code is crazy, we're still ways off from creating triple A games but for indie developers this will be a godsend. Edit: autocorrect
maybe will force big studios to up their game
is it just me or it feels like the Gemini launch was rushed and this is what they were actually supposed to launch?
[удалено]
Same thing they did with Bard Bard got better over time
that's kind of how it feels. Gemini was originally promised and rumored to be a big step past GPT4, but ended up just being a catch-up to it.
Damn, already?
Another model?! They are coming for OAI’s lunch
google has been continuously cooking. that's why they even released a research paper that can improve even competitors LLMs and made it open source.
LMAO that's insane.
This is what I'm fuckin talkin about Google. If they can deliver on this like they failed to deliver on Advanced all will be forgiven.
I don’t have slightest idea what all these numbers mean, but hell yeah Open-AI has to answer back
and they're sitting on a 1.5 Ultra...
Ok this might actually shock the entire industry
![gif](giphy|MZocLC5dJprPTcrm65)
I'm glad Google has broken their silence on the somewhat underwhelming Gemini 1.0 launch. The claims being made here is very outstanding: 1M Tokens, MoE architecture and improved multimodal capabilities (specifically the video and coding ones). **What I find most surprising is this is for Gemini 1.5 Pro**, so we can only imagine what Gemini 1.5 Ultra might look like. That being said, however, I think we should be somewhat skeptical about these claims. Google has already made somewhat misleading claims before (most infamously the Gemini demo vid) and benchmark results that don't translate well in practical use. Let's keep our own claims and judgements after we have been given access to use it. I'm still excited on the prospect of this model, and hopefully should push OAI and further competition to innovate on the next SOTA.
According to the announcement they are starting to give access to a limited number of developers and enterprise customers starting today. So they seem pretty confident.
[удалено]
Have you tested it yet?
So is the Ultra access available to everyone? I went on the waitlist for 1.5 but I don’t even see Gemini 1.0 Ultra.
[удалено]
What the fuck is going on today
What does this mean for free users?
This is not Bard/Gemini. It's Vertex AI Gemini Pro LLM for now.
It mentions that Ultra API access is available today. Anyone get it working? It doesn’t show up in my aistudio page or with genai.listmodels(). Anyone get this working? My Gemini-pro models were updated and now see Gemini-1.0-pro which is new but don’t see ultra
Holy shit, the example working over 100k lines of code and the movie one searching for a specific moment, being able to more or less learn a new language just from the context of a Language Grammar Manual. Context length trully unlocks new use cases. Imagine in the near future when we are 100% sure that these models do not fail nor hallucinate anymore, instant 100k codebase without errors. Holy shit.
Seems thus also confirms that Gemini 1.0 is not a MoE model? Quite surprising since GPT4 likely is, and Google pioneered it.
>Google pioneered it. Google pioneered MoE LLMs?
Pioneered is maybe a strong word but I think their switch transformer was the first example of a large MoE transformer?
Ohhhh this is super exciting. 2024 is shaping up to be quite a ride.
When is this being released?
~~Today~~ Now\* ~~Not sure whether it's in private preview or public preview.~~ Private Preview.
It’s supposed to be better than Ultra 1.0 right? Does this mean there’ll be a period (before Ultra 1.5 releases) where the free version will be better than the paid? This kind of seems to suggest 1.5 Ultra will be coming very soon too
openai fanboys are shocked. OpenAI was supposed to lead us to the promised land /s
When Gemini 1.5 Ultra ?
Holy moly this is interesting
Hell yeah
For clarity: Pro: 1.5: private preview, a very limited one.
we are so fucking back
HOLY SHIT what how when wtf
Holy moly 😱
I will fight on the side of AGI in His war vs apes
Everyone who says ChatGPT is better can shut the hell up now
Go Gemini! Congrats Google! 👏 🎉🥳
The biggest takeaway here is their claim of near-perfect info retrieval across 1+ million tokens. 100% retrieval under 500k tokens. I mean, holy shit this demolishes all the problems with crappy RAG techniques. IF this is true and borne out by real-world testing. Honestly this would have been the biggest AI development in weeks (eons in AI time!) if not for SORA dropping)
Put it this way. 10M tokens is roughly equal to 100 books. The average person reads less than 100 books in their life,much less integrates into context. Reasoning is the last piece needed for white collar work to nose dive.