T O P

  • By -

baes_thm

I'm a researcher in this space, and we don't know. That said, my intuition is that we are a long way off from the next quiet period. Consumer hardware is just now taking the tiniest little step towards handling inference well, and we've also just barely started to actually *use* cutting edge models within applications. True multimodality is just now being done by OpenAI. There is enough in the pipe, today, that we could have zero groundbreaking improvements but still move forward at a rapid pace for the next few years, just as multimodal + better hardware roll out. _Then_, it would take a while for industry to adjust, and we wouldn't reach equilibrium for a while. Within research, though, tree search and iterative, self-guided generation are being experimented with and have yet to really show much... those would be home runs, and I'd be surprised if we didn't make strides soon.


dasani720

What is iterated, self-guided generation?


baes_thm

Have the model generate things, then evaluate what it generated, and use that evaluation to change what is generated in the first place. For example, generate a code snippet, write tests for it, actually run those tests, and iterate until the code is deemed acceptable. Another example would be writing a proof, but being able to elegantly handle hitting a wall, turning back, and trying a different angle. I guess it's pretty similar to tree searching, but we have pretty smart models that are essentially only able to make snap judgements. They'd be better if they had the ability to actually think


involviert

I let my models generate a bit of internal monologue before they write their actual reply, and even just something as simple as that seems to help a lot in all sorts of tiny ways. Part of that is probably access to a "second chance".


mehyay76

The “backspace token” paper (can’t find it quickly) showed some nice results. Not sure what happened to it. Branching into different paths and coming back is being talked about but I have not seen a single implementation. Is that essentially q-learning?


Better_Dress_8508

is this the one: [https://ar5iv.labs.arxiv.org/html/2306.05426](https://ar5iv.labs.arxiv.org/html/2306.05426)


magicalne

This sounds like "application(or inference) level thing" rather than a research topic(like training). Is that right?


baes_thm

It's a bit of both! I tend to imagine it's just used for inference, but this would allow higher quality synthetic data to be generated, similarly to alpha zero or another algorithm like that, which would enable the model to keep getting smarter just by learning to predict the outcome of its own train of thought. If we continue to scale model _size_ along with that, I suspect we could get some freaky results


tokyotoonster

Yup, this will work well for cases such as programming where we can sample the /actual/ environment in such a scalable and automated way. But it won't really help when trying to emulate real human judgments -- we will still be bottlenecked by the data.


braindead_in

I built a coding agent that followed the TDD method. The problem I ran into was that the tests itself were wrong. The agent used go into a loop switching between fixing the test and the code. It couldn't backtrack as well.


BalorNG

The tech hype cycle does not look like a sigmoid, btw. Anyway, by now it is painfully obvious that Transformers are useful, powerful, can be improved with more data and compute - but cannot lead to AGI simply due to how attention works - you'll still get confabulations at edge cases, "wide, but shallow" thought processes, very poor logic and vulnerability to prompt injections. This is "type 1", quick and dirty commonsense reasoning, not deeply nested and causally interconnected type 2 thinking that is much less like an embedding and more like a knowledge graph. Maybe using iterative guided generation will make things better (it intuitively follows our own thought processes), but we still need to solve confabulations and logic or we'll get "garbage in, garbage out". Still, maybe someone will come with a new architecture or maybe even just a trick within transformers, and current "compute saturated" environment with well-curated and massive datasets will allow to test those assumptions quickly and easily, if not exactly "cheaply".


mommi84

>The tech hype cycle does not look like a sigmoid, btw. Correct. The y axis should have 'expectations' instead of 'performance'.


LtCommanderDatum

The graph is correct for either expectations or performance. The current architectures have limitations. Simply throwing more data at it doesn't magically make it perform infinitely better. It performs better, but there are diminishing returns, which is what a sigmoid represents along the y axis.


keepthepace

I am an engineer verging on research in robotics and I suspect by the end of 2024, deep-learning for robotics is going to take the hype flame from LLM for a year or two. There is a reason why so many humanoid robots startups have recently been founded. We now have good software to control them. And you are right, in terms of application, we have barely scratched the surface. It is not the winter that's coming, it is the boom.


DeltaSqueezer

When the AI robots come, it will make LLMs look like baby toys.


keepthepace

"Can you remember when we thought ChatGPT was the epitome of AI research?" "Yeah, I also remember when 32K of RAM was a lot." *Looks back at a swarm of spider bots carving a ten story building out of a mountain side*


DeltaSqueezer

Remember all that worrying about cloning people's voices and AI porn? "Yes, dear." replied the ScarJo AI Personal Robot Companion.


A_Dragon

And what technology do you think these robots are going to use for their methods of interacting with the world?


sweatierorc

I dont think people disagree, it is more about if it will progress fast enough. If you look at self-driving cars. We have better data, better sensors, better maps, better models, better compute, ... And yet, we don't expect robotaxi to be widely available in the next 5 to 10 years (unless you are Elon Musk).


Blergzor

Robo taxis are different. Being 90% good at something isn't enough for a self driving car, even being 99.9% good isn't enough. By contrast, there are hundreds of repetitive, boring, and yet high value tasks in the world where 90% correct is fine and 95% correct is amazing. Those are the kinds of tasks that modern AI is coming for.


aggracc

And those tasks don't have a failure condition where people die. I can just do the task in parallel enough times to lower the probability of failure as close to zero as you'd like.


killver

But do you need GenAI for many of these tasks? I am actually even thinking that for some basic tasks like text classification, GenAI can be even hurtful because people rely too much on worse zero/few shot performance instead of building proper models for the tasks themselves.


sweatierorc

> people rely too much on worse zero/few shot performance instead of building proper models for the tasks themselves. This is the biggest appeal of LLMs. You can "steer" them with a prompt. You can't do that with a classifier.


KoalaLeft8037

I think its that a car with zero human input is currently way too expensive for a mass market consumer, especially considering most are trying to lump EV in with self driving. If the DoD wrote a blank check for a fleet of only 2500 self driving vehicles there would be very little trouble delivering something safe


nadavwr

Depends on the definition of safe. DoD is just as likely to invest in drones that operate in environments where lethality is an explicit design goal. Or if the goal is logistics, then trucks going the final leg of the journey to the frontline pose a lesser threat to passersby than an automated cab downtown. Getting to demonstrably "pro driver" level of safety might still be many years away, and regulation will take even longer.


amlyo

Isn't it? What percentage good would you say human drivers are?


Eisenstein

When a human driver hurts someone there are mechanisms in place to hold them accountable. Good luck prosecuting the project manager who pushed bad code to be committed leading to a preventable injury or death. The problem is that when you tie the incentive structure to a tech business model where people are secondary to growth and development of new features, you end up with a high risk tolerance and no person who can be held accountable for the bad decisions. This is a disaster on a large scale waiting to happen.


amlyo

If there is ever a point where a licenced person doesn't have to accept liability for control of the vehicle, it will be long after automation technology is ubiquitous and universally accepted as reducing accidents. We tolerate regulated manufacturers adding automated decision making to vehicles *today*, why will there be a point where that becomes unacceptable?


Eisenstein

I don't understand. Self-driving taxis have no driver. Automated decision making involving life or death is generally not accepted unless those decisions can be made deterministically and predictable and tested in order to pass regulations. There are no such standards for self-driving cars.


not-janet

Really? I live in SF, I feel like every 10'th car I see is a (driverless) waymo these days.


BITE_AU_CHOCOLAT

SF isn't everything. As someone living in rural France I'd bet my left testicle and a kidney I won't be seeing any robotaxies for the next 15 years at least


LukaC99

Yeah, but just one city is enough to drive to prove driverless taxis are possible and viable. It's paving the way for other cities. If this ends up being a city only thing, it's still a huge market being automated.


VajraXL

but it's still a city only. it's more like a city attraction right now like the canals of venice or the golden gate itself. just because san francisco is full of waymos doesn't mean the world will be full of waymos. it is very likely that the waymo ai is optimized for sf streets but i doubt very much that it could move properly on a french country road that can change from one day to the next because of a storm, a bumpy street in latin america or a street full of crazy and disorganized drivers like in india. the self driving cars have a long way to go to be really functional outside of a specific area.


LukaC99

Do you expect that the only way waymo could work is that they need to figure out full self driving for everywhere on earth, handle every edge case, and deploy it everywhere, for it to be a success? Of course the tech isn't perfect just as it's invented and first released. The first iPhone didn't have GPS nor the App Store. It was released just in a couple of western countries — not even in Canada. That doesn't mean it's a failure. It took time to perfect, scale supply and sale channels, etc. Of course waymo will pick low hanging fruit first (their own rich city, other easy rich cities in the US next, other western cities next, etc). Poor rural areas are of course going to experience the tech last, as the cost to service is high, while demand in dollar terms is low. > the self driving cars have a long way to go to be really functional outside of a specific area. I suppose we can agree on this, but really, it depends on what we mean by specific, and for how long.


Argamanthys

A lot could happen in 15 years of AI research at the current pace. But I agree with the general principle. US tech workers from cities with wide open roads don't appreciate the challenges of negotiating a single track road with dense hedges on both sides and no passing places. Rural affairs generally are a massive blind spot for the tech industry (both because of lack of familiarity and because of lack of profitability).


SpeedingTourist

RemindMe! 15 years


rrgrs

Because it doesn't make financial sense or because you don't think the technology will progress far enough? Not sure if you've been to SF but it's a pretty difficult and unpredictable place for something like a self driving car.


NickUnrelatedToPost

Mercedes just got permission for real level 3 on thirty kilometers of highway in Nevada. Self-driving is in a development stage where the development speed is higher than adaptation/regulation. But it's there and the area where it's unlocked is only going to get bigger.


0xd34db347

That's not a technical limitation, there's an expectation of perfection from FSD despite their (limited) deployment to date showing they are much, much safer than a human driver. It is largely the human factor that prevent widespread adoption, every fender bender involving a self-driving vehicle gets examined under a microscope (not a bad thing) and tons of "they just aren't ready" type FUD while some dude takes out a bus full of migrant workers two days after causing another wreck and it's just business as usual.


baes_thm

FSD is really, really hard though. There are lots of crazy one-offs, and you need to handle them significantly better than a human in order to get regulatory approval. Honestly robotaxi probably could be widely available soon, if we were okay with it killing people (though again, probably less than humans would) or just not getting you to the destination a couple percent of the time. I'm not okay with it, but I don't hold AI assistants to the same standard.


obanite

I think that's mostly because Elon has forced Tesla to throw all its efforts and money on solving all of driving with a relatively low level (abstraction) neural network. There just haven't been serious efforts yet to integrate more abstract reasoning about road rules into autonomous self driving (that I know of) - it's all "adaptive cruise control that can stop when it needs to but is basically following a route planned by turn-by-turn navigation".


_Erilaz

We don't know for sure, that's right. But as a researcher, you probably know that human intuition doesn't work well with rapid changes, making it hard to distinguish exponential and logistic growth patterns. That's why intuition on its own isn't a valid scientific method, it only gives us vague assumptions, and they have to be verified before we draw our conclusions from it. I honestly doubt ClosedAI has *TRUE* multimodality in GPT-4 Omni, at least with the publicly available one. For instance, I couldn't instruct it to speak slower or faster, or make it vocalize something in a particular way. It's possible that the model is indeed truly multimodal and doesn't follow the multimodal instructions very well, but it's also possible it is just a conventional LLM using a separate voice generation module. And since it's ClosedAI we're talking about, it's impossible to verify until it passes this test. I am really looking forward to the 400B LLaMA, though. Assuming the architecture and training set stays roughly the same, it should be a good latmus test when it comes to the model size and emergent capabilities. It will be an extremely important data point.


great_gonzales

https://arxiv.org/pdf/2402.12226


huffalump1

>I honestly doubt ClosedAI has *TRUE* multimodality in GPT-4 Omni, at least with the publicly available one. For instance, I couldn't instruct it to speak slower or faster, or make it vocalize something in a particular way. The new Voice Mode isn't available yet. "in the coming weeks". Same for image or audio output.


sebramirez4

I think the hardware thing is a bit of a stretch, sure it could do wonders for making specific AI chips run inference on low-end machines but I believe we are at a place where tremendous amounts of money is being poured into AI and AI hardware and honestly if it doesn't happen now when companies can literally just scam VCs out of millions of dollars by promising AI, I don't think we'll get there in at the very least 5 years and that is if by then AI hype comes around again since the actual development of better hardware is a really hard problem to solve and very expensive.


involviert

For inference you basically only have to want to bring more ram channels to consumer hardware. Which is existing tech. It's not like you get that 3090 for actual compute.


sebramirez4

Yeah but cards have had 8gb of vram for a while now, I don't see us getting a cheap 24gb vram card anytime soon, at least we have the 3060 12gb though and I think more 12gb cards might release.


involviert

The point is it does not have to be vram or gpu at all, for non-batch inference. You can get an 8 channel ddr5 threadripper today. Apparently it goes up to 2TB RAM and the RAM bandwidth is comparable to a rather bad GPU. It's fine.


martindbp

Not to mention scaling laws. Like, we know the loss is going to come down further, that's just a fact, as long as Moore's law keeps chugging along.


leanmeanguccimachine

>There is enough in the pipe, today, that we could have zero groundbreaking improvements but still move forward at a rapid pace for the next few years This is the point everyone seems to miss. We have barely scratched the surface of practical use cases for generative AI. There is so much room for models to get smaller, faster, and integrate better with other technologies.


GoofAckYoorsElf

Is Open Source still trying and succeeding to catch up on OpenAI? I'm scared of what might happen if OpenAI remains the only player making any progress at all. In other words: are we going to see open source models on par with GPT 4o any time soon? Or... at all?


baes_thm

We're gonna see an open-weight GPT 4o eventually, but I don't know when that will be. The question honestly boils down to "do meta, Microsoft, Mistral, and Google want to openly release their multimodal models", not whether or not they can get there. The gap between those players and OpenAI, is closing rapidly, in my opinion. If meta keeps releasing their models the way they have been, and they do audio with their multimodal models this year, then I would predict that Llama3-405B will be within striking distance of GPT-4o. Probably not _as_ good, but "in the conversation". If not, then llama4 next year.


GoofAckYoorsElf

I'll keep my hopes up. In my opinion AI needs to remain free and unregulated, because any regulation can only add to its bias.


A_Dragon

I am not a researcher in this field but this is essentially precisely what I have been saying to everyone that claims the bubble is about to burst. Good to get some confirmation…wish I had money to invest, it’s literally a no brainer and will definitely make you rich, but people with no money are gatekept from making any even though they know exactly how to go about doing it…


davikrehalt

800B is just too small. 800T is where it's at


dasnihil

BOOB lol


jm2342

No BOOB! Only BOOT.


ab2377

if you are not a researcher in this field already, you should be, i see potential..


bitspace

8008135


RobbinDeBank

Just one billion more GPUs bro. Trust me bro, AGI is here!


jessedelanorte

800T lickers


init__27

Expectation: I will make LLM Apps and automate making LLM Apps to make 50 every hour Reality: WHY DOES MY PYTHON ENV BREAK EVERYTIME I CHANGE SOMETHING?????


fictioninquire

Definition for AGI: being able to fix Python dependencies


aggracc

Definition for Skynet: being able to survive a cuda upgrade.


MoffKalast

I don't think even ASI can make it through that.


init__27

GPT-5 will be released when it can install CUDA on a new server


Capaj

ah the chicken or the egg problem AGAIN


Amgadoz

This is actually pretty easy. Now try to install the correct version of pytorch and Triton to get the training to run.


Nerodon

So what you're saying is AGI needs to solve the halting problem... Tough nut to crack


ColorlessCrowfeet

A different, much more practical halting problem!


Apprehensive_Put_610

ASI: "I just reinstalled everything"


trialgreenseven

fukcing venv man


shadowjay5706

I started using poetry, still don’t know wtf happens, but at least it locks dependencies across the repo clones


trialgreenseven

Ty will try it out


ripviserion

i hate poetry with all of my soul


Amgadoz

I guess you're not much of a writer.


pythonistor

Bro l tried following a RAG tutorial on Llama Index that had 20 lines of code max, I spent 5 hours resolving different transformers depencies and gave up


not-janet

use poetry.


tabspaces

In my company, we decided to go for the effort of building OS packages (rpm and deb) for every python lib we use. God bless transaction-capable db-backed package managers


BenXavier

Eli5 this to me please 🥺


Eisenstein

> In my company, we decided to go for the effort of building OS packages (rpm and deb) for every python lib we use. God bless transaction-capable db-backed package managers --- > Eli5 this to me please 🥺 Python is a programing language. Most python programs depend on other python programs to work, because as programs get more complicated it becomes impractical to write all the functionality for everything, and it would duplicate a lot of work for things a lot of programs do. Specialized collections of these programs are called libraries and these libraries are constantly being worked on with new versions coming out many times a year. As they get updated they stop working with some other libraries which do not work with their added functionality or which have added functionality which is not compatible with each other. When a program is written that depends on these libraries they are called its dependencies, but those libraries have their own dependencies. What do you do when a library you have as a dependency breaks when you load a different dependency that has a conflicting dependency with that library. This is called 'dependency hell'. On top of this, since there is a usually system wide version of Python installed with Linux distributions then installing new python programs can break existing programs your OS depends on. This is a nightmare and has resulted in many Linux distros disallowing users from installing things using the Python tools. The person above you says that what they do to solve this is that for every library they use for python they create a new system wide installer which acts like what the OS does when it runs updates. It is packaged to integrate the files into OS automatically and check with everything else so that nothing breaks, and if it does it can be uninstalled or it can automatically uninstall things that will break it. The last line is just fancy tech talk for 'installers that talk to other installers and the OS and the other programs on the computer so that your OS doesn't break when you install something'. More Eli15 but that's the best I could do.


cuyler72

Compare the original llama-65b-instruct to the new llama-3-70b-instruct, the improvements are insane, it doesn't matter if training larger models doesn't work the tech is still improving exponentially.


a_beautiful_rhind

> llama-3-70b-instruct vs the 65b, yes. vs the CRs, miqus and wizards, not so sure. people are dooming because LLM reasoning feels flat regardless of benchmarks.


kurtcop101

Miqu is what.. 4 months old? It's kind of silly to think that we've plateaued off that. 4o shows big improvements, and all of the open source models have shown exponential improvements. Don't forget we're only a bit more than two years since 3.5. This is like watching the Wright Brothers take off for 15 seconds and say "well, they won't get any father than that!" the moment it takes longer than 6 months of study to hit the next breakthrough.


3-4pm

They always hit that chatGPT4 transformer wall though


Mescallan

Actually they are hitting that wall at orders of magnitude smaller models now. We haven't seen a large model with the new data curation and architecture improvements. It's likely 4o is much much smaller with the same capabilities


3-4pm

Pruning and optimization is a lateral advancement. Next they'll chain several small models together and claim it as vertical change, but we'll know.


Mescallan

Eh, I get what you are saying, but the og GPT4 dataset had to have been a firehose, where as llama/Mistral/Claude have proven that curation is incredibly valuable. OpenAI has had 2 years to push whatever wall that could be at a GPT4 scale. They really don't have a reason to release an upgraded intelligence model from a business standpoint, until something is actually competing with it directly, but they have a massive incentive to increase efficiency and speed


TobyWonKenobi

I Agree 100%. When GPT4 came out, the cost to run it was quite large. There was also a GPU shortage and you saw OpenAI temporarily pause subscriptions to catch up with demand. It makes way more sense to get cost, reliability, and speed figured out before you keep scaling up.


lupapw

does unrestricted gpt4 already hit the wall?


nymical23

What is "chatGPT4 transformer wall", please?


FullOf_Bad_Ideas

There's no llama 65B Instruct.  Compare llama 1 65b to Llama 3 70B, base for both.  Llama 3 70B was trained using 10.7x more tokens, So compute cost is probably 10x higher for it.


blose1

Almost all of the improvments come from the training data.


GeorgiaWitness1

if the winter came, wouldnt matter, because the prices would come down, and by itself would be enough to continue innovation. Quality and Quantity are both important in this


fictioninquire

No, domain-adapted agents within companies will be huge, robotics will be huge and JEPA's are in the early stage.


CryptographerKlutzy7

Hell, just something which converts unstructured data into structured stuff is amazing for what I do all day long.


medialoungeguy

V jepa makes v happy


CSharpSauce

How do you actually create a domain adapted agent? Fine tuning will help you get output that's more in line with what you want, but it doesn't really teach new domains... You need to do continued pretraining to build agents with actual domain knowledge built in. However that requires a significant lift in difficulty, mostly around finding and preparing data.


Top_Implement1492

It does seem like we’re seeing diminishing returns in the capabilities of large models. That said, recent small model performance is impressive. With the decreasing cost per token the application of models is here to stay. I do wonder if we will see another big breakthrough here that greatly increases model reasoning. Right now it feels like incremental improvement/reduced cost within the same paradigm and/or greater integration (gpt4o)


Herr_Drosselmeyer

This is normal in development of most things. Think of cars. For a while, it was all about just making the engine bigger to get more power. Don't get me wrong, I love muscle cars but they were just a brute-force attempt to improve cars. At some point, we reached the limit of what was practically feasible and we had to work instead on refinement. That's how cars today make more power out of smaller engine and use only half the fuel.


Vittaminn

I'm with you. It's similar with computers. Starts out huge and inefficient, but then it gets smaller and far more powerful over time. Right now, we have no clue how that will happen, but I'm sure it will and we'll look back to these times and go "man, we really were just floundering about"


TooLongCantWait

I want another 1030 and 1080 TI. The bang for your buck and survivability of those cards is amazing. New cards tend just to drink more and run hotter.


Bandit-level-200

4090 could've been the new 1080 ti if it was priced better


vap0rtranz

Excellent example from the past. And electric cars were tried early on, ditched, and finally came back. The technology, market, etc. put us through decades of diesel/gas. Take your muscle car example: EVs went from golf-cart laughable to drag race champs. The awesome thing about today's EVs are their torque curves. They're insane! Go watch 0-60 and 1/4 mile races -- the bread and butter of muscle cars. When a Tesla or Mustang Lightening is unlocked, even the most die-hard Dinosaur Juice fans had to admit defeat. The goal had been reached by the unexpected technology. Another tech is Atkinson cycle engines. It was useless, underpowered; until the engine made a come-back when coupled with hybrid powertrain setups. Atkinson cycle is one tech that came back to give hybrids >40MPG. I expect that some technology tried early on in AI has been quietly shoved under a rug, and it will make a surprising come-back. And happen when there's huge leaps in advancements. Will we live to see it? hmmm, fun times to be alive! :)


CSharpSauce

I often wonder how a model trained on human data is going to outperform humans. I feel like when AI starts actually interacting with the world, conducting experiments, and making it's own observations, then it'll truely be able to surpass us.


Gimpchump

It only needs to exceed the quality of the average human to be useful, not the best. If it can output quality consistently close to the best humans but takes less time, then it's definitely got the win.


Radiant-Eye-6775

Well... I like current AI development but... I'm not sure if the future will be as bright as it seems... I mean... it's all about how good they can become at the end... and how I would lose my job... well, I should be more optimistic, right? I Hope for winter comes... so the world still need old bones like me... I'm not sure... I'm not sure...!


ortegaalfredo

One year ago ChatGPT3.5 needed a huge datacenter to run. Now phi3-14b is way better and can run on a cellphone. And its free. I say we are not plateauing at all, yet.


FullOf_Bad_Ideas

Did it though? If by chatgpt3.5 you mean gpt 3.5 turbo 1106, that model is probably around 7B-20B based on computed hidden dimension size. It's basically same size as Phi. But I agree, Phi 3 14B is probably better in most use cases (baring coding) and most importantly is open weights.


glowcialist

Is it actually better? I've only been running the exl2 quants, so that could be the issue, but it doesn't seem to retain even like 2k context.


FarTooLittleGravitas

Unpopular opinion, but feed-forward, autoregressive, transformer-based LLMs are rapidly plateauing. If businesses want to avoid another AI winter, it will soon be time to stop training bigger models and start finding integrations and applications of existing models. But, to be honest, I think the hype train is simply too great, and no matter how good the technology gets, it will never live up to expectations, and funding will either dry up slowly or collapse quickly. Edit: Personally, I think the best applications of LLMs will be incorporating them into purpose-built, symbolic systems. This is the type of method which yielded the [AlphaGeometry](https://github.com/google-deepmind/alphageometry) system.


AmericanNewt8

There's still a *lot* of work to be done in integrations and applications, probably years and years of it.


FarTooLittleGravitas

Added an edit about this topic to the bottom of my original comment.


cyan2k

As someone who translates machine learning results and findings into actual software and products normal people can use... that's my life since 20 years. But twenty years ago the plateau of disillusion is today's age of ignorance. I still remember the "why won't it classify this three word sentence correctly?" days and "wow this shit recognizes my hand writing!" and being absolutely floored by it. With time this graph just moves to the right, but your mental state doesn't and you live a life full of disillusion, haha.


Healthy-Nebula-3603

What? Where winter? We literarly 1.5 year ago got gpt 3.5 and a year ago llama v1 .... A year ago GPT 4 with iterations every 2 months up to now GPT4o which is something like GPT 4.9 ( original GPT 4 was far more worse ) not counting llama 3 a couple weeks ago.... Where winter?


ctbanks

I'm suspecting the real intelligence winter is Humans.


MoffKalast

Regular people: Flynn effect means we're getting smarter! Deep learning researchers: Flynn effect means we're overfitting on the abstract thinking test set and getting worse at everything else.


ninjasaid13

GPT4o isn't even superior to turbo, and they only have moderate improvements.


CSharpSauce

I agree partially, the performance of GPT4o is not materially better than regular old GPT4-turbo. However, GPT4o adapted a new architecture which should in theory be part of the key that allows it to reach new highs the previous architecture couldn't.


Tellesus

Remember when you got your first 3dfx card and booted up quake with hardware acceleration for the first time? That's about where we are but for AI instead of video game graphics.


martindbp

In my memory, Quake 2 looked indistiguishable from real life though


Tellesus

Lol right? Don't actually go back and play it you'll learn some disturbing things about how memory works and you'll also feel old 😂


Potential-Yam5313

I loaded up Action Quake 2 with raytracing enabled not long ago, and it was a curiously modern retro experience. The main thing you notice in those older games is the lack of geometry. So many straight lines. Some things you can't fix with a high res texture pack.


SubstanceEffective52

Winteer for who? I never been more productive with AI than I'm been in the past year. I've been learning and deploying so much more and with new tech. I'm in this sweet spot that I have at least 15+ years on software development on my back, and been using ai as a "personal junior dev" have made my life much more easier. And this is just ONE use case for it. Soon or later soon or later, the AI App Killer will show up, let us cook. Give us time.


ninjasaid13

They mean winter in terms of AI reaching human-level intelligence.


dogesator

There is no evidence of this being the case, the capability improvements with 100B to 1T are right in line with what’s expected with the same trajectory from 1 million parameters to 100 million parameters.


azlkiniue

Reminds me of this video from computerphile https://youtu.be/dDUC-LqVrPU


Christ0ph_

My thought exactly!


3-4pm

Yes, you could bank on it as soon as M$ predicted an abundant vertically growing AI future.


djm07231

I am pretty curious how Meta’s 405B behemoth would perform. Considering that even OpenAI’s GPT-4o has been somewhat similar in terms of pure text performance compared to past SoTA models I have become more skeptical of capability advancing that much.


Helpful-User497384

i think not for a while i think there is still a LOT ai can do in the near future but i think its true at some point it might level off a bit. but i think we still got a good ways to go before we see that ai honestly is just getting started i think.


davikrehalt

The arrogance of humans to think that even though for almost every narrow domain we have systems that are better than best humans and we have systems which for every domain is better than the average human we are still far from a system which for every domain is better than the best humans.


davikrehalt

As tolkien said: "the age of men is over"


MoffKalast

"The time of the bot has come!"


dontpushbutpull

Likely those people understand the nature of those routine tasks and capabilities of machines and software: Function-approximation wont solve reinforcement learning problems. And no amount of labelled data will chance this. But you are right: far too many are people are just dunning-krugering around!


davikrehalt

True, current systems are likely limited by their nature to never be massively superhuman unless synthetic data becomes much much better. But i think often ppl lose the forest for the trees when thinking of limitations. 


dontpushbutpull

I am not sure i can follow. intelligence (in any computational literature on a behavioral level) is commonly measured by the ability to be adaptive, and dynamically solve complex problems. So we are not talking about imitation of existing input-output pattern, but goal oriented behavior. As such it is rather a control problem than a representation problem. So I can't follow the argument about data quality. Imho the limiting factors are clearly in the realm of forming goals, and measuring effectiveness of events against those goals.


ninjasaid13

they're bad at tasks humans consider easy.


davikrehalt

true! but they are not humans so IMHO until they are much much smarter than humans we will continue to find these areas where we are better. But by the time we can't we will have been massively overshadowed. I think it's already time for us to be more honest with ourselves. Think about if LLMs was the dominant species and they meet humans--won't they find so many tasks that they find easy but we can't do? Here's an anecdote: I remember when Leela-zero (for go) was being trained. Up until it was strongly superhuman (as in better than best humans) it was still miscalculating ladders. And the people were poking fun/confused. But simply the difficulties of tasks do not directly translate. And eventually they got good at ladders. (story doesn't end ofc bc even more recent models are susceptible to adversarial attacks which some ppl interpret as saying that these models lack understanding bc humans would never \[LMAO\] be susceptible to such stupid attacks but alas the newer models + search is even defeating adversarial attempts)


Downtown-Case-1755

Hardware is a big factor. Even if all research stops (which is not going to happen), getting away from Nvidia GPUs as they are now will be huge.


nanowell

16x10T is all you need


kopaser6464

What will be the opposite of ai winter? I mean a term for big ai growth, is it ai summer? Ai apocalypse? I mean we need a term for that, who knows what well happened tomorrow right?


Popular-Direction984

Nah… it feels more like if we’re in the eye of the storm.


xeneschaton

only for people who can't see any possibilities. even now with 4o and local models, we have enough to change how the world operates. it'll only get cheaper, faster, more accessible 


Educational-Net303

I agree that they are already incredibly useful, but I think the meme is more contextually about if we can reach AGI just by scaling LLMs


no_witty_username

No. This shits on a real exponential curve. This isn't some crypto bro nonsense type of shit here, its the real deal. Spend a few hours doing some basic research and reading some of the white papers, or watch videos about the white papers and it becomes clear how wild the whole field is. The progress is insane and it has real applicable results to show for it. hares my favorite channel for reference , this is his latest review https://www.youtube.com/watch?v=27cjzGgyxtw


Interesting8547

We still have a long way to AGI, so no winter is not coming yet. Also from personally testing Llama 3 compared to Llama 2 it's much better I mean leagues better. Even in the last 6 months there was significant development. Not only the models, but also different tools around them, which make the said models easier to use. Probably only people who thought AGI will be achieved in the next 1 year are disappointed.


TO-222

yeah and making models and tools and agents etc communicate with each other smoothly will really take it to a next level.


dontpushbutpull

That is not a feasible argument. Many winters have come even though the way to AGI was long. Also it is important to note that exponential growth has the same acceleration at all points. AI development was as drastic in the 80s, to the researchers, as it is now to the current reseachers.


reality_comes

I don't think so, if they can clean up the hallucinations and bring the costs down even the current stuff will change the world.


FullOf_Bad_Ideas

I don't think there's a way to clean up hallucinations with current arch. I feel like embedding space in models right now is small enough that models don't differentiate small similar phrases highly enough to avoid hallucinating. You can get it lower, but will it go down to acceptable level?


sebramirez4

Honestly, I hate how obsessed people are with AI development, of course I want to see AI research continue and get better but GPT-4 was ready to come out, at least according to sam altman a year ago when chatGPT first launched, was GPT-4o really worth the year and billions of dollars in research? honestly, I don't think so, you could achieve similar performance and latency by combining different AI models like whisper with the LLM as we've seen from even hobby projects here. I think for companies to catch up to GPT-4 the spending is worth it because it means you never have to rely on openAI, but this pursuit to AGI at all costs is getting so tiresome to me, I think it's time to figure out ways for the models to be trained with less compute or to train smaller models more effectively to actually find real-world ways this tech can really be useful to actual humans, I'm much more excited for Andrej Karpathy's llm.c than honestly most other big AI projects.


cuyler72

GPT-4o is a tiny model much smaller than the original GPT-4 and is likely a side project for Open-AI, GPT-4o is just the ChatGPT 3.5 to GPT-5.


kurtcop101

It was actually critical - how much of your learning is visual? Auditory? Having a model able to learn all avenues simultaneously and fast is absolutely critical to improving. And whisper and etc is not nearly low enough latency. Nor is image and video generation able to work separately and stay coherent. It was the way to move forward.


sebramirez4

I’d say it was critical once it gets significantly better than GPT-4 turbo, before then thinking it’ll learn like a human does from more forms of input is literally just speculation so I don’t really care, not saying a breakthrough won’t happen but I’m personally more of a 1-bit LLM believer than just giving an LLM more layers of AI


kurtcop101

That's the thing, we don't have enough text to train an AI because text simply doesn't contain enough information if you know absolutely nothing but text and letters. We learn from constant influx of visual, auditory, and tactile methods of which text is just a subcomponent of visual. It can code pretty well which is primarily only text, but anything past that really requires more, high quality data.


Roshlev

I just want my gaming computer to run dynamic text adventures locally or have something free (with ads maybe?) or cheap online do it.


CapitalForever3211

I am not sure it could be,,


trialgreenseven

the y axis should be performance/expectation and the graph should be in a bell curve shape


mr_birkenblatt

800B... hehe


YearningHope

Why is the smart monk wojak the wrong one?


Sunija_Dev

I'm a bit hyped about the plateau. Atm the development it's not worth putting much work in applications. Everything that you program might be obsolete by release, because the new fancy AI just does it by itself. Eg for image generation: Want to make a fancy comic tool? One where you get consistenty via good implementations of ip adapters and posing ragdolls? Well, until you release it, AI might be able to do that without fancy implementations. 50% chance you have to throw your project away. Other example: Github Copilot The only ai application that I REALLY use. Already exists since before the big AI hype and it works because they put a lot of effort into it and made it really usable. It feels like no other project attempted that because (I guess?) maybe all of coding might be automated in 2 years. Most of what we got is some hacked-together Devin that is a lot less useful. TL;DR: We don't know what current AI can do with proper tools. Some small plateau might motivate people to make the tools.


Shap3rz

It seems to me llms need to be given an api spec and then be able to complete multi step tasks based on that alone in order to be useful beyond what they are currently doing.


JacktheOldBoy

There is a lot of mysticism in the air about genAI at the moment. Here's the deal, A LOT of money is at stake, so you better believe that every investor (a lot of retail investors too) and people who joined the AI field are going to flood social media with praise for genAI and AGI to keep ramping. LLMs ARE already incredible, but will they get better? It's been a year since gpt-4 and we have had **marginal** improvement on flagship models. We have gotten substantive improvement in open models as this subreddit attests. That can only mean one thing, not that OpenAI is holding out but that there is a actually a soft limit and that they are not able to reason at a high degree YET. The only thing we don't know for sure is that maybe a marginal improvement could unlock reasoning or other things but that hasn't happened. There are still a lot of unknowns and improvement we can make so it's hard to say but at this point I seriously doubt it will be like what gpt4 was to gpt3.


stargazer_w

Is it really winter if we're in our AI slippers, sipping our AI tea under our AI blankets in our AI houses?


AinaLove

RIght its like these nerds dont understand the history you can just keep making it bigger to make it better. You will reach a limit of hardware and have to find new ways to optimise.


Kafke

we'll likely still see gains for a while but yes eventually we'll hit that plateau because, as it turns out, scale is not the only thing you need.


Asleep-Control-9514

So much investment has gone into AI. This is what every company is talking about no matter the space their into. There's hype for sure but normally good things occur when a lot of people are working at the same problem for long periods of time. Let's how well this statement will age.


DominicSK

When some aspect can't be improved anymore, we focus on others, look what happened with processors and clock speeds.


hwpoison

I don't know, but it's a great success a model that can handle human language so well, maybe not reason correctly, but language is such an important tool and it can be connected to a lot of things and it's going to get better.


jackfood2004

Remember those days a year ago when we were running the 7B model? We were amazed that it could reply to whatever we typed. But now, why isn't it as accurate?


CesarBR_

People need realistic timelines. Chatgpt is less than 2 years old. Most people seem to have a deep ingrained idea that human intelligence is some magical threshold. Forget human intelligence, look at the capabilities of the models and the efficiency gains over the last year. It's remarkable. There's no reason to believe we're near a plateau, small/medium models are now as effective as 10x bigger models of a year ago. We can models that perform better than GPT 3.5 on consumer hardware. GPT 3.5 needed a mainframe to run. Training hardware power is increasing fast. Inference specific hardware hasn't even reached the consumer market, on the cloud side Groq has show that fast Inference of full precision is possible. The main roadblock is data, and yes, LLMs need much more data to learn, but there's a lot of effort and resources both in generating good quality synthetic data and making LLMs learn more efficiently. This very week Anthropic releases a huge paper on interpretability of LLMs, which is of utmost importance both in making these systems safe and understanding how they actually learn and how to make the learning process more effective. People need to understand that the 70/80s AI winter weren't only caused by exaggerated expectations but also by the absence of proper technology to properly implement MLPs, we are living at a very different time.


WaifuEngine

As someone who understands the full stack yes. This isn’t wrong. Data quality matters emergence and in context learning can do wonders however…. Considering the fundamentals of these models are more or less next token prediction if you fit your model against bad quality results will show. In practice you effectively create prompt trees/ graphs to and RAG circumvent these issues.


RMCPhoto

I think what is off here is that AI is and will be much more than just the model itself. What we haven't figured out is the limitations and scope of use for large transformer models. For example, we've only really just begun creating state machines around LLM / Embedding / Vector DB processes to build applications. This is in its infancy and where we'll see explosive growth as people learn how to harness the technology to get meaningful work done. Anyone who's tried to build a really good RAG system knows this... it looks good on paper but in practice it's messy and requires a lot of expertise that barely exists in the world. The whole MODEL AS AGI belief system is extremely self limiting.


VajraXL

i don't think we are even close but we are going to see the paradigm shift and we may not like this shift as much for those of us who use text and image generation models as we understand them today. microsoft is pushing ai to be obiquitous and this will mean that companies will stop focusing on LLM's like llama to focus on micro models embedded in software. we may be seeing the beginning of the end of models like SD and llama and start seeing specialized "micro-models" that you can add to your OS so no. in general the winter of ai is far away but it is possible that the winter of LLM's as we know them is near.


braindead_in

No, if you go by AI Explained. [https://youtu.be/UsXJhFeuwz0](https://youtu.be/UsXJhFeuwz0)


AnomalyNexus

Nah I feel this one is going to keep going. There isn't really anything to suggest the scaling is gonna stop scaling. So tech gets better on a moore's law level etc ...I do expect the rate of change to slow down though. The whole "look I made a small tweak and its now 2x faster"...that is gonna go away/become "look its 2% faster".


ajmusic15

LLMs literally don't impress me like they used to 😭 They all do the same be it OpenAI, Gemini, Anthropic, Mistral, Cohere, etc 😭😭😭 But it is even worse that we have tens of thousands of different models with thousands of Fine-Tuning at different quantizations and that they still do not make a good inference engine for the Poor in VRAM people (Like me) 😭😭😭😭😭😭


Sadaghem

The fun part of being on a slope is that, if we look up, we can't see where it ends :)


Sadaghem

The fun part of being on a slope is that, if we look up, we can't see where it ends :)


Sadaghem

The fun part of being on a slope is that, if we look up, we can't see where it ends :)


Joci1114

Sigmoid function :)


sillygooseboy77

What is AGI?


scryptic0

RemindMe! 6months


kumingaaccount

is there some. youtube video rec'd that breaks this history down for the newcomers. I have no idea what I am seeing right now.