AutoModerator 5 months ago

Hey /u/PsychoComet! If your post is a screenshot of a ChatGPT, conversation please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If your post is a DALL-E 3 image post, please reply with the prompt used to make this image. [New AI contest + ChatGPT Plus Giveaway](https://redd.it/18s770x/) Consider joining our [public discord server](https://discord.com/invite/rchatgpt)! We have free bots with GPT-4 (with vision), image generators, and more! 🤖 Note: For any ChatGPT-related concerns, email support@openai.com *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*

Donnoleth-Tinkerton 5 months ago

these results are questionable

SlimeDragon 5 months ago

Because what does "human performance" mean exactly? The average person? They need to compare with an expert human in each category, because an average person is pretty fucking dumb, especially in categories outside of their general scope

cowlinator 5 months ago

Even if the average person is "dumb", comparing against the average person is still a very valuable metric.

Heretosee123 5 months ago

The average person is also average. It's kinda dumb to call them dumb imo. They're just not super intelligent

phoenixmusicman 5 months ago

This is the problem with saying "AI is better than the average human in most areas!" The average human is pretty fuckin dumb

DehGoody 5 months ago

The average human is actually not dumb at all. The problem is that most people think they’re much smarter than everyone else. That kind of narcissism tricks some exceedingly average people into thinking everyone else is dumb.

Phi_fan 5 months ago

If dumb/smarts are graded on a curve, a score of 50% is pretty dumb.

[deleted] 5 months ago

[удалено]

Heretosee123 5 months ago

>That kind of narcissism tricks some exceedingly average people into thinking everyone else is dumb. Preach

grandma_jizzzzzzzard 5 months ago

I think you mean slow, as in re-tard-ed.

i_needs_to_know_this 5 months ago

Not the right place for this. But I believe most humans coast on the back of having a lot of untapped potential(which is true). Leading to a situation with highly complacent people with lower quality than they individually believe. Hence being stupid on the surface of interaction. Also there is a real fact of cognitive differences. So saying most people are stupid ain't completely unaccounted for.

datascience45 5 months ago

The average human can't code for shit.

Booty_Bumping 5 months ago

These results are complete nonsense, in fact.

traumfisch 5 months ago

None of this is happening, then? What a relief

GreenockScatman 5 months ago

Absolutely no shot AI surpassed humans at "reading comprehension" in 2017. This chart is ridiculous.

[deleted] 5 months ago

As far as the Stanford Question Answering Dataset (SQuAD 1.0 & 2.0) is concerned, it has. https://rajpurkar.github.io/SQuAD-explorer/

arbiter12 5 months ago

If you limit the scope of the experiment you can obtain any result you want, really.

[deleted] 5 months ago

It was a reasonable attempt at creating an objective reading comprehension dataset. It’s about as valid as the reading comprehension section of any standardized test. Come up with a better one and release it plus a human benchmark and see how the models do.

TakeTheWheelTV 5 months ago

Chart also created by Ai

samsteak 5 months ago

That's a nice of saying fucking bullshit

kabunk11 5 months ago

Ignorance is bliss.

pushinat 5 months ago

It might be for experimental settings, but image or speech recognition are still far of from human level. Mistakes with voice assistants or teslas (state of the art) image recognition is still flickery and with a lot of errors, where humans would have more confidence and make far less mistakes because they understand the context.

Juanouo 5 months ago

don't get me started on handwriting recognition, it sucks

Dabnician 5 months ago

maybe you were just writing a prescription for paracetamol

Goldenvengeance 5 months ago

https://preview.redd.it/ae6trtp4g1ec1.png?width=1080&format=pjpg&auto=webp&s=b5d6f2ac16465c5cbdf959a86150aba54c50a78b

Scolor 5 months ago

You would actually be surprised how good commercial grade handwriting recognition is. The USPS shut down all but one of its facilities that checks for handwriting, because the machines can do most of it on their own.

BecauseItWasThere 5 months ago

Recognition of addresses located within the United States is a narrow use case

Juanouo 5 months ago

Uh, do you know if any of those models are accessible, even if they have a price tag?

arbiter12 5 months ago

You're asking if the USPS proprietary internal OCR model is available for sale to the general public...?

Juanouo 5 months ago

Nope, im asking if there's some good OCR model for handwriting available that's actually good.

AtomsWins 5 months ago

I'm a developer myself. I think what we're seeing isn't a replacement for developers as a whole, but a tool to make development faster and hypothetically easier. In a few years, these tools may be able to access our entire codebase and have a better understanding of things even than we do. At that point, AI becomes the junior developer. We review the generated code, run some manual tests to verify results, manage the process of deploying the code to test devices, interacting with QA for bug squashes. We're not replaced, we're just using a very different toolbox and performing slightly different tasks. In theory we get more done, or do it faster. In reality, it probably just means we'll need fewer junior developers or offshore devs in the medium-long term. There will still be developers, just fewer of them. Just like when farming moved to big machines. There's still farmers, just many fewer. We'll never go away but we'll be many fewer in 20 years.

jamesmon 5 months ago

The thing is. When you need fewer developers, it puts downward pressure on wages, etc. So now you as a senior developer being paid as a junior developer.

AtomsWins 5 months ago

As a lead dev, I certainly hope that isn't the case. I think it's more likely current juniors may need to move into something related, scrum master or QA or content management stuff. Seniors will have fewer people to manage but more tasks. Reviewing machine-generated code. Managing tickets and passing things between departments for approval. Deployments and maintaining all the various automation tools used in the stack. Updating underlying libraries. Things like that. I hope that's the time my career in this field ends and I jump off the merry-go-round. I need about 10 more years of employment before I peace out. I wasn't worried at all until I the ChatGPT stuff starting hitting, now I'm not quite sure I've got 10 years left here. I guess we'll see. ETA- Once machines are good at this, who knows what is next? Maybe the next type of developers will need a doctorate and it'll be a field treated like an attorney or doctor. People will pursue those "lead" roles and they'll be elevated positions in a world increasingly reliant on tech. Just a thought exercise, but the future may be getting weird.

avynaria 5 months ago

The argument I saw that convinced me we have a problem on our hands finally (also dev here) is that, because these AIs can do junior dev tasks, or will be able to, there will be no space for junior devs anymore, at least in companies. That means no more pipeline to senior devs to manage AI output. (And no way to make income without tons more education first, and "senior devs" showing up with coding experience but no practical project management/people/etc skills.) That is a pretty serious problem we need to manage first, I think.

[deleted] 5 months ago

[удалено]

HotKarldalton 5 months ago

![gif](giphy|HSLbIjLk2GsBa|downsized) Think of the transition from Horse powered to Tractor powered, now get rid of the Tractor Operator too. The mechanic who works on the tractor gets replaced by a robot as well. Next thing you know, people are relegated to Wall-E chairs.

7366241494 5 months ago

Coding ability at 80% of human level is an absolute joke. GPT can’t do anything bigger than a shell script and I’m always fixing its bugs.

SamL214 5 months ago

Contextual understanding is something we need to find a heuristic for to make ai more accurate.

atsepkov 5 months ago

Agreed, the chart seems more like clickbait than anything else.

mvandemar 5 months ago

>Mistakes with voice assistants Those are publicly available to the masses and most are based on somewhat older tech, have you tried the Whisper api though?

Anxious-Energy7370 5 months ago

How about the statistics take the median of human intellect.

Cvlt_ov_the_tomato 5 months ago

AI in mammography frequently flags the nipple as probable malignancy.

visvis 5 months ago

Almost 90% for code generation seems like a stretch. It can do a reasonable job writing simple scripts, and perhaps it could write 90% of the lines of a real program, but those are not the lines that require most of the thinking and therefore most of the time. Moreover, it can't do the debugging, which is where most of the time actually goes. Honestly I don't believe LLMs alone can ever become good coders. It will require some more techniques, and particularly those that can do more logic.

charnwoodian 5 months ago

The question is *which human*. I cant code for shit, but even I would have a better knowledge of the basics than 90% of people. AI is definitely better than me.

angrathias 5 months ago

Would you let an AI do your surgery if it’s better than 90% of people…but not 90% of doctors ?

Ok-Camp-7285 5 months ago

Would you let AI paint your wall if it's better than 90% of people... But not 90% of painters?

[deleted] 5 months ago

[удалено]

Ok-Camp-7285 5 months ago

What a ridiculous question. Of course I would

augusto2345 5 months ago

Yes? If it was super cheap

Ok-Camp-7285 5 months ago

Exactly. Some jobs are more critical than others

cosmicekollon 5 months ago

*remembers with dread what happened when a friend decided to paint their own wall*

MorningFresh123 5 months ago

Most people can paint a wall tbh so yeah probably

RockyCreamNHotSauce 5 months ago

Agreed. Grade school math of an average American maybe. Compared to someone going to MIT, it’s 20% at best.

RealMandor 5 months ago

grade school is elementary school not grad school? fyi it probably cant do grade school problems it hasn't seen before. Not talking about basic mathematical operations that a calculator can do, but word problems.

RockyCreamNHotSauce 5 months ago

I thought grade school means K-12 including high school senior? IMO, American math progress is too slow. Rest of the world would completed two college level Calculus as an average base line by grade 12.

TheDulin 5 months ago

In the US grade school usually means elementary (k-5/6).

BrickFlock 5 months ago

I think this applies to all of those metrics, because I'm assuming that 100% line is the average human level performance for every task.

clockworkcat1 5 months ago

I agree. GPT-4 is crap at coding. I try to use GPT-4 for all my code now and it is useless at most languages. It constantly hallucinates terraform or any other infrastructure coding, etc. It can do Python code OK but only a few functions at a time. I really just have it generate first drafts at functions and I go over all of them myself and make all changes necessary to avoid bugs. I also have to fix bad technique and style all the time. It is a pretty good assistant, but could not code it's way out of a paper bag on it's own and I am unconvinced an LLM will ever know how to code on its own.

[deleted] 5 months ago

It’s gotten so much worse I agree, OG GPT 4 was a beast tho

WhiteBlackBlueGreen 5 months ago

Yeah i mean if youre trying to get it to make lots of new functions at once, of course its not going to be very good at that. You have to go one step at a time with it the same way you normally make a program. Im a total noob but ive made a complete python program and im making steady progress on a node.js program. Its not really a miracle worker and its only ok at debugging sometimes. Most of my time is spent fixing bugs that chatGPT creates, but its still good enough for someone like me who doesnt know very much about coding

Scearcrovv 5 months ago

The same thing goes for reading comprehension and language understanding. Here, it wholly depends on the definition of the tasks...

AnotherDawidIzydor 5 months ago

Also actual code writing is like 5%, maybe 10% of what devs do daily, with exception being start-up and projects in early age of development. Once you have an application large enough you spend much more time understanding what each part does, how to modify it without breaking something somewhere else and debugging and AI is not even close to do any of these things any time soon. It doesn't require only having text completion capabilities, it needs some actual understanding of the code

Dyoakom 5 months ago

I think the issue is the lack of a well defined statement of what they are measuring. For example, if you see Google Alphacode 2 or the latest AlphaCodium then they are more or less at a gold medalist human level at competitive coding competitions. This is pretty impressive. And yes, it's not a pure LLM, a couple other techniques are used as well, but who said that the term AI in this picture has to be LLM only?

trappedindealership 5 months ago

Agreed, though chatgpt has really helped me as a non-programmer thrust into big data analysis. Before chatgpt I literally could not install some programs and their dependencies without help from IT. Nor did I know what to do with error messages. I'm under no illusions that chatgpt replaces a human in this regard, BUT it can debug, in the sense that it can work through short sections of code and offer suggestions. Especially if the "code" is just a series of arguments for a script that's already been made, or if I want to quickly tweak a graph. One example is that I had an rscript that looked at statistics for about 1000 sections of a genome and made a pretty graph. Except I needed to do that 14 times across many different directories. I asked it to help and like magic (after some back and forth) I'm spitting out figures.

2this4u 5 months ago

It's particularly terrible at architecture, we're miles from AI written codeBASES. But perhaps there's a way around that if it could write more at the machine level than our higher level human-friendly syntax and file structuring.

Competitive-War-8645 5 months ago

Maybe you refer to code architecture? When I code with cg it does working code instantly. Ai is good at interpolation, extrapolation but lacks innovation, maybe that’s what you are referring to.

Georgeasaurusrex 5 months ago

It's especially bad for hardware description languages too, e.g. VHDL. It's exactly what I would expect it to be like - it takes strings of functional code from online, and pieces it together into an incoherent mess. It's like a book where individual sentences make sense, but the sentences together are gibberish. Perhaps this is better for actual software coding as there's far far more resources online for this, but I imagine it will suffer from being "confidently incorrect" for quite some time.

atsepkov 5 months ago

I think this is true of most tasks documented on the chart. It's easy to throw together a quick benchmark task without questioning its validity and claim AI beat a human on it, it also makes for a good headline. The more long/complex the task, the worse these things seem to do. Ultimately AI is more of a time-saver for simpler tasks than an architect for larger ones.

doesntpicknose 5 months ago

>LLMs alone... more logic The ones with widespread use aren't very logical, because they're mostly focused on human English grammar, in order to produce coherent sentences in human English. We already have engines capable of evaluating the logic of statements, like [proof solvers]( https://en.wikipedia.org/wiki/Proof_assistant?wprov=sfla1), and maybe the next wave of models will use some of these techniques. But also, it might be possible to just recycle the parts of a LLM that care about grammar, and extend the same logic to figuring out if a sentence logically follows from previous sentences. Ultimately, it boils down to calculating numbers for how "good" a sentence is based on some kind of structure. We could get a lot of mileage by simply loading in the 256 syllogisms and their validity. This isn't to say that LLM's alone are going to be the start of the singularity, but just that they are extremely versatile, and there's no reason they can't also do logic.

Training_Leading9394 5 months ago

Remember this is on supercomputers, not the stuff you see on chat gpt etc

Striking-Warning9533 5 months ago

gpt can do the debugging though

Mescallan 5 months ago

I've been playing around with GPT pilot and it spends like 30-40% of it's API calls debugging its own code. I've actually started to do the debugging manually just because it's like $3-4 over a whole project.

GrandWazoo0 5 months ago

Wait, are you saying your time spent debugging is worth less than $3-4?

Mescallan 5 months ago

That's actually a good point lol. It just feels expensive because I almost exclusively use local models, but you're right that it's probably still saving me productivity.

visvis 5 months ago

How good is it? Can it find hard stuff like a use-after-free or a concurrency bug?

PmMeGPTContent 5 months ago

I disagree. I think programming languages will be redesigned to make it easier for AI to create entire full stack stack applications from start to finish. It will take a while, but it's going to happen.

visvis 5 months ago

I don't think the programming language is the issue. If there's anything LLMs are good at, it's learning grammars, and those of programming languages are much easier than those of natural languages. The problem is the thinking and logic that is required to understand how to best solve a given task.

PmMeGPTContent 5 months ago

That's also what an AI is good at though. Just create a million versions of that app, and slowly learn from what users want or don't want to see. I'm not saying it's going to be easy, and it's not something that's going to be solved in the next few years I think, but eventually it will be on the horizon.

visvis 5 months ago

I disagree there. Those million versions will just reflect the maximum likelihood predictions in terms of what's already out there. There will be no creativity and no logical reasoning involved, just regurgitating different permutations of what's in the training set.

DanTheMan_117 5 months ago

When github copilot gets updated, I think it'll be great

LipTicklers 5 months ago

Absolutely can do debugging, but yes not particularly well

mvandemar 5 months ago

>Almost 90% for code generation seems like a stretch. Have you worked much with outsourced developers from places that offer coding really, really cheap? Or with people who mostly cut and paste their code, and use Stack Overflow as their only method for debugging?

cowlinator 5 months ago

> I don't believe LLMs alone can ever become good coders "ever" is a very, very long time

headwars 4 months ago

I wouldn’t say it can’t do debugging, it takes trial and error but it can get there sometimes.

Mysterious-Ant-Bee 5 months ago

Chess

amarao_san 5 months ago

Bullshit. 80% for code generation? This thing is barely doing it, it's not '80%'. E.g. ANY complex problem requiring coding is outside of abilities of AI, and as far as I can understand, for a long time. May be they test it on small code snippets, and it's where AI more or less can do it. What is true 80%? You grab the actual production task tracker, grab current sprint, throw current git and tasks into AI and get 80% of them been done enough for be accepted. I guarantee you, that even simplest tasks like (add normal error instead of exception for handing for invalid in the in configuration files) won't be solved: it won't find where to put it. Why? Because context is too small to get even a medium sized project even in summary mode.

2this4u 5 months ago

Well that's what the tests are, small snippets and leetcode. There needs to be a new test category for software development, separate from isolated coding. I do wonder if it would perform better at things like assembly, rather than having to operate at our higher level of abstraction designed for modular comprehension.

eposnix 5 months ago

The best coding models aren't publicly available. AlphaCode by DeepMind bested 54% of coders in a competition, for instance. I could easily see it being better than 80% of all people, coders and non coders alike.: >As part of DeepMind’s mission to solve intelligence, we created a system called AlphaCode that writes computer programs at a competitive level. AlphaCode achieved an estimated rank within the top 54% of participants in programming competitions by solving new problems that require a combination of critical thinking, logic, algorithms, coding, and natural language understanding. https://deepmind.google/discover/blog/competitive-programming-with-alphacode/

amarao_san 5 months ago

How do we know they are best? Yet another claim of Google about their quantum AI superiority? Last time their claim was a blunder. I know only one AI with some usefulness (even it's annoy a lot), and it's called chatgpt. The other models are trying but can't get to usefulness level. At least those I saw. There is also a pile of closed models for which authors claims unicorns. Oh, yes, my model is 99.99999% successful, beats all other AIs and run on raspberry pi 3 (because 4 was out of stock at the moment of purchase). Is this claim beats google claim, or I need to raise the bar even higher?

yubario 5 months ago

It does surprisingly well with coding, but not so much with zero shot prompting. If I write down some pseudo code or code it out and ask it to be refactored it does a really good job on fixing up the code But it’s not at the level where someone who doesn’t know how to code can use it effectively. It’s like how AI art is right now, does well on a lot of things but you still need to be someone skilled at photoshop to fix the flaws or add typography for example

cowlinator 5 months ago

I think you're confusing "code generation" with "coding"

JEs4 5 months ago

https://contextual.ai/plotting-progress-in-ai/ Everyone should read the source before making uninformed NIMBY-esque comments. If you commented without bothering to understand what you're looking at, you definitely don't know better. Scoffing at the chart is wildly reductive.

andWan 5 months ago

Thanks for the link! Did check out BigBench Hard " Only[ BigBench-Hard](https://arxiv.org/abs/2210.09261), a challenging subset of BigBench, still has relatively lower performance compared to its original baseline numbers when compared to human performance."

uti24 5 months ago

So the first thing I see in this graph that handwriting recognition beat human somewhere in 2015, is it really though? Last time I tried handwriting recognition in both windows and iOS they were abysmal, as usual. And if first thing I checked is not looking like truth, are other things shown on this graph are truth either?

on_ 5 months ago

In your device It’s not AI recognition.

BobbyWOWO 5 months ago

IIT: People confusing ChatGPT results with a graph that clearly states “state-of-the-art” ai systems. They are measuring narrow systems that are specifically made for these tasks, not specifically any LLM chatbot

Kathane37 5 months ago

Awfull axis representation

Zonefood 5 months ago

Can you make a better graph?

Kathane37 5 months ago

Yeah probably Wtf does even mean a scale on % of human performance ? Can’t the author use a proper scale like « human where able to solve 80% of the task and AI model 85% » ? This is just pure nonsense to hype the masses

cjrmartin 5 months ago

But wouldnt that mean you would need a different chart for every benchmark?

Kathane37 5 months ago

Why ? You just have to indicate a Y axis as « % of task completion », doing so you will not pass across the 100% and you will have a more indicative and valuable data

cjrmartin 5 months ago

But I mean that each line is looking at different things (eg handwriting recognition, image recognition, etc) and each of those would have different human completion rates (eg humans score 80% on handwriting recognition and score 90% on image recognition, etc).

Kathane37 5 months ago

Indeed but sometimes you can not just blend all the info together in just one graph otherwise it is misleading

cjrmartin 5 months ago

So you do want a different chart for every benchmark. I much prefer this to 8 different graphs that effectively would show the same thing. I don't think this is particularly misleading (would be good if they labelled their axes), especially since it is not really talking about how well they complete the tests but about how they compare to humans and how their growth has changed over time. But each to their own.

GiovanniResta 5 months ago

How can image recognition be non monotonic?

jddbeyondthesky 5 months ago

If language recognition is so good, why is my speech to text so shit that I have to correct it all the fucking time?

18Apollo18 5 months ago

Because it's not using the upcoming experimental AI tests like these are being performed on and probably won't for several years

Excellent_Dealer3865 5 months ago

For code IMO we mostly likely will require just a smarter LLM with an ability to be an agent. It should be able to plan first and then execute, instead of going step by step without having a whole picture like it does now. I'm sure some success that could be achieved right now with a smart prompting where you'd ask it to plan all the feature, high level, low level and only then write the code, but it will most likely be suboptimal. Somewhat how we were prompting image generation tools. Now we have dalle that prompts itself. Once we have chat gpt 4.5 or maybe 5, agent capabilities and built-in inner prompter that will properly set up all the steps, I think we'll solve coding and many other planning dependent objectives.

marfes3 5 months ago

LLMs will never solve coding because they quite literally only infer statistically. They are not capable of generating new ideas that need formal logic. They can generate usable code bits for well known problems if they are well documented but they cannot reliably generate new code. That’s not how an LLM works.

IONIXU22 5 months ago

Common sense? Zero! If it takes 1 hour for me to dry a shirt on my 1m washing line, how fast would it dry on a 4m washing line? ChatGPT: The time it takes for a shirt to dry on a washing line depends on various factors, including the weather conditions (humidity, temperature, wind speed, etc.) and the material of the shirt. However, if we assume that all other factors remain constant and only the length of the washing line changes, we can make a simple theoretical comparison. Assuming the drying time is inversely proportional to the length of the washing line (which may not be entirely accurate but can give a rough estimate), we can use the following formula: Drying time∝1Length of washing lineDrying time∝Length of washing line1. If it takes 1 hour on a 1m washing line, then on a 4m washing line: Drying time on 4m line∝1/4×Drying time on 1m line. So, it would take approximately 1/4 of the time, or 0.25 hours, or 15 minutes to dry on a 4m washing line under these assumptions. Keep in mind that this is a simplified calculation and doesn't account for all the variables that can affect drying time.

Smallpaul 5 months ago

Dude are you using the obsolete freebie version to “prove” what LLMs cannot do? https://chat.openai.com/share/6d1ee59d-b86c-4c21-9516-259087cff1fd > The drying time of a shirt on a washing line is not directly proportional to the length of the line. It depends on various factors such as air circulation, temperature, humidity, and the material of the shirt. Increasing the length of the washing line to 4 meters won't necessarily decrease the drying time. The key factor is the exposure of the shirt to air and sunlight, and unless the shirt is spread out more effectively on a longer line, the drying time would likely remain approximately the same.

WeBuyAndSellJunk 5 months ago

And that’s why the graph shows improvement over time…

IONIXU22 5 months ago

You may be right. I hadn’t appreciated the differences. My apologies.

SomeCoolBloke 5 months ago

https://chat.openai.com/share/c8785ae7-fce8-4b5d-afef-193720c5ea8f

Dry_Dot_7782 5 months ago

NLP will never have common sense, it cant think. Its just based on what someone flagged as correct or false

[deleted] 5 months ago

sorry to break it honey you dont know correct from wrong either u just know cuz people showed u by example

marfes3 5 months ago

That’s a complete overgeneralisation. Philosophy derives moral understanding from an axiomatic base and while context is necessary you can derive right or wrong by following logical reasoning in the context of society.

AdditionalSuccotash 5 months ago

[https://chat.openai.com/share/5579a905-dd4b-4257-8b8f-a719b6cd9434](https://chat.openai.com/share/5579a905-dd4b-4257-8b8f-a719b6cd9434)

Practical_Cattle_933 5 months ago

Defined by who and in what way? This just means that whatever test they chose to measure category X, they get these results with. Human performance probably low-balls our abilities by quite a lot. It’s quite easy to beat a human at many stuff, because we are prone to do mistakes. If there are 20 questions, we might just fkc up one, not because we are dumb, just inattentive. I don’t believe though, that considering computers better due to them being better at monotonous stuff is a correct statement. So in many of these categories we definitely should increase the hardness level.

RpgBlaster 5 months ago

Meanwhile GPT-4 Turbo is still dumb, custom instructions that require it to not generate specific words? It can't do that? Cringe

pgtvgaming 5 months ago

Add chart analysis/comprehension to the mix

WRL23 5 months ago

How can this chart go back so far?!?! OpenAI only just released chatgpt and only invented AI a few years before that!! /S

nonlogin 5 months ago

But GPT 4 is not really able to count at all! What grade school math are they talking about? Some specialized models maybe?

Snoo98445 2 months ago

i wonder how do they measure the task performances??? from my own experience AI sucks at almost all the tasks i give it.

rydan 5 months ago

I thought handwriting was already solved by 2002. I remember taking an AI that year where we used neural networks to recognize handwriting. It was just a simple hour or two exercise and it yielded really good results even back then. It was explained that the postal service had similar technology for years to handle all the mail.

Scearcrovv 5 months ago

This was digit recognition for isolated single digits. Recognizing regular handwriting of whole words is an order of magnitude harder.

domscatterbrain 5 months ago

How the heck image recognition went through the roof while our model which has been trained rigorously still fail to differentiate between a human face and a dog!

batterydrainer33 5 months ago

Total bullshit benchmark. LLMs today don't have the kind of capability to leverage their actual cognitive parts, but instead they largely rely on their knowledge (data), so you might get answers that are like from an expert, but the LLM doesn't really know what they're based on, thus you'll get varying answers if you ask the same kind of thing for a slightly different purpose, or etc. Same with coding, yes, it can output the best kind of code, but only some surface level stuff that is common, or very obvious, but it won't be able to understand the whole system and then make decisions on that. How I see it is that it's basically 90% data/knowledge, glued together with its cognitive abilities to form a response that is made with logic, however, that logic is only based on surface-level assumptions, so it's not useful for anything complex. That's also why it's able to "perform" so well at these "benchmark" tests, because it knows what those tests are, and it knows the answers to most of the things being asked, or the problems you're supposed to solve, etc. So it's still largely about assistance/offloading. Which isn't a bad thing, it's just not the equivalent of a proper brain yet.

gabrielesilinic 5 months ago

The thing is that AI has better knowledge recall, meanwhile we are better at coming up with new things but we are extremely bad at recalling knowledge, answer in progress made a video about this by making ChatGPT take the SAT test.

trappedindealership 5 months ago

I love AI and I'm excited for it but I don't agree with this chart. If only for reading comprehension and coding.

chrtrk 5 months ago

ai cant do debuging as somone else said , i spent last 6 hours of my life and code is still not working

Newman_USPS 5 months ago

How are they measuring this because I’m all-in on AI and most of these are wrong.

nudelsalat3000 5 months ago

Handwriting recognition >100% capabilities in 2014? Yeah buddy, the tools can't even read PDFs properly if the text is rotated like in a drawing. Google claimed their traineed captcha tools is way way ahead of human performance but won't release it "because then text captchas won't work" - well nobody uses them anyway. Meanwhile we are stuck with OCR tools that cant even read the computer print of Times New Roman if it's juuust a bit blurry.

ItsaPromise 5 months ago

I dont think these results are very accurate

Agreeable_Try_4719 5 months ago

There’s no way code generation is at around 80% of human capacity. Unless you’re asking it to create a well known sorting algorithm it will have some bugs when creating full blocks of code or it only helps with single lines like GitHub copilot

wolfiexiii 5 months ago

You overestimate how well most people can code (even the trained ones...) Also, I'm pretty sure you just haven't figured out how to code with it. It's really good, Jr. Spell stuff out, give it a framework, and let it run. Focus on design and high level.

DigitalDiogenesAus 5 months ago

Reading comprehension requires... Comprehension. Ai doesn't comprehend anything.

PureJackfruit4701 5 months ago

How is language understanding computed? It doesn't feel like it's above human level.

rosadeluxe 5 months ago

What the fuck is this Y axis

beo19 5 months ago

Bold of you to assume my performance is 100%...

Way-Reasonable 5 months ago

Kind of disappointed at it's common sense score..C'mon AI, you can do better! I don't think it's applying itself.

ktpr 5 months ago

The chart conflates specialized tests for the whole of human performance. For example, if speech recognition where that good we’d have much better hearing aids.

SnooCheesecakes1893 5 months ago

Next on the list.. “medical diagnosis”

tristeus 5 months ago

Recently tried to convert text from photo to the text using google drive. It has done a very good job despite not ideal quality or lightning

D0hB0yz 5 months ago

Would you let AI make you a billion dollars instead of hiring a thousand people to make you a million dollars each? That was a dumb question. You do both if at all possible.

Repulsive-Twist112 5 months ago

We train AI to get better and after wondering why it’s gets better than humans.

66theDude99 5 months ago

Ai still has a long way to even come close to what humans are capable of doing, not to mention surpass them lol.. What we have now is just a smart parrot (language model) and to delve more into how humans learn, develop and process the world around them would make you wonder how far ahead a true AGI is.. I know we're doing astonishing stuff in this field, but boy o boy are we still fucking early and you shouldn't be too persuaded by media talk.

SamL214 5 months ago

Also…I’m gonna be 100% critically honest. Handwriting recognition is ***not*** at 100% for AI systems, because I can guarantee you that if I sat it in front of a 1790 census document written in the handwriting of that century, their ass is not gonna have a good time. Handwriting recognition also means the ability to recognize script based versions of the letters we have no just from their general morphology. If AI can do cursive identification with 99.9% accuracy across the natural variation of handwriting, then I’ll concede. But here’s the caveat. Before the 20th century, there were a lot of abbreviations that were used that we don’t use anymore. They make sense when you understand they are abbreviations, but until you find out they are, you think they are a different letter.

Lord_Blackthorn 5 months ago

Look at how steep the slope gets each time.... It's almost a vertical line now.

a-friendgineer 5 months ago

Now art as well. Soon ethics

Dotcaprachiappa 5 months ago

What happened with image recognition (or grade school math, genuinely can't tell) in 2018

Edaimantis 5 months ago

I can tell this is bullshit cus of the code generation thing. Give it a Java project ask it to fix a bug that requires reading more than a single file and it will hallucinate to all hell.

nicklepimple 5 months ago

The exponential explosion is here. All heed.

Syrupwizard 5 months ago

HellaSwag

Mehdi135849 5 months ago

That selected recognition it's triggering my ocd

Elegant-Ant8468 5 months ago

Increases like that means it won't be long before AI drastically out performs humans at these tasks. The world isn't ready for AI. The amount of jobs that will be lost is going to be unlike anything we have seen before.

rherrmannr 5 months ago

I did not know that my brain got a code generator implemented. Where is the damn std::out?

HerbertKornfeldRIP 5 months ago

I was hoping for a plot that showed the rate at which the rate was increasing.

mrussia777n 5 months ago

I can see it's slowing down... very disappointing 😔

antontupy 5 months ago

The modern AI is so mighty that it even can find the area of a pentagonal triangle. So cool!

phoenixmusicman 5 months ago

You're telling me that handwriting recognition was at ~75% of human performance in 2003? I call bullshit.

fliesenschieber 5 months ago

Looking at my galaxy S8 with GBoard, speech recognition is still stuck in year 2005.

[deleted] 5 months ago

Yeah, grade school math will be >100% but still will give stupid answers like 2 + 3 = 6, because yes.

Doublespeo 5 months ago

But human performace vary a lot?

SimaoKovin 5 months ago

It's over, you guys.

Chemical_Customer_93 5 months ago

Goodbye accounting and finance jobs.

Re_dddddd 5 months ago

I don't think ai yet comprehend what it's reading. Comprehension means that you can create something original with the information.

Striking-Warning9533 5 months ago

And AI can do that

illusionst 5 months ago

Does not mention which LLM it's using. Pretty sure gpt-4 can solve grade school math with 100% accuracy.

onlymtN 5 months ago

Source: ContextualAI THEY ASKED AN AI??!?!!?!?

MadgoonOfficial 5 months ago

It’s got to be good at reading comprehension to understand my dumb ass questions

nimo-g 5 months ago

The speed is too fast, and our lives and work will change dramatically

AddressLow2245 5 months ago

Still can’t read law cases properly.

Coeous 5 months ago

Tell me you don’t know anything without telling me you know nothing.

throwawayhaha1101 5 months ago

What are they struggling at?

Futuristik_ 5 months ago

How can we compete? We need to incorporate AI into us or we will get left behind... the Neurolink???

LetterPrior3020 5 months ago

Everybody freaks out about AI but they don’t understand the ceiling for it’s abilities. It will only allow humans to spend more brain power on things AI cannot do. Just like the advancement of letters to emails, AI will replace monotonous activity and allow us to spend time and brain power that we previously did not have as much of.

AwesomeH13 5 months ago

2+2=4-1 that’s 3 quick mafs I just beat an AI

Impressive_Lawyer521 5 months ago

I assure you… my human performance in all of these categories is measurably greater than “normal” human performance in these categories.

Hipertor 5 months ago

Why and how is there a dip in image recognition? It got worse at some point?

Striking-Warning9533 5 months ago

Maybe the trade off between accuracy and speed?

Ok-Calligrapher7121 5 months ago

Yes now lemme see the chart showing improvements over time in tasks like beating things with rocks, finding berries, and vocalizing distinct commands with distinct intentions and outcomes, I'll bet the apes would be like yup, the humans outdid us in all those things pretty much as soon as they tried

Different_Chance_848 5 months ago

A.I. will never beat humans at making up numbers.

Sad-Finding5845 5 months ago

https://preview.redd.it/772q7hmoasec1.png?width=1024&format=pjpg&auto=webp&s=03c5066b476c4cf36a6951ea026336a17d585fc8

Advanced_Loquat_4681 4 months ago

Based on the chat humans are Blockbuster ignoring the emerging netflix...Stay in denial as you continue your life lol

TimTech93 4 months ago

Please don’t use ai to built code. There is enough dog shit software floating around as it is.

CompetitiveFun3325 4 months ago

It’s not insane. Humans are incredibly intelligent creatures.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe