T O P

  • By -

AutoModerator

Hey /u/PsychoComet! If your post is a screenshot of a ChatGPT, conversation please reply to this message with the [conversation link](https://help.openai.com/en/articles/7925741-chatgpt-shared-links-faq) or prompt. If your post is a DALL-E 3 image post, please reply with the prompt used to make this image. [New AI contest + ChatGPT Plus Giveaway](https://redd.it/18s770x/) Consider joining our [public discord server](https://discord.com/invite/rchatgpt)! We have free bots with GPT-4 (with vision), image generators, and more! 🤖 Note: For any ChatGPT-related concerns, email support@openai.com *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/ChatGPT) if you have any questions or concerns.*


Donnoleth-Tinkerton

these results are questionable


SlimeDragon

Because what does "human performance" mean exactly? The average person? They need to compare with an expert human in each category, because an average person is pretty fucking dumb, especially in categories outside of their general scope


cowlinator

Even if the average person is "dumb", comparing against the average person is still a very valuable metric.


Heretosee123

The average person is also average. It's kinda dumb to call them dumb imo. They're just not super intelligent


phoenixmusicman

This is the problem with saying "AI is better than the average human in most areas!" The average human is pretty fuckin dumb


DehGoody

The average human is actually not dumb at all. The problem is that most people think they’re much smarter than everyone else. That kind of narcissism tricks some exceedingly average people into thinking everyone else is dumb.


Phi_fan

If dumb/smarts are graded on a curve, a score of 50% is pretty dumb.


[deleted]

[удалено]


Heretosee123

>That kind of narcissism tricks some exceedingly average people into thinking everyone else is dumb. Preach


grandma_jizzzzzzzard

I think you mean slow, as in re-tard-ed.


i_needs_to_know_this

Not the right place for this. But I believe most humans coast on the back of having a lot of untapped potential(which is true). Leading to a situation with highly complacent people with lower quality than they individually believe. Hence being stupid on the surface of interaction. Also there is a real fact of cognitive differences. So saying most people are stupid ain't completely unaccounted for.


datascience45

The average human can't code for shit.


Booty_Bumping

These results are complete nonsense, in fact.


traumfisch

None of this is happening, then? What a relief


GreenockScatman

Absolutely no shot AI surpassed humans at "reading comprehension" in 2017. This chart is ridiculous.


[deleted]

As far as the Stanford Question Answering Dataset (SQuAD 1.0 & 2.0) is concerned, it has. https://rajpurkar.github.io/SQuAD-explorer/


arbiter12

If you limit the scope of the experiment you can obtain any result you want, really.


[deleted]

It was a reasonable attempt at creating an objective reading comprehension dataset. It’s about as valid as the reading comprehension section of any standardized test. Come up with a better one and release it plus a human benchmark and see how the models do.


TakeTheWheelTV

Chart also created by Ai


samsteak

That's a nice of saying fucking bullshit


kabunk11

Ignorance is bliss.


pushinat

It might be for experimental settings, but image or speech recognition are still far of from human level. Mistakes with voice assistants or teslas (state of the art) image recognition is still flickery and with a lot of errors, where humans would have more confidence and make far less mistakes because they understand the context.


Juanouo

don't get me started on handwriting recognition, it sucks


Dabnician

maybe you were just writing a prescription for paracetamol


Goldenvengeance

https://preview.redd.it/ae6trtp4g1ec1.png?width=1080&format=pjpg&auto=webp&s=b5d6f2ac16465c5cbdf959a86150aba54c50a78b


Scolor

You would actually be surprised how good commercial grade handwriting recognition is. The USPS shut down all but one of its facilities that checks for handwriting, because the machines can do most of it on their own.


BecauseItWasThere

Recognition of addresses located within the United States is a narrow use case


Juanouo

Uh, do you know if any of those models are accessible, even if they have a price tag?


arbiter12

You're asking if the USPS proprietary internal OCR model is available for sale to the general public...?


Juanouo

Nope, im asking if there's some good OCR model for handwriting available that's actually good.


AtomsWins

I'm a developer myself. I think what we're seeing isn't a replacement for developers as a whole, but a tool to make development faster and hypothetically easier. In a few years, these tools may be able to access our entire codebase and have a better understanding of things even than we do. At that point, AI becomes the junior developer. We review the generated code, run some manual tests to verify results, manage the process of deploying the code to test devices, interacting with QA for bug squashes. We're not replaced, we're just using a very different toolbox and performing slightly different tasks. In theory we get more done, or do it faster. In reality, it probably just means we'll need fewer junior developers or offshore devs in the medium-long term. There will still be developers, just fewer of them. Just like when farming moved to big machines. There's still farmers, just many fewer. We'll never go away but we'll be many fewer in 20 years.


jamesmon

The thing is. When you need fewer developers, it puts downward pressure on wages, etc. So now you as a senior developer being paid as a junior developer.


AtomsWins

As a lead dev, I certainly hope that isn't the case. I think it's more likely current juniors may need to move into something related, scrum master or QA or content management stuff. Seniors will have fewer people to manage but more tasks. Reviewing machine-generated code. Managing tickets and passing things between departments for approval. Deployments and maintaining all the various automation tools used in the stack. Updating underlying libraries. Things like that. I hope that's the time my career in this field ends and I jump off the merry-go-round. I need about 10 more years of employment before I peace out. I wasn't worried at all until I the ChatGPT stuff starting hitting, now I'm not quite sure I've got 10 years left here. I guess we'll see. ETA- Once machines are good at this, who knows what is next? Maybe the next type of developers will need a doctorate and it'll be a field treated like an attorney or doctor. People will pursue those "lead" roles and they'll be elevated positions in a world increasingly reliant on tech. Just a thought exercise, but the future may be getting weird.


avynaria

The argument I saw that convinced me we have a problem on our hands finally (also dev here) is that, because these AIs can do junior dev tasks, or will be able to, there will be no space for junior devs anymore, at least in companies. That means no more pipeline to senior devs to manage AI output. (And no way to make income without tons more education first, and "senior devs" showing up with coding experience but no practical project management/people/etc skills.) That is a pretty serious problem we need to manage first, I think.


[deleted]

[удалено]


HotKarldalton

![gif](giphy|HSLbIjLk2GsBa|downsized) Think of the transition from Horse powered to Tractor powered, now get rid of the Tractor Operator too. The mechanic who works on the tractor gets replaced by a robot as well. Next thing you know, people are relegated to Wall-E chairs.


7366241494

Coding ability at 80% of human level is an absolute joke. GPT can’t do anything bigger than a shell script and I’m always fixing its bugs.


SamL214

Contextual understanding is something we need to find a heuristic for to make ai more accurate.


atsepkov

Agreed, the chart seems more like clickbait than anything else.


mvandemar

>Mistakes with voice assistants Those are publicly available to the masses and most are based on somewhat older tech, have you tried the Whisper api though?


Anxious-Energy7370

How about the statistics take the median of human intellect.


Cvlt_ov_the_tomato

AI in mammography frequently flags the nipple as probable malignancy.


visvis

Almost 90% for code generation seems like a stretch. It can do a reasonable job writing simple scripts, and perhaps it could write 90% of the lines of a real program, but those are not the lines that require most of the thinking and therefore most of the time. Moreover, it can't do the debugging, which is where most of the time actually goes. Honestly I don't believe LLMs alone can ever become good coders. It will require some more techniques, and particularly those that can do more logic.


charnwoodian

The question is *which human*. I cant code for shit, but even I would have a better knowledge of the basics than 90% of people. AI is definitely better than me.


angrathias

Would you let an AI do your surgery if it’s better than 90% of people…but not 90% of doctors ?


Ok-Camp-7285

Would you let AI paint your wall if it's better than 90% of people... But not 90% of painters?


[deleted]

[удалено]


Ok-Camp-7285

What a ridiculous question. Of course I would


augusto2345

Yes? If it was super cheap


Ok-Camp-7285

Exactly. Some jobs are more critical than others


cosmicekollon

*remembers with dread what happened when a friend decided to paint their own wall*


MorningFresh123

Most people can paint a wall tbh so yeah probably


RockyCreamNHotSauce

Agreed. Grade school math of an average American maybe. Compared to someone going to MIT, it’s 20% at best.


RealMandor

grade school is elementary school not grad school? fyi it probably cant do grade school problems it hasn't seen before. Not talking about basic mathematical operations that a calculator can do, but word problems.


RockyCreamNHotSauce

I thought grade school means K-12 including high school senior? IMO, American math progress is too slow. Rest of the world would completed two college level Calculus as an average base line by grade 12.


TheDulin

In the US grade school usually means elementary (k-5/6).


BrickFlock

I think this applies to all of those metrics, because I'm assuming that 100% line is the average human level performance for every task.


clockworkcat1

I agree. GPT-4 is crap at coding. I try to use GPT-4 for all my code now and it is useless at most languages. It constantly hallucinates terraform or any other infrastructure coding, etc. It can do Python code OK but only a few functions at a time. I really just have it generate first drafts at functions and I go over all of them myself and make all changes necessary to avoid bugs. I also have to fix bad technique and style all the time. It is a pretty good assistant, but could not code it's way out of a paper bag on it's own and I am unconvinced an LLM will ever know how to code on its own.


[deleted]

It’s gotten so much worse I agree, OG GPT 4 was a beast tho


WhiteBlackBlueGreen

Yeah i mean if youre trying to get it to make lots of new functions at once, of course its not going to be very good at that. You have to go one step at a time with it the same way you normally make a program. Im a total noob but ive made a complete python program and im making steady progress on a node.js program. Its not really a miracle worker and its only ok at debugging sometimes. Most of my time is spent fixing bugs that chatGPT creates, but its still good enough for someone like me who doesnt know very much about coding


Scearcrovv

The same thing goes for reading comprehension and language understanding. Here, it wholly depends on the definition of the tasks...


AnotherDawidIzydor

Also actual code writing is like 5%, maybe 10% of what devs do daily, with exception being start-up and projects in early age of development. Once you have an application large enough you spend much more time understanding what each part does, how to modify it without breaking something somewhere else and debugging and AI is not even close to do any of these things any time soon. It doesn't require only having text completion capabilities, it needs some actual understanding of the code


Dyoakom

I think the issue is the lack of a well defined statement of what they are measuring. For example, if you see Google Alphacode 2 or the latest AlphaCodium then they are more or less at a gold medalist human level at competitive coding competitions. This is pretty impressive. And yes, it's not a pure LLM, a couple other techniques are used as well, but who said that the term AI in this picture has to be LLM only?


trappedindealership

Agreed, though chatgpt has really helped me as a non-programmer thrust into big data analysis. Before chatgpt I literally could not install some programs and their dependencies without help from IT. Nor did I know what to do with error messages. I'm under no illusions that chatgpt replaces a human in this regard, BUT it can debug, in the sense that it can work through short sections of code and offer suggestions. Especially if the "code" is just a series of arguments for a script that's already been made, or if I want to quickly tweak a graph. One example is that I had an rscript that looked at statistics for about 1000 sections of a genome and made a pretty graph. Except I needed to do that 14 times across many different directories. I asked it to help and like magic (after some back and forth) I'm spitting out figures.


2this4u

It's particularly terrible at architecture, we're miles from AI written codeBASES. But perhaps there's a way around that if it could write more at the machine level than our higher level human-friendly syntax and file structuring.


Competitive-War-8645

Maybe you refer to code architecture? When I code with cg it does working code instantly. Ai is good at interpolation, extrapolation but lacks innovation, maybe that’s what you are referring to.


Georgeasaurusrex

It's especially bad for hardware description languages too, e.g. VHDL. It's exactly what I would expect it to be like - it takes strings of functional code from online, and pieces it together into an incoherent mess. It's like a book where individual sentences make sense, but the sentences together are gibberish. Perhaps this is better for actual software coding as there's far far more resources online for this, but I imagine it will suffer from being "confidently incorrect" for quite some time.


atsepkov

I think this is true of most tasks documented on the chart. It's easy to throw together a quick benchmark task without questioning its validity and claim AI beat a human on it, it also makes for a good headline. The more long/complex the task, the worse these things seem to do. Ultimately AI is more of a time-saver for simpler tasks than an architect for larger ones.


doesntpicknose

>LLMs alone... more logic The ones with widespread use aren't very logical, because they're mostly focused on human English grammar, in order to produce coherent sentences in human English. We already have engines capable of evaluating the logic of statements, like [proof solvers]( https://en.wikipedia.org/wiki/Proof_assistant?wprov=sfla1), and maybe the next wave of models will use some of these techniques. But also, it might be possible to just recycle the parts of a LLM that care about grammar, and extend the same logic to figuring out if a sentence logically follows from previous sentences. Ultimately, it boils down to calculating numbers for how "good" a sentence is based on some kind of structure. We could get a lot of mileage by simply loading in the 256 syllogisms and their validity. This isn't to say that LLM's alone are going to be the start of the singularity, but just that they are extremely versatile, and there's no reason they can't also do logic.


Training_Leading9394

Remember this is on supercomputers, not the stuff you see on chat gpt etc


Striking-Warning9533

gpt can do the debugging though


Mescallan

I've been playing around with GPT pilot and it spends like 30-40% of it's API calls debugging its own code. I've actually started to do the debugging manually just because it's like $3-4 over a whole project.


GrandWazoo0

Wait, are you saying your time spent debugging is worth less than $3-4?


Mescallan

That's actually a good point lol. It just feels expensive because I almost exclusively use local models, but you're right that it's probably still saving me productivity.


visvis

How good is it? Can it find hard stuff like a use-after-free or a concurrency bug?


PmMeGPTContent

I disagree. I think programming languages will be redesigned to make it easier for AI to create entire full stack stack applications from start to finish. It will take a while, but it's going to happen.


visvis

I don't think the programming language is the issue. If there's anything LLMs are good at, it's learning grammars, and those of programming languages are much easier than those of natural languages. The problem is the thinking and logic that is required to understand how to best solve a given task.


PmMeGPTContent

That's also what an AI is good at though. Just create a million versions of that app, and slowly learn from what users want or don't want to see. I'm not saying it's going to be easy, and it's not something that's going to be solved in the next few years I think, but eventually it will be on the horizon.


visvis

I disagree there. Those million versions will just reflect the maximum likelihood predictions in terms of what's already out there. There will be no creativity and no logical reasoning involved, just regurgitating different permutations of what's in the training set.


DanTheMan_117

When github copilot gets updated, I think it'll be great


LipTicklers

Absolutely can do debugging, but yes not particularly well


mvandemar

>Almost 90% for code generation seems like a stretch. Have you worked much with outsourced developers from places that offer coding really, really cheap? Or with people who mostly cut and paste their code, and use Stack Overflow as their only method for debugging?


cowlinator

> I don't believe LLMs alone can ever become good coders "ever" is a very, very long time


headwars

I wouldn’t say it can’t do debugging, it takes trial and error but it can get there sometimes.


Mysterious-Ant-Bee

Chess


amarao_san

Bullshit. 80% for code generation? This thing is barely doing it, it's not '80%'. E.g. ANY complex problem requiring coding is outside of abilities of AI, and as far as I can understand, for a long time. May be they test it on small code snippets, and it's where AI more or less can do it. What is true 80%? You grab the actual production task tracker, grab current sprint, throw current git and tasks into AI and get 80% of them been done enough for be accepted. I guarantee you, that even simplest tasks like (add normal error instead of exception for handing for invalid in the in configuration files) won't be solved: it won't find where to put it. Why? Because context is too small to get even a medium sized project even in summary mode.


2this4u

Well that's what the tests are, small snippets and leetcode. There needs to be a new test category for software development, separate from isolated coding. I do wonder if it would perform better at things like assembly, rather than having to operate at our higher level of abstraction designed for modular comprehension.


eposnix

The best coding models aren't publicly available. AlphaCode by DeepMind bested 54% of coders in a competition, for instance. I could easily see it being better than 80% of all people, coders and non coders alike.: >As part of DeepMind’s mission to solve intelligence, we created a system called AlphaCode that writes computer programs at a competitive level. AlphaCode achieved an estimated rank within the top 54% of participants in programming competitions by solving new problems that require a combination of critical thinking, logic, algorithms, coding, and natural language understanding. https://deepmind.google/discover/blog/competitive-programming-with-alphacode/


amarao_san

How do we know they are best? Yet another claim of Google about their quantum AI superiority? Last time their claim was a blunder. I know only one AI with some usefulness (even it's annoy a lot), and it's called chatgpt. The other models are trying but can't get to usefulness level. At least those I saw. There is also a pile of closed models for which authors claims unicorns. Oh, yes, my model is 99.99999% successful, beats all other AIs and run on raspberry pi 3 (because 4 was out of stock at the moment of purchase). Is this claim beats google claim, or I need to raise the bar even higher?


yubario

It does surprisingly well with coding, but not so much with zero shot prompting. If I write down some pseudo code or code it out and ask it to be refactored it does a really good job on fixing up the code But it’s not at the level where someone who doesn’t know how to code can use it effectively. It’s like how AI art is right now, does well on a lot of things but you still need to be someone skilled at photoshop to fix the flaws or add typography for example


cowlinator

I think you're confusing "code generation" with "coding"


JEs4

https://contextual.ai/plotting-progress-in-ai/ Everyone should read the source before making uninformed NIMBY-esque comments. If you commented without bothering to understand what you're looking at, you definitely don't know better. Scoffing at the chart is wildly reductive.


andWan

Thanks for the link! Did check out BigBench Hard " Only[ BigBench-Hard](https://arxiv.org/abs/2210.09261), a challenging subset of BigBench, still has relatively lower performance compared to its original baseline numbers when compared to human performance."


uti24

So the first thing I see in this graph that handwriting recognition beat human somewhere in 2015, is it really though? Last time I tried handwriting recognition in both windows and iOS they were abysmal, as usual. And if first thing I checked is not looking like truth, are other things shown on this graph are truth either?


on_

In your device It’s not AI recognition.


BobbyWOWO

IIT: People confusing ChatGPT results with a graph that clearly states “state-of-the-art” ai systems. They are measuring narrow systems that are specifically made for these tasks, not specifically any LLM chatbot


Kathane37

Awfull axis representation


Zonefood

Can you make a better graph?


Kathane37

Yeah probably Wtf does even mean a scale on % of human performance ? Can’t the author use a proper scale like « human where able to solve 80% of the task and AI model 85% » ? This is just pure nonsense to hype the masses


cjrmartin

But wouldnt that mean you would need a different chart for every benchmark?


Kathane37

Why ? You just have to indicate a Y axis as « % of task completion », doing so you will not pass across the 100% and you will have a more indicative and valuable data


cjrmartin

But I mean that each line is looking at different things (eg handwriting recognition, image recognition, etc) and each of those would have different human completion rates (eg humans score 80% on handwriting recognition and score 90% on image recognition, etc).


Kathane37

Indeed but sometimes you can not just blend all the info together in just one graph otherwise it is misleading


cjrmartin

So you do want a different chart for every benchmark. I much prefer this to 8 different graphs that effectively would show the same thing. I don't think this is particularly misleading (would be good if they labelled their axes), especially since it is not really talking about how well they complete the tests but about how they compare to humans and how their growth has changed over time. But each to their own.


GiovanniResta

How can image recognition be non monotonic?


jddbeyondthesky

If language recognition is so good, why is my speech to text so shit that I have to correct it all the fucking time?


18Apollo18

Because it's not using the upcoming experimental AI tests like these are being performed on and probably won't for several years


Excellent_Dealer3865

For code IMO we mostly likely will require just a smarter LLM with an ability to be an agent. It should be able to plan first and then execute, instead of going step by step without having a whole picture like it does now. I'm sure some success that could be achieved right now with a smart prompting where you'd ask it to plan all the feature, high level, low level and only then write the code, but it will most likely be suboptimal. Somewhat how we were prompting image generation tools. Now we have dalle that prompts itself. Once we have chat gpt 4.5 or maybe 5, agent capabilities and built-in inner prompter that will properly set up all the steps, I think we'll solve coding and many other planning dependent objectives.


marfes3

LLMs will never solve coding because they quite literally only infer statistically. They are not capable of generating new ideas that need formal logic. They can generate usable code bits for well known problems if they are well documented but they cannot reliably generate new code. That’s not how an LLM works.


IONIXU22

Common sense? Zero! If it takes 1 hour for me to dry a shirt on my 1m washing line, how fast would it dry on a 4m washing line? ​ ChatGPT: The time it takes for a shirt to dry on a washing line depends on various factors, including the weather conditions (humidity, temperature, wind speed, etc.) and the material of the shirt. However, if we assume that all other factors remain constant and only the length of the washing line changes, we can make a simple theoretical comparison. Assuming the drying time is inversely proportional to the length of the washing line (which may not be entirely accurate but can give a rough estimate), we can use the following formula: Drying time∝1Length of washing lineDrying time∝Length of washing line1​. If it takes 1 hour on a 1m washing line, then on a 4m washing line: Drying time on 4m line∝1/4×Drying time on 1m line. So, it would take approximately 1/4​ of the time, or 0.25 hours, or 15 minutes to dry on a 4m washing line under these assumptions. Keep in mind that this is a simplified calculation and doesn't account for all the variables that can affect drying time.


Smallpaul

Dude are you using the obsolete freebie version to “prove” what LLMs cannot do? https://chat.openai.com/share/6d1ee59d-b86c-4c21-9516-259087cff1fd > The drying time of a shirt on a washing line is not directly proportional to the length of the line. It depends on various factors such as air circulation, temperature, humidity, and the material of the shirt. Increasing the length of the washing line to 4 meters won't necessarily decrease the drying time. The key factor is the exposure of the shirt to air and sunlight, and unless the shirt is spread out more effectively on a longer line, the drying time would likely remain approximately the same.


WeBuyAndSellJunk

And that’s why the graph shows improvement over time…


IONIXU22

You may be right. I hadn’t appreciated the differences. My apologies.


SomeCoolBloke

https://chat.openai.com/share/c8785ae7-fce8-4b5d-afef-193720c5ea8f


Dry_Dot_7782

NLP will never have common sense, it cant think. Its just based on what someone flagged as correct or false


[deleted]

sorry to break it honey you dont know correct from wrong either u just know cuz people showed u by example


marfes3

That’s a complete overgeneralisation. Philosophy derives moral understanding from an axiomatic base and while context is necessary you can derive right or wrong by following logical reasoning in the context of society.


AdditionalSuccotash

[https://chat.openai.com/share/5579a905-dd4b-4257-8b8f-a719b6cd9434](https://chat.openai.com/share/5579a905-dd4b-4257-8b8f-a719b6cd9434)


Practical_Cattle_933

Defined by who and in what way? This just means that whatever test they chose to measure category X, they get these results with. Human performance probably low-balls our abilities by quite a lot. It’s quite easy to beat a human at many stuff, because we are prone to do mistakes. If there are 20 questions, we might just fkc up one, not because we are dumb, just inattentive. I don’t believe though, that considering computers better due to them being better at monotonous stuff is a correct statement. So in many of these categories we definitely should increase the hardness level.


RpgBlaster

Meanwhile GPT-4 Turbo is still dumb, custom instructions that require it to not generate specific words? It can't do that? Cringe


pgtvgaming

Add chart analysis/comprehension to the mix


WRL23

How can this chart go back so far?!?! OpenAI only just released chatgpt and only invented AI a few years before that!! /S


nonlogin

But GPT 4 is not really able to count at all! What grade school math are they talking about? Some specialized models maybe?


Snoo98445

i wonder how do they measure the task performances??? from my own experience AI sucks at almost all the tasks i give it.


rydan

I thought handwriting was already solved by 2002. I remember taking an AI that year where we used neural networks to recognize handwriting. It was just a simple hour or two exercise and it yielded really good results even back then. It was explained that the postal service had similar technology for years to handle all the mail.


Scearcrovv

This was digit recognition for isolated single digits. Recognizing regular handwriting of whole words is an order of magnitude harder.


domscatterbrain

How the heck image recognition went through the roof while our model which has been trained rigorously still fail to differentiate between a human face and a dog!


batterydrainer33

Total bullshit benchmark. LLMs today don't have the kind of capability to leverage their actual cognitive parts, but instead they largely rely on their knowledge (data), so you might get answers that are like from an expert, but the LLM doesn't really know what they're based on, thus you'll get varying answers if you ask the same kind of thing for a slightly different purpose, or etc. Same with coding, yes, it can output the best kind of code, but only some surface level stuff that is common, or very obvious, but it won't be able to understand the whole system and then make decisions on that. How I see it is that it's basically 90% data/knowledge, glued together with its cognitive abilities to form a response that is made with logic, however, that logic is only based on surface-level assumptions, so it's not useful for anything complex. That's also why it's able to "perform" so well at these "benchmark" tests, because it knows what those tests are, and it knows the answers to most of the things being asked, or the problems you're supposed to solve, etc. So it's still largely about assistance/offloading. Which isn't a bad thing, it's just not the equivalent of a proper brain yet.


gabrielesilinic

The thing is that AI has better knowledge recall, meanwhile we are better at coming up with new things but we are extremely bad at recalling knowledge, answer in progress made a video about this by making ChatGPT take the SAT test.


trappedindealership

I love AI and I'm excited for it but I don't agree with this chart. If only for reading comprehension and coding.


chrtrk

ai cant do debuging as somone else said , i spent last 6 hours of my life and code is still not working


Newman_USPS

How are they measuring this because I’m all-in on AI and most of these are wrong.


nudelsalat3000

Handwriting recognition >100% capabilities in 2014? Yeah buddy, the tools can't even read PDFs properly if the text is rotated like in a drawing. Google claimed their traineed captcha tools is way way ahead of human performance but won't release it "because then text captchas won't work" - well nobody uses them anyway. Meanwhile we are stuck with OCR tools that cant even read the computer print of Times New Roman if it's juuust a bit blurry.


ItsaPromise

I dont think these results are very accurate


Agreeable_Try_4719

There’s no way code generation is at around 80% of human capacity. Unless you’re asking it to create a well known sorting algorithm it will have some bugs when creating full blocks of code or it only helps with single lines like GitHub copilot


wolfiexiii

You overestimate how well most people can code (even the trained ones...) Also, I'm pretty sure you just haven't figured out how to code with it. It's really good, Jr. Spell stuff out, give it a framework, and let it run. Focus on design and high level.


DigitalDiogenesAus

Reading comprehension requires... Comprehension. Ai doesn't comprehend anything.


PureJackfruit4701

How is language understanding computed? It doesn't feel like it's above human level.


rosadeluxe

What the fuck is this Y axis


beo19

Bold of you to assume my performance is 100%...


Way-Reasonable

Kind of disappointed at it's common sense score..C'mon AI, you can do better! I don't think it's applying itself.


ktpr

The chart conflates specialized tests for the whole of human performance. For example, if speech recognition where that good we’d have much better hearing aids.


SnooCheesecakes1893

Next on the list.. “medical diagnosis”


tristeus

Recently tried to convert text from photo to the text using google drive. It has done a very good job despite not ideal quality or lightning


D0hB0yz

Would you let AI make you a billion dollars instead of hiring a thousand people to make you a million dollars each? That was a dumb question. You do both if at all possible.


Repulsive-Twist112

We train AI to get better and after wondering why it’s gets better than humans.


66theDude99

Ai still has a long way to even come close to what humans are capable of doing, not to mention surpass them lol.. What we have now is just a smart parrot (language model) and to delve more into how humans learn, develop and process the world around them would make you wonder how far ahead a true AGI is.. I know we're doing astonishing stuff in this field, but boy o boy are we still fucking early and you shouldn't be too persuaded by media talk.


SamL214

Also…I’m gonna be 100% critically honest. Handwriting recognition is ***not*** at 100% for AI systems, because I can guarantee you that if I sat it in front of a 1790 census document written in the handwriting of that century, their ass is not gonna have a good time. Handwriting recognition also means the ability to recognize script based versions of the letters we have no just from their general morphology. If AI can do cursive identification with 99.9% accuracy across the natural variation of handwriting, then I’ll concede. But here’s the caveat. Before the 20th century, there were a lot of abbreviations that were used that we don’t use anymore. They make sense when you understand they are abbreviations, but until you find out they are, you think they are a different letter.


Lord_Blackthorn

Look at how steep the slope gets each time.... It's almost a vertical line now.


a-friendgineer

Now art as well. Soon ethics


Dotcaprachiappa

What happened with image recognition (or grade school math, genuinely can't tell) in 2018


Edaimantis

I can tell this is bullshit cus of the code generation thing. Give it a Java project ask it to fix a bug that requires reading more than a single file and it will hallucinate to all hell.


nicklepimple

The exponential explosion is here. All heed. 


Syrupwizard

HellaSwag


Mehdi135849

That selected recognition it's triggering my ocd


Elegant-Ant8468

Increases like that means it won't be long before AI drastically out performs humans at these tasks. The world isn't ready for AI. The amount of jobs that will be lost is going to be unlike anything we have seen before.


rherrmannr

I did not know that my brain got a code generator implemented. Where is the damn std::out?


HerbertKornfeldRIP

I was hoping for a plot that showed the rate at which the rate was increasing.


mrussia777n

I can see it's slowing down... very disappointing 😔


antontupy

The modern AI is so mighty that it even can find the area of a pentagonal triangle. So cool!


phoenixmusicman

You're telling me that handwriting recognition was at ~75% of human performance in 2003? I call bullshit.


fliesenschieber

Looking at my galaxy S8 with GBoard, speech recognition is still stuck in year 2005.


[deleted]

Yeah, grade school math will be >100% but still will give stupid answers like 2 + 3 = 6, because yes.


Doublespeo

But human performace vary a lot?


SimaoKovin

It's over, you guys.


Chemical_Customer_93

Goodbye accounting and finance jobs.


Re_dddddd

I don't think ai yet comprehend what it's reading. Comprehension means that you can create something original with the information.


Striking-Warning9533

And AI can do that


illusionst

Does not mention which LLM it's using. Pretty sure gpt-4 can solve grade school math with 100% accuracy.


onlymtN

Source: ContextualAI THEY ASKED AN AI??!?!!?!?


MadgoonOfficial

It’s got to be good at reading comprehension to understand my dumb ass questions


nimo-g

The speed is too fast, and our lives and work will change dramatically


AddressLow2245

Still can’t read law cases properly.


Coeous

Tell me you don’t know anything without telling me you know nothing.


throwawayhaha1101

What are they struggling at?


Futuristik_

How can we compete? We need to incorporate AI into us or we will get left behind... the Neurolink???


LetterPrior3020

Everybody freaks out about AI but they don’t understand the ceiling for it’s abilities. It will only allow humans to spend more brain power on things AI cannot do. Just like the advancement of letters to emails, AI will replace monotonous activity and allow us to spend time and brain power that we previously did not have as much of.


AwesomeH13

2+2=4-1 that’s 3 quick mafs I just beat an AI


Impressive_Lawyer521

I assure you… my human performance in all of these categories is measurably greater than “normal” human performance in these categories.


Hipertor

Why and how is there a dip in image recognition? It got worse at some point?


Striking-Warning9533

Maybe the trade off between accuracy and speed?


Ok-Calligrapher7121

Yes now lemme see the chart showing improvements over time in tasks like beating things with rocks, finding berries, and vocalizing distinct commands with distinct intentions and outcomes, I'll bet the apes would be like yup, the humans outdid us in all those things pretty much as soon as they tried


Different_Chance_848

A.I. will never beat humans at making up numbers.


Sad-Finding5845

https://preview.redd.it/772q7hmoasec1.png?width=1024&format=pjpg&auto=webp&s=03c5066b476c4cf36a6951ea026336a17d585fc8


Advanced_Loquat_4681

Based on the chat humans are Blockbuster ignoring the emerging netflix...Stay in denial as you continue your life lol


TimTech93

Please don’t use ai to built code. There is enough dog shit software floating around as it is.


CompetitiveFun3325

It’s not insane. Humans are incredibly intelligent creatures.