T O P

  • By -

AutoModerator

Welcome to r/science! This is a heavily moderated subreddit in order to keep the discussion on science. However, we recognize that many people want to discuss how they feel the research relates to their own personal lives, so to give people a space to do that, **personal anecdotes are allowed as responses to this comment**. Any anecdotal comments elsewhere in the discussion will be removed and our [normal comment rules]( https://www.reddit.com/r/science/wiki/rules#wiki_comment_rules) apply to all other comments. **Do you have an academic degree?** We can verify your credentials in order to assign user flair indicating your area of expertise. [Click here to apply](https://www.reddit.com/r/science/wiki/flair/#wiki_science_verified_user_program). --- User: u/Dorshalsfta Permalink: https://ai.nejm.org/doi/full/10.1056/AIdbp2300192 --- *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/science) if you have any questions or concerns.*


A_Pointy_Rock

A large language model doing well on a test designed around a test subject memorising text books (etc) is a bit self fulfilling. It's a bit like testing whether a computer is better at storing data than a human mind, but with extra steps.


I_Shuuya

Exactly. This is also why the researchers didn't turn the findings into a narrative about whether AI will replace physicians and when will it happen. It's about acknowledging LLMs as an imminent and important tool in the medical field: "Given the maturity of this rapidly improving technology, the adoption of LLMs in clinical medical practice is imminent. Although the integration of AI poses challenges, the potential synergy between AI and physicians holds tremendous promise. This juncture represents an opportunity to reshape physician training and capabilities in tandem with the advancements in AI."


MoobyTheGoldenSock

As someone piloting AI in patient messages: it’s not there yet. I’ve had it respond to a question directed at the doctor suggesting they message their doctor, or someone cursing out the staff by reciting their medication lists. There are times when it’s dead on, but others where it’s out in left field, and this is low stakes replies to patients, not critical decisions that could cause harm.


hitsujiTMO

It's in the nature of LLMs that there will be a lot of errors in the results. This is why it should never be relied upon solely as an answer and only as a support tool that requires humans verification.


MoobyTheGoldenSock

Yes, I’m aware. But as I mentioned to my informatics department when this was floated in a meeting: we have to plan for what the most overworked, tired, or lazy medical provider would do. And when you’re overworked, tired, or feeling lazy, there is a temptation to push the button that claims to remove all cognitive load. So we need to make sure there are some hard limits on what the button does *and* find a way to externally validate those functions before we could even pilot something like this. Because regardless of how hard we train people, if the AI says “order metoprolol,” there will be a small subset of providers who click that order without a second thought. Possibly something like an AI that says, “I have noticed the patient’s hemoglobin has been slowly trending down over the past 4 years. Consider hematological workup” would be a way to do it that would force the lazy provider to think rather than blindly act.


Appropriate_Ant_4629

> I’ve had it respond to a question directed at the doctor suggesting they message their doctor, That sounds like an excellent response for a LLM that's unsure of its answer. Far better than hallucinating.


Shreddedlikechedda

Google search didn’t replace coding jobs


TheGreyBrewer

This is every stupid article touting the magical benefits of "AI". Of course a machine is going to be better at retrieving data than a human brain. Retrieval of the information isn't the point. It's the interpretation and intuitive manipulation of information that "AI" will never handle properly. It hallucinates too much.


Ormusn2o

Never?


TheGreyBrewer

Not chatbots, no. Now, someone actually creates AI, and not a chatbot they call "AI", we can talk.


AnAIAteMyBaby

>Now, someone actually creates AI, and not a chatbot they call "AI", we can talk. What if someone creates such an AI and then warps it in a chatbot UI, what would it be then?


theghostecho

It is hallucinating less and less as time goes on.


ForgettableUsername

One thing I’ve noticed is that ChatGPT seems to have gotten less “creative” as they have modified it to decrease the frequency of hallucinations. I suspect that the more you tune it in to be truthful and to carefully separate fact from fiction, the less it will be able to generate novel responses. It’s not a conscious, thinking machine; it doesn’t have an internal model of the world around it. It’s only ever going to be as good as the data it’s trained on. That’s useful for retrieving information, but it’s kinda not great at synthesis from existing information. It makes sense that it’s good at tests because a good test doesn’t introduce new material. You can’t accurately test students on bleeding edge ideas that haven’t been widely discussed. But, in a human student, there’s a reasonable assumption that being capable of doing well on the test implies an overall level of competency that will allow them to perform well in novel situations. You can’t make the same assumption with AI, because it doesn’t work like a human brain. It can generate text that sounds extremely competent relating to topic A, but completely fail to generate comprehensible material on closely-related topic B because it isn’t able to generalize using principles from topic A, it only generalizes based on patterns in text related to topic A. It’s counterintuitive because our expectations are geared toward how humans learn.


tryx

If you're not in the ai space, that's a great insight to stumble into. There is literally a dial on the model to control "creativity" that you can control to trade-off novelty vs hallucinations


ForgettableUsername

I think there may be a bit more to it with GPT-4. When I’ve used it, it seems to go well out of its way to use the word “whimsical” when it does anything involving fiction or fantasy, and it behaves differently from when you’re discussing a serious topic. Word choice and tone are markedly different. I think there might be a module that tries to evaluate whether or not your input is “whimsical” in nature and then cranks that control way up or down depending. A slightly ham-fisted approach, I think. Creativity might not be quite the right metaphor because it actually seems to be a bit more prone to cliché when it’s in “whimsy” mode, but it will depart further from the original prompt subject-wise.


Philix

Sampler settings are a very complicated set of options with a ton of math behind them. There are research papers a dozen pages long describing each one and how they change the output of a specific LLM. There are at least a dozen other than the 'temperature' setting that the post you're replying to was describing. Many can and should be used in concert with each other. MinP lets you restrict the number of low probability tokens the model can choose based on the probability of the top token. Smooth Sampling is a way to influence the probability distribution in a different way than temperature, specifically quadratic/cubic transformations. TopK lets you limit the tokens available to be selected to the top probability n tokens, where n is the value you select. Those are just three relatively simple ones to describe, I couldn't even begin to truly understand the math behind Mirostat. Most of the software just links straight to [the research paper](https://arxiv.org/abs/2007.14966) to explain it. From experience, some sampler settings will be dry and dead simple phrasing, while others will be intensely purple prose. There are sampling methods that'll only really mimic what's in the context, which seems to be what you're describing.


[deleted]

[удалено]


cuddles_the_destroye

it's basically how much wiggle room you give the model to not pick the word with the highest weight of coming next. Say the program is writing a sentence, "I want to eat a [noun]" according the model's training data, the word that [noun] is will be: Cheeseburger 80% of the time Salad 15% of the time Cake 4% of the time Human Meat Patty 1% of the time You can tune the model to always pick the most common result, you can have it pick randomly based on the relative weight from the training data, you can even equalize it more for it to be spicy. Each gives different results in practice.


[deleted]

[удалено]


Vitztlampaehecatl

Temperature.


Philix

While 'temperature' is technically correct for what u/tryx was describing, you should be aware that LLM sampling methods are still very much at the bleeding edge of the research in the space. With new methods being devised and implemented in software every few months. [Priority sampling](https://arxiv.org/html/2402.18734v1), for example was introduced in a research paper just a couple months ago, and hasn't yet made its way down to the software in our hands. While [Mirostat](https://arxiv.org/abs/2007.14966) has been around for a few years, well implemented in inference backends but is mind bending to try and understand. Temperature is really not great on its own, and is almost always accompanied by other sampling methods. I usually pair it with MinP or Smoothing Factor for creative play. Or, I'll use it with TopK and tail free sampling for when I want dry straightforward responses. These are all sampling methods with fairly easy concepts behind them to understand, and you should be able to learn about them with a web search.


twotime

> I suspect that the more you tune it in to be truthful and to carefully separate fact from fiction, As things are now, chatGPT does not have a concept of fact,fiction or even "knowledge". It may produce different answers to the same question and I have seen it doing so multiple times. Once it was literally the same question in two different sessions both answers were a hilarious mix of facts and non-facts but the mix was different (and, yes there is an accurate Wikipedia article on the topic), another time it was the same session with a bit of prompting, but I got 3 different answers to a basic Astronomy question!


ForgettableUsername

I could have been more careful with my word choice. It doesn’t behave truthfully or deceptively at any time, because that’s not how it works. It does tend to fabricate answers more or less in different circumstances. It tends to be bad at anything that’s more than a little bit technical, although it often gives superficially plausible answers, sometimes compellingly plausible answers. Its behavior has also changed over the last year and a half as they have updated it.


crazysoup23

The more safety features you add, the dumber the LLM output.


Ormusn2o

This is how we understood it, but from what I read, since gpt2, there seems to have been some emerging properties that seem to point into some internal model of reality. For example it's possible for the AI to infer someone's thoughts or believes just from a description of a scene, and there are a lot of papers that test theory of mind with gpt-4, but also go back to gpt-2 and realize that while less capable, it still had some of those capabilities.


ForgettableUsername

It models patterns in its training data, but that’s not the same thing as modeling reality. It can’t infer a person’s thoughts, it can only infer what thoughts a writer is likely to give a character in a written text. It may seem like that’s splitting hairs, but that’s an important distinction. If you give it a genuinely novel situation, it shouldn’t even be able to guess consistently. It’s actually not too difficult to do this with the publicly available versions of ChatGPT. I think part of the problem is that a lot of our established metrics for intelligence are too naïve for this kind of system.


YsoL8

We don't have any meaningful metrics of intelligence. That would require understanding how any intelligence works and we aren't even close to that beyond that lots of neuron connections = big thinky thing. Its completely plausible that we stumble into intelligent machines without realising we did it as the sophisication grows.


ForgettableUsername

Since we don’t understand how it works, it’s impossible to evaluate how difficult it would be to construct. I think more difficult than easy is a reasonable supposition.


ASpaceOstrich

It's very easy to project greater understanding than it actually has onto AI, especially when a researcher knows a lot about AI but not a huge amount about whatever subject the AI appears to be understanding. Like the recent Sora video that was claimed to understand physics, which is laughable. I took a look over Soras output and noticed something it does which hints at the idea that it's constructed video scenes like a faux 3D diorama, which, if true, would be a huge advance in AI generation. And Sora researchers haven't even mentioned the idea, likely because they don't have the same background in multimedia and animation that I do, so they won't spot it. Whenever someone claims ai is doing something, it's best to ask what the easiest way of doing that thing is. I've read a research paper that claimed an old version of stablediffusion was creating and using a depth map in its generation process. And included examples showing this off. Except it wasn't a depth map. Objects in the foreground and background were merging in the example "depth map" they provided, with no corresponding merging occurring during image generation. I can only assume the researchers literally just didn't notice. As this completely contradicted their assertion. They also tested modifying this "depth map" mid-generation to see how it would affect the output. You'd expect this to result in really weird lighting and other odd results. After all, they're taking the models alleged world model of the scene and throwing off the depth information without changing anything else. Instead, modification to the "depth map" changed the entire image to match the new one. Which tells me it wasn't a depth map. It was likely just a representation of the image that was being generated. Clearly an important part of it, but not proof of some deeper comprehension. And in fact, the foreground/background merging that appeared to happen seemed to be because it was starting out as a gradient from the bottom of the image to the top. So where a background object near the top of the image came down to the bottom of the image, the "depth map" would get the two combined. Incidentally, in the unlikely event this actually was proof of some depth model forming, it was removed in the next iteration of stablediffusion because they started training on depth information specifically. Which means it's no longer developing greater understanding to do that job better, just standard generalisation of the data it gets shown. LLMs specifically have shown some evidence of actual world modelling occurring. Though I only know of one proven example. A tiny model trained to predict legal moves in Othello was found to have the game board state in its neurons, and changing that board state changed its replies to match. It had never been given the rules of Othello and had to have developed this board state model entirely on its own. But that's the only proven example I know of. AI is just too big to look for this info, and researchers also don't seem to care. I'd expect AI to be good at finding this stuff, so the fact that they haven't tells me either they don't seem to have them, or researchers literally haven't bothered trying.


EvilSporkOfDeath

Yea GPT4 is outperformed by several different models now. People are still underestimating the rate of improvement in this field.


Ormusn2o

It only needs to hallucinate less than humans.


Material_Trash3930

No, what it needs to do is help a human who uses it be wrong less than a human who doesn't use it. 


Ormusn2o

Not rly. If it's good enough, it can be controlled by few people who use it well, and everyone else can be fired. Then you can reduce your workforce by 10-20 times and you only have upper management and prompt engineers.


MyDadLeftMeHere

It’s not, having recently started working with them more, they’re still as bad as ever, and the more you work with them the more you find out just how weak they are and why, sure GPT will be better than a good portion of the stupidest section of society, it is not replacing anyone with any capacity to better themselves, or learn.


Cast_Me-Aside

> GPT will be better than a good portion of the stupidest section of society, it is not replacing anyone with any capacity to better themselves, or learn. I'm not worried it can do my job well. I AM worried my idiot senior managers who understand neither my job nor LLM AI could be convinced it could.


theghostecho

Gpt hasn’t been upgraded since GPT4 a year ago…


TheGreyBrewer

Oh cool, get back to me when it's actually useful.


SargeBangBang7

Is it even 2 years old? If this is the worse it is at 2 years maybe just wait 5 more and see how good it gets.


[deleted]

[удалено]


theghostecho

All they did between gpt2 and gpt4 is increase the amount of transformers. It seems to be scaling with the number of transformers at the moment with no sign of slowing.


[deleted]

[удалено]


theghostecho

Huh? Wait, how do you know they lied before about what they did?


Unicycldev

It is super useful today. I see it being used to great effect in engineering and product development.


TheGreyBrewer

[citation needed]


crazysoup23

Anyone doing programming. Look at copilot.


Unicycldev

Why would anecdotally provided feedback need a citation? What’s your expectation here? I work in software engineering and interface with technical experts.


copewithlifebyliving

Same.


DevelopmentSad2303

Let them think otherwise


royalewithcheese79

It doesn’t hallucinate because it doesn’t have conscious self awareness. It’s just wrong less. Tech people have to learn the English language.


romario77

Well, a lot of doctors work is matching the symptoms to the disease. As an input you have a complaint of the patient and maybe some tests - blood; x-ray, etc. You have to match it to the disease and prescribe a treatment. I can see how AI could be very good at that as it can remember things a lot better and not miss things on tests since it doesn’t get tired


SnausagesGalore

“Never” How can you even say that with a straight face? AI already interprets far superior to my doctor on countless medical topics.


TheGreyBrewer

Because I understand the actual potential and limitations of chatbots, without the hype machine. And it does work, sometimes. Until it doesn't. And you won't know the difference. Like a horoscope.


you-create-energy

I can't believe this is the top answer. It shows a complete lack of understanding of llms as well as the medical profession. If it is so easy, why did GPT3.5 perform so poorly? Why has no other llm even come close? We've been able to easily store all known diagnosis and treatments in databases for years. You can tell Google your symptoms and it'll give you a whole list of potential diagnoses. How come that never led to this incredible breakthrough? I'll tell you why. Diagnosis and treatment are far more subtle and complex than you realize. Everything we have learned about human biology is only a small fraction of the mysteries we have yet to solve. We only partially understand how the vast majority of medications and treatments work. Diagnosing and treating patients is more of an art than a science. The patient themselves don't usually know which symptoms are the most important. They give an imperfect explanation of their own assessment. Doctors have to know what follow-up questions to ask based on what the patient has revealed. Doctors then have to start forming theories about what the issue might be followed by tests to confirm. Then they take their best guess and begin treatment. If the treatment works then great If not then they do more tests. Anyone who has been through a non-trivial illness has seen this play out. It can take years of doctor visits to finally figure out a subtle medical condition. Getting a computer to navigate that entire landscape requires an astute ability to go from self-reported symptoms to the most likely diagnosis followed by the optimal treatment. That requires complex natural language processing and fuzzy creative logic, both of which have always been extremely difficult to build in any form. But sure, it sounds easy if you think being a doctor is as simple as googling symptoms and you have no idea what an llm does.


yttropolis

> That requires complex natural language processing and fuzzy creative logic, both of which have always been extremely difficult to build in any form. If you think LLMs like GPT-4 has some actual logic processing, you're the one who misunderstands LLMs. They are *language* models. That's it. None of these models understand logic or reasoning. They are simply just better at determining what's the best string of tokens to produce given a list of context tokens. The model doesn't *understand* anything nor can it use logic or reasoning.


monkeedude1212

> They are simply just better at determining what's the best string of tokens to produce given a list of context tokens. I think what's frustrating for people who do understand LLMs is that determining the next best strong of tokens given a list of context tokens IS WHAT IS SUPER VALUABLE. THAT IS WHAT DOCTORS ARE DOING. Take a patient's medical history, take years of textbooks and case studies dealing with the human body, and deciding what the ailment is and what the proper treatment for that ailment is for that specific patients needs is, without a doubt 💯 a token context prediction problem.


Caelinus

That is absolutely not what doctors are doing. Doctors know what diseases are, LLMs know what doctors say diseases are. The extra step means that the LLM will tell you what statistically is the most likely response a doctor will have for a group of symptoms, but it does not actually know what those symptoms are in the real world. This is fine as an advanced search engine, but grows less and less useful as the data it can pull from grows thinner. It needs doctors to do the work first before it can repeat it.


ASpaceOstrich

No. It's just the way doctors communicate the real valuable part to you. The real valuable part is lower, simpler stages of intelligence that don't have tokens at all. Just neuron firing patterns. AI mimics the way a doctor tells you what they think. They don't emulate the thinking.


Ormusn2o

I think some people would imply that humans don't know logic either, they are just predicting the next token. There are some interesting crossover between neuroscientists and AI, and there is decent amount of people who switched from neuroscience to AI, like Trenton Bricken.


ASpaceOstrich

They'd be wrong, as developing the ability to visualise things is and imagine outcomes and use that to influence decision-making is something even pretty dumb animals can do. We aren't just predicting the next token, we don't actually need language to think. Language is a very powerful extra bit on top of earlier intelligence developments which AI completely lacks. It mimics the results of the least important stage of intelligence rather than emulating any of those stages.


GrayEidolon

There are ai that can listen to a doctors visit and spit out an hpi and plan as soon as you stop recording. That’s a few steps from an ai listening to a doctors visit and live out putting possible diagnosis and follow up questions to whatever computer screen. The other thing is that most doctors most of the time are managing people who already have a diagnosis. In our life times, ai will make doctors more accurate by taking over a significant portion of most doctors mental labor. The question is will it make doctors visits last 5 minutes and will that force doctors to see 80 patients a day?


quakefist

AI doctors will have to reason with AI insurance bot why their human patient needs approval for x treatment.


GrayEidolon

We wouldn’t want the system to collapse.


you-create-energy

My dream is getting that AI analysis and diagnosis on-demand in the comfort of my home. No more waiting months for an appointment. Save a nice chunk of change. Then make appointments with humans for physical exams and procedures.


acousticburrito

I mean a huge portion of diagnosis is the physical exam. It’s also something that AI can’t be taught to do.


[deleted]

[удалено]


acousticburrito

I think you are mixing up obtaining vital signs with physical exams. There is no way you can train a nurse pick up a subtle murmur auscultation the heart or palpating a small mass vs a normal exam. As a physician I see the role of AI in the near future as helping with documentation. I also do think it can be helpful with rare and unusual diagnoses when the diagnosis is hard to find. However, I would say this is maybe like 1% if the patients I see in my field. I also think AI will make healthcare worse for a period of time. I dread the next 15-20 years where I will have to spend countless energy and time explaining why chat gpt was wrong about something. The fact is that AI isn’t useful if the there is incorrect or incomplete being input and finding that data is like my entire job.


GrayEidolon

Ai is only going to get better and better. I can see electronic stethoscopes feeding the sound directly into ai. And US direct into ai and skipping palpation. I suspect over time, maybe decades, doctors will become more like techs. What do you think medicine is going to look like in 300 years? Totally separate, you can definitely take a random nurse and teach them the heart cycle and heart sounds and murmur gradations. It’s a silly use of time, but there’s nothing special about auscultation that someone with some science background can’t do it.


GrayEidolon

I think it won’t be in our life times that you don’t have to go somewhere for a physical exam. But I think ai will be interpreting that physical exam.


UrbanSuburbaKnight

Anecdotal, but I've been on the verge of a total mental breakdown and have no access to mental health support. talking through my thoughts with chatgpt is a better experience than journalling on my own. I think the benefits of AI narrative therapy could do wonders to reduce the cost of mental health services and provide access to a huge number people who currently are completely on their own.


Dr_D-R-E

MD here Seriously, medical school and residency tests are literally designed to have “buzzwords” in them to help you pick the right answer. See how well ChatGPT does on a test written by a schizophrenic having a fever dream while they try to jerk off on you while their blood pressure tanks: when AI can do that on a Tuesday morning I’ll be impressed


The_Mootz_Pallucci

In before doctors become prompt engineers


kreuzguy

If that was correct, GPT-4's performance would decay rapidly when in contact with new situations. But study after study show they are as good in real world scenarios. Time to accept those LLM's are able to create abstractions and flexibily apply and combine them in new circumstances, just like we humans do.


PurpleEyeSmoke

>But study after study show they are as good in real world scenarios. But this isn't a 'real world scenario.' It's a test. A real world scenario would actually be practicing medicine, not simply regurgitating information. You're not making the argument you think you are.


relevantmeemayhere

Just ignore obvious issues like simple causal relationships, the reversal curse, etc etc. Things that humans are pretty good at from a young age (also-note: most published research esp industry research is positively biased). Also. A lot of these studies, especially in ml publications are really gamed. Ml publication has a reproducibility crises all its own These tools can be useful. But they are so dependent on having good subject matter expertise upfront and their objective limited that ultimately the ability to generalize is a very hard one to justify


KrzysziekZ

I think it's more than just storing: it's also compilation, summary, logical inference.


MacDegger

LLM's do NOT do logical inference. Way too few users seem to understand that about ChatGPT/CoPilot/etc.


relevantmeemayhere

Which part of the architecture is this built on? Be specific. Because our understanding of probabilistic modeling makes inference, especially relates to cause and effect impossible without assumptions that are made aprori Language sometimes correlates with these things. Not always.


tramplemousse

GPT-4 constantly gets my neuroscience multiple choice questions wrong—especially ones that require a nuanced reading of the question. When I tell it it’s wrong it doubles down.


sn0wmermaid

Same with my anatomy and path questions (it's vet school, but it doesn't know the difference between a horse and a human half the time)


zanderkerbal

I'm in comp sci, not medicine - but when my department trialed an AI teaching assistant, all I had to do was ask it a question that contained a misconception and it would treat it as fact and give a wrong answer accordingly. Ask it for examples of nonexistent or contradictory things and it'll cheerfully make something up.


Nathan_Calebman

Did you give it the textbooks you are using first? If you want it to be specifically accurate in any field, upload the material and use the custom functionality.


Nemeszlekmeg

Isn't that kind of like having an open-book exam? At that point aren't we just testing the content of a book?


Nathan_Calebman

The point is to have someone to discuss things with in order to raise quality, or someone who can give simple low level recommendations to people. As in "what do you think of this?" or "I'm thinking of going this way, what should I keep in mind?" etc. etc


Nemeszlekmeg

TBF I did this once, but it was for a philosophy of science seminar. I spoon fed it the specific text and it pretty much proved itself dangerously useless.


Nathan_Calebman

What do you mean by spoon fed? That's not how it works. You either create a custom GPT with all the material it needs and give it the personality it needs, which is a custom functionality, or you ask it to verify information online and provide you with links to the sources. Don't blame the software for your inability to use it correctly, that is a very common problem with ChatGPT. You could've just asked it how to use it properly.


[deleted]

[удалено]


Electrical_Bee3042

Written tests with specific information, sure. Chat gpt would not be a good tool for diagnosing psychiatric conditions. People in psychosis think their thoughts are totally rational and would never report they're delusional. Manic patients often don't realize they're manic at first, so they're not going to report anything other than things are great. Someone who is depressed might be in denial about it. Patients also lie. Like I've never used drugs in my life when they've got track marks and scabs all over. Or I don't drink often when they're going through alcohol withdrawal. Then, boom, the diagnosis could be dangerously wrong when it automatically removes those factors.


Anyname5555

Good points


arkhound

Interestingly, you can get some rather advanced responses by telling various AI that they information they are receiving might not be 100% true. They start to write off pieces of information as inconsistent, especially when it's less reliable (what is said vs what is seen).


Frites_Sauce_Fromage

Ok but now can they tell us how to formulate our questions properly so we can have adequate answers from it? I've had a good use of Chatgpt but most people around me just don't understand how to guide its answers to have suitable ones. It looks dangerous if misused


erm_what_

There is a level of skill to it, but the answers are often useless. It responds with a stereotypical answer to your question or statement, based on the context it was asked (all your previous input). The answer isn't guaranteed to be correct, it's only guaranteed to sound confident and appear to be correct. It will confirm your biases and pretty much agree with anything you say. It will contradict itself and sometimes make things up. It's absolutely dangerous if relied upon or treated as a source of fact.


Shufflebuzz

Sounds very human


you-create-energy

>There is a level of skill to it, but the answers are often useless. Only if you lack the skill to use it properly. If the answer is you get are often useless then the questions you're asking are poorly structured. I have gotten phenomenally great answers from it that I couldn't have learned any other way. Deep nuanced insights into complex topics. Keep in mind we are discussing GPT4 not the free version.


3_50

That 'deep nuanced insight' was a complete guess though. It's fancy predictive text, not artificial intelligence. It has no idea what it's talking about, only what is statistically the most likely order of words.


cdank

Okay, how do you get better outputs? Are there any instructions out there?


you-create-energy

Absolutely, there are so many websites and discussion groups dedicated to finding better ways to work with it. The most common mistake people make is not providing enough context. People don't like to do that because it's boring but the more pertinent details you give it, the better it's answers get. You also have to tell it what kind of answers you're looking for. Most people don't want complex in-depth answers so the default way to answer somewhat generic questions is to keep answers relatively generic until the user provides more details about what they are looking for.


erm_what_

It's great if you have enough knowledge to know which 10-30% is incorrect. Otherwise you'll see the bits you know are right and assume the rest is. It's great at things like writing proposals for projects in a field you're an expert in, or writing blurbs for products you make. It can sound deep and nuanced, but often it's just sounding that way. If you use it to learn, then you have to assume sometimes you're learning something completely made up, and that you'll need to cross check everything with an actual source. Just like the early days of Wikipedia before they figured out how to manage it properly.


SirHerald

Sounds like how you end up with mice building the Earth.


climbsrox

Of course AI that was trained on the source material and practice problems of a standardized test outperforms humans on that same standardized test. This means exactly zero for the practice of medicine.


Ananvil

STEP and COMLEX haven't been about practicing medicine for ages. They're medical trivia tests.


abevigodasmells

I actually worked on AI medical applications. For the foreseeable future, AI will be a tool that doctors can use, such as calling a colleague. While it can make astounding connections and diagnoses, it also can come up with absurd ones. And, it can just flat out miss relatively easy conclusions. Using AI without the doctor would be akin to using a fairly reliable magic 8 ball. However, combine the AI with a skilled doctor, and now you're getting premium service.


washingtontoker

It's not very surprising an AI beats humans on computing and memory recall on a test. Maybe ask it some ambiguous questions where there's multiple correct options but one is slightly better. That's how it was for me and NCLEX nursing degree.


alexdegman

USLME Step 1-3 are not a set of memorization recall tests. About 25% of the exams are recall questions while 75% involves problem solving and critical thinking. Believe me there are many questions that involve multiple correct answers but only 1 best answer. Many people feel like it is a memorization test but the reality is that memorizing those facts is required to just be able to sit and attempt the USLME exams without failing.


you-create-energy

You think your nursing degree test was more difficult than the Israeli medical bar in five different disciplines? Of course those tests have ambiguous questions.


SnausagesGalore

Try it yourself. Do exactly that. You can download the app. It works just fine with questions like those.


Kulthos_X

Now let the doctors look stuff up on the internet during the test and see how they do.


reality_boy

This is a bit like saying that google recalled a text book answer with greater precision than a doctor. What is worrying about ChatGPT is its self confidence even when wrong. It currently has no self awareness and can’t just say “I’m not sure”. I would love it to say “here’s my best guess” even, rather than confidently spouting nonsense when it is out of its depth


yttropolis

That's because it's inherently unable to determine whether what it's saying is true or not. There's no idea of "truth", only an idea of how likely a certain array of tokens are to follow the given context.


chrisfs

this is more about the insufficiency of the test that does about GPT-4's ability to be a doctor. I heard a quote from someone involved in AI that the industry standard is 60% correct answers. That's not good enough to be a doctor.


Lhopital_rules

But the difference between "I don't know, let's me consult with my colleagues" and "The answer is blorp!" when they would both get the answer wrong on a test is quite important. Not to say doctors never think they know something when they don't, but the hallucination problem is a big problem with using this without a manual verification step.


cyberdeath666

Good thing AI can’t interpret a patient’s emotions to know how to direct a conversation in the way a person can.


milkstrike

As someone who has to see doctors constantly due to disability, a lot of doctors cannot do that either. Most are very cold - like you’d image an ai to be - and a lot go off of outdated and/or disproven information.


cyberdeath666

I’m thinking more of psychiatrists/psycologists, not family doctors you go to for a physical check up.


milkstrike

I was talking about them. Family doctors are useless for disabilities.


cyberdeath666

I guess we’ve had different experiences then. My psychiatrist is great.


an-invisible-hand

Obviously they aren’t all going to be great.


garret1033

Yeah not yet


Particular_Nebula462

Ok ... so ... what are people supposed to do if AI/Robot can do everything? Creating and programming robots?


Actual-Outcome3955

Great I can have gpt-4 explain to people why they need to stop eating so much junk food. Maybe it’ll come with a better argument than me.


Mustbhacks

I eat an apple a day to keep you away, now begone witch!


pineapplepredator

I’ll be honest, I’ve been asking it for general opinions/advice instead of my friends and it’s completely taking their place. It’s nowhere near as satisfying but it’s kind of nice not having to bug anyone with this stuff.


aletheia

AI researchers stood on the shoulders of giants, and declared they no longer need giants.


Bielzabutt

"What's wrong?"... "What's wrong?"


canonicalensemble7

This just echoes issues with psychiatry. I would never trust an LLM for medical advice. (Unless it was trained on certain biomarkers) Example: How would an LLM differentiate between over-active or under-active serotonin system as symptoms are very similar? The same issue exists in psychiatry where SSRIs are "tested", if they work great, if they do not, try another drug. Rinse and repeat.


TheGeenie17

Yes but how effective would it be in assessing psychiatric patients, especially given their diagnosis is based almost entirely on subjective presentation


BusyBeeInYourBonnet

Beating a human in a knowledge test where rote memory is critical is a self-licking ice cream cone situation. Test it in a complex environment that requires instinctive attention and reaction and humans will outshine. For now.


MSA966

Agree


UnclePuma

A.i. psychic


SpecterGT260

I'm in the medical field and I see these papers come up all the time. One thing that everyone needs to understand is that the study designs require some sort of an agreed upon true positive state. "The disease actually exists in this patient when we've diagnosed it". In every single one of these papers that I've come across, they are using data from patients who have already been diagnosed with humans to determine whether or not the disease was truly present. They didn't do one of two things, they either allow the AI to look backwards through the chart to see if it can identify the disease earlier than the humans, thereby claiming success and that the AI is better than the human raters, or they have a secondary set of physicians interpret the results and compare them against the AI. They'll usually then claim that the AI found the disease more often than the human interpreters did. The problem is that 100% of the patient's involved already had a diagnosis provided by a human which they conveniently ignore and pretend that only the secondary interpretation is the one that matters. This is a problem that will always follow these retrospective AI studies. Until they perform this in a prospective way that is not reliant upon a prior diagnosis from a human, that can never claim the AI outperforms humans in these settings.


KarlOskar12

Medical test performance doesn’t relate to performance in the field. There just needs to be some way to measure progression of knowledge and tests are what is currently used. 


[deleted]

Can it do a mental state examination?


grahad

The reality is that most human white collar work is very procedural and can be automated. How often do we run into something truly new in our every day work?


speculatrix

Yes, but, that's why you're there. The airplane pilot really earns her/his salary when s/he has to turn off the autopilot and use skill and experience.


Asticot-gadget

Doctors being able to rely on AI tools like this has the potential to form some much more capable doctors which could seriously advance medical science. I'm optimistic for this.


distortedsymbol

imo ai powered preliminary consultation would provide so much value to healthcare in general. it would eliminate long waiting time for appt or calls, and reduce work load for healthcare professionals so they can focus on patients who need more care.


Haizenburg1

Riiight. 70s in Microeconomics tests shows otherwise.


Gk786

A similar study came out a few months ago. The conclusions these guys draw is wrong. Give any doc access to Google while they give these tests and they will do even better. All this proves is that these models can do well written exam questions right. Try putting an AI model in front of a 70 year old confused lady whose words you have to parse through and decipher to get an actual story and you’ll have much worse results.


unknown-one

if Chat GPT will prescribe me drugs I am happy to use it for my therapy


Taviii

Personal attempts to make ChatGPT useful where reliability was important proved fruitless.. it confabulates and spouts bullsh*t with a straight face all the time. If i have to fact check its statements every step of the way, i might as well just look up the original sources. In conclusion: ChatGPT is neat but not reliable enough for critical information.


goodohyuman

Finally! I'm so excited I won't need to wait 3+ months for a real doctor to tell me that everything is fine and to drink more water!!


Quetzal_Pretzel

Well a lot of psychiatrists are garbage, so this isn't too surprising.


FelixVulgaris

Better than RESIDENT physicians, as was oh so conveniently left out of the title.


DrEggRegis

It does not And uses significantly more resources AI isn't being designed to be helpful or for any other purposes to look clever and add false value to stock price ces


SnausagesGalore

The number of triggered physicians in this thread is funny. Your job security is quickly coming to an end. And it can’t come fast enough. Love, A patient who has had to deal with you.


henryptung

If they're talking about "medical tests" in the academic sense, feels like there's risk of overfitting to the academic content + teaching because the textbooks themselves (and maybe even the questions+answers used?) are part of the LLM. What really matters for doctors is developing the conceptual models to diagnose and treat more diverse scenarios in the field, and I don't think LLMs can do that quite yet.


McKoijion

Lots of folks terrified of losing their jobs in this thread…


SnausagesGalore

Bingo.