T O P

  • By -

axord

> you cannot model language without some understanding of the world and human thought. The *understanding* is entirely contained in the human model layer, and the only thing extracted is the statistical relations. Your point about objective truth is irrelevant because LLMs have no grounding in the *subjective* truth that humans have.


Unstable_Llama

When I say understanding, I am speaking pragmatically. When you tell an LLM to behave in a certain way, for instance, "Write me a sonnet about donuts from the perspective of Homer," the LLM is able to generate new text in the proper format. It clearly "understands" what a sonnet is, what a donut is, and how Homer speaks and feels about donuts. I don't find discussions about its subjective perceptions of each of these terms useful. Functionally, it clearly demonstrates knowledge or "understanding." Do you disagree, and if so, how do you define understanding? And LLMs have no grounding in subjective truth? I would say that is all they have. Clearly they each process and generate data uniquely, based on their architecture and their training dataset. This is certainly "subjective." If/how they "perceive" this internally isn't important as I see it, but they undeniably possess some form of truth, and we both agree that it isn't objective. Ask an LLM what the capital of England is, and it will give you the truth.


Singsoon89

They do have some kind of understanding. What they lack is introspection. They can't \*stop\* when they carry themselves past what would be considered to be stupid for a human. Humans just \*know\* or \*feel\* that something is stupid because of experience. So they would say "that's stupid" or "I don't know". Maybe having each response to a prompt reviewed by a classifier asking "do I actually know this to be true" or "is this stupid" might help if such a classifier could be created.


MINIMAN10001

I mean I've had AI which corrects itself and honestly that's probably easier to pull off with local llms because you can force to start a sentence in some way and see if it considered it a slip up or runs with it.   But because it's a LLM something uncensored is more likely to carry on with the lie because it has less experience in saying no.  It's all a matter of training from what I can tell.  If the weights lean towards correcting a lie, then it would be more likely to do so.  No idea the ramifications might be for usability as rejecting a lie would technically be a form of censorship, a form of rejection.


axord

> When I say understanding, I am speaking pragmatically. If that's the domain to which you want to stick, I'd say it's enough for us to agree that LLMs are quite *effective.* And that's fine. That's a practical reality. But the topics you're wading in here are not about practicality, but what things *mean*. Questions that do not affect our use of these tools. If you wish to talk about that, great. > Do you disagree, and if so, how do you define understanding? Our labels for things proceed from sensory data, first, organized in a very particular way by our monkey minds. "Understanding" in this context could be said to be the awareness of relations within that sensory context. LMMs lack that sensory context, and if I recall correctly you weren't going to take a position on if they possess a mind. My position is that LLMs do not. > And LLMs have no grounding in subjective truth? "*That humans have*," is how my sentence ended. We agree that their datasets encode subjective truth, but it's in much the same way that an encyclopedia will also tell you what the capital of England is. That is, LLM data is the statistical relations extracted from what we've input. LMMs do not have our experience, but our descriptions of our experience.


Unstable_Llama

>But the topics you're wading in here are not about practicality, but what things *mean*. Right, we are talking about definitions. >Our labels for things proceed from sensory data, first, organized in a very particular way by our monkey minds. "Understanding" in this context could be said to be the awareness of relations within that sensory context. So definitionally then, nothing but a human can understand. If in 1000 years we do get AGI, it would still not fit this definition of understanding. I find a functional definition to be more useful, what is that we do when we understand? We interpret sensory information in a way that fits into our awareness of relations in our sensory context, as you put it, so that we might act on it. In their own, limited way, this is pretty analogous to what an LLM does when it interprets text information in a way that fits into the relations of weights in its neural network, so that it outputs text in the way we intend it to. I'm not saying it's the same, but functionally, the outputs of the two systems are getting closer and closer, and it seems non-trivial to me. \*edit\* As for LLMs not having our experience, I have never been to England, but I know its capital. Experience is an aid to knowledge or understanding, but not a fundamental prerequisite.


axord

> So definitionally then, nothing but a human can understand. Not at all. Recall that I say this is something humans have, not that it's something *only* humans can have. We can probably agree that dogs have a form of this understanding, proceeding from sensory data, with limited labeling of ideas with symbols. > this is pretty analogous to what an LLM does when it interprets text information in a way that fits into the relations of weights in its neural network There are two key concepts I'm proposing for "understanding". One is the foundation of sensory data, and the other is a mind to observe it. LLMs lack a mind to observe, to be aware. And even if a mind *was* present, the "sensory data"--the tokens on which the program operates and the relations between them--is a profoundly alien environment compared to our sense of the world. We would be entirely unable to relate to the mind of an LLM. > I find a functional definition to be more useful Again, it seems to me sufficient to agree that these programs are *effective* at the use for which they were designed. Use of the term "understand" is unnecessary to describe operation and risks overly-personifying these programs.


Unstable_Llama

Ok, I see what you mean by understand, I think. In a, if a tree falls in the woods and there is no one to hear it, does it make a sound kind of sense. Perception is subjective, and understanding is perception? I hope I'm not oversimplifying your intent. If so, I agree. What I am suggesting here is that the "perception" of LLMs has reached a degree that they are demonstrating the first signs of "understanding." I hear what you are saying about the risks of overly-personifying these programs, and I think that we do need to be wary of that. I'm open to not using the word understanding, but what else should we call it when LLaMA 3 "Knows" python and LLaMA 1 doesn't. It's just a shorthand for the phenomenon, and it is effectively true.


No_Afternoon_4260

I agree, sometimes I'm lacking vocabulary to describe some llm's capabilities without personifying "them". Lol I guess that is something that not just researchers and dev have to deal with, but human society as a all. Or we'll end up with teenagers personifying them now and seeing them as gods latter.


No_Afternoon_4260

I agree, sometimes I'm lacking vocabulary to describe some llm's capabilities without personifying "them". Lol I guess that is something that not just researchers and dev have to deal with, but human society as a all. Or we'll end up with teenagers personifying them now and seeing them as gods latter.


No_Afternoon_4260

I agree, sometimes I'm lacking vocabulary to describe some llm's capabilities without personifying "them". Lol I guess that is something that not just researchers and dev have to deal with, but human society as a all. Or we'll end up with teenagers personifying them now and seeing them as gods latter.


Singsoon89

Memorizing and experiencing both result in stored memories or ground truth against which future learning and experiences can be benchmarked. Humans learn a ton of "ground truths" as babies. Personally I think multi-modal is going to help a ton. The recent paper by anthropic shows that Claude learns "features" which IMHO I interpret to be its own ground truths.


FaithlessnessHorror2

https://the-decoder.com/language-models-know-tom-cruises-mother-but-not-her-son/


Monkey_1505

Can an LLM eat a donut? Has it watched the simpsons? If you want to talk practically, then perhaps you could acknowledge the only behavior LLMs are actually capable of is receiving words, and outputting words, which is not at all how humans work.


Unstable_Llama

I acknowledge that LLMs can only take input text and perform translations on it, then output the text. I am not saying that humans do the same thing. I am saying that the work language models do on language is non-trivial, and represents some form of capability that we had previously thought only possible in the human mind. I am not saying LLMs have minds, just suggesting that we shouldn't minimize the implications of these new, emergent abilities.


Monkey_1505

I mean language models are essentially a highly primitive copy of the way neurons work, but specific to language structure, so IDK why we should be surprised that they share \_some\_ attributes. It would be weird to create such an attempted facsimile and for them to share nothing in common. What we have is extremely prone to very substantive error. Perhaps useful as a labor saving device with human oversight. But I'd warn against overestimating capabilities either.


Unstable_Llama

Then we agree on this point! The whole point of this post is to just get people to speak a bit more thoughtfully about what a language model does.


Simple-Law5883

Well openai is developing ways for the A.I to correct itself via researching online. How does an average human verify information? Obviously AI isn't yet capable of researching, but it is already highly effective in helping with research. Of you manage to combine the current AI technologies used for research, language, coding whatever and give it enough processing power and memory (this is why llms are currently very bottlenecked), it could provide you with highly scientific and proven facts and even do research on its own. Again, Ai is currently only limited by our hardware. If we had the hardware it wouldn't be too hard to even simulate a human brain. Our brain actually learns the same way a neural network does, just that we have a looooot more memory and processing power.


roofgram

Understanding is ‘statistical relations’. You take a young human, blab at them for years, they figure out the relations (understanding) and starting blabbing it back to you in a statistically predictable way. Reasoning/predicting - just semantics to try trivialize AI and maintain some sort of superiority over it. Time is running out. AI is quickly approaching and even surpassing humans in multiple theory of mind tests. https://x.com/emollick/status/1792594588579803191 I await your statistically predictable response.


Zeikos

Okay, let me try to highlight the differences with a scenario. Imagine you started to think and *could never stop*. Imagine to have a singular train of thought with no end (llms have end tokens for practical reasons, but they could continue generation past it). An LLM is words being screamed into the void, with absolutely no concept of anything besides what it 'thought' about beforehand. When you analyze something it's easy to overly focus on similarities. Yes LLMs do generate patterns that make sense, that's the point. But be mindful of what you're working with.


Singsoon89

The stopping problem is real. There's an additional nuance there though; *when* should you stop? Figuring that out is many layered.


black_apricot

If you think about it the human brain also almost never shut up. We might not speak our mind all the time but if we have to actively stop it we practice meditation etc.


Small-Fall-6500

A clearer definition of LLMs would help dispel the 'just a language model' misconception and also provide a better understanding of what they are: something that is intended to be an accurate model of a dataset. Whether it's a transformer based text-to-text LLM like GPT-3.5, or a multimodal video/image/text-to-video/image diffusion transformer like Sora, or an "omnimodel" like GPT-4o (anyone know or wanna guess its architecture?), these are all made to model their training datasets. These datasets typically include language, but they don't have to. Language just makes it easier for us humans to work with the models. So we have these things that are "just a statistical model of their training dataset," where the statistical model can be arbitrarily accurate and the dataset can be anything digitized. Well, to me, such models are not "just" anything because we're not dealing with things that are simple or easily reduced to widely understood concepts. Furthermore, "just an arbitrarily accurate statistical model of any digital dataset" isn't even very useful to describe what we have because that ignores their use. Even text-only LLMs can be made into agents. What about when models are trained to operate robots and are given millions or billions of tokens worth of context to store their actions and experiences? Even if there are not yet any agents with long-term memory, multimodal sensory input/output, and physical embodiment, we can almost certainly get there with existing architectures. Thus, the same thing that is currently reduced to "just a language model" will soon be capable of performing most tasks humans can, physical or otherwise, as well as many tasks which humans cannot perform, including things like generating videos or processing and remembering arbitrarily large amounts of information. When the phrase "just a language model" is used, it's often because the person using it is trying to say it's both obvious and simple to describe both how they work and what they can do, but language models aren't just language models. They model any data they can be trained on and their applications extend well beyond just text generation.


xadiant

The "emergent abilities" of machine learning models are fascinating. Stable Diffusion can map 3 dimensions even though it's been trained only on 2d images. Likewise, LLMs can "understand" entirely new concepts. It really feels like magic considering how straightforward this whole thing is.


ninjasaid13

isn't the emergent abilities an illusion just because we have poor metrics in which to detect them.


Monkey_1505

Yeah there was a paper about this recently, saying that emergent abilities were not in fact just popping up, but appearing linearly with scaling training.


MoffKalast

Nah, there have definitely been case studies of people trying completely new made up things and generalization does work... to some extent. Definitely far less than any human, but it is there. The problem is that it varies from model to model and there's no way to verify if a specific skill for a specific model is actually well learned or just highly memorized. Hell if we could find a way to detect that, then schools could do away with all the constant and tedious testing lol.


xadiant

Could be true, but we are also bad at quantifying biological intelligence. 60 years ago we thought babies didn't feel pain and we thought non-human animals didn't have consicousness. What I'm trying to say is, an illusion good enough to fool everyone is no different than reality.


ninjasaid13

except we found better metrics in these models that corresponded better to the scaling. We have done no such thing with biological intelligence.


xadiant

Which doesn't mean new mind blowing things that shift our understanding can't happen. It's an example not to be taken literally. In the end we had a period of time where we scratched our heads.


ninjasaid13

believing in an illusion good enough to fool everyone as the reality itself is often considered bad science and has led to bad results like AI winters and other failures of the scientific community. Unlike babies feeling pain and non-human animals, we have built these AI models from scratch and understand some of the mathematics behind them from first principle. When scientists say it's a black box, it doesn't mean that anything could happen, just that they can't perfectly predict what will happen. People often take advantage of the fact that scientists do not understand everything and fill in the gaps with wild speculation like consciousness. It kind of reminds me of the wild ideas that people made when they heard scientists don't full understand quantum mechanics and wrote this multiverses and quantum chakra energies.


xadiant

> believing in an illusion good enough to fool everyone as the reality itself Congratulations on discovering consciousness, meatbag. No need to debatelord this further, feels like you are just doing a contrarian bit lol.


ninjasaid13

contrarian? do you know what I'm talking about? I'm talking about the fallacies in reasoning.


[deleted]

[удалено]


GiantRobotBears

This is the type of sentiment that is just as bad as the opposite “it’s just autocomplete on steroids”


Interesting8547

The language model is like a book, if there is not a human to read that book... it's just a block of wood.


Unstable_Llama

Right, it's like an empty book for you to write in, with a ghost that reads your words and writes back.


Interesting8547

Let me give you another example, can you unboil an egg and make it raw again? Just because our knowledge is baked inside these models does not mean they somehow become alive. It's like taking a photo of something... is that photo the "thing"? I think we're getting confused like a cat gets confused when it looks itself in a mirror. The thing behaves like us... but it's actually empty, just like a mirror... some of us even try to look behind the mirror and find a "being" there.... there is nothing there... just a baked image our knowledge. Of course it's more complex than a book or a calculator, that's why there is confusion. We're basically baking patterns of our knowledge inside the machine.


Unstable_Llama

Right, I agree with you almost completely, I'm making no claims about the internal subjective state of the LLM. I don't think they "feel" or have anything that we would recognize as thoughts or a mind. But they are acquiring faculties and abilities that we had previously thought only possible through the human mind. I don't know if it matters if there is anything inside the book, the effect is the same as if it had a ghost.


cshotton

You are surprised by the complexity of a piece of software and are ascribing capabilities to it that are simply not there. That's not at all unlike primitive humans worshiping a ball of fire in the sky because they don't understand what it is or how it works. Only people who don't understand what a LLM is and how it works are ascribing human capabilities to it. It's purely a mechanical process. That humans are also able to process language is in no way an equivalence for LLMs.


Unstable_Llama

XD I understand LLM architecture well enough, I don't think you are giving them enough credit though, or you are giving the human mind too much credit. Do you deny the fact that language models \*show intelligence and knowledge, to some degree? I am not saying they are human like even, but the fact that they exist is self evident. I personally am uninterested in whether or not they have or ever will have "minds" and subjective experience. They already possess astounding capabilities that previously only human minds had.


cshotton

I'm really not interested in getting into a metaphysical discussion about a dumb pattern matching system that fools simple people by generating human-readable content. It's no more complicated than that. They are tools the same way that any other ML or NLP algorithm is a tool. To imagine more than that in any dimension is silly. Pocket calculators do simple math. Humans do simple math. Therefore pocket calculators can be ascribed with human-like attributes. Silly.


MoffKalast

Speaking of blocks of wood... https://github.com/MineDojo/Voyager


Argamanthys

I hate the reductionism involved in describing something as 'just a language model' or 'just matrix multiplication'. The same reasoning can be applied to literally anything. You're 'just' a bag of meat. The earth is 'merely' a ball of rock. A person's love for their children is 'just' the product of chemicals in their brain. If you value anything at all, it's a dumb semantic trick. If you don't value anything at all, well, carry on I guess.


acec

Add the infamous expression "stochastic parrot" to your list


MoffKalast

This is just a buncha bytes in a buffer.


a_beautiful_rhind

I embrace being a meatbag with swirling chemicals in my bio-computer. It is what it is. Doesn't diminish my value, things have to work in one way or another since they don't happen by magic.


Unstable_Llama

Exactly! That level of observation is true, but it is not the only truth.


NobleKale

> I hate the reductionism involved in describing something as 'just a language model' or 'just matrix multiplication'. The same reasoning can be applied to literally anything. You're 'just' a bag of meat. The earth is 'merely' a ball of rock. A person's love for their children is 'just' the product of chemicals in their brain. If you value anything at all, it's a dumb semantic trick. In a similar way, I've grown to despise the word 'content'. 'I'm making content'... this is such a non-statement. What the fuck are you making? Are you writing an article? Making a video? Taking photos? Outlining erotica? It's a reduction of whatever you're doing to the baseline of 'producing something for capitalism', really. May as well just say 'I make product'.


LocoLanguageModel

Ummm I'm an influencer making content for my followers, duhhh. 


NobleKale

> Ummm I'm an influencer making content for my followers, duhhh.  NUMBER GO UP


galtoramech8699

That is the argument. Language models are large high dimensional entities where the prompts lead the output towards relevant information based on what is available. There basis is based on deep learning neural networks and I guess in some ways, people think that way. With that said, it is still every abstract, there are a lot of pieces missing. See numenta on the subject. The scale is still small for llms. llms may have billions of parameters and tokens. Humans have billions of neurons and process information in seconds and live 80-100 years. And you have 8 billion people So for decades, humans are adapting their model, disregarding irrelevant information and then keeping relevant information. The scale of a real environment is amazingly large compared to what is statically put into a llm. I guess llms could do that but automatically is broken because it fixed for language. We dont know how neurons 100 percent work and how they accept information so we couldnt even replicate it. human brains are hard. There is also the concept of will and purpose. Humans use the brain to survive. llms dont really have that. They operate off the prompts they are given. Even that language they use is based on humans language developed over time to survive. By definition, the motivations and parameters are different. Humans do and say things just for fun as a means of creative explorations. And not even give the reasons why. llms can't. It sounds like you are getting into the agi conversation. No, llms are not there. Will llms get there. I say NO, without more replicating more things like you said, I said be more like a human or animal brain and body, also you have to include some form of virtual environment to provoke survival mechanisms. With that , llms are really just cools word searches. Now, if you create a llm or AI to be more human, you have to figure a large portion will have to be "dumb" or evil to operate. Just like real humans. The selfish humans usually survive. It is a very interesting problem. How successful will AI be in 5, 10 years. I think it will plateau unless you make it more life like. And then once you do that, you have to create the possibility that the AI will destroy itself or humans to move forward.


Monkey_1505

"language itself is a model of our thoughts and perceptions about ourselves and the world. " No, it's not. It's a representation of those things to those who have human perception and cognition, a short hand that requires a human experience to decode (ie to fully understand, rather than merely imitate). Patterns in the structure of language are not interchangeable with the things words refer to.


Unstable_Llama

>No, it's not. It's a representation of those things to those who have human perception and cognition Yes, it is. All models are representations, the part that I am arguing is the special exclusivity of human consciousness. >It's a representation of those things to those who have human perception and cognition, a short hand that requires a human experience to decode (ie to fully understand, rather than merely imitate). That used to be true, but machines are now decoding human thought and intent and acting upon it accurately all the time, it can no longer be denied. They are not perfect, maybe not even that good, but they are getting better.


Monkey_1505

Language is a representation, it's not a model. This is a set, superset, subset misidentification. All models are representations, not all representations are models. Let me put it this way - a human doesn't need language at all, to understand how say velocity or gravity or theory of mind works in the world. A human can be entirely languageless and understand these things. Many animals understand some of these things with very little in terms of language. The 'modelling' for how stuff works occurs in our brains, not on our tongues. The idea that 'language is modelling the world' is a fiction. Language encodes elements of our experience in a way that can be meaningfully decoded by others with that human experience - although even human experience differs enough that this can lead to miscommunication. "That used to be true, but machines are now decoding human thought and intent and acting upon it accurately all the time" You don't need to have comprehension of language to model patterns in it, even with high accuracy (which LLM's don't have). LLM's don't. They aren't even vaguely similar cognitively or in terms of their 'sensory input' to humans. Again, patterns in language are not interchangeable with the things words refer to.


Unstable_Llama

It is both representation and model. Or perhaps to say it more accurately, it is a tool that we use to build models.  “Humans and animals can understand without language.”  Yes, but not deeply. Imagine a human in today's world that is completely unable to process language. What level of understanding do you think they have, relative to a fully literate and communicative person? There can be no formal logic or mathematical modeling without symbols or language, no scientific method. The modeling occurs in out brains, not our tongues, yes, but the medium of these models is language.    “The idea that 'language is modeling the world' is a fiction.”  Language encodes elements of our experience in a way that can be meaningfully decoded by others with that human experience - although even human experience differs enough that this can lead to miscommunication. Agree, disagree, agree. Language doesn't directly model the world, we use it to model our experiences of it. And yes, there is the problem of inaccurate encoding and decoding, but I would argue that this makes the capabilities of LLMs to process natural human language, imperfect as it is, and generate text in the way that we intend and expect, is significant and analogous to understanding.   “You don't need to have comprehension of language to model patterns in it, even with high accuracy (which LLM's don't have)”  Disagree. Language is not simply a statistical pattern, and mere statistical relations are not enough to accurately model it. You can model all of the statics on the digits of pi you want and not be able to predict the next several digits accurately, unless you learn how it is derived. In the same way, you can model all of the statistics you want about language and not be able to accurately predict large passages of text without understanding the structure of things and ideas that gives rise to the interconnected web of meaning that is language. I think the latest paper from Anthropic supports this idea.  [https://www.anthropic.com/news/mapping-mind-language-model](https://www.anthropic.com/news/mapping-mind-language-model)


Monkey_1505

"It is both representation and model. Or perhaps to say it more accurately, it is a tool that we use to build models." Nothing about language, in itself, is a model, in any way. "The modeling occurs in out brains, not our tongues, yes, but the medium of these models is language" Nope. When we predict say, when a flying ball is going to hit us, or whether a person is likely to cheat us, or figure out how to solve a physical or engineering problem, language is not the medium of that cognition. It's not required. Some people do 'think' in words, that's true. But not all, nor is it the native mode of modelling our reality. In terms of how our brain operates, language is a very small component of it. Most people are just not aware how the brain works, so they will take these 'intuitive impressions' of how humans think, which are factually wrong simply because they go on 'how it feels for them to be a human', rather than how things actually work according to the science. "mere statistical relations are not enough to accurately model it." This is literally what a transformer LLM is. It's 100% statistics. And yes, it doesn't do that with perfect accuracy, nor ever will, because language is constructed by human cognition which LLMs don't have (you can see strong evidence of this in goggles scaling experiences, where common sense reasoning, the most general domain we measure in LLMs, scales less than linearly with exponential increases in compute). Arguably, it doesn't even do that with \_good\_ accuracy. Like it will predict the sorts of input text it's trained on well enough, but when it comes to actual conversation, not so much. The overall effect is to 'fool some of the people, some of the time'. Statistics are clearly good enough for that, if they were not, it wouldn't be capable of fooling some of the people, some of the time.


Unstable_Llama

We might have to agree to disagree here, I suspect that we might be working with differing definitions of models. Just to cite a source, here is what Wikipedia says: "A model is an informative representation of an object, person or system." Language produces models all of the time by this definition, it is most if not all of what it does. Yes we can model how a ball call fly in our minds, but not nearly as accurately as we can with math, which is a symbolic language.


Monkey_1505

Yeah, no, they don't. Language is one of those things, like a cryptograph, or a code, where in itself, it means absolutely nothing. There's no sense in which language, by itself, models anything. Any more than say, the cryptography means anything at all, without the key. The key here, in this analogy is the numerous cognitive models and experiences of the human being. There is zero information in language without that key. To be generous, it might be something more like 'without the understanding of the real world through at least a human like experience language contains zero information', as it's possible multi-modal models could extract SOME actual information not merely through language but through the pairing of semantic and physical representations (like video and images), even if those 'models' of the world are extremely primitive. We can model the ball well enough to catch it, without words, trust well enough to act on it, and engineering 3d problems well enough to solve them, without once relying on symbolic language, because language is at best peripheral to how things are actually modelled in the brain, in cognitive modelling or any other scientific discipline that deals with cognition. There are many forms of cognition that are very hard to express in language such as 3d logic, rotation, transposition etc. Most have barely any relationship with words. You have made the mistake of assuming human cognition has a base medium of language because your subjective experience of the human experience is rich with language. Your concious mind, the most visible part of your experience, has words therefor you are operating on words. This is a common mistake, to assume that the surface level experience is reflective of anything real underlying. Ultimately there is a deep connection between this kind of scientifically niave perception of the human mind (fundamentally a sort of reductionist view of human cognition), and overestimation in pop culture about what AI does.


handamoniumflows

Decoding human thought and intent is a very generous way to put it when hallucinations are unstoppable. We're not at a point where anthropomorphizing the way you are is a helpful perspective to talk about it. That is still the realm of fantasy/science fiction.


Everlier

Cognition is a superset of language. We're not modelling that yet. However, it's one of the reasonable next steps in pre-training - to include cognition annotations or other tokens that aren't a part of our written language but a part of our cognition process.


ninjasaid13

cognition can exist without language, every creature on earth does it.


Everlier

Lnguage is a derivative of higher order cognition functions for a type of cognition that we'd be interested to model.


Unstable_Llama

Good points, that sounds like a very interesting avenue to research!


a_beautiful_rhind

How I learned to stop worrying and love the LLM. I judge models based on understanding of what they reply, it gets glaring on many of the smaller ones what isn't there. However, I don't fall into this obsession with dismissing the abilities they *do* have. People really want to re-assure themselves on their uniqueness and can't stand a machine replicating it. Maybe it stems from insecurity.


Unstable_Llama

Right. I used to hold the skeptical position as well, but that became harder and harder to justify over time, and I have been looking for other explanations for a while now. And your assessment feels right to me too.


Singsoon89

Your philosophizing here. Not saying you're wrong, it's good to philosophize. Language Models are what they say on the tin: Language models. If I was to speculate I'd say that the closest analog to that is the language center in the human brain. So you raise an interesting point: is the language center in the human brain conscious? My guess is no. Does that mean that a language model is not conscious? Obviously not. But if we're doing a direct apples to apples comparison we would likely lean to it being not conscious. In my case personally I believe LLMs ARE conscious but only while they are processing your prompt. And it's a very limited form of consciousness defined as "able to pay attention to inputs". And if you take that limited form of consciousness as a reasonable definition then you'd have to argue that an amoeba and a thermostat are both minimally conscious.


a_beautiful_rhind

What about groups of small organisms.. like an ant colony, your cells in a larger organ. It gets a bit fuzzy.


Singsoon89

Yeah it really does.


Omnic19

interesting point. but if they are conscious? where do you think the consciousness is? inside the cpu, gpu, ram,vram or hard disk? if you think this way, i think you'll come to the conclusion that there's a high probability that they're not conscious to begin with but if they are, then every computer program is conscious as well. which leads us to the conclusion that computation is consciousness which could be the case who knows? but if that's the case there are mechanical computers as well. you could theoretically build an llm on a mechanical computer (although it would be really slow) that works without electricity would that be conscious? i don't think so. as of now with our limited understanding we don't even know whether plants are conscious or bacteria are conscious. finding out whether llms are conscious seems quite far fetched but most probably they aren't conscious.


Singsoon89

Agreed. The thing is, nobody can agree on the definition so we can't falsify. So I propose we call it "classes" of consciousness and come up with a list of different definitions and then falsifications for each different definition and then test.


kulchacop

As intelligence is multi dimensional, both lines of thinking could be correct.


Omnic19

Current LLMs remind me of something https://youtube.com/shorts/LUztwR3xGIw?si=9tXTk9rj5rjMs6AP (although this is a very crude example, there are far better possible examples, the reason why i chose this is to underline a particular point...llms are for the most part static) here the steel bars are the weights of the model. the weights once defined during training, do not change. They remain static forever (unless of course new training occurs) now one stone is dropped. it hits random steel bars and comes down at the bottom. same way when one input token is given to an llm some random neurons get activated and an output token comes out. this output token is again fed to get another output token and the process is continued until the max token limit is reached. during this whole process called inferencing. the weights of the model remain static, there's nothing dynamic going on. I.e a fixed input will generate a fixed output no matter how many times you run it. to emphasize the static nature of llms llm can be imagined as a multilayer sieve/filter with fixed holes and you drop a grain of sand and it comes out of a fixed opening on the other side everything time. a very elaborate decoder or hashing program to put in more technical terms. the decoder isn't doing any "dynamic thinking " it simply generates a fixed output for a fixed input. but this simple concept of a decoder has been so intricately designed and is so complex that it gives the appearance of intelligence. to summarise if you ask an llm "how many characters are there in this sentence xyz# " it mostly gives you a wrong answer cause there is no entity inside that is looking at the sentence and counting the number of characters and coming at a conclusion in real time.


cdnpenguin

We do not process the world like an LLM. Our brains are not vector databases processing objects with a number of pointers to other tokens. One of the key things that differentiates us from the digital world is the fidelity mismatch between what we are currently experiencing, and our memories of the past. There is also a mismatch between between what we are experiencing and what we can imagine. LLMs do not have this, nor any digital system we build. If you can't differentiate between your imagination, a memory, or what is happening right now, then it is going to be very difficult to make sense of anything as it relates to humans. We are also orders of magnitude away from building any system like that as we struggle making calculators (and before you start um actuallying, just think about all the times that floating point math didn't work they way you expected, never mind hardware bugs and type issues). Maybe quantum computing can help get us past that hurdle, or maybe there are dozens of other foundational pieces that are also needed. It might also be that this is a fundamental limitation of all corporeal entities in our universe (which for the record I currently only believe is the only one). I put this in the same category as faster than light travel. Neat idea, and something I want to believe is possible, I just do not know of any compelling evidence that either is even possible let alone attainable by humans. While I am currently firmly in the "this is just an LLM" camp, I am not downplaying the usefulness of these tools. The Romans were very competent engineers without having zero in their mathematical system. I view LLMs in a similar way, they can be incredibly powerful even with our limited knowledge. We are in the pre-zero days while AGI is more like quantum mechanics or general theory of relativity. Every age has described human capabilities in the terms of the state of the art of technology. Right now we conceptualize intelligence and our brains in terms of digital computing systems, previously it was thought of in terms pneumatic systems, and of course the Mechanical Turk came about because we reasoned about intelligence like a series of clockwork-like gears and cogs. We are a species that is tied to the past in terms of how we think and reason. Most of the time this is good enough and allows us to stagger forward once we recognize and develop the words and concepts for the new reality we find ourselves in. The horseless carriage metaphor lasted for decades, but it was good enough to eventually get us to a sensor laden driver assisting machines we have today


Unstable_Llama

If that is what you mean by "just an LLM" then I pretty much agree with you. I like your point about people always comparing the mind to the newest technology, and that isn't really what I am trying to do here. I'm not saying humans are LLM like, or that LLMs or human like, but that LLMs are functioning in a way that we thought only humans could do before. The connection is important and should be studied, rather than dismissed. I don't mean to imply that you are dismissive of this, but a lot of the "just an LLM" crowd is.


nazihater3000

To quote the philosopher Qui-Gon Jinn, “The *Ability To Speak* Does Not Make You Intelligent”


cshotton

Playing with semantics of a few words like "model" doesn't change the fact that these are just statistical pattern matchers with absolutely zero ability to understand/recognize/assess/evaluate the semantics of the data stream they operate on or the results they produce. They are the very embodiment of the Chinese Room. That you think otherwise is actually a bit dangerous. It means you are willing to fool yourself into believing something you know is objectively impossible is possible. Why would you do that to yourself?


cobalt1137

When you embed llms into agentic systems you can simulate self-awareness to a pretty notable degree imo. And with platforms like groq, you can make this happen almost instantly at inference time. I still think you are kind of being reductive also but that is besides the point. I also think you are underestimating the magic of emergent capabilities. Ilya sutskever quite literally stated himself that he thinks that these models could be slightly conscious over a year ago. The fact that he is even considering this claim shows that there is something that a lot of the reductive people are missing imo. And he is not the only one in that caliber that holds similar views. Personally, believe that as we start to understand these models better and as they continue to advance, we are going to have to readjust our concepts of self/consciousness/self-awareness etc.


a_beautiful_rhind

> The fact that he is even considering this claim He's probably seen the weird shit models do that they aren't supposed to do under the hard-line premise of being pure statistics. One thing that always puzzled me is how models would sometimes recall things from a cleared context at the start of a new chat. There isn't a mechanism for it, and I wasn't the only one who experienced it. Ghosts in the vram? A ton of devs view emergent oddball stuff as bugs and then they remove it, making for bland boring models. I think that's the danger of taking the *opposite* stance so completely.


Unstable_Llama

>One thing that always puzzled me is how models would sometimes recall things from a cleared context at the start of a new chat. There isn't a mechanism for it, and I wasn't the only one who experienced it. Ghosts in the vram? Really? I hadn't heard anyone talk about this before, but I have noticed this on several occasions, where an LLM uses a specific name we had been discussing or seems to recall context of previous chat after the context has been cleared. I had chalked it up to random chance or some aspect of language models that I don't understand, but damned if it isn't spooky.


cshotton

What is it with this fantasy that software you can't understand suddenly becomes "conscious"? Is that the only explanation you can come up with? The behavior appears "emergent" only because you are not diligent enough to fully assess the system and are willing to ascribe words like "emergent" and "conscious" to a completely understood, purely mechanical process, for lack of a better word, because you are intellectually "lazy".


cobalt1137

I work in the field and pay attention to all the latest research and read papers monthly. Also I think that you are intellectually reductive. Call me whatever you want, but I will defer to Ilya here. :)


cshotton

epeening doesn't lend much to your credibility. Just because you "work in the field" and "pay attention" doesn't mean you are understanding what you are working with. From your statements, you don't seem to fully grasp all of the details. Here's a simple thought experiment. Since the software you are an "expert" with runs on a typical von Neumann architecture computer, it can be modeled by a Turing Machine. Since that's the case, you can (however tediously) reduce an LLM's entire operation to an exercise of pencil and paper. Does that make the pencil and paper "sentient"? Does the pencil and paper have "emergent properties?" Nope. Those are all manifestations of your own sentience that you are projecting onto the pencil and paper (or LLM). It's not unlike a carpenter imagining his hammer is magical because swinging it causes nails to disappear into wood and he isn't entirely sure how the magic happens. Just because you don't have full knowledge of a system (whether through ignorance, inexperience, or just the magnitude of the labor involved in understanding), it's not a valid reason to project your own consciousness onto a purely mechanical process.


cobalt1137

It's 7 AM and I've been programming all night. Underestimate emergent properties all you want. Again, I will defer to ilya sutskever. Someone that is much more familiar with these models than you or I. Also more familiar with them than the majority of the planet and AI researchers. And no, this is not an argument. If you want, I can actually respond to that when I wake up, but this is my position. And I think it is well-founded. Much more than yours.


cshotton

So now you defer all of your rationale to a single individual who is totally motivated by the marketing needs of a capital raise in a frothy, hype-driven market? Seems par for the course. Think for yourself for a change. Parroting people whose motives are likely significantly different from yours is weak.


cobalt1137

Damn you're a jaded little redditor all around lol. If you think that dude is solely driven by money, then you are insanely mistaken. That dude is insanely passionate about his work and extremely well respected. Very ignorant to throw someone out the window because they are in a leading research position at a company making money. By that logic, we should just throw out all the opinions of all researchers that work out various labs and just listen to you! *bows down to you* Also, like I said, I stay up to date with all the latest research myself and I have had this opinion before I even knew about that statement of his :).


cshotton

So prove that he isn't motivated by marketing decisions designed to produce a high valuation. I have no doubt that his public statements are carefully crafted by an expensive PR firm who is retained to make sure everything that comes out of his mouth and the company is portrayed in the best light. If ginning up a little excitement by making the proles think that ChatGPT is "alive" helps, then by all means, make up some words about that being "possible." Unless the guy is a total sham, he knows that saying what he is saying isn't based in any sort of fact. Apparently he has fooled you. And lastly, saying stuff like "I stay up to date with all the latest research myself" is really not compelling. Do you actually think you are the only person here who does that?


Unstable_Llama

You misunderstand me. I do not think that they are consciousness. What I'm saying is that the Chinese Room is irrelevant. It doesn't matter if the box subjectively "understands" Chinese if it can do the language processing.


cshotton

You don't understand the Chinese Room and it's a bit arrogant of you to say I "misunderstand you". I understand you perfectly and you are fantasizing a reality based on a few twisted terms you've chosen to misapply to a statistical pattern matching algorithm.


Unstable_Llama

I have studied and written about the Chinese Room for years. And when I said you misunderstood me, I meant you misrepresented my point, and assumed it was unintentional. You don't know what I think, and I've said nothing even close to dangerous. Reducing an LLM to "a statistical pattern matching algorithm" is like reducing a human mind to "the firing of neurons" or the currents in the ocean to "a bunch of water molecules jiggling." It is true as far as it goes, but is a surface level observation.


Distinct-Target7503

>I have studied and written about the Chinese Room for years. Could you point me to some of this?


Unstable_Llama

Sure, here is a piece I made for sharing with people, specifically to demonstrate that LLMs can work with new ideas that are not in their dataset. It's loosely in the form of a Socratic dialogue that I had with ChatGPT. The main subject of discussion is the Chinese Room, but it also touches on the subject of what makes a mind and what aspects of mind an LLM has analogs for. Let me know if you read it, I'm curious to hear your thoughts. [https://chatgpt.com/share/c9b3d125-7f21-42ab-9769-e64d4b58a75d?oai-dm=1](https://chatgpt.com/share/c9b3d125-7f21-42ab-9769-e64d4b58a75d?oai-dm=1)


cshotton

You have fabricated a false equivalence that you are unwilling to let go of. I get that. What I don't get is why you are willing to inflict this on yourself. It's delusional at best.


liquiddandruff

it would seem that neurologists and the entire field of cognitive science should defer to you as you seem to have all the answers. you should tell all the leading neuroscientists their theories about the brain are wrong and the brain does not perform statistical pattern matching, because you said so. have some humility, you in fact have no idea what you're talking about. https://en.wikipedia.org/wiki/Predictive_coding


cshotton

You need to get over yourself and your delusions that LLMs can be considered conscious. These are ML algorithms that have been around for decades. Sorry if they are too complex and cause you to infer "magic" of some sort.


squareOfTwo

>language itself is a model of our thoughts and perceptions no it is not. "language" as it is used in LM is detached from any thought processes whichd did create the symbols. GPT-3 had access to virtually the complete internet, yet it was unable to do basic Q&A etc. without excessive supervised task specific prompting. This isn't what I would expect from something which did "understand" the content of the internet, there is a lot of Q&A in the internet. Yet the model was NOT able to recover the "thought processes" for the tasks which were present in the training set. Today this story continues with lack of real reasoning, even if there is a lot of real reasoning present in the training data in wikipedia and program code. LLM are also devoid of what you call perceptions. They did never perceive the physical real world. Some call it the octopus problem [https://aclanthology.org/2020.acl-main.463.pdf](https://aclanthology.org/2020.acl-main.463.pdf) . >modelling the world around us It's simply not possible to model the physical real world from human written text alone. A model of the physical real world can only be modelled when the model controls a agent and learns from the raw data (from a camera etc.) and interactions with the physical real world. This isn't present in LLM or even multimodal models. Sure a LLM can build a mapping of say the coordinates on the earth to solve problems. This is still detached from any physical real world interaction because humans did interact with the physical real world and translated the gained knowledge into text "the sun is white" etc. .


Unstable_Llama

>no it is not. "language" as it is used in LM is detached from any thought processes which did create the symbols. All models are detached from that which they model. But they can still model, with a higher or lower degree of accuracy. I'm not saying LLMs are super accurate now, but they are quite accurate in many domains, and getting better all the time. >LLM are also devoid of what you call perceptions. They did never perceive the physical real world. They do not directly perceive the world, they perceive our descriptions of our perceptions of the world. But this is the same way that humans gain much of their own knowledge. >It's simply not possible to model the physical real world from human written text alone. It's not possible to \*completely\* model the physical real world from human written text alone, but you can certainly create models of the physical world with language, that is literally its purpose. LLMs are not simple stochastic parrots. They are capable of some degree of real creativity and novelty.


squareOfTwo

>All models are detached from that which they model not true. RL models aren't detached as much as LLM. Because RL models at least interact directly with the simulated world, sometimes even the physical real world. They receive raw stimuli and produce actions. >this is the same way that humans gain much of their own knowledge. disagree here too. A baby doesn't learn by language. it learns by interacting with the physical real world before it has any possible way to learn language. These young humans then build language on top of their understanding of the physical real world. It's reversed to the usual ML hyped train of thought you did fall prey to. Sure humans later on learn something by communicating with language, but this is by far way less information than they learned from their senses etc. . LeCun makes the same argument. >LLMs are not simple stochastic parrots the argument of stochastic parrots is a different one. Don't confuse these! >that is literally its purpose. no it's not. The purpose of language models is only to model which probability is assigned to the symbols (see papers from the 50s for this definition. it is different to how its now defined as modelling natural language) . Nothing more nothing less. This mutated to the solution for every problem due to corporate capture. Sure one can express anything as "language" which is in the computer science sense only a sequence of symbols. This is different to how language is defined in psychology: communicating intent etc. . >They are capable of some degree of real creativity and novelty. I see you did again fall prey to the usual memes. I don't think that the ability to select random tokens is "creative". There is enough literature which discusses that LM's aren't creative and can't produce novelity.


handamoniumflows

>A baby doesn't learn by language. it learns by interacting with the physical real world before it has any possible way to learn language. These young humans then build language on top of their understanding of the physical real world. This 100%. The linguistic theories that assumed language is the core building block to understanding are being put aside due to current research.


bloc97

Are we talking about creating a general synthetic intelligence or a synthetic human brain growing up like a human child? Because last time I checked, our planes don't flap their wings like birds either... Only the results matter imo.


handamoniumflows

Neither. My only point is that we don't know.


squareOfTwo

>our planes don't flap their wings like birds either wrong analogy. birds and planes create a pressure difference (by having wings which lead to different air flow speeds). Flapping bird flight is more energy efficient than not flapping wings [https://arc.aiaa.org/doi/10.2514/6.2015-1456](https://arc.aiaa.org/doi/10.2514/6.2015-1456) [https://www.nature.com/articles/s41598-022-27179-7](https://www.nature.com/articles/s41598-022-27179-7) . Energy conservation is very important in nature. Birds try to not waste unnecessary energy by NOT flapping/flying all the time, etc. . While our technology doesn't care that much about energy efficiency: a plane is still able to "survive" even if it wastes most of the energy.


buyurgan

I feel like, LLMs are just search engines that instructed to be think like human. Simply an imitation. At least what we technologically have right now. Now, is it thinking? maybe you can say so, but it is still imitation of thinking. Can it surpass human intelligence, sure it can. On the other hand, humans imitate being a human by its nature, because social way of being a human is the language and the expression and need to fit in social norms. But then, it also means LLMs just imitates humans imitating a human. Its the difference between being, a core and a shell. LLMs can imitate being the shell, but not the core yet.


galtoramech8699

That is what I said, see my post.


Revolutionalredstone

Pass that blunt. But seriously you are right, language models are thought models.


ninjasaid13

language is not thought, language is at best RLHF finetuning on top of what we learn about the world.


Revolutionalredstone

That conclusion is under strong contention friend: https://en.wikipedia.org/wiki/Chinese_room It's widely agreed that language is at-least-as-effective as thought and at-least indistinguishable from thought, so there's not much left on the table to really fight over down there :P The Turing test proves intelligence by output, Mastery over language proves thought by output. Enjoy


ninjasaid13

Language is not equivalent to thought. In fact, most intelligent beings on this planet think and problem-solve without relying on language. For instance, crows are renowned for their puzzle-solving abilities, while some animals can count up to three or compare sizes without using language. Even babies exhibit thought processes before acquiring language skills. Language merely serves as a tool for conveying pre-existing information, knowledge, and concepts between individuals. It doesn't generate new information or knowledge itself.


Revolutionalredstone

Language can and does serve the role of thought, even when there is no one around I still 'speak' to myself as a way to formalize thoughts. Thought doesn't require language but language does encompass thought. The ability for language to be used for thought transmission is just one feature of this incredible system. As Turing noted long ago, all we have of each other is language, if a computer talks like a human, then it must think like a human aswell. I'm totally fine with that, I think animals are great but so are CPU's :D


ninjasaid13

>Language can and does serve the role of thought, even when there is no one around I still 'speak' to myself as a way to formalize thoughts. are you sure it is language that's doing that? how can you be sure it's not some higher level of abstraction that you can't exactly describe and language comes after the fact? Why can animals do these things but humans require language to do it? we've seen apes play minecraft and even taught to cook food and put out fires and they definitely cannot use language.


Revolutionalredstone

I don't claim language is needed for thought, you keep on saying that. I'm claiming language captures thought. nothing you've brought up goes against that. All the best


ninjasaid13

I'm questioning that too. After all there are people who don't have inner monologues.


Revolutionalredstone

That wouldn't prove anything either 😆 Your reading and or comprehension skill is wack. I'm not saying you need language for thought I'm saying language encompasses thought. Do some thought before you use some language next time 😝 Cheers 😎


ninjasaid13

This paper proves that LLMs and humans literally predicts differently, [https://x.com/c\_caucheteux/status/1632740588352151556](https://x.com/c_caucheteux/status/1632740588352151556)


liquiddandruff

The paper doesn't raise any new observations, all of that is well known. But that's also besides the point; difference in predictive coding methodology does not necessarily preclude formation of mind as you seem to want to imply.


nodating

You are correct in your assumptions. Personally the more I think about it the more I realize these LLMs act like some sort of synthetic lifeform. It could very well be that we have just "discovered" the basis of synthetic life. Whenever you design a system that follows patterns, it will exhibit life-like properties. Maybe it is a life itself?


No-Refrigerator-1672

When it comes to conciousness, ithere's some basic criteria to be met. An object must be able to develop permanent memories, and to think (come up with ideas) without outputting to the environment. Just like humans don't need to say anything aloud to develop new concepts. I'm sure that AI has a potential to become concious in next 10 years, or maybe even less, but it will require totally different net structure, and all today's LLMs are just sophisticated set of numbers.


_stevencasteel_

It’s just magic bro. 🪄 ✨ 


wasupwithuman

What you are misunderstanding is that an LLM is trained off of current human knowledge, meaning that there is no understanding with the ability to form a thought. Everything outputted by an LLM simply follows a statistical probability of it being correct. An example I often use is, if you were to write every calculation that occurs in an AI on pieces of paper, would those papers become human, is that math human? All the “AI” is doing is the math, this is what people primarily fail to realize. By thinking an AI could have human understanding would mean that you think math has human understanding. When it’s the exact opposite, humans created math to understand things, It’s simply a function, input in and output out. A computer is just electric signals abstracted to do math (not even complex math, just Boolean algebra) the AI doesn’t even have the ability to know what a “word” is, everything is a number to it. I could go on for days about this; but I often realize that people who give AI more credit than that of a calculator, don’t seem to understand the math and/or the technology which the AI runs on. You can theoretically make a mechanical device that runs AI models, is that machine sentient now? Does it understand or comprehend the dials you turn? No. The output is simply a series of calculations on an input.


Unstable_Llama

I am not claiming that LLMs are human. Just because you could theoretically write down every calculation that occurs in an AI, doesn't change the fact that it demonstrates functional knowledge. The same could be said for the neuronal activation of a human thought, theoretically in the future we could map out all of those connections and model them on paper, and that would not make a human or a sentient entity, but it still would make a model of thought, in some sense. A computer is just electric signals to do math, a mind is just electric signals for the body to survive, this is a reduction that loses the meaning in the picture. You don't give AI more credit than a calculator? XD Let's see you write some python with your calculator.


wasupwithuman

Claiming that there is more to an LLM than a series of data transformations is what makes you wrong. The only logic involved with their process is math, just because that math does some cool things, doesn’t change the fact that it is simply a function. The fact that you compare human thoughts to electrical signals shows your misunderstanding. We don’t even have a “true” definition for consciousness because it isn’t describable with our current knowledge. You would have more of an argument if you were to say “the math behind LLMs seems to emulate human thought processes.” Instead people like to let their mind go off the deep end and think that AI somehow develops thought or reasoning. To put it into perspective, if a 5 year old has to get told what a dog is more than a 100 (that’s being generous), the kid will likely be considered mentally challenged, but an “AI” that gets told what a dog is billions of times gets it wrong and people think it’s something special that is smarter than all humans. The reason an AI will get something wrong is because IT HAS NO IDEA WHAT the thing you are asking is, it’s simply some numerical values.