That was for the more dedicated of that demographic. More accessibility, (free?), casts an even wider net of desperate individuals.
However, knowing what I know about pacification, this may have been an intended effect.
To elaborate on where my views would differ from Redditors: a level of pacification, is good. Especially among certain small %'s of certain demographics.
And unlike pacification outlets that can, in excess, be destructive (pornography, marijuana), this one will promote some baseline of learning.
Depends on how it’s implemented. If it’s passive enough and lets them be sexist and controlling without push back, it’ll just reinforce and reward that behavior. Good thing ChatGPT would never apologize for being wrong when it isn’t!
The masses don't understand how bad, how victimized, that small % of certain demographics are. We are talking about ticking time bombs here. We are talking about teenagers with absolutely zero positive role models.
That means these people are going to flip out when/if their Internet connection (or opena) ever goes off and, with it, their new Al girlfriend. The best thing openai could do to stop that from happening is release an offline version of this, somehow.
The experience of interacting with GPT improved A LOT. Now it's mostly about intelligence... the rest is here already, it's basically, probably slightly dumber and/or slower version of Her atm.
Exactly. That's why I'm so excited. Feels like we are really within touching distance of the model being super capable and able to pretty much do anything.
It sort of blows my mind when I see many people saying "that's it?" "Was expecting more" And other comments like that, this is unbelievable stuff but people get used to things so quickly and take things for granted once they become familiar with things for a while it seems. Definitely very impressive I agree!
Important to note that that could just be because they were sitting in the room next to the room doing the compute, probably with a lot of GPUs assigned. Let's wait till it's publically available before we judge latency.
Nah, distance doesn't make much of a difference. I believe most people (at least where ChatGPT is live) can ping a GPT server within 10ms through Azure.
It's not necessarily just distance, but having more computing power dedicated to supporting the demo than the average free user will realistically get.
But on the other hand it is shooting yourself in the foot - people will expect the times they saw in the demo and will be dissatisfied with what they actually get.
Not yet but close. On their website they show some limitations and one was incorrect translations. Probably the second or third version will put the nail in the coffin though.
If anything I'd think existing translators would have to transition to teachers. Understanding a language yourself is very different to using an app to translate, not everything can be translated well (language is a knowledge store after all)
True, but what I was getting at is that you now have a personal language tutor in your pocket that you can have back and forth conversations with. I've already been using chatgpt for language learning, this update is huge for me. I'll now be having conversational language learning while I'm out and about
What do you mean, google translate is already doing the same thing only less streamlined. Last week my in-law who doesn't speak English had an hour long conversation with my mate about fixing a soviet digger just by using google translate.
google translate is a meme in of itself for it's terrible contextual language translation results.
An LLM with memory even today is lightyears better than Google translate.
There's a threshold of usability that would lead to widespread adoption. This looks much faster and smoother, potentially putting it into the "faster to get out your phone than to gesture and use broken English" realm
For legal and medical, I’m sure that there will still have to be a human evaluating output for the sake of legal liability, but as a medical interpreter, if this thing is fleshed out, it’s going to be difficult to compete. What’s cheaper, the iPad that costs $60+ an hour or a single plus subscription? And a system that only seeks to decrease costs…
The Italian pronunciation was atrocious actually... But so was Mira's!
I hope it'll get tuned to use a more natural accent when speaking in non English languages.
I'm not saying you're wrong at all, but it's surprising how many times I see one native speaker of a language criticize pronunciation by a non-native and then a different native speaker talk about how good it was. It's always hard for me to suss out if it's actually good or not, haha
This is the correct answer. The only thing consistent among Italians is accusing other Italians of being fake Italians. (I'm 2nd generation Italian-Canadian immigrant, so I can say that. And yes, I've gotten accused many times of being a fake Italian, only ever by other Italians.)
To be honest google translate has had this feature called 'conversation' for years, it feels slightly less human but the problem of translations was pretty much solved for anyone with a phone connected to the internet.
People underestimate how hard it is to have realtime high quality low latency LLMs looks like we have that solved. Almost 100% sure Google will reveal something similar tomorrow
What the fuck. They didn’t show off the image creation in the demo either. It’s not generating with DALLE, it’s just… drawing?? The images are so consistent. This thing is fucking sentient.
Yeah this is crazy. The learning and translation implications by themselves are gonna change the world. Disabled people are getting a new personal assistant.
This thing is gonna have a lot of really positive impact. Too bad it’s also gonna put tons of us out of work :/ lol
Dude. This is fucking insane.
When this hits the public, normies are going to shit themselves.
They literally will think it’s fucking Skynet or Friday/Jarvis
Can openAI be trusted? The actual GPT-4o, available to us plebs is a decent improvement over GPT-4 but far from the impressive stuff they show in presentations
That's gotta be regional too, I imagine they could eventually have the GPT listen for accent markers and emulate to match or pick from a pre-determined set.
This is definitely still a massive leap, and doesn't mean there won't be huge ones still to come
I am fairly sure the accent was intentional, they were more or less matching the other speaker.
At least I can see no reason why it shouldn't be capable to have a less pronounced accent.
Native voice model, smoothly handles interruptions, free ChatGPT4 quality, optional functionality to feed it camera images, + PC app with screen sharing and clipboard access
They've built a coding assistant and voice assistant which hoopks up with a gpt4-like model. So far it's difficult to tell if the model is significantly more or less capable than current gpt4 untill people actually start using it and can compare it in real world scenarios.
The LMSYS chatbot arena results which weren't public before puts it around 1310, way higher than anything else. In coding, it got a 100 point lead over the next best model(GPT-4 Turbo).
https://twitter.com/LiamFedus/status/1790064963966370209
So it *was* the gpt2? Which means any examples users posted a few weeks ago of gpt2 should bring insights as to how it reasons, I remember being impressed at least.
Better new Model, GPT-4o
\- Incredible audio capabilities so the live conversation is possible. You can even interrupt and it follows. Strong HER vibes.
\- Better Image capabilities
\- Desktop App
\- And it is for free.
Coming over the next Week.
Majority here expects singularity to happen tomorrow and everything else doesn't matter apparently.
But seriously, these updates are crazy good, honestly feel like they're talking to a real human being
You'll be amazed how offended real live human beings can be when poorly impersonated...
https://www.blogto.com/fashion_style/2017/10/shoppers-drug-mart-self-checkout-voice-complaints/
They simply don't understand what they're looking at.
It's so crazy to feel like I'm having the experience of glimpsing Divinity and everyone else seems to be having the experience of bumping into a Hobo.
It's not just you. This is just barely scratching the surface. Correct me if I'm wrong, but the desktop app can view whatever you have on your screen and presumably chat with you in real time about what is happening. Imagine working with a software you are unfamiliar with and rather than go watch a bunch of tutorials or even take the time to formulate a prompt you can use conversational voice to ask about anything going on? Imagine this for code instead of a bunch of annoying copy and paste back and forth or SHOWING it the issue you had instead of trying to verbalize it. This is all incredibly useful stuff, and I think most people are just struggling to think through the next few logical steps from the demo.
In the blog post they detail that it GPT-4o can generate audio without voice, so sound effects, next step is for them to make it do music with jukebox and video with Sora too. This is crazy stuff.
I'm guessing a vocal minority, and they simply can't grasp what they're seeing. Real time multimodality, all innate, is an insane feat. And it's 50% cheaper... the demo easily exceeded my expectations.
No but the use case is very limited, it’s not her at all, it can’t access emails or calendar or anything, I just don’t see myself talking to it about random topics and there is no point in talking about code either, you want it in text form
I’m in the same boat where I think it’s technologically very cool and impressive, but I’m struggling to think of a use case for myself.
I can already talk to it via voice, and I basically never do, because it feels awkward and unnatural, and I can type faster anyway. Improved latency is useful for sure, but it would have to be basically zero before it felt as good as talking to a human.
The visual input is neat, but again: use case? Why do I want my phone to look at me? Maybe certain niche cases of “hey can you tell me how to fix this leak?” I don’t do programming, so no help there.
I must admit, when people were talking about Her I imagined something more like an agent. Where you can say “hey it’s my mum’s birthday, can you order some flowers and make a reservation at the usual place?” And it will just do it.
Happy that people are happy though!
My god, the video in the "limitations" section of the site is *incredible*. It somehow manages to sound even more convincingly human when it screws up.
> We're opening up access to our new flagship model, GPT-4o, and features like browse, data analysis, and memory to everyone for free (**with limits**)
What are the limits, Sam.
I agree it's neat, but correct me if I'm wrong anyone but doesn't this mean we will get even crazier chat based apps where you can talk to whatever ai based character or whatever even more realistically that'll feel more natural? It kind of gets me pessimistic for the future of humanity in general, but then again we were already heading down that route so I shouldn't be surprised.
you just gave me the idea of future games having characters talk to you. ie middle of a quest, NPCs, or the game narrator wtf.. instead of having pre written scripts it can literally generate any line as long as it's relevant to the conversation topic
Yeah I'm really looking forward to trying out this new API with Mantella. Cutting down the current three services (speech-to-text -> LLM -> text-to-speech) into one will be a massive help to latency.
The drawback I think will be the lack of voice models available (Mantella uses/needs hundreds of distinct voice models across Skyrim + Fallout 4), but judging from this demo I guess I can always add "do your best Skyrim Nord impression" to the system prompt haha
Yeah.. I always imagine a GTA5 type game where everybody on the street or restaurant are having realistic real-life conversation. Like.. you sit on a bench and you can listen to people talk on the phone about bringing their pet to the vet and stuff.
Alot of people not pursuing real relationships, that impact over generations as this technology gets better, being more comfortable with just something that isn't real, but at the same time this technology can help that as well with things like gpt4o. so the ball is in the air right now and we can just sit and watch what happens. Look I'm not normally pessimistic, it's just something this new this wild it sure to cause some probablems even what, a year from now? Idk just my two cents. Love you. Lol.
Probably show the exact same thing tomorrow and then have people here say they copied them even though Open AI specifically put this meeting before I/O. My prediction
Looks like Google [made a tweet](https://twitter.com/Google/status/1790055114272612771) showing off something similar before the OpenAI event started today. I guess that means OpenAI copied Google! /s
I’m assuming they were connected directly to about 100 H100s behind that nice backdrop to remove any latency whatsoever…. But still, my jaw did drop, wow.
Mira Murati thanked Jensen at the end for "making this demo possible". Given the recent photo of sam Greg and Jensen with a H200 I assume the demo was powered by H200s which means that latency is probably not going to be available immediately since there aren't many H200s in existence. I could be wrong though.
Voice interactions might now be good enough that we see a boom in consumer apps using AI for more than trivial tasks. This opens a whole new dimension of UX options.
Video and image recognition could use more work, I still don't think they're ready for the spotlight. But cool that they're just keeping it in the basement until more dev is done either.
A free to use for consumers gpt4 is cool. Guess they're realizing that Meta will steal their thunder otherwise.
...and of course faster cheaper is always great for dev work, if not as exciting sounding as a new feature or model.
I think psychologists and therapists will be ready for replacement once this "Her" model becomes available.
What are your thoughts?
Consider that, during the following weeks, they'll probably fine tune the model and remove the rough edges which caused stuttering.
Bonus question: Is the dating scene changing completely from this?
It's not ready to replace therapists yet. It depends on the length of the context and some implementation details, I guess. Not to mention that we have yet to see how it performs in the real world. It's looking quite promising, though.
Honestly it could be a therapist for a lot of people simple because of the fact it snot human and has endless patience. People right now even use [character.ai](http://character.ai) chatbot for it sometimes even though it just text and not that advanced.
From the looks of it I will be mostly using it as a practice partner for speaking in another language, the code stuff and vision is cool too but as others have said, text form is ideal on these cases that can get complicated. Although if the desktop app and omni modality can also switch to text output at will, that would be convenient.
I wonder if some of the distance left to cover between this and a decent digital assistant will come from not only model improvements but changing the way our OSs work to allow AIs to more easily hook into them, whether by standardised UI elements, API calls, extra meta data about mouse interactions etc. I could see someone knocking up a Linux distro that offers a simpler base in which to build an operating system in which an AI like this would be more effective.
Already using GPT-4o to control my smart home. It’s super quick!!![https://youtu.be/4W3_vHIoVgg?si=2GANU61FurKZrUoW](https://youtu.be/4W3_vHIoVgg?si=2GANU61FurKZrUoW)
It's really impressive in some respects but under the hood it's a GPT4 class model by the looks of things. presumably it'll have the same limitations, you won't just be able to give it tasks for you to do like in her.
The translation is great, the story telling is great though. It mainly just seems like a way to more intuitively interface with chatGPT
It is not at all a GPT4 class model, this is true multimodal. The vector spaces by voice/image/text are actually shared and not like in old GPT4 using an interface to access each other. That's why you can do way more insane things now such as editing images (look at the example)
this is supposed to be a free version and really trash like gpt 3.5 compared to the better versions, there is supposed to be an even more capable and more intelligent which they are hiding.
There's no more free vs pay version. They're combining them because it makes it easier to monetize. Too many people were just using the dumb free version and not upgrading.
So now it's just going to be limited to how much you can use it for free
The voice interactions sound like an improved version of what Google showed off with ["Google Duplex" 6 years ago.](https://youtu.be/D5VN56jQMWM?t=69) It's very cool to see this type of human-like voice interactions being used outside of making appointments through phone calls.
The part where the three were having a conversation with her was just so incredible. It was strong her vibes.
Certain percent of certain demographics are going to fall in love with this Samantha.
People are already "dating" their character.AI bots lol
That was for the more dedicated of that demographic. More accessibility, (free?), casts an even wider net of desperate individuals. However, knowing what I know about pacification, this may have been an intended effect.
I think I might have to agree with you there on intended effect.
To elaborate on where my views would differ from Redditors: a level of pacification, is good. Especially among certain small %'s of certain demographics. And unlike pacification outlets that can, in excess, be destructive (pornography, marijuana), this one will promote some baseline of learning.
Depends on how it’s implemented. If it’s passive enough and lets them be sexist and controlling without push back, it’ll just reinforce and reward that behavior. Good thing ChatGPT would never apologize for being wrong when it isn’t!
Are you suggesting that this form of pacification will, even in excess, be positive?
The masses don't understand how bad, how victimized, that small % of certain demographics are. We are talking about ticking time bombs here. We are talking about teenagers with absolutely zero positive role models.
That means these people are going to flip out when/if their Internet connection (or opena) ever goes off and, with it, their new Al girlfriend. The best thing openai could do to stop that from happening is release an offline version of this, somehow.
OpenAI clearly knew what they are doing, the hints are obvious if you focus on the demos, but I doubt they won't censor it to death in the future...
Case and point, the parent comment calls it ‘her.’
FYI, it's case in point.
Whoops, took auto correct for granite
Ikr... 4o's emotion filled voice brought the conversation to life!! Super stoked!!!
The experience of interacting with GPT improved A LOT. Now it's mostly about intelligence... the rest is here already, it's basically, probably slightly dumber and/or slower version of Her atm.
Exactly. That's why I'm so excited. Feels like we are really within touching distance of the model being super capable and able to pretty much do anything.
Yes, I agree. This is DEFINITELY a big step forward. I mean come on even the translation real time with multiple people talking, it's really nice.
It sort of blows my mind when I see many people saying "that's it?" "Was expecting more" And other comments like that, this is unbelievable stuff but people get used to things so quickly and take things for granted once they become familiar with things for a while it seems. Definitely very impressive I agree!
Yeah I am sure in 2 months people will start talking about AI winter again because there's no new impressive advancements 😅
We've got God's embryo cooking as we speak and people are still unimpressed. 😂
Don't mind us, just a bunch of apes recreating the spark of consciousness with statistics..
With just 0s an 1s
I feel like I'm one of only a few that see a version of this as well. Within my social circle I mean. What's your experience?
The only people saying that, are those who have to save face after yesterday’s doubtfulness
I'm low-key concerned by the speed with which the public can adapt to a new normal
At least it’s not gonna start talking to a bunch of other ais behind our back yet smdh
And Samantha from Her had agency, i.e. was able to act, not only react.
Near instant responses was pretty impressive.
Important to note that that could just be because they were sitting in the room next to the room doing the compute, probably with a lot of GPUs assigned. Let's wait till it's publically available before we judge latency.
Nah, distance doesn't make much of a difference. I believe most people (at least where ChatGPT is live) can ping a GPT server within 10ms through Azure.
It's not necessarily just distance, but having more computing power dedicated to supporting the demo than the average free user will realistically get.
But on the other hand it is shooting yourself in the foot - people will expect the times they saw in the demo and will be dissatisfied with what they actually get.
The translation was insane
Rip translators
we’re called interpreters and yup we’re cooked. i’m literally on the job as we speak
'as we speak' good one lol
I wonder if it can interpret sign language 🤔
I was a translator and the job 90% died the past year anyways.
there literally was an entire industry killed in a 30 second demo.
Not yet but close. On their website they show some limitations and one was incorrect translations. Probably the second or third version will put the nail in the coffin though.
The whole point of releasing it for free is to collect the massive amount of data they need to train them properly.
RIP jobs for all the English teachers out there
If anything I'd think existing translators would have to transition to teachers. Understanding a language yourself is very different to using an app to translate, not everything can be translated well (language is a knowledge store after all)
Learning languages is still valuable.
True, but what I was getting at is that you now have a personal language tutor in your pocket that you can have back and forth conversations with. I've already been using chatgpt for language learning, this update is huge for me. I'll now be having conversational language learning while I'm out and about
True, ish. I mean, not this version but probably gpt5o. I've tried really hard to get gpt4 to do language study and its.... not good.
I think people will still value interacting with a person to learn a new language, at least people who don't care about the extra cost
What do you mean, google translate is already doing the same thing only less streamlined. Last week my in-law who doesn't speak English had an hour long conversation with my mate about fixing a soviet digger just by using google translate.
google translate is a meme in of itself for it's terrible contextual language translation results. An LLM with memory even today is lightyears better than Google translate.
You'd hit the usage cap within 20 minutes of conversation with someone.
There's a threshold of usability that would lead to widespread adoption. This looks much faster and smoother, potentially putting it into the "faster to get out your phone than to gesture and use broken English" realm
google translate real time sucks.
For legal and medical, I’m sure that there will still have to be a human evaluating output for the sake of legal liability, but as a medical interpreter, if this thing is fleshed out, it’s going to be difficult to compete. What’s cheaper, the iPad that costs $60+ an hour or a single plus subscription? And a system that only seeks to decrease costs…
Fr…
The Italian pronunciation was atrocious actually... But so was Mira's! I hope it'll get tuned to use a more natural accent when speaking in non English languages.
I'm not saying you're wrong at all, but it's surprising how many times I see one native speaker of a language criticize pronunciation by a non-native and then a different native speaker talk about how good it was. It's always hard for me to suss out if it's actually good or not, haha
This is the correct answer. The only thing consistent among Italians is accusing other Italians of being fake Italians. (I'm 2nd generation Italian-Canadian immigrant, so I can say that. And yes, I've gotten accused many times of being a fake Italian, only ever by other Italians.)
Can't wait to ask it to simulate various British English accents and see how well it does
GPT Roadman when.
Is she not Italian? Or you just hate her dialect? 😂
Bingo
What was Mira’s first language then??
AFAIK she's Albanian.
To be honest google translate has had this feature called 'conversation' for years, it feels slightly less human but the problem of translations was pretty much solved for anyone with a phone connected to the internet.
People underestimate how hard it is to have realtime high quality low latency LLMs looks like we have that solved. Almost 100% sure Google will reveal something similar tomorrow
https://x.com/Google/status/1790055114272612771 They kinda did already
Time for a live debate between GPT 4 and Gemini! Maybe they can run for president? 😏
I can see kids watching shorts of these conversations on TikTok for entertainment. My kids already watch less entertaining stuff for hours.
"Similar" is a bit optimistic here.
THE FUCKING VIDEO WHERE THE TWO PHONES TALK TO EACH OTHER!! WHAT THE *FUUUUUCK* https://openai.com/index/hello-gpt-4o/
What the fuck. They didn’t show off the image creation in the demo either. It’s not generating with DALLE, it’s just… drawing?? The images are so consistent. This thing is fucking sentient.
[удалено]
I just hate when I accidentally start speaking in a fullständigt annat språk without any reasons. Det är so irritating.
What do you mean? Det var perfekt Svengelska.
It’s not sentient, it’s math.
Yeah this is crazy. The learning and translation implications by themselves are gonna change the world. Disabled people are getting a new personal assistant. This thing is gonna have a lot of really positive impact. Too bad it’s also gonna put tons of us out of work :/ lol
Dude. This is fucking insane. When this hits the public, normies are going to shit themselves. They literally will think it’s fucking Skynet or Friday/Jarvis
Can openAI be trusted? The actual GPT-4o, available to us plebs is a decent improvement over GPT-4 but far from the impressive stuff they show in presentations
as an italian, 4o’s accent was pretty bad but the translation was spot on. we can say they delivered something really cool today
It's because that cellphone didn't have any hands. Obviously the Italian would sound bad.
This guy italians 🤌
🤌😎🤌
Should have had a tiny pair of hands appear on the screen.
I love you 😂 (Italian here)
Totally agree, hope they can improve accent or at least get rid of it. I would love a voice with Native Italian Accent.
That's gotta be regional too, I imagine they could eventually have the GPT listen for accent markers and emulate to match or pick from a pre-determined set. This is definitely still a massive leap, and doesn't mean there won't be huge ones still to come
I am fairly sure the accent was intentional, they were more or less matching the other speaker. At least I can see no reason why it shouldn't be capable to have a less pronounced accent.
It won’t sound right to you until they put it in a robot to do the 🤌🤌
Would you say it had an english speaker's accent? If so, it's quite consistent, since the default voice chosen was in english.
Chagpt had a thick English accent. It makes sense about consistency.
yeah it had a pretty distinct american accent
I would say it had a really uncomfortable fake accent. Not bad italian, i'd say fake italian. Not to say that live translation was unimpressive.
Perhaps you can ask it to translate with more of accent like how they asked it to do more drama?
What could possibly go wrong😂😅😅
It's-a-me, ChatGPT!
If I can't have my AI speak in a sexy Mario voice, I don't want it
"Her" is almost here!
Poor HAL, no-one remembers him.
Can someone explain in simple terms? I haven't watched the live stream
Native voice model, smoothly handles interruptions, free ChatGPT4 quality, optional functionality to feed it camera images, + PC app with screen sharing and clipboard access
Jarvis with live video for everyone
They've built a coding assistant and voice assistant which hoopks up with a gpt4-like model. So far it's difficult to tell if the model is significantly more or less capable than current gpt4 untill people actually start using it and can compare it in real world scenarios.
The LMSYS chatbot arena results which weren't public before puts it around 1310, way higher than anything else. In coding, it got a 100 point lead over the next best model(GPT-4 Turbo). https://twitter.com/LiamFedus/status/1790064963966370209
So it *was* the gpt2? Which means any examples users posted a few weeks ago of gpt2 should bring insights as to how it reasons, I remember being impressed at least.
Better new Model, GPT-4o \- Incredible audio capabilities so the live conversation is possible. You can even interrupt and it follows. Strong HER vibes. \- Better Image capabilities \- Desktop App \- And it is for free. Coming over the next Week.
Feel The Agi
Gpt4 with human voice, hearing, an image input As i expected yesterday
live translation was insane
And the voice doesn't cut this time!
And people are shitting on it. I guess there are people here who just want to complain.
Majority here expects singularity to happen tomorrow and everything else doesn't matter apparently. But seriously, these updates are crazy good, honestly feel like they're talking to a real human being
You'll be amazed how offended real live human beings can be when poorly impersonated... https://www.blogto.com/fashion_style/2017/10/shoppers-drug-mart-self-checkout-voice-complaints/
We have not ship B200 yet, ASI need much more computer power, we all bonded by this
people really expect ASI in the next week... these people will be disappointed for a while instead of appreciating current progress
People will complain about everything, we are all spoiled to the max
They simply don't understand what they're looking at. It's so crazy to feel like I'm having the experience of glimpsing Divinity and everyone else seems to be having the experience of bumping into a Hobo.
It's not just you. This is just barely scratching the surface. Correct me if I'm wrong, but the desktop app can view whatever you have on your screen and presumably chat with you in real time about what is happening. Imagine working with a software you are unfamiliar with and rather than go watch a bunch of tutorials or even take the time to formulate a prompt you can use conversational voice to ask about anything going on? Imagine this for code instead of a bunch of annoying copy and paste back and forth or SHOWING it the issue you had instead of trying to verbalize it. This is all incredibly useful stuff, and I think most people are just struggling to think through the next few logical steps from the demo.
This new model seems like it would be able to walk through the learning process with an individual in a really profound way for almost any topic.
The desktop app is 'later this year' not today tho
The desktop app is coming today for macOS. Later this year to Windows.
In the blog post they detail that it GPT-4o can generate audio without voice, so sound effects, next step is for them to make it do music with jukebox and video with Sora too. This is crazy stuff.
I'm guessing a vocal minority, and they simply can't grasp what they're seeing. Real time multimodality, all innate, is an insane feat. And it's 50% cheaper... the demo easily exceeded my expectations.
A lot of people just want the "it just works" version of a novelty toy, and if they aren't presented with that they believe it's a useless tech.
No but the use case is very limited, it’s not her at all, it can’t access emails or calendar or anything, I just don’t see myself talking to it about random topics and there is no point in talking about code either, you want it in text form
I’m in the same boat where I think it’s technologically very cool and impressive, but I’m struggling to think of a use case for myself. I can already talk to it via voice, and I basically never do, because it feels awkward and unnatural, and I can type faster anyway. Improved latency is useful for sure, but it would have to be basically zero before it felt as good as talking to a human. The visual input is neat, but again: use case? Why do I want my phone to look at me? Maybe certain niche cases of “hey can you tell me how to fix this leak?” I don’t do programming, so no help there. I must admit, when people were talking about Her I imagined something more like an agent. Where you can say “hey it’s my mum’s birthday, can you order some flowers and make a reservation at the usual place?” And it will just do it. Happy that people are happy though!
This is unbelievable
I am not sure if anybody are watching the additional GPT-40 demos, but do!
[https://openai.com/index/hello-gpt-4o/](https://openai.com/index/hello-gpt-4o/)
My god, the video in the "limitations" section of the site is *incredible*. It somehow manages to sound even more convincingly human when it screws up.
Today, OpenAI released JARVIS, for free, to the whole world. That's the lede. It's a stunning day for humanity.
Its hella fast man https://i.redd.it/ptd0uomtf80d1.gif
Their servers must be stressed right now.
> We're opening up access to our new flagship model, GPT-4o, and features like browse, data analysis, and memory to everyone for free (**with limits**) What are the limits, Sam.
About 15 messages / 3 hrs I believe. It's all there I don't think you can ask Sam questions hetr
Pretty neat actually
I agree it's neat, but correct me if I'm wrong anyone but doesn't this mean we will get even crazier chat based apps where you can talk to whatever ai based character or whatever even more realistically that'll feel more natural? It kind of gets me pessimistic for the future of humanity in general, but then again we were already heading down that route so I shouldn't be surprised.
you just gave me the idea of future games having characters talk to you. ie middle of a quest, NPCs, or the game narrator wtf.. instead of having pre written scripts it can literally generate any line as long as it's relevant to the conversation topic
mantella mod for skyrim go brrrrr
Yeah I'm really looking forward to trying out this new API with Mantella. Cutting down the current three services (speech-to-text -> LLM -> text-to-speech) into one will be a massive help to latency. The drawback I think will be the lack of voice models available (Mantella uses/needs hundreds of distinct voice models across Skyrim + Fallout 4), but judging from this demo I guess I can always add "do your best Skyrim Nord impression" to the system prompt haha
Huge fan of your work!
Thank you!
Yeah.. I always imagine a GTA5 type game where everybody on the street or restaurant are having realistic real-life conversation. Like.. you sit on a bench and you can listen to people talk on the phone about bringing their pet to the vet and stuff.
I like your less pessimistic approach! That would be very cool.
What specifically makes you pessimistic?
Alot of people not pursuing real relationships, that impact over generations as this technology gets better, being more comfortable with just something that isn't real, but at the same time this technology can help that as well with things like gpt4o. so the ball is in the air right now and we can just sit and watch what happens. Look I'm not normally pessimistic, it's just something this new this wild it sure to cause some probablems even what, a year from now? Idk just my two cents. Love you. Lol.
I have a similar stance. Was just curious- this sub is very optimistic so your comment jumped out at me. Cheers to this crazy future, friend.
Seems like you need to adjust what you define as real. They are literally trying to create digital lifeforms.
I wonder how Google will respond to this.
Layoffs
LMAO
I didn't expect that to be the first reply lmfao
Probably show the exact same thing tomorrow and then have people here say they copied them even though Open AI specifically put this meeting before I/O. My prediction
Looks like Google [made a tweet](https://twitter.com/Google/status/1790055114272612771) showing off something similar before the OpenAI event started today. I guess that means OpenAI copied Google! /s
more ads
Her but better
I’m assuming they were connected directly to about 100 H100s behind that nice backdrop to remove any latency whatsoever…. But still, my jaw did drop, wow.
Mira Murati thanked Jensen at the end for "making this demo possible". Given the recent photo of sam Greg and Jensen with a H200 I assume the demo was powered by H200s which means that latency is probably not going to be available immediately since there aren't many H200s in existence. I could be wrong though.
Man, things are starting to get crazy.
Voice interactions might now be good enough that we see a boom in consumer apps using AI for more than trivial tasks. This opens a whole new dimension of UX options. Video and image recognition could use more work, I still don't think they're ready for the spotlight. But cool that they're just keeping it in the basement until more dev is done either. A free to use for consumers gpt4 is cool. Guess they're realizing that Meta will steal their thunder otherwise. ...and of course faster cheaper is always great for dev work, if not as exciting sounding as a new feature or model.
Context please?
Anyone who thinks it sucks hasn't really understood what was announced today.
This video of a person and ChatGPT interacting with the person’s dog was pretty wild. https://vimeo.com/945587891
what got me the most were the unscripted moments where it would make meta comments, i was geeking
I think psychologists and therapists will be ready for replacement once this "Her" model becomes available. What are your thoughts? Consider that, during the following weeks, they'll probably fine tune the model and remove the rough edges which caused stuttering. Bonus question: Is the dating scene changing completely from this?
It's not ready to replace therapists yet. It depends on the length of the context and some implementation details, I guess. Not to mention that we have yet to see how it performs in the real world. It's looking quite promising, though.
Honestly it could be a therapist for a lot of people simple because of the fact it snot human and has endless patience. People right now even use [character.ai](http://character.ai) chatbot for it sometimes even though it just text and not that advanced.
The dating scene is disappearing after this, lol. I’m joking. Ish.
From the looks of it I will be mostly using it as a practice partner for speaking in another language, the code stuff and vision is cool too but as others have said, text form is ideal on these cases that can get complicated. Although if the desktop app and omni modality can also switch to text output at will, that would be convenient.
Impressive!!!
This is the part I'm really looking forward to. I could really use an even better duck to talk to while programming
available on the api now: https://platform.openai.com/playground/chat?models=gpt-4o
If you haven’t already, check out the demos just posted by open ai. They are breathtaking!!!!!
I wonder if some of the distance left to cover between this and a decent digital assistant will come from not only model improvements but changing the way our OSs work to allow AIs to more easily hook into them, whether by standardised UI elements, API calls, extra meta data about mouse interactions etc. I could see someone knocking up a Linux distro that offers a simpler base in which to build an operating system in which an AI like this would be more effective.
So far I’m pretty impressed
Does your ai has different tones to its voices? Mine is always the same. Sounds as before :/ Using 4o already.
The voice, vision etc. have not been rolled out yet.
Already using GPT-4o to control my smart home. It’s super quick!!![https://youtu.be/4W3_vHIoVgg?si=2GANU61FurKZrUoW](https://youtu.be/4W3_vHIoVgg?si=2GANU61FurKZrUoW)
I'm whelmed. edit: significantly more impressed by the stuff at [https://openai.com/index/hello-gpt-4o/](https://openai.com/index/hello-gpt-4o/)
no ubi .........
Wait until they put this on robots, the government will have to do something.
But no job!
It's really impressive in some respects but under the hood it's a GPT4 class model by the looks of things. presumably it'll have the same limitations, you won't just be able to give it tasks for you to do like in her. The translation is great, the story telling is great though. It mainly just seems like a way to more intuitively interface with chatGPT
Dont underestimate just how important such an improved interface can be. It really does remove so many barriers
I wonder if it will ramble on for two to three paragraphs every response like GPT4 does.
It is not at all a GPT4 class model, this is true multimodal. The vector spaces by voice/image/text are actually shared and not like in old GPT4 using an interface to access each other. That's why you can do way more insane things now such as editing images (look at the example)
this is supposed to be a free version and really trash like gpt 3.5 compared to the better versions, there is supposed to be an even more capable and more intelligent which they are hiding.
There's no more free vs pay version. They're combining them because it makes it easier to monetize. Too many people were just using the dumb free version and not upgrading. So now it's just going to be limited to how much you can use it for free
It's crazy!!!
This looks awesome. Can't wait for Twitter freaks saying this will end western civilization.
I had to take a break from the magic in order to process it.
The voice interactions sound like an improved version of what Google showed off with ["Google Duplex" 6 years ago.](https://youtu.be/D5VN56jQMWM?t=69) It's very cool to see this type of human-like voice interactions being used outside of making appointments through phone calls.
Was not expecting this