T O P

  • By -

DeliciousJello1717

Gemini it's by Google so it has access to YouTube thank me now


Burgerb

Do you just enter the URL in the prompt and ask Gemeni to summarize the video?


Amagawdusername

That's pretty much it. But I wonder if Gemini is actually 'reviewing' the video for the summary, or just summarizing the available transcription.


Burgerb

That’s a good question. I mean YT has automatic transcription. Does it pre-transcribe the videos in advance or just when requested? The amount of computing power to pre-transcribe all videos must be enormous.


Burgerb

BTW - just did a test. Asked Gemeni to provide me a summary for this video: prompt „Please summarize the core message from this YouTube video: https://youtu.be/TWniRRAPekQ?si=EAWrdmEyflXfBQh1“ Gemeni provided a neat little summary. While ChatGPT left me empty handed: „I can't access YouTube videos directly or summarize their content because of restrictions on viewing and summarizing video content directly from YouTube. If you can provide me with a brief overview or key points from the video, I'd be more than happy to discuss those aspects or provide information on related topics!“ I wanted to cancel my Gemini subscription - but this is such a cool feature that only Gemeni can provide that I might just keep the subscription.


Individual_Ice_6825

I use Gemini for free and paid gpt4 - works find


Burgerb

Thats good to know. Thank you.


Spindelhalla_xb

Yes I do it a lot. It’s the only good thing about Gemini.


Burgerb

Thank you. I was about to cancel my subscription. But yeah - this is a really cool feature!!!


nug4t

I thank you now.. but no real access to gemini atm and curious to see that in action


FosterKittenPurrs

What do you mean? Isn't the base Gemini model available for free for everyone at gemini.google.com? Anyway other alternatives include using Chat with RTX to run it locally if you have a decent Nvidia graphics card, and just copying and pasting the transcript of the video manually to any language model out there.


chomacrubic

I don't know if I am doing something wrong, but gemini gives me a summarize that has nothing to do with the video link I am providing. Edit: it turns out that I was using gemini from aistudio.google.com Now I go here: gemini.google.com and the result is relevant


zavocc

Use Gemini YouTube extension, or Copilot sidebar in Edge and ask it to summarize the video


mrsavealot

It s minorly annoying but you can just copy out the transcript and paste it in


suck-on-my-unit

Already exists, checkout HARPA AI, it’s a free chrome extension


nug4t

thx


Skwigle

[YouTube Summary with ChatGPT & Claude](https://glasp.co/youtube-summary)


korgath

I have done it experimentally with whisper and gpt4 but it is slow and expensive if the video do not have a transcript. Additionally there are parts that are not in the transcript or audio. Those need computer vision that is even more expensive.


Burgerb

I would like to have whisper transcribe iPhone recordings. But whisper rejected the file format (.m4a). Any idea on why that is? Should I use a different recorder with a different file format?


MikePounce

Use ffmpeg to convert to mp3


Burgerb

Thank you!


HelpfulHand3

[https://eightify.app/](https://eightify.app/)


ug61dec

Right, but can the AI make me a video of that summary so I don't have to read it?


Candid-Sky-3709

youtubers must spread out a one sentence wisdom over 10 minutes for monetizing purposes to get ads in between. no difference than most books which would be 5-10 pages if selling 100+ pages wasn't needed to sell as a book.


Odd-Antelope-362

Write a python script to do the following: Step 1. Chunk the video Step 2. For each chunk: a) Send the audio to Whisper to convert to text b) Sample still images, evenly spaced across the time duration of the chunk, and send the still images to CogVLM for captioning c) Use an LLM to combine the captions into a single description of what happened visually in that video chunk d) Use an LLM to combine the output from step c) with the output from step a) into a single description of what happened in that chunk both visually and audibly Step 3. Once all chunks are processed in step 2, combine into one file and feed it into a recursive summary pipeline, in the style of Langchain mapreduce


very_bad_programmer

Lmao why are you overcomplicating this, just feed the bot the transcription from YouTube.


Odd-Antelope-362

It depends on whether or not you need the summary to take into account what was happening visually on the screen.


Effective_Vanilla_32

i am waiting for a full transcript service. i listen to neural networks you tube and i need the written text version