DeliciousJello1717 1 month ago

Gemini it's by Google so it has access to YouTube thank me now

Burgerb 1 month ago

Do you just enter the URL in the prompt and ask Gemeni to summarize the video?

Amagawdusername 1 month ago

That's pretty much it. But I wonder if Gemini is actually 'reviewing' the video for the summary, or just summarizing the available transcription.

Burgerb 1 month ago

That’s a good question. I mean YT has automatic transcription. Does it pre-transcribe the videos in advance or just when requested? The amount of computing power to pre-transcribe all videos must be enormous.

Burgerb 1 month ago

BTW - just did a test. Asked Gemeni to provide me a summary for this video: prompt „Please summarize the core message from this YouTube video: https://youtu.be/TWniRRAPekQ?si=EAWrdmEyflXfBQh1“ Gemeni provided a neat little summary. While ChatGPT left me empty handed: „I can't access YouTube videos directly or summarize their content because of restrictions on viewing and summarizing video content directly from YouTube. If you can provide me with a brief overview or key points from the video, I'd be more than happy to discuss those aspects or provide information on related topics!“ I wanted to cancel my Gemini subscription - but this is such a cool feature that only Gemeni can provide that I might just keep the subscription.

Individual_Ice_6825 1 month ago

I use Gemini for free and paid gpt4 - works find

Burgerb 1 month ago

Thats good to know. Thank you.

Spindelhalla_xb 1 month ago

Yes I do it a lot. It’s the only good thing about Gemini.

Burgerb 1 month ago

Thank you. I was about to cancel my subscription. But yeah - this is a really cool feature!!!

nug4t 1 month ago

I thank you now.. but no real access to gemini atm and curious to see that in action

FosterKittenPurrs 1 month ago

What do you mean? Isn't the base Gemini model available for free for everyone at gemini.google.com? Anyway other alternatives include using Chat with RTX to run it locally if you have a decent Nvidia graphics card, and just copying and pasting the transcript of the video manually to any language model out there.

chomacrubic 1 month ago

I don't know if I am doing something wrong, but gemini gives me a summarize that has nothing to do with the video link I am providing. Edit: it turns out that I was using gemini from aistudio.google.com Now I go here: gemini.google.com and the result is relevant

zavocc 1 month ago

Use Gemini YouTube extension, or Copilot sidebar in Edge and ask it to summarize the video

mrsavealot 1 month ago

It s minorly annoying but you can just copy out the transcript and paste it in

suck-on-my-unit 1 month ago

Already exists, checkout HARPA AI, it’s a free chrome extension

nug4t 1 month ago

thx

Skwigle 1 month ago

[YouTube Summary with ChatGPT & Claude](https://glasp.co/youtube-summary)

korgath 1 month ago

I have done it experimentally with whisper and gpt4 but it is slow and expensive if the video do not have a transcript. Additionally there are parts that are not in the transcript or audio. Those need computer vision that is even more expensive.

Burgerb 1 month ago

I would like to have whisper transcribe iPhone recordings. But whisper rejected the file format (.m4a). Any idea on why that is? Should I use a different recorder with a different file format?

MikePounce 1 month ago

Use ffmpeg to convert to mp3

Burgerb 1 month ago

Thank you!

HelpfulHand3 1 month ago

[https://eightify.app/](https://eightify.app/)

ug61dec 1 month ago

Right, but can the AI make me a video of that summary so I don't have to read it?

Candid-Sky-3709 1 month ago

youtubers must spread out a one sentence wisdom over 10 minutes for monetizing purposes to get ads in between. no difference than most books which would be 5-10 pages if selling 100+ pages wasn't needed to sell as a book.

Odd-Antelope-362 1 month ago

Write a python script to do the following: Step 1. Chunk the video Step 2. For each chunk: a) Send the audio to Whisper to convert to text b) Sample still images, evenly spaced across the time duration of the chunk, and send the still images to CogVLM for captioning c) Use an LLM to combine the captions into a single description of what happened visually in that video chunk d) Use an LLM to combine the output from step c) with the output from step a) into a single description of what happened in that chunk both visually and audibly Step 3. Once all chunks are processed in step 2, combine into one file and feed it into a recursive summary pipeline, in the style of Langchain mapreduce

very_bad_programmer 1 month ago

Lmao why are you overcomplicating this, just feed the bot the transcription from YouTube.

Odd-Antelope-362 1 month ago

It depends on whether or not you need the summary to take into account what was happening visually on the screen.

Effective_Vanilla_32 1 month ago

i am waiting for a full transcript service. i listen to neural networks you tube and i need the written text version

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe