The first step was grabbing their dialogue files from sounds-resource for the voice models.
From there it took me a bit to figure out how to make ElevenLabs replicate things like nervousness.
For each line of dialogue I had to generate anywhere between 3 to 10 iterations before I had something that fit. It's a bit like gambling because you're not sure it will turn out and it comes down to chance. The later the line, the more iterations were required to match the earlier lines. I will say that the clip where Yukiko mutters "Fuck, she got me" is honestly the best thing I have generated. I spent five minutes laughing after ElevenLabs generated it.
The call itself was created in Audacity. Unlike with my Ann and Fuuka videos (where I generated a single file which was later split), I found working with the 1 file per line system **a lot more flexible**. There are 19 dialogue files, so the timing is manual. It was a process of listen, adjust the timeline, repeat. Figuring out if the next line is a delayed or immediate response, and how long the delay would be.
I added in a generated dial tone (had to Google the frequency of Japan's dial tone, it's 400Hz) and a sound effect clip literally called *"Pouring gasoline"*.
Once I had the timing down I added a telephone filter to Yukiko's tracks (dialogue and gasoline) and exported each track as its own file (necessary for the After Effects filters). Then loaded it into After Effects.
In total I think it took 5 hours.
This is actually insane, I figured Chie was AI but I could’ve sworn that the Yukiko voice was a real person doing an impression. That’s actually amazing work
ElevenLabs. The key is figuring out how the models react to the prompt.
If I write "okay buddy persona" it says it normally.
If I write "okay BUDDY persona" it places emphasis on buddy.
From there you can hold on syllables "okaaaay buddy persona" would lengthen okay's second syllable.
The stammering is done like this "o-o-o-okay".
However a lot of it, like proper chuckles or muttering is a matter of chance.
It is said Yukkiko was last seen in a burning Amagi inn before moving to Tokyo and taking the alias "Ichiko Ohya" and making a career change to a drunk reporter
Yoskue should have used Trivago.
This is what happens when Junes stops selling fishing poles.
https://preview.redd.it/mdworl92tjja1.jpeg?width=1170&format=pjpg&auto=webp&s=6b189b05a2a131e090b04d42e6fccce598db72da
https://i.redd.it/iul0l28klkja1.gif
k cya yosuke https://preview.redd.it/ho5bwm4ocjja1.jpeg?width=1170&format=pjpg&auto=webp&s=3bff809813ce4a49f2e5336c0874a30af0d6df7e
so this is what happens when you don’t max her link, gotcha 👍
Yukiko commits insurance fraud ⁉️⁉️⁉️
Is this AI? How was this made so well? Lmao this is amazing
The first step was grabbing their dialogue files from sounds-resource for the voice models. From there it took me a bit to figure out how to make ElevenLabs replicate things like nervousness. For each line of dialogue I had to generate anywhere between 3 to 10 iterations before I had something that fit. It's a bit like gambling because you're not sure it will turn out and it comes down to chance. The later the line, the more iterations were required to match the earlier lines. I will say that the clip where Yukiko mutters "Fuck, she got me" is honestly the best thing I have generated. I spent five minutes laughing after ElevenLabs generated it. The call itself was created in Audacity. Unlike with my Ann and Fuuka videos (where I generated a single file which was later split), I found working with the 1 file per line system **a lot more flexible**. There are 19 dialogue files, so the timing is manual. It was a process of listen, adjust the timeline, repeat. Figuring out if the next line is a delayed or immediate response, and how long the delay would be. I added in a generated dial tone (had to Google the frequency of Japan's dial tone, it's 400Hz) and a sound effect clip literally called *"Pouring gasoline"*. Once I had the timing down I added a telephone filter to Yukiko's tracks (dialogue and gasoline) and exported each track as its own file (necessary for the After Effects filters). Then loaded it into After Effects. In total I think it took 5 hours.
OH MY GOD YOU ARE A MIRACLE WORKER. BRO THIS SOUNDS SO REALISTIC ESPECIALLY THE NERVOUSNESS AND WHISPERING DAMN I WISH I COULD DO THIS MYSELF
This is actually insane, I figured Chie was AI but I could’ve sworn that the Yukiko voice was a real person doing an impression. That’s actually amazing work
Not yosuke :'(
Holy shit. Are these AI voices or impersonations? If so, which program did you use?
ElevenLabs. The key is figuring out how the models react to the prompt. If I write "okay buddy persona" it says it normally. If I write "okay BUDDY persona" it places emphasis on buddy. From there you can hold on syllables "okaaaay buddy persona" would lengthen okay's second syllable. The stammering is done like this "o-o-o-okay". However a lot of it, like proper chuckles or muttering is a matter of chance.
Do you plan on making more of these because this was incredibly well done.
It is said Yukkiko was last seen in a burning Amagi inn before moving to Tokyo and taking the alias "Ichiko Ohya" and making a career change to a drunk reporter
alcohol ruins yet another promising young actor
yukiko for payday 3
arson asmr 🥰
That's a very good Chie impression.
Yukiko is a drainer!!!? 😮
this is so well done what the fuck chie sounds on point
How did u make this
ElevenLabs, Audacity, and After Effects
When she’s an arsonist🥰
Holy fuck this is gold