It took me a few minutes to create a fake me giving a fake lecture.
Ethan: Thankś for working through this practical example.
The freakiest bit is your demonstration of how little money - and even less effort - was required for you to produce this very decent example.
It ain't just that generative AI works.
It is suddenly instantly accessible and is rapidly becoming ubiquitous
Now all I need is for the AI with my cloned voice to be able to change how it sounds. I'm an author and a trans woman. I launched my last book on Kickstarter where best practice dictates that I create a video.
When someone can see my face, they see a woman. But when they hear my voice, they hear a man. I would really love a tool that could change that.
Thanks for the tutorial! It took me maybe 20 minutes to complete a video that...if you saw it via Zoom and/or wasn’t paying close attention...you’d likely be convinced. 🤯😳
I’m excited to share this with my MCAD students. It’s so clear this technology needs to be exploited to create art, to create joy. Then we can see it as enabling and not threatening.
I appreciate your wonderful efforts researching and sharing insights in this space.
Imho, the larger picture for such developments is that this is an example of bad engineering. That is, pushing such technology forward as fast as we can fails to take in to account that human beings as a species can't adapt to such changes as fast as the changes can be made.
As example, consider a car maker who proudly designs a car that can go 500mph, while forgetting that very few if any human beings can control a car at that speed. Bad engineering.
These are fun toys for sure, but is it worth it to destroy our confidence in the video medium? When everything can be convincingly faked, how are we supposed to believe in anything?
This is moving quite fast. Hopefully we can all get our own pet AI that will watch out for us in what is bound to be an exponentially treacherous digital environment
Brings to mind the often quoted: “the future is already here, just not evenly distributed”.
In this case, the future is already here, we just haven’t figured it out yet.
Praise be! I no longer have to spend hours narrating my own audio books! Thank you for compiling this!
I feel like this is such a pivotal moment in human history. Glad I’ve got your writing to navigate it.
Where did ChatGPT get the info about you? Do you have a public page or something like LinkedIn?
Just to understand...
We. Are. Doomed.
The weaponization of AI - destabilization of governments, politics, radicalization, white color crime (especially targeting the elderly), pornography (targeting an ex, the “girl next door”, HS students, teachers, etc.), extortion.
So, on a completely minor note…
Re the last audio clip: “ It is also worth noting that if I used one of the default AI voices provided by ElevenLabs, the AI-reading of the script would sound even better and more emotional. Here is an example:”
This, to me, was extremely fake sounding. The difference between a poorly engineered CD and an analog master vinyl LP.
I thought the first AI was much more realistic. But it will obviously be a very short time before a “pure AI voice” is indistinguishable from that master analog vinyl LP.
I just commented to someone about this, this morning. There's this story, about an accused murderer's alibi "unraveling" because the prosecutors have audio of him at the crime scene: https://nypost.com/2023/02/09/alex-murdaugh-alibi-unraveling-in-courtroom-source/ . But the software for faking one's voice is, as you note, commoditized, and *really* good. I don't think most lawyers, judges or juries understand any of this. Approximately *all* video, photographic, and audio evidence *cannot* be trusted and ought not be admissible in a court of law.
Something I've been pondering for over a year now: If we cannot trust any photo or video or even audio, how do we verify anything?
Example: Salacious footage is revealed on the nightly news of a politician soliciting a prostitute. The politician claims the footage was created by an AI. He could be telling the truth, he could be lying. I think the answer lies with cameras themselves acquiring the capability to embed a real-time generated cryptographic signature into any images created to include location. Similar to metadata it would be similar to a digital tamper-proof seal. Essentially the end-user would be able to authenticate whether the parameters of the digital image or video had ever been processed in any way or the extent of the processing.
Well the mouth movement synchronisation to the actual spoken script in the "AI" version is totally off. Only the most credulous would believe this was a real video.
Maybe this would improve with a better facial model but it's clearly way off at the moment.
Remember when we created ‘the bomb’ and thought that the ‘bad actors’ somehow wouldn’t abuse it. Yeah, it’s like that people. I think we know how this will play out. No one will no longer trust ANYTHING.
You are a man.