A quick and sobering guide to cloning yourself
It took me a few minutes to create a fake me giving a fake lecture.
I think a lot of people do not realize how rapidly the multiple strands of generative AI (audio, text, images, and video) are advancing, and what that means for the future.
With just a photograph and 60 seconds of audio, you can now create a deepfake of yourself in just a matter of minutes by combining a few cheap AI tools. I've tried it myself, and the results are mind-blowing, even if they're not completely convincing. Just a few months ago, this was impossible. Now, it's a reality.
To start, you should probably watch the short video of Virtual Me and Real Me giving the same talk about entrepreneurship. Nothing about the Virtual Me part of the video is real, even the script was completely AI-generated.
I want to give you the instructions on how to do it yourself. This isn’t some secret, lots of people are already creating these sorts of video, but it is worth seeing how easy it is.
In order to figure out what Virtual Me should say, I turned to our old friend ChatGPT, and, with no prompt-crafting or effort to revise it, simply requested: what would Professor Ethan Mollick say about startups? Write a script in first person. Include a brief introduction about him
The answer, as I think we all would expect by now, was surprisingly good. There was no real hallucinations, it got my title and history right, and the stuff that ChatGPT Fake Ethan Mollick talked about, while not deep, was not wrong:
Hello everyone, my name is Professor Ethan Mollick, and I am a professor of management at the Wharton School of the University of Pennsylvania. I have been studying startups and entrepreneurship for over a decade and have some thoughts on the subject that I would like to share with you today.
When it comes to startups, my first piece of advice is to focus on solving a real problem for customers. Many entrepreneurs get excited about a new idea or technology, but it's important to make sure that there is actually a market for what you are offering. It's not enough to just have a cool idea - you need to have customers who are willing to pay for it.
Another important thing to keep in mind is that startups are a long-term game. Building a successful company takes time and effort, and there will be ups and downs along the way. You need to have a growth mindset, be persistent, and be willing to pivot if necessary.
It's also important to surround yourself with the right team. As the saying goes, "A startup is a team sport." You need to have a group of people who complement each other's skills and can work well together to achieve common goals.
Finally, I would say that startups are all about learning. You are going to make mistakes, and that's okay. The important thing is to learn from those mistakes and use that learning to make better decisions in the future.
There are now a lot of services that can create realistic, virtual voices. Many of these can create voices that mimic anybody with high accuracy. For my experiment, I used a service called ElevenLabs, which cost me only $5. I uploaded two minutes of me talking about random things, and waited a few seconds. It generated a “clone” of my speech. Now I had a virtual voice that would say anything I typed.
I fed it the script. Here is the AI reading it, and me reading it, just for comparison. Again, it is solid - the Virtual Me even takes breaths and pauses - but it (hopefully) won’t fool anyone yet.
Real Me reading the fake script:
Fake Me reading the fake script (I used a lower quality microphone for the sample, so it has a realistic background sound):
It is also worth noting that if I used one of the default AI voices provided by ElevenLabs, the AI-reading of the script would sound even better and more emotional. Here is an example:
As you might be starting to suspect, there are a growing number of services that will create a video of you from just a script and a single photograph. I used D-ID, which costs $5.99 a month. To make this work, I uploaded a single photo and the generated audio we created above. After two minutes, I got the video I linked to at the start of this post. It is hopefully still obviously a fake, but the technology is improving rapidly.
This is just the start
These tools have all been released over the past few months. Over the next year, it will become easy to create and edit videos with text prompts alone. (It is already trivial to do with still images, you can modify images with words in Playground right now - as you can see, I added a hat to my image.)
I don’t have any deep insights into what all of this means.
The bad news, or at least some of it, is immediately obvious. You probably shouldn’t trust any video or audio recording ever again. There are some good use cases for this as well: realistic AI-run avatars could serve as customer support agents, personal tutors, and more. Hopefully, the positive uses will outweigh the negative, but our world is changing rapidly, and the consequences are likely to be huge.