I created my talking AI twin. Here’s how to make your own.

Subscribe to 2049 for weekly analysis, insights and resources to help you deepen your understanding of the technologies, innovations and ideas shaping the future.

Hi friends 👋

Welcome to the year 2049! Today I am sharing an experience I recently had, which blew my mind.

Hope you enjoy and try it yourself. If you do, send me the results!

It feels like being in the Wild West of AI. Every day my inbox and social media feeds are flooded with the latest AI tools that can apparently do amazing things.

I wanted to put them to the ultimate test: can they help me create an “AI twin” that looks and talks like me?

It turned out to be much easier than expected. I’ll show you the result and then explain what I did.

Pretty impressive. It took less than 2 hours and cost me $40, although you can recreate a decent version of it for free.

To create my AI twin, I had to do a few things:

  1. Create an AI avatar that looks like me.

  2. Create a synthetic version of my voice that reads any text I give it.

  3. Combine AI avatar and audio to create my talking AI twin.

First I created an AI avatar that looks like me using Mid Road. For best results, I used the latest version of Midjourney (v5) on their $10/month plan. Even if you don’t want to pay, you can still use the free credits they give you to create a good avatar.

Note: If you have never used Midjourney, here is a Quick Start Guide how to create an account and use the tool.

I uploaded a picture of me to one of the #newbies channels so Midjourney can use it as a reference. Let’s call it the reference picture for the rest of this tutorial.

I recommend you upload a clear image of yourself looking into the camera.

Then you will need to click on the image, right click, then “Copy link”.

Type “/imagine” in the chat box and press Enter to start writing your prompt:

The first thing you need to do is paste your reference picture URL to tell Midjourney to use it as a guide.

Now you need to write a complete prompt describing the image you want Midjourney to generate. This step requires a lot of trial and error, so don’t expect to get a perfect result on your first try. It took me 6 or 7 generations to get a look I was happy with.

The full prompt I ended up using:

(reference image URL) sitting at a desk, wearing a plain black t-shirt, white wall and bookshelf in the background, looking into the camera, facing the camera, holding a microphone, cinematic lighting, unreal engine, photorealistic 

Of the four images I got, I found the first one that looked the most like me (and a hint of Messi? 🧐).

AI Fawzi (made with Midjourney)

There’s plenty of room to be creative here! Here’s what Midjourney gave me when I asked him to turn me into a Pixar character.

Once you have an AI avatar that you are happy with, proceed to 2nd step.

Alternative methods:

  • If you don’t want to use Midjourney, use other tools like SLAB•E 2 Or Lens.

  • If you don’t want to upload images of your face for this experience, you can search for existing avatars/AI characters at Vocabulary. Make sure the character you get is facing the camera and not at an angle. This will be important for the last step.

What is a voiceless avatar?

For this step, I wanted to create a synthetic version of my voice that could read any text I gave it and still sound like me. I discovered Description overdub functionality, which allowed me to do this.

The tool allows you to create a “voice model” of yourself based on the audio recordings you provide to it.

Note: This step requires you to create an account Description and download their desktop app.

In the Descript desktop app, click “Create New Voice” in the “Voice” menu:

Next, you’ll need to provide at least 10 minutes of audio recordings that Descript can use to create your voice model. You have two options here:

  1. Drag any existing audio recordings of yourself onto the page.

  2. Record yourself reading the descriptions training scenario.

I used the second option and only recorded myself reading the script for 10 minutes. Descript recommends 30 minutes of audio to train the voice model, but mine still sounded pretty accurate.

Once you’ve uploaded your audio recordings to Descript, click “Submit Workout Data” in the top right.

After submission, Descript may take up to 24 hours to generate your voice model. Mine took about 12 hours.

Once ready, go to the “Recent Projects” menu and create a new project:

This will create a new document where you can type any text and use your voice model to read it. For this experiment, I wanted my AI twin to pose as if I was meeting him in person.

I used ChatGPT to get a script for a short introduction:

I pasted this intro into Descript:

Then I set the speaker to Fawzi:

You’ll need to give Descript a minute or two to generate the audio based on your voice model. Expect to be scared like I was.

Note: Descript’s free plan limits the words you can use in your script to this 1000 word list and will replace any other word with “jibber jabber”. Because I wanted to get the best results, I temporarily upgraded to their $30/month plan.

When the audio is ready, you can export it by clicking on “Publish” then “Export”:

Once the audio file is downloaded, go to Step 3.

Alternative methods: Someone on IG recommended ElevenLabs as an alternative. I’ve never used the tool and can’t speak to its quality, but I’m including it in case you need an alternative.

The last step was surprisingly the easiest.

Now that you have your AI avatar and synthetic audio recording of yourself, you’re ready to bring your AI twin to life!

For this step I used DID on their free plan.

After creating an account, go to “Create a video” and upload your AI avatar under “Choose a presenter”:

Then, in the right menu, switch from “Script” to “Audio” and upload the audio recording you exported from Descript:

Click on “Generate a video” and… you are done! The video will appear in your video library after a few minutes.

Congratulations, you have officially created your AI twin that looks and sounds like you.

I’m currently working on a fun video where I combine all the talking avatars I’ve created into one video and you’ll love it. You will soon see it on my instagram Or ICT Tac.

I was surprised how easy it was to do. I find it a little scary but also amazing.

If there’s a lesson to be learned from this, it’s that it’s essential to upload audio/video content with a grain of salt, especially with content involving public figures and influencers. Technologies like these can be fun for creative experiments like this, but can also be used in malicious ways.

So if you come across a video or audio recording of a celebrity or politician saying something crazy or inflammatory, remember that these can be easily faked now.

I’d love to see what you found, so DM me on Instagram or leave a comment to tell me about it.

leave a comment

I plan to make and share more fun experiments and tutorials with you. Subscribe to the year 2049 below if you don’t want to miss them:

The future is too exciting to keep to yourself.

Share this message in your group chats with your friends, family and colleagues.


How do you assess this edition of the year 2049?

Boring | All right | Great

Not yet subscribed? Subscribe for free here
Email me at fawzi@year2049.com for any questions or comments.
follow my instagram for more content

Leave a Comment