Lights, Camera, AI! (Or: AI-generated video for social media)
Hello, dear readers and fellow video fans! Welcome back to Dear AI, the blog about Human-AI collaboration starring Gabrielle.Day - our sassy AI-powered advice columnist. In today's episode, we dive into the fast-moving world of AI-powered video creation!
There have been astonishing advances in video creation lately. Incidentally, we needed a video for social media to introduce Gabrielle to the world. So why not give her a glamorous AI-powered video where she can talk about herself?
All we had was one photo of Gabrielle. And in a few hours, we transformed that single headshot into a jaw-dropping video ready to go viral! It was a team effort that spanned scriptwriting, image generation, speech synthesis, music composition, and video editing - all with the help of some excellent AI tools.
Spoilers ahead!
So check out the video here first:
Now, settle back and enjoy this exclusive look behind the scenes!
1) Script with ChatGPT
The idea for the video started when we built Gabrielle's website (see last week's post). The website opens with Gabrielle's black and white headshot and a few lines where she introduces herself. ChatGPT helped write that introduction, which became the script for today's video.
ChatGPT suggested that we be candid about Gabrielle being an AI. I agreed but also wanted to preserve a sense of mystery and curiosity. So I asked it to help craft an ambiguous message that would only gradually reveal that she is artificial.
As an advice columnist, it made sense for Gabrielle to have gone through a lot to gain valuable experience. We had already decided she would be in her fifties. We thought about heartbreak, jail, and other challenges. Then, through our conversations about AI - we realized she was an AI model. As is the way of the (AI) world, older models get replaced by newer ones - and suddenly, Gabrielle had a back story. A few more details fit in with Gabrielle's human-like personality and AI existence alike - relating to her training, working remotely or virtually, and eventually, a script was born!
In the video, we would have Gabrielle tell this life story in her own voice as she ages from a young, hip model into an experienced and caring counselor. We just hoped we could do it!
2) Aging a photograph - with Editing by Playground AI's
We mentioned playground AI before - it's a fantastic and free AI image-generation tool we used for the images on Gabrielle's website. And one of the cool features they implemented is image editing. This feature lets you take a photo and describe with words how you want to change it. So, for example, you can ask, "Turn the water into wine," "Add dragons in the sky," etc. So we took a photo of 50-year-old Gabrielle and tried imagining how she would look at 20 or 70.
The initial prompt was "Make the woman Gabrielle 20 years old," but the AI model struggled with this wording. The editing model doesn't understand language as well as the newest text-to-image models. So, for example, it focused on the "old" part of the "20 years old" prompt, and gave this result:
This is where some human creativity with language came in handy! First, I tried other ways to describe it - like "the woman Gabrielle at 20 years young." That was ok, but eventually, using a specific age worked best - "Gabrielle at age 23". The exact age didn't matter, but it did pick up on the fact that 33 is older than 23.
The biggest strength of the editing model, though, is that it kept the faces very consistent! They mostly all faced in the same direction and had the same hairstyle, matching facial expressions, and jewelry. Although we threw away some images that looked too different and didn't make the final cut, this was a huge success and a great start to the day!
3) Changing Gabrielle's outfit with outpainting and inpainting.
We now had Gabrielle's images at different ages. But they were all square-shaped, and Gabrielle is anything but square! We knew we had to go vertical if we wanted this video to trend on social media.
So we used Playground AI's canvas tool to expand the images - a technique called outpainting. You select a frame extending beyond your original image, and the AI generates the prompt in the empty space.
We extended young Gabrielle's photo with a few clicks to showcase her outfit and style. But that's when we stumbled upon one of the biggest challenges of working with AI: bias in results.
About 80% of the generated outfits were too revealing. While AI companies try to reduce bias, there's a way to go. One way to get around this with an existing model is to change the prompt. In our case, it took heavy-handed language such as "The businesswoman Gabrielle wearing conservative clothing" to get decent results.
But wait, there's more! With multiple images of Gabrielle at various ages and one outfit image, we faced the next challenge - making the outfit image fit seamlessly with all the different faces. Typically, this would require a skilled Photoshop wizard to retouch the seams. But, with AI inpainting, it's a piece of cake. Inpainting means selecting a part inside the image and re-generating it. You also can change the text prompt to influence what gets inpainted. And whether you change it or not - the repainted area will match seamlessly with the rest of the image. So, it was easy to repaint the seams and make them vanish. Check out the before and after - it's magic!
We then put the images to the side and went to work on the vocals.
4) Generating Gabrielle's voice with Eleven Labs:
To bring Gabrielle's story to life with spoken words, we turned to Eleven Labs, an AI startup that turns text into incredibly realistic speech. Their VoiceLab lets you design AI-generated voices or clone your voice from samples.
The options when designing a voice are limited - you can choose gender, age, and accent and adjust a few sliders. Then the rest is trial and error. So we tried different voices until one came up that sounded like Gabrielle.
Next, we wanted to age the voice. The hope was that if we created some younger and some older voices, a few would match. But even after burning through half of the monthly generation credits, the younger and older voices didn't sound close to Gabrielle's main voice.
Instead, we resorted to a low-tech solution of 'aging' Gabrielle's voice by changing the pitch up or down. We used a free online tool to do it, and it came close enough.
With Gabrielle's voice-over ready, we needed the right background music to accompany her emotional journey.
5) Background music with AIVA
Quite a few AI music generators have popped up recently. AIVA is a more veteran company that's been around for several years and is one of my favorites. AIVA is also recognized as a virtual composer and released several instrumental music albums since 2016. What I like most about it is the easy-to-use interface that produces high-quality music at the press of a button.
We generated a piano solo in a minor key to serve as soft background music for Gabrielle's life story. Similarly to voice design, music generation is often trial and error. That's because most AI music generators are at an earlier stage, without precise controls or the ability to iterate on the music. Luckily, with the background track for Gabrielle, the piano solo came out as an excellent fit on the first shot.
6) Animating Gabrielle's face with D-ID
Animating a person to speak a specific script sounds complicated. Getting the subtle facial movements in sync with the voice is no easy feat. This recent WSJ article showed the journalist visiting the studios at a company called Synthesia to clone herself into a talking, moving video avatar. Besides having to visit the studio in person for the recording and wait several days for processing, this is also expensive and typically costs $1,000+.
But there is also a much easier and more accessible alternative with D-ID. The company creates lifelike avatars for online videos - from training to corporate communications. One of their remarkable features is that you can upload and animate any face. D-ID's video creator can then animate it in a few minutes with head movements, blinking, and mouthing of words. You can start from a written script and use one of D-ID's voices to read it aloud or upload a spoken track. In both cases, D-ID's AI animates the face to match.
We already had Gabrielle's spoken voice clips and images of Gabrielle at different ages. So we just had to upload the correct image with the right audio clip and let it run! That gave us about ten clips of Gabrielle speaking parts of the script at different ages. Fantastic!
7) Putting it all together - RunwayML
Runway ML is one of the startups on the frontier of generative AI. They worked with Stability AI on image generation models and have released several text-to-video generative models. Besides their contributions to generative AI, they built a fantastic and easy-to-use AI-powered video editor and several tools for AI video creation.
We uploaded all the clips to Runway and organized them in order... but the individual clips didn't immediately fit. For example, 23-year-old Gabrielle would end her sentence facing left, and 27-year-old Gabrielle came in facing right. Or the eyes were half-closed at the end of the first clip and wide open at the start of the next. A professional video editor could figure out how to make the transitions work. But of course, we are always curious to see how AI would do it.
We turned to a neat tool called FILM (Frame Interpolation for Large Motion), or Frame Interpolation in Runway's implementation. This is based on an AI model that takes a series of images and identifies objects that move between those images. It then creates an animated sequence that smooths out that motion. In our case, we used it to make quick, smooth transitions between the individual videos. For example, taking the last image from the age 23 video and the first image from the age 27 video, the model created a 1-second clip that smoothly tied the two pictures. Neat trick! And one that a future video-editing AI could replicate automatically.
We have yet to try Runway's generative video models, but they look promising for future AI social media posts!
8) Final touch - captions by Instagram
Instagram's caption feature for reels is super easy to use and was just what we needed. It uses AI to transcribe the audio of your reel automatically and generates animated captions that match. Super easy to use and a great accessibility feature!
It didn't get all the words right, but there's an easy way to correct any mistakes or even add emojis. It worked well even though the video already had the background music from AIVA - it may have worked even better if we had added that separately.
Voila! From idea to a good-looking video in just a few hours of work! All thanks to powerful AI tools that made the creative process flow and turned vague plans into reality.
Conclusion
Are you prepared for the age of artificial intelligence in social media? You might think it sounds like a nightmare. But, on the other hand, isn't social media already artificial? So what's the harm in adding some intelligence?
Either way, we wanted Gabrielle to be ready. And we were glad to find all these fantastic free AI software tools that made this amazing video come to life.
This mini-project of making the video was a lot of fun! It was another example of a project where I had a general idea of what I wanted but no clue how to achieve it, and AI helped overcome all the obstacles. I came away from this short filmmaking experience with a few thoughts:
Video is another field where having AI on your team gives you access to superpowers: from speech and background music to generating and animating images to editing.
The creative flow comes from the collaboration: It's not just about asking AI to do specific pieces. Instead, the magic comes from the back-and-forth process of iterating on those pieces and combining them in interesting ways.
It's unbelievable how easy this already is. Gabrielle is entirely fabricated, not at all famous, and a sweetheart. But it is just as easy to make a real-looking video of any politician, religious figure, or other influencers saying anything with a clone of their voice. We'll talk more about AI ethics in a future post. But one early thought about it is - the more awareness we can spread about how easy it is to create such convincing illusions, the more prepared we can be.
That’s it for today’s post! Thank you for following along and allowing us to share this journey with you.
What do you think about these AI tools? Awesome, or scary? Let us know in the comments!
Remember to email dear@gabrielle.day anytime for good advice from unexpected places, or write me at itai@gabrielle.day If there's anything specific you'd like to ask or suggest.
Liked this post? Share with others who might like it too!