Will voice overs ever be replaced by Artificial Intelligence (AI)? That question seems to be on the minds of many colleagues.
Of course the answer is YES and NO. Don’t kid yourself. We’re already replaced by AI. Just go to any automatic supermarket checkout, or watch some idiotic YouTube videos. [bold blue text always indicates a hyperlink leading to additional content]
AI can do the easy, predictable jobs that don’t require any degree of voice ACTING. But when it comes to a nuanced performance that requires the talent to emote, AI still falls short.
A colleague recently stated:
“One thing people don’t seem to be getting is that it’s not about the AI being able to interpret emotional cues from a script. You’re right, it can’t do that. And it’ll be a long time before it can. BUT- it doesn’t have to. Because the human who will be using the program likely CAN do that. And once these programs have appropriately large and varied sample libraries, they’ll be able to direct the program to create a performance that WILL get those things.”
THE OPERATIC COMPARISON
Technology moves fast, but as far as I know there’s no AI “instrument” that’s sophisticated enough to replace an opera singer, singing the title role in Verdi’s “La Traviata.” That’s the comparison we should make. Here’s one of my favorite sopranos, Lisette Oropesa. The poor woman had to die twice on stage, because the audience begged for an encore.
Now be totally honest. Do you ever see robots replacing a vocal performance as beautiful and vulnerable as this one?
The best audio book narrators are our equivalent to opera singers. In this whole AI discussion, I refuse to be compared to some simple sampled piano. Remember that even the best sounding digital pianos need to be played by humans, and even then it’s a very different instrument, acoustically speaking. It will always be an imitation. A fake.
Let’s say, for argument’s sake, that a client chooses a machine over a human being. As my colleague said, we’d still need a real person to interpret emotional cues. That person needs to be paid, so, how much cheaper and faster will the artificial recording actually be? When a director asks me to give ten different takes on a tag line, I’ll do it rapid-fire. Do you see stupid software doing that?
Take a look at the demo of this “Voices of Gaia” software. It’s astonishing what it can do, but a lot of human manipulation is needed to make it sound less artificial. Hence all the knobs and faders in the interface.
The maker of this software has added a telling disclaimer to the video. This software is meant for “underscoring, reinforcement, and ambient arrangements.” In other words: it should be used for backgrounds. Not for solos.
In 2019, EastWest came out with Voices of Opera, sampling the voice of a soprano and a tenor. While the untrained ear may be fooled at first, it doesn’t take much musicality to realize that these sounds are as fake as margarine. No matter how good you can make them “sing,” they will always sound uninspired. You cannot program passion into a lifeless object.
All these sampled voices and instruments are great tools for budding composers who want to get a better idea of what their compositions might sound like. They can be used to create the illusion of a background choir, but they will never replace a singer with the emotional depth of Lisette Oropesa. And it’s that depth of vocal storytelling that sets a living, breathing human being apart from even the most expensive software.
TEXT TO SPEECH
Some of you may claim that speaking is less complex than singing. To quote Gershwin: “It ain’t necessarily so.” A classically trained singer will focus heavily on vowels (source). Intelligibility is often sacrificed for aesthetic reasons. In professional narration, intelligibility of the message is one of the main goals. That’s why a professional speaker will emphasize consonants as well as vowels. Speaking is just not as intense as singing, but it can certainly be as emotionally demanding and draining.
I guess one can build certain rules of inflection into text-to-speech software, but that’s always going to be: if this, than that. The computer-generated response is going to be predictable (read: boring).
Individual human emotions are unpredictable, sometimes subtle, and different each time. They are going to be imperfect and surprising. That’s what makes them so beautiful and unique.
The best compliment a narrator can receive from an author is this:
“S/he made my novel come alive. S/he put her heart and soul into my novel.” That, a computer can never do, because it has no heart and no soul. It is DEAD.
ETHICAL ARGUMENT
Even if it were possible for us to create software that is able to faithfully create or recreate human speech in all its subtleties and intensity of emotion, do we really want to go there?
You may have seen some astonishing deep fakes specifically created to manipulate impressionable people. So, when discussing the impact of AI, we shouldn’t only be talking about the possibility of us poor voice overs perhaps losing the most boring jobs to a software program. We really should be thinking about the meaning of truth in our society.
If these text-to-speech programs evolve even further, and we can no longer distinguish fact from fiction, and a lie from reality, whom or what can we trust? Do we really want to get to a place where a Korean dictator can appear to have fired nuclear missiles heading for the United States? It’s already happening! The same software that allows those who lost their voice to cancer to speak, can help produce a deep fake Vladimir Putin talking about starting a war.
Technological advances are unstoppable you may argue, but that does not turn us into hopeless, helpless victims. As a society we can decide how far we wish to take it. Just because we can, doesn’t mean we must. Yes, it is totally possible to edit human DNA, but we deem certain interventions and applications to be off limits because of ethical concerns.
Ideally, technology is supposed to make life easier. Not more complicated and life-threatening. It’s supposed to serve us instead of the other way around. We are not powerless spectators either. At the supermarket we choose to either go to the automatic checkout that is taking away jobs, or go to a real cashier who is earning a living. Our clients face the same choice.
When thinking about Artificial Intelligence and the effect it might have on our line of work, there are no easy answers. So, how can you possibly prepare for what lies ahead?
FUTURE PROOFING
Even though I strongly believe Artificial Intelligence will never replace ALL voice over work, some of what we do has already been taken over by sophisticated text-to-speech software.
It’s usually the type of work that does not require the talent to express emotional highs and lows. It’s the boring and repetitive stuff that doesn’t ask for a lot of acting skills. So, if you have a knack for flat, neutral reads, your job may already be on the line.
To futureproof your career against AI, you have to specialize in areas computers can’t handle.
New York Times tech columnist Kevin Roose has written a new book, “Futureproof: 9 Rules For Humans In The Age Of Automation.“
According to Roose, the following jobs are futureproof:
Surprising jobs are safe.
The more irregularities your job requires (surprises), the harder it is for AI to do. For instance, a computer-simulated voice can read a repetitive text at the supermarket check out (“please take your receipt”), but it can’t teach Kindergarten.
Social jobs are safe.
Those are jobs that involve making people FEEL things, rather than manufacturing things. Roose is talking about professions that require soft skills. Think of jobs in healthcare, social services, and interpersonal education. Think of therapists, ministers, but also flight attendants, baristas and yes…. actors.
Scarce work is safe.
By “scarce” Roose means jobs that require a rare combinations of skills and people who are experts in their field. He often uses the guy who prepares his taxes as an example. In this day and age of Turbotax, you’d think accountants would be the first to be replaced by AI software. But his accountant Russ, who used to be a stand up comedian, brings his sense of humor to the job and makes doing the taxes a lot of fun. Good luck finding accounting software that actually makes you laugh!
I’d like to add a fourth category to this list:
Jobs that require creativity and imagination are safe.
Computers are good at replication and repetition -if A, then B- but they cannot think out of the box. They can’t take heart ache, and spin it into gold.
Kevin Roose:
“Rather than trying to compete with machines, we should be trying to improve our human skills; the kinds of things only people can do. Things involving compassion, critical thinking, humor, and moral courage. And when we do our jobs, we should be doing them as humanely as possible. That means: putting more of yourself in your work. Focus on things that reveal your personality.”
Ever since Henry Ford developed the assembly line, automation has given us more and more of the same. But those who offer something truly unique, like the team at Gotham Garage, know they will get top dollar for making vehicles that are anything but cookie cutter. It goes to show that people with special skills who love what they do, are more valuable!
So, BE AN ORIGINAL!
The rest is already taken.
ONE OF A KIND
I’m going to leave you with one final thought. It has to do with kindness.
I don’t know if you share my experience, but when I’m on social media I often notice how rude people can be (especially to folks they don’t even know). I think the online experience creates distance, and depersonalizes every interaction. We can’t see people’s facial expressions, so we have no idea how our words land. Add to that the fact that some people hide behind fake personas, so they can’t be held accountable for their nasty comments.
We need to be nicer to one another. More caring, more understanding and compassionate.
Kindness is not something you can program into a computer. A computer doesn’t care.
The future belongs to the emotionally intelligent people who do.
Moira Tait says
This is a different take on other blogs about AI and voiceovers and I loved your final comment. It hits the nail on the head.
Paul Strikwerda says
Thank you, Moira. If I would say the same things other people have already said, this blog would be redundant. I’ve never been good at carpentry, but I’m glad you think I hit the nail on the head.
I’m currently working on creating a “virtual Paul” that will work WITH me and secure the jobs that someone really wants an AI voice for. You can borrow him. May not match the accent, but at least the name is the same?
Virtually the same should be good enough for most people. Let me know when you have your AI assistant. I might want to borrow him.
Excellent post, Paul. “Just because we can doesn’t mean we should.” So true.
I’m still doing a lot of work that AI could do, like video training for the armed services, but the client wanted to have a real voice to humanize the robotic copy. It got me a repeat booking, too.
In the pandemic lockdown, I have been delinquent about staying in touch and up to date with your posts (and those of far too many other friends’ as well). But I have been grateful that in our business, we can still work from home while others are locked out of their jobs; It gives me an even greater appreciation for what we do.
A last thought (for now): a recent New Yorker cartoon shows a bear leaving a group of other bears and returning to its cave, saying, “I’m going back into hibernation until the first wave of partying is over.” I feel as though that wave may be wrapping up, at least in certain regions, and I look forward to re-entry and interaction with “real people,” not “Zoom people.” However, my social skills have become rusty and I wonder what the new reality will feel like once we’re out there in it.
Stay happy and healthy, my friend!
Thanks for taking the time to read my blog, Paul. I too, long for the time we can be truly social again, and enjoy life to the fullest!
Insert big and real smile here (something a computer can’t do and an emoji can never replace).
How do I know this was REALLY Paul Strikwerda who wrote this blog… hmmm….
OH YEAH! It has HEART. 🙂
That was so nice of you to say! I heart your comment.
True to your usual form … inspiringly fresh and authentically clear.
You’ve summed what we need to strive for Paul. The brave warriors will victorious fight ahead in the game of AI.
Well expressed!
Thanks, Ramesh! Let the TTS software take care of the boring jobs. There will be plenty of interesting work left.
The problem is many businesses these days don’t want the opera singer because she costs too much, and they place no value on her performances in any case. Far too often there’s days I see mr and mrs Joe public accept mediocre as the benchmark. Until the general public get bored and dis satisfied with below par performances, these big players in AI are going to continue to carve a chunk from the voice over industry…
I don’t think it’s going to be a chunk you and I will be interested in. I’ll gladly leave it to the Fiverr and VoiceBunny crowd.