adobe voco

By: Nick Mirra,


All you have in this life is your word. The human voice serves as the carrier for our words, thoughts, and feelings. Each of us is imparted with unique voice which allows us to be identified amongst a group.[1] Our voice is our vocal finger print. Every word which departs from our lips carries our exclusive trademark assigning words as our own.[2] Because uniqueness of voice is a phenomenon implicitly understood by all humans, our words have become intertwined with our identity. As a result of this interconnection between voice and identity, voice recordings have become easily introducible into evidence, and they serve to relay information in any given case through our own words.

Technology has confounded the reliability of vocal identification. For example, Alexander Graham Bell’s revolutionary invention of the telephone has impacted the use of vocal evidence in court.[3] Upon the advent of the telephone, testimony based on voice recognition has been even further complicated because vocal communication was made possible over long distances while providing relative clarity of voice. Even though the correspondents may be miles apart, the parties are able to communicate with each other effectively.

The next hurdle to vocal evidence since the telephone looms on the horizon. What would it be like if a proponent of a piece of evidence could introduce a voice recording that was clearly the voice of their opponent, but in reality, the opponent wasn’t the one speaking at all? Even further, what if the opponent himself was convinced that it was in fact their voice, but they hotly contest that they ever said the words uttered on the recording? There is a new software program being developed which allows the user to put words in your mouth. Through this program, your own unique and identifiable voice becomes the marionette bending at the will of the puppeteer.

When Adobe unveiled its Project VoCo software in a live press release in November 2016, it shocked the audience.[4] On a stage in front of spectators, an Adobe representative showed the true power of the company’s newest technology.[5] VoCo is a software which enables the user to make a computer say anything the user types into it.[6] This program is not akin to mere text-to-speech conversion software. VoCo can take typed text, and convert it into human speech spoken by anyone’s voice that the user has on file.[7] It can take a recording of a voice, and change one or more words in a spoken sentence, or even create novel sentences altogether.[8] More specifically, VoCo records a 20 minute audio sample, and then anything the user types after that will be read back by the program in the speaker’s own voice.[9] Essentially, the software is Photoshop for the human voice.[10] As the software evolves, the length of the voice sample required for the software to function will exponentially shorten, and the ease of manipulating another’s voice will become increasingly more simple.[11]

The courts will soon be faced with this software which will shake the principles of earwitness evidence. It is important for practitioners to be made aware of Project VoCo so that they can react competently to falsified evidence. The issues will be hard to detect, but VoCo is a plausible explanation for how someone is putting unfavorable words in their opponent’s mouth.




[1] See Sophie Scott, Why do Human Voices Sound the Way they do, BBC, (Dec. 1, 2009)

[2] See Gilbert v. Cal., 388 U.S. 263, 266 (1967).

[3] See e.g. F.M. English, Annotation, Admissibility of sound recordings in evidence, 71 A.L.R.2d 1024 (enumerating instances where telephone calls and voice recordings appear in American Law Reports).

[4] See Adobe Creative Cloud, #VoCo. Adobe MAX 2016 (Sneak Peeks), YouTube (Nov. 4, 2016)

[5] See id.  

[6] See id.

[7] See Nick Statt, Adobe is Working on an Audio App that Lets You Add Words Someone Never Said, The Verge (Nov. 3, 2016)

[8] See Id.

[9] See Id.

[10] See Id.

[11] See Id.

Image Source:×380.jpg.