Putting Words in Your Mouth: The Evidentiary Impact of Emerging Voice Editing Software

Mirra Publication Version PDF

By: Nicholas Mirra, Esq.*

Cite as: Nicholas Mirra, Putting Words in Your Mouth: The Evidentiary Impact of Emerging Voice Editing Software, 25 Rich. J.L. & Tech., no. 1, 2018.

I. Introduction 

[1]       All you have in this life is your word. The human voice serves as the carrier for our words, thoughts, and feelings; each of us is imparted with a unique voice that allows us to be identified amongst a group.[1] Our voice is our vocal finger print.[2] Every word which departs from our lips carries an exclusive trademark which identifies those words as belonging to an individual.[3] Because individuality of voice is a phenomenon implicitly understood by all humans, our words have become intertwined with our identity.[4] As a result of this interconnection between voice and identity, voice recordings and identification have become essential to the legal process.[5]

[2]       In today’s technologically advancing world, evidence can be effortlessly manipulated in more ways than were imaginable even a few years ago.[6] There is little debate that, as a whole, strides in technology make the human experience more convenient and productive.[7] With every new advance in human ingenuity comes a reciprocal; a set of new problems that had never been at issue prior to the new invention.[8] There are few areas of civilization that are as equally susceptible to the benefits and hurdles of new technologies as the law.[9]

[3]       Law is uniquely situated in a position where it must play both ends. Although the field of law has felt the positive impacts of technological advancement, it also has the opportunity to be manipulated by it.[10] In order to prepare for new technologies, courts must consider how they can be used to provide novel forms of evidence, or conversely, how the new technology may threaten existing and well established forms of evidence.[11] One type of evidence whose genesis was created due to technological innovation was the voice recording in 1878.[12] The invention of recording gave practitioners an opportunity to capture the human voice for later reproduction.[13] When voice recordings were first introduced, the recordings were preserved on physical mediums such as cylinders of wax used in the phonograph, or later at the time the Federal Rules of Evidence were enacted, on tape recordings.[14] These mediums were difficult to tamper with after recordation, and thus the bar for introducing evidence of voice recordings was set understandably low at the time the Federal Rules were formed.[15] Now that most voice recordings are digitized, the data which recordings are comprised of may be more readily manipulated.[16]

[4]       The next great threat to the reliability and authenticity of recordings of the human voice is looming on the horizon. With Adobe’s advent of Project VoCo, and with other companies beginning to create similar products that operate more quickly,[17] the courts must be made aware of the new technology, and issues associated with the best way of preparing for its arrival.[18] The current law surrounding the admissibility of voice evidence is based on the assumption that—as with all real evidence—juries are inherently capable of assessing the weight of the evidence.[19] This premise is based on the human ability to perceive differences in visual and auditory cues.[20] Project VoCo undermines this import placed on the ability to perceive differences in sound.[21] When voice recordings are manipulated using Project VoCo, juries will be unable to perceive an auditory difference in speech in voice recordings.[22]

[5]       Admissibility issues surrounding voice identification have confounded the legal process since its beginning, but the complexity encompassing voice identification continues to expand into the digital age.[23] Issues regarding vocal evidence arise even in its most simple form—when an earwitness is identifying a person based on their natural voice alone.[24] It could be a bank teller who was robbed by a masked man, left only to identify the perpetrator by the unique characteristics of their voice. Perhaps it could be a witness who received a threatening phone call but had no other means of identifying the person aside from their voice. How reliable could the identification of a defendant be if it is solely based on voice?[25]

[6]       This paper will first introduce Adobe’s Project VoCo. This introduction of the new technology will explain the power and breadth of the software. Next, the paper will address how the Federal Rules of Evidence are ill-equipped to handle Project VoCo. Then, the paper will analogize Project VoCo to other technologies, and discuss how courts have adapted to those technologies. Later, the paper will discuss why there should be a new threshold intertwined into the current Rule 901 standard for admitting voice evidence. Finally, the paper will conclude with a brief summary.

II. Project VoCo

[7]       Adobe is a software company known for Photoshop, Acrobat, Illustrator, Dreamweaver, and more than 20 other software programs.[26] Many of these programs are available for public purchase, and following in the footsteps of several other large technology companies,[27] are now available on a subscription service known as Adobe Creative Cloud.[28] These powerful programs are accessible by anyone with internet connection and the modest capital required to subscribe to the use of these products.[29]

[8]       Although Photoshop’s effect on evidence is still at issue,[30] the newest threat to the legitimacy of evidence is Adobe’s Project VoCo. When Adobe unveiled its Project VoCo software in a live press release in November 2016, it shocked the audience.[31] On a stage in front of hundreds of engaged spectators, an Adobe representative showed the true power of the company’s newest technology.[32] VoCo is a software that enables the user to make a computer read back anything the user types into it.[33] Although the underlying concept may seem familiar at first glance, this program is not akin to mere text-to-speech conversion software.[34] VoCo can take typed text and convert it into distinguishable speech spoken by anyone’s voice that the VoCo user has on file.[35] Project VoCo can use a recording of a target’s voice and change one or more words in a spoken sentence, or even create novel sentences altogether.[36] More specifically, VoCo can use a 10 minute audio sample of their target speaking, and then anything the user types can be read back by the program in the target speaker’s voice.[37] The VoCo user can individually adjust each phoneme within any word in the sentence in order to create a sentence that flows as naturally as a real human statement.[38] The software also hosts a litany of pronunciations available for each word used as well.[39] The user can modify the pitch and duration of each syllable for even more accurate speech that sounds identical to the target’s natural voice.[40] Essentially, VoCo has largely been dubbed as a Photoshop for the human voice.[41] As the software evolves, the length of the voice sample required for the software to function will likely exponentially shorten, and the ease of manipulating another’s voice will become increasingly more simple.[42]

[9]       VoCo produces a novel issue of law. If VoCo is used by an actor, a voice recording may be properly identified as belonging to a party of a case, but the recording may not accurately represent anything that the party ever said.[43] Essentially, anyone with access to VoCo will be able to put words in someone’s mouth.[44] Although such a misuse would require the bad faith action, once the edits to the recording have been made, it will be extremely difficult to discern any alterations or tampering to the recording.[45]

A. The Federal Rules of Evidence Are Ill-Equipped to Handle VoCo

[10]     The Federal Rules of Evidence were enacted in 1975;[46] a time when photographs were taken on traditional film, the United States Military was leaving Vietnam, and well before the internet was invented.[47]  The future advances in vocal and photographical technologies were understandably beyond the purview of the drafters of the Rules.[48] Although several technologies that would eventually create more complex evidentiary issues had not yet been invented in 1975, the fundamental concerns regarding identification by voice recognition were nonetheless directly addressed by the Rules of Evidence.[49] Rule 901 explains that authenticating or identifying evidence requires that the proponent of the evidence must provide supporting evidence to prove an item is what the proponent claims it is.[50] More pertinently, Rule 901(b)(5) states that an earwitness may testify to their opinion “identifying a person’s voice—whether heard firsthand or through mechanical or electronic transmission or recording—based on hearing the voice at any time under circumstances that connect it with the alleged speaker.”[51] Rule 901(b)(6) more specifically addresses evidence about telephone calls.[52] The Rule states that evidence regarding a phone call is admissible if it demonstrates that a “particular person, if circumstances . . . show that the person answering was the one called . . .”[53]

[11]     Under the Rules, if a proponent posits that the circumstances surrounding the call demonstrate that John Doe was on the other end of a phone call, and the proponent records the phone call, then the proponent would merely need to have someone who has heard John Doe speak identify the voice on the recording as belonging to John Doe.

[12]     In its essence, Rule 901 is a specific application of Rule 104(b).[54] Rule 104(b) states: “When the relevance of evidence depends on whether a fact exists, proof must be introduced sufficient to support a finding that the fact does exist. The court may admit the proposed evidence on the condition that the proof be introduced later.”[55] This standard requires that a proponent merely provide evidence that could support a finding.[56] As a result of this low threshold, these forms of evidence are almost always admitted at trial.[57]

[13]     In practice and context, the problem with Rule 901 is that ear witnesses may testify about whether a voice sample belongs to someone, and the jury is left with the responsibility to provide the evidence with weight that they deem adequate.[58] Both voice evidence and photographical evidence have extreme probative value in the eyes of juries.[59] In fact, under Rule 403, oral and visual evidence of key facts in a case can become unduly prejudicial to a party.[60] For example, courts have found that gory photos of a crime scene can tip the scale balancing prejudice and probative value too far towards the prejudicial side, and courts will typically not allow them to be shown to a jury.[61] Similarly, a graphic auditory account of a crime scene may cause voice evidence to be unduly prejudicial.[62]

[14]     Under this current scheme, VoCo will present a means by which a proponent could introduce evidence which satisfies the Federal Rules of Evidence in a seemingly imperceptible fashion.[63] Because the Rules plainly permit a witness to testify that a voice on a recording is distinct and belongs to the person in question, a sample altered by VoCo could be easily slipped into evidence.[64] VoCo could produce admissible evidence even if the opponent never actually said the words that were expressed on the recordings.[65] Further, it would be nearly impossible for an opponent to rebut a recording of their own voice being played aloud for the courtroom to hear without producing reams of metadata.[66] Regardless of the application, the potential for abuse by using VoCo is boundless in breadth and limitless in depth.

[15]     The current Federal Rules of Evidence inadvertently provide an avenue for a party to authenticate false vocal evidence with relative ease. The Rules are outdated in this regard because until now, manipulating one’s speech has never been as viable as a threat.[67] Until VoCo becomes more mainstream and occupies the public eye, a blindsided opponent likely would not be able to sufficiently explain how the proponent of the evidence has a “smoking-gun” voice recording of words that were never said. The evidence could be damning, and the opponent would be left without plausible justification for why it is her voice on the recording, but words she never said.

B. Examples Where Voice Evidence Could be Easily Abused 

[16]     Although voice identification evidence has criminal and civil implications, in a criminal context, the potential for abusing VoCo is especially rampant. Voice evidence plays a substantial role in criminal cases.[68] The entire course of a defendant’s life could be altered with some quick changes made on VoCo to a voice recording that provides a pivotal piece of evidence at trial. In order to illustrate the importance of voice evidence in the criminal context, a few cases where the outcome hinged on voice evidence are described below:

[17]     In United States v. Dionisio, the Supreme Court held that voice exemplars, when compelled, are not in violation of the Fifth Amendment privilege against self-incrimination when the exemplars are used merely for identification purposes.[69] The Court also concluded that compelled disclosures of voice exemplars in front of a grand jury are not a violation of the Fourth Amendment as an unreasonable search or seizure.[70]

[T]he Fourth Amendment provides no protection for what ‘a person knowingly exposes to the public, even in his own home or office . . . .’[71] The physical characteristics of a person’s voice, its tone and manner, as opposed to the content of a specific conversation, are constantly exposed to the public. Like a man’s facial characteristics, or handwriting, his voice is repeatedly produced for others to hear. No person can have a reasonable expectation that others will not know the sound of his voice, any more than he can reasonably expect that his face will be a mystery to the world.[72]

[18]     This case sets a pivotal foundation for VoCo because it illustrates that courts can compel a defendant to provide a voice sample in a criminal context.[73] As a result, if there is a voice recording that is fraudulently crafted using VoCo, then the court can reasonably compel a person of interest to provide a voice sample.[74] If the grand jury deems the voices to be similar enough, the grand jury can indict the defrauded defendant.[75]

[19]     In United States v. Ashers, the defendant was convicted of “accepting a bribe while employed as a classification and parole specialist” at a prison complex.[76] During the trial, the United States proved that the defendant disguised his voice while preparing a voice exemplar for a defense expert to examine.[77] The defendant’s voice was pivotal in the case because the bribery conviction was largely attributable to a recording of the defendant conversing with an inmate who was wearing a wire.[78] The district court enhanced the defendant’s offense level by two levels for obstruction of justice due to the defendant intentionally falsifying his voice.[79] Ultimately, the 4th Circuit held that the judgment was proper.[80]

[20]     If VoCo was in fruition at the time of this case, then the defendant may have been able to disguise his voice or change his words entirely by altering it with VoCo instead of simply attempting to disguise his voice.[81] VoCo’s transformative powers would allow an abuser of the technology to augment any recording of his voice in a manner that would sound completely different from the speaker’s natural speech.[82]

[21]     In United States v. Basey, the 9th Circuit held that voice identification, witness testimony to use of an alias, and recorded telephone conversations were sufficient evidence to identify a defendant.[83] The court explicitly stated that voice identification may be accomplished by direct or circumstantial evidence.[84] In this case, a Drug Enforcement Administration (DEA) agent was able to identify the defendant’s voice based on recorded conversations.[85] This identification, in combination with witness testimony that identified the defendant’s alias, and recorded conversations were sufficient to identify the defendant, and the conviction was upheld.[86]

[22]     This case demonstrates how low the admissibility bar for voice identification is. Identification by one individual was sufficient to establish that the defendant was the voice on the recording.[87] With VoCo, these recordings could have been modified with or without the agent’s knowledge, and the voice would still have been identified as belonging to the defendant.[88] The identification of the voice would have been correct, but the recording would not be an accurate representation of the true conversation.[89]

[23]     In order to effectively predict how the courts will address VoCo, it is paramount to examine issues surrounding comparable software. Some may argue introduction of VoCo may create a slippery slope by which every opponent to a piece of voice evidence alleges that the recording was falsified using VoCo. While this concern is legitimate, the same defense could be articulated about photographic evidence and Photoshop, but that argument does not seem to be raised often.[90]

III.  Courts Should Look to Previous Ground-Breaking Technologies in Order to Prepare for Project VoCo

[24]     Whenever a novel technology is introduced that impacts the viability of evidence, courts have to adapt.[91] Generally, this adaptation either occurs after the new technology has caused a problem, or the courts attempt to assimilate the new technology into the ill-equipped confines of an existing evidentiary schema.[92] Project VoCo is not available for public purchase at the time of this paper, but the threat looms.[93] The drafters of the Rules have the unique proactive opportunity to ensure that the Rules of Evidence are equipped to grapple with this new technology should it become available in the near future. In order to best prepare for the new technology, courts should look to how they adapted to several similar innovative technologies in the past. Specifically, courts should consider the telephone, voice modulators, and Photoshop.

A. Telephone

[25]     Alexander Graham Bell’s revolutionary invention of the telephone has impacted the use of vocal evidence in court.[94] Upon the advent of the telephone, testimony based on voice recognition has been further complicated because vocal communication was made possible over long distances while providing relative clarity of voice.[95] Even though the correspondents may be miles apart, parties to a phone call are able to communicate with each other effectively.[96]

[26]     The Federal Rules of Evidence were enacted in 1975, at which point the telephone had become commonplace in American society.[97] The technology was so ubiquitous and understood that the Rules had specific criteria to address the use of telephonic evidence.[98]

[27]     The content of telephone calls and the identity of the speakers has come to play an important role in legal proceedings. An earwitness testifying about the content of a telephone conversation must be able to prove the specificity of the person called.[99] To overcome that standard, the voice of the caller would viably be cited as support for identifying a specific person on the other end of a phone call in accordance with the standard previously set forth in Rule 901(b)(5).[100]

[28]     Unlike the telephone, VoCo was not a ubiquitous technology at the time the Federal Rules were enacted.[101] In fact, VoCo is still not a ubiquitous technology today.[102] Once the technology is readily available and is possessed by the masses, the law will need to react accordingly.

B. Voice Changer 

[29]     A voice changer, or a voice modulator, is an electronic device or software program that manipulates the human voice, usually in real time.[103] Although voice changers appear less often in cases than photographic evidence, voice changers share many similar features as VoCo, which makes them an import analog.[104] For example, like VoCo, not all voice changers are standalone electronic devices.[105] Many of these voice changers are computer programs that can be employed to alter swaths of speech much like VoCo.[106]

[30]     Voice changers are also known as voice disguisers.[107] When you use a voice changer, your voice is changed so dramatically that close friends and relatives would not be able to discern who they are speaking to on the basis of voice.[108] They operate primarily by raising and lowering the pitch of the user’s voice.[109]

[31]     In United States v. Gilbert, a defendant was convicted of making a telephone bomb threat.[110] The district court admitted evidence that the defendant purchased a toy voice changer on the day the telephone bomb threat was made.[111] This voice changer was capable of shifting the pitch of the speaker’s voice up or down so that it was disguised.[112] The defendant made multiple threatening calls using the voice changer.[113] Ultimately, an expert was able to prove that the defendant was the one making the calls by taking the recorded messages and altering the pitch so that the defendant’s true voice shone through.[114]

[32]     Although voice changers generally shift the pitch of a natural voice, VoCo is distinguished as being far more powerful and containing many more features.[115] VoCo can edit almost any facet, but it can also generate novel speech.[116] These features make alterations using VoCo much more difficult to detect than voice changers.[117]

[33]     The way the court handled voice changer evidence in Gilbert was by allowing an expert to testify as to the identification of the defendant.[118] As discussed later in Part IV, courts may need to allow experts to examine evidence of voice recordings in order to determine if it has been tampered with by VoCo.[119] The difficulty still remains that VoCo uses the voice of its target to generate speech.[120] Because VoCo does not merely change the pitch of speech, but instead can substitute or generate words in the target’s voice, a voice recording expert would presumably need to look at other indicators aside from pitch.[121] Perhaps an expert would be able to examine metadata associated with a voice file, but the requisite analysis would be undoubtedly complex as discussed below.[122]

C. Photoshop

[34]     Photoshop allows the user to alter images in order to “[c]reate anything you can imagine[,] [a]nywhere you are.”[123] Photoshop shook the foundations of photographic evidence when it was brought to market, because it allowed users to modify photographs in almost any way imaginable.[124] The software was first introduced in February of 1990, and has transformed the visual world ever since.[125] Photoshop began as a rudimentary digital editing program, but it has advanced greatly since its genesis.[126] Photoshop now enables a user practiced in the program to alter existing images, layer new images over other images, and remove aspects of images.[127] Although complex tasks on Photoshop require some level of experience, novice users are also able to manipulate photos in extraordinary ways.[128]

[35]     Even before Photoshop, courts have grappled for years with instances where photographs are manipulated, and the law is still developing in the area.[129] The ability to untraceably falsify photographic evidence has never been more accessible, and it poses a legitimate threat to justice.[130] Although the programs exact change on different electronic mediums, both Photoshop and VoCo pose similar threats.

IV. The Similarities Between Photoshop and Project VoCo

[36]     Though VoCo poses unique challenges in the world of voice evidence, several parallels may be drawn between it and its sister software, Photoshop. Photoshop is capable of taking digital data in the form of photographs and manipulating it in order to produce a wholly altered and convincing image.[131] Photoshop is now so advanced, that the user can start with a blank slate and create an image of whatever they desire through the manipulation of both stock and original images.[132] VoCo operates on a similar plane, but with an analogous medium. VoCo takes digital data in the form of vocal recordings and can manipulate it to produce a novel vocal recording that was never actually uttered by the original speaker.[133] VoCo is the Photoshop of soundwaves.[134] In order to effectively predict how VoCo will be addressed by the courts, an analysis of how they have dealt with Photoshop’s impact on photographic evidence will provide insight.

A. How the Courts Have Handled Photoshopped Evidence

[37]     Similar to vocal evidence, photographic evidence has been a mainstay in courts for decades.[135] As Photoshop gained traction in the market, digital manipulation of photographs became commonplace as the public began to explore and exploit the program’s functions.[136] While the import of photographic evidence is often the subject of debate, photographs continue to play a fundamental role in the legal system.[137] Few pieces of evidence are more convincing and enduring for a jury than a photograph illustrating a pivotal subject of a trial.[138] Understandably, the newfound powers of Photoshop raised evidentiary red flags amongst academics, practitioners, and courts alike.[139]

[38]     A photograph must be both relevant and authentic in order to be introduced into evidence.[140] Similar to vocal evidence, low hanging hurdles such as those described above have created an avenue for falsified images to be easily introduced into evidence. Because of the weight that photographic evidence carries in the mind of jurors, having a clear and convincing image to provide to the jury can make or break a case.[141] This weight can occasionally lead a party to modify a photograph to depict a scene more clearly, or simply misrepresent a scene in their favor through editing techniques.[142] Traditional editing techniques were far more detectable when photographs were taken using analog film and development.[143] Similar to VoCo making voice alterations achievable and difficult to detect, digital photographs and Photoshop have made it nearly impossible to detect alterations made to photographs.[144]

[39]     “Because of the assumptions relating to analog informational records, under the current rules only a few quick and sketchy foundational questions are required to allow writings, photographs, and tape recordings to come into evidence as ‘authentic’—as being what they ‘purported to be.’”[145] It is sufficient to prove authenticity and relevance if a witness on the stand testifies that the photograph accurately depicts the scene as they saw it.[146] “Similar to the photograph, a party seeking to admit sound evidence need only show that the recording is an accurate reproduction of sound that was previously heard by a witness.”[147]

[40]     In an effort to combat tampering, there is an emerging practice where digital photographs are authenticated and verified via metadata.[148] Metadata is defined as “data that describes other data.”[149] Common examples of metadata include: author, date created, dates of modification, and file statistics.[150] Unfortunately, metadata can also be readily altered by actors who are only moderately technologically inclined.[151]

[41]     In regard to digital photographs, metadata is a comprehensive listing of all persons who had access to the photograph, as well as digital breadcrumbs that evidence any edits or alterations made to the photo.[152] Even under the monitor of metadata, it is possible for edits to be made covertly without trace or trail.[153] Unfortunately, utilizing metadata for authenticating a digital photograph requires expert knowledge for both production and analysis.[154] As such, providing metadata for each digital photograph is proving to be impractical and unduly burdensome and unrealistic in application.[155] Using metadata to track the access and alterations made to voice recordings would presumably pose the same host of issues as using metadata in digital photographic evidence.

[42]     Under federal law, the consequences for false declarations before a court are enumerated in 18 U.S.C. § 1623.[156] Illegal modifications to photographs would likely fall under the veil of false declarations because the modifications are being falsely represented as true.[157] Infractions under this statute carry a penalty of a fine and up to five years of imprisonment.[158] Though the consequences of defrauding the court are severe, because modern photographic alteration techniques are so covert, some parties likely calculate the risk of detection of their bad faith action is substantially outweighed by the reward of a victory at trial. VoCo will create the same potential to defraud the court and usurp justice.

B. How Should Project VoCo be Handled? 

[43]     It is paramount that the Federal Rules of Evidence adapt. One plausible solution would be to use voice evidence experts to provide expert testimony regarding the authenticity of voice evidence. This would require the experts to satisfy the Daubert standard.[159] In Daubert, the Supreme Court held that the expert testimony requirements enumerated in the Federal Rules of Evidence superseded the old common law requirements.[160] 

[44]     Daubert was later superseded by statute as codified in Federal Rule of Evidence 702.[161] As mentioned previously, the admissibility standard for voice recordings is simply that someone familiar with the voice is able to verify who is speaking on the recording.[162] If experts were to testify about the authenticity of voice recording evidence, it would require experts to pass this higher 702 standard in order to testify about voice recording evidence, which has a lower standard.[163]

[45]     In total, there are three potential solutions for the Federal Rules of Evidence to adapt to VoCo. First, the law could remain exactly the same. Second, the law could require that a metadata expert testify as to the validity and veracity of all native digital, or retroactively digitized, voice recordings in accord with the standard described above. Third, the law could adopt a threshold standard where expert testimony regarding metadata would need to testify about the validity of a voice recording. For the reasons discussed below, the Federal Rules of Evidence should adopt a threshold standard to Rule 901.

[46]     The first option is implausible because the Federal Rules of Evidence should not go unchanged in response to VoCo. The Rules are not equipped to handle this new technology for the reasons discussed in Part II.[164] If the Rules are not modified, the potential for manipulation of voice evidence could create a large obstruction to the administration of actual justice.

[47]     In the alternative, if metadata was required for every piece of voice evidence introduced to the courts, the courts would become clogged by the new standard for admission. The courts would likely be burdened because, as mentioned in Part II, the current admissibility threshold under Rule 901 for voice evidence is currently so low.[165] If experts were required to testify about the metadata for each voice recording ever used in federal court, the timeframe and costs of litigation would increase significantly.[166] This is especially prominent in the criminal context because that is where a majority of voice evidence is used.[167] If the prosecution had to produce an expert to interpret and testify every time voice evidence of a defendant was introduced, the government would be heavily burdened.[168] Likewise, a defendant may need to produce an expert to testify against the validity of the metadata that further amplifies the timeline and expense of trial, which many criminal defendants simply cannot afford.[169]

[48]     For the reasons mentioned above, the Federal Rules should adopt a threshold standard. Under this threshold standard, Rule 901 should be kept as the general rule that applies to voice evidence unless the opponent of the evidence is able to demonstrate a plausibility of tampering with the evidence.[170] The opponent to the voice evidence should bear the burden to demonstrate plausibility of tampering, and if the opponent is able to demonstrate plausibility of tampering, then a Daubert standard should apply.[171] Under this standard, a metadata expert would need to demonstrate that the voice recording has not been tampered with or manipulated using VoCo or similar software.[172] After this expert testimony, the jury should be able to assess the weight of the evidence.[173]

V. Conclusion

[49]     What is preventing a voice recording created by VoCo from being introduced as authentic evidence? Under the current Federal Rule structure, almost nothing.[174] If there was a threshold showing that the proponent of voice evidence may have altered the recording, perhaps the only thing inhibiting a fabricated recording from being admitted is an analysis of the metadata by an expert which demonstrates the digital modification.[175] The challenges of proving that an opponent did not say what he was recorded saying will prove to be a momentous task that must be tailored on a case-by-case basis. How does one prove that despite a recording being their voice, that they never said those words? They must demonstrate that the voice recording is not authentic in a similar way that a Photoshopped photo is proved to be inauthentic.[176] The first step however, is that practitioners must be aware that this technology exists and familiar with its function. Undoubtedly, VoCo will be further refined and released to the public in the near future.[177]

[50]     The Federal Rules of Evidence should be amended to account for the proliferation of this technology.[178] Authenticating voice recordings will be a much more profound impediment when introducing evidence as a result of their presumptions of validity being unsettled. In an ideal world, a voice recording would require accompanying metadata similar to digital photographs that demonstrates the recording is in a raw and unadulterated format. Like photographic metadata, the burden of producing metadata for each voice recording used in evidence will prove to be unreasonably cumbersome as it would magnify the burden of discovery, and it can still be unreliable.[179] This is why the Rules should adopt a threshold showing.[180] Under this threshold showing, once an opponent of the evidence is able to demonstrate plausibility that the proponent tampered with the evidence, the Rules should require experts in metadata to provide their opinion as to the validity of the voice recording.[181] Although metadata is a partial solution, it is not a perfect solution because it too can be altered.[182]

[51]     The law must adapt to reconcile the evolving possibilities of tomorrow instead of being entrenched in the antiquated shadow of yesterday. In order to prevent the voice recordings from being distorted by a few keystrokes on a laptop, the law must account for potential foul play facilitated by new technologies. As novel innovations in technology proliferate, opportunities for dishonesty multiply. Project VoCo and similar technologies are coming, and it will demand change.


* Nick Mirra, Esq. earned his B.A. in Psychology at Christopher Newport University and his J.D. at the University of Richmond’s T.C. Williams School of Law. He is a current practitioner at Woods Rogers, PLC in Roanoke, Virginia, where his legal focus is on Cybersecurity and Civil Litigation. Nick also served as Editor-in-Chief for Richmond’s Journal of Law and Technology, Volume XXVI.

[1] See Sophie Scott, Why Do Human Voices Sound the Way They Do, BBC (Dec. 1, 2009, 10:40 GMT), http://news.bbc.co.uk/2/hi/health/8382900.stm [https://perma.cc/7KTH-TBU7].

[2] See Hugh McLachlan, Is Every Human Voice and Fingerprint Really Unique?, The Conversation (Aug. 11, 2016, 10:19AM EDT), http://theconversation.com/is-every-human-voice-and-fingerprint-really-unique-63739 [https://perma.cc/9JHT-7M5N].

[3] See Gilbert v. Cal., 388 U.S. 263, 266–67 (1967).

[4] See, e.g., Claudia Roswandowitz et al., Obligatory and Facultative Brain Regions for Voice-Identity Recognition, 141 Brain 234, 235 (2018) (arguing that voice recognition of others is an important skill for social interaction).

[5] See, e.g., J. P. Ludington, Annotation, Identification of Accused by His Voice, 70 A.L.R. 2d 995, 2 (2018) (discussing the admissibility of voice-recognition testimony).

[6] See, e.g., Omar Ramadan et al., Digital Image Manipulation Forensic, in U. Cal. Berkley 2015, at 1, 6–7 (Technical Report No. UCB/EECS-2015-85, 2015) (describing the ever-increasing ability to manipulate photographic evidence).

[7] See generally Richard C. Dorf, Technology, Humans, and Society: Toward a Sustainable World (2001) (describing how technology has improved various areas of human life including: business, engineering, sustainable energies, natural resource conservation, manufacturing, vehicles, fuel cells, and transportation).

[8] See Carlos Alvarenga, Does Technology Create More Problems Than It Solves? The Linn Effect Says Yes, Reconnomics (Dec. 2, 2014), https://reconnomics.com/2014/12/02/does-technology-create-more-problems-than-it-solves-the-linn-effect-says-yes/ [https://perma.cc/UW6C-Y9YC].

[9] See generally Blair Janis, How Technology is Changing the Practice of Law, 31 ABA, no. 3, 2014, at 2, https://www.americanbar.org/publications/gp_solo/2014/may_june/how_technology_changing_practice_law.html [https://perma.cc/NFY9-7LRA] (describing how technology has affected, and will continue to affect, the practice of law).

[10] See id.

[11] See, e.g., Fed. R. Evid. 403; Fed. R. Evid. 901 (referring to the adjustments courts have made in response to the changes in technology).

[12] See Eli MacKinnon, Edison Voice Recording Is Old, but Not Oldest, Live Science (Oct. 26, 2012, 12:14 PM ET), https://www.livescience.com/24317-earliest-audio-recording.html [https://perma.cc/Y947-7HYJ] (explaining that Edison’s phonograph was invented in 1878, which was the first audio recording that could be played back for listening instead of inscribing an audio pattern on tinfoil).

[13] See id.

[14] See id.; see also Bryan Dewalt, The Tape Recorder, Library & Archives Canada, http://www.collectionscanada.gc.ca/gramophone/028011-3021.3-e.html [https://perma.cc/N8F6-MUG3] (last modified July 15, 2010).

[15] See, e.g., John Holkeboer, An Intro to Analog Tape Splicing & Editing, Tape Op Mag. (Dec. 1998), https://tapeop.com/tutorials/11/intro-analog-tape-splicing-and-editing-and-tape-loops/ [https://perma.cc/Q5H3-2TXM] (describing the “ancient art of audiotape editing and splicing”).

[16] See Mike Williams & Cat Ellis, The Best Free Audio Editor 2018: Trim, Tweak and Mix, Tech Radar (July 30, 2018), https://www.techradar.com/news/the-best-free-audio-editor [https://perma.cc/B7QT-VUAB] (providing a list of free audio editing devices available in 2018). See generally Mary Huismann, Audio Timeline, Yale Univ. Library, Irving S. Gilmore Music Library, https://web.library.yale.edu/cataloging/music/audiotimeline [https://perma.cc/J7CK-PUTZ] (providing a timeline of audio recordation).

[17] See, e.g., Andy Weir, Lyrebird is Launching an API to Copy Anyone’s Voice from a One-Minute Audio Recording, Neowin (Apr. 24, 2017, 08:32 EDT), https://www.neowin.net/news/lyrebird-is-launching-an-api-to-copy-anyone039s-voice-from-a-one-minute-audio-recording [https://perma.cc/62L4-XVZU] (announcing that Lyrebird will be able to capture anyone’s voice from a one-minute audio sample, which is significantly shorter than VoCo’s current twenty-minute sample).

[18] See Adobe VoCo ‘Photoshop-for-voice’ Causes Concern, BBC News (Nov. 7, 2016), http://www.bbc.com/news/technology-37899902 [https://perma.cc/K7WU-RLHG] (addressing ethical concerns following Adobe’s Project VoCo release).

[19] See Fed. R. Evid. 901(a), (b)(5).

[20] See Nadine Lavan, Sophie K. Scott & Carolyn McGettigan, Impaired Generalization of Speaker Identity in the Perception of Familiar and Unfamiliar Voices, 145 J. Experimental Psychol. 1604, 1604 (2016) (explaining the human ability to discern familiar and unfamiliar human voices).

[21] See Nick Statt, Adobe is Working on an Audio App that Lets You Add Words Someone Never Said: Watch What You Don’t Say, The Verge (Nov. 3, 2016, 6:30 PM EDT), https://www.theverge.com/2016/11/3/13514088/adobe-photoshop-audio-project-voco [https://perma.cc/X9YG-Z9UU].

[22] See id. (explaining the ability of VoCo to generate novel speech in a target’s voice that makes distinguishing the new replicated speech almost impossible).

[23] See, e.g., Neil v. Biggers, 409 U.S. 188, 194–95 (1972) (illustrating a victim’s difficulty in identifying her assailant).

[24] See id.

[25] See Gilbert v. Cal., 388 U.S. 263, 266 (1967).

[26] See Adobe Products: Desktop, Web, and Mobile Applications, Adobe, https://www.adobe.com/products/catalog.html [https://perma.cc/6W6E-JPRZ].

[27] See Christy Pettey, Moving to a Software Subscription Model, Gartner: Smarter With Gartner (May 30, 2018), https://www.gartner.com/smarterwithgartner/moving-to-a-software-subscription-model/ [https://perma.cc/M5FU-UNHR].

[28] See Adobe Creative Cloud, Adobe, https://www.adobe.com/creativecloud.html [https://perma.cc/4BGB-HKZ4].

[29] See id.

[30] See, e.g., Zachariah B. Parry, Note, Digital Manipulation and Photographic Evidence: Defrauding the Courts One Thousand Words at a Time, 2009 U. Ill. J.L. Tech. & Pol’y 175, 193–96 (2009) (discussing different courts’ approaches to the use of digitally altered photos).

[31] See Adobe Creative Cloud, #VoCo. Adobe MAX 2016 (Sneak Peeks), YouTube (Nov. 4, 2016) [hereinafter #VoCo], https://www.youtube.com/watch?v=I3l4XLZ59iw [https://perma.cc/C5QJ-TEQU].

[32] See id.  

[33] See id.

[34] See id. (demonstrating VoCo’s ability to offer users type and play functionality similar to text-to-speech conversion, but instead of using typical computer-generated voices, VoCo uses a real human voice).

[35] See Statt, supra note 21.

[36] See id.; see also #VoCo, supra note 31.

[37] See Statt, supra note 21; Adam Finkelstein, VoCo: Text-based Insertion and Replacement in Audio Narration, YouTube (May 11, 2017), https://www.youtube.com/watch?v=RB7upq8nzIU [https://perma.cc/9CCA-UN5D].

[38] See Finkelstein, supra note 37.

[39] See id.

[40] See id.

[41] See id.

[42] See id.

[43] See Finkelstein, supra note 37.

[44] See id. (demonstrating VoCo’s ability to generate novel speech using a target’s voice).

[45] See id.

[46] See FRE Legislative History Overview Resource Page, Fed. Evidence Review, http://federalevidence.com/legislative-history-overview [https://perma.cc/T395-4W9V]. See generally Pub. L. No. 93-595, 88 Stat. 1926 (1975) (stating that the Federal Rules of Evidence would go into effect in 180 days).

[47] See What Happened in 1975: Important News and Events, Key Technology, and Popular Culture, The People History, http://www.thepeoplehistory.com/1975.html [https://perma.cc/G6EC-U5DW].

[48] See Richard Trenholm, Photos: The History of the Camera, Cnet (Nov. 5, 2007, 7:06 AM PST) https://www.cnet.com/news/photos-the-history-of-the-digital-camera/ [https://perma.cc/8K4J-43TF] (explaining digital photography was invented in 1975, but did not reach the commercial market until 1981, six years after the Federal Rules of Evidence were enacted).

[49] See, e.g., Fed. R. Evid. 901 (acknowledging and incorporating telephone conversations into the Federal Rules of Evidence).

[50] See id.

[51] Fed. R. Evid. 901(b)(5) (emphasis added).

[52] See Fed. R. Evid. 901(b)(6).

[53] Id.

[54] See Fed. R. Evid. 104(b).

[55] Id.

[56] See id.

[57] See, e.g., United States v. Dionisio, 410 U.S. 1, 2–3 (1973) (allowing voice recordings into evidence in a gambling investigation case).

[58] See Fed. R. Evid. 901.

[59] See Jules Epstein & Suzanne Mannes, “Gruesome” Evidence, Science, and Rule 403, The Nat’l Judicial Coll. (March 17, 2016), http://www.judges.org/gruesome-evidence-science-and-rule-403/ [https://perma.cc/MR6G-NJXB].

[60] See Fed. R. Evid. 403.

[61] See Epstein & Mannes, supra note 59.

[62] See Fed. R. Evid. 403.

[63] See #VoCo, supra note 31.

[64] See Fed. R. Evid. 901(b)(5).

[65] See id.

[66] See id.; see also Parry, supra note 30, at 189 (explaining how difficult Photoshopped photographic evidence is to refute).

[67] See, e.g., #VoCo, supra note 31 (demonstrating the true power and potential of project VoCo).

[68] See Bethany K. Dumas, Voice Identification in a Criminal Law Context, 65 American Speech (Special Issue), no. 4, 341, 341 (1990).

[69] See United States v. Dionisio, 410 U.S. 1, 14 (1973).

[70] See id.

[71] Id. (quoting Katz v. United States, 389 U.S. 347, 351 (1967)).

[72] Id.

[73] See id.

[74] See Dionisio, 410 U.S. at 14.

[75] See id. (citing United States v. Doe, 457 F.2d 895, 898–99 (2d. Cir. 1972)).

[76] See United States v. Ashers, 968 F. 2d 411, 412 (4th Cir. 1992).

[77] See id.

[78] See id.

[79] See id.; see also U.S. Sentencing Guidelines Manual § 2C1.1(a), (b)(1) (U.S. Sentencing Comm’n 2016).

[80] See Ashers, 968 F.2d at 412; see also U.S. Sentencing Guidelines Manual § 2C1.1(a), (b)(1) (U.S. Sentencing Comm’n 2016).

[81] See #VoCo, supra note 31.

[82] See Finkelstein, supra note 37.

[83] See United States v. Basey, 613 F.2d 198, 200 (9th Cir. 1979).

[84] See id. at 201 (citing United States v. Turner, 528 F.2d 143, 166 (9th Cir. 1975)).

[85] See id.

[86] See id. at 200.

[87] See id. 

[88] See Basey, 613 F.2d at 202.

[89] See id.

[90] See, e.g., Parry, supra note 30, at 202.

[91] See Hewlett Packard Enterprise, Six Ways Technology is Disrupting the Courts, Actiac 2 (2017), https://www.actiac.org/system/files/HPE_Six%20ways%20technology%20is%20disrupting.pdf [https://perma.cc/F7NE-K2AR] [hereinafter Hewlett Packard Business White Paper].

[92] See Geoff Spencer, Technology, Ethics, and the Law: Grappling with our AI-Powered Future, Microsoft (Apr. 9, 2018), https://news.microsoft.com/apac/features/technology-ethics-and-the-law-grappling-with-our-ai-powered-future/ [https://perma.cc/SZJ6-BGZH].

[93] See generally Mostafa Yosry, Voice Now has a Photoshop, Know Everything About Adobe’s New Project VoCo, Samma3a (Jul. 16, 2017), https://www.samma3a.com/tech/en/know-every-thing-about-adobe-voco/ [https://perma.cc/R769-9YK5] (stating that VoCo is still under development and the launch has not been released).

[94] See, e.g., F. M. English, Annotation, Admissibility of Sound Recordings in Evidence, 71 A.L.R.2d 1024 (enumerating instances where telephone calls and voice recordings appear in American Law Reports).

[95] See id.

[96] See id.

 [97] See Percentage of Housing Units with Telephones in the United States from 1920–2008, Statista (2018), https://www.statista.com/statistics/189959/housing-units-with-telephones-in-the-united-states-since-1920/ [https://perma.cc/4S6J-96G6] (explaining that by 1970, 90.5% of homes had telephones).

[98] See Fed. R. Evid. 901(b)(6).

[99] See id.

[100] See Fed. R. Evid. 901(b)(5).

[101] See #VoCo, supra note 31 (explaining that VoCo was debuted in 2016).

[102] See id.

[103] See, e.g., Professional Voice Changer, Spycentre Security, https://spycentre.com/products/professional-voice-changer-1 [https://perma.cc/85TX-QVMC]; Voice Changers, Safety Basement, https://www.safetybasement.com/Voice-changer-Voice-Changing-Devices-s/387.htm [https://perma.cc/UMW5-6G2F].

[104] See, e.g., Voice Changer Software – Full Feature and Benefit List, Audio4Fun, https://www.audio4fun.com/voice-changer-features.htm [https://perma.cc/R5U8-TZ52].

[105] See id.

[106] See id.

[107] See Cell Phone Voice Changer, tbo-tech, https://www.tbotech.com/voicechanger.htm [https://perma.cc/LDM5-QM92] (information is located under the description tab of the product).

[108] See id.

[109] See id.

[110] See United States v. Gilbert, 181 F.3d 152, 153 (1st Cir. 1999).

[111] See id. at 155.

[112] See id.

[113] See id. at 155–56.

[114] See Gilbert, 181 F.3d at 157.

[115] See #VoCo, supra note 31.

[116] See id.

[117] See id.

[118] See Gilbert, 181 F.3d at 161–62.

[119] See discussion infra Section IV.B.

[120] See #VoCo, supra note 31.

[121] See id.

[122] See Parry, supra note 30, at 197–98.

[123] Adobe Photoshop CC Homepage, Adobe, [[https://web.archive.org/web/20170715045829/http://www.adobe.com/products/ photoshop.html].

[124] See K. Mahesh, History of Photoshop: Journey from Photoshop 1.0 to Photoshop CS5, Creative Overflow (Sept. 12, 2011), https://creativeoverflow.net/history-of-photoshop-journey-from-photoshop-1-0-to-photoshop-cs5/ [https://perma.cc/LR42-9ZC9].

[125] See id.

[126] See id.

[127] See Harry Guinness, What Can You Actually Do with Adobe Photoshop, Make Use Of (July 11, 2016), http://www.makeuseof.com/tag/what-can-do-with-photoshop/ [https://perma.cc/XJ98-GBLY].

[128] See id.

[129] See Parry, supra note 30, at 189.

[130] See id. at 176.

[131] See, e.g., Ankur Patar Recreates Rembrandt Masterpiece with Adobe Stock Adobe Creative Cloud, YouTube (June 26, 2016), https://www.youtube.com/watch?v=8LAlcSm6EzA [http://perma.cc/9HTY-9GZB].

[132] See id.

[133] See Statt, supra note 21.

[134] See id.

[135] Cf. Glenn Porter, A New Theoretical Framework Regarding the Application and Reliability of Photographic Evidence, 15 Int’l J. Evidence & Proof, 26, 27 (2011) (stating that “the application of pictures, rather than exclusively using words, is having a profound affect on legal persuasion and courts’ decision mechanisms.”).

[136] See Parry, supra note 30, at 182­–83.

[137] See Benjamin V. Madison III, Note, Scientific Evidence Symposium: Seeing Can Be Deceiving: Photographic Evidence in a Visual Age – How Much Weight Does It Deserve?, 25 Wm. & Mary L. Rev. 705, 705 (1984).

[138] See generally Kevin S. Douglas et al., The Impact of Graphic Photographic Evidence on Mock Jurors’ Decisions in a Murder Trial: Probative or Prejudicial?, 21 L. & Hum. Behav. 485, 492 (1997) (conducting a study in which jurors were found to be twice as likely to convict a person of murder if there were photographs included in the case file as opposed to no photographs).

[139] See Parry, supra note 30, at 178–79.

[140] See Fed. R. Evid. 402.

[141] See Douglas et. al, supra note 138, at 497.

[142] See Parry, supra note 30, at 184–85.

[143] See id. at 178.

[144] See id. at 179.

[145] George L. Paul, The “Authenticity Crisis” in Real Evidence, 15 Prac. Litigator 45, 47 (2004).

[146] See id.

[147] Id.

[148] See Parry, supra note 30, at 197–200.

[149] Margaret Rouse, Metadata, WhatIs.com (July 2014), https://whatis.techtarget.com/definition/metadata [http://perma.cc/UR8Y-V6AZ].

[150] See id.

[151] See Parry, supra note 30, at 199.

[152] See Romanas Naryškin, What Is Metadata in Photography, photographylife (Mar. 1, 2017), https://photographylife.com/what-is-metadata-in-photography [http://perma.cc/7MEW-XPXG].

[153] See Paul, supra note 145, at 48–49.

[154] See id. at 49.

[155] See id. 

[156] See 18 U.S.C. § 1623 (2016).

[157] See generally id. (stating whoever “knowingly makes any false material declaration or makes or uses any other information, including any book, paper, document, record, recording, or other material, knowing the same to contain any false material declaration . . . .”).

[158] See id.

[159] See Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 582 (1993).

[160] See id. at 586­–87.

[161] See Fed. R. Evid. 702; see also United States v. Parra, 402 F.3d 752, 758 (7th Cir. 2005) (“At this point, Rule 702 has superseded Daubert, but the standard of review that was established for Daubert challenges is still appropriate.”).

[162] See Fed. R. Evid. 901(b)(5).

[163] Compare Fed. R. Evid. 702 (stating that the “witness who is qualified as an expert by knowledge, skill, expertise, training, or education . . .” may testify to their opinion, but only under certain parameters), with Fed. R. Evid. 901(a), (b)(5) (stating that to authenticate a person’s voice, the proponent must be able to produce sufficient evidence to support that the voice is what the proponent claims it is).

[164] See discussion supra Section II.A.

[165] See Fed. R. Evid. 901.

[166] See, e.g., Expert Witness Fees: How Much Does An Expert Witness Cost?, Seak Experts, https://blog.seakexperts.com/expert-witness-fees-how-much-does-an-expert-witness-cost/ [http://perma.cc/4459-Y3GP]; Gary Edmond et al., Unsound Law: Issues With (‘Expert’) Voice Comparison Evidence, 35 Melb. U. L. Rev. 52, 52 (2011).

[167] See generally Gary Edmond et al., Unsound Law: Issues With (‘Expert’) Voice Comparison Evidence, 35 Melb. U. L. Rev. 52, 52 (2011) (explaining a similar increased rate of voice identification evidence in Australian Courts since the 1980’s).

[168] Cf. Expert Witness Fees: How Much Does An Expert Witness Cost?. Seak Experts, https://blog.seakexperts.com/expert-witness-fees-how-much-does-an-expert-witness-cost/ [http://perma.cc/4459-Y3GP] (stating the median testimony fee per hour for expert witnesses as well as information on retainer fees).

[169] See Caroline Wolf Harlow, Defense Counsel in Criminal Cases, Dep’t of Just. (2000), https://www.bjs.gov/content/pub/pdf/dccc.pdf [http://perma.cc/B69C-AJKB].

[170] Cf. Fed. R. Evid. 901 (stating that to authenticate a person’s voice, the proponent must be able to produce sufficient evidence to support that the voice is what the proponent claims it is).

[171] See discussion supra Section IV.B.

[172] See discussion supra Section IV.B.

[173] See discussion supra Section IV.B.

[174] See discussion supra Section II.A; see also Fed. R. Evid. 901.

[175] See discussion supra Section IV.B.

[176] See discussion supra Section IV.A.

[177] See Finkelstein, supra note 37; Weir, supra note 17; #VoCo, supra note 31 (concluding that the threat of VoCo is imminent).

[178] See discussion supra Section IV.B.

[179] See Parry, supra note 30, at 176–77, 184–85.

[180] See discussion supra Section IV.B.

[181] See id.; see Parry, supra note 30, at 187–92.

[182] See Parry, supra note 30 at 176, 182–83.