Putting Words in Your Mouth: The Evidentiary Impact of Emerging Voice Editing Software

By: Nicholas Mirra, Esq.*

Cite as: Nicholas Mirra, Putting Words in Your Mouth: The Evidentiary Impact of Emerging Voice Editing Software, 25 Rich. J.L. & Tech., no. 1, 2018.

I. Introduction 

[1]       All you have in this life is your word. The human voice serves as the carrier for our words, thoughts, and feelings; each of us is imparted with a unique voice that allows us to be identified amongst a group.[1] Our voice is our vocal finger print.[2] Every word which departs from our lips carries an exclusive trademark which identifies those words as belonging to an individual.[3] Because individuality of voice is a phenomenon implicitly understood by all humans, our words have become intertwined with our identity.[4] As a result of this interconnection between voice and identity, voice recordings and identification have become essential to the legal process.[5]

[2]       In today’s technologically advancing world, evidence can be effortlessly manipulated in more ways than were imaginable even a few years ago.[6] There is little debate that, as a whole, strides in technology make the human experience more convenient and productive.[7] With every new advance in human ingenuity comes a reciprocal; a set of new problems that had never been at issue prior to the new invention.[8] There are few areas of civilization that are as equally susceptible to the benefits and hurdles of new technologies as the law.[9]

[3]       Law is uniquely situated in a position where it must play both ends. Although the field of law has felt the positive impacts of technological advancement, it also has the opportunity to be manipulated by it.[10] In order to prepare for new technologies, courts must consider how they can be used to provide novel forms of evidence, or conversely, how the new technology may threaten existing and well established forms of evidence.[11] One type of evidence whose genesis was created due to technological innovation was the voice recording in 1878.[12] The invention of recording gave practitioners an opportunity to capture the human voice for later reproduction.[13] When voice recordings were first introduced, the recordings were preserved on physical mediums such as cylinders of wax used in the phonograph, or later at the time the Federal Rules of Evidence were enacted, on tape recordings.[14] These mediums were difficult to tamper with after recordation, and thus the bar for introducing evidence of voice recordings was set understandably low at the time the Federal Rules were formed.[15] Now that most voice recordings are digitized, the data which recordings are comprised of may be more readily manipulated.[16]

[4]       The next great threat to the reliability and authenticity of recordings of the human voice is looming on the horizon. With Adobe’s advent of Project VoCo, and with other companies beginning to create similar products that operate more quickly,[17] the courts must be made aware of the new technology, and issues associated with the best way of preparing for its arrival.[18] The current law surrounding the admissibility of voice evidence is based on the assumption that—as with all real evidence—juries are inherently capable of assessing the weight of the evidence.[19] This premise is based on the human ability to perceive differences in visual and auditory cues.[20] Project VoCo undermines this import placed on the ability to perceive differences in sound.[21] When voice recordings are manipulated using Project VoCo, juries will be unable to perceive an auditory difference in speech in voice recordings.[22]

[5]       Admissibility issues surrounding voice identification have confounded the legal process since its beginning, but the complexity encompassing voice identification continues to expand into the digital age.[23] Issues regarding vocal evidence arise even in its most simple form—when an earwitness is identifying a person based on their natural voice alone.[24] It could be a bank teller who was robbed by a masked man, left only to identify the perpetrator by the unique characteristics of their voice. Perhaps it could be a witness who received a threatening phone call but had no other means of identifying the person aside from their voice. How reliable could the identification of a defendant be if it is solely based on voice?[25]

[6]       This paper will first introduce Adobe’s Project VoCo. This introduction of the new technology will explain the power and breadth of the software. Next, the paper will address how the Federal Rules of Evidence are ill-equipped to handle Project VoCo. Then, the paper will analogize Project VoCo to other technologies, and discuss how courts have adapted to those technologies. Later, the paper will discuss why there should be a new threshold intertwined into the current Rule 901 standard for admitting voice evidence. Finally, the paper will conclude with a brief summary.

II. Project VoCo

[7]       Adobe is a software company known for Photoshop, Acrobat, Illustrator, Dreamweaver, and more than 20 other software programs.[26] Many of these programs are available for public purchase, and following in the footsteps of several other large technology companies,[27] are now available on a subscription service known as Adobe Creative Cloud.[28] These powerful programs are accessible by anyone with internet connection and the modest capital required to subscribe to the use of these products.[29]

[8]       Although Photoshop’s effect on evidence is still at issue,[30] the newest threat to the legitimacy of evidence is Adobe’s Project VoCo. When Adobe unveiled its Project VoCo software in a live press release in November 2016, it shocked the audience.[31] On a stage in front of hundreds of engaged spectators, an Adobe representative showed the true power of the company’s newest technology.[32] VoCo is a software that enables the user to make a computer read back anything the user types into it.[33] Although the underlying concept may seem familiar at first glance, this program is not akin to mere text-to-speech conversion software.[34] VoCo can take typed text and convert it into distinguishable speech spoken by anyone’s voice that the VoCo user has on file.[35] Project VoCo can use a recording of a target’s voice and change one or more words in a spoken sentence, or even create novel sentences altogether.[36] More specifically, VoCo can use a 10 minute audio sample of their target speaking, and then anything the user types can be read back by the program in the target speaker’s voice.[37] The VoCo user can individually adjust each phoneme within any word in the sentence in order to create a sentence that flows as naturally as a real human statement.[38] The software also hosts a litany of pronunciations available for each word used as well.[39] The user can modify the pitch and duration of each syllable for even more accurate speech that sounds identical to the target’s natural voice.[40] Essentially, VoCo has largely been dubbed as a Photoshop for the human voice.[41] As the software evolves, the length of the voice sample required for the software to function will likely exponentially shorten, and the ease of manipulating another’s voice will become increasingly more simple.[42]

[9]       VoCo produces a novel issue of law. If VoCo is used by an actor, a voice recording may be properly identified as belonging to a party of a case, but the recording may not accurately represent anything that the party ever said.[43] Essentially, anyone with access to VoCo will be able to put words in someone’s mouth.[44] Although such a misuse would require the bad faith action, once the edits to the recording have been made, it will be extremely difficult to discern any alterations or tampering to the recording.[45]

A. The Federal Rules of Evidence Are Ill-Equipped to Handle VoCo

[10]     The Federal Rules of Evidence were enacted in 1975;[46] a time when photographs were taken on traditional film, the United States Military was leaving Vietnam, and well before the internet was invented.[47]  The future advances in vocal and photographical technologies were understandably beyond the purview of the drafters of the Rules.[48] Although several technologies that would eventually create more complex evidentiary issues had not yet been invented in 1975, the fundamental concerns regarding identification by voice recognition were nonetheless directly addressed by the Rules of Evidence.[49] Rule 901 explains that authenticating or identifying evidence requires that the proponent of the evidence must provide supporting evidence to prove an item is what the proponent claims it is.[50] More pertinently, Rule 901(b)(5) states that an earwitness may testify to their opinion “identifying a person’s voice—whether heard firsthand or through mechanical or electronic transmission or recording—based on hearing the voice at any time under circumstances that connect it with the alleged speaker.”[51] Rule 901(b)(6) more specifically addresses evidence about telephone calls.[52] The Rule states that evidence regarding a phone call is admissible if it demonstrates that a “particular person, if circumstances . . . show that the person answering was the one called . . .”[53]

[11]     Under the Rules, if a proponent posits that the circumstances surrounding the call demonstrate that John Doe was on the other end of a phone call, and the proponent records the phone call, then the proponent would merely need to have someone who has heard John Doe speak identify the voice on the recording as belonging to John Doe.

[12]     In its essence, Rule 901 is a specific application of Rule 104(b).[54] Rule 104(b) states: “When the relevance of evidence depends on whether a fact exists, proof must be introduced sufficient to support a finding that the fact does exist. The court may admit the proposed evidence on the condition that the proof be introduced later.”[55] This standard requires that a proponent merely provide evidence that could support a finding.[56] As a result of this low threshold, these forms of evidence are almost always admitted at trial.[57]

[13]     In practice and context, the problem with Rule 901 is that ear witnesses may testify about whether a voice sample belongs to someone, and the jury is left with the responsibility to provide the evidence with weight that they deem adequate.[58] Both voice evidence and photographical evidence have extreme probative value in the eyes of juries.[59] In fact, under Rule 403, oral and visual evidence of key facts in a case can become unduly prejudicial to a party.[60] For example, courts have found that gory photos of a crime scene can tip the scale balancing prejudice and probative value too far towards the prejudicial side, and courts will typically not allow them to be shown to a jury.[61] Similarly, a graphic auditory account of a crime scene may cause voice evidence to be unduly prejudicial.[62]

[14]     Under this current scheme, VoCo will present a means by which a proponent could introduce evidence which satisfies the Federal Rules of Evidence in a seemingly imperceptible fashion.[63] Because the Rules plainly permit a witness to testify that a voice on a recording is distinct and belongs to the person in question, a sample altered by VoCo could be easily slipped into evidence.[64] VoCo could produce admissible evidence even if the opponent never actually said the words that were expressed on the recordings.[65] Further, it would be nearly impossible for an opponent to rebut a recording of their own voice being played aloud for the courtroom to hear without producing reams of metadata.[66] Regardless of the application, the potential for abuse by using VoCo is boundless in breadth and limitless in depth.

[15]     The current Federal Rules of Evidence inadvertently provide an avenue for a party to authenticate false vocal evidence with relative ease. The Rules are outdated in this regard because until now, manipulating one’s speech has never been as viable as a threat.[67] Until VoCo becomes more mainstream and occupies the public eye, a blindsided opponent likely would not be able to sufficiently explain how the proponent of the evidence has a “smoking-gun” voice recording of words that were never said. The evidence could be damning, and the opponent would be left without plausible justification for why it is her voice on the recording, but words she never said.

B. Examples Where Voice Evidence Could be Easily Abused 

[16]     Although voice identification evidence has criminal and civil implications, in a criminal context, the potential for abusing VoCo is especially rampant. Voice evidence plays a substantial role in criminal cases.[68] The entire course of a defendant’s life could be altered with some quick changes made on VoCo to a voice recording that provides a pivotal piece of evidence at trial. In order to illustrate the importance of voice evidence in the criminal context, a few cases where the outcome hinged on voice evidence are described below:

[17]     In United States v. Dionisio, the Supreme Court held that voice exemplars, when compelled, are not in violation of the Fifth Amendment privilege against self-incrimination when the exemplars are used merely for identification purposes.[69] The Court also concluded that compelled disclosures of voice exemplars in front of a grand jury are not a violation of the Fourth Amendment as an unreasonable search or seizure.[70]

[T]he Fourth Amendment provides no protection for what ‘a person knowingly exposes to the public, even in his own home or office . . . .’[71] The physical characteristics of a person’s voice, its tone and manner, as opposed to the content of a specific conversation, are constantly exposed to the public. Like a man’s facial characteristics, or handwriting, his voice is repeatedly produced for others to hear. No person can have a reasonable expectation that others will not know the sound of his voice, any more than he can reasonably expect that his face will be a mystery to the world.[72]

[18]     This case sets a pivotal foundation for VoCo because it illustrates that courts can compel a defendant to provide a voice sample in a criminal context.[73] As a result, if there is a voice recording that is fraudulently crafted using VoCo, then the court can reasonably compel a person of interest to provide a voice sample.[74] If the grand jury deems the voices to be similar enough, the grand jury can indict the defrauded defendant.[75]

[19]     In United States v. Ashers, the defendant was convicted of “accepting a bribe while employed as a classification and parole specialist” at a prison complex.[76] During the trial, the United States proved that the defendant disguised his voice while preparing a voice exemplar for a defense expert to examine.[77] The defendant’s voice was pivotal in the case because the bribery conviction was largely attributable to a recording of the defendant conversing with an inmate who was wearing a wire.[78] The district court enhanced the defendant’s offense level by two levels for obstruction of justice due to the defendant intentionally falsifying his voice.[79] Ultimately, the 4th Circuit held that the judgment was proper.[80]

[20]     If VoCo was in fruition at the time of this case, then the defendant may have been able to disguise his voice or change his words entirely by altering it with VoCo instead of simply attempting to disguise his voice.[81] VoCo’s transformative powers would allow an abuser of the technology to augment any recording of his voice in a manner that would sound completely different from the speaker’s natural speech.[82]

[21]     In United States v. Basey, the 9th Circuit held that voice identification, witness testimony to use of an alias, and recorded telephone conversations were sufficient evidence to identify a defendant.[83] The court explicitly stated that voice identification may be accomplished by direct or circumstantial evidence.[84] In this case, a Drug Enforcement Administration (DEA) agent was able to identify the defendant’s voice based on recorded conversations.[85] This identification, in combination with witness testimony that identified the defendant’s alias, and recorded conversations were sufficient to identify the defendant, and the conviction was upheld.[86]

[22]     This case demonstrates how low the admissibility bar for voice identification is. Identification by one individual was sufficient to establish that the defendant was the voice on the recording.[87] With VoCo, these recordings could have been modified with or without the agent’s knowledge, and the voice would still have been identified as belonging to the defendant.[88] The identification of the voice would have been correct, but the recording would not be an accurate representation of the true conversation.[89]

[23]     In order to effectively predict how the courts will address VoCo, it is paramount to examine issues surrounding comparable software. Some may argue introduction of VoCo may create a slippery slope by which every opponent to a piece of voice evidence alleges that the recording was falsified using VoCo. While this concern is legitimate, the same defense could be articulated about photographic evidence and Photoshop, but that argument does not seem to be raised often.[90]

III.  Courts Should Look to Previous Ground-Breaking Technologies in Order to Prepare for Project VoCo

[24]     Whenever a novel technology is introduced that impacts the viability of evidence, courts have to adapt.[91] Generally, this adaptation either occurs after the new technology has caused a problem, or the courts attempt to assimilate the new technology into the ill-equipped confines of an existing evidentiary schema.[92] Project VoCo is not available for public purchase at the time of this paper, but the threat looms.[93] The drafters of the Rules have the unique proactive opportunity to ensure that the Rules of Evidence are equipped to grapple with this new technology should it become available in the near future. In order to best prepare for the new technology, courts should look to how they adapted to several similar innovative technologies in the past. Specifically, courts should consider the telephone, voice modulators, and Photoshop.

A. Telephone

[25]     Alexander Graham Bell’s revolutionary invention of the telephone has impacted the use of vocal evidence in court.[94] Upon the advent of the telephone, testimony based on voice recognition has been further complicated because vocal communication was made possible over long distances while providing relative clarity of voice.[95] Even though the correspondents may be miles apart, parties to a phone call are able to communicate with each other effectively.[96]

[26]     The Federal Rules of Evidence were enacted in 1975, at which point the telephone had become commonplace in American society.[97] The technology was so ubiquitous and understood that the Rules had specific criteria to address the use of telephonic evidence.[98]

[27]     The content of telephone calls and the identity of the speakers has come to play an important role in legal proceedings. An earwitness testifying about the content of a telephone conversation must be able to prove the specificity of the person called.[99] To overcome that standard, the voice of the caller would viably be cited as support for identifying a specific person on the other end of a phone call in accordance with the standard previously set forth in Rule 901(b)(5).[100]

[28]     Unlike the telephone, VoCo was not a ubiquitous technology at the time the Federal Rules were enacted.[101] In fact, VoCo is still not a ubiquitous technology today.[102] Once the technology is readily available and is possessed by the masses, the law will need to react accordingly.

B. Voice Changer 

[29]     A voice changer, or a voice modulator, is an electronic device or software program that manipulates the human voice, usually in real time.[103] Although voice changers appear less often in cases than photographic evidence, voice changers share many similar features as VoCo, which makes them an import analog.[104] For example, like VoCo, not all voice changers are standalone electronic devices.[105] Many of these voice changers are computer programs that can be employed to alter swaths of speech much like VoCo.[106]

[30]     Voice changers are also known as voice disguisers.[107] When you use a voice changer, your voice is changed so dramatically that close friends and relatives would not be able to discern who they are speaking to on the basis of voice.[108] They operate primarily by raising and lowering the pitch of the user’s voice.[109]

[31]     In United States v. Gilbert, a defendant was convicted of making a telephone bomb threat.[110] The district court admitted evidence that the defendant purchased a toy voice changer on the day the telephone bomb threat was made.[111] This voice changer was capable of shifting the pitch of the speaker’s voice up or down so that it was disguised.[112] The defendant made multiple threatening calls using the voice changer.[113] Ultimately, an expert was able to prove that the defendant was the one making the calls by taking the recorded messages and altering the pitch so that the defendant’s true voice shone through.[114]

[32]     Although voice changers generally shift the pitch of a natural voice, VoCo is distinguished as being far more powerful and containing many more features.[115] VoCo can edit almost any facet, but it can also generate novel speech.[116] These features make alterations using VoCo much more difficult to detect than voice changers.[117]

[33]     The way the court handled voice changer evidence in Gilbert was by allowing an expert to testify as to the identification of the defendant.[118] As discussed later in Part IV, courts may need to allow experts to examine evidence of voice recordings in order to determine if it has been tampered with by VoCo.[119] The difficulty still remains that VoCo uses the voice of its target to generate speech.[120] Because VoCo does not merely change the pitch of speech, but instead can substitute or generate words in the target’s voice, a voice recording expert would presumably need to look at other indicators aside from pitch.[121] Perhaps an expert would be able to examine metadata associated with a voice file, but the requisite analysis would be undoubtedly complex as discussed below.[122]

C. Photoshop

[34]     Photoshop allows the user to alter images in order to “[c]reate anything you can imagine[,] [a]nywhere you are.”[123] Photoshop shook the foundations of photographic evidence when it was brought to market, because it allowed users to modify photographs in almost any way imaginable.[124] The software was first introduced in February of 1990, and has transformed the visual world ever since.[125] Photoshop began as a rudimentary digital editing program, but it has advanced greatly since its genesis.[126] Photoshop now enables a user practiced in the program to alter existing images, layer new images over other images, and remove aspects of images.[127] Although complex tasks on Photoshop require some level of experience, novice users are also able to manipulate photos in extraordinary ways.[128]

[35]     Even before Photoshop, courts have grappled for years with instances where photographs are manipulated, and the law is still developing in the area.[129] The ability to untraceably falsify photographic evidence has never been more accessible, and it poses a legitimate threat to justice.[130] Although the programs exact change on different electronic mediums, both Photoshop and VoCo pose similar threats.

IV. The Similarities Between Photoshop and Project VoCo

[36]     Though VoCo poses unique challenges in the world of voice evidence, several parallels may be drawn between it and its sister software, Photoshop. Photoshop is capable of taking digital data in the form of photographs and manipulating it in order to produce a wholly altered and convincing image.[131] Photoshop is now so advanced, that the user can start with a blank slate and create an image of whatever they desire through the manipulation of both stock and original images.[132] VoCo operates on a similar plane, but with an analogous medium. VoCo takes digital data in the form of vocal recordings and can manipulate it to produce a novel vocal recording that was never actually uttered by the original speaker.[133] VoCo is the Photoshop of soundwaves.[134] In order to effectively predict how VoCo will be addressed by the courts, an analysis of how they have dealt with Photoshop’s impact on photographic evidence will provide insight.

A. How the Courts Have Handled Photoshopped Evidence

[37]     Similar to vocal evidence, photographic evidence has been a mainstay in courts for decades.[135] As Photoshop gained traction in the market, digital manipulation of photographs became commonplace as the public began to explore and exploit the program’s functions.[136] While the import of photographic evidence is often the subject of debate, photographs continue to play a fundamental role in the legal system.[137] Few pieces of evidence are more convincing and enduring for a jury than a photograph illustrating a pivotal subject of a trial.[138] Understandably, the newfound powers of Photoshop raised evidentiary red flags amongst academics, practitioners, and courts alike.[139]

[38]     A photograph must be both relevant and authentic in order to be introduced into evidence.[140] Similar to vocal evidence, low hanging hurdles such as those described above have created an avenue for falsified images to be easily introduced into evidence. Because of the weight that photographic evidence carries in the mind of jurors, having a clear and convincing image to provide to the jury can make or break a case.[141] This weight can occasionally lead a party to modify a photograph to depict a scene more clearly, or simply misrepresent a scene in their favor through editing techniques.[142] Traditional editing techniques were far more detectable when photographs were taken using analog film and development.[143] Similar to VoCo making voice alterations achievable and difficult to detect, digital photographs and Photoshop have made it nearly impossible to detect alterations made to photographs.[144]

[39]     “Because of the assumptions relating to analog informational records, under the current rules only a few quick and sketchy foundational questions are required to allow writings, photographs, and tape recordings to come into evidence as ‘authentic’—as being what they ‘purported to be.’”[145] It is sufficient to prove authenticity and relevance if a witness on the stand testifies that the photograph accurately depicts the scene as they saw it.[146] “Similar to the photograph, a party seeking to admit sound evidence need only show that the recording is an accurate reproduction of sound that was previously heard by a witness.”[147]

[40]     In an effort to combat tampering, there is an emerging practice where digital photographs are authenticated and verified via metadata.[148] Metadata is defined as “data that describes other data.”[149] Common examples of metadata include: author, date created, dates of modification, and file statistics.[150] Unfortunately, metadata can also be readily altered by actors who are only moderately technologically inclined.[151]

[41]     In regard to digital photographs, metadata is a comprehensive listing of all persons who had access to the photograph, as well as digital breadcrumbs that evidence any edits or alterations made to the photo.[152] Even under the monitor of metadata, it is possible for edits to be made covertly without trace or trail.[153] Unfortunately, utilizing metadata for authenticating a digital photograph requires expert knowledge for both production and analysis.[154] As such, providing metadata for each digital photograph is proving to be impractical and unduly burdensome and unrealistic in application.[155] Using metadata to track the access and alterations made to voice recordings would presumably pose the same host of issues as using metadata in digital photographic evidence.

[42]     Under federal law, the consequences for false declarations before a court are enumerated in 18 U.S.C. § 1623.[156] Illegal modifications to photographs would likely fall under the veil of false declarations because the modifications are being falsely represented as true.[157] Infractions under this statute carry a penalty of a fine and up to five years of imprisonment.[158] Though the consequences of defrauding the court are severe, because modern photographic alteration techniques are so covert, some parties likely calculate the risk of detection of their bad faith action is substantially outweighed by the reward of a victory at trial. VoCo will create the same potential to defraud the court and usurp justice.

B. How Should Project VoCo be Handled? 

[43]     It is paramount that the Federal Rules of Evidence adapt. One plausible solution would be to use voice evidence experts to provide expert testimony regarding the authenticity of voice evidence. This would require the experts to satisfy the Daubert standard.[159] In Daubert, the Supreme Court held that the expert testimony requirements enumerated in the Federal Rules of Evidence superseded the old common law requirements.[160] 

[44]     Daubert was later superseded by statute as codified in Federal Rule of Evidence 702.[161] As mentioned previously, the admissibility standard for voice recordings is simply that someone familiar with the voice is able to verify who is speaking on the recording.[162] If experts were to testify about the authenticity of voice recording evidence, it would require experts to pass this higher 702 standard in order to testify about voice recording evidence, which has a lower standard.[163]

[45]     In total, there are three potential solutions for the Federal Rules of Evidence to adapt to VoCo. First, the law could remain exactly the same. Second, the law could require that a metadata expert testify as to the validity and veracity of all native digital, or retroactively digitized, voice recordings in accord with the standard described above. Third, the law could adopt a threshold standard where expert testimony regarding metadata would need to testify about the validity of a voice recording. For the reasons discussed below, the Federal Rules of Evidence should adopt a threshold standard to Rule 901.

[46]     The first option is implausible because the Federal Rules of Evidence should not go unchanged in response to VoCo. The Rules are not equipped to handle this new technology for the reasons discussed in Part II.[164] If the Rules are not modified, the potential for manipulation of voice evidence could create a large obstruction to the administration of actual justice.

[47]     In the alternative, if metadata was required for every piece of voice evidence introduced to the courts, the courts would become clogged by the new standard for admission. The courts would likely be burdened because, as mentioned in Part II, the current admissibility threshold under Rule 901 for voice evidence is currently so low.[165] If experts were required to testify about the metadata for each voice recording ever used in federal court, the timeframe and costs of litigation would increase significantly.[166] This is especially prominent in the criminal context because that is where a majority of voice evidence is used.[167] If the prosecution had to produce an expert to interpret and testify every time voice evidence of a defendant was introduced, the government would be heavily burdened.[168] Likewise, a defendant may need to produce an expert to testify against the validity of the metadata that further amplifies the timeline and expense of trial, which many criminal defendants simply cannot afford.[169]

[48]     For the reasons mentioned above, the Federal Rules should adopt a threshold standard. Under this threshold standard, Rule 901 should be kept as the general rule that applies to voice evidence unless the opponent of the evidence is able to demonstrate a plausibility of tampering with the evidence.[170] The opponent to the voice evidence should bear the burden to demonstrate plausibility of tampering, and if the opponent is able to demonstrate plausibility of tampering, then a Daubert standard should apply.[171] Under this standard, a metadata expert would need to demonstrate that the voice recording has not been tampered with or manipulated using VoCo or similar software.[172] After this expert testimony, the jury should be able to assess the weight of the evidence.[173]

V. Conclusion

[49]     What is preventing a voice recording created by VoCo from being introduced as authentic evidence? Under the current Federal Rule structure, almost nothing.[174] If there was a threshold showing that the proponent of voice evidence may have altered the recording, perhaps the only thing inhibiting a fabricated recording from being admitted is an analysis of the metadata by an expert which demonstrates the digital modification.[175] The challenges of proving that an opponent did not say what he was recorded saying will prove to be a momentous task that must be tailored on a case-by-case basis. How does one prove that despite a recording being their voice, that they never said those words? They must demonstrate that the voice recording is not authentic in a similar way that a Photoshopped photo is proved to be inauthentic.[176] The first step however, is that practitioners must be aware that this technology exists and familiar with its function. Undoubtedly, VoCo will be further refined and released to the public in the near future.[177]

[50]     The Federal Rules of Evidence should be amended to account for the proliferation of this technology.[178] Authenticating voice recordings will be a much more profound impediment when introducing evidence as a result of their presumptions of validity being unsettled. In an ideal world, a voice recording would require accompanying metadata similar to digital photographs that demonstrates the recording is in a raw and unadulterated format. Like photographic metadata, the burden of producing metadata for each voice recording used in evidence will prove to be unreasonably cumbersome as it would magnify the burden of discovery, and it can still be unreliable.[179] This is why the Rules should adopt a threshold showing.[180] Under this threshold showing, once an opponent of the evidence is able to demonstrate plausibility that the proponent tampered with the evidence, the Rules should require experts in metadata to provide their opinion as to the validity of the voice recording.[181] Although metadata is a partial solution, it is not a perfect solution because it too can be altered.[182]

[51]     The law must adapt to reconcile the evolving possibilities of tomorrow instead of being entrenched in the antiquated shadow of yesterday. In order to prevent the voice recordings from being distorted by a few keystrokes on a laptop, the law must account for potential foul play facilitated by new technologies. As novel innovations in technology proliferate, opportunities for dishonesty multiply. Project VoCo and similar technologies are coming, and it will demand change.


