Finding the Signal in the Noise: Information Governance, Analytics, and the Future of Legal Practice

On March 14, 2014

Cite as: Bennett B. Borden & Jason R. Baron, Finding the Signal in the Noise: Information Governance, Analytics, and the Future of Legal Practice, 20 Rich. J.L. & Tech. 7 (2014), http://jolt.richmond.edu/v20i2/article7.pdf.

Bennett B. Borden* and Jason R. Baron**

Introduction

[1] In the watershed year of 2012, the world of law witnessed the first concrete discussion of how predictive analytics may be used to make legal practice more efficient. That the conversation about the use of predictive analytics has emerged out of the e-Discovery sector of the law is not all that surprising: in the last decade and with increasing force since 2006—with the passage of revised Federal Rules of Civil Procedure that expressly took into account the fact that lawyers must confront “electronically stored information” in all its varieties—there has been a growing recognition among courts and commentators that the practice of litigation is changing dramatically. What needs now to be recognized, however, is that the rapidly evolving tools and techniques that have been so helpful in providing efficient responses to document requests in complex litigation may be used in a variety of complementary ways to the discovery process itself.

[2] This Article is informed by the authors’ strong views on the subject of using advanced technological strategies to be better at “information governance,” as defined herein. If a certain evangelical strain appears to arise out of these pages, the authors willingly plead guilty. One need not be an evangelist, however, but merely a realist to recognize that the legal world and the corporate world both are increasingly confronting the challenges and opportunities posed by “Big data.”[1] This Article has a modest aim: to suggest certain paths forward where lawyers may add value in recommending to their clients greater use of advanced analytical techniques for the purpose of optimizing various aspects of information governance. No attempt at comprehensiveness is aimed for here; instead, the motivation behind writing this Article is simply to take stock of where the legal profession is, as represented by the emerging case law on predictive coding represented by Da Silva Moore,[2] and to suggest that the expertise law firms have gained in this area may be applied in a variety of related contexts.

[3] To accomplish what we are setting out to do, we will divide the discussion into the following parts: first, a synopsis of why and how predictive coding first emerged against the backdrop of e-Discovery. This discussion will include a brief overview of predictive coding with references to the technical literature, as the subject has been recently covered exhaustively elsewhere. Second, we will define what we mean by “Big data,” “analytics,” and “information governance,” for the purpose of providing a proper context for what follows. Third, we will note those aspects of an information governance program that are most susceptible to the application of predictive coding and related analytical techniques. Perhaps of most value, we wish to share a few “early” examples of where we as lawyers have brought advanced analytics, like predictive coding, to bear in non-litigation contexts and to assist our clients in creative new ways. We fully expect that what we say here will be overrun with a multitude of real-life use cases soon to emerge in the legal space. Armed with the knowledge that we are attempting to catch lightning in a bottle and that law reviews on subjects such as this one have ever decreasing “shelf-lives”[3] in terms of the value proposition they provide, we proceed nonetheless.

A. The Path to Da Silva Moore

[4] The Law of Search and Retrieval. In the beginning, there was manual review. Any graduate of a law school during the latter part of the twentieth century who found herself or himself employed before the year 2000 at a law firm specializing in litigation and engaged in high-stakes discovery remembers well how document review was conducted: legions of lawyers with hundreds if not thousands of boxes in warehouses, reviewing folders and pages one-by-one in an effort to find the relevant needles in the haystack.[4] (Some of us also remember “Sheparding” a case to find subsequent citations to it, using red and yellow booklets, before automated key-citing came along.) Although manual review continues to remain a default practice in a variety of more modest engagements, it is increasingly the case that all of discovery involves “e-Discovery” of some sort—that the world is simply “awash in data”[5] (starting but by no means ending with email, messages and other textual documents of all varieties), and that it will increasingly be the unusual case of any size where documents in paper form still loom large as the principal source of discovery.

[5] At the turn of the century, the dawning awareness of the need to deal with a new realm of electronically stored information (“ESI”) led to burgeoning efforts on many fronts, including, for example, the creation of The Sedona Conference working group on electronic document retention and production, members of which drafted The Sedona Conference Principles: Addressing Electronic Document Production (2005; 2d ed. 2007) and its “prequel,” The Sedona Guidelines: Best Practice Guidelines and Commentary for Managing Records and Information in the Electronic Age (2005; 2d ed. 2007). These early commentaries, including a smattering of pre-2006 case law,[6] recognized that changes in legal practice were necessary to accommodate the big changes coming in the world of records and information management within the enterprise. Subsequent developments would constitute various complementary threads leading to the greater use of analytics in the legal space.

[6] First, part of that early recognition was that in an inflationary universe of rapidly expanding amounts of ESI, new tools and techniques would be necessary for the legal profession to adapt and keep up with the times.[7] By the time of adoption of the revised Federal Rules of Civil Procedure in 2006, which expressly added the term “ESI” to supplement “documents” in the rule set applicable to discovery practice, the legal profession was well aware of the need to perform automated searches in the form of keyword searching within large data sets as the only realistically available means for sorting information into relevant and non-relevant evidence in particular engagements, be they litigation or investigations. So too, it was recognized early on in commentaries[8] and followed by case law[9] that keyword searching, as good a tool as it was, had profound limitations that in the end do not scale well. At the end of the day, even being able to limit or cull down a large data set to one percent of its original size through the use of keywords leaves the lawyer with the near impossible task of manually reviewing a very large set of documents at great cost.[10]

[7] Second, in evolving e-Discovery practice after 2006, a growing recognition also occurred around the idea that e-Discovery workflows are an “industrial” process in need of better metrics and measures for evaluating the quality of productions of large data sets. As recognized in The Sedona Conference Commentary on Achieving Quality in E-discovery (Post-Public Comment Version 2013):

The legal profession has passe
d a crossroads: When faced with a choice between continuing to conduct discovery as it had “always been practiced” in a paper world—before the advent of computers, the Internet, and the exponential growth of electronically stored information (ESI)—or alternatively embracing new ways of thinking in today’s digital world, practitioners and parties acknowledged a new reality and chose progress. But while the initial steps are completed, cost-conscious clients and over-burdened judges are increasingly demanding that parties find new approaches to solve litigation problems.[11]

[8] The Commentary goes on to suggest that the legal profession would benefit from greater

awareness about a variety of processes, tools, techniques, methods, and metrics that fall broadly under the umbrella term “quality measures” and that may be of assistance in handling ESI throughout the various phases of the discovery workflow process. These include greater use of project management, sampling, machine learning, and other means to verify the accuracy and completeness of what constitutes the “output” of e-[D]iscovery. Such collective measures, drawn from a wide variety of scientific and management disciplines, are intended only as an entry-point for further discussion, rather than an all-inclusive checklist or cookie-cutter solution to all e-[D]iscovery issues.[12]

[9] Indeed, more recent case law has recognized the need for quality control, including through the use of greater sampling, iterative methods, and phased productions in line with principles of proportionality.[13] Still other case law has emphasized the need for cooperation among parties in litigation on technical subjects, especially at the margins of, or outside the range of, lawyer expertise if not basic competence.

[10] Active or supervised “machine learning,” as referred to here in the context of e-Discovery, refers to a set of analytical tools and techniques that go by a variety of names, such as “predictive coding,” “computer-assisted review,” and “technology assisted review.” As explained in one helpful recent monograph:

Predictive coding is the process of using a smaller set of manual reviewed and coded documents as examples to build a computer generated mathematical model that is then used to predict the coding on a larger set of documents. It is a specialized application of a class of techniques referred to as supervised machine-learning in computer science. Other technical terms often used to describe predictive coding include document (or text) “classification” and document (or text) “categorization.”[14]

[11] And as stated in The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (Post-Public Comment Version 2013):

Generally put, computer- or technology-assisted approaches are based on iterative processes where one (or more) attorneys or [Information Retrieval] experts train the software, using document exemplars, to differentiate between relevant and non-relevant documents. In most cases, these technologies are combined with statistical and quality assurance features that assess the quality of the results. The research . . . has demonstrated such techniques superior, in most cases, to traditional keyword based search, and, even, in some cases, to human review.

The computer- or technology-assisted review paradigm is the joint product of human expertise (usually an attorney or IR expert working in concert with case attorneys) and technology. The quality of the application’s output, which is an assessment or ranking of the relevance of each document in the collection, is highly dependent on the quality of the input, that is, the human training. Best practices focus on the utilization of informed, experienced, and reliable individuals training the system. These individuals work in close consultation with the legal team handling the matter, for engineering the application. Similarly . . . the defensibility and usability of computer- or technology-assisted review tools require the application of statistically-valid approaches to selection of a “seed” or “training” set of documents, monitoring of the training process, sampling, and quantification and verification of the results.[15]

A discussion of the mathematical algorithms that underlie predictive coding is beyond the intended scope of this Article, but the interested reader should refer to references cited at the margin to understand better what is “going on under the hood” with respect to the mathematics involved.[16]

[12] The Da Silva Moore Precedent. The various threads in search and retrieval law, including the need for advanced search methods applied to document review in a world of increasingly large data sets, were well known by 2012. In February 2012, drawing on recent research and scholarship emanating out of the Text Retrieval Conference (TREC) Legal Track[17] and the 2007 public comment version of The Sedona Conference Search Commentary,[18] Judge Peck approached the Da Silva Moore case as an appropriate vehicle to provide a judicial blessing for the use of predictive coding in e-Discovery. In doing so, however, Judge Peck’s opinion may also be viewed as setting the stage for greater use of analytics generally in the information governance practice area, beyond “mere” e-Discovery.

[13] Plaintiffs in Da Silva Moore brought claims of gender discrimination against defendant advertising conglomerate Publicis Groupe and its United States public relations subsidiary, defendant MSL Group.[19] Prior to the February 2012 opinion issued by Judge Peck, the parties had already agreed that defendant MSL would use predictive coding to review and produce relevant documents, but disagreed on methodology.[20] Defendant MSL proposed starting with the manual review of a random sample of documents to create a “seed set” of documents that would be used to train the predictive coding software.[21] Plaintiffs would participate in the creation of the “seed set” of documents by offering keywords.[22] All documents reviewed during the creation of the “seed set,” relevant or irrelevant, would be provided to plaintiffs.[23]

[14] After creation of the seed set of documents, MSL proposed using a series of “iterative rounds” to test and stabilize the training software.[24] The results of these iterative rounds would be provided to plaintiffs, who would be able to provide feedback to further refine the searches.[25] Judge Peck accepted MSL’s proposal.[26] Plaintiffs filed objections with the district judge on the grounds that Judge Peck’s approval of MSL’s protocol unlawfully disposed of MSL’s duty under Federal Rule of Civil Procedure 26(g) to certify the completeness of its document collection, and the methodology in MSL’s protocol was not sufficiently reliable to satisfy Federal Rule of Evidence 702 and Daubert.[27]

[15] Judge Peck found the plaintiffs’ objections to be misplaced and irrelevant.[28] With respect to Federal Rule of Civil Procedure 26(g), Judge Peck commented that no attorney could certify the completeness of a document production as large as MSL’s. Moreover, Federal Rule of Civil Procedure 26(g) did not require the type of certification plaintiffs described.[29] Further, Federal Rule of Evidence 702 and Daubert are applicable to expert methodology, not to methodologies used in electronic discovery.[30] Judge Peck went on to note that the decision to allow computer-assisted review in this case was easy because the parties agreed to this method of document collection and review.[31] While computer-assisted review may not be a perfect system, he found it to be more
efficient and effective than using manual review and keyword searches to locate responsive documents.[32] Use of predictive coding was appropriate in this case considering:

(1) the parties’ agreement, (2) the vast amount of ESI to be reviewed (over three million documents), (3) the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches), (4) the need for cost effectiveness and proportionality under Rule 26(b)(2)(C), and (5) the transparent process proposed by MSL.[33]

[16] In issuing this opinion, Judge Peck became the first judge to approve the use of computer-assisted review.[34] He also stressed the limitations of his opinion, stating that computer-assisted review may not be appropriate in all cases, and his opinion was not intended to endorse any particular computer-assisted review method.[35] However, Judge Peck encouraged the Bar to consider computer-assisted review as an available tool for “large-data-volume cases” where use of such methods could save significant amounts of legal fees.[36] Judge Peck also stressed the importance of cooperation, or what he called “strategic proactive disclosure of information.” If counsel is knowledgeable about the client’s key custodians and fully explains proposed search methods to opposing counsel and the court, those proposed search methods are more likely to be approved. To sum up his opinion, Judge Peck noted that “[c]ounsel no longer have to worry about being the ‘first’ or ‘guinea pig’ for judicial acceptance of computer-assisted review. . . . Computer-assisted review now can be considered judicially-approved for use in appropriate cases.”[37] In the two years since Da Silva Moore, in addition to cases in which the parties have agreed upon a predictive coding methodology,[38] courts have confronted the issue of having to rule on either the requesting or responding party’s motion to compel a judicial “blessing” of the use of predictive coding (however termed). In Global Aerospace,[39] the responding party asked that the court approve its own use of such technique; in Kleen Products, the requesting party made an ultimately unsuccessful demand for a “do-over” in discovery, where the responding party had used keyword search methods and the plaintiffs were demanding that more advanced methods be tried.[40] In the EOHRB case, the Court sua sponte suggested that the parties consider using predictive coding, including the same vendor.[41] And in the In re Biomet case,[42] the court approved a predictive coding methodology over the objections of the requesting party. These cases represent only some of the reported decisions to date, and we suspect that there will be dozens of reported cases and many more unreported ones in the near term.

[17] As recognized in these cases (implicitly or explicitly), as well as in a growing number of commentaries,[43] predictive coding is an analytical technique holding the promise of achieving much greater efficiencies in the e-Discovery process. Notwithstanding Da Silva Moore’s call to action, it needs to be conceded, however, that the research has not proven that active machine learning techniques will always achieve greater scores than keyword search or manual review.[44] Additionally, we bow to the reality that in a large class of cases the use of predictive coding is currently infeasible or unwarranted, especially as a matter of cost.[45]

[18] Nevertheless, it seems apparent that the legal profession finds itself in a new place—namely, in need of recognizing that artificial intelligence techniques are growing in strength from year to year—and thus it appears to be only a matter of time until a much greater percentage of complex cases involving a large magnitude of ESI will constitute good candidates for lawyers using predictive coding techniques, both as available currently and as improved with future technological progress. As William Gibson once put it, “the future is here, it’s just not evenly distributed.”[46]

B. Information Governance and Analytics in the Era of Big Data

[19] We are now in a post-Da Silva Moore, “Big data” era where lawyers are on constructive (if not actual) notice of a world of technology assisted review techniques available at least in the sphere of e-Discovery. The proposition being advanced is that the greater revelation of Da Silva Moore is how similar the techniques being put forward as best practices in e-Discovery fit a larger realm of issues familiar to lawyers, many of which fall within what is increasingly being recognized as “information governance” practice. It is here where we can break new ground in our legal practice by recommending the use of these advanced techniques to solve real-world problems of our clients. First, however, some definitions are in order to better frame the legal issues that will follow in Section C.

[20] Big data. It has been noted that “Big data is a loosely defined term used to describe data sets so large and complex that they become awkward to work with using standard statistical software.”[47] Alternatively, “Big data” is a term that “describe[s] the technologies and techniques used to capture and utilize the exponentially increasing streams of data with the goal of bringing enterprise-wide visibility and insights to make rapid critical decisions.”[48]

[21] The fact that the data encountered within the corporate enterprise increasingly is indeed “big” means, at least according to Gartner, that it not only has volume, but velocity and complexity as well.[49] As Bill Franks has put it, “What this means is that you aren’t just getting a lot of data when you work with big data. It’s also coming at you fast, it’s coming at you in complex formats, and it’s coming at you from a variety of sources.”[50] These elements all significantly contribute to the challenge of finding signals in the noise.

[22] These definitions seem to get us closer to what makes Big data a new and interesting phenomenon in the world: it is not its volume alone, but the fact that we are able to “mine” large data sets using new and advanced techniques to uncover unexpected relationships, patterns and categories within these data sets, that makes the field potentially exciting. Indeed, “it is tempting to understand big data solely in terms of size. But that would be misleading. Big data is also characterized by the ability to render into data many aspects of the world that have never been quantified before; call it ‘datafication.’”[51]

[23] Analytics. Second, we need to place “predictive coding” as one form of active machine learning in the context of the broader realm of “analytics.” In their book, Keeping Up With the Quants: Your Guide To Understanding and Using Analytics,[52] authors Thomas Davenport and Jinho Kim provide a useful construct in categorizing the newly emergent field of “analytics”: they define analytics to mean “the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and add value,” going on to say that “[a]nalytics is all about making sense of big data, and using it for competitive advantage.” The authors divide the world of analytics into three categories:

(i) descriptive analytics – gathering, organizing, tabulating and depicting data;

(ii) predictive analytics – using data to predict future courses of action; and

(iii) prescriptive analytics – recommendations on future courses of action.[53]

[24] To the extent that “predictive coding” has been used to date to have machines “predict” relevancy in large ESI data sets, the term comfortably can be said to fall within category (ii). But the world of analytics is a larger universe, encompassing a greater number of mathematical magic tricks,[54] and this should be kept in mind as we choose to limit our discussion here to a few examples of how predictive coding as one form of analytics may be usefully applied in non-traditional contexts.[55]

[25] Corporations (much ahead of the legal profession) have rushed headlong during the past half-decade to use a variety of analytics to understand the Big data they increasingly hold, to add value, and to improve the bottom line.[56] A 2013 AIIM study indicates that corporations find analytics to be useful in a variety of settings.[57]

[26] Information Governance. “Information governance,” as defined in The Sedona Conference’s recently published Commentary on the subject, means:

an organization’s coordinated, interdisciplinary approach to satisfying information legal and compliance requirements and managing information risks while optimizing information value. As such, Information Governance encompasses and reconciles the various legal and compliance requirements and risks addressed by different information focused disciplines, such as records and information management (“RIM”), data privacy, information security, and e-[D]iscovery.[58]

Or, as highlighted by the seminal law review article devoted to information governance written by Charles R. Ragan who quotes Barclay Blair in defining information governance as a “‘new approach’ that “builds upon and adapts disciplines like records management and retention, archiving business analytics, and IT governance to create an integrated model for harnessing and controlling enterprise information . . . [I]t is an evolutionary model that requires organizations to make real changes.”[59]

[27] As the Sedona IG Commentary highlights, “many organizations have traditionally used siloed approaches when managing information.”[60] The “core shortcoming” of this approach is “that those within particular silos are constrained by the culture, knowledge, and short-term goals of their business unit, administrative function, or discipline.”[61] This leads in turn to key actors within the organization having “no knowledge of gaps and overlaps in technology or information in relation to other silos. . . .”[62] In such situations, “[t]here is no overall governance or coordination for managing information as an asset, and there is no roadmap for the current and future use of information technology.”[63]

[28] The Sedona IG Commentary goes on to provide eleven principles of what constitutes good IG practices, of which Principle 10 is of special relevance to our discussion here: “An organization should consider leveraging the power of new technologies in its Information Governance program.”[64] As stated therein,

Organizations should consider using advanced tools and technologies to perform various types of categorization and classification activities. . . such as machine learning, auto-categorization, and predictive analytics to perform multiple purposes, including (i) optimizing the governance of information for traditional RIM [records and information management]; (ii) providing more efficient and more efficacious means of accessing information for e-discovery, compliance, and open records laws, and (iii) advancing sophisticated business intelligence across the enterprise.[65]

With respect to the latter category, the Commentary goes on to specifically identify areas where predictive analytics may be used in compliance programs “to predict and prevent wrongful or negligent conduct that might result in data breach or loss,” as a type of “early warning system.”[66] It is precisely this latter type of conduct that we wish to primarily explore in the next section, along with a few final words on using analytics with auto-categorization for the purpose of records classification and data remediation.

C. Applying the Lessons of E-Discovery In Using Analytics for Optimal Information Governance: Some Examples

[29] Advanced analytics are increasingly being used in the e-Discovery context because the legal profession has begun to realize the limitations of manual and keyword searching, while at the same time seeing how advanced techniques are at least as efficacious and far more efficient in a wide variety of substantial engagements. But more efficient and at least as equally effective at doing what, precisely? In e-Discovery, the primary information task involves separating relevant from non-relevant, and to a secondary degree, privileged from non-privileged information, in documents and ESI. Indeed, lawyers are under a duty to make “reasonable”—not perfect—efforts to find all relevant documents within the scope of a given discovery request.[67] The illusiveness of this quest in an exponentially expanding data universe is becoming increasingly apparent to many.[68]

[30] Moreover, the degree of success in being able to either find or demand substantial amounts of relevant information is not (nor should it be) the fundamental goal or point of engaging in e-Discovery.[69] Rather, the liberal discovery rules that at least U.S. lawyers operate within have as their underlying purpose the ferreting out of important, material facts to the case at hand. The increasingly overwhelming nature of ESI poses clear technological obstacles to a lawyer en route to efficiently engaging in developing facts from all those relevant documents to determine what happened and why.[70] The promise of using an advanced analytical method such as predictive coding is its ability to quickly find and rank-order the most relevant documents for answering these questions. For once we determine how something happened and why, it is relatively straightforward to figure out the parties’ respective rights, responsibilities, and even liability. That is precisely the point of litigation, and the purpose of the Rules that govern it.[71] And, facts drive it all.

[31] Given our increasing ability in litigation in finding the most relevant needles (i.e., facts) in the Big data haystack, it stands to consider whether similar methods may be successfully applied in non-litigation contexts. Somewhat paradoxically, however, experience indicates that there are advantages to dealing with larger volumes of data when applying analytical tools and methods to solve corporate legal issues. That is, while a vast amount of data residing in corporate networks and repositories admittedly poses complex information governance challenges, the volume of Big data also may be a boon to the investigator simply trying to figure out what happened. This is the case because there are simply many more data points from which to derive facts. One can liken the phenomenon to the difference in quality of a one-megapixel versus a ten-megapixel picture: the difference in the quality of the image is a function of the greater density of points of illumination.

[32] Big data is more data, and more data means the potential for a more complete picture of what happened in a given situation of interest, assuming of course that the facts can be captured efficiently. The problem is not one of volume, but of visibility. In the era of Big data, the investigator with the more powerful analytical methods, who can search into vast repositories of ESI to draw out the facts that are critical to the question at hand, is king (or queen). This is where the skillful application of advanced analytics to Big data can bring about some remarkable results. The true strategic advant
age of advanced analytics is the speed with which an accurate answer can be ascertained.[72]

[33] True Life Example #1.[73] A corporate client is being sued by a former employee in a whistleblower qui tam action.[74] Because of the False Claims Act allegations, the suit represented a significant threat to the company. The corporation retains counsel to understand the client’s information systems as well as its key players, and to assist in the implementation of a litigation hold. Counsel strategically targets the data most likely to shed light on the facts. The law firm’s Fact Development Team applies advanced analytics to 675,000 documents, and within four days knows enough to defend the client’s position that the allegations are indisputably baseless. All of this is done before the answer to the Complaint was due.

[34] Armed with this information, counsel for the corporation approached plaintiff’s counsel and asked to meet. Prior to the meeting, the corporation voluntarily produced 12,500 documents that laid out the parties’ position precisely. Counsel then met with plaintiff’s counsel and walked them through the evidence, laying out all the facts. The case ended up being settled within days for what amounted to nuisance value based on a retaliation claim—without any discovery, and at a small fraction of the cost budgeted for the litigation.

[35] This example indicates that the real power of advanced analytics is not merely in potentially reducing the cost of vexatious litigation, but rather the strategic advantage that comes with counsel getting to an answer quickly and accurately. This precise strategic advantage has many applications outside of litigation, each of which involves an aspect of optimizing information governance.

[36] Only a short step away from the direct litigation realm is using advanced analytics for investigations, either in response to a regulatory inquiry or for purely internal purposes. As we have already seen, corporate clients are often faced with circumstances where determining whether an allegation is true, and the scope of the potential problem if it is, is critically important. Often, management must wait, unsure of their company’s exposure and how to remediate it, while traditional investigation techniques crawl along. However, with the skillful application of advanced analytics upon the right data set, accurate answers can be determined with remarkable speed.

[37] True Life Example #2. A highly regulated manufacturing client decided to outsource the function of safety testing some of its products. A director of the department whose function was being outsourced was offered a generous severance package. Late on a Friday afternoon, the soon-to-be former director sent an email to the company’s CEO demanding four times the severance amount and threatened to go to the company’s regulator with a list of ten supposed major violations that he described in the email if he did not receive what he was asking for. He gave the company until the following Monday to respond.

[38] The lawyers were called in. They analyzed the list of allegations and determined which IT systems would most likely contain data that would prove their veracity and immediately pulled the data. Applying advanced analytics, the law firm’s Fact Development Team analyzed on the order of 275,000 documents in thirty-six hours. By that Monday morning, counsel was able to present a report to the company’s board indisputably proving that the allegations were unfounded.

[39] True Life Example #3. A major company received a whistleblower letter from a reputable third party alleging that several senior personnel were involved with an elaborate kickback scheme that also involved FCPA violations. If true, the company would have faced serious regulatory and legal issues, as well as major internal difficulties. Because of the extremely sensitive nature of the allegations, a traditional investigation was not possible; even knowing certain personnel were under investigation could have had immense consequences.

[40] The lawyers were tasked with determining whether there was any information within the company’s possession that shed any light on the allegations. If there were, the company would proceed to take whatever steps were required. The investigation was of such a secret nature that no one was authorized to involve the internal IT staff. Fortunately, counsel knew the company and its information systems well. Over a weekend, they were able to pull 8.5 million documents from relevant systems using the law firm’s personnel. This turned out to be a highly complex investigation involving a number of potential subjects, where the task involved tracking the subject’s travel, meetings with suppliers, subsequent sales orders and fulfillments, rebates and promotions, all across several years.

[41] Again, applying advanced analytics, the law firm’s Fact Development Team analyzed the 8.5 million documents in ten days. They were able to prove that the allegations were largely baseless, and precisely where there were potential areas of concern. Counsel also was able to make clear recommendations for areas of further investigation and for modifying compliance tracking and programs. The company was able to act quickly and with certainty. These real-life use cases illustrate how the power of analytics enhances the ability of lawyers to provide legal advice under conditions of “certainty” previously unobtainable, at least in the past few decades of the digital era. “Certainty” is a somewhat foreign concept in the law—lawyers tend to be a conservative and caveating bunch, largely because certainty has historically been hard to come by, or at least prohibitively expensive. With advanced analytics and good lawyers who know how to use these new tools, that is no longer necessarily the case. There is so much data that if one cannot, after a reasonable effort, find evidence of a fact in the vastness of a company’s electronic information (as long as you have the right information), the fact most likely is not true. Such has been illustrated, proving a negative is particularly useful in investigations.

[42] Using advanced analytics (and good lawyering) for investigations is not that far removed from using it for litigation: one is still attempting to find the answer to the question of what happened and why. But there are many other questions that companies would like to ask of their data. And indeed, both the analytics tools and the fact development techniques used in litigation and investigations can be “tuned” to solve a variety of novel issues facing our clients.

[43] For example, analytics can be used to vet candidates for political appointments as well as candidates for senior leadership positions. Due to the candid nature of the medium, providing access to corporate email coupled with using analytic capabilities allows for an accurate picture to be drawn before a decision is made with regard to making a candidate your next CEO or running mate. Analytics can be used to analyze business divisions to identify good and bad leaders, how decisions are made, why a division is more successful than another, and many more similar applications.

[44] Quite simply, a company’s data is the digital imprint of the actions and decisions of all of its managers and employees. Having insight into those actions and decisions can be immensely valuable. That value has lain largely fallow, hidden in plain sight because the valuable wheat could not effectively be sifted from the chaff. With the proper application of advanced analytics, that is no longer the case. The answers we can obtain are limited only by the creativity of management in asking the right questions.

[45] True Life Example #4. Advanced a
nalytics used upon the major acquisition of another company by a corporate client. As with most acquisitions, the client undertook traditional due diligence, gathering information from the target regarding its financial performance, customers, market share, receivables, potential liabilities, and came up with a valuation, an appropriate multiplier, and a final purchase price. Also as is typical, the acquisition agreement contained a provision such that if the disclosures made by the target were found to be off by a certain margin within thirty days of the acquisition, the purchase price would be adjusted.

[46] The moment the acquisition closed, the corporate client then owned all of the target’s information systems. Having some concern about the bases for some of the target’s disclosures, at the client’s request counsel proceeded to use analytics on those newly acquired systems to determine what we could about those disclosures. Preparing a company for sale is a complicated affair, with many people involved in gathering information to present to the acquirer to satisfy due diligence. This gathering and presentation of information is done primarily through electronic means—and leaves a trail.

[47] Using advanced analytics, the law firm’s Fact Development Team traced the compilation of the target’s due diligence information, including all of the discussion that went along with it. They were able to understand the source of each disclosure, the reasonableness of its basis, and any weaknesses within it. They uncovered disagreements within the target over such things as what the right numbers were, or how much of a liability to disclose. Using this information, counsel prepared a claim in accord with the adjustment provision seeking twenty-five percent of the purchase price totaling millions of dollars. The claim was primarily composed using quotes from their own documents. It is difficult to argue with yourself.

[48] As demonstrated, using advanced analytics in the form of predictive coding and similar technologies can accomplish some notable aims. But each of the prior examples uses data to look back to determine what has already occurred: the descriptive use of analytics.[75] This is extremely valuable. But for many of a law firm’s clients, it would be even more useful to be able to catch bad actors while the misconduct was occurring, or even to predict misconduct before it happens.

[49] Based on the anecdotal experience gathered from many past investigations, the authors believe that certain kinds of misconduct follow certain patterns, and that when bad actors are acting badly, they tend to undertake the same kinds of actions, or are experiencing similar circumstances. For example, in our experience the primary factors that pertain to a person committing fraud are personal relationship problems, financial difficulties, drug or alcohol problems, gambling, a feeling of under appreciation at work, and unreasonable pressure to achieve a work outcome without a legitimate way to accomplish it (and so they attempt illegitimate ways to do so). These factors are often detectable in the electronic information the subject creates. Similarly, a person who is harassing or discriminating against others also tends to undertake specific actions and use particular language in communications. All of these indicia of misconduct are detectable using advanced analytics and skillful strategy.

[50] Lawyers have gotten quite good at finding this information when looking back in time. We thought, then, that it should not be too difficult to find this information while the misconduct is unfolding, or to identify warning signs that misconduct is likely to occur, and seek to provide relief of certain factors where possible or take corrective action when needed and as early as possible. So, we put this to the test, developing Early Warning Systems (“EWS”) for some of our clients.

[51] The idea for an EWS first occurred to one of the authors when working on a pro bono matter with the ACLU in a case against the Baltimore Police Department (“BPD”) alleging unconstitutional arrest practices in its Zero Tolerance Policing policies.[76] As a result of the case, the BPD agreed to, among other things, implement a tracking system whereby certain data points were collected regarding police officer conduct and arrest practices that research had proven were warning signs of potential problem officers.[77] The accumulation of certain data points with respect to an officer triggered a review of the officer’s conduct, with various remediation outcomes.[78] We thought that a similar approach could be used for our clients.

[52] An EWS is a tricky thing to implement, and requires careful consideration of many factors, employee privacy at the forefront. However, with careful planning, policy development, and training, an effective EWS can be designed and implemented. Predictive analytics applications can be trained to search for indicia of the conduct, language, or factors across information systems. The specific systems to be targeted will vary depending on what is being sought and the systems most likely to contain it and will vary greatly from company to company. But, when properly trained and targeted, we have found these systems to be very effective in detecting and even preventing misconduct. We believe that this use of predictive analytics will become one of the most powerful applications of this technology in the near future.

[53] Moving from the business intelligence aspects of information governance to the arguably more prosaic field of records and information management, the authors also count themselves as true believers in the power of analytics to optimize traditional RIM (records and information management) functionality. A full discussion of archival and records management practices in the digital age is beyond the scope of this Article, but the interested reader will find a wealth of scholarly literature in the leading journals discussing how the traditional practice of records management is being transformed in the digital age. One of the authors has argued that predictive coding and like methods are the most promising way to open up “dark archives” in the public sector, such as digital collections of data appraised as permanent records (mostly consisting of White House email at this point), that for reasons of privacy or privilege will be otherwise inaccessible to the public for many decades to come.[79]

[54] In the authors’ experience, email archiving using auto-categorization for recordkeeping purposes is available using existing software in the marketplace. In such instances, email is populated in specific “buckets” in a repository depending on how it is characterized, based on either the position of the creator or recipient of the email, the subject matter, or based on some other attribute appearing as metadata.[80] In the most advanced versions of auto-categorization software, the system “learns” as it is trained using exemplars in a seed set selected by subject matter experts (i.e., records managers or expert end users), via a protocol highly reminiscent of the methods adopted by the parties in Da Silva Moore and similar cases. It is only a matter of time before predictive analytics is more widely used to optimize auto-classification while reducing the burden on end users to perform manual records management functions.[81]

[55] In similar fashion, the power of predictive analytics to reliably classify content after adequate training makes such tools optimal for data remediation efforts. The problem of legacy data in corporations is well known, and only growing over time with the inflationary expansion of the ESI universe.[82] Using advanced analytics to classify low value data, the chaos that is the reality of most shared drives and other joint data repositories, may potentially be reduced by
orders of magnitude. The challenge of engaging in defensible deletion is one important aspect of optimizing information governance.[83]

Conclusion

[56] As was made clear at the outset, it is the authors’ intent merely to scratch the surface of what is possible in the analytics space as applied to matters of importance for corporate information governance. No one has a one hundred percent reliable crystal ball, but it seems evident that as computing power increases, those forms of artificial intelligence that we have referred to here as analytics will themselves only grow in importance in both our daily and professional lives. By the end of this decade, we would be surprised if the following do not occur: pervasive use of business intelligence software; the use of more automated decision-making (also known as “operational business intelligence”); the use of alerts in the form of early warning systems including the type described above; much greater use of text mining and predictive technologies across a variety of domains.[84]

[57] All of these developments dovetail with the expected demand on the part of corporate clients for lawyers to be familiar with state of the art practices in the information governance space, as already anticipated by the type of technology that Da Silva Moore and related cases suggest. As best said in The Sedona Commentary on Achieving Quality in E-Discovery, “[i]n the end, cost-conscious firms, organizations, and institutions of all types that are intent on best practices . . . will demand that parties undertake new ways of thinking about how to solve e-[D]iscovery problems. . . .” [85] The same holds true for the greater playing field of information governance. Lawyers who have embraced analytics will have a leg up on their competition in this brave new space.

* Mr. Borden is a partner in the Commercial Litigation section at Drinker Biddle & Reath, LLP, Washington, D.C., where he serves as Chair of the Information Governance and e-Discovery Group. He is Co-Chair of the Cloud Computing Committee and Vice Chair of the e-Discovery and Digital Evidence Committee of the Science and Technology Law Section of the ABA. He is also a founding member of the steering committee for the Electronic Discovery Section of the District of Columbia Bar. B.A., with highest honors, George Mason University; J.D., cum laude, Georgetown University Law School.

** Mr. Baron serves as Of Counsel in the Information Governance and e-Discovery Group, Drinker Biddle & Reath, LLP, Washington, D.C, and is on the Adjunct Faculty at the University of Maryland. He formerly served as Director of Litigation at the National Archives and Records Administration, and is a former steering committee Co-Chair of The Sedona Conference Working Group 1 on Electronic Document Retention and Production. B.A., magna cum laude, Wesleyan University; J.D., Boston University School of Law. The authors wish to thank Drinker Biddle & Reath associates Amy Frenzen and Nicholas Feltham for their assistance in the drafting of this article. The views expressed are the authors’ own and do not necessarily reflect the views of any institution, public or private, that they are affiliated with.

[1] See infra text accompanying notes 47-49 for a definition.

[2] Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, 192 (S.D.N.Y. 2012), aff’dsub nom. Moore v. Publicis Groupe SA, 2012 U.S. Dist LEXIS 58742 (S.D.N.Y. Apr. 26, 2012) (Carter, J.).

[3] We recognize the paradox of articles living “forever” on the Internet, especially when published in online journals such as this one, while at the same time ever more rapidly becoming obsolete and out of date.

[4]See generally The Sedona Conference, The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery, 8 Sedona Conf. J. 189, 198 (2007) [hereinafter Sedona Search Commentary].

[5] Thomas H. Davenport & Jinho Kim, Keeping Up with the Quants: Your Guide To Understanding and Using Analytics 1-2 (2013).

[6] See Sedona Search Commentary, supra note 4, at 200-201 nn.16-19.

[7]See, e.g., George L. Paul & Jason R. Baron, Information Inflation: Can The Legal System Adapt?, 13 Rich. J.L. & Tech. 10, ¶ 2 (2007), http://law.richmond.edu/jolt/v13i3/article10.pdf.

[8] Id.; see Sedona Search Commentary, supra note 4, at 201-202; Mia Mazza, Emmalena K. Quesada, & Ashley L. Stenberg, In Pursuit of FRCP1: Creative Approaches to Cutting and Shifting Costs of Discovery of Electronically Stored Information, 13 Rich. J.L. & Tech. 11, ¶ 46 (2007), http://jolt.richmond.edu/v13i3/article11.pdf.

[9]See Victor Stanley v. Creative Pipe, 250 F.R.D. 251, 256-7 (D. Md. 2008); see also United States v. O’Keefe, 537 F. Supp. 2d 14, 23-24 (D.D.C. 2008); William A. Gross Const. Ass’n v Am. Mfrs. Mut. Ins. Co., 256 F.R.D. 134, 135 (S.D.N.Y. 2009); Equity Analytics, LLC v. Lundin, 248 F.R.D. 331, 333 (D.D.C. 2008); In re Seroquel Prod. Liab. Litig., 244 F.R.D. 650, 663 (M.D. Fla. 2007). See generally Jason R. Baron, Law in the Age of Exabytes: Some Further Thoughts on ‘Information Inflation’ and Current Issues in E-Discovery Search, 17 Rich. J.L. & Tech. 9, ¶ 11 n.38 (2011), http://jolt.richmond.edu/v17i3/article9.pdf.

[10] See Paul & Baron, supra note 7, at ¶ 20; see also Bennett B. Borden, The Demise of Linear Review, Williams Mullen E-Discovery Alert, Oct. 2010, at 1, http://www.clearwellsystems.com/e-discovery-blog/wp-content/uploads/2010/12/E-Discovery_10-05-2010_Linear-Review_1.pdf.

[11] The Sedona Conference, The Sedona Conference Commentary on Achieving Quality in e-Discovery 1 (Post-Public Comment Version 2013), available at www.thesedonaconference.org/publications (for publication 15 Sedona Conf. J. ___ (2014) (forthcoming)).

[12]Id.

[13] See, e.g., William A. Gross Constr., 256 F.R.D. at 136; Seroquel, 244 F.R.D. at 662. See generally Bennett B. Borden et al., Four Years Later: How the 2006 Amendments to the Federal Rules Have Reshaped the E-Discovery Landscape and Are Revitalizing the Civil Justice System, 17 Rich. J.L. & Tech. 10, ¶¶ 30-37 (2011), http://jolt.richmond.edu/v17i3/article10.pdf; Ralph C. Losey, Predictive Coding and the Proportionality Doctrine: A Marriage Made in Big Data, 26 Regent U. L. Rev. 7, 53 n.189 (2013) (collecting cases on proportionality).

[14] Rajiv Maheshwari, Predictive Coding Guru’s Guide 21 (2013); see also Baron, supra note 9, at ¶ 32, n.124 (stating predictive coding and other like terminology as used by e-Discovery vendors); Maura R. Grossman & Gordon V. Cormack, The Grossman-Cormack Glossary of Technology-Assisted Review, 7 Fed. Cts. L. Rev. 1, 4 (2013), http://www.fclr.org/fclr/articles/html/2010/grossman.pdf; Nicholas M. Pace & Laura Zakaras, Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery, RAND Institute for Civil Justice 59 (2012), available at http://www.rand.org/pubs/monographs/MG1208.html (defining predictive coding).

[15] The Sedona Conference, The Sedona Conference Best Practices Commentary on the Use of Search and Information Retrieval Methods in E-Discovery (Post-Public Comment Version 2013), available at www.thesedonaconference.org/publications (for publication in 15 Sedona Conf. J. ___ (2014)). For an excellent, in-depth discussion of how a practitioner may use predictive coding in e-Di
scovery, with references to experiments by the author, see Losey, supra note 13, at 9.

[16] See, e.g., Sedona Search Commentary, supra note 4, at app. 217-223 (describing various search methods); Douglas W. Oard & William Webber, Information Retrieval for E-Discovery, 7 Foundations and Trends in Information Retrieval 100 (2013), available at http://terpconnect.umd.edu/~oard/pdf/fntir13.pdf; Jason R. Baron & Jesse B. Freeman, Cooperation, Transparency, and the Rise of Support Vector Machines in E-Discovery: Issues Raised By the Need to Classify Documents as Either Responsive or Nonresponsive (2013), http://www.umiacs.umd.edu/~oard/desi5/additional/Baron-Jason-final.pdf. For good resources in the form of information retrieval textbooks, see Gary Miner, et al., Practical Text Mining and Statistical Structured Text Data Applications (Elsevier: Amsterdam) (2012); Christopher D. Manning, Prabhakar Raghavan, & Hinrich Schutze, Introduction to Information Retrieval (2008).

[17] SeeTREC Legal Track, U. Md., http://trec-legal.umiacs.umd.edu (last visited Feb. 23, 2014) (collecting Overview reports from 2006-2011) (as explained on its home page, “[t]he goal of the Legal Track at the Text Retrieval Conference (TREC) [was] to assess the ability of information retrieval techniques to meet the needs of the legal profession for tools and methods capable of helping with the retrieval of electronic business records, principally for use as evidence in civil litigation.”); see also Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient than Exhaustive Manual Review, 17 Rich. J.L. & Tech. 11, ¶¶ 3-4 (2011), http:/jolt.richmond.edu/v17i3/article11.pdf; Patrick Oot, et al., Mandating Reasonableness in a Reasonable Inquiry, 87 Denv. U.L. Rev. 533, 558-559 (2010); Herbert Roitblat et al., Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, 61 J. Am. Soc’y for Info. Sci. & Tech. 70, 77-79 (2010), available at http://onlinelibrary.wiley.com/doi/10.1002/asi.21233/full; see generally Pace & Zakaras, supra note 14, at 77-80.

[18] Sedona Search Commentary, supra note 4, at 192-193.

[19] Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, 183 (S.D.N.Y 2012), aff’d sub nom. Moore v. Publicis Groupe SA, 2012 U.S. Dist LEXIS 58742 (S.D.N.Y. Apr. 26, 2012) (Carter, J.).

[20] Id. at 184-87.

[21] Id. at 186-87.

[22] Id. at 187.

[23] Id.

[24] Da Silva Moore, 287 F.R.D. at 187.

[25] Id.

[26] Id.

[27] Id. at 188-89.

[28] Id.

[29] Da Silva Moore, 287 F.R.D. at 188.

[30] Id. at 188-89 (citing Daubert v. Merrell Dow Pharms., 509 U.S. 579, 585 (1993)). But cf. David J. Waxse & Benda Yoakum-Kris, Experts on Computer-Assisted Review: Why Federal Rule of Evidence 702 Should Apply to Their Use, 52 Washburn L.J. 207, 219-23 (2013) (arguing that the Daubert standard should be applied to experts presenting evidence on ESI search and review methodologies)

[31] Id. at 189.

[32] Id. at 190-91; see Grossman & Cormack, supra note 17, at ¶ 61.

[33] Da Silva Moore, 287 F.R.D. at 192.

[34] Id. at 193.

[35] Id.

[36] Id.

[37] Id.

[38] See, e.g., In re Actos (Pioglitazone) Prods. Liab. Litig., No. 6:11-md-2299, 2012 U.S. Dist. LEXIS 187519, at *20 (W.D. La. July 27, 2012).

[39] Global Aero. Inc. v. Landow Aviation, No. CL 61040, 2012 Va. Cir. LEXIS 50, at *2 (Apr. 23, 2012).

[40] Kleen Products, LLC v. Packaging Corp., No. 10 C 5711, 2012 U.S. Dist. LEXIS 139632, at *61-63 (N.D. Ill. Sept. 28, 2012).

[41] EORHB v. HOA Holdings, Civ. Ac. No. 7409-VCL (Del. Ch. Oct. 15, 2012), 2012 WL 4896670, as amended in a subsequent order, 2013 WL 1960621 (Del. Ch. May 6, 2013).

[42] In re Biomet M2a Magnum Hip Implant Prods. Liab. Litg., No. 3:12-MD-2391, 2013 U.S. Dist. LEXIS 84440, at *5-6, *9-10 (N.D. Ind. Apr. 18, 2013).

[43] See, e.g., Nicholas Barry, Note, Man Versus Machine Review: The Showdown Between Hordes of Discovery Lawyers and a Computer-Utilizing Predictive Coding Technology, 15 Vand. J. Ent. & Tech. L. 343, 344-345 (2013); Harrison M. Brown, Comment, Searching for an Answer: Defensible E-Discovery Search Techniques in the Absence of Judicial Voice, 16 Chap. L. Rev. 407, 407-409 (2013); Jacob Tingen, Technologies-That-Must-Not-Be-Named: Understanding and Implementing Advanced Search Technologies in E-Discovery, 19 Rich. J.L. & Tech 2, ¶ 63 (2012), http://jolt.richmond.edu/v19i1/article2.pdf.

[44] See Pace & Zakara, supra note 14, at 61-65.

[45] Cf. Losey, supra note 13, at 68.

[46] Pagan Kennedy, William Gibson’s Future is Now, N.Y. Times (Jan. 13, 2012), www.nytimes.com/2012/01/15/books/review/distrust-that-particular-flavor-by-william-gibson-book-review.html?pagewanted=all&_r=0.

[47] Chris Snijders, Uwe Matzat, & Ulf-Dietrich Reips, “Big Data”: Big Gaps of Knowledge in the Field of Internet Science, 7 Int’l J. Internet Sci. 1 (2012), http://www.ijis.net/ijis7_1/ijis7_1_editorial.pdf.

[48] Daniel Burrus, 25 Game Changing Trends That Will Create Disruption & Opportunity (Part I), Daniel burrus, http://www.burrus.com/2013/12/game-changing-it-trends-a-five-year-outlook-part-i/ (last visited Feb. 24, 2014).

[49] Bill Franks, Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics 5 (John Wiley & Sons, Inc. ed., 2012) (citing Stephen Prentice, CEO Advisory: ‘Big Data’ Equals Big Opportunity (2011)).

[50] Id. at 5.

[51] Kenneth Neil Cukier & Viktor Mayer-Schoenberger, The Rise of Big Data: How It’s Changing the Way We Think About the World, Council on Foreign Relations (Apr. 3, 2013), http://www.foreignaffairs.com/articles/139104/kenneth-neil-cukier-and-viktor-mayer-schoenberger/the-rise-of-big-data.

[52] Davenport & Kim, supra note 5.

[53] Id. at 3.

[54] See id. at 4-5 (providing a listing of various fields of research that make up a part of and comfortably fit within the broader term “Analytics,” including statistics, forecasting, data mining, text mining, optimization and experimental design).

[55] For additional titles in the popular literature, see Thomas H. Davenport & Jeanne G. Harris, Competing on Analytics: The New Science of Winning (2007); Franks, supra note 49; Thornton May, The New Know: Innovation Powered by Analytics (John Wiley & Sons, Inc. ed., 2009); Michael Minelli, Michele Chambers & Ambiga Dhiraj, Big Data Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses (John Wiley & Sons, Inc. ed., 2013); Eric Siegel, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (John Wiley & Sons, Inc. ed., 2013).

[56] See Davenport & Kim, supra note 5.

[57] See AIIM, Big Data and Content Analytics: measuring the ROI 9 (2013), available at http://www.aiim.org/Research-and-Publications/Research/Industry-Watch/Big-Data-2013. In a questionnaire asking “What type of analysis would
you like to do/already do on unstructured/semi-structured data?”, respondents identified over a dozen uses for analytics which they would consider of high value to their corporation, including: Metadata creation; Content deletion/retention/duplication; Trends/pattern analysis; Compliance breach, illegality; Fraud detection/prevention; Security re-classification/PII (personally identifiable information) detection; Predictive analysis/modeling; Data visualization; Cross relation with demographics; Incident prediction; Geo-correlation; Brand conformance; Sentiment analysis; Image/video recognition; and Diagnostic/medical. Id.

[58] The Sedona Conference, The Sedona Conference Commentary on Information Governance 2 (2013), available at https://thesedonaconference.org/publication [hereinafter Sedona IG Commentary].

[59] Charles R. Ragan, Information Governance: It’s a Duty and It’s Smart Business, 19 Rich. J.L. & Tech. 12, ¶ 32 (2013), http://jolt.richmond.edu/v19i4/article12.pdf (internal quotation marks omitted) (quoting Barclay T. Blair, Why Information Governance, in Information Governance Executive Briefing Book, 7 (2011), available at http://mimage.opentext.com/alt_content/binary/pdf/Information-Governance-Executive-Brief-Book-OpenText.pdf). For additional useful definitions of what constitutes information governance, see The Generally Accepted Recordkeeping Principles, ARMA Int’l, http://www.arma.org/r2/generally-accepted-br-recordkeeping-principles (last visited Feb. 24, 2014) (setting out eight principles of IG, under the headings Accountability, Integrity, Protection, Compliance, Availability, Retention, Disposition and Transparency); Debra Logan, What is Information Governance? And Why is it So Hard?, Gartner (Jan. 11, 2010), http://blogs.gartner.com/debra_logan/2010/01/11/what-is-information-governance-and-why-is-it-so-hard/ (defining IG on behalf of Gartner to be “the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival and deletion of information. It includes the processes, roles, standards and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals.”).

[60] Sedona IG Commentary, supra note 58, at 5.

[61] Id.

[62] Id.

[63] Id.

[64] Id. at 25.

[65] Sedona IG Commentary, supra note 58, at 25.

[66] Id. at 27.

[67] See Pension Comm. of Univ. of Montreal Pension Plan v. Banc of Am. Sec., LLC, 685 F. Supp. 2d 456, 461 (S.D.N.Y. 2010). The information task in e-Discovery is therefore very unlike the user experience with the leading, well-known commercial search engines on the Web in, for example, finding a place for dinner in a strange city. For the latter project, few individuals religiously scour hundreds of pages of listings even if thousands of “hits” are obtained in response to a select set of keywords; instead they browse only from the first few pages of listings. Yet the lawyer is tasked with making reasonable efforts to credibly retrieve “the long tail” represented by “any and all” documents in response to document requests so phrased under Federal Rule of Civil Procedure 34.

[68] See, e.g., Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182, 191 (S.D.N.Y. 2012), aff’dsub nom. Moore v. Publicis Groupe SA, 2012 U.S. Dist LEXIS 58742 (Apr. 26, 2012) (Carter, J.); Pension Comm., 685 F. Supp. 2d at 461.

[69] See Bennett B. Borden et al., Why Document Review Is Broken, EDIG: E-Discovery and Information Governance, May 2011, at 1, available at http://www.umiacs.umd.edu/~oard/desi4/papers/borden.pdf.

[70] An Insider’s Look at Reducing ESI Volumes Before E-Discovery Collection, EXTERRO, http://www.exterro.com/ondemand_webcast/an-insiders-look-at-reducing-esi-volumes-before-e-discovery-collection/ (last visited Feb. 24, 2014); Andrew Bartholomew, An Insider’s Perspective on Intelligent E-Discovery, E-Discovery Beat (Sept. 11, 2013), http://www.exterro.com/e-discovery-beat/2013/09/11/an-insiders-perspective-on-intelligent-e-discovery/.

[71] See Fed. R. Civ. P. 1 (“These rules . . . should be construed and administered to secure the just, speedy, and inexpensive determination of every action and proceeding.”) (emphasis added).

[72] Borden et al., supra note 69, at 3.

[73] All of the “True Life Examples” referred to in this article are “ripped from” the pages of the author’s legal experience, without embellishment.

[74] A qui tam suit is a lawsuit brought by a “private citizen (popularly called a ‘whistle blower’) against a person or company who is believed to have violated the law in the performance of a contract with the government or in violation of a government regulation, when there is a statute which provides for a penalty for such violations.” Qui Tam Action, The Free Dictionary, http://legal-dictionary.thefreedictionary.com/qui+tam+action (last visited Feb. 24, 2014); see also United States ex rel. Eisenstein v. City of New York, 556 U.S. 928, 932 (2009) (defining a qui tam action as a lawsuit brought by a private party alleging fraud on behalf of the government) (internal citations omitted).

[75] See Davenport & Kim, supra note 5, at 3.

[76] See Amended Complaint and Demand for Jury Trial, NAACP v. Balt. City Police Dep’t, No. 06-1863 (D. Md. Dec. 18, 2007), available at http://www.aclu-md.org/uploaded_files/0000/0205/amended_complaint.pdf.

[77] See Charles F. Wellford, Justice Assessment and Evaluation Services, First Status Report for the Audit of the Stipulation of Settlement Between the Maryland State Conference of NAACP Branches, et. al. and the Baltimore City Police Department, et. al. 2 (2012), available at http://www.aclu-md.org/uploaded_files/0000/0207/first_audit_report_april_30.pdf; see alsoPlaintiffs Win Justice in Illegal Arrests Lawsuit Settlement with the Baltimore City Police Department, ACLU (June 23, 2010), https://www.aclu.org/racial-justice/plaintiffs-win-justice-illegal-arrests-lawsuit-settlement-baltimore-city-police-depar.

[78] See Wellford, supra note 77, at 2, 14.

[79] See Jason R. Baron & Simon J. Attfield, Where Light in Darkness Lies: Preservation, Access and Sensemaking Strategies for the Modern Digital Archive, in The Memory of the World in the Digital Age Conference: Digitalization and Preservation 580-595 (2012), http://www.ciscra.org/docs/UNESCO_MOW2012_Proceedings_FINAL_ENG_Compressed.pdf.

[80] See id. at 587.

[81] See id. at 588; see also Ragan, supra note 59, at ¶ 6.

[82] See, e.g., The Sedona Conference, The Sedona Conference Commentary on Inactive Information Sources 2, 5 (2009), available at https://thesedonaconference.org/publication/The%20Sedona%20Conference®%20Commentary%20on%20Inactive%20Information%20Sources.

[83] SeeSedona IG Commentary, supra note 58, at 20-22.

[84] See Davenport & Harris, supra 55, at 176-78.

[85] The Sedona Conference, The Sedona Conference Commentary on Achieving Quality in the E-Discovery Process, 10 Sedona Conf. J. 299, 325 (2009).

Understanding and Contextualizing Precedents in e-Discovery: The Illusion of Stare Decisis and Best Practices to Avoid Reliance on Outdated Guidance

By JOLT

On March 14, 2014

In Article

Defensible Data Deletion: A Practical Approach to Reducing Cost and Managing Risk Associated with Expanding Enterprise Data

By JOLT

On March 14, 2014

In Article

DownloadPDF

Cite as: Dennis R. Kiker, Defensible Data Deletion: A Practical Approach to Reducing Cost and Managing Risk Associated with Expanding Enterprise Data, 20 Rich. J.L. & Tech. 6 (2014), http://jolt.richmond.edu/v20i2/article6.pdf.

Dennis R. Kiker*

I. Introduction

[1] Modern businesses are hosts to steadily increasing volumes of data, creating significant cost and risk while potentially compromising the current and future performance and stability of the information systems in which the data reside. To mitigate these costs and risks, many companies are considering initiatives to identify and eliminate information that is not needed for any business or legal purpose (a process referred to herein as “data remediation”). There are several challenges for any such initiative, the most significant of which may be the fear that information subject to a legal preservation obligation might be destroyed. Given the volumes of information and the practical limitations of search technology, it is simply impossible to eliminate all risk that such information might be overlooked during the identification or remediation process. However, the law does not require that corporations eliminate “all risk.” The law requires that corporations act reasonably and in good faith,[1] and it is entirely possible to design and execute a data remediation program that demonstrates both. Moreover, executing a reasonable data remediation program yields more than just economic and operational benefits. Eliminating information that has no legal or business value enables more effective and efficient identification, preservation, and production of information requested in discovery.[2]

[2] This Article will review the legal requirements governing data preservation in the litigation context, and will demonstrate that a company can conduct data remediation programs while complying with those legal requirements. First, we will examine the magnitude of the information management challenge faced by companies today. Then we will outline the legal principles associated with the preservation and disposition of information. Finally, with that background, we will propose a framework for an effective data remediation program that demonstrates reasonableness and good faith while achieving the important business objectives of lowering cost and risk.

II. The Problem: More Data Than We Want or Need

[3] Companies generate an enormous amount of information in the ordinary course of business. More than a decade ago, researchers at the University of California at Berkeley School of Information Management and Systems undertook a study to estimate the amount of new information generated each year.[3] Even ten years ago, the results were nearly beyond comprehension. The study estimated that the worldwide production of original information as of 2002 was roughly five exabytes of data, and that the storage of new information was growing at a rate of up to 30% per year.[4] Put in perspective, the same study estimates that five exabytes is approximately equal to all of the words ever spoken by human beings.[5] Regardless of the precision of the study, there is little question that the volume of information, particularly electronically stored information (“ESI”) is enormous and growing at a frantic pace. Much of that information is created by and resides in the computer and storage systems of companies. And the timeworn adage that “storage is cheap” is simply not true when applied to large volumes of information. Indeed, the cost of storage can be great and come from a number of different sources.

[4] First, there is the cost of the storage media and infrastructure itself, as well as the personnel required to maintain them. Analysts estimate the total cost to store one petabyte of information to be almost five million dollars per year.[6] The significance of these costs is even greater when one realizes that the vast majority of the storage for which companies are currently paying is not being used for any productive purpose. At least one survey indicates that companies could defensibly dispose of up to 70% of the electronic data currently retained.[7]

[5] Second, there is a cost associated with keeping information that currently serves no productive business purpose. The existence of large volumes of valueless information makes it more difficult to find information that is of use. Numerous analysts and experts have recognized the tremendous challenge of identifying, preserving, and producing relevant information in large, unorganized data stores.[8] As data stores increase in size, identifying particular records relevant to a specific issue becomes progressively more challenging. One of the best things a company can do to improve its ability to preserve potentially relevant information, while also conserving corporate resources, is to eliminate information from its data stores that has no business value and is not subject to a current preservation obligation.

[6] Eliminating information can be extremely challenging, however, due to the potential cost and complexity associated with identifying information that must be preserved to comply with existing legal obligations. When dealing with large volumes of information, manual, item-by-item review by humans is both impractical and ineffective. From the practical perspective, large volumes of information simply cannot be reviewed in a timely fashion with reasonable cost. For example, consider an enterprise system containing 500 million items. Even assuming a very aggressive review rate of 100 documents per hour, 500 million items would require five million man-hours to review. At any hourly rate, the cost associated with such a review would be prohibitive.

[7] Even when leveraging commonly used methods of data culling to reduce the volume required for review, such as deduplication, date culling, and key word filtering, the anticipated volume would still be unwieldy when even a 90% reduction in volume would require review of 50 million items. Moreover, studies have long demonstrated that human reviewers are often quite inconsistent with respect to identifying “relevant” information, even when assisted by key word searches.[9]

[8] Current scholarship also shows that human reviewers do not consistently apply the concept of relevance and that the overlap, or the measure of the percentage of agreement on the relevancy of a particular document between reviewers, can be extremely low.[10] Counter-intuitively, the result is the same even when more “senior” review attorneys set the “gold standard” for determining relevance.[11] Recent studies comparing technology-assisted processes with traditional human review conclude that the former can and will yield better results. Technology can improve both recall (the percentage of the total number of relevant documents in the general population that are retrieved through search) and precision (percentage of retrieved documents that are, in fact, relevant) than humans can achieve using traditional methods.[12]

[9] There is also growing judicial acceptance of parties’ use of technology to help reduce the substantial burdens and costs associated with identifying, collecting, and reviewing ESI. Recently, the U.S. District Court for the Southern District of New York affirmed Magistrate Judge Andrew Peck’s order approving the parties’ agreement to use “predictive coding,” a method of using specialized software to identify potentially relevant information.^{^[13]}

[10] Likewise, a Loudon County, Virginia Circuit Court judge recently granted a defendant’s motion for protective order allowing the use of predictive coding for document review.[14] The defendant had a data population of 250 GB of reviewable ESI comprising as many as two million documents, which, it argued, would require 20,000 man-hours to review using traditional human review.[15] The defendant explained that traditional methods of linear human review likely “misses on average 40% of the relevant documents, and the documents pulled by human reviewers are nearly 70% irrelevant.”[16]

[11] Similarly, commentary included with recent revisions to Rule 502 of the Federal Rules of Evidence indicate that using computer-assisted tools may demonstrate reasonableness in the context of privilege review: “Depending on the circumstances, a party that uses advanced analytical software applications and linguistic tools in screening for privilege may be found to have taken ‘reasonable steps’ to prevent inadvertent disclosure.”[17]

[12] Simply put, dealing with the volume of information in most business information systems is beyond what would be humanly possible without leveraging technology. Because such systems contain hundreds of millions of records, companies effectively have three choices for searching for data subject to a preservation obligation: they can rely on the search capabilities of the application or native operating system, they can invest in and employ third-party technology to index and search the data in its native environment, or they can export all of the data to a third-party application for processing and analysis.

III. The Solution: Defensible Data Remediation

[13] Simply adding storage and retaining the ever-increasing volume of information is not a tenable option for businesses given the cost and risk involved. However, there are risks associated with data disposition as well, specifically that information necessary to the business or required for legal or regulatory reasons will be destroyed. Thus, the first stage of a defensible data remediation program requires an understanding of the business and legal retention requirements applicable to the data in question. Once these are understood, it is possible to construct a remediation framework appropriate to the repository that reflects those requirements.

A. Retention and Preservation Requirements

[14] The U.S. Supreme Court has recognized that “‘[d]ocument retention policies,’ which are created in part to keep certain information from getting into the hands of others, including the Government, are common in business.”[18] The Court noted that compliance with a valid document retention policy is not wrongful under ordinary circumstances.[19] Document retention policies are intended to facilitate retention of information that companies need for ongoing or historical business purposes, or as mandated by some regulatory or similar legal requirement. Before attempting remediation of a data repository, the company must first understand and document the applicable retention and preservation requirements.

[15] It is beyond the scope of this Article to outline all of the potential business and regulatory retention requirements.[20] Ideally, these would be reflected in the company’s record retention schedules. However, even when a company does not have current, up-to-date retention schedules, embarking on a data remediation exercise affords the opportunity to develop or update such schedules in the context of a specific data repository. Most data repositories contain limited types of data. For example, an order processing system would not contain engineering documents. Thus, a company is generally focused on a limited number of retention requirements for any given repository. There are exceptions to this rule, such as with e-mail systems, shared-use repositories (e.g., Microsoft SharePoint), and shared network drives. Even then, focusing on the specific repository will enable the company to likewise focus on some limited subset of its overall record retention requirements. Once a company has identified the business and regulatory retention requirements applicable to a given data repository, information in the repository that is not subject to those requirements is eligible for deletion unless it is subject to the duty to preserve evidence.

[16] The modern duty to preserve derives from the common law duty to preserve evidence and is not explicitly addressed in the Federal Rules of Civil Procedure.[21] The duty does not arise until litigation is “reasonably anticipated.”[22] Litigation is “reasonably anticipated” when a party “knows” or “should have known” that the evidence may be relevant to current or future litigation.[23] Once litigation is reasonably anticipated, a company should establish and follow a reasonable preservation plan.[24] Although there are no specific court-sanctioned processes for complying with the preservation duty, courts generally measure the parties’ conduct in a given case against the standards of reasonableness and good faith.[25] In this context, a “defined policy and memorialized evidence of compliance should provide strong support if the organization is called up on to prove the reasonableness of the decision-making process.”[26]

[17] The duty to preserve is not without limits: “[e]lectronic discovery burdens should be proportional to the amount in controversy and the nature of the case” so the high cost of electronic discovery does not “overwhelm the ability to resolve disputes fairly in litigation.”[27] Moreover, courts do not equate reasonableness with “perfection.”[28] Nor does the law require a party to take “extraordinary” measures to preserve “every e-mail” even if it is technically feasible to do so.[29] “Rather, in accordance with existing records and information management principles, it is more rational to establish a procedure by which selected items of value can be identified and maintained as necessary to meet the organization’s legal and business needs[.]”[30]

[18] Critical tasks in a preservation plan are the identification and documentation of key custodians and other sources of potentially relevant information.[31] Custodians identified as having potentially relevant information should generally receive a written litigation hold notice.[32] The notice should be sent by someone occupying a position of authority within the organization to increase the likelihood of compliance.[33] The Sedona Guidelines also suggest that a hold notice is most effective when it:

1) Identifies the persons likely to have relevant information and communicates a preservation notice to those persons;

2) Communicates the preservation notice in a manner that ensures the recipients will receive actual, comprehensible and effective notice of the requirement to preserve information;

3) Is in written form;

4) Clearly defines what information is to be preserved and how the preservation is to be undertaken; and

5) Is regularly reviewed and reissued in either its original form or an amended form when necessary.[34]

[19] The legal hold should also include a mechanism for confirming that recipients received and understood the notice, for following up with custodians who do not acknowledge receipt, and for escalating the issue until it is resolved.[35] To be effective, the legal hold should be periodically reissued to remind custodians of their obligation and to apprise them of changes required by the facts and circumstances in the litigation.[36]

[20] Experience has also shown that legal holds that are not properly managed and ultimately released are less likely to receive the appropriate level of attention by employees. Thus, the legal hold process should also include a means for determining when litigation is no longer reasonably anticipated and the hold can be released, while ensuring that information relevant to another active matter is preserved.[37]

B. The Remediation Framework

[21] Against this backdrop, it is possible to outline a framework for data remediation that is compliant with legal preservation requirements. The following describes a high-level data remediation process that can be applied to virtually any data environment and any risk tolerance profile. The general process is described in Figure 1 below:

Figure 1: Data Remediation Framework

1. Assemble the Team

[22] A successful data remediation project depends on invested participation by at least three constituents in the organization: legal, information technology (“IT”), and records and information management (“RIM”). In addition, the project may require additional support from experts experienced in information search and retrieval and statistical analysis. In-house and/or outside counsel provides legal oversight and risk assessment for the project team, as well as guidance on legal preservation obligations. IT provides the technological expertise necessary to understand the structure and capabilities of the target data repository. RIM professionals provide guidance on business and regulatory retention obligations. The need for information search and retrieval experts and statisticians depends on the complexity of the data remediation effort as described below. Finally, including business users of the information may be necessary as required to fully document retention requirements applicable to a particular repository if not adequately documented in the organization’s document retention policy and schedule.

2. Select Target Data Repository

[23] Selecting the target data repository requires consideration of the costs and benefits of the data remediation exercise. Each type of repository presents unique opportunities and challenges. For example, e-mail systems, whether traditional or archived, are notorious for containing vast amounts of information that is not needed for any business or legal purpose. Similarly, shared network drives tend to contain large volumes of unused and unneeded information. Backup tapes, legacy systems, and even structured databases are other possible targets. IT and RIM resources are invaluable in identifying a suitable target repository. For example, IT can often run reports identifying directories and files that have not been accessed recently.

3. Document Retention and Preservation Obligations

[24] As discussed above, it is critical to understand the retention and preservation obligations that are applicable to the data contained in the target repository. Retention obligations include the business information needs as well as any regulatory requirements mandating the preservation of data. Ideally, these are incorporated into the document retention policy and schedule for the organization. If not, it will be important to document those requirements applicable to the target repository.

[25] Preservation obligations are driven by existing and reasonably anticipated litigation.[38] In some cases this may be the most challenging part of the project, particularly for highly litigious companies, because, unlike business needs and regulatory requirements, preservation obligations are constantly changing as new matters arise and circumstances evolve in existing matters. Successful completion of the remediation project will require a detailed understanding of, and constant attention to, the preservation obligations applicable to the target repository. As discussed below, some of the risk associated with this aspect of the project can be ameliorated through selection of the appropriate repository and culling criteria. Nevertheless, the scope and timing of the project will be driven in large part by the preservation obligations applicable to the target repository.

4. Inventory Target Data Repository

[26] After selecting the target data repository, the team must inventory the information within that repository. This does not involve creating an exhaustive list or catalog of every item within the repository. Rather, inventorying the repository involves developing a good understanding of the types of information that are contained there, the date ranges of the information, and other criteria that will enable identifying information that must be retained and that which can be deleted. The details of the inventory will vary by data repository. For example, for an e-mail server, the pertinent criteria may include only date ranges and custodians, whereas for a shared network drive, the pertinent criteria may include departments and individuals with access, date ranges, and file types.

5. Gross Culling

[27] The next step is to determine the “gross culling” criteria for the data repository. In this context, “gross culling” refers to an initial phase of data culling based on broad criteria as opposed to fine or detailed culling criteria that may be used in a later phase of the exercise.[39] The nature of the information contained within the repository will determine the specific criteria to be used, but the objective is to locate the “low-hanging fruit,” the items within the repository that can be readily identified as not falling within any retention or preservation obligation. These are black-and-white decisions where the remediation team can definitively determine without further analysis that the items identified can be deleted.

[28] For example, in most cases, dates are effective gross culling criteria. Quite often, large volumes of e-mail and loose files (data retained in shared network drives or other unstructured storage) predate any existing retention or preservation obligation for such items. Similarly, in repositories that are subject to short or no retention guidelines, the business need for the data can be evaluated in terms of the date last accessed. In the case of shared network drives, for example, it is not uncommon to find large volumes of information that has not been accessed by any user in many years.[40] Such information can be disposed of with very little risk.

6. Fine Culling

[29] Sometimes, the process need go no further than the gross culling stage. Depending on the volume of data deleted and the volume and nature of the data remaining, the remediation team may determine that the cost and benefit of attempting further culling of the data are not worth the effort and risk. In some cases, however, gross culling techniques will not identify sufficient volumes of unneeded data and more sophisticated culling strategies must be employed.

[30] The precise culling technique and strategy will depend on the specific data repository, its native search capabilities, and the availability of other search tools. For example, many modern e-mail archiving systems have fairly sophisticated native search capabilities that can locate with a high degree of accuracy content pertinent to selected criteria. Other systems will require the use of third-party technology. In either case, the fine culling process will require selection of culling criteria that will uniquely identify items not subject to a retention or preservation obligation and be susceptible to verification. Depending on the nature of the data and the complexity of the necessary search criteria, the remediation team may need to engage an expert in information search and retrieval.

7. Sampling and Statistical Analysis

[31] Regardless of the specific fine culling strategy employed, the remediation team should validate the results by sampling and analysis to ensure defensibility. Generally, it will be advisable to engage a statistician to direct the sampling effort and perform the analysis because both can be quite complex and rife with opportunity for error.[41] Moreover, in the event that the company’s process is ever challenged, validation by an independent expert is compelling evidence of good faith. It is important to realize that the statistical analysis cannot demonstrate that no items subject to a preservation obligation are included in the data to be destroyed. It can only identify the probability that this is the case, but it can do so with remarkable precision when properly performed.[42]

8. Iteration

[32] Fine culling and validation should continue until the remediation team achieves results that meet its expectations regarding the volume of data identified for deletion and the probability that only data not subject to a preservation obligation are included in the result set.

IV. Conclusion

[33] The enormity of the challenge that expanding volumes of unneeded information creates for businesses is difficult to understate. Companies literally spend millions of dollars annually to store and maintain information that serves no useful purpose, funds that could be directed to productive uses such as hiring, research, and investment. Facing this challenge, on the other hand, is a challenge of its own, perhaps due more to the fear of adverse consequences in litigation than any other factor. It is possible, however, to develop a defensible data remediation process that enables a company to demonstrate good faith and reasonableness while eliminating the cost, waste, and risk of this unnecessary data.

* Dennis Kiker has been a partner in a national law firm, director of professional services at a major e-Discovery company, and a founding shareholder of his own law firm. He has served as national discovery counsel for one of the largest manufacturing companies in the country, and counseled many others on discovery and information governance-related issues. He is a Martindale-Hubbell AV-rated attorney admitted at various times to practice in Virginia, Arizona and Florida, and holds a J.D., magna cum laude & Order of the Coif from the University of Michigan Law School. Dennis is currently a consultant at Granite Legal Systems, Inc. in Houston, Texas.

[1] See The Sedona Conference, The Sedona Principles: Second Edition Best Practices Recommendations & Principles for Addressing Electronic Document Production 28 (Jonathan M. Redgrave et al. eds., 2007) [hereinafter “The Sedona Principles”], available at http://www.sos.mt.gov/Records/committees/erim_resources/A%20-%20Sedona%20Principles%20Second%20Edition.pdf (last visited Jan. 30, 2014); see also Louis R. Pepe & Jared Cohane, Document Retention, Electronic Discovery, E-Discovery Cost Allocation, and Spoliation Evidence: The Four Horsemen of the Apocalypse of Litigation Today, 80 Conn. B. J. 331, 348 (2006) (explaining how proposed Rule 37(f) addresses the routine alteration and deletion of electronically stored information during ordinary use).

[2] See The Sedona Principles, supra note 1, at 12.

[3] See Peter Lyman & Hal R. Varian, How Much Information 2003?, http://www.sims.berkeley.edu/research/projects/how-much-info-2003/ (last visited Feb. 9, 2014).

[4] Id.

[5] See id.

[6] Jake Frazier, Hoarders: The Corporate Edition, Business Computing World (Sept. 25, 2013), http://www.businesscomputingworld.co.uk/hoarders-the-corporate-edition/.

[7] Id.

[8] See James Dertouzos et. al, Rand Inst. for Civil Justice, The Legal and Economic Implications of E-Discovery: Options for Future Research ix (2008), available at http://www.rand.org/content/dam/rand/pubs/occasional_papers/2008/RAND_OP183.pdf; see also Robert Blumberg & Shaku Atre, The Problem with Unstructured Data, Info. Mgmt. (Feb. 1, 2003, 1:00 AM), http://soquelgroup.com/Articles/dmreview_0203_problem.pdf; The Radicati Group, Taming the Growth of Email: An ROI Analysis 3-4 (2005), available at http://www.radicati.com/wp/wp-content/uploads/2008/09/hp_whitepaper.pdf

[9] See David C. Blair & M.E. Maron, An Evaluation of Retrieval Effectiveness for a Full-Text Document Retrieval System, Comm. ACM, March 1985, at 289-90, 295-96.

[10] See Ellen M. Voorhees, Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness, 36 Info. Processing & Mgmt. 697, 701 (2000), available at http://‌www.cs.cornell.edu/‌courses/‌cs430/‌2006fa/‌cache/‌Trec_8.pdf (finding that relevance is not a consistently applied concept between independent reviewers). See generally Hebert L. Roitblat et al., Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review, 61 J. Am. Soc’y. for Info. Sci. & Tech. 70, 77 (2010).

[11] See Voorhees, supra note 10, at 701 (finding that the “overlap” between even senior reviewers shows that they disagree as often as they agree on relevance).

[12] See generally Maura R. Grossman & Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, 17 Rich. J.L. & Tech. 11 ¶ 2 (2011), http://‌jolt.‌richmond.‌edu/‌‌v17i3/‌article11.pdf (analyzing data from the TREC 2009 Legal Track Interactive Task Initiative).

[13] See Moore v. Publicis Groupe SA, No. 11 Civ. 1279(ALC)(AJP), 2012 WL 1446534, at *1-3 (S.D.N.Y. Apr. 26, 2012).

[14] See Global Aerospace, Inc. v. Landow Aviation, L.P., No. CL 61040, 2012 Va. Cir. LEXIS 50, at *2 (Va. Cir. Ct. Apr. 23, 2012).

[15] See Mem. in Supp. of Mot. for Protective Order Approving the Use of Predictive Coding at 4-5, Global Aerospace, Inc. v. Landow Aviation, L.P., No. CL 61040, 2012 Va. Cir. LEXIS 50 (Va. Cir. Ct. Apr. 9, 2012).

[16] Id. at 6-7.

[17] Fed. R. Evid. 502(b) Advisory Committee’s Notes, Subdivision (b) (2007).

[18] Arthur Anderson LLP v. United States, 544 U.S. 696, 704 (2005).

[19] Id.; see Managed Care Solutions, Inc. v. Essent Healthcare, 736 F. Supp. 2d 1317, 1326 (S.D. Fla. 2010) (rejecting plaintiffs’ argument that a company policy that e-mail data be deleted after 13 months was unreasonable) (citing Wilson v. Wal-Mart Stores, Inc., No. 5:07-cv-394-Oc-10GRJ, 2008 WL 4642596, at *2 (M.D. Fla. Oct. 17, 2008); Floeter v. City of Orlando, No. 6:05-CV-400-Orl-22KRS, 2007 WL 486633, at *7 (M.D. Fla. Feb. 9, 2007)). But see Day v. LSI Corp., No. CIV 11–186–TUC–CKJ, 2012 WL 6674434, at *16 (D. Ariz. Dec. 20, 2012) (finding evidence of defendant’s failure to follow its own document policy was a factor in entering default judgment sanction for spoliation).

[20] For purposes of this article, such laws and regulations are treated as retention requirements with which a business must comply in the ordinary course of business. This article focuses on the requirement to exempt records from “ordinary course” retention requirements due to a duty to preserve the records when litigation is reasonably anticipated. In short, this article relies on the distinction between retention of information and preservation of information, focusing on the latter. Seeinfra text accompanying note 23.

[21] See Sylvestri v. Gen. Motors, Inc., 271 F.3d 583, 590 (4th Cir. 2001); see also Stanley, Inc. v. Creative Pipe, Inc., 269 F.R.D. 497, 519 (4th Cir. 2010).

[22] See Cache la Poudre Feeds v. Land O’Lakes, 244 F.R.D. 614, 621, 623 (D. Colo. 2007); see also The Sedona Principles, supra note 1, at 14.

[23] See Pension Comm. of the Univ. of Montreal Pension Plan v. Banc of Am. Sec., LLC, 685 F. Supp. 2d 456, 466 (S.D.N.Y. Jan. 15, 2010 as amended May 28, 2010); Rimkus Consulting Grp., Inc. v. Cammarata, 688 F. Supp. 2d 598, 612-13 (S.D. Tex. 2010); Zubulake v. UBS Warburg LLC, 220 F.R.D. 212, 216 (S.D.N.Y. 2003) (Zubulake IV); see also The Sedona Conference, Commentary on Legal Holds: The Trigger & The Process, 11 Sedona Conf. J. 265, 269 (2010) [hereinafter “Commentary on Legal Holds”].

[24] Commentary on Legal Holds, supra note 23, at 269 (“Adopting and consistently following a policy or practice governing an organization’s preservation obligations are factors that may demonstrate reasonableness and good faith.”); see The Sedona Principles, supra note 1, at 12.

[25] Commentary on Legal Holds, supra note 23, at 270 (evaluating an organization’s preservation decisions should be based on good faith and reasonable evaluation of relevant facts and circumstances).

[26] Id. at 274.

[27] Rimkus Consulting, 688 F. Supp. 2d at 613 n.8 (quoting The Sedona Principles, supra note 1, at 17); see also Stanley v. Creative Pipe, Inc., 269 F.R.D. 497, 523 (D. Md., 2010); Commentary on Legal Holds, supra note 23, at 270.

[28] Pension Comm., 685 F. Supp. 2d at 461 (“Courts cannot and do not expect that any party can meet a standard of perfection.”).

[29] See The Sedona Principles, supra note 1, at 28, 30 (citing Concord Boat Corp. v. Brunswick Corp., No. LR-C-95-781, 1997 WL 33352759, at *4 (E.D. Ark. Aug. 29, 1997)).

[30] The Sedona Principles, supra note 1, at 15.

[31] See Commentary on Legal Holds, supra note 23, at 270; id. at 28.

[32] See Pension Comm. 685 F. Supp. 2d at 465; see also Commentary on Legal Holds, supra note 23, at 270.

[33] The Sedona Principles, supra, note 1, at 32.

[34] Commentary on Legal Holds, supra note 23, at 270.

[35] Id. at 283-85.

[36] See id. at 285.

[37] Id. at 287.

[38] See supra ¶ 16.

[39] See Alex Vorro, How to Reduce Worthless Data, InsideCounsel (Mar. 1, 2012), http://www.insidecounsel.com/2012/03/01/how-to-reduce-worthless-data?t=technology.

[40] See, e.g., Anne Kershaw, Hoarding Data Wastes Money, Baseline (Apr. 16, 2012), http://www.baselinemag.com/storage/Hoarding-Data-Wastes-Money/ (80% of the data on shared network and local hard drives has not been accessed in three to five years).

[41] Statistical sampling results can be as valid using a small random sample size as they are for using a larger sample size because, in a simple random sample of any given size, all items are given an equal probability of being selected for the statistical assessment. In fact, to achieve a confidence interval of 95% with a margin of error of 5%, a sample size of 384 would be sufficient for the population of 300 million. SeeSample Size Table, Research Advisors, http://research-advisors.com/tools/SampleSize.htm (last visited on Jan. 12, 2014) (citing Robert V. Krejcie & Daryle W. Morgan, Determining Sample Size for Research Activities, Educational and Psychological Measurement 30 Educ. & Psychol. Measurement 607, 607-610 (1970). However, samples can be vulnerable to discrete “sampling error” because the randomness of the selection may result in a sample that does not reflect the makeup of the overall population. For instance, a simple random sample of messages will on average produce five with attachments and five with no attachments, but any given test may over-represent one message type (e.g., those with attachments) and under-represent the other (e.g., those without).

[42] See,e.g., Statistics, Wikipedia, http://en.wikipedia.org/wiki/Statistics (last visited on Feb. 9, 2014).

The Compliance Case for Information Governance

By JOLT

On March 14, 2014

In Article

Blog: Banned from the Web: Is the Internet Really a Human Right?

By JOLT

On March 5, 2014

In Blog Posts

by Catherine M. Gray, Associate Manuscripts Editor

Just before Valentine’s Day, a Racine County Circuit Court judge banned a Wisconsin resident from using the Internet for thirty months.¹ Jason Willis, a thirty-one year old resident of Waterford, created a Craigslist ad requesting “nude male suitors” using his neighbor’s picture and address.² As one might imagine, Willis’ neighbor “Dawn” was shocked when several men arrived at her door, one wearing only a trench coat.³ In banning Willis from the Internet, Judge Allen Torhost declared, “[i]f you want to drive drunk, you’re not allowed to drive. To me, a public availability of the internet [sic]—to use it the way he did—is unconscionable.”⁴

But did Judge Torhost’s violate Willis’ human rights? A 2011 United Nations report declared “that disconnecting people from the internet [sic] is a human rights violation and against international law.”⁵ While the report focused on the United Kingdom and France’s efforts to block individuals accused of illegal file sharing and countries who would block Internet access to “quell political unrest,” it calls into question whether Judge Torhost’s decision in some way violated Willis’ fundamental rights in modern society.⁶ In the United States, higher courts than Judge Torhost’s have declared banning an individual from the Internet is an appropriate remedy.⁷ Nevertheless, there is a question of whether, in a society and work industry so intrinsically linked to the Internet, banning an individual from being online constitutes a human rights violation. Will the individual be able to secure and maintain employment? Does the answer to that question really matter?

Despite the United Nations’ assertion that the Internet is a human right, I’m inclined to agree with Judge Torhost and the Eleventh Circuit here. Recently, Miranda Barbour admitted to killing more than twenty individuals across the United States, lured to their deaths through Craigslist.⁸ The use of the Internet for the purposes of harassment, child pornography, and even murder does much to counter the U.N.’s argument for a human right to the World Wide Web. In these cases, I see little wrong with banning convicted offenders from using the Internet, even permanently. Part of being a productive, contributing member of society is acting responsibly, and Judge Torhost got it right when he likened use of the Internet to driving. The Internet, like the driving, is a privilege, not a right, and abuse of a privilege means it’s revoked.

1 Cody Holyoke, Judge: Waterford man ‘banned from Internet, Today’s TMJ4 (Feb. 11, 2014), http://www.jrn.com/tmj4/news/Judge-Waterford-man-banned-from-Internet-245097211.html.

2 Taylor Berman, Man Banned From the Internet for Sending Naked Men to Neighbor’s Home, Gawker (Feb. 13, 2014, 10:08 AM), http://gawker.com/man-banned-from-the-internet-for-sending-naked-men-to-1522072855.

3 Id.

4 Id.

5 David Kravets, U.N. Report Declares Internet Access a Human Right, Wired (June 2, 2011, 2:47 PM), http://www.wired.com/threatlevel/2011/06/internet-a-human-right/.

6 Id., see also Greg Sandoval, U.K. embraces ‘three strikes’ for illegal file sharing, CNET (Apr. 8, 2010, 8:35 AM), http://news.cnet.com/8301-31001_3-20002018-261.html.

7 United States v. Dove, 343 F. App’x 428, 430 (11th Cir. 2009) (upholding the defendant’s lifetime ban from the Internet as a condition of supervised release following a conviction for “traveling in interstate commerce with intent to engage in illicit conduct with a person under the age of 18 years, in violation of 18 U.S.C. § 2423(b), (f)”); see also David Kravets, U.S. Courts Split on Internet Bans, Wired (Jan. 12, 2010, 5:11 PM), http://www.wired.com/threatlevel/2010/01/courts-split-on-internet-bans/.

8 John Bacon, Accused Craigslist killer claims more slayings, USA Today (Feb. 17, 2014, 12:32 PM), http://www.usatoday.com/story/news/nation/2014/02/16/pa-barbour-craigslist-murders-interview/5526113/.

Blog: Facebook’s Continuing Privacy Policy Battle: Parents Concerned Over the Use of Children’s Information in Advertisements

By JOLT

On March 4, 2014

In Blog Posts

By: Jasmine McKinney, Associate Manuscripts Editor

By now, most of us probably know that Facebook is no stranger to lawsuits. Just a mere two years ago, Facebook was involved in a class-action lawsuit where the social networking giant was found to have used members’ images without their permission in advertisements commonly called “sponsored stories”.[1] Facebook users involved in the suit reported that Facebook had used their personal images shared on the website for various commercial activities.[2] Though at the time users had no choice to opt out of being featured in sponsored stories, they could configure their privacy settings to share information with only certain people.[3] The suit that was settled in 2012 cost Facebook a whopping $10 million.[4]

Fast-forward to today and throngs of people are still up in arms over Facebook’s lackadaisical privacy policies. The settlement has yet to take effect due to the large number of appeals making their way through court. However, under the settlement, Facebook agreed to change its privacy policies so that Facebook users between the ages of thirteen and eighteen could indication whether their parents were also Facebook users and give the parents control over the use of their children’s ‘likes’ and comments for advertising purposes.[5] For children whose parents were not Facebook users, the site promised to opt their children out of social advertising until age eighteen.[6] Still many are unsatisfied.

Now, a group of parents as well as child advocacy and privacy groups are asking a federal appeals court to throw out the 2012 settlement with Facebook over the website’s use of children’s images in these advertisements or sponsored stories.[7] The plaintiffs and other public interest groups claim that despite the 2012 action, Facebook still uses children’s images without the proper authorization.[8] Though parents have the ability to ask Facebook, to remove an image used in advertisements on the site, many still believe Facebook should ask for permission first.[9] Plaintiffs in the case claim that Facebook’s practices continue to violate laws prohibiting the use of a minor’s image without parental permission in several states, including: California, Florida, New York, Oklahoma, Tennessee, Virginia, and Wisconsin.[10]

Scott Michelman, a lawyer for the nonprofit group Public Citizen argues that Facebook is currently exercising a backwards approach to privacy concerning it’s younger users. “The default should be that a minor’s image should not be used for advertising unless the parent opts in. Putting the burden on the parent to opt the child out gets it exactly backward.”[11] Another group known as the Campaign for Commercial-Free Childhood has also concluded that the previous settlement offers little protection to children and has taken the same stance as Public Citizen on the issue.[12]

Over the years, Facebook has continued to emphasize that its takes the privacy of its users seriously, but many may question how this can be true given the company’s history of frequently making changes to its privacy policy (some of which have not always been the best in terms of privacy).[13] The safety of minors online is undoubtedly an important issue and the result of this new suit is bound to show Facebook’s true stance on protecting the privacy of its younger users.

[1] Cecilia Kang, Parents Resume Privacy Fight vs. Facebook Over Use of Children’s Images in Ads, Washington Post (Feb. 13, 2014) http://www.washingtonpost.com/business/technology/parents-resume-privacy-fight-vs-facebook-over-use-of-childrens-images-in-ads/2014/02/12/5ceb9f82-9430-11e3-b46a-5a3d0d2130da_story.html

[2] Hayley Tsukayama, Facebook Settles Sponsored Stories Suit for $10 Million, Washington Post (June 18, 2012) http://www.washingtonpost.com/business/technology/facebook-settles-sponsored-stories-suit-for-10-million/2012/06/18/gJQATcmklV_story.html

[3] Id.

[4] Id.

[5] Vindu Goel, More Pressure on Facebook to Change Its Policy of Using Users’ Images and ‘Likes’ in Ads, The Economic Times (Feb. 13, 2014) http://economictimes.indiatimes.com/tech/internet/more-pressure-on-facebook-to-change-its-policy-of-using-users-images-and-likes-in-ads/articleshow/30324124.cms

[6] Id.

[7] Kang, supra note 1.

[8] Id.

[9] Id.

[10] Id.

[11] Goel, supra note 5.

[12] Id.

[13] See http://newsroom.fb.com/News/735/Reminder-Finishing-the-Removal-of-an-Old-Search-Setting (explaining Facebook’s decision to remove the “Who can look up your Timeline by Name? setting)

Blog: Net Neutrality

By JOLT

On February 25, 2014

In Blog Posts

By Jessica Ertel, Associate Articles Editor

The D.C. Court of Appeals recently turned down federal net neutrality legislation, thus allowing Internet service providers to charge Internet companies fees for faster delivery of Internet content.

Net neutrality legislation requires that broadband providers treat all Internet traffic equally. The Federal Communications Commission also calls this “Internet openness.” The FCC supports an open Internet because without net neutrality legislation, broadband providers might prevent their subscribers from accessing certain websites altogether or degrade the quality of these sites in order to direct Internet traffic towards their own competing services, or to collect fees from these websites.¹ The Commission’s purpose in the net neutrality legislation was to prevent broadband providers from blocking or discriminating against certain Internet site providers.

By throwing out net neutrality legislation, the decision opens the door for Internet Services providers to charge fees to companies who want their Internet content to be delivered to consumers more quickly. Internet services providers, such as petitioner Verizon, applaud the ruling, because it means that they can make money off of Internet companies who want the information from their sites delivered “first class.” The Internet companies who deliver streaming content are the ones most distressed by the U.S. Court of Appeals’ decision. Netflix is one such company. The CEO of Netflix, Reed Hastings, responded to the outcome of the case: “Were this draconian scenario to unfold with some [Internet Service Provider], we would vigorously protest and encourage our members to demand the open Internet they are paying their ISP to deliver.”²

Big companies such as Netflix would be the ones most hurt by this ruling, and the company estimates that it would potentially be forced to pay as much as 10 percent of its annual revenue to broadband providers.³ This could in turn be pushed onto the consumers in the form of higher prices to access sites like Netflix. Yet such a price increase is unlikely to happen soon, and further, Internet service providers have expressed their commitment to their consumers’ ability to freely access Internet sites.⁴

In spite of this roadblock for the FCC, it has promised to find other ways to pursue Internet openness. The Appeals Court did find that the FCC had the authority to regulate broadband providers’ treatment of Internet traffic.⁵ The FCC appears ready for the challenge to find other ways to promote Internet openness.

1Verizon v. F.C.C., 11-1355, 2014 WL 113946, at *2 (D.C. Cir. Jan. 14, 2014).

2 Steven Russolillo, Netflix CEO on Net Neutrality: We Will ‘Vigorously Protest’ a “Draconian Scenario,’Wall St. J. (Jan. 22, 2014), available at http://blogs.wsj.com/moneybeat/2014/01/22/netflix-ceo-on-net-neutrality-we-will-vigorously-protest-a-draconian-scenario/?mod=e2tw.

3 Scott Mortize & Cliff Edwards, Verizon Victory on Net-Neutrality Rules Seen as Loss for Netflix, Bloomberg Law, Jan. 14, 2014, available at http://about.bloomberglaw.com/legal-news/verizon-victory-on-net-neutrality-rules-seen-as-loss-for-netflix.

4 Edward Wyatt, Rebuffing F.C.C. in ‘Net Neutrality’ Case, Court Allows Streaming Deals, N.Y. Times, Jan. 14, 2014, available at http://www.nytimes.com/2014/01/15/technology/appeals-court-rejects-fcc-rules-on-internet-service-providers.html?_r=0(Comcast vice president expressing the company’s commitment to deliver an open Internet to its customers).

5Verizon v. F.C.C., 11-1355, 2014 WL 113946, at *1 (D.C. Cir. Jan. 14, 2014).

Blog: Snapchat May Not Be Just for Friends – How About Insider Trading?

By JOLT

On February 19, 2014

In Blog Posts

by Dylan Denslow, Associate Technology and Public Relations Editor

Since its launch in September 2011, Snapchat has amassed some 26 million users who together send an average 400 million “snaps” each day.[1] To say the app is popular is an understatement. However, Snapchat’s reputation has been primarily as an outlet for teenagers and college students to send scandalous or embarrassing photos of themselves. Recently however, a new app named Confide has taken the idea behind Snapchat, the notion of a disappearing message, and brought it to Wall Street.[2]

Confide is a “new ‘off-the-record’ messaging app” that has raised $1.9 million in seed funding, and was initially referred to as “Snapchat for business”.[3] Business people commonly run into a situation where they do not want to create a paper trail of emails discussing a particular subject – instead they prefer to talk over the phone where their discussions aren’t recorded and may not bring about as many legal consequences. Confide is meant to alleviate this situation where phone tag is frequent and an unnecessary impediment to transacting business.[4]

On its face, this seems like a great idea that could cure business problems faced on a daily basis. However, the app is ripe for abuse and potentially provides a mechanism by which employees may be able to skirt or break state and federal laws. Insider trading immediately comes to mind. For example, an executive with a stock tip could send a message through Confide to an investor knowing that the record of that message would soon disappear.[5] The messages sent on Confide are not stored on servers, and the company has put in place protections to avoid users taking screenshots of the messages themselves.[6] This seems like a perfect mechanism to help exec’s and employees send messages without worrying about later consequences of their statements.

Although geared towards the business world, it is also likely that the app will eventually fall into other hands as well. This in itself creates a number of potential legal issues. For example, imagine a drug dealer with use of the app. No longer is there a record of his texts to buyers, instead his messages are deleted immediately, making it more difficult for law enforcement to connect him with his past activities. Although this violates the terms of Confide’s user agreement, it is unlikely that such an agreement would deter someone already involved with such criminal activity.[7]

Snapchat has certainly brought value to its users, primarily through its ability to allow them to share fun experiences. However, there have already been allegations that Snapchat is being used for insider trading, even when its reputation typically involves a drunken “selfie” at a bar or college party.[8] Now, Confide brings a similar product to market specifically geared at the business community. A user agreement prohibiting illegal activity will not be enough to deter law breaking. As this technology moves into the business arena, messages will have more serious financial effects than the seemingly harmless Snapchats. Lawmakers should be poised to monitor and regulate use of this technology in order to avoid any potentially serious legal issues that may arise.

[1] See http://expandedramblings.com/index.php/snapchat-statistics/#.UvLdyGiPAc4.

[2] See http://www.forbes.com/sites/jeffbercovici/2014/02/04/snapchat-for-business-its-called-confide-and-it-exists-now/.

[3] Id.

[4] Id.

[5] See http://www.thedailybeast.com/articles/2014/01/17/confide-is-the-best-way-to-keep-your-dastardly-deeds-hidden-for-now.html.

[6] See http://www.forbes.com/sites/jeffbercovici/2014/02/04/snapchat-for-business-its-called-confide-and-it-exists-now/.

[7] See http://www.forbes.com/sites/jeffbercovici/2014/02/04/snapchat-for-business-its-called-confide-and-it-exists-now/.

[8] See http://www.cnbc.com/id/100924846.

Sedona Conference® to Use JOLT Article in Webinar

By JOLT

On February 19, 2014

In News

Sedona Conference® Webinar

ESI in the Criminal Justice System – From Initial Investigation through Trial

The Sedona Conference® is hosting a two-part webinar on electronically stored information (ESI) as it relates to criminal justice system. The 90-minute webinars will address the issues dealing with the collection, disclosure, and use of “criminal electronically stored information” from criminal investigation through the trial. The dates of the two webinars are Feb. 19, 2014 and March 19, 2014.

As part of the presentation, the Sedona Conference® will include material from the following JOLT article:

Social Media Evidence in Government Investigations and Criminal Proceedings: A Frontier of New Legal Issues – Justin P. Murphy & Adrian Fontecilla,

Click here for more information on the Webinar

Richmond Journal of Law and Technology

The first exclusively online law review.

Finding the Signal in the Noise: Information Governance, Analytics, and the Future of Legal Practice

Introduction

B. Information Governance and Analytics in the Era of Big Data

C. Applying the Lessons of E-Discovery In Using Analytics for Optimal Information Governance: Some Examples

Conclusion

Understanding and Contextualizing Precedents in e-Discovery: The Illusion of Stare Decisis and Best Practices to Avoid Reliance on Outdated Guidance

Defensible Data Deletion: A Practical Approach to Reducing Cost and Managing Risk Associated with Expanding Enterprise Data

I. Introduction

II. The Problem: More Data Than We Want or Need

III. The Solution: Defensible Data Remediation

A. Retention and Preservation Requirements

B. The Remediation Framework

1. Assemble the Team

2. Select Target Data Repository

3. Document Retention and Preservation Obligations

4. Inventory Target Data Repository

5. Gross Culling

6. Fine Culling

7. Sampling and Statistical Analysis

8. Iteration

IV. Conclusion

The Compliance Case for Information Governance

Blog: Banned from the Web: Is the Internet Really a Human Right?

Blog: Facebook’s Continuing Privacy Policy Battle: Parents Concerned Over the Use of Children’s Information in Advertisements

Blog: Net Neutrality

Blog: Snapchat May Not Be Just for Friends – How About Insider Trading?

Sedona Conference® to Use JOLT Article in Webinar

Sedona Conference® Webinar

ESI in the Criminal Justice System – From Initial Investigation through Trial