DATE: Thursday Sep. 27, 4:45 — 6:00pm
LOCATION: Hornbake Library North — Room 0302J
TITLE: The Importance of Being Efficient in Searching Electronic Archives (Ripped From the Headlines)
This talk is an illustration of the potential impact of computational archival science on public policy. It proposes to connect hot button policy issues in the news with the importance of implementing AI and machine learning techniques as part of an overall governmental “access strategy” – one that in the future holds the promise of providing greater citizen access to public records through more efficient, timely responses to Congressional requests, FOIA, litigation, and other investigations of all types.
Both the National Archives and all federal agencies now increasingly are confronting the prospect of needing to search vast amounts of e-mail and other electronic records in connection with access demands coming from all quarters. In the case of the National Archives, it is the legal repository to date of over 500 million presidential emails from past Administrations, accumulated to date between the Reagan and the Obama Administrations.
The most recent controversy has involved the scope of a search to be conducted for Judge Brett Kavanaugh’s records, as created or received by him during his tenure in the George W. Bush Administration. The headlines concerning the document review necessary to be conducted serves to illuminate the larger issue NARA (and all of government) faces, namely: being without adequate tools consisting of advanced search methods employing what has come to be known in the legal community as “technology assisted review” (a type of machine learning).
NARA candidly has acknowledged the steep challenge it faces in responding to Congressional demands for electronic records. In a letter dated August 2, 2018, sent by NARA’s General Counsel to the US Senate Judiciary Committee, NARA informed the Committee that just with respect to records requested by the Majority, the request includes an estimated 900,000 pages of emails with attachments—dwarfing all prior requests for records of past Supreme Court nominees. NARA estimates that it would take until October to process this request, substantially after the scheduled hearings on the nomination.
The access issue is multiplied hundreds of fold in scope when one considers that across the entirety of the federal government, FOIA officers and lawyers are without advanced search software necessary to perform highly efficient and timely searches in response to access demands from the public. With the unfolding implementation of the total email archiving policy known as “Capstone,” repositories of email totaling in the many millions (if not tens or hundreds of millions) will be accumulating in Capstone accounts, and required to be searched under FOIA. Without machine learning tools applied to the search function, agencies will be finding them overwhelmed by the number of potentially “false positive” hits they are finding in e-mail that they need to search through to cull out truly relevant records in response to FOIA requests and litigation demands. (Not to mention missing the “false negative” documents that are truly responsive even if lacking the chosen “keyword”).
The public deserves better than continued reliance on search techniques that are 10 to 20 years old, especially on matters of crucial public importance such as the selection of the next Supreme Court justice.
Jason R. Baron is an internationally recognized speaker and author on the preservation of electronic documents. Jason previously served as the first appointed Director of Litigation for the U.S. National Archives and Records Administration, and as a trial lawyer and senior counsel at the Department of Justice. In those roles, Jason played a leading role in the government’s adoption of electronic recordkeeping practices and acted as lead counsel in landmark cases involving the preservation of White House email. As an adjunct professor in the College of Information Studies, Jason co-taught with Prof. Doug Oard the first e-discovery course for PhD and Masters candidates in the United States.
Jason was prominently featured in the documentary The Decade of Discovery(2014), which tells the story of a government lawyer seeking a better way to search for White House email. The American Lawyer magazine named him one of six “e-discovery trailblazers” in its 2013 issue devoted to “The Top 50 Big Law Innovators of the Last 50 Years.” Jason has appeared on CNN, NBC Nightly News,Good Morning America, The Last Word With Lawrence O’Donnell, NPR’s All Things Considered, and has been quoted in The New York Times, The Washington Post,TIME Magazine, and numerous other media outlets.