UNC releases thousands more pages of Wainstein records
Posted February 26
Raleigh, N.C. — The University of North Carolina at Chapel Hill has released nearly 200,000 more pages of records from a trove of 5 million pages of emails, memos and other records collected as part of the investigation into the school's long-running academic fraud scandal.
The February release comes about four months after the university first posted 200,000-plus pages of unsearchable PDFs online in response to a public records request from The News & Observer and The Daily Tar Heel – the largest request in the university's history, according to school officials. The records release prompted WRAL News to launch an app in November that opened the search of those documents to the public.
Beginning with its second document dump in December, the university began releasing records containing "readable" text encoded in the files, allowing readers to search for words or phrases in many common PDF readers.
Reporters from WRAL News added the newest documents to our application, allowing users to read and search more than half a million pages of records by names and other keywords.
For example, our app allows users to search for Butch Davis, UNC-Chapel Hill's former head football coach who was fired as the scandal began to brew in July 2011. His name returns 3,261 results, allowing readers to quickly find mentions of him in the documents, including many emails he sent and received.
Readers can also search for Julius Nyang'oro, a professor at the center of the scandal who chaired UNC-Chapel Hill's Department of African and Afro-American Studies. His name returns more than 1,985 results. A search for Mary Willingham turns up 2,544 results. She's the university's former academic adviser who spoke out against lax standards for student-athletes and has since written a book about the scandal.
The app includes shortcut searches for some of the most well-known names mentioned in the documents and in the Wainstein Report, which details the nearly two decades of academic fraud at the university.
Readers can run their own searches as well, but not all names will show up. The university has redacted students' names and other information it has deemed private under federal and state records law.
Wainstein's 131-page report found student-athletes were given preferential treatment in the classroom and were specifically steered by academic counselors toward classes in the African and Afro-American Studies Department that rarely met and required only a paper to pass. Several employees were terminated or resigned as a result of the investigation, and more could still face discipline.
UNC plans to release more of the 5 million records as they are reviewed and redacted to protect private information. As of November, the university said it had paid a law firm almost $3 million primarily for help preparing the documents for release, just part of the more than $10 million school officials have paid outside firms for services related to the investigation.
UNC spokesperson Jim Gregory said in an email earlier this month that school officials had also hired more than 30 full-time reviewers from the university's temp service, as well as others from an outside company, to review the documents in preparation for release at a total projected cost of more than $1.4 million.
How we created the app
For the documents that weren't searchable, WRAL News used a Web-based service called DocumentCloud to process the documents with optical character recognition, which attempts to match images of text with their corresponding characters. OCR is never 100 percent accurate. Sometimes, letters and characters are too small, blurry or rendered in a difficult font. But it does give readers a chance to identify text in the documents.
New documents released by UNC in February were also uploaded to the collection, bringing the total number of pages to more than 600,000.
While there still may be a few bugs, the most important thing for readers to know is that we want the application to evolve with their feedback.
We hope it becomes a tool we can use to make public records more accessible to the public.