More UNC academic fraud records now searchable
Posted December 22, 2015
Raleigh, N.C. — The University of North Carolina at Chapel Hill has released more documents from a trove of 5 million pages of emails, memos and other records collected as part of the investigation into the school's long-running academic fraud scandal.
The December release, which totals more than 227,000 pages, comes about two months after the university posted 200,000-plus pages of unsearchable PDFs online in response to a public records request from The News & Observer and The Daily Tar Heel – the largest request in the university's history, according to school officials. The records release prompted WRAL News to launch an app in November that opened the search of those documents to the public.
The documents UNC posted online last week, however, do contain "readable" text encoded in the files, allowing readers to search for words or phrases in many common PDF readers.
UNC spokesperson Jim Gregory said Monday that after conversations with the media and others following the release of the first batch, university officials decided making the documents searchable would improve transparency and save costs as they continue to release records.
"We thought it was the right thing to do," Gregory said.
Reporters from WRAL News added the newest documents to our application, allowing users to read and search nearly a half-million pages of records by names and other keywords.
For example, our app allows users to search for Butch Davis, UNC-Chapel Hill's former head football coach who was fired as the scandal began to brew in July 2011. His name returns 1,114 results, allowing readers to quickly find mentions of him in the documents, including many emails he sent and received.
Readers can also search for Julius Nyang'oro, a professor at the center of the scandal who chaired UNC-Chapel Hill's Department of African and Afro-American Studies. His name returns more than 1,985 results. A search for Mary Willingham turns up 1,336 results. She's the university's former academic adviser who spoke out against lax standards for student-athletes and has since written a book about the scandal.
The app includes shortcut searches for some of the most well-known names mentioned in the documents and in the Wainstein Report, which details the nearly two decades of academic fraud at the university.
Readers can run their own searches as well, but not all names will show up. The university has redacted students' names and other information it has deemed private.
Wainstein's 131-page report found student-athletes were given preferential treatment in the classroom and were specifically steered by academic counselors toward classes in the African and Afro-American Studies Department that rarely met and required only a paper to pass. Four employees were terminated or resigned as a result of the investigation. Six other employees still face a review by the university and could be disciplined.
UNC plans to release more of the 5 million records as they are reviewed and redacted to protect private information. As of November, the university said it had paid a law firm almost $3 million primarily for help preparing the documents for release, just part of the more than $10 million school officials have paid outside firms.
How we created the app
For the documents that weren't searchable, WRAL News used a Web-based service called DocumentCloud to process the documents with optical character recognition, which attempts to match images of text with their corresponding characters. OCR is never 100 percent accurate. Sometimes, letters and characters are too small, blurry or rendered in a difficult font. But it does give readers a chance to identify text in the documents.
New documents released by UNC in December were also uploaded to the collection, bringing the total number of pages to more than 400,000.
While there still may be a few bugs, the most important thing for readers to know is that we want the application to evolve with their feedback.
We hope it becomes a tool we can use to make public records more accessible to the public.