Some UNC academic fraud records now searchable on WRAL
Posted November 3, 2015
Raleigh, N.C. — When the University of North Carolina at Chapel Hill posted 200,000-plus pages of public records on its website last month, it allowed the public to read some of the 5 million pages of emails, memos and other documents collected as part of the investigation into the school's long-running academic fraud scandal.
What the university did not provide was a way for the public to easily search those records. The documents were posted as PDFs with no "readable" text encoded in the files, meaning readers can't search for names or other keywords and must look through the documents one by one.
To help sift through the records, WRAL News processed the documents to make the text searchable and created an app to open that search to the public. Users can easily share what they find on Twitter using the hashtag #UNCdocs or email findings to the WRAL newsroom.
For example, our app allows users to search for Butch Davis, UNC-Chapel Hill's former head football coach who was fired as the scandal began to brew in July 2011. His name returns 979 results, allowing readers to quickly find mentions of him in the documents, including many emails he sent and received.
Readers can also search for Julius Nyang'oro, a professor at the center of the scandal who chaired UNC-Chapel Hill's Department of African and Afro-American Studies. His name returns more than 1,981 results. A search for Mary Willingham turns up 858 results. She's the university's former academic adviser who spoke out against lax standards for student-athletes and has since written a book about the scandal.
The app includes shortcut searches for some of the most well-known names mentioned in the documents and in the Wainstein Report, which details the nearly two decades of academic fraud at the university.
Readers can run their own searches as well, but not all names will show up. The university has redacted students' names and other information it has deemed private.
Wainstein's 131-page report found student-athletes were given preferential treatment in the classroom and were specifically steered by academic counselors toward classes in the African and Afro-American Studies Department that rarely met and required only a paper to pass. Four employees were terminated or resigned as a result of the investigation. Six other employees still face a review by the university and could be disciplined.
UNC-Chapel Hill released the more than 200,000 documents gathered as part of the Wainstein Report in response to public records requests from The News & Observer and The Daily Tar Heel – the largest public records requests in the university's history, according to school officials.
UNC plans to release more of the 5 million records as they are reviewed and redacted to protect private information. To date, the university says it has paid a law firm almost $3 million primarily for help preparing the documents for release, just part of the more than $10 million school officials have paid outside firms.
How we created the app
WRAL News used a Web-based service called DocumentCloud to process the documents with optical character recognition, which attempts to match images of text with their corresponding characters. OCR is never 100 percent accurate. Sometimes, letters and characters are too small, blurry or rendered in a difficult font. But it does give readers a chance to identify text in the documents.
We put the application together in a few days, based on how we thought our readers might be most interested in working with and reviewing the documents. While there still may be a few bugs, the most important thing for readers to know is that we want the application to evolve with their feedback.
We want it to be a helpful way for the public to review the public records released last month as well as additional ones we expect to see from UNC and other organizations. The best way to do that is to learn more about how our audience uses it.
Dumping hundreds of thousands of unsearchable pages in the form of almost three gigabytes of files on a website is the bare minimum of complying with North Carolina’s public records law. This application doesn’t change that, but we hope it becomes a tool we can use to make public records more accessible to the public.