Kindle KittieTechTalk Blog: Consumer Tech News
Want to stay current on the latest tech issues and trends? Find out about cool stuff you can use, news you should be aware of and resources that should come in handy with WRAL's TechTalk with technology writer and researcher Tara Calishain.

Duke University puts old yearbooks online

Duke University has made back issues of its yearbook, The Chanticleer, available on The Internet Archive. The back issues cover 1918 to 1960 and are available via an Internet Archive search.

This isn't an extensive presentation where you can look up individuals by name or class year. Instead it's digitized versions of the books themselves, each yearbook an individual entry. They're free to download and are available in several formats including PDF and text files (which you're probably familiar with) and DjVu and Flip formats (which you may not be familiar with.)

Note that these are pretty big files unless you just want the text version. Even a plain ol' PDF is over 25MB. Make sure you have a good connection before you start downloading.

Scanning in all these yearbooks and putting them online was a tremendous amount of work. However. It was driving me CRAZY that there was no way to search the text of all this yearbook content at once. You suspect Uncle Fred went to Duke, but you're not sure what year, and tracking him down might take some digging. I wondered if Google was indexing the content of the yearbooks, but running this search:

"duke university" chanticleer site:archive.org filetype:pdf

didn't find anything. So I pulled out links to all the text-only versions of the yearbooks and built a Google CSE.

A Google CSE -- a Custom Search Engine -- is just a search engine that looks and operates like Google but searches only the pages and sites that you've specified. It allows you to create your own search engine for little slices of the Web -- in this case, a very tiny slice.

The problem is, when I first made the Google CSE for searching the Duke Chanticleers, it didn't work. I didn't get any results for any of the searches I did. That's probably because Google had not indexed that content yet. But after I let the CSE sit for a couple of days, I could get results when I ran searches.

If you want to search the text of the Duke Chanticleer yearbooks, visit my Chanticleers Google Custom Search Engine atits Google Custom Search page. (WRAL's content management system doesn't allow me to embed search engines in my blog posts. But if you visit that page you'll see that there are ways for you to put the Chanticleer search on your own Web site if you like.) You'll see that this little search engine is only searching 43 pages, but each of these pages is the content of a yearbook.

Some notes on searching this content. If you want to find a name be sure to search for it both backwards ("Smith Fred") and forwards ("Fred Smith") as the names are sometimes listed on the yearbook backwards.

Now, what do you do once you find a name? Let's try an example. I'll try searching for "Smith Fred". I get three results, so I'm ready to go back to the Internet Archive and get copies of the yearbooks in order to go rummaging around for Fred. But how do I know which yearbooks to download?

Look in the URL of the search result. The first result of my Fred Smith search had this page URL:

www.archive.org/stream/chanticleerseria1932duke/chanticleerseria1932duke_djvu.txt

The URL will tell you what year of the Chanticleer to look at -- in this case, 1932. The only one that won't tell you that is this URL:

www.archive.org/stream/chanticleerseria00duke/chanticleerseria00duke_djvu.txt

... and I can tell you that one's 1950.

Is this an ideal solution to searching the yearbooks? Absolutely not. There's a lot of data here that's not divided up, you have to have a search engine in one window and the yearbooks collection in the other, and because the text was scanned by OCR software it might miss once in a while. But it's something to have the yearbooks' text aggregated all in one place, especially as a starting point if you think your Uncle Fred went to Duke but you're not sure when, and you want a quick tool to narrow down the search for his phiz.

Read More Posts from this Blog
Share:  

0 Comments


Golo

Welcome to GOLO, where WRAL.com visitors can comment on stories and create profile pages, blogs and photo galleries.

You must be a registered WRAL.com user to use these tools. Click here to register or log in.

View Comments View Comments


This blog post is closed for comments.

Featured Blogposts
  1. Ginyard scraps for rebound
    FANkind
    Marcus Ginyard not getting stops

  2. 04_davidson_ncsu.jpg
    FANkind
    Cowher says no to Buffalo

  3. Brian Shrader's Siteseeing Blog
    Brian Shrader's Siteseeing Blog
    Tillman the Skateboarding Dog


Other Recent Blogposts
  1. WRAL WeatherCenter Blog: Dropping the Drought

  2. Bill Leslie's Carolina Conversations: Holiday Preparations

  3. WRAL WeatherCenter Blog: Edward's impossible sun in "New Moon"

  4. Bill Leslie's Carolina Conversations: Gift Basket Deadline

  5. Brian Shrader's Siteseeing Blog: In the eye of the beholder