Dates Suck

This weekend we uploaded 8,880 records from the Denver Zine Library. And deleted them and uploaded them. Twice. The errors were exclusively date problems. I was going to let the problem records stand without the date fields, but then reconsidered because it seemed like there should be an easy fix. Just because I didn’t find it, doesn’t mean that there isn’t!

A screenshot of GChat conversation with Lauren: If I update the MAP again to skip alternate title, I won't get as many errors. I was just hoping that it would work for some of the items." J, "Like it would change its mind about them?" L: "Hope is silly in this situation tho. lol. yeah exactly."

I tried to remove all text from cells in the date column, but neither MS Office or LibreOffice cared for my [AZaz] find and replace attempt. Ultimately I used a combination of clustering in OpenRefine and pattern matching in Excel, and dumped all of the non-year date information. It was inelegant, and removed important characters like question marks and granular detail like month, season, or holiday (e.g., “Yuletide” is a word I abolished because I am humbug af). 

It took a while, and I’m confident that there are more thorough and automated ways to accomplish what I did, but my internet searches failed me. I remembered Lisa Rhody once saying something along the lines of “the way you get it done is the best way to do it,” so I’m taking the win. Now Lauren is in the same struggle with my stoopid Barnard records. She’s dealing with alternate title problems, too. I feel so bad! But really I blame fussy library catalogers and their annoying precision. I’m mostly kidding, my Troublesome Catalogers and Magical Metadata Fairies!

Meddlesome Lauren couldn’t leave well enough alone with all the problems from this week’s library ingests, Barnard and Denver. She had to go poking around in last week’s mess and discovered that subject terms are doing something weird. A zine record will include one textual subject, and the rest are printed as their vocabulary item number. 

A screenshot of ZineCat record for Hack This Zine.

Eric says it’s fine, and we’ll figure out how to print the text string, not the vocab item number on the display side or in a later ingest, but I’m still like 

A screenshot of Tweet, "We've now got about 20K records in our development instance, but it's possible that there are some rando numbers in the subject field. #dirtydata #zinelibraries #unioncatalogs #whylordwhy
Lauren uploaded the 5,957 Barnard records, and at press time is in the process of mapmaking for the Sallie Bingham Center records, and Eric is helping us get an export from the QZAP CA catalog. 

The other thing going on right now is that we’re getting started on our white paper content. Lisa and Maura have agreed that we can submit a zine, rather than a white paper. Lauren outlined the zine contents in last week’s blog post. I started a folder and individual pages for each section. I then wrote down the names of each section and put them in an old mug. I drew the slip of paper that read, “Through the MA,” and summed up our ZineCat work in seven MALS and MADH program classes in fewer than 600 words. It feels good to be transitioning from just working on the catalog to starting our formal documentation process.

Leave a Reply

Your email address will not be published.