Everything is Miscellaneous (not!)

Posted on October 29, 2007 by Walter Underwood

I checked out David Weinberger’s Everything is Miscellaneous from the local library with every intention of reading the whole book, but after 150 pages, I just can’t waste any more of my time on it. It is just too sloppy — badly researched and the conclusions are flimsy.

One of my father’s frequent sayings is “Do your homework.” This doesn’t mean to finish your spelling paper, it means you should research your subject and not go in half-prepared. Weinberger has not done his homework and his book shows it. He repeatedly starts with bad assumptions, so the book crumbles into a pile of anecdote and opinion.

I find it especially embarrassing to finish something and find out that I’ve done a halfway job of reinventing a known solution, so I try to follow Dad’s advice and do my homework. I’ve been working in search for the past ten years and I’ve done a lot of homework. In multiple places in this book, Weinberger just gets it wrong, and wrong in places where it shoots down his argument.

Let’s take Chapter 3, where he criticizes how libraries organize information. Libraries have been beating on this problem for centuries and they have something that works. It isn’t always pretty, but it works. Weinberger takes the Dewey Decimal Classification (DDC) as representative of all library organization. It isn’t. DDC looks like a classification scheme, but it is really used to put books on shelves in some useful order, not to fully describe their topics. The goal is to get the WWI books near the WWII books, not to decide whether war is history, politics, or technology.

Then he makes fun of Library of Congress Classification (LCC) for putting the the Balkans at the same level as Africa. He manages to make three ignorant mistakes in one example. First, LCC is enumerative, so it doesn’t have a strong hierarchy. It is designed so it is easy to add space at the end, perhaps because there is no end to knowledge. Second, book classification schemes are shaped by when they were designed and by the universe of books. There are lots of books about the Balkans, probably as many as there were about Africa when the scheme was designed. Of course, he makes the same mistake as he made with DDC, confusing a locating scheme with a subject classification scheme.

He certainly should have looked at the Library of Congress Subject Headings (LCSH). These have the cross-referenced graph structure that he wants DDC and LCC to have. They are updated weekly, extensible, and on-line.

Weinberger picks The Little House Cookbook to demonstrate that Amazon’s info on the book is superior to a library “card catalog”. He shows the Amazon categories below, then uses a customer’s list to find additional books about and by Laura Ingalls Wilder.

Children’s Books>Authors & Illustrators, A-Z>Williams, Garth
Children’s Books>History & Historical Fiction>United States>1800s
Children’s Books>Sports & Activities>Cooking

Now let’s look up the record in WorldCat. Amazingly, it is already tagged with Laura Ingalls Wilder as one of the subjects. We don’t need user lists to do that, just good categories and catalogers (both of which are expensive). Aside from the quaint word “cookery”, this list of categories seems more useful than the ones at Amazon:

Cookery, American — History — Juvenile literature.
Literary cookbooks.
Wilder, Laura Ingalls.
Cookery, American — History.
Frontier and pioneer life.

This is his crowning example in Chapter 3, which he uses to show that Amazon is better than his misunderstanding of DDC. But Amazon isn’t better in this example, so the whole thing falls flat.

If an author makes these kind of mistakes in a book about organizing information, I can’t trust him. Library technology has been continually developed since at least Callimachus at Alexandria. If you care about information, you need to grok library technology and its true strengths and weaknesses, not tell us why Melvil Dewey spelled his name oddly.

I recommend you don’t waste your time on this book. I gave up when I was spending most of my time picking apart his examples and not learning anything in the process. Disagreeing with someone who has done their homework is invigorating, but this was just red pencil time.

Instead, choose one of these books. I found each one of these increased the depth and breadth of my thinking. You might not agree with the authors, but you’ll get a good workout doing it.

The Social Life of Information, John Seely Brown and Paul Duguid
Understanding Comics, Scott McCloud
A Theory of Fun for Game Design, Raph Koster
Women, Fire, and Dangerous Things, George Lakoff (I just read the first part, part two is for linguistics geeks only)
The Nature and Art of Workmanship, David Pye
Managing the Flow of Technology, Thomas Allen
Democracy in America, Alexis de Tocqueville
Decisions and Organizations, Jim March

By the way, Clay Shirky is just as sloppy. His Ontology is Overrated makes the same mistakes. What a mess.

Future of Bibliographic Control

Posted on March 19, 2007 by Walter Underwood

I went to the Library of Congress open meeting on bibliographic issues a couple of weeks ago. Interesting, but I think they have a long way to go. This meeting was a good stab at understanding users, both searchers and catalogers, but the tricky part is the model and system interface. How to support links and mashups and massive content generation and cataloging? There was some talk about tagging, but the anti-spam algorithms needed for low-trust, low-authority cataloging are far beyond the expertise and budgets of libraries.

The official writeup and lots of notes by Karen Coyle are good places for more thorough coverage.

Bernie Hurley from UC Berkeley gave a talk on issues today with MARC (see Karen’s notes). This was far more interesting than I expected, mostly because it was fact-based. Some tidbits:

MARC cataloging is expensive, even when outsourced to India
thesis cataloging is different, the subject areas tend to be outside of the established categories
MARC has more information than they use (have 175 fields but 2/3 of search is on just 3 and they show a maximum of 27)
it does not have the information that is needed for search and faceted browse (from Andrew Pace, NC State)
the book height and depth are measured for shelving, but we need the weight and thickness for mailing them (also from Andrew)

The main fields they use are:

Author
Title
Subject keywords
Date for sorting
LC Classification

Several speakers, both from the podium and the floor, were pinning their hopes on full-text search. I presume that is because they haven’t tried implementing it. I appreciate the optimism, but full-text is Muggle Technology, not magic. Full-text is great for finding the next 20% or 30% of stuff, but most of your good results come from great metadata (including links and attention data). As Dan Clancy (Google Book and Google Scholar) pointed out, book search is much harder than web search precisely because you don’t have as much link data (metadata). No one had any good ideas about how to get access to all that text so it could be indexed. Well, ideas besides Google Book.

Hey, why wasn’t Brewster Kahle invited? Maybe the LoC already knows what he thinks, but a position paper would be handy for the rest of us.

On-line access to content is working OK. The only complaints were about the URL fields in library catalogs. If you don’t know what MARC is, take a moment to look over MARC 856, Electronic Location and Access. It’s a little more complicated than the <a> tag.

The day started with an interesting and dangerous talk by Timothy Burke on the wonders and difficulties of serious research using our current tools (see Karen’s notes). It was mostly about searching techniques, though it wasn’t really explained that way. I would have been happier if he’d started with some terminology from Marcia Bates. The personal view was helpful, but this should be well-understood stuff by now.

The danger is aiming our tool efforts primarily at the expert user. That way lies disaster. There is really only one way to do this and succeed, and that is to follow the Rob Pike architectural rules:

Simple things are simple.
Hard things are possible.
You don’t have to understand the whole system to use part of it.

Once you do this, the fancy tools can be built on top of it. If you design for the fancy stuff, the system will never be simple and it will probably be over-fit to an old problem (like MARC is today).

One other point from Burke’s presentation, universities no longer teach how to do literature search. Each discipline has general techniques and domain-specific ones (think chemical structure search), and this cannot be fobbed off on some other department. Striking out on your own might help avoid the prejudices of the field, but it can also mean missing and reinventing a lot of stuff.

I also saw some premature target lock-on. For example, converting subject headings to strings of standalone “subject keywords” is a lot of work, and is primarily useful for faceted browsing. Faceted browsing is good, but it is only one approach. We may be using facets because they are the best we can do with the HTML-based web apps of the past five years. Is it right for five years from now, when the conversion is done or did we just blow a wad of cash on another dead technology?

Finally, I should have asked Andrew Pace how much NC State spent on Endeca.

A side note — Google did a poor job of hosting this event. We had to park a half-mile away, there were no power strips for laptops, I couldn’t get back on the GoogleGuest net after 10AM, we had a “mini kitchen” instead of the usual wide array of free munchies (dang!), and lunch was “here’s a map of the area”. No one stood up to say “let me know if there are any problems”. A few people got power by unplugging the massage chair. Worst of all, the committee was ushered off to the Google Cafeteria, so there was no way to talk with any of them over lunch. Why have an open meeting if you aren’t going to eat together? That was golden time with users, and it was squandered.

UnSuggester at LibraryThing

Posted on November 13, 2006 by Walter Underwood

Recommendation systems are so serious, so it is fun to see LibraryThing have theirs show you the both the best and the worst match for a book. That is, if you like Diplomacy by Henry Kissenger you won’t like Thud! by Terry Pratchett. Seems like a safe bet.

	Walter Underwood on Using a Mobile Antenna as a Te…
	Letters to me: April… on Are Websites Dead?
	Letters to me: April… on Are Websites Dead?
	Randolph King on My Father’s Pens
	Whoopers on Women at Philmont

Most Casual Observer

In physics class, many things are intuitively obvious to the most casual observer. Welcome to my casual observations.

Category Archives: Libraries

Everything is Miscellaneous (not!)

Future of Bibliographic Control

UnSuggester at LibraryThing