Skip to primary content
Skip to secondary content

Most Casual Observer

Most Casual Observer

Main menu

  • Home
  • About

Tag Archives: Metadata

How much does metadata cost?

Posted on April 11, 2008 by Walter Underwood
Reply

It is very hard to find numbers on what it really costs for metadata, but here is one from a Netflix job posting. $6 per movie for “original, descriptive movie and TV episode synopses.”

Here are links to a Hacking Netflix blog posting (likely to remain a valid URL) and to the Netflix job posting (guaranteed to succumb to link rot as soon as the opening
is filled).

The only other published numbers I’ve found are similar, $6.20 to $14.67 per jazz CD depending on the detail in 2003 at the Public Library of Cincinnati and Hamilton County. They were given a collection of 6200 jazz records and were estimating what that gift would cost them. See the article How Much Will It Cost? Making Informed Policy Choices Using Cataloging Standards.

The Netflix numbers are probably closer for an ecommerce or search application. Still, the close agreement in the numbers makes it pretty safe to say “less than $10 per document”.

Remember that the metadata must be updated when the document changes. Maybe “$10 per document per year” is a better number. HP was spending about that much to manage the HP-UX spec (man pages) about ten years ago. That covered all activities, not just metadata.

The Netflix job posting is for six openings, each with a six week duration. That sounds like a lot of work, but if I assume each writer does three synopses per hour (seems very fast for finished work), that is still only 4300 movies. Metadata is very, very expensive.

I have a couple of other stories without dollars, but still instructive.

One publishing company needed to digitize their back content and planned to start a division in the Philippines with 3000 employees to get it done. They found a different way.

I was consulting with a telecom company, and the CEO asked for metadata on every page in their intranet. They had 4M documents.

One final note, since I work for Netflix. All of the Netflix info here is derived from the job posting. No insider information was required or is included in this post.

Posted in Search Engines | Tagged Cataloguing, Metadata | Leave a reply

Everything is Miscellaneous (not!)

Posted on October 29, 2007 by Walter Underwood
Reply

I checked out David Weinberger’s Everything is Miscellaneous from the local library with every intention of reading the whole book, but after 150 pages, I just can’t waste any more of my time on it. It is just too sloppy — badly researched and the conclusions are flimsy.

One of my father’s frequent sayings is “Do your homework.” This doesn’t mean to finish your spelling paper, it means you should research your subject and not go in half-prepared. Weinberger has not done his homework and his book shows it. He repeatedly starts with bad assumptions, so the book crumbles into a pile of anecdote and opinion.

I find it especially embarrassing to finish something and find out that I’ve done a halfway job of reinventing a known solution, so I try to follow Dad’s advice and do my homework. I’ve been working in search for the past ten years and I’ve done a lot of homework. In multiple places in this book, Weinberger just gets it wrong, and wrong in places where it shoots down his argument.

Let’s take Chapter 3, where he criticizes how libraries organize information. Libraries have been beating on this problem for centuries and they have something that works. It isn’t always pretty, but it works. Weinberger takes the Dewey Decimal Classification (DDC) as representative of all library organization. It isn’t. DDC looks like a classification scheme, but it is really used to put books on shelves in some useful order, not to fully describe their topics. The goal is to get the WWI books near the WWII books, not to decide whether war is history, politics, or technology.

Then he makes fun of Library of Congress Classification (LCC) for putting the the Balkans at the same level as Africa. He manages to make three ignorant mistakes in one example. First, LCC is enumerative, so it doesn’t have a strong hierarchy. It is designed so it is easy to add space at the end, perhaps because there is no end to knowledge. Second, book classification schemes are shaped by when they were designed and by the universe of books. There are lots of books about the Balkans, probably as many as there were about Africa when the scheme was designed. Of course, he makes the same mistake as he made with DDC, confusing a locating scheme with a subject classification scheme.

He certainly should have looked at the Library of Congress Subject Headings (LCSH). These have the cross-referenced graph structure that he wants DDC and LCC to have. They are updated weekly, extensible, and on-line.

Weinberger picks The Little House Cookbook to demonstrate that Amazon’s info on the book is superior to a library “card catalog”. He shows the Amazon categories below, then uses a customer’s list to find additional books about and by Laura Ingalls Wilder.

Children’s Books>Authors & Illustrators, A-Z>Williams, Garth
Children’s Books>History & Historical Fiction>United States>1800s
Children’s Books>Sports & Activities>Cooking

Now let’s look up the record in WorldCat. Amazingly, it is already tagged with Laura Ingalls Wilder as one of the subjects. We don’t need user lists to do that, just good categories and catalogers (both of which are expensive). Aside from the quaint word “cookery”, this list of categories seems more useful than the ones at Amazon:

Cookery, American — History — Juvenile literature.
Literary cookbooks.
Wilder, Laura Ingalls.
Cookery, American — History.
Frontier and pioneer life.

This is his crowning example in Chapter 3, which he uses to show that Amazon is better than his misunderstanding of DDC. But Amazon isn’t better in this example, so the whole thing falls flat.

If an author makes these kind of mistakes in a book about organizing information, I can’t trust him. Library technology has been continually developed since at least Callimachus at Alexandria. If you care about information, you need to grok library technology and its true strengths and weaknesses, not tell us why Melvil Dewey spelled his name oddly.

I recommend you don’t waste your time on this book. I gave up when I was spending most of my time picking apart his examples and not learning anything in the process. Disagreeing with someone who has done their homework is invigorating, but this was just red pencil time.

Instead, choose one of these books. I found each one of these increased the depth and breadth of my thinking. You might not agree with the authors, but you’ll get a good workout doing it.

  • The Social Life of Information, John Seely Brown and Paul Duguid
  • Understanding Comics, Scott McCloud
  • A Theory of Fun for Game Design, Raph Koster
  • Women, Fire, and Dangerous Things, George Lakoff (I just read the first part, part two is for linguistics geeks only)
  • The Nature and Art of Workmanship, David Pye
  • Managing the Flow of Technology, Thomas Allen
  • Democracy in America, Alexis de Tocqueville
  • Decisions and Organizations, Jim March

By the way, Clay Shirky is just as sloppy. His Ontology is Overrated makes the same mistakes. What a mess.

Posted in Books, Libraries, Search Engines | Tagged Cataloguing, Metadata | Leave a reply

Variant Spellings of “Guns N’ Roses”

Posted on September 11, 2007 by Walter Underwood
2

Paul Lamere posts different spellings of “Guns N’ Roses” and “Tchaikovsky” from ID3 tags. Here are the top twelve for “Guns N’ Roses”:

Guns N Roses
Guns and Roses
guns ‘n’ roses
Guns ‘N Roses
Guns & Roses
Guns’N’Roses
Guns N’Roses
Guns’N Roses
Guns´n Roses
Guns N´ Roses
Guns -N- Roses
GNR

You’ll need to go to the posting for the “Пётр Ильич Чайковский” variants.

Posted in Search Engines | Tagged Cataloguing, Metadata | 2 Replies

Cataloger or Director of Metadata?

Posted on July 7, 2007 by Walter Underwood
Reply

Marc Siry posts about Job Descriptions from the Future: Director of Metadata, but it sure sounds like a cataloging librarian to me. Here are the key parts of the job description:

  • Fashion metadata requirements for content partnerships
  • Determine metadata needs for editorial programming interfaces
  • Develop a schema that supports product capabilities for discovery and reporting
  • build out the CMS and data entry methods to support the schema
  • plan for future expansions and revisions of the schema as the business evolves

A 1200 word post on managing metadata with no mention of librarians. It looks like NBC Universal is going to be reinventing a lot of stuff. To reword Isaac Newton’s famous comment, this is standing on the toes of giants, not their shoulders.

Posted in Search Engines | Tagged Cataloguing, Metadata | Leave a reply

Recent Posts

  • The Power of Suggestion
  • RF Exposure Calculations for Emergency Commmunication
  • Scout Backpacking Around the SF Bay Area
  • Backpacking: Pioneer Outpost at Cutter Scout Reservation
  • Backpacking: Castle Rock State Park

Top Posts & Pages

  • Better Yamaha CM500 Audio with PTT on Elecraft KX3
  • Scouting @ Home: Family Life Merit Badge
  • Using a Mobile Antenna as a Temporary Base Antenna
  • SOTA at Philmont
  • Speakers for my Elecraft KX3
  • Transmit Audio and Compression with the Elecraft KX3
  • How to Make a Roux
  • Philmont Pack Weights 2010
  • Emergency Communication in the Wilderness
  • Windshield Survey: A COVID-Friendly Emergency Service Project (E. Prep. 7a)

Recent Comments

Bob Perlman on Koss SB-45 vs Yamaha CM50…
Scout Pioneering on History of Morse Code in the B…
Walter Underwood on Women at Philmont
Charley on Women at Philmont
ananniamul1212 on RF Exposure Calculations for E…

Categories

  • Amateur Radio
  • Backpacking & Hiking
  • Backpacking Food
  • Books
  • Computers
  • Elecraft KX3
  • Emergency Communication
  • Family
  • First Aid
  • Food & Cooking
  • Games
  • Libraries
  • Morse Code
  • Movies & TV
  • Music
  • Photography
  • Radio Scouting
  • Scouting at Home
  • Scouts BSA
  • Search Engines
  • Service Dogs
  • Uncategorized

Archives

  • April 2022
  • January 2022
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • April 2020
  • March 2020
  • February 2020
  • November 2019
  • July 2019
  • April 2019
  • March 2019
  • November 2018
  • August 2018
  • July 2018
  • June 2018
  • August 2017
  • June 2017
  • May 2017
  • April 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • September 2016
  • June 2016
  • May 2016
  • April 2016
  • March 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • July 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • July 2013
  • April 2013
  • March 2013
  • August 2012
  • July 2012
  • May 2012
  • October 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • May 2009
  • April 2009
  • February 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • January 2008
  • December 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007
  • March 2007
  • February 2007
  • January 2007
  • December 2006
  • November 2006
  • October 2006
  • September 2006
  • August 2006
  • June 2006
  • May 2006
  • April 2006
  • March 2006
  • February 2006
  • January 2006

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com
Create a free website or blog at WordPress.com.
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • Most Casual Observer
    • Join 464 other followers
    • Already have a WordPress.com account? Log in now.
    • Most Casual Observer
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar