It is very hard to find numbers on what it really costs for metadata, but here is one from a Netflix job posting. $6 per movie for “original, descriptive movie and TV episode synopses.”
Here are links to a Hacking Netflix blog posting (likely to remain a valid URL) and to the Netflix job posting (guaranteed to succumb to link rot as soon as the opening
is filled).
The only other published numbers I’ve found are similar, $6.20 to $14.67 per jazz CD depending on the detail in 2003 at the Public Library of Cincinnati and Hamilton County. They were given a collection of 6200 jazz records and were estimating what that gift would cost them. See the article How Much Will It Cost? Making Informed Policy Choices Using Cataloging Standards.
The Netflix numbers are probably closer for an ecommerce or search application. Still, the close agreement in the numbers makes it pretty safe to say “less than $10 per document”.
Remember that the metadata must be updated when the document changes. Maybe “$10 per document per year” is a better number. HP was spending about that much to manage the HP-UX spec (man pages) about ten years ago. That covered all activities, not just metadata.
The Netflix job posting is for six openings, each with a six week duration. That sounds like a lot of work, but if I assume each writer does three synopses per hour (seems very fast for finished work), that is still only 4300 movies. Metadata is very, very expensive.
I have a couple of other stories without dollars, but still instructive.
One publishing company needed to digitize their back content and planned to start a division in the Philippines with 3000 employees to get it done. They found a different way.
I was consulting with a telecom company, and the CEO asked for metadata on every page in their intranet. They had 4M documents.
One final note, since I work for Netflix. All of the Netflix info here is derived from the job posting. No insider information was required or is included in this post.