Mimsy Were The Borogroves

I saw a movie trailer for The Last Mimzy and immediately recognized it as a science fiction short story I’d read thirty-five years ago. In ninth grade, I read The Year’s Best S-F (edited by Judith Merril) for every year that the school library had. Since that was in 1971, I probably read all eleven volumes from 1956 through 1966. It was wonderful, a new world every twelve pages.

I remain convinced that Mimsy Were The Borogroves was in one of those anthologies, even though I now know that it was first published in 1943. Henry Kuttner and Catherine Moore, writing as Lewis Padgett, put together a tale of a device from the future that educates two children in mathematics far beyond the current understanding. They construct a tessarect, and disappear. Exciting and sad technology at the same time, probably an interesting read for scientists at the Manhattan Project.

I read the anthologies in chronological order, and saw an interesting shift from rockets to inner space. By the end, I was reading Flowers for Algernon and an odd story about a women who can communicate with the roaches in her New York apartment. If you haven’t read Flowers for Algernon, find a copy of the short story (technically a “novelette”). It is really more powerful in a single sitting and weaker when stretched to a novel.

A few years later, at North Central High School in Indianapolis, I was stage manager for a play based on that story. As I remember, I had to manage changes for fifty-six scenes in Charly.

Forty years ago, a school librarian at Baton Rouge High School decided to buy that set of books. It wasn’t a big library (I can clearly see it today in my mind), so I’m sure it was a tricky decision. Whoever you are, thank you.

Liberal Pigs Go To War!

The French nationalist right-wing Front National party set up an office in Second Life and were soon besieged by avatars against intolerance. Things quickly got weird:

So amid the exchange of salvos, the chat log was choked over with pro and anti-Le Pen curses, most in French. And when the lag was not too overwhelming to stream audio, the whole fracas was accompanied by bursts of European techno.

One enterprising insurrectionist created a pig grenade, fixed it to a flying saucer, and sent several whirling into Front National headquarters, where they’d explode in a starburst of porcine shrapnel.

No one can protest like the French.

The End of Open Spider Standards?

Yahoo and Microsoft have signed on to Google’s sitemap.xml format and published it at sitemaps.org. Two weeks ago, Yahoo announced support for wildcards in robots.txt which seems to be something similar to Google’s (non-standard) robots.txt.

It is well past time for updates to these, but it is sad that there was no attempt to include anyone outside of the Big Guys and that there is no invitation for anyone else to contribute. Robotstxt.org is still there, as is the ROBOTS mailing list, but both were bypassed.

The sitemap protocol is published under a Creative Commons license, but there is no mailing list, no wiki, not even a feedback e-mail address on the website. Questions are referred back to each individual search engine.

This is both sad and foolish. The big three are not the only bots in the world, and more eyes make a better spec. Submit this to the IETF, OK? It isn’t that hard, and the spec will be much, much better afterwards. Look at the improvement from RSS 2.0 to Atom (RFC 4287). It is like night and day.

Five Habits of Seven Successful Websites

Aaron Swartz calls his post Seven Habits of Highly Successful Websites, covering five approaches common across MySpace, Wikipedia, Facebook, Flickr, Digg, del.icoi.us (I can never type that correctly), and Google Maps. I guess that the sixth and seventh habits are “get the details wrong” and “always use magic numbers”.

The five are:

  1. Be Ugly
  2. Don’t Have Features
  3. Let Users Do Your Job
  4. Ignore Standards
  5. Build to Flip

Welcome to the real Web 2.0.

Search Transparency and Trust

One way to increase user’s trust in your search engine is to give hints about how it works. When a search engine doesn’t work, the wrong results can be mysterious. That mystery leads to mistrust and to some interesting folklore about search engine algorithms.

Why is Australian radio associated with Disney? Well, because the engine thought that the Australian Broadcasting Corporation (ABC) was the same as the ABC that is part of Disney. With no explanation, that looks stupid, but with “ABC” highlighted, it is a reasonable mistake. That extra information makes the search engine more trusted.

Snippets: These days, we expect search engines to show passages from the matched documents and to highlight the matching words. Why is that important? Because it shows what the engine matched in that document and helps explain why it appears in the results. It eliminates the mystery so the user can say, “Oh you silly engine, that is the wrong ABC!”

Because you liked: At Netflix, the recommendations are introduced with an explanation. For our account, it looks like this today, “The following movies were chosen based on your interest in: Man with the Movie Camera, Gladiator, Harvey.” Without that hint, I would be genuinely confused by recommendations including Steamboat Bill, Jr. and The Last Samurai.

Group by Topic: Showing related topics is helpful, but a topic name is usually not enough information for people to trust the link. Instead, show the topic and the first two or three documents in that topic. This is especially useful when the user’s query doesn’t match up with the way the topics are organized. A search for “linux” could match press releases, products, knowledge base, and so on. Show the first few matches in each of those areas and the contents are much more clear.

Google, Yahoo, and MSN cannot reveal their algorithms, but you can (unless you use Google for your site search, oops). The WWW engines must defend against spammers taking advantage of loopholes in their scores. If you own your own content and your own search engine, you can reveal as much as you want. Just don’t make it all about the engine, the users are there for the content.

Yellow Pages are Dead

At least the paper version of the yellow pages is dead. I opened the new phonebook to find an Indian restaurant for take-out, and the listings were skimpy enough that the the by-cuisine section is gone. Restaurants that I know exist were not listed. My guess is that restaurants are dropping their yellow pages ads in favor of web sites. In the latest phone book for Palo Alto, that section in the directory just dropped below the useful level. I won’t bother with it again.

Unfortunately, local web search still isn’t doing the job. Yahoo! Local is the best, but browsing multiple pages for different kinds of restaurants is really clumsy compared to the good ol’ paper yellow pages. Try it: a Yahoo! search for indian restaurants, palo alto links to this Yahoo! Local result. Not bad, but it doesn’t seem very complete or up to date. Why list the Whole Foods six miles away in downtown Palo Alto but not the new one a mile away in Los Altos?

If I have more time to plan, I scan the Metroactive restaurant section which has pretty good coverage, but with some mysterious navigational division between full reviews and the one sentence descriptions.

In our case, we skipped restaurant take-out entirely, and grabbed some Indian from the deli/take-out section at Piazza’s, our local grocery. So switching from paper to web didn’t really pay off for the local restaurants.

Silicon Valley is a few years ahead of the rest of the country in web adoption, so let’s hope that local search can get it together before my parents in Texas are stranded with a skinny, useless yellow section in their phonebook.

Backpacking: A Cutting Board and a Fix for Slippery Pads

While in Bed, Bath, & Beyond getting a new coffee maker, I grabbed a couple of inexpensive items for backpacking.

A flexible cutting board. These cost $4 for a pack of two 12″x15″ sheets of tough plastic. I might cut one to a smaller size for easier packing. I think we’ll keep the other one for car camping. How do you cut a cutting board? I’m betting on my compound metal shears.

A roll of grabby rubber drawer lining, the kind that is soft with a sort of honeycomb of holes. Wrap a length of this around your sleeping pad, and it will stay put in your tent. Your sleeping bag will also stay on your pad. I chose the dark brown color so it won’t show as much dirt. This was $10 for a 20′ roll. Six or seven feet should be enough for one wrap around the pad with a bit of overlap, so this will supply three people.

Both of these ideas are from a backpacking colleague (and fellow Scoutmaster). They are lightweight and cheap, and address serious backcountry issues: food cleanliness and good sleep. Your mind and attitude are critical safety equipment, so you must keep them in good shape. If your trek leader is sleep-deprived and throwing up, they probably aren’t making the best decisions.

Hmm, sounds like another Scoutmaster Minute, if I can find a hook to something that matters to the boys.

More Bad Reporting from CNN Headline News

Yesterday, our local CNN Headline News radio station (KLIV 1590) ran a puff piece on the CEO of Harper-Collins. I think it was the CEO. She sounded like a sharp person, but I can’t manage to even verify her name on the badly-designed H-C website. In it, they claimed that H-C is the first publisher to make excerpts of their books available on-line. They rolled out that “innovative” idea this year. Bzzzt. Wrong.

OK, accepting the qualifier “publisher” does rule out relative unknowns like Amazon (search inside this book) and Google’s book search. Google and Amazon aren’t publishers. And “excerpt” accidentally rules out the Baen Free Library which publishes entire books. But it is still wrong.

Every heard of the publisher O’Reilly? The reporter would have, if they’d ever even walked into the cube of anyone who keeps cnn.com running. O’Reilly has published book chapters for many years. Five years, seven, who knows? Heck, they’ve been doing it long enough to move past that stuff to a customizable on-line textbook publishing system for universities.

And I can’t even link to the CNN HN story so I can diss it specifically, because it doesn’t seem to exist on their website. Bleah.

I titled this “more bad reporting” because the worst science reporting I’ve ever heard was on CNN Headline News. I’ll write that up later.

Design for Easier HTTP Load Testing

I’ve told two people about this trick in the last few days, so it is worth writing it up.

It is hard to get a good distribution of requests in your HTTP load tester, it usually requires a knowledge of valid keys or users and a model of the distribution of the accesses. This can all be built in the load tester, but that seems to be a big barrier, since I’ve rarely seen that happen. I’ve certainly never done it and I’ve needed it several times.

The easy way to do this is to add a “random choice” parameter to the app. The app already knows the legal set of keys or users and can quickly make a choice. You already know the language and the code in the app, and the changes are localized to the URL parameter parsing. Let’s say you have a back-end server that returns records.

http://example.com/getRecords?key=12345&key=67890
http://example.com/getRecords?key=random&key=random

An HTTP load tester can access the single randomizing URL over and over again, and fetch different records each time. This is a trivial load test script. In Jakarta JMeter, it is one of the samples.

This is really very easy to write inside of the server. Getting a random key looks something like this, assuming that we already have an instance of java.util.Random initialized and ready to go.

key = cache.keySet().get(random.nextInt(cache.keySet().size()))

In Python, you can use the default, shared instance of the random source and the choice() convenience method:

key = random.choice(cache.keys())

This can all be done in the code that parses the URL parameters. Once you have a random key, the remainder of the app executes with no changes.

If the app should be tested with a non-uniform distribution of accesses, that is also easy to do. Python’s random.paretovariate() looks especially good for Zipf (80/20 or “long tail”) distributions. Or you could duplicate that code in your favorite language:

def paretovariate(self, alpha):
"""Pareto distribution.  alpha is the shape parameter."""
# Jain, pg. 495
u = 1.0 - self.random()
return 1.0 / pow(u, 1.0/alpha)

For user logins, add an option to masquerade as a random user, or even a random user from certain classes (big profile, frequent login, new user …).

For testing search, I once made an especially fancy tester that would access a log of queries in order, but start at a different place for each client. This preserves the time locality of queries while giving each client a different set. I used a cookie to hold the per-client state, so that each client would access the queries in order from their starting place. It went roughly like this:

  1. If the client did not send a cookie, choose a random index in the log.
  2. Otherwise, read the cookie to get an index.
  3. Set the cookie to the next index.
  4. Wrap the index, modulo the log size.
  5. Run the search with the query at that index.

Now go test your software. I might need to use it someday.

While Reading Richard III

You’re expected to think deep thoughts while reading Hamlet, but Richard III is a crowd-pleaser, Shakespeare’s first big hit on the stage, so herewith a series of thoughtlets.

Shakespeare is famous for insults, but this play specializes in curses. There are a few good insults, of course. Richard calls Queen Margaret a “foul, wrinkled witch.” That sets the tone for that relationship. But the curses are almost as evil as the deeds, “Die neither mother, wife, nor England’s queen!,” Queen Margaret says, wishing the deaths of Queen Elizabeth’s father, sons, and husband.

Some of Shakespeare’s plays are more accessible than others. This one is pretty good, though keeping track of four kings, two queens, two near-queens (dead and promised, respectively), and innumerable lords gets a bit old. It would be easier to understand if Richard 3.0 directly followed Richard 2.0. But then we’d have betas. Dang.

Is it time for a new book of management ideology? The main challenge in writing Management Secrets of Richard III would be getting 300 pages out of “demand total loyalty, lie to everyone, kill anyone in your way.”

In Act 3, Scene 7 Richard makes a tremendously risky and confident move. Almost every obstacle between him and the throne is dead or locked away, so he refuses it and makes them beg. His false objections are a hint of truth, “Alas, why would you heap this care on me? / I am unfit for state and majesty.” When he finally agrees, he claims that the blame lies on them if it all goes wrong. Suckers.

The history of actors playing Richard seems to be a continuing struggle to rise above chewing the scenery. The part invites several kinds of overacting, but also allows very different interpretations. It must be a real thrill to nail that part.

Richard Plantagenet was born October 2, 1452, Niccolò Machiavelli on May 3, 1469. It is a shame that they never met. Niccolò had the theory, Richard the practice.

The play isn’t history, Richard couldn’t have been that evil. It is based on seriously biased Tudor histories. That makes it more fun, like listening to the home-town radio announcers for baseball instead of the carefully even-handed TV commentators. Before every Rice football game they’d tell us “No cheering in the pressbox,” but that didn’t stop us from writing how the dominant Owls crushed the hapless Horned Frogs.

I’m not especially happy with the notes in this edition (Signet), about a third of them are things I don’t need to be told, and there are quite a few mysteries without notes. I guess it is back to the Arden Shakespeare. More expensive, but worth it.

Many Shakespeare plays are just full of lines that are widely quoted. Beyond “this is the winter of our discontent” and “my kingdom for a horse,” there aren’t many in Richard III. Those two are the first line of the play and the last of the second to last act, Richard’s first and last lines — clearly Shakespeare knew when to play his best cards.

The pacing is interesting in Act 5. The last two scenes are extremely short, 13 and 40 lines to cover the final clash between Richard and Richmond, Richard’s death, and Richmond’s closing speech. If those were preceded by normal action, the play would feel cut short, unfinished. But the scene before delays the clash is a parade of ghosts who recapitulate Richard’s murders, something that would usually be done in a final speech. We are held at the high point of tension, and the shock of the final scenes can hit with full force.

These are not subtle or especially deep characters. Richard is broadly drawn, with some shreds of humanity, but the other characters are pretty shallow. Mostly, we get to watch them get sucked into the evil vortex that is Richard and see how much they struggle against it. They each get their turn, but it is all about Richard.

Update: A few hours after I posted this, I read about a Shakespeare-themed virtual world. The first play they’ll tackle is Richard III.

CALL BRTHDY(50)

FORTRAN turns 50 years old in four days. October 15, 1956 was the release date for Programmer’s Reference Manual, The FORTRAN Automatic Coding System for the IBM 704 EPDM (6.1 Meg scanned PDF). FORTRAN was an amazing achievement, inventing the idea of a compiler while generating code as fast as hand-coded assembler.

There are a couple of early papers that give a feeling for how hard all this was. The FORTRAN Automatic Coding System (1957) describes the design of the compiler. History of FORTRAN I, II, and III (1978) goes into the economics of computing at the time, influences, design decisions, and follow-ons. They didn’t have it really working until April 1957, which seems rather similar to modern software projects.

Some possible ways to commemorate this occasion:

  • RESTRICT YOUR TYPING TO 6-BIT BCD (SEE APPENDIX A OF MANUAL).
  • Use no words longer than six chars.
  • Propose the arithmetic IF as a Java extension.
  • Use GO TO. A lot.
  • Number your statements.
  • Refer to the LEDs on your computer as “sense lights”.
  • Solve a problem that uses only 32K 6-bit words of memory. Data and program has to fit.
  • No indentation.
  • Switch to vacuum tube heat this winter.
  • Write a program on a coding form, type it in, and run it. If there are any errors, even syntax errors, start over.

Prepare for the festivities by (re)reading the Programmer’s Reference Manual. It is only 51 pages, and refreshingly clear. The whole language fits in your head — no running back to the manual to figure out why const is propagating through your templates like a virus or whether you should use notify or notifyAll.

Goodbye, Liz

A dozen years ago, I heard one Liz Phair song on the radio and bought the CD. I’ve bought every one since. I heard the first song on her most recent CD, and I think this may be my last. I listened all the way through, but it was painful. By the end, I was making guesses about the next tired rock cliche, will she say “Baby” in this one? Yes.
As one Amazon reviewer said, “It is OK if she wants to be Sheryl Crow, but this isn’t even good Sheryl Crow.”

What the heck happened? Selling songs and making a living is good, but do they have to be so bland? Exile in Guyville is in the past, I don’t expect another one of those, but Somebody’s Miracle is just lame. Liz can write pop, I’m sure of it. “Polyester Bride” from whitechocolatespaceegg has a monster hook. The chorus will be stuck in my head for two days just from typing that sentence.

It’s been a fun ride, but I’m getting off now. Spread your wings and fly away, Liz.