My Preaching Schedule

I guess preaching is in my blood, like it or not.

I didn’t follow my father and grandfather into the ministry, but I recently realized that I have a regular preaching schedule. Twice a month, I deliver a “Scoutmaster Minute”, a traditional homily given at the end of a Boy Scout troop meeting. We gather in a circle, and I have a minute (or two or three) to say something meaningful and memorable.

My “parish” is this Scout troop, and the boys are in my care for a number of evenings and weekends each year, so I need to connect in those few minutes.

My father is an excellent preacher and a student of the art, so I’m not completely ignorant. Still, knowing and doing are separate things, and I’m still learning to practice what my father preached.

The ancient (and boring) formula is “tell ’em what you’re going to tell ’em, tell ’em, and tell ’em what you told ’em.” You might be able to get all that into a twenty minute sermon, but it is a bit much for a minute or three.

My father’s preferred approach, learned from Reuel Howe at the Institute of Advanced Pastoral Studies (how do I remember these details?), is more work, but more rewarding — take something from scripture, something from life, and relate the two.

Scouting doesn’t have Scripture, and Baden-Powell was a bit of a free-thinker and pacifist for the current crowd at BSA National, but I keep my eye out for authoritative bits of outdoor lore.

I also pay extra attention to my own life and my own memories. What have I done that is an example, good or bad? What matters this week for this troop?

Somehow, I picked up a few useful sermon-writing habits from my father — always carry a book, make notes, practice your stories and listen to other’s stories. Start with a rich pile of material (Gerry Weinberg’s fieldstone method), but also learn how to make a “good parts version” of that material. A great storyteller can spin a long yarn (Utah Phillips’ “Moose Turd Pie”) but I’m more comfortable with short and sweet.

I’ve started posting my Scoutmaster Minutes; the first two are Steve Irwin and Take the Bruised Apple.

These look very short when written down, but the second one is about a minute and a half when spoken, and felt pretty long in the meeting. Steve Irwin comes in right around thirty seconds and was very effective. I find this an interesting thing to get better at.

I haven’t had a problem finding a core, some quote or experience, but my first few minutes just petered out at the end. The two that are posted are after I started working on the close. What should I work on next?

Movable Type’s Impenetrable CSS

I really like MT, but the 3.x layout is really, really hard to work with. With MT 2.6, I spent about 45 minutes and had a simple fluid layout. Done. With 3.0, I wasted a couple of hours and got nowhere. I’ve seen a couple posts where people spent six or eight hours to get it working. That’s crazy.

I really, really hate fixed-width designs and Six Apart doesn’t offer a single fluid design in their styles, so I’m forced to do it all from scratch. A fixed-width design is like a guest walking into your home and immediately rearranging all the furniture to suit them. It is total design arrogance and I won’t do it.

The current layout is my old 2.6 templates pasted onto 3.x.

So, if parts of this blog are ugly, like the comments popup, well, it is my responsibility, but I just don’t have a full day to sacrifice to the CSS god and the ridiculous design from Six Apart. Sorry about that. It is on my list of things to fix.

Google Blog Search Catches Up, Sort Of

Google Blog Search adds support for blog ping only two years after it shipped in Ultraseek (in version 5.3, September 2004). Want to bet the Google Appliance still doesn’t have it?

It wasn’t hard to implement, either. I put it together for Ultraseek in a couple of days. Clearly, most of their people are working on ads, not search. After all, ads make money and search costs money.

Take the Bruised Apple

Scoutmaster Minute for Troop 14, September 26, 2006

You’ve probably heard the phrase “rank has its privileges.” That means the Patrol Leader can pick the best spot for his tent, is first in line for dinner, and gets to tell people what to do, right?

A friend of mine was in the Marine Corps, and has a story about this that has a different angle.

You are an officer eating with your unit, and they bring out a bowl of apples. There is one apple per person, because this is the military. One of the apples is bruised. Which one do you pick?

According to Dave, if you pick a good apple, you are not fit to lead in the Marine Corps. When you choose that, you are giving a bad apple to one of your Marines, and that means they will not be at their best. If they are not at their best, they might die, and they are your responsibility. Not giving them enough food is like not giving them enough bullets.

We aren’t Marines, but we are leaders. When you are leading, you might find that your tent goes up last, because you are helping a Tenderfoot get his tent set up snug and dry. You might spend a lot of time on the phone, making sure your guys know what is happening. You might be last in line for food and first in line to clean the pots. You might find that your privilege is to serve, like it is my privilege to serve you.

Take the bruised apple.

Nukers and Shills

Some fun new terminology in the “let’s spam the recommender” business. Paul Lamere of Sun Labs reports on a talk by Bamshad Mobasher at Recommenders ’06 about attacks on recommendation systems.

[Dr. Bamshad Mobasher] outlined two basic types of attacks: shilling – trying to promote an item and nuking – trying to demote an item. These types of attacks are quite real. The social site Digg is under constant attack by shills trying to get their story promoted to the front page.

Bamshad pointed to an example when a loosely organized group who didn’t like evangelist Pat Robertson managed to trick the Amazon recommender into linking his book “Six Steps to a Spiritual Life” with a book on anal sex for men.

Bamshad suggest that one way to defend against shills and nukes is to create hybrid recommenders – recommenders that not only use social data but some inherent measure such as text or acoustic similarity. These types of systems are typically more robust than pure social recommenders.

This isn’t a new thing. Around 1998 at Infoseek someone was trying to sell us a system that used auto-categorization with discussion groups. As soon as they opened it up, some hostile users hijacked a Christian group to be in the Satanism category (or maybe Wicca, it was a long time ago).

At the start of this post I called recommender spam a business. I don’t know if anyone is making money on it yet, but it is one more opportunity for spammers. Your product gets a higher rating, they get paid. If you notice that sites suddenly require a login or even a subscription to post ratings, thank the spammers.

Steve Irwin

Scoutmaster Minute for Troop 14, September 12, 2006

Steve Irwin, the crocodile hunter, died last week. People had a lot of opinions about him, but one thing that everyone agreed on was that he loved what he did. He loved wildlife and he loved being close to it.

You are lucky if you find something that you love that much, and it is really rare to be able to do it all the time.

If you can find something that you love even half as much as Steve Irwin loved wrestling crocs, and you can do it even one hour a week, do it.

That Game Boy Groove

I just got hooked on 8-bit music, made with sequencers on reprogrammed Game Boys, like Nanoloop and Little Sound DJ. I’ve never been that interested in video games, but this is some fun music, with a Zelda Dance Party groove.

Right now, I’m listing to “In the Dark” by Boy vs. Bacteria. The band seems to be one guy in Sweden. Whee!

For a sampler playlist, check out DJ Octobit at This Spartan Life.

Some of the musical transitions don’t quite make sense to me, but that is probably because I haven’t played the game. Everyone knows the music changes when you leave the meadow. Or something.

Some Specific, Non-boring Teaching Methods

Clarke Green has posted a series of articles titled Instructional Methods for Scouts. These are great for learning in Scouts, but they certainly aren’t limited to that. Teaching teenage boys is a special challenge, mostly because they haven’t learned to be quiet and polite when they are bored out of their skulls. If you’d like to move beyond boring your students, give these a try.

  1. Introduction
  2. Round Robins
  3. Guided Discovery
  4. Coach and Pupil
  5. Kim’s Game and Variations
  6. Circle Up!
  7. Preparing
  8. Who Instructs?

Or, get the whole thing as a PDF.

Christopher Alexander (Mis)reading Photographs

I’ve finished the first volume of Christopher Alexander’s The Nature of Order, and the photographs just jumped out at me. Several of the photos showing “wholeness” in everyday life were very, very good. The photos aren’t credited in the text, so I dug through the acknowledgments in the back. Surprise! The photos are by Henri Cartier-Bresson, Alfred Eisenstadt, Andreas Feninger, André Kertész, and Eliot Porter, some of the finest photographers of the 20th century.

The central concept of The Nature of Order is wholeness, an aesthetic and mathematical order which creates good fit between things and people. There are photographs of wholeness in buildings, ceramics, and rugs, all by masters of those arts. There are also photographs of street scenes and everyday life. These photographs are are by masters of photography, but they are not examined as art in themselves, only as documentation of wholeness in something else. Oops.

Alexander looks at the teacup, but through the photograph. The discussions of wholeness are always about the photograph’s time and place, never about the creativity of the photographer who chose that time and place to make the photograph. Alexander makes an important mistake when he treats artistic photography as pure documentation.

The mistake is easier to understand when you look at the photographers he uses. Most of them are working in a narrow style, the “high mimetic” mode (using Northrup Frye’s literary term) typical of Life magazine. The photographs intentionally show a world that is clearly like us, but better in some way. Most people do take these photos as documents, without realizing the skill and art involved in making a beautiful photograph from the living, moving world.

For Alexander, these are photographs of subjects or situations which strongly show wholeness. For me, these are photographers who can create art with strong wholeness from everyday subjects and sitations. Unfortunately for him, this is a serious mistake. Is the wholeness in the world or in the photograph? Is it innate or created by observers? Is wholeness flat and black and white or three-dimensional with colors and smells? If you are espousing a theory of fundamental order and wholeness in the world using photographic evidence, this isn’t a question you can dodge. It is central. These photographs are not neutral evidence of order and wholeness, they are themselves creations.

Alexander does use a few photographs by Eliot Porter and Edward Weston, clearly not high mimetic photographers. Again adapting Northrup Frye, these are recognizably real scenes, obviously superior in degree but not in kind (Frye calls this the “romantic” mode). These photos are used to illustrate form in nature, so it is appropriate to use photos that emphasize formal composition over documentation. Still, Alexander never even mentions that Eliot Porter might have created a photograph with order and wholeness out of available bits of nature instead of merely documenting the existing order. He seems to be misreading these more formal photographs in the same way as the others.

Two glaring examples of this misreading are with a single Henri Cartier-Bresson photo and with a series of André Kertész’s photos of Paris. Both cases have extensive discussions of the wholeness of the scenes as if the photographs were pure documentation.

The first example, pages 92-95, comes with a convenient contrasting example. First, we get a discussion of what is visible in the Cartier-Bresson photo. The next photo is of Alexander’s childhood home, and most of the discussion is about things not shown in the photo. In fact, this discussion is the first one where wholeness is clearly a three-dimensional concept and even an experiential path through three dimensions (like ZEN VIEW or INTIMACY GRADIENT in A Pattern Language). Until this point, it wasn’t clear whether wholeness was purely visual or was a characteristic of human activity.

Toward the end of the volume is a short section dedicated to André Kertész’s Paris. Kertész is an especially poor choice to treat as a documentary photographer. He was deeply visual and emotional, sometimes more more surrealist than realist. His own comments on his photography make exactly this point: “The things I photograph are not at all outstanding. I make them stand out.” [from PBS video interview]. Alexander reads these photographs naively: “Can we aspire to this? To Kertész’s pictures?” [page 394].

How can it make sense for architecture to aspire to a photograph? A later Kertész photograph, Broken Bench, makes this point especially clearly. The photograph is of a park, but it certainly isn’t something we aspire to. The bench is broken! It does make sense as a symbolic portrait an emotional state, perhaps of Kertesz’s problems fitting into New York after leaving Paris. It isn’t any kind of evidence for or against the wholeness of that particular park, and there is no way for an architect to “aspire to this”. The art of that photograph has nothing to do with the design of parks and benches.

I do think there is a lot of value in Alexander’s thesis of wholeness, but it is deeply disappointing that a brilliant person working in an applied art (architecture) can’t tell the difference between a document an a work of art. Photography has been around for over 150 years. Get a clue, people.

Good to Great Search

I was reviewing a sample chapter from Lou Rosenfeld and Rich Wiggins’ upcoming book on search log analysis. This chapter is covers Michigan State University’s steps in patching around an aging AltaVista engine. It is good history, but not very good advice. MSU’s first step was to build a manual Best Bets system to match individual queries to editorially chosen URLs.

Best Bets are very effective, but are usually a last resort, not the first. The strength of Best Bets is that the results are very, very good. The weakness of Best Bets is that the manual effort only improves the results for a single query. That had better be an important query! Most other kinds of tuning help all queries or at least a broad set, perhaps all results from one website or one web page template.

Here is what I suggest for improving your search:

  1. Get a better search engine. This will help all queries, even the ones you don’t measure. If you don’t already have a metric for “better”, use the relevance measure from step 4 combined with the required number of documents and query rate.
  2. Look at the top few hundred queries and record the rank of the first relevant result.
  3. For each query without a good hit in the top three (“above the fold”), find one or more documents (URLs) which would be good results.
  4. If you want a single number for goodness, use the ranks from step 3 to calculate MRR (mean reciprocal rank). Invert each rank number and average them. You’ll get a number between 0 and 1, where “1” means the first hit was relevant every time. If you are getting above 0.5, your engine is doing a pretty good job — you’re averaging a good result in the second position. You need at least 200 queries for MRR measurements to be statistically valid.

Now you have a list of failed queries matched with good documents. Start at the top of that list, and try the following actions for each one. When one of your preferred documents is ranked above the fold, you are done with that query and should move on to the next query in your list.

  1. Are the preferred documents in the index at all? If not, get them in and recheck the ranking.
  2. Are the documents ranked above the preferred ones good quality or junk? If they are unlikely to be a good answer for a reasonable query, get them out of the index and recheck the ranking.
  3. Are the preferred documents valid HTML? Do they depend heavily on IFrames, JavaScript, Flash, or other too-clever features? Fix them to comply with ADA and Section 508 (it’s the law!), reindex, and recheck.
  4. Do the preferred documents have good titles (the <title> tag in HTML)?
    If not, fix that, reindex, and check the ranking.
  5. Take a critical look at the preferred documents and decide whether they really answer the query. If they don’t, add a page which does answer it. Index that page and recheck the ranking.
  6. Do the documents include lots of chrome, navigation, and other stuff which swamp the main content? If so, configure your search engine to selectively index the page (Ultraseek Page Expert) or use engine-specific markup for selective indexing in the page templates. Reindex and check the ranking.
  7. Do the terms in the preferred documents match the query? The query is “jobs” but the page says “careers”? If so, consider adding the keywords meta tag or synonym support in your engine (or go to the next step). Reindex and check the ranking.
  8. Add a manual Best Bet for this specific query pointing to the well-formatted, well-written document with the answer. Schedule a recheck in six months to catch site redesigns, hostname changes, etc. and hope that it doesn’t go stale before then.

As you go through this process, you’ll find entire sites which are not indexed, have bad HTML, are heavy with nav and chrome, or are designed so that they just don’t answer queries (click for the next paragraph). Fixing those will tend to improve lots of things: WWW search rankings, web caching, accessibility, and bookmarkability.

Search matches questions to answers. It is really hard to improve the quality of the questions (get smarter customers?), and the matching algorithms are subtle and tweaky, so don’t be surprised when most of your time is spent improving the quality of the answers.

Lensman, Now With Real Swearing!

Arnold Zwicky’s Goram Motherfrakker! post about fake cuss words reminded me of the last time I read E. E. Smith’s Lensman series. The fake swearing there is of legendary silliness, to the point that it distracts me from the silliness which is essential to the plot. My trick is to substitute my own realistic cursing while reading. You can do it too. Use your imagination. I know you can do better than “she’s a seven sector call-out” even if it is just “check out the ass on that one!” You’ll need to get into a rhythm though, because you will encounter “by Klono’s gadolinium guts!”

The User Is Not Broken

Karen Schneider posted a meme-ifesto under the title The User Is Not Broken. I like it. This is a much better slogan than “The customer is always right.” We all know customers who are deeply mistaken, even wrong, but they are not broken. The user is not the problem. The user has a problem and they may be really confused about how to solve it or what the problem really is. Our work is to help them solve their problem.

Stupid Stemmer Tricks

A “stemmer” is software that returns a “stem” for a word, usually removing inflections like plural or past tense. The people writing stemmers seem to think they are finished when the mapping is linguistically sensible, but that leaves plenty of room for dumb behavior. Just because it is in the dictionary doesn’t mean it is the right answer.

Here are some legal, but not very useful things that our stemmers have done for us:

  • “US” to “we” (wrong answer for “US Mail”)
  • “best.com” to “good.com” (oops, don’t run URLs through the stemmer)
  • “number” to “numb” (correct, but when is the last time you meant “more numb”?)
  • “tracking meeting” to “track meet” (gerund to verb that can also be a noun, bleah)

The stemmer people say “use part of speech tagging”, but we need to do exactly the same transformations to the documents and to the queries. Queries rarely have enough text for the tagger to work.

A search-tuned stemmer would be really nice to have. I’ve got some ideas: leave gerunds alone, don’t treat comparatives and superlatives as inflections, and prefer noun-to-noun mapping. It would need to be relevance-tested with real queries against a real corpus, of course.

The Right Tool for the Job

My son watered our plants on the patio with the hose, but missed a couple (and watered a couple of chairs, too). We pointed out the missed plants, so he got his super soaker, loaded that up, and used it to water them. It was just the right amount of water for two plants.

When choosing between tools for a job, why not choose the fun one?

HTTP Compression is not an Obvious Win

Tim Bray posts about How to Send Data and asks, “if you’re sending anything across the Net, why would you ever send it uncompressed?” Mostly because it is a lot messier than it should be and the payoff is small. I’ll survey the problems we ran into when we added HTTP compression to Ultraseek.

Tim also brings up encryption. That has many of the same problems, but the payoff is much, much bigger, so it is usually worth the hassle.

If you can store your content compressed, some of these problems go away, but not all. Compressing on the fly is often not worth the bother.

Algorithm Compatibility: The spec lists three standard compression algorithms: compress, deflate and gzip. Compress isn’t as effective and browsers implement deflate in two incompatible ways, so the first step is to only send gzip. With gzip, you still have to decide on a compression level.

Keep-alive: For HTTP keep-alive, you need to specify the content length in the HTTP header. But with compression, you don’t know that length until after the compression, so you can’t send the header to the client until the compression is finished. This can add substantial delay. You avoid this by using chunked transfer coding, an additional complexity.

Server-side Latency: A great trick for responsive servers is to push content out the socket as soon as you have it. This is especially important if the content takes a while to generate. In our case, you can list all the URLs the spider knows about for a site. This can take a while. So, flush out the template HTML, then flush every N list items. If your content compresses really well (an HTML list of URLs may see 10X compression), then you have a choice of pushing out short packets or making the customer wait. Either way, compression has not improved the user-visible performance.

TCP Latency: If the latency is dominated by network round-trips or new connections, compression won’t help much. New connections go through TCP slow start, so reducing your page from six packets to four won’t eliminate a single round trip. Slow start doubles the outstanding packets for each round trip, so you have 1, 2, 4, … in transit. One RTT for one packet sent, two for three, three for seven, until you hit the max in-transit buffer size.

Browser Compatibility: The deflate algorithm mess is one source of browser incompatibility, but there are also older browsers that implement compression badly or only recognize “x-gzip” in the response headers. A really robust implementation may need to check the user agent before sending compressed responses.

Compressed Formats: Compressing an already-compressed format is a complete waste of time, so you need to make sure to not compress JPEGs, zip archives, etc.

Hard to Measure: Good performance measures for this need a range of tests over different real networks with varying bandwidth/delay properties. In our tests, we could not demonstrate conclusive improvement. But it didn’t hurt, so we leave it turned on.

Way back in 1997, the LAN-based tests of HTTP compression showed small improvements, around 15-25%. That is not a meaningful different for user interface, and maybe not for net utilization. If there is any increase in latency to start rendering the page, that will be a big loss for responsiveness.