The User Is Not Broken

Karen Schneider posted a meme-ifesto under the title The User Is Not Broken. I like it. This is a much better slogan than “The customer is always right.” We all know customers who are deeply mistaken, even wrong, but they are not broken. The user is not the problem. The user has a problem and they may be really confused about how to solve it or what the problem really is. Our work is to help them solve their problem.

HTTP Compression is not an Obvious Win

Tim Bray posts about How to Send Data and asks, “if you’re sending anything across the Net, why would you ever send it uncompressed?” Mostly because it is a lot messier than it should be and the payoff is small. I’ll survey the problems we ran into when we added HTTP compression to Ultraseek.

Tim also brings up encryption. That has many of the same problems, but the payoff is much, much bigger, so it is usually worth the hassle.

If you can store your content compressed, some of these problems go away, but not all. Compressing on the fly is often not worth the bother.

Algorithm Compatibility: The spec lists three standard compression algorithms: compress, deflate and gzip. Compress isn’t as effective and browsers implement deflate in two incompatible ways, so the first step is to only send gzip. With gzip, you still have to decide on a compression level.

Keep-alive: For HTTP keep-alive, you need to specify the content length in the HTTP header. But with compression, you don’t know that length until after the compression, so you can’t send the header to the client until the compression is finished. This can add substantial delay. You avoid this by using chunked transfer coding, an additional complexity.

Server-side Latency: A great trick for responsive servers is to push content out the socket as soon as you have it. This is especially important if the content takes a while to generate. In our case, you can list all the URLs the spider knows about for a site. This can take a while. So, flush out the template HTML, then flush every N list items. If your content compresses really well (an HTML list of URLs may see 10X compression), then you have a choice of pushing out short packets or making the customer wait. Either way, compression has not improved the user-visible performance.

TCP Latency: If the latency is dominated by network round-trips or new connections, compression won’t help much. New connections go through TCP slow start, so reducing your page from six packets to four won’t eliminate a single round trip. Slow start doubles the outstanding packets for each round trip, so you have 1, 2, 4, … in transit. One RTT for one packet sent, two for three, three for seven, until you hit the max in-transit buffer size.

Browser Compatibility: The deflate algorithm mess is one source of browser incompatibility, but there are also older browsers that implement compression badly or only recognize “x-gzip” in the response headers. A really robust implementation may need to check the user agent before sending compressed responses.

Compressed Formats: Compressing an already-compressed format is a complete waste of time, so you need to make sure to not compress JPEGs, zip archives, etc.

Hard to Measure: Good performance measures for this need a range of tests over different real networks with varying bandwidth/delay properties. In our tests, we could not demonstrate conclusive improvement. But it didn’t hurt, so we leave it turned on.

Way back in 1997, the LAN-based tests of HTTP compression showed small improvements, around 15-25%. That is not a meaningful different for user interface, and maybe not for net utilization. If there is any increase in latency to start rendering the page, that will be a big loss for responsiveness.

Older than FORTRAN

But only by a few months. Today is my 50th birthday, and the most reliable “birthday” I can find for FORTRAN is October 15, 1956, the publication date for the FORTRAN Programmer’s Reference Manual (scanned PDF).

I wrote my first program in FORTRAN. To be specific, FORTRAN IV EMU from Eastern Michigan University, running on the IBM 1401 (I think) at Rose-Hulman Institute of Technology. I was at Operation Catapult, a three-week program for high school juniors. Big fun, and I’m glad to see it is still running.

The program was a two-body simulation, with the paths printed in in line-printer graphics. I wonder if I still have a copy of that somewhere in the “closed stacks” at the back of the garage.

FORTRAN wasn’t my first computer language, that was BNF grammars. I was reading SF in math class because I was being taught logarithms for the third time, and I’d learned them before I was taught them the first time (got a slide rule for Christmas in seventh grade). The teacher noticed and had me stay after to chat. He sympathized, but asked me to at least read a math book during class. So, I found one on computer programming and churned through it over a couple of weeks. I still have a fondness for colon-equals as an assignment op.

PowerBook out the Window

No, I didn’t throw it and I’m not switching. Someone broke our window at 3:30 AM and grabbed my PowerBook off the table. Gone.

A window breaking is really loud. We thought that the kittens had manged to knock down a stack of cookie sheets with dishes on top of it until we found the broken glass by the table. The Palo Alto police were really nice, but it was hard to get back to sleep. The kids slept through the whole thing, of course. And all this two days before we left on vacation.

The IT department has been really great — my new IntelBook is already delivered, waiting for me to return from Maui.

I miss the data more than the hardware. I wasn’t very good about backups, but I did treat most of the laptop data as volatile. E-mail lives on the server and I’m religous about the digital photos being on two separate storage devices before I delete them from the camera. Code is all in CVS. Software keys are copied to the home iMac. Still, there are plenty of miscellaneous things that are just gone, like notes from the Patrol Leaders Council (time to trust the Troop Scribe to take notes).

Since I’m starting clean on the new machine, I’m open to recommendations for Mac software (especially backup).

Google Linux Distro: Desktop or Appliance?

This news article on Google Ubuntu, aka Goobuntu only talks about desktop Linux, toolbars, clients, and challenging Microsoft. No mention of a base for running Google search without an appliance or web caching or GFS-based fault-tolerant file servers or any of that other server-room stuff.

Funny how people only think about server-side stuff inside Google (“they’re building their own internet!”) and client-side stuff outside (“they want to be Microsoft!”).

I wonder what is really up.

Problem-Solving Products

Here is a very clear statement about understanding your product:

“Don’t start a business if you can’t explain what pain it solves, for whom, and why your product will eliminate this pain, and how the customer will pay to solve this pain. The other day I went to a presentation of six high tech startups and not one of them had a clear idea for what pain they were proposing to solve.”
— Joel Spolsky, Micro-ISV: From Vision to Reality

I also like the advice from What color is your parachute?, that companies hire a person to solve a problem and they don’t want to get new problems. They buy products exactly the same way.

What problem does your product solve?

What problems does your product create?