Many search engines don’t index “stopwords”, words that are very common and have little meaning by themselves. The stopword list is often just the most frequent words in the language: “the”, “be” (and its inflections), “a”, “of”, and so on.
Search engines that index all words like to show off searches for “to be or not to be”, because stopword elimination can remove every word in the phrase. Of course, no one really searches for “to be or not to be” because we all know where it came from.
Are there any real titles that are all stopwords? Does this matter? I’ve been indexing movie titles, and found a more than a few that are 100% stopwords.
- Being There (this is the first one I noticed)
- To Be and To Have (Être et Avoir)
- To Have and To Have Not
- Once and Again
- To Be or Not To Be (1942) (OK, it isn’t just a quote from Hamlet)
- To Be or Not To Be (1983)
- Now and Then, Here and There
- Be with Me
- I’ll Be There
- It Had to Be You
- You Should Not Be Here
- You Are Here
The last one isn’t a traditional stopward, but think about the number of “click here” links on the web. It is a web stopword, for sure.
There was a time when the WordPress search could not handle “The Who” though there were hundreds of mentions on various sites. Let’s not even go into !!! (an indie band)
My favorite movie title at Netflix (and also a band name) was +/-.
Surprisingly, the links to the movies still work, at least for signed-in Netflix subscribers.
Other stopword-rich performers are Don Was and The The.
Forgot about the band “Was (Not Was)”. Isn’t English fun?
Pingback: Using different language stop words in Solr – vfbga