Tuesday, July 31, 2007

The changing nature of search

Since the dawn of the web-driven internet, search engines have been largely key-word based. By that I mean the web pages of the world were indexed based on what the words are on that page. How the results are sorted after you hit the Go button is the product of quite a bit of intellectual property, but the big boys in the industry rank results based on popularity as a proxy for relevance.

This is an approach that works well for items that are popular, but not as well for items that are decidedly niche or were JUST published and therefore don't have much popularity yet. Also, really good searches are best thought of logically, and logic is not something most users are trained in. Drilling down into a popular topic into the sub-realm you're interested in almost always requires excluding search terms, or selecting between two. And most people don't get that sort of thinking.

The search engine I used before Google came on the scene in a big way was Alta Vista. That one used actual logical operators for its searches, which pleased the computer scientist in me. When looking up weird stuff sometimes I ended up with searches that looked like this:

((term OR term) AND (term OR term)) NEAR term

I know what that means, and how the parens work in there. But, um, normal people don't, even though it used words instead of & (and), | (or), or ! (not). What 'normal people' expect, is to type the phrase, "How do I tie a half-windsor" into the search engine, and get a handy how-to in the first few links. Most search engines these days will attempt to search the whole phrase, or failing that drop the common words (everything but 'tie' and 'half-windsor') and return what THAT comes up with.

Work has been going on for years now on what's called 'semantic search', which analyzes the query for the concept it is driving at, and returns results that match the concept. This is a much harder way to index, but arguably a more valuable one when it comes to returning valid results. As an example of why this is a good thing, take the term 'bondage'. It has two very different concepts associated with it. One is the Christian concept of the term ("deliver me from bondage", or, "I am in bondage to.."), the other is a bedroom activity that gets filtered by family filters. A semantic search engine should be able to tell the difference between queries using that word, and decide which large sub-set of pages it should return for the search.

This is decidedly tricky, as there is no easy way to automate the process of determining the concepts on a page. It takes MUCH beefier computer power to index a page for sematics and concepts than it does for key-words. Google and other search engines are now taking variations on the search terms and searching on that as a way to deliver more relevant results, but this only goes so far.

The other day I was searching on a non-Google site for a query that had the word "replicate" in it. Results returned by that query had the following words highlighted as keyword matches: replication, reproduction, reproduce. As it happens, I was searching for a term that has rather specific technical meaning and is commonly used in that exact form. Reproduction, while next to replication in the Thesaurus, had absolutely nothing to do with what I was interested in. Happily, this particular search engine allowed me to put in "+replicate" as a way to tell it to not try searching on variants, and that returned me a much better list.

Once Semantic Search is here and works well, it'll out-perform traditional key-word based search for most searches. This will take some getting used to for those of us who understand the Boolean logic behind current search engines. Key-word search won't go away, though it may require special search syntax to invoke it. However, I predict that I'll just plain get used to it and start using sentences in search terms instead of key-words.

0 Comments:

Post a Comment

<< Home