Blekko (a new search engine) has been in the news/blogs this week and it seems pretty interesting.  The about page has some interesting info – like principles for search engines.  The goal of openness is pretty interesting, though time will tell whether it’s a good idea or not.

Slashtags seem to be the main contribution – the users can add tags to webpages and those can be included in your search.  The most compelling examples seem to be adding conservative or liberal slashtags to political searches.  The complication is that slashtags are user-defined, so they include many different types of classification of texts.  Here are a few examples of slashtag types:

  • topic tags:  /music /tech /neuroscience
  • ???:  /humor /gossip
  • source politics:  /conservative /liberal
  • genre?:  /youtube /twitter /tips-and-tricks /homepage /links /blogs

The people at blekko defined several tags themselves, such as /date sorts results by date, or /people only returns pages associated with a person.

The main question in my mind is:  How much of this can’t be represented in traditional search?  In response, I’d say there are two things.  For one, tags may reflect attributes of the document that are orthogonal to the topic/bag-of-words.  Secondly, tags are defined by users, so where the author might not outright use the term “homepage”, the slashtag can fill that in.

The second concern I have is that it’s very dependent on the user base.  Even the “type” of the slashtags is dependent on the user base.  If you wanted to, you could go in and add a “/interesting” tag to things.  Because people have varying interests, that could become a useless tag.  It also strikes me as more susceptible to search engine optimization pranks (there have been a few semi-famous ones with Google), where you optimize a bad site for a search query for people you don’t like.  I think there was one about George Bush or maybe Scientology, but I can’t remember.

Beyond the problem of reaching a consensus in slashtags and gaming them, there are intrinsic difficulties.  Like say a webpage hasn’t been tagged yet (or hasn’t been fully tagged).  This blog describes how blekko can auto-tag for certain categories, but it is a small manually-maintained tag set.  I imagine they’ll need to tag using user tags eventually, though I imagine it’s scary to trust unknown taggers.  At the same time, with topics at least there is a natural hierarchy, but I don’t think tags work that way.  For example, the query “memory /science” might not return something tagged as /neuroscience (unless I’m mistaken).

Overall I’ll say that it’s an interesting idea, but risky in terms of trusting anonymous users.  Some communities of anonymous users are excellent and others are awful.  Potentially they may come up with automated ways to flag questionable taggings for review or some similar notion to control the user base.

Categories: nlp Tags:

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s