Monthly Archives: March 2009

Blog search broken? DEAD?! Let’s just call it “hybernating”.

Over the last year, there has been a flurry of stories on the blogosphere about the about the inability of big search players and small startups to create a viable blog search engine. It all gradually started with the ubiquitous niggle at Technorati, the “leading” player in the blog search market, a company that probably spends more time at thinking about the perfect ad placement coordinates than spam reduction and search relevance.. hmm, well, you see what I mean by “ubiquitous niggle”.

But then I just thought it was all meant to encourage Technorati, as we all have this innate need to see Google beaten at any race they’re in, even though we actually just need that to happen to be able to tell ourselves that Google is not the only option and we’re using it at free will (sorry, drifting offtopic..). Anyway, that seemed not to be the case, as the will to see Technorati improve slightly went over into a general sense of frustration, capped off in August 2008 with Mashable’s supposition that Blog Search is Broken.

The main notion in that post was that Technorati has the most potential, but just doesn’t seem to be getting any love from the users, while the newer options like MyBlogLog, BlogCatalog or Wikio are just trying to put collections of blogs and/or blog posts together in a directory-like manner, without offering much of the search dimension.

Since then, at least one startup has sprung to life which seems to be getting a lot of acclaim, namely Twingly. But while the company is doing really good progress, it doesn’t seem to have established itself as a major player yet, and thus we come to March 2009, where, as the legend goes, blog search has finally died. At least according to The Blog Herald’s latest analysis, according to which there is too much spam and irrelevant stuff whenever it comes to blog searching.

Now they are probably completely right on the problem aspect, i.e. there is really a lot of spam-blogs out there, as well as feeds that aren’t even blog feeds or some company’s great idea to index the whole page of a blog post, without even trying to cut out the actual text from it (Go*cough*ogle). As for the solutions, there is much more that needs to be done but also much reason to be optimistic even of the very near future to come up with a decent solution.

First, the problem or spam-blogs, feeds that don’t point to blogs and other basic irrelevancy. Think of just one approach here, the Wikipedia-approach, and you will be rewarded aplenty! Yes, spammers will try to add their spam-blogs, while others will be accidentally saved by the crawlers, but there are users to help us out there. A simple combination of user-voting and admin-monitoring can do wonders here in my opinion, there just needs to be a good base of blogs to start with to attract the crowd and then the snowball will be on its’ way.

A much bigger problem is that blog search engines are still focusing on the old approaches of link-based post ranking, tagging and also the new approach of user-voting. As I pointed out in previous posts, the first is just a tonic for top 100 blogs, the second highly misleading and the third just not applicable for a pure search site (it surely is for a news aggregator or such). What is amiss here are semantic technologies that will mine deeper into the meaning of the blog posts and provide the user with highly relevant stories based on their content and not some keyword-link rank.

Thus, to revive blog search, we have to see that it’s different from the initial Web problem of finding any relevant content in a haystack, it’s about finding the one that makes the most sense. Luckily for all concerned, there are many semantic startups launching and getting good support around the scene and it can’t be long before they spread over to such tasks as blog search. When that day comes (i.e. I stop talking and finally deliver thing thingy called Topify), Jonathan, you will be one of the first to get an invite and have your faith in blog search restored :)


