Blog search broken? DEAD?! Let’s just call it “hybernating”.

Over the last year, there has been a flurry of stories on the blogosphere about the about the inability of big search players and small startups to create a viable blog search engine. It all gradually started with the ubiquitous niggle at Technorati, the “leading” player in the blog search market, a company that probably spends more time at thinking about the perfect ad placement coordinates than spam reduction and search relevance.. hmm, well, you see what I mean by “ubiquitous niggle”.

But then I just thought it was all meant to encourage Technorati, as we all have this innate need to see Google beaten at any race they’re in, even though we actually just need that to happen to be able to tell ourselves that Google is not the only option and we’re using it at free will (sorry, drifting offtopic..). Anyway, that seemed not to be the case, as the will to see Technorati improve slightly went over into a general sense of frustration, capped off in August 2008 with Mashable’s supposition that Blog Search is Broken.

The main notion in that post was that Technorati has the most potential, but just doesn’t seem to be getting any love from the users, while the newer options like MyBlogLog, BlogCatalog or Wikio are just trying to put collections of blogs and/or blog posts together in a directory-like manner, without offering much of the search dimension.

Since then, at least one startup has sprung to life which seems to be getting a lot of acclaim, namely Twingly. But while the company is doing really good progress, it doesn’t seem to have established itself as a major player yet, and thus we come to March 2009, where, as the legend goes, blog search has finally died. At least according to The Blog Herald’s latest analysis, according to which there is too much spam and irrelevant stuff whenever it comes to blog searching.

Now they are probably completely right on the problem aspect, i.e. there is really a lot of spam-blogs out there, as well as feeds that aren’t even blog feeds or some company’s great idea to index the whole page of a blog post, without even trying to cut out the actual text from it (Go*cough*ogle). As for the solutions, there is much more that needs to be done but also much reason to be optimistic even of the very near future to come up with a decent solution.

First, the problem or spam-blogs, feeds that don’t point to blogs and other basic irrelevancy. Think of just one approach here, the Wikipedia-approach, and you will be rewarded aplenty! Yes, spammers will try to add their spam-blogs, while others will be accidentally saved by the crawlers, but there are users to help us out there. A simple combination of user-voting and admin-monitoring can do wonders here in my opinion, there just needs to be a good base of blogs to start with to attract the crowd and then the snowball will be on its’ way.

A much bigger problem is that blog search engines are still focusing on the old approaches of link-based post ranking, tagging and also the new approach of user-voting. As I pointed out in previous posts, the first is just a tonic for top 100 blogs, the second highly misleading and the third just not applicable for a pure search site (it surely is for a news aggregator or such). What is amiss here are semantic technologies that will mine deeper into the meaning of the blog posts and provide the user with highly relevant stories based on their content and not some keyword-link rank.

Thus, to revive blog search, we have to see that it’s different from the initial Web problem of finding any relevant content in a haystack, it’s about finding the one that makes the most sense. Luckily for all concerned, there are many semantic startups launching and getting good support around the scene and it can’t be long before they spread over to such tasks as blog search. When that day comes (i.e. I stop talking and finally deliver thing thingy called Topify), Jonathan, you will be one of the first to get an invite and have your faith in blog search restored :)

Advertisements

3 Comments

Filed under Storyflow

3 responses to “Blog search broken? DEAD?! Let’s just call it “hybernating”.

  1. First off, I am definitely interested in checking out Topify when the time comes so I am very happy to be on that list of people. Definitely include me if you can.

    As I said on my post on the Blog Herald, I think the semantic Web has potential to really improve things in this area, but I am not certain it will really fix the problem as it doesn’t address a lot of the issues.

    One thing I do find kind of odd is that you hail a user-generated solution to the problem, namely Wikipedia-style voting and recommendations, while condemning other user-generated solutions such as inbound links, tags and voting as non-answers. Perhaps I’m misreading those two paragraphs, I get the feeling I’m missing something, but I don’t see how one user-generated solution can be a fix for a mess created by three others.

    We will see if the semantic Web can help. I’m going to remain a skeptic until I see the progress. Right now though, if blog search isn’t dead, it’s on life support. Doesn’t mean it won’t come back. If I’ve learned one thing from my love affair with horror movies, it’s the being dead really doesn’t mean all that much.

  2. I feel your pain. I’m one of the co-founders of Regator, our site was mentioned in Jonathan’s post over on the Blog Herald. I think your idea of a crowd-sourced Wikipedia style isn’t a bad one. The angle that we’re taking at Regator is to ensure quality content. So we have real people as well curate the site, we’re just really selective and want a lot of the control of the quality. Once control that is dispersed into the crowd, you can get quality hiccups like on wikipedia. Idefinitely has it good side and bad side. That is not to say that we will not someday have a more social component to blog selection on our site, but we feel that through some of the things we’re doing by hand and also some of the semantic stuff we’v got on the backend and we’re working on, we will continue to provide a solid resource for people looking for good blog content. We’ll never have everything under the sun and that’s okay by us. Just as long as we have relevant stuff that people find useful and interesting, without all the crap.

    Twingly does well for search but if you take a look at our site, blog search is really just one part of it, it’s more Blog Browsing (with bonus search!).
    Anyways, good followup post and if you have any feedback please let me know, we love to get it. You can email or call me at 404-493-5121 to discuss. Cheers!

  3. Marty

    Jonathan, Scott,

    Thank you very much for your comments. I must admit I probably made it all a little confusing by condemning user-generated solutions first and then saying they are the cure :) What I meant was that when looking for relevant stories, one shouldn’t rely on user votes/tags/link etc, because you will get gamed or at least very biased results.

    On the other hand, when all you need to do is filter out a source that is pure spam or not a blog, that’s where the amazing classifier which is the human brain comes up trumps. So if you only let the users report bad sources and do semantic search on the stories from the acceptable sources, that’s already a big step towards having a broad, yet good range of sources and stories.

    Hope this clears our idea up a little, if you have any further questions or thoughts, let me know :)

    Cheers,
    Martin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s