Topics on a rise? Launch of FeedVis certainly suggests so.

Before I speak any further about a possible “competitor”, I would like to clarify my personal position in this whole mess a little: Although the Topify project aims to provide its’ users with a perfect topic exploration experience and blow all the competition away, *grin*, we also want to see more linguistic tools used on the Web, especially when it comes to dealing with natural text. Accordingly, we welcome and encourage every idea that tries to use NLP tools to make sense of human-generated texts. Again, as I said before, tagging is an amazing idea and currently the most efficient approach to finding relevant information on the Web, but I deeply believe that natural language can bring so much more to the scene.

That clarified, I would like to express my sincere joy at seeing a company try and use some kind of (albeit very simple) structural approach to “track hot topics“. FeedVis doesn’t do much more than count word ocurrency frequencies in blog posts and put these together to possibly detect the most important and/or currently trending topics in a thematically connected set of blogs. This doesn’t sound like much, but it’s a great start already, although probably with a couple of misconceptions.

What we see in the FeedVis “tag cloud” is in my opinion not precisely a set of topics, but rather a specialized base vocabulary of the selected area. If, when and to what degree the one word or the other is actually a topic is hard to say from such a generalized statistic though. The word “learning” might appear once in _every_ post, whereas some others, which are much lower down on the absolute count, can appear several times in a single post. For me, the first case would not be a topic, but a general theme-defined word which doesn’t really help us understand what the current trends in the area are. The second case, however, is a real topic, which gives strong clues about the main motifs in an area at the given time.

All in all, it’s nice to see a service like FeedVis, that goes beyond the common tagging approach and tries to extract some sense out of texts without relying on anything really. Will be exciting to compete with you once Topify is running. You get a head start anyway :)



Filed under Storyflow

2 responses to “Topics on a rise? Launch of FeedVis certainly suggests so.

  1. Hey, it’s Jason, the guy who built FeedVis. It’s cool that you’re working on something similar. To be honest, FeedVis isn’t really a competitor to anything; it’s just something banged out in my spare time for fun. It’s all open source, so you’re welcome to take all or any of the code you like–so no “head start” :)

    You wrote one thing I didn’t quite understand, about detecting “topics.” FeedVis measures overall frequency, as in word-uses/total words for a particular word. I originally was doing word-uses/total posts, which I think is something like what you were saying. I was very surprised to see that these measures actually came out very similar. It would be interesting to look more into this.

    I think we’re agreed that however it comes about, this kind of automatic metadata generation is the future: tags are nice, but no one is going to sit around tagging the entire internet. That’s what we’ve got computers for.

  2. Marty

    Hey Jason, thanks for stepping by. Yeah, I noticed a bit later that FeedVis is an open source project, but anyway it’s really great some people think in this direction.

    As for the “detection of topics”, word-count/total posts should be actually equal (in distribution) to word-count/total words, since total-words = total-posts*average-words-per-post, or, to put it bluntly, both are constant for a given corpus. What I was referring to is a measure like word-count/posts-where-word-occurred-count. This way if you have say 100 posts and both “learning” and “classroom” have a total count of 50, your current approach would rank them the same. But imagine if “learning” appears in 50 posts (i.e. always once in a post, never more) and “classroom” only in 10 posts (so average of 5 per post). This would hint to the conclusion that “classroom” is an important topic within the posts it comes up in, while “learning” is just a ‘background-term’ that’s parasitic to the field.

    This is obviously a very simple idea, one can take this further to consider average counts per post where a word occurs, spread over time chunks etc. Hope you can catch my drift.

    And yea, let’s hope people recognize the power of AI and NLP, as we need more ideas here. There are some pretty powerful statistical basics in the field, what we need now are some inventive applications to push it further.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s