Tag Archives: topics

Topify Storyline or Related Stories 2.0

If you still consume news at their source, i.e. by visiting blogs and online newspapers you will be accustomed to seeing a “related stories” block somewhere next to or at the bottom of a story. The list containing usually 3 to 5 links has become an inevitable part of almost any article on the Web, just like the share buttons and the comments. However, how often does one really check the titles of the suggested related articles, let alone go ahead with reading these? (That is really meant to be rhetorical, but if the owner of a news site wants to share some analytics stats, you’re more than welcome!)

I would argue that the issue with the provided “related stories” is not that readers are not interested in reading more – in fact, if a person took the time to read this article, and if the suggested stories are truly topically related, he would more likely than not be interested in a few of these. That being said, a “related stories” list does not offer the reader enough background information about the articles to persuade him to actually take a look at these.

What I’m referring to in the above is that, when we consume news, we do so as a continuous lifelong process. We build up our perception of the world by doing so, and we (subconsciously) keep track of the evolution behind the stories and topics that interest us. Reading a bunch of random articles that may be from last week or from five years ago does not fit into that process as it requires too much effort from the reader to place the bits of additional news into the timeline that is kept inside our heads.

To alleviate the above issue, all one needs to do as an owner of a news site is provide the related articles in a visual representation that would enable the reader to easily detect the immediate context of the different stories, and at the same time identify stories of high relevance and importance, as opposed to occasional mentions of a topic that bear little to no significance on its evolution.

WebScio team invites both article writers and readers to experience the new way to discover the bigger picture behind a single news article by trying out our new Topify Storyline widget. The service is currently in a beta state and is being tested by the Irish Times in their Financial section (see e.g. here). Interested news site owners are welcome to subscribe for the service and will get notified very shortly with installation details.

We realize that many changes might still need to be made along the way, however, we strongly believe that this is the right next step in the evolution online news narrative! Visit Topify to learn more.


Leave a comment

Filed under Storyflow

Blog search broken? DEAD?! Let’s just call it “hybernating”.

Over the last year, there has been a flurry of stories on the blogosphere about the about the inability of big search players and small startups to create a viable blog search engine. It all gradually started with the ubiquitous niggle at Technorati, the “leading” player in the blog search market, a company that probably spends more time at thinking about the perfect ad placement coordinates than spam reduction and search relevance.. hmm, well, you see what I mean by “ubiquitous niggle”.

But then I just thought it was all meant to encourage Technorati, as we all have this innate need to see Google beaten at any race they’re in, even though we actually just need that to happen to be able to tell ourselves that Google is not the only option and we’re using it at free will (sorry, drifting offtopic..). Anyway, that seemed not to be the case, as the will to see Technorati improve slightly went over into a general sense of frustration, capped off in August 2008 with Mashable’s supposition that Blog Search is Broken.

The main notion in that post was that Technorati has the most potential, but just doesn’t seem to be getting any love from the users, while the newer options like MyBlogLog, BlogCatalog or Wikio are just trying to put collections of blogs and/or blog posts together in a directory-like manner, without offering much of the search dimension.

Since then, at least one startup has sprung to life which seems to be getting a lot of acclaim, namely Twingly. But while the company is doing really good progress, it doesn’t seem to have established itself as a major player yet, and thus we come to March 2009, where, as the legend goes, blog search has finally died. At least according to The Blog Herald’s latest analysis, according to which there is too much spam and irrelevant stuff whenever it comes to blog searching.

Now they are probably completely right on the problem aspect, i.e. there is really a lot of spam-blogs out there, as well as feeds that aren’t even blog feeds or some company’s great idea to index the whole page of a blog post, without even trying to cut out the actual text from it (Go*cough*ogle). As for the solutions, there is much more that needs to be done but also much reason to be optimistic even of the very near future to come up with a decent solution.

First, the problem or spam-blogs, feeds that don’t point to blogs and other basic irrelevancy. Think of just one approach here, the Wikipedia-approach, and you will be rewarded aplenty! Yes, spammers will try to add their spam-blogs, while others will be accidentally saved by the crawlers, but there are users to help us out there. A simple combination of user-voting and admin-monitoring can do wonders here in my opinion, there just needs to be a good base of blogs to start with to attract the crowd and then the snowball will be on its’ way.

A much bigger problem is that blog search engines are still focusing on the old approaches of link-based post ranking, tagging and also the new approach of user-voting. As I pointed out in previous posts, the first is just a tonic for top 100 blogs, the second highly misleading and the third just not applicable for a pure search site (it surely is for a news aggregator or such). What is amiss here are semantic technologies that will mine deeper into the meaning of the blog posts and provide the user with highly relevant stories based on their content and not some keyword-link rank.

Thus, to revive blog search, we have to see that it’s different from the initial Web problem of finding any relevant content in a haystack, it’s about finding the one that makes the most sense. Luckily for all concerned, there are many semantic startups launching and getting good support around the scene and it can’t be long before they spread over to such tasks as blog search. When that day comes (i.e. I stop talking and finally deliver thing thingy called Topify), Jonathan, you will be one of the first to get an invite and have your faith in blog search restored :)


Filed under Storyflow

Topics on a rise? Launch of FeedVis certainly suggests so.

Before I speak any further about a possible “competitor”, I would like to clarify my personal position in this whole mess a little: Although the Topify project aims to provide its’ users with a perfect topic exploration experience and blow all the competition away, *grin*, we also want to see more linguistic tools used on the Web, especially when it comes to dealing with natural text. Accordingly, we welcome and encourage every idea that tries to use NLP tools to make sense of human-generated texts. Again, as I said before, tagging is an amazing idea and currently the most efficient approach to finding relevant information on the Web, but I deeply believe that natural language can bring so much more to the scene.

That clarified, I would like to express my sincere joy at seeing a company try and use some kind of (albeit very simple) structural approach to “track hot topics“. FeedVis doesn’t do much more than count word ocurrency frequencies in blog posts and put these together to possibly detect the most important and/or currently trending topics in a thematically connected set of blogs. This doesn’t sound like much, but it’s a great start already, although probably with a couple of misconceptions.

What we see in the FeedVis “tag cloud” is in my opinion not precisely a set of topics, but rather a specialized base vocabulary of the selected area. If, when and to what degree the one word or the other is actually a topic is hard to say from such a generalized statistic though. The word “learning” might appear once in _every_ post, whereas some others, which are much lower down on the absolute count, can appear several times in a single post. For me, the first case would not be a topic, but a general theme-defined word which doesn’t really help us understand what the current trends in the area are. The second case, however, is a real topic, which gives strong clues about the main motifs in an area at the given time.

All in all, it’s nice to see a service like FeedVis, that goes beyond the common tagging approach and tries to extract some sense out of texts without relying on anything really. Will be exciting to compete with you once Topify is running. You get a head start anyway :)


Filed under Storyflow

Extracting the “Public Opinion” from Blogs

I apologize for not being able to write a lot these last few weeks. I’ve been mostly busy working on a chapter about the theory of public opinion for my final thesis. This was a very enlightening experience, since it opened my eyes on the different ways one can take when measuring the “voix publique” and the corresponding problems. The question I raised at the end of the chapter in my thesis is the one I ask here: do blogs represent the public opinion? This is indeed a very tricky question, but if one takes a look at what the public opinion actually is, there is hope still, that the question can be answered!

The concept of a public opinion comes from the 18th century France, where it was represented by a group of similarly educated, well-read intellectuals. Due to the similarity in the education and the philosophy of the members of the group, the critique they expressed tended to come in unison, hence the initial singular in “public opinion”. Later on, as more differently educated social groups appeared, the “one” public opinion dispersed into the opinions of these groups. By the 20th century, every person was considered bright enough to think on his own, so the notion of an “opinion expressed in social groups” was dropped for a more “public” definition of the term.

However, ever since the public opinion was considered to be a sum of all the personal opinions in the public, no one has been really able to grasp it anymore. Multiple institutions for research of public opinion exist all over the world, but all they do is question the people on the street (at home, on the Web, whatever.. keyword is “at random”). The biggest problem with this approach is that though not every person has an opinion to every subject, they’re still browbeaten by the interviewers to say something..

On the other hand, most people have opinions to certain subjects. And most of us wouldn’t mind sharing them, provided that our anonymity is guaranteed. These are thus exactly the thoughts and opinions that one should gather in order to get an idea of the so called “public opinion”. How do we find these people? Look on the Web! Find all posts that deal with a topic that you want to examine and see what people are saying. This is being done already by specialized brand-monitoring companies like Trackur, which was just recently reviewed on ReadWriteWeb. Unfortunately, however, such services still run for a fee and not for all the random subjects around us.

Thus I would argue that a service that gives its’ users an overview of the “hot” topics and the opinions about these should be of great value and interest to everyone from a common man to a scheming politician. It’s time to put a stop to the “A”-blogs dominating the agenda like the social groups of the 19th century. There is much more information out there that is just as exciting and valuable, we just tend to underestimate the long tail. To get an impression of the matter just click over to this post over at the Data Mining blog and take a look at the odds for a “normal” blogger getting a headline at a site like TechMeme compared to those of an “A”-blogger. Impressive huh? Now just try to imagine the diversity of the topics and opinions in the public blogs.. it’s IMMENSE!

Leave a comment

Filed under Storyflow

Tags and Headlines – Cheating on Topics?

Just a couple of weeks ago I was trying to set things straight in the rather troublesome relationship between tags and topics. The bottom line was that tags are just the same words we use in the content and are hence just as arbitrary and hard to generalize as the content itself. No perfect solution for this problem exists as yet, although I proposed to topify the content in order to keep the generalizing terms (tags.. topics..) as close to the content as possible.

Now I get reminded that there is a third member of the family that may be used in this way, but rather tends to go astray and entertain people. I’m of course talking about the ever present headlines that set the tone to a piece of content long before we actually get to read the whole thing. So what is the function of this, rather distinct, member of the meta-family?

One might have thought that titles should provide the reader a short taster of what is coming next, so that he or she would be able to filter and prioritize a piece of content among others. However, certain media actors don’t like to play fair, especially those who don’t have much of worthy content to offer. This is first and foremost the case with the so called Yellow Press, which has invented and perfected the art of indirect headline for the last century or so.

The idea of an indirect headline is to use a quality of human nature – a humans curiosity – to drive readers to some potentially utterly worthless content, as was the case in a blog entry in The Blog Herald that inspired this here post. The evil thing about this is that we can’t help it but take the bait over and over, even after getting disappointed time and time again.

So what can we do to tame this extravagant member of the meta-family? Well, frankly not much. One could check out the site’s tags to get some additional information, but the problem with tags is that they are becoming more and more like headlines as well. If you tag your content in detail, you risk losing readers who know for sure they don’t care about certain stuff. General tags on the other hand don’t allow you to filter and prioritize the content properly, so you have to check out at least the first few lines of the text. This is where, I think, a more intrinsic way of meta-describing content by topics e.g. can come in handy, from the user point of view anyway.

Leave a comment

Filed under Storyflow

What People Say When They T..alk

This is a challenging question that comes up all the time in our everyday life. This includes anything from the “normal” face-to-face communication when two people meet on the street to blog entries and instant messages or tweets on the Web. As a matter of fact, this is the main topic of a great ReadWriteWeb post titled “What People Say When They Tweet“. I’m really glad to see that the survey performed by the RWW blog and Summize doesn’t stop on calculating the most frequent words and phrases, since those only reflect our emotions and current activities (e.g. lol, ;), working, sleeping). Instead, they try to extract the topics of conversations, further dividing these into short-term and long-term discussions (e.g. about a “Lost” episode vs. US election).

I find the results truly fascinating, since this is a kind of information that is really hard to obtain directly. What these results show is an extremely simplified reflection of the matters that in one way or the other bother us at a certain point in time. It’s a bit sad that the results show less than 10 topics per day, since there must be tens and hundreds of others that have a big enough population as well. It’s just normal that some themes that are common to all the population of a country, like the US election, will dominate all the discussions for a long time. This can be compared to a cloud of 100 topics, where one, the election, completely dominates the rest, but there are hundreds of others around it, that shouldn’t be neglected, considering that there are millions of potential topics.

But no need to be sad! Since this is exactly what Topify is aiming to provide all its’ users – an overview of the topics that are burning the fingertips of all those who are ardently blogging away in an effort to make oneself heard. We hope to make your everyday experience at this project’s site as fascinating as that one post on RWW, with the difference that we are aiming to concentrate on the more structural and constructive communication area – the blogosphere. Excited already? :)

Leave a comment

Filed under Storyflow

Tags and Topics – Yin and Yang or a Dysfunctional Marriage?

I think I don’t have to explain to anyone what tags are, or? But to make sure, Wikipedia states that tags allow anyone on the Web to annotate and categorize content and then roam around and discover similarly tagged content. Sounds like a blast to a lot of people, all one has to do is find some content one finds interesting (a lucky search will do), tag it accordingly and then sit back and enjoy the thrill of discovery via his or her RSS reader.

Now, suppose we stumble upon one of those Microsoft-Yahoo merger articles (well, you couldn’t avoid running into one even if you wanted to) and want to tag it. It’s surely about.. Microsoft.. and Yahoo.. and business deals/mergers/acquisitions.. the “Future of the Web as we know it”.. then technology.. media.. ads.. revenues.. oh and the ubiquitous Google.. I just crawled these from 5 blog entries or so, if I took a week off and went through the other thousand, I’m sure the number of tags used on this one subject would reach a century.

What is then the problem with tagging? Is that different people use different words for the same topics? Or that everyone categorizes differently? Or that people read in different topics into stories? Obviously it’s all of the above and this is what makes tagging, a practice targeted to be used by communities, so hard to actually employ this way. There is a great post by Ms. Rashmi Sinha on the cognitive processes behind concept building and tagging, which shows that it’s a very tough task to actually employ tags that would make the tagged content be easily discovered by others.

So what are the possible solutions to this here mess? The “Tagging” blog describes the approach of ‘tag description’ where users can make a note for themselves, what they actually understand under a selected tag. “Tag gardening” is a great definition for this approach, and I assume that most of you won’t be to keen on having to spend hours on growing your favorite tag plants. Another attempt to make tags more generalized is the idea of semantic tagging, e.g. as provided by Zigtag. I’m sure it will improve the tagging experience by a country mile, but it still doesn’t reduce the number of tags nor the number of words we can theoretically use.

Another solution would be to actually extract the main topics from a piece of content so that the story is naturally categorized. Then, depending on if you want to read a story about just one of these topics or explore in similar topics, you can either search directly or use natural language semantic relationships to find similar stuff. This might take the social component out of inventing tags that are understood by a possibly large part of the community, but it would bring in more related content discovery and allow people with similar thoughts to find each other and start talking, rather than tagging. Or?


Filed under Storyflow