Friday, March 24, 2006

Funny Profiles on Zoominfo

These days a lot of people try hard to work on improving search on the internet. Today’s wealth of internet content is so vast that any method that would help people to differentiate quality content from the ballast (that is overall flooding the net) would be extremely beneficial. Well, we already have one such a method – it is called PageRank. This method is based on the “universal popularity” of a particular site expressed by links that are pointing to it. In other words, PageRank grubs out the semantic information on popularity from the only available syntactic tool: web links. The PageRank algorithm is well proven and fine-tuned to the best possible extent. It is very hard to find any further improvement of it.

Context digging

OK, so where can we move from this point? There are just two ways forward:

  • to add some additional syntax piece to the internet (that would help make the content better searchable), or
  • try to work better with the existing unstructured content.

Zoominfo can serve as a typical application of the second approach. It tries to dig out the semantics information from the context of keywords and automatically builds user profiles from publicly available news resources. To do this, it attempts to uniquely identify a particular person by searching its name in the context of other keywords that are automatically identified as being relevant to this person. This is a very non-trivial thing to do, indeed!

The Reality Check

Let me share some examples with you. If we search Zoominfo for the most popular Czech singer Karel Gott, we find eight (!) different profiles. The good news is that all are sort of related to the singer; however, the bad news is that no one is really correct and seven of the eight actually don’t mention that this person is a singer! Where is the problem? In the attempt to differentiate possible namesakes the system actually splits information about one person to many different profiles. Of course, the balance is difficult to reach. On one hand, it is wise to suppose that if there is a lot of information about a particular person, part of it should be contributed to namesakes. On the other hand, it doesn’t hold always, particularly if the person is really popular.

From professor to journalist or landlord

However, this problem is even more general and is not limited to top celebrities only. For example prof. Vorisek, who is the Head of Department of Information Technologies at the Prague Economic University, has 4 different profiles. Only the profile No. 2 is sort of correct, but it is vastly incomplete, just quoting his name and school. We don’t even know his function and have no idea about his other activities. In addition, some of the profiles are pretty funny. My favorite one is the one that actually identifies Jiri as a sort of landlord of Zofin Palace. In reality, Zofin Palace is just the venue of a regular annual conference Jiri’s department is organizing.

The conclusion

I don’t think that people at Zoominfo don’t try hard. They certainly do. The problem is a more serious one: the task to process context of keywords exceeds capabilities of today’s technologies, even if we limit this task to search in a particular context only (e.g., search of names and positions, as Zoominfo does). The idea itself is not bad, but it is a too ambitious one. Generally speaking, the complexity of this task is close to the problem of an automatic text comprehension and translation. Zoominfo’s case just illustrates that we are not at this stage yet.

This is a very clear message that shouldn’t be overlooked. It is (yet) very hard and even contra productive to automatically work with unstructured information, even in very special scenarios. On the other hand the syntax approach (PageRank) works well; the problem however is that its mechanism is already “milked to death”.

The solution?

To get better search results, we will have to add some additional syntax to the web. We should do it smartly – we cannot expect too much work from users, but in the same time we should make this web extension a clear advantage for everybody who joins.

There are many applications already that tackle the internet search problem this way – social networks can serve as a good example; thanks to their growing popularity they are in fact turning a significant part of the internet to a structured form! Another interesting example is the Friend of a Friend (FOAF) project.

We will however try to formulate a more general approach based on Unique Personal Identificator (UPI). It is actually a nice paradox that Zoominfo (and not only it) would greatly benefit from such a system. On the other hand, if the internet had UPI, applications like Zoominfo would not be necessary at all...

Labels: , , , ,

4 Comments:

Anonymous Anonymous said...

Great work!
[url=http://wjqfhzid.com/rgka/nqzd.html]My homepage[/url] | [url=http://fhldsrxu.com/dvaz/czyl.html]Cool site[/url]

6:02 PM  
Anonymous Anonymous said...

Good design!
My homepage | Please visit

6:02 PM  
Anonymous Anonymous said...

Thank you!
http://wjqfhzid.com/rgka/nqzd.html | http://ynazgggh.com/tlxu/ngsk.html

6:03 PM  
Anonymous Anonymous said...

This comment has been removed by a blog administrator.

9:45 AM  

Post a Comment

<< Home