Jiri Donat's weblog: April 2006

Clear Message of Globalization

We are living in an exciting time. Fast changes are occurring everywhere in our global world. I am an especially lucky person, because I have the opportunity to have a first hand personal experience of these changes.

Last December I went to China to teach MBA in Peking University. Can you imagine any better place to discuss the phenomenon of globalization with managers than Beijing? These days I enjoy being in Moscow. So my life has been arranged in a really special way! I am able to see the disruptive development of our civilization called globalization from different angles and on my own eyes. And I am also able to think about it.

I remember Moscow from my young days 20 years ago, when I was on a student exchange in the time of Gorbatchew. Now, the city is very different, although interestingly enough, the speed of changes is substantially slower than in China.

Our world is changing itself in a rapid pace. It is a direct consequence of information technologies that help people to reorganize the way they interact. Formerly, communication between nations was limited to contacts between emperors or politicians only. Today, this has been replaced by direct contact between anybody who has the need to communicate. From a slow and centralistic method of communication (or, should we rather say, “isolative method of communication”?) we are moving to a free flow of news, ideas, and cooperation. We are living in a world where communication is more and more limited only by the will of the participants.

This development has a direct consequence: the role of states decreases and with it, state regulations goes off, too. In some parts of the world this process is faster, in other parts it is slower. But the message is here and is very clear. The process is global and the changes are inevitable.

Well, we all are lucky to live in an exciting, opening world. In the world where barriers are being removed for the benefits of most of us.

Labels: globalization, moscow, Soviet Union

The Power of LinkedIn

I’ve just joined the fast growing Central European team of Capgemini. In my new role of Managing Consultant it will be my pleasure to develop offerings of this global IT services and business consultancy around the lines of Service Oriented Architecture.

In another place of this blog I am discussing the business model of social networks. Indeed, the model is flawed, as today’s applications motivate participants to grow their “trusted” networks indefinitely (last time today I’ve got an invitation saying “it is always beneficial to increase the size and scope of ones network…”). So, this conclusion is very true.

But of course, even if the business model is not right, it does not imply anything about practical usability of these applications. Actually, I can serve as a good example myself. After being a member of the LinkedIn network for just two months, I was approached by headhunters working for Cap. They found my profile at LinkedIn around the same time when another big IT company found me on this network, too. Then both these companies approached me directly and gave me the luxury of deciding between two good opportunities.

The lesson learned? Applications like social networks really work. Even before visionary projects like UPI happen (sorry, this is my child :-)), social networks are already turning the internet into a more structured place. By improving search in more and more special areas, the internet is gradually becoming a medium where you can find what you need.

So there is one symbolism for me. Since now, I have a new job. But in the same time, I have been shown that the world has changed.

Welcome to a networked world! It will be my pleasure to continue meeting you there.

Labels: internet community, internet recruitment, internet search, social networks

UPI Defined

UPI – Unique Personal Identificator – is a new and open syntactic layer of internet that uniquely identifies both authors and users of the internet content. It can be applied to various forms of electronic communication (web pages, discussion forums, mails and even IMs and VoIP calls). As a result, the traditional PageRank method will be superseded by a truly personalized approach. We will not only see search results sorted by our personal preferences, but in addition will be even able to limit web search to “people similar to us”.

The Motivation

Let us start with several concrete examples. Should UPI be massively adopted, the following internet search queries will become possible:

Find for me new ideas on certain subject that were written or read by people similar to myself.
Feed me with any new ideas from people whose reasoning and thinking I like
Find a fishery expert (or any other expert I need right now) that has similar interests and way of thinking like me.
Find a business partner that shares my business specialization and that will be an easy communication partner for me
Find a customer for my product or service I can easily target personally

The Method

There are lots of systems today that attempt to solve similar tasks. Generally speaking, we can divide these systems to the following two categories:

Systems that are trying to dig out more from context around certain keywords (e.g., Zoominfo, that searches names in context of automatically selected adjectives), and
Systems that are trying to add some additional explicit information from the user to the existing web content.

The second group can be further divided to

a) systems that collect user’s behavior – or directly, or through their work with links (e.g. Google Reader, Google Personalized Search, Flork, del.icio.us, or Stumbleupon)

b) systems that try to add an additional piece of syntax to the internet (e.g. the Friend of a Friend FOAF project)

The UPI system falls into the 2b) category.

Acknowledgements

The UPI system was invented in a discussion (here is its full content in Czech language) that was moderated by myself on the discussion server Lupa.cz this February. Several members of the community added significant pieces to the system design, so the idea I am now describing is by no means my sole work. My special thanks go to Jan Bilek, who created several important elements of the system.

How it works

First of all, the user chooses his or her unique UPI. Although the easiest technical solution to implement this function would be to go through one centralized registration service, this centralistic approach would very likely harm the system’s adoption. We are thus envisioning multiple competing services – so called identity servers – to do the registration process. The only thing that must be defined centrally is the UPI syntax. We propose the following one:

chosen_name#identity_server_URL

This syntax corresponds with the popular email syntax
chosen_name@e-mail_server_URL.
Our approach makes it easy to select UPIs really uniquely and yet in a decentralized way; the UPI identificator is easily differentiable from the rest of web content, so, in other words, it creates a new piece of the web syntax, which is easily understandable both to human readers and machines. It also directly points to the home identity server of the user, which helps to resolve potential conflicts if more than one UPI identity server page (so called reading profile – see below) is found for a particular user.

The Role of Identity Servers

The purpose of UPI is to uniquely identify a particular user in all his communication activities. To collect the maximum information possible, we must cover (that means uniquely identify) both reading and writing activities of the user. To allow for maximum adoption, the system itself should not demand too much activity on the user side. We thus propose to include all functions of the system in a simple browser plug-in which will do almost all activities for the user automatically. The user will be only required to sign-in to this service on the device he is going to use. The plug-in will be provided or by any third party (in most cases by a search engine) – this area is fully open to competition, too. In addition, the plug-in can automatically identify existing UPIs on the web pages as we see them and turn them automatically into miniature clickable icons or even pictures of users; clicking on such a picture will show a context menu that is related to search services of a particular search engine. It can for example automatically show us the pages we read jointly, discussions where we both participated or even the entire history of our communication.

Tracing Authors

The easy part of unique identification is the “active”, or authoring part. The browser plug-in will sign everything we publish on the web. This is technically very easy: the browser contains a button that inserts our UPI to any our post, article, and even email we write. However, the syntax of UPI is so easy that users can sign any document even manually, in a similar way to adding email address to their posts.

Tracing Readers

A much more difficult part of the system is to trace reading behavior of a particular user. In an ideal world, every page would be signed by UPI of its author and will in the same time contain UPIs of all its readers – this highly formalized content would be then publicly available for all competing search engines. This would be an ideal form of the web!

This will of course not happen (most of the web pages are “read-only”), but we can do virtually the same by placing our reading history to any publicly visible page of the internet. Our plug-in will automatically add URL of every page we visit to our “reading profile” – a web page with specific syntax which can be located on any server we have writing access to. The server that hosts this page will be then called identity server. Over the time, special identity servers will certainly appear on the internet, but to use UPI system we don’t need anything else than just one web page we have write access to.

The Role of Web Search Engines

As soon as we have the web content signed by UPIs of authors and related via reading profiles to its readers, the main part can come. All this information is publicly available, so a competition between different search engines in processing this valuable information may start. The main outcome of this competition will be implementation of “people similar to me” search function. Let us underline that the UPI concept will not become a competitive advantage of any particular web search service; it will serve to all of them, both general and specialized, in creating better personalized search.

How Will Search Engines Process UPI?

The “search people similar to me” function implementation will revolve around the family of statistical cluster analysis methods. The algorithm may look this way:

For each user the search engine searches the web for the person’s UPI and for his reading profile. If multiple reading profiles are found, it resolves this conflict. The information found is then transformed into multidimensional user information that will serve as an input for cluster analysis. This multidimensional representation of user information is to a certain extent similar to the FOAF project, but it is much more information rich and, in addition, it dynamically evolves during the time and so it respects changes of user’s behavior. The actual realization of this transformation will become a competitive advantage between different web search services.
After creating the representation for all UPI users, the cluster analysis starts. For each user the search engine calculates his “distance” to all other users. The detailed realization of the cluster analysis will become a competitive field, too. As a result, we get a two-dimensional matrix of mutual users’ relations. This matrix will then become a direct, personalized successor to PageRank; it will serve for any search query the particular user will carry on from now on, until the next analysis is performed.

Advantages of Openness

Because the UPI concept itself will not serve as a competitive advantage to any particular search service, all search services will be encouraged to optimize its implementation.

The search engines competition will evolve around refining the following areas:

processing the raw content of web pages (web crawlers may be for example able to identify UPIs not only on the same page, but in addition within the same discussion threat, or analyze the frequency of communication between particular UPIs in participating e-mail or IM systems);
further processing of UPI-based information (for example, “aging” of my reading or publishing history could be optimized for particular search scenarios – how should the weight of pages visited or created decrease over the time?);
representation of user information to the form which will provide the best cluster analysis results;
the cluster analysis itself – it can be modified to best serve specific search queries.

Transferability of UPIs

There are of course many remaining things to be resolved. What happens if I am not satisfied with my identity server? Or, if the server stops its service entirely? There should be an easy procedure which allows me to move to another identity server and still maintains my existing UPI (as UPI should be persistent over the time). So I should be able to transfer my UPI to any other identity server; the original server will then be responsible for displaying my new ID server.

What happens if the original server stops working entirely? Even this situation can be resolved. My new identity server will always display my UPI on my publicly visible reading profile. This page will be searchable by search engines, so they can find my UPI page wherever it is located (because UPI is unique and reading profile has a given syntax – so we know the list of reading profiles for each UPI). It is user’s responsibility (and also his own interest) to ensure there is just one UPI profile page with his profile on the internet. If there is more that one page, the user is informed about this problem by his search engine and is then asked to resolve this ubiquity. He can for example blacklist a fake “reading page” provided by a malicious server. Such a black list can be in addition shared by multiple search servers.

Motivation of Users

A nice feature of the system is that it motivates its users for a fair and consistent usage. Soon after we start to use this system, we start to benefit from an improved web search. If an user for example decides to stop using his UPI and replace it by another one, he instantly looses all the information that he already built during the usage of his former identity. In other words, the longer and the more consistently I use my UPI, the more I benefit from it.

OK, there can be one special question: how about if the user wants to visit some xxx pages? He is certainly not willing to have this part of his history publicly available in his profile. But that is fine, too. The user is free to have more than one UPI, if he wants to. His second, “xxx-UPI” will help him to find the xxx-content even better than before, while his “normal” UPI will help him in his normal work. By choosing the right UPI he actually submits an additional information to the system. The user is of course also free to sign off from his UPI-toolbar entirely when he wants to visit pages he doesn’t want to share with anybody else. In that case, he can browse the content entirely anonymously.

So it is the user’s own motivation to use the system as frequently as possible and in a very consistent way. Only this usage pattern will give him the best search benefits.

Conclusion

The main properties of the UPI system are openness and simplicity. It extends the current internet infrastructure and its proven algorithms, so it builds upon existing and verified systems. These properties maximize chances of the system for its mass adoption.
The system is not implemented yet, but I will be happy to assist with its implementation to anybody who is interested.

Labels: internet community, internet search, unique personal identificator, upi

My FOAF Comments

The Friend of a Friend (FOAF) project is certainly worth a look. It attempts to provide some basic machinery to help us “tell the Web about the connections between the things that matter to us”. People are one special case of these “things”, so from this perspective, FOAF has similar motivation to UPI (Unique Personal Identificator).

I have however one issue with this system. To my opinion, it is not feasible to try to put condensed personal information (relations to other people or activities) into one short static descriptor. It will never be exact; it stays static over the time and still requires quite a lot of work from participating users. To my opinion, another approach makes better sense: to uniquely identify the user and let him freely work and use the internet. As a result, enough information will be created during a time. This information will then allow any (competing) web engine to create on the fly “FOAF-like” identificators that are however dynamically evolving over the time. In addition, these “dynamic FOAFs” can be then focused and optimized to a particular purpose.

I am sure that the UPI approach, which we are going to describe in the next post, can eventually fulfill the FOAF Goals, but can even strive for something more...

Labels: foaf, internet community, internet search, unique personal identificator, upi

Jiri Donat's weblog

Wednesday, April 19, 2006