Jiri Donat's weblog

Monday, November 29, 2010

Google has one last chance to buy Facebook

The Internet went already through several battles on its platform. The first one was a battle for the browser, which was also completed with a dramatic judicial final. Browsers in the meantime became nearly uninteresting commodity, on which now (almost) nobody earns money. The second battle focused on the search engine. Search engines are now at the peak of their career. They earn money so well that the winner of this battle Google has become a powerful global corporation. On the Internet, however, nobody does have anything certain forever. We are now ready for the next, already the third major battle: this time to fight for the platform on which it will be possible to build personalized search and personalized mass advertising. The winner of this "internet personalization layer" will be granted huge income from the modern advertising industry. Today there are only two serious rivals: Google and Facebook. If Google doesn’t buy Facebook now, it may eventually regret its decision.

The magic of advertising market

Advertising market was, is and always will be very lucrative. The need to sell is the basic need of all commercial firms from the dawn of business. Internet brings to this area a breakthrough innovation: for the first time in history it offers mass advertising aimed at individual needs and preferences of specific customers. Yes, especially the apparent contradictions in the words "mass" and "personal" is the key to huge earnings mass personalized advertisement will bring.

Two types of advertising

Personalized advertising was always available: but it required a personal work of people. The more customers we wanted to reach, the more people we had to involve in the sale: we had to hire more dealers, more salesmen, more call center operators. Advertising could therefore always be personalized, targeted to the specific needs of buyers, but couldn’t be in the same time a mass.

Also, mass advertising is available here for some time, specifically, since the invention of mass communication media - print, radio and television. It is able to reach millions to tens of millions of people in a single moment, but all of them with the exact same message. Men are then offered hairspray, ladies are offered sports car or a razor blade to shave beards. Mass advertising is relatively inexpensive, which unfortunately leads almost to its "abuse", which results in its overabundance. People are today approached by advertising from all sides. And since most of the advertisements are not relevant to them (they are not targeted), people take advertising as something annoying, something that is just a waste of their time and thus should be ignored ("ad blindness"). But this forces advertisers to try even harder – to try everything to win customer’s attention. Advertising thus became increasingly obtrusive and annoying. This is however, paradoxically, against the two "golden rules of successful sales. Perhaps these rules would not hurt to recall:

1. Try to solve customer problems;

2. Contact the customer when it is convenient for him.

In other words, the seller should always try to respect the needs of customers and context of their situation. If the customer looks at a football match right now, let’s assume he cares about the match, not about buying a new car. Conversely, if the customer is walking down the supermarket, now is the right time for commercial information. Advertising in such a situation will be perceived by the customer as a useful information and he will be keen to listen.

Era of mass personalization

Personalized online advertising is the right tool to solve this dilemma. The Internet is a two-way mass-media. It can deliver a personal message to the billions of people, but in the same time it can do it individually, with knowledge of the situation and context. And it can do so automatically, because the internet is basically an interconnection of computers on which programs communicate with target users according to pre-written rules. The role of precious personal sellers, dealers or call center operators can be taken over by these cheap computers and their programs. Internet thus offers the same level of economic efficiency as public broadcasting, but unlike public broadcasting it is able to customize the messages and provide an optimal time of delivery. For the first time in history, advertisement can be mass, inexpensive and yet individually addressed to specific potential customers. But this is exactly the combination every commercial company called for for years! In addition, personalized Internet advertising does not reach customers only on the Internet - thanks to consumer electronics devices it extends its reach to our daily life. Advertising (and not just the Internet one) will thus change from today's untargeted, flat and annoying form to a useful information respecting the person addressed and the context of his situation.

More effective sales

This offers truly exciting possibilities for both sides of the sales relationship. Seller will be offered personal contact right to those potential customers who might be interested in their product: we can then address just those car owners who are just now considering a replacement of their vehicle, only those tourists who now go around my restaurant and love the type of food I cook, just those travelers who are just looking for bed this evening near my hotel and could thus fill my unsold capacity. This new way of advertising is also advantageous for buyers. Buyer will be interested in receiving commercial information of the areas that interest them, and when it is useful to them. If I am walking down a mall, I would be happy to learn that the shop left to me offers the gaming console I have already asked for, and that if I buy it right now, I get a personal discount of 30%. Finally, people will not be bothered by advertising; on the contrary, they will look forward to it.

The need for a single platform

To build such an advertising system, we certainly need to know users; that means we have to know about them as much information as possible and we need to watch their behavior for as long as possible – and all that of course regardless of the device that the users are using. The same user should therefore register with the same username and password on his computer, on his cell phone, video console, television or an electronic book - personalization must be at the user level, not at the device. The platform must be capable of processing data from a large number of users and automatically monitor their activities (shopping behavior, topics read on the Internet, books read, movies watched, even their movement in the real world as tracked by GPS). All this information should then be automatically added to their profile and then groups of “similar" people should be calculated. The platform should also be massively used in order to compare the behavior patterns of as many people as possible. The more users the platform will have, the better it will be suited for targeted advertising.

It is therefore necessary to create a platform that will meet the following two criteria:

1. It is mass-popular

2. People spend as much time as possible using it.

Let's see, who is now closer to this “golden platform”. Facebook or Google?

The Facebook approach

Social network Facebook was founded in 2004 as a small student platform. By today it has more than 500 million active users, far beyond the academic realm. Gradually, Facebook is offering more and more services that compete with the general Internet services. The last of its major innovations is email called Facebook Messages, which the company launched on 15th November 2010. With this email, users get address @facebook.com and can send mails with attachments to anyone on the Internet. This email platform competes head-on-head with Google’s Gmail. But Facebook has also other services in competition with established Internet applications: for example, a photo album with no limit on the number of uploaded images and with the possibility to tag other users on photos (competes with Flicker and PicasaWeb) blogging tool Facebook Notes (Blogger.com competition), discussions organized by businesses, schools or other criteria (competition of Google Groups), not to mention the personal websites of users, which competes to Google Sites. Plus a large number of third party applications, the possibility to update user status, shared "wall" where users can write messages and post attachments, the "news feed" informing users about changes in profiles and activities of their friends. Facebook is just a small "Internet-in-one”. Everything is extremely well integrated, the user gets all the features easily, the system constantly advises them, which other functions they could still try. Whether you want to find friends, classmates, or attend one of the many forums, you are still in one application enjoying the same interface. Besides, nearly all of your friends are already there - Facebook is now by far the most popular social networking platform. Thanks to it all these services like search and discussions gain even greater meaning and usefulness.

This leaves Facebook with a real wealth of personal information. Facebook is well aware of the value it has in their users and tries to use the data collected from users in contextual advertising. Let us mention the Facebook Beacon project in 1997 (ended in 2009), which sent data from external websites to Facebook, ostensibly for the purpose of allowing targeted advertisements. This project is proof that Facebook knows which direction it should take.

In summary, Facebook gradually grew into a natural platform for the Internet - it's kind of a small Internet of its own, which integrates all useful functions, but which does not suffer from the complexity of the "big" internet. With some exaggeration we can even say that Facebook is the "Internet for ordinary consumers." But that is the majority of the Internet users.

How can Google compete?

But what is a competitive platform Google could stand against Facebook? Google has its applications in most areas of Facebook functionality. These applications usually existed long before Facebook and offer more functionality than Facebook. But does Google have a single product that could stand Facebook as its direct rival? I am afraid that here the answer is negative.

Google applications grew up as completely separate platforms, created by different teams, or even purchased as finished products. As a result, these applications are not integrated together - except the login where the same Google Account is used. An additional problem is that the functionality of many of these applications overlaps considerably. For example, several Google products have characteristics of social network, namely Gmail with Google Buzz, PicasaWeb, Blogger, YouTube, Google Maps with Latitude and Orkut. In all these applications you can add "friends" and see what these friends do. Unfortunately, by adding friends in these separate applications you end up with a number of different, separate sets of "friends." To tell the truth it results in a chaos, with whom we actually take up with.

Chaos in friends :-)

In fact, Google applications are very good individually. In particular, Google has an excellent e-mail system Gmail, which is combined with chat, video chat and calendar. In February 2010, a feature called Buzz was added to Gmail allowing short messages to be shared with "friends." This step actually moves Gmail into a sort of a social network. To share photos, however, you have to use other application, PicasaWeb, which, unlike Facebook limits the number of photos uploaded to 1GB of storage space, the additional space must be paid for. Here, users can also comment on their photos and send messages, but those comments are separate from Gmail or Gmail Chat. If the user wants to blog, it can do it of course as well. But then he must turn to another distinct system: blogger.com. There, your blogs can be followed by other people via RSS feed and similarly you can follow other blogs (again, the analogy of "friendship"). For the blogs of your friends (the analog of Facebook News Feed) you can use an excellent RSS reader Google Reader (another separate product), which works regardless of platform where your friends are blogging.. Unfortunately, this creates yet another separate group of your "friends". Google has also excellent maps, enhanced with an interesting community feature Google Latitude. With this feature, you may disclose the location of yourselves to your friends and vice versa. Adding "friends" into Google Latitude creates unfortunately again a separated “set of friends”. The popular video server YouTube is also a sort of a social network. Its users can rate videos, can comment on videos and recommend videos to others. However, this interaction is again separated from the interaction on Gmail, Google Buzz, Blogger, and PicasaWeb. YouTube users can also watch other people’s “channels” - again, an analogy of friendship, but separated again from all other sets of “friends” in Google's portfolio. And to make matters even worse, Google also owns a "full-fledged" social networking site Orkut. This network has 100 million active users, especially in India and Brazil - and certainly cannot be considered a failure. Unfortunately, friends in this network are yet another separate set of friends you have in Google.

Can this all be ever integrated?

Google thus has all the functionality which Facebook offers, and usually at a higher level, but unfortunately, these functions are spread among a number of very different applications. Google’s applications are excellent and in many cases even the best on the market. Thanks to them, Google owns also a large amount of data about their users. But these data are again held in separate applications, without the possibility to access them uniformly. Our conclusion? Google lack an uniform platform that could compete with Facebook.

In contrast, Facebook went through a very different process. First, it created its own platform, only then it built all of its functionality on top of it. Thanks to this approach Facebook created its own “small Internet,” which is well integrated on a single platform. As a result, there is only one “set of friends” on Facebook and there is just one platform all the applications run on. Facebook can thus be much more efficient in collecting information about its users. And last, but not least: it is also much easier to use.

This is a big threat to Google. Google is aware that it has no other option than to win users for its own plaftorm, Google Account .

How to get users to Google Account?

The easiest way is to gain market share by "brute force"; in other words, Google must engage in this combat all of its popular services. A similar strategy has been already used by many companies in the past: for example, Microsoft has built the success of its Office suite on the success of its Windows operating system. Google has already made the first step: unified login of all applications in its portfolio via a single Google Account.

But Google has two other irons in the fire. They are the popular Android operating system and the forthcoming Google Chrome OS. For manufacturers of mobile phones and PCs both platforms are very appealing for their zero price and high quality. And they are very attractive for users, too - along with a mobile phone or PC users get some very useful applications (such as maps, navigation, and online Google Docs office suite). But to make this system work, users must have a user name and password on the Google platform, otherwise its operating system will not even start.

The Android platform is expected to increase the number of its users by 500% in the next 3 years. Already, there is no doubt that Android aspires to the position of the main platform of mobile devices. But even if this actually happens, Google will still be far from victory. It can easily occur that although most of the devices will be using the Android operating system and Google Chrome, all of them will also carry an Facebook icon on their desktop. And this icon will be in addition present also on all other platforms that are out of Google’s control. Android's success will be then possible to compare with the success of the Internet Explorer browser. The battle was won, but unfortunately on the wrong battlefield.

Should then Google buy Facebook?

Would it be therefore a solution for Google to buy Facebook?

According to SecondMarket Inc., as of November 2010 the market value of Facebook is U.S. $ 41 billion and Facebook is the third largest web company in the United States after Google and Amazon (just before eBay). Google's market capitalization is around U.S. $ 190 billion. Acquisition would be possible, but it would not be in any way cheap for Google.

Another problem would be that Facebook functionality massively overlaps with Google’s. In the case of purchase, therefore, Google has either to abandon some of its own applications or integrate them into Facebook. If it decides to integrate it would be a very complex task – both in terms of technology and marketing.

Yet it is perhaps still a better option for Google than the prospect that in a few years, Google will compete with Facebook in a match of equals. Although Google will have better and more sophisticated applications, Facebook will be easier to use and better integrated. And the majority of users will be on its platform.

Labels: advertisement, computing paradigm, Facebook, google, internet search, personalization

Wednesday, February 20, 2008

Life in a glassy fishbowl

One interesting comment appeared today on my previous post:

Anonymous said...

Well described. However there is missing area. Security. People usually not expose contacts which could be pretty close and they feel that another their contact dislike/ hate the person. Second, not all communication whatever intensive mean we are friend or colleagues. Imagine yourself complaining to any bigger company. Many times long story and no relationship will happen. Third, Unified ID. Thanks God there is no way how to enforce it (now). ID itself is great idea, however real people abuse anything they can. Now you loose at maximum limit on your credit card. However with digital identity you could loose more. In worst case you could be completely impersonated with all consequences. In digital world (now I exclude mixing of real and digital world) your reputation could be easily harmed and your chance to prove your innocence is limited

2:23 AM

Good points! However, the future world will not care about which developments we would prefer to happen. There is one general observation we can make even now: the level of transparency our our future world will change. My hypothesis is that whether we like it or not, this "transparency level" will increase significantly. It however doesn't mean our world will necessarily become a worse place for life. It may work just the opposite way: if all information is transparent we can live a more peaceful life than today - no more will anybody be a subject of gouging, no more will anybody be nervous that something secret will be found out. All information will be public. Everybody will have to live his life with a full knowledge of this fact to avoid negative surprises.

The other side of this new set-up however is that we all will have to accommodate our life to this new situation. We will have to live our life as if we stood at every moment on a public stage. It is not inappropriate to compare this situation to a new kind of religion - from the time when the God saw everything (so people had to behave gently and appropriately), we are now approaching a situation when we can be sure that whatever we do can be observed, archived and found by anybody, even by our worst enemy. (And to be frank, to a great extent we already live in this situation today - or do you really think our emails and calls are safe these days?)

Meet my mistress, darling!
Specifically to your first point: I fully agree with your comment that not all our contacts would appreciate to know all other contacts we do have or we communicate frequently with; for example, our wife will not appreciate our mails, calls and meetings with our mistress(es), your boss will not value our job application to the competing companies, etc., etc. However, as I said above, this will be not our choice to decide which information we will share with whom (I do exaggerate here, but only slightly). It will result in a new, "transparent" world and this world can basically have two consequences:

People will start to behave "more appropriately" (knowing the consequences of each steps they are doing), or
People will become more liberal and will accept certain situations as "normal".

I frequently think about how this new level of transparency will influence peoples' relationships. My tip is that the final result will be between these two extremes and will be different for different areas (work code of conduct will probably be more liberal than the personal code of conduct). It will be certainly very interesting to see how this develops.

Business of personal?
Your second comment falls into a more general category of how to split "business" communication from the personal one. The question is, do we need to split them at all? I agree with you that although even in business we can (and do) make friends, we all have personal experience with annoying communication with institutions which lead nowhere (only to personal frustration). But my experience tells me that in these situations we tend to limit the communication to an absolute minimum.
In addition, there are other tricks that can be used, which will help to separate the"real" relationship with the fake one. If somebody is, say, a speaker of a large corporation, he automatically gets lots of messages every day and he also replies to lots of messages, because it is the nature of his work. In this amount of communication, his personal share of communication with any particular client gets naturally pretty low. And this can be one of the clues to our problem. Weights of the friendship can be taken relatively in respect to the overall amount of communication of every person of the communication.
Interestingly enough, such an algorithm would work also well with celebrities, actors, politicians, sport stars, and all people who receive lot of attention and thus lot of communication (even with our boss). It would automatically take into account the "weight of the communication" on every side of the communication. The more asymmetrical the communication is, the less important the relationship probably is. There is certainly need to work out such ideas to a much more detail and to come up with better and better algorithms.

Let's live in a glassy fishbowl
To your third point: yes indeed, everything in our world can be and will be misused. I don't however think an instant "loss of identity" can occur; on the other hand, somebody can pretend he is you. But to make this really work he would have to do it continuously for a long time and invest quite a lot of energy into it. Frankly, most of the people have other things to do. In other words, most of the people are normal: tell their real names when we meet them on the street, do wear their own faces, not masks, and tell their real names to the phone when they call us. So I tend not to be too pessimistic here. But indeed this will be a problem. Certainly some mechanisms will appear to fight these frauds and certainly there will be even smarter frauds invented that circumvent these mechanisms. But as I said, most people behave normally and this is, frankly, why our world works, and why the future world will work, too.
Much bigger problem would thus will be how people will cope with the new transparent world where there will be an absolute minimum of personal secrets. It will depend only on us how we tackle this new situation.

So I would correct your saying slightly:

In a digital world you will have to build your reputation every moment of your life.

Labels: foaf, internet search, social networks, socioware, Web 2.0

Tuesday, April 11, 2006

The Power of LinkedIn

I’ve just joined the fast growing Central European team of Capgemini. In my new role of Managing Consultant it will be my pleasure to develop offerings of this global IT services and business consultancy around the lines of Service Oriented Architecture.

In another place of this blog I am discussing the business model of social networks. Indeed, the model is flawed, as today’s applications motivate participants to grow their “trusted” networks indefinitely (last time today I’ve got an invitation saying “it is always beneficial to increase the size and scope of ones network…”). So, this conclusion is very true.

But of course, even if the business model is not right, it does not imply anything about practical usability of these applications. Actually, I can serve as a good example myself. After being a member of the LinkedIn network for just two months, I was approached by headhunters working for Cap. They found my profile at LinkedIn around the same time when another big IT company found me on this network, too. Then both these companies approached me directly and gave me the luxury of deciding between two good opportunities.

The lesson learned? Applications like social networks really work. Even before visionary projects like UPI happen (sorry, this is my child :-)), social networks are already turning the internet into a more structured place. By improving search in more and more special areas, the internet is gradually becoming a medium where you can find what you need.

So there is one symbolism for me. Since now, I have a new job. But in the same time, I have been shown that the world has changed.

Welcome to a networked world! It will be my pleasure to continue meeting you there.

Labels: internet community, internet recruitment, internet search, social networks

Sunday, April 02, 2006

UPI Defined

UPI – Unique Personal Identificator – is a new and open syntactic layer of internet that uniquely identifies both authors and users of the internet content. It can be applied to various forms of electronic communication (web pages, discussion forums, mails and even IMs and VoIP calls). As a result, the traditional PageRank method will be superseded by a truly personalized approach. We will not only see search results sorted by our personal preferences, but in addition will be even able to limit web search to “people similar to us”.

The Motivation

Let us start with several concrete examples. Should UPI be massively adopted, the following internet search queries will become possible:

Find for me new ideas on certain subject that were written or read by people similar to myself.
Feed me with any new ideas from people whose reasoning and thinking I like
Find a fishery expert (or any other expert I need right now) that has similar interests and way of thinking like me.
Find a business partner that shares my business specialization and that will be an easy communication partner for me
Find a customer for my product or service I can easily target personally

The Method

There are lots of systems today that attempt to solve similar tasks. Generally speaking, we can divide these systems to the following two categories:

Systems that are trying to dig out more from context around certain keywords (e.g., Zoominfo, that searches names in context of automatically selected adjectives), and
Systems that are trying to add some additional explicit information from the user to the existing web content.

The second group can be further divided to

a) systems that collect user’s behavior – or directly, or through their work with links (e.g. Google Reader, Google Personalized Search, Flork, del.icio.us, or Stumbleupon)

b) systems that try to add an additional piece of syntax to the internet (e.g. the Friend of a Friend FOAF project)

The UPI system falls into the 2b) category.

Acknowledgements

The UPI system was invented in a discussion (here is its full content in Czech language) that was moderated by myself on the discussion server Lupa.cz this February. Several members of the community added significant pieces to the system design, so the idea I am now describing is by no means my sole work. My special thanks go to Jan Bilek, who created several important elements of the system.

How it works

First of all, the user chooses his or her unique UPI. Although the easiest technical solution to implement this function would be to go through one centralized registration service, this centralistic approach would very likely harm the system’s adoption. We are thus envisioning multiple competing services – so called identity servers – to do the registration process. The only thing that must be defined centrally is the UPI syntax. We propose the following one:

chosen_name#identity_server_URL

This syntax corresponds with the popular email syntax
chosen_name@e-mail_server_URL.
Our approach makes it easy to select UPIs really uniquely and yet in a decentralized way; the UPI identificator is easily differentiable from the rest of web content, so, in other words, it creates a new piece of the web syntax, which is easily understandable both to human readers and machines. It also directly points to the home identity server of the user, which helps to resolve potential conflicts if more than one UPI identity server page (so called reading profile – see below) is found for a particular user.

The Role of Identity Servers

The purpose of UPI is to uniquely identify a particular user in all his communication activities. To collect the maximum information possible, we must cover (that means uniquely identify) both reading and writing activities of the user. To allow for maximum adoption, the system itself should not demand too much activity on the user side. We thus propose to include all functions of the system in a simple browser plug-in which will do almost all activities for the user automatically. The user will be only required to sign-in to this service on the device he is going to use. The plug-in will be provided or by any third party (in most cases by a search engine) – this area is fully open to competition, too. In addition, the plug-in can automatically identify existing UPIs on the web pages as we see them and turn them automatically into miniature clickable icons or even pictures of users; clicking on such a picture will show a context menu that is related to search services of a particular search engine. It can for example automatically show us the pages we read jointly, discussions where we both participated or even the entire history of our communication.

Tracing Authors

The easy part of unique identification is the “active”, or authoring part. The browser plug-in will sign everything we publish on the web. This is technically very easy: the browser contains a button that inserts our UPI to any our post, article, and even email we write. However, the syntax of UPI is so easy that users can sign any document even manually, in a similar way to adding email address to their posts.

Tracing Readers

A much more difficult part of the system is to trace reading behavior of a particular user. In an ideal world, every page would be signed by UPI of its author and will in the same time contain UPIs of all its readers – this highly formalized content would be then publicly available for all competing search engines. This would be an ideal form of the web!

This will of course not happen (most of the web pages are “read-only”), but we can do virtually the same by placing our reading history to any publicly visible page of the internet. Our plug-in will automatically add URL of every page we visit to our “reading profile” – a web page with specific syntax which can be located on any server we have writing access to. The server that hosts this page will be then called identity server. Over the time, special identity servers will certainly appear on the internet, but to use UPI system we don’t need anything else than just one web page we have write access to.

The Role of Web Search Engines

As soon as we have the web content signed by UPIs of authors and related via reading profiles to its readers, the main part can come. All this information is publicly available, so a competition between different search engines in processing this valuable information may start. The main outcome of this competition will be implementation of “people similar to me” search function. Let us underline that the UPI concept will not become a competitive advantage of any particular web search service; it will serve to all of them, both general and specialized, in creating better personalized search.

How Will Search Engines Process UPI?

The “search people similar to me” function implementation will revolve around the family of statistical cluster analysis methods. The algorithm may look this way:

For each user the search engine searches the web for the person’s UPI and for his reading profile. If multiple reading profiles are found, it resolves this conflict. The information found is then transformed into multidimensional user information that will serve as an input for cluster analysis. This multidimensional representation of user information is to a certain extent similar to the FOAF project, but it is much more information rich and, in addition, it dynamically evolves during the time and so it respects changes of user’s behavior. The actual realization of this transformation will become a competitive advantage between different web search services.
After creating the representation for all UPI users, the cluster analysis starts. For each user the search engine calculates his “distance” to all other users. The detailed realization of the cluster analysis will become a competitive field, too. As a result, we get a two-dimensional matrix of mutual users’ relations. This matrix will then become a direct, personalized successor to PageRank; it will serve for any search query the particular user will carry on from now on, until the next analysis is performed.

Advantages of Openness

Because the UPI concept itself will not serve as a competitive advantage to any particular search service, all search services will be encouraged to optimize its implementation.

The search engines competition will evolve around refining the following areas:

processing the raw content of web pages (web crawlers may be for example able to identify UPIs not only on the same page, but in addition within the same discussion threat, or analyze the frequency of communication between particular UPIs in participating e-mail or IM systems);
further processing of UPI-based information (for example, “aging” of my reading or publishing history could be optimized for particular search scenarios – how should the weight of pages visited or created decrease over the time?);
representation of user information to the form which will provide the best cluster analysis results;
the cluster analysis itself – it can be modified to best serve specific search queries.

Transferability of UPIs

There are of course many remaining things to be resolved. What happens if I am not satisfied with my identity server? Or, if the server stops its service entirely? There should be an easy procedure which allows me to move to another identity server and still maintains my existing UPI (as UPI should be persistent over the time). So I should be able to transfer my UPI to any other identity server; the original server will then be responsible for displaying my new ID server.

What happens if the original server stops working entirely? Even this situation can be resolved. My new identity server will always display my UPI on my publicly visible reading profile. This page will be searchable by search engines, so they can find my UPI page wherever it is located (because UPI is unique and reading profile has a given syntax – so we know the list of reading profiles for each UPI). It is user’s responsibility (and also his own interest) to ensure there is just one UPI profile page with his profile on the internet. If there is more that one page, the user is informed about this problem by his search engine and is then asked to resolve this ubiquity. He can for example blacklist a fake “reading page” provided by a malicious server. Such a black list can be in addition shared by multiple search servers.

Motivation of Users

A nice feature of the system is that it motivates its users for a fair and consistent usage. Soon after we start to use this system, we start to benefit from an improved web search. If an user for example decides to stop using his UPI and replace it by another one, he instantly looses all the information that he already built during the usage of his former identity. In other words, the longer and the more consistently I use my UPI, the more I benefit from it.

OK, there can be one special question: how about if the user wants to visit some xxx pages? He is certainly not willing to have this part of his history publicly available in his profile. But that is fine, too. The user is free to have more than one UPI, if he wants to. His second, “xxx-UPI” will help him to find the xxx-content even better than before, while his “normal” UPI will help him in his normal work. By choosing the right UPI he actually submits an additional information to the system. The user is of course also free to sign off from his UPI-toolbar entirely when he wants to visit pages he doesn’t want to share with anybody else. In that case, he can browse the content entirely anonymously.

So it is the user’s own motivation to use the system as frequently as possible and in a very consistent way. Only this usage pattern will give him the best search benefits.

Conclusion

The main properties of the UPI system are openness and simplicity. It extends the current internet infrastructure and its proven algorithms, so it builds upon existing and verified systems. These properties maximize chances of the system for its mass adoption.
The system is not implemented yet, but I will be happy to assist with its implementation to anybody who is interested.

Labels: internet community, internet search, unique personal identificator, upi

My FOAF Comments

The Friend of a Friend (FOAF) project is certainly worth a look. It attempts to provide some basic machinery to help us “tell the Web about the connections between the things that matter to us”. People are one special case of these “things”, so from this perspective, FOAF has similar motivation to UPI (Unique Personal Identificator).

I have however one issue with this system. To my opinion, it is not feasible to try to put condensed personal information (relations to other people or activities) into one short static descriptor. It will never be exact; it stays static over the time and still requires quite a lot of work from participating users. To my opinion, another approach makes better sense: to uniquely identify the user and let him freely work and use the internet. As a result, enough information will be created during a time. This information will then allow any (competing) web engine to create on the fly “FOAF-like” identificators that are however dynamically evolving over the time. In addition, these “dynamic FOAFs” can be then focused and optimized to a particular purpose.

I am sure that the UPI approach, which we are going to describe in the next post, can eventually fulfill the FOAF Goals, but can even strive for something more...

Labels: foaf, internet community, internet search, unique personal identificator, upi

Friday, March 24, 2006

Funny Profiles on Zoominfo

These days a lot of people try hard to work on improving search on the internet. Today’s wealth of internet content is so vast that any method that would help people to differentiate quality content from the ballast (that is overall flooding the net) would be extremely beneficial. Well, we already have one such a method – it is called PageRank. This method is based on the “universal popularity” of a particular site expressed by links that are pointing to it. In other words, PageRank grubs out the semantic information on popularity from the only available syntactic tool: web links. The PageRank algorithm is well proven and fine-tuned to the best possible extent. It is very hard to find any further improvement of it.

Context digging

OK, so where can we move from this point? There are just two ways forward:

to add some additional syntax piece to the internet (that would help make the content better searchable), or
try to work better with the existing unstructured content.

Zoominfo can serve as a typical application of the second approach. It tries to dig out the semantics information from the context of keywords and automatically builds user profiles from publicly available news resources. To do this, it attempts to uniquely identify a particular person by searching its name in the context of other keywords that are automatically identified as being relevant to this person. This is a very non-trivial thing to do, indeed!

The Reality Check

Let me share some examples with you. If we search Zoominfo for the most popular Czech singer Karel Gott, we find eight (!) different profiles. The good news is that all are sort of related to the singer; however, the bad news is that no one is really correct and seven of the eight actually don’t mention that this person is a singer! Where is the problem? In the attempt to differentiate possible namesakes the system actually splits information about one person to many different profiles. Of course, the balance is difficult to reach. On one hand, it is wise to suppose that if there is a lot of information about a particular person, part of it should be contributed to namesakes. On the other hand, it doesn’t hold always, particularly if the person is really popular.

From professor to journalist or landlord

However, this problem is even more general and is not limited to top celebrities only. For example prof. Vorisek, who is the Head of Department of Information Technologies at the Prague Economic University, has 4 different profiles. Only the profile No. 2 is sort of correct, but it is vastly incomplete, just quoting his name and school. We don’t even know his function and have no idea about his other activities. In addition, some of the profiles are pretty funny. My favorite one is the one that actually identifies Jiri as a sort of landlord of Zofin Palace. In reality, Zofin Palace is just the venue of a regular annual conference Jiri’s department is organizing.

The conclusion

I don’t think that people at Zoominfo don’t try hard. They certainly do. The problem is a more serious one: the task to process context of keywords exceeds capabilities of today’s technologies, even if we limit this task to search in a particular context only (e.g., search of names and positions, as Zoominfo does). The idea itself is not bad, but it is a too ambitious one. Generally speaking, the complexity of this task is close to the problem of an automatic text comprehension and translation. Zoominfo’s case just illustrates that we are not at this stage yet.

This is a very clear message that shouldn’t be overlooked. It is (yet) very hard and even contra productive to automatically work with unstructured information, even in very special scenarios. On the other hand the syntax approach (PageRank) works well; the problem however is that its mechanism is already “milked to death”.

The solution?

To get better search results, we will have to add some additional syntax to the web. We should do it smartly – we cannot expect too much work from users, but in the same time we should make this web extension a clear advantage for everybody who joins.

There are many applications already that tackle the internet search problem this way – social networks can serve as a good example; thanks to their growing popularity they are in fact turning a significant part of the internet to a structured form! Another interesting example is the Friend of a Friend (FOAF) project.

We will however try to formulate a more general approach based on Unique Personal Identificator (UPI). It is actually a nice paradox that Zoominfo (and not only it) would greatly benefit from such a system. On the other hand, if the internet had UPI, applications like Zoominfo would not be necessary at all...

Labels: internet community, internet recruitment, internet search, pagerank, social networks

Monday, March 13, 2006

What Will Supersede PageRank?

Today we live in a world ruled by PageRank. Every web page has its specific rank that says whether it is valuable to the internet community or not. There is however one problem. There is nothing like a “universal” internet community per se. There are just people with different priorities, interests, expectations.

Although PageRank was a big success of its days (being able to distinguish between valuable content and the “mess” of the web), more and more people understands that the “majority” approach, that fits well with broadcasting media, is not suitable for the internet, which is by its nature an interactive medium, able to personally identify its users.

“I don’t want to only see the stories that most people are interested in, I want interesting stories.” (Dave’s Wordpress Blog)

OK, this is a reasonable expectation. But, how to move on? By replacing an “universal” PageRank with an “personalized” one?

A “personalized” PageRank

Page Rank is a brilliant piece of thinking. It was able to make use of the only semantic information that is embedded in the web syntax (the links) to evaluate quality of pages. By processing statistics of links we can understand which pages are most linked to, and this in fact allows us to access the vast amount of work of people who already read and evaluated these pages and created links to those they considered valuable.

But the links are already “milked to death” and there is nothing other in the web syntax that would give us an additional clue to quality of web content. So any attempt to move forward with the quality of web search would require introducing some new piece of syntax to the web, or, put it simply, something that would make the web content more structured. Yes, it is a tremendous task, but not impossible. And in fact, it is already happening.

Towards a more structured web

There are two possible approaches to adding more structure to the web:

Growing popularity and thus mass penetration of structured applications, like social networks.
Introducing a new piece to the web’s syntax, that would be seamlessly integrated to the existing web. My candidate: the Unique Personal Identificator (UPI).

These are quite different approaches; while the first one is based on mass adoption of structured applications, the second one is based on adoption of simple additional syntax by users. Let’s start with the first one for now.

Social network as a search engine

Social network is in fact an application that consists of

a specialized web search engine coupled with
a specialized web hosting service.

This approach has a clear motivation: the specialized search engine greatly benefits from being able to work with upfront defined structured information. So, for example, if we assume that the name is always filled in a field called “name”, company name in the appropriate field “company” (and is in addition related to the unique ticker symbol), education degree and country are selected from a pre-filled list etc., we are able to provide far better and far more relevant search results for our predefined queries than any full-text based approach can. So we are just porting the old good theory from traditional database systems to the internet. Ideally, the entire web should be structured this way!

Growing popularity of social networks

But now the interesting piece comes. The web is in fact becoming more structured, thanks to these applications. Because the search in social networks really works (well, structured search worked in traditional databases since 60’s, so why not here), these applications become useful and thus popular. The biggest social networks today contain tens of million of users and put profiles of these users on the web. Thanks to this development, a significant piece of the internet content is becoming structured in a very formal, traditional “database way”. We can even say that the web is becoming a more organized place.

Wider consequences of social networks

So there are now millions of users on the web, who took the time to create their personalized and structured profiles, and who keep these structured profiles updated. This is an amount of work that cannot be overlooked. In fact, it could already be compared (at least to certain extent) to the effort, which web users invested into linking their pages. This growing piece of structured web content will serve as a special (and welcomed!) input to universal web search engines. It can greatly improve their search capabilities in the areas where applications like social networks force people to use “strict syntax”.

Vision

This in fact doesn’t mean anything else than introduction of new syntax rules to certain application areas of the web. It is fair to expect that there will be more and more applications like social networks over the time. All these applications will have one thing in common: they all will motivate users to use the internet in a predefined, highly structured way. Whether this will result in structured personal profiles, product descriptions, descriptions of calendar events, or others, all this information will turn the internet to a more structured base of data. The amount of structured content on the internet will grow and will become a goldmine for any search engine of the future. As a result, traditional full text based web search will be complemented by more efficient tools in all areas where possible. Thank to this development, search will certainly improve. But for a really significant improvement, we should dethrone PageRank from its role of a sole and universal expert for evaluating information relevance.

PageRank Replacement?

To do this, we should implement a shift from evaluating pages to evaluating users. This would be a true revolution in the web search allowing us to search personally relevant information.

However, as we already said, this would require introducing a new piece to syntax to the entire web. Very difficult concept, indeed! Could we find out a method how to persuade users and developers to adopt this new piece of web syntax? Let us think about it next time.

Labels: internet, internet community, internet search, pagerank

Live.com - too deep innovation

There is lot of areas where we should innovate the web search, except of one – user interface. It was Google’s big contribution to the internet community to go the simplest way. No flashing banners, no “sexy” layouts. Just a very intuitive text list. And a page navigation that uses our own browser functions. What could be nicer and more practical?

And now have a look at live.com. Its search results are displayed in a fancy window and end some 5cm above the bottom of page. You intuitively need to scroll – but oops! No scrollbar is there. Just two strange and almost invisible (because made in light grey on white background) arrows. Should we move them? Click on them? Click on the bar?

This is not the way to go. As Microsoft Monitor blog puts it: “I see the new doohickeys--slider and macros--as adding complexity without significantly improving search relevancy.”

Microsoft uses the extra white space under the search results for a message Help us improve. Interesting enough – if they really improve in this matter (and focus their innovation efforts to the right areas) the place for this message disappears automatically...

Labels: internet, internet search, live.com, microsoft