Jiri Donat's weblog: What Will Supersede PageRank?

Today we live in a world ruled by PageRank. Every web page has its specific rank that says whether it is valuable to the internet community or not. There is however one problem. There is nothing like a “universal” internet community per se. There are just people with different priorities, interests, expectations.

Although PageRank was a big success of its days (being able to distinguish between valuable content and the “mess” of the web), more and more people understands that the “majority” approach, that fits well with broadcasting media, is not suitable for the internet, which is by its nature an interactive medium, able to personally identify its users.

“I don’t want to only see the stories that most people are interested in, I want interesting stories.” (Dave’s Wordpress Blog)

OK, this is a reasonable expectation. But, how to move on? By replacing an “universal” PageRank with an “personalized” one?

A “personalized” PageRank

Page Rank is a brilliant piece of thinking. It was able to make use of the only semantic information that is embedded in the web syntax (the links) to evaluate quality of pages. By processing statistics of links we can understand which pages are most linked to, and this in fact allows us to access the vast amount of work of people who already read and evaluated these pages and created links to those they considered valuable.

But the links are already “milked to death” and there is nothing other in the web syntax that would give us an additional clue to quality of web content. So any attempt to move forward with the quality of web search would require introducing some new piece of syntax to the web, or, put it simply, something that would make the web content more structured. Yes, it is a tremendous task, but not impossible. And in fact, it is already happening.

Towards a more structured web

There are two possible approaches to adding more structure to the web:

Growing popularity and thus mass penetration of structured applications, like social networks.
Introducing a new piece to the web’s syntax, that would be seamlessly integrated to the existing web. My candidate: the Unique Personal Identificator (UPI).

These are quite different approaches; while the first one is based on mass adoption of structured applications, the second one is based on adoption of simple additional syntax by users. Let’s start with the first one for now.

Social network as a search engine

Social network is in fact an application that consists of

a specialized web search engine coupled with
a specialized web hosting service.

This approach has a clear motivation: the specialized search engine greatly benefits from being able to work with upfront defined structured information. So, for example, if we assume that the name is always filled in a field called “name”, company name in the appropriate field “company” (and is in addition related to the unique ticker symbol), education degree and country are selected from a pre-filled list etc., we are able to provide far better and far more relevant search results for our predefined queries than any full-text based approach can. So we are just porting the old good theory from traditional database systems to the internet. Ideally, the entire web should be structured this way!

Growing popularity of social networks

But now the interesting piece comes. The web is in fact becoming more structured, thanks to these applications. Because the search in social networks really works (well, structured search worked in traditional databases since 60’s, so why not here), these applications become useful and thus popular. The biggest social networks today contain tens of million of users and put profiles of these users on the web. Thanks to this development, a significant piece of the internet content is becoming structured in a very formal, traditional “database way”. We can even say that the web is becoming a more organized place.

Wider consequences of social networks

So there are now millions of users on the web, who took the time to create their personalized and structured profiles, and who keep these structured profiles updated. This is an amount of work that cannot be overlooked. In fact, it could already be compared (at least to certain extent) to the effort, which web users invested into linking their pages. This growing piece of structured web content will serve as a special (and welcomed!) input to universal web search engines. It can greatly improve their search capabilities in the areas where applications like social networks force people to use “strict syntax”.

Vision

This in fact doesn’t mean anything else than introduction of new syntax rules to certain application areas of the web. It is fair to expect that there will be more and more applications like social networks over the time. All these applications will have one thing in common: they all will motivate users to use the internet in a predefined, highly structured way. Whether this will result in structured personal profiles, product descriptions, descriptions of calendar events, or others, all this information will turn the internet to a more structured base of data. The amount of structured content on the internet will grow and will become a goldmine for any search engine of the future. As a result, traditional full text based web search will be complemented by more efficient tools in all areas where possible. Thank to this development, search will certainly improve. But for a really significant improvement, we should dethrone PageRank from its role of a sole and universal expert for evaluating information relevance.

PageRank Replacement?

To do this, we should implement a shift from evaluating pages to evaluating users. This would be a true revolution in the web search allowing us to search personally relevant information.

However, as we already said, this would require introducing a new piece to syntax to the entire web. Very difficult concept, indeed! Could we find out a method how to persuade users and developers to adopt this new piece of web syntax? Let us think about it next time.

Labels: internet, internet community, internet search, pagerank