top of page
Search
  • Writer's pictureMichael G

Decoding Google's Page Ranking Algorithm and Link Dynamics


Search engines, such as http://www.google.com/, are characterized by the exceptional results they return for queries. We will provide a rough explanation of how this ranking is done using the linking information between websites on the World Wide Web (WWW). Google.com assigns a positive real number, the page rank, to each page it returns. The page rank is calculated by Google.com using one of the largest eigenvalue computations with the power method. In the graph of Figure 1, each node n represents a website, and an edge from i to j represents a link from the website i to the website j.


Let

be the adjacency matrix of the graph depicted in the Figure.

For A, A(i,j) = 1 when there is a link from node i to node j and 0 otherwise. Google.com's engineers envisioned a user in this network who is on page i with probability pi. Then, the user either moves to a random page (with a specific probability q, usually 0.15) or with probability 1 − q, randomly selects one of the links on page i. The probability that the user moves from page i to page j is given by the equation:


where ni is the sum of the i-th row of A (essentially the number of links on page i). The probability that the user is on page j at any given time is the sum of p{i→j} in the i-th row of the matrix A.


Equation (1) can be written in matrix form as:

where p is the vector of n probabilities of the user being on the n websites, and G is the matrix with G(i,j)=q/n+A(j,i)(1−q)/nj. We will call the matrix G the Google matrix. Each column of the Google matrix sums to 1, making it a stochastic matrix, meaning its maximum eigenvalue is 1. The eigenvector corresponding to the eigenvalue 1 is essentially the page rank.

Based on the example given earlier, the page ranks for q = 0.15 are as follows:​

This specific eigenvector has been normalized so that the sum of its values is 1, as it should be for probabilities. The page rank is higher for pages 13 and 15, followed by pages 14, 10, and 11. Note that page rank depends not only on the input degree (number of pages pointing to a page) but is derived in a more complex way. Even though nodes 10 and 11 have the highest input degrees, their importance is transferred to nodes 13 and 15 because they link to them. This is the idea behind the google-bombing phenomenon, where the significance of a page is increased by convincing other high-traffic pages to link to it.

Note the connection between page rank and the concept of importance. Page rank is currently the best way to represent the importance of a page, even though the exact definition of importance is still unknown. Therefore, someone may find a better method to model importance.


Questions:


  1. Prove in detail that the matrix G is stochastic.

  2. Construct the matrix G for the given example and verify that the eigenvector of the maximum eigenvalue is the one provided above (using the power method).

  3. Add 4 connections of your choice and remove 1 from those already existing in the graph. Construct the new matrix G to improve the importance ranking of a page you select.

  4. Change the teleportation probability q to (a) q = 0.02 and (b) q = 0.6 in the new graph. Describe the changes in the page rank. What is the purpose of the teleportation probability?

  5. Suppose that in the initial graph of Figure 1, page 11 wanted to improve its rank compared to its competitor, page 10. Suppose it wants to achieve this by persuading pages 8 and 12 to link to page 11 more effectively. If we model this by replacing A(8,11) and A(12,11) with 3 in the adjacency matrix, does this strategy work?

  6. Study the impact of removing page 10 from the graph. Which page ranks increase, and which decrease?


Task 1:



The solution provides a detailed proof that the matrix G is stochastic by calculating the sum of each column in the matrix G. Initially, it examines the term q/n, representing the probability of the user moving to a random page. Next, it analyzes the term A(j,i)·(1−q)/nj, corresponding to the probability of the user moving to a specific page j from page i, considering the factor (1 − q), representing the likelihood of following a link instead of moving to a random page. The sum ∑ i=1 -> n G(i, j) indicates the probability of the user being on page j, derived from the overall probability of moving to page j from any other page i in the graph. The result of the sum ∑i=1 -> n G(i, j) is 1, demonstrating that the matrix G is stochastic as its column sums to 1.


Task 2:


• A: The web graph, where elements A(i, j) are 1 if there is a link from page i to page j.

• q: Probability parameter representing the likelihood of a user moving to a random page instead of following a link.

• G: The stochastic Google matrix derived from the original graph.

• randomVector, resultVector, finalVector: Probability vectors used in the iterative process.

• lambda: The largest eigenvalue of the Google matrix.

• EigenVector: The eigenvector corresponding to the largest eigenvalue, utilized as the PageRank vector.



Results:

Task 3:

We observe that the website 1 has the lowest rank. Therefore, we choose to improve its importance by adding four connections and removing one of our choice. We decide to remove the link between website 5 and 1 since 5 has a relatively low importance score. Additionally, we choose to add the connections 13-1, 15-1, 14-1, 10-1 as these websites have high importance scores. The matrix A will take the following form:

We observe that the importance score of webpage 1 has significantly increased.


Task 4:

Results: Invoking the EigenValue(A, q) method for various values of q:

We observe that in the first case, with q = 0.02, the page rank increases, while in the second case, with q = 0.6, the page rank decreases. Changes in page rank are influenced by the new teleportation probability, q. A small value of q (q=0.02) makes it more likely for users to visit new pages, whereas a higher value of q (q=0.6) makes it more likely for users to stay on already visited pages. The purpose of the teleportation probability is to model user behavior, capturing instances when users randomly jump from one page to another, regardless of the graph's connections.


Task 5:

The matrix A will have the following form:

Results after invoking the EigenValue(A, q) method for different matrices A:

We observe that the page rank of page 11 improved significantly (0.1063->0.1240) compared to page 10 (0.1063->0.1029). Therefore, the strategy employed to convince pages 8 and 12 to link to page 11 in a more effective way was successful.


Task 6:

The matrix A will have the following form:


Results after invoking the EigenValue(A, q) method for different matrices A:

We observe that the ranks of pages 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, and 15 have increased, while those of 9, 13, and 14 are decreasing.





Until the next one, cheers! 🍺

Comments


bottom of page