Parallel PageRank and BFS

PageRank

PageRank is an algorithm that determines the relative importance of vertices in a graph. For each iteration, each vertex will be updated by nearby vertices. Let’s define some terms.

$r_i^t$ as the rank of vertex $i$ at iteration $t$, with $\sum\limits_{i \in V}r_i^0 = 1$.
$\delta(i)$ as the set of neighboring vertices of vertex $i$.
$\beta$ is the convergence damping factor, with $0 \le \beta \le 1$.

With terms above, $r_i^{t + 1}$ $=$ $\beta\sum\limits_{j \in \delta(i)}\frac{r_j^t}{|\delta(j)|}$ $+$ $(1 - \beta)\frac{1}{|V|}$. If we keep compute this, it will converges to some values and that is the result of the page rank. Note that if $\sum\limits_{i \in V}r_i^t = 1$, then it remains consistent after each iteration as well. Proof is like follow.

$\sum\limits_{i \in V}r_i^{t + 1}$ $=$ $\sum\limits_{i \in V}(\beta\sum\limits_{j \in \delta(i)}\frac{r_j^t}{|\delta(j)|}$ $+$ $(1 - \beta)\frac{1}{|V|})$ $=$ $\beta\sum\limits_{i \in V}\sum\limits_{j \in \delta(i)}\frac{r_j^t}{|\delta(j)|}$ $+$ $(1 - \beta)\sum\limits_{i \in V}\frac{1}{|V|}$ $=$ $\beta\sum\limits_{(i,j) \in E}\frac{r_j^t}{|\delta(j)|}$ $+$ $(1 - \beta)\frac{|V|}{|V|}$ $=$ $\beta\sum\limits_{j \in V}\sum\limits_{i \in \delta(j)}\frac{r_j^t}{|\delta(j)|}$ $+$ $(1 - \beta)$ $=$ $\beta\sum\limits_{j \in V}\frac{r_j^t}{|\delta(j)|}\sum\limits_{i \in \delta(j)}1$ $+$ $(1 - \beta)$ $=$ $\beta\sum\limits_{j \in V}\frac{r_j^t}{|\delta(j)|}|\delta(j)|$ $+$ $(1 - \beta)$ $=$ $\beta\sum\limits_{j \in V}r_j^t$ $+$ $(1 - \beta)$ $=$ $\beta$ $+$ $(1 - \beta)$ $=$ $1$.

$\operatorname{for} i \leftarrow 0,\cdots,|V| - 1$

$s[i] \leftarrow 1/|V|$

$\operatorname{while} error \le threshold$

$error \leftarrow 0$
$\operatorname{for} v \leftarrow 0,\cdots,|V| - 1$

$o[v] \leftarrow s[v]$
$s[v] \leftarrow 0$

$\operatorname{for} v \leftarrow 0,\cdots,|V| - 1$

$\operatorname{for} u \in \delta(v)$

$s[v] \leftarrow s[v] + \beta \frac{o[u]}{|\delta(u)|}$

$s[v] \leftarrow s[v] + (1 - \beta) \frac{1}{|V|}$
$error \leftarrow error + |o[v] - s[v]|$

The algorithm above can be parallelized across the for loops because there is no dependency between iterations of each loop. As a result, page rank can be parallelized like below.

$\operatorname{for} i \leftarrow 0,\cdots,|V| - 1 \operatorname{in} \operatorname{parallel}$

$s[i] \leftarrow 1/|V|$

$\operatorname{while} error \le threshold$

$error \leftarrow 0$
$\operatorname{for} v \leftarrow 0,\cdots,|V| - 1 \operatorname{in} \operatorname{parallel}$

$o[v] \leftarrow s[v]$
$s[v] \leftarrow 0$

$\operatorname{for} v \leftarrow 0,\cdots,|V| - 1 \operatorname{in} \operatorname{parallel}$

$\operatorname{for} u \in \delta(v) \operatorname{in} \operatorname{reduction}$

$s[v] \leftarrow s[v] + \beta \frac{o[u]}{|\delta(u)|}$

$s[v] \leftarrow s[v] + (1 - \beta) \frac{1}{|V|}$

$\operatorname{for} v \leftarrow 0,\cdots,|V| - 1 \operatorname{in} \operatorname{reduction}$

$error \leftarrow error + |o[v] - s[v]|$

There are two major parallelization techniques used above.

$\operatorname{for} \operatorname{in} \operatorname{parallel}$
$\operatorname{for} \operatorname{in} \operatorname{reduction}$

for in parallel means the iterations can be fully parallelized independently. for in reduction means the iterations are fully parallel but the results are combined in tree order. The first technique applies when there are no inter-iteration dependencies. The second applies when iterations write to a shared accumulator — there are read dependencies but no write conflicts. Inspecting the algorithm above confirms there are read dependencies but no write dependencies.

BFS

BFS (Breadth-First Search) is a graph traversal algorithm that explores vertices level by level from a source. It can be used for various purposes: computing shortest distances from the source, finding connected components, and more. Here, we use BFS to assign each vertex its distance from the source. Let $\delta(v)$ denote the set of neighbors of $v$.

$\operatorname{for} i \leftarrow 0, \cdots, |V| - 1$

$d[i] = -1$

$Q \leftarrow \emptyset$
$Q.push(s)$
$d[s] = 0$
$\operatorname{while} Q \neq \emptyset$

$v \leftarrow Q_{top}$
$Q.pop()$
$\operatorname{for} u \in \delta(v)$

$\operatorname{if} d[u] = -1$

$d[u] = d[v] + 1$
$Q.insert(u)$

Now, it can be parallelized per vetices at the same depth.

$\operatorname{for} i \leftarrow 0, \cdots, |V| - 1 \operatorname{in} \operatorname{parallel}$

$d[i] = -1$

$Q \leftarrow \emptyset$
$Q.push(s)$
$d[s] = 0$
$\operatorname{while} Q \neq \emptyset$

$Q^n \leftarrow \emptyset$
$\operatorname{for} v \in Q \text{ with the same depth } \operatorname{in} \operatorname{parallel}$

$\operatorname{for} u \in \delta(v)$

$\operatorname{if} d[u] = -1$

Critical section

$d[u] = d[v] + 1$
$Q^n.insert(u)$

$Q \leftarrow Q^n$

However, there are two problems with this approach. First, a critical section is needed to update shared state, which limits scalability. We can avoid locking $Q$ by using thread-local queues, but $d[u]$ still requires synchronization. Second, many threads may race to update the same neighbor $u$, resulting in wasted work.

To address these issues, there is an alternative approach known as the bottom-up BFS. To do this, let’s define $\delta(v)$ as incoming edges in this case.

$\operatorname{for} i \leftarrow 0, \cdots, |V| - 1 \operatorname{in} \operatorname{parallel}$

$d[i] = -1$

$Q \leftarrow \emptyset$
$Q.push(s)$
$d[s] = 0$
$\operatorname{while} \text{distance changed}$

$Q^n \leftarrow \emptyset$
$\operatorname{for} v \in V \operatorname{in} \operatorname{parallel}$

$\operatorname{if} d[v] = -1$

$\operatorname{for} u \in \delta(v)$

$\operatorname{if} u \in Q$

$d[v] = d[u] + 1$
Critical section

$Q^n.insert(v)$

$Q \leftarrow Q^n$

This approach makes better use of the full computing resources by avoiding many unnecessary collisions. It also parallelizes more efficiently because there is no critical section needed to update each node’s distance. Therefore, bottom-up BFS often outperforms top-down BFS for dense frontier cases. However, it is known to be optimal to use a hybrid strategy, switching between the two approaches based on the current state. The algorithm starts with top-down BFS. When the frontier becomes large (many edges to explore), it switches to the bottom-up approach. When the frontier shrinks again (few unvisited vertices remain), it switches back to top-down.