Approximating Metrics by Tree Metrics

For a given vertices $V$ and distance $V \times V \rightarrow \mathcal{R} : d$. $(V,d)$ is so called a metric if following properties are hold.

$d_{uv}$ $\ge$ $0$ for all $u,v$ $\in$ $V$
$d_{uv}$ $=$ $0$ iff $u = v$
$d_{uv}$ $=$ $d_{vu}$ for all $u,v$ $\in$ $V$
(Triangular inequality) $d_{uv}$ $\le$ $d_{uw}$ $+$ $d_{wv}$ for all $u,v,w$ $\in$ $V$

Notice that we say it is a pseudometric if property 2 changes to “$d_{uv}$ $=$ $0$ if $u = v$”.

A tree metric $(V’, T)$ for a vertex set $V$ is a tree $T$ defined on a node set $V’ \supseteq V$, with non-negative edge lengths. For $u, v \in V’$, let $T_{uv}$ denote the length of the unique path between $u$ and $v$ in $T$.

A tree metric is indeed a metric. Properties 1–3 are trivial to verify. For property 4 (triangle inequality): if there is a path from $u$ to $w$ and from $w$ to $v$, concatenating them gives a path from $u$ to $v$ with length at most the sum of the two sub-paths (a cycle, if any, can be removed to shorten it).

Given a metric $d$ on $V$, we say a tree metric $(V’, T)$ is a $\operatorname{tree metic embedding}$ of distortion $\alpha$ if $d_{uv}$ $\le$ $T_{uv}$ $\le$ $\alpha d_{uv}$ for all $u,v$ $\in$ $V$.

Can any metric be approximated by a tree metric? With large enough distortion, yes. But with bounded distortion, no — in fact, there exist metrics on $n$ vertices for which no tree metric has distortion less than $\frac{n-1}{8}$. However, the following theorem gives a useful randomized result.

Given a metric $d$ on $V$ such that $d_{uv}$ $\ge$ $1$ for all $u$ $\neq$ $v$, there exists a randomized algorithm that finds a tree metric $(V’, T)$ such that for all $u, v$ $\in$ $V$, $d_{uv}$ $\le$ $T_{uv}$ and $E[T_{uv}]$ $\le$ $O(\ln \left\vert V \right\vert)d_{uv}$. Notice that this randomized algorithm picks a tree metric from a given graph. Proof is like follow.

Consider the following algorithm where $B(x, r)$ is a hypersphere with the center at $x$ and radius of $r$.

Pick $r_0$ $\in$ $[\frac{1}{2}, 1)$ uniformly at random
Choose $\Delta$ as the smallest power of two greater or equal than $2\max_{u,v \in V}d_{uv}$
Let $r_i$ $=$ $2^ir_0$ for $1$ $\le$ $i$ $\le$ $\log_2 \Delta$
Pick a permutation $\pi$ of $v$ uniformly at random
$\mathcal{C}(\log_2 \Delta) \leftarrow \{V\}$
Create a node corresponding to $V$ and make it the root node
$\operatorname{for}$ $i \leftarrow \log_2 \Delta, \log_2 \Delta - 1, \cdots, 1$

$\mathcal{C}(i - 1) \leftarrow \emptyset$
$\operatorname{for}$ $C \in \mathcal{C}(i)$

$S \leftarrow C$
$\operatorname{for}$ $j \leftarrow 1, 2, \cdots, \left\vert V \right\vert$

$\operatorname{if}$ $B(\pi(j), r_{i-1})$ $\cap$ $S$ $\neq$ $\emptyset$

Add $\{B(\pi(j), r_{i-1}) \cap S\}$ to $\mathcal{C}(i -1)$
$S$ $\leftarrow$ $S$ $-$ $(B(\pi(j), r_{i-1}) \cap S)$

Create nodes corresponding to each set in $\mathcal{C}(i -1)$ and attach each node to the node in $\mathcal{C}(i)$ corresponding to its superset by an edge of length $2^i$

$V' \leftarrow$ all nodes in $\bigcup\limits_{k = 0}^{\log_2 \Delta}\mathcal{C}(k)$
$T \leftarrow$ all edges between $\mathcal{C}(k)$ and $\mathcal{C}(k - 1)$ for all $1$ $\le$ $k$ $\le$ $\log_2 \Delta$
return $(V', T)$

For an example, following graph’s result will be like follow.

If given metric is like follow.

Returned tree metric can be like follow.

Now, following is true if we call the deepest nodes as level 0 and level $i$’s parent as level $i + 1$. Note that each node at level 0 corresponding to a singleton and every vertex in $V$ appears exactly once. The reason is that there can’t be more than center itself because $r_0$ $<$ $1$ $\le$ $d_{uv}$ for all $u,v$ $\in$ $V$. Notice that every vertex at level $i$ is belongs to a hyper sphere of radius $r_i$ and centered by one of vertex in side of the set. Which also means that level $\log_2 \Delta$ will contain entire $V$ because $r_{\log_2 \Delta}$ $=$ $2^{\log_2 \Delta} r_0$ $\ge$ $2^{\log_2 \Delta} \frac{1}{2}$ $=$ $\Delta \frac{1}{2}$ $\ge$ $2\max_{u,v \in V}d_{uv}\frac{1}{2}$ $=$ $\max_{u,v \in V}d_{uv}$.

Now, let’s denote some terminologies for a node $n$ in the $(V’, T)$.

$\mathcal{L}_n$ is the level of the $n$. Notice that level of root node is $\log_2 \Delta$ and level of leaf node is $0$.
$\mathcal{S}_n$ is the set of vertices which the corresponding hyper sphere includes.

Then, there are some facts.

$d_{yz}$ $\le$ $2r_{\mathcal{L}_n}$ for any $y,z$ $\in$ $\mathcal{S}_n$ becuase it should be in the same hyper sphere.
For any $u,v$ $\in$ $V$, they can’t belongs to the same node at level $[\log_2 d_{uv}] - 1$ because otherwise $d_{uv}$ $\le$ $2r_{[\log_2 d_{uv}] - 1}$ $=$ $2 \cdot 2^{[\log_2 d_{uv}] - 1}r_0$ $=$ $2^{[\log_2 d_{uv}]}r_0$ $\le$ $2^{\log_2 d_{uv}}r_0$ $=$ $d_{uv}r_0$ $<$ $d_{uv}$ which is a contradiction.

Then, $T_{uv}$ $\ge$ $d_{uv}$ is true. Proof is like follow. If $d_{uv}$ $<$ $4$ then, $T_{uv}$ $\ge$ $d_{uv}$ because $T_{uv}$ $\ge$ $4$. Notice that $T_{uv}$ $\ge$ $4$ is true because all edges in the $T$ is bigger or equal than $2$ and there should be at least one parent to go $v$ from $u$. Now, other cases are $d_{uv}$ $\ge$ $4$. Frist, $d_{uv}$ $\le$ $\sum\limits_{k = 0}^{[\log_2 d_{uv}]} 2^k$ because RHS is bigger than a binary representation of $d_{uv}$. Then, $d_{uv}$ $\le$ $\sum\limits_{k = 0}^{[\log_2 d_{uv}]} 2^k$ $=$ $\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ $+$ $1$ $\le$ $\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ $+$ $\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ $=$ $2\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ $\le$ $T_{uv}$. Notice that last inequality comes from the fact 2 above. If we think about path from $u$ to $v$, it should go to at least level $[\log_2 d_{uv}]$ because it can’t belongs to the same node untill level $[\log_2 d_{uv}] - 1$. Also, $2\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ is a sum of length from $u$ to $v$ via level $[\log_2 d_{uv}]$. Therefore, claim holds.

Now there are only one thing to show that $E[T_{uv}]$ $\le$ $O(\ln \left\vert V \right\vert)d_{uv}$.

To show this, there some terminologies to define.

$\mathcal{A}_{uv}$ is the least common ancestor of $u$ and $v$.
We say $w$ settles the pair $(u, v)$ on level $i$ if $w$ is the first vertex in $\pi$ such that at least one of $u, v$ is in $B(w, r_i)$.
We say $w$ cuts $(u, v)$ on level $i$ if exactly one of $u, v$ is in $B(w, r_i)$.
$X_{iw}(u,v)$ is the event that $w$ cuts $(u, v)$ on level $i$.
$S_{iw}(u,v)$ is the event that $w$ settles $(u, v)$ on level $i$.
$\mathbb{1}(x)$ is the indicator function: $1$ if $x$ is true, $0$ otherwise.

First, $T_{uv}$ $=$ $2\sum_{k=1}^{\mathcal{L}_{\mathcal{A}_{uv}}}2^k$ $=$ $2(2^{\mathcal{L}_{\mathcal{A}_{uv}} + 1} - 2)$ $=$ $2^{\mathcal{L}_{\mathcal{A}_{uv}} + 2} - 4$ $\le$ $2^{\mathcal{L}_{\mathcal{A}_{uv}} + 2}$.

Then, $T_{uv}$ $\le$ $\max\limits_{i = 0}^{\log_2 \Delta - 1} \mathbb{1}(\exists X_{iw}(u,v) \cap S_{iw}(u,v) \text{ for } w \in V) 2^{i + 3}$.

Notice that $\operatorname{argmax}\limits_{i = 0}^{\log_2 \Delta - 1} \mathbb{1}(\exists X_{iw}(u,v) \cap S_{iw}(u,v) \text{ for } w \in V)2^{i + 3}$ is the $\mathcal{A}_{uv}$.

Therefore, $T_{uv}$ $\le$ $\max\limits_{i = 0}^{\log_2 \Delta - 1} \mathbb{1}(\exists X_{iw}(u,v) \cap S_{iw}(u,v) \text{ for } w \in V) 2^{i + 3}$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} \mathbb{1}(\exists X_{iw}(u,v) \cap S_{iw}(u,v)) 2^{i + 3}$.

Now if we go back to the algorithm, there are two random events at the algorithm which are “picking $r_0$” and “picking a random permutation $\pi$”. Therefore, we can average on certain path to select $T_{uv}$. Then, $E[T_{uv}]$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v) \cap S_{iw}(u,v)] 2^{i + 3}$. Note that the existence of $X_{iw}(u,v) \cap S_{iw}(u,v)$ for $w \in V$ on level $i$ depends on the random variables; all other quantities are deterministic.

Now, let’s consider followings.

With out loosing generality, let’s assume that $d_{uw}$ $\le$ $d_{vw}$ for $w$ that cuts $u$ and $v$ on level $i$. Then, $d_{uw}$ $\le$ $r_i$ $<$ $d_{vw}$ because $u$ should be in $B(w, r_i)$ and $v$ shouldn’t. With above, we need to select $r_i$ uniformly at random with $2^ir_0$. Then we will pick $r_i$ in $[2^{i-1},2^i)$ because we will pick $r_0$ in $[\frac{1}{2}, 1)$. Now, let’s think about the $Pr[X_{iw}(u,v)]$. Then, $Pr[X_{iw}(u,v)]$ $=$ $\frac{\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert}{\left\vert [2^{i-1}, 2^i \right\vert}$ $=$ $\frac{\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert}{2^{i-1}}$. As a result, $\sum\limits_{i=0}^{\log_2 \Delta - 1}Pr[X_{iw}(u,v)]2^{i + 3}$ $=$ $\sum\limits_{i=0}^{\log_2 \Delta - 1}\frac{\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert}{2^{i-1}}2^{i + 3}$ $=$ $\sum\limits_{i=0}^{\log_2 \Delta - 1}\frac{2^{i + 3}}{2^{i-1}}\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert$ $=$ $2^4\sum\limits_{i=0}^{\log_2 \Delta - 1}\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert$ $=$ $16\sum\limits_{i=0}^{\log_2 \Delta - 1}\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert$ $=$ $16\left\vert [d_{uw}, d_{vw}) \right\vert$ for any $w$. Notice that $\bigcup\limits_{i=0}^{\log_2 \Delta - 1}[2^{i-1}, 2^i)$ $=$ $[2^{-1}, 2^{\log_2 \Delta - 1})$ $=$ $[2^{-1}, 2^{-1}\Delta)$ $=$ $[2^{-1}, 2^{-1}2\max_{u,v \in V}d_{uv})$ $=$ $[2^{-1}, \max_{u,v \in V}d_{uv})$ $\supseteq$ set of possible $d_{uv}$ for all $u,v$ $\in$ $V$. Which means it will see all possible area. Notice that $d_{uv}$ $\ge$ $1$ and $Pr[X_{iw}(u,v)]$ means probability that $w$ on level $i$ will cut $u$ and $v$. As a result, $Pr[X_{iw}(u,v)]$ $=$ $16\left\vert [d_{uw}, d_{vw}) \right\vert$ $=$ $16(d_{vw} - d{uw})$ $\le$ $16(d_{vu} + d_{uw} - d_{uw})$ $=$ $16d_{vu}$ $=$ $16d_{uv}$ from the triangular inequality.

Now, let’s think about list $\mathbb{L}$ from $V$ such that sorted in the order $\min(d_{ux}, d_{vx})$ for $x$ $\in$ $\mathbb{L}$. Now, think about 2 things.

Some $w$ $\in$ $V$ such that $w$ cuts $u$ and $v$.
Let’s denote $w$’s level as $i$.
Let’s denote $w$’s index in $\mathbb{L}$ is $\mathcal{I}_w$

Then, one of $u$ or $v$ should be in $B(z, r_i)$ for any $z$ that has less index than $\mathcal{I}_w$. Notice that $\min(d_{uz}, d_{vz})$ $\le$ $\min(d_{uw}, d_{vw})$ $\le$ $r_i$ because it is sorted by $\min(d_{ux}, d_{vx})$ and $w$ cuts $u$ and $v$. Then, $Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $\le$ $\frac{1}{\mathcal{I}_w}$ because $w$ need to be placed in $\pi$ more previous than all such $z$s. Notice that there are at least $\mathcal{I}_w$ candiadates that can cut $u$ and $v$ and $w$ need to be the first on $\pi$ to settle $u$ and $v$. Moreover, $\sum\limits_{w \in V}Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $\le$ $\sum\limits_{w \in V}\frac{1}{\mathcal{I}_w}$ for a fixed $i$. Notice that if we fix $i$, $\mathcal{I}_w$ will be some arbitrary order of vertices. However that will be still fixed in some way after set-up random variables. Which means $\sum\limits_{w \in V}Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $=$ $\sum\limits_{w \in V : \mathcal{I}_w = k, w \text{ cuts } u \text{ and } v}\frac{1}{k}$ $=$ $\sum\limits_{k = 1 : \mathcal{I}_w = k, w \text{ cuts } u \text{ and } v}^{\left\vert V \right\vert}\frac{1}{k}$ $\le$ $\sum\limits_{k = 1}^{\left\vert V \right\vert}\frac{1}{k}$ $\le$ $\ln \left\vert V \right\vert$ $+$ $1$. Finally, notice that $\mathcal{I}_w$ doesn’t depend on $i$ because it only depend on $\min(d_{ux}, d_{vx})$. Infact, it doesn’t even depend on any random varaible.

Now, following 4 things are true for summary.

$E[T_{uv}]$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v) \cap S_{iw}(u,v)] 2^{i + 3}$
$\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v)]2^{i + 3}$ $\le$ $16d_{uv}$
$Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $\le$ $\frac{1}{\mathcal{I}_w}$
$\sum\limits_{w \in V} \frac{1}{\mathcal{I}_w}$ $\le$ $\ln \left\vert V \right\vert$ $+$ $1$

As a result, $E[T_{uv}]$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v) \cap S_{iw}(u,v)] 2^{i + 3}$ $=$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]Pr[X_{iw}(u,v)] 2^{i + 3}$ $=$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v)]2^{i + 3} Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v)]2^{i + 3}\frac{1}{\mathcal{I}_w}$ $=$ $\sum\limits_{w \in V}\frac{1}{\mathcal{I}_w}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v)]2^{i + 3}$ $\le$ $\sum\limits_{w \in V}\frac{1}{\mathcal{I}_w}16d_{uv}$ $=$ $16d_{uv}\sum\limits_{w \in V} \frac{1}{\mathcal{I}_w}$ $\le$ $16d{uv}(\ln \left\vert V \right\vert + 1)$ $=$ $O(\ln \left\vert V \right\vert)d_{uv}$.

Therefore, claim holds.