Approximation of metrics by tree metrics

For a given vertices $V$ and distance $V \times V \rightarrow \mathcal{R} : d$. $(V,d)$ is so called a metric if following properties are hold.

$d_{uv}$ $\ge$ $0$ for all $u,v$ $\in$ $V$
$d_{uv}$ $=$ $0$ iff $u = v$
$d_{uv}$ $=$ $d_{vu}$ for all $u,v$ $\in$ $V$
(Triangular inequality) $d_{uv}$ $\le$ $d_{uw}$ $+$ $d_{wv}$ for all $u,v,w$ $\in$ $V$

Notice that we say it is a pseudometric if property 2 changes to “$d_{uv}$ $=$ $0$ if $u = v$”.

A tree metic $(V’, T)$ for a set of vertices $V$ is a tree $T$ defined on a set of nodes $V’ \supseteq V$, whoese edges are given non negative lengths. For $u,v$ $\in$ $V’$, let $T_{uv}$ denote the length of the unique path between $u$ and $v$ in $T$.

Notice that tree matrix is a matric. It is trivial to be hold for 1 ~ 3. For 4, if there is a path $u$ to $w$ and $w$ to $v$, then concatinating them is a path from $u$ to $v$. It will have the length equal or less than sum of each path because there could be a cycle and it can be removed.

Given a metric $d$ on $V$, we say a tree metric $(V’, T)$ is a $\operatorname{tree metic embedding}$ of distortion $\alpha$ if $d_{uv}$ $\le$ $T_{uv}$ $\le$ $\alpha d_{uv}$ for all $u,v$ $\in$ $V$.

However, do we even have any approximation for it in every time? Yes, if we have gigantic distortion. However, it’s no with some distortion. In fact it is known that there is no tree metric has distortion less than $\frac{n - 1}{8}$ for some metric $d$ on $n$ vertices. However there is a good theorem.

Given a metric $d$ on $V$ such that $d_{uv}$ $\ge$ $1$ for all $u$ $\neq$ $v$, there exists a randomized algorithm that finds a tree metric $(V’, T)$ such that for all $u, v$ $\in$ $V$, $d_{uv}$ $\le$ $T_{uv}$ and $E[T_{uv}]$ $\le$ $O(\ln \left\vert V \right\vert)d_{uv}$. Notice that this randomized algorithm picks a tree metric from a given graph. Proof is like follow.

Consider the following algorithm where $B(x, r)$ is a hypersphere with the center at $x$ and radius of $r$.

Pick $r_0$ $\in$ $[\frac{1}{2}, 1)$ uniformly at random
Choose $\Delta$ as the smallest power of two greater or equal than $2\max_{u,v \in V}d_{uv}$
Let $r_i$ $=$ $2^ir_0$ for $1$ $\le$ $i$ $\le$ $\log_2 \Delta$
Pick a permutation $\pi$ of $v$ uniformly at random
$\mathcal{C}(\log_2 \Delta) \leftarrow \{V\}$
Create a node corresponding to $V$ and make it the root node
$\operatorname{for}$ $i \leftarrow \log_2 \Delta, \log_2 \Delta - 1, \cdots, 1$

$\mathcal{C}(i - 1) \leftarrow \emptyset$
$\operatorname{for}$ $C \in \mathcal{C}(i)$

$S \leftarrow C$
$\operatorname{for}$ $j \leftarrow 1, 2, \cdots, \left\vert V \right\vert$

$\operatorname{if}$ $B(\pi(j), r_{i-1})$ $\cap$ $S$ $\neq$ $\emptyset$

Add $\{B(\pi(j), r_{i-1}) \cap S\}$ to $\mathcal{C}(i -1)$
$S$ $\leftarrow$ $S$ $-$ $(B(\pi(j), r_{i-1}) \cap S)$

Create nodes corresponding to each set in $\mathcal{C}(i -1)$ and attach each node to the node in $\mathcal{C}(i)$ corresponding to its superset by an edge of length $2^i$

$V' \leftarrow$ all nodes in $\bigcup\limits_{k = 0}^{\log_2 \Delta}\mathcal{C}(k)$
$T \leftarrow$ all edges between $\mathcal{C}(k)$ and $\mathcal{C}(k - 1)$ for all $1$ $\le$ $k$ $\le$ $\log_2 \Delta$
return $(V', T)$

For an example, following graph’s result will be like follow.

If given metric is like follow.

Returned tree metric can be like follow.

Now, following is true if we call the deepest nodes as level 0 and level $i$’s parent as level $i + 1$. Note that each node at level 0 corresponding to a singleton and every vertex in $V$ appears exactly once. The reason is that there can’t be more than center itself because $r_0$ $<$ $1$ $\le$ $d_{uv}$ for all $u,v$ $\in$ $V$. Notice that every vertex at level $i$ is belongs to a hyper sphere of radius $r_i$ and centered by one of vertex in side of the set. Which also means that level $\log_2 \Delta$ will contain entire $V$ because $r_{\log_2 \Delta}$ $=$ $2^{\log_2 \Delta} r_0$ $\ge$ $2^{\log_2 \Delta} \frac{1}{2}$ $=$ $\Delta \frac{1}{2}$ $\ge$ $2\max_{u,v \in V}d_{uv}\frac{1}{2}$ $=$ $\max_{u,v \in V}d_{uv}$.

Now, let’s denote some terminologies for a node $n$ in the $(V’, T)$.

$\mathcal{L}_n$ is the level of the $n$. Notice that level of root node is $\log_2 \Delta$ and level of leaf node is $0$.
$\mathcal{S}_n$ is the set of vertices which the corresponding hyper sphere includes.

Then, there are some facts.

$d_{yz}$ $\le$ $2r_{\mathcal{L}_n}$ for any $y,z$ $\in$ $\mathcal{S}_n$ becuase it should be in the same hyper sphere.
For any $u,v$ $\in$ $V$, they can’t belongs to the same node at level $[\log_2 d_{uv}] - 1$ because otherwise $d_{uv}$ $\le$ $2r_{[\log_2 d_{uv}] - 1}$ $=$ $2 \cdot 2^{[\log_2 d_{uv}] - 1}r_0$ $=$ $2^{[\log_2 d_{uv}]}r_0$ $\le$ $2^{\log_2 d_{uv}}r_0$ $=$ $d_{uv}r_0$ $<$ $d_{uv}$ which is a contradiction.

Then, $T_{uv}$ $\ge$ $d_{uv}$ is true. Proof is like follow. If $d_{uv}$ $<$ $4$ then, $T_{uv}$ $\ge$ $d_{uv}$ because $T_{uv}$ $\ge$ $4$. Notice that $T_{uv}$ $\ge$ $4$ is true because all edges in the $T$ is bigger or equal than $2$ and there should be at least one parent to go $v$ from $u$. Now, other cases are $d_{uv}$ $\ge$ $4$. Frist, $d_{uv}$ $\le$ $\sum\limits_{k = 0}^{[\log_2 d_{uv}]} 2^k$ because RHS is bigger than a binary representation of $d_{uv}$. Then, $d_{uv}$ $\le$ $\sum\limits_{k = 0}^{[\log_2 d_{uv}]} 2^k$ $=$ $\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ $+$ $1$ $\le$ $\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ $+$ $\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ $=$ $2\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ $\le$ $T_{uv}$. Notice that last inequality comes from the fact 2 above. If we think about path from $u$ to $v$, it should go to at least level $[\log_2 d_{uv}]$ because it can’t belongs to the same node untill level $[\log_2 d_{uv}] - 1$. Also, $2\sum\limits_{k = 1}^{[\log_2 d_{uv}]} 2^k$ is a sum of length from $u$ to $v$ via level $[\log_2 d_{uv}]$. Therefore, claim holds.

Now there are only one thing to show that $E[T_{uv}]$ $\le$ $O(\ln \left\vert V \right\vert)d_{uv}$.

To show this, there some terminologies to define.

$\mathcal{A}_{uv}$ is the least common ancestor of $u$ and $v$.
We say $w$ settles the pair of $u$ and $v$ on $i$ if $w$ is the first vertex in permutation $\pi$ such that at least one of $u$ and $v$ is in the hypersphere $B(w, r_i)$.
We say $w$ cut $u$ and $v$ on level $i$ if exactly one of $u$ and $v$ is in the hypersphere $B(w, r_i)$.
$X_{iw}(u,v)$ be the event that $w$ cuts $(u, v)$ on level $i$.
$S_{iW}(u,v)$ be the event that $w$ settles $(u, v)$ on level $i$.
$\mathbb{1}(x)$ is an indicator fuction such that $1$ if $x$ is true $0$ otherwise.

First, $T_{uv}$ $=$ $2\sum_{k=1}^{\mathcal{L}_{\mathcal{A}_{uv}}}2^k$ $=$ $2(2^{\mathcal{L}_{\mathcal{A}_{uv}} + 1} - 2)$ $=$ $2^{\mathcal{L}_{\mathcal{A}_{uv}} + 2} - 4$ $\le$ $2^{\mathcal{L}_{\mathcal{A}_{uv}} + 2}$.

Then, $T_{uv}$ $\le$ $\max\limits_{i = 0}^{\log_2 \Delta - 1} \mathbb{1}(\exists X_{iw}(u,v) \cap S_{iw}(u,v) \text{ for } w \in V) 2^{i + 3}$.

Notice that $\operatorname{argmax}\limits_{i = 0}^{\log_2 \Delta - 1} \mathbb{1}(\exists X_{iw}(u,v) \cap S_{iw}(u,v) \text{ for } w \in V)2^{i + 3}$ is the $\mathcal{A}_{uv}$.

Therefore, $T_{uv}$ $\le$ $\max\limits_{i = 0}^{\log_2 \Delta - 1} \mathbb{1}(\exists X_{iw}(u,v) \cap S_{iw}(u,v) \text{ for } w \in V) 2^{i + 3}$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} \mathbb{1}(\exists X_{iw}(u,v) \cap S_{iw}(u,v)) 2^{i + 3}$.

Now if we go back to the algorithm, there are two random events at the algorithm which are “picking $r_0$” and “picking a random permutation $\pi$”. Therefore, we can average on certain path to select $T_{uv}$. Then, $E[T_{uv}]$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v) \cap S_{iw}(u,v)] 2^{i + 3}$. Notice that existance of $X_{iw}(u,v) \cap S_{iw}(u,v)$ for $w$ $\in$ $V$ on level $i$ will be depend on random varaibles and other things are indepedent with the random variable.

Now, let’s consider followings.

With out loosing generality, let’s assume that $d_{uw}$ $\le$ $d_{vw}$ for $w$ that cuts $u$ and $v$ on level $i$. Then, $d_{uw}$ $\le$ $r_i$ $<$ $d_{vw}$ because $u$ should be in $B(w, r_i)$ and $v$ shouldn’t. With above, we need to select $r_i$ uniformly at random with $2^ir_0$. Then we will pick $r_i$ in $[2^{i-1},2^i)$ because we will pick $r_0$ in $[\frac{1}{2}, 1)$. Now, let’s think about the $Pr[X_{iw}(u,v)]$. Then, $Pr[X_{iw}(u,v)]$ $=$ $\frac{\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert}{\left\vert [2^{i-1}, 2^i \right\vert}$ $=$ $\frac{\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert}{2^{i-1}}$. As a result, $\sum\limits_{i=0}^{\log_2 \Delta - 1}Pr[X_{iw}(u,v)]2^{i + 3}$ $=$ $\sum\limits_{i=0}^{\log_2 \Delta - 1}\frac{\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert}{2^{i-1}}2^{i + 3}$ $=$ $\sum\limits_{i=0}^{\log_2 \Delta - 1}\frac{2^{i + 3}}{2^{i-1}}\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert$ $=$ $2^4\sum\limits_{i=0}^{\log_2 \Delta - 1}\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert$ $=$ $16\sum\limits_{i=0}^{\log_2 \Delta - 1}\left\vert [2^{i-1}, 2^i) \cap [d_{uw}, d_{vw}) \right\vert$ $=$ $16\left\vert [d_{uw}, d_{vw}) \right\vert$ for any $w$. Notice that $\bigcup\limits_{i=0}^{\log_2 \Delta - 1}[2^{i-1}, 2^i)$ $=$ $[2^{-1}, 2^{\log_2 \Delta - 1})$ $=$ $[2^{-1}, 2^{-1}\Delta)$ $=$ $[2^{-1}, 2^{-1}2\max_{u,v \in V}d_{uv})$ $=$ $[2^{-1}, \max_{u,v \in V}d_{uv})$ $\supseteq$ set of possible $d_{uv}$ for all $u,v$ $\in$ $V$. Which means it will see all possible area. Notice that $d_{uv}$ $\ge$ $1$ and $Pr[X_{iw}(u,v)]$ means probability that $w$ on level $i$ will cut $u$ and $v$. As a result, $Pr[X_{iw}(u,v)]$ $=$ $16\left\vert [d_{uw}, d_{vw}) \right\vert$ $=$ $16(d_{vw} - d{uw})$ $\le$ $16(d_{vu} + d_{uw} - d_{uw})$ $=$ $16d_{vu}$ $=$ $16d_{uv}$ from the triangular inequality.

Now, let’s think about list $\mathbb{L}$ from $V$ such that sorted in the order $\min(d_{ux}, d_{vx})$ for $x$ $\in$ $\mathbb{L}$. Now, think about 2 things.

Some $w$ $\in$ $V$ such that $w$ cuts $u$ and $v$.
Let’s denote $w$’s level as $i$.
Let’s denote $w$’s index in $\mathbb{L}$ is $\mathcal{I}_w$

Then, one of $u$ or $v$ should be in $B(z, r_i)$ for any $z$ that has less index than $\mathcal{I}_w$. Notice that $\min(d_{uz}, d_{vz})$ $\le$ $\min(d_{uw}, d_{vw})$ $\le$ $r_i$ because it is sorted by $\min(d_{ux}, d_{vx})$ and $w$ cuts $u$ and $v$. Then, $Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $\le$ $\frac{1}{\mathcal{I}_w}$ because $w$ need to be placed in $\pi$ more previous than all such $z$s. Notice that there are at least $\mathcal{I}_w$ candiadates that can cut $u$ and $v$ and $w$ need to be the first on $\pi$ to settle $u$ and $v$. Moreover, $\sum\limits_{w \in V}Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $\le$ $\sum\limits_{w \in V}\frac{1}{\mathcal{I}_w}$ for a fixed $i$. Notice that if we fix $i$, $\mathcal{I}_w$ will be some arbitrary order of vertices. However that will be still fixed in some way after set-up random variables. Which means $\sum\limits_{w \in V}Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $=$ $\sum\limits_{w \in V : \mathcal{I}_w = k, w \text{ cuts } u \text{ and } v}\frac{1}{k}$ $=$ $\sum\limits_{k = 1 : \mathcal{I}_w = k, w \text{ cuts } u \text{ and } v}^{\left\vert V \right\vert}\frac{1}{k}$ $\le$ $\sum\limits_{k = 1}^{\left\vert V \right\vert}\frac{1}{k}$ $\le$ $\ln \left\vert V \right\vert$ $+$ $1$. Finally, notice that $\mathcal{I}_w$ doesn’t depend on $i$ because it only depend on $\min(d_{ux}, d_{vx})$. Infact, it doesn’t even depend on any random varaible.

Now, following 4 things are true for summary.

$E[T_{uv}]$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v) \cap S_{iw}(u,v)] 2^{i + 3}$
$\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v)]2^{i + 3}$ $\le$ $16d_{uv}$
$Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $\le$ $\frac{1}{\mathcal{I}_w}$
$\sum\limits_{w \in V} \frac{1}{\mathcal{I}_w}$ $\le$ $\ln \left\vert V \right\vert$ $+$ $1$

As a result, $E[T_{uv}]$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v) \cap S_{iw}(u,v)] 2^{i + 3}$ $=$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]Pr[X_{iw}(u,v)] 2^{i + 3}$ $=$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v)]2^{i + 3} Pr[S_{iw}(u,v) \vert X_{iw}(u,v)]$ $\le$ $\sum\limits_{w \in V}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v)]2^{i + 3}\frac{1}{\mathcal{I}_w}$ $=$ $\sum\limits_{w \in V}\frac{1}{\mathcal{I}_w}\sum\limits_{i = 0}^{\log_2 \Delta - 1} Pr[X_{iw}(u,v)]2^{i + 3}$ $\le$ $\sum\limits_{w \in V}\frac{1}{\mathcal{I}_w}16d_{uv}$ $=$ $16d_{uv}\sum\limits_{w \in V} \frac{1}{\mathcal{I}_w}$ $\le$ $16d{uv}(\ln \left\vert V \right\vert + 1)$ $=$ $O(\ln \left\vert V \right\vert)d_{uv}$.

Therefore, claim holds.