Approximation Algorithm (3): Set Cover

Recap $\operatorname{SET COVER}$ can be LP-relaxed to “Minimize the $\sum\limits_{j=1}^{m} w_j x_j$ such that $\sum\limits_{j:e_i \in S_j} x_j \ge 1$, $x_j \ge 0$ then choose $x_j$ to $1$ which $x_j \ge \frac{1}{f}$ and otherwise to 0”. Which is $f$-approximation algorithm. Which $e_i$ denotes each elements, $S_j$ denotes each subsets, $w_j$ denotes costs of each subsets, $x_j$ is the chocie of each subset. With $f$ is the maximum number of existance of elements.

From the above, we can construct an $f$-approximation algorithm. However, $f$ may be arbitrarily large, so we seek a better approximation.

Here are some different approaches to this $\operatorname{SET COVER}$ with $O(ln(n))$-approximation.

Greedy algorithm

$I \leftarrow \emptyset$
$\hat S_j \leftarrow S_j \forall j$
$\operatorname{while}$ $I$ is not a set cover $\operatorname{do}$
$l \leftarrow \operatorname{argmin}_{j:\hat S_j \neq \emptyset} \frac{w_j}{|\hat S_j|}$
$I \leftarrow I \cup$ {$l$}
$\hat S_j \leftarrow \hat S_j - S_l \forall j$

Algorithm above is a natural algorithm for $\operatorname{SET COVER}$ and it is the $O(ln(n))$-approximation algorithm.

First, it is trivial that it returns a valid solution because it terminates iff returned solution is a $\operatorname{SET COVER}$. Second, this algorithm runs in polynomial time because, loops ends up in polynomial time and each step works in only polynomial time.

Now, only left part is that proving solution is bounded by $O(ln(n))$ factor of original solution. First of all, let’s define $I$ as the final solution, $w_1, w_2, \cdots, w_{|I|}$ to weights of each start of iteration and $\hat S_1, \hat S_2, \cdots, \hat S_{|I|}$ to uncovered elements in each start of iteration. From the fact that we will choose the minimum average weights, $w_j \le \frac{|\hat S_j| - |\hat S_{j + 1}|}{|\hat S_j|}\operatorname{OPT}$. Which means $\sum\limits_{j \in I} w_j$ $\le$ $\sum\limits_{j = 1}^{|I|} \frac{|\hat S_j| - |\hat S_{j + 1}|}{|\hat S_j|}\operatorname{OPT}$ $\le$ $\operatorname{OPT}\sum\limits_{j \in I}(\frac{1}{|\hat S_j|} + \frac{1}{|\hat S_j - 1|} + \frac{1}{|\hat S_j - 2|} + \cdots + \frac{1}{|\hat S_{j+1}| + 1})$ $=$ $\operatorname{OPT}\sum\limits_{i}^{n}\frac{1}{i}$ $\le$ $\operatorname{OPT}(1 + ln(n))$.

Random rounding

Another approach is to round each $x_j$ randomly, setting $x_j = 1$ with probability $x_j$ and $x_j = 0$ with probability $1 - x_j$. Note that this algorithm will never set $x_j > 1$.

We cannot guarantee that a single round returns a good solution (it might set all $x_j = 1$). However, we can provide guarantees in expectation, which is standard for randomized approximation algorithms. $E[\sum\limits_{j=1}^{m} w_j X_j]$ $=$ $\sum\limits_{j=1}^{m} Pr[X_j = 1]$ $=$ $\sum\limits_{j=1}^{m} w_j x_j \le \operatorname{OPT}$. As a result, it gives a solution which is better than $\operatorname{OPT}$. However, it never can’t better than the optimal solution unless it doesn’t cover all elements in the ground set. Let’s check whether it really can’t cover all elements. $Pr[e_i$ not covered$]$ $=$ $\prod\limits_{j:e_i \in S_j} (1 - x_j)$ $\le$ $\prod\limits_{j:e_i \in S_j} e^{-x_j}$ $=$ $e^{-\sum_{j:e_i \in S_j} x_j}$ $\le$ $e^{-1}$ As a result, the probability that any given element remains uncovered is at most $e^{-1}$. This means the algorithm is likely to leave some elements uncovered in a single round.

To fix this, we repeat the rounding $c \ln n$ times and take the union: $Pr[e_i$ not covered$]$ $=$ $\prod\limits_{j:e_i \in S_j} (1 - x_j)^{c \operatorname{ln} n}$ $\le$ $\prod\limits_{j:e_i \in S_j} e^{-x_j (c \operatorname{ln} n)}$ $=$ $e^{-(c \operatorname{ln} n) \sum_{j:e_i \in S_j} x_j}$ $\le$ $\frac{1}{n^c}$. As a result, $\Pr[\text{there exists an uncovered element}]$ $\le$ $\sum\limits_{i=1}^{n}\Pr[e_i \text{ not covered}]$ $\le$ $\frac{1}{n^{c-1}}$. This shows that the randomized algorithm produces a feasible solution with reasonably high probability.

How about the average of the solution? First of all, let’s define $p(x_j)$ as the probability of “$x_j$ is included in the solution”. Then, $p(x_j)$ $=$ $1 - (1-x_j)^{c \operatorname{ln} n}$. Which makes $p(x_j)’$ $=$ $(c \operatorname{ln} n)(1 - (1-x_j)^{(c \operatorname{ln} n) - 1})$ $\le$ $c \operatorname{ln} n$. Which also makes $p(x_j)$ $\le$ $(c \operatorname{ln} n) x_j$. As a result, $E[\sum\limits_{j=1}^m w_j X_j]$ $=$ $\sum\limits_{j=1}^m w_j Pr[X_j = 1]$ $\le$ $\sum_{j=1}^m w_j (c \operatorname{ln} n) x_j$ $=$ $(c \operatorname{ln} n)\sum_{j=1}^m w_j x_j$ $\le$ $(c \operatorname{ln} n) OPT$

This concludes the summary of approximation algorithms for $\operatorname{SET COVER}$. The next posts will explore other techniques with additional examples.

Reference :David P. Williamson and David B. Shmoys, The Design of Approximation Algorithms

Reference :CSI6107 lecture at Yonsei University