Approximation Algorithm (2): Set Cover

Recap $\operatorname{SET COVER}$ can be LP-relaxed to “Minimize the $\sum\limits_{j=1}^{m} w_j x_j$ such that $\sum\limits_{j:e_i \in S_j} x_j \ge 1$, $x_j \ge 0$ then choose $x_j$ to $1$ which $x_j \ge \frac{1}{f}$ and otherwise to 0”. Which is $f$-approximation algorithm. Which $e_i$ denotes each elements, $S_j$ denotes each subsets, $w_j$ denotes costs of each subsets, $x_j$ is the chocie of each subset. With $f$ is the maximum number of existance of elements.

Here are some different approaches to this $\operatorname{SET COVER}$.

Dual solution

Maximize the $\sum\limits_{i=1}^{n} y_i$ such that $\sum\limits_{i:e_i \in S_j} y_i \le w_j$, $y_i \ge 0$. This is so-called dual of original LP. Now, how do we rounding? We can choose $\sum\limits_{i:e_i \in S_j} y_i = w_j$.

This approach aims to make the total payment as large as possible while ensuring every subset can be fully funded. Only subsets that are fully paid for are selected.

Now, let’s show this is also an $f$-approximation.

First, let’s show that the dual solution $I$ is a valid set cover. Suppose we failed to cover some $e_k$, so for every subset $S_j$ with $e_k \in S_j$, $\sum\limits_{i:e_i \in S_j} y_i < w_j$. Let $\epsilon = \min_{j:e_k \in S_j}(w_j - \sum\limits_{i:e_i \in S_j} y^o_i)$. Then the new solution with $y^b_k = y^o_k + \epsilon$ is still feasible and has a strictly larger objective value. This contradicts the assumption that $y^o$ was optimal. Therefore, the dual solution always produces a feasible set cover.

From the original LP, it can be known that $\sum\limits_{i=1}^{n} y_i$ $\le$ $\sum\limits_{i=1}^{n} (y_i \sum\limits_{j:e_i \in S_j} x_j)$ $=$ $\sum\limits_{i=1}^{n}\sum\limits_{j:e_i \in S_j} y_i x_j$ $=$ $\sum\limits_{j=1}^{m}\sum\limits_{i:e_i \in S_j} y_i x_j$ $=$ $\sum\limits_{j=1}^{m} (x_j \sum\limits_{i: e_i \in S_j} y_i )$ since $\sum\limits_{j:e_i \in S_j} x_j \ge 1$.

From the constrain $\sum\limits_{j=1}^{m} (x_j \sum\limits_{i: e_i \in S_j} y_i)$ $\le$ $\sum\limits_{j=1}^{m} x_j w_j$ is true. As a result, $\sum\limits_{i=1}^{n} y_i \le \sum\limits_{j=1}^{m} x_j w_j$. Since it works with any feasible solution $x_j$ and $y_i$, $\sum\limits_{i=1}^{n} y^o_i$ $\le$ $\sum\limits_{j=1}^{m} x^o_j w_j$ is true for a solution of original LP $x^o_j$ and a solution of dual $y^o_i$. In summary, $\sum\limits_{i=1}^{n} y^o_i$ $\le$ $\sum\limits_{j=1}^{m} x^o_j w_j$ $\le$ $\operatorname{OPT}$.

Now with the fact above, $\sum\limits_{j \in I} w_j$ $=$ $\sum\limits_{j \in I} (\sum\limits_{i:e_i \in S_j} y^o_i)$ from the constraint. $\sum\limits_{j \in I} (\sum\limits_{i:e_i \in S_j} y^o_i)$ $=$ $\sum\limits_{i=1}^{n}(\sum\limits_{e_i \in S_j, j \in I} y^o_i)$ $=$ $\sum\limits_{i=1}^{n}(|j \in I, e_i \in S_j| y^o_i)$ $\le$ $\sum\limits_{i=1}^{n}f y^o_i$ $\le$ $f\sum\limits_{i=1}^{n}y^o_i$ $\le$ $f \operatorname{OPT}$. Now we proved this is the $f$-approximation algorithm.

Primal-dual algorithm

Now, there are some major points in this algorithm.

$\sum\limits_{i \in l} y_i \le \operatorname{OPT}$
We can make a better solution by choose $y^b_k$ $=$ $y^o_k + \epsilon$ $=$ $w_j$ without lossing valid constraint.
We will choose $\sum\limits_{i:e_i \in S_j} y_i = w_j$.

From the above, we can design a better algorithmic approach.

$y \leftarrow 0$
$I \leftarrow \emptyset$
$\operatorname{while}$ there exists $e_i \not\in \bigcup\limits_{j \in I} S_j \operatorname{do}$

Increase the dual variable $y_i$ until there is some $l$ with $e_i \in S_l$ such that $\sum\limits_{j:e_j \in S_l} y_j = w_l$
$I \leftarrow I \cup \{l\}$

From the facts above, we can observe that after increasing $y_i$, property 1 still holds. By facts 2 and 3, we can adjust any $y_j$ while maintaining feasibility. The proof is omitted here as it follows the same reasoning as the dual program analysis. Importantly, this primal-dual approach avoids solving linear programming explicitly.