Approximation Algorithm (1): Set Cover

An optimization problem is a problem of finding the minimum or maximum possible solution. For example, $\operatorname{SET COVER}$ is a well-known optimization problem. For a universe set $U = \{e_1,e_2,e_3,\cdots,e_n$} and subsets of $U$, $S_j(1 \le j \le n)$. $\operatorname{SET COVER}$ is a problem to find a collection of subsets which minimize the sum of costs of subset $W_j$ and it contains every element in $U$. For example, if $U = \{e_1,e_2,e_3,e_4,e_5$}, $S_1 = \{e_1,e_2$}, $S_2 = \{e_3,e_4$}, $S_3 = \{e_5$}, $S_4 = \{e_2,e_3$}, $S_5 = \{e_1$}, $S_6 = \{e_4$}, $W_1=10$, $W_2=10$, $W_3=1$, $W_4=15$, $W_5=1$, $W_6=1$. Optimal solution would be $W_3 + W_4 + W_5 + W_6 = 1 + 15 + 1 + 1 = 18$. Notice that $S_3 \cup S_4 \cup S_5 \cup S_6 = U$. However, $\operatorname{SET COVER}$ is known to be NP-hard, meaning it requires more than polynomial time unless P $=$ NP.

Now, here the approximation algorithm works.

An $\alpha$-approximation algorithm produces a solution within a factor of $\alpha$ of the optimum. That is, if the optimal solution cost is $\operatorname{OPT}$, the algorithm’s answer $\mathrm{ans}$ satisfies $\mathrm{ans} \le \alpha \cdot \operatorname{OPT}$.

For the easy of explanation, $\operatorname{SET COVER}$ can be converted like below. Minimize the $\sum\limits_{j=1}^{m} w_j x_j$ such that $\sum\limits_{j:e_i \in S_j} x_j \ge 1$, $x_j \in \{0,1$}. Notice that this kinds of problem is known as integer problem.

However, if we do relaxation, we can do approximation over this. New problem is like below.

Minimize the $\sum\limits_{j=1}^{m} w_j x_j$ such that $\sum\limits_{j:e_i \in S_j} x_j \ge 1$, $x_j \ge 0$. Notice that this kinds of problem is known as linear problem. Which is known to be solvable in the polynomial time.

After solving the LP above, we obtain a fractional solution $x_j$. However, to produce a valid integer solution, we need to round the values. To do so, we compute the maximum frequency $f_{e_i}$ of each element $e_i$ across all sets $S_j$. For example, if $U = \{e_1,e_2,e_3,e_4,e_5$}, $S_1 = \{e_1,e_2$}, $S_2 = \{e_3,e_4$}, $S_3 = \{e_5$}, $S_4 = \{e_2,e_3$}, $S_5 = \{e_1$}, $S_6 = \{e_4$}, $W_1=10$, $W_2=10$, $W_3=1$, $W_4=15$, $W_5=1$, $W_6=1$. The frequencies are $f_{e_1} = 2$, $f_{e_2} = 2$, $f_{e_3} = 2$, $f_{e_4} = 2$, $f_{e_5} = 1$. So the maximum frequency is $f = 2$. We then round up: set $x_j = 1$ if $x_j \ge \frac{1}{f}$, and $x_j = 0$ otherwise. Algorithm above can be done in polynomial time.

Now, we need to prove that this is a proper $f$-approximation algorithm for original problem.

Since we set $x_j = 1$ for all $x_j \ge \frac{1}{f}$, and the maximum frequency of any element is $f$, at least one such $x_j$ must be selected. Otherwise, $\sum\limits_{j:e_i \in S_j} x_j \ge 1$ cannot hold. Therefore, $\operatorname{ans}$ is a valid set cover.
Let $\operatorname{round}$ denote the LP solution before rounding. Since $\operatorname{ans}$ selects only those $x_j \ge \frac{1}{f}$, we have $\operatorname{ans} \le f \cdot \operatorname{round}$. Note that $\operatorname{round}$ has no $x_j > 1$. Furthermore, $\operatorname{round} \le \operatorname{OPT}$ because it is a relaxation of the integer program. Therefore, $\operatorname{ans} \le f \cdot \operatorname{round} \le f \cdot \operatorname{OPT}$. This completes the proof.

We have constructed an efficient $f$-approximation algorithm for $\operatorname{SET COVER}$.