Greedy Algorithms: Interval Scheduling and Minimizing Maximum Lateness

Interval Scheduling

Problem Definition

The lecture hall scheduling problem: accept as many non-overlapping requests as possible.

Problem 1. Given $n$ intervals $1, \ldots, n$, find the maximum subset of mutually compatible intervals.

  • $s(i)$, $f(i)$: start and finish times of interval $i$ ($s(i) < f(i)$)
  • Intervals $i$ and $j$ are compatible if $f(i) \leq s(j)$ or $f(j) \leq s(i)$

Incorrect Attempt

1
2
3
4
5
6
let R = given intervals
sol ← ∅
while R ≠ ∅ do
    i ← interval in R with fewest overlaps with others
    sol ← sol ∪ {i}
    remove all intervals overlapping i from R

Counterexample: Choosing the interval with the fewest conflicts first can fail to find the optimal solution.


Correct Algorithm: Select Earliest-Finishing Interval

Intuition: Selecting the interval with the earliest finish time leaves the room free “as soon as possible.”

1
2
3
4
5
6
7
let R = given intervals
sol ← ∅
while R ≠ ∅ do
    i ← interval in R with earliest finish time
    sol ← sol ∪ {i}
    remove all intervals overlapping i from R (including i)
return sol

Observation 1. The algorithm returns a set of compatible intervals (each selected interval’s conflicts are immediately removed).


Correctness Proof (Stay-Ahead Argument)

Notation:

  • Algorithm’s solution: $\text{sol} = \lbrace i_1, \ldots, i_k\rbrace$ ($i_\ell$ selected in iteration $\ell$)
  • Optimal solution: $O = \lbrace j_1, \ldots, j_m\rbrace$ ($f(j_1) \leq \cdots \leq f(j_m)$)

Lemma 1. For all $1 \leq r \leq m$: $k \geq r$ and $f(i_r) \leq f(j_r)$.

Proof. By induction on $r$.

  • Base case ($r=1$): The algorithm selects the earliest finish time, so $k \geq 1$ and $f(i_1) \leq f(j)$ for all $j \in I$.

  • Inductive step ($r \geq 2$): By hypothesis, $f(i_{r-1}) \leq f(j_{r-1})$. Since $O$ is compatible, $f(j_{r-1}) \leq s(j_r)$. Therefore:

\[f(i_1) \leq \cdots \leq f(i_{r-1}) \leq f(j_{r-1}) \leq s(j_r)\]

So $j_r$ is not removed before iteration $r$, meaning the $r$-th iteration exists and $f(i_r) \leq f(j_r)$. ∎

Theorem 1. The algorithm is correct.

Proof. By Lemma 1, $k \geq m$. Since the algorithm returns a compatible set, $k \leq m$. Therefore $k = m$. ∎

Theorem 2. Time complexity: $O(n \log n)$ (sort by finish time, then scan once).


Certifying Algorithm

We can output a certificate of optimality alongside the solution.

Observation 2. If $J \subseteq I$ is compatible, then $\text{OPT}(I) \geq \lvert J\rvert$.

Lemma 2. If $I_1, \ldots, I_k$ partition $I$ and every pair of intervals within each $I_i$ overlaps, then $\text{OPT}(I) \leq k$.

Proof. For the optimal solution $O$, $\lvert I_i \cap O\rvert \leq 1$ for each $I_i$. Therefore $\text{OPT}(I) = \lvert O\rvert \leq k$. ∎

1
2
3
4
5
6
7
8
9
10
let R = given intervals
sol ← ∅
k ← 0
while R ≠ ∅ do
    k ← k + 1
    ik ← interval in R with earliest finish time
    sol ← sol ∪ {ik}
    Ck ← all intervals in R overlapping ik (including ik)
    R ← R \ Ck
return sol

This algorithm outputs sol together with $C_1, \ldots, C_{k^}$ as the optimality certificate. In each $C_k$, all intervals contain $f(i_k)$ and therefore mutually overlap. By Lemma 2, $\text{OPT}(I) \leq k^$, and since the algorithm returns $k^*$ compatible intervals, the solution is optimal.


Characteristics of Greedy Algorithms

The defining characteristic of a greedy algorithm: it makes a myopic choice at each step. While combinatorial choices are often coupled, myopic choices under a carefully selected criterion can yield an optimal algorithm for certain problems.


Minimizing Maximum Lateness

Problem Definition

Problem 2. Given $n$ jobs with processing times $p_i$ and deadlines $d_i$, find a schedule that minimizes the maximum lateness $L := \max_{1 \leq i \leq n} \ell(i)$.

Definitions:

  • $s(i)$: start time of job $i$, $f(i) := s(i) + p_i$: completion time
  • $\ell(i) := \max(f(i) - d_i,\ 0)$: lateness of job $i$

Failed Approaches

Shortest job first: $p_1=1, d_1=100$, $p_2=2, d_2=2$ → doing job 1 first gives $L=1$, optimally job 2 first gives $L=0$.

Smallest slack first ($d_i - p_i$): $p_1=1, d_1=2$, $p_2=10, d_2=10$ → counterexample exists.


EDD (Earliest Due Date) Rule

1
2
3
4
5
sort jobs so that d1 ≤ ... ≤ dn
f ← 0
for each job i = 1, ..., n do
    s(i) ← f
    f ← f + pi

Correctness Proof

Definition 1. A schedule has idle time if the machine is empty at some time $t$ but has a job scheduled at some $t’ > t$.

Lemma 3. There exists an optimal schedule with no idle time. (Compressing idle time does not increase maximum lateness.)

Definition 2. An inversion $(i,j)$ occurs if job $i$ is scheduled before $j$ but $d_i > d_j$.

Lemma 4. There exists an optimal schedule with no inversions and no idle time.

Proof sketch. If an inversion $(i,j)$ exists, there is an adjacent inversion pair. Swapping them:

  • Jobs other than $i, j$: completion time unchanged.
  • $j$: $\bar{s}(j) = s(i) < s(j)$ so $\bar{\ell}(j) \leq \ell(j) \leq L$.
  • $i$: $\bar{\ell}(i) = \max\lbrace 0, f(j) - d_i\rbrace \leq \max\lbrace 0, f(j) - d_j\rbrace = \ell(j) \leq L$ (since $d_i > d_j$).

Each swap reduces inversions by 1. After at most $\binom{n}{2}$ swaps, we reach an inversion-free schedule. ∎

Lemma 5. All schedules with no inversions and no idle time have the same maximum lateness.

Proof. In any inversion-free schedule, jobs with the same deadline are scheduled consecutively after all earlier-deadline jobs. The last completion time among jobs with deadline $d$ is independent of their internal order, so the maximum lateness is the same. ∎

Theorem 3. The EDD algorithm is correct. (Follows from Lemmas 3, 4, and 5.)

Time complexity: $O(n \log n)$ (dominated by sorting).