When implementing arbitrary-precision integers in C++, the key challenge is how fast you can perform multiplication. This post introduces a two-stage hybrid approach combining NTT and Karatsuba.
Table of Contents
Internal Representation
BigInteger stores its absolute value as a little-endian limb array.
Each limb is an int in the range $[0, B)$, with the least significant digits at index 0. The sign is stored as a separate bool flag.
1 | |
Why $B = 10^7$?
When multiplying two limbs as long long, there must be no overflow.
$10^7$ is the largest power of 10 that satisfies this constraint.
Multiplication Strategy — Two-Stage Hybrid
The core of multiplying two numbers is computing the integer convolution of their limb arrays.
\[c[k] = \sum_{i+j=k} a[i] \cdot b[j]\]Propagating carries through the convolution result yields the final product.
NTT-Based Polynomial Convolution
Number Theoretic Transform (NTT) is FFT implemented over modular arithmetic. It computes integer convolution in $O(n \log n)$ without floating-point error.
The two primes used for the transform are NTT-friendly primes of the form $p = k \cdot 2^m + 1$.
| Prime | Value | Form | Max Transform Size |
|---|---|---|---|
| $p_1$ | 998,244,353 | $119 \cdot 2^{23} + 1$ | $2^{23}$ |
| $p_2$ | 985,661,441 | $235 \cdot 2^{22} + 1$ | $2^{22}$ |
NTT proceeds as follows.
\[a, b \xrightarrow{\text{NTT mod }p} \hat{a}, \hat{b} \xrightarrow{\text{pointwise}} \hat{c} \xrightarrow{\text{INTT mod }p} c\]Running this process for each of $p_1$ and $p_2$ gives each result coefficient $c[i]$ expressed under two moduli.
\[c[i] \equiv r_1 \pmod{p_1}, \quad c[i] \equiv r_2 \pmod{p_2}\]Result Recovery via Garner’s CRT
Recovering the actual value from two residues is the Chinese Remainder Theorem (CRT). We use Garner’s algorithm, which is numerically efficient.
Garner’s algorithm uses a mixed-radix representation.
\[c[i] = c_1 + c_2 \cdot p_1\]where
\[c_1 = r_1, \quad c_2 = (r_2 - r_1) \cdot p_1^{-1} \pmod{p_2}\]Since $p_1 \cdot p_2 \approx 9.84 \times 10^{17}$, any coefficient smaller than this product is recovered exactly. The maximum value of each coefficient is $n \cdot (B-1)^2$, so the maximum number of limbs for which NTT can be safely applied is:
\[n \cdot (B-1)^2 < p_1 \cdot p_2 \implies n < \frac{p_1 \cdot p_2}{(10^7)^2} \approx 9860\]This means NTT can be applied directly to numbers with up to approximately 69,000 decimal digits.
Karatsuba Recursive Split
For larger numbers, Karatsuba’s algorithm splits the problem in half.
Split $a$ and $b$ each into two halves.
\[a = a_{\text{hi}} \cdot x^m + a_{\text{lo}}, \quad b = b_{\text{hi}} \cdot x^m + b_{\text{lo}}\]The product becomes
\[a \cdot b = z_2 \cdot x^{2m} + z_1 \cdot x^m + z_0\]Karatsuba’s key insight is computing the middle term $z_1$ with only three multiplications.
\[z_0 = a_{\text{lo}} \cdot b_{\text{lo}}\] \[z_2 = a_{\text{hi}} \cdot b_{\text{hi}}\] \[z_1 = (a_{\text{lo}} + a_{\text{hi}}) \cdot (b_{\text{lo}} + b_{\text{hi}}) - z_0 - z_2\]Compared to a naive divide-and-conquer with 4 recursive calls, 3 calls reduce the complexity to:
\[T(n) = 3T(n/2) \implies T(n) = O(n^{\log_2 3}) = O(n^{1.585})\]Two-stage switching logic:
1 | |
This structure allows multiplication of arbitrarily large numbers, even beyond NTT’s transform size limit ($2^{22}$).
Division — Knuth Algorithm D
Division is implemented using Algorithm D from Knuth’s The Art of Computer Programming Vol. 2. The algorithm proceeds in the following steps.
- Normalization: Multiply both dividend and divisor by the same integer $d$ so that the most significant limb of the divisor is at least $B/2$.
- Quotient estimation: Estimate each digit of the quotient ($\hat{q}$) using the top two limbs.
- Correction: Since $\hat{q}$ can exceed the true quotient by at most 2, verify the estimate and reduce it up to twice if needed.
- Denormalization: Divide the remainder by $d$ to recover the true remainder.
Sign convention follows the C++ standard (truncate toward zero).
| Dividend | Divisor | Quotient | Remainder |
|---|---|---|---|
| 17 | 5 | 3 | 2 |
| -17 | 5 | -3 | -2 |
| 17 | -5 | -3 | 2 |
| -17 | -5 | 3 | -2 |
Together, these three components — the $10^7$-based limb representation, the NTT + Garner CRT convolution engine, and Knuth’s Algorithm D — form a self-contained arbitrary-precision integer library that handles numbers of any size with predictable, well-understood complexity at each layer.