Algorithm Design 13
Chapter 9 Randomized Algorithm
9.4 Approximate Counting
Strategy : Rejection Selection
Goal : estimate \(|G|\)
Method : take \(U\) s.t. \(G\subseteq U\)
- We can take uniform samples from \(U\) efficiently
- There is a membership oracle \(\mathcal O\) for \(G\) : \(\forall x\in U,\mathcal O(x)=\mathbf 1(x\in G)\)
- We can easily compute/estimate \(|U|\)
Theorem : Let \(\alpha=\frac{|G|}{|U|}\), so we need \(N\ge c\frac{1}{\epsilon^2\alpha}\log(\frac{1}{\delta})\) samples, s.t. \[ \Pr\left\{\frac{\text{\# samples in }G}{N}\in (1\pm \epsilon)|G|\right\}\ge 1-\delta \] Proof : Chernoff Bound
Counting #DNF solutions
Description
DNF : \(\lor_{i=1}^m C_i\), \(C_i=a_{i,1}\land \cdots\land a_{i,l_i}\) , \(a_{i,j}\in \{x_t,\bar x_t:1\le t\le n\}\)
Goal : Estimate the number of satisfying assignments ( out of \(2^n\) possibilities )
#P Hard Problem
Karp-Luby-Mardars Method
If DNF has a clause of size \(3\):
\(U\) : all \(2^n\) assignments, \(G\) : satisfying assignments
\(|G|\ge \frac{|U|}{8}\), so \(\alpha\ge \frac{1}{8}\), we only need \(\mathcal O(\frac{1}{\epsilon^2}\log \frac{1}{\delta})\)
For general case:
Considering a table, \(2^n\times m\), \(t_{i,j}=*\) iff assignment \(i\) can satisfy clause \(j\). \(t_{i,j}=\otimes\) if clause \(j\) is the first clause satisfied by assignment \(i\)
\(U\) : All \(*\) , \(G\) : All \(\otimes\)
How to sample from \(U\) :
\(|U|=\sum_{i=1}^m 2^{n-l_i}\) .
- Firstly sample column \(i\) w.p. \(\frac{2^{n-l_i}}{|U|}\) ( this can be implemented in \(\mathcal O(1)\) , or \(\mathcal O(m)\) ?)
- Then, from column \(i\), choose \(*\)
Check whether \(*\in G\) : easily in \(\mathcal O(m)\) , check all previous clauses
\(\alpha=\frac{|G|}{|U|}\ge \frac{1}{m}\) : we have at least one \(*\) in one satisfying assignment
9.5 Streaming Algorithms
Definition
Input arrives in stream fashion, length of stream is \(n\).
Goal: estimate some statistics of the stream.
Setting: Space $ n$ . Typical size : \(\mathcal O(1),poly(\log n),\mathcal O(\sqrt n)\) ( at least sublinear )
Usually, we cannot give a precise solution due to space constraint.
\((\epsilon,\delta)\)-solution: \(\Pr\{\mathtt{SOL}\in (1\pm \epsilon)\texttt{ True Value}\}\ge 1-\delta\).
Counting the distinct elements ( \(F_0\)-estimation )
Description
\(n\) elements, each element \(\in [m]\) , \(n,m\) are large
Goal: output the number of distinct elements
Google : HyperLogLog
A simpler (idealized) algorithm with Hash function
deterministic hash function \(h:[m]\to [0,1]\)
Take representative: maintain \(k=\mathcal O(\frac{1}{\epsilon^2})\)-th smallest \(h\) value ( suppose its value is \(t\) )
Not taking the smallest value: variance is large
Estimation: \(\frac{t}{k}\)
Can achieve \((\epsilon,\delta)\)-approximation (Proof: Chernoff Bound)
Real Implementation: \(h:[m]\to [m^2]\), only store smallest \(k\) values
Chapter 10 Local Search
10.1 Metropolis Algorithm
From statistical mechanics
\(\Pr\{\text{a state with energy }E\}\sim \exp(-\frac{E}{kT})\)
(Purple: Small \(T\), Red: Large \(T\))
Markov Chain
Directed weighted graph \(G\), \(\forall u\in V,\sum_{(u,v)\in E}p(u,v)=1\)
Transition matrix: \(P_{i,j}=p(i,j)=P(j|i)\)
\((P^k)_{i,j}\): probability that start from \(i\) and end at \(j\) with exactly \(k\) steps
Stationary distribution \(\bar x\): \(P^T\bar x=\bar x\)
Finite space: must have stationary distribution
Count # perfect matching in bipartite graph
\(\iff\) Sampling problem \(\iff\) Markov Chain ( matching \(\to\) matching )
Design MC s.t. stationary distribution of MC is uniformly distributed over all states
Starting from one state, after \(k\) steps we stop, consider the final state as a uniformly sampled stationary distribution
Under certain condition (Ergodic), a random walk will converge to a unique stationary distribution
\(\forall x_0,\lim\limits_{k\to\infty}(P^T)^k x_0=\bar x\)
Mix time: maximum ( consider all \(x_0\) ) time to reach \(\bar x\)
Metropolis Algorithm
\(\mathcal C\): set of all states, \(S\): current state, \(S'\): a uniformly chosen neighbor of \(S\).
If \(E(S')<E(S)\), \(S\gets S'\). Otherwise, \(S\gets S'\) with probability \(\exp(-\frac{E(S')-E(S)}{kT})\).
Theorem: Let \(Z=\sum_{S\in \mathcal C}\exp(-\frac{E(S)}{kT})\), then \[ \Pr\{S\gets \mathcal C:S\in \text{stationary distribution}\}=\frac{1}{Z}\exp(-\frac{E(S)}{kT}) \] Application:
- Sampling: Bayesian Inference/ Graphical Models
- Optimization: minimize \(f(x)\): Let \(E(S)=f(S)\) \(\to\) Simulated Annealing
Simulated Annealing
Goal: minimize \(f\)
- High temperature: like uniform, Markov Chain converges quicker for larger \(T\) (since the graph has good continuity)
- Low temperature: many local minimum, hard to sample (converges slower)
Cooling scheduling \(T=T(i)\): \(i=1,2,\cdots\)
- $x_i$ run Metropolis Algorithm for \(T(i)\) with initial state \(x_{i-1}\)
Mixing Time
Total variation distance: \(p,q\) are two distributions, \[ d_{TV}(p,q)=\sum_{i}|p_i-q_i|=\left\|\vec p-\vec q\right\|_1 \] Let \(p_x^k=(P^T)^kx\), \(\pi\) be the stationary distribution. Define mixing time as \[ \tau_x(\epsilon)=\min\{k:d_{TV}(p_x^k,\pi)\le \epsilon\} \] Mixing time \(\sim\) continuity
- MinCut very small \(\to\) mixing time large
- Good continuity \(\to\) mixing time small
Theorem: Let \(P\) be the transition matrix of Markov Chain, suppose \(P\) is symmetric. Suppose \(\lambda_1\ge \lambda_2\ge \cdots\ge \lambda_N\) be eigenvalues of \(P\), it is known that \(\lambda_1=1\) and \(v_1=(\frac{1}{\sqrt n},\cdots,\frac{1}{\sqrt n})\). Then let \(\lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\), so \[ \tau(\epsilon)\le \mathcal O\left(\frac{\log n+\log \frac{1}{\epsilon}}{1-\lambda_{\max}}\right) \] i.e. \(\lambda_2\to 1\), mixing time \(\to\) large