Algorithm Design 13

Chapter 9 Randomized Algorithm

9.4 Approximate Counting

  1. Strategy : Rejection Selection

    Goal : estimate \(|G|\)

    Method : take \(U\) s.t. \(G\subseteq U\)

    • We can take uniform samples from \(U\) efficiently
    • There is a membership oracle \(\mathcal O\) for \(G\) : \(\forall x\in U,\mathcal O(x)=\mathbf 1(x\in G)\)
    • We can easily compute/estimate \(|U|\)

    Theorem : Let \(\alpha=\frac{|G|}{|U|}\), so we need \(N\ge c\frac{1}{\epsilon^2\alpha}\log(\frac{1}{\delta})\) samples, s.t. \[ \Pr\left\{\frac{\text{\# samples in }G}{N}\in (1\pm \epsilon)|G|\right\}\ge 1-\delta \] Proof : Chernoff Bound

  2. Counting #DNF solutions

    1. Description

      DNF : \(\lor_{i=1}^m C_i\), \(C_i=a_{i,1}\land \cdots\land a_{i,l_i}\) , \(a_{i,j}\in \{x_t,\bar x_t:1\le t\le n\}\)

      Goal : Estimate the number of satisfying assignments ( out of \(2^n\) possibilities )

      #P Hard Problem

    2. Karp-Luby-Mardars Method

      If DNF has a clause of size \(3\):

      \(U\) : all \(2^n\) assignments, \(G\) : satisfying assignments

      \(|G|\ge \frac{|U|}{8}\), so \(\alpha\ge \frac{1}{8}\), we only need \(\mathcal O(\frac{1}{\epsilon^2}\log \frac{1}{\delta})\)

      For general case:

      Considering a table, \(2^n\times m\), \(t_{i,j}=*\) iff assignment \(i\) can satisfy clause \(j\). \(t_{i,j}=\otimes\) if clause \(j\) is the first clause satisfied by assignment \(i\)

      \(U\) : All \(*\) , \(G\) : All \(\otimes\)

      1. How to sample from \(U\) :

        \(|U|=\sum_{i=1}^m 2^{n-l_i}\) .

        • Firstly sample column \(i\) w.p. \(\frac{2^{n-l_i}}{|U|}\) ( this can be implemented in \(\mathcal O(1)\) , or \(\mathcal O(m)\) ?)
        • Then, from column \(i\), choose \(*\)
      2. Check whether \(*\in G\) : easily in \(\mathcal O(m)\) , check all previous clauses

      3. \(\alpha=\frac{|G|}{|U|}\ge \frac{1}{m}\) : we have at least one \(*\) in one satisfying assignment

9.5 Streaming Algorithms

  1. Definition

    Input arrives in stream fashion, length of stream is \(n\).

    Goal: estimate some statistics of the stream.

    Setting: Space $ n$ . Typical size : \(\mathcal O(1),poly(\log n),\mathcal O(\sqrt n)\) ( at least sublinear )

    Usually, we cannot give a precise solution due to space constraint.

    \((\epsilon,\delta)\)-solution: \(\Pr\{\mathtt{SOL}\in (1\pm \epsilon)\texttt{ True Value}\}\ge 1-\delta\).

  2. Counting the distinct elements ( \(F_0\)-estimation )

    1. Description

      \(n\) elements, each element \(\in [m]\) , \(n,m\) are large

      Goal: output the number of distinct elements

    2. Google : HyperLogLog

    3. A simpler (idealized) algorithm with Hash function

      1. deterministic hash function \(h:[m]\to [0,1]\)

      2. Take representative: maintain \(k=\mathcal O(\frac{1}{\epsilon^2})\)-th smallest \(h\) value ( suppose its value is \(t\) )

        Not taking the smallest value: variance is large

      3. Estimation: \(\frac{t}{k}\)

    4. Can achieve \((\epsilon,\delta)\)-approximation (Proof: Chernoff Bound)

    5. Real Implementation: \(h:[m]\to [m^2]\), only store smallest \(k\) values

10.1 Metropolis Algorithm

  1. From statistical mechanics

    \(\Pr\{\text{a state with energy }E\}\sim \exp(-\frac{E}{kT})\)

    (Purple: Small \(T\), Red: Large \(T\))

  2. Markov Chain

    Directed weighted graph \(G\), \(\forall u\in V,\sum_{(u,v)\in E}p(u,v)=1\)

    Transition matrix: \(P_{i,j}=p(i,j)=P(j|i)\)

    \((P^k)_{i,j}\): probability that start from \(i\) and end at \(j\) with exactly \(k\) steps

    Stationary distribution \(\bar x\): \(P^T\bar x=\bar x\)

    Finite space: must have stationary distribution

  3. Count # perfect matching in bipartite graph

    \(\iff\) Sampling problem \(\iff\) Markov Chain ( matching \(\to\) matching )

    Design MC s.t. stationary distribution of MC is uniformly distributed over all states

    Starting from one state, after \(k\) steps we stop, consider the final state as a uniformly sampled stationary distribution

    Under certain condition (Ergodic), a random walk will converge to a unique stationary distribution

    \(\forall x_0,\lim\limits_{k\to\infty}(P^T)^k x_0=\bar x\)

    Mix time: maximum ( consider all \(x_0\) ) time to reach \(\bar x\)

  4. Metropolis Algorithm

    \(\mathcal C\): set of all states, \(S\): current state, \(S'\): a uniformly chosen neighbor of \(S\).

    If \(E(S')<E(S)\), \(S\gets S'\). Otherwise, \(S\gets S'\) with probability \(\exp(-\frac{E(S')-E(S)}{kT})\).

    Theorem: Let \(Z=\sum_{S\in \mathcal C}\exp(-\frac{E(S)}{kT})\), then \[ \Pr\{S\gets \mathcal C:S\in \text{stationary distribution}\}=\frac{1}{Z}\exp(-\frac{E(S)}{kT}) \] Application:

    • Sampling: Bayesian Inference/ Graphical Models
    • Optimization: minimize \(f(x)\): Let \(E(S)=f(S)\) \(\to\) Simulated Annealing
  5. Simulated Annealing

    Goal: minimize \(f\)

    • High temperature: like uniform, Markov Chain converges quicker for larger \(T\) (since the graph has good continuity)
    • Low temperature: many local minimum, hard to sample (converges slower)

    Cooling scheduling \(T=T(i)\): \(i=1,2,\cdots\)

    • $x_i$ run Metropolis Algorithm for \(T(i)\) with initial state \(x_{i-1}\)
  6. Mixing Time

    Total variation distance: \(p,q\) are two distributions, \[ d_{TV}(p,q)=\sum_{i}|p_i-q_i|=\left\|\vec p-\vec q\right\|_1 \] Let \(p_x^k=(P^T)^kx\), \(\pi\) be the stationary distribution. Define mixing time as \[ \tau_x(\epsilon)=\min\{k:d_{TV}(p_x^k,\pi)\le \epsilon\} \] Mixing time \(\sim\) continuity

    • MinCut very small \(\to\) mixing time large
    • Good continuity \(\to\) mixing time small

    Theorem: Let \(P\) be the transition matrix of Markov Chain, suppose \(P\) is symmetric. Suppose \(\lambda_1\ge \lambda_2\ge \cdots\ge \lambda_N\) be eigenvalues of \(P\), it is known that \(\lambda_1=1\) and \(v_1=(\frac{1}{\sqrt n},\cdots,\frac{1}{\sqrt n})\). Then let \(\lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}\), so \[ \tau(\epsilon)\le \mathcal O\left(\frac{\log n+\log \frac{1}{\epsilon}}{1-\lambda_{\max}}\right) \] i.e. \(\lambda_2\to 1\), mixing time \(\to\) large