Algorithm Design 13

发表于 2023-12-29 分类于课程笔记，算法设计阅读次数：本文字数： 941 阅读时长 ≈ 3 分钟

[2023 秋] 算法设计第十三课课堂笔记 / [2023 Fall] Algorithm Design Lecture 13 Note

Chapter 9 Randomized Algorithm

9.4 Approximate Counting

Strategy : Rejection Selection

Goal : estimate $|G|$

Method : take $U$ s.t. $G\subseteq U$
- We can take uniform samples from $U$ efficiently
- There is a membership oracle $\mathcal O$ for $G$ : $\forall x\in U,\mathcal O(x)=\mathbf 1(x\in G)$
- We can easily compute/estimate $|U|$
Theorem : Let $\alpha=\frac{|G|}{|U|}$, so we need $N\ge c\frac{1}{\epsilon^2\alpha}\log(\frac{1}{\delta})$ samples, s.t. \[ \Pr\left\{\frac{\text{\# samples in }G}{N}\in (1\pm \epsilon)|G|\right\}\ge 1-\delta \] Proof : Chernoff Bound
Counting #DNF solutions
1. Description
  
  DNF : $\lor_{i=1}^m C_i$, $C_i=a_{i,1}\land \cdots\land a_{i,l_i}$ , $a_{i,j}\in \{x_t,\bar x_t:1\le t\le n\}$
  
  Goal : Estimate the number of satisfying assignments ( out of $2^n$ possibilities )
  
  #P Hard Problem
2. Karp-Luby-Mardars Method
  
  If DNF has a clause of size $3$:
  
  $U$ : all $2^n$ assignments, $G$ : satisfying assignments
  
  $|G|\ge \frac{|U|}{8}$, so $\alpha\ge \frac{1}{8}$, we only need $\mathcal O(\frac{1}{\epsilon^2}\log \frac{1}{\delta})$
  
  For general case:
  
  Considering a table, $2^n\times m$, $t_{i,j}=*$ iff assignment $i$ can satisfy clause $j$. $t_{i,j}=\otimes$ if clause $j$ is the first clause satisfied by assignment $i$
  
  $U$ : All $*$ , $G$ : All $\otimes$
  1. How to sample from $U$ :
    
    $|U|=\sum_{i=1}^m 2^{n-l_i}$ .
    - Firstly sample column $i$ w.p. $\frac{2^{n-l_i}}{|U|}$ ( this can be implemented in $\mathcal O(1)$ , or $\mathcal O(m)$ ?)
    - Then, from column $i$, choose $*$
  2. Check whether $*\in G$ : easily in $\mathcal O(m)$ , check all previous clauses
  3. $\alpha=\frac{|G|}{|U|}\ge \frac{1}{m}$ : we have at least one $*$ in one satisfying assignment

9.5 Streaming Algorithms

Definition

Input arrives in stream fashion, length of stream is $n$.

Goal: estimate some statistics of the stream.

Setting: Space $ n$ . Typical size : $\mathcal O(1),poly(\log n),\mathcal O(\sqrt n)$ ( at least sublinear )

Usually, we cannot give a precise solution due to space constraint.

$(\epsilon,\delta)$-solution: $\Pr\{\mathtt{SOL}\in (1\pm \epsilon)\texttt{ True Value}\}\ge 1-\delta$.
Counting the distinct elements ( $F_0$-estimation )
1. Description
  
  $n$ elements, each element $\in [m]$ , $n,m$ are large
  
  Goal: output the number of distinct elements
2. Google : HyperLogLog
3. A simpler (idealized) algorithm with Hash function
  1. deterministic hash function $h:[m]\to [0,1]$
  2. Take representative: maintain $k=\mathcal O(\frac{1}{\epsilon^2})$-th smallest $h$ value ( suppose its value is $t$ )
    
    Not taking the smallest value: variance is large
  3. Estimation: $\frac{t}{k}$
4. Can achieve $(\epsilon,\delta)$-approximation (Proof: Chernoff Bound)
5. Real Implementation: $h:[m]\to [m^2]$, only store smallest $k$ values

Chapter 10 Local Search

10.1 Metropolis Algorithm

From statistical mechanics

$\Pr\{\text{a state with energy }E\}\sim \exp(-\frac{E}{kT})$

(Purple: Small $T$, Red: Large $T$)
Markov Chain

Directed weighted graph $G$, $\forall u\in V,\sum_{(u,v)\in E}p(u,v)=1$

Transition matrix: $P_{i,j}=p(i,j)=P(j|i)$

$(P^k)_{i,j}$: probability that start from $i$ and end at $j$ with exactly $k$ steps

Stationary distribution $\bar x$: $P^T\bar x=\bar x$

Finite space: must have stationary distribution
Count # perfect matching in bipartite graph

$\iff$ Sampling problem $\iff$ Markov Chain ( matching $\to$ matching )

Design MC s.t. stationary distribution of MC is uniformly distributed over all states

Starting from one state, after $k$ steps we stop, consider the final state as a uniformly sampled stationary distribution

Under certain condition (Ergodic), a random walk will converge to a unique stationary distribution

$\forall x_0,\lim\limits_{k\to\infty}(P^T)^k x_0=\bar x$

Mix time: maximum ( consider all $x_0$ ) time to reach $\bar x$
Metropolis Algorithm

$\mathcal C$: set of all states, $S$: current state, $S'$: a uniformly chosen neighbor of $S$.

If $E(S')<E(S)$, $S\gets S'$. Otherwise, $S\gets S'$ with probability $\exp(-\frac{E(S')-E(S)}{kT})$.

Theorem: Let $Z=\sum_{S\in \mathcal C}\exp(-\frac{E(S)}{kT})$, then \[ \Pr\{S\gets \mathcal C:S\in \text{stationary distribution}\}=\frac{1}{Z}\exp(-\frac{E(S)}{kT}) \] Application:
- Sampling: Bayesian Inference/ Graphical Models
- Optimization: minimize $f(x)$: Let $E(S)=f(S)$ $\to$ Simulated Annealing
Simulated Annealing

Goal: minimize $f$
- High temperature: like uniform, Markov Chain converges quicker for larger $T$ (since the graph has good continuity)
- Low temperature: many local minimum, hard to sample (converges slower)
Cooling scheduling $T=T(i)$: $i=1,2,\cdots$
- $x_i$ run Metropolis Algorithm for $T(i)$ with initial state $x_{i-1}$
Mixing Time

Total variation distance: $p,q$ are two distributions, \[ d_{TV}(p,q)=\sum_{i}|p_i-q_i|=\left\|\vec p-\vec q\right\|_1 \] Let $p_x^k=(P^T)^kx$, $\pi$ be the stationary distribution. Define mixing time as \[ \tau_x(\epsilon)=\min\{k:d_{TV}(p_x^k,\pi)\le \epsilon\} \] Mixing time $\sim$ continuity
- MinCut very small $\to$ mixing time large
- Good continuity $\to$ mixing time small
Theorem: Let $P$ be the transition matrix of Markov Chain, suppose $P$ is symmetric. Suppose $\lambda_1\ge \lambda_2\ge \cdots\ge \lambda_N$ be eigenvalues of $P$, it is known that $\lambda_1=1$ and $v_1=(\frac{1}{\sqrt n},\cdots,\frac{1}{\sqrt n})$. Then let $\lambda_{\max}=\max\{|\lambda_2|,|\lambda_N|\}$, so \[ \tau(\epsilon)\le \mathcal O\left(\frac{\log n+\log \frac{1}{\epsilon}}{1-\lambda_{\max}}\right) \] i.e. $\lambda_2\to 1$, mixing time $\to$ large