Algorithm Design 12

发表于 2023-12-29 分类于课程笔记，算法设计阅读次数：本文字数： 1.2k 阅读时长 ≈ 4 分钟

[2023 秋] 算法设计第十二课课堂笔记 / [2023 Fall] Algorithm Design Lecture 12 Note

Chapter 9 Randomized Algorithm

9.3 Hashing

universal hashing function

Intuition : want the function looks "random"
1. Def [ universal hashing function ] A family $\mathcal H$ of functions $h:U\to\{0,\cdots,n-1\}$ is universal if $\forall u,v\in U,u\neq v$ , \[ \Pr\{h\gets \mathcal H:h(u)=h(v)\}\le \frac{1}{n} \]
2. Design : Let $p\approx n$ be a prime number ( $n$ is the size of hash table , for simplicity let $p=n$ )
  
  For all $x\in U$ , write $x$ in base-$p$ : $x=\sum_{i=0}^{r-1} x_ip^i$ , where $x_i\in \{0,\cdots,p-1\}$ . \[ \mathcal H:=\left\{h_{\vec a}(x)=\left(\sum_{i=0}^{r-1}a_ix_i\right)\bmod p:\vec a=(a_0,\cdots,a_{r-1}),a_i\in \{0,\cdots,p-1\}\right\} \] Goal : prove $\forall x,y\in U,x\neq y$ , \[ \Pr\{a_i\gets \{0,\cdots,p-1\}:h_{\vec a}(x)=h_{\vec a}(y)\}\le \frac{1}{n} \] Proof :
  
  Suppose $x=\sum_{i=0}^{r-1}x_ip^i$ , $y=\sum_{i=0}^{r-1}y_ip^i$ , so there exists $j$ such that $x_j\neq y_j$ . \[ \begin{aligned} &\quad \Pr\left\{h_{\vec a}(x)=h_{\vec a}(y)\right\}\\ &=\Pr\left\{\sum_{i=0}^{r-1}a_ix_i\equiv \sum_{i=0}^{r-1}a_iy_i\pmod p\right\}\\ &=\Pr\left\{a_j(x_j-y_j)\equiv \sum_{i=0,i\neq j}^{r-1}a_i(y_i-x_i)\pmod p\right\}\\ &=\frac{1}{p} \end{aligned} \] This is because $a_j\gets \{0,\cdots,p-1\}$ , and for all possible $\sum_{i=0,i\neq j}^{r-1}a_i(y_i-x_i)$ , there exists exactly one $a_j$ that $a_j(x_j-y_j)\equiv \sum_{i=0,i\neq j}^{r-1}a_i(y_i-x_i)\pmod p$
Perfect Hashing
1. Def : static dictionary with no collision , $|U|=n$ , space $\mathcal O(n)$ , query time $\mathcal O(1)$ ( with $h(x)$ oracle , performing $h(x)$ costs $\mathcal O(1)$ time )
  
  static : only consider lookup operation , no insertion/deletion .
2. FKS Perfect Hashing [1984]
  
  Intuition : $2$-level data structure
  - Firstly sample $h\gets\mathcal H$ , get a hash table , space $\mathcal O(n)$
  - Secondly , if index $i$ encounters collision (suppose the collision set is $B_i=\{u\in U:h(u)=i\}$ ), then sample $h_i\gets \mathcal H'$ and get a second-level hash table , space $\mathcal O(|B_i|^2)$ . If collision still happens , then resample $h_i$ , until no collision happens .
3. Collision Analysis
  
  Claim : $\mathcal H=\{h\}$ universal , $h:[n]\to [m]$ . If $m=\mathcal O(n^2)$ , then with probability $\ge 0.9$ there is no collision .
  
  Proof : $U=\{x_1,\cdots,x_n\}$ , define random variable $X_{i,j}=\begin{cases}1&h(x_i)=h(x_j)\\0&otherwise\end{cases}$ , define $X=\sum_{1\le i<j\le n} X_{i,j}$ \[ E[X]=\sum_{i<j}E[X_{i,j}]=\sum_{i<j}\Pr\{h(x_i)=h(x_j)\}\le \frac{\binom{n}{2}}{M} \] By Markov's Inequality , \[ \Pr\{X\ge 1\}\le E[X]\le \frac{n(n-1)}{2M}\le \frac{n^2}{2M} \] Therefore , let $M=5n^2$ , so $\Pr\{X\ge 1\}\le \frac{1}{10}$ . $\square$
4. Space Analysis
  
  Claim : $E[\sum |B_i|^2]=\mathcal O(n)$
  
  Proof : Let $Y_i=|B_i|$ , $Y=\sum |B_i|^2$ , so in the first level , \[ X=\sum \binom{Y_i}{2}=\frac{1}{2}\left(\sum Y_i^2-\sum Y_i\right)=\frac{Y-n}{2} \] Similarly , $E[X]\le \frac{\binom{n}{2}}{n}=\frac{n-1}{2}$ , so $E[Y]=E[2X+n]\le 2n-1=\mathcal O(n)$ .
Balls-Bins Game
1. $n$ bins , $n$ balls , $E[\text{max bins}]=\Theta(\frac{\log n}{\log\log n})$
  
  $X_i=\#\text{ balls in bin }i$ , $Y_{i,j}=\begin{cases}1&\text{ball }j\to \text{bin }i\\0&otherwise\end{cases}$ , so $X_i=\sum Y_{i,j}$
  
  Let $c=\Theta(\frac{\log n}{\log\log n})$ , so $c=1+\delta$ , $E[X_i]=1$ , so \[ \Pr\{X_i\ge c\}\le \left(\frac{e^{c-1}}{c^c}\right)\le \frac{1}{n^2} \]
2. Coupon Collector Problem
  
  $n\ln n$ balls , $n$ bins , w.h.p. every bin is non-empty
3. Load Balancing Problem
  
  $kn\log n$ balls , $n$ bins , load of every bin $\approx (k-1,k+1)\log n$ ( $k$ not too small )
  
  Application : load balancing in network
4. Power of two choices
  
  $n$ bins , $n$ balls
  
  For every ball , choose $2$ random bins , and place the ball in the bin with smaller load .
  
  $E[\text{max bins}]=\Theta(\log \log n)$
(*) Cuckoo Hashing
1. worst case : $\mathcal O(1)$ lookup , dynamic dictionary
2. Maintain $2$ hash tables , $h_1,h_2\gets \mathcal H$
  
  Insert(x)
  1. If $h_1(x)$ in $T_1$ is empty , insert $x$ into $T_1$
  2. If $T_1(h_1(x))$ contains $x'$ , then insert $x'$ into $T_2$ by $h_2(x')$ , and insert $x$ into $T_1$ by $h_1(x)$
    
    Following some procedure , until an empty slot is found .
  3. If #iteration $$ threshold $t$ , rehash all elements
  If load $\le 50\%$ , $E[\texttt{insertion time}]=\mathcal O(1)$
  
  Lookup(x) : Obviously at most look up $2$ times .
  
  Delete(x) : Use lookup then delete .
Application : Closest Pair Problem

using D&C , $\mathcal O(n\log n)$

using Hashing , $\mathcal O(n)$ in expectation
1. Hash function computation modal :
  - Insertion : $\mathcal O(1)$ in expectation
  - Deletion : $\mathcal O(1)$ in expectation
  - Lookup : $\mathcal O(1)$ in expectation
2. Algorithm :
  1. Order points randomly $P_1,\cdots,P_n$ , let $\delta=d(P_1,P_2)$
  2. When we process $P_i$ , we will find whether $\exists j<i$ s.t. $d(P_j,P_i)<\delta$ .
    
    If YES , we start a new phase with new $\delta'=d(P_j,P_i)$
  3. In each phase , maintain a grid with edge length $\frac{\delta}{2}$ , each cell contains $\le 1$ points in $\{P_1,\cdots,P_{i-1}\}$
    
    maintain all nonempty cells in a dictionary ( $U$ : all cells )
    
    Suppose we are processing $P_i$ , we only need to look up $25$ cells in the dictionary to find smaller distance .
    
    Whenever we start a new phase , we reconstruct the whole dictionary .
3. Analysis :
  
  $X_i=\begin{cases}1&P_i\text{ causes a new phase}\\0&otherwise\end{cases}$ , running time $T=\sum_{i=1}^n \left(X_i \mathcal O(i)+25\mathcal O(1)\right)$ .
  
  Therefore , $E[T]=\left(\sum_{i=1}^n \mathcal O(i)\Pr\{X_i=1\}\right)+\mathcal O(n)$ .
  
  Claim : $\Pr\{X_i=1\}\le \frac{2}{i}$ ( by the initial random permutation )
  
  Proof : Suppose $(P_a,P_b)$ has the minimum distance among $P_1,\cdots,P_i$ , so $\Pr\{X_i=1\}=\Pr\{a=i\lor b=i\}\le \Pr\{a=i\}+\Pr\{b=i\}$ .
  
  $\Pr\{a=i\}=\Pr\{b=i\}=\frac{1}{i}$ , so $\Pr\{X_i=1\}\le \frac{2}{i}$ . $\square$

9.4 Approximate Counting

Approximate Counting Problem
1. #P : counting problem $\leftrightarrow$ NP
2. #P-complete
  
  $L\in \texttt{NP-Complete}\to L\in \texttt{\#P-Complete}$
  
  $L\in \texttt{\#P-Complete}\not\to L\in \texttt{NP-Complete}$ : e.g. counting perfect matching
3. FPRAS : Fully-Poly Randomized Approximation Scheme
  
  An algorithm in $poly(n,\frac{1}{\epsilon})$ , $\Pr\{(1-\epsilon)ANS\le SOL\le (1+\epsilon)ANS\}\ge 0.99$
4. Scheme : Rejection Sampling
  
  Example:
  
  General : to estimate $|G|$ , construct $U$ s.t. $G\subseteq U$
  - Can uniformly sample from $U$
  - Membership Oracle : $\forall x\in U$ , decide whether $x\in G$ or not
  - We know $|U|$