Algorithm Design 11
Chapter 8 Network Flow
8.3 Application
8.3.7 Project Selection Problem
Description
A set of projects , project \(i\) has profit \(p_i\) ( \(p_i\) can be positive or negative)
A set of procedure constraints : given by DAG , \(e=(i,j)\) means \(i\) can be selected only if \(j\) is selected .
Goal : Select a subset of projects \(A\) , s.t. \(\sum_{i\in A}p_i\) is maximized .
Flow Construction
Min-Cut Model
- \((i,j)\) : give \(\infty\) capacity , then it cannot be a cut . Then it cannot be \(i\) in \(s\) part while \(j\) in \(t\) part .
- Define in \(s\) part : selected , in \(t\) part : not selected .
- If \(p_i\ge 0\) , then have edge \((s,i,p_i)\) ; If \(p_i<0\) , then have edge \((i,t,-p_i)\) .
- Answer is \(\sum_{p_i\ge 0}p_i-\mathtt{MinCut}\)
Formal construction :
Vertices : \(V'=V\cup \{s,t\}\)
Edges :
\(\forall (i,j)\in E\) , \((i,j,\infty)\)
\(\forall i\in V,p_i\ge 0\) , \((s,i,p_i)\)
\(\forall i\in V,p_i<0\) , \((i,t,-p_i)\)
Find a Min-Cut in \(G'\) : \((A,V'-A)\)
The minimum value of cut \((A,V'-A)\) is \[ \begin{aligned} &\quad \sum_{p_i\ge 0,i\notin A}p_i+\sum_{p_i<0,i\in A}(-p_i)\\ &=\sum_{p_i\ge 0}p_i-\left(\sum_{p_i\ge 0,i\in A}p_i+\sum_{p_i<0,i\in A}p_i\right) \end{aligned} \] Which is equivalent to maximize \[ \sum_{p_i\ge 0,i\in A}p_i+\sum_{p_i<0,i\in A}p_i=\sum_{i\in A}p_i \]
8.3.8 Densest Subgraph Problem
Description
Given undirected graph \(G=(V,E)\) . Density of induced subgraph \(G[S]\) , \(S\subseteq V\) is defined as \(\frac{|E\cap G[S]|}{|S|}\)
Goal : find a subgraph of maximum density .
More generalized (but easier problem) : Baseball Elimination
Input :
- A set \(S\) of teams . For each \(x\in S\) , its current score \(w_x\) .
- For \(x,y\in S\) , they will need to play \(g_{x,y}\) more games .
- For each game , winner gains \(1\) point , loser gains \(0\) point . Not consider draw case .
Consider a special team \(z\) , determine theoretically whether \(z\) can win the league ( have the highest score ) (not necessarily unique)
WOLG , we can let \(z\) win all games it needs to play , and the current score of \(z\) becomes \(m\) .
\(z\) cannot win \(\iff\) \(\exists T\subseteq S\) , \(\sum\limits_{x\in T}w_x+\sum\limits_{x,y\in T}g_{x,y}>m|T|\)
Flow Construction
\(\iff\) \(\exists T\subseteq S\) , \(\sum\limits_{x\in T}(w_x-m)+\sum\limits_{x,y\in T}g_{x,y}>0\)
Min-Cut Model
Construction :
- Vertices : \(V'=V\cup E\cup \{s,t\}\)
- Edges :
- \(\forall (u,v)\in E\) , \((s,(u,v),g_{u,v})\)
- \(\forall (u,v)\in E\) , \(((u,v),u,\infty)\) , \(((u,v),v,\infty)\)
- \(\forall v\in V\) , \((v,t,m-w_v)\)
Analysis
\(\max\limits_{T\subseteq S}\sum\limits_{x\in T}(w_x-m)+\sum\limits_{x,y\in T}g_{x,y}>0\) \(\iff\) \(\sum_{x,y}g_{x,y}-\mathtt{MinCut}>0\)
Min-Cut in \(G'\) : \((A,V'-A)\) , let \(T=A\cap V\) . The min cut value is : \[ \begin{aligned} &\quad \sum_{x\in T} (m-w_x)+\sum_{x\notin T\lor y\notin T} g_{x,y}\\ &=\sum_{x,y}g_{x,y}-\left(\sum_{x,y\in T}g_{x,y}+\sum_{x\in T}(w_x-m)\right) \end{aligned} \]
\(\sum_{x,y}g_{x,y}-\mathtt{MinCut}>0\) \(\iff\) \(z\) cannot win
Consider as MaxFlow : greedily allocate all games , MaxFlow obviously \(\le \sum_{x,y}g_{x,y}\) .
If \(<\) , we cannot allocate all games that satisfies \(f_v+w_v\le m\) , so \(z\) must lose in all possibilities .
If \(=\) , we can find a valid game-result allocation , so \(z\) still have possibility to win .
Find maximum density
Binary search : OK but slow
parametric diagram
Densest Subgraph problem can be solved by parametric flow algorithm in poly-time .
8.4 MaxFlow and LP
LP form of MaxFlow
Goal : maximize \(\sum_{v}f_{s,v}\)
Constraints :
- \(\forall v\in V\backslash\{s,t\}\) , \(\sum_{u}f_{u,v}=\sum_{u}f_{v,u}\)
- \(\forall f_e\in E\) , \(0\le f_e\le c_e\)
Integral Polytope
Claim : If \(c_e\in \mathbb Z_+\) , \(P\) is integral ( vertices of \(P\) are integral vector )
Proof : By FF algorithm
General Result :
- LP constraints : \(Ax\le b\)
- \(A\) is totally unimodular matrix \(\to\) Integral Polytope
Chapter 9 Randomized Algorithm
Complexity
P , RP , ZPP , BPP
Conjecture : P=RP ? (current guess : YES)
9.1 MaxCut Problem
Description
Given undirected \(G=(V,E)\)
Find a cut \((A,V-A)\) , s.t. \(|\{(u,v)\in E:u\in A,v\in V-A\}|\) is maximized
Randomized \(2\)-approximation algorithm
\(\forall v\in V\) , \(\begin{cases}v\in A&w.p.\frac{1}{2}\\ v\notin A&w.p.\frac{1}{2}\end{cases}\)
Analysis : \[ \begin{aligned} &\quad E[cut(A,V-A)]\\ &=\sum_{e\in E} \Pr\{e\in cut(A,V-A)\}\\ &=\sum_{(u,v)\in E} \Pr\{(u\in A,v\notin A)\lor (u\notin A,v\in A)\}\\ &=\sum_{(u,v)\in E} \frac{1}{2}\\ &=\frac{|E|}{2}\ge \frac{OPT}{2} \end{aligned} \]
Similarly , for Max-3-SAT problem , randomly assign each variable
\(E[SOL]\ge \frac{7}{8}OPT\)
Derandomization : Conditional Expectation method
Goal : design a deterministic \(\frac{8}{7}\)-approximation algorithm for Max-3-SAT \[ E_{x_i\in \{0,1\}}[SOL]=\sum_{C}\Pr\{C\text{ is satisfied}\}=\frac{7}{8}|C| \]
\[ E_{x_i\in \{0,1\}}[SOL]=\frac{1}{2}E[SOL|x_1=0]+\frac{1}{2}E[SOL|x_1=1] \]
We can still compute \(E[SOL|x_1=0]\) and \(E[SOL|x_1=1]\) by computing \(\Pr\{C\text{ is satisfied}\}\) for each clause .
Either \(E[SOL|x_1=0]\ge \frac{7}{8}\) or \(E[SOL|x_1=1]\ge \frac{7}{8}\)
9.2 Randomized D&C
\(k\)-th smallest number
Randomized algorithm
1
2
3
4
5
6
7
8
9
10int select(S,k){ // find k-th smallest number in S
a<- randomly choosed element in S ;
S_l=S.filter(x:x<a);
S_r=S.filter(x:x>=a);
if(|S_l|==k-1) return a;
else{
if(|S_l|>=k) return select(S_l,k);
else return select(S_r,k-|S_l|);
}
}Analysis
Goal : prove expected time is \(\mathcal O(n)\) .
Intuition :
- If \(\frac{1}{4}|S|\le |S_l|\le \frac{3}{4}|S|\) , do the function . Otherwise , do not divide and re-choose pivot \(a\) .
- This algorithm is obviously slower than our algorithm .
Analysis :
Define phase \(j\) : \(n(\frac{3}{4})^j\le |S|\le n(\frac{3}{4})^{j+1}\)
We have \(\mathcal O(\log n)\) phases
Key : In each phase , #iterations to find a valid pivot is expected bounded .
In one iteration : \(\Pr\{\text{succeed in choosing a pivot element}\}\ge \frac{1}{2}\) .
Therefore , \(E[|\text{iterations in this phase}|]\le 2\) .
\[ \begin{aligned} &\quad E[\text{running time}]\\ &\le \sum_{j=1}^{\mathcal O(\log n)} E[\text{running time of phase }j]\\ &\le \sum_{j=1}^{\mathcal O(\log n)} (\frac{3}{4})^{j+1} n E[\text{iterations in this phase}]\\ &\le \mathcal O(n) \end{aligned} \]
Derandomization
Divide into \(\frac{n}{k}\) groups , find medium for each group , then find medium for medium of each group as pivot .
We can ensure that this pivot is not far from middle rank \(\to\) good separation .
9.3 Hashing
Intuition
Find a "good" function \(h:U\to\{0,1,\cdots,n-1\}\) , where \(|U|> n\) . ( \(|U|\) is unaffordable in time/memory , but \(n\) is acceptable in time/memory)
Use \(h(u)\) to represent \(u\) .
Good performance : Insert : \(\mathcal O(1)\) , Delete : \(\mathcal O(1)\) , Find : \(\mathcal O(1)\) .
Problem : collision : \(u\neq v\in U\) , \(h(u)=h(v)\) .
Collision resolution -
Chaining : maintain a link-list for each \(i\in \{0,1,\cdots,n-1\}\) .
- If \(h(v)\) is allocated , insert \(v\) at the end of \(h(v)\)-th link-list .
- When find/delete , iterate the whole \(h(v)\)-th link-list .
Open Address method : ( #element stored \(<N\) )
\(h:U\times\{0,\cdots,N-1\}\to \{0,1,\cdots,N-1\}\)
We probe \(h(x,0),h(x,1),\cdots\) in order , until find an empty cell
Linear probing : \(h':U\to \{0,1,\cdots,N-1\}\) , \(h(x,i)=(h'(x)+i)\bmod N\)
Drawback : continuous allocation \(\to\) more possible to allocate near
Ideal hashing function : like random
Primary Cluster : Bad in performance
Improvement :
- Quadratic probing \(h(x,i)=(h'(x)+c_1i+c_2i^2)\bmod N\)
- Double Hashing : \(h(x,i)=(h_1(x)+ih_2(x))\bmod N\)