Algorithm Design 11

Chapter 8 Network Flow

8.3 Application

8.3.7 Project Selection Problem

  1. Description

    A set of projects , project \(i\) has profit \(p_i\) ( \(p_i\) can be positive or negative)

    A set of procedure constraints : given by DAG , \(e=(i,j)\) means \(i\) can be selected only if \(j\) is selected .

    Goal : Select a subset of projects \(A\) , s.t. \(\sum_{i\in A}p_i\) is maximized .

  2. Flow Construction

    Min-Cut Model

    • \((i,j)\) : give \(\infty\) capacity , then it cannot be a cut . Then it cannot be \(i\) in \(s\) part while \(j\) in \(t\) part .
    • Define in \(s\) part : selected , in \(t\) part : not selected .
    • If \(p_i\ge 0\) , then have edge \((s,i,p_i)\) ; If \(p_i<0\) , then have edge \((i,t,-p_i)\) .
    • Answer is \(\sum_{p_i\ge 0}p_i-\mathtt{MinCut}\)

    Formal construction :

    • Vertices : \(V'=V\cup \{s,t\}\)

    • Edges :

      • \(\forall (i,j)\in E\) , \((i,j,\infty)\)

      • \(\forall i\in V,p_i\ge 0\) , \((s,i,p_i)\)

      • \(\forall i\in V,p_i<0\) , \((i,t,-p_i)\)

    • Find a Min-Cut in \(G'\) : \((A,V'-A)\)

      The minimum value of cut \((A,V'-A)\) is \[ \begin{aligned} &\quad \sum_{p_i\ge 0,i\notin A}p_i+\sum_{p_i<0,i\in A}(-p_i)\\ &=\sum_{p_i\ge 0}p_i-\left(\sum_{p_i\ge 0,i\in A}p_i+\sum_{p_i<0,i\in A}p_i\right) \end{aligned} \] Which is equivalent to maximize \[ \sum_{p_i\ge 0,i\in A}p_i+\sum_{p_i<0,i\in A}p_i=\sum_{i\in A}p_i \]

8.3.8 Densest Subgraph Problem

  1. Description

    Given undirected graph \(G=(V,E)\) . Density of induced subgraph \(G[S]\) , \(S\subseteq V\) is defined as \(\frac{|E\cap G[S]|}{|S|}\)

    Goal : find a subgraph of maximum density .

  2. More generalized (but easier problem) : Baseball Elimination

    Input :

    • A set \(S\) of teams . For each \(x\in S\) , its current score \(w_x\) .
    • For \(x,y\in S\) , they will need to play \(g_{x,y}\) more games .
    • For each game , winner gains \(1\) point , loser gains \(0\) point . Not consider draw case .

    Consider a special team \(z\) , determine theoretically whether \(z\) can win the league ( have the highest score ) (not necessarily unique)

    WOLG , we can let \(z\) win all games it needs to play , and the current score of \(z\) becomes \(m\) .

    \(z\) cannot win \(\iff\) \(\exists T\subseteq S\) , \(\sum\limits_{x\in T}w_x+\sum\limits_{x,y\in T}g_{x,y}>m|T|\)

  3. Flow Construction

    \(\iff\) \(\exists T\subseteq S\) , \(\sum\limits_{x\in T}(w_x-m)+\sum\limits_{x,y\in T}g_{x,y}>0\)

    Min-Cut Model

    Construction :

    • Vertices : \(V'=V\cup E\cup \{s,t\}\)
    • Edges :
      • \(\forall (u,v)\in E\) , \((s,(u,v),g_{u,v})\)
      • \(\forall (u,v)\in E\) , \(((u,v),u,\infty)\) , \(((u,v),v,\infty)\)
      • \(\forall v\in V\) , \((v,t,m-w_v)\)
  4. Analysis

    1. \(\max\limits_{T\subseteq S}\sum\limits_{x\in T}(w_x-m)+\sum\limits_{x,y\in T}g_{x,y}>0\) \(\iff\) \(\sum_{x,y}g_{x,y}-\mathtt{MinCut}>0\)

      Min-Cut in \(G'\) : \((A,V'-A)\) , let \(T=A\cap V\) . The min cut value is : \[ \begin{aligned} &\quad \sum_{x\in T} (m-w_x)+\sum_{x\notin T\lor y\notin T} g_{x,y}\\ &=\sum_{x,y}g_{x,y}-\left(\sum_{x,y\in T}g_{x,y}+\sum_{x\in T}(w_x-m)\right) \end{aligned} \]

    2. \(\sum_{x,y}g_{x,y}-\mathtt{MinCut}>0\) \(\iff\) \(z\) cannot win

      Consider as MaxFlow : greedily allocate all games , MaxFlow obviously \(\le \sum_{x,y}g_{x,y}\) .

      If \(<\) , we cannot allocate all games that satisfies \(f_v+w_v\le m\) , so \(z\) must lose in all possibilities .

      If \(=\) , we can find a valid game-result allocation , so \(z\) still have possibility to win .

  5. Find maximum density

    • Binary search : OK but slow

    • parametric diagram

    • Densest Subgraph problem can be solved by parametric flow algorithm in poly-time .

8.4 MaxFlow and LP

  1. LP form of MaxFlow

    Goal : maximize \(\sum_{v}f_{s,v}\)

    Constraints :

    • \(\forall v\in V\backslash\{s,t\}\) , \(\sum_{u}f_{u,v}=\sum_{u}f_{v,u}\)
    • \(\forall f_e\in E\) , \(0\le f_e\le c_e\)
  2. Integral Polytope

    Claim : If \(c_e\in \mathbb Z_+\) , \(P\) is integral ( vertices of \(P\) are integral vector )

    Proof : By FF algorithm

    General Result :

    • LP constraints : \(Ax\le b\)
    • \(A\) is totally unimodular matrix \(\to\) Integral Polytope

Chapter 9 Randomized Algorithm

  1. Complexity

    P , RP , ZPP , BPP

    Conjecture : P=RP ? (current guess : YES)

9.1 MaxCut Problem

  1. Description

    Given undirected \(G=(V,E)\)

    Find a cut \((A,V-A)\) , s.t. \(|\{(u,v)\in E:u\in A,v\in V-A\}|\) is maximized

  2. Randomized \(2\)-approximation algorithm

    \(\forall v\in V\) , \(\begin{cases}v\in A&w.p.\frac{1}{2}\\ v\notin A&w.p.\frac{1}{2}\end{cases}\)

    Analysis : \[ \begin{aligned} &\quad E[cut(A,V-A)]\\ &=\sum_{e\in E} \Pr\{e\in cut(A,V-A)\}\\ &=\sum_{(u,v)\in E} \Pr\{(u\in A,v\notin A)\lor (u\notin A,v\in A)\}\\ &=\sum_{(u,v)\in E} \frac{1}{2}\\ &=\frac{|E|}{2}\ge \frac{OPT}{2} \end{aligned} \]

  3. Similarly , for Max-3-SAT problem , randomly assign each variable

    \(E[SOL]\ge \frac{7}{8}OPT\)

  4. Derandomization : Conditional Expectation method

    Goal : design a deterministic \(\frac{8}{7}\)-approximation algorithm for Max-3-SAT \[ E_{x_i\in \{0,1\}}[SOL]=\sum_{C}\Pr\{C\text{ is satisfied}\}=\frac{7}{8}|C| \]

    \[ E_{x_i\in \{0,1\}}[SOL]=\frac{1}{2}E[SOL|x_1=0]+\frac{1}{2}E[SOL|x_1=1] \]

    We can still compute \(E[SOL|x_1=0]\) and \(E[SOL|x_1=1]\) by computing \(\Pr\{C\text{ is satisfied}\}\) for each clause .

    Either \(E[SOL|x_1=0]\ge \frac{7}{8}\) or \(E[SOL|x_1=1]\ge \frac{7}{8}\)

9.2 Randomized D&C

\(k\)-th smallest number

  1. Randomized algorithm

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    int select(S,k){ // find k-th smallest number in S
    a<- randomly choosed element in S ;
    S_l=S.filter(x:x<a);
    S_r=S.filter(x:x>=a);
    if(|S_l|==k-1) return a;
    else{
    if(|S_l|>=k) return select(S_l,k);
    else return select(S_r,k-|S_l|);
    }
    }
  2. Analysis

    Goal : prove expected time is \(\mathcal O(n)\) .

    Intuition :

    • If \(\frac{1}{4}|S|\le |S_l|\le \frac{3}{4}|S|\) , do the function . Otherwise , do not divide and re-choose pivot \(a\) .
    • This algorithm is obviously slower than our algorithm .

    Analysis :

    • Define phase \(j\) : \(n(\frac{3}{4})^j\le |S|\le n(\frac{3}{4})^{j+1}\)

      We have \(\mathcal O(\log n)\) phases

    • Key : In each phase , #iterations to find a valid pivot is expected bounded .

      In one iteration : \(\Pr\{\text{succeed in choosing a pivot element}\}\ge \frac{1}{2}\) .

      Therefore , \(E[|\text{iterations in this phase}|]\le 2\) .

    • \[ \begin{aligned} &\quad E[\text{running time}]\\ &\le \sum_{j=1}^{\mathcal O(\log n)} E[\text{running time of phase }j]\\ &\le \sum_{j=1}^{\mathcal O(\log n)} (\frac{3}{4})^{j+1} n E[\text{iterations in this phase}]\\ &\le \mathcal O(n) \end{aligned} \]

  3. Derandomization

    Divide into \(\frac{n}{k}\) groups , find medium for each group , then find medium for medium of each group as pivot .

    We can ensure that this pivot is not far from middle rank \(\to\) good separation .

9.3 Hashing

  1. Intuition

    Find a "good" function \(h:U\to\{0,1,\cdots,n-1\}\) , where \(|U|> n\) . ( \(|U|\) is unaffordable in time/memory , but \(n\) is acceptable in time/memory)

    Use \(h(u)\) to represent \(u\) .

    Good performance : Insert : \(\mathcal O(1)\) , Delete : \(\mathcal O(1)\) , Find : \(\mathcal O(1)\) .

    Problem : collision : \(u\neq v\in U\) , \(h(u)=h(v)\) .

    Collision resolution -

    • Chaining : maintain a link-list for each \(i\in \{0,1,\cdots,n-1\}\) .

      • If \(h(v)\) is allocated , insert \(v\) at the end of \(h(v)\)-th link-list .
      • When find/delete , iterate the whole \(h(v)\)-th link-list .
    • Open Address method : ( #element stored \(<N\) )

      • \(h:U\times\{0,\cdots,N-1\}\to \{0,1,\cdots,N-1\}\)

      • We probe \(h(x,0),h(x,1),\cdots\) in order , until find an empty cell

      • Linear probing : \(h':U\to \{0,1,\cdots,N-1\}\) , \(h(x,i)=(h'(x)+i)\bmod N\)

      • Drawback : continuous allocation \(\to\) more possible to allocate near

        Ideal hashing function : like random

        Primary Cluster : Bad in performance

      • Improvement :

        • Quadratic probing \(h(x,i)=(h'(x)+c_1i+c_2i^2)\bmod N\)
        • Double Hashing : \(h(x,i)=(h_1(x)+ih_2(x))\bmod N\)