Algorithm Design 4

Chapter 2 Greedy Algorithm

2.7 Optimal Caching

  1. Description

    • U : \(n\) pieces of distinct elements stored in main memory

    • cache : can hold \(k<n\) pieces

      init : cache holds \(k\) elements

    • need to process a sequence of elements \(d_1,\cdots,d_m\) ( \(m\) can \(\ge n\) , not distinct)

    • when processing \(d_i\) :

      • If \(d_i\) already in cache : cache hit , do nothing to the cache
      • If \(d_i\) not in cache : cache miss , evict some other element from the cache
    • Goal : minimize # of cache miss .

  2. Greedy Algorithm

    1. FF Strategy : evict the element that will be used the furthest in the future .

    2. Proof [Exchange Argument] :

      See BOOK

      Suppose our solution \(S_{FF}\) differs with the optimal solution \(S\) firstly at \(d_i\) , which is a cache miss , and our solution evicts \(a\) and the optimal solution evicts \(b\) , where the future used time : \(t(a)>t(b)\) . Let's prove that we can construct a new solution \(S'\) that is same with our solution at \(d_i\) as well , and is not worse than the optimal solution \(S\).

      We do not need to care about other caches and other elements . We only need to care about \(a,b\) . \(S=S'\) except the following cases :

      Case 0 : At \(d_i\) , \(S'\) should evict \(a\) rather than \(b\)

      Case 1 : \(d_j\) let \(a\) evicted in \(S\) before the first future used time \(t(a)\) . Then let \(S'\) evicts \(b\) at the same time , \(S\) is the same as \(S'\) .

      Case 2 : At \(t(b)\) . If at this time , \(S\) evicts \(a\) , then \(S'\) can do nothing and save one cache miss . If at this time , \(S\) evicts \(c\neq a\) , then we can let \(S'\) evicts \(c\) as well , and then let \(a\) "waiting" until it is used at \(t(a)\) or evicted later .

  3. In reality : we cannot know the sequence ahead

    LRU : least recently used

    using locality of reference

    [ Slater , Tarjan ] : LRU is the earliest online solution (elements come one by one)

    \(LRU\le k(FF+1)\) , \(k\) is the cache size .

2.8 Minimum Cost Arborescence

a.k.a 最小树形图

  1. Description

    Find a min-cost directed spanning tree rooted at \(r\) in a directed graph \(G\) .

    Assumption : \(\forall e\in E,w(e)\ge 0\)

  2. Algorithm

    • For \(v\in V\backslash \{r\}\) , choose the in-edge with minimum weight \(e_v\) .

      Let \(F^*\) be the set of chosen edge , so \(cost(F^*)\le OPT\) .

      If \(F^*\) is already a tree , we get the optimal solution .

    • Otherwise , for all \(v\in V\backslash\{r\}\) , let \(w'(e=(u,v))=w(u,v)-w(e_v)\) , and then contract zero-cost cycles to get a new graph \(G'\) .

    • Process \(G'\) as above .

  3. Extension : matrix-tree theorem

    count # of spanning tree in a graph (either directed or undirected)

    \(Det(L_{00})\)

    variant : count # of spanning tree in a graph with given total weight

Chapter 3 Dynamic Programming

3.1 weighted interval scheduling problem

  1. Description

    Input : \(n\) intervals \([l_i,r_i]\) with weight \(w_i\)

    Goal : maximize the total weight s.t. the chosen intervals are distinct

  2. key : Dynamic Programming recursion

    1. sort the jobs according to \(r_i\) (like greedy algorithm)
    2. define \(opt(j)\) : the optimal value for the first \(j\) intervals

    \[ opt(i)=\max\{opt(i-1),opt(p(i))+w_i\} \]

    where \(p(i)=\max\{j|r_j<l_i\}\) .

    1. If we don't use interval \(i\) , it is \(opt(i-1)\) .

      If we use interval \(i\) , since \(\{r_i\}\) non-decreasing , \(opt(p(i))\) contains all valid optimal solutions before .

  3. Implementation

    naive :

    1
    2
    3
    4
    int compute_opt(int i){
    if(i==0)return 0;
    else return max(compute_opt(i-1),w[i]+compute_opt(p(i)));
    }

    Problem : can be exponential time !

    Key : many redundant computation \(\to\) store them

    memorization / filling out the DP table :

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    int opt[]={-1};
    // recursive implementation
    int compute_opt(int i){
    if(i==0) return 0;
    if(opt[i]!=-1)return opt[i];
    else return opt[i]=max(compute_opt(i-1),w[i]+compute_opt(p(i)));
    }
    // non-recursive implementation
    {
    opt[0]=0;
    for(int i=1;i<=n;++i){
    opt[i]=max(opt[i-1],w[i]+opt[p(i)]);
    }
    }

3.2 Subset Sum / Knapsack

  1. Description

    Input : \(n\) items , item \(i\) has weight \(w_i\) and value \(v_i\) ; Weight limit \(W\)

    Goal : find a subset of items \(S\) s.t. \(\sum_{i\in S}w_i\le W\) and maximize \(\sum_{i\in S}v_i\) .

  2. DP Algorithm

    1. pseudo-poly time / poly time

      pseudo-poly time : \(poly(n,W)\)

      poly time : \(poly(n,\log W)\) ( \(\log W\) is the least number of bits to input \(W\) )

      Knapsack : NP-Hard : hard to find poly time algorithm

      But finding a pseudo-poly time algorithm is easy !

    2. \(opt(i,w)\) : the optimal solution for first \(i\) items and weight \(= w\) \[ opt(i,w)=\max\begin{cases}opt(i-1,w)&\text{not using i-th item}\\ opt(i-1,w-w_i)+v_i&\text{using i-th item} \end{cases} \]

    3. Time Complexity : \(\mathcal O(nW)\) .

  3. Implementation

    \(opt(w)\) : at \(i\)-th iteration : the optimal solution for first \(i\) items and weight \(= w\) \[ opt(w)=\max\{opt(w),opt(w-w_i)+v_i\} \] iterate \(w\) decreasingly

    Memory Complexity : \(\mathcal O(n+W)\)

3.3 RNA secondary structure

  1. Description

    Input : \(\{A,C,G,U\}^n\)

    Constraints :

    • no sharp turns : If pairing \((i,j)\in S\) , then \(i<j-4\)
    • pairing : \(A-G\) , \(C-U\)
    • one base can only appear in \(\le 1\) pairs
    • no crossing pairing : \(\not\exist (i,j),(l,r)\in S,i<l<j<r\)

    Goal : maximize the size of pairing set \(S\)

  2. Algorithm

    \(opt(l,r)\) : the maximum number of pairs in subsequence \([l,r]\) \[ opt(l,r)=\max\begin{cases} opt(l+1,r-1)+1&a[l]\text{ and }a[r] \text{ can pairing}\\ \max_{l\le i\le r-1}\{opt(l,i)+opt(i+1,r)\} \end{cases} \] Order : \(r-l\) increasing

    Time Complexity : \(\mathcal O(n^3)\)

    Memory Complexity : \(\mathcal O(n^2)\)

3.4 DP on tree and tree-like graph

3.4.1 maximum independent set on tree

  1. Description

    Input : A tree , vertex-weighted

    Goal : Find an Independent Set of maximum weight

  2. Algorithm

    \(opt_{0/1}(x)\) : the maximum-weighted independent set of the subtree \(x\) , \(opt_1\) means we choose \(x\) , \(opt_0\) means we don't choose \(x\) . \[ opt_1(x)=w_x+\sum_{y\in child(x)}opt_0(y)\\ opt_0(x)=\sum_{y\in child(x)}\max\{opt_0(y),opt_1(y)\} \]

  3. Tree DP View :

    • subtree is the natural subinstance
    • delete \(x\) , the tree will be separated into several parts , no connection between \(subtree(y)\) (\(y\in child(x)\)) , \(G\backslash subtree(x)\)

3.4.2 treewidth

  1. Tree Decomposition ( clique tree , junction trees )

    1. Def [separator] : For connected graph \(G\) , a separator is a vertex set \(S\subset V\) , s.t. \(G\backslash S\) is not connected .

    2. Def [tree decomposition] : \(T(G)\) is a tree decomposition if :

      Each vertex in \(T(G)\) is called a bag \((V_T(x),E_T(x))\) : containing some vertices in \(G\) and their edges

      • \(\cup_x V_T(x)=V\)
      • \(\forall e\in E,\exists x\in T(G) , e\in E_T(x)\)
      • \(\forall v\in V,\{x|v\in V_T(x)\}\) form a connected subtree
    3. Def [treewidth for tree decomposition] : \(tw(T(G)):=\max_x \{|V_T(x)|\}-1\)

    4. Lemma : If \(G\) is a tree , \(tw(T(G))=1\)

    5. Def [treewidth for graph] : \[ tw(G)=\min\limits_{T(G)\text{ is a tree decomposition}}tw(T(G)) \]

  2. Lemma [ bags and separators ] : \(T(G)\) is a tree decomposition of \(G\) , \(A,B\) are adjacent bags in \(T(G)\) , then

    \(V_T(A)\) is a separator of \(G\) , \(V_T(A\cap B)\) is a separator of \(G\)

  3. Independent Set on a graph with a tree decomposition

    Input : given \(G\) and a tree decomposition \(T(G)\)

    Goal : Find maximum Independent Set in Time Complexity \(\mathcal O(n2^{tw(T(G))})\)

  4. Algorithm

    • \(opt(x,S)\) : consider the subtree \(x\) in the tree decomposition , choosing exactly \(S\) in \(V_T(x)\) , the maximum independent set

    • \[ opt(x,S)=w(S)+\sum_{y\in child(x)}\max_{S' , S\text{ consists with } S'\text{ on }S\cap S'}\{opt(y,S')-w(S\cap S')\} \]