Algorithm Design 4

发表于 2023-10-09 更新于 2023-11-05 分类于课程笔记，算法设计阅读次数：本文字数： 1.4k 阅读时长 ≈ 5 分钟

[2023 秋] 算法设计第四课课堂笔记 / [2023 Fall] Algorithm Design Lecture 4 Notes

Chapter 2 Greedy Algorithm

2.7 Optimal Caching

Description
- U : \(n\) pieces of distinct elements stored in main memory
- cache : can hold \(k<n\) pieces
  
  init : cache holds \(k\) elements
- need to process a sequence of elements \(d_1,\cdots,d_m\) ( \(m\) can \(\ge n\) , not distinct)
- when processing \(d_i\) :
  - If \(d_i\) already in cache : cache hit , do nothing to the cache
  - If \(d_i\) not in cache : cache miss , evict some other element from the cache
- Goal : minimize # of cache miss .
Greedy Algorithm
1. FF Strategy : evict the element that will be used the furthest in the future .
2. Proof [Exchange Argument] :
  
  See BOOK
  
  Suppose our solution \(S_{FF}\) differs with the optimal solution \(S\) firstly at \(d_i\) , which is a cache miss , and our solution evicts \(a\) and the optimal solution evicts \(b\) , where the future used time : \(t(a)>t(b)\) . Let's prove that we can construct a new solution \(S'\) that is same with our solution at \(d_i\) as well , and is not worse than the optimal solution \(S\).
  
  We do not need to care about other caches and other elements . We only need to care about \(a,b\) . \(S=S'\) except the following cases :
  
  Case 0 : At \(d_i\) , \(S'\) should evict \(a\) rather than \(b\)
  
  Case 1 : \(d_j\) let \(a\) evicted in \(S\) before the first future used time \(t(a)\) . Then let \(S'\) evicts \(b\) at the same time , \(S\) is the same as \(S'\) .
  
  Case 2 : At \(t(b)\) . If at this time , \(S\) evicts \(a\) , then \(S'\) can do nothing and save one cache miss . If at this time , \(S\) evicts \(c\neq a\) , then we can let \(S'\) evicts \(c\) as well , and then let \(a\) "waiting" until it is used at \(t(a)\) or evicted later .
In reality : we cannot know the sequence ahead

LRU : least recently used

using locality of reference

[ Slater , Tarjan ] : LRU is the earliest online solution (elements come one by one)

\(LRU\le k(FF+1)\) , \(k\) is the cache size .

2.8 Minimum Cost Arborescence

a.k.a 最小树形图

Description

Find a min-cost directed spanning tree rooted at \(r\) in a directed graph \(G\) .

Assumption : \(\forall e\in E,w(e)\ge 0\)
Algorithm
- For \(v\in V\backslash \{r\}\) , choose the in-edge with minimum weight \(e_v\) .
  
  Let \(F^*\) be the set of chosen edge , so \(cost(F^*)\le OPT\) .
  
  If \(F^*\) is already a tree , we get the optimal solution .
- Otherwise , for all \(v\in V\backslash\{r\}\) , let \(w'(e=(u,v))=w(u,v)-w(e_v)\) , and then contract zero-cost cycles to get a new graph \(G'\) .
- Process \(G'\) as above .
Extension : matrix-tree theorem

count # of spanning tree in a graph (either directed or undirected)

\(Det(L_{00})\)

variant : count # of spanning tree in a graph with given total weight

Chapter 3 Dynamic Programming

3.1 weighted interval scheduling problem

Description

Input : \(n\) intervals \([l_i,r_i]\) with weight \(w_i\)

Goal : maximize the total weight s.t. the chosen intervals are distinct
key : Dynamic Programming recursion
1. sort the jobs according to \(r_i\) (like greedy algorithm)
2. define \(opt(j)\) : the optimal value for the first \(j\) intervals
\[ opt(i)=\max\{opt(i-1),opt(p(i))+w_i\} \]

where \(p(i)=\max\{j|r_j<l_i\}\) .
1. If we don't use interval \(i\) , it is \(opt(i-1)\) .
  
  If we use interval \(i\) , since \(\{r_i\}\) non-decreasing , \(opt(p(i))\) contains all valid optimal solutions before .

Implementation

naive :

int compute_opt(int i){
   if(i==0)return 0;
   else return max(compute_opt(i-1),w[i]+compute_opt(p(i)));
}

Problem : can be exponential time !

Key : many redundant computation \(\to\) store them

memorization / filling out the DP table :

int opt[]={-1};
// recursive implementation 
int compute_opt(int i){
    if(i==0) return 0;
    if(opt[i]!=-1)return opt[i];
    else return opt[i]=max(compute_opt(i-1),w[i]+compute_opt(p(i)));
}
// non-recursive implementation
{
    opt[0]=0;
    for(int i=1;i<=n;++i){
        opt[i]=max(opt[i-1],w[i]+opt[p(i)]);
    }
}

3.2 Subset Sum / Knapsack

Description

Input : \(n\) items , item \(i\) has weight \(w_i\) and value \(v_i\) ; Weight limit \(W\)

Goal : find a subset of items \(S\) s.t. \(\sum_{i\in S}w_i\le W\) and maximize \(\sum_{i\in S}v_i\) .
DP Algorithm
1. pseudo-poly time / poly time
  
  pseudo-poly time : \(poly(n,W)\)
  
  poly time : \(poly(n,\log W)\) ( \(\log W\) is the least number of bits to input \(W\) )
  
  Knapsack : NP-Hard : hard to find poly time algorithm
  
  But finding a pseudo-poly time algorithm is easy !
2. \(opt(i,w)\) : the optimal solution for first \(i\) items and weight \(= w\) \[ opt(i,w)=\max\begin{cases}opt(i-1,w)&\text{not using i-th item}\\ opt(i-1,w-w_i)+v_i&\text{using i-th item} \end{cases} \]
3. Time Complexity : \(\mathcal O(nW)\) .
Implementation

\(opt(w)\) : at \(i\)-th iteration : the optimal solution for first \(i\) items and weight \(= w\) \[ opt(w)=\max\{opt(w),opt(w-w_i)+v_i\} \] iterate \(w\) decreasingly

Memory Complexity : \(\mathcal O(n+W)\)

3.3 RNA secondary structure

Description

Input : \(\{A,C,G,U\}^n\)

Constraints :
- no sharp turns : If pairing \((i,j)\in S\) , then \(i<j-4\)
- pairing : \(A-G\) , \(C-U\)
- one base can only appear in \(\le 1\) pairs
- no crossing pairing : \(\not\exist (i,j),(l,r)\in S,i<l<j<r\)
Goal : maximize the size of pairing set \(S\)
Algorithm

\(opt(l,r)\) : the maximum number of pairs in subsequence \([l,r]\) \[ opt(l,r)=\max\begin{cases} opt(l+1,r-1)+1&a[l]\text{ and }a[r] \text{ can pairing}\\ \max_{l\le i\le r-1}\{opt(l,i)+opt(i+1,r)\} \end{cases} \] Order : \(r-l\) increasing

Time Complexity : \(\mathcal O(n^3)\)

Memory Complexity : \(\mathcal O(n^2)\)

3.4 DP on tree and tree-like graph

3.4.1 maximum independent set on tree

Description

Input : A tree , vertex-weighted

Goal : Find an Independent Set of maximum weight
Algorithm

\(opt_{0/1}(x)\) : the maximum-weighted independent set of the subtree \(x\) , \(opt_1\) means we choose \(x\) , \(opt_0\) means we don't choose \(x\) . \[ opt_1(x)=w_x+\sum_{y\in child(x)}opt_0(y)\\ opt_0(x)=\sum_{y\in child(x)}\max\{opt_0(y),opt_1(y)\} \]
Tree DP View :
- subtree is the natural subinstance
- delete \(x\) , the tree will be separated into several parts , no connection between \(subtree(y)\) (\(y\in child(x)\)) , \(G\backslash subtree(x)\)

3.4.2 treewidth

Tree Decomposition ( clique tree , junction trees )
1. Def [separator] : For connected graph \(G\) , a separator is a vertex set \(S\subset V\) , s.t. \(G\backslash S\) is not connected .
2. Def [tree decomposition] : \(T(G)\) is a tree decomposition if :
  
  Each vertex in \(T(G)\) is called a bag \((V_T(x),E_T(x))\) : containing some vertices in \(G\) and their edges
  - \(\cup_x V_T(x)=V\)
  - \(\forall e\in E,\exists x\in T(G) , e\in E_T(x)\)
  - \(\forall v\in V,\{x|v\in V_T(x)\}\) form a connected subtree
3. Def [treewidth for tree decomposition] : \(tw(T(G)):=\max_x \{|V_T(x)|\}-1\)
4. Lemma : If \(G\) is a tree , \(tw(T(G))=1\)
5. Def [treewidth for graph] : \[ tw(G)=\min\limits_{T(G)\text{ is a tree decomposition}}tw(T(G)) \]
Lemma [ bags and separators ] : \(T(G)\) is a tree decomposition of \(G\) , \(A,B\) are adjacent bags in \(T(G)\) , then

\(V_T(A)\) is a separator of \(G\) , \(V_T(A\cap B)\) is a separator of \(G\)
Independent Set on a graph with a tree decomposition

Input : given \(G\) and a tree decomposition \(T(G)\)

Goal : Find maximum Independent Set in Time Complexity \(\mathcal O(n2^{tw(T(G))})\)
Algorithm
- \(opt(x,S)\) : consider the subtree \(x\) in the tree decomposition , choosing exactly \(S\) in \(V_T(x)\) , the maximum independent set
- \[ opt(x,S)=w(S)+\sum_{y\in child(x)}\max_{S' , S\text{ consists with } S'\text{ on }S\cap S'}\{opt(y,S')-w(S\cap S')\} \]