Algorithm Design 4
Chapter 2 Greedy Algorithm
2.7 Optimal Caching
Description
U : \(n\) pieces of distinct elements stored in main memory
cache : can hold \(k<n\) pieces
init : cache holds \(k\) elements
need to process a sequence of elements \(d_1,\cdots,d_m\) ( \(m\) can \(\ge n\) , not distinct)
when processing \(d_i\) :
- If \(d_i\) already in cache : cache hit , do nothing to the cache
- If \(d_i\) not in cache : cache miss , evict some other element from the cache
Goal : minimize # of cache miss .
Greedy Algorithm
FF Strategy : evict the element that will be used the furthest in the future .
Proof [Exchange Argument] :
See BOOK
Suppose our solution \(S_{FF}\) differs with the optimal solution \(S\) firstly at \(d_i\) , which is a cache miss , and our solution evicts \(a\) and the optimal solution evicts \(b\) , where the future used time : \(t(a)>t(b)\) . Let's prove that we can construct a new solution \(S'\) that is same with our solution at \(d_i\) as well , and is not worse than the optimal solution \(S\).
We do not need to care about other caches and other elements . We only need to care about \(a,b\) . \(S=S'\) except the following cases :
Case 0 : At \(d_i\) , \(S'\) should evict \(a\) rather than \(b\)
Case 1 : \(d_j\) let \(a\) evicted in \(S\) before the first future used time \(t(a)\) . Then let \(S'\) evicts \(b\) at the same time , \(S\) is the same as \(S'\) .
Case 2 : At \(t(b)\) . If at this time , \(S\) evicts \(a\) , then \(S'\) can do nothing and save one cache miss . If at this time , \(S\) evicts \(c\neq a\) , then we can let \(S'\) evicts \(c\) as well , and then let \(a\) "waiting" until it is used at \(t(a)\) or evicted later .
In reality : we cannot know the sequence ahead
LRU : least recently used
using locality of reference
[ Slater , Tarjan ] : LRU is the earliest online solution (elements come one by one)
\(LRU\le k(FF+1)\) , \(k\) is the cache size .
2.8 Minimum Cost Arborescence
a.k.a 最小树形图
Description
Find a min-cost directed spanning tree rooted at \(r\) in a directed graph \(G\) .
Assumption : \(\forall e\in E,w(e)\ge 0\)
Algorithm
For \(v\in V\backslash \{r\}\) , choose the in-edge with minimum weight \(e_v\) .
Let \(F^*\) be the set of chosen edge , so \(cost(F^*)\le OPT\) .
If \(F^*\) is already a tree , we get the optimal solution .
Otherwise , for all \(v\in V\backslash\{r\}\) , let \(w'(e=(u,v))=w(u,v)-w(e_v)\) , and then contract zero-cost cycles to get a new graph \(G'\) .
Process \(G'\) as above .
Extension : matrix-tree theorem
count # of spanning tree in a graph (either directed or undirected)
\(Det(L_{00})\)
variant : count # of spanning tree in a graph with given total weight
Chapter 3 Dynamic Programming
3.1 weighted interval scheduling problem
Description
Input : \(n\) intervals \([l_i,r_i]\) with weight \(w_i\)
Goal : maximize the total weight s.t. the chosen intervals are distinct
key : Dynamic Programming recursion
- sort the jobs according to \(r_i\) (like greedy algorithm)
- define \(opt(j)\) : the optimal value for the first \(j\) intervals
\[ opt(i)=\max\{opt(i-1),opt(p(i))+w_i\} \]
where \(p(i)=\max\{j|r_j<l_i\}\) .
If we don't use interval \(i\) , it is \(opt(i-1)\) .
If we use interval \(i\) , since \(\{r_i\}\) non-decreasing , \(opt(p(i))\) contains all valid optimal solutions before .
Implementation
naive :
1
2
3
4int compute_opt(int i){
if(i==0)return 0;
else return max(compute_opt(i-1),w[i]+compute_opt(p(i)));
}Problem : can be exponential time !
Key : many redundant computation \(\to\) store them
memorization / filling out the DP table :
1
2
3
4
5
6
7
8
9
10
11
12
13
14int opt[]={-1};
// recursive implementation
int compute_opt(int i){
if(i==0) return 0;
if(opt[i]!=-1)return opt[i];
else return opt[i]=max(compute_opt(i-1),w[i]+compute_opt(p(i)));
}
// non-recursive implementation
{
opt[0]=0;
for(int i=1;i<=n;++i){
opt[i]=max(opt[i-1],w[i]+opt[p(i)]);
}
}
3.2 Subset Sum / Knapsack
Description
Input : \(n\) items , item \(i\) has weight \(w_i\) and value \(v_i\) ; Weight limit \(W\)
Goal : find a subset of items \(S\) s.t. \(\sum_{i\in S}w_i\le W\) and maximize \(\sum_{i\in S}v_i\) .
DP Algorithm
pseudo-poly time / poly time
pseudo-poly time : \(poly(n,W)\)
poly time : \(poly(n,\log W)\) ( \(\log W\) is the least number of bits to input \(W\) )
Knapsack : NP-Hard : hard to find poly time algorithm
But finding a pseudo-poly time algorithm is easy !
\(opt(i,w)\) : the optimal solution for first \(i\) items and weight \(= w\) \[ opt(i,w)=\max\begin{cases}opt(i-1,w)&\text{not using i-th item}\\ opt(i-1,w-w_i)+v_i&\text{using i-th item} \end{cases} \]
Time Complexity : \(\mathcal O(nW)\) .
Implementation
\(opt(w)\) : at \(i\)-th iteration : the optimal solution for first \(i\) items and weight \(= w\) \[ opt(w)=\max\{opt(w),opt(w-w_i)+v_i\} \] iterate \(w\) decreasingly
Memory Complexity : \(\mathcal O(n+W)\)
3.3 RNA secondary structure
Description
Input : \(\{A,C,G,U\}^n\)
Constraints :
- no sharp turns : If pairing \((i,j)\in S\) , then \(i<j-4\)
- pairing : \(A-G\) , \(C-U\)
- one base can only appear in \(\le 1\) pairs
- no crossing pairing : \(\not\exist (i,j),(l,r)\in S,i<l<j<r\)
Goal : maximize the size of pairing set \(S\)
Algorithm
\(opt(l,r)\) : the maximum number of pairs in subsequence \([l,r]\) \[ opt(l,r)=\max\begin{cases} opt(l+1,r-1)+1&a[l]\text{ and }a[r] \text{ can pairing}\\ \max_{l\le i\le r-1}\{opt(l,i)+opt(i+1,r)\} \end{cases} \] Order : \(r-l\) increasing
Time Complexity : \(\mathcal O(n^3)\)
Memory Complexity : \(\mathcal O(n^2)\)
3.4 DP on tree and tree-like graph
3.4.1 maximum independent set on tree
Description
Input : A tree , vertex-weighted
Goal : Find an Independent Set of maximum weight
Algorithm
\(opt_{0/1}(x)\) : the maximum-weighted independent set of the subtree \(x\) , \(opt_1\) means we choose \(x\) , \(opt_0\) means we don't choose \(x\) . \[ opt_1(x)=w_x+\sum_{y\in child(x)}opt_0(y)\\ opt_0(x)=\sum_{y\in child(x)}\max\{opt_0(y),opt_1(y)\} \]
Tree DP View :
- subtree is the natural subinstance
- delete \(x\) , the tree will be separated into several parts , no connection between \(subtree(y)\) (\(y\in child(x)\)) , \(G\backslash subtree(x)\)
3.4.2 treewidth
Tree Decomposition ( clique tree , junction trees )
Def [separator] : For connected graph \(G\) , a separator is a vertex set \(S\subset V\) , s.t. \(G\backslash S\) is not connected .
Def [tree decomposition] : \(T(G)\) is a tree decomposition if :
Each vertex in \(T(G)\) is called a bag \((V_T(x),E_T(x))\) : containing some vertices in \(G\) and their edges
- \(\cup_x V_T(x)=V\)
- \(\forall e\in E,\exists x\in T(G) , e\in E_T(x)\)
- \(\forall v\in V,\{x|v\in V_T(x)\}\) form a connected subtree
Def [treewidth for tree decomposition] : \(tw(T(G)):=\max_x \{|V_T(x)|\}-1\)
Lemma : If \(G\) is a tree , \(tw(T(G))=1\)
Def [treewidth for graph] : \[ tw(G)=\min\limits_{T(G)\text{ is a tree decomposition}}tw(T(G)) \]
Lemma [ bags and separators ] : \(T(G)\) is a tree decomposition of \(G\) , \(A,B\) are adjacent bags in \(T(G)\) , then
\(V_T(A)\) is a separator of \(G\) , \(V_T(A\cap B)\) is a separator of \(G\)
Independent Set on a graph with a tree decomposition
Input : given \(G\) and a tree decomposition \(T(G)\)
Goal : Find maximum Independent Set in Time Complexity \(\mathcal O(n2^{tw(T(G))})\)
Algorithm
\(opt(x,S)\) : consider the subtree \(x\) in the tree decomposition , choosing exactly \(S\) in \(V_T(x)\) , the maximum independent set
\[ opt(x,S)=w(S)+\sum_{y\in child(x)}\max_{S' , S\text{ consists with } S'\text{ on }S\cap S'}\{opt(y,S')-w(S\cap S')\} \]