Algorithm Design 3

发表于 2023-10-07 更新于 2023-11-05 分类于课程笔记，算法设计阅读次数：本文字数： 1.5k 阅读时长 ≈ 5 分钟

[2023 秋] 算法设计第三课课堂笔记 / [2023 Fall] Algorithm Design Lecture 3 Notes

Chapter 2 Greedy Algorithm

2.3 Minimum Spanning Tree (undirected)

Definition
1. Input : undirected , edge-weighted graph $G$
2. Output : Minimum Spanning Tree of $G$
  
  Spanning Tree : a subgraph of $G$ that is a tree containing all vertices
  
  Minimum : the sum of weights on tree's all edges is minimized
Properties
1. Spanning Tree
  1. Connectivity $\leftrightarrow$ check connectivity $\to$ cut
  2. Acyclic
2. Cut
  1. Def : Graph $G=(V,E),V=A\cup B,A\cap B=\varnothing$
    
    $(A,B)$-cut : $E(A,B)=\{(u,v)\in E|u\in A,v\in B\}$
  2. Observation : If $T$ is a Spanning Tree , $E(A,B)$ is any cut , then $T\cap E(A,B)\neq \varnothing$
    
    Otherwise , not connected
3. Lemma : Assumption : weights are distinct
  
  Suppose $e=(u,v),u\in A,v\in B$ is the edge with minimum weight in $(A,B)$-cut , then every MST must contain $e$ .
  
  Proof : [Exchange Argument , Prove by contradiction]
  
  Suppose $T$ is a MST but $e\notin T$ . By the observation , there exists $e'\in T,e'\in E(A,B)$ .
  
  Let $T'=T-e'+e$ , then $T$ is still connected , with smaller weight .

Problem : $T$ can have cycle !

Correction :
Choose one specific $e'$ : $e\notin T$ , then $e$ and some edges in $T$ can form a cycle ( the path from $u$ to $v$ on $T$ ). This cycle must contain an edge $e'\in E(A,B)$ .

Kruskal's Algorithm
1. Algorithm
  
  Successively inserting edges from $E$ in increasing order of weight .
  
  If edge $e$ would create a cycle . discard it .
  
  (Using Union-Find Set)
2. Proof of Correctness
  
  Consider every added edge $e$ , $e$ is the min-weight edge in some cut
  
  $e=(u,v)$ , consider the connected component $C$ containing $u$ , then $e$ is the minimum weight edge in cut $E(C,V\backslash C)$

Prim's Algorithm

Algorithm

At each step , add the node that can be attached as cheaply as possible to the partial tree we already have .

Naive Implementation

// d[x] : current minimum edge in cut E({x},S)
// Init 
for (v in V){
    d[v]=+inf;
}
d[1]=0;S.insert(1);
while(S!=V){
    int x=argmin(d[v] | v in V\S);
    S.insert(x); // using the edge d[x]
    for(y in Neighbour(x)){
        d[y]=min(d[y],w(x,y));
    }
}

Improvement : Priority Queue

Reverse Deletion

Successively removing edges from $E$ in decreasing order of weight .

If removing edge $e$ would cause disconnectedness , discard it .

The reverse version of Kruskal's Algorithm
Faster algorithms

[Chazelle] Deterministic Algorithm $\mathcal O(|E|\alpha(|E|))$

This means MST is weaker than sorting

[Karger , Klem , Tarjan] Randomized Algorithm $\mathcal O(|E|+|V|)$

Open Question : Deterministic Linear Algorithm

2.4 Union-Find Set

Definition

Maintain sets of elements , support :
1. find(element e) , return the set that contains $e$
2. union(set Si,set Sj) , combine set $S_i$ and $S_j$ , return the union set $S_i\cup S_j$
Implementation

The sets can be viewed as trees .

The set can be represented by the root of the corresponding tree .

find : find the root of the tree (keep finding father)

union : $fa(root(S_1))\leftarrow root(S_2)$

Problem : tree can be very deep
Improve
1. 启发式合并 / Heuristic merging
  
  To make the tree shallow , suppose $|S_1|\le |S_2|$
  
  小集合合并到大集合上
2. 路径压缩 / Path Compression
  
  Suppose one find : $e,v_1,\cdots,v_k,root$ , after the find , let $fa(e)=fa(v_1)=\cdots=fa(v_k)=root$
Time Complexity

Amortized Analysis / 均摊分析 ( using Path Compression and Heuristic merging)

Suppose we have a sequence of find or union operations , with length $m$ .

Total running time of union-find is $\mathcal O(m\alpha(m,n))$

$\alpha$ : inverse Ackermann's function , grows very slow

Another slow function : $\log^*(n)$ , number of logs to make $n$ to small constant (i.e. $\log(\log\cdots\log(n))\le 5$)

See [CLRS] for detailed proof .

Kruskal's Algorithm Implementation

sort(E); // weight-increasing
for(edge e=(u,v):E){
    if(find(u)!=find(v)){
        T.insert(e);
        union(find(u),find(v));
    }
}

Total time complexity : $\mathcal O(|E|\log|E|)$ .

2.5 Priority Queue

Definition

Maintain a set of elements $S$ , support :

insert(S,x) , insert $x$ into $S$ .

extract_max(S) , return the maximum element in $S$ , and then remove this element .

max(S) , return the maximum element in $S$

increase_key(S,x,k) , increase value of $x$ by $k$

(Basic) (max) Heap

Property : Complete binary tree , parent $$ both children
Can be implemented in an array

fa[x]=x/2 , left_child[x]=2x , right_child[x]=2x+1

insert(S,x){ // flow down , O(log n)
    val[|S|]=x;|S|++;
    int p=|S|-1;
    while(p!=root&&val[p]>val[fa[p]]){
        swap(val[p],val[fa[p]]);
        p=fa[p];
    }
}
// Note : build a heap : O(n)

extract_max(S){ // flow up , O(log n)
    int ret_val=val[root];
    swap(val[root],val[|S|-1]); // swap the root and the last element
    |S|--; // delete original root
    int p=root;
    while(val[p]<val[left_child[p]]||val[p]<val[right_child[p]]){
        if(val[left_child[p]]>val[right_child[p]]){
            swap(val[p],val[left_child[p]]);
            p=left_child[p];
        }
        else{
            swap(val[p],val[right_child[p]]);
            p=right_child[p];
        }
    }
    return ret_val;
}

1
2
3

max(S){ // O(1)
    return val[root];
}

increase_key(S,p,k){ // flow down , O(log n)
    val[p]=val[p]+k;
    while(p!=root&&val[p]>val[fa[p]]){
        swap(val[p],val[fa[p]]);
        p=fa[p];
    }
}

Advanced : Fibonacci Heap

Time complexity (amortized-time)

operations	binary heap	Fibonacci heap
`insert`	$\mathcal O(\log n)$	$\mathcal O(1)$
`extract_max`	$\mathcal O(\log n)$	$\mathcal O(\log n)$
`max`	$\mathcal O(1)$	$\mathcal O(1)$
`increase_key`	$\mathcal O(\log n)$	$\mathcal O(1)$

See [CLRS] for detailed Fibonacci heap .

Prim's Algorithm with Priority Queue

// using min priority queue
for (v in V\{1}){
    d[v]=+inf;
    inS[v]=false;
    PQ.insert((v,+inf));
}
d[1]=0;inS[1]=true;
PQ.insert((1,0));
// index 1 , value 0
for(int i=1;i<n;++i){ // n-1 rounds
    int x=PQ.extract_min(); // argmin value
    inS[x]=true;
    // using edge d[x]
    for(y in Neighbour(x)){
        if(!inS[y] && w(x,y)<d[y]){
            PQ.decrease_key(y,d[y]-w(x,y));
            d[y]=w(x,y);
        }
    }
}

Time Complexity : $\mathcal O(|E|\log |V|)$ .

Dijkstra's Algorithm with Priority Queue

// using min priority queue
for(v in V\{s}){
    d[v]=+inf;
    inS[v]=false;
    PQ.insert((v,+inf))
}
d[s]=0;inS[s]=true;
PQ.insert((v,+inf));
for(int i=1;i<n;++i){ // n-1 rounds
    int x=PQ.extract_min();
    inS[x]=true;
    for(y in Neighbour(x)){
        if(!inS[y] && d[y] > d[x]+w(x,y) ){
            PQ.decrease_key(d[y]-d[x]-w(x,y));
            d[y]=d[x]+w(x,y);
        }
    }
}

Time Complexity : $\mathcal O(|E|\log |V|)$ .

2.6 Huffman Code

Code Definition : a set $S$ of letters encode to $\{0,1\}^*$ , $r:S\to \{0,1\}^*$

Properties
1. $r$ is one-to-one / injection
2. encoding / decoding efficiently
3. minimize average length of the code
4. [optional] robust to errors , Error Correction Code
Prefix Code
1. Def : $\forall x,y\in S$ , $r(x)$ is not a prefix of $r(y)$
2. Property : useful for decoding (unique decode) , can decode greedily
3. e.g. $S=\{a,b,c,d,e\}$
  
  Prefix Code \[ r(a)=11,r(b)=01,r(c)=001,r(d)=10,r(e)=000 \]
  
  \[ decode(0000011101)=decode(000\ 001\ 11\ 01)=ecab \]
  
  non-Prefix Code \[ r(a)=110,r(b)=11,r(c)=01 \]
  
  \[ decode(11011)=decode(110\ 11)=decode(11\ 01\ 1) \]
4. Prefix Code can be represented using a binary tree
  
  Leafy tree : each leaf corresponds to a letter
Minimize the length
1. Average encoding length
  
  Suppose each letter has a frequency $p(x)$ , $\sum_{x\in S}p(x)=1$ .
  
  Average encoding length : \[ AEL(r)=\sum_{x\in S}p(x)|r(x)| \]
2. [Shannon] Source Coding Theorem \[ \forall r,AEL(r)\ge \sum_{x\in S}-p(x)\log p(x) =:H \] See : [Thomas Cover] Information Theory
Lemma

There is an optimal code ( or an optimal binary tree ) in which two lowest-frequency letter are assigned to leaves that are as deep as possible , and are siblings .

Proof : 同层换：对 AEL 无影响；跨层： exchange argument

Huffman's Code

Initially , construct a set $S$ containing all letters.

while(|S|>1){
    elem x,y;
    x=extract_min(S) , y=extract_min(S);
    elem new_elem=(id,p(x)+p(y));
    insert(S,new_elem);
}

operations	binary heap	Fibonacci heap
`insert`	\(\mathcal O(\log n)\)	\(\mathcal O(1)\)
`extract_max`	\(\mathcal O(\log n)\)	\(\mathcal O(\log n)\)
`max`	\(\mathcal O(1)\)	\(\mathcal O(1)\)
`increase_key`	\(\mathcal O(\log n)\)	\(\mathcal O(1)\)