Algorithm Design 12

Chapter 9 Randomized Algorithm

9.3 Hashing

  1. universal hashing function

    Intuition : want the function looks "random"

    1. Def [ universal hashing function ] A family H of functions h:U{0,,n1} is universal if u,vU,uv , Pr{hH:h(u)=h(v)}1n

    2. Design : Let pn be a prime number ( n is the size of hash table , for simplicity let p=n )

      For all xU , write x in base-p : x=i=0r1xipi , where xi{0,,p1} . H:={ha(x)=(i=0r1aixi)modp:a=(a0,,ar1),ai{0,,p1}} Goal : prove x,yU,xy , Pr{ai{0,,p1}:ha(x)=ha(y)}1n Proof :

      Suppose x=i=0r1xipi , y=i=0r1yipi , so there exists j such that xjyj . Pr{ha(x)=ha(y)}=Pr{i=0r1aixii=0r1aiyi(modp)}=Pr{aj(xjyj)i=0,ijr1ai(yixi)(modp)}=1p This is because aj{0,,p1} , and for all possible i=0,ijr1ai(yixi) , there exists exactly one aj that aj(xjyj)i=0,ijr1ai(yixi)(modp)

  2. Perfect Hashing

    1. Def : static dictionary with no collision , |U|=n , space O(n) , query time O(1) ( with h(x) oracle , performing h(x) costs O(1) time )

      static : only consider lookup operation , no insertion/deletion .

    2. FKS Perfect Hashing [1984]

      Intuition : 2-level data structure

      • Firstly sample hH , get a hash table , space O(n)
      • Secondly , if index i encounters collision (suppose the collision set is Bi={uU:h(u)=i} ), then sample hiH and get a second-level hash table , space O(|Bi|2) . If collision still happens , then resample hi , until no collision happens .
    3. Collision Analysis

      Claim : H={h} universal , h:[n][m] . If m=O(n2) , then with probability 0.9 there is no collision .

      Proof : U={x1,,xn} , define random variable Xi,j={1h(xi)=h(xj)0otherwise , define X=1i<jnXi,j E[X]=i<jE[Xi,j]=i<jPr{h(xi)=h(xj)}(n2)M By Markov's Inequality , Pr{X1}E[X]n(n1)2Mn22M Therefore , let M=5n2 , so Pr{X1}110 .

    4. Space Analysis

      Claim : E[|Bi|2]=O(n)

      Proof : Let Yi=|Bi| , Y=|Bi|2 , so in the first level , X=(Yi2)=12(Yi2Yi)=Yn2 Similarly , E[X](n2)n=n12 , so E[Y]=E[2X+n]2n1=O(n) .

  3. Balls-Bins Game

    1. n bins , n balls , E[max bins]=Θ(lognloglogn)

      Xi=# balls in bin i , Yi,j={1ball jbin i0otherwise , so Xi=Yi,j

      Let c=Θ(lognloglogn) , so c=1+δ , E[Xi]=1 , so Pr{Xic}(ec1cc)1n2

    2. Coupon Collector Problem

      nlnn balls , n bins , w.h.p. every bin is non-empty

    3. Load Balancing Problem

      knlogn balls , n bins , load of every bin (k1,k+1)logn ( k not too small )

      Application : load balancing in network

    4. Power of two choices

      n bins , n balls

      For every ball , choose 2 random bins , and place the ball in the bin with smaller load .

      E[max bins]=Θ(loglogn)

  4. (*) Cuckoo Hashing

    1. worst case : O(1) lookup , dynamic dictionary

    2. Maintain 2 hash tables , h1,h2H

      Insert(x)

      1. If h1(x) in T1 is empty , insert x into T1

      2. If T1(h1(x)) contains x , then insert x into T2 by h2(x) , and insert x into T1 by h1(x)

        Following some procedure , until an empty slot is found .

      3. If #iteration $$ threshold t , rehash all elements

      If load 50% , E[insertion time]=O(1)

      Lookup(x) : Obviously at most look up 2 times .

      Delete(x) : Use lookup then delete .

  5. Application : Closest Pair Problem

    using D&C , O(nlogn)

    using Hashing , O(n) in expectation

    1. Hash function computation modal :

      • Insertion : O(1) in expectation
      • Deletion : O(1) in expectation
      • Lookup : O(1) in expectation
    2. Algorithm :

      1. Order points randomly P1,,Pn , let δ=d(P1,P2)

      2. When we process Pi , we will find whether j<i s.t. d(Pj,Pi)<δ .

        If YES , we start a new phase with new δ=d(Pj,Pi)

      3. In each phase , maintain a grid with edge length δ2 , each cell contains 1 points in {P1,,Pi1}

        maintain all nonempty cells in a dictionary ( U : all cells )

        Suppose we are processing Pi , we only need to look up 25 cells in the dictionary to find smaller distance .

        Whenever we start a new phase , we reconstruct the whole dictionary .

    3. Analysis :

      Xi={1Pi causes anew phase0otherwise , running time T=i=1n(XiO(i)+25O(1)) .

      Therefore , E[T]=(i=1nO(i)Pr{Xi=1})+O(n) .

      Claim : Pr{Xi=1}2i ( by the initial random permutation )

      Proof : Suppose (Pa,Pb) has the minimum distance among P1,,Pi , so Pr{Xi=1}=Pr{a=ib=i}Pr{a=i}+Pr{b=i} .

      Pr{a=i}=Pr{b=i}=1i , so Pr{Xi=1}2i .

9.4 Approximate Counting

  1. Approximate Counting Problem

    1. #P : counting problem NP

    2. #P-complete

      LNP-CompleteL\#P-Complete

      L\#P-CompleteLNP-Complete : e.g. counting perfect matching

    3. FPRAS : Fully-Poly Randomized Approximation Scheme

      An algorithm in poly(n,1ϵ) , Pr{(1ϵ)ANSSOL(1+ϵ)ANS}0.99

    4. Scheme : Rejection Sampling

      Example:

      General : to estimate |G| , construct U s.t. GU

      • Can uniformly sample from U
      • Membership Oracle : xU , decide whether xG or not
      • We know |U|