Probability and Statistics 5

Chapter 2 Laws of Large Numbers

2.1 Independence

  1. THM [ Independence of r.v. is a special case of \(\sigma\)-field ]

    1. THM

      1. If random variables \(X,Y\) are independent , then \(\sigma(X),\sigma(Y)\) are independent
      2. If \(\mathcal F\) and \(\mathcal G\) are independent , \(X\in \mathcal F,Y\in \mathcal G\) , then \(X,Y\) are independent
    2. Proof

      1. We want to show that \(\forall A\in \sigma(X),B\in \sigma(Y)\) , \(P(A\cap B)=P(A)P(B)\)

        By definition of \(\sigma(X)\) , we can find \(C\in \mathcal R\) s.t. \(A=\{X\in C\}\) . Similarly , we can find \(D\in \mathcal R\) s.t. \(B=\{Y\in D\}\). Therefore , \[ P(A\cap B)=P(\{X\in C\}\cap \{Y\in D\})=P(\{X\in C\})P(\{Y\in D\})=P(A)P(B) \]

      2. We want to show that \(\forall C,D\in \mathcal R\) , \(P(\{X\in C\}\cap \{Y\in D\})=P(\{X\in C\})P(\{Y\in D\})\).

        Let \(A=\{X\in C\}\) , \(B=\{Y\in D\}\) , so \(A\in \mathcal F,B\in \mathcal G\) , so \(A,B\) are independent, so \[ P(\{X\in C\}\cap \{Y\in D\})=P(A\cap B)=P(A)P(B)=P(\{X\in C\})P(\{Y\in D\}) \]

    3. Remark

      This means that \(X,Y\) independent is equivalent to \(\sigma(X),\sigma(Y)\) independent , which is a special case of independence of \(\sigma\)-field.

  2. THM [ Independence of events is a special case of r.v. ]

    1. THM

      1. If \(A,B\) are independent , then \(A\perp\!\!\!\perp B^c\) , \(A^c\perp\!\!\!\perp B\) , \(A^c\perp\!\!\!\perp B^c\).
      2. \(A\perp\!\!\!\perp B\iff \mathbb 1_A\perp\!\!\!\perp\mathbb 1_B\)
    2. Proof

      1. \(P(A\cap B^c)=P(A)-P(A\cap B)\) , and \(P(A\cap B)=P(A)P(B)\) , so \[ P(A\cap B^c)=P(A)(1-P(B))=P(A)P(B^c) \] Similar to prove the rest.

      2. Firstly, If \(\mathbb 1_A\perp\!\!\!\perp\mathbb 1_B\) , then let \(C=D=\{1\}\) , so \(\{\mathbb 1_A\in C\}=A\) , \(\{\mathbb 1_B\in D\}=B\) , so \(P(A\cap B)=P(A)P(B)\).

        Secondly, if \(A\perp\!\!\!\perp B\) , we want to prove that \(\forall C,D\in \mathcal R\) , \(P(\{\mathbb 1_A\in C\}\{\mathbb 1_B\in D\})=P(\{\mathbb 1_A\in C\})P(\{\mathbb 1_B\in D\})\).

        \(\{\mathbb 1_A\in C\}\in\{\varnothing.A.A^c,\Omega\}\) , \(\{\mathbb 1_B\in D\}\in \{\varnothing,B,B^c,\Omega\}\).

        For \(\varnothing,\Omega\) the statement is trivial.

        For \(A,A^c\) and \(B,B^c\) , by (1) , they are all independent , so \(\mathbb 1_A\perp\!\!\!\perp\mathbb 1_B\).

    3. Remark

      This means that \(A,B\) independent is equivalent to \(\mathbb 1_A,\mathbb 1_B\) independent , which is a special case of independence of random variables.

  3. Independence of finite collection of objects

    1. Def [ Independence of finite \(\sigma\)-fields ] : \(\mathcal F_1,\cdots,\mathcal F_n\) are independent , if \(\forall A_i\in \mathcal F_i\), \[ P\left(\bigcap_{i=1}^n A_i\right)=\prod_{i=1}^n P(A_i) \]

    2. Def [ Independence of finite random variables ] : \(X_1,\cdots,X_n\) are independent , if \(\forall B_i\in \mathcal R\), \[ P\left(\bigcap_{i=1}^n \{X_i\in B_i\}\right)=\prod_{i=1}^n P(\{X_i\in B_i\}) \]

    3. Def [ Independence of finite events ] : \(A_1,\cdots,A_n\) are independent , if \(\forall I\subseteq [n]\), \[ P\left(\bigcap_{i\in I} A_i\right)=\prod_{i\in I}P(A_i) \] Remark : we need to enumerate all subset of events, not just \(I=[n]\).

      This is indeed reasonable since we can let \(X_i=\mathbb 1_{A_i}\) , let \(B_i=\begin{cases}\{1\}&i\in I\\\mathbb R&i\notin I\end{cases}\) .

    4. Def [ pairwise independence ] : \(A_1,\cdots,A_n\) are pairwise independent , if \(\forall i\neq j\in [n]\) , \(A_i\perp\!\!\!\perp A_j\).

    5. Exp [ pairwise independence \(\not\Rightarrow\) joint independence ]

      When \(A_1,A_2,A_3\) are pairwise independent , \(A_1,A_2,A_3\) may not be independent.

      Let \(X_1,X_2,X_3\) be independent random variables , \(P(X_i=0)=P(X_i=1)=\frac{1}{2}\).

      Let \(A_1=\{X_1=X_2\}\) , \(A_2=\{X_2=X_3\}\) , \(A_3=\{X_1=X_3\}\).

      \(P(A_i)=\frac{1}{2}\) , \(P(A_i\cap A_j)=\frac{1}{4}\) , \(P(A_1\cap A_2\cap A_3)=\frac{1}{4}\).

    6. Prop : If \(A_1,\cdots,A_n\) are independent,

      1. \(A_1^c,A_2,\cdots,A_n\) are independent
      2. \(\mathbb 1_{A_1},\mathbb 1_{A_2},\cdots,\mathbb 1_{A_n}\) are independent
  4. Independence of infinite collection of objects

    \(O_1,O_2,\cdots\) are independent , if any finite sub-collection is independent. ( \(O_n\) can be \(\sigma\)-field , r.v. , event ).

    THM : \(X_1,X_2,\cdots\) are independent \(\iff\) \(\sigma(X_1),\sigma(X_2),\cdots\) are independent

  5. Sufficient condition of independence

    1. (*) Def [ \(\pi\)-system ] : \(\mathcal A\) is a \(\pi\)-system , if \(\forall A,B\in \mathcal A\) , \(A\cap B\in \mathcal A\).

    2. (*) THM : Suppose \(\mathcal A_1,\cdots,A_n\) are independent , and each \(A_i\) is a \(\pi\)-system , then \(\sigma(\mathcal A_1),\sigma(\mathcal A_2),\cdots,\sigma(\mathcal A_n)\) are independent.

      \(\mathcal A_1,\cdots,\mathcal A_n\) here are not necessarily \(\sigma\)-field , we define its independence similar to definition of \(\sigma\)-field.

    3. Cor : If \(\forall x_1,\cdots,x_n\in (-\infty,\infty]\) , \[ P\left(\bigcap_{i=1}^n \{X_i\le x_i\}\right)=\prod_{i=1}^n P(\{X_i\le x_i\}) \] Then \(X_1,\cdots,X_n\) are independent.

      Let \(\mathcal A_i=\left\{\{X_i\le x_i\}:x_i\in (-\infty,\infty]\right\}\) , so \(\{X_i\le x\}\cap \{X_i\le y\}=\{X_i\le x\land y\}\) , so \(\mathcal A_i\) is \(\pi\)-system.

      Since we allow \(x_i=\infty\) ( that is \(\Omega\in A_i\) ) , so \(\sigma(\mathcal A_i)=\sigma(X_i)\) , so \(X_i\) are independent.

    4. Cor : Suppose \(\mathcal F_{i,j}\) ( \(1\le i\le n,1\le j\le m(i)\) ) are independent , let \(\mathcal G_i=\sigma(\cup_{j}\mathcal F_{i,j})\) , so \(\mathcal G_1,\cdots,\mathcal G_n\) are independent.

      Let \(\mathcal A_i=\left\{\cap_j A_{i,j}:A_{i,j}\in \mathcal F_{i,j}\right\}\) , so \(\mathcal A_i\) is a \(\pi\)-system containing \(\Omega\) and \(\cup_{j}\mathcal F_{i,j}\) , so \(\sigma(\mathcal A_i)=\mathcal G_i\).

    5. Cor : Suppose \(X_{i,j}\) ( \(1\le i\le n,1\le j\le m(i)\) ) are independent , \(f_i:\mathbb R^{m(i)}\to \mathbb R\) are measurable, then

      \(Y_i=f_i(X_{i,1},\cdots,X_{i,m(i)})\) are independent.

      Let \(\mathcal F_{i,j}=\sigma(X_{i,j})\) , \(\mathcal G_i=\sigma(\cup_{j}\mathcal F_{i,j})\) , so \(Y_i\in \mathcal G_i\).

      Remark : when \(X_1,\cdots,X_n\) are independent , let \(X=X_1,Y=X_2X_3\cdots X_n\) , so \(X\perp\!\!\!\perp Y\).

  6. Distribution, Expectation of independent random variables

    1. THM [ Distribution of Independent r.v. ] : Suppose \(X_1,\cdots,X_n\) are independent , and \(X_i\) has distribution \(\mu_i\) , then

      \((X_1,\cdots,X_n)\) has distribution \(\mu=\mu_1\times\cdots\times \mu_n\)

      Proof : \[ \begin{aligned} &\quad P((X_1,\cdots,X_n)\in (A_1,\cdots,A_n))\\ &=P\left(\bigcap_{i=1}^n \{X_i\in A_i\}\right)\\ &=\prod_{i=1}^n P(\{X_i\in A_i\})\\ &=\prod_{i=1}^n \mu_i(A_i)\\ &=\mu(A_1\times\cdots\times A_n) \end{aligned} \]

    2. THM [ Expectation of Independent r.v. ] :

      1. THM

        Suppose \(X\perp\!\!\!\perp Y\) , and have distribution \(\mu,\nu\) . If \(h:\mathbb R^2\to\mathbb R\) is a measurable function, and either \(h\ge 0\) or \(E[|h(X,Y)|]<\infty\) , then \[ E[h(X,Y)]=\iint h(x,y)\mu(dx)\nu(dy) \] In particular , if \(h(x,y)=f(x)g(y)\) , where \(f,g:\mathbb R\to\mathbb R\) are measurable functions , and either \(f,g\ge 0\) or \(E[|f(X)|],E[|g(Y)|]<\infty\) , then \[ E[f(X)g(Y)]=E[f(X)]E[g(Y)] \]

      2. Proof

        Since \(X,Y\) are independent , \(\mu\times \nu\) is the distribution of \(X\times Y\) . By Fubini's Theorem , \[ E[h(X,Y)]=\int hd(\mu\times \nu)=\iint h(x,y)\mu(dx)\nu(dy) \] When \(f,g\ge 0\) , \(h=fg\ge 0\) , so \[ E[f(X)g(Y)]=\iint f(x)g(y)\mu(dx)\nu(dy)=\int g(y)E[f(X)]\nu(dy)=E[f(X)]E[g(Y)] \] When \(E[|f(X)|],E[|g(Y)|]<\infty\) , \(E[|f(X)g(Y)|]=E[|f(X)|]E[|g(Y)|]<\infty\) (by above) , so \[ E[f(X)g(Y)]=\iint f(x)g(y)\mu(dx)\nu(dy)=\int g(y)E[f(X)]\nu(dy)=E[f(X)]E[g(Y)] \]

      3. Loophole : when \(f,g\ge 0\) , \(E[f(X)]=\infty\) , \(E[g(Y)]=0\) , what's the result ?

        Fix : \(E[g(Y)]=0\) , so \(g(Y)=0\) a.s. , so \(f(X)g(Y)=0\) a.s. , so \(E[f(X)g(Y)]=0\)

      4. Remarks :

        1. This holds for \(n\) independent r.v. . If \(X_1,\cdots,X_n\) are independent , either \(\forall i\in [n],X_i\ge 0\) or \(\forall i\in [n] , E[|X_i|]<\infty\) , then \[ E\left[\prod_{i=1}^n X_i\right]=\prod_{i=1}^n E[X_i] \]

        2. Even if \(X,Y\) are not independent , \(E[XY]=E[X]E[Y]\) can still hold .

          Def [ uncorrelated ] : If \(E[X^2],E[Y^2]<\infty\) , and \(E[XY]=E[X]E[Y]\) , then \(X,Y\) are uncorrelated.

  7. Sum of Independent Random Variables

    1. THM : If \(X,Y\) are independent , \(F(x)=P(X\le x) , G(y)=P(Y\le y)\) , then \[ P(X+Y\le Z)=\int F(z-y)d G(y) \] Suppose that \(\nu\) is the distribution of \(Y\) , \(dG(y)\) means \(\nu(dy)\) .

      Remark : this is also called the convolution of \(F\) and \(G\) , denoted as \(F*G\) . \[ (F*G)(z)=\int F(z-y)dG(y) \]

    2. THM : Suppose \(X\) with density \(f\) , \(Y\) with distribution \(G\) , \(X,Y\) are independent , then \(X+Y\) has density \(h\) : \[ h(x)=\int f(x-y)dG(y) \] Moreover , when \(Y\) with density \(g\) , \[ h(x)=\int f(x-y)g(y)dy \]

2.2 Conditional Expectation

  1. Conditioning on set

    Def [ conditioning on set ] : \(A,B\) be two events , the probability of \(A\) given \(B\) is : \(P(A|B)=\frac{P(A\cap B)}{P(B)}\).

  2. Conditioning on discrete random variables

    1. Derivation :

      Considering \(X,Z\) : discrete , with finite possibilities .

      \(X\in \{x_1,\cdots,x_m\},Z\in \{z_1,\cdots,z_n\}\) \[ P(X=x_i|Z=z_j)=\frac{P(X=x_i,Z=z_j)}{P(Z=z_j)} \]

      Remark : \(\sum\limits_{i=1}^m P(X=x_i|Z=z_j)=1\)

      We can define \[ E[X|Z=z_j]=\sum_{i=1}^m x_iP(X=x_i|Z=z_j)=h(z_j) \] which is a function of \(z_j\) . Therefore , we define \(Y=E[X|Z]\) is a random variable s.t.

      \(\forall w\in \Omega,Y(w)=h(Z(w))\)

    2. Properties

      1. \(E[X|Z]\) is a function of \(Z\)
      2. \(\forall G\in \sigma(Z)\) , \(\int_G YdP=\int_G XdP\)
    3. Proof of Prop. 2

      Consider \(G_i=\{w:Z(w)=z_i\}\) , so by definition , there exists \(I\subseteq [n]\) , s.t. \(G=\cup_{i\in I}G_i\).

      We only need to prove that \(\int_{G_i}YdP=\int_{G_i}XdP\) \[ \begin{aligned} &\quad\int_{Z=z_i}YdP\\ &=h(z_i)P(Z=z_i)\\ &=\sum_{j=1}^m x_j P(X=x_j|Z=z_i)P(Z=z_i)\\ &=\sum_{j=1}^m x_j P(X=x_j,Z=z_i)\\ &=\int_{G_i}XdP \end{aligned} \]

  3. Conditioning on continuous random variables

    1. Derivation

      1. Def [ joint PDF ] : \(f_{X,Z}\) is joint PDF , if \(\forall B\in \mathcal R^2\) , \[ P((X,Z)\in B)=\int_{(x,z)\in B}f_{X,Z}(x,z)dxdz \]

      2. Def [ marginal PDF ] : \(f_Z\) is marginal PDF for \(Z\) , defined as \[ f_Z(z)=\int_{-\infty}^{\infty}f_{X,Z}(x,z)dx \]

      3. Prop : \[ \begin{aligned} P(Z\in A)&=P(Z\in A,X\in (-\infty,\infty))\\ &=\int_{Z\in A}\int_{-\infty}^{\infty} f_{X,Z}(x,z)dxdz\\ &=\int_{Z\in A}f_Z(z)dz \end{aligned} \]

      4. Def [ conditional PDF ] : \(f_{X|Z}\) is a conditional PDF , defined as \[ f_{X|Z}(x|z)=\frac{f_{X,Z}(x,z)}{f_Z(z)} \] Therefore \[ \begin{aligned} P(X\in A,Z\in B)&=\int_{Z\in B}\int_{X\in A}f_{X,Z}(x,z)dxdz\\ &=\int_{Z\in B}\int_{X\in A}f_{X|Z}(x|z)f_Z(z)dxdz\\ &=\int_{Z\in B} f_Z(z)\left(\int_{X\in A}f_{X|Z}(x,z)dx\right)dz\\ &\sim \int_{Z\in B}f_Z(z)P(X\in A|Z=z) dz \end{aligned} \] Compare to discrete version : \[ P(X\in A,Z\in B)=\sum_{z_i\in B}P(X\in A|Z=z_i) P(Z=z_i) \] We can use \(\int_{X\in A} f_{X|Z}(x,z)dx\) to denote \(P(X\in A|Z=z)\)

      5. Def [ conditional expected value for continuous r.v. ] : \[ h(z)=E[X|Z=z]=\int xf_{X|Z}(x|z)dx \] which is a function of \(z\) . Let \(Y=h\circ Z\) .

    2. Properties

      1. \(E[X|Z]\) is a function of \(Z\)
      2. \(\forall G\in \sigma(Z)\) , \(\int_G YdP=\int_G XdP\)
    3. Proof of Prop.2

      Let \(Z_G=\{z:\exists w\in G,Z(w)=z\}\) \[ \begin{aligned} \int_{G}YdP&=\int_{G}h\circ ZdP\\ &=\int_{Z_G}h(z)f_Z(z)dz\quad\quad\text{using change of variable formula}\\ &=\int_{Z_G}f_Z(z)\int_{-\infty}^{\infty} xf_{X|Z}(x|z)dxdz\\ &=\int_{Z_G}\int_{-\infty}^{\infty} xf_{X,Z}(x,z)dxdz\\ &=\int_{(X,Z)\in R\times Z_G} xf_{X,Z}(x,z)dxdz\\ &=\int_{(X,Z)\in R\times Z_G} x\mu_{X,Z}(dxdz)\\ &=\int_{\tilde G}XdP\quad\quad\quad\quad\quad\text{using change of variable formula} \end{aligned} \] Where \(\tilde G=(X,Z)^{-1}(R\times Z_G)=\{w:(X(w),Z(w))\in R\times Z_G\}=\{w:Z(w)\in Z_G\}\).

      Firstly , For any \(w\in G\) , \(Z(w)\in Z_G\) , so \(w\in \tilde G\) .

      Secondly , For any \(w\in \tilde G\) , \(Z(w)\in Z_G\) , so \(\exists w_0\in G\) , \(Z(w)=Z(w_0)\) .

      Lemma : \(\forall G\in \sigma(Z)\) for random variable \(Z\) , If \(\exists w,w'\in \Omega\) with \(Z(w)=Z(w')\) , then \(w\in G\iff w'\in G\).

      Proof : Let \(G_z=\{w:Z(w)=z\}\) , Let \(\mathcal G=\{\cup_{z\in B}G_z:B\in \mathcal R\}\) .

      Since \(\forall B\in \mathcal R\) , \(\{w:Z(w)\in B\}=\cup_{z\in B}G_z\in \mathcal G\) , \(Z(w)\) is a measurable map from \((\Omega,\mathcal G)\) to \((\mathbb R,\mathcal R)\)

      Therefore , \(\sigma(Z)\subseteq \mathcal G\) ( since \(\sigma(Z)\) is the smallest ). Therefore , all sets in \(\sigma(Z)\) is of the form \(\cup_{z\in B}G_z:B\in \mathcal R\) , so \(G\) will contain either both of \(w,w'\) or neither of \(w,w'\). \(\Box\)

      Therefore , if \(w_0\in G\) , then \(w\in G\). Finally \(G=\tilde G\).