Probability and Statistics 5
Chapter 2 Laws of Large Numbers
2.1 Independence
THM [ Independence of r.v. is a special case of \(\sigma\)-field ]
THM
- If random variables \(X,Y\) are independent , then \(\sigma(X),\sigma(Y)\) are independent
- If \(\mathcal F\) and \(\mathcal G\) are independent , \(X\in \mathcal F,Y\in \mathcal G\) , then \(X,Y\) are independent
Proof
We want to show that \(\forall A\in \sigma(X),B\in \sigma(Y)\) , \(P(A\cap B)=P(A)P(B)\)
By definition of \(\sigma(X)\) , we can find \(C\in \mathcal R\) s.t. \(A=\{X\in C\}\) . Similarly , we can find \(D\in \mathcal R\) s.t. \(B=\{Y\in D\}\). Therefore , \[ P(A\cap B)=P(\{X\in C\}\cap \{Y\in D\})=P(\{X\in C\})P(\{Y\in D\})=P(A)P(B) \]
We want to show that \(\forall C,D\in \mathcal R\) , \(P(\{X\in C\}\cap \{Y\in D\})=P(\{X\in C\})P(\{Y\in D\})\).
Let \(A=\{X\in C\}\) , \(B=\{Y\in D\}\) , so \(A\in \mathcal F,B\in \mathcal G\) , so \(A,B\) are independent, so \[ P(\{X\in C\}\cap \{Y\in D\})=P(A\cap B)=P(A)P(B)=P(\{X\in C\})P(\{Y\in D\}) \]
Remark
This means that \(X,Y\) independent is equivalent to \(\sigma(X),\sigma(Y)\) independent , which is a special case of independence of \(\sigma\)-field.
THM [ Independence of events is a special case of r.v. ]
THM
- If \(A,B\) are independent , then \(A\perp\!\!\!\perp B^c\) , \(A^c\perp\!\!\!\perp B\) , \(A^c\perp\!\!\!\perp B^c\).
- \(A\perp\!\!\!\perp B\iff \mathbb 1_A\perp\!\!\!\perp\mathbb 1_B\)
Proof
\(P(A\cap B^c)=P(A)-P(A\cap B)\) , and \(P(A\cap B)=P(A)P(B)\) , so \[ P(A\cap B^c)=P(A)(1-P(B))=P(A)P(B^c) \] Similar to prove the rest.
Firstly, If \(\mathbb 1_A\perp\!\!\!\perp\mathbb 1_B\) , then let \(C=D=\{1\}\) , so \(\{\mathbb 1_A\in C\}=A\) , \(\{\mathbb 1_B\in D\}=B\) , so \(P(A\cap B)=P(A)P(B)\).
Secondly, if \(A\perp\!\!\!\perp B\) , we want to prove that \(\forall C,D\in \mathcal R\) , \(P(\{\mathbb 1_A\in C\}\{\mathbb 1_B\in D\})=P(\{\mathbb 1_A\in C\})P(\{\mathbb 1_B\in D\})\).
\(\{\mathbb 1_A\in C\}\in\{\varnothing.A.A^c,\Omega\}\) , \(\{\mathbb 1_B\in D\}\in \{\varnothing,B,B^c,\Omega\}\).
For \(\varnothing,\Omega\) the statement is trivial.
For \(A,A^c\) and \(B,B^c\) , by (1) , they are all independent , so \(\mathbb 1_A\perp\!\!\!\perp\mathbb 1_B\).
Remark
This means that \(A,B\) independent is equivalent to \(\mathbb 1_A,\mathbb 1_B\) independent , which is a special case of independence of random variables.
Independence of finite collection of objects
Def [ Independence of finite \(\sigma\)-fields ] : \(\mathcal F_1,\cdots,\mathcal F_n\) are independent , if \(\forall A_i\in \mathcal F_i\), \[ P\left(\bigcap_{i=1}^n A_i\right)=\prod_{i=1}^n P(A_i) \]
Def [ Independence of finite random variables ] : \(X_1,\cdots,X_n\) are independent , if \(\forall B_i\in \mathcal R\), \[ P\left(\bigcap_{i=1}^n \{X_i\in B_i\}\right)=\prod_{i=1}^n P(\{X_i\in B_i\}) \]
Def [ Independence of finite events ] : \(A_1,\cdots,A_n\) are independent , if \(\forall I\subseteq [n]\), \[ P\left(\bigcap_{i\in I} A_i\right)=\prod_{i\in I}P(A_i) \] Remark : we need to enumerate all subset of events, not just \(I=[n]\).
This is indeed reasonable since we can let \(X_i=\mathbb 1_{A_i}\) , let \(B_i=\begin{cases}\{1\}&i\in I\\\mathbb R&i\notin I\end{cases}\) .
Def [ pairwise independence ] : \(A_1,\cdots,A_n\) are pairwise independent , if \(\forall i\neq j\in [n]\) , \(A_i\perp\!\!\!\perp A_j\).
Exp [ pairwise independence \(\not\Rightarrow\) joint independence ]
When \(A_1,A_2,A_3\) are pairwise independent , \(A_1,A_2,A_3\) may not be independent.
Let \(X_1,X_2,X_3\) be independent random variables , \(P(X_i=0)=P(X_i=1)=\frac{1}{2}\).
Let \(A_1=\{X_1=X_2\}\) , \(A_2=\{X_2=X_3\}\) , \(A_3=\{X_1=X_3\}\).
\(P(A_i)=\frac{1}{2}\) , \(P(A_i\cap A_j)=\frac{1}{4}\) , \(P(A_1\cap A_2\cap A_3)=\frac{1}{4}\).
Prop : If \(A_1,\cdots,A_n\) are independent,
- \(A_1^c,A_2,\cdots,A_n\) are independent
- \(\mathbb 1_{A_1},\mathbb 1_{A_2},\cdots,\mathbb 1_{A_n}\) are independent
Independence of infinite collection of objects
\(O_1,O_2,\cdots\) are independent , if any finite sub-collection is independent. ( \(O_n\) can be \(\sigma\)-field , r.v. , event ).
THM : \(X_1,X_2,\cdots\) are independent \(\iff\) \(\sigma(X_1),\sigma(X_2),\cdots\) are independent
Sufficient condition of independence
(*) Def [ \(\pi\)-system ] : \(\mathcal A\) is a \(\pi\)-system , if \(\forall A,B\in \mathcal A\) , \(A\cap B\in \mathcal A\).
(*) THM : Suppose \(\mathcal A_1,\cdots,A_n\) are independent , and each \(A_i\) is a \(\pi\)-system , then \(\sigma(\mathcal A_1),\sigma(\mathcal A_2),\cdots,\sigma(\mathcal A_n)\) are independent.
\(\mathcal A_1,\cdots,\mathcal A_n\) here are not necessarily \(\sigma\)-field , we define its independence similar to definition of \(\sigma\)-field.
Cor : If \(\forall x_1,\cdots,x_n\in (-\infty,\infty]\) , \[ P\left(\bigcap_{i=1}^n \{X_i\le x_i\}\right)=\prod_{i=1}^n P(\{X_i\le x_i\}) \] Then \(X_1,\cdots,X_n\) are independent.
Let \(\mathcal A_i=\left\{\{X_i\le x_i\}:x_i\in (-\infty,\infty]\right\}\) , so \(\{X_i\le x\}\cap \{X_i\le y\}=\{X_i\le x\land y\}\) , so \(\mathcal A_i\) is \(\pi\)-system.
Since we allow \(x_i=\infty\) ( that is \(\Omega\in A_i\) ) , so \(\sigma(\mathcal A_i)=\sigma(X_i)\) , so \(X_i\) are independent.
Cor : Suppose \(\mathcal F_{i,j}\) ( \(1\le i\le n,1\le j\le m(i)\) ) are independent , let \(\mathcal G_i=\sigma(\cup_{j}\mathcal F_{i,j})\) , so \(\mathcal G_1,\cdots,\mathcal G_n\) are independent.
Let \(\mathcal A_i=\left\{\cap_j A_{i,j}:A_{i,j}\in \mathcal F_{i,j}\right\}\) , so \(\mathcal A_i\) is a \(\pi\)-system containing \(\Omega\) and \(\cup_{j}\mathcal F_{i,j}\) , so \(\sigma(\mathcal A_i)=\mathcal G_i\).
Cor : Suppose \(X_{i,j}\) ( \(1\le i\le n,1\le j\le m(i)\) ) are independent , \(f_i:\mathbb R^{m(i)}\to \mathbb R\) are measurable, then
\(Y_i=f_i(X_{i,1},\cdots,X_{i,m(i)})\) are independent.
Let \(\mathcal F_{i,j}=\sigma(X_{i,j})\) , \(\mathcal G_i=\sigma(\cup_{j}\mathcal F_{i,j})\) , so \(Y_i\in \mathcal G_i\).
Remark : when \(X_1,\cdots,X_n\) are independent , let \(X=X_1,Y=X_2X_3\cdots X_n\) , so \(X\perp\!\!\!\perp Y\).
Distribution, Expectation of independent random variables
THM [ Distribution of Independent r.v. ] : Suppose \(X_1,\cdots,X_n\) are independent , and \(X_i\) has distribution \(\mu_i\) , then
\((X_1,\cdots,X_n)\) has distribution \(\mu=\mu_1\times\cdots\times \mu_n\)
Proof : \[ \begin{aligned} &\quad P((X_1,\cdots,X_n)\in (A_1,\cdots,A_n))\\ &=P\left(\bigcap_{i=1}^n \{X_i\in A_i\}\right)\\ &=\prod_{i=1}^n P(\{X_i\in A_i\})\\ &=\prod_{i=1}^n \mu_i(A_i)\\ &=\mu(A_1\times\cdots\times A_n) \end{aligned} \]
THM [ Expectation of Independent r.v. ] :
THM
Suppose \(X\perp\!\!\!\perp Y\) , and have distribution \(\mu,\nu\) . If \(h:\mathbb R^2\to\mathbb R\) is a measurable function, and either \(h\ge 0\) or \(E[|h(X,Y)|]<\infty\) , then \[ E[h(X,Y)]=\iint h(x,y)\mu(dx)\nu(dy) \] In particular , if \(h(x,y)=f(x)g(y)\) , where \(f,g:\mathbb R\to\mathbb R\) are measurable functions , and either \(f,g\ge 0\) or \(E[|f(X)|],E[|g(Y)|]<\infty\) , then \[ E[f(X)g(Y)]=E[f(X)]E[g(Y)] \]
Proof
Since \(X,Y\) are independent , \(\mu\times \nu\) is the distribution of \(X\times Y\) . By Fubini's Theorem , \[ E[h(X,Y)]=\int hd(\mu\times \nu)=\iint h(x,y)\mu(dx)\nu(dy) \] When \(f,g\ge 0\) , \(h=fg\ge 0\) , so \[ E[f(X)g(Y)]=\iint f(x)g(y)\mu(dx)\nu(dy)=\int g(y)E[f(X)]\nu(dy)=E[f(X)]E[g(Y)] \] When \(E[|f(X)|],E[|g(Y)|]<\infty\) , \(E[|f(X)g(Y)|]=E[|f(X)|]E[|g(Y)|]<\infty\) (by above) , so \[ E[f(X)g(Y)]=\iint f(x)g(y)\mu(dx)\nu(dy)=\int g(y)E[f(X)]\nu(dy)=E[f(X)]E[g(Y)] \]
Loophole : when \(f,g\ge 0\) , \(E[f(X)]=\infty\) , \(E[g(Y)]=0\) , what's the result ?
Fix : \(E[g(Y)]=0\) , so \(g(Y)=0\) a.s. , so \(f(X)g(Y)=0\) a.s. , so \(E[f(X)g(Y)]=0\)
Remarks :
This holds for \(n\) independent r.v. . If \(X_1,\cdots,X_n\) are independent , either \(\forall i\in [n],X_i\ge 0\) or \(\forall i\in [n] , E[|X_i|]<\infty\) , then \[ E\left[\prod_{i=1}^n X_i\right]=\prod_{i=1}^n E[X_i] \]
Even if \(X,Y\) are not independent , \(E[XY]=E[X]E[Y]\) can still hold .
Def [ uncorrelated ] : If \(E[X^2],E[Y^2]<\infty\) , and \(E[XY]=E[X]E[Y]\) , then \(X,Y\) are uncorrelated.
Sum of Independent Random Variables
THM : If \(X,Y\) are independent , \(F(x)=P(X\le x) , G(y)=P(Y\le y)\) , then \[ P(X+Y\le Z)=\int F(z-y)d G(y) \] Suppose that \(\nu\) is the distribution of \(Y\) , \(dG(y)\) means \(\nu(dy)\) .
Remark : this is also called the convolution of \(F\) and \(G\) , denoted as \(F*G\) . \[ (F*G)(z)=\int F(z-y)dG(y) \]
THM : Suppose \(X\) with density \(f\) , \(Y\) with distribution \(G\) , \(X,Y\) are independent , then \(X+Y\) has density \(h\) : \[ h(x)=\int f(x-y)dG(y) \] Moreover , when \(Y\) with density \(g\) , \[ h(x)=\int f(x-y)g(y)dy \]
2.2 Conditional Expectation
Conditioning on set
Def [ conditioning on set ] : \(A,B\) be two events , the probability of \(A\) given \(B\) is : \(P(A|B)=\frac{P(A\cap B)}{P(B)}\).
Conditioning on discrete random variables
Derivation :
Considering \(X,Z\) : discrete , with finite possibilities .
\(X\in \{x_1,\cdots,x_m\},Z\in \{z_1,\cdots,z_n\}\) \[ P(X=x_i|Z=z_j)=\frac{P(X=x_i,Z=z_j)}{P(Z=z_j)} \]
Remark : \(\sum\limits_{i=1}^m P(X=x_i|Z=z_j)=1\)
We can define \[ E[X|Z=z_j]=\sum_{i=1}^m x_iP(X=x_i|Z=z_j)=h(z_j) \] which is a function of \(z_j\) . Therefore , we define \(Y=E[X|Z]\) is a random variable s.t.
\(\forall w\in \Omega,Y(w)=h(Z(w))\)
Properties
- \(E[X|Z]\) is a function of \(Z\)
- \(\forall G\in \sigma(Z)\) , \(\int_G YdP=\int_G XdP\)
Proof of Prop. 2
Consider \(G_i=\{w:Z(w)=z_i\}\) , so by definition , there exists \(I\subseteq [n]\) , s.t. \(G=\cup_{i\in I}G_i\).
We only need to prove that \(\int_{G_i}YdP=\int_{G_i}XdP\) \[ \begin{aligned} &\quad\int_{Z=z_i}YdP\\ &=h(z_i)P(Z=z_i)\\ &=\sum_{j=1}^m x_j P(X=x_j|Z=z_i)P(Z=z_i)\\ &=\sum_{j=1}^m x_j P(X=x_j,Z=z_i)\\ &=\int_{G_i}XdP \end{aligned} \]
Conditioning on continuous random variables
Derivation
Def [ joint PDF ] : \(f_{X,Z}\) is joint PDF , if \(\forall B\in \mathcal R^2\) , \[ P((X,Z)\in B)=\int_{(x,z)\in B}f_{X,Z}(x,z)dxdz \]
Def [ marginal PDF ] : \(f_Z\) is marginal PDF for \(Z\) , defined as \[ f_Z(z)=\int_{-\infty}^{\infty}f_{X,Z}(x,z)dx \]
Prop : \[ \begin{aligned} P(Z\in A)&=P(Z\in A,X\in (-\infty,\infty))\\ &=\int_{Z\in A}\int_{-\infty}^{\infty} f_{X,Z}(x,z)dxdz\\ &=\int_{Z\in A}f_Z(z)dz \end{aligned} \]
Def [ conditional PDF ] : \(f_{X|Z}\) is a conditional PDF , defined as \[ f_{X|Z}(x|z)=\frac{f_{X,Z}(x,z)}{f_Z(z)} \] Therefore \[ \begin{aligned} P(X\in A,Z\in B)&=\int_{Z\in B}\int_{X\in A}f_{X,Z}(x,z)dxdz\\ &=\int_{Z\in B}\int_{X\in A}f_{X|Z}(x|z)f_Z(z)dxdz\\ &=\int_{Z\in B} f_Z(z)\left(\int_{X\in A}f_{X|Z}(x,z)dx\right)dz\\ &\sim \int_{Z\in B}f_Z(z)P(X\in A|Z=z) dz \end{aligned} \] Compare to discrete version : \[ P(X\in A,Z\in B)=\sum_{z_i\in B}P(X\in A|Z=z_i) P(Z=z_i) \] We can use \(\int_{X\in A} f_{X|Z}(x,z)dx\) to denote \(P(X\in A|Z=z)\)
Def [ conditional expected value for continuous r.v. ] : \[ h(z)=E[X|Z=z]=\int xf_{X|Z}(x|z)dx \] which is a function of \(z\) . Let \(Y=h\circ Z\) .
Properties
- \(E[X|Z]\) is a function of \(Z\)
- \(\forall G\in \sigma(Z)\) , \(\int_G YdP=\int_G XdP\)
Proof of Prop.2
Let \(Z_G=\{z:\exists w\in G,Z(w)=z\}\) \[ \begin{aligned} \int_{G}YdP&=\int_{G}h\circ ZdP\\ &=\int_{Z_G}h(z)f_Z(z)dz\quad\quad\text{using change of variable formula}\\ &=\int_{Z_G}f_Z(z)\int_{-\infty}^{\infty} xf_{X|Z}(x|z)dxdz\\ &=\int_{Z_G}\int_{-\infty}^{\infty} xf_{X,Z}(x,z)dxdz\\ &=\int_{(X,Z)\in R\times Z_G} xf_{X,Z}(x,z)dxdz\\ &=\int_{(X,Z)\in R\times Z_G} x\mu_{X,Z}(dxdz)\\ &=\int_{\tilde G}XdP\quad\quad\quad\quad\quad\text{using change of variable formula} \end{aligned} \] Where \(\tilde G=(X,Z)^{-1}(R\times Z_G)=\{w:(X(w),Z(w))\in R\times Z_G\}=\{w:Z(w)\in Z_G\}\).
Firstly , For any \(w\in G\) , \(Z(w)\in Z_G\) , so \(w\in \tilde G\) .
Secondly , For any \(w\in \tilde G\) , \(Z(w)\in Z_G\) , so \(\exists w_0\in G\) , \(Z(w)=Z(w_0)\) .
Lemma : \(\forall G\in \sigma(Z)\) for random variable \(Z\) , If \(\exists w,w'\in \Omega\) with \(Z(w)=Z(w')\) , then \(w\in G\iff w'\in G\).
Proof : Let \(G_z=\{w:Z(w)=z\}\) , Let \(\mathcal G=\{\cup_{z\in B}G_z:B\in \mathcal R\}\) .
Since \(\forall B\in \mathcal R\) , \(\{w:Z(w)\in B\}=\cup_{z\in B}G_z\in \mathcal G\) , \(Z(w)\) is a measurable map from \((\Omega,\mathcal G)\) to \((\mathbb R,\mathcal R)\)
Therefore , \(\sigma(Z)\subseteq \mathcal G\) ( since \(\sigma(Z)\) is the smallest ). Therefore , all sets in \(\sigma(Z)\) is of the form \(\cup_{z\in B}G_z:B\in \mathcal R\) , so \(G\) will contain either both of \(w,w'\) or neither of \(w,w'\). \(\Box\)
Therefore , if \(w_0\in G\) , then \(w\in G\). Finally \(G=\tilde G\).