Probability and Statistics 6
Chapter 2 Laws of Large Numbers
2.2 Conditional Expectation
Conditioning on general random variables
Def [ conditional expected value ] :
Let \(X,Z\) be random variables , \(E[|X|]<\infty\) , define \(Y=E[X|Z]\) with following properties:
(i). \(Y\) is a function of \(Z\)
(ii). \(E[|Y|]<\infty\)
(iii). \(\forall G\in \sigma(Z)\) , \(\int_{G}YdP=\int_GXdP\)
Moreover , if \(\tilde Y\) is a random variable satisfying (i)(ii)(iii) , then \(\tilde Y=Y\) a.s.
This definition is given by Kolmogorov in 1933 : "Fundamental Concepts of Probability Theory"
Def [ (regular) conditional probability ] \[ P(X\in A|Z):=E[\mathbb 1(X\in A)|Z] \] Def [ (regular) conditional density ] \[ \exists f_{X|Z}(x|z) \text{ s.t. } P(X\in A|Z)(w)=\int_Af_{X|Z}(x|Z(w))dx \]
Here regular means this definition only holds in general cases. There may exist loophole in some special cases.
Famous Results
Bayes rule \[ \begin{aligned} P(B|A)&=\frac{P(A|B)P(B)}{P(A)}\\ P(X=x|Z=z)&=\frac{P(Z=z|X=x)P(X=x)}{P(Z=z)}\\ f_{X|Z}(x|z)&=\frac{f_{Z|X}(z|x)f_X(x)}{f_Z(z)} \end{aligned} \]
Correlation
Intuition
\(X\perp\!\!\!\perp Y\Rightarrow E[(X-EX)(Y-EY)]=E[X-EX]E[Y-EY]=0\)
\(E[(X-EX)(Y-EY)]\not\Rightarrow X\perp\!\!\!\perp Y\)
Def [ uncorrelated ] \(X,Y\) are uncorrelated random variables , if \(E[(X-EX)(Y-EY)]=0\)
This definition is equivalent as \(E[XY]=E[X]E[Y]\)
Def [ correlation coefficient ] \[ \rho=\frac{E[(X-EX)(Y-EY)]}{\sqrt{Var(X)}\sqrt{Var(Y)}} \] Def [ covariance ] : \(E[(X-EX)(Y-EY)]\)
THM : Let \(X_1,\cdots,X_n\) have \(E[X_i^2]<\infty\) and be uncorrelated , then \[ Var(X_1+\cdots+X_n)=\sum_{i=1}^n Var(X_i) \] Proof : Let \(S_n=X_1+\cdots+X_n\) \[ \begin{aligned} Var(S_n)&=E[(S_n-E[S_n])^2)]\\ &=E\left[\sum_{i=1}^n(X_i-\mu_i)\right]^2\\ &=E\left[\sum_{i=1}^n (X_i-\mu_i)^2+\sum_{i\neq j}(X_i-\mu_i)(X_j-\mu_j)\right]\\ &=\sum_{i=1}^n E[(X_i-\mu_i)^2]+\sum_{i\neq j}E[(X_i-\mu_i)(X_j-\mu_j)]\\ &=\sum_{i=1}^n Var(X_i) \end{aligned} \]
THM : \(Var(cX)=c^2Var(X)\)
Remark : for "totally correlated" r.v.
2.3 Introduction of LLN
Def [ i.i.d. ] i.i.d. means independent and identically distributed
THM [ week law of large numbers (WLLN) ]
THM : Let \(X_1,X_2,\cdots\) be i.i.d. , with \(\lim\limits_{x\to\infty}xP(|X_1|>x)=0\). Let \(\mu_n=E[X_1\mathbb 1(|X_1|\le n)]\) , so \[ \frac{1}{n}\sum_{i=1}^n X_i-\mu_n\xrightarrow{P} 0 \]
Remarks
A sufficient condition for \(\lim\limits_{x\to\infty} xP(|X_1|>x)=0\) is \(E[|X_1|]<\infty\)
Proof : \[ \begin{aligned} &\quad xP(|X_1|>x)\\ &=x\int \mathbb 1(|X_1|>x)dP\\ &\le\int |X_1|\mathbb 1(|X_1|>x)dP\\ &=E[|X_1|\mathbb 1(|X_1|>x)]\\ &\to 0\qquad\qquad\qquad\qquad\qquad\text{since }E[|X_1|]<\infty \end{aligned} \]
When \(E[|X_1|]<\infty\) , using Dominated Convergence Theorem , \[ \mu_n=E[X_1\mathbb 1(|X_1|\le n)]\to E[X_1]=\mu \]
THM [ WLLN in common ] :
Let \(X_1,X_2,\cdots\) be i.i.d. , with \(E[|X_i|]<\infty\) , and \(\mu=E[X_i]\) , then as \(n\to\infty\) , \[ \frac{1}{n}\sum_{i=1}^n X_i\xrightarrow{P}\mu \]
THM [ strong law of large numbers (SLLN) ]
Let \(X_1,X_2,\cdots\) be pairwise independent and identically distributed , \(E[|X_i|]<\infty\) , and \(\mu=E[X_i]\) , therefore, as \(n\to\infty\), \[ \frac{1}{n}\sum_{i=1}^n X_i\to\mu\qquad a.s. \]
2.4 Weak Laws of Large Numbers
(*) \(L^2\) Weak Laws
THM [ \(L^2\) weak law ] : Let \(X_1,X_2,\cdots\) be uncorrelated r.v. , with \(E[X_i]=\mu\) and \(Var(X_i)\le C<\infty\). Then as \(n\to\infty\), \[ \frac{1}{n}\sum_{i=1}^n X_i\to \mu \] In \(L^2\) and in probability
\(L^2\)-convergence : \(A\to B\) in \(L^2\) means \(E[(A-B)^2]\to 0\).
Lemma : If \(p>0\) , and \(E[|Z_n|^p]\to 0\) , then \(Z_n\xrightarrow{P} 0\).
Proof : Use Chebyshev's Lemma for \(\varphi(x)=x^p\) , so \(P(|Z_n|>\epsilon)\le \epsilon^{-p} E[|Z_n|^p]\to 0\).
Proof : Let \(S_n=\sum_{i=1}^n X_i\),
\(L^2\) convergence : \[ E[(S_n/n-\mu)^2]=Var(S_n/n)=\frac{1}{n^2}\sum_{i=1}^n Var(X_i)\le \frac{C}{n}\to\infty \] In probability : Let \(Z_n=S_n/n-\mu\) , let \(p=2\) , use the lemma above.
Application : see book
Triangular Arrays
(*) THM
Let \(X_1,X_2\cdots,\) be random variables, \(S_n=\sum_{i=1}^n X_i\) , \(\mu_n=E[S_n]\) , \(\sigma^2_n=Var[S_n]\). Let \(b_1,b_2,\cdots\) be a sequence, with \(\frac{\sigma_n^2}{b_n^2}\to 0\) , then \[ \frac{S_n-\mu_n}{b_n}\xrightarrow{P}0 \] Proof :
Since \(E[(S_n-\mu_n)/b_n]^2=Var(S_n)/b_n^2\to 0\) , by Lemma above , this conclusion holds.
Def [ truncation ] : To truncate r.v. \(X\) at level \(M\) , means \[ \bar X=X\mathbb 1(|X|\le M)=\begin{cases}X&|X|\le M\\0&|X|>M\end{cases} \]
THM [ Weak Law for Triangular Arrays ] :
Conditions
For each \(n\) , \(X_{n,1},\cdots,X_{n,n}\) are independent
\(b_1,b_2,\cdots\) is a sequence s.t. \(b_n>0\) and \(b_n\to\infty\) as \(n\to\infty\) . Let \(\bar X_{n,k}=X_{n,k}\mathbb 1(|X_{n,k}|\le b_n)\).
As \(n\to\infty\)
(i). \(\sum_{k=1}^n P(|X_{n,k}|>b_n)\to 0\)
(ii). \(\frac{1}{b_n^2}\sum_{k=1}^n E[\bar X_{n,k}^2]\to 0\)
Result: Let \(S_n=\sum_{k=1}^n X_{n,k}\) , let \(a_n=\sum_{k=1}^n E[\bar X_{n,k}]\) , then \[ \frac{S_n-a_n}{b_n}\xrightarrow{P}0 \] Proof :
Let \(\bar S_n=\sum_{k=1}^n \bar X_{n,k}\) , so \[ P\left(|\frac{S_n-a_n}{b_n}|>\epsilon\right)\le P(S_n\neq \bar S_n)+P\left(|\frac{\bar S_n-a_n}{b_n}|>\epsilon\right) \]
\[ P(S_n\neq \bar S_n)\le P\left(\bigcup_{k=1}^n\{\bar X_{n,k}\neq X_{n,k}\}\right)\le \sum_{k=1}^n P(|X_{n,k}|>b_n)\to 0 \]
\[ \begin{aligned} &\quad P\left(|\frac{\bar S_n-a_n}{b_n}|>\epsilon\right)\\ &\le \frac{1}{\epsilon^2} E\left[\frac{\bar S_n-a_n}{b_n}\right]^2\qquad\text{using Chebyshev's Inequality}\\ &=\frac{1}{\epsilon^2b_n^2}Var(\bar S_n)\\ &=\frac{1}{\epsilon^2b_n^2}\sum_{k=1}^n Var(\bar X_{n,k})\qquad{\text{using uncorrelated property}}\\ &\le \frac{1}{\epsilon^2b_n^2}\sum_{k=1}^n E[\bar X_{n,k}^2]\to0 \end{aligned} \]
Weak Law of Large Numbers
THM [ week law of large numbers (WLLN) ]
Let \(X_1,X_2,\cdots\) be i.i.d. , with \(\lim\limits_{x\to\infty}xP(|X_1|>x)=0\). Let \(\mu_n=E[X_1\mathbb 1(|X_1|\le n)]\) , so \[ \frac{1}{n}\sum_{i=1}^n X_i-\mu_n\xrightarrow{P} 0 \]
Proof:
Let \(X_{n,k}=X_k,b_n=n\), we want to use Weak Law for Triangular Arrays.
For condition (i) , \[ \sum_{k=1}^n P(|X_{n,k}|>b_n)=\sum_{k=1}^n P(|X_k|>n)=nP(|X_k|>n)\to 0 \]
Lemma [ expected value of \(t\)-th momentum ] : For random variable \(Y\ge 0\) and \(t>0\), \[ E[Y^t]=\int_{0}^{\infty}ty^{t-1}P(Y>y)dy \] Proof : \[ \begin{aligned} &\quad \int_{0}^{\infty}ty^{t-1}P(Y>y)dy\\ &=\int_{0}^{\infty}\int_{\Omega}ty^{t-1}\mathbb 1(Y>y)dPdy\\ &=\int_{\Omega}\int_{0}^{\infty}ty^{t-1}\mathbb 1(Y>y)dydP\\ &=\int_{\Omega}\int_{0}^{Y}ty^{t-1}dydP\\ &=\int_{\Omega}y^tdP\\ &=E[Y^t] \end{aligned} \]
For condition (ii) , \[ \begin{aligned} &\quad \frac{1}{b_n^2}\sum_{k=1}^n E[\bar X_{n,k}^2]\\ &=\frac{1}{b_n^2}\sum_{k=1}^n E[(X_{n,k}\mathbb 1(|X_{n,k}|\le b_n))^2]\\ &=\frac{1}{n^2}\sum_{k=1}^n E[(X_k\mathbb 1(|X_k|\le n))^2]\\ &=\frac{1}{n}E[(X_1\mathbb 1(|X_1|\le n))^2]\\ &=\frac{1}{n} E[\bar X_{n,1}^2]\\ &=\frac{1}{n}\int_{0}^{\infty} 2yP(\bar X_{n,1}>y)dy\\ &=\frac{2}{n}\int_{0}^{n} yP(|X_1|>y)dy \end{aligned} \] Let \(g(y)=yP(|X_1|>y)\) , since \(xP(|X_1|>x)\to 0\) , \(g(y)\) is bounded.
Let \(g_n(y):=g(ny)\) , so \(\forall y>0\) , \(g_n\) is bounded and \(\to 0\) \[ \begin{aligned} &\quad\frac{2}{n}\int_{0}^n g(y)dy\\ &=2\int_0^1 g_n(y)dy\\ &\to 0\qquad\qquad\qquad\text{using Dominated Convergence Theorem} \end{aligned} \]
Remark :
Use lemma with \(t=1-\epsilon\) , so \(xP(|X_1|>x)\to 0\) implies \(E[|X_1|^{1-\epsilon}]<\infty\) .
Which means that \(xP(|X_1|>x)\to 0\) is not much weak than \(E[|X_1|]<\infty\) .
THM [ WLLN in common ] :
Let \(X_1,X_2,\cdots\) be i.i.d. , with \(\mu=E[|X_i|]<\infty\) , then as \(n\to\infty\) , \[ \frac{1}{n}\sum_{i=1}^n X_i\xrightarrow{P}\mu \]
(*) Remarks :
Weak Law does not hold : [ Cauchy Distribution ]
\(P(X_i\le x)=\int_{-\infty}^x \frac{dt}{\pi(1+t^2)}\)
As \(x\to\infty\) , \[ P(|X_1|>x)=2\int_{x}^{\infty} \frac{dt}{\pi(1+t^2)}\sim \frac{2}{\pi}\int_{x}^{\infty} t^{-2}dt= \frac{2}{\pi} x^{-1} \] Therefore , \(xP(|X_1|>x)=\frac{2}{\pi}\not\to 0\) .
Weak Law holds but \(E[X_1]=\infty\)
E.g. \(P(X_i=2^j)=2^{-j}\) for \(j=1,2,\cdots\)
SOL : back to weak law for triangular arrays , choose better \(b_n\)
\(S_n/(n\log n)\xrightarrow P 1\)
2.5 Borel-Cantelli Lemma
\(\limsup,\liminf\) of sets
Def [ limsup of events ] Let \(A_1,A_2,\cdots\) be events , \[ \limsup A_n:=\lim_{m\to\infty}\cup_{n=m}^{\infty}A_n \] THM : \(\limsup A_n=\{w:w\text{ in infinitely many }A_n\}\)
Proof :
\(\supseteq\) : Since \(w\) is in infinitely many \(A_n\) , \(\forall m>0\) , \(w\in B_m=\cup_{n=m}^{\infty} A_n\) , so \(w\in \limsup A_n\)
\(\subseteq\) : If \(w\) is not in infinitely many \(A_n\) , then suppose \(w\in A_{n_1},A_{n_2},\cdots,A_{n_k}\) , let \(m\ge n_k+1\) , so \(w\notin B_m\) , so \(w\notin \limsup A_n\).
Def [ liminf of events ] Let \(A_1,A_2,\cdots\) be events , \[ \liminf A_n:=\lim_{m\to\infty}\cap_{n=m}^{\infty}A_n \] THM : \(\liminf A_n=\{w:w\text{ in all but finitely many }A_n\}\)
Proof :
\(\supseteq\) : Suppose \(w\notin A_{n_1},\cdots,A_{n_k}\) , so \(\forall m\ge n_k+1,w\in C_m=\cap_{n=m}^{\infty} A_n\), so \(w\in \liminf A_n\).
\(\subseteq\) : Suppose \(w\notin A_{n_1},A_{n_2},\cdots\) , so \(\forall m>0,\exists n_k>m,w\notin A_{n_k}\) , so \(w\notin C_m\) , so \(w\notin \liminf A_n\).
Def [ infinitely often (i.o.) ] infinitely often : appears infinitely times \[ \limsup A_n=\{w:w\in A_n\text{ i.o.}\} \]
(*) Property
\(P(\limsup A_n)\ge \limsup P(A_n)\) , \(P(\liminf A_n)\le \liminf P(A_n)\)
THM : The following three statements are equivalent
(i). \(X_n\to 0\) a.s.
(ii). \(\forall \epsilon>0\) , \(P(w:|X_n(w)|>\epsilon \text{ i.o.})=0\)
(iii). Let \(A_n(\epsilon)=\{w:|X_n(w)|>\epsilon\}\) , \(P\left(\bigcup_{\epsilon>0} \limsup A_n(\epsilon)\right)=0\)
Proof :
(i)\(\to\) (ii) : \(X_n\to 0\) a.s. means \(P\{w:X_n(w)\not\to 0\}=0\) .
Let \(S=\{w:X_n(w)\to 0\}\) , so \(\forall w\in S\) , \(\forall \epsilon>0\) , \(\exists N_{w,\epsilon}>0\) , \(\forall n>N_{w,\epsilon}\) , \(|X_n(w)|\le \epsilon\) . Therefore , \(w\notin \{w:|X_n(w)|>\epsilon\text{ i.o.}\}\). Therefore , \(\{w:X_n(w)\not\to 0\}\supseteq \{w:|X_n(w)|>\epsilon\text{ i.o.}\}\).
(ii)\(\to\)(iii) : \(\forall 0<\epsilon_1<\epsilon_2\) , \(\limsup A_n(\epsilon_1)\supseteq \limsup A_n(\epsilon_2)\)
Let \(\epsilon_1,\epsilon_2,\cdots\) be a sequence , \(\lim_{i\to\infty} \epsilon_i=0\) , so \[ \bigcup_{\epsilon>0}\limsup A_n(\epsilon)=\lim_{I\to\infty} \bigcup_{i=1}^I\limsup A_n(\epsilon_i) \] Therefore , \[ \begin{aligned} &\quad P\left(\bigcup_{\epsilon>0}\limsup A_n(\epsilon)\right)\\ &\le\lim_{I\to\infty}\sum_{i=1}^I P(\limsup A_n(\epsilon_i))\\ &=\lim_{I\to\infty}\sum_{i=1}^I P(\{w:|X_n(w)|>\epsilon_i\text{ i.o.}\})\\ &=0 \end{aligned} \] (iii)\(\to\)(i) : Let \(\Omega_0=\{w:X_n(w)\not\to 0\}\) , we only need to prove that \(\bigcup_{\epsilon>0} \limsup A_n(\epsilon)=\Omega_0\).
\(\forall \epsilon>0\) , \(\limsup A_n(\epsilon)=\{w:|X_n(w)|>\epsilon\text{ i.o.}\}\)
For \(w\in \limsup A_n(\epsilon)\) , \(X_n(w)\not\to 0\) , so \(w\in \Omega_0\).
\(\forall w\in \Omega_0\) , \(\exists \epsilon_0>0\) , \(\forall N>0\) , \(\exists n_N>N\) , s.t. \(|X_n(w)|>\epsilon_0\) .
Therefore , \(w\in\) infinitely many \(A_n(\epsilon_0)\)
Therefore , \(w\in \limsup A_n(\epsilon_0)\)
Borel-Cantelli Lemma
THM [ Borel-Cantelli Lemma ] : \(A_1,A_2,\cdots\) be a sequence of events , \[ \sum_{n=1}^{\infty}P(A_n)<\infty\Rightarrow P(w:w\in A_n\text{ i.o.})=0 \]
Proof :
Let \(N(w)=\sum_{n=1}^{\infty}\mathbb 1_{A_k}(w)\) , ( which indicates the number of \(A_n\) that \(w\) appears ) , therefore, \[ \begin{aligned} E[N]&=E\left[\sum_{n=1}^{\infty}\mathbb 1_{A_k}\right]\\ &=\sum_{n=1}^{\infty} E[\mathbb 1_{A_k}]\qquad \text{using Fubini's Theorem}\\ &=\sum_{k=1}^{\infty}P(A_k)\\ &<\infty \end{aligned} \] Therefore , \(N(w)<\infty\) a.s.
Therefore , \[ P(w:w\in A_n\text{ i.o.})=P(w:N(w)=\infty)=0 \]