Probability and Statistics 6

Chapter 2 Laws of Large Numbers

2.2 Conditional Expectation

  1. Conditioning on general random variables

    1. Def [ conditional expected value ] :

      Let \(X,Z\) be random variables , \(E[|X|]<\infty\) , define \(Y=E[X|Z]\) with following properties:

      (i). \(Y\) is a function of \(Z\)

      (ii). \(E[|Y|]<\infty\)

      (iii). \(\forall G\in \sigma(Z)\) , \(\int_{G}YdP=\int_GXdP\)

      Moreover , if \(\tilde Y\) is a random variable satisfying (i)(ii)(iii) , then \(\tilde Y=Y\) a.s.

      This definition is given by Kolmogorov in 1933 : "Fundamental Concepts of Probability Theory"

    2. Def [ (regular) conditional probability ] \[ P(X\in A|Z):=E[\mathbb 1(X\in A)|Z] \] Def [ (regular) conditional density ] \[ \exists f_{X|Z}(x|z) \text{ s.t. } P(X\in A|Z)(w)=\int_Af_{X|Z}(x|Z(w))dx \]

      Here regular means this definition only holds in general cases. There may exist loophole in some special cases.

  2. Famous Results

    1. Bayes rule \[ \begin{aligned} P(B|A)&=\frac{P(A|B)P(B)}{P(A)}\\ P(X=x|Z=z)&=\frac{P(Z=z|X=x)P(X=x)}{P(Z=z)}\\ f_{X|Z}(x|z)&=\frac{f_{Z|X}(z|x)f_X(x)}{f_Z(z)} \end{aligned} \]

    2. Correlation

      1. Intuition

        \(X\perp\!\!\!\perp Y\Rightarrow E[(X-EX)(Y-EY)]=E[X-EX]E[Y-EY]=0\)

        \(E[(X-EX)(Y-EY)]\not\Rightarrow X\perp\!\!\!\perp Y\)

      2. Def [ uncorrelated ] \(X,Y\) are uncorrelated random variables , if \(E[(X-EX)(Y-EY)]=0\)

        This definition is equivalent as \(E[XY]=E[X]E[Y]\)

      3. Def [ correlation coefficient ] \[ \rho=\frac{E[(X-EX)(Y-EY)]}{\sqrt{Var(X)}\sqrt{Var(Y)}} \] Def [ covariance ] : \(E[(X-EX)(Y-EY)]\)

      4. THM : Let \(X_1,\cdots,X_n\) have \(E[X_i^2]<\infty\) and be uncorrelated , then \[ Var(X_1+\cdots+X_n)=\sum_{i=1}^n Var(X_i) \] Proof : Let \(S_n=X_1+\cdots+X_n\) \[ \begin{aligned} Var(S_n)&=E[(S_n-E[S_n])^2)]\\ &=E\left[\sum_{i=1}^n(X_i-\mu_i)\right]^2\\ &=E\left[\sum_{i=1}^n (X_i-\mu_i)^2+\sum_{i\neq j}(X_i-\mu_i)(X_j-\mu_j)\right]\\ &=\sum_{i=1}^n E[(X_i-\mu_i)^2]+\sum_{i\neq j}E[(X_i-\mu_i)(X_j-\mu_j)]\\ &=\sum_{i=1}^n Var(X_i) \end{aligned} \]

      5. THM : \(Var(cX)=c^2Var(X)\)

        Remark : for "totally correlated" r.v.

2.3 Introduction of LLN

  1. Def [ i.i.d. ] i.i.d. means independent and identically distributed

  2. THM [ week law of large numbers (WLLN) ]

    1. THM : Let \(X_1,X_2,\cdots\) be i.i.d. , with \(\lim\limits_{x\to\infty}xP(|X_1|>x)=0\). Let \(\mu_n=E[X_1\mathbb 1(|X_1|\le n)]\) , so \[ \frac{1}{n}\sum_{i=1}^n X_i-\mu_n\xrightarrow{P} 0 \]

    2. Remarks

      1. A sufficient condition for \(\lim\limits_{x\to\infty} xP(|X_1|>x)=0\) is \(E[|X_1|]<\infty\)

        Proof : \[ \begin{aligned} &\quad xP(|X_1|>x)\\ &=x\int \mathbb 1(|X_1|>x)dP\\ &\le\int |X_1|\mathbb 1(|X_1|>x)dP\\ &=E[|X_1|\mathbb 1(|X_1|>x)]\\ &\to 0\qquad\qquad\qquad\qquad\qquad\text{since }E[|X_1|]<\infty \end{aligned} \]

      2. When \(E[|X_1|]<\infty\) , using Dominated Convergence Theorem , \[ \mu_n=E[X_1\mathbb 1(|X_1|\le n)]\to E[X_1]=\mu \]

    3. THM [ WLLN in common ] :

      Let \(X_1,X_2,\cdots\) be i.i.d. , with \(E[|X_i|]<\infty\) , and \(\mu=E[X_i]\) , then as \(n\to\infty\) , \[ \frac{1}{n}\sum_{i=1}^n X_i\xrightarrow{P}\mu \]

  3. THM [ strong law of large numbers (SLLN) ]

    Let \(X_1,X_2,\cdots\) be pairwise independent and identically distributed , \(E[|X_i|]<\infty\) , and \(\mu=E[X_i]\) , therefore, as \(n\to\infty\), \[ \frac{1}{n}\sum_{i=1}^n X_i\to\mu\qquad a.s. \]

2.4 Weak Laws of Large Numbers

  1. (*) \(L^2\) Weak Laws

    1. THM [ \(L^2\) weak law ] : Let \(X_1,X_2,\cdots\) be uncorrelated r.v. , with \(E[X_i]=\mu\) and \(Var(X_i)\le C<\infty\). Then as \(n\to\infty\), \[ \frac{1}{n}\sum_{i=1}^n X_i\to \mu \] In \(L^2\) and in probability

      \(L^2\)-convergence : \(A\to B\) in \(L^2\) means \(E[(A-B)^2]\to 0\).

    2. Lemma : If \(p>0\) , and \(E[|Z_n|^p]\to 0\) , then \(Z_n\xrightarrow{P} 0\).

      Proof : Use Chebyshev's Lemma for \(\varphi(x)=x^p\) , so \(P(|Z_n|>\epsilon)\le \epsilon^{-p} E[|Z_n|^p]\to 0\).

    3. Proof : Let \(S_n=\sum_{i=1}^n X_i\),

      \(L^2\) convergence : \[ E[(S_n/n-\mu)^2]=Var(S_n/n)=\frac{1}{n^2}\sum_{i=1}^n Var(X_i)\le \frac{C}{n}\to\infty \] In probability : Let \(Z_n=S_n/n-\mu\) , let \(p=2\) , use the lemma above.

    4. Application : see book

  2. Triangular Arrays

    1. (*) THM

      Let \(X_1,X_2\cdots,\) be random variables, \(S_n=\sum_{i=1}^n X_i\) , \(\mu_n=E[S_n]\) , \(\sigma^2_n=Var[S_n]\). Let \(b_1,b_2,\cdots\) be a sequence, with \(\frac{\sigma_n^2}{b_n^2}\to 0\) , then \[ \frac{S_n-\mu_n}{b_n}\xrightarrow{P}0 \] Proof :

      Since \(E[(S_n-\mu_n)/b_n]^2=Var(S_n)/b_n^2\to 0\) , by Lemma above , this conclusion holds.

    2. Def [ truncation ] : To truncate r.v. \(X\) at level \(M\) , means \[ \bar X=X\mathbb 1(|X|\le M)=\begin{cases}X&|X|\le M\\0&|X|>M\end{cases} \]

    3. THM [ Weak Law for Triangular Arrays ] :

      Conditions

      1. For each \(n\) , \(X_{n,1},\cdots,X_{n,n}\) are independent

      2. \(b_1,b_2,\cdots\) is a sequence s.t. \(b_n>0\) and \(b_n\to\infty\) as \(n\to\infty\) . Let \(\bar X_{n,k}=X_{n,k}\mathbb 1(|X_{n,k}|\le b_n)\).

      3. As \(n\to\infty\)

        (i). \(\sum_{k=1}^n P(|X_{n,k}|>b_n)\to 0\)

        (ii). \(\frac{1}{b_n^2}\sum_{k=1}^n E[\bar X_{n,k}^2]\to 0\)

      Result: Let \(S_n=\sum_{k=1}^n X_{n,k}\) , let \(a_n=\sum_{k=1}^n E[\bar X_{n,k}]\) , then \[ \frac{S_n-a_n}{b_n}\xrightarrow{P}0 \] Proof :

      1. Let \(\bar S_n=\sum_{k=1}^n \bar X_{n,k}\) , so \[ P\left(|\frac{S_n-a_n}{b_n}|>\epsilon\right)\le P(S_n\neq \bar S_n)+P\left(|\frac{\bar S_n-a_n}{b_n}|>\epsilon\right) \]

      2. \[ P(S_n\neq \bar S_n)\le P\left(\bigcup_{k=1}^n\{\bar X_{n,k}\neq X_{n,k}\}\right)\le \sum_{k=1}^n P(|X_{n,k}|>b_n)\to 0 \]

      3. \[ \begin{aligned} &\quad P\left(|\frac{\bar S_n-a_n}{b_n}|>\epsilon\right)\\ &\le \frac{1}{\epsilon^2} E\left[\frac{\bar S_n-a_n}{b_n}\right]^2\qquad\text{using Chebyshev's Inequality}\\ &=\frac{1}{\epsilon^2b_n^2}Var(\bar S_n)\\ &=\frac{1}{\epsilon^2b_n^2}\sum_{k=1}^n Var(\bar X_{n,k})\qquad{\text{using uncorrelated property}}\\ &\le \frac{1}{\epsilon^2b_n^2}\sum_{k=1}^n E[\bar X_{n,k}^2]\to0 \end{aligned} \]

  3. Weak Law of Large Numbers

    1. THM [ week law of large numbers (WLLN) ]

      Let \(X_1,X_2,\cdots\) be i.i.d. , with \(\lim\limits_{x\to\infty}xP(|X_1|>x)=0\). Let \(\mu_n=E[X_1\mathbb 1(|X_1|\le n)]\) , so \[ \frac{1}{n}\sum_{i=1}^n X_i-\mu_n\xrightarrow{P} 0 \]

    2. Proof:

      Let \(X_{n,k}=X_k,b_n=n\), we want to use Weak Law for Triangular Arrays.

      1. For condition (i) , \[ \sum_{k=1}^n P(|X_{n,k}|>b_n)=\sum_{k=1}^n P(|X_k|>n)=nP(|X_k|>n)\to 0 \]

      2. Lemma [ expected value of \(t\)-th momentum ] : For random variable \(Y\ge 0\) and \(t>0\), \[ E[Y^t]=\int_{0}^{\infty}ty^{t-1}P(Y>y)dy \] Proof : \[ \begin{aligned} &\quad \int_{0}^{\infty}ty^{t-1}P(Y>y)dy\\ &=\int_{0}^{\infty}\int_{\Omega}ty^{t-1}\mathbb 1(Y>y)dPdy\\ &=\int_{\Omega}\int_{0}^{\infty}ty^{t-1}\mathbb 1(Y>y)dydP\\ &=\int_{\Omega}\int_{0}^{Y}ty^{t-1}dydP\\ &=\int_{\Omega}y^tdP\\ &=E[Y^t] \end{aligned} \]

      3. For condition (ii) , \[ \begin{aligned} &\quad \frac{1}{b_n^2}\sum_{k=1}^n E[\bar X_{n,k}^2]\\ &=\frac{1}{b_n^2}\sum_{k=1}^n E[(X_{n,k}\mathbb 1(|X_{n,k}|\le b_n))^2]\\ &=\frac{1}{n^2}\sum_{k=1}^n E[(X_k\mathbb 1(|X_k|\le n))^2]\\ &=\frac{1}{n}E[(X_1\mathbb 1(|X_1|\le n))^2]\\ &=\frac{1}{n} E[\bar X_{n,1}^2]\\ &=\frac{1}{n}\int_{0}^{\infty} 2yP(\bar X_{n,1}>y)dy\\ &=\frac{2}{n}\int_{0}^{n} yP(|X_1|>y)dy \end{aligned} \] Let \(g(y)=yP(|X_1|>y)\) , since \(xP(|X_1|>x)\to 0\) , \(g(y)\) is bounded.

        Let \(g_n(y):=g(ny)\) , so \(\forall y>0\) , \(g_n\) is bounded and \(\to 0\) \[ \begin{aligned} &\quad\frac{2}{n}\int_{0}^n g(y)dy\\ &=2\int_0^1 g_n(y)dy\\ &\to 0\qquad\qquad\qquad\text{using Dominated Convergence Theorem} \end{aligned} \]

    3. Remark :

      Use lemma with \(t=1-\epsilon\) , so \(xP(|X_1|>x)\to 0\) implies \(E[|X_1|^{1-\epsilon}]<\infty\) .

      Which means that \(xP(|X_1|>x)\to 0\) is not much weak than \(E[|X_1|]<\infty\) .

    4. THM [ WLLN in common ] :

      Let \(X_1,X_2,\cdots\) be i.i.d. , with \(\mu=E[|X_i|]<\infty\) , then as \(n\to\infty\) , \[ \frac{1}{n}\sum_{i=1}^n X_i\xrightarrow{P}\mu \]

    5. (*) Remarks :

      1. Weak Law does not hold : [ Cauchy Distribution ]

        \(P(X_i\le x)=\int_{-\infty}^x \frac{dt}{\pi(1+t^2)}\)

        As \(x\to\infty\) , \[ P(|X_1|>x)=2\int_{x}^{\infty} \frac{dt}{\pi(1+t^2)}\sim \frac{2}{\pi}\int_{x}^{\infty} t^{-2}dt= \frac{2}{\pi} x^{-1} \] Therefore , \(xP(|X_1|>x)=\frac{2}{\pi}\not\to 0\) .

      2. Weak Law holds but \(E[X_1]=\infty\)

        E.g. \(P(X_i=2^j)=2^{-j}\) for \(j=1,2,\cdots\)

        SOL : back to weak law for triangular arrays , choose better \(b_n\)

        \(S_n/(n\log n)\xrightarrow P 1\)

2.5 Borel-Cantelli Lemma

  1. \(\limsup,\liminf\) of sets

    1. Def [ limsup of events ] Let \(A_1,A_2,\cdots\) be events , \[ \limsup A_n:=\lim_{m\to\infty}\cup_{n=m}^{\infty}A_n \] THM : \(\limsup A_n=\{w:w\text{ in infinitely many }A_n\}\)

      Proof :

      \(\supseteq\) : Since \(w\) is in infinitely many \(A_n\) , \(\forall m>0\) , \(w\in B_m=\cup_{n=m}^{\infty} A_n\) , so \(w\in \limsup A_n\)

      \(\subseteq\) : If \(w\) is not in infinitely many \(A_n\) , then suppose \(w\in A_{n_1},A_{n_2},\cdots,A_{n_k}\) , let \(m\ge n_k+1\) , so \(w\notin B_m\) , so \(w\notin \limsup A_n\).

    2. Def [ liminf of events ] Let \(A_1,A_2,\cdots\) be events , \[ \liminf A_n:=\lim_{m\to\infty}\cap_{n=m}^{\infty}A_n \] THM : \(\liminf A_n=\{w:w\text{ in all but finitely many }A_n\}\)

      Proof :

      \(\supseteq\) : Suppose \(w\notin A_{n_1},\cdots,A_{n_k}\) , so \(\forall m\ge n_k+1,w\in C_m=\cap_{n=m}^{\infty} A_n\), so \(w\in \liminf A_n\).

      \(\subseteq\) : Suppose \(w\notin A_{n_1},A_{n_2},\cdots\) , so \(\forall m>0,\exists n_k>m,w\notin A_{n_k}\) , so \(w\notin C_m\) , so \(w\notin \liminf A_n\).

    3. Def [ infinitely often (i.o.) ] infinitely often : appears infinitely times \[ \limsup A_n=\{w:w\in A_n\text{ i.o.}\} \]

    4. (*) Property

      \(P(\limsup A_n)\ge \limsup P(A_n)\) , \(P(\liminf A_n)\le \liminf P(A_n)\)

    5. THM : The following three statements are equivalent

      (i). \(X_n\to 0\) a.s.

      (ii). \(\forall \epsilon>0\) , \(P(w:|X_n(w)|>\epsilon \text{ i.o.})=0\)

      (iii). Let \(A_n(\epsilon)=\{w:|X_n(w)|>\epsilon\}\) , \(P\left(\bigcup_{\epsilon>0} \limsup A_n(\epsilon)\right)=0\)

      Proof :

      (i)\(\to\) (ii) : \(X_n\to 0\) a.s. means \(P\{w:X_n(w)\not\to 0\}=0\) .

      Let \(S=\{w:X_n(w)\to 0\}\) , so \(\forall w\in S\) , \(\forall \epsilon>0\) , \(\exists N_{w,\epsilon}>0\) , \(\forall n>N_{w,\epsilon}\) , \(|X_n(w)|\le \epsilon\) . Therefore , \(w\notin \{w:|X_n(w)|>\epsilon\text{ i.o.}\}\). Therefore , \(\{w:X_n(w)\not\to 0\}\supseteq \{w:|X_n(w)|>\epsilon\text{ i.o.}\}\).

      (ii)\(\to\)(iii) : \(\forall 0<\epsilon_1<\epsilon_2\) , \(\limsup A_n(\epsilon_1)\supseteq \limsup A_n(\epsilon_2)\)

      Let \(\epsilon_1,\epsilon_2,\cdots\) be a sequence , \(\lim_{i\to\infty} \epsilon_i=0\) , so \[ \bigcup_{\epsilon>0}\limsup A_n(\epsilon)=\lim_{I\to\infty} \bigcup_{i=1}^I\limsup A_n(\epsilon_i) \] Therefore , \[ \begin{aligned} &\quad P\left(\bigcup_{\epsilon>0}\limsup A_n(\epsilon)\right)\\ &\le\lim_{I\to\infty}\sum_{i=1}^I P(\limsup A_n(\epsilon_i))\\ &=\lim_{I\to\infty}\sum_{i=1}^I P(\{w:|X_n(w)|>\epsilon_i\text{ i.o.}\})\\ &=0 \end{aligned} \] (iii)\(\to\)(i) : Let \(\Omega_0=\{w:X_n(w)\not\to 0\}\) , we only need to prove that \(\bigcup_{\epsilon>0} \limsup A_n(\epsilon)=\Omega_0\).

      1. \(\forall \epsilon>0\) , \(\limsup A_n(\epsilon)=\{w:|X_n(w)|>\epsilon\text{ i.o.}\}\)

        For \(w\in \limsup A_n(\epsilon)\) , \(X_n(w)\not\to 0\) , so \(w\in \Omega_0\).

      2. \(\forall w\in \Omega_0\) , \(\exists \epsilon_0>0\) , \(\forall N>0\) , \(\exists n_N>N\) , s.t. \(|X_n(w)|>\epsilon_0\) .

        Therefore , \(w\in\) infinitely many \(A_n(\epsilon_0)\)

        Therefore , \(w\in \limsup A_n(\epsilon_0)\)

  2. Borel-Cantelli Lemma

    1. THM [ Borel-Cantelli Lemma ] : \(A_1,A_2,\cdots\) be a sequence of events , \[ \sum_{n=1}^{\infty}P(A_n)<\infty\Rightarrow P(w:w\in A_n\text{ i.o.})=0 \]

    2. Proof :

      Let \(N(w)=\sum_{n=1}^{\infty}\mathbb 1_{A_k}(w)\) , ( which indicates the number of \(A_n\) that \(w\) appears ) , therefore, \[ \begin{aligned} E[N]&=E\left[\sum_{n=1}^{\infty}\mathbb 1_{A_k}\right]\\ &=\sum_{n=1}^{\infty} E[\mathbb 1_{A_k}]\qquad \text{using Fubini's Theorem}\\ &=\sum_{k=1}^{\infty}P(A_k)\\ &<\infty \end{aligned} \] Therefore , \(N(w)<\infty\) a.s.

      Therefore , \[ P(w:w\in A_n\text{ i.o.})=P(w:N(w)=\infty)=0 \]