Probability and Statistics 4
Chapter 1 Background in Probability
1.5 Properties of Integration
Holder's Inequality
Def [ \(L_p\) Norm ] : For \(p\ge 1\) , we can define \(L_p\) Norm : \(||f||_p=\left(\int |f|^p d\mu\right)^{1/p}\)
THM [ Holder's Inequality ] : \(\forall p,q>1\) , s.t. \(\frac{1}{p}+\frac{1}{q}=1\) , then \[ \int |fg|d\mu \le ||f||_p||g||_q \]
Proof
Lemma : If \(p,q>1,\frac{1}{p}+\frac{1}{q}=1\) , then \(\forall x,y\ge 0,xy\le \frac{1}{p}x^p+\frac{1}{q}y^q\) .
Proof :
Fix \(y\) , let \(f(x)=\frac{1}{p}x^p-yx+\frac{1}{q}y^q\) , so \(f'(x)=x^{p-1}-y\)
Therefore , \(f(x)_{\min}=f(x=y^{1/(p-1)})=0\) , so \(f(x)\ge 0\) . \(\Box\)
For either \(||f||_p=0\) or \(||g||_q=0\) , that means \(f=0\) a.e. or \(g=0\) a.e. , so \(\int |fg|d\mu=0\)
When \(||f||_p\neq 0,||g||_q\neq 0\) , we can suppose that \(||f||_p=||g||_q=1\) without loss of generality .
Therefore , using the lemma , \[ \begin{aligned} \int |fg|d\mu&\le \int \left(\frac{1}{p}|f|^p+\frac{1}{q}|g|^q\right)d\mu\\ &=\frac{1}{p}||f||_p^p+\frac{1}{q}||g||_q^q\\ &=\frac{1}{p}+\frac{1}{q}\\ &=1\\ &=||f||_p||g||_q \end{aligned} \]
Remark : Here it is necessary to suppose \(||f||_p=||g||_q=1\) , or otherwise we will get \[ \frac{1}{p}||f||_p^p+\frac{1}{q}||g||_q^q\ge ||f||_p||g||_q \] Which is not what we want .
But exactly when \(||f||_p=||g||_q=1\) , the above inequality is equality .
Remark : When \(p=q=2\) , Holder's Inequality becomes Cauchy-Schwarz Inequality. \[ \left(\int |fg|d\mu\right)^2\le \left(\int f^2d\mu\right)\left(\int g^2d\mu\right) \]
Convergence of functions
Def [ converge a.e. ] : \(f_1,f_2,\cdots\) is a sequence of functions . If \(f_n\) converge to \(f\) almost everywhere , it means that \[ \mu\left(\left\{w:\lim_{n\to\infty} f_n(w)\neq f(w)\right\}\right)=0 \]
Def [ converge in measure ] : \(f_1,f_2,\cdots\) is a sequence of functions . If \(f_n\) *** converge to \(f\) in measure*** , it means that \[ \forall \epsilon>0 , \lim_{n\to\infty}\mu\left(\left\{w:|f_n(w)-f(w)|\ge \epsilon\right\}\right)=0 \]
Def [ almost uniform convergence ] : \(f_1,f_2,\cdots\) is a sequence of functions . Suppose \(f:E\to \mathbb R\) is a function . If \(f_n\) converge to \(f\) almost uniformly , it means that
\(\forall \epsilon_1>0\) , there exists a measurable set \(D\subseteq E\) such that \(\mu(D)<\epsilon_1\) , such that \[ \forall \epsilon>0,\exists N>0,\forall n>N,x\in E\backslash D,|f_n(x)-f(x)|<\epsilon \]
THM [ Egovov's THM ] : If the support set \(E\) of \(f\) with \(\mu(E)<\infty\) , then
\(f_n\to f\) a.e. \(\Rightarrow\) \(f_n\to f\) almost uniformly
THM : \(f_n\to f\) almost uniformly \(\Rightarrow\) \(f_n\to f\) in measure
Remarks
the difference between convergences
- a.e. : 相当于逐点收敛,但每个点的收敛速度可能不一样
- uniform : 相当于一致收敛,趋于无穷时存在的 \(N\) 只和 \(\epsilon\) 有关(而不依赖于 \(w\) ),相当于衡量一致的收敛速度
- in measure : 类似于一致收敛,但可以允许不收敛的地方依赖 \(n\)
converge almost uniform 强于 converge in measure
在 \(\mu(E)<\infty\) 时,converge a.e. 强于 converge almost uniformly
\(\mu(E)=\infty\) , Egovov's THM 可能不成立
\((\mathbb R,\mathcal R,\lambda)\) , \(f_n(x)=\mathbb 1_{[n,n+1]}(x)\)
\(f_n\to 0\) a.e. , but let \(\epsilon=\frac{1}{2}\) , \(\mu(\{w:|f_n(w)|>\frac{1}{2}\})=1\)
Convergence of random variables
\(\mu\) : probability measure , \(f\) , \(f_1,\cdots\) random variables
Def [ converge a.s. ] : \(f_n\to f\) a.e. , then we say \(f_n\) converges to \(f\) almost surely .
Def [ converge in probability ] : \(f_n\to f\) in measure , then we say \(f_n\) converges to \(f\) in probability , denote as \(f_n\xrightarrow{P} f\).
Bounded Convergence Theorem
THM [ Bounded Convergence Theorem (BCT) ]
Condition :
\(E\in \mathcal F\) , s.t. \(\mu(E)<\infty\) and \(\forall n\ge 1,f_n(E^c)=0\) .
\(\exists M>0\) , \(\forall n\ge 1\) , \(|f_n(x)|\le M\)
\(f_n\to f\) in measure
Result : \[ \lim_{n\to\infty}\int f_n d\mu=\int fd\mu \]
Proof
\(\forall \epsilon>0\) , let \(G_n=\{x:|f_n(x)-f(x)|<\epsilon\}\) , let \(B_n=\Omega-G_n\) \[ \begin{aligned} &\quad\left|\int f_nd\mu-\int fd\mu\right|\\ &\le \int |f_n-f|d\mu\\ &=\int_{G_n}|f_n-f|d\mu+\int_{B_n\cap \{|f|\le M+1\}}|f_n-f|d\mu+\int_{B_n\cap \{|f|>M+1\}}|f_n-f|d\mu\\ &\le \epsilon \mu(G_n)+3M\mu(B_n)+\int_{\{|f|\ge M+\frac{1}{2}\}} (|f|+M)d\mu\\ &\le \epsilon \mu(G_n)+3M\mu(B_n)+M\mu\left\{|f|\ge M+\frac{1}{2}\right\}+\int_{\{|f|\ge M+\frac{1}{2}\}}|f|d\mu \end{aligned} \] Since \(f_n\to f\) in measure , \(\mu(B_n)\to 0\) as \(n\to \infty\)
If \(\mu\{|f|\ge M+\frac{1}{2}\}\neq 0\) ,then as \(n\to\infty\) , \(\mu\{x:|f_n(x)-f(x)|\ge \frac{1}{2}\} \ge \mu\{|f|\ge M+\frac{1}{2}\}\not\to 0\) , contradict with \(f_n\to f\) in measure
\(\int_{\{|f|\ge M+\frac{1}{2}\}}|f|d\mu=0\) ( though I don't know how to prove it , maybe here is a bug)
Fatou's Lemma
Lemma [ Fatou's Lemma ] : If \(f_n\ge 0\) , then \[ \liminf_{n\to\infty} \int f_nd\mu\ge \int \left(\liminf_{n\to\infty}f_n\right)d\mu \]
Proof
Let \(g_n(x)=\inf_{m\ge n} f_m(x)\) , so \(f_n(x)\ge g_n(x)\) . And as \(n\uparrow\infty\) , \(g_n(x)\uparrow g(x)=\liminf\limits_{n\to\infty} f_n(x)\) .
Therefore , we only need to prove that \[ \lim_{n\to\infty} \int g_nd\mu \ge \int d\mu \] Consider \(E_m\uparrow \Omega\) , \(E_1\subseteq E_2\subseteq\cdots\) , and \(\bigcup_{m=1}^{\infty} E_i=\Omega\) , then
\(\forall m>0\) , \(m\) fixed , \((g_n\land m)\mathbb 1_{E_m}\to (g\land m)\mathbb 1_{E_m}\) a.e.
Therefore , for any fixed \(m>0\) , \[ \begin{aligned} &\quad \lim_{n\to\infty}\int g_nd\mu\\ &\ge \lim_{n\to\infty} \int_{E_m}(g_n\land m)d\mu\\ &=\int_{E_m}(g\land m)d\mu\\ \end{aligned} \] Therefore , \[ \begin{aligned} &\quad \lim_{n\to\infty}\int g_nd\mu\\ &\ge \sup_{m>0}\int_{E_m}(g\land m)d\mu\\ &=\lim_{m\to\infty} \int_{E_m}(g\land m)d\mu\\ &=\int gd\mu \end{aligned} \]
Monotone Convergence Theorem
THM [ Monotone Convergence Theorem (MCT) ]
Condition :
\(f_n\ge 0\)
\(f_n\uparrow f\) a.e.
Result : \[ \int f_nd\mu \uparrow \int fd\mu \]
Proof
Since \(f_n\uparrow f\) , \(\limsup\limits_{n\to\infty} \int f_nd\mu\le \int fd\mu\).
By Fatou's Lemma ,
\[ \begin{aligned} &\quad \liminf_{n\to\infty}\int f_nd\mu\\ &\ge \int \left(\liminf_{n\to\infty} f_n\right)d\mu\\ &=\int \left(\lim_{n\to\infty} f_n\right)d\mu\\ &=\int fd\mu \end{aligned} \] Therefore , \(\lim\limits_{n\to\infty} \int f_nd\mu=\int fd\mu\) .
Dominated Convergence Theorem
THM [ Dominated Convergence Theorem (DCT) ]
Condition :
\(f_n\to f\) a.e.
There exists a function \(g\) that \(\forall n\ge 1,|f_n|\le g\) a.e.
\(g\) is integrable , i.e. \(\int |g|d\mu <\infty\)
Result : \[ \int f_n d\mu\to \int fd\mu \]
Proof
Since \(|f_n|\le g\) , \(f_n+g\ge 0\) a.e. and \(-f_n+g\ge 0\) a.e.
By Fatou's Lemma , \[ \begin{aligned} \liminf_{n\to\infty} \int (f_n+g)d\mu\ge \int (f+g)d\mu \quad&\Rightarrow\quad \liminf_{n\to\infty} \int f_nd\mu \ge \int fd\mu\\ \liminf_{n\to\infty} \int (-f_n+g)d\mu\ge \int (-f+g)d\mu \quad&\Rightarrow\quad \limsup_{n\to\infty} \int f_nd\mu \le \int fd\mu\\ \end{aligned} \]
1.6 Expected Value
Basic concept of Expectation
Def [ Expected Value ] : For \(X\ge 0\) be a random variable on \((\Omega,\mathcal F,P)\) , its expected value is \[ E[X]=\int X dP \] For general case , let \(X^+=\max\{X,0\}\) , \(X^-=\max\{-X,0\}\) .
Define \(E[X]\) when \(E[X^+]<\infty\) or \(E[X^-]<\infty\) , as \(E[X]=E[X^+]-E[X^-]\) .
Remarks
The definition of expected value is a little bit bigger than integrable , since we also define \(E[X]\) when \(E[X^+]=\infty\) or \(E[X^-]=\infty\) . But usually this does not matter.
We can construct an example that \(E[X]=\infty\)
\(P(X=2^{j})=2^{-j}\) for integer \(j\ge 1\) , then \(E[X]=\sum_{j=1}^{\infty} 2^jP(X=2^j)=\sum_{j=1}^{\infty}1=\infty\)
\(E[X]\) is often called mean of \(X\) , and denoted as \(\mu\) (different from measure!).
Basic properties
- \(E[X+Y]=E[X]+E[Y]\)
- \(E[aX+b]=aE[X]+b\)
- If \(X\ge Y\) , then \(E[X]\ge E[Y]\)
Inequalities
THM [ Jensen's Inequality ] : Suppose \(\varphi\) is a convex , and \(E[\varphi(X)],E[X]\) exist , then \[ E[\varphi(X)]\ge \varphi(E[X]) \]
Corollary : \(|E[X]|\le E[|X|]\) , \((E[X])^2\le E[X^2]\)
THM [ Holder's Inequality ] : If \(p,q\in [1,\infty]\) , \(\frac{1}{p}+\frac{1}{q}=1\) , then \[ E|XY|\le ||X||_p||Y||_q \] Here define \(||X||_r=(E[X^r])^{1/r}\) for \(r\in [1,\infty)\) ,
define \(||X||_{\infty}=\inf\{M:P(|X|>M)=0\}\) ( like the maximum )
THM [ Chebyshev's Inequality ] : \(\varphi : \mathbb R\to \mathbb R\) , \(\varphi\ge 0\)
Let \(A\in \mathcal R\) , and let \(i_A=\inf\{\varphi(y):y\in A\}\) . Therefore , \[ i_A P(X\in A)\le E[\varphi(X)\mathbb 1(X\in A)]\le E[\varphi(X)] \] Proof : \[ i_A\mathbb 1(X\in A)\le \varphi(X)\mathbb 1(X\in A)\le \varphi(X) \]
THM [ Chebyshev's Inequality 2 ] : Let \(\varphi(x)=(x-\mu)^2\) , then \[ \Pr\{|X-\mu|\ge t\}=\frac{Var(X)}{t^2} \]
THM [ Markov's Inequality ] : If \(X\ge 0\) a.s. then \[ \Pr\{X\ge t\}\le \frac{E[X]}{t} \]
Convergence
THM [ Egovov's Theorem ] : \(X_1,X_2 \cdots\) is a sequence of random variables , then
\(X_n\to X\) a.s. \(\Rightarrow\) \(X_n\to X\) in probability
THM [ Bounded Convergence Theorem ] :
Condition :
\(X_n\to X\) a.s.
\(\exists M>0\) , \(\forall n\ge 1,|X_n|\le M\)
Result : \(E[X_n]\to E[X]\)
THM [ Fatou's Lemma ] : If \(X_n\ge 0\) , then \[ \liminf_{n\to\infty} E[X_n]\ge E\left[\liminf_{n\to\infty} X_n\right] \]
THM [ Monotone Convergence Theorem ] :
Condition :
\(X_n\ge 0\)
\(X_n\uparrow X\) a.s.
Result : \(E[X_n]\uparrow E[X]\) a.s.
THM [ Dominated Convergence Theorem ] :
Condition :
\(X_n\to X\) a.s.
\(\exists Y\) , \(\forall n\ge 1\) , \(|X_n|\le Y\)
\(E[|Y|]<\infty\)
Result : \(E[X_n]\to E[X]\)
THM
Condition :
\(X_n\to X\) a.s.
\(g,h\) continuous functions
\(g\ge 0\) , \(g(x)\to\infty\) as \(|x|\to\infty\)
\(|h(x)|/g(x)\to 0\) as \(|x|\to\infty\)
\(\exists K>0\) , s.t. \(E[g(X)]\le K<\infty\)
Result : \(E[h(X_n)]\to E[h(X)]\) .
Remark : similar to DCT , use \(g\) to dominate \(h\)
Proof : see book
Intuition : truncation , consider \(\bar X=X\cdot \mathbb 1(|X|\le M)\) , \(M\) to make \(g(x)>0\) . \[ E[h(X_n)]\to E[h(\bar X_n)]\to E[h(\bar X)]\to E[h(X)] \]
Computing Expected Value
THM [ Change of variable formula ] :
Condition :
(i). \(X\) is a random variable on \((S,\mathcal S)\) , with distribution \(\mu\) ( i.e. \(\mu(A)=P(X\in A)\) )
(ii). \(f\) is a measurable function from \((S,\mathcal S)\) to \((\mathbb R,\mathcal R)\)
(iii). \(f\ge 0\) or \(E[|f(X)|]<\infty\)
Result : \[ E[f(X)]=\int_S f(y)\mu(dy) \]
Remark : let \(\mu=P\circ X^{-1}\) , so \[ \int f(X)dP=\int_S f(y)d(P\circ X^{-1}) \]
Further computation
When \(X\) is a continuous random variable , we can use Radon-Nikodym derivative to represent PDF , so \[ \begin{aligned} E[f(X)]&=\int_S f(y)d\mu\\ &=\int_\mathbb R f \frac{d\mu }{d\lambda} d\lambda\\ &=\int_{-\infty}^{\infty} f(x)p(x)dx \end{aligned} \]
Momentum
Def [ \(k\)-th momentum ] : If \(k\in \mathbb N^*\) , then \(E[X^k]\) is called the \(k\)-th moment of \(X\) .
Def [ Variance ] : \(Var(X)=E[(X-E[X])^2]\)
Property : \(Var(X)=E[X^2]-(E[X])^2\)
Examples
Def [ Bernoulli random variable ] : \(X\in \{0,1\}\) , \(\begin{cases}P(X=0)&=1-p\\P(X=1)&=p\end{cases}\)
\(E[X]=p\) , \(Var(X)=p(1-p)\)
Def [ Poisson random variable ] : with parameter \(\lambda\) , \[ P(X=k)=e^{-\lambda} \frac{\lambda^k}{k!} \quad k=0,1,2,\cdots \] Property : \[ E[\prod_{i=0}^{k-1}(X-i)]=\lambda^k \] \(Var(X)=\lambda\)
Def [ Geometric distribution ] : with success probability \(p\) , \[ P(X=k)=p(1-p)^{k-1} \] \(E[X]=\frac{1}{p}\) , \(Var(X)=\frac{1-p}{p^2}\)
Def [ Gaussian random variable / Normal distribution ] : with mean \(\mu\) and variance \(\sigma^2\) , \[ f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right) \] \(E[X]=\mu\) , \(Var[X]=\sigma^2\)
1.7 Product Measure and Fubini's Theorem
product of measure
Definition
For measurable space \((X,\mathcal A,\mu_1)\) and \((Y,\mathcal B,\mu_2)\) ,
Let \(\Omega=X\times Y=\{(x,y):x\in X,y\in Y\}\)
Let \(\mathcal S=\{A\times B:A\in \mathcal A,B\in \mathcal B\}\) , so \(\mathcal S\) is a semi-algebra since \((A\times B)^c=(A^c\times B)\cup (A\times B^c)\cup (A^c\times B^c)\) .
Let \(\mathcal F=\sigma(\mathcal S)\) , that is the \(\sigma\)-field generated by \(\mathcal S\) , denote as \(\mathcal F=\mathcal A\times\mathcal B\) .
THM [ measure for measurable space product ] :
There is a unique measure \(\mu\) on \(\mathcal F\) , s.t. \(\mu(A\times B)=\mu_1(A)\mu_2(B)\) .
(*) Proof :
By the property of \(\sigma\)-field generation , we only need to prove that :
If \(A\times B=+_i (A_i\times B_i)\) is a finite or countable disjoint union , then \[ \mu(A\times B)=\sum_{i} \mu(A_i\times B_i) \] For each \(x\in A\) , let \(I(x)=\{i:x\in A_i\}\) , firstly consider \(B=+_{i\in I(x)}B_i\) , so \[ \mathbb 1_A(x)\mu_2(B)=\sum_{i}\mathbb 1_{A_i}(x)\mu_2(B_i) \] Integrate it with \(\mu_1\) , so \[ \begin{aligned} \int \mathbb 1_A(x)\mu_2(B)d\mu_1&=\int \sum_{i}\mathbb 1_{A_i}(x)\mu_2(B_i)d\mu_1\\ \iff\quad\quad\quad\mu_1(A)\mu_2(B)&=\sum_{i}\mu_2(B_i)\int\mathbb 1_{A_i}(x)d\mu_1\\ \iff\quad\quad\quad\mu_1(A)\mu_2(B)&=\sum_{i}\mu_1(A_i)\mu_2(B_i)\\ \end{aligned} \]
Remark
\(\mu\) is often denoted as \(\mu=\mu_1\times \mu_2\)
We can generate this result to \(n\) measurable space product , so
Consider measurable space \((\Omega_i,\mathcal F_i,\mu_i)\) , Let \(\Omega=\times_{i=1}^n \Omega_i\) , \(\mathcal F=\times_{i=1}^n \mathcal F_i\) , so there is a unique measure \(\mu\) that for \(A=\times_{i=1}^n A_i\) , where \(A_i\in \mathcal F_i\) , \[ \mu(\times_{i=1}^n A_i)=\prod_{i=1}^n \mu_i(A_i) \]
When \((\Omega_i,\mathcal F_i,\mu_i)=(\mathbb R,\mathcal R,\lambda)\) , then \(\mu\) is the Lebesgue measure on the Borel subsets of \(\mathbb R^n\)
Fubini's Theorem
THM [ Fubini's Theorem ] : For measurable space \((X,\mathcal A,\mu_1)\) and \((Y,\mathcal B,\mu_2)\). If \(f\ge 0\) or \(\int |f|d\mu<\infty\) , then \[ \int_X\int_Y f(x,y)\mu_2(dy)\mu_1(dx)=\int_{X\times Y} fd\mu=\int_Y\int_X f(x,y)\mu_1(dx)\mu_2(dy) \]
(*) Proof
Firstly , we need to make sure that
- Fixing \(x\) , \(y\to f(x,y)\) is a measurable map on \(\mathcal B\)
- \(x\to\int_Y f(x,y)\mu_2(dy)\) is a measurable map on \(\mathcal A\)
We have the following lemma , dealing with \(f=\mathbb 1_{E}\)
Lemma 1 : Let \(E_x=\{y:(x,y)\in E\}\) . If \(E\in \mathcal A\times \mathcal B\) , then \(E_x\in \mathcal B\)
Lemma 2 : If \(E\in \mathcal A\times \mathcal B\) , then \(g(x):=\mu_2(E_x)\) is a measurable map on \(\mathcal A\) , and \[ \int_X gd\mu_1=\mu(E) \]
By these lemma , we can prove Fubini's Theorem on \(f=\mathbb 1_E\) for any \(E\in \mathcal A\times \mathcal B\)
Using the linearity of integration , we can prove Fubini's Theorem holds for all simple functions.
For non-negative function , we can let \(f_n(x,y)=([2^nf(x,y)]/2^n)\land n\) , so \(f_n\) is simple function and \(f_n\uparrow f\) . By MCT , Fubini's Theorem holds for all non-negative functions.
For general function , we can compute \(f=f^+-f^-\) each . Then Fubini's Theorem holds for all integrable functions.
Remarks
Toneli's Theorem : proved that Fubini's Theorem holds for \(f\ge 0\) .
When \(f\) is not non-negative and not integrable , Fubini's Theorem may fail :
Let \(X=Y=\mathbb N^*\) , \(\mathcal A=\mathcal B=\{S:S\subseteq \mathbb N^*\}\) , \(\mu_1=\mu_2=\text{counting measure}\) .
Let \(f(m,n)=\begin{cases}1&m=n\\ -1&m=n+1\\ 0&otherwise\end{cases}\) for all \(m,n\ge 1\) , so \[ \sum_m\sum_n f(m,n)=1,\sum_{n}\sum_{m}f(m,n)=0 \]
Chapter 2 Laws of Large Numbers
2.1 Independence
- Definition ( for probability space \((\Omega,\mathcal F,P)\) )
- Def [ Independence of Events ] : Let \(A,B\in \mathcal F\) , \(A\) and \(B\) are independent if \(P(A\cap B)=P(A)P(B)\), denote as \(A\perp\!\!\!\perp B\).
- Def [ Independence of Random Variables ] : Let \(X,Y\) be two random variables , \(X\) and \(Y\) are independent if \(\forall C,D\in \mathcal R\) , \(P(\{X\in C\}\cap \{Y\in D\})=P(\{X\in C\})P(\{Y\in D\})\), denote as \(X\perp\!\!\!\perp Y\).
- Def [ Independence of \(\sigma\)-field ] : Two \(\sigma\)-field \(\mathcal F,\mathcal G\) are independent if \(\forall A\in \mathcal F\) , \(\forall B\in \mathcal G\) , \(A,B\) are independent, denote as \(\mathcal F\perp\!\!\!\perp \mathcal G\).
- THM [ Independence of r.v. is a special case of \(\sigma\)-field ]
- If random variables \(X,Y\) are independent , then \(\sigma(X),\sigma(Y)\) are independent
- If \(\mathcal F\) and \(\mathcal G\) are independent , \(X\in \mathcal F,Y\in \mathcal G\) , then \(X,Y\) are independent