Probability and Statistics 1

Chapter 0

  1. Categories

    1. Background in Probability

      • what is probability \(\to\) measure theory

      • what is integration $$ Riemann / Lebesgue Integration

      • Expectation & its properties

    2. Probability foundations of Asymptotic Statistics

      • weak law of large numbers
      • strong law of large numbers (proof by Kolmoguor)
      • central limit theorem
      • characteristic function
    3. Estimation inference & testing

      • hypothesis testing
      • regression analysis
      • frontiers of statistical research (e.g. distribution of free test)
  2. Textbook

    [Durrett] Probability Theory & Examples (or PTE)

Chapter 1 Background in Probability

1.1 Probability Space

  1. Probability Space

    1. Def : \((\Omega,\mathcal F,P)\)

      \(\Omega\) : set of "outcomes"

      \(\mathcal F\) : set of "events" , like subset of \(\Omega\)

      \(P\) : function : \(\mathcal F\to [0,1]\)

    2. \(\mathcal F\) should be a \(\sigma\)-field

      1. Def : [\(\sigma\)-field] A nonempty collection of subsets of \(\Omega\) that

        • 补封闭:If \(A\in \mathcal F\) , then \(A^c\in \mathcal F\)
        • 可数无穷并封闭:If \(A_i\in \mathcal F\) , \(A_i\) is a countable sequence , then \(\cup_i A_i\in \mathcal F\)
      2. Prop :

        • \(\varnothing \in \mathcal F,\Omega\in \mathcal F\)

        Proof : \(A\in\mathcal F\) $$ \(A^c\in \mathcal F\) \(\to\) \(A\cup A^c\in \mathcal F\) \(\to\) \(\Omega\in \mathcal F\) \(\to\) \(\varnothing\in \mathcal F\)

        • 可数无穷交封闭:If \(A_i\in \mathcal F\) , , \(A_i\) is a countable sequence , then \(\cap_i A_i\in \mathcal F\)

        Proof : \(A\cap B=(A^c\cup B^c)^c\)

  2. Measurable Space

    1. Def : [measure] : non-negative , countably additive , set function

      A function \(\mu:\mathcal F\to \mathbb R\) with :

      • \(\forall A\in \mathcal F\) , \(\mu(A)\ge \mu(\varnothing)=0\)

      • If \(A_i\in \mathcal F\) , \(A_i\) countable disjoint sequence , then \[ \mu(\cup_i A_i)=\sum_{i}\mu(A_i) \]

    2. Def : [probability measure] : a measure \(\mu\) with \(\mu(\Omega)=1\)

    3. Thm : a measure \(\mu\) on \((\Omega,\mathcal F)\) satisfies

      • monotonicity : \(A\subset B\Rightarrow \mu(A)\le \mu(B)\)

      • subadditivity : \(A\subset \cup_{i=1}^{\infty} A_i\Rightarrow \mu(A)\le \sum_{i=1}^{\infty}A_i\)

      • continuity :

        Def : [\(A_i\uparrow A\) ]

        For set \(A\) : \(A_1\subset A_2\subset\cdots , \cup_i A_i=A\)

        For real number \(A\) : \(A_1\le A_2\le \cdots,\lim_{n\to\infty}A_n=A\)

        If \(A_i\uparrow A\) , then \(\mu(A_i)\uparrow \mu(A)\)

        If \(A_i\downarrow A\) , then \(\mu(A_i)\downarrow \mu(A)\)

      Proof :

      1. : Let \(B-A=B\cap A^c\) , so if \(A\subset B\) , then \(B=A+(B-A)\) , and \(A,B-A\) are disjoint \[ \mu(B)=\mu(A+(B-A))=\mu(A)+\mu(B-A)\ge \mu(A) \]
      2. : Let \(A_n':=A_n\cap A\) , so \(A=\cup_{i=1}^{\infty} A'_i\) . Let \(B_n=\begin{cases}A_1'&n=1\\A_n'-\cup_{i=1}^{n-1} A_i'&n\ge 2\end{cases}\)
      Therefore , \(B_n\) are disjoint , and \(\cup_{i=1}^{\infty} B_i=\cup_{i=1}^{\infty} A_i'=A\) \[ \mu(A)=\mu(\cup_{i} B_i)=\sum_{i}\mu(B_i)\le \sum_{i} \mu(A_i) \]
      1. : Let \(B_n=A_n-A_{n-1}\) , so \(B_n\) are disjoint , \(\cup_{i=1}^{n} B_i=A_n\) , \(\cup_{i=1}^{\infty}B_i=A\) \[ \mu(A)=\sum_{i=1}^{\infty} \mu(B_i)=\lim_{n\to\infty}\sum_{i=1}^n \mu(B_i)=\lim_{n\to \infty}\mu(A_n) \]
    4. E.g.1 Discrete Probability Space

      \(\Omega\) : countable set , \(\mathcal F\) : the set of all subsets of \(\Omega\) , \(p: \Omega\to[0,1]\) , where \(\sum_{\omega\in \Omega}p(\omega)=1\) . \[ P(A):=\sum_{\omega\in A}p(\omega) \]

  3. Measure on real line

    1. Def : [generate] \(\mathcal A\) is a set of some subsets of \(\Omega\). A \(\sigma\)-field is generated by \(\mathcal A\) if it is the smallest \(\sigma\)-field containing \(\mathcal A\) : \[ \sigma(\mathcal A):=\bigcap_{\mathcal A\subset\mathcal F,\mathcal F\text{ is }\sigma\text{-field}}\mathcal F \]

    2. Def : [Borel Set]

      Let \(\mathcal A\) be the open subsets of \(\mathbb R^d\) , Borel set is \(\sigma(\mathcal A)\) , denoted as \(\mathcal R^d\) .

    3. measure for \(d=1\)

      1. Def : [Stieltjes measure function] \(F:\mathbb R\to\mathbb R\) satisfies :

        • non-decreasing : \(\forall x\ge y , F(x)\ge F(y)\)
        • right-continuous : \(\lim_{y\downarrow x}F(y)=\lim_{y\to x^+}F(y)=F(x)\)
      2. Thm : For all Stieltjes measure function \(F\) , there is a unique measure \(\mu\) on \((\mathbb R,\mathcal R)\) , with \[ \mu((a,b])=F(b)-F(a) \]

      3. When \(F(x)=x\) , \(\mu\) is Lebesgue measure

        right-continuous : If \(b_n\downarrow b\) , then \(\cup_{n}(a,b_n]=(a,b_n]\) (可以保持右闭)

      4. Def [CDF] : For probability measure : \(\lim\limits_{x\to -\infty}F(x)=0,\lim\limits_{x\to+\infty}F(x)=1\)

        \(F\) : Cumulative Distribution Function [CDF] .

  4. (*) semi-algebra , algebra , \(\sigma\)-algebra

    1. Def : [semi-algebra , algebra , \(\sigma\)-algebra]
structure complement intersection/union
semi-algebra \(S^c\) is a finite , disjoint union of sets in \(\mathcal S\) \(S,T\in \mathcal S\) , then \(S\cap T\in\mathcal S\)
algebra \(A\in \mathcal A\) , then \(A^c\in \mathcal A\) \(S,T\in \mathcal A\) , then \(S\cap T,S\cup T\in \mathcal A\)
\(\sigma\)-algebra \(A\in \mathcal F\) , then \(A^c\in \mathcal F\) \(A_i\in \mathcal F\) , countable sequence , then \(\cup_i A_i\in \mathcal F\)
  1. E.g. [algebra but not \(\sigma\)-algebra]

    \(\Omega=\mathbb Z\) , \(\mathcal A=\{A\subset \Omega | A\text{ or }A^c\text{ is finite}\}\)

    \(\mathcal A\) is obviously algebra but not \(\sigma\)-algebra

  2. Lemma

    If \(\mathcal S\) is a semi-algebra , then \(\bar{\mathcal S}=\{\text{finite disjoint union of sets in }\mathcal S\}\) is algebra .

    \(\bar{\mathcal S}\) is called the algebra generated by \(\mathcal S\) .

    Proof : easy to check two properties of algebra

    Question : Is this generation the smallest generation ?

  3. Def : [measure for algebra] a measure \(\mu\) on an algebra \(\mathcal A\) satisfies :

    • \(\forall A\in \mathcal A , \mu(A)\ge \mu(\varnothing)=0\)

    • If \(A_i\in \mathcal A\) is a disjoint sequence , and \(\cup_i A_i\in \mathcal A\) , then \[ \mu(\cup_i A_i)=\sum_{i}\mu(A_i) \]

    Def : [\(\sigma\)-finite] If there exists a sequence of sets $A_nA $ , $(A_n)<$ , \(\cup_n A_n=\Omega\) .

    We can let \(A_n'=\cup_{i=1}^n A_i\) , then \(A_n'\uparrow \Omega\) .

    We can let \(A_n'=A_n\cap(\cap_{i=1}^{n-1}A_i^c)\) , then \(A_n'\) are disjoint .

    即,在构造这样的 \(A_n\) 的时候,我们可以直接考虑 \(A_n\uparrow \Omega\)\(A_n\) 不交

  4. Thm : \(\mathcal S\) is a semi-algebra , \(\mu\) defined on \(\mathcal S\) with \(\mu(\varnothing)=0\)

    1. . If \(\mu\) satisfies :
    • If \(S\in \mathcal S\) is a finite disjoint union of sets \(S_i\in \mathcal S\) , then \(\mu(S)=\sum_{i}\mu(S_i)\)
    • If \(S_i,S\in \mathcal S\) , \(S=+_{i\ge 1} S_i\) , then \(\mu(S)\le \sum_{i\ge 1} \mu(S_i)\)

    Then \(\mu\) has a unique extension \(\bar \mu\) that is a measure on \(\bar{\mathcal S}\) .

    1. . If \(\bar\mu\) is \(\sigma\)-finite , then there is a unique extension \(\hat \mu\) that is a measure on \(\sigma(\mathcal S)\) .
  5. Lemma : If \(\mathcal S\) is a semi-algebra , \(\mu\) defined on \(\mathcal S\) with \(\mu(\varnothing)=0\) . If \(S\in \mathcal S\) is a finite disjoint union of sets \(S_i\in \mathcal S\) , then \(\mu(S)=\sum_{i}\mu(S_i)\) . Then ,

    • If \(A,B_i\in \bar{\mathcal S}\) , \(A=+_{i=1}^n B_i\) , then \(\bar \mu(A)=\sum_{i=1}^n \bar\mu(B_i)\)
    • If \(A,B_i\in \bar{\mathcal S}\) , \(A\subset \cup_{i=1}^n B_i\) , then \(\bar \mu(A)\le \sum_{i=1}^n \bar\mu(B_i)\)

    相当于,上面 (i) 中如果第一个条件成立,对于有限情况下的 第二个条件 一定成立,并可以直接扩展到 \(\bar{\mathcal S}\)\(\bar \mu\) 上 。

  6. 可以借助 Thm , 证明 Stieltjes measure function 对应的 measure 存在,且证明过程需要左开。

  7. (*) measure on \(\mathbb R^d\)

    1. 直接采用类似 Stieltjes measure function 的条件构造 measure 是不够的

      Restrictions :

      • non-decreasing : If \(\vec x\le \vec y\) ( \(\forall i\in [d] , x_i\le y_i\)) , then \(F(\vec x)\le F(\vec y)\)
      • right-continuous : Define \(\vec y\downarrow \vec x\) as \(\forall i\in [d] , y_i\downarrow x_i\) , then \(\lim_{\vec y\downarrow \vec x}F(\vec y)=F(\vec x)\)
      • (probability measure) \(\lim\limits_{\vec x\downarrow -\infty}F(\vec x)=0\) , \(\lim_{\vec x\uparrow +\infty}F(\vec x)=1\)

      Problem : \[ F(x_1,x_2)=\begin{cases} 1&x_1\ge 1,x_2\ge 1\\ \frac{2}{3}&x_1\ge 1,x_2\in [0,1)\\ \frac{2}{3}&x_1\in [0,1),x_2\ge 1\\ 0&\text{otherwise} \end{cases} \]

      \[ \mu((a_1,b_1]\times(a_2,b_2])=F(b_1,b_2)-F(a_1,b_2)-F(b_1,a_2)+F(a_1,a_2) \]

      Let $a_1,a_2=1-$ , \(b_1,b_2=1\) , \(\epsilon \to 0\) , then \[ \mu(\{1\}\times\{1\})=-\frac{1}{3}<0 \]

    2. Def : [ \(\mathbb R^d\) measure ]

      Consider finite rectangles \(A=(a_1,b_1]\times\cdots\times(a_d,b_d]\) , \(V=\{a_1,b_1\}\times\cdots\times\{a_d,b_d\}\)

      If \(v\in V\) , define \[ sgn(v)=(-1)^{|\{i\in [d]|v_i=a_i\}|}\\ \Delta_A F:=\sum_{v\in V}sgn(v)F(v) \] let \(\mu(A)=\Delta_A F\) .

      此处相当于 \(d\) 维前缀和与差分,\(V\) 相当于 \(d\) 维矩形 \(A\) 的所有顶点,\(sgn(v)\) 相当于顶点 \(v\) 有多少维是左顶点,然后容斥求差分。

    3. Thm : [\(\mathbb R^d\) measure ] If \(F:\mathbb R^d\to [0,1]\) , satisfies the \(3\) restrictions above , and for all rectangles \(A\) , \(\Delta_A F\ge 0\) . Then there is a unique probability measure \(\mu\) on \((\mathbb R^d,\mathcal R^d)\) that \(\mu(A)=\Delta_A F\) for all finite rectangles .

    4. If \(F(\vec x)=\prod_{i=1}^d F_i(x_i)\) , \(F_i\) are all Stieltjes measure function , then \[ \Delta_A F=\prod_{i=1}^d (F_i(b_i)-F_i(a_i)) \] When \(F_i(x)=x\) for all \(i\in [d]\) , \(F\) is Lebesgue measure on \(\mathbb R^d\) .

1.2 Random Variables

  1. measurable map

    1. Def : [measurable map] \(X:\Omega\to S\) is a measurable map from \((\Omega,\mathcal F)\) to \((S,\mathcal S)\) if \[ \forall B\in \mathcal S , X^{-1}(B):=\{w\in \Omega|X(w)\in B\}\in \mathcal F \] Def : [random vector] When \((S,\mathcal S)=(\mathbb R^d,\mathcal R^d)\) , \(X\) is random vector .

      Def : [random variable] When \((S,\mathcal S)=(\mathbb R,\mathcal R)\) , \(X\) is a random variable .

    2. 虽然 measurable map 写作 from \((\Omega,\mathcal F)\) to \((S,\mathcal S)\) ,但 \(X\) 本身并不实现 \(\mathcal F\to\mathcal S\) 的映射,只有 \(\Omega\to S\) 的映射。 \(\mathcal F,\mathcal S\) 是表明 measurable 的"范围"

      Random variable is not a variable but a (measurable) map

      这也很好解释了 \(E(X^2)\) 这种类型的记号的实际含义

    3. Thm [a sufficient condition for measurable map]

      \(X:\Omega\to S\) , $A $ : a collection of some subsets of \(S\) , If

      • \(\forall A\in \mathcal A , X^{-1}(A)\in \mathcal F\)
      • \(\mathcal A\) generates \(\mathcal S\)

      Then \(X\) is a measurable map from \((\Omega,\mathcal F)\) to \((S,\mathcal S)\) .

      Proof : Prove \(\mathcal B=\{B\subset S|X^{-1}(B)\in \mathcal F\}\) is a \(\sigma\)-field , and obviously \(\mathcal A\subset \mathcal B\) . Consider generation is the smallest , \(\mathcal S\subset \mathcal B\) .

    4. E.g. \(f:\mathbb R^d\to \mathbb R\) : \(f(x_1,\cdots,x_d)=\sum_{i=1}^d x_i\) is a measurable map from \((\mathbb R^d,\mathcal R^d)\) to \((\mathbb R,\mathcal R)\) .