Skip to content

Statistical Inference

1 Populations and Samples

  1. Populations are the complete set of all possible observations or measurements that are of interest. Its size is the number of elements in the population, denoted by N.

  2. Samples are subsets of the population selected by sampling procedures. Its size is the number of elements in the sample, denoted by n.

  3. A sampling procedure is called biased if it produces inferences that consistently overestimate or underestimate the population parameter of interest.

2 Location Measures

  1. Mean is the sum of all observations divided by the number of observations. It is denoted by x¯. It is a measure of central tendency and sensitive to outliers.

    • For scalar data x: x¯=1ni=1nxi

    • For vector data xiRd×1: x¯=1ni=1nxi

  2. Median is the middle value of a dataset. It is denoted by M. It is a measure of central tendency and robust to outliers.

    • For scalar data x:
    M={x(n+12)if n is odd12(x(n2)+x(n2+1))if n is even
    • For vector data xiRd×1: We compute the geometric median.
    M=argminmi=1nxim2
  3. Mode is the value that appears most frequently in a dataset. It is denoted by Mo. It is a measure of central tendency and robust to outliers.

    • For scalar data x:
    Mo=argmaxxi=1nI(xi=x)
    • For vector data xiRd×1: We compute the mode corresponds to the point where all data points, then combine them to form a new mode vector.

    • If samples are less repetitive, we group them into bins and compute the mode of each bin.

  4. Variance and Standard Deviation are measures of dispersion. They quantify the spread of the data around the mean. They are denoted by s2 and s (For sample) and σ2 and σ (For population).

    • For scalar data x:
    s2=1n1i=1n(xix¯)2s=s2
    • For vector data xiRd×1:
    s2=1n1i=1nxix¯22s=s2
    • For population variance and standard deviation, replace n1 by n.

    Why?

    The denominator is n1 instead of n because the sample variance is an unbiased estimator of the population variance. The unbiased estimator is the one that gives the correct answer on average over many samples. The sample variance is an unbiased estimator because the expected value of the sample variance is equal to the population variance.

  5. Boxplot is a graphical representation of the data based on the five-number summary: minimum, first quartile, median, third quartile, and maximum. It is useful for detecting outliers and comparing distributions.

    1. Minimum: The smallest value in the dataset.
    2. First Quartile (Q1): The median of the lower half of the dataset.
    3. Median (Q2): The middle value of the dataset.
    4. Third Quartile (Q3): The median of the upper half of the dataset.
    5. Maximum: The largest value in the dataset.
    6. Interquartile Range (IQR): The range between the first and third quartiles, IQR=Q3Q1.
    7. Outliers: Values that fall below Q11.5×IQR or above Q3+1.5×IQR.
    8. Whiskers: The lines extending from the box to the minimum and maximum values, excluding outliers.

3 Correlation Measures

  1. Pearson Correlation Coefficient measures the linear relationship between two variables. It ranges from -1 to 1, where:

    • 1 indicates a perfect positive linear relationship.
    • -1 indicates a perfect negative linear relationship.
    • 0 indicates no linear relationship.
    ρ=i=1n(xix¯)(yiy¯)i=1n(xix¯)2i=1n(yiy¯)2ρ=Cov(x,y)sxsy
  2. Multi-correlation Coefficient measures the linear relationship between multiple variables. It ranges from 0 to 1, where:

    • 1 indicates a perfect linear relationship.
    • 0 indicates no linear relationship.
    R2=i=1n(y^iy¯)2i=1n(yiy¯)2
  3. Covariance measures the relationship between two variables. It is the Pearson correlation coefficient multiplied by the standard deviations of the variables.

    • If Cov(x,y)>0, the variables are positively correlated.
    • If Cov(x,y)<0, the variables are negatively correlated.
    Cov(x,y)=1n1i=1n(xix¯)(yiy¯)

    Note

    If given two variables AR1×n and BR1×n, the covariance matrix is given by:

    Cov(A,B)=1n1i=1n(AiA¯)(BiB¯)
  4. Covariance Matrix: The covariance matrix is a square matrix that contains the covariance between each pair of variables. It is symmetric and positive semi-definite.

    • The diagonal elements are the variances of the variables.
    • The off-diagonal elements are the covariances between the variables.
    Cov(X)=[Cov(X1,X1)Cov(X1,X2)Cov(X1,Xd)Cov(X2,X1)Cov(X2,X2)Cov(X2,Xd)Cov(Xd,X1)Cov(Xd,X2)Cov(Xd,Xd)]

    Note

    Let x1,x2,,xnRd×1, (sample), then, its covariance matrix is given by:

    Σ=1n1i=1n(xix¯)(xix¯)TRd×d.

    Question

    1. How to understand the covariance matrix?

    2. Linear relationship means direction + strength. dose the covaiance matrix here indicates these info?

    1. The covariance matrix ΣRd×d represents the relationships between the components of the vectors. For diagonal elements, they represent the variance of each component. For off-diagonal elements, they represent the covariance between the components. The covariance matrix is symmetric and positive semi-definite. If its value is positive, it means the two components are positively correlated. If its value is negative, it means the two components are negatively correlated. If its value is zero, it means the two components are nearly independent.
    2. A linear relationship consists of direction and strength.
    3. Strength of Linear Relationship: Covariance values indicate the degree to which two dimensions are linearly related. However, these values are not normalized, so interpreting the "strength" directly can be difficult. Instead, the correlation coefficient (a normalized version of covariance) is often used to quantify strength. For Σij, the closer its magnitude to ΣiiΣjj, the stronger the linear relationship.
    4. Direction of Linear Relationship: The same to above:
    • If Cov(x,y)>0, the variables are positively correlated.
    • If Cov(x,y)<0, the variables are negatively correlated.

    Question

    If we suppose the dataset X=[A1,A2,,Ad]TRd×n, then, what is the dimension of the covariance matrix Σ?

    Σ=[Var(A1,A1)Cov(A1,A2)Cov(A1,Ad)Cov(A2,A1)Var(A2,A2)Cov(A2,Ad)Cov(Ad,A1)Cov(Ad,A2)Var(Ad,Ad)]

    The covariance matrix ΣRd×d.

4 Shape Measures

  1. Skewness measures the asymmetry of the data distributions. It is denoted by γ.

    • If γ>0, the distribution is right-skewed.
    • If γ<0, the distribution is left-skewed.
    • If γ=0, the distribution is symmetric.
    • If |γ| is small, it has a mild skewness.
    • If |γ| is large, it has a severe skewness.
    γ=1ni=1n(xix¯)3s3

    From Wikipedia:

    In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

    γ1:=μ~3=E[(Xμσ)3]=μ3σ3=E[(Xμ)3](E[(Xμ)2])3/2=κ3κ23/2
  2. Kurtosis measures the tailedness of the data distributions. It is denoted by κ.

    • If κ>3, the distribution has heavier tails than the normal distribution (Leptokurtic).
    • If κ<3, the distribution has lighter tails than the normal distribution (Platykurtic).
    • If κ=3, the distribution has tails similar to the normal distribution (Mesokurtic).
    κ=1ni=1n(xix¯)4s4

    From Wikipedia:

    In probability theory and statistics, kurtosis (from Greek: κυρτός, kyrtos or kurtos, meaning "curved, arching") refers to the degree of “tailedness” in the probability distribution of a real-valued random variable. Similar to skewness, kurtosis provides insight into specific characteristics of a distribution. Various methods exist for quantifying kurtosis in theoretical distributions, and corresponding techniques allow estimation based on sample data from a population. It’s important to note that different measures of kurtosis can yield varying interpretations.

    κ2:=μ~4=E[(Xμσ)4]=μ4σ4=E[(Xμ)4](E[(Xμ)2])2=κ4κ22