Statistical Data Analysis - Course program


Preliminary version of the course program. As the course progresses, the program becomes final and the font color changes from gray to black. 

lesson
date
lesson topics
total time

Part 1: Probability theory and probability models


1
6/10/2021
Introduction to the course. Review of the basic concepts of probability. Basic combinatorics. 2
2
8/10/2021
Example of approximate combinatorial reasoning in physics: Pauling and the ice model (link to presentation).
Set theory representation of probability space. Addition law for probabilities of incompatible events. Addition law for probabilities of non-mutually exclusive events: the inclusion-exclusion principle. Application of the inclusion-exclusion principle.
4
3
13/10/2021
Conditional probabilities. Independent events. Statistical independence and dimensional reduction.
Bayes' theorem and basic introduction to Bayesian inference. Elementary example of Bayesian inference.
6
4
15/10/2021
Conditional probabilities in stochastic modeling: gambler's ruin and probabilistic diffusion models. Discrete and continuous forms of gambler's ruin. Overview of random walk models in physics. Proof of the first and second Borel-Cantelli lemmas.
The transition from sample space to random variables. Review of basic concepts and definitions on discrete and continuous random variables. Uniform distribution.
8
5
20/10/2021
Buffon's needle.
Expectation, dispersion and their properties. Higher moments. Moment generating function.
Chebyshev's inequality. Other inequalities in probability theory: Markov's inequality, generalized Chebyshev's inequality. From Chebyshev's inequality to the weak law of large numbers.
10
6
22/10/2021
Strong law of large numbers.
Brief review of common probability models. 1. The uniform distribution; 2. the Bernoulli distribution; 3 the binomial distribution. 4. the geometric distribution;
12
7
27/10/2021
Distributions (ctd): 5. the negative binomial distribution; application of the negative binomial distribution to opinion polling. 5. the hypergeometric distribution. 6. the multinomial distribution. 7. Poisson distribution. 14
8
29/10/2021
Distributions (ctd): 7. Poisson distribution (ctd.); memoryless random processes; Examples: cell plating statistics in biology; Poisson survival probability in radiobiology. 8. exponential distribution. Example: paralyzable and non-paralyzable detectors.
15
9
5/11/2021
Distributions (ctd): 9. Example: coincidence counting. 10. The De Moivre - Laplace theorem and the normal distribution. Properties of the normal distribution. Transformations of random variables. Sum of random variables (convolution).
17
10
10/11/2021
Functions of random variables. Approximate transformation of random variables: error propagation. Linear (orthogonal) transformation of random variables. The multivariate normal distribution.  
19
11
12/11/2021
Other important distributions (lognormal, gamma, beta, Rayleigh, logistic, Laplace, Cauchy). Example of a complex model used to setup a null hypothesis: the distribution of nearest-neighbor distance. Bertrand's paradox.  21
12
19/11/2021
Overview of Jaynes' solution of Bertrand's paradox.
Introduction to probability generating and characteristic functions.  
23
13 24/11/2021 Generating and characteristic functions of some common distributions. Poisson distribution as limiting case of a binomial distribution. PGF of the Galton-Watson branching process. Photomultiplier noise. 
25
14
26/11/2021
Photomultiplier noise (ctd.). More on the moments of a distribution. Skewness and kurtosis. Mode and median. Properties of characteristic functions.
27
15
1/12/2021
The Central Limit Theorem (CLT). Further comments on on the CLT. The Berry-Esseen theorem. Multiplicative processes. Power Laws.
29

Part 2: Statistical inference


16 3/12/2021
Descriptive and exploratory statistics. Quick review of sample mean, sample variance, estimate of covariance and correlation coefficient. Visualization in statistics (see this beautiful example due to Hans Rosling). 31
17
10/12/2021
PDF of sample mean for exponentially distributed data.  Order statistics. Box plots. Outliers. Violin plots. Rug plots. Kernel density plots. Schwartz's inequality and Pearson's correlation coefficient.
Introduction to the Monte Carlo method. Pseudorandom numbers.Uniformly distributed pseudorandom numbers. 
33
18
15/12/2021 Transformation method. Example: generation of exponentially-distributed pseudorandom numbers.
The Box-Müller method for the generation of pairs of Gaussian variates. Transformation method and the transformation of differential cross sections. Acceptance-rejection method. Monte Carlo method examples: Examples: 1. generation of angles in the e+e- -> mu+mu- scattering; 2. generation of angles in the Bhabha scattering. 3. the structure of a complete MC program to simulate low-energy electron transport.
34
19
17/12/2021 Early history of the Monte Carlo method.
Statistical bootstrap.
Review of the Bayesian approach to statistical inference.
36
20
20/12/2021 Maximum likelihood method 1. Connection with Bayes' theorem. The Maximum Likelihood principle. Example with exponentially distributed data. Point estimators. Properties of estimators. Transformations of estimators. 38
21
22/12/2021 Maximum likelihood method 2. Consistency of the maximum likelihood estimators. Asymptotic optimality of ML estimators. Bartlett's Identities. Cramer-Rao-Fisher bound. Variance of ML estimators.  Efficiency and Gaussianity of ML estimators. Introduction to Shannon's entropy.
Information measures based on the Shannon's entropy: Kullback-Leibler divergence, Jeffreys distance, Fisher information. 
40
22
12/1/2022 Information measures based on the Shannon's entropy: Kullback-Leibler divergence, Fisher information (ctd.).
Maximum likelihood method 3. Non-uniqueness of the likelihood function. Credible intervals in the Bayesian perspective. Introduction to confidence intervals. Confidence intervals for the sample mean of exponentially distributed samples. Graphical method for the variance of ML estimators. Example with two counting channels. Confidence intervals and confidence level. Detailed analysis of the Neyman construction of confidence intervals (link to the Neyman paper). Confidence intervals for the correlation coefficient of a bivariate Gaussian distribution from MC simulation.
42
23
14/01/2022  More properties of the likelihood function. Extended maximum likelihood. Examples. Introduction to ML with binned data. Example of ML with binned data: decay rate in radioactive decay. Example of ML with binned data: power laws. Chi-square and its relation to ML. Very brief overview of chi-square and least squares fits. 44
24
19/01/2022 The chi-square distribution. The chi-square distribution in the frame of multidimensional geometry. Least squares fits and chi-square. An anecdote about Fermi and multiparametric fits (see Dyson's paper and this paper on "fitting an elephant"). 46
25
21/01/2022
Hypothesis tests, significance level. Examples. Critical region and acceptance region. Errors of the first and of the second kind. Chi-square as a test statistic. The Neyman-Pearson lemma.  Fisher's p-value and rejection of the null hypothesis. p-values and multiple testing, the Sidak correction and the Bonferroni correction, with a discussion on the look-elsewhere-effect and trials factors. 48

 Edoardo Milotti - January 2022