Statistical Data Analysis - Course program


Preliminary version of the course program. As the course progresses, the font color changes from gray to black. 

lesson
date
lesson topics
total time

Part 1: Probability theory and probability models


1
6/10/2020
Introduction to the course. Review of the basic concepts of probability. Basic combinatorics. 2
2
9/10/2020
Example of approximate combinatorial reasoning in physics: Pauling and the ice model.
Stirling's formula. Set theory representation of probability space. Addition law for probabilities of incompatible events. Addition law for probabilities of non-mutually exclusive events.
4
3
13/10/2020
Conditional probabilities. Bayes' theorem. Independent events. Statistical independence and dimensional reduction.
Elementary examples of Bayesian inference. Conditional probabilities in stochastic modeling: gambler's ruin and probabilistic diffusion models. Discrete and continuous forms of gambler's ruin. Overview of random walk models in physics.
6
4
16/10/2020
Proof of the first and second Borel-Cantelli lemmas.
The transition from sample space to random variables. Review of basic concepts and definitions on discrete and continuous random variables.
Uniform distribution. Buffon's needle.
8
5
20/10/2020
Expectation, dispersion and their properties. Higher moments. Moment generating function.
Chebyshev's inequality. Other inequalities in probability theory: Markov's inequality, generalized Chebyshev's inequality. From Chebyshev's inequality to the weak law of large numbers. Strong law of large numbers.
10
6
23/10/2020
Brief review of common probability models. 1. The uniform distribution; 2. the Bernoulli distribution; 3 the binomial distribution. 4. the geometric distribution; 5. the negative binomial distribution. 5. the hypergeometric distribution; 12
7
27/10/2020
Distributions (ctd): Application of the negative binomial distribution to opinion polling. 6. the multinomial distribution. 7. Poisson distribution; memoryless random processes; Examples: cell plating statistics in biology; Poisson survivaval probability in radiobiology. 14
8
30/10/2020
Distributions (ctd): 8. exponential distribution. Example: paralyzable and non-paralyzable detectors. 9. The De Moivre - Laplace theorem and the normal distribution.  15
9
6/11/2020
Properties of the normal distribution. Transformations of random variables. Sum of random variables (convolution). Functions of random variables. Approximate transformation of random variables: error propagation. Linear (orthogonal) transformation of random variables. The multivariate normal distribution.  
17
10
10/11/2020
Other important distributions (lognormal, gamma, beta, Rayleigh, logistic, Laplace, Cauchy). Example of a complex model used to setup a null hypothesis: the distribution of nearest-neighbor distance. Bertrand's paradox.  19
11
13/11/2020
Overview of Jaynes' solution of Bertrand's paradox.
Introduction to probability generating and characteristic functions. 
21
12
17/11/2020
Generating and characteristic functions of some common distributions. Poisson distribution as limiting case of a binomial distribution. PGF of the Galton-Watson branching process. Photomultiplier noise.  23
13 20/11/2020 Description and discussion of the Delbr�ck-Luria experiment (link to slides). More on the moments of a distribution. Skewness and kurtosis. Mode and median. Properties of characteristic functions. The Central Limit Theorem (CLT). 25

Part 2: Statistical inference


14
24/11/2020
Further comments on on the CLT. Multiplicative processes and the log-normal distribution.
Descriptive and exploratory statistics. Sample mean, sample variance, estimate of covariance and correlation coefficient.
27
15
27/11/2020
PDF of sample mean for exponentially distributed data. Considerations on the adequacy of common estimators (sample mean, sample standard deviation, and consideration of one alternative to standard deviation as a measure of extent of the bulk of a distribution). Order statistics. Visualization in statistics (see this beautiful example due to Hans Rosling). Box plots. Outliers. Violin plots. Rug plots. Kernel density plots. Important inequalities (Schwartz's inequality and Pearson's correlation coefficient, Jensen's inequality).
29
16 1/12/2020
Introduction to the Monte Carlo method. Early history of the Monte Carlo method. Pseudorandom numbers.Uniformly distributed pseudorandom numbers. Transformation method. Example: generation of exponentially-distributed pseudorandom numbers. 31
17
4/12/2020
The Box-M�ller method for the generation of pairs of Gaussian variates. Transformation method and the transformation of differential cross sections. Acceptance-rejection method. Monte Carlo method examples: Examples: 1. generation of angles in the e+e- -> mu+mu- scattering; 2. generation of angles in the Bhabha scattering. 33
18
5/12/2020 3. the structure of a complete MC program to simulate low-energy electron transport 34
19
11/12/2020 Examples: 4. MC program to understand the basics of treatment planning in radiotherapy (link to presentation). Statistical bootstrap.
36
20
15/12/2020 Maximum likelihood method 1. Connection with Bayes' theorem. The Maximum Likelihood principle. Example with exponentially distributed data. Point estimators. Properties of estimators. Transformations of estimators. 38
21
18/12/2020 Maximum likelihood method 2. Consistency of the maximum likelihood estimators. Asymptotic optimality of ML estimators. Bartlett's Identities. Cramer-Rao-Fisher bound. Variance of ML estimators.  Efficiency and Gaussianity of ML estimators. Introduction to Shannon's entropy.
40
22
22/12/2020 Information measures based on the Shannon's entropy: Kullback-Leibler divergence, Jeffreys distance, Fisher information.  Maximum likelihood method 3. Non-uniqueness of the likelihood function. Credible intervals in the Bayesian perspective. Introduction to confidence intervals. Confidence intervals for the sample mean of exponentially distributed samples. Graphical method for the variance of ML estimators. Example with two counting channels.
42
23
12/01/2021 Confidence intervals and confidence level. Detailed analysis of the Neyman construction of confidence intervals (link to the Neyman paper). Confidence intervals for the correlation coefficient of a bivariate Gaussian distribution from MC simulation.
More properties of the likelihood function. Extended maximum likelihood. Examples. Introduction to ML with binned data. Example of ML with binned data: decay rate in radioactive decay. Example of ML with binned data: power laws. 
44
24
15/01/2021 Chi-square and its relation to ML. Very brief overview of chi-square and least squares fits. The chi-square distribution. The chi-square distribution in the frame of multidimensional geometry.
46
25
19/01/2021
An anecdote about Fermi and multiparametric fits (see Dyson's paper and this paper on "fitting an elephant"). Hypothesis tests, significance level. Examples. Critical region and acceptance region. Errors of the first and of the second kind. Chi-square as a test statistic. The Neyman-Pearson lemma.  Fisher's p-value and rejection of the null hypothesis. p-values and multiple testing, the Sidak correction and the Bonferroni correction, with a discussion on the look-elsewhere-effect and trials factors. 48

 Edoardo Milotti - January 2021