Statistical Data Analysis - Course program


Preliminary version of the course program. As the course progresses, the font color changes from gray to black. 

lesson
date
lesson topics
total time

Part 1: Probability theory and probability models


1
4/10/2016
Introduction to the course. Review of the basic concepts of probability. Basic combinatorics.  2
2
6/10/2016
Set theory representation of probability space. Addition law for probabilities of incompatible events. Addition law for probabilities of non-mutually exclusive events. Conditional probabilities. Bayes' theorem. Independent events.
4
3
11/10/2016
Overview of random walk models in physics. Conditional probabilities in stochastic modeling: gambler's ruin and probabilistic diffusion models. Discrete and continuous forms of gambler's ruin. Bertrand's Paradox (1). Introduction to the Borel-Cantelli lemmas with proof of some auxiliary theorems.  6
4
13/10/2016
First Borel-Cantelli lemma. Second Borel-Cantelli lemma. Statistical independence and dimensional reduction. The transition from sample space to random variables.
Review of basic concepts and definitions on discrete and continuous random variables. Uniform distribution. Buffon's needle. Transformations of random variables. Brief review of mathematical expectation and its properties.
8
5
18/10/2016
Review of dispersion (variance) and its properties. Chebyshev's inequality. From Chebyshev's inequality to the weak law of large numbers. Other inequalities in probability theory: Markov's inequality, generalized Chebyshev's inequality. Strong law of large numbers.
Moment generating function. Bernoulli (indicator) random variables. Proof of the Chernoff bound.
10
6
25/10/2016
Applications of the Chernoff bound: more specific forms of the Chernoff bound; application to network routing.
11
7
27/10/2016
Brief review of common probability models. 1. The uniform distribution; 2. the Bernoulli distribution; 3 the binomial distribution; 4. the geometric distribution; 5. the negative binomial distribution. 5. the hypergeometric distribution;   13
8
8/11/2016
Distributions (ctd): 6. the multinomial distribution; 7. Poisson distribution; memoryless random processes; Examples: lottery tickets and the Poisson statistics;  cell plating statistics in biology; 8. exponential distribution. Example: paralyzable and non-paralyzable detectors. 15
9
10/11/2016
Distributions (ctd): 9. short overview of the De Moivre-Laplace theorem and the Gaussian distribution.
Transformations of random variables. Sum of random variables (convolution). Product of two random variables (Mellin's convolution). Functions of random variables. Approximate transformation of random variables: error propagation.
17
10
15/11/2016
Linear (orthogonal) transformation of random variables. The multivariate normal distribution. Other important distributions (lognormal, gamma, beta, Rayleigh, logistic, Laplace, Cauchy). Example: the distribution of nearest-neighbor distance.
19
11
17/11/2016
Distribution of nearest-neighbor distance (ctd.). Overview of Jaynes' solution of Bertrand's paradox.
Introduction to generating functions.
21
12
22/11/2016
Generating functions. Probability generating functions (PGF). PGF of the Poisson distribution. PGF of uniform and binomial distributions. Poisson distribution as limiting case of a binomial distribution. PGF of the Galton-Watson branching process. Photomultiplier noise.
24
13 24/11/2016 Characteristic functions. Moments of a distribution. Skewness and kurtosis. Mode and median. Properties of characteristic functions. The Central Limit Theorem (CLT).  The Berry-Esseen theorem. Additive and multiplicative processes. 25

Part 2: Statistical inference


13 24/11/2016
Descriptive and exploratory statistics. Sample mean, sample variance, estimate of covariance and correlation coefficient. PDF of sample mean for exponentially distributed data. Limitations of the standard deviation as a descriptor of the width of a distribution.  Shannon entropy and information concentration in pdf's. Box plots. Outliers. Violin plots. Rug plots. Kernel density plots.  27
14
29/11/2016
Introduction to the Monte Carlo method. Early history of the Monte Carlo method. Pseudorandom numbers. Uniformly distributed pseudorandom numbers. Transformation method. Transformation method and the trasformation of differential cross sections. Acceptance-rejection method. Examples: generation of angles in the e+e- -> mu+mu- scattering; generation of angles in the Bhabha scattering. 30
15
1/12/2016
The structure of a complete MC program to simulate low-energy electron transport. 
Statistical bootstrap.

33
16
6/12/2016
Maximum likelihood method 1. Point estimators. Connection with Bayes' theorem. Properties of estimators. Consistency of the maximum likelihood estimators. Asymptotic optimality of ML estimators. Bartlett's Identities. Cramer-Rao-Fisher bound. 36
17
13/12/2016
Maximum likelihood method 2. Variance of ML estimators. Introduction to Shannon's entropy. Information measures based on the Shannon's entropy: Kullback-Leibler divergence, Jeffreys distance, Fisher information.
Introduction to confidence intervals. Confidence intervals and confidence level. Detailed analysis of the Neyman construction of confidence intervals (link to the Neyman paper). Confidence intervals for the sample mean of exponentially distributed samples.
39
18
15/12/2016
Confidence intervals for the sample mean of exponentially distributed samples. Confidence intervals for the correlation coefficient of a bivariate Gaussian distribution from MC simulation. Maximum likelihood method 3. More properties of the likelihood function. Graphical method for the variance of ML estimators.  41
19
21/12/2016
Maximum likelihood method 4. Extended maximum likelihood. Examples. Introduction to ML with binned data. Example with two channels.  Other examples of ML with binned data: decay rate in radioactive decay; exponent of power-law. Very brief overview of chi-square and least squares fits, chi-square distribution, weighted straight line fits, general least squares fits, least squares fitting of binned data, and nonlinear least squares. Fit quality and dimension of parameter space. Chi-square and chi-square tests. 43
20
22/12/2016
Hypothesis tests, significance level. Examples. Critical region and acceptance region. Errors of the first and of the second kind. p-value and rejection of the null hypothesis. Chi-square as a test statistic. Neyman-Pearson lemma. Confidence intervals and the Feldman-Cousins construction (link to the FC paper).
45
23
10/1/2017
Final seminar by Diego Tonelli: Statistical Methods for the Large Hadron Collider (link to slides).
47

 Edoardo Milotti - January 2017