ECE 901 Spring'14: Statistical Learning Theory

Prerequisites:

Background in applied mathematics, probability, and statistics

Instructor:

Robert Nowak
E-mail: nowak@ece.wisc.edu
Web: http://nowak.ece.wisc.edu
3539 Engineering Hall
Office Hours: T-R 12:15-1pm

Meeting times: Tuesday and Thursday 11am-12:15pm, room 2305 Engineering Hall

Course Format:
This course will serve as a primer for statistical learning theory as well as a platform
for exploring emerging theory and methods in large-scale data analytics. The course is
intended for PhD students in ECE, CS, Statistics and Mathematics who already have a strong
background in signal processing, machine learning, or mathematical statistics. The first
few weeks of the course will give an introduction to statistical learning theory (somewhat
following the lecture notes below). The rest of the course will involve reading and
discussion of a selection of papers. Each student will be responsible for reading all
papers and presenting a paper in the class.

Grading: Grades will be based on class participation, quizzes, homework assignments,
presentations and/or reports. Each student will be responsible for presenting one of the papers
that the class will study. Every student is required to carefully read the lecture notes and/or
papers prior to class lectures.

Lectures:

Topic 1: Regression
http://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture24.pdf
http://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture26.pdf
http://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture21.pdf

Topic 2: Concentration of Measure and Empirical Risk Minimization
http://nowak.ece.wisc.edu/ece901_concentration.pdf
http://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture25.pdf

Topic 3: Vapnik-Chervonenkis Theory and Binary Classification
http://nowak.ece.wisc.edu/SLT09/lecture18.pdf
http://nowak.ece.wisc.edu/SLT09/lecture19.pdf
http://nowak.ece.wisc.edu/SLT09/lecture20.pdf

Topic 4: Structural Risk Minimization
http://nowak.ece.wisc.edu/SLT09/lecture9.pdf
http://nowak.ece.wisc.edu/SLT09/lecture10.pdf
http://code.ucsd.edu/~zeger/publications/journals/LuZe96-IT-Concept/LuZe96-IT-Concept.pdf

Topic 5: Minimax Lower Bounds
http://nowak.ece.wisc.edu/SLT14/lecture21.pdf
http://nowak.ece.wisc.edu/SLT14/sparse_recovery.pdf

Topic 6: Bernstein's Inequality and the Johnson-Lindenstrauss Lemma
section 2.8 and 2.9 in Concentration Inequalities: A Nonasymptotic Theory of Independence

Topic: Convex Losses and Rademacher Complexity
http://machinelearning.wustl.edu/mlpapers/paper_files/BartlettM02.pdf

Topic: Logistic Regression and Lasso
Nikhll Rao, March 6 http://arxiv.org/abs/1202.1212
Nikhll Rao, March 11 http://arxiv.org/pdf/1402.4512.pdf
March 13 TBA: Ravi Sastry

Topic: Compressed Sensing
Jing F and Shike M, March 25 http://www.eecs.berkeley.edu/~brecht/papers/11.candes.recht.pdf
http://nowak.ece.wisc.edu/SLT14/ece901_compressed_sensing.pdf

Topic: Sparsity and Learning Graphs
Sumeet K and Atul D, March 27 http://stat.ethz.ch/~nicolai/consistent.pdf

Topic: Dictionary Learning
Qinyuan Sun and Shulei Wang, April 1 http://people.csail.mit.edu/moitra/docs/bookex.pdf, chapter 5

Topic: The SVD and Matrix Completion
Urvashi O and Vamsi K, April 3 chapter 4 of http://www.cs.cornell.edu/jeh/NOSOLUTIONS90413.pdf
Han Li and Karthik A, April 8 http://pages.cs.wisc.edu/~brecht/papers/09.Recht.ImprovedMC.pdf

Topic: Clustering
Xin Zhang and Albert Oh, April 10 http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf
Daniel P and Rao Fu, April 15 http://www.cs.sunysb.edu/~jgao/paper/clustering_lines.pdf

Topic: Stochastic Gradient Methods and Large-scale Optimization
Xiaomao L and Ting-Ting, April 17 http://www.eecs.berkeley.edu/~brecht/cs294docs/week1/09.Nemirovski.pdf
Willem M and Urvashi O, April 22 http://nowak.ece.wisc.edu/ece830/ece830_spring13_adaptive_filtering.pdf

Topic: Sketching
Wentao Wu and Luwan Z, April 24 chapter 7 of http://www.cs.cornell.edu/jeh/NOSOLUTIONS90413.pdf
Song W and Won Kim, April 29 http://www.eecs.berkeley.edu/~brecht/cs294docs/week1/09.Strohmer.pdf

Topic: Multi-armed Bandits and Active Learning
Sumeet K and Atul D, May 1 http://arxiv.org/pdf/1312.7308.pdf
Wentao W and Karthik R, May 6 http://nowak.ece.wisc.edu/IT_minimax.pdf
Shike M and Ting-Ting, May 8 http://www.cc.gatech.edu/%7Eninamf/papers/optimal-lin-sep.pdf





Lecture Notes from Previous Course Offerings

Lecture 1 A Probabilistic Approach to Pattern Recognition
Lecture 2 Introduction to Classification and Regression
Lecture 3 Introduction to Complexity Regularization
Lecture 4 Denoising in Smooth Function Spaces
Lecture 5 Plug-in Rules and Histogram Classifiers
Lecture 6 Probably Approximately Correct (PAC) Learning
Lecture 7 Chernoff's Bound and Hoeffding's Inequality
Lecture 8 Classification Error Bounds
Lecture 9 Error Bounds in Countably Infinite Models Spaces
Lecture 10 Complexity Regularization
Lecture 11 Decision Trees
Lecture 12 Complexity Regularization for Squared Error Loss
Lecture 13 Maximum Likelihood Estimation
Lecture 14 Maximum Likelihood and Complexity Regularization
Lecture 15 Denoising II: Adapting to Unknown Smoothness
Lecture 16 Wavelet Approximation Theory
Lecture 17 Denoising III: Spatial Adaptivity
Lecture 18 Introduction to VC Theory
Lecture 19 The VC Inequality
Lecture 20 Applications of VC Theory
Lecture 21 Minimax Lower Bounds


Textbooks and References:

A textbook will not be followed in this course. A collection of
notes, relevant papers and materials will be prepared and distributed.
Textbooks recommended for further reading are listed below.

A probabilistic theory of pattern recognition, Devroye, Gyorfi, Lugosi, Springer
Nonparameteric Estimation Theory, Iain Johnstone, unpublished monograph
The Elements of Statistical Learning, Hastie, et al, Springer
An introduction to support vector machines, Cristianini and Shawe-Taylor, Cambridge Press
Combinatorial methods in density estimation, Devroye and Lugosi, Springer
Statistical Learning Theory, Vapnik, Wiley
An Introduction to Computational Learning Theory, Kearns and Vazirani, MIT Press
Empirical Processes in M-Estimation, van de Geer, Cambridge Press

Grading and Evaluation:

Grades will be based on course participation, quizzes, and lecture presentations.