ECE 901 Spring'14: Statistical Learning TheoryPrerequisites:Background in applied mathematics, probability, and statistics

Instructor:

Robert Nowak

E-mail: nowak@ece.wisc.edu

Web: http://nowak.ece.wisc.edu

3539 Engineering Hall

Office Hours: T-R 12:15-1pmMeeting times: Tuesday and Thursday 11am-12:15pm, room 2305 Engineering Hall

Course Format:

This course will serve as a primer for statistical learning theory as well as a platform

for exploring emerging theory and methods in large-scale data analytics. The course is

intended for PhD students in ECE, CS, Statistics and Mathematics who already have a strong

background in signal processing, machine learning, or mathematical statistics. The first

few weeks of the course will give an introduction to statistical learning theory (somewhat

following the lecture notes below). The rest of the course will involve reading and

discussion of a selection of papers. Each student will be responsible for reading all

papers and presenting a paper in the class.Grading: Grades will be based on class participation, quizzes, homework assignments,

presentations and/or reports. Each student will be responsible for presenting one of the papers

that the class will study. Every student is required to carefully read the lecture notes and/or

papers prior to class lectures.Lectures:

Topic 1: Regression

http://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture24.pdf

http://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture26.pdf

http://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture21.pdf

Topic 2: Concentration of Measure and Empirical Risk Minimization

http://nowak.ece.wisc.edu/ece901_concentration.pdf

http://nowak.ece.wisc.edu/ece830/ece830_fall11_lecture25.pdf

Topic 3: Vapnik-Chervonenkis Theory and Binary Classification

http://nowak.ece.wisc.edu/SLT09/lecture18.pdf

http://nowak.ece.wisc.edu/SLT09/lecture19.pdf

http://nowak.ece.wisc.edu/SLT09/lecture20.pdf

Topic 4: Structural Risk Minimization

http://nowak.ece.wisc.edu/SLT09/lecture9.pdf

http://nowak.ece.wisc.edu/SLT09/lecture10.pdf

http://code.ucsd.edu/~zeger/publications/journals/LuZe96-IT-Concept/LuZe96-IT-Concept.pdf

Topic 5: Minimax Lower Bounds

http://nowak.ece.wisc.edu/SLT14/lecture21.pdf

http://nowak.ece.wisc.edu/SLT14/sparse_recovery.pdf

Topic 6: Bernstein's Inequality and the Johnson-Lindenstrauss Lemma

section 2.8 and 2.9 in Concentration Inequalities: A Nonasymptotic Theory of Independence

Topic: Convex Losses and Rademacher Complexity

http://machinelearning.wustl.edu/mlpapers/paper_files/BartlettM02.pdf

Topic: Logistic Regression and Lasso

Nikhll Rao, March 6 http://arxiv.org/abs/1202.1212

Nikhll Rao, March 11 http://arxiv.org/pdf/1402.4512.pdf

March 13 TBA: Ravi Sastry

Topic: Compressed Sensing

Jing F and Shike M, March 25 http://www.eecs.berkeley.edu/~brecht/papers/11.candes.recht.pdf

http://nowak.ece.wisc.edu/SLT14/ece901_compressed_sensing.pdf

Topic: Sparsity and Learning Graphs

Sumeet K and Atul D, March 27 http://stat.ethz.ch/~nicolai/consistent.pdf

Topic: Dictionary Learning

Qinyuan Sun and Shulei Wang, April 1 http://people.csail.mit.edu/moitra/docs/bookex.pdf, chapter 5

Topic: The SVD and Matrix Completion

Urvashi O and Vamsi K, April 3 chapter 4 of http://www.cs.cornell.edu/jeh/NOSOLUTIONS90413.pdf

Han Li and Karthik A, April 8 http://pages.cs.wisc.edu/~brecht/papers/09.Recht.ImprovedMC.pdf

Topic: Clustering

Xin Zhang and Albert Oh, April 10 http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf

Daniel P and Rao Fu, April 15 http://www.cs.sunysb.edu/~jgao/paper/clustering_lines.pdf

Topic: Stochastic Gradient Methods and Large-scale Optimization

Xiaomao L and Ting-Ting, April 17 http://www.eecs.berkeley.edu/~brecht/cs294docs/week1/09.Nemirovski.pdf

Willem M and Urvashi O, April 22 http://nowak.ece.wisc.edu/ece830/ece830_spring13_adaptive_filtering.pdf

Topic: Sketching

Wentao Wu and Luwan Z, April 24 chapter 7 of http://www.cs.cornell.edu/jeh/NOSOLUTIONS90413.pdf

Song W and Won Kim, April 29 http://www.eecs.berkeley.edu/~brecht/cs294docs/week1/09.Strohmer.pdf

Topic: Multi-armed Bandits and Active Learning

Sumeet K and Atul D, May 1 http://arxiv.org/pdf/1312.7308.pdf

Wentao W and Karthik R, May 6 http://nowak.ece.wisc.edu/IT_minimax.pdf

Shike M and Ting-Ting, May 8 http://www.cc.gatech.edu/%7Eninamf/papers/optimal-lin-sep.pdf

Lecture Notes from Previous Course OfferingsLecture 1 A Probabilistic Approach to Pattern RecognitionLecture 2 Introduction to Classification and RegressionLecture 3 Introduction to Complexity RegularizationLecture 4 Denoising in Smooth Function SpacesLecture 5 Plug-in Rules and Histogram ClassifiersLecture 6 Probably Approximately Correct (PAC) LearningLecture 7 Chernoff's Bound and Hoeffding's InequalityLecture 8 Classification Error BoundsLecture 9 Error Bounds in Countably Infinite Models SpacesLecture 10 Complexity RegularizationLecture 11 Decision TreesLecture 12 Complexity Regularization for Squared Error LossLecture 13 Maximum Likelihood EstimationLecture 14 Maximum Likelihood and Complexity RegularizationLecture 15 Denoising II: Adapting to Unknown SmoothnessLecture 16 Wavelet Approximation TheoryLecture 17 Denoising III: Spatial AdaptivityLecture 18 Introduction to VC TheoryLecture 19 The VC InequalityLecture 20 Applications of VC TheoryLecture 21 Minimax Lower BoundsTextbooks and References:

A textbook will not be followed in this course. A collection of

notes, relevant papers and materials will be prepared and distributed.

Textbooks recommended for further reading are listed below.

A probabilistic theory of pattern recognition, Devroye, Gyorfi, Lugosi, Springer

Nonparameteric Estimation Theory, Iain Johnstone, unpublished monograph

The Elements of Statistical Learning, Hastie, et al, Springer

An introduction to support vector machines, Cristianini and Shawe-Taylor, Cambridge Press

Combinatorial methods in density estimation, Devroye and Lugosi, Springer

Statistical Learning Theory, Vapnik, Wiley

An Introduction to Computational Learning Theory, Kearns and Vazirani, MIT Press

Empirical Processes in M-Estimation, van de Geer, Cambridge Press

Grading and Evaluation:

Grades will be based on course participation, quizzes, and lecture presentations.