Blog Archives

Revised slides for January 23, 2015

1/26/2015

My notations for the EM algorithm were a bit sloppy, as Nancy pointed out during the lecture. On pages 6, 7, 8, 10 of my slides [first part of my lecture, before Professor Eric Kolaczyk's talk], I have now written E(z_{ik}|X, \hat{\Theta}) instead of E(z_{ik}|X, \Theta), where \hat{\Theta} denotes the *current* parameter estimate, which gets updated at every EM iteration [a fact made explicit on page 6].

For students who didn't appreciate the subtlety here, my old notations could give the impression that w_{ik} was a function of \Theta. But that's not the case. In the M-step, w_{ik} takes on a particular value [computed by the E-step]; if w_{ik} were a function of \Theta, the maximization problem in the M-step would become a lot more complicated [and almost destroy the very purpose of the EM].

I have also improved page 13 of my slides [again, first part of my lecture]. When we start to model each word [rather than each document] as a mixture, we would, of course, expect the distribution p(\cdot;\theta_k) to take on a slightly different meaning.

On page 12, x_i = (x_{i1}, x_{i2}, ..., x_{id})^T, where each x_{ij} counts the number of times word j [or the j-th word in the vocabulary] appears in document i. In this case, each component of the mixture, p(x_i;\theta_k), is a "usual" multinomial distribution.

On page 13, I have now used the notation x_{it} [rather than x_{ij}, to avoid confusion] to denote the t-th word in document i, and it could be the first word in the vocabulary [x_{it}=1], the second word in the vocabulary [x_{it}=2], or the j-th word in the vocabulary [x_{it}=j], for j going all the way up to d. In this case, each component of the mixture, p(x_{it};\theta_k), is equal to \theta_{kj} for x_{it}=j. In other words, it is the probability that the t-th word in document i is the j-th word in the vocabulary. We can see that this distribution is still very much "multinomial" in spirit, except we are now tossing the die only once, rather than multiple times. I have revised page 13 of my slides to make this distinction more explicit.

(MZ)

0 Comments

New page entitled "Materials" added to web site

1/15/2015

0 Comments

We have added a new page called "Materials" (see menu bar above), where links to lectures, slides, and various other materials (if any) will be posted as the course progresses.

(NR & MZ)

0 Comments

Video of first lecture (January 9, 2015)

1/13/2015

1 Comment

http://www.fields.utoronto.ca/video-archive/static/2015/01/324-3560/mergedvideo.ogv

(NR & MZ)

1 Comment

MZ's slides January 9

1/9/2015

0 Comments

Materials: tentative schedule as of today (subject to change); lecture slides.

(MZ)

0 Comments

NR's slides January 9

1/9/2015

0 Comments

Please let me know if this link to the slides is broken. Note that FieldsLive also archives all the lectures, with the slides linked to the audio.

(NR)

0 Comments

U Toronto students

1/9/2015

1 Comment

If you are registered for this course in the Department of Statistical Sciences, then the course number is STA 4412S, and the title on the department web page is "Topics in Inference for Big Data". But apparently on ROSI it comes up as "Likelihood and Asymptote" (which probably should have been "Likelihood Asymptotics"). This is probably because we created the course after the calendar deadline, but I'll check. If it's unchangeable, I'll be *delighted* to give you a lecture on likelihood asymptotics ;)

(NR)

1 Comment

First class on January 9, 2015

1/6/2015

0 Comments

Nancy will present for the first hour, and Mu will continue after a short break.

Nancy will give an overview of the topics to be covered during the thematic program, along with some background on how and why these topics were chosen, describe other topics that are important, but not covered, and give a summary of various musings on "Big Data" that she has been collecting since starting on this adventure.

Mu will cover some basics about supervised learning, in particular, regression (e.g., linear regression, nearest neighbors, kernel regression), classification (e.g., logistic regression, linear discriminant analysis, naive Bayes, neural network), and fundamental principles such as the bias-variance trade-off and the curse of dimensionality.

(NR & MZ)

0 Comments

Revised slides for January 23, 2015

New page entitled "Materials" added to web site

Video of first lecture (January 9, 2015)

MZ's slides January 9

NR's slides January 9

U Toronto students

First class on January 9, 2015

Author

Categories

Archives