(NR & MZ)

]]>Like Mu, I'd like to learn something new, to learn that you learned something new, and to read something that hangs together and looks nice! Feel free to submit a preliminary version for comments.]]>

For me, I think the best way to put it is that

There are some usual constraints, of course. For example, your essay obviously should be related to the course, and you should not plagiarize others’ work. Keep in mind that, generally, “double dipping” is not allowed, i.e., you cannot submit the same work to receive multiple academic credits.

(MZ)

]]>- Due Monday,
__April 13, 2015__, @ 12:00 noon EDT. - Submit both a blinded version (without your name) and a "regular" version (with your name).
- Peer review assignments will be made randomly and use the blinded version.

- Due Monday,
__April 20, 2015__, @ 12:00 noon EDT.

- Eliminated. Course enrollment is higher than expected.

(NR & MZ)

]]>For students who didn't appreciate the subtlety here, my old notations could give the impression that w_{ik} was a function of \Theta. But that's not the case. In the M-step, w_{ik} takes on a particular value [computed by the E-step]; if w_{ik} were a function of \Theta, the maximization problem in the M-step would become a lot more complicated [and almost destroy the very purpose of the EM].

I have also improved page 13 of my slides [again, first part of my lecture]. When we start to model each word [rather than each document] as a mixture, we would, of course, expect the distribution p(\cdot;\theta_k) to take on a slightly different meaning.

On page 12, x_i = (x_{i1}, x_{i2}, ..., x_{id})^T, where each x_{ij} counts the number of times word j [or the j-th word in the vocabulary] appears in document i. In this case, each component of the mixture, p(x_i;\theta_k), is a "usual" multinomial distribution.

On page 13, I have now used the notation x_{it} [rather than x_{ij}, to avoid confusion] to denote the t-th word in document i, and it could be the first word in the vocabulary [x_{it}=1], the second word in the vocabulary [x_{it}=2], or the j-th word in the vocabulary [x_{it}=j], for j going all the way up to d. In this case, each component of the mixture, p(x_{it};\theta_k), is equal to \theta_{kj} for x_{it}=j. In other words, it is the probability that the t-th word in document i is the j-th word in the vocabulary. We can see that this distribution is still very much "multinomial" in spirit, except we are now tossing the die only once, rather than multiple times. I have revised page 13 of my slides to make this distinction more explicit.

(MZ)

]]>(NR & MZ)

]]>(NR & MZ)

]]>(MZ)

]]>(NR)

]]>(NR)

]]>(NR & MZ)

]]>