Please tell us about your experiences @ https://www.surveymonkey.com/r/YDGVSWL. (NR & MZ)
7 Comments
I'd expect students to pick a topic from one or more lectures in this course, or from a presentation at one of the workshops in the Big Data program. It would probably be necessary to look at one or more of the source papers. Ideally, you'd describe your topic, for example 'reproducible research' (although this particular topic wouldn't quite fit since we haven't covered it); identify what seem to be some important references on the topic, explain how the topic fits into the framework of "Big Data", and discuss ideas of current interest.
Like Mu, I'd like to learn something new, to learn that you learned something new, and to read something that hangs together and looks nice! Feel free to submit a preliminary version for comments. Quite a few students have asked me what our expectations are for the 5-page critical essay that determines more than 50% of the grade in this course. Nancy and I have agreed to each write a blog about it. For me, I think the best way to put it is that I expect to learn something useful from each essay. I mean, why else would I spend hours reading through them? In some ways, that’s really the only expectation I have. Of course, nobody can learn much from an essay that is more or less incomprehensible, whether it is due to bad English writing, poor mathematical notation, or weak logical argument. There are some usual constraints, of course. For example, your essay obviously should be related to the course, and you should not plagiarize others’ work. Keep in mind that, generally, “double dipping” is not allowed, i.e., you cannot submit the same work to receive multiple academic credits. (MZ)
The schedule that was published on January 9 has been updated for our second half. Short critical essay (5 pages max)
Peer review of critical essay (1 page max)
Oral presentation
(NR & MZ)
My notations for the EM algorithm were a bit sloppy, as Nancy pointed out during the lecture. On pages 6, 7, 8, 10 of my slides [first part of my lecture, before Professor Eric Kolaczyk's talk], I have now written E(z_{ik}|X, \hat{\Theta}) instead of E(z_{ik}|X, \Theta), where \hat{\Theta} denotes the *current* parameter estimate, which gets updated at every EM iteration [a fact made explicit on page 6]. For students who didn't appreciate the subtlety here, my old notations could give the impression that w_{ik} was a function of \Theta. But that's not the case. In the M-step, w_{ik} takes on a particular value [computed by the E-step]; if w_{ik} were a function of \Theta, the maximization problem in the M-step would become a lot more complicated [and almost destroy the very purpose of the EM]. I have also improved page 13 of my slides [again, first part of my lecture]. When we start to model each word [rather than each document] as a mixture, we would, of course, expect the distribution p(\cdot;\theta_k) to take on a slightly different meaning. On page 12, x_i = (x_{i1}, x_{i2}, ..., x_{id})^T, where each x_{ij} counts the number of times word j [or the j-th word in the vocabulary] appears in document i. In this case, each component of the mixture, p(x_i;\theta_k), is a "usual" multinomial distribution. On page 13, I have now used the notation x_{it} [rather than x_{ij}, to avoid confusion] to denote the t-th word in document i, and it could be the first word in the vocabulary [x_{it}=1], the second word in the vocabulary [x_{it}=2], or the j-th word in the vocabulary [x_{it}=j], for j going all the way up to d. In this case, each component of the mixture, p(x_{it};\theta_k), is equal to \theta_{kj} for x_{it}=j. In other words, it is the probability that the t-th word in document i is the j-th word in the vocabulary. We can see that this distribution is still very much "multinomial" in spirit, except we are now tossing the die only once, rather than multiple times. I have revised page 13 of my slides to make this distinction more explicit. (MZ)
We have added a new page called "Materials" (see menu bar above), where links to lectures, slides, and various other materials (if any) will be posted as the course progresses. (NR & MZ)
http://www.fields.utoronto.ca/video-archive/static/2015/01/324-3560/mergedvideo.ogv (NR & MZ)
Please let me know if this link to the slides is broken. Note that FieldsLive also archives all the lectures, with the slides linked to the audio. (NR)
If you are registered for this course in the Department of Statistical Sciences, then the course number is STA 4412S, and the title on the department web page is "Topics in Inference for Big Data". But apparently on ROSI it comes up as "Likelihood and Asymptote" (which probably should have been "Likelihood Asymptotics"). This is probably because we created the course after the calendar deadline, but I'll check. If it's unchangeable, I'll be *delighted* to give you a lecture on likelihood asymptotics ;) (NR)
Nancy will present for the first hour, and Mu will continue after a short break. Nancy will give an overview of the topics to be covered during the thematic program, along with some background on how and why these topics were chosen, describe other topics that are important, but not covered, and give a summary of various musings on "Big Data" that she has been collecting since starting on this adventure. Mu will cover some basics about supervised learning, in particular, regression (e.g., linear regression, nearest neighbors, kernel regression), classification (e.g., logistic regression, linear discriminant analysis, naive Bayes, neural network), and fundamental principles such as the bias-variance trade-off and the curse of dimensionality. (NR & MZ)
|