Friday, November 14, 2014

LDA and Topic Models Reading List

A big thank you to everyone that came to see me talk about topic models at DC-NLP on Wednesday. I am grateful for the feedback that I received. I'd also like to give a big shout out to my co-author, Brian St. Thomas. Not only has his hard work made our research shine, he is the one who came up with the "ball and urns" graphic to explain topic models. Many people came up to me afterwords saying how intuitive that was; I wish I could take the credit, but it was all Brian.

While I wait on approval from work to release my slides, I thought I'd put together an LDA-related reading list of many of my sources. I've done a bit of that before here. Some of those papers are also below, as well as others.

LDA Basics

  1. Rethinking LDA: Why Priors Matter (This is a good paper, though I am skeptical of the conclusion.)
  2. Comparison of topic models, their estimation algorithms, and priors. (Very underrated, MUST READ.)
  3. Incorporating Zipf's law in language models
  4. A note on estimating LDA with asymmetric priors

Evaluating LDA/Issues With LDA

  1. LDA is an inconsistent estimator
  2. Reading Tea Leaves: How humans interpret topic models (Also, MUST READ.)
  3. A coherence (cohesion?) metric for topic models. (Note: This metric has the issue of "liking" topics full of statistically-independent words. It is still useful though.)

Other Topic Models

  1. Spherical topic models. (My co-author assures me that these are consistent estimators; we've not yet implemented them though. Know anyone that has?) (Update 2:48: I was wrong, this model is *not* consistent but it could be. See Brian's note, below.)
  2. Dynamic topic models
  3. Ensembles of topic models (not our stuff, but from Jordan Boyd-Graber who is super smart and a friend of DC-NLP)

Other Stuff

  1. KERA keyword extraction used to label topics in one of my examples. (The paper applying it to LDA is forthcoming, however.)
  2. Rethinking Language: How probabilities shape the words we use (MUST READ, though not about topic modeling specifically.)
  3. David Blei's topic modeling website

From Brian on spherical topic models: "A small note on spherical topic models - the basic spherical topic model that is out there (SAM) is *not* a consistent estimator, but we have a framework to make a consistent estimator from my work on estimating mixtures of linear subspaces by tweaking the prior."

No comments:

Post a Comment