Biased Estimates: LDA and Topic Models Reading List

Friday, November 14, 2014

LDA and Topic Models Reading List

Good crowd tonight to hear @thos_jones talk about topic modeling #NLProc #datadc cc: @DataCommunityDC @YourGirlK pic.twitter.com/dthE7z1lRB
— DC NLP Meetup (@DCNLP) November 13, 2014

A big thank you to everyone that came to see me talk about topic models at DC-NLP on Wednesday. I am grateful for the feedback that I received. I'd also like to give a big shout out to my co-author, Brian St. Thomas. Not only has his hard work made our research shine, he is the one who came up with the "ball and urns" graphic to explain topic models. Many people came up to me afterwords saying how intuitive that was; I wish I could take the credit, but it was all Brian.

While I wait on approval from work to release my slides, I thought I'd put together an LDA-related reading list of many of my sources. I've done a bit of that before here. Some of those papers are also below, as well as others.

LDA Basics

On Priors and Zipf's Law

Rethinking LDA: Why Priors Matter (This is a good paper, though I am skeptical of the conclusion.)
Comparison of topic models, their estimation algorithms, and priors. (Very underrated, MUST READ.)
Incorporating Zipf's law in language models
A note on estimating LDA with asymmetric priors

Evaluating LDA/Issues With LDA

LDA is an inconsistent estimator
Reading Tea Leaves: How humans interpret topic models (Also, MUST READ.)
A coherence (cohesion?) metric for topic models. (Note: This metric has the issue of "liking" topics full of statistically-independent words. It is still useful though.)

Other Stuff

KERA keyword extraction used to label topics in one of my examples. (The paper applying it to LDA is forthcoming, however.)
Rethinking Language: How probabilities shape the words we use (MUST READ, though not about topic modeling specifically.)
David Blei's topic modeling website

From Brian on spherical topic models: "A small note on spherical topic models - the basic spherical topic model that is out there (SAM) is *not* a consistent estimator, but we have a framework to make a consistent estimator from my work on estimating mixtures of linear subspaces by tweaking the prior."

Biased Estimates

Friday, November 14, 2014

LDA and Topic Models Reading List

LDA Basics

On Priors and Zipf's Law

Evaluating LDA/Issues With LDA

Other Topic Models

Other Stuff

No comments:

Post a Comment