Topic Modeling & LDA Basics
- The clearest statement of LDA I've seen is on Wikipedia.
- Here is David Blei et. al's original paper.
- This paper introduces Gibbs sampling for LDA.
- pLSA (pLSI) is the frequentist version of LDA. They are equivalent under certain conditions.
The Topic Modeling Software I Use
On Priors and Zipf's Law
- Rethinking LDA: Why Priors Matter (This is a good paper, though I am skeptical of the conclusion.)
- Comparison of topic models, their estimation algorithms, and priors. (Very underrated, MUST READ.)
- Incorporating Zipf's law in language models
- A note on estimating LDA with asymmetric priors
Evaluating LDA/Issues With LDA
- LDA is an inconsistent estimator
- Reading Tea Leaves: How humans interpret topic models (Also, MUST READ.)
- A coherence (cohesion?) metric for topic models. (Note: This metric has the issue of "liking" topics full of statistically-independent words. It is still useful though.)
- My working paper on an R-squared for topic models
Other Topic Models
- KERA keyword extraction used to label topics in one of my examples. (The paper applying it to LDA is forthcoming, however.)
- Rethinking Language: How probabilities shape the words we use (MUST READ, though not about topic modeling specifically.)
- David Blei's topic modeling website