Topic Modeling & LDA Basics
- The clearest statement of LDA I've seen is on Wikipedia.
- Here is David Blei et. al's original paper.
- This paper introduces Gibbs sampling for LDA.
- pLSA (pLSI) is the frequentist version of LDA. They are equivalent under certain conditions.
The Topic Modeling Software I Use
- My own textmineR package
- R's lda package by Jonathan Chang
On Priors and Zipf's Law
- Rethinking LDA: Why Priors Matter (This is a good paper, though I am skeptical of the conclusion.)
- Comparison of topic models, their estimation algorithms, and priors. (Very underrated, MUST READ.)
- Incorporating Zipf's law in language models
- A note on estimating LDA with asymmetric priors
Evaluating LDA/Issues With LDA
- LDA is an inconsistent estimator
- Reading Tea Leaves: How humans interpret topic models (Also, MUST READ.)
- A coherence (cohesion?) metric for topic models. (Note: This metric has the issue of "liking" topics full of statistically-independent words. It is still useful though.)
- My working paper on an R-squared for topic models
Other Topic Models
- Spherical topic models
- Dynamic topic models
- Ensembles of topic models (not our stuff, but from Jordan Boyd-Graber who is super smart and a friend of DC-NLP)
Other Stuff
- KERA keyword extraction used to label topics in one of my examples. (The paper applying it to LDA is forthcoming, however.)
- Rethinking Language: How probabilities shape the words we use (MUST READ, though not about topic modeling specifically.)
- David Blei's topic modeling website
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.