The vignettes cover the philosophy of textmineR, basic corpus statistics, document clustering, topic modeling, text embeddings (which is basically topic modeling of a term co-occurrence matrix), and building a basic document summarizer. That last vignette uses text embeddings plus a variation of the TextRank algorithm.
The other updates are relatively minor. @manuelbickle discovered that my implementation of
CalcProbCoherence
was scaled differently from what I'd intended. That's fixed, though it shouldn't affect the qualitative use of probabilistic coherence. Second, I realized that my documentation for CreateTcm
was misleading. So, that's now fixed.