Biased Estimates: November 2018

Friday, November 16, 2018

textmineR v3.0 is here

textmineR version 3 (and up!) is here. This represents a major overhaul. The two most substantive changes are a native implementation of LDA and a more object-oriented take on topic models. The former allows for more flexibility in setting priors and a better Bayesian treatment of model fitting (e.g. averaging over the chain after a pre-determined burn in period). The latter enables a predict method for models, making textmineR's topic models have syntax similar to more traditional models in R. A longer list of changes is below.

Several functions that were slated for deletion in version 2.1.3 are now gone.
- RecursiveRbind
- Vec2Dtm
- JSD
- HellDist
- GetPhiPrime
- FormatRawLdaOutput
- Files2Vec
- DepluralizeDtm
- CorrectS
- CalcPhiPrime
FitLdaModel has changed significantly.
- Now only Gibbs sampling is a supported training method. The Gibbs sampler is no longer wrapping lda::lda_collapsed_gibbs_sampler. It is now native to textmineR. It's a little slower, but has additional features.
- Asymmetric priors are supported for both alpha and beta.
- There is an option, optimize_alpha, which updates alpha every 10 iterations based on the value of theta at the current iteration.
- The log likelihood of the data given estimates of phi and theta is optionally calculated every 10 iterations.
- Probabilistic coherence is optionally calculated at the time of model fit.
- R-squared is optionally calculated at the time of model fit.
Supported topic models (LDA, LSA, CTM) are now object-oriented, creating their own S3 classes. These classes have their own predict methods, meaning you do not have to do your own math to make predictions for new documents.
A new function SummarizeTopics has been added.
tm is no longer a dependency for stopwords. We now use the stopwords package. The extended result of this is that there is no longer any Java dependency.
Several packages have been moved from "Imports" to "Suggests". The result is a faster install and lower likelihood of install failure based on packages with system dependencies. (Looking at you, topicmodels!)
Finally, I have changed the textmineR license to the MIT license. Note, however, that some dependencies may have more restrictive licenses. So if you're looking to use textmineR in a commercial project, you may want to dig deeper into what is/isn't permissable.