textmineR version 3 (and up!) is here. This represents a major overhaul. The two most substantive changes are a native implementation of LDA and a more object-oriented take on topic models. The former allows for more flexibility in setting priors and a better Bayesian treatment of model fitting (e.g. averaging over the chain after a pre-determined burn in period). The latter enables a predict method for models, making textmineR's topic models have syntax similar to more traditional models in R. A longer list of changes is below.
- Several functions that were slated for deletion in version 2.1.3 are now gone.
- FitLdaModel has changed significantly.
- Now only Gibbs sampling is a supported training method. The Gibbs sampler is no longer wrapping lda::lda_collapsed_gibbs_sampler. It is now native to textmineR. It's a little slower, but has additional features.
- Asymmetric priors are supported for both alpha and beta.
- There is an option, optimize_alpha, which updates alpha every 10 iterations based on the value of theta at the current iteration.
- The log likelihood of the data given estimates of phi and theta is optionally calculated every 10 iterations.
- Probabilistic coherence is optionally calculated at the time of model fit.
- R-squared is optionally calculated at the time of model fit.
- Supported topic models (LDA, LSA, CTM) are now object-oriented, creating their own S3 classes. These classes have their own predict methods, meaning you do not have to do your own math to make predictions for new documents.
- A new function SummarizeTopics has been added.
- tm is no longer a dependency for stopwords. We now use the stopwords package. The extended result of this is that there is no longer any Java dependency.
- Several packages have been moved from "Imports" to "Suggests". The result is a faster install and lower likelihood of install failure based on packages with system dependencies. (Looking at you, topicmodels!)
- Finally, I have changed the textmineR license to the MIT license. Note, however, that some dependencies may have more restrictive licenses. So if you're looking to use textmineR in a commercial project, you may want to dig deeper into what is/isn't permissable.