This module provides utilities for doing text processing.
Note that standard utilities in the text_analytics package can be used for
transforming text data into “bag of words” format, where a document is
represented as a dictionary mapping unique words with the number of times that
word occurs in the document. See
for more details. Also, see
unstack() for ways of creating SArrays
containing dictionary types.
We provide methods for learning topic models, which can be useful for modeling
large document collections. See
create() for more, as well as the
How-Tos, data science Gallery, and text analysis chapter of
the User Guide.