Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community. Features: * All algorithms are memory-independent w.r.t. the corpus size (can process input larger than RAM, streamed, out-of-core), * Intuitive interfaces * easy to plug in your own input corpus/datastream (trivial streaming API) * easy to extend with other Vector Space algorithms (trivial transformation API) * Efficient multicore implementations of popular algorithms, such as online Latent Semantic Analysis (LSA/LSI/SVD), Latent Dirichlet Allocation (LDA), Random Projections (RP), Hierarchical Dirichlet Process (HDP) or word2vec deep learning. * Distributed computing: can run Latent Semantic Analysis and Latent Dirichlet Allocation on a cluster of computers. * Extensive documentation and Jupyter Notebook tutorials. WWW: https://radimrehurek.com/gensim/