NLP相关python包 · python

结巴分词 [jieba](https://github.com/fxsjy/jieba) ***** 包括很多功能的汉语nlp工具 [pyhanlp](https://github.com/hankcs/pyhanlp) ***** 包括很多功能的汉语nlp工具 [pyltp](https://github.com/HIT-SCIR/pyltp) ***** word2vec, lda等模型 [gensim](https://radimrehurek.com/gensim/) ***** 英文的nlp工具 [textblob](https://textblob.readthedocs.io/en/dev/) ***** 英文的nlp工具 [nltk](https://www.nltk.org/) ***** 英文的nlp工具 [spacy](https://spacy.io/) ***** hash算法 [datasketch](https://github.com/ekzhu/datasketch) ***** pinyin [pinyin](https://github.com/mozillazg/python-pinyin) ***** 爬虫 [scrapy](https://scrapy.org) ***** 语言检测等功能的主打英文多语言NLP [polyglot](https://polyglot.readthedocs.io/en/latest/index.html) ***** 工业强度的主题模型 [Familia](https://github.com/baidu/Familia) ***** 基于flashtext算法的关键词搜索 [flashtext](https://github.com/vi3k6i5/flashtext) ***** 基于Levenshtein Distance计算文本匹配度 [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy/) 需要安装python-Levenshtein已提升速度 `pip install python-Levenshtein` ***** stanford corenlp [stanford-corenlp](https://stanfordnlp.github.io/CoreNLP/) ***** 基于bi-lstm的中文分词 [FoolNLTK](https://github.com/rockyzhengwu/FoolNLTK) ***** 北大出品的分词包 [pkuseg](https://github.com/lancopku/pkuseg-python) ***** 同义词 [Synonyms](https://github.com/huyingxi/Synonyms) ***** fasttest [pyfasttext](https://github.com/vrasneur/pyfasttext) [official](https://github.com/facebookresearch/fastText) **** 英文同义词、反义词等 [vocabulary](https://github.com/tasdikrahman/vocabulary) ***** 读写特殊文件格式 [pynlpl](https://github.com/proycon/pynlpl)