1. NLTK (Natural Language Toolkit): NLTK is the most popular and widely-used open-source library for NLP. It provides modules for building programs that process natural language, such as tokenization, part-of-speech tagging, stemming, sentiment analysis, and more. Example:

import nltk
sentence = “The brown fox jumps over the lazy dog.”
tokens = nltk.word_tokenize(sentence)
print(tokens)

2. SpaCy: SpaCy is an open-source library for advanced NLP. It provides a fast and accurate syntactic parser, named entity recognition, and more. Example:

import spacy
nlp = spacy.load(“en_core_web_sm”)
doc = nlp(“The brown fox jumps over the lazy dog.”)
for token in doc:
print(token.text, token.pos_)

3. Gensim: Gensim is an open-source library for unsupervised topic modeling and natural language processing. It provides tools for creating and analyzing vector space models, such as word2vec and doc2vec. Example:

import gensim
from gensim.models import Word2Vec
sentences = [[“the”, “brown”, “fox”], [“jumps”, “over”, “the”, “lazy”, “dog”]]
model = Word2Vec(sentences, min_count=1)
print(model.wv.similarity(“fox”, “dog”))

Leave a Reply

Your email address will not be published. Required fields are marked *