Knowledge Base
Natural Language Processing
NLP is a branch of computer science that allows computers to understand Human Language.Using NLP we are able to derive meaningful insights and use them in practical applications such as
ChatBots,Spam filtering,spell check ,making google
search better and the list goes on ….
NLP has few steps such as
Tokenization
Stemmation
.Lemmetization
.POS Tags
(Parts of Speech).Named Entity Recognition
.Chunking
.
Tokenization
Tokenization is the first step of the NLP process. It’s process of splitting text into minimal meaningful units so our machine can understand. Furthermore Read
Stemmation
In simplest terms Stemmation is a process of getting a root word. For instance if there are words such as Plays
,Played
,Playing
for this example the root word is Play
.
Stemming is usually done by stripping the prefixes and suffixes from the words.
Further Read
Lemmetization
Lemmetization is more sophisticated technique compared stemmation
. As we know stemmation just gets the root word for instance words like car
wont be matched with automobile
when doing stemmation
. But in case of lemmmetization
car
will be matched with automobile
. In lemmatization, the part of speech of a word should be first determined and the normalisation rules will be different for different part of speech.
Note: Stemmation
only strips down words to root words while stripping prefixes and suffixes. While lemmatization will put together words by the use of correct vocabulary.
For instance , car
will be matched with automobile
. or truck
will be matched with lorry
.
Further Read
POS
Tagging words with correct parts of speech.
Named Entities.
This process is related to defining the named entities in the text. For instance Mark Zuckerburg is the CEO of Facebook. In this examples Mark Zuckerburg
and Facebook
is a named entity.
Chunking
Chunking is a process of extracting phrases from unstructured text. Instead of just simple tokens which may not represent the actual meaning of the text, its advisable to use phrases such as “South Africa” as a single word instead of ‘South’ and ‘Africa’ separate words.
** Further Read **
Home