Introduction to Natural Language Processing in Python

This tutorial introduces the basics of natural language processing (NLP) in Python. If you have encountered a pile of textual data for the first time, this is the right place for you to begin your journey of making sense of the data. This tutorial is based on Python version 3.6.5 and NLTK version 3.3.

Before the common NLP tasks --- word frequency, word cloud, NER and TF-IDF, the data should be cleaned by word tokenization, converting words to their canonical form and removing noise.

Taking out word frequency is the most basic form of analysis on textual data. A single tweet is too small an entity to find out the distribution of words, hence, the analysis of the frequency of words would be done on all of the 20000 tweets.

After you have plotted the most frequent words, you need to visualize the distribution of words, then you can create a word cloud using the wordcloud package.

Named Entity Recognition (NER) is the process of detecting the named entities such as persons, locations and organizations from your text.

The TF-IDF (term frequency - inverse document frequency) is a statistic that signifies how important a term is to a document. Ideally, the terms at the top of the TF-IDF list should play an important role in deciding the topic of the text.

For step by step tutorial of Natural Language Processing in Python, please go to Natural Language Processing in Python 3 Using NLTK.

Related Products

Machine Learning Platform for AI

Machine Learning Platform for AI is an end-to-end platform that provides various machine learning algorithms to meet your data mining and analysis requirements, including text processing components for NLP.

Realtime Compute

Realtime Compute offers a one-stop, high-performance platform that enables real-time big data processing based on Apache Flink. It is widely used in diverse scenarios, such as streaming data processing, offline data processing, and data lake computing.

Community

Introduction to Natural Language Processing in Python

Related Blog Posts

An Introduction to Core Machine Learning

How to Create and Deploy a Pre-Trained Word2Vec Deep Learning REST API

Related Products

Machine Learning Platform for AI

Realtime Compute

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

MaxCompute