No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Andrew-Curiohh
Andrew-Curiohh adding data
Latest commit 8c938c5 Dec 5, 2018
Permalink
Failed to load latest commit information.
data adding data Dec 5, 2018
extra-materials first commit Dec 5, 2018
images first commit Dec 5, 2018
practice adding data Dec 5, 2018
slides first commit Dec 5, 2018
.gitignore adding data Dec 5, 2018
CHANGELOG.md first commit Dec 5, 2018
README.md first commit Dec 5, 2018
Session 17 (Wed).pdf adding slides Dec 5, 2018
natural-language-processing.ipynb adding data Dec 5, 2018

README.md

Natural Language Processing

Unit 4: Required

Materials We Provide

Topic Description Link
Lesson Natural Language Processing Here
Practice Four sample NLP activities Here
Data Yelp Review and Tweet Datasets Here
Slides Sample slide deck for this topic (PPTX, deprecated) Here
Extra Materials Optional materials on Bayes Theorem and Naive Bayes Here

The Yelp dataset was chosen because of its rich and colloquial text attributes, in addition to how well it lends itself to sentiment analysis.

Note: This lesson also uses the Naive Bayes model MultinomialNB, which is often used for NLP applications, such as spam detection. An appendix is included at the end of the lesson for interested students. Supplemental materials are also offered if you want to explore Bayes-related topics.


Learning Objectives

By the end of this lesson, students should be able to:

  • Discuss the major tasks involved with natural language processing
  • Discuss, on a low level, the components of natural language processing
  • Identify why natural language processing is difficult
  • Demonstrate text classification
  • Demonstrate common text preprocessing techniques

Student Requirements

Before this lesson, students should already be able to:

  • Use Anaconda for package management
  • Use train/test/split to create a set of features and target values
  • Read data into a Pandas DataFrame
  • Build and evaluate predictive models using scikit-learn

Lesson Guide


Installation Notes

To procede through the lesson, first install TextBlob as explained below. We tend to prefer Anaconda-based installations, since they tend to be tested with our other Anaconda packages. However, in this case TextBlob is not available on some platforms with Anaconda (e.g. Win64). To install textblob:

  1. conda install -c https://conda.anaconda.org/sloria textblob

Or:

  1. pip install textblob
  2. python -m textblob.download_corpora lite

Additional Resources

For more information, we recommend the following resources: