Intro to Natural Language Text Mining - Short Course
This class will cover machine learning applied to natural language text documents. We will cover the use of statistical algorithms for accomplishing machine learning tasks on texts. We won't cover more traditional rule-based semantics, parsing, etc.
We'll start with some introduction to the subject matter, comparison of statistical techniques to semantic approaches, definition of problems in text mining, and simple text manipulations. We'll cover various algorithms for dealing with standard text mining problems, such as indexing, automatic classification (e.g. spam filtering) topic modeling, classification etc.
Course Outline
Intro to text mining problems
R language background
Basic text manipulations
Normalization
Stop words
Stemming
Document - Term Matrix Processing
Formation and Basic Manipulations of Document-Term Matrix
Latent Semantic Indexing - Search
Topic Modelling - Clustering and Classification
Spam Detection.
Prerequisites - Programming experience is required. We'll use code examples to work through the material. We'll use R programming language so you should have R installed and R Studio. There will be a short intro to R for those who haven't used it. Other than that you'll only need general undergrad level background math.
There's a $100 discount if you sign up at least 5 days before the class starts.
http://www.meetup.com/HandsOnProgrammingEvents/events/92124662/