Shared Flashcard Set

Details

Patterns in Language Final
Introduction course to computational linguistics
24
Computer Science
Undergraduate 3
12/13/2015

Additional Computer Science Flashcards

 


 

Cards

Term
Document Classification
Definition
Sort documents to user-defined classes
Term
Sentiment Analysis
Definition
Automate the selection of positive and negative terms in a document. Useful for political polls, marketing.
Term
Spam Identification
Definition
Calculating the frequency of n-grams in a certain language that are usually spam words.
Term
Rule based spam identification
Definition
Filters spam based on rules and adds weight to certain n-grams and once it passes some threshold, its identified as spam.
Term
Statistical approach spam identification
Definition
These learn from a large set of examples--one spam set, one ham set. They can adapt based on what emails are marked as spam by all or specific users.
Term
Rule based identification drawbacks
Definition
They are, by nature, one step behind spammers because a pattern has to be identified first and by that time, the spam is already out.
Term
Supervised learning
Definition
Training set and test set that is pre-programmed with the correct answers.
Term
Supervised learning method
Definition
1. Label a corpus of artciels with desired categories to make training and test sets
2. Apply machine learning software to the labeled training system set that summarizes whats been learned.
3. Generate predictions for test set model
4. Deploy model on untested set
Term
Unsupervised learning
Definition
There are no pre-assumed categories but there are now cluster articles that have similar properties, like being about sports. Its less costly because you dont have to sit someone down and label every single document but the clusters may not be intuitive and clustering solutions are difficult to evaluate.
Term
Feature-engineering
Definition
Looks at most relevant properties of spam
Term
Kitchen sink feature engineering
Definition
Use many features in the hope that some will be relevant and useful. Make every word a feature and choose a machine learning method that is good at focusing on few but important features and ignores irrelevant features.
Term
Hand crafted strategy of feature enginering
Definition
Carefully and thoughtfully identify a small set of features that are likely to be relevant. The downside is that you have the choose the features.
Term
Naive Bayes for document classification
Definition
Take a word. Count how much of that word is in spam and how much is in ham and calculat ethat ratio Then calculate the odds ratio (ham/total over spam/total). Combine the
Term
Bag of words assumption
Definition
Pretend you're dealing with an unstructured set of data that ignores syntax and topic structure. Put all the words of a document in a bag, draw a word and calculate which document its most likely to have come from.
Term
Perceptron
Definition
Error-driven learning. It predicts outcomes and then adjusts the weights when it makes the wrong prediction. Initially the weights are uninformative but over time it builds up an ability to associate features with outcomes. Its a network with two layers; one node for each possible unput features and one for each possible outcome (spam and ham)
Term
Past tense debate
Definition
How do people learn regular and irregular forms of words?
Term
U-shaped curve
Definition
Star with good performance on some task, then get substantially worse, and then gradually get better again.
Term
Wug test
Definition
A test given to kids with a made up noun, "wug" and see if kids can determine the plural form.
Term
Gricean Maxims
Definition
Quantity: keep it short and sweet. Not TMI.
Quality: Don't lie or be sarcastic.
Relation: Say things that are pertinent to the question.
Manner: Be clear, brief, and orderly.
Term
SHRDLU
Definition
A robot that was an expert is moving shapes around. like, REALLY good. This showed that AI is successful but only in a very controlled and within a specific domain
Term
The chinese room
Definition
A man sits in a room with a Chinese rule book. The input is in English, he translates it using the rule book, and outputs in perfect chinese. Does he know chinese? Does the room know chinese?
Term
Eliza
Definition
A therapy model that wasnt very good at her job.
Term
Semantics
Definition
the logical aspects of language and its meaning
Term
Pragmatics
Definition
How context contributes to meaning
Supporting users have an ad free experience!