Sub-field of ML: learning representations of data
Existing ML uses manually designed features
- often over-specified and incomplete
- take a long time to design and validate
DP
- Learned Features are easy to adapt, fast
- Deep learning provides a very flexible, (almost?) universal
- Effective end-to-end joint system learning
speach recog not good visual perception good Question Answering good
4 + 2 = 6 neurons (not counting inputs) - biases has same number (for each resulting node)
[3 x 4] + [4 x 2] = 20 weights



Optimize (min. or max.) objective/cost function š½(š) Generate error signal that measures difference between predictions and target values
word representation is 74 72 65 65 in x0
but it is pool
so we need new presentation of word
wordnet
WordNet: contains the list of synonyms/hypernyms ā using human resources
Wordās meaning is given by words that frequently appear close-by
Context: set of words that appears nearby in the fixed-size window (ex: before/after 5 words)
Vector dimension = Number of words * Number of words
memroy O(n^2) poor
so we use dimensionality reduction (pca, svd) ā almost all value is 0 so l;we can
ā word embedding ģ“ģ ė²”ķ°ź³±ģ¼ė” similarķėØź°ė„ (become dense vector) - from one hot vector
We can make embedding for between other languagex
vector similarity (cosine)

Language Modeling: Models ofP(text) - sentance
score sentance

add all is exampel (linearfeatures)
usually softmax the sum


soft max is for before output

1. P(text) - linear model
CBOW
Predict word based on sum of surroundingembeddings
Skip-gram
Predict each word in the context given theword [very good, good, neutral, bad, verybad]
Linear Models canātLearn FeatureCombinations
2. Models of P(label |text)
BOW
each word has its own 5 elements corresponding to
DeepCBOW
CombinationFeatures - Each vector has āfeaturesā (e.g. is this an animate object? is this a positive word, etc.)
ConvolutionalNetworks
x/ CNN - pooling
weak for long distance feature extractor
don't have holistic view of
x/ RNN
good for long disatnce fraeture extractor
weakness - Indirect passing of information, credit assignment moredifficult
Can be slow, due to incrementalprocessing
ModelingSentences w/n-grams
P(text |text)
RNN is frequently used in language modeling since RNN can capture long-distance dependencies