Sub-field of ML: learning representations of data
Existing ML uses manually designed features
- often over-specified and incomplete
- take a long time to design and validate
DP
- Learned Features are easy to adapt, fast
- Deep learning provides a very flexible, (almost?) universal
- Effective end-to-end joint system learning
speach recog not good visual perception good Question Answering good
4 + 2 = 6 neurons (not counting inputs) - biases has same number (for each resulting node)
[3 x 4] + [4 x 2] = 20 weights
Optimize (min. or max.) objective/cost function 𝐽(𝜃) Generate error signal that measures difference between predictions and target values
word representation is 74 72 65 65 in x0
but it is pool
so we need new presentation of word
wordnet
WordNet: contains the list of synonyms/hypernyms ← using human resources
Word’s meaning is given by words that frequently appear close-by
Context: set of words that appears nearby in the fixed-size window (ex: before/after 5 words)
Vector dimension = Number of words * Number of words
memroy O(n^2) poor
so we use dimensionality reduction (pca, svd) ← almost all value is 0 so l;we can
→ word embedding 이제 벡터곱으로 similar판단가능 (become dense vector) - from one hot vector
We can make embedding for between other languagex
vector similarity (cosine)
Language Modeling: Models ofP(text) - sentance
score sentance
add all is exampel (linearfeatures)
usually softmax the sum
soft max is for before output
1. P(text) - linear model
CBOW
Predict word based on sum of surroundingembeddings
Skip-gram
Predict each word in the context given theword [very good, good, neutral, bad, verybad]
Linear Models can’tLearn FeatureCombinations
2. Models of P(label |text)
BOW
each word has its own 5 elements corresponding to
DeepCBOW
CombinationFeatures - Each vector has “features” (e.g. is this an animate object? is this a positive word, etc.)
ConvolutionalNetworks
x/ CNN - pooling
weak for long distance feature extractor
don't have holistic view of
x/ RNN
good for long disatnce fraeture extractor
weakness - Indirect passing of information, credit assignment moredifficult
Can be slow, due to incrementalprocessing
ModelingSentences w/n-grams
P(text |text)
Conditional LanguageModels
Calculating the Probability of aSentence
RNN is frequently used in language modeling since RNN can capture long-distance dependencies