Deep Learning Math

Sub-field of ML: learning representations of data

Existing ML uses manually designed features

often over-specified and incomplete

take a long time to design and validate

Learned Features are easy to adapt, fast

Deep learning provides a very flexible, (almost?) universal

Effective end-to-end joint system learning

speach recog not good visual perception good Question Answering good

4 + 2 = 6 neurons (not counting inputs) - biases has same number (for each resulting node)

[3 x 4] + [4 x 2] = 20 weights

Optimize (min. or max.) objective/cost function 𝐽(𝜃) Generate error signal that measures difference between predictions and target values

word representation is 74 72 65 65 in x0

but it is pool

so we need new presentation of word

wordnet

WordNet: contains the list of synonyms/hypernyms ← using human resources

Word’s meaning is given by words that frequently appear close-by

Context: set of words that appears nearby in the fixed-size window (ex: before/after 5 words)

Vector dimension = Number of words * Number of words

memroy O(n^2) poor

so we use dimensionality reduction (pca, svd) ← almost all value is 0 so l;we can

→ word embedding 이제 벡터곱으로 similar판단가능 (become dense vector) - from one hot vector

We can make embedding for between other languagex

vector similarity (cosine)

Language Modeling: Models ofP(text) - sentance

score sentance

add all is exampel (linearfeatures)

usually softmax the sum

soft max is for before output

1. P(text) - linear model

CBOW

Predict word based on sum of surroundingembeddings

Skip-gram

Predict each word in the context given theword [very good, good, neutral, bad, verybad]

Linear Models can’tLearn FeatureCombinations

2. Models of P(label |text)

BOW

each word has its own 5 elements corresponding to

DeepCBOW

CombinationFeatures - Each vector has “features” (e.g. is this an animate object? is this a positive word, etc.)

ConvolutionalNetworks

x/ CNN - pooling

weak for long distance feature extractor

don't have holistic view of

x/ RNN

good for long disatnce fraeture extractor

weakness - Indirect passing of information, credit assignment moredifficult

Can be slow, due to incrementalprocessing

ModelingSentences w/n-grams

P(text |text)

RNN is frequently used in language modeling since RNN can capture long-distance dependencies