Deep Learning Math

Deep Learning Math

Creator
Created
Created
2019 Nov 5 5:18
Editor
Edited
Edited
2023 Mar 26 5:11
Refs
Refs

Sub-field of ML: learning representations of data

Existing ML uses manually designed features
  • often over-specified and incomplete
  • take a long time to design and validate
DP
  • Learned Features are easy to adapt, fast
  • Deep learning provides a very flexible, (almost?) universal
  • Effective end-to-end joint system learning
 
speach recog not good visual perception good Question Answering good
 
4 + 2 = 6 neurons (not counting inputs) - biases has same number (for each resulting node)
[3 x 4] + [4 x 2] = 20 weights
 
notion image
notion image
 
 
 
 
 
notion image
Optimize (min. or max.) objective/cost function 𝐽(𝜃) Generate error signal that measures difference between predictions and target values
 
 
 
word representation is 74 72 65 65 in x0
but it is pool
so we need new presentation of word
 
 
 

wordnet


WordNet: contains the list of synonyms/hypernyms ← using human resources
Word’s meaning is given by words that frequently appear close-by

Context: set of words that appears nearby in the fixed-size window (ex: before/after 5 words)

Vector dimension = Number of words * Number of words
 
memroy O(n^2) poor
so we use dimensionality reduction (pca, svd) ← almost all value is 0 so l;we can
→ word embedding 이제 벡터곱으로 similar판단가능 (become dense vector) - from one hot vector
 
 
We can make embedding for between other languagex
vector similarity (cosine)
notion image
 
 
 
 

Language Modeling: Models ofP(text) - sentance


score sentance
 
notion image
add all is exampel (linearfeatures)
usually softmax the sum
notion image
notion image
 
 
soft max is for before output
 
notion image

1. P(text) - linear model

CBOW

Predict word based on sum of surroundingembeddings
 

Skip-gram

Predict each word in the context given theword [very good, good, neutral, bad, verybad]
Linear Models can’tLearn FeatureCombinations

2. Models of P(label |text)

BOW

each word has its own 5 elements corresponding to

DeepCBOW

CombinationFeatures - Each vector has “features” (e.g. is this an animate object? is this a positive word, etc.)
 

ConvolutionalNetworks

x/ CNN - pooling

weak for long distance feature extractor
don't have holistic view of

x/ RNN

good for long disatnce fraeture extractor
weakness - Indirect passing of information, credit assignment moredifficult
Can be slow, due to incrementalprocessing
 

ModelingSentences w/n-grams

 

P(text |text)

 
notion image
Conditional LanguageModels
notion image
Calculating the Probability of aSentence
 
RNN is frequently used in language modeling since RNN can capture long-distance dependencies
 
 
 

Recommendations