Skip-gram

Maximize the probability of any context word given the current centre word

The position of surrounding words does not matter

Skip-gram implicitly factorizing a word-context matrix, whose cells are the

Pointwise mutual information of the respective word and context pairs, shifted by a global constant.

Has better modeling capability for rare words, resulting in slightly better performance than

CBOW

Works well with small amount of the training data, represents well even rare words or phrases.

skip-trigram

"A… B C" [source]... [destination][out]

Negative Sampling

Change the objective function since the original one is naive and inefficient way for parameter inference (computationally expensive). New objective is high dot product with context words and low dot product with nagative samples.

arxiv.org

https://arxiv.org/pdf/1402.3722

Neural Word Embedding as Implicit Matrix Factorization

papers.nips.cc

https://papers.nips.cc/paper_files/paper/2014/file/feab05aa91085b7a8012516bc3533958-Paper.pdf

Skip-gram

Maximize the probability of any context word given the current centre word

skip-trigram

Negative Sampling

Backlinks

Recommendations