Maximize the probability of any context word given the current centre word
The position of surrounding words does not matter
Skip-gram implicitly factorizing a word-context matrix, whose cells are the Pointwise mutual information of the respective word and context pairs, shifted by a global constant.
Has better modeling capability for rare words, resulting in slightly better performance than CBOW
Works well with small amount of the training data, represents well even rare words or phrases.
skip-trigram
"A… B C" [source]... [destination][out]
Negative Sampling
Change the objective function since the original one is naive and inefficient way for parameter inference (computationally expensive). New objective is high dot product with context words and low dot product with nagative samples.
Neural Word Embedding as Implicit Matrix Factorization