Understanding Word2Vec

Limitations of One-Hot Encoding

One-hot encoding does not represent similarity because the distance between every pair of words is the same. Additionally, as the group of words grows larger, the dimension of the vector also increases, which causes inefficiency in calculations.

Word2Vec as a Solution

One solution is to use embeddings instead of encoding. The main purpose of Word2Vec is to represent the relationships between words more accurately using low-dimensional vectors, such as 2-dimensional vectors.

How Word2Vec Works

Word2Vec is a method for generating 2-dimensional vector representations based on measuring the similarity between neighboring words within sentences in a given dataset. Word2Vec assumes that neighboring words are more likely to have similar meanings to other words.

Simply put, we provide sentences as input. The hidden layer in the middle uses Word2Vec. It returns embedding values by multiplying the initial one-hot encoding values by the weights of the hidden (linear) neurons. Thus, Word2Vec provides 2-dimensional vectors, such as [1, 1].

CBOW and Skip-Gram

Word2Vec employs CBOW (Continuous Bag of Words) or Skip-gram to learn similarities between words. CBOW predicts a word in the blank by utilizing neighboring words, while Skip-gram predicts neighboring words by utilizing the word in the blank. It does not use cosine similarity (angle) or Euclidean similarity (distance).

Back to Blog