BoW
Assumption: words are independent, order doesn’t matter (Unigram Language model)
Data representation method which do not considering order, only focus on frequency
Originated from Texture Recognition
생성하거나 예측하는 것의 학습에는 유효하지 않다
- represent word as one-hot vector word마다 id부여 index
→ all is orthogonal → similarity 0 → cannot contain semantic
데이터의 분포를 보고 이 데이터 corpus가 어떤 종류의 corpus인지 판단