Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Data Processing/Big Data/Data Mining/Text Mining/Text Classification/
GZIP is all you need
Search

GZIP is all you need

Creator
Creator
Seonglae Cho
Created
Created
2023 Apr 30 9:27
Editor
Editor
Seonglae Cho
Edited
Edited
2024 Apr 22 6:15
Refs
Refs

Good for text classification

gzip with a k-nearest-neighbor classifier. The method achieves competitive results with non-pretrained deep learning methods on six in-distribution datasets and outperforms BERT on all five out-of-distribution datasets, including four low-resource languages
 
 
 
 
David Sauerwein on LinkedIn: What a fantastic result! A 0-parameter 14-line Python script using gzip… | 178 comments
What a fantastic result! A 0-parameter 14-line Python script using gzip compression and K-nearest neighbors outperforms a 345 Million-parameter transformer… | 178 comments on LinkedIn
David Sauerwein on LinkedIn: What a fantastic result!  A 0-parameter 14-line Python script using gzip… | 178 comments
https://www.linkedin.com/feed/update/urn:li:activity:7085349430819700737/
David Sauerwein on LinkedIn: What a fantastic result!  A 0-parameter 14-line Python script using gzip… | 178 comments
aclanthology.org
https://aclanthology.org/2023.findings-acl.426.pdf
Luke Gessler | @lgessler@lingo.lol on Twitter
this paper's nuts. for sentence classification on out-of-domain datasets, all neural (Transformer or not) approaches lose to good old kNN on representations generated by.... gzip https://t.co/6eZiXlJxOX pic.twitter.com/sF9kd1FzI4— Luke Gessler | @lgessler@lingo.lol (@LukeGessler) July 12, 2023
Luke Gessler | @lgessler@lingo.lol on Twitter
https://twitter.com/LukeGessler/status/1679211291292889100
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Data Processing/Big Data/Data Mining/Text Mining/Text Classification/
GZIP is all you need
Copyright Seonglae Cho