|
|
|
|
|
by josh-sematic
661 days ago
|
|
Very cool! At Airtrain we’ve also found embeddings can be very valuable for building classification models. If you’re looking to play around with a large amount of text and embeddings we actually recently deduped and embedded all of fineweb-edu (also mentioned in the article) and put the resulting dataset on Hugging Face: https://huggingface.co/datasets/airtrain-ai/fineweb-edu-fort... |
|