| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by localhost 747 days ago
	This is a giant dataset of 536GB of embeddings. I wonder how much compression is possible by training or fine-tuning a transformer model directly using these embeddings, i.e., no tokenization/decoding steps? Could a 7B or 14B model "memorize" Wikipedia?