| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by moyix 1944 days ago
	Yep, I trained my own BPE using HuggingFace's tokenizer library. During training I didn't keep the entire dataset in memory because even on an RTX8000 the full dataset + model weights + data used by the optimizer (ADAM) is too big.