| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thomasahle 763 days ago
	Maybe you could just use a good-old 1D-CNN for the bottom 3-4 layers. Then the model has been able to combine characters into roughly token length chunks anyway. Just make sure to have some big MLPs at the start too, to enrich the "tokens" with the information currently stored in the embedding tables.