| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by npodbielski 30 days ago
	Yes, I was thinking about the same approach because I have Strix Halo and it slows down with longer context so context with less than <10k tokens would be achievable this way. If this could be done with small model that have >50tk/s that would be huge. Unfortunately I am caught up right now in other projects at work and otherwise and just tried few dozens of prompts to see if this is even achievable.