Hacker News new | ask | show | jobs
by float-trip 924 days ago
That's what I ended up doing (`[Author] username [Title] post title...`)

> Adding new tokens needs a ton of data to train what the token means.

But how much? 300M tokens is fine for a simple version of ChatML with ~4 tokens. Not for 15, at least in my case. How's this relationship scale?

Just trying to offer one datapoint for what doesn't work, with the hedge that I might have just had a bug

1 comments

I don't know how many tokens are required to get good results, because I simply didn't mark mine as "special_tokens" due to the issues that I had read about. I got great results, whereas others who have tried special tokens got pretty poor results. I'm sure there is a magic number, but it's just not been worth it for me to explore that area yet.