|
|
|
|
|
by immibis
538 days ago
|
|
NP is a category of decision problems - problems with boolean answers. Saying that it's NP-complete to find the tokeniser that produces the fewest symbols is meaningless. You have to convert it to the form "is there a tokenizer that produces fewer than N symbols?" before it even makes sense to ask whether it's NP-complete. |
|
There might be plenty of perfectly "good" tokenizers (whatever that ends up meaning) that can be found or generated without formulating their design as an NP-complete decision problem. Claiming "tokenization is NP-complete" (paper title) in general seems like an overstatement.