|
|
|
|
|
by windsignaling
495 days ago
|
|
I'm not a fan of these "gotchas" because they don't test for what we really care about. Like counting the number of R's in strawberry, many of these are character-counting or character manipulation problems which tokenization is not well-suited for. I'm sure an engineer could come up with a clever way to train for this, but that seems like optimizing for the wrong thing. IMO these questions go in the wrong direction. Character permutation is a problem for "Software 1.0", not LLMs. Just as you wouldn't use an LLM to multiply 2 large numbers, you'd use a calculator. |
|
Imagine a model that isn't sure if 9.11 is greater than 9.9 - which is difficult to reason about, because tokens.
Could such a model coach kids in math? Could it proofread a paper, or sense-check a business plan? Could it summarise a long document about carbon emissions? Could it generate a GUI? Could it spot mistakes in an OCRed document? Spot an off-by-one error or divide-by-zero in computer code?