Hacker News new | ask | show | jobs
by ndnichols 1205 days ago
I just had this interaction with ChatGPT.

Me: Reverse the digits of 12+39

ChatGPT: The sum of 12 and 39 is 51. If you reverse the digits, you get 15.

Me: Reverse the digits of 12 + 84. Only respond with the reversed digits, no explanation

ChatGPT: The reversed digits of 12 + 84 are 96.

Which makes me think that longer explanations give it more of a chance to think because it gets more passes through the model. Weird!

2 comments

It's never going to be great at math problems, it is a language model.
I wonder if ChatGPT could be "wired up" to https://www.wolfram.com/ somehow to "strengthen" that "weakness"?
Yes. And if you give it a database schema, it can answer free-form questions about the data in it by generating SQL queries, so long as you wire up the results (or just manually copy/paste them). Although it does hallucinate fields in tables sometimes - but if your wiring reports errors in a readable way, it will usually self-correct.

I think the most interesting potential development of this concept would be to give it the ability to spawn child instances to process subtasks (such that each subtask gets its own token window!) and produce intermediate results that it would that combine. It can be done manually (copy/paste) with a lot of handholding; the trick is to come up with a way to automate it, such that it's clear which part of the output is a request to spawn a submodel + its prompt, and the result is also communicated in some way that's clear to the model.

Or it could write code in python and evaluate it, people are experimenting with that sort of thing.
OOoo… Hook that up to the ChatGPT API and let it modify itself with additional code? SkyNet / Matrix here we come!
Amount of compute applied to the problem is roughly linear to the number of input+output tokens. It is hard to predict at what stage the compute is applied to parse and create the embedding representing the problem and when it is applied to actually solve it.

And anyway, probably most of the compute is used to judge the social standing of the person asking the question. And if it is worth bothering to answer it ;)