| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by photonthug 282 days ago

From the paper abstract,

> (4) we derive the optimal chain-of-thought length as [..math..] with explicit constants

I know we probably have to dive into math and abandon metaphor and analogy, but the whole structure of a claim like this just strikes me as bizarre.

Chain-of-thought always makes me think of that old joke. Alexander the great was a great general. Great generals are forewarned. Forewarned is forearmed. Four is an odd number of arms to have. Four is also an even number. And the only number that is both odd and even is infinity. Therefore, Alexander, the great general, had an infinite number of arms.

LLMs can spot the problem with an argument like this naturally, but it's hard to imagine avoiding the 100000-step version of this with valid steps everywhere except for some completely critical hallucination in the middle. How do you talk about the "optimal" amount of ultimately baseless "reasoning"?

1 comments

ep103 282 days ago

Yesterday I used ChatGPT to transform a csv file. Move around a couple of columns, add a few new ones. Very large file.

It got them all right. Except when I really looked through the data, for 3 of the excel cells, it clearly just made up new numbers. I found the first one by accident, the remaining two took longer than it would have taken to modify the file from scratch myself.

Watching my coworkers blindly trust output like this is concerning.

photonthug 282 days ago

After we fix the all the simple specious reasoning of stuff like Alexander-the-great and agree to out-source certain problems to appropriate tools, the high-dimensional analogs of stuff like Datasaurus[0] and Simpson's paradox[1] etc are still going to be a thing. But we'll be so disconnected from the representation of the problems that we're trying to solve that we won't even be aware of the possibility of any danger, much less able to actually spot it.

My take-away re: chain-of-thought specifically is this. If the answer to "LLMs can't reason" is "use more LLMs", and then the answer to problems with that is to run the same process in parallel N times and vote/retry/etc, it just feels like a scam aimed at burning through more tokens.

Hopefully chain-of-code[2] is better in that it's at least trying to force LLMs into emulating a more deterministic abstract machine instead of rolling dice. Trying to eliminate things like code, formal representations, and explicit world-models in favor of implicit representations and inscrutable oracles might be good business but it's bad engineering

[0] https://en.wikipedia.org/wiki/Datasaurus_dozen [1] https://towardsdatascience.com/how-metrics-and-llms-can-tric... [2] https://icml.cc/media/icml-2024/Slides/32784.pdf

dingnuts 282 days ago

> it just feels like a scam aimed at burning through more tokens.

IT IS A SCAM TO BURN MORE TOKENS. You will know when it is no longer a scam when you either:

1) pay a flat price with NO USAGE LIMITS

or

2) pay per token with the ability to mark a response as bullshit & get a refund for those wasted tokens.

Until then: the incentives are the same as a casino's which means IT IS A SCAM.

phs318u 281 days ago

Ding ding ding! We have a winner!

befictious 281 days ago

>it just feels like a scam aimed at burning through more tokens.

I have a growing tin foil hat theory that the business model of LLM's is the same as 1-900-psychic numbers of old.

For just 25¢ 1-900-psychic will solve all your problems in just 5 minutes! Still need help?! No problem! We'll work with you until you get your answers for only 10¢ a minute until your happy!

eerily similar

jmogly 282 days ago

To me it’s a problem of if a piece of information is not well represented in the training data the llm will always tend towards bad token predictions for related to said information. I think the next big thing in LLM’s could be figuring out how to tell if a token was just a “fill in” or “guess” vs a well predicted token. That way you can have some sort of governor that can kill a response if it is getting too guessy, or atleast provide some other indication that the provided tokens are likely hallucinated.

Maybe there is some way to do it based on the geometry of how the neural net activated for a token, or some other more statistics based approach, idk I’m not an expert.

photonthug 281 days ago

A related topic you might want to look into here is called nucleus sampling. Similar to temperature but also different.. it's been surprising to me that people don't talk about it more often, and that lots of systems won't expose the knobs for it.

weinzierl 282 days ago

It sometimes happens with simple things. I once pasted the announcement for an event in Claude to check for spelling and grammar.

It had a small suggestion for the last sentence and repeated the whole corrected version for me to copy and paste.

Only last sentence slightly modified - or so I thought because it had moved the date of the event in the first sentence by one day.

Luckily I caught it before posting, but it was a close call.

toss1 281 days ago

Yup, I always take editing suggestions and implement them manually, then re-feed the edited version back in for new suggestions if needed. Never let it edit your stuff directly —— the risk of stealth random errors sneaking in is too great.

Just because every competent human we know would edit ONLY the specified parts, or move only the specified columns with a cut/paste operation (or similar deterministically reliable operation), does not mean an LLM will do the same, in fact, it seems to prefer to regenerate everything on the fly. NO, just NO.

K0balt 280 days ago

Tool use seems like a much better solution in theory. I wonder how it works out IRL?

throwawayoldie 282 days ago

> Yesterday I used ChatGPT to transform a csv file. Move around a couple of columns, add a few new ones. Very large file.

I'm struggling with trying to understand how using an LLM to do this seemed like a good idea in the first place.

recursive 282 days ago

When you have a shiny new hammer, everything around you takes on a nail-like aspect.

spongebobstoes 282 days ago

the safe way to do this is to have it write code to transform data, then run the code

I expect future models will be able to identify when a computational tool will work, and use it directly

epiccoleman 281 days ago

I don't mean to be rude, but this sounds like user error. I don't understand why anyone would use an LLM for this - or at least, why you would let the LLM perform the transformation.

If I was trying to do something like this I would ask the LLM to write a Python script, validate the output by running it against the first handful of rows (like, `head -n 10 thing.csv | python transform-csv.py`).

There are times when statistical / stochastic output is useful. There are other times when you want deterministic output. A transformation on a CSV is the latter.

ep103 280 days ago

Because it markets and presents itself as deterministic and honest. That's the whole issue. AI is unethically marketed and presented to the public.

epiccoleman 280 days ago

iPod marketing presented then as a device that made you cool. I just used mine to listen to music though