Hacker News new | ask | show | jobs
by devin 1031 days ago
It takes like 2-3 experiences of receiving a confidently wrong answer to downgrade your usage. If you use a refactoring tool to rename and it misses one, you won’t use it again.
3 comments

While that would likely be my experience with a refactoring tool (unless I didn't have a better alternative), that's not my experience with ChatGPT 4. And that's considering I have very little tolerance for buggy software.

There was a period of a few weeks or months in which it seemed like ChatGPT had really degraded to the point of being unusable (although it could have been my biases). However, it seems to be better now (again, my subjective experience).

Sometimes I still catch it making really basic mistakes, but most times I can convince it to correct the mistake (especially if I point them out).

But what's most amazing to me is how ChatGPT is absolutely brilliant at some things, and not just technical or even obscure topics.

Recently, it gave me the most amazing idea for navigating a complex and nuanced social situation I was having difficulty with. And given the constraints of the situation, there was no way I could have gotten that idea otherwise, especially in the allotted time.

So despite its flaws and mistakes, I still find it to be a tremendously useful tool, even if only to point me in the right direction.

Given the fact that OpenAI has constant resources (for any given small span of time) and varying demand (users and query type), it's not crazy to think they dynamically adjust to consume all available resources on their side.

Obviously the base model would be the same, but aren't there are +/- flavors they could overlay with extra compute? E.g. multi-pass, additional experts, etc.

The benefits to giving someone an occasional "magic" answer are too great not to.

Have there been any wide studies on same-prompt-different-times?

> So despite its flaws and mistakes, I still find it to be a tremendously useful tool, even if only to point me in the right direction.

Much of this resonates. That said, I get tremendous value simply by writing things down (or dictating them) and replying to my own question. I would expect that a sizable fraction of people have forgotten about these strategies and/or don't use them when they are most useful. For many, there is tremendous muscle memory to run a Hooli search almost on mental autopilot. Who has time to slow down and write a well-conceived question? Or perhaps we should turn it around ... On a longer time horizon, who would want to waste time with poorly-conceived questions?

It is the question that starts the process. So we should ask good questions. Do we? I'd be curious about the usage data OpenAI collects. I do my best to lower expectations about people in general, but I'm confident I'd still be unprepared for the level of thought put into questions.

> But what's most amazing to me is how ChatGPT is absolutely brilliant at some things, and not just technical or even obscure topics.

I'm not amazed in the way you are. I expect a variation in quality across topics and domains and question styles.

> I'm not amazed in the way you are. I expect a variation in quality across topics and domains and question styles.

Yes, I can see that. But over time, you also learn and adapt the prompts to ChatGPT's peculiarities so that it provides more useful output.

Still, I'm sure there are many topics/domains for which it's not useful.

As another anecdote, I'm not a mathematician but at one point I was playing around with proving theorems on a theorem prover.

What I found is that ChatGPT is this paradoxical entity which makes the most elementary math errors all the time (I'm talking third-grade level math mistakes), and yet, it was by far the most useful tool ever in coming up with lots of useful PhD-level ideas and math theorems that would allow me to complete proofs when I was completely stuck (and not just for proofs which it had seen before).

It came up all the time with brilliant ideas and theorems which simultaneously I didn't even know existed, were not part of any theorem database of any theorem prover I had seen before (and I've seen the vast majority of them), and there was no way I was going to find them by searching on the web or writing things down on a notepad (I know this because I had tried, for days at a time, along with other ideas such as visualizations and simulations).

That's not to say a mathematician wouldn't be aware of them, but I don't have easy access to one, and I was surely not going to pay one given that I was just exploring, mostly for curiosity.

This seems like a paid ad, but I promise you, I have no affiliation whatsoever...

Anecdotes sometimes take a beating, but I happen to like the personal ones. Thanks for sharing.

A quick thought about your success: ChatGPT's imprecision and stochasticity can work in its favor for many creative efforts. Unexpected token connections can have a lot of value in a space where vast numbers of novel directions are worthwhile.

For me, having spent thousands of hours thinking about statistics, ML, logic, and reasoning, ChatGPT is not paradoxical. To me, the human aspect is more interesting; namely, the ways in which people are surprised reveals a tremendous diversity in people's expectations about intelligence, algorithms, and pattern-matching.

For many people, most of the time, basic reasoning is a basic requirement for intelligence. By themselves, sequence to sequence models are not computationally capable of deductive reasoning with an arbitrary number of steps, since that would require recursion (or iteration).

I don't think I've spent nearly as much time as you thinking about these things and I'm not entirely sure I understood your perspective, but I have a couple of reflections for you which perhaps you can comment on:

> By themselves, sequence to sequence models are not computationally capable of deductive reasoning with an arbitrary number of steps, since that would require recursion (or iteration).

Isn't the fact that LLMs perform their inference step by step, where in each step they output only one token, an instance of deductive reasoning with a (potentially) arbitrary number of steps?

I say this because on each inference step, the tokens that were previously generated do become part of the input.

At a higher level of abstraction, I'm also thinking about chain-of-thought prompting, in which the LLMs first output the easier-to-deduct steps, then build on these steps to perform more deductive steps up until they finally produce the desired answer [1].

Of course, they have a limited context, but the context can be (and has been) increased. And humans have a limited context as well (except if we consider long-term memory or taking notes, perhaps).

The main difference I see is that in LLM chain-of-thought reasoning, they are currently outputting their intermediate "thoughts" before actually giving the final answer, whereas we humans are capable of silencing ourselves before actually having figured out the answer, which we then "output" as speech [2].

So I think there is still a form of recursion or iteration happening in LLMs, it's just that it's in a somewhat limited form in that we are observing it as it happens, i.e. as they output tokens one-by-one.

That said, something that I think could really make LLMs take a big step forward would be to have something akin to long-term memory. And the other big step would probably be being able to learn continuously, rather than only during their training. These two potential steps might even be the same thing.

So I don't know. I'm obviously not an expert but these are my thoughts with regards to what you've just said.

[1] https://ai.googleblog.com/2022/05/language-models-perform-re...

[2] Interestingly, there have been studies that show that humans produce micro-speech patterns when we are thinking, i.e. as if we are really speaking, although imperceptibly. That said, I have no idea how trustworthy these studies are.

Edit: added a clarification at the beginning.

First, I hope that my estimate of hours input into my brain didn't come across as boastful. I'm still working on the balancing act of stating my experience so people get my point of view without sounding arrogant. In this case, I should have also said that sometimes thinking about anything long enough can sometimes cause some of the wonder to fade. Luckily, though, for me, the curiosity remains, just focused in different directions.

Second, your comment above covers the ground I was referring to regarding deduction. It seems like we're on the same page. The main difference may be where one draws the lines. When I said "by themselves sequence to sequence models..." I was excluding algorithms that chain language models together in various ways.

Not too long ago, when people said "AI" that tended to refer to algorithms like forward chaining over a set of facts.

> That said, something that I think could really make LLMs take a big step forward would be to have something akin to long-term memory.

Yes. There is significant work in this direction.

When I was doing a lot of C++ gamedev, we were definitely doing a lot of stuff that would trip up static analysis, e.g. X-macros.

We would still use refactoring tools even though they would often miss stuff. You just rely on a combination of refactoring tool / search and replace / the compiler.

We would also debug our code in release mode with symbols. You get used to a debugging environment where you don't trust anything you're seeing in variables, etc. too.

Depends on what you expect it capable of given the limitations of these systems.