Hacker News new | ask | show | jobs
by akamoonknight 1184 days ago
One thought that came to mind when the author mentioned that GPT-4 seemed to want to use A* was that maybe the problem statement was not formed in a way that would get the best response. I'm not entirely sure if this type of back and forth would be useful, but just struck me as interesting to try.

I put the following prompt into ChatGPT (no GPT-4 access for me yet):

*A problem statement is given as follows:

An agent needs to find the best path on a 10x10 tile grid from their current location to a target location. They have a limited movement range of 5 points. Regular tiles cost 1 point to move through, water tiles cost 2 points to move through. Fire tiles cost 1 point to move through, but they should avoid pathing through them even if it means taking a longer path to their destination (provided the path is still within their limited movement range)

Please point out any parts of the problem statement that require clarification.*

Which returned the following response:

Based on the given problem statement, some potential areas that may require clarification are:

Current location and target location: It is not clear if the current location and the target location are fixed or if they change with each problem instance. If they are fixed, it should be mentioned explicitly. If they are not fixed, it should be specified how they are generated for each problem instance.

Movement range: It is not specified whether the agent's movement range is circular or square. If it is circular, it means that the agent can move to any tile that is within a distance of 5 from its current location. If it is square, it means that the agent can move to any tile that is within a range of 5 tiles in both the horizontal and vertical directions. The shape of the movement range needs to be specified.

Fire tiles: The problem statement mentions that the agent should avoid pathing through fire tiles even if it means taking a longer path to their destination, but it is not clear how the agent knows which tiles are fire tiles. It should be specified how the agent perceives and distinguishes fire tiles from regular tiles.

Tile costs: The problem statement mentions that regular tiles cost 1 point and water tiles cost 2 points, but it is not specified what happens if the agent moves diagonally. It should be specified whether diagonal moves are allowed and how much they cost.

Objective function: It is not specified how the "best path" is defined. It should be mentioned whether the objective is to find the shortest path or the path that minimizes some other cost function (e.g., total cost of the path).

4 comments

Interesting, not a single one of these things it asked for clarifications on are things it actually got wrong in its suggested solution
I've found similar success in using this approach rather than going direct to "write me some code AI-code-monkey". You can also take the following (which are the same sorts of things you'd want to do as a programmer writing this code yourself:

- Rewrite the problem as a list of constraints

- Point out any ambiguous or missing constraints

- Write an example that demonstrates each constraint

etc.

Effectively this article is really asking the question when posing a novel problem to the LLM, how deep does the internal state go in producing the completions. When it doesn't go deep enough, the trick is to make it do the things that deepen the internal state.

I'd guess that supplying a good system message to GPT-4 (waiting for my access) would help. Something like:

You're an engineer responsible for writing correct code from a specification. Break down the specification into small chunks that can be explained simply. If there are ambiguities, seek clarification. Only write code once all ambiguities are resolved and each part of the problem is described simply.

I really like that approach of asking it what needs to be clarified. That's going straight into my GPT toolbox.
This is perhaps one of the most impressive responses I've read. It truly seems like there is some reasoning happening. I don't understand how this can be the output of a generative LLM.
> It truly seems like there is some reasoning happening. I don't understand how this can be the output of a generative LLM

Right - this seeming "cognition" is exactly what's so spooky about the whole thing.

Here's what spooked me out from yesterday: https://news.ycombinator.com/item?id=35167685 - specifically how it determines the divide-by-zero error in this code: https://whatdoesthiscodedo.com/g/6a8f359

...which demonstrates GPT as being capable of at-least C++ "constexpr"-style compile-time computation, which shouldn't even be possible if one presumes GPT is "just" a giant database storing only multidimensional word similarity scores and sequence distribution from text inference.

> a generative LLM

I definitely wanted to believe that GPT was "just predicting the next word" - it was somewhat comforting to think of GPT as still being far from being-human or behaving like a real brain does - but that explanation never sat well with me: it was too simplistic and dismissive, and didn't explain the behaviour I was witnessing in GPT.

...so after having read Wolfram's (surprisingly accessible) article on GPT ( https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... ) it made a lot of things "click" in my head - and enabled me to start to understand why and how GPT is capable of... the surprising things it does; but it also leads me to believe we're (warning: incoming cliche) barely scratching the surface of what we can do: right-away I do believe we're almost at the point where we could simply ask GPT how to adapt it into some kind of early AGI - and we've all heard of what's supposed to follow next... and it really is quite unsettling.

Whoa, that explanation is really cool.

> capable of at-least C++ "constexpr"-style compile-time computation, which shouldn't even be possible if one presumes GPT is "just" a giant database storing only multidimensional word similarity scores and sequence distribution from text inference

I don't see how being a giant word-DB necessarily disqualifies compile-time computation. You can view computation as applying a series of term rewrite rules to an input until some sort of termination condition (or indefinite loop). In the case of these AI, the input is the prompt and predicting the next token is a limited form of term rewriting (where the rules are probabilistic and based off the network), and because code and explanations were probably included in the training data, it seems reasonable to me that the "rewrite rules" of Python bled a little bit into the AI.

It makes me insanely curious about the internal structures though. I gave that site 2 similar examples: one produces a correct explanation while another produces an incorrect explanation. The difference: a deleted line of insignificant whitespace

* https://whatdoesthiscodedo.com/g/dd2af89

* https://whatdoesthiscodedo.com/g/45ea060

From those 2 examples, I think its pretty clear that the AI's "rewrite rules" don't always coincide with Python's, but I would expect this to be mitigated by targeted training (like Copilot).

In the same vein, I was playing around with its ability to annotate code with explanations of the changing constraints on different variables (I thought this was a good proxy for deeper understanding of code).

I asked it to write a moderately complex algorithm, and it chose to write a function to compute the Pollard's Rho algorithm.

Then, with a few example comments to show what I meant, I asked it to annotate the code to show how the constraints on the variable changed as each line was executed.

This was its output:

  # n : an integer n > 3, no upper limit
  # seed : an optional integer parameter (default = 1) for the polynomial function
  # pollards_rho(n, seed) : an integer or nil, representing a non-trivial factor of n, or nil if the algorithm fails to find a factor

  def pollards_rho(n, seed = 1)
  return n if n <= 3

  x = 2
  # x : an integer, initially set to 2, will be updated using the polynomial function f
  # x will always be in the range [0, n)

  y = 2
  # y : an integer, initially set to 2, will be updated using the polynomial function f
  # y will always be in the range [0, n)

  d = 1
  # d : an integer, initially set to 1, will store the greatest common divisor of (x - y) and n
  # 1 <= d <= n

  while d == 1
    x = f(x, n, seed)
    # x : updated using the polynomial function f, x = (x * x + seed) % n
    # x will always be in the range [0, n)

    y = f(f(y, n, seed), n, seed)
    # y : updated using the polynomial function f twice, y = ((y * y + seed) * (y * y + seed) + seed) % n
    # y will always be in the range [0, n)

    d = gcd((x - y).abs, n)
    # d : the greatest common divisor of the absolute difference between x and y, and n
    # 1 <= d <= n, such that (x - y) % d == 0 and n % d == 0
  end

  d == n ? nil : d
  # Returns a non-trivial factor of n (1 < d < n) if one is found, or nil if the algorithm fails to find a factor
end

This is not actually the most impressive thing I've seen it do, but it illustrates that, even if GPT doesn't "really understand" code, it can emulate understanding to an impressive level. I'd go further and say that it opens up new questions as to what understanding actually means.

One personal "woah" moment was asking it to write some unit tests for a simple 2d game GPT and I wrote together. One function, "create_area" took a 2d array of characters (representing a map) and four integers representing coordinates, and a tile type. (The purpose being to create a rectangular area of the desired tile on the map according to the passed coordinates.)

GPT-4 successfully figured out how to write a unit test: it created a 5x5 array of ROCK tiles, passed it to create_area with the coordinates 1, 1 and 3, 3, and successfully figured out what the output should look like, even writing a fairly concise test to check the output (modified) 5x5 array. This was an eyebrow-raising moment for me: it made clear that GPT really does emulate some kind of "computation" internally, though quite possibly in some abstracted form. The geometric nature of this problem stuck out to me: a human can "see" a 2d array as a rectangular grid, and might realise the function carved out a smaller rectangle from that grid, but I never expected to see a computer (let alone a language model) figure it out. Interesting times, indeed.