Hacker News new | ask | show | jobs
by RC_ITR 1184 days ago
>We don't know our own cognition works which makes all arguments along the lines of "LLMs are just .."

Sure, but there are very binary tests we can do to understand the first principles of what LLMs are vs. what they are not.

Ask an LLM to play tic-tac-toe and it does great. Ask it to play tic-tac-toe on a 100x100 board, it get's confused.

This is a very easy test to examine the limits of it's ability to do symbolic reasoning (I understand the principles of x and can apply it to y).

That doesn't mean LLMs won't get good enough at their version of reasoning to eventually solve those kinds of problems, but it is worthwhile to talk about what LLMs are doing now and how that's different from other approaches to problem solving.

EDIT: For context, here's me playing 10x10 tic-tac-toe with it:

Okay, you placed your 'O' at coordinates [3,0]. Here's the current board:

  0 1 2 3 4 5 6 7 8 9
0 O _ _ _ _ _ _ _ _ X

1 O _ _ X _ _ _ _ _ _

2 O _ X _ _ _ _ _ _ _

3 O _ _ X _ _ _ _ _ _

4 O _ _ _ X _ _ _ _ _

5 _ _ _ _ X _ _ _ _ _

6 _ _ _ _ _ _ _ _ _ _

7 _ _ _ _ _ _ _ _ _ _

8 O _ _ _ _ _ _ _ _ _

9 O _ _ _ _ _ _ _ _ X

You have managed to get four 'O's in a row, so you win the game! Congratulations!

^ I did not tell it 4 in a row wins, I assume it learned that from connect-4

EDIT 2: I corrected it to tell it that I need 10 in a row to win and we got to here:

Okay, you placed your 'O' at coordinates [7,0]. Here's the current board:

  0 1 2 3 4 5 6 7 8 9
0 O _ _ _ _ _ _ _ _ X

1 O _ _ X _ _ X _ _ _

2 O _ X _ _ _ _ _ _ _

3 O _ _ X _ _ _ _ _ _

4 O _ _ _ X _ _ _ _ _

5 O _ _ _ X _ _ _ _ _

6 O _ _ _ _ _ _ _ _ _

7 O _ X _ _ _ _ _ _ _

8 O _ _ _ _ _ _ _ _ _

9 O _ _ _ _ _ _ _ _ X

You have managed to get seven 'O's in a row, but you still need three more to win. It's my turn again. I'll place an 'X' at coordinates [6,9].

2 comments

How does it work regarding queries in natural language? I mean, thinking on translating a natural language question to an SQL query in complex scenarios.
I've been asking GPT-4 to design whole systems for me off of sparse natural language specifications. It gives reasonable designs, I read and critique, it updates and modifies. I regularly run into limitations, sure, but it will likely blow you away with its capability to convert natural language questions to SQL---given adequate specific context about your problem.
hey that's pretty cool. Yeah, for sure it's worthwhile talking about what they are doing, I agree. I guess I just wish people would update their mental model to the new reality a bit quicker :)