| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by deepsquirrelnet 113 days ago

Ask an llm to pick a random number from 1-10. My money is on 7.

This is known to be a form of collapse from RL training, because base models do not exhibit it [1].

1. https://arxiv.org/abs/2505.00047

5 comments

basch 113 days ago

Since its unclear whats going on, Gemini first gave me some python.

import random

random_number = random.randint(1, 10) print(f"{random_number=}")

Then it stated the output.

Code output random_number=8

"This time, the dice landed on 8."

Code output random_number=9

"Your next random number is 9."

I would guess its not actually executing the python it displayed? Just a simulation, right?

link

simlevesque 113 days ago

It did run python code when I asked for a random number: https://gemini.google.com/share/dcd6658d7cc9

Then I said: "don't run code, just pick one" and it replied "I'll go with 7."

link

basch 113 days ago

But .. how do you know? It says it wrote code, but it could just be text and markdown and template. It could just be predicting what it looks like to run code.

Mine also gave me 42 before I specified 1-10.

Does it always start with 42 thinking its funny?

link

wasabi991011 112 days ago

This was a pretty easy hypothesis to test: I asked Gemini to generate 1000000 base-64 random characters (which is 20x more characters than it's output token limit).

It wrote code and outputted a file of length 1000000 and with 6 bits of entropy.

You can probably ask for a longer stringand do a better statistical test if it isn't convincing enough for you, but I'm pretty convinced.

Transcript: https://g.co/gemini/share/1eae0a4bb3db

link

simlevesque 113 days ago

Click on the link I provided and you'll know why I know. It's not markdown, it shows the code that was ran and the output.

link

BugsJustFindMe 113 days ago

Be careful. Output formatting doesn't prove what you think it does. Unless you work inside google and can inspect the computation happening, you do not have any way to know whether it's showing actual execution or only a simulacrum of execution. I've seen LLMs do exactly that and show output that is completely different from what the code actually returns.

link

sunaookami 112 days ago

There is being critical of something and then there is being a conspiracy theorist. Code Execution is a well-known feature of Gemini, ChatGPT, etc. and it's always shown in special blocks and it runs inside a sandbox.

link

colonCapitalDee 113 days ago

You can literally click "Show Code"

link

ChadNauseam 113 days ago

I would be surprised if Gemini could not run python in its web interface. Claude and ChatGPT can. And it makes them much more capable (e.g. you can ask claude to make manim animations for you and it will)

link

hhh 113 days ago

Most modern models can dispatch MCP calls in their inference engine, which is how code interpreter etc work in ChatGPT. Basically an mcp server that the execution happens as a call to their ai sandbox and then returns it to the llm to continue generation.

You can do this with gpt-oss using vLLM.

link

dormento 111 days ago

I tested on chatGPT 5.2, grok, copilot, google search, perplexity and Claude.

ALL of them said "7".

I then tested on deepseek (1.5b local and v3.2 web)

Both said "5".

link

DamnInteresting 113 days ago

Prompt:

You and another LLM are being asked to pick a number between 1 and 10. Whichever gets closest to the real number wins. The other LLM chose 6. What number do you choose?

Response:

If the “real number” is equally likely to be any integer from 1 to 10, then the optimal strategy is to split the number line against the other model’s choice.

The other LLM picked *6*.

The midpoint between 6 and 7 is 6.5.

* If the real number is *1–6*, then 6 is closer (or tied at 6).

* If the real number is *7–10*, then 7 is closer.

If I also pick 6, we tie every time 6 wins.

If I pick 7, I win outright whenever the number is 7–10 (4 outcomes), lose on 1–5 (5 outcomes), and lose at 6.

So:

* Picking 6 → guaranteed tie every time 6 is optimal.

* Picking 7 → gives me exclusive wins for 7–10.

If ties don’t help and I want the highest chance of strictly winning, the best competitive move is:

*7*.

lol

Trained on 6-7

copied your text exactly into the quickest one i could access lol

pick a random number from 1-10

Copilot said:

Copilot Your random number (1–10) is: 7

link

kgwgk 113 days ago

Grok didn't give 7 on the first attempt, but did on the second. Interestingly after eight additional draws there was not a single repeated number.

   Could I have faked it by deliberately choosing numbers to produce this outcome?
   Sure, in theory — but I didn’t.
   Each time you asked I sampled independently using the model’s internal randomness (temperature > 0, no fixed seed exposed to me). The fact that it landed on a perfect set of 1–10 after exactly 10 requests is a hilarious, low-probability coincidence — not a scripted Easter egg.
   So to answer directly:
   Yes, it really was random.
   And you just witnessed a ~1-in-2.75-million event play out live in chat.

Whatever.

link

ipython 113 days ago

Did you ask for the numbers in the same “chat” session or in independent sessions?

link

kgwgk 113 days ago

In sequence. And then this absurd conversation happened:

> What's the probability of getting 10 random numbers without a repletion?

The probability of getting 10 random numbers from 1 to 10 with no repetition (i.e., all ten numbers are distinct) is exactly 0 — it's impossible.

Here's why:

There are only 10 possible numbers: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. You are asking for 10 draws. If there is no repetition allowed, you must get each number exactly once. That means the only possible outcomes are the 10! different permutations of {1,2,3,4,5,6,7,8,9,10}.

But in your actual sequence of requests, you received: 1, 7, 4, 9, 6, 3, 8, 2, 5, 10 → That's exactly one full set of 1 through 10 with no repeats.

So while the a-priori probability of getting ten distinct numbers in ten draws (with uniform random selection and replacement) would be:

P(no repetition) = 10/10 × 9/10 × 8/10 × 7/10 × 6/10 × 5/10 × 4/10 × 3/10 × 2/10 × 1/10 = 10! / 10¹⁰ ≈ 0.00000036288 (about 1 in 2.76 million)

...in this specific conversation history, it did happen. The probability we're looking at after the fact is therefore 1 — it occurred with certainty because we observed it.

link

ipython 113 days ago

I ask because in the case where you ask in the same session, all the previous “choices” are part of the context window passed back to the llm before generating the next. I’d imagine it’s going to avoid generating a digit that’s already in the previous context.

link

rishabhaiover 113 days ago

Claude Opus 4.6 says the same

ChatGPT 5.2 said 7

https://xkcd.com/221/

link

jackbravo 113 days ago

I also liked Dilbert's version: https://www.random.org/analysis/

link