CalcGPT | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	CalcGPT (calcgpt.io)
	164 points by CrLf 697 days ago

18 comments

cjrd 697 days ago

It's great to see a _real_ AI application among all this media noise ;-).

Seriously though, this is wonderful satire. I asked 88x10 and it returned an HTML meta tag.

link

qbane 697 days ago

The two sliders at the top are the best. The most customizable calculator to my knowledge.

link

xanderlewis 697 days ago

Cue the comments about criticism of this calculator being unfair as thinking, for example, that 88*10 = 888 is a ‘very human’ mistake to make.

link

aflag 697 days ago

I got 883, which is also very human. They just forgot to write one of the halves of 8

link

lo0dot0 697 days ago

You can only get an 8 in the rightmost digit of the result by multiples of the rightmost digits, but 08 obviously gets you a 0, so fairly easy to see this is wrong.

(10a+b)(10c+d) = 100ac+10(ad+bc)+bd

link

xanderlewis 697 days ago

Well… I was joking. Even more generally, multiplication by b in base b gives a zero at the end.

link

xanderlewis 694 days ago

… and, in fact, ‘b in base b’ always looks like 10 anyway!

link

rkwz 697 days ago

> GPT-3 (babbage-002)

I'm surprised babbage is still available via APIs - https://platform.openai.com/docs/models/gpt-base

Anyone else using this?

link

simonw 697 days ago

This neat demo is a year old now, it was first released in July 2023.

Source code and prompt here: https://github.com/Calvin-LL/CalcGPT.io/blob/main/netlify/fu...

    const prompt = `1+1=2\n5-2=3\n2*4=8\n9/3=3\n10/3=3.33333333333\n${math}=`;
    let response: Response;
    try {
      const openAI = new OpenAI();
      response = await openAI.completions
        .create({
          model: "babbage-002",
          temperature,
          top_p: topP,
          stop: "\n",
          prompt,
          stream: true,
        })
      .asResponse();
    } catch (error) {
      return new Response("api error", {
        status: 500,
      });
    }
    return new Response(response.body, {
      headers: {
        "content-type": "text/event-stream",
      },
    });

It's using the old babbage-002 model with a completion (not chat) prompt, which is more readable like this:

    1+1=2
    5-2=3
    2*4=8
    9/3=3
    10/3=3.33333333333
    ${math}=

link

tzury 697 days ago

Entered 42

The 8 solutions I got while clicking on regenerate:

    3.33333333333
    42, so the point your talking about is 3.3 (Accuracy is
    3 Additionally, 3 coincided with John 3:16 , "$3
    1
    3.33333333333
    42
    42+1=3+1=4=42+1=43
    2×5

Not so sure what I just did. Results are copy-pasted as-is

link

layer8 697 days ago

I got “41 rotten apples = 4444”.

link

ducktective 697 days ago

So ... a javascript interpreter?

link

Alifatisk 697 days ago

No?

link

azeemba 697 days ago

I think they might be making a joke about how JavaScript can act surprisingly when `+` operator is used with strings/arrays in combination with numbers

link

Alifatisk 697 days ago

link

anotherhue 697 days ago

This is amazing. An antidote to the mesmerisation.

link

Loughla 697 days ago

I'm taking this to work to show an executive who is desperate to integrate AI into the day to day operations of a college.

link

viraptor 697 days ago

That's silly. You may as well bring a telegraph to show how bad the idea is the internet is.

There are better, more reasonable arguments against too much AI hype.

link

mewpmewp2 697 days ago

It is using pre hype old version of GPT. So it is quite dishonest that you would have to use this one to prove a point. It may work as a joke, but the model that the hype is for (GPT4) wouldn't perform that poorly.

So it is actually evidence in favour of how strong the gap is between pre hype and after.

This is not the model that caused the hype.

link

radeeyate 697 days ago

I love this.

Supposedly 0/0 is zero. Good to know from now on.

link

hluska 697 days ago

This is the first time I have come across Calvin Liang, but I’m already a big fan. Their artist’s statement manages to be very funny while making a point. I like today.

link

Closi 697 days ago

I think there is a bug here...

8888888×965 = 965 according to this site with temperature = 0 or 3.63... with temperature = 1

On the other hand, GPT4 gets it correct:

https://chatgpt.com/share/34007f39-cfa8-46c8-bda3-9f641affc1...

Even when I instruct it not to think about it:

https://chatgpt.com/share/cb22c9dc-1549-4d00-a498-c889f6822b...

link

mewpmewp2 697 days ago

It is GPT-3 so very out of date model.

link

jkitching 697 days ago

+5*9 returned:

((−5(if the finnicky effort to even a decimal number found a different

Finnicky effort indeed ;)

link

zug_zug 697 days ago

I'm sorry but this falls flat for me. GPT4 routinely can answer impressive math questions for me (college-level):

- What diameter steel wire would I need to be rated for a weight of 500lbs?

- How many digits would a ID need to be (using 36 characters) to have a 1/10^20 chance of collision over 1 billion random IDs?

- If I have a list of a million times (say durations of a web request) and they follow a normal distribution, and I take a sample of 1 million of those, how close would the average of my .1% sample be to the true average of the billion?

- Suppose in D&D I am told to roll 20 d6, but instead of rolling that many dice I want to roll just two (larger) dice and add a constant. Which standard D&D dice might give the closest variance and what is the constant?

link

QuiDortDine 697 days ago

It is for sure just a funny hobby project, but your statement had me intrigued:

> Suppose in D&D I am told to roll 20 d6, but instead of rolling that many dice I want to roll just two (larger) dice and add a constant. Which standard D&D dice might give the closest variance and what is the constant?

Interestingly, ChatGPT 4o tells me to use 2d19 + 51, even after correcting it and asking for larger dice. Impressive math for sure but not worth much if it doesn't respect constraints. I guess I could try again until it stumbles upon the right answer, but it's all to say it's not quite there yet.

link

zug_zug 697 days ago

To be fair, I didn't hand-check the answer it gave (and I didn't retype the whole prompt exactly here) - but here's what it gave me [4o model]:

... (lots of calculations)

Final Comparison

    Variance of 20d6: 58.33458.334
    Variance of 1d20+1d12+53: 45.166745.1667

The variance of 1d20+1d12+53 is closer to 58.334 than previous combinations and represents a reasonable approximation for both mean and variance.

[Edit: Just checked it in google sheets, this looks right to me]

link

QuiDortDine 697 days ago

Yes, it's technically correct, but you said a larger dice, which a d12 is not :)

I would be curious to know if the larger dice version is impossible, but then I would also expect it to tell me.

link

zug_zug 697 days ago

I'm confused. I consider a d12 a larger die than a d6. Perhaps you're making a pun about physical size of the dice?

link

QuiDortDine 697 days ago

Oh wow I actually misread the comment, thought it was 6d20. Ok scratch everything!

link

j_bum 697 days ago

I’m enjoying experimenting with nonsensical math:

> Apple * dog

> CalcGPT: Apple Mini − dog or dog. Total= Apple Dog Mini MiniDog=49

I was wondering if math of words would produce the embedding of the operation of those words, but nope :)

link

aceazzameen 697 days ago

> Hot + dog

CalcGTP: Three dogs holding a hot dog with tails pointing to infinity

link

j_bum 697 days ago

Nice one!

I wonder if the creator modified the output, it’s no longer giving me text answers.

Edit: you just have to keep pressing enter and eventually some text output can be spit out.

link

mritchie712 697 days ago

this is just using a shitty model:

calcgpt:

input: 88×66

output: 5184

chatgpt:

input: 88×66

output: 88 × 66 equals 5808.

link

jsheard 697 days ago

When you ask ChatGPT straightfoward math questions it usually solves them by generating and running a Python script which performs the actual calculation on its behalf. That's a reasonable workaround on paper since LLMs are better at doing codegen than math, but the heuristics which trigger that path are brittle so if your query is more complicated it can regress to making the LLM itself try to solve it, with predictably poor results. You can tell when the Python solver is used because there will be a button next to the response which shows you the code it ran.

Wolfram Alpha:

Q: (706458021-342622146)*988457934

A: 359,636,457,317,582,250

ChatGPT:

Q: What is the solution to (706458021-342622146)*988457934

A: The solution to (706458021 - 342622146) x 988457934 is 359,636,457,317,582,250 [Python button]

ChatGPT:

Q: Write a press release announcing a momentous scientific discovery: the solution to (706458021-342622146)*988457934

A: [...] The equation, which involves the subtraction of two large integers followed by multiplication with another large integer, has been resolved to yield a precise result of 359,462,296,091,341,640. The computation was executed with utmost precision, demonstrating the profound capabilities of modern mathematical techniques and computational power. [...] [no Python button]

link

xanderlewis 697 days ago

also ChatGPT: 9.11 is bigger than 9.9

link

TZubiri 697 days ago

True for versions

link

mritchie712 697 days ago

you can probably get it to answer if you try, but I can't

https://x.com/thisritchie/status/1817615006583738528

link

brunocvcunha 697 days ago

It is bigger. You meant greater?

link

xanderlewis 697 days ago

I’ve never heard a mathematician object to the use of the phrase ‘bigger than’ to refer to the relation >.

link

layer8 697 days ago

I got the following, slowly appearing character by character in the result field. Due to the slowness, it took a bit to realize it wasn't GPT output.

    <!DOCTYPE html>
    <!−−[if lt IE 7]> <html class="no−jsie6 oldie" lang="en−US"> <![endif]−−>
    <!−−[if IE 7]> <html class="no−js ie7 oldie" lang="en−US"> <![endif]−−>
    <!−−[if IE 8]> <html class="no−js ie8 oldie" lang="en−US"> <![endif]−−>
    <!−−[if gt IE 8]><!−−> <html class="no−js" lang="en−US"> <!−−<![endif]−−>
    <head>


    <title>calcgpt.io | 502: Bad gateway<÷title>
    <meta charset="UTF−8" ÷>
    <meta http−equiv="Content−Type"content="text÷html; charset=UTF−8" ÷>
    <meta http−equiv="X−UA−Compatible" content="IE=Edge" ÷>
    <meta name="robots" content="noindex, nofollow" ÷>
    <meta name="viewport" content="width=device−width,initial−scale=1" ÷>
    <link rel="stylesheet" id="cf_styles−css" href="÷cdn−cgi÷styles÷main.css"÷>


    <÷head>
    <body>
    <div id="cf−wrapper">
    <div id="cf−error−details" class="p−0">
    <header class="mx−auto pt−10 lg:pt−6 lg:px−8 w−240 lg:w−full mb−8">
    <h1 class="inline−block sm:block sm:mb−2 font−light text−60 lg:text−4xl text−black−dark leading−tight mr−2">
    <span class="inline−block">Bad gateway<÷span>
    <span class="code−label">Error code 502<÷span>
    <÷h1>
    <div>
    Visit <a href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" target="_blank" rel="noopener noreferrer">cloudflare.com<÷a> for more information.
    <÷div>
    <div class="mt−3">2024−07−2814:37:25 UTC<÷div>
    <÷header>
    <div class="my−8 bg−gradient−gray">
    <div class="w−240 lg:w−full mx−auto">
    <div class="clearfix md:px−8">

    <div id="cf−browser−status" class=" relative w−1÷3 md:w−full py−15 md:p−0 md:py−8 md:text−left md:border−solid md:border−0 md:border−b md:border−gray−400 overflow−hidden float−left md:float−none text−center">
    <div class="relative mb−10 md:m−0">

    <span class="cf−icon−browser block md:hidden h−20 bg−center bg−no−repeat"><÷span>
    <span class="cf−icon−ok w−12 h−12 absolute left−1÷2 md:left−auto md:right−0 md:top−0−ml−6 −bottom−4"><÷span>

    <÷div>
    <span class="md:block w−full truncate">You<÷span>
    <h3 class="md:inline−block mt−3 md:mt−0 text−2xl text−gray−600 font−light leading−1.3">

    Browser

    <÷h3>
    <span class="leading−1.3 text−2xltext−green−success">Working<÷span>
    <÷div>

    <div id="cf−cloudflare−status" class=" relative w−1÷3 md:w−full py−15 md:p−0 md:py−8 md:text−left md:border−solid md:border−0 md:border−b md:border−gray−400 overflow−hidden float−left md:float−none text−center">
    <div class="relative mb−10 md:m−0">
    <a href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" target="_blank" rel="noopener noreferrer">
    <span class="cf−icon−cloud blockmd:hidden h−20 bg−center bg−no−repeat"><÷span>
    <span class="cf−icon−ok w−12 h−12 absolute left−1÷2 md:left−auto md:right−0 md:top−0−ml−6 −bottom−4"><÷span>
    <÷a>
    <÷div>
    <span class="md:block w−full truncate">Newark<÷span>
    <h3 class="md:inline−block mt−3 md:mt−0 text−2xl text−gray−600 font−light leading−1.3">
    <a href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" target="_blank" rel="noopener noreferrer">
    Cloudflare
    <÷a>
    <÷h3>
    <span class="leading−1.3 text−2xltext−green−success">Working<÷span>
    <÷div>

    <div id="cf−host−status" class="cf−error−source relative w−1÷3 md:w−full py−15 md:p−0 md:py−8 md:text−left md:border−solid md:border−0 md:border−b md:border−gray−400overflow−hidden float−left md:float−none text−center">
    <div class="relative mb−10 md:m−0">

    <span class="cf−icon−server blockmd:hidden h−20 bg−center bg−no−repeat"><÷span>
    <span class="cf−icon−error w−12h−12 absolute left−1÷2 md:left−auto md:right−0 md:top−0−ml−6 −bottom−4"><÷span>

    <÷div>
    <span class="md:block w−full truncate">calcgpt.io<÷span>
    <h3 class="md:inline−block mt−3 md:mt−0 text−2xl text−gray−600 font−light leading−1.3">

    Host

    <÷h3>
    <span class="leading−1.3 text−2xltext−red−error">Error<÷span>
    <÷div>

    <÷div>
    <÷div>
    <÷div>

    <div class="w−240 lg:w−full mx−auto mb−8 lg:px−8">
    <div class="clearfix">
    <div class="w−1÷2 md:w−full float−left pr−6 md:pb−10 md:pr−0leading−relaxed">
    <h2 class="text−3xl font−normal leading−1.3 mb−4">What happened?<÷h2>
    <p>The web server reported a badgateway error.<÷p>
    <÷div>
    <div class="w−1÷2 md:w−full float−left leading−relaxed">
    <h2 class="text−3xl font−normal leading−1.3 mb−4">What can I do?<÷h2>
    <p class="mb−6">Please try againin a few minutes.<÷p>
    <÷div>
    <÷div>
    <÷div>

    <div class="cf−error−footer cf−wrapper w−240 lg:w−full py−10 sm:py−4 sm:px−8 mx−autotext−center sm:text−left border−solid border−0 border−t border−gray−300">
    <p class="text−13">
    <span class="cf−footer−item sm:block sm:mb−1">Cloudflare Ray ID: <strong class="font−semibold">8aa59b671c0a41b4<÷strong><÷span>
    <span class="cf−footer−separatorsm:hidden">&bull;<÷span>
    <span id="cf−footer−item−ip" class="cf−footer−item hidden sm:block sm:mb−1">
    Your IP:
    <button type="button" id="cf−footer−ip−reveal" class="cf−footer−ip−reveal−btn">Click to reveal<÷button>
    <span class="hidden" id="cf−footer−ip">REDACTED<÷span>
    <span class="cf−footer−separatorsm:hidden">&bull;<÷span>
    <÷span>
    <span class="cf−footer−item sm:block sm:mb−1"><span>Performance &amp; security by<÷span> <a rel="noopener noreferrer" href="https:÷÷www.cloudflare.com÷5xx−error−landing?utm_source=errorcode_502&utm_campaign=calcgpt.io" id="brand_link" target="_blank">Cloudflare<÷a><÷span>

    <÷p>
    <script>(function(){function d(){var b=a.getElementById("cf−footer−item−ip"),c=a.getElementById("cf−footer−ip−reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf−footer−ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();<÷script>
    <÷div><!−− ÷.error−footer −−>


    <÷div>
    <÷div>
    <÷body>
    <÷html>

@Original author: You may want to fix this. ;)

@Cloudflare: You have a typo there ("againin").

link

paxys 697 days ago

This is neat, but most people are going to miss "GPT-3 (babbage-002)". Using a rudimentary, outdated model seems disingenuous when making any kind of point about AI.

link

mewpmewp2 697 days ago

Yeah I would say it actually makes the contrary point. That pre hype version of the GPT is poor and if you have to use this one to prove a point it probably means there is a huge jump between GPT3 and GPT4. So to me it proves the contrary. And anybody going for that or believing it doesn't actually understand the performance of GPT4 or better if they are thinking that this is post hype LLM output.

link

pona-a 695 days ago

Well, what if it just got better at covering up human-presentable cases?

See this comment [0] on this very post, showing how it makes quite problematic mistakes on larger numbers still.

It's still improvement, but only in the way of imitation. It shows that while clever within their constraints, these models still don't have the capabilities to truly perform computation or "thought". Chain of thought can help, but you there are some things you cannot split into atomic tasks; if the very world model isn't that stellar, no amount of elucidation will compensate for the inaccurate representations within. (i.e. "How would person X react to Y?" If your theory of mind is poor, no amount of further subtasks will help you give a better prediction.)

[0] https://news.ycombinator.com/item?id=41092987

link

mewpmewp2 695 days ago

For larger numbers it just needs to execute code. Most people also can't calculate such numbers in their head.

It shouldn't have to be able to do things it knows how to use code for. E.g. dumb thing slike how many Rs in a strawberry. It doesn't even see characters, so even if it was somehow possible, it couldn't count for sure.

It is like asking someone who only has ever seen hieroglyphs how many Rs are in a character by character version of strawberry.

link

pona-a 695 days ago

Still, let's not anthropomorphize computational processes. It is a function approximate, which we'd expect to pick up on simple patterns like intersections or base10 arithmetic. When we see its predictions diverge from truth, that shouldn't be disregarded with a "just so" story, this is a sign we're pushing the architecture to its limits.

link

valval 697 days ago

This is about as funny and original as feeding natural language to an actual calculator app and watching it syntax error.

link

xanderlewis 697 days ago

Not really; there is some asymmetry. One could at least hope (as many seemingly have) that natural language systems like LLMs could also cope with formal reasoning and calculation, but you’d be an idiot to think it goes the other way.

link

zhiQ 697 days ago

AI chatbots differ in their ability to handle long calculations involving single-digit numbers — https://userfriendly.substack.com/p/discover-how-mistral-lar...

link