| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jewel 841 days ago

We have a product that uses ChatGPT via the API, using the 3.5 turbo version. Our query involves some dates. Instead of giving back text like it usually does, today it has been giving errors because it does not think 2024-02-29 is a valid date.

This is easy to reproduce with the web interface, at least sometimes [0]. It start out by saying it's not a valid date and then as it's explaining why it isn't it realizes its mistake and sometimes corrects itself.

[0] https://chat.openai.com/share/37490c9f-81d6-499f-b491-116536...

4 comments

esalman 841 days ago

Blows my mind that people consider using ChatGPT for serious applications. I mean it's fine as a code autocorrect/autocomplete tool as in GitHub copilot. But it should not replace the code itself. You encounter a bug in the code, you fix it, you never encounter it again. But ChatGPT will repeat the same mistake sooner or later. That's not how we should engineer solution for critical problems.

Draiken 841 days ago

As long as you add the "AI" keyword to the product/app/company and sell to people that don't understand how unreliable it is, you're good.

Let me rephrase that: you can profit from it. Even if it's not good.

makoto12 841 days ago

If you sandbox your connection to openAI correctly, then you can get the benefit of a llm without making your application look silly at the same time. Identifying the correct places in your business to use it is tricky, but imo it certainly makes sense in a lot of specific areas. Just not a catch all that can run your business for you

ativzzz 841 days ago

It's great for prototyping and creating outlines/rough drafts or for creating rough summaries - you can build this into some features to help your customers speed up writing lots of text

lucumo 841 days ago

If the cost-savings is worth it compared to the problems...

I mean, that's how we do it with humans. It's quite a common occurrence to keep a part of a business process human because automating it would we too expense due to edge cases.

Humans make mistakes and are expensive, but are also flexible and usually smartish. ChatGPT makes mistakes and is usually dumbish, but is also flexible and cheap.

Engineering is about picking the right trade-offs in your solution.

fuzztester 841 days ago

>I mean, that's how we do it with humans. It's quite a common occurrence to keep a part of a business process human because automating it would we too expense due to edge cases.

Which part of the human do people keep? The head? Arms? ;)

#ParsingAmbiguityError

fractalb 841 days ago

> ChatGPT makes mistakes and is usually dumbish, but is also flexible and cheap.

People wouldn't mind it if the keyword `dumbish` has been all along there.

littlestymaar 841 days ago

Wired: LLM are practically AGI

Tired: ChatGPT thinks February 29th isn't a valid date.

pquki4 841 days ago

My favorite prompt: asking "How many e's are there in the word egregious". Always says three (ChatGPT 3.5, 4, 4 turbo). When you ask it which three, it realizes its mistake and apologizes (or sometimes tells you where they are that is completely wrong). Looks like it just outputs gibberish for these things.

moozilla 841 days ago

ChatGPT is specifically bad at these kinds of tasks because of tokenization. If you plug your query into https://platform.openai.com/tokenizer, you can see that"egregious" is a single token, so the LLM doesn't actually see any "e" characters -- to answer your question it would have had to learn a fact about how the word was spelled from it's training data, and I imagine texts explicitly talking about how words are spelled are not very common.

Good explanation here if this still doesn't make sense: https://twitter.com/npew/status/1525900849888866307, or check out Andrej Karpathy's latest video if you have 2 hours for a deep dive: https://www.youtube.com/watch?v=zduSFxRajkE

IMO questions about spelling or number sense are pretty tired as gotchas, because they are all basically just artifacts of this implementation detail. There are other language models available that don't have this issue. BTW this is also the reason DALL-E etc suck at generating text in images.

Izkata 841 days ago

> If you plug your query into https://platform.openai.com/tokenizer, you can see that"egregious" is a single token

That says it's 3 tokens.

wruza 841 days ago

It doesn’t even matter how many tokens there is, because LLMs are completely ignorant about how their input is structured. They don’t see letters or syllables cause they have no “eyes”. The closest analogy with a human is that vocal-ish concepts just emerge in their mind without any visual representation. They can only “recall” how many “e”s are there, but cannot look and count.

alickz 841 days ago

>They can only “recall” how many “e”s are there, but cannot look and count.

Like a blind person?

brewtide 841 days ago

Pre-cogs, I knew it.

redox99 841 days ago

" egregious" (with a leading space) is the single token. Most lower case word tokens start with a space.

ToValueFunfetti 841 days ago

The number of tokens depends on context; if you just entered 'egregious' it will have broken it into three tokens, but with the whole query it's one.

fuzztester 841 days ago

Why three tokens, not one?

chgs 841 days ago

Chatgpt could say “I don’t know”

devjab 841 days ago

It’s always gibberish, it’s just really good at guessing.

I forgot what exactly it was I was doing. We were trying to get it to generate word lists of words ending with x or maybe it was starting with. For a marketing PoC and it made up oceans of words that not only didn’t start/end with x but mostly didn’t include x at all.

Isn’t this also why it can pass CS exams and job interviews better than like 95% of us, but then can’t help you solve the most simple business process in the world. Because nobody has asked that question two billion times on its training data.

mewpmewp2 841 days ago

But also it doesn't see characters. It sees tokens. The only way it would be reliably able to solve it is if it had a lookup table of token to characters. Which it likely doesn't.

You couldn't do it either unless you learned the exact matchings of all tokens to all characters in that token and their positions if you were given tokens as an input. You would have learned the meaning of the token, but not what the exact characters it represents.

yousif_123123 841 days ago

Even if it sees tokens, I don't think it's an impossible task. Certainly an advanced enough LLM should be able to decipher token meanings, to know that a word is made up of the individual character tokens regardless of how the full word is tokenized. Maybe something gpt5 can do (or there's some real technical limitation which I don't understand)

Izkata 841 days ago

> to know that a word is made up of the individual character tokens

A token is the smallest unit, it's not made of further tokens. It maps to a number.

mewpmewp2 840 days ago

The point is though that this is definitely not a task to evaluate an LLMs intelligence with. It's kind of like laughing at Einstein when he wouldn't be able to decipher hieroglyphs in any language without previous training for those hieroglyphs. Could Einstein potentially learn those hieroglyphs? Sure. But is it the best use of his time - or memory space?

botanical 841 days ago

I just asked it with Gemini; at first, it got it right. Then I asked if it was sure and it apologised and said 3 is the correct answer. When asked what are the 3 "e"s, it says:

> The three "e"s in "egregious" are: > > 1. The first "e" is located in the first syllable, between the "g" and the "g". > 2. The second "e" is located in the second syllable, following the "r". > 3. The third "e" is located in the third syllable, following the "i" and before the last "o".

AlienRobot 841 days ago

That's because it's trained on 2021 data. No February 29 back then!

pants2 841 days ago

For the record, both ChatGPT-4 and Gemini Ultra affirmed that it's a valid date. Gemini reasoned through it and GPT-4 ran python code to make sure 2024 was divisible by 4.

roland35 841 days ago

Interesting... But that isn't exactly true! Centuries that are not divisible by 4 don't count!

swores 841 days ago

I've yet to come across a form of 100 that isn't divisible by 4... since 25 usually still exists!

But I do remember there being some weird niche rules about which years are or aren't leap years, so I'm guessing your comment is basically right just wrongly worded?

latexr 841 days ago

The rule is that leap years are the ones divisible by 4. Unless it’s also divisible by 100. Unless unless it’s divisible by 400.

So 2000 was leap, but 2100, 2200, and 2300 won’t be, but 2400 will be.

swores 841 days ago

Ahh, so it's centuries that aren't divisible by 400 rather than that aren't divisible by 4, that makes more sense!

Thanks for answering

inglor_cz 841 days ago

The GP formulated it in a somewhat unclear way. "Centuries" divisible by 4 probably meant "years" divisible by 400.

So, 19th century (1900 is the last year) isn't divisible by 4 (19/4 is not integer), which is the same as saying that 1900 isn't divisible by 400.

This is the main reform of the Gregorian calendar - leap days aren't introduced on xy00 years which aren't divisible by 400. This corrects the length of a year to 365.2425 days, which is fairly close to the real value of 364.2422 days.

The original Julian calendar had year of 365.25 days, which aggregated an error of more than ten days over the centuries.

daseiner1 841 days ago

Did it also check if 2024 is divisible by 100 but not 400?

peteradio 841 days ago

But akshually neither does my uncle Ned

lynx23 841 days ago

Saw the same via the API with gpt-4-0125-preview. IOW, even gpt-4 thinks 2024 is not a leap year.

fragmede 841 days ago

That's weird, I don't get that with gpt-4

https://chat.openai.com/share/336b1c4b-53b7-4d56-ac68-e3c868...

lynx23 836 days ago

Given the probabilistic nature of LLMs, you likely need to run an experiment several times to see the various different results it can produce.

baron816 841 days ago

ChatGPT thinks today is March 1. Go ahead and ask it what today’s date is.

huytersd 841 days ago

> No, 2024 is not a leap year. Leap years are years divisible by 4, except for years that are divisible by 100 unless they are also divisible by 400. Therefore, the next leap year will be 2024.

rodface 841 days ago

As the proud new owner of a leap day baby, I find this to be extra hilarious

pizzafeelsright 841 days ago

I was at the hospital and heard two babies being born. I love that music.

heywoods 841 days ago

Congratulations!

hilux 841 days ago

Congratulations!

Big savings on parties and cake!

muzani 841 days ago

Apparently it reads 2024 as two different numbers: 202 and 4

s9df898r32h 841 days ago

thats better then Mistral... it "thinks" it is March 15th 2023

"The current date is March 15, 2023. However, please note that as a large language model, my knowledge is based on the data I was trained on, which is up to 2021. Therefore, I cannot provide real-time information or updates on current events or dates. I recommend checking a reliable source such as a calendar or a trusted news website for the most accurate and up-to-date information."

vdfs 841 days ago

It is March 1 in many places

cheapgeek 841 days ago

It's UTC. Ask it what day was yesterday.

js2 841 days ago

As of 10 PM US/Eastern, ChatGPT 3.5 answers as follows for me:

> You: what is today's date?

> ChatGPT: Today's date is February 29, 2024. Please note that February 29 occurs only in leap years. If you have any more questions or if there's anything else I can help you with, feel free to ask!

floydianspiral 841 days ago

openai actually in their platform for billing, etc shows that its actually march 1 as well.