Hacker News new | ask | show | jobs
by alariccole 199 days ago
ChatGPT just told me to put the turkey in my toaster oven legs facing the door, and you think it can replace school. Unless there is a massive architectural change that can be provably verified by third parties, this can never be. I’d hate for my unschooled surgeon to check an llm while I’m under.
3 comments

Just curious, not being a turkey SME, what's the downside to positioning the turkey that way?
Most turkeys of my acquaintance would not fit into a toaster oven without some percussive assistance.
I see, I overlooked the 'toaster' part. That's a good world model benchmark question for models and a good reading comprehension question for humans. :-P

GPT 5.1 Pro made the same mistake ("Face the legs away from the door.") Claude Sonnet 4.5 agreed but added "Note: Most toaster ovens max out around 10-12 pounds for a whole turkey."

Gemini 3 acknowledged that toaster ovens are usually very compact and that the legs shouldn't be positioned where they will touch the glass door. When challenged, it hand-waved something to the effect of "Well, some toaster ovens are large countertop convection units that can hold up to a 12-pound turkey." When asked for a brand and model number of such an oven, it backtracked and admitted that no toaster oven would be large enough.

Changing the prompt to explicitly specify a 12-pound turkey yielded good answers ("A 12-pound turkey won't fit in a toaster oven - most max out at 4-6 pounds for poultry. Attempting this would be a fire hazard and result in dangerously uneven cooking," from Sonnet.)

So, progress, but not enough.

Don't worry, someone will put another hack on top the model to teach it to handle this specific case better. That will totally fix the problem, right? Right?
What's the alternate if someone didn't know something during a procedure? Just wing it? Getting another data point from an LLM seems beneficial to me.
Ask a human who does. If there are no competent humans on-call before the procedure starts, reschedule the procedure.
A trained professional making their best guess is far more capable and trustworthy than the slop LLMs put out. So yeah, winging it is a good alternative here.