Hacker News new | ask | show | jobs
by latexr 617 days ago
Sorry to say that it failed at everything I threw at it. For coding questions it flat out refused to answer. For factual questions it was dead wrong. I asked a few very easy questions about Monty Python members, all of which verifiable with a 10 second search, and it got all the answers wrong.
1 comments

hey! sorry about that, it’s still not perfect but shows that using CoT prompt does improve llm responses. compared with its base model, you can clearly see some difference. If you like, please email me at contact@pixelverse.tech with some prompts you provided that t1 failed to respond correctly and I can take a look.
> but shows that using CoT prompt does improve llm responses.

A wrong answer is a wrong answer. In one of the questions it failed exactly in the same manner that GPT-4o did when I asked, so it’s not clear at all this is better. I could even see the chain and identify exactly where it made the mistake, but that’s not really a consolation.

As I said - it’s not perfect at answering every question right. What I am saying is that CoT promoting does have an effect on the quality of LLM responses. Ask how many r in strawberry or a similar question to t1 and llama 3.1 and you will see that CoT strategy has some effect.
Also to be clear - I never claimed that t1 is better than gpt 4o and o1, but thank you for trying it and providing feedback :)