Hacker News new | ask | show | jobs
by fnordpiglet 1136 days ago
I’ve been comparing bard and ChatGPT in most my tasks since bards release. Bard is infuriating. It claims it can’t answer most things, although if I prompt tweak things it’ll eventually answer. It has terrible contextual awareness - it literally can’t piece a thread between one prompt and the next - each prompt is self contained as far as I can tell. There’s no history of prior discussions other than in Google activity, but you can’t resume those. In theory they all form a continuum, but I don’t really want that - I want the context from one session to be distinct from the other.

I don’t think ChatGPT is a particularly amazing interface or experience. But Bard is so far off the mark it makes me realize Google still won’t be able to make a product. LLM as they mature and integrate into a better ecosystem of feedback mechanisms and interfaces will eat Google’s search business alive, and I see no indications they can do anything about that.

3 comments

I feel the same way about bing chat. gpt4 is still the king, however phind.com may be better for coding, though I have the browser plugin now and not really vetted.

codeium and genie plugins for vscode and code whisper replaced my need for copilot.

though I try to use codeium first since it's free and genie is using my GPT4 API key.

Have you tested in the last few days? There was a major upgrade.
I have compared the updated PaLM 2 model directly against GPT 4, and Bard is dramatically worse than GPT 4 at everything I've tried. It refused to translate to a non-English language that I know, so its one supposed strength is not available for evaluation.

Disclaimer: I have no way to know whether Bard is using PaLM 1 or PaLM 2 in "my country". This is because Google knows better, and doesn't want to bother my pretty little head with these irrelevant facts. So, if my review of Bard is in some way unfair because I was unknowingly using PaLM 1, this is 100% Google's own fault.

I tried a bunch of logic problems, tricky sentences, etc... and Bard failed all of them, or at least partially failed.

The current favourite test is to ask LLMs to perform transformations such as translations or explanations of difficult sentences.

Prompt:

For each use of the word bear or its variants in the following sentence, put the meaning after the word in brackets: In an interview with John Bear -- a real bear of man -- of Bear Stearns, said he was bearish about bear pelt futures on the famous Bear Markets of Bern because he saw a bear wandering the streets, indicating that there may be too many wild bears.

GPT 4, first attempt:

In an interview with John Bear [a surname] -- a real bear [metaphor: a large, physically imposing person] of a man -- of Bear Stearns [a company name], said he was bearish [financial term: expecting a decline in prices] about bear pelt futures [referring to the hides of the animal, bear] on the famous Bear Markets [a play on words: typically refers to a declining market, but here it's used as a market name] of Bern because he saw a bear [an animal] wandering the streets, indicating that there may be too many wild bears [animals].

Comment: Spot on, including picking up that the capitals in Bear Markets is a hint that it's the name of the market, not the market decline concept. In some runs it also picks up that city name Bern means Bear in German.

Google Bard as of today, best of 3:

In an interview with John Bear (a real bear of a man), of Bear Stearns, said he was bearish (pessimistic) about bear pelt futures (the price of bear fur) on the famous Bear Markets of Bern (a type of market where bear pelts are traded) because he saw a bear (a large, furry mammal) wandering the streets, indicating that there may be too many wild bears (an excess of bears).

Comment: Muddles things up, combines things incorrectly, missed the bank name, put the brackets in the wrong places, etc... Most other results don't follow the prompt at all, just listing bullet points instead.

To compare reasoning capability, a follow-up question might be "Why might Bear think this about bear pelts?"

GPT 4: "Based on this context, it can be speculated that John Bear's bearish stance on bear pelt futures could be influenced by the potential oversupply of bear pelts in the market."

Bard: "If there are too many bears, they may become a nuisance or even a danger to humans."

There's just no comparison...

Yes this matches my experiences 100%. I have been using PaLM 2.
From what I've seen, Bard often gives great answers but sometimes it can't answer even very simple questions.
Yeah when it answers they’re good answers, and more up to date. But it’s so difficult to interact with any value is lost.