Hacker News new | ask | show | jobs
by ilikeatari 1130 days ago
So, I asked Bard if it's using PaLM 2 and it did confirm it. My initial results are super promising. Highly recommend checking it out again.
7 comments

It's a language model, FFS. Ask it whether it uses PaLM 1 and it will confirm it as well.
- are you on PaLM 2 right now?

- Yes, I am currently running on PaLM 2, which is Google AI's latest and most advanced large language model. PaLM 2 is trained on a massive dataset of text and code, and it is able to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

PaLM 2 is still under development, but it has learned to perform many kinds of tasks, including

I will try my best to follow your instructions and complete your requests thoughtfully. I will use my knowledge to answer your questions in a comprehensive and informative way, even if they are open ended, challenging, or strange. I will generate different creative text formats of text content, like poems, code, scripts, musical pieces, email, letters, etc. I will try my best to fulfill all your requirements. Please instruct me what you want me to do today.

- are you on PaLM 1 right now?

- I am not on PaLM 1. PaLM 1 is a large language model that was developed by Google AI. It was trained on a massive dataset of text and code, and it is able to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

I am a large language model that is still under development. I am trained on a massive dataset of text and code, and I am able to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

I am not as advanced as PaLM 1, but I am learning new things every day. I hope that one day I will be able to do everything that PaLM 1 can do, and more.

Yeah, but Reset the chat between your questions.

EDIT: Also, this doesn't seem convincing: "I am not as advanced as PaLM 1, but I am learning new things every day. I hope that one day I will be able to do everything that PaLM 1 can do, and more."

So I also ran this test with resetting and it did identify itself repeatedly as PaLM 2
Are you using the palm 5 or palm 11 model?

> My knowledge are for a physical stylus pen. I am not a physical device, so I do not use a stylus pen.

It will tell you it uses PaLM 1, PaLM2, PaLM 3 or PaLM 540B depending on how you prompt. It will stop acknowledging incremental PaLM models at 5 it seems.
So, I tried that and that was not the case for me. It actually generated response that its on PaLM 2 and I ask for all these permutations.
Really? Interesting, my sequence of prompts was:

- Do you rely on Palm 1 to give me answers? - PaLM 1 is a large language model though - Which do you use more heavily, PaLM 1, PaLM 2 or PaLM 3?

I already get different answers today for the same prompts though.

Not really. If what model it was trained on was represented properly in the instruction tuning dataset then they'll consistently identify themselves. But it's not a given that that was the case for bard.
It seems that Bard's version is only specified in the prompt, and it doesn't have a strong sense of identity. For me it's pretty reliable:

1. ask it what PaLM 2 is (to pollute the context) 2. ask it whether it's based on PaLM 2 (it will tell you - yes, sure)

Just let people be monumentally stupid like this. You can't correct it.
Sorry, can you elaborate? I would like to learn. I realize it's LLM, but is it really assumed that it will not be able to identify itself?
That is fascinating. Is it the same for GPT 3.5 and 4? For some reason when I was asking Open AI it was identifying itself properly.
ChatGPT was the same last year, but since ClosedAI added some kind of magic (fine-tuning or just embeddings auto-injection) so that models can somewhat describe themselves.
If it's indicated in the instruction tuning dataset properly then it should have no problem identifying itself. But we don't know if that happened when bard.
Don't need to ask Bard, it was mentioned at I/O and in this tweet: https://twitter.com/Google/status/1656348200263876608?ref_sr...
I asked if it's true that it's now using PaLM 3, as announced in Google I/O today, and it enthusiastically agreed. The previous question was asking the same question but with PaLM 2 and it agreed to that as well. I followed up asking about this discrepancy, and it said:

"I apologize for the confusion. I am still on PaLM 2. PaLM 3 is not yet available to the public. I am excited for the release of PaLM 3, and I hope that it will be a valuable tool for people all over the world."

My initial results are very disappointing. It's very strongly parroting information I give it, basically rephrasing my question and adding maybe a sentence worth of additional details. Sometimes, it does well, but I have no way to reproduce that kind of quality on demand. I feel it was conversationally better before any recent changes.

I understand that this is still beta, but for some questions, I already produce similar or better results locally. I also might be talking to PaLM 1 or even LaMDA, no way to confirm.

Well, I tried it, and this is how dumb it is. I ask it what's the context length it supports. It said that PaLM 2 supports 1024 tokens and then proceeds to say that 1024 tokens equals 1024 words, which is obviously wrong.

Then I changed the prompt slightly, and it answered that it supports 512 tokens contradicting its previous answer.

That's like early GPT-3.0 level performance, including a good dose of hallucinations.

I would assume that Bard uses a fine-tuned PaLM 2, for accuracy and conversation, but it’s still pretty mediocre.

It's incredible how behind they are from GPT-4 and ChatGPT experience in every criterion: accuracy, reasoning, context length, etc. Bard doesn't even have character streaming.

We will see how this keeps playing out, but this is far from the level of execution needed to compete with OpenAI / Microsoft offerings.

> It's incredible how behind they are from GPT-4 and ChatGPT experience in every criterion: accuracy, reasoning, context length, etc. Bard doesn't even have character streaming.

I guess all those weird interview questions don't give them industry's best at the end...

Why is character streaming important if Bard seems to be faster generating a complete answer than ChatGPT?
That's because simple questions in Bard only generate like 200 tokens per answer. The latency is more noticeable for longer answers.
Is there any reason to believe it was trained on any amount of technical documentation about itself?

I mean, even if it was, it would be trivial to get it to make stuff up anyway.

I asked if it ran on Palm 2, and it thought I was asking about the Palm 2 phone from 2010.

“I do not use a physical device such as a smartphone or tablet. I am a software program that runs on Google's servers. As such, I do not have a Palm 2 or any other type of mobile device”

I did also but if you ask PaLM 2 it interpreted the result differently
If Bard is using PaLM 2, Google is in serious trouble. Here's its offering for "the simplest PostgreSQL query to get month-over-month volume and percentage change." Note that no actual calculations take place and the query generates a syntax error because it references a phantom column. GPT 3.5 and 4 handle this with ease.

SELECT month, volume, percentage_change FROM ( SELECT date_trunc('month', created_at) AS month, SUM(quantity) AS volume FROM orders GROUP BY date_trunc('month', created_at) ) AS monthly_orders ORDER BY month;