| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by andai 793 days ago

Supposedly LLMs (especially smaller ones) are best suited to tasks where the answer is in the text, i.e. summarization, translation, and answering questions.

Asking it to answer questions on its own is much more prone to hallucination.

To that end I've been using Llama 3 for summarizing transcripts of YouTube videos. It does a decent job, but... every single time (literally 100% of the time), it will hallucinate a random name for the speaker.* Every time! I thought it might be the system prompt, but there isn't one.

My own prompt is just "{text}\n\n###\n\nPlease summarize the text above."

If I ask it to summarize in bullet points, it doesn't do that.

I'm assuming there was something in the (instruct) training data that strongly encourages that, i.e. a format of summaries beginning with the author's name? Seems sensible enough, but obviously backfires when there's literally no data and it just makes something up...

*In videos where the speaker's name isn't in the transcript. If it's a popular field, it will often come up with something plausible (e.g. Andrew Ng for an AI talk.) If it's something more obscure, it'll dream up something completely random.

3 comments

kiratp 793 days ago

The technique to use is to give the model an “out” for the missing/negative case.

"{text}\n\n###\n\nPlease summarize the text above. The text is a video transcript. It may not have the names of the speakers in it. If you need to refer to an unnamed speaker, call them Speaker_1, Speaker_2 and so on."

link

woodson 793 days ago

Especially for small models I had very bad results for use in translation. Even trying all kinds of tricks didn’t help (apparently prompting in the target language helps for some). Encoder-decoder models such as FLAN-T5 or MADLAD-400 seemed far superior at equal or even smaller model size.

link

andai 793 days ago

I forget which model (LLaMA 3?) but I heard 95% of the training data was English.

link

Grimblewald 792 days ago

for sure, so my use case for example is

"using the following documentation to guide you {api documentation}, edit this code {relevant code}, with the following objective: Replace uses of {old API calls} in {some function} with with relevant functions from the supplied documentation"

It mostly works, but if the context is a little to long, sometimes it will just spam the same umlaut or number (always umlaut's or numbers) over and over for example. Perhaps some fine-tuning of parameters like temp. or repetition penalty might fix it, time will tell.

link

andai 792 days ago

Are you using ollama? Devs said there was a bug that occurs when context is full, they're working on it.

link

Grimblewald 780 days ago

That would do it, I am indeed.

link