Hacker News new | ask | show | jobs
by loudmax 1049 days ago
My understanding of ghost attention is that the interface will inject periodic reminders into the conversation. These are seen by the model but hidden from the user. They do use up some of the tokens available in the context window.
2 comments

But from the paper, it sounds like this happens only during training. Some trick about constantly re-injecting the system prompt during chat conversations.

But during inference, there's no trick. The system message remains once at the top.

Couldn't you replicate that by doing the same thing and prepending system prompts in the Vicuna models?