| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by in_a_society 920 days ago
	Without removing the functionality as it currently exists, I don't see a way to prevent this attack. Seems like the only real way is to have the user not specify websites to scrape for info but to copy paste that content themselves where they at least stand a greater than zero percent chance of noticing a crafted prompt.

3 comments

simonw 920 days ago

Writer.com could make this a lot less harmful by closing the exfiltration vulnerability it's using: they should disallow rendering of Markdown images, or, if they're allowed, make sure that they can only be rendered on domains directly controlled by Writer.com - so not a CSP header for *.cloudfront.net.

There's no current reliable solution to the threat of extra malicious instructions sneaking in via web page summarization etc, so the key thing is to limit the damage that those instructions can do - which means avoiding exposing harmful actions that the language model can carry out and cutting off exfiltration vectors.

link

ranguna 920 days ago

Just prompt the user every time an image needs to be rendered and show the call details. The users will see the full url with all their text in it and they can report it.

This works for images and any other output call, like normal http REST calls.

link

jcparkyn 920 days ago

I would think that a fairly reliable fix would be "only render markdown links that appear verbatim in the retrieved HTML", perhaps with an additional whitelist for known safe image hosts. The signifiant majority of legitimate images would meet one or both of these criteria, meaning the feature would be mostly unaffected.

This way, the maximum theoretical amount of information exfiltrated would be log2(number of images on page) bits, making it much less dangerous.

link