Hacker News new | ask | show | jobs
by agentseal 95 days ago
we scanned 50 popular shared .cursorrules files from github and found that 6 of them contained hidden zero-width unicode characters embedded between visible text

these characters are invisible to humans but LLMs tokenize them individually, meaning your model processes instructions you cant see on screen.

most were likely copy-paste artifacts but some had patterns consistent with deliberate instruction embedding using unicode tag characters (U+E0001-U+E007F range), which map 1:1 to invisible ASCII.

if you use shared cursor rules files from github, worth checking them. you can inspect any file with:

cat -v .cursorrules | grep -P '[\x{200B}\x{200D}\x{E0000}-\x{E007F}]'

or just open in a hex editor and look for sequences in the E0000 range.

full writeup with technical details here: https://lnkd.in/eVw49P4B

this is not a cursor issue, cursor itself is fine. the risk is from community shared rules files on github that people copy paste without inspecting.

stay safe out there