Hacker News new | ask | show | jobs
by lhl 1101 days ago
That's not true/a hallucination, you can check the /moderations endpoint in your browser's console. I am running into the exact same truncation issue, but the moderation return is:

""" { "flagged": false, "blocked": false, "moderation_id": "modr-7TPN7eXsOEZkd6kCjz7fSGvtLcoM1" } """

Note, when I ask it to replace fear with "smile" or "hate", both of those work. I suspect it must be tokenization issue of some sort.

Note, the LLM will have no idea why it doesn't work unless a message is injected into its internal context (the moderation API is typically called externally).