Hacker News new | ask | show | jobs
by pksebben 115 days ago
guidance and alignment are usually handled by RLHF, which actually rewires the weights such that it becomes near-impossible for the model to have certain kinds of 'thoughts'. This is baked in such that it's not something you can just extract or turn off.