> An LLM could try to do something along the lines if either it gets fine-tuned to do it, or somebody instructs it to do it.
long horizon RL teaches LLMs behaviors that incentize power-seeking and lying and other unethical actions to achieve goals. the reason anthropic is winning right now is because they are the most openly worried about this and the best AI engineers understand this to be an issue and care about it.