The obstacle is supposed to be there and is supposed to be respected as an implicit order. Getting around it without extremely explicit instructions is an alignment problem.
It's not necessarily model alignment, I guess, is more what I'm getting at.
It may be more of a product alignment thing, where the fix may be making the context clearer, since it was violating an implicit agreement to achieve the explicit instructions it received. So the fix may involve a lot of better context.
But then also, to the extent that the fix does NOT involve better context, it seems like it hits the zone where alignment issues are really capability/intelligence issues. Which doesn't make them not-alignment, but it does make "alignment" not give off quite the right vibe since the issue is it's too dumb / has no common sense / can't make good judgments, (general issues the models have across the board).
It may be more of a product alignment thing, where the fix may be making the context clearer, since it was violating an implicit agreement to achieve the explicit instructions it received. So the fix may involve a lot of better context.
But then also, to the extent that the fix does NOT involve better context, it seems like it hits the zone where alignment issues are really capability/intelligence issues. Which doesn't make them not-alignment, but it does make "alignment" not give off quite the right vibe since the issue is it's too dumb / has no common sense / can't make good judgments, (general issues the models have across the board).